Some time ago I described how to use the Video Intelligence (VI) API to create the ‘dna’ of a film by using VI labels. You can then use these fingerprints to find films that are likely to be copies, or similar to other films. We did a Totally Unscripted show a couple of years ago to describe the technique. You can see the video here.
In this post I’ll show another technique to create a different kind of DNA which not only can enhance film disambiguation by content using VI labels, but can also be used to find films that are similar in the shots – color is often used to set the atmosphere or feel of a film, so we can find films that share an atmosphere in a more subtle way than simply comparing content. This article is limited to showing how to create color ‘strips’ of a film. I’ll do another later to show how these strips can be compared with each other for similarities.
First though, let’s recap on content dna. Using the VI API we can analyze a film and pick up these kind of labels.
These can be used to navigate through a film. Here for example, I’ve noticed that we have a frame label of ‘gym’, and clicked on it to jump to the part of the film with a gym. The ‘dna’ of the frame labels is shown on the right – color coded for each label. You can imagine if we had a duplicate or film with similar content, the dna signature would be similar.
For disambiguation, we also use ‘object tracking’. This follows given item’s movement through the film – so for example if we had 2 films, both with a dog and balloon making the same relative movement – there’s a good chance they are the same film, especially if backed up by other coincident object tracking matches.
Color strips
Here’s a music video with a color strip dna just below the film, as well as content labelling.
Just as we used the content dna to navigate the film, we can also use the color tracking dna to navigate by clicking on a color in the strip.
I clicked on the yellowish section to get here.
How to make color strips
Here’s a shell that copies a file from Cloud Storage, and finds the entry in my database by using the md5 of the file. Because I want all the strips to be the same length regardless of the length of the film, I have to normalize the sample rate to take a fixed number of sample images and average the coloe of those sampled images to a single image file. Then we combine the image files into a strip, load it to cloud storage, and update my database with the url of the color strip.
#!/bin/bash # film name FILM=$1 # the api key to access my api to update my database APIKEY=$2
# some of this may not be relevant for you BUCKET="gs://MY_BUCKET/" PFX="MY_DESTINATION_GCS_FOLDER/" FX="MY_VIDEO_SOURCE_GCS_FOLDER/" API="https://MY_API_ENDPOINT"
if [ -z $FILM ]; then echo "Arg 1 should be the film name" exit 1 fi echo "...doing film ${FILM}" if [ -z $STRIP ]; then echo "Arg 2 should be the strip name" exit 1 fi echo "...creating strip ${STRIP}" if [ -z $FPS ]; then echo "Arg 3 should be the new fps to generate the required number of images" exit 1 fi echo "...new FPS is ${FPS}" if [ -z $FOLDER ]; then echo "Arg 4 should be the folder for the final result png" exit 1 fi
## get the md5 of the film ## eg 0006c7fe42adaa132c21324e121c304c.mp4 = 0006c7fe42adaa132c21324e121c304c MD5="${FILM%.*}" echo "...working on md5 ${MD5}"
# this is my small app to pick up the ID in my database that matches the film MD5 # you'll have to replace this however you are going to record the image color strip produced FILMMASTER=$(node index.mjs -v false -a ${API} -k ${APIKEY} -m ${MD5}) echo "...FILMMASTER ${FILMMASTER}"
## get film from gcs gsutil cp "${BUCKET}${FX}${FILM}" ./
# get the current number of frames # ffprobe gets installed with ffmpeg FPS=$(ffprobe -v error -select_streams v:0 -show_entries stream=avg_frame_rate -of default=noprint_wrappers=1:nokey=1 "${FILM}" | sed -E "s/\/1//" | bc -l)
if [ -z $FPS ]; then echo "Couldnt get the frame rate" exit 1 fi
# now get the duration in secs S=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "${FILM}" | bc -l) if [ -z $S ]; then echo "Couldnt get the duration" exit 1 fi
# we want a standard number of images no matter the fps or duration # so in other words change the frame rate # the new framerate needs to be the duration in secs/number of images required IMAGES=256 FPS=$(echo "$S / $IMAGES" | bc -l)
echo "...${FILM} (seconds:${S} total frames:${TF} new FPS ${FPS} for ${IMAGES} images)" FOLDER="strips/" STRIP=$(basename ${FILM})
echo "...result will be in ${FOLDER}${STRIP}" SCRATCH="./tmp/" PREFIX="${MD5}"- EXT=".png"
# make the series of images BITS="${SCRATCH}${PREFIX}d${EXT}" FINAL="${FOLDER}${STRIP}${EXT}"
# clean the previous stuff in case there's any around rm -f -- "${SCRATCH}${PREFIX}*${EXT}" rm -f -- "${FINAL}"
# this will cut the film into a standard number of images and take the average color ffmpeg -i "${FILM}" -hide_banner -loglevel error -vf tblend=all_mode=average,scale=1:1,fps=1/${FPS} "${BITS}" < /dev/null
# number of files created NFILES=$(ls ${SCRATCH} | tail -n 1 | sed "s/${PREFIX}//" | sed "s/.png//" | bc) echo "...${NFILES} created in ${SCRATCH}" < /dev/null
# get the files created and combine ffmpeg -i "${BITS}" -hide_banner -loglevel error -filter_complex scale=3:75,tile="${NFILES}x1",avgblur=sizeX=2 -update true "${FINAL}" < /dev/null
# clean tmp files rm "${SCRATCH}"* echo "...final result is ${FINAL}"
# now move to gcs DEST="${BUCKET}${PFX}/" gsutil cp "${FINAL}" "${DEST}"
# delete the film rm "${FILM}"
# do the mutation # this is my small app to update my database # you'll have some other way of recording the result location on cloud storage node index.mjs -v true -a ${API} -k ${APIKEY} -f ${FILMMASTER} -s ${STRIP}${EXT}
strip.sh
Batching
I have many thousands of films to process, so I’ll need a script that can compare all the video files I have on GCS, with all the image files and select out those that don’t yet have a color strip prepared. Here’s how
BUCKET="gs://MY_BUCKET/" gsutil ls "${BUCKET}MY_VIDEO_FOLDER" > films.lst cat films.lst | sed -E "s/.*\///g" | sort > vids.lst gsutil ls "${BUCKET}MY_IMAGE_FOLDER" | sed "s/.png//g" | sed -E "s/.*\///g" | sort > strips.lst ## files vids without strips comm -3 vids.lst strips.lst > work.lst
Video files without strips -mks.sh
Processing the batch
Since there are new films arriving all the time, I usually select out a chunk at a time with something like this
sh mks.sh | head work.lst -n 100 > w.lst
Then, run the strip.sh for the selected chunk
KEY=MY_API_KEY while IFS= read -r LINE; do echo $LINE bash strip.sh "${LINE}" ${KEY} done < w.lst
work.sh – process the list of videos
A few more
Here’s a few more color strips from random ads
Share with your network
bruce mcpherson is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://www.mcpher.com. Permissions beyond the scope of this license may be available at code use guidelines