(March 2012)

Offline playback of Youtube videos and their annotations

So you just watched a nifty video on Youtube... it's really cool, so you proceed to download it (via youtube-dl) and keep it in your offline video library. But... the video is also annotated; i.e. it displays these "subtitle-like" text balloons while it plays, and these are a big part of why it is actually worth watching...

How can you save these, too? Resort to drastic measures, like recording your desktop while watching...? Surely there must be a better way - after all, these annotations are plain text that is "splashed on" during playback... Isn't there a way to keep both "sources" (i.e. the video and its annotations data) and play them back "in tandem"?

Different scenario: you are careless, like me:

It's my fault, of course - I assumed that if Vimeo can handle it, so would Youtube - but I was wrong. How can I salvage the effort I spent on my video annotations, and use it with my high-res video (which I still have on my hard drive)?

Open source tools to the rescue...

Getting the annotations

Googling a bit, I soon find out about this:

$ wget -O annotations.xml \

...where VIDEO_ID is the video identification part from Youtube videos, i.e. the part in red in the link below:

Update, October 2013: It seems this service has moved now to...

$ wget -O annotations.xml \

So I try it; and review my video's annotation data...

<?xml version="1.0" encoding="utf-8"?>
<document latest_timestamp="1330014529506131" polling_interval="30">
  <requestHeader video_id="o0BgAp11C9s" />
    <annotation author="ttsiodras" id="annotation_150202" 
        style="anchored" type="text">
      <TEXT>The most important parts for C/C++ coding...</TEXT>
        <movingRegion type="anchored">
          <anchoredRegion d="0" h="47.77" sx="17.68" sy="54.72" 
            t="0:02:42.1" w="96.54" x="2.64" y="4.72" />
          <anchoredRegion d="0" h="47.77" sx="17.68" sy="54.72" 
            t="0:02:51.7" w="96.54" x="2.64" y="4.72" />

This file contains all the annotation data I need:

Ingredient 1 for the soup: annotation data - check.

(if my annotations were "subtitle-like" - i.e. small one liners, that can just be placed at the bottom of the screen - then this data would be enough; I would code a simple xml2srt filter, that would create an .srt subtitle for my video (MPlayer can use .srt during playback). Alas, my annotations are more complex; each one is to be displayed at a different rectangular area in the video, so more work is needed...)

Displaying text during playback

Opening up the manpage of the awesome MPlayer, I review the list of video filters, looking for one that allows me to display stuff during playback - and I notice bmovl:

bmovl: The bitmap overlay filter reads bitmaps from a FIFO and displays
       them on top of the movie...

Spot on! Time to try this out: I setup a FIFO, and try sending MPlayer a box of RGB noise, just to see it while playing:

$ cp /path/to/snapshot.png . # this is a 1024x768 snapshot
$ mkfifo bmovlFIFO
$ # Tell MPlayer to play the snapshot over and over (phony video)
$ # and to read bmovl overlay bitmap data from the bmovlFIFO
$ mplayer -vf bmovl=0:0:bmovlFIFO mf://snapshot.png -loop 0 >/dev/null 2>&1 & 
$ sleep 3 # wait a bit for MPlayer to start
$ # Now prepare and send a rectangle of noise of 517x58x4 =
$ # (times 4, for RGBA: one byte for each component) 119944 bytes
$ dd if=/dev/urandom of=box.rgba  bs=1 count=119944 >/dev/null 2>&1
$ echo 'RGBA32 517 58 100 100 0 1' > bmovlFIFO
$ cat box.rgba > bmovlFIFO
$ # At this point, I should be seeing a rectangle
$ sleep 10

Instead, I see this distorted image in the MPlayer window:

bmovl distortion in current MPlayer (2012/03)
bmovl distortion in current MPlayer (2012/03)

This looks like an off-by-one error - each new scanline moves one pixel to the left, so the box ends up "tilted". Surely this is a bug, so I report it to the MPlayer folks and open a Bugzilla ticket.

A day later, a gentleman replies - he indicates that a related patch is in the pipeline for inclusion in MPlayer...

I checkout the latest MPlayer from the official repos, apply the patch on it...

It works! A perfect rectangle is shown during playback!

Ingredient 2 for the soup: showing bitmaps inside MPlayer's window during playback: check.

Text to image

There's only one final ingredient missing: I need to convert my TEXT into nice bitmap rectangles, that will be sent to MPlayer over the bmovl FIFO... Well, I use ImageMagick for most of my image processing - it has never failed me before...

$ cat > sampleText
But that's not all - it is also vital to be able to navigate across 
macro, variable, type, function and class declarations - by using ctags, 
I can instantly jump to the relevant declarations by placing the cursor 
on something and hitting "Ctrl-]" (going to the declaration of 
CallTypeLevel1). I use it again to go deeper (to the declaration of BUF_TYPE) 
and go back to where I was, by "Ctrl-t"
$ convert -trim -size 517x358 -pointsize 19 -depth 8 \
    -fill black -background orange caption:@sampleText box.png

And as ever, it works its magic - giving me this:

TEXT converted to bitmap
TEXT converted to bitmap

I improve the output a bit, using some ImageMagick-foo:

$ convert -bordercolor orange -border 15 box.png annotation.png
$ convert annotation.png -fill gray50 -colorize '100%' -raise 8 \
    -normalize -blur 0x4 light.png
$ convert annotation.png light.png -compose hardlight \
    -composite finalAnnotation.png

Now it gives me this:

TEXT converted nicely to bitmap
TEXT converted nicely to bitmap

Final ingredient for the soup: text to bitmap: check.

Time to enter the Python kitchen... :‑)

Putting it all together in a Python script

The end-user steps:

  1. The user downloads his video from youtube (via youtube-dl or whatever other Youtube downloader):

    $ youtube-dl -o vimPower.flv ''

  2. The video's annotation data are next:

    $ wget -O annotations.xml ''

  3. Then he runs my tiny Python script:

    $ annotations.xml vimPower.flv

The script then...

This is the core of my script's main() function:

width, height, fps = DetectVideoSizeAndLength(sys.argv[2])
childMPlayer = CreateFifoAndSpawnMplayer()
annotations = parseAnnotations(sys.argv[1])
startTime = time.time()
fifoToMplayer = open("bmovl", "w")
for bt in sorted(annotations.keys()):
    annotation = annotations[bt]
    nextTimeInSeconds = getTime(annotation._t0)
    CreateAnnotationImage(annotation, width, height)
    currentTime = time.time()
    if not SleepAndCheckMplayer(childMPlayer, startTime+nextTimeInSeconds-currentTime):
    renderArea = SendAnnotationImageToFIFO(annotation, width, height, fifoToMplayer)
    nextTimeInSeconds = getTime(annotation._t1)
    currentTime = time.time()
    if not SleepAndCheckMplayer(childMPlayer, startTime+nextTimeInSeconds-currentTime):
    SendClearBufferToFIFO(fifoToMplayer, renderArea)

Here's the script: it works for my VIM video, and I have also tested it on a few other Youtube videos. You can see the results (a) in the Vimeo version of my VIM advocacy or (b) download a full-HD version of the same video. If you do decide to use this script, remember that you must also patch your MPlayer, since the bmovl filter is currently (2012/03) broken.


