Streaming HLS video (redux)

In a previous post, I suggested a method for streaming HLS video using ffmpeg. Unfortunately, although I could have sworn it worked at the time, based on what I now know, this seems unlikely.

As far as I can tell, there’s no way you can use the hls muxer, and pipe the output; certainly not in the way I needed to use it, as the playlist is updated over time.

All is not lost though, the solution is to output the necessary files to disk:

app.get('/hls/video', function(req, res) {
    var output = "videos/out.m3u8";
    var stats;
    try {
        stats = fs.statSync(output);
    } catch (e) {
    }
    if (stats) {
        return res.redirect(output);
    }
    var redirected = false;
    var watcher = fs.watch("videos/", (e, f) => {
        if (e === "rename" && f === file) {
            if (!res.finished) {
                res.redirect(output);
                redirected = true;
            }
            watcher.close();
        }
    });
    var proc = ffmpeg(output);
    proc.on("exit", code => {
        if (code === 0) {
            if (!redirected && !res.finished) {
                res.redirect(output);
            }
            watcher.close();
        } else {
            res.status(500).end();
        }
    });
});
  
function ffmpeg(output) {
    var cmd = "ffmpeg";
    var filter = "some complex filter expr";
    var args = ["-i", "video1.mp4"];
    ...
    args.push(
        "-vcodec", "libx264",
        "-pix_fmt", "yuv420p",
        "-f", "hls",
        "-hls_playlist_type", "event",
        "-profile:v", "baseline",
        "-level", "3.0"
    );
    args.push(output);
    return spawn(cmd, args);
}

When a video is requested, we check if the playlist already exists, if so we redirect immediately. If not, we run the ffmpeg command, and add a filewatcher.

Once the playlist (m3u8) is created, we redirect. As this is a “live” stream (playlist type: event), the playlist will be updated as the new chunks become ready.

We need to take care not to leak the filewatchers though, in the event of an error. The final piece is to serve the output artifacts, as static files:

app.use("/videos", express.static("videos", {
    setHeaders: function(res, path, stat) {
        if (path.indexOf(".ts") > -1) {
            res.set("cache-control", "public, max-age=300");
        }
    }
}));

The ts chunks can be cached (upstream, by varnish & cloudfront), but we need to ensure the m3u8 isn’t, as it will be changing until the video is complete.

Returning errors when piping a stream

Most of the examples of piping a stream of data using expressjs look like this:

app.get('/video', function(req, res) {
    var cmd = "ffmpeg";
    var args = [...];
    var proc = spawn(cmd, args);
    res.contentType('video/mp4');
    proc.stdout.pipe(res);
});

Which is great for the happy path, but means any errors from the child proc are returned as a 200; and, in my case, cached. Not ideal.

I googled it pretty hard, and even asked on SO, with no joy. Eventually, I found this article, at which point I realised I’d been asking the wrong question!

The pipe method is part of the base library, nothing to do with express (obvious, in retrospect). And, as the documentation clearly states, it calls end when the readable stream ends.

So, the solution is to do that bit yourself:

proc.stdout.pipe(res, {end: false});
proc.on("error", err => {
    console.log("error from ffmpeg", err.stack);
    res.status(500).end();
}); 
proc.on("exit", code => {
    console.log("child proc exited", code);
    res.status(code === 200 ? 200 : 500).end();
});

Boom! (Just remember to handle all the cases, so end is always called).

Streaming video to iOS devices

UPDATE: this is fake news, see my newer post for more info.

It seems that neither iOS devices, nor Safari on OS X, support mp4; so if you’re trying to stream video, you need to provide another format.

The recommendation is to use HLS, which fortunately is supported by ffmpeg, you merely need to adjust your incantation:

app.get('/hls/video', function(req, res) {
    res.contentType('application/vnd.apple.mpegurl');
    var proc = ffmpeg();
    proc.stdout.pipe(res);
});
 
function ffmpeg() {
    var cmd = "ffmpeg";
    var filter = "some complex filter expr";
    var args = ["-i", "video1.mp4"];
    ...
    args.push(
        "-vcodec", "libx264",
        "-f", "hls",
        "-hls_time", "9",
        "-hls_list_size", "0",
        "-profile:v", "baseline",
        "-level", "3.0",
        "pipe:1"
    );
    return spawn(cmd, args);
}

I could probably use conneg to decide which format to return, rather than the uri, but I’m not convinced that my caching infrastructure (varnish and cloudfront now!) would handle that correctly.

Streaming video from ffmpeg

ffmpeg is a fantastic tool for converting, concatenating, or otherwise fiddling with video content. If you can generate what you need in advance, then you can upload it to s3 (or some other CDN like option); but sometimes you need to stream video on demand.

First, you need to get the ffmpeg incantation right:

ffmpeg -i video1.mp4 -i video2.mp4 ... \
       -filter_complex "something something" \
       -movflags "frag_keyframe+empty_moov"
       -f mp4 pipe:1

Using pipe:1 sends the output to stdout (as documented here), and the movflags are needed to allow streaming in mp4 format.

With this in place, it’s easy to pipe the output data to a browser using expressjs:

#!/usr/bin/env node
"use strict";

const express = require('express');
const { spawn } = require('child_process');

var app = express();

app.get('/video', function(req, res) {
    res.contentType('video/mp4');
    var proc = ffmpeg();
    proc.stdout.pipe(res);

    res.on("close", () => {
        proc.kill("SIGKILL");
    });
});

app.listen(4000);

function ffmpeg() {
    var cmd = "ffmpeg";
    var filter = "some complex filter expr";
    var args = ["-i", "video1.mp4"];
    ...
    args.push(
        "-filter_complex", filter,
        "-s", "1280x720",
        "-acodec", "aac",
        "-vcodec", "h264",
        "-movflags", "frag_keyframe+empty_moov",
        "-f", "mp4",
        "pipe:1"
    );
    return spawn(cmd, args);
}

I found I ended up with “zombie” ffmpeg processes, if the client connection had closed before the video ended (because it’s still trying to write data to the pipe?). There’s probably a neater way to solve that, but kill -9 is pretty effective!

At this point, you can stream some video; but this solution won’t scale, every request runs the ffmpeg command, which is pretty cpu intensive. It would be relatively simple to cache the output, either in memory or on disk; or you could put the node process behind a caching proxy.

However, the best solution for our needs seemed to be using this as an “origin server” for Cloudfront. This means that only the first request (of each type) hits our server, and the response is cached for as long as necessary.

If you need to render different videos for every request, then you’ll just have to throw money at it!

SSH keys in Docker

One of the first stumbling blocks, when using Docker, is often discovering that your SSH keys are on the wrong side of the hatchway.

A simple solution is just to copy them, when building the image:

COPY ~/.ssh /root/.ssh

And this is reasonable, if the image is never going to leave your laptop. Any respectable tinfoil hat wearing neckbeard will look for an alternative though.

An obvious next step is to mount the same folder, as a volume:

SSH=~/.ssh
docker run -it --rm -v ${SSH}:/root/.ssh app npm install

Unfortunately, you’ll probably find at this point that the permissions are not correct inside the container, and that if you try and change them, they are hopelessly entangled with the permissions outside the container.

After much googling, and trying ideas from StackOverflow, the best solution I found was to use the same ssh-agent as the host:

SSH=~/.ssh
docker run -it --rm -v ${SSH}:/root/.ssh -v $SSH_AUTH_SOCK:/ssh-agent -e SSH_AUTH_SOCK=/ssh-agent app npm install

(I’m still mounting the .ssh folder, but only to recycle my known_hosts file).

Or, using compose:

version: '2' 
services:
  app:
    build: .
    volumes:
      - .:/app
      - ~/.ssh:/root/.ssh
      - $SSH_AUTH_SOCK:/ssh-agent

Getting a rax image id

If you use the Rackspace API to create servers (e.g. via the ansible module), you need to know the id of the image you want to use.

It’s pretty easy to make the list request using curl, but the json is quite noisy and it’s paged; so I wrote a script:

#!/usr/bin/env python2

# https://developer.rackspace.com/docs/cloud-images/quickstart/

import os
import json
import urllib2
from urlparse import urlparse

username = os.environ['RAX_USERNAME']
api_key = os.environ['RAX_API_KEY']
region = os.environ['RAX_REGION']

url = 'https://identity.api.rackspacecloud.com/v2.0/tokens'
headers = {
    'Content-Type': 'application/json'
}
req_body = {
    'auth': {
        'RAX-KSKEY:apiKeyCredentials': {
            'username': username,
            'apiKey': api_key
        }
    }
}

data = json.dumps(req_body)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
res_body = json.loads(response.read())
access = res_body['access']
token = access['token']['id']
#print "Token: {0}".format(token)
endpoints = [s for s in access['serviceCatalog'] if s['name'] == 'cloudImages'][0]['endpoints']
endpoint = [e for e in endpoints if e['region'] == region][0]['publicURL']
#print "Endpoint: {0}".format(endpoint)
url = '{0}/images'.format(endpoint)
#print "Url: {0}".format(url)
images = []

while url != None :
    headers = {
        'X-Auth-Token': token
    }
    req = urllib2.Request(url, headers=headers)
    response = urllib2.urlopen(req)
    res_body = json.loads(response.read())
    images = images + map(lambda i: {'id': i['id'], 'name': i['name']}, res_body['images'])

    if 'next' in res_body :
        u = urlparse(req.get_full_url())
        next_page = '{0}://{1}{2}'.format(u.scheme, u.netloc, res_body['next'])
        #print "Next: {0}".format(next_page)
        url = next_page
    else :
        url = None

for i in images:
    print "{0}: {1}".format(i['name'], i['id'])

UPDATE: the rax module allows you to use a name now (e.g. “Debian 8 (Jessie) (PVHVM)”), but this can still be useful for listing other rax “things”