“Trouble parsing json”

We use Bunyan in our node apps, for “structured logging”. The output json string is passed to syslog, by systemd, and then fed into ELK.

{
    "name":"foo-service",
    "hostname":"app-01",
    "pid":30988,
    "ip":"1.19.24.8",
    "requestId":"1c11f448-73f2-4efa-bc63-3de787618d49",
    "level":50,
    "err": {
        "message":"oh noes!"
    }
}

Unfortunately, if that string is longer than 2048 chars (usually a stacktrace, or html returned from a web service instead of json), then the json blob ends up split over 2 lines in syslog.

This causes ELK to barf when attempting to parse the broken lines (assuming you are parsing as json), and means you won’t see those errors in Kibana.

It is possible to detect the error parsing the error, by searching for the string “Trouble parsing json”, but that’s not really a solution.

I would prefer to see a truncated error, than have the current situation, but that means either wrapping or patching Bunyan itself.

Node.js and systemd-notify

Running a nodejs app as a systemd service is pretty simple. But the downside to this approach is that if the app dies straight away (e.g. a config file is malformed), then systemd remains unaware.

An alternative is to switch to using notify:

[Service]
Type=notify
ExecStart=/var/www/app_name/app.js
Restart=always
...

There are various ways to notify systemd that your service is ready; but the simplest is to use systemd-notify, a console wrapper around sd_notify.

this.start = function(port) {
    var deferred = Q.defer();

    server = app.listen(port, '127.0.0.1', function(error) {
        if (error) {
            deferred.reject(new Error(error));
        } else {
            logger.info('Listening on port %d', port);
            exec('systemd-notify --ready');
            deferred.resolve();
        }
        });

        return deferred.promise;
    };
};

If the process that calls systemd-notify is not the one in ExecStart (e.g. you are using cluster), you will also need to set NotifyAccess to “all”.

Now when you use systemctl to start your service, it will wait for the notification before deeming the start process complete.

Jenkins and oauth2_proxy

We hide Jenkins behind bitly’s oauth2_proxy, to control access using our Google accounts. After recently upgrading to Debian Jessie (amongst other things), we found that enabling security on Jenkins (using the Reverse Proxy Auth plugin) resulted in an error:

java.lang.NullPointerException
	at org.jenkinsci.plugins.reverse_proxy_auth.ReverseProxySecurityRealm$1.doFilter(ReverseProxySecurityRealm.java:435)
	at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1482)
	at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1482)
...

Following the stack trace, we find ourselves here. It’s pretty obvious that the NRE is caused by u being null, but the real question is why we are in that if block at all.

It turns out that at some point the oauth proxy started sending a basic auth header, as well as the X-Forwarded ones we need. This makes the Jenkins plugin sad, when it tries to look up the user.

Unfortunately, there is currently no way to have one without the other, which is an issue for other upstream applications. Hopefully at some point that flag will be added, but until then I’ve simply deleted the offending line.

Error from postgresql_user

We have an ansible task that creates a postgres user (role) for vagrant:

- name: Create vagrant user
  sudo: true
  sudo_user: postgres
  postgresql_user: name=vagrant role_attr_flags=CREATEDB,CREATEUSER

which was working fine with ansible 1.9; but when we upgraded to 2.0, we started getting an error if the user already existed.

TASK [pg-vagrant : Create vagrant user] ****************************************
fatal: [default]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "\r\nTraceback (most recent call last):\r\n  File \"/tmp/ansible-tmp-1457016468.17-9802839733620/postgresql_user\", line 2722, in \r\n    main()\r\n  File \"/tmp/ansible-tmp-1457016468.17-9802839733620/postgresql_user\", line 621, in main\r\n    changed = user_alter(cursor, module, user, password, role_attr_flags, encrypted, expires, no_password_changes)\r\n  File \"/tmp/ansible-tmp-1457016468.17-9802839733620/postgresql_user\", line 274, in user_alter\r\n    if current_role_attrs[PRIV_TO_AUTHID_COLUMN[role_attr_name]] != role_attr_value:\r\n  File \"/usr/lib/python2.7/dist-packages/psycopg2/extras.py\", line 144, in __getitem__\r\n    x = self._index[x]\r\nKeyError: 'rolcreateuser'\r\n", "msg": "MODULE FAILURE", "parsed": false}

When I checked that the user had been created successfully the first time I provisioned:

=# \dg
                                 List of roles
    Role name     |                   Attributes                   | Member of
------------------+------------------------------------------------+-----------
 postgres         | Superuser, Create role, Create DB, Replication | {}
 vagrant          | Superuser, Create DB                           | {}

I noticed that the role actually had Superuser privileges, something the documentation confirmed:

CREATEUSER
NOCREATEUSER
These clauses are an obsolete, but still accepted, spelling of SUPERUSER and NOSUPERUSER. Note that they are not equivalent to CREATEROLE as one might naively expect!

So it looks like somewhere in the toolchain (ansible, psycopg2, postgres) this is no longer supported. Substituting SUPERUSER for CREATEUSER fixed the issue.

Tennis Kata in Erlang

I’ve recently been reading ploeh’s series on doing the Tennis Kata in F#. I decided to have a go in Erlang, we might not have a fancy type system but we do have pattern matching and partial functions.

You can find the source code to follow along here.

I decided to implement a “game” as a gen_server (when you have a hammer…), I refer you to a previous post for the pre-amble.

Love all

The first test is to check the initial score:

initial_score(Pid) ->
    fun() ->
        ?assertEqual(#{p1=>love,p2=>love}, gen_server:call(Pid, get_score))
    end.

I decided to return a map, rather than a tuple or record, and to use the atom “love” rather than 0. The implementation is pretty trivial:

init([]) ->
    {ok, #{p1=>love,p2=>love}}.

handle_call(get_score, _From, State) ->
    {reply, State, State};

If there was more information in the state map, you might want to return a projection rather than the whole object.

Fifteen love

fifteen_love(Pid) ->
    fun() ->
        ?assertEqual(#{p1=>15,p2=>love}, gen_server:call(Pid, {won_point, p1}))
    end.

This time we need to calculate the new score when a player has won a point:

handle_call({won_point, p1}, _From, State = #{p1:=CurrentScore}) ->
    NewState = State#{p1:=new_score(CurrentScore)},
    {reply, NewState, NewState};

...

new_score(love) -> 15.

Love fifteen

love_fifteen(Pid) ->
    fun() ->
        ?assertEqual(#{p1=>love,p2=>15}, gen_server:call(Pid, {won_point, p2}))
    end.

We can re-use the new_score method from before:

handle_call({won_point, p2}, _From, State = #{p2:=CurrentScore}) ->
    NewState = State#{p2:=new_score(CurrentScore)},
    {reply, NewState, NewState};

Fifteen all

fifteen_all(Pid) ->
    fun() ->
        won_point(Pid, p1),
        ?assertEqual(#{p1=>15,p2=>15}, won_point(Pid, p2))
    end.

won_point(Pid, Player) ->
    gen_server:call(Pid, {won_point, Player}).

No need for any new code, but it seemed like time for a little test refactoring.

Thirty fifteen

thirty_fifteen(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p2),
        ?assertEqual(#{p1=>30,p2=>15}, won_point(Pid, p1))
    end.

Now we need to add another function head, to handle the new score:

new_score(love) -> 15;
new_score(15) -> 30.

Thirty all

thirty_all(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p2),
        ?assertEqual(#{p1=>30,p2=>30}, won_point(Pid, p1))
    end.

No new code required.

Forty thirty

forty_thirty(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p1),
        won_point(Pid, p2),
        ?assertEqual(#{p1=>40,p2=>30}, won_point(Pid, p1))
    end.

Again, we need to handle a new score:

...
new_score(30) -> 40.

P1 win

p1_wins(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p1),
        won_point(Pid, p1),
        ?assertEqual({game_over, p1}, won_point(Pid, p1))
    end.

This requires a special case, for when the player has 40 points:

handle_call({won_point, p1}, _From, State = #{p1:=40}) ->
    {stop, normal, {game_over, p1}, State};

But now we have a problem, as I skipped over the troublesome “deuce”.

Deuce

deuce(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p1),
        won_point(Pid, p1),
        won_point(Pid, p2),
        ?assertEqual(#{p1=>deuce,p2=>deuce}, won_point(Pid, p2))
    end.

Again, I decided to use an atom, rather than 40-40. But now we need to take both player’s scores into account when determining the new score:

handle_call({won_point, p1}, _From, State = #{p1:=P1Score,p2:=P2Score}) ->
    {NewP1Score, NewP2Score} = new_score(P1Score, P2Score),
    NewState = State#{p1:=NewP1Score, p2:=NewP2Score},
    {reply, NewState, NewState};

...

new_score(love, Score) -> {15, Score};
new_score(15, Score) -> {30, Score};
new_score(30, 40) -> {deuce, deuce};
new_score(30, Score) -> {40, Score}.

Advantage

advantage_p1(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p1),
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p2),
        ?assertEqual(#{p1=>advantage,p2=>40}, won_point(Pid, p1))
    end.

Another case to handle:

...
new_score(deuce, deuce) -> {advantage, 40}.

Back to Deuce

advantage_p1_back_to_deuce(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p1),
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p2),
        won_point(Pid, p1),
        ?assertEqual(#{p1=>deuce,p2=>deuce}, won_point(Pid, p2))
    end.

Now we need to add a guard to the win case:

handle_call({won_point, p2}, _From, State = #{p1:=P1Score, p2:=40}) when P1Score =/= advantage ->
    {stop, normal, {game_over, p2}, State};

...

new_score(40, advantage) -> {deuce, deuce}.

Advantage, then win

advantage_p1_then_win(Pid) ->
    fun() ->
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p1),
        won_point(Pid, p1),
        won_point(Pid, p2),
        won_point(Pid, p2),
        won_point(Pid, p1),
        ?assertEqual({game_over, p1}, won_point(Pid, p1))
    end.

Finally, we need to allow a win from Advantage:

handle_call({won_point, p1}, _From, State = #{p1:=advantage}) ->
    {stop, normal, {game_over, p1}, State};

I think that’s everything covered. As Mark says, “it’s easy enough that it’s fun to play with, but difficult enough that it’s fun to play with”. I personally feel that the use of pattern matching made the intent of the code clearer, but of course YMMV.

Managing nodejs dependencies

The easiest way to “deploy” a node app is to clone the git repo on a server, and run npm install. There are a couple of disadvantages though: first, I don’t really like having to install git, and manage credentials for a private repo.

Second, installing the dependencies like that means you may get different versions of the modules you rely on than you were expecting. One of the tenets of a reliable build pipeline is ensuring that builds are repeatable, and that what you deploy matches what you tested.

There are a few alternatives: you could vendor in the node_modules folder, but this dirties the commit history, and increases the size of your repo. You could use npm shrinkwrap, which is the same concept as Bundler’s Gemfile.lock, a list of specific versions to install. This is definitely an improvement, but still leaves the risk of npm i failing during a deployment.

I’d prefer to only install the dependencies once, on the build server. This means I can run the tests, then tar up the folder and upload that to each environment in turn:

npm install
npm test
npm prune --production
npm dedupe
tar -czf app.tar.gz --exclude='*/coverage' app/

After running the tests, we prune the dependencies to remove mocha etc. We then dedupe to try and reduce the number of copies of shared modules, and finally create an archive of the app. This is output as an artifact of the build, and pulled in the by the deploy to staging build, output from that again, and finally pulled in by the deploy to production build.

Buffered channels in Erlang

Golang has the handy concept of a buffered channel (a similar concept to .NET’s blocking collections). It allows you to throttle your code, by blocking when adding an item to a collection that is already full.

Erlang doesn’t offer anything similar, out of the box (mailboxes will happily increase in size until the VM topples over), but it’s pretty easy to implement using a gen_server. You only need to handle 2 messages, adding an item:

init([Limit]) ->
    State = #{
        limit => Limit,
        count => 0,
        add_q => queue:new(),
        get_q => queue:new()
    },
    {ok, State}.

handle_call(add, From, State = #{count:=Count, limit:=Limit, add_q:=AddQ, get_q:=GetQ}) ->
    UpdatedCount = Count + 1,
    case UpdatedCount > Limit of
        false ->
            case queue:is_empty(GetQ) of
                true ->
                    {reply, ok, State#{count => UpdatedCount}};
                _ ->
                    {{value, Client}, UpdatedQ} = queue:out(GetQ),
                    gen_server:reply(Client, ok),
                    {reply, ok, State#{get_q => UpdatedQ}}
            end;
        _ ->
            {noreply, State#{add_q => queue:in(From, AddQ)}}
    end;

The process is initialised with a limit, the number of items that can be added to the queue before it will block. This collection is FIFO, but another data structure, e.g. a stack, could be used.

When an add message is received, the current size of the collection is checked. If adding another item would breach the limit, then the pid of the caller is added to the wait queue. Otherwise, the number of items is incremented (this version only keeps a count of items, rather than an actual list, but adding that would be trivial).

If any process is blocked waiting to get an item from the collection, then it is unblocked and an item removed immediately. We also need to handle a get message:

handle_call(get, From, State = #{count:=Count, name:=_Name, add_q:=AddQ, get_q:=GetQ}) ->
    case Count > 0 of
        true ->
            case queue:is_empty(AddQ) of
                true ->
                    UpdatedCount = Count - 1,
                    {reply, ok, State#{count => UpdatedCount}};
                _ ->
                    {Client, UpdatedQ} = queue:out(AddQ),
                    gen_server:reply(Client, ok),
                    {reply, ok, State#{add_q => UpdatedQ}}
            end;
        _ ->
            {noreply, State#{get_q => queue:in(From, GetQ)}}
    end;

which is pretty much the opposite, check if the collection is empty. Block if it is, or return an element to the caller.

Once your collection is up & running (and supervised!), using it is as simple as calling any other gen_server:

ok = gen_server:call(queue_1, get, infinity),
do_something(),
ok = gen_server:call(queue_2, add, infinity),

I’ve set the timeout to infinity, but that might not be a good idea for production code.

It would also be pretty simple to back the collection with disk storage, e.g. using a DETS table.

Ansible templates and urlencode

If you use ansible templates to generate your config, and strong passwords, chances are that you know to use urlencode:

{
    "postgres": "postgres://{{ db_user }}:{{ db_password|urlencode() }}@{{ db_server }}/{{ db_name }}",
}

to ensure that any special chars are not mangled. Unfortunately, as I discovered to my cost, the jinja filter does NOT encode a forward slash as %2F.

Arrow functions returning an expression

One of the new features of ES6 that has made it into Node 4 is arrow functions. According to the documentation, there are 2 alternative syntaxes:

(param1, param2, …, paramN) => { statements }
(param1, param2, …, paramN) => expression
         // equivalent to:  => { return expression; }

Using the 2nd version works as expected for primitive values:

> [1,2,3].map(i => i);
[ 1, 2, 3 ]

but not when you return an object:

> [1,2,3].map(i => { id: i });
[ undefined, undefined, undefined ]

As was pointed out to me on SO, it’s impossible to tell the difference between a statement block and an expression consisting of an object literal.

The solution is to wrap the returned object in an extra set of parentheses:

> [1,2,3].map(i => ({ id: i }));
[ {id: 1}, {id: 2}, {id: 3} ]

Jenkins + SSH keys

Jenkins makes it very easy to manage SSH keys. You can use the Credentials plugin to store the key, and then the SSH Agent plugin to seamlessly expose it to your build.

The downside is that now everyone with access to Jenkins has access to that key. It’s possible to use roles to restrict access through the web UI, but in our case it’s useful to allow access to the machine Jenkins is running on (for debugging purposes). And Jenkins itself has r+w privileges, so it’s all but impossible to prevent reading that file.

When the key is used for deploying to production, that’s a problem. Access to the key itself is actually useless, as it’s passphrase protected, but using the solution described above means the passphrase is stored in a credentials.xml file in $JENKINS_HOME. The file is encrypted, but reversing that is trivial.

It would be handy if the SSH Agent plugin allowed prompting for the passphrase before running a build, but that doesn’t appear to be a thing. It is possible however, to use the Parameterized Build plugin to emulate that.

This means you need to start ssh-agent yourself, and due to the fact that ssh-add doesn’t play nice with stdin, there’s some hoop jumping involved. The easiest method seems to involve using expect:

#!/bin/bash

expect << EOF
  spawn ssh-add $1
  expect "Enter passphrase"
  send "$SSH_PASSPHRASE\r"
  expect eof
EOF

Then, assuming you added a build parameter named SSH_PASSPHRASE, you can use this script after launching ssh-agent and before you need the ssh key:

eval `ssh-agent`
./ssh-add-pass ./key_file
./run_playbook