Grouping by time period in Redshift

We use Redshift for reporting (or, more accurately, the tool we use for reporting uses Redshift).

It’s pretty simple to calculate daily rollups, using datetrunc:

select date_trunc('day', foo) date, sum(bar) bar
from [baz]
where ...
group by date

Or just by casting to a date:

select [foo:date] date, sum(bar) bar
from [baz]
where ...
group by date

But what if you want to slice into smaller time periods?

The easiest way I’ve found, is to use to_char, and format the timestamp appropriately. e.g.

select to_char(foo, 'YYYY-MM-DD HH24:MI') date, sum(bar) bar
from [baz]
where ...
group by date

Using “free” wi-fi on Linux

I recently found myself at a train station that claimed to offer “free super-fast wi-fi”; but when I connected to it, I was stuck with a little question mark in the gnome toolbar, and the chrome dinosaur.

I’m not the first person to note this, but I did manage to get it working… eventually.

I opened up a terminal, and even though the browser was sad, I could ping:

$ ping google.com
PING google.com (216.58.214.14) 56(84) bytes of data.
From _gateway (10.102.0.5) icmp_seq=1 Destination Net Prohibited

Sort of. DNS was working, anyway. So I tried curl:

$ curl -vL "https://www.google.com"
* Rebuilt URL to: https://www.google.com/
*   Trying 172.217.19.196...
* TCP_NODELAY set
* Connected to www.google.com (172.217.19.196) port 443 (#0)
... snip ...

And it looked like I was getting the actual Google homepage, not a hijacked page, which was interesting.

At that point I took the IP address from that request, and stuck it directly in the browser, hoping to skip straight there; and now the browser decided to show me the T&C page I needed. Once I’d ticked the right box, I was online.

Adding/removing LBaaS pool members, using Ansible

We use a load balancer to achieve “zero downtime deployments”. During a deploy, we take each node out of the rotation, update the code, and put it back in. If you’re using the Rackspace Cloud, there are ansible modules provided (as described here).

If you’re just using OpenStack though, it’s not quite as simple. While there are some ansible modules, there doesn’t seem to be one for managing LBaaS pool members. It is possible to create a pool member using a heat template, but that doesn’t really fit into an ansible playbook.

The only alternative seems to be using the neutron cli, which is deprecated (but doesn’t seem to have been replaced, for controlling a LBaaS anyway).

Fortunately, it can return output in a machine readable format (json), which makes calling it from ansible relatively simple. To add a pool member, in an idempotent fashion:

- name: Get current pool members
  local_action: command neutron lbaas-member-list test-gib-lb-pool --format json
  register: member_list

- set_fact: pool_member={{ member_list.stdout | from_json | selectattr("address", "equalto", ansible_default_ipv4.address) | map(attribute="id") | list }}

- name: Remove node from LB
  local_action: command neutron lbaas-member-delete {{ pool_member | first }} test-gib-lb-pool
  when: (pool_member | count) > 0

You first need to check if the current host is in the list of pool members. If so, remove it; otherwise, do nothing.

Once the code is updated, and the services restarted, you can put the node back in the pool:

- name: Add node to LB
  local_action: command neutron lbaas-member-create --name {{ inventory_hostname }} --subnet gamevy_subnet --address {{ ansible_default_ipv4.address }} --protocol-port 443 test-gib-lb-pool

This seems to be quite effective, but we have seen some errors when the members are deleted (which isn’t exactly “zero” downtime). If I find a better approach I’ll update this.

A squad of squids

We use squid as a forward proxy, to ensure that all outbound requests come from the same (whitelisted) IP addresses.

Originally, we chose the proxy to use at deploy time, using an ansible template:

"proxy": "http://{{ hostvars['proxy-' + (["01", "02"] | random())].ansible_default_ipv4.address }}:3128"

This “load balanced” the traffic, but wouldn’t be very useful if one of the proxies disappeared (the main reason for having two of them!)

I briefly considered using nginx to pass requests to squid, as it was already installed on the app servers; but quickly realised that wouldn’t work, for the same reason we needed to use squid in the first place: it can’t proxy TLS traffic.

A bit of research revealed that you can group multiple squid instances into a hierarchy. Taken care of by some more templating, in the squid config this time:

{% if inventory_hostname in groups['app_server'] %}
{% for host in proxy_ips %}
cache_peer {{ host }} parent 3128 0 round-robin
{% endfor %}
never_direct allow all
{% endif %}

Where the list of IPs is a projection from inventory data:

     
proxy_ips: "{{ groups['proxy_server'] | map('extract', hostvars, ['ansible_default_ipv4', 'address']) | list }}"

Withdraw!

The next transition is a withdrawal. Naively, we add another command:

open(_Data) ->
    [
        {open, {call, bank_statem, deposit, [pos_integer()]}},
        {open, {call, bank_statem, withdraw, [pos_integer()]}}
    ].

Which causes an immediate failure:

===> Testing prop_bank_statem:prop_test()
...
=ERROR REPORT==== 29-Apr-2018::15:08:41 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,2}}
** When server state  = {open,#{balance => 1}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,2},
                   #{balance => 1}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
Crash dump is being written to: erl_crash.dump...done

Having the entire process crash isn’t very useful, so we follow the recommendation to use the TRAPEXIT macro:

prop_test() ->
    ?FORALL(Cmds, proper_fsm:commands(?MODULE),
        ?TRAPEXIT(
            begin
                bank_statem:start_link(),
                {History,State,Result} = proper_fsm:run_commands(?MODULE, Cmds),
                bank_statem:stop(),
                ?WHENFAIL(io:format("History: ~p\nState: ~p\nResult: ~p\n", [History,State,Result]),
                    aggregate(zip(proper_fsm:state_names(History), command_names(Cmds)), Result =:= ok))
            end)).

This allows shrinking to take place, and gives us (slightly) more useful output:

===> Testing prop_bank_statem:prop_test()
.!
Failed: After 2 test(s).
A linked process died with reason {function_clause,[{bank_statem,open,[{call,{,#Ref}},{withdraw,3},#{balance=>0}],[{file,[47,97,112,112,47,95,98,117,105,108,100,47,116,101,115,116,47,108,105,98,47,98,97,110,107,95,115,116,97,116,101,109,47,115,114,99,47,98,97,110,107,95,115,116,97,116,101,109,46,101,114,108]},{line,46}]},{gen_statem,call_state_function,5,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1633}]},{gen_statem,loop_event_state_function,6,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1023}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,247}]}]}.

=ERROR REPORT==== 29-Apr-2018::15:11:36 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,3}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,3},
                   #{balance => 0}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
[{set,{var,1},{call,bank_statem,withdraw,[3]}}]

Shrinking (0 time(s))
[{set,{var,1},{call,bank_statem,withdraw,[3]}}]

=ERROR REPORT==== 29-Apr-2018::15:11:36 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,3}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,3},
                   #{balance => 0}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
....
===> 
0/1 properties passed, 1 failed
===> Failed test cases:
  prop_bank_statem:prop_test() -> false

We can see that the error is caused by trying to withdraw funds that don’t exist. This can be avoided by adding a pre-condition, to ensure that invalid commands aren’t generated:

precondition(_From, _To, #{balance:=Balance}, {call, bank_statem, withdraw, [Amount]}) ->
    Balance - Amount >= 0;

And also update our model when a withdrawal occurs:

next_state_data(_From, _To, #{balance:=Balance}=Data, _Res, {call, bank_statem, withdraw, [Amount]}) ->
    NewBalance = Balance - Amount,
    Data#{balance:=NewBalance};

At this point, I would expect the property to pass, but it’s still failing:

===> Testing prop_bank_statem:prop_test()
...................!
Failed: After 20 test(s).
A linked process died with reason {function_clause,[{bank_statem,open,[{call,{,#Ref}},{withdraw,9},#{balance=>9}],[{file,[47,97,112,112,47,95,98,117,105,108,100,47,116,101,115,116,47,108,105,98,47,98,97,110,107,95,115,116,97,116,101,109,47,115,114,99,47,98,97,110,107,95,115,116,97,116,101,109,46,101,114,108]},{line,46}]},{gen_statem,call_state_function,5,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1633}]},{gen_statem,loop_event_state_function,6,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1023}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,247}]}]}.

=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...

Shrinking .
=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
.
=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
(2 time(s))

=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
[{set,{var,1},{call,bank_statem,deposit,[2]}},{set,{var,2},{call,bank_statem,deposit,[7]}},{set,{var,3},{call,bank_statem,withdraw,[9]}}]
===> 
0/1 properties passed, 1 failed
===> Failed test cases:
  prop_bank_statem:prop_test() -> false

Rather embarrassingly, that’s an actual bug in my code, the guard is checking that the remaining balance is strictly greater than 0. Hurray for property testing!

State machine properties

One of the more interesting corners of the Erlang ecosystem is property based testing. Building on Haskell’s QuickCheck, it allows generation of test cases, similar to “fuzzing”.

The integration with gen_fsm/statem makes stateful testing remarkably easy, by describing the possible transitions for our state machine:

-module(prop_bank_statem).

-include_lib("proper/include/proper.hrl").

-compile(export_all).

prop_test() ->
    ?FORALL(Cmds, proper_fsm:commands(?MODULE),
        begin
            bank_statem:start_link(),
            {History,State,Result} = proper_fsm:run_commands(?MODULE, Cmds),
            bank_statem:stop(),
            ?WHENFAIL(
                io:format("History: ~p\nState: ~p\nResult: ~p\n", [History,State,Result]),
                aggregate(zip(proper_fsm:state_names(History), command_names(Cmds)), Result =:= ok)
            )
        end).

initial_state() -> open.

initial_state_data() -> #{}.

open(_Data) -> [{open, {call, bank_statem, deposit, [pos_integer()]}}].

weight(_FromState, _ToState, _Call) -> 1.

precondition(_From, _To, #{}, {call, _Mod, _Fun, _Args}) -> true.

postcondition(_From, _To, _Data, {call, _Mod, _Fun, _Args}, _Res) -> true.

next_state_data(_From, _To, Data, _Res, {call, _Mod, _Fun, _Args}) ->
    NewData = Data,
    NewData.

To begin with, we’re just checking that a deposit can be made to an open account. This property succeeds (after the default 100 cases):

===> Testing prop_bank_statem:prop_test()
....................................................................................................
OK: Passed 100 test(s).

100% {open,{bank_statem,deposit,1}}
===> 
1/1 properties passed

But, if we switch to using a less strict generator:

open(_Data) -> [{open, {call, bank_statem, deposit, [integer()]}}].

We very quickly hit a failure:

===> Testing prop_bank_statem:prop_test()

=ERROR REPORT==== 29-Apr-2018::14:05:12 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {deposit,0}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,handle_deposit,
                  [0,
                   #{balance => 0},
                   {,#Ref}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,77}]},

caused by an attempt to make a deposit of 0.

To make things more interesting, we can amend our deposit transition to return the updated balance, and track that in the model:

initial_state_data() -> #{balance=>0}.

...

postcondition(_From, _To, #{balance:=PrevBalance}, {call, bank_statem, deposit, [Amount]}, {deposit_made, UpdatedBalance}) ->
    UpdatedBalance =:= (PrevBalance + Amount);

...

next_state_data(_From, _To, #{balance:=Balance}=Data, _Res, {call, bank_statem, deposit, [Amount]}) ->
    NewBalance = Balance + Amount,
    Data#{balance:=NewBalance};

For this toy example, the model is almost exactly the same as the implementation. And, in fact, the hardest thing with this style of property testing is keeping the model simple, when testing a more complex example.

Next time, we will add the rest of the transitions to our model.

Hold pls

On hold

The next step for our bank account is to place a hold on an account.

account-events-withself-held

Using state functions, this is as simple as:

open({call, From}, place_hold, Data) ->
    {next_state, held, Data, [{reply, From, hold_placed}]};

...

open({call, From}, {deposit, Amount}, Data) ->
    handle_deposit(Amount, Data, From);

...

held({call, From}, {deposit, Amount}, Data) ->
    handle_deposit(Amount, Data, From);

held({call, From}, remove_hold, Data) ->
    {next_state, open, Data, [{reply, From, hold_removed}]}.

handle_deposit(Amount, #{balance:=Balance} = Data, From) when is_number(Amount) andalso Amount > 0 ->
    NewBalance = Balance + Amount,
    {keep_state, Data#{balance:=NewBalance}, [{reply, From, deposit_made}]}.

Now any attempt to withdraw funds from a held account, will cause the gen_statem process to crash (hopefully having previously persisted any important data!).

And using handle event:

handle_event({call, From}, place_hold, open, Data) ->
    {next_state, held, Data, [{reply, From, hold_placed}]};

...

handle_event({call, From}, remove_hold, held, Data) ->
    {next_state, open, Data, [{reply, From, hold_removed}]};

...

handle_event({call, From}, {deposit, Amount}, open, Data) ->
    handle_deposit(Amount, Data, From);

handle_event({call, From}, {deposit, Amount}, held, Data) ->
    handle_deposit(Amount, Data, From);

...

handle_deposit(Amount, #{balance:=Balance} = Data, From) when is_number(Amount) andalso Amount > 0 ->
    NewBalance = Balance + Amount,
    {keep_state, Data#{balance:=NewBalance}, [{reply, From, deposit_made}]}.

Insufficient funds?

We can also add an availableToWithdraw method, with different behaviour for held accounts, without too much hassle:

open({call, From}, available_to_withdraw, #{balance:=Balance} = Data) ->
    {keep_state, Data, [{reply, From, Balance}]};

...

held({call, From}, available_to_withdraw, Data) ->
    {keep_state, Data, [{reply, From, 0}]};

Or:

handle_event({call, From}, available_to_withdraw, open, #{balance:=Balance} = Data) ->
    {keep_state, Data, [{reply, From, Balance}]};

...

handle_event({call, From}, available_to_withdraw, held, Data) ->
    {keep_state, Data, [{reply, From, 0}]};