Word chains (Part 3)

Last time, we found the possible next words. Now we want to build on that, and use that function to build a chain from the first word, to the goal word. Sounds like a job for recursion (divide and conquer)!

This time, we’ll check that the chains generated all end in the expected word:

prop_all_chains_should_include_last_word() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
            Words = word_chains:word_list(length(FirstWord)),
            Chains = word_chains:all_chains(FirstWord, LastWord, Words, length(FirstWord)),
            InvalidChains = lists:filter(fun([W|_]) -> W =/= LastWord end, Chains),
            length(InvalidChains) =:= 0

We pass in the word list (all words of the chosen length), to avoid reading the file multiple times.

all_chains(FirstWord, LastWord, Words, MaxLength) ->
    lists:sort(fun(A, B) -> length(A) =< length(B) end, all_chains(FirstWord, LastWord, Words, MaxLength, [[FirstWord]])).

all_chains(FirstWord, LastWord, Words, MaxLength, Chains) ->
    lists:append(lists:map(fun(Chain) ->
        [CurrentWord | _Rest] = Chain,
        case CurrentWord =:= LastWord of
            true -> [Chain];
            false ->
                NextWords = next_words(CurrentWord, Words),
                NewChains = compact(lists:map(fun(NewWord) ->
                    case lists:member(NewWord, Chain) of
                        false ->
                            NewChain = [NewWord | Chain],
                            case length(NewChain) > MaxLength of
                                true -> [];
                                false -> NewChain
                        true -> []
                end, NextWords)),
                all_chains(FirstWord, LastWord, Words, MaxLength, NewChains)
    end, Chains)).

Our first chain is simply the first word, e.g. [“cat”]. We then iterate over the list, and find all possible next words, and create the possible chains using those words, [[“bat”, “cat”], [“cab”, “cat”], &c …] .

If any chain ends in the target word, no more work is required. Otherwise we continue to extend, and branch, the chains. If the proposed next word already exists in the current chain, then that branch is dead (to avoid looping forever).

Once all branches have been exhausted, we return the list of valid chains, sorted by length (shortest first).

Unfortunately, while it seemed like a good idea to generate all possible chains, it turns out that some of them can be very long. So I added a max length param, to cut short further exploration.

Even with that, execution can be pretty slow; so next time we’ll do some profiling, and see if caching the possible next words will help.

Word chains (Part 2)

Previously, we laid some groundwork for generating word chains. Rather than arbitrarily returning one word, we might as well get all the words that are one letter different from the first word:

prop_next_words_should_be_near() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
            NextWords = word_chains:next_words(FirstWord),
            InvalidWords = lists:filter(fun(W) -> word_chains:get_word_distance(W, FirstWord) =/= 1 end, NextWords),
            length(InvalidWords) =:= 0

We can calculate the “word distance” using map/reduce:

get_word_distance(Word1, Word2) ->
    Differences = lists:zipwith(fun(X, Y) -> case X =:= Y of true -> 0; false -> 1 end end, Word1, Word2),
    lists:foldl(fun(D, Acc) -> Acc + D end, 0, Differences).

For each letter in Word1, we compare it with the same (position) letter in Word2, and assign a 0 if it matches and a 1 if it differs. The sum of these values tells us the difference between the 2 words.

2> word_chains:get_word_distance("cat", "cat").
3> word_chains:get_word_distance("cat", "cot").
4> word_chains:get_word_distance("cat", "cog").

Using this helper function, we can easily find all the possible next words:

next_words(FirstWord) ->
    WordList = word_list(),
    SameLengthWords = lists:filter(fun(W) -> length(W) =:= length(FirstWord) end, WordList),
    WordDistances = lists:map(fun(W) -> {W, get_word_distance(W, FirstWord)} end, SameLengthWords),
    lists:map(fun({Word, _}) -> Word end, lists:filter(fun({_, Distance}) -> Distance =:= 1 end, WordDistances)).

Almost there! Next time, we will actually start generating some word chains.

Word chains (Part 1)

I’ve recently been using the word chains kata for interviewing, and I thought it might be interesting to try using Erlang, and property testing.

The first step is to get a word list. I thought most linux distros came with a dictionary file, but my laptop only had a crack lib, which wasn’t really what I was looking for.

I had used this npm package before, so I just downloaded the text file it provides. With that hand, getting a list of words is easy:

word_list() ->
    {ok, Data} = file:read_file("words.txt"),
    binary:split(Data, [<<"\n">>], [global]).

The next step is to find all words that are one letter away from the first word. So we create a property:

prop_next_word_should_be_new() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
            NextWord = word_chains:next_word(FirstWord, LastWord),
            NextWord =/= FirstWord

For each first word/last word pair, we check that the next word is different from the first word. We also need a generator, of valid words:

valid_words() ->
    ?SUCHTHAT({FirstWord, LastWord},
        ?LET(N, choose(2, 10),
                WordList = word_chains:word_list(),
                SameLengthWords = lists:filter(fun(W) -> length(W) =:= N end, WordList),
                {random_word(SameLengthWords), random_word(SameLengthWords)}
    FirstWord =/= LastWord).

random_word(Words) ->
    lists:nth(rand:uniform(length(Words)), Words).

First we pick a random number, in the range (2, 10), and then pick 2 words of that length, from the full word list, at random. This could result in the same word being used as both first & last word, so we filter that out, using the ?SUCHTHAT macro.

For now, we can make this pass by simply returning the last word:

next_word(_FirstWord, LastWord) ->
$ ./rebar3 proper
===> Verifying dependencies...
===> Compiling word_chains
===> Testing prop_word_chains:prop_next_word_should_be_new()
OK: Passed 100 test(s).
1/1 properties passed

Boom! Next time, a more useful implementation of next word.

Grouping by time period in Redshift

We use Redshift for reporting (or, more accurately, the tool we use for reporting uses Redshift).

It’s pretty simple to calculate daily rollups, using datetrunc:

select date_trunc('day', foo) date, sum(bar) bar
from [baz]
where ...
group by date

Or just by casting to a date:

select [foo:date] date, sum(bar) bar
from [baz]
where ...
group by date

But what if you want to slice into smaller time periods?

The easiest way I’ve found, is to use to_char, and format the timestamp appropriately. e.g.

select to_char(foo, 'YYYY-MM-DD HH24:MI') date, sum(bar) bar
from [baz]
where ...
group by date

Using “free” wi-fi on Linux

I recently found myself at a train station that claimed to offer “free super-fast wi-fi”; but when I connected to it, I was stuck with a little question mark in the gnome toolbar, and the chrome dinosaur.

I’m not the first person to note this, but I did manage to get it working… eventually.

I opened up a terminal, and even though the browser was sad, I could ping:

$ ping google.com
PING google.com ( 56(84) bytes of data.
From _gateway ( icmp_seq=1 Destination Net Prohibited

Sort of. DNS was working, anyway. So I tried curl:

$ curl -vL "https://www.google.com"
* Rebuilt URL to: https://www.google.com/
*   Trying
* Connected to www.google.com ( port 443 (#0)
... snip ...

And it looked like I was getting the actual Google homepage, not a hijacked page, which was interesting.

At that point I took the IP address from that request, and stuck it directly in the browser, hoping to skip straight there; and now the browser decided to show me the T&C page I needed. Once I’d ticked the right box, I was online.

Adding/removing LBaaS pool members, using Ansible

We use a load balancer to achieve “zero downtime deployments”. During a deploy, we take each node out of the rotation, update the code, and put it back in. If you’re using the Rackspace Cloud, there are ansible modules provided (as described here).

If you’re just using OpenStack though, it’s not quite as simple. While there are some ansible modules, there doesn’t seem to be one for managing LBaaS pool members. It is possible to create a pool member using a heat template, but that doesn’t really fit into an ansible playbook.

The only alternative seems to be using the neutron cli, which is deprecated (but doesn’t seem to have been replaced, for controlling a LBaaS anyway).

Fortunately, it can return output in a machine readable format (json), which makes calling it from ansible relatively simple. To add a pool member, in an idempotent fashion:

- name: Get current pool members
  local_action: command neutron lbaas-member-list test-gib-lb-pool --format json
  register: member_list

- set_fact: pool_member={{ member_list.stdout | from_json | selectattr("address", "equalto", ansible_default_ipv4.address) | map(attribute="id") | list }}

- name: Remove node from LB
  local_action: command neutron lbaas-member-delete {{ pool_member | first }} test-gib-lb-pool
  when: (pool_member | count) > 0

You first need to check if the current host is in the list of pool members. If so, remove it; otherwise, do nothing.

Once the code is updated, and the services restarted, you can put the node back in the pool:

- name: Add node to LB
  local_action: command neutron lbaas-member-create --name {{ inventory_hostname }} --subnet gamevy_subnet --address {{ ansible_default_ipv4.address }} --protocol-port 443 test-gib-lb-pool

This seems to be quite effective, but we have seen some errors when the members are deleted (which isn’t exactly “zero” downtime). If I find a better approach I’ll update this.

A squad of squids

We use squid as a forward proxy, to ensure that all outbound requests come from the same (whitelisted) IP addresses.

Originally, we chose the proxy to use at deploy time, using an ansible template:

"proxy": "http://{{ hostvars['proxy-' + (["01", "02"] | random())].ansible_default_ipv4.address }}:3128"

This “load balanced” the traffic, but wouldn’t be very useful if one of the proxies disappeared (the main reason for having two of them!)

I briefly considered using nginx to pass requests to squid, as it was already installed on the app servers; but quickly realised that wouldn’t work, for the same reason we needed to use squid in the first place: it can’t proxy TLS traffic.

A bit of research revealed that you can group multiple squid instances into a hierarchy. Taken care of by some more templating, in the squid config this time:

{% if inventory_hostname in groups['app_server'] %}
{% for host in proxy_ips %}
cache_peer {{ host }} parent 3128 0 round-robin
{% endfor %}
never_direct allow all
{% endif %}

Where the list of IPs is a projection from inventory data:

proxy_ips: "{{ groups['proxy_server'] | map('extract', hostvars, ['ansible_default_ipv4', 'address']) | list }}"