It’s a Syn (Part 2)

To make things more interesting, we need to take the simple model, and run it with multiple concurrent nodes:

CONSTANTS
Nodes = {n1, n2}

Just two, to start with. The model state needs to become more complex:

Init ==
    /\ inbox = [n \in Nodes |-> <<>>]
    /\ registered = [n \in Nodes |-> {}]
    /\ next_val = 0
    /\ added = 0
    /\ removed = 0

Each node now has its own set of registered values; and, this being an Erlang system, it makes sense to model the inbox for messages (a sequence, for each node). In this case, a node is really a process (or gen_server).

Next ==
    /\ \E n \in Nodes:
        \/ Register(n)
        \/ SyncRegister(n)
        \/ Unregister(n)
        \/ SyncUnregister(n)
        \/ Complete

There are more options, for the next state: we now pick a node from the set of all nodes, and apply an action to that. Also, register & unregister have split in two (async).

Register(n) ==
    /\ \A o \in Nodes: Len(inbox[o]) = 0
    /\ next_val < MaxValues
    /\ registered' = [registered EXCEPT![n] = registered[n] \union {next_val}]
    /\ next_val' = next_val + 1
    /\ added' = added + 1
    /\ inbox' = [o \in Nodes |-> IF o = n THEN inbox[o] ELSE Append(inbox[o], [action |-> "sync_register", name |-> next_val])]
    /\ UNCHANGED <<removed>>

First we update the local registry, then broadcast a message to all other nodes, to do the same.

SyncRegister(n) ==
    /\ Len(inbox[n]) > 0
    /\ Head(inbox[n]).action = "sync_register"
    /\ registered' = [registered EXCEPT![n] = registered[n] \union {Head(inbox[n]).name}]
    /\ inbox' = [inbox EXCEPT![n] = Tail(inbox[n])]
    /\ UNCHANGED <<removed, added, next_val>>

To make things simpler, I decided that a pre-condition for register would be that all nodes have empty inboxes (i.e. any previous messages have already been processed). This obviously isn’t realistic, and the constraint will be removed later.

ItemToRemove(n) ==
    CHOOSE r \in registered[n]: TRUE

Unregister(n) ==
    /\ \A o \in Nodes: Len(inbox[o]) = 0
    /\ Cardinality(registered[n]) > 0
    /\ LET item_to_remove == ItemToRemove(n)
        IN registered' = [registered EXCEPT![n] = registered[n] \ {item_to_remove}]
        /\ inbox' = [o \in Nodes |-> IF o = n THEN inbox[o] ELSE Append(inbox[o], [action |-> "sync_unregister", name |-> item_to_remove])]
    /\ removed' = removed + 1
    /\ UNCHANGED <<added, next_val>>

SyncUnregister(n) ==
    /\ Len(inbox[n]) > 0
    /\ Head(inbox[n]).action = "sync_unregister"
    /\ registered' = [registered EXCEPT![n] = registered[n] \ {Head(inbox[n]).name}]
    /\ inbox' = [inbox EXCEPT![n] = Tail(inbox[n])]
    /\ UNCHANGED <<removed, added, next_val>>

Unsurprisingly, unregister is still a mirror image of the same steps. We maintain the same invariant:

AllRegistered ==
    \A n \in Nodes:
        Len(inbox[n]) = 0 => Cardinality(registered[n]) = (added - removed)

but evaluated on each node, individually; and we don’t bother checking until all messages have been processed (as the counters obviously won’t match).

And now we also check that all messages have been consumed, when the system halts:

PROPERTIES
AllMessagesProcessed

AllMessagesProcessed ==
    \A n \in Nodes:
        <>(Len(inbox[n]) = 0)

Again, we can run this with the model checker:

Starting...
Implied-temporal checking--satisfiability problem has 2 branches.
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-01-05 13:15:50.
Progress(41) at 2024-01-05 13:15:50: 543 states generated, 286 distinct states found, 0 states left on queue.
Checking 2 branches of temporal properties for the complete state space with 572 total distinct states at (2024-01-05 13:15:50)
Finished checking temporal properties in 00s at 2024-01-05 13:15:50
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 4.0E-15
543 states generated, 286 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 41.

And the number of possible states has ballooned from the simple model, but still no errors!

We can also try registering a larger number of keys, or using more nodes, with the same result; but the search space, and runtime, can increase dramatically (this is the downside to exhaustive checking, vs example based testing or even generative random testing).

It’s a Syn (Part 1)

I was looking at the Syn process registry recently, and thought it might be interesting to try modelling its netsplit recovery protocol (eventual consistency) in TLA+.

It’s always worth starting with something so simple, that it can’t be wrong. In this case I decided that scopes were out of scope, as was metadata; and I would first try to build a model for a single node.

Init ==
    /\ registered = {}
    /\ next_val = 0
    /\ added = 0
    /\ removed = 0

We have an empty set, for the registered values. A serial for the next item to be added (using integers as “keys”); and two counters, for the number of items added & removed.

Next ==
    \/ Register
    \/ Unregister
    \/ Complete

The next state is a choice of registering a value, unregistering a value; or reaching the end (an arbitrary limit).

Register ==
    /\ next_val < 5
    /\ registered' = registered \union {next_val}
    /\ next_val' = next_val + 1
    /\ added' = added + 1
    /\ UNCHANGED <<removed>>

If we haven’t hit that limit yet (pre-condition), then we add the next value to the set of registered values, and increment the counter.

ItemToRemove ==
    CHOOSE r \in registered: TRUE

Unregister ==
    /\ Cardinality(registered) > 0
    /\ registered' = registered \ {ItemToRemove}
    /\ removed' = removed + 1
    /\ UNCHANGED <<added, next_val>>

Unregistration is pretty similar. The pre-condition this time is that the set is not empty, and obviously we remove the (randomly selected) key instead.

Complete ==
    /\ next_val = 5
    /\ UNCHANGED <<added, next_val, registered, removed>>

Finally, once 5 values have been added (no matter how many were removed), we give up.

INVARIANTS
AllRegistered

AllRegistered == Cardinality(registered) = (added - removed)

And after every step, we ensure that the number of items in the set is equal to those added minus those removed.

If we run the model:

$ docker run --rm -it -v $PWD:/app -w /app openjdk:14-jdk-alpine java -XX:+UseParallelGC -cp tla2tools.jar tlc2.TLC -config syn.cfg -workers auto -cleanup syn.tla
TLC2 Version 2.18 of Day Month 20?? (rev: cab6f13)
...
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-01-05 12:23:27.
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 1.8E-17
37 states generated, 21 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 11.

No errors are found. Not a huge surprise, but this gives us a good foundation to build something more realistic.

CQRS Bookings (Part 2)

Last time, we looked at taking a booking. Now we want to persist that, over a server restart. We are going to do something very simple, using DETS; obviously for a more realistic scenario you might want to store your data in an RDBMS, or Kafka, or even a distributed ledger (Raft/Paxos).

First, we need to open the file, during init:

{ok, _Name} = dets:open_file(bookings, []),

We can then save the command, after validation:

handle_call({book_room, Cmd}, _From, #{available_rooms:=AvailableRooms, version:=Version} = State) ->
     ...
    NewVersion = Version + 1,
    ok = dets:insert(bookings, {NewVersion, Cmd}),
    ...

Now, if the process dies, and is respawned; we can resurrect the state, by replaying all the saved commands:

init(unused) ->
     ...
    MatchSpec = ets:fun2ms(fun({N,Cmd}) when N >= 0 -> {N, Cmd} end),
    ExistingBookings = lists:sort(fun({A,_}, {B,_}) -> A =< B end, dets:select(bookings, MatchSpec)),
    NewBookings = lists:foldl(fun({_, Cmd}, B) -> add_new_booking(Cmd, B) end, maps:get(bookings, State), ExistingBookings),

This isn’t exactly “event sourcing“, but something more akin to the Write Ahead Log (WAL) that databases use. We could just load all the history (which is basically what is happening right now), but the match spec:

[{{'$1','$2'},[{'>','$1',{const,0}}],[{{'$1','$2'}}]}]

will come in handy. We can now stop & start the server, and as long as the file on disk remains, any history will survive. To make things more efficient, we can save a snapshot of the state, at regular intervals; and only replay any events newer than that:

init(unused) ->
    {ok, _} = dets:open_file(snapshot, []),
    Results = dets:lookup(snapshot, latest),
    State = case length(Results) of
        0 ->
            new_state();
        1 ->
            [{latest, S}] = Results,
            S
    end,
    LatestVersion = maps:get(version, State),
    MatchSpec = ets:fun2ms(fun({N,Cmd}) when N > LatestVersion -> {N, Cmd} end),
    ...
    erlang:send_after(60 * 1000, self(), save_snapshot),
    ...

handle_info(save_snapshot, State) ->
    ok = dets:insert(snapshot, {latest, State}),
    {noreply, State};

This isn’t without risk. The main benefit of the “let it crash” philosophy is that any bad state is blown away, so we would need to take into account that we could attempt to replay a poison message; particularly if e.g. the actual hotel availability was from an external system, so any action might not be idempotent. We might need to discard some commands, or even perform a compensating action.

CQRS Bookings (Part 1)

I recently came across the CQRS Bookings kata, and thought it might be a good chance to experiment. The actual kata description isn’t very detailed, but there is a good example repo (in C#).

You first need some data structure to define what rooms are available & when (in a more realistic example, this would come from some external source, e.g. a third party API). I decided to start with something relatively simple: a map of hotel id to date, then to a list of available rooms (another map).

3 => #{
   {2023, 12, 1} => [
                     #{
                       id => <<"101">>,
                       prices => #{
                                   "EUR" => #{
                                              1 => 109,
                                              2 => 140
                                             }
                                  }
                      },
                     #{
                       id => <<"102">>,
                       prices => #{
                                   "EUR" => #{
                                              1 => 109,
                                              2 => 140
                                             }
                                  }
                      },
                      ...

The rooms have two prices, depending on the number of occupants.

We can then have a first stab at making a booking:

gen_server:call(cqrs_booking, {book_room, {1, 2, <<"101">>, {2023, 12, 1}, {2023, 12, 2}}}).

handle_call({book_room, Cmd}, _From, #{available_rooms:=AvailableRooms, bookings:=Bookings} = State) ->
    {Client, Hotel, Room, CheckIn, _CheckOut} = Cmd,
    AvailableRoomsForHotel = maps:get(Hotel, AvailableRooms),
    AvailableRoomsForDay = maps:get(CheckIn, AvailableRoomsForHotel),
    [_RoomInfo] = lists:filter(fun(R) -> maps:get(id, R) == Room end, AvailableRoomsForDay),
    NewBookings = case maps:is_key(Client, Bookings) of
        true ->
            BookingsForClient = maps:get(Client, Bookings),
            maps:update(Client, [Cmd | BookingsForClient], Bookings);
        false ->
            maps:put(Client, [Cmd], Bookings)
    end,
    {reply, ok, State#{bookings:=NewBookings}};

First, we look up the hotel by id, then the day (using the check-in date). This would crash the process, if the key did not exist. If the room is available, then the booking is added to another map (part of the gen_server state). I’m not removing the room from the available list, which would obviously be a problem if you were actually running a hotel; but that’s not really the part I’m interested in.

To make things a bit more CQRS, we can now split the validation of the command, from the actual processing. This obviously introduces a whole new assortment of trade-offs, but deciding whether that is worthwhile for a particular use case is out of the scope of this blog post.

self() ! {new_booking, Cmd},

This is simply sending a new message to the same process, to be handled later:

handle_info({new_booking, {Client, _Hotel, _Room, _CheckIn, _CheckOut} = Cmd}, #{bookings:=Bookings} = State) ->
    NewBookings = case maps:is_key(Client, Bookings) of
        true ->
            BookingsForClient = maps:get(Client, Bookings),
            maps:update(Client, [Cmd | BookingsForClient], Bookings);
        false ->
            maps:put(Client, [Cmd], Bookings)
    end,
    {noreply, State#{bookings:=NewBookings}};

This works; but as of v21, there is a more elegant way to handle it:

{reply, ok, State, {continue, {new_booking, Cmd}}};

...

handle_continue({new_booking, {Client, _Hotel, _Room, _CheckIn, _CheckOut} = Cmd}, #{bookings:=Bookings} = State) ->
    ...

All well and good, so far; but if the server crashes, we lose everything. Next time, we’ll look at persisting some data.

Minimal web app with cowboy & rebar3

The erlang docker images now include rebar3, so it’s very easy to get started with a new project (the name needs to be an erlang term, so underscores not dashes!):

$ docker run -v $PWD:/app/foo_app -w /app erlang:25 rebar3 new release foo_app
Unable to find image 'erlang:25' locally
25: Pulling from library/erlang
e756f3fdd6a3: Pull complete 
...
Digest: sha256:4eafc58e4475a7be2416af55ea142a7cd00c14b6ec2490a38db3a0869efde7e4
Status: Downloaded newer image for erlang:25
===> Writing foo_app/apps/foo_app/src/foo_app_app.erl
===> Writing foo_app/apps/foo_app/src/foo_app_sup.erl
===> Writing foo_app/apps/foo_app/src/foo_app.app.src
===> Writing foo_app/rebar.config
===> Writing foo_app/config/sys.config
===> Writing foo_app/config/vm.args
===> Writing foo_app/.gitignore
===> Writing foo_app/LICENSE
===> Writing foo_app/README.md

As dockerd is running as root, you then need to chown the generated files (you can run the docker cmd as the current user, but that didn’t go well when I tried it).

Then you need a Dockerfile:

# Build stage 0
FROM erlang:25-alpine

RUN apk add --no-cache git

# Set working directory
RUN mkdir /buildroot
WORKDIR /buildroot

# Copy our Erlang test application
COPY rebar.config .
COPY apps/ apps/
COPY config/ config/

# And build the release
RUN rebar3 as prod release

# Build stage 1
FROM alpine

# Install some libs
RUN apk add --no-cache openssl ncurses-libs libstdc++

# Install the released application
COPY --from=0 /buildroot/_build/prod/rel/foo_app /foo_app

# Expose relevant ports
EXPOSE 8080

CMD ["/foo_app/bin/foo_app", "foreground"]

At this point, you should be able to build the image:

$ docker build -t foo_app .
Sending build context to Docker daemon  104.4kB
Step 1/13 : FROM erlang:25-alpine
25-alpine: Pulling from library/erlang
2408cc74d12b: Already exists 
1e90e213ba89: Pull complete 
Digest: sha256:1fdd18a383206eeba257f18c5dd22d7f381942eda4cd76a88e36a1d3247c4130
Status: Downloaded newer image for erlang:25-alpine
 ---> 8f1437fc7749
Step 2/13 : RUN apk add --no-cache git
 ---> Running in 1af099a684a7
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/x86_64/APKINDEX.tar.gz
(1/6) Installing brotli-libs (1.0.9-r6)
(2/6) Installing nghttp2-libs (1.47.0-r0)
(3/6) Installing libcurl (7.83.1-r1)
(4/6) Installing expat (2.4.8-r0)
(5/6) Installing pcre2 (10.40-r0)
(6/6) Installing git (2.36.1-r0)
Executing busybox-1.35.0-r13.trigger
OK: 23 MiB in 30 packages
Removing intermediate container 1af099a684a7
 ---> 99d4529dc490
Step 3/13 : RUN mkdir /buildroot
 ---> Running in 47daa05e777b
Removing intermediate container 47daa05e777b
 ---> ac097064f098
Step 4/13 : WORKDIR /buildroot
 ---> Running in 51479ca5c35a
Removing intermediate container 51479ca5c35a
 ---> 649f740b900b
Step 5/13 : COPY rebar.config .
 ---> 3fe63710afa6
Step 6/13 : COPY apps/ apps/
 ---> 91e46ef9dfc9
Step 7/13 : COPY config/ config/
 ---> 4194973d4e23
Step 8/13 : RUN rebar3 as prod release
 ---> Running in 7e7a2112d86e
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling foo_app
===> Assembling release foo_app-0.1.0...
===> Release successfully assembled: _build/prod/rel/foo_app
Removing intermediate container 7e7a2112d86e
 ---> 91154e8634db
Step 9/13 : FROM alpine
latest: Pulling from library/alpine
2408cc74d12b: Already exists 
Digest: sha256:686d8c9dfa6f3ccfc8230bc3178d23f84eeaf7e457f36f271ab1acc53015037c
Status: Downloaded newer image for alpine:latest
 ---> e66264b98777
Step 10/13 : RUN apk add --no-cache openssl ncurses-libs libstdc++
 ---> Running in 935857dec575
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.16/community/x86_64/APKINDEX.tar.gz
(1/5) Installing libgcc (11.2.1_git20220219-r2)
(2/5) Installing libstdc++ (11.2.1_git20220219-r2)
(3/5) Installing ncurses-terminfo-base (6.3_p20220521-r0)
(4/5) Installing ncurses-libs (6.3_p20220521-r0)
(5/5) Installing openssl (1.1.1o-r0)
Executing busybox-1.35.0-r13.trigger
OK: 9 MiB in 19 packages
Removing intermediate container 935857dec575
 ---> e62baafed2cd
Step 11/13 : COPY --from=0 /buildroot/_build/prod/rel/foo_app /foo_app
 ---> d768a4b774ea
Step 12/13 : EXPOSE 8080
 ---> Running in 11f013006a16
Removing intermediate container 11f013006a16
 ---> 0e347e1c2d46
Step 13/13 : CMD ["/foo_app/bin/foo_app", "foreground"]
 ---> Running in 7bf493256ded
Removing intermediate container 7bf493256ded
 ---> dc0f1632d27f
Successfully built dc0f1632d27f
Successfully tagged foo_app:latest

This is a good approach for building a final deployable image, but it’s a bit painful for local development, and doesn’t make the best use of the docker layer caching. You probably want to use a mounted vol instead.

You should now be able to run the application:

$ docker run -p 8080:8080 --rm foo_app
Exec: /foo_app/erts-13.0.1/bin/erlexec -noinput +Bd -boot /foo_app/releases/0.1.0/start -mode embedded -boot_var SYSTEM_LIB_DIR /foo_app/lib -config /foo_app/releases/0.1.0/sys.config -args_file /foo_app/releases/0.1.0/vm.args -- foreground
Root: /foo_app
/foo_app

Unfortunately Ctrl+C doesn’t work, so you need to docker kill it.

This isn’t a web app, yet, so we need to add cowboy to the dependencies list in rebar.config:

{deps, [
    {cowboy,"2.9.0"}
]}.

I’m using the latest tag, but you might want a specific version. If you build again, you should see some extra steps:

Step 8/13 : RUN rebar3 as prod release
 ---> Running in 95791181399b
===> Verifying dependencies...
===> Fetching cowboy v2.9.0
===> Fetching cowlib v2.11.0
===> Fetching ranch v1.8.0
===> Analyzing applications...
===> Compiling cowlib
===> Compiling ranch
===> Compiling cowboy
===> Analyzing applications...
===> Compiling foo_app
===> Assembling release foo_app-0.1.0...
===> Release successfully assembled: _build/prod/rel/foo_app

You also need to add cowboy to the list in the app.src file:

  {applications,
   [kernel,
    stdlib,
    cowboy
   ]},

You can then configure cowboy, in your _app file:

start(_StartType, _StartArgs) ->
    Port = 8080,
    Dispatch = cowboy_router:compile([
        {'_', [{"/ping", ping_handler, []}]}
    ]), 
    {ok, _} = cowboy:start_clear(my_http_listener,
        [{port, Port}],
        #{env => #{dispatch => Dispatch}}
    ), 
    foo_app_sup:start_link().

And create a handler:

-module(ping_handler).

-export([init/2]).

init(Req, State) ->
    {ok, Req, State}.

This is the absolute minimum, and will return a 204 for any request:

$ curl -I -XGET "http://localhost:8080/ping"
HTTP/1.1 204 No Content

Or, for something a bit more realistic:

init(#{method := Method} = Req, _State) ->
    handle_req(Method, Req).

handle_req(<<"GET">>, Req) ->
    {ok, text_plain(Req, <<"pong">>)};

handle_req(_Method, Req) ->
    cowboy_req:reply(404, Req).

text_plain(Request, ResponseBody) ->
    ResponseHeaders = #{
        <<"content-type">> => <<"text/plain">>
    },--
    cowboy_req:reply(200, ResponseHeaders, ResponseBody, Request).

This will return some text, for a GET:

$ curl "http://localhost:8080/ping"
pong

And a 404 for any other HTTP method:

$ curl -I -XPOST "http://localhost:8080/ping"
HTTP/1.1 404 Not Found
content-length: 0
date: Fri, 17 Jun 2022 11:54:35 GMT
server: Cowboy

Word chains (Part 3)

Last time, we found the possible next words. Now we want to build on that, and use that function to build a chain from the first word, to the goal word. Sounds like a job for recursion (divide and conquer)!

This time, we’ll check that the chains generated all end in the expected word:

prop_all_chains_should_include_last_word() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            Words = word_chains:word_list(length(FirstWord)),
            Chains = word_chains:all_chains(FirstWord, LastWord, Words, length(FirstWord)),
            InvalidChains = lists:filter(fun([W|_]) -> W =/= LastWord end, Chains),
            length(InvalidChains) =:= 0
        end).

We pass in the word list (all words of the chosen length), to avoid reading the file multiple times.

all_chains(FirstWord, LastWord, Words, MaxLength) ->
    lists:sort(fun(A, B) -> length(A) =< length(B) end, all_chains(FirstWord, LastWord, Words, MaxLength, [[FirstWord]])).

all_chains(FirstWord, LastWord, Words, MaxLength, Chains) ->
    lists:append(lists:map(fun(Chain) ->
        [CurrentWord | _Rest] = Chain,
        case CurrentWord =:= LastWord of
            true -> [Chain];
            false ->
                NextWords = next_words(CurrentWord, Words),
                NewChains = compact(lists:map(fun(NewWord) ->
                    case lists:member(NewWord, Chain) of
                        false ->
                            NewChain = [NewWord | Chain],
                            case length(NewChain) > MaxLength of
                                true -> [];
                                false -> NewChain
                            end;
                        true -> []
                    end
                end, NextWords)),
                all_chains(FirstWord, LastWord, Words, MaxLength, NewChains)
        end
    end, Chains)).

Our first chain is simply the first word, e.g. [“cat”]. We then iterate over the list, and find all possible next words, and create the possible chains using those words, [[“bat”, “cat”], [“cab”, “cat”], &c …] .

If any chain ends in the target word, no more work is required. Otherwise we continue to extend, and branch, the chains. If the proposed next word already exists in the current chain, then that branch is dead (to avoid looping forever).

Once all branches have been exhausted, we return the list of valid chains, sorted by length (shortest first).

Unfortunately, while it seemed like a good idea to generate all possible chains, it turns out that some of them can be very long. So I added a max length param, to cut short further exploration.

Even with that, execution can be pretty slow; so next time we’ll do some profiling, and see if caching the possible next words will help.

Word chains (Part 2)

Previously, we laid some groundwork for generating word chains. Rather than arbitrarily returning one word, we might as well get all the words that are one letter different from the first word:

prop_next_words_should_be_near() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            NextWords = word_chains:next_words(FirstWord),
            InvalidWords = lists:filter(fun(W) -> word_chains:get_word_distance(W, FirstWord) =/= 1 end, NextWords),
            length(InvalidWords) =:= 0
        end).

We can calculate the “word distance” using map/reduce:

get_word_distance(Word1, Word2) ->
    Differences = lists:zipwith(fun(X, Y) -> case X =:= Y of true -> 0; false -> 1 end end, Word1, Word2),
    lists:foldl(fun(D, Acc) -> Acc + D end, 0, Differences).

For each letter in Word1, we compare it with the same (position) letter in Word2, and assign a 0 if it matches and a 1 if it differs. The sum of these values tells us the difference between the 2 words.

2> word_chains:get_word_distance("cat", "cat").
0
3> word_chains:get_word_distance("cat", "cot").
1
4> word_chains:get_word_distance("cat", "cog").
2

Using this helper function, we can easily find all the possible next words:

next_words(FirstWord) ->
    WordList = word_list(),
    SameLengthWords = lists:filter(fun(W) -> length(W) =:= length(FirstWord) end, WordList),
    WordDistances = lists:map(fun(W) -> {W, get_word_distance(W, FirstWord)} end, SameLengthWords),
    lists:map(fun({Word, _}) -> Word end, lists:filter(fun({_, Distance}) -> Distance =:= 1 end, WordDistances)).

Almost there! Next time, we will actually start generating some word chains.

Word chains (Part 1)

I’ve recently been using the word chains kata for interviewing, and I thought it might be interesting to try using Erlang, and property testing.

The first step is to get a word list. I thought most linux distros came with a dictionary file, but my laptop only had a crack lib, which wasn’t really what I was looking for.

I had used this npm package before, so I just downloaded the text file it provides. With that hand, getting a list of words is easy:

word_list() ->
    {ok, Data} = file:read_file("words.txt"),
    binary:split(Data, [<<"\n">>], [global]).

The next step is to find all words that are one letter away from the first word. So we create a property:

prop_next_word_should_be_new() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            NextWord = word_chains:next_word(FirstWord, LastWord),
            NextWord =/= FirstWord
        end).

For each first word/last word pair, we check that the next word is different from the first word. We also need a generator, of valid words:

valid_words() ->
    ?SUCHTHAT({FirstWord, LastWord},
        ?LET(N, choose(2, 10),
            begin
                WordList = word_chains:word_list(),
                SameLengthWords = lists:filter(fun(W) -> length(W) =:= N end, WordList),
                {random_word(SameLengthWords), random_word(SameLengthWords)}
            end),
    FirstWord =/= LastWord).

random_word(Words) ->
    lists:nth(rand:uniform(length(Words)), Words).

First we pick a random number, in the range (2, 10), and then pick 2 words of that length, from the full word list, at random. This could result in the same word being used as both first & last word, so we filter that out, using the ?SUCHTHAT macro.

For now, we can make this pass by simply returning the last word:

next_word(_FirstWord, LastWord) ->
    LastWord.
$ ./rebar3 proper
===> Verifying dependencies...
===> Compiling word_chains
===> Testing prop_word_chains:prop_next_word_should_be_new()
....................................................................................................
OK: Passed 100 test(s).
===> 
1/1 properties passed

Boom! Next time, a more useful implementation of next word.

Withdraw!

The next transition is a withdrawal. Naively, we add another command:

open(_Data) ->
    [
        {open, {call, bank_statem, deposit, [pos_integer()]}},
        {open, {call, bank_statem, withdraw, [pos_integer()]}}
    ].

Which causes an immediate failure:

===> Testing prop_bank_statem:prop_test()
...
=ERROR REPORT==== 29-Apr-2018::15:08:41 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,2}}
** When server state  = {open,#{balance => 1}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,2},
                   #{balance => 1}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
Crash dump is being written to: erl_crash.dump...done

Having the entire process crash isn’t very useful, so we follow the recommendation to use the TRAPEXIT macro:

prop_test() ->
    ?FORALL(Cmds, proper_fsm:commands(?MODULE),
        ?TRAPEXIT(
            begin
                bank_statem:start_link(),
                {History,State,Result} = proper_fsm:run_commands(?MODULE, Cmds),
                bank_statem:stop(),
                ?WHENFAIL(io:format("History: ~p\nState: ~p\nResult: ~p\n", [History,State,Result]),
                    aggregate(zip(proper_fsm:state_names(History), command_names(Cmds)), Result =:= ok))
            end)).

This allows shrinking to take place, and gives us (slightly) more useful output:

===> Testing prop_bank_statem:prop_test()
.!
Failed: After 2 test(s).
A linked process died with reason {function_clause,[{bank_statem,open,[{call,{,#Ref}},{withdraw,3},#{balance=>0}],[{file,[47,97,112,112,47,95,98,117,105,108,100,47,116,101,115,116,47,108,105,98,47,98,97,110,107,95,115,116,97,116,101,109,47,115,114,99,47,98,97,110,107,95,115,116,97,116,101,109,46,101,114,108]},{line,46}]},{gen_statem,call_state_function,5,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1633}]},{gen_statem,loop_event_state_function,6,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1023}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,247}]}]}.

=ERROR REPORT==== 29-Apr-2018::15:11:36 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,3}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,3},
                   #{balance => 0}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
[{set,{var,1},{call,bank_statem,withdraw,[3]}}]

Shrinking (0 time(s))
[{set,{var,1},{call,bank_statem,withdraw,[3]}}]

=ERROR REPORT==== 29-Apr-2018::15:11:36 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,3}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,3},
                   #{balance => 0}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
....
===> 
0/1 properties passed, 1 failed
===> Failed test cases:
  prop_bank_statem:prop_test() -> false

We can see that the error is caused by trying to withdraw funds that don’t exist. This can be avoided by adding a pre-condition, to ensure that invalid commands aren’t generated:

precondition(_From, _To, #{balance:=Balance}, {call, bank_statem, withdraw, [Amount]}) ->
    Balance - Amount >= 0;

And also update our model when a withdrawal occurs:

next_state_data(_From, _To, #{balance:=Balance}=Data, _Res, {call, bank_statem, withdraw, [Amount]}) ->
    NewBalance = Balance - Amount,
    Data#{balance:=NewBalance};

At this point, I would expect the property to pass, but it’s still failing:

===> Testing prop_bank_statem:prop_test()
...................!
Failed: After 20 test(s).
A linked process died with reason {function_clause,[{bank_statem,open,[{call,{,#Ref}},{withdraw,9},#{balance=>9}],[{file,[47,97,112,112,47,95,98,117,105,108,100,47,116,101,115,116,47,108,105,98,47,98,97,110,107,95,115,116,97,116,101,109,47,115,114,99,47,98,97,110,107,95,115,116,97,116,101,109,46,101,114,108]},{line,46}]},{gen_statem,call_state_function,5,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1633}]},{gen_statem,loop_event_state_function,6,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1023}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,247}]}]}.

=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...

Shrinking .
=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
.
=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
(2 time(s))

=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
[{set,{var,1},{call,bank_statem,deposit,[2]}},{set,{var,2},{call,bank_statem,deposit,[7]}},{set,{var,3},{call,bank_statem,withdraw,[9]}}]
===> 
0/1 properties passed, 1 failed
===> Failed test cases:
  prop_bank_statem:prop_test() -> false

Rather embarrassingly, that’s an actual bug in my code, the guard is checking that the remaining balance is strictly greater than 0. Hurray for property testing!

State machine properties

One of the more interesting corners of the Erlang ecosystem is property based testing. Building on Haskell’s QuickCheck, it allows generation of test cases, similar to “fuzzing”.

The integration with gen_fsm/statem makes stateful testing remarkably easy, by describing the possible transitions for our state machine:

-module(prop_bank_statem).

-include_lib("proper/include/proper.hrl").

-compile(export_all).

prop_test() ->
    ?FORALL(Cmds, proper_fsm:commands(?MODULE),
        begin
            bank_statem:start_link(),
            {History,State,Result} = proper_fsm:run_commands(?MODULE, Cmds),
            bank_statem:stop(),
            ?WHENFAIL(
                io:format("History: ~p\nState: ~p\nResult: ~p\n", [History,State,Result]),
                aggregate(zip(proper_fsm:state_names(History), command_names(Cmds)), Result =:= ok)
            )
        end).

initial_state() -> open.

initial_state_data() -> #{}.

open(_Data) -> [{open, {call, bank_statem, deposit, [pos_integer()]}}].

weight(_FromState, _ToState, _Call) -> 1.

precondition(_From, _To, #{}, {call, _Mod, _Fun, _Args}) -> true.

postcondition(_From, _To, _Data, {call, _Mod, _Fun, _Args}, _Res) -> true.

next_state_data(_From, _To, Data, _Res, {call, _Mod, _Fun, _Args}) ->
    NewData = Data,
    NewData.

To begin with, we’re just checking that a deposit can be made to an open account. This property succeeds (after the default 100 cases):

===> Testing prop_bank_statem:prop_test()
....................................................................................................
OK: Passed 100 test(s).

100% {open,{bank_statem,deposit,1}}
===> 
1/1 properties passed

But, if we switch to using a less strict generator:

open(_Data) -> [{open, {call, bank_statem, deposit, [integer()]}}].

We very quickly hit a failure:

===> Testing prop_bank_statem:prop_test()

=ERROR REPORT==== 29-Apr-2018::14:05:12 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {deposit,0}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,handle_deposit,
                  [0,
                   #{balance => 0},
                   {,#Ref}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,77}]},

caused by an attempt to make a deposit of 0.

To make things more interesting, we can amend our deposit transition to return the updated balance, and track that in the model:

initial_state_data() -> #{balance=>0}.

...

postcondition(_From, _To, #{balance:=PrevBalance}, {call, bank_statem, deposit, [Amount]}, {deposit_made, UpdatedBalance}) ->
    UpdatedBalance =:= (PrevBalance + Amount);

...

next_state_data(_From, _To, #{balance:=Balance}=Data, _Res, {call, bank_statem, deposit, [Amount]}) ->
    NewBalance = Balance + Amount,
    Data#{balance:=NewBalance};

For this toy example, the model is almost exactly the same as the implementation. And, in fact, the hardest thing with this style of property testing is keeping the model simple, when testing a more complex example.

Next time, we will add the rest of the transitions to our model.