Word chains (Part 3)

Last time, we found the possible next words. Now we want to build on that, and use that function to build a chain from the first word, to the goal word. Sounds like a job for recursion (divide and conquer)!

This time, we’ll check that the chains generated all end in the expected word:

prop_all_chains_should_include_last_word() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            Words = word_chains:word_list(length(FirstWord)),
            Chains = word_chains:all_chains(FirstWord, LastWord, Words, length(FirstWord)),
            InvalidChains = lists:filter(fun([W|_]) -> W =/= LastWord end, Chains),
            length(InvalidChains) =:= 0
        end).

We pass in the word list (all words of the chosen length), to avoid reading the file multiple times.

all_chains(FirstWord, LastWord, Words, MaxLength) ->
    lists:sort(fun(A, B) -> length(A) =< length(B) end, all_chains(FirstWord, LastWord, Words, MaxLength, [[FirstWord]])).

all_chains(FirstWord, LastWord, Words, MaxLength, Chains) ->
    lists:append(lists:map(fun(Chain) ->
        [CurrentWord | _Rest] = Chain,
        case CurrentWord =:= LastWord of
            true -> [Chain];
            false ->
                NextWords = next_words(CurrentWord, Words),
                NewChains = compact(lists:map(fun(NewWord) ->
                    case lists:member(NewWord, Chain) of
                        false ->
                            NewChain = [NewWord | Chain],
                            case length(NewChain) > MaxLength of
                                true -> [];
                                false -> NewChain
                            end;
                        true -> []
                    end
                end, NextWords)),
                all_chains(FirstWord, LastWord, Words, MaxLength, NewChains)
        end
    end, Chains)).

Our first chain is simply the first word, e.g. [“cat”]. We then iterate over the list, and find all possible next words, and create the possible chains using those words, [[“bat”, “cat”], [“cab”, “cat”], &c …] .

If any chain ends in the target word, no more work is required. Otherwise we continue to extend, and branch, the chains. If the proposed next word already exists in the current chain, then that branch is dead (to avoid looping forever).

Once all branches have been exhausted, we return the list of valid chains, sorted by length (shortest first).

Unfortunately, while it seemed like a good idea to generate all possible chains, it turns out that some of them can be very long. So I added a max length param, to cut short further exploration.

Even with that, execution can be pretty slow; so next time we’ll do some profiling, and see if caching the possible next words will help.

Word chains (Part 2)

Previously, we laid some groundwork for generating word chains. Rather than arbitrarily returning one word, we might as well get all the words that are one letter different from the first word:

prop_next_words_should_be_near() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            NextWords = word_chains:next_words(FirstWord),
            InvalidWords = lists:filter(fun(W) -> word_chains:get_word_distance(W, FirstWord) =/= 1 end, NextWords),
            length(InvalidWords) =:= 0
        end).

We can calculate the “word distance” using map/reduce:

get_word_distance(Word1, Word2) ->
    Differences = lists:zipwith(fun(X, Y) -> case X =:= Y of true -> 0; false -> 1 end end, Word1, Word2),
    lists:foldl(fun(D, Acc) -> Acc + D end, 0, Differences).

For each letter in Word1, we compare it with the same (position) letter in Word2, and assign a 0 if it matches and a 1 if it differs. The sum of these values tells us the difference between the 2 words.

2> word_chains:get_word_distance("cat", "cat").
0
3> word_chains:get_word_distance("cat", "cot").
1
4> word_chains:get_word_distance("cat", "cog").
2

Using this helper function, we can easily find all the possible next words:

next_words(FirstWord) ->
    WordList = word_list(),
    SameLengthWords = lists:filter(fun(W) -> length(W) =:= length(FirstWord) end, WordList),
    WordDistances = lists:map(fun(W) -> {W, get_word_distance(W, FirstWord)} end, SameLengthWords),
    lists:map(fun({Word, _}) -> Word end, lists:filter(fun({_, Distance}) -> Distance =:= 1 end, WordDistances)).

Almost there! Next time, we will actually start generating some word chains.

Word chains (Part 1)

I’ve recently been using the word chains kata for interviewing, and I thought it might be interesting to try using Erlang, and property testing.

The first step is to get a word list. I thought most linux distros came with a dictionary file, but my laptop only had a crack lib, which wasn’t really what I was looking for.

I had used this npm package before, so I just downloaded the text file it provides. With that hand, getting a list of words is easy:

word_list() ->
    {ok, Data} = file:read_file("words.txt"),
    binary:split(Data, [<<"\n">>], [global]).

The next step is to find all words that are one letter away from the first word. So we create a property:

prop_next_word_should_be_new() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            NextWord = word_chains:next_word(FirstWord, LastWord),
            NextWord =/= FirstWord
        end).

For each first word/last word pair, we check that the next word is different from the first word. We also need a generator, of valid words:

valid_words() ->
    ?SUCHTHAT({FirstWord, LastWord},
        ?LET(N, choose(2, 10),
            begin
                WordList = word_chains:word_list(),
                SameLengthWords = lists:filter(fun(W) -> length(W) =:= N end, WordList),
                {random_word(SameLengthWords), random_word(SameLengthWords)}
            end),
    FirstWord =/= LastWord).

random_word(Words) ->
    lists:nth(rand:uniform(length(Words)), Words).

First we pick a random number, in the range (2, 10), and then pick 2 words of that length, from the full word list, at random. This could result in the same word being used as both first & last word, so we filter that out, using the ?SUCHTHAT macro.

For now, we can make this pass by simply returning the last word:

next_word(_FirstWord, LastWord) ->
    LastWord.
$ ./rebar3 proper
===> Verifying dependencies...
===> Compiling word_chains
===> Testing prop_word_chains:prop_next_word_should_be_new()
....................................................................................................
OK: Passed 100 test(s).
===> 
1/1 properties passed

Boom! Next time, a more useful implementation of next word.

Withdraw!

The next transition is a withdrawal. Naively, we add another command:

open(_Data) ->
    [
        {open, {call, bank_statem, deposit, [pos_integer()]}},
        {open, {call, bank_statem, withdraw, [pos_integer()]}}
    ].

Which causes an immediate failure:

===> Testing prop_bank_statem:prop_test()
...
=ERROR REPORT==== 29-Apr-2018::15:08:41 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,2}}
** When server state  = {open,#{balance => 1}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,2},
                   #{balance => 1}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
Crash dump is being written to: erl_crash.dump...done

Having the entire process crash isn’t very useful, so we follow the recommendation to use the TRAPEXIT macro:

prop_test() ->
    ?FORALL(Cmds, proper_fsm:commands(?MODULE),
        ?TRAPEXIT(
            begin
                bank_statem:start_link(),
                {History,State,Result} = proper_fsm:run_commands(?MODULE, Cmds),
                bank_statem:stop(),
                ?WHENFAIL(io:format("History: ~p\nState: ~p\nResult: ~p\n", [History,State,Result]),
                    aggregate(zip(proper_fsm:state_names(History), command_names(Cmds)), Result =:= ok))
            end)).

This allows shrinking to take place, and gives us (slightly) more useful output:

===> Testing prop_bank_statem:prop_test()
.!
Failed: After 2 test(s).
A linked process died with reason {function_clause,[{bank_statem,open,[{call,{,#Ref}},{withdraw,3},#{balance=>0}],[{file,[47,97,112,112,47,95,98,117,105,108,100,47,116,101,115,116,47,108,105,98,47,98,97,110,107,95,115,116,97,116,101,109,47,115,114,99,47,98,97,110,107,95,115,116,97,116,101,109,46,101,114,108]},{line,46}]},{gen_statem,call_state_function,5,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1633}]},{gen_statem,loop_event_state_function,6,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1023}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,247}]}]}.

=ERROR REPORT==== 29-Apr-2018::15:11:36 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,3}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,3},
                   #{balance => 0}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
[{set,{var,1},{call,bank_statem,withdraw,[3]}}]

Shrinking (0 time(s))
[{set,{var,1},{call,bank_statem,withdraw,[3]}}]

=ERROR REPORT==== 29-Apr-2018::15:11:36 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,3}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,3},
                   #{balance => 0}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
....
===> 
0/1 properties passed, 1 failed
===> Failed test cases:
  prop_bank_statem:prop_test() -> false

We can see that the error is caused by trying to withdraw funds that don’t exist. This can be avoided by adding a pre-condition, to ensure that invalid commands aren’t generated:

precondition(_From, _To, #{balance:=Balance}, {call, bank_statem, withdraw, [Amount]}) ->
    Balance - Amount >= 0;

And also update our model when a withdrawal occurs:

next_state_data(_From, _To, #{balance:=Balance}=Data, _Res, {call, bank_statem, withdraw, [Amount]}) ->
    NewBalance = Balance - Amount,
    Data#{balance:=NewBalance};

At this point, I would expect the property to pass, but it’s still failing:

===> Testing prop_bank_statem:prop_test()
...................!
Failed: After 20 test(s).
A linked process died with reason {function_clause,[{bank_statem,open,[{call,{,#Ref}},{withdraw,9},#{balance=>9}],[{file,[47,97,112,112,47,95,98,117,105,108,100,47,116,101,115,116,47,108,105,98,47,98,97,110,107,95,115,116,97,116,101,109,47,115,114,99,47,98,97,110,107,95,115,116,97,116,101,109,46,101,114,108]},{line,46}]},{gen_statem,call_state_function,5,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1633}]},{gen_statem,loop_event_state_function,6,[{file,[103,101,110,95,115,116,97,116,101,109,46,101,114,108]},{line,1023}]},{proc_lib,init_p_do_apply,3,[{file,[112,114,111,99,95,108,105,98,46,101,114,108]},{line,247}]}]}.

=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...

Shrinking .
=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
.
=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
(2 time(s))

=ERROR REPORT==== 29-Apr-2018::15:14:34 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {withdraw,9}}
** When server state  = {open,#{balance => 9}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,open,
                  [{call,{,#Ref}},
                   {withdraw,9},
                   #{balance => 9}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,46}]},
...
[{set,{var,1},{call,bank_statem,deposit,[2]}},{set,{var,2},{call,bank_statem,deposit,[7]}},{set,{var,3},{call,bank_statem,withdraw,[9]}}]
===> 
0/1 properties passed, 1 failed
===> Failed test cases:
  prop_bank_statem:prop_test() -> false

Rather embarrassingly, that’s an actual bug in my code, the guard is checking that the remaining balance is strictly greater than 0. Hurray for property testing!

State machine properties

One of the more interesting corners of the Erlang ecosystem is property based testing. Building on Haskell’s QuickCheck, it allows generation of test cases, similar to “fuzzing”.

The integration with gen_fsm/statem makes stateful testing remarkably easy, by describing the possible transitions for our state machine:

-module(prop_bank_statem).

-include_lib("proper/include/proper.hrl").

-compile(export_all).

prop_test() ->
    ?FORALL(Cmds, proper_fsm:commands(?MODULE),
        begin
            bank_statem:start_link(),
            {History,State,Result} = proper_fsm:run_commands(?MODULE, Cmds),
            bank_statem:stop(),
            ?WHENFAIL(
                io:format("History: ~p\nState: ~p\nResult: ~p\n", [History,State,Result]),
                aggregate(zip(proper_fsm:state_names(History), command_names(Cmds)), Result =:= ok)
            )
        end).

initial_state() -> open.

initial_state_data() -> #{}.

open(_Data) -> [{open, {call, bank_statem, deposit, [pos_integer()]}}].

weight(_FromState, _ToState, _Call) -> 1.

precondition(_From, _To, #{}, {call, _Mod, _Fun, _Args}) -> true.

postcondition(_From, _To, _Data, {call, _Mod, _Fun, _Args}, _Res) -> true.

next_state_data(_From, _To, Data, _Res, {call, _Mod, _Fun, _Args}) ->
    NewData = Data,
    NewData.

To begin with, we’re just checking that a deposit can be made to an open account. This property succeeds (after the default 100 cases):

===> Testing prop_bank_statem:prop_test()
....................................................................................................
OK: Passed 100 test(s).

100% {open,{bank_statem,deposit,1}}
===> 
1/1 properties passed

But, if we switch to using a less strict generator:

open(_Data) -> [{open, {call, bank_statem, deposit, [integer()]}}].

We very quickly hit a failure:

===> Testing prop_bank_statem:prop_test()

=ERROR REPORT==== 29-Apr-2018::14:05:12 ===
** State machine bank_statem terminating
** Last event = {{call,{,#Ref}},
                 {deposit,0}}
** When server state  = {open,#{balance => 0}}
** Reason for termination = error:function_clause
** Callback mode = state_functions
** Stacktrace =
**  [{bank_statem,handle_deposit,
                  [0,
                   #{balance => 0},
                   {,#Ref}],
                  [{file,"/app/_build/test/lib/bank_statem/src/bank_statem.erl"},
                   {line,77}]},

caused by an attempt to make a deposit of 0.

To make things more interesting, we can amend our deposit transition to return the updated balance, and track that in the model:

initial_state_data() -> #{balance=>0}.

...

postcondition(_From, _To, #{balance:=PrevBalance}, {call, bank_statem, deposit, [Amount]}, {deposit_made, UpdatedBalance}) ->
    UpdatedBalance =:= (PrevBalance + Amount);

...

next_state_data(_From, _To, #{balance:=Balance}=Data, _Res, {call, bank_statem, deposit, [Amount]}) ->
    NewBalance = Balance + Amount,
    Data#{balance:=NewBalance};

For this toy example, the model is almost exactly the same as the implementation. And, in fact, the hardest thing with this style of property testing is keeping the model simple, when testing a more complex example.

Next time, we will add the rest of the transitions to our model.

Hold pls

On hold

The next step for our bank account is to place a hold on an account.

account-events-withself-held

Using state functions, this is as simple as:

open({call, From}, place_hold, Data) ->
    {next_state, held, Data, [{reply, From, hold_placed}]};

...

open({call, From}, {deposit, Amount}, Data) ->
    handle_deposit(Amount, Data, From);

...

held({call, From}, {deposit, Amount}, Data) ->
    handle_deposit(Amount, Data, From);

held({call, From}, remove_hold, Data) ->
    {next_state, open, Data, [{reply, From, hold_removed}]}.

handle_deposit(Amount, #{balance:=Balance} = Data, From) when is_number(Amount) andalso Amount > 0 ->
    NewBalance = Balance + Amount,
    {keep_state, Data#{balance:=NewBalance}, [{reply, From, deposit_made}]}.

Now any attempt to withdraw funds from a held account, will cause the gen_statem process to crash (hopefully having previously persisted any important data!).

And using handle event:

handle_event({call, From}, place_hold, open, Data) ->
    {next_state, held, Data, [{reply, From, hold_placed}]};

...

handle_event({call, From}, remove_hold, held, Data) ->
    {next_state, open, Data, [{reply, From, hold_removed}]};

...

handle_event({call, From}, {deposit, Amount}, open, Data) ->
    handle_deposit(Amount, Data, From);

handle_event({call, From}, {deposit, Amount}, held, Data) ->
    handle_deposit(Amount, Data, From);

...

handle_deposit(Amount, #{balance:=Balance} = Data, From) when is_number(Amount) andalso Amount > 0 ->
    NewBalance = Balance + Amount,
    {keep_state, Data#{balance:=NewBalance}, [{reply, From, deposit_made}]}.

Insufficient funds?

We can also add an availableToWithdraw method, with different behaviour for held accounts, without too much hassle:

open({call, From}, available_to_withdraw, #{balance:=Balance} = Data) ->
    {keep_state, Data, [{reply, From, Balance}]};

...

held({call, From}, available_to_withdraw, Data) ->
    {keep_state, Data, [{reply, From, 0}]};

Or:

handle_event({call, From}, available_to_withdraw, open, #{balance:=Balance} = Data) ->
    {keep_state, Data, [{reply, From, Balance}]};

...

handle_event({call, From}, available_to_withdraw, held, Data) ->
    {keep_state, Data, [{reply, From, 0}]};

Handle event function

The main difference between gen_statem and the older gen_fsm, is the addition of a new “handle event function” callback mode, with less restrictions on the state data type.

Converting an existing state machine, is pretty easy. Change the callback mode, and move the existing state functions into one callback:

callback_mode() -> handle_event_function.

handle_event({call, From}, get_balance, open, #{balance:=Balance} = Data) ->
    {keep_state, Data, [{reply, From, Balance}]};

handle_event({call, From}, close, open, Data) ->
    {next_state, closed, Data, [{reply, From, closed}]};

handle_event({call, From}, {deposit, Amount}, open, #{balance:=Balance} = Data) when is_number(Amount) andalso Amount > 0 ->
    NewBalance = Balance + Amount,
    {keep_state, Data#{balance:=NewBalance}, [{reply, From, deposit_made}]};

handle_event({call, From}, {withdraw, Amount}, open, #{balance:=Balance} = Data) when is_number(Amount) andalso (Balance - Amount > 0) ->
    NewBalance = Balance - Amount,
    {keep_state, Data#{balance:=NewBalance}, [{reply, From, withdrawal_made}]};

handle_event({call, From}, reopen, closed, Data) ->
    {next_state, open, Data, [{reply, From, open}]}.

Whether you prefer this style, or the previous version, seems like a matter of personal taste, if you don’t need the extra flexibility.