PropEr – Look on my works, ye Mighty, and despair!

Word chains (Part 3)

January 7, 2019January 5, 2019Graham HayLeave a comment

Last time, we found the possible next words. Now we want to build on that, and use that function to build a chain from the first word, to the goal word. Sounds like a job for recursion (divide and conquer)!

This time, we’ll check that the chains generated all end in the expected word:

prop_all_chains_should_include_last_word() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            Words = word_chains:word_list(length(FirstWord)),
            Chains = word_chains:all_chains(FirstWord, LastWord, Words, length(FirstWord)),
            InvalidChains = lists:filter(fun([W|_]) -> W =/= LastWord end, Chains),
            length(InvalidChains) =:= 0
        end).

We pass in the word list (all words of the chosen length), to avoid reading the file multiple times.

all_chains(FirstWord, LastWord, Words, MaxLength) ->
    lists:sort(fun(A, B) -> length(A) =< length(B) end, all_chains(FirstWord, LastWord, Words, MaxLength, [[FirstWord]])).

all_chains(FirstWord, LastWord, Words, MaxLength, Chains) ->
    lists:append(lists:map(fun(Chain) ->
        [CurrentWord | _Rest] = Chain,
        case CurrentWord =:= LastWord of
            true -> [Chain];
            false ->
                NextWords = next_words(CurrentWord, Words),
                NewChains = compact(lists:map(fun(NewWord) ->
                    case lists:member(NewWord, Chain) of
                        false ->
                            NewChain = [NewWord | Chain],
                            case length(NewChain) > MaxLength of
                                true -> [];
                                false -> NewChain
                            end;
                        true -> []
                    end
                end, NextWords)),
                all_chains(FirstWord, LastWord, Words, MaxLength, NewChains)
        end
    end, Chains)).

Our first chain is simply the first word, e.g. [“cat”]. We then iterate over the list, and find all possible next words, and create the possible chains using those words, [[“bat”, “cat”], [“cab”, “cat”], &c …] .

If any chain ends in the target word, no more work is required. Otherwise we continue to extend, and branch, the chains. If the proposed next word already exists in the current chain, then that branch is dead (to avoid looping forever).

Once all branches have been exhausted, we return the list of valid chains, sorted by length (shortest first).

Unfortunately, while it seemed like a good idea to generate all possible chains, it turns out that some of them can be very long. So I added a max length param, to cut short further exploration.

Even with that, execution can be pretty slow; so next time we’ll do some profiling, and see if caching the possible next words will help.

Word chains (Part 2)

January 6, 2019January 5, 2019Graham HayLeave a comment

Previously, we laid some groundwork for generating word chains. Rather than arbitrarily returning one word, we might as well get all the words that are one letter different from the first word:

prop_next_words_should_be_near() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            NextWords = word_chains:next_words(FirstWord),
            InvalidWords = lists:filter(fun(W) -> word_chains:get_word_distance(W, FirstWord) =/= 1 end, NextWords),
            length(InvalidWords) =:= 0
        end).

We can calculate the “word distance” using map/reduce:

get_word_distance(Word1, Word2) ->
    Differences = lists:zipwith(fun(X, Y) -> case X =:= Y of true -> 0; false -> 1 end end, Word1, Word2),
    lists:foldl(fun(D, Acc) -> Acc + D end, 0, Differences).

For each letter in Word1, we compare it with the same (position) letter in Word2, and assign a 0 if it matches and a 1 if it differs. The sum of these values tells us the difference between the 2 words.

2> word_chains:get_word_distance("cat", "cat").
0
3> word_chains:get_word_distance("cat", "cot").
1
4> word_chains:get_word_distance("cat", "cog").
2

Using this helper function, we can easily find all the possible next words:

next_words(FirstWord) ->
    WordList = word_list(),
    SameLengthWords = lists:filter(fun(W) -> length(W) =:= length(FirstWord) end, WordList),
    WordDistances = lists:map(fun(W) -> {W, get_word_distance(W, FirstWord)} end, SameLengthWords),
    lists:map(fun({Word, _}) -> Word end, lists:filter(fun({_, Distance}) -> Distance =:= 1 end, WordDistances)).

Almost there! Next time, we will actually start generating some word chains.

Word chains (Part 1)

January 5, 2019Graham HayLeave a comment

I’ve recently been using the word chains kata for interviewing, and I thought it might be interesting to try using Erlang, and property testing.

The first step is to get a word list. I thought most linux distros came with a dictionary file, but my laptop only had a crack lib, which wasn’t really what I was looking for.

I had used this npm package before, so I just downloaded the text file it provides. With that hand, getting a list of words is easy:

word_list() ->
    {ok, Data} = file:read_file("words.txt"),
    binary:split(Data, [<<"\n">>], [global]).

The next step is to find all words that are one letter away from the first word. So we create a property:

prop_next_word_should_be_new() ->
    ?FORALL({FirstWord, LastWord}, valid_words(),
        begin
            NextWord = word_chains:next_word(FirstWord, LastWord),
            NextWord =/= FirstWord
        end).

For each first word/last word pair, we check that the next word is different from the first word. We also need a generator, of valid words:

valid_words() ->
    ?SUCHTHAT({FirstWord, LastWord},
        ?LET(N, choose(2, 10),
            begin
                WordList = word_chains:word_list(),
                SameLengthWords = lists:filter(fun(W) -> length(W) =:= N end, WordList),
                {random_word(SameLengthWords), random_word(SameLengthWords)}
            end),
    FirstWord =/= LastWord).

random_word(Words) ->
    lists:nth(rand:uniform(length(Words)), Words).

First we pick a random number, in the range (2, 10), and then pick 2 words of that length, from the full word list, at random. This could result in the same word being used as both first & last word, so we filter that out, using the ?SUCHTHAT macro.

For now, we can make this pass by simply returning the last word:

next_word(_FirstWord, LastWord) ->
    LastWord.

$ ./rebar3 proper
===> Verifying dependencies...
===> Compiling word_chains
===> Testing prop_word_chains:prop_next_word_should_be_new()
....................................................................................................
OK: Passed 100 test(s).
===> 
1/1 properties passed

Boom! Next time, a more useful implementation of next word.