It’s a Syn (Part 3)

I said last time that waiting for all messages to be processed before performing any new actions was unrealistic, so let’s remove those two pre-conditions:

diff --git a/syn.tla b/syn.tla
index 5bab535..d75d5d3 100644
--- a/syn.tla
+++ b/syn.tla
@@ -15,7 +15,6 @@ Init ==
     /\ removed = 0
 
 Register(n) ==
-    /\ \A o \in Nodes: Len(inbox[o]) = 0
     /\ next_val < MaxValues
     /\ registered' = [registered EXCEPT![n] = registered[n] \union {next_val}]
     /\ next_val' = next_val + 1
@@ -34,7 +33,6 @@ ItemToRemove(n) ==
     CHOOSE r \in registered[n]: TRUE
 
 Unregister(n) ==
-    /\ \A o \in Nodes: Len(inbox[o]) = 0
     /\ Cardinality(registered[n]) > 0
     /\ LET item_to_remove == ItemToRemove(n)
         IN registered' = [registered EXCEPT![n] = registered[n] \ {item_to_remove}]

Now if we run the model checker, we have a problem:

Error: Invariant AllRegistered is violated.
Error: The behavior up to this point is:
State 1: <Initial predicate>
/\ next_val = 0
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ added = 0
/\ removed = 0
/\ registered = (n1 :> {} @@ n2 :> {})

State 2: <Next line 55, col 5 to line 60, col 19 of module syn>
/\ next_val = 1
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "sync_register", name |-> 0]>>)
/\ added = 1
/\ removed = 0
/\ registered = (n1 :> {0} @@ n2 :> {})

State 3: <Next line 55, col 5 to line 60, col 19 of module syn>
/\ next_val = 1
/\ inbox = ( n1 :> <<>> @@
  n2 :>
      << [action |-> "sync_register", name |-> 0],
         [action |-> "sync_unregister", name |-> 0] >> )
/\ added = 1
/\ removed = 1
/\ registered = (n1 :> {} @@ n2 :> {})

State 4: <Next line 55, col 5 to line 60, col 19 of module syn>
/\ next_val = 1
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "sync_unregister", name |-> 0]>>)
/\ added = 1
/\ removed = 1
/\ registered = (n1 :> {} @@ n2 :> {0})

State 5: <Next line 55, col 5 to line 60, col 19 of module syn>
/\ next_val = 1
/\ inbox = ( n1 :> <<[action |-> "sync_unregister", name |-> 0]>> @@
  n2 :> <<[action |-> "sync_unregister", name |-> 0]>> )
/\ added = 1
/\ removed = 2
/\ registered = (n1 :> {} @@ n2 :> {})

State 6: <Next line 55, col 5 to line 60, col 19 of module syn>
/\ next_val = 1
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "sync_unregister", name |-> 0]>>)
/\ added = 1
/\ removed = 2
/\ registered = (n1 :> {} @@ n2 :> {})

In this scenario, a key (0) was added on node 1 and then immediately removed. But node 2 still had both messages waiting, so it also added the key; and then decided to remove it itself before getting to the unregister message from node 1. This is getting more interesting, but the problem lies with our model, not the protocol.

It’s generally considered bad form to update the registry for another process (and this can actually be enforced); but the real problem here is that even though removing an item that doesn’t exist from a set is a no-op, we are double counting. So we think that we have added 1 item, and removed 2; and therefore the size of the set should be -1, which is never going to happen.

Assuming we want to allow the non-strict behaviour, we could probably solve this with some clever conditional updates of the counters; but I think I would prefer instead to track a “platonic ideal” of what should be registered, and compare the actual state to that instead.

Init ==
    ...
    /\ registered = {}
    /\ locally_registered = [n \in Nodes |-> {}]

Register(n) ==
    ...
    /\ registered' = registered \union {next_val}
    /\ locally_registered' = [locally_registered EXCEPT![n] = locally_registered[n] \union {next_val}]

AllRegistered ==
    \A n \in Nodes:
        Len(inbox[n]) = 0 => locally_registered[n] = registered

This does remove the previous error, but even with just 5 keys, we are generating far more states than before:

Model checking completed. No error has been found.
    ...
1782413 states generated, 357188 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 26.

This means running the model will be slower, but should also mean it’s more likely to find some interesting concurrent executions.

Next stop, netsplits!

Leave a comment