We are starting to get to the business end. So how can we model a netsplit? The easiest thing seems to be to add a set for each node, listing the nodes that are visible from it:
AllOtherNodes(n) ==
Nodes \ {n}
Init ==
...
/\ visible_nodes = [n \in Nodes |-> AllOtherNodes(n)]
At the start, all nodes can see all other nodes. Now, when we register a new value:
Register(n) ==
...
/\ inbox' = [o \in Nodes |->
IF o \in visible_nodes[n] THEN
Append(inbox[o], [action |-> "sync_register", name |-> next_val])
ELSE
inbox[o]
]
we only send messages to the nodes that are still visible (same for unregister). And we add two new state transitions:
Next ==
/\ \E n \in Nodes:
...
\/ Disconnect(n)
\/ Reconnect(n)
To disconnect:
Disconnect(n) ==
/\ Cardinality(visible_nodes[n]) > 0
/\ LET other_node == CHOOSE o \in visible_nodes[n]: TRUE
IN visible_nodes' = [o \in Nodes |-> CASE
(o = other_node) -> visible_nodes[o] \ {n}
[] (o = n) -> visible_nodes[o] \ {other_node}
[] OTHER -> visible_nodes[o]
]
/\ UNCHANGED <<registered, locally_registered, inbox, next_val>>
If other nodes are still visible; we select a node, at random, from the set and remove it (on both sides). And reconnection is the opposite:
Reconnect(n) ==
/\ Cardinality(AllOtherNodes(n) \ visible_nodes[n]) > 0
/\ LET other_node == CHOOSE o \in (AllOtherNodes(n) \ visible_nodes[n]): TRUE
IN visible_nodes' = [o \in Nodes |-> CASE
(o = other_node) -> visible_nodes[o] \union {n}
[] (o = n) -> visible_nodes[o] \union {other_node}
[] OTHER -> visible_nodes[o]
]
/\ UNCHANGED <<registered, locally_registered, inbox, next_val>>
We also need to update our invariant:
AllRegistered ==
\A n \in Nodes:
Len(inbox[n]) = 0 /\ visible_nodes[n] = AllOtherNodes(n) => locally_registered[n] = registered
to only compare the registered values, once a netsplit has healed. If we run this model:
Error: Invariant AllRegistered is violated.
Error: The behavior up to this point is:
State 1: <Initial predicate>
/\ next_val = 0
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ locally_registered = (n1 :> {} @@ n2 :> {})
/\ registered = {}
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
State 2: <Next line 78, col 5 to line 85, col 19 of module syn>
/\ next_val = 0
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ locally_registered = (n1 :> {} @@ n2 :> {})
/\ registered = {}
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
State 3: <Next line 78, col 5 to line 85, col 19 of module syn>
/\ next_val = 1
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ locally_registered = (n1 :> {0} @@ n2 :> {})
/\ registered = {0}
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
State 4: <Next line 78, col 5 to line 85, col 19 of module syn>
/\ next_val = 1
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ locally_registered = (n1 :> {0} @@ n2 :> {})
/\ registered = {0}
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
it fails pretty quickly. But that is what we should expect, as new keys could have been added while the network was partitioned, and we aren’t doing anything to resolve that. And that is what we will be looking into, next time.