I was looking at the Syn process registry recently, and thought it might be interesting to try modelling its netsplit recovery protocol (eventual consistency) in TLA+.
It’s always worth starting with something so simple, that it can’t be wrong. In this case I decided that scopes were out of scope, as was metadata; and I would first try to build a model for a single node.
Init ==
/\ registered = {}
/\ next_val = 0
/\ added = 0
/\ removed = 0
We have an empty set, for the registered values. A serial for the next item to be added (using integers as “keys”); and two counters, for the number of items added & removed.
Next ==
\/ Register
\/ Unregister
\/ Complete
The next state is a choice of registering a value, unregistering a value; or reaching the end (an arbitrary limit).
Register ==
/\ next_val < 5
/\ registered' = registered \union {next_val}
/\ next_val' = next_val + 1
/\ added' = added + 1
/\ UNCHANGED <<removed>>
If we haven’t hit that limit yet (pre-condition), then we add the next value to the set of registered values, and increment the counter.
ItemToRemove ==
CHOOSE r \in registered: TRUE
Unregister ==
/\ Cardinality(registered) > 0
/\ registered' = registered \ {ItemToRemove}
/\ removed' = removed + 1
/\ UNCHANGED <<added, next_val>>
Unregistration is pretty similar. The pre-condition this time is that the set is not empty, and obviously we remove the (randomly selected) key instead.
Complete ==
/\ next_val = 5
/\ UNCHANGED <<added, next_val, registered, removed>>
Finally, once 5 values have been added (no matter how many were removed), we give up.
INVARIANTS
AllRegistered
AllRegistered == Cardinality(registered) = (added - removed)
And after every step, we ensure that the number of items in the set is equal to those added minus those removed.
If we run the model:
$ docker run --rm -it -v $PWD:/app -w /app openjdk:14-jdk-alpine java -XX:+UseParallelGC -cp tla2tools.jar tlc2.TLC -config syn.cfg -workers auto -cleanup syn.tla
TLC2 Version 2.18 of Day Month 20?? (rev: cab6f13)
...
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-01-05 12:23:27.
Model checking completed. No error has been found.
Estimates of the probability that TLC did not check all reachable states
because two distinct states had the same fingerprint:
calculated (optimistic): val = 1.8E-17
37 states generated, 21 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 11.
No errors are found. Not a huge surprise, but this gives us a good foundation to build something more realistic.