Fetch tags when cloning repo

I recently moved a freestyle Jenkins job to (multibranch) pipeline, and discovered that there was a subtle change in behaviour in the checkout being made. Before:

git fetch --tags --force --progress -- git@github.com:*** +refs/heads/*:refs/remotes/origin/* # timeout=10

And after:

git fetch --no-tags --force --progress -- https://github.com/*** +refs/heads/master:refs/remotes/origin/master # timeout=10

The important difference here being the --no-tags. This is probably a sensible default for most use cases (i.e. a faster clone), but this job was using semantic-release; which needs the tags.

It’s relatively simple to fix this in the ui, you can add an “advanced clone behaviours” block, and tick a box:

But this job is created using the Job DSL, and I couldn’t see any easy way to add that to the branchSource. In the end, I changed the checkout step from:

stage('Checkout') {
            steps {
                checkout scm
            }
}

to:

checkout scmGit(
                    branches: scm.branches,
                    extensions: [cloneOption(noTags: false, reference: '', shallow: false)],
                    userRemoteConfigs: scm.userRemoteConfigs
                )

It’s a Syn (Part 13)

It took a few attempts, but I managed to come up with a merge implementation that worked for both AckSync and SyncRegister, with more than 2 nodes:

Merge(left, right) ==
    LET to_keep == {name \in DOMAIN left : (name \notin DOMAIN right \/ left[name] > right[name])}
    IN [name \in to_keep |-> left[name]]

RECURSIVE Flatten(_, _, _)

Flatten(keys, struct, acc) ==
    IF keys = {} THEN acc
    ELSE
        LET k == CHOOSE k \in keys: TRUE
        IN Flatten(keys \ {k}, struct, acc @@ struct[k])

MergeRegistries(local, remote, remote_node) ==
    LET all_registered == Flatten(DOMAIN local, local, << >>)
    IN [r \in DOMAIN local |-> CASE
        (r = remote_node) -> local[r] @@ Merge(remote, all_registered)
        [] OTHER -> Merge(local[r], remote)
    ]

SyncRegister(n) ==
    ...
    l == MergeRegistries(locally_registered[n], [r \in {message.name} |-> message.time], message.from)

AckSync(n) ==
    ...
    l == MergeRegistries(locally_registered[n], message.local_data, message.from)

and now I can run simulation node (with 5 nodes & more keys), for hours; without failure. I have no idea how much of the search space has been covered though, I suspect not a lot.

And on that note, I think I’m done with this topic 🎉

It’s a Syn (Part 12)

Up till now, we have been using the default behaviour of TLC: model checking mode. But there is another tool in our quiver (mixing my metaphors there), simulation mode!

This is more like the way property testing tooling works: rather than a breadth-first search of the problem space, a random walk is generated. As we have discovered, the time taken to run an exhaustive model check rapidly expands, as the model becomes more complicated.

If you have added constraints to prevent the model running forever (e.g. in our case, the disconnection limit, and not re-using registered names), you can remove those now. When you run the spec:

$ docker run --rm -it -v $PWD:/app -w /app amazoncorretto:8-alpine3.17-jdk java -cp tla2tools.jar tlc2.TLC -simulate syn.tla
TLC2 Version 2.18 of 20 March 2023 (rev: 3ea3222)
Running Random Simulation with seed -7079380368092791143 with 1 worker on 8 cores with 3488MB heap and 64MB offheap memory (Linux 6.7.2-arch1-1 amd64, Amazon.com Inc. 1.8.0_392 x86_64).
...

it gives you the seed, so you can run the same tests again after fixing any issues. TLC will run forever in this mode, if there aren’t any failures, but we are not so lucky:

Error: Invariant ThereCanBeOnlyOne is violated.
Error: The behavior up to this point is:

...

State 34: <Next line 290, col 5 to line 303, col 22 of module syn>
/\ disconnections = 6
/\ locally_registered = ( n1 :> (n1 :> (a :> 2 @@ b :> 26) @@ n2 :> <<>> @@ n3 :> <<>>) @@
  n2 :> (n1 :> <<>> @@ n2 :> (a :> 22) @@ n3 :> <<>>) @@
  n3 :> (n1 :> (a :> 2) @@ n2 :> (a :> 22) @@ n3 :> << >>) )
/\ time = 33
/\ states = << <<"Register", n1, a>>,
   <<"Register", n2, a>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Disconnect", n2, n1>>,
   <<"Register", n1, b>>,
   <<"Disconnect", n2, n3>>,
   <<"Register", n2, b>>,
   <<"SyncRegister", n3, a>>,
   <<"Register", n1, c>>,
   <<"Reconnect", n3, n2>>,
   <<"Unregister", n1, a>>,
   <<"Register", n2, c>>,
   <<"Disconnect", n2, n3>>,
   <<"Down", n3, n2>>,
   <<"Reconnect", n3, n2>>,
   <<"Register", n1, d>>,
   <<"Unregister", n1, a>>,
   <<"Unregister", n1, a>>,
   <<"Reconnect", n1, n2>>,
   <<"Disconnect", n2, n1>>,
   <<"Reconnect", n2, n1>>,
   <<"Disconnect", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Register", n1, e>>,
   <<"Down", n1, n2>>,
   <<"Discover", n3, n2>>,
   <<"RegisterOrUpdateOnNode", n1, b>>,
   <<"Down", n3, n2>>,
   <<"Register", n1, f>>,
   <<"Disconnect", n1, n3>>,
   <<"Register", n3, b>>,
   <<"Discover", n3, n2>>,
   <<"SyncRegister", n3, a>> >>
/\ visible_nodes = (n1 :> {} @@ n2 :> {n3} @@ n3 :> {n2})
/\ registered = (a :> 2 @@ b :> 1 @@ c :> 0 @@ d :> 0 @@ e :> 0 @@ f :> 0 @@ g :> 0 @@ h :> 0)
/\ inbox = (n1 :> <<[action |-> "register_or_update_on_node", name |-> c], [action |-> "unregister_on_node", name |-> a], [action |-> "register_or_update_on_node", name |-> d], [action |-> "unregister_on_node", name |-> a], [action |-> "unregister_on_node", name |-> a], [action |-> "discover", from |-> n2], [action |-> "DOWN", from |-> n2], [action |-> "discover", from |-> n2], [action |-> "DOWN", from |-> n2], [action |-> "register_or_update_on_node", name |-> e], [action |-> "register_or_update_on_node", name |-> f], [action |-> "DOWN", from |-> n3]>> @@ n2 :> <<[time |-> 2, action |-> "sync_register", name |-> a, from |-> n1], [action |-> "DOWN", from |-> n1], [action |-> "DOWN", from |-> n3], [action |-> "register_or_update_on_node", name |-> b], [action |-> "discover", from |-> n3], [action |-> "register_or_update_on_node", name |-> c], [action |-> "DOWN", from |-> n3], [action |-> "discover", from |-> n3], [action |-> "discover", from |-> n1], [action |-> "DOWN", from |-> n1], [action |-> "discover", from |-> n1], [action |-> "DOWN", from |-> n1], [action |-> "ack_sync", from |-> n3, local_data |-> << >>], [action |-> "ack_sync", from |-> n3, local_data |-> << >>]>> @@ n3 :> <<[time |-> 26, action |-> "sync_register", name |-> b, from |-> n1], [action |-> "DOWN", from |-> n1], [action |-> "register_or_update_on_node", name |-> b]>>)
/\ names = (n1 :> {a, g, h} @@ n2 :> {d, e, f, g, h} @@ n3 :> {a, c, d, e, f, g, h})

The stack trace is much deeper now (34 transitions), and with 3 nodes it is much harder to understand what has gone wrong. This looks like an issue with my conflict resolution handling though, I am only checking for conflicts between the current node, and the sender. This works fine with a two node cluster, but not with 3 (or more).

I could probably change that check to \in RegisteredOnThisNode(n), but I would need to know which node had registered it, to resolve the conflict; and whatever change I make will also need to be reproduced in AckSync too. It might be time to bite the bullet, and factor out the conflict resolution (yet more recursion).

It’s a Syn (Part 11)

Assuming that it is possible to end up with a 🧟 pid, I think one solution would be to send a message when the conflict resolution kills a process:

SyncRegister(n) ==
    ...
       /\ inbox' = [o \in Nodes |->
            (IF (o = n) THEN
                Tail(inbox[o])
            ELSE
                IF keep_remote THEN
                    Append(inbox[o], [action |-> "killed", name |-> message.name, time |-> locally_registered[n][n][message.name], from |-> n])
                ELSE
                    inbox[o])
        ]

and then remove any exact matches, i.e. registered with the same name, at the same time:

Killed(n) ==
    /\ Len(inbox[n]) > 0
    /\ Head(inbox[n]).action = "killed"
    /\ LET message == Head(inbox[n])
        exact_match == message.name \in DOMAIN locally_registered[n][message.from] /\ locally_registered[n][message.from][message.name] = message.time
        l == (IF exact_match THEN
                [r \in (DOMAIN locally_registered[n][message.from] \ {message.name}) |-> locally_registered[n][message.from][r]]
            ELSE
                locally_registered[n][message.from])
        IN inbox' = [inbox EXCEPT![n] = Tail(inbox[n])]
        /\ locally_registered' = [locally_registered EXCEPT![n] = ([locally_registered[n] EXCEPT![message.from] = l])]

with this fix, I could run the model significantly longer without failure (but sadly ran out of memory on my laptop: I’m not sure I’m interested enough to pay AWS for the answer).

It’s a Syn (Part 10)

If we also update the refcount, when a conflict occurs during sync:

AckSync(n) ==
        ...
        conflicts == DOMAIN locally_registered[n][n] \intersect DOMAIN message.local_data
        c1 == [c \in { r \in conflicts : message.local_data[r] > locally_registered[n][n][r] } |-> registered[c] - 1]
        c2 == [c \in { r \in conflicts : locally_registered[n][n][r] > message.local_data[r] } |-> registered[c]]
        IN ...
        /\ registered' = c1 @@ c2 @@ [r \in (DOMAIN registered \ conflicts) |-> registered[r]]

we get another failure (I’ll paste the full stack trace, this time):

Error: Invariant AllRegistered is violated.
Error: The behavior up to this point is:
State 1: <Initial predicate>
/\ disconnections = 0
/\ locally_registered = (n1 :> (n1 :> <<>> @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> <<>>))
/\ time = 0
/\ states = <<>>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 0)
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ names = (n1 :> {a} @@ n2 :> {a})

State 2: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 0
/\ locally_registered = (n1 :> (n1 :> <<>> @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> <<>>))
/\ time = 1
/\ states = <<<<"Register", n1, a>>>>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 0)
/\ inbox = (n1 :> <<[action |-> "register_or_update_on_node", name |-> a]>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {a})

State 3: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> <<>> @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> <<>>))
/\ time = 2
/\ states = <<<<"Register", n1, a>>, <<"Disconnect", n1, n2>>>>
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
/\ registered = (a :> 0)
/\ inbox = ( n1 :>
      << [action |-> "register_or_update_on_node", name |-> a],
         [action |-> "DOWN", from |-> n2] >> @@
  n2 :> <<[action |-> "DOWN", from |-> n1]>> )
/\ names = (n1 :> {} @@ n2 :> {a})

The two nodes (n1 & n2) are disconnected:

State 4: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> <<>>))
/\ time = 3
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>> >>
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
/\ registered = (a :> 1)
/\ inbox = ( n1 :> <<[action |-> "DOWN", from |-> n2]>> @@
  n2 :> <<[action |-> "DOWN", from |-> n1]>> )
/\ names = (n1 :> {} @@ n2 :> {a})

State 5: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> <<>>))
/\ time = 4
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>> >>
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
/\ registered = (a :> 1)
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "DOWN", from |-> n1]>>)
/\ names = (n1 :> {} @@ n2 :> {a})

State 6: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> <<>>))
/\ time = 5
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>> >>
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
/\ registered = (a :> 1)
/\ inbox = ( n1 :> <<>> @@
  n2 :>
      << [action |-> "DOWN", from |-> n1],
         [action |-> "register_or_update_on_node", name |-> a] >> )
/\ names = (n1 :> {} @@ n2 :> {})

State 7: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> <<>>))
/\ time = 6
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>> >>
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
/\ registered = (a :> 1)
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "register_or_update_on_node", name |-> a]>>)
/\ names = (n1 :> {} @@ n2 :> {})

State 8: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> (a :> 6)))
/\ time = 7
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>> >>
/\ visible_nodes = (n1 :> {} @@ n2 :> {})
/\ registered = (a :> 2)
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {})

During the netsplit, both manage to register the same name (“a”)

State 9: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> (a :> 6)))
/\ time = 8
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 2)
/\ inbox = ( n1 :> <<[action |-> "discover", from |-> n2]>> @@
  n2 :> <<[action |-> "discover", from |-> n1]>> )
/\ names = (n1 :> {} @@ n2 :> {})

State 10: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> (a :> 6)))
/\ time = 9
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>>,
   <<"Unregister", n2, a>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 2)
/\ inbox = ( n1 :> <<[action |-> "discover", from |-> n2]>> @@
  n2 :>
      << [action |-> "discover", from |-> n1],
         [action |-> "unregister_on_node", name |-> a] >> )
/\ names = (n1 :> {} @@ n2 :> {})

State 11: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> (a :> 6)))
/\ time = 10
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>>,
   <<"Unregister", n2, a>>,
   <<"Discover", n1, n2>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 2)
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "discover", from |-> n1], [action |-> "unregister_on_node", name |-> a], [action |-> "ack_sync", from |-> n1, local_data |-> (a :> 2)]>>)
/\ names = (n1 :> {} @@ n2 :> {})

State 12: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> (a :> 2) @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> (a :> 6)))
/\ time = 11
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>>,
   <<"Unregister", n2, a>>,
   <<"Discover", n1, n2>>,
   <<"Discover", n2, n1>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 2)
/\ inbox = (n1 :> <<[action |-> "ack_sync", from |-> n2, local_data |-> (a :> 6)]>> @@ n2 :> <<[action |-> "unregister_on_node", name |-> a], [action |-> "ack_sync", from |-> n1, local_data |-> (a :> 2)]>>)
/\ names = (n1 :> {} @@ n2 :> {})

State 13: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> (a :> 6)) @@ n2 :> (n1 :> <<>> @@ n2 :> (a :> 6)))
/\ time = 12
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>>,
   <<"Unregister", n2, a>>,
   <<"Discover", n1, n2>>,
   <<"Discover", n2, n1>>,
   <<"AckSync", n1, n2>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 1)
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "unregister_on_node", name |-> a], [action |-> "ack_sync", from |-> n1, local_data |-> (a :> 2)]>>)
/\ names = (n1 :> {} @@ n2 :> {})

State 14: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> (a :> 6)) @@ n2 :> (n1 :> <<>> @@ n2 :> << >>))
/\ time = 13
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>>,
   <<"Unregister", n2, a>>,
   <<"Discover", n1, n2>>,
   <<"Discover", n2, n1>>,
   <<"AckSync", n1, n2>>,
   <<"UnregisterOnNode", n2, a>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 0)
/\ inbox = (n1 :> <<[action |-> "sync_unregister", name |-> a, from |-> n2]>> @@ n2 :> <<[action |-> "ack_sync", from |-> n1, local_data |-> (a :> 2)]>>)
/\ names = (n1 :> {} @@ n2 :> {})

State 15: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> << >>) @@ n2 :> (n1 :> <<>> @@ n2 :> << >>))
/\ time = 14
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>>,
   <<"Unregister", n2, a>>,
   <<"Discover", n1, n2>>,
   <<"Discover", n2, n1>>,
   <<"AckSync", n1, n2>>,
   <<"UnregisterOnNode", n2, a>>,
   <<"SyncUnregister", n1, a>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 0)
/\ inbox = (n1 :> <<>> @@ n2 :> <<[action |-> "ack_sync", from |-> n1, local_data |-> (a :> 2)]>>)
/\ names = (n1 :> {} @@ n2 :> {})

State 16: <Next line 249, col 5 to line 261, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> << >>) @@ n2 :> (n1 :> (a :> 2) @@ n2 :> << >>))
/\ time = 15
/\ states = << <<"Register", n1, a>>,
   <<"Disconnect", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Down", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n2, n1>>,
   <<"Unregister", n2, a>>,
   <<"Discover", n1, n2>>,
   <<"Discover", n2, n1>>,
   <<"AckSync", n1, n2>>,
   <<"UnregisterOnNode", n2, a>>,
   <<"SyncUnregister", n1, a>>,
   <<"AckSync", n2, n1>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 0)
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {})

Once re-connected, the sync takes place; but one node also un-registers the name, and we somehow end up with a zombie registration (to a pid that was killed by the conflict resolution).

So far, I’ve not been able to find anything in the Syn code (or the Erlang VM) that prevents this; but I’m quite open to it being a problem with my model 🤷

It’s a Syn (Part 9)

We are now handling registration conflicts correctly, but the invariant is being violated due to limitations in change tracking in our model.

We can move from a simple set, to refcounting:

Init ==
     ...
    /\ registered = [n \in Names |-> 0]

RegisterOrUpdateOnNode(n) ==
    ...
    registered' = [registered EXCEPT![message.name] = @ + 1]

If a name is registered on two nodes, it will look something like this: [a |-> 2]. Running this, we get a failure:

Error: Invariant AllRegistered is violated.
...

State 10: <Next line 235, col 5 to line 247, col 22 of module syn>
/\ disconnections = 0
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> (a :> 3)) @@ n2 :> (n1 :> << >> @@ n2 :> (a :> 3)))
/\ time = 9
/\ states = << <<"Register", n1, a>>,
   <<"Register", n2, a>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Unregister", n1, a>>,
   <<"SyncRegister", n1, a>>,
   <<"UnregisterOnNode", n1, a>>,
   <<"SyncRegister", n2, a>>,
   <<"SyncUnregister", n2, a>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 0)
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {})

It looks like we think a has been unregistered everywhere, when it hasn’t. If we update the condition for unregister, we now get a more interesting failure:

Error: Invariant ThereCanBeOnlyOne is violated.
...

State 12: <Next line 238, col 5 to line 250, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = ( n1 :> (n1 :> (a :> 3) @@ n2 :> <<>>) @@
  n2 :> (n1 :> (a :> 3) @@ n2 :> (a :> 9)) )
/\ time = 11
/\ states = << <<"Disconnect", n1, n2>>,
   <<"Register", n1, a>>,
   <<"Down", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Reconnect", n1, n2>>,
   <<"Register", n2, a>>,
   <<"Discover", n1, n2>>,
   <<"Down", n2, n1>>,
   <<"Discover", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"AckSync", n2, n1>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 2)
/\ inbox = (n1 :> <<[action |-> "ack_sync", from |-> n2, local_data |-> <<>>], [time |-> 9, action |-> "sync_register", name |-> a, from |-> n2]>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {})

Connor MacLeod would not be pleased. Currently, when a node re-joins the cluster, we are just taking its version of the data as the truth. We also need to implement conflict resolution here:

MergeRegistries(local, remote, local_node, remote_node) ==
    LET local_merged == [rr \in {r \in DOMAIN local[local_node] : (r \notin DOMAIN remote \/ local[local_node][r] > remote[r])} |-> local[local_node][rr]]
        remote_merged == [rr \in {r \in DOMAIN remote : (r \notin DOMAIN local[local_node] \/ remote[r] > local[local_node][r])} |-> remote[rr]]
    IN [r \in DOMAIN local |-> CASE
        (r = remote_node) -> remote_merged
        [] (r = local_node) -> local_merged
        [] OTHER -> local[r]
    ]

AckSync(n) ==
    ...
    l == MergeRegistries(locally_registered[n], message.local_data, n, message.from)

The real implementation actually uses the same code as for registration, which is probably sensible; so I might need to refactor this later.

Running this looked like it might run forever (again), as I think I added another stuttering case by allowing multiple attempts to register the same name. If we remove that, we get a pretty long run; but eventually it still fails:

Error: Invariant AllRegistered is violated.
...

State 16: <Next line 247, col 5 to line 259, col 22 of module syn>
/\ disconnections = 1
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> << >>) @@ n2 :> (n1 :> << >> @@ n2 :> << >>))
/\ time = 15
/\ states = << <<"Disconnect", n1, n2>>,
   <<"Register", n1, a>>,
   <<"Down", n1, n2>>,
   <<"RegisterOrUpdateOnNode", n1, a>>,
   <<"Register", n2, a>>,
   <<"Down", n2, n1>>,
   <<"RegisterOrUpdateOnNode", n2, a>>,
   <<"Reconnect", n1, n2>>,
   <<"Discover", n1, n2>>,
   <<"Discover", n2, n1>>,
   <<"AckSync", n1, n2>>,
   <<"Unregister", n2, a>>,
   <<"AckSync", n2, n1>>,
   <<"UnregisterOnNode", n2, a>>,
   <<"SyncUnregister", n1, a>> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = (a :> 1)
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {})

And this time, it’s the other way round. We think a should still be registered, somewhere, but it’s not. I think the problem here is that when we resolve conflicts on sync, we aren’t updating the refcount.

It’s a Syn (Part 7)

Before we can resolve conflicts, we have to generate some. We can let each node try and register the same pool of names:

Init ==
     ...
    /\ names = [n \in Nodes |-> 1..MaxValues]

 Register(n) ==
    /\ names[n] # {}
    /\ LET next_val == CHOOSE o \in (names[n]): TRUE
         IN ...
        /\ names' = [names EXCEPT![n] = names[n] \ {next_val}]

This produces an immediate failure, and reveals that we have a race even without disconnection:

...

State 3: <Next line 139, col 5 to line 149, col 22 of module syn>
/\ states = <<<<"Register", n1, 1>>, <<"Register", n2, 1>>>>
/\ names = (n1 :> {} @@ n2 :> {})
/\ inbox = ( n1 :> <<[action |-> "sync_register", name |-> 1, from |-> n2]>> @@
  n2 :> <<[action |-> "sync_register", name |-> 1, from |-> n1]>> )
/\ locally_registered = (n1 :> (n1 :> {1} @@ n2 :> {}) @@ n2 :> (n1 :> {} @@ n2 :> {1}))
/\ disconnections = 0
/\ registered = {1}
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})

Both nodes can attempt to register the same name, even when connected, if it’s before the sync_register message has arrived/been processed.

Let’s park that, for now. Our conflict resolution process will pick the most recent registration as the winner, so we will need the concept of time:

Init ==
    ...
    /\ time = 0

Register(n) ==
    ...
    /\ time' = time + 1

A counter incremented by every state transition. Now, instead of a storing registrations in a set, we will have a struct (function) of name -> time:

 Init ==
    ...
    /\ locally_registered = [n1 \in Nodes |-> [n2 \in Nodes |-> <<>>]]
    /\ names = {"a"}

I was seeing some confusing behaviour from TLC when using integer keys for the registered names, so I switched to a set of strings.

Register(n) ==
    /\ names # {}
    /\ LET next_val == CHOOSE o \in names: TRUE
        l == [locally_registered[n] EXCEPT![n] = locally_registered[n][n] @@ [r \in {next_val} |-> time]]
        IN registered' = registered \union {next_val}
        /\ locally_registered' = [locally_registered EXCEPT![n] = l]
        /\ names' = names \ {next_val}
    ...

SyncRegister(n) ==
    ...
    /\ LET message == Head(inbox[n])
        l == [locally_registered[n] EXCEPT![message.from] = locally_registered[n][message.from] @@ [r \in {message.name} |-> message.time]]

now locally_registered will look something like this:

( n1 :> (n1 :> << >> @@ n2 :> [a |-> 0]) @@ n2 :> (n1 :> << >> @@ n2 :> [a |-> 0]) )

This model still has no failures, as we can only register a key once. If that limitation is removed, we need to handle a conflict:

        conflict == message.name \in DOMAIN locally_registered[n][n]
        l == [o \in DOMAIN locally_registered[n] |-> CASE
            (o = message.from) -> (
                IF conflict /\ locally_registered[n][n][message.name] > message.time THEN
                    locally_registered[n][message.from]
                ELSE
                    locally_registered[n][message.from] @@ [r \in {message.name} |-> message.time]
            )
            [] (o = n) ->
                IF conflict /\ locally_registered[n][n][message.name] > message.time THEN
                    locally_registered[n][n]
                ELSE
                    [r \in (DOMAIN locally_registered[n][n] \ {message.name}) |-> locally_registered[n][n][r]]
            [] OTHER -> locally_registered[n][o]
        ]

We keep whichever process was registered with a more recent time. I also added the “Highlander” invariant, to guarantee that we can’t keep both:

 INVARIANTS
...
ThereCanBeOnlyOne

RECURSIVE Duplicates(_, _, _)

Duplicates(keys, struct, acc) ==
    IF Cardinality(keys) < 2 THEN acc
    ELSE
        LET k1 == CHOOSE k \in keys: TRUE
        k2 == CHOOSE k \in (keys \ {k1}): TRUE
        duplicates == DOMAIN struct[k1] \intersect DOMAIN struct[k2]
        IN Duplicates(keys \ {k1}, struct, duplicates \union acc)

ThereCanBeOnlyOne ==
    \A n \in Nodes:
        Duplicates(DOMAIN locally_registered[n], locally_registered[n], {}) = {}

We immediately hit a failure:

...

State 4: <Next line 176, col 5 to line 186, col 22 of module syn>
/\ states = <<<<"Register", n1, "a">>, <<"Register", n2, "a">>, <<"Unregister", n1, "a">>>>
/\ time = 3
/\ names = (n1 :> {} @@ n2 :> {})
/\ inbox = ( n1 :>
      <<[time |-> 1, action |-> "sync_register", name |-> "a", from |-> n2]>> @@
  n2 :>
      << [time |-> 0, action |-> "sync_register", name |-> "a", from |-> n1],
         [action |-> "sync_unregister", name |-> "a", from |-> n1] >> )
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> <<>>) @@ n2 :> (n1 :> <<>> @@ n2 :> [a |-> 1]))
/\ disconnections = 0
/\ registered = {}
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})

Both nodes have tried to register the same name, but before we have tried to resolve the conflict, one node has removed its registration; and even though the other node still has a (valid) registration, we are now expecting it to not exist anywhere.

The easiest way to solve this, is to check all other nodes, before removing the name from the canonical set:

RECURSIVE AllRegisteredNames(_, _, _)

AllRegisteredNames(nodes, locals, registrations) ==
    IF nodes = {} THEN registrations
    ELSE
        LET n == CHOOSE n \in nodes: TRUE
        IN AllRegisteredNames(nodes \ {n}, locals, registrations \union DOMAIN locals[n][n])

RegisteredElsewhere(node) ==
    AllRegisteredNames(AllOtherNodes(node), locally_registered, {})

Unregister(n) ==
    ...
     /\ LET item_to_remove == ItemToRemove(n)
        IN registered' = (IF item_to_remove \in RegisteredElsewhere(n) THEN registered ELSE registered \ {item_to_remove})

Now the failure has morphed slightly:

...

State 7: <Next line 187, col 5 to line 197, col 22 of module syn>
/\ disconnections = 0
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> << >>) @@ n2 :> (n1 :> [a |-> 0] @@ n2 :> << >>))
/\ time = 6
/\ states = << <<"Register", n1, "a">>,
   <<"Register", n2, "a">>,
   <<"SyncRegister", n1, "a">>,
   <<"Unregister", n2, "a">>,
   <<"SyncUnregister", n1, "a">>,
   <<"SyncRegister", n2, "a">> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = {}
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {})

And the 👻 registration is actually the process that lost the conflict resolution battle.

This caused a fair bit of head scratching, and I nearly persuaded myself that I had found a real race condition. My first thought was that a sync_register message for a dead process would be discarded, but erlang:is_process_alive can only be used on a local pid.

Now I think that the root cause is that I am treating register/unregister as sync calls (when in reality they use gen_server:call); which has allowed them to push into the queue, ahead of the sync_register message.

We’ll see if I’m right, next time!

It’s a Syn (Part 8)

If we make the register request async:

Register(n) ==
    ...
            IN inbox' = [inbox EXCEPT![n] = Append(inbox[n], [action |-> "register_or_update_on_node", name |-> next_val])]

RegisterOrUpdateOnNode(n) ==
    /\ Len(inbox[n]) > 0
    /\ Head(inbox[n]).action = "register_or_update_on_node"
    /\ LET message == Head(inbox[n])
        l == [locally_registered[n] EXCEPT![n] = locally_registered[n][n] @@ [r \in {message.name} |-> time]]
        already_registered == message.name \in AllRegisteredForNode(locally_registered[n])
        IN
        (IF already_registered THEN
            \* {error, taken}
            registered' = registered
            /\ locally_registered' = locally_registered
            /\ inbox' = [inbox EXCEPT![n] = Tail(inbox[n])]
            /\ names' = names
        ELSE
            registered' = registered \union {message.name}
            /\ locally_registered' = [locally_registered EXCEPT![n] = l]
            /\ inbox' = [o \in Nodes |-> CASE
                (o = n) -> Tail(inbox[n])
                [] (o \in visible_nodes[n]) -> Append(inbox[o], [action |-> "sync_register", name |-> message.name, from |-> n, time |-> time])
                [] OTHER -> inbox[o]
            ]
            /\ names' = [names EXCEPT![n] = names[n] \ {message.name}]
        )
        /\ states' = Append(states, <<"RegisterOrUpdateOnNode", n, message.name>>)
    /\ time' = time + 1
    /\ UNCHANGED <<visible_nodes, disconnections>>

we now need to handle the case that the name was registered in the new race we have added.

Other than that the process is the same, and the same vars are updated. If we do the same for unregister:

Unregister(n) ==
        ...
        IN inbox' = [inbox EXCEPT![n] = Append(inbox[n], [action |-> "unregister_on_node", name |-> item_to_remove])]
 
UnregisterOnNode(n) ==
    /\ Len(inbox[n]) > 0
    /\ Head(inbox[n]).action = "unregister_on_node"
    /\ LET message == Head(inbox[n])
        l == [r \in (DOMAIN locally_registered[n][n] \ {message.name}) |-> locally_registered[n][n][r]]
        already_removed == message.name \notin AllRegisteredForNode(locally_registered[n])
        IN
        (IF already_removed THEN
            \* {error, undefined}
            registered' = registered
            /\ locally_registered' = locally_registered
            /\ inbox' = [inbox EXCEPT![n] = Tail(inbox[n])]
            /\ names' = names
        ELSE
            registered' = (IF message.name \in RegisteredElsewhere(n) THEN registered ELSE registered \ {message.name})
            /\ locally_registered' = [locally_registered EXCEPT![n] = ([locally_registered[n] EXCEPT![n] = l])]
            /\ inbox' = [o \in Nodes |-> CASE
                (o = n) -> Tail(inbox[n])
                [] (o \in visible_nodes[n]) -> Append(inbox[o], [action |-> "sync_unregister", name |-> message.name, from |-> n])
                [] OTHER -> inbox[o]
            ]
            /\ names' = [names EXCEPT![n] = names[n] \ {message.name}]
        )
        /\ states' = Append(states, <<"UnregisterOnNode", n, message.name>>)
    /\ time' = time + 1
    /\ UNCHANGED <<visible_nodes, disconnections>>

we still have a failure of the same invariant:

Error: Invariant AllRegistered is violated.
Error: The behavior up to this point is:
...

State 10: <Next line 234, col 5 to line 246, col 22 of module syn>
/\ disconnections = 0
/\ locally_registered = (n1 :> (n1 :> << >> @@ n2 :> << >>) @@ n2 :> (n1 :> <<>> @@ n2 :> << >>))
/\ time = 9
/\ states = << <<"Register", n1, "a">>,
   <<"Register", n2, "a">>,
   <<"RegisterOrUpdateOnNode", n1, "a">>,
   <<"RegisterOrUpdateOnNode", n2, "a">>,
   <<"SyncRegister", n2, "a">>,
   <<"Unregister", n2, "a">>,
   <<"UnregisterOnNode", n2, "a">>,
   <<"SyncRegister", n1, "a">>,
   <<"SyncUnregister", n1, "a">> >>
/\ visible_nodes = (n1 :> {n2} @@ n2 :> {n1})
/\ registered = {"a"}
/\ inbox = (n1 :> <<>> @@ n2 :> <<>>)
/\ names = (n1 :> {} @@ n2 :> {})

but now the problem is that our canonical list of registrations still thinks the name should be there, even though it was removed.

I think this fix was too simplistic, and I will need to go back to my first idea: of refcounting registrations.

It’s a Syn (Part 6)

So the other half of the protocol requires that registrations for a disconnected node are removed. First, we send a DOWN message on disconnect (to both nodes):

Disconnect(n) ==
          ...
        /\ inbox' = [o \in Nodes |-> CASE
            (o = n) -> Append(inbox[o], [action |-> "DOWN", from |-> other_node])
            [] (o = other_node) -> Append(inbox[o], [action |-> "DOWN", from |-> n])
            [] OTHER -> inbox[o]
        ]

And just do nothing with the message, for now:

Down(n) ==
    /\ Len(inbox[n]) > 0
    /\ Head(inbox[n]).action = "DOWN"
    /\ inbox' = [inbox EXCEPT![n] = Tail(inbox[n])]
    /\ states' = Append(states, "Down")
    /\ UNCHANGED <<registered, next_val, visible_nodes, locally_registered>>

Next ==
    /\ \E n \in Nodes:
        ...
        \/ Down(n)

To be able to remove the registrations for the disconnected node, we need to track who has registered what. There’s probably more than one way to achieve this, but I decided to convert my locally_registered struct into a deeper HashMap<Node, HashMap<Node, Set>>; so that each node has its own map of what the others have registered:

 Init ==
    ...
    /\ locally_registered = [n1 \in Nodes |-> [n2 \in Nodes |-> {}]]

This makes updates more complicated:

 Register(n) ==
     /\ next_val < MaxValues
     /\ registered' = registered \union {next_val}
     /\ LET l == [locally_registered[n] EXCEPT![n] = locally_registered[n][n] \union {next_val}]
        IN locally_registered' = [locally_registered EXCEPT![n] = l]

which we will just have to live with. And the invariant needs to create a superset for comparison:

RECURSIVE ReduceStruct(_, _, _)

ReduceStruct(keys, struct, acc) ==
    IF keys = {} THEN acc
    ELSE
        LET k == CHOOSE k \in keys: TRUE
        IN ReduceStruct(keys \ {k}, struct, acc \union struct[k])

AllRegisteredForNode(locals) ==
    ReduceStruct(DOMAIN locals, locals, {})

 AllRegistered ==
     \A n \in Nodes:
        (\A o \in Nodes: Len(inbox[o]) = 0) /\ visible_nodes[n] = AllOtherNodes(n) => AllRegisteredForNode(locally_registered[n]) = registered

This is pretty fugly, but if there’s a built in way to do it, I couldn’t find it. Finally, we can use this extra info, to remove the remote registrations:

Down(n) ==
    ...
    /\ LET message == Head(inbox[n])
        l == [locally_registered[n] EXCEPT![message.from] = {}]
        IN locally_registered' = [locally_registered EXCEPT![n] = l]

If we run this, the error seems to be gone, but it also looks like it is going to run forever (again). So we also need to limit the number of disconnections:

Init ==
     ...
    /\ disconnections = 0

 Disconnect(n) ==
    /\ disconnections < MaxDisconnections
    ...
    /\ disconnections' = disconnections + 1

Back to a working system! But even with a strictly limited number of nodes/registrations/disconnects:

CONSTANTS
Nodes = {n1, n2}
MaxValues = 2 
MaxDisconnections = 1 

we have had a combinatorial explosion of interleavings, and running this model now takes 45 mins on my laptop:

Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 2.7E-4
  based on the actual fingerprints:  val = 3.1E-4
148808949 states generated, 49693165 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 17.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 7 and the 95th percentile is 2).
Finished in 46min 15s at (2024-01-10 13:55:28)

Nearly 50M distinct states! As I said previously, this is the downside to an exhaustive search of the domain. If this is an existential question for your business (e.g. a smart contract, that can’t be fixed once out in the wild; or a spacecraft), then you might need to just rent a honking great EC2 instance and run your model for a week.

There are some possible optimisations. We can declare the nodes to be a “symmetry set“, removing any states that are mirror images of each other (about half in a 2 node cluster). This makes a huge difference:

74404477 states generated, 24846583 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 17.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 7 and the 95th percentile is 2).
Finished in 05min 03s at (2024-01-10 14:19:02)

but you need to be confident that it really doesn’t affect the outcome. Beyond that, the only option is to start removing some concurrency (i.e. combine two state transitions), and hope that those missing interleavings didn’t reveal any bugs.

And we still haven’t got to the interesting bit! As it stands, our model can never have any registration conflicts because a) we are using an incrementing integer as the key, and b) when we sync, we are unioning two sets, which would be quite happy with two nodes having registered the same name.

Next time, we will look into the default conflict resolution process.

Testing fragments of TLA+

As your specs grow, you quickly reach a point where the feedback loop of running the whole model (even with restricted inputs) is too slow.

If you are using the Toolbox (GUI), it’s pretty easy to set up a scratchfile; but it is also possible to write some “unit” tests for any helper functions:

---- MODULE syn_tests ----
EXTENDS syn

VARIABLE x

I == FALSE /\ x = 0 
N == FALSE /\ x' = x 

...

\* AllRegisteredForNode

ASSUME LET l == ("n1" :> <<>> @@ "n2" :> [a |-> 0, b |-> 2] @@ "n3" :> [c |-> 1])
       res == AllRegisteredForNode(l)
       IN res = {"a", "b", "c"}

\* Duplicates

ASSUME LET l == ("n1" :> <<>> @@ "n2" :> [a |-> 0] @@ "n3" :> [a |-> 1])
       res == Duplicates(DOMAIN l, l, {})
       IN res = {"a"}
====

We need to override the Init and Next functions with this config file (syn_tests.cfg), to prevent the existing model from running:

INIT I
NEXT N

If the assumptions are all valid:

$ docker run --rm -it -v $PWD:/app -w /app amazoncorretto:8-alpine3.17-jdk java -XX:+UseParallelGC -cp tla2tools.jar tlc2.TLC -workers auto -cleanup scratch.tla
TLC2 Version 2.18 of 20 March 2023 (rev: 3ea3222)
...
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 0.0
0 states generated, 0 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 1.
Finished in 00s at (2024-01-17 14:16:36)

The failure output isn’t very useful:

Starting... 
Error: Assumption line 17, col 8 to line 19, col 18 of module syn_tests is false.
Finished in 00s 

But you can output values to debug, using PrintT:

ASSUME ... IN PrintT(res) /\ res = {}

Starting...
{"a"}
Error: Assumption line 17, col 8 to line 19, col 33 of module syn_tests is false.

There are some more examples here.