This presentation talked about:
- What TLA+, PlusCal and TLC are like
- How to get started with TLA+ by using Lost Update anomaly as
an example
- Overview of the TLA+ spec for ScalarDB
Task Manager Lease task-1 from the TM This is what the developers expect It’s hard to exhaustively test all the potential cases… task-1 is leased by Worker1 for a while The lease of task-1 is expired Crash Process task-1 Finished task-1 task-1 is completed task-1 is leased by Worker2 for a while Worker1 Worker2 Task Manager task-1 is leased by Worker1 for a while The lease of task-1 is expired Process task-1 Finished task-1 ???? task-1 is already aborted task-1 is leased by Worker2 for a while Process task-1 checking the expiration Stuck Notice the expiration of task-1 Aborted task-1 task-1 is aborted This could potentially happen (e.g., due to long STW by GC) Restarted. Doesn’t remember task-1 Lease task-1 from the TM Lease task-1 from the TM Lease task-1 from the TM
logic works in (almost) all potential cases - TLA+ - SPIN - Its language Promela has array data type, but doesn’t have set, map or list… - P-lang - It has some modern features and easy to use. But I felt the exhaustive checking mode was unstable… - Alloy From my experiences of playing with these tools (except for Alloy), TLA+ has a good balance of functionality, performance and exhaustive test capability.
Leslie Lamport - It’s based on mathematics and set theory - Easy to define “liveness” in addition to “safety” It’s not easy to write down the spec in TLA+…? https://lamport.azurewebsites.net/tla/summary-standalone.pdf
use of formal methods - It resembles imperative pseudo-code for describing concurrent algorithms - It serves as a front-end to TLA+, being converted into TLA+ code Easy to use. So, we don’t need to use TLA+ at all, right? Unfortunately, no. You’d need to write TLA+ code partially in the end
used for specifications written in TLA+ to verify properties and behaviors of systems - TLC can check both safety and liveness properties, ensuring correctness and reliability of system designs
- Download TLAToolbox-x.x.x-${OS}.zip from the latest release https://github.com/tlaplus/tlaplus/releases/tag - Extract the zip file somewhere - Execute toolbox executable file in the extracted directory
100 Read(x) x: 100 Read(x) x: 100 Write(100 + 1) x: 101 Write(100 + 1) x: 101 Increment x by 1 Increment x by 1 This should be 102! T1 T2 R1(x0) R2(x0) W1(x1) W2(x2) RW(x) WW(x) Not serializable…
necessary modules PlusCal code is written as TLA+ comments value starts with 100 value must be eventually 102 There are 2 processes to increment value Fetch the global value to the local variable Write the incremented local variable to the global value All the operations in a label are atomically executed
pc (program counter) variable is automatically added Init operator for initialization is automatically added All the PlusCal labels are converted into TLA+ operators This state Terminating is necessary to prevent deadlock caused by stuttering. (Stuttering is related to a concept of crash-stop fault) The specification is - Starts with Init operator - Execute Next operator - WF_vars is related to stuttering This liveness check is automatically added This data structure is similar to Map and Dictionary data structure. In this case, tmp_value is like {“t1”: 0, “t2”: 0} /\ means AND and \/ means OR. ${variable}’ means updated variable
operator This model can check if any deadlock doesn’t happen In this example, no invariants are used Specify the liveness-ish properties here Select “New Model”
a single coordinator state string. rState is called “function” which is very similar to Map and Dictionary data structures. In this case, the key is a record name and the value is a record state. This operator means: - Only when the record state is prepared - and the coordinator state is committed, - the record state will be updated to committed - the coordinator state won’t be updated In Next operator, either of these operators will be executed TypeOK and Consistent are kind of invariants
and TLC are like - How to get started with TLA+ by using Lost Update anomaly as an example - Overview of the TLA+ spec for ScalarDB See also - https://lamport.azurewebsites.net/tla/tla.html - https://lamport.azurewebsites.net/tla/summary-standalone.pdf - https://learntla.com/index.html - https://github.com/Apress/practical-tla-plus