Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction of TLA+ toolbox

Introduction of TLA+ toolbox

A brief introduction of TLA+ toolbox.

This presentation talked about:
- What TLA+, PlusCal and TLC are like
- How to get started with TLA+ by using Lost Update anomaly as
an example
- Overview of the TLA+ spec for ScalarDB

Mitsunori Komatsu

March 03, 2024
Tweet

More Decks by Mitsunori Komatsu

Other Decks in Programming

Transcript

  1. Distributed system can have a bunch of states… Worker1 Worker2

    Task Manager Lease task-1 from the TM This is what the developers expect It’s hard to exhaustively test all the potential cases… task-1 is leased by Worker1 for a while The lease of task-1 is expired Crash Process task-1 Finished task-1 task-1 is completed task-1 is leased by Worker2 for a while Worker1 Worker2 Task Manager task-1 is leased by Worker1 for a while The lease of task-1 is expired Process task-1 Finished task-1 ???? task-1 is already aborted task-1 is leased by Worker2 for a while Process task-1 checking the expiration Stuck Notice the expiration of task-1 Aborted task-1 task-1 is aborted This could potentially happen (e.g., due to long STW by GC) Restarted. Doesn’t remember task-1 Lease task-1 from the TM Lease task-1 from the TM Lease task-1 from the TM
  2. Model checking tools are very useful to see if the

    logic works in (almost) all potential cases - TLA+ - SPIN - Its language Promela has array data type, but doesn’t have set, map or list… - P-lang - It has some modern features and easy to use. But I felt the exhaustive checking mode was unstable… - Alloy From my experiences of playing with these tools (except for Alloy), TLA+ has a good balance of functionality, performance and exhaustive test capability.
  3. TLA+ - TLA+ is a formal specification language developed by

    Leslie Lamport - It’s based on mathematics and set theory - Easy to define “liveness” in addition to “safety” It’s not easy to write down the spec in TLA+…? https://lamport.azurewebsites.net/tla/summary-standalone.pdf
  4. PlusCal - PlusCal is a language designed to simplify the

    use of formal methods - It resembles imperative pseudo-code for describing concurrent algorithms - It serves as a front-end to TLA+, being converted into TLA+ code Easy to use. So, we don’t need to use TLA+ at all, right? Unfortunately, no. You’d need to write TLA+ code partially in the end
  5. TLC (Temporal Logic Checker) - TLC is a model checker

    used for specifications written in TLA+ to verify properties and behaviors of systems - TLC can check both safety and liveness properties, ensuring correctness and reliability of system designs
  6. Install TLA+ toolbox - Basically follow the instructions on https://lamport.azurewebsites.net/tla/toolbox.html

    - Download TLAToolbox-x.x.x-${OS}.zip from the latest release https://github.com/tlaplus/tlaplus/releases/tag - Extract the zip file somewhere - Execute toolbox executable file in the extracted directory
  7. Example : Lost Update Txn 1 Txn 2 Values x:

    100 Read(x) x: 100 Read(x) x: 100 Write(100 + 1) x: 101 Write(100 + 1) x: 101 Increment x by 1 Increment x by 1 This should be 102! T1 T2 R1(x0) R2(x0) W1(x1) W2(x2) RW(x) WW(x) Not serializable…
  8. Example : Lost Update Write the spec in PlusCal Import

    necessary modules PlusCal code is written as TLA+ comments value starts with 100 value must be eventually 102 There are 2 processes to increment value Fetch the global value to the local variable Write the incremented local variable to the global value All the operations in a label are atomically executed
  9. Example : Lost Update Launch TLA+ toolbox and add the

    specification Select “New Specification” Specify your PlusCal (TLA+) file
  10. Example : Lost Update Translate the PlusCal algorithm into TLA+

    Select “File” -> “Translate PlusCal Algorithm” Then, the TLA+ code will be generated here
  11. Example : Lost Update Look at the generated TLA+ New

    pc (program counter) variable is automatically added Init operator for initialization is automatically added All the PlusCal labels are converted into TLA+ operators This state Terminating is necessary to prevent deadlock caused by stuttering. (Stuttering is related to a concept of crash-stop fault) The specification is - Starts with Init operator - Execute Next operator - WF_vars is related to stuttering This liveness check is automatically added This data structure is similar to Map and Dictionary data structure. In this case, tmp_value is like {“t1”: 0, “t2”: 0} /\ means AND and \/ means OR. ${variable}’ means updated variable
  12. Example : Lost Update Create a model Specify the Spec

    operator This model can check if any deadlock doesn’t happen In this example, no invariants are used Specify the liveness-ish properties here Select “New Model”
  13. Example : Lost Update Execute the model on the spec

    The final value was supposed to be 102, but the actual value was 101… “Temporal properties were violated” !!!
  14. Example : Fix Lost Update Txn 1 Txn 2 Values

    x: 100 Read(x) x: 100 Read(x) x: 101 Write(100 + 1) x: 101 Write(101 + 1) x: 102 Increment x by 1 Increment x by 1 T1 T2 R1(x0) R2(x1) W1(x1) W2(x2) WR(x) WW(x) 🎉 Serializable! AcquireLock(x) ReleaseLock(x) AcquireLock(x) ReleaseLock(x) How about using a lock?
  15. Example : Fix Lost Update Update the spec written in

    PlusCal locked is added The lock is acquired before the read and write operations The lock is released after the read and write operations
  16. Example : Fix Lost Update Execute the model on the

    spec All the possible cases were passed without any violation!
  17. Example : ScalarDB https://github.com/scalar-labs/scalardb/blob/master/tla%2B/consensus-commit/CCSpec.tla Interpreting this TLA+ spec cState contains

    a single coordinator state string. rState is called “function” which is very similar to Map and Dictionary data structures. In this case, the key is a record name and the value is a record state. This operator means: - Only when the record state is prepared - and the coordinator state is committed, - the record state will be updated to committed - the coordinator state won’t be updated In Next operator, either of these operators will be executed TypeOK and Consistent are kind of invariants
  18. Example : ScalarDB https://github.com/scalar-labs/scalardb/blob/master/tla%2B/consensus-commit/CCSpec.tla customers table (rState) - R1: initialized

    - R2: initialized coordinator.state table (cState) - state: initialized Possible states are: initialized, prepared, committed, aborted Possible states are: initialized, committed, aborted #1 Initialized
  19. Example : ScalarDB https://github.com/scalar-labs/scalardb/blob/master/tla%2B/consensus-commit/CCSpec.tla customers table (rState) - R1: prepared

    - R2: prepared coordinator.state table (cState) - state: initialized Possible states are: initialized, prepared, committed, aborted Possible states are: initialized, committed, aborted #2 Prepare records - rPrepare(R1) - rPrepare(R2)
  20. Example : ScalarDB https://github.com/scalar-labs/scalardb/blob/master/tla%2B/consensus-commit/CCSpec.tla customers table (rState) - R1: prepared

    - R2: prepared coordinator.state table (cState) - state: committed Possible states are: initialized, prepared, committed, aborted Possible states are: initialized, committed, aborted #3 Commit state - cCommit()
  21. Example : ScalarDB https://github.com/scalar-labs/scalardb/blob/master/tla%2B/consensus-commit/CCSpec.tla customers table (rState) - R1: committed

    - R2: committed coordinator.state table (cState) - state: committed Possible states are: initialized, prepared, committed, aborted Possible states are: initialized, committed, aborted #4 Commit records (Completed🍺) - rCommit(R1) - rCommit(R2)
  22. Example : ScalarDB https://github.com/scalar-labs/scalardb/blob/master/tla%2B/consensus-commit/CCSpec.tla cState: initialized rState(R1): initialized rState(R2): initialized

    cState: initialized rState(R1): prepared rState(R2): initialized cState: initialized rState(R1): initialized rState(R2): prepared cState: aborted rState(R1): initialized rState(R2): initialized cState: initialized rState(R1): prepared rState(R2): prepared cState: aborted rState(R1): prepared rState(R2): initialized cState: initialized rState(R1): prepared rState(R2): prepared cState: aborted rState(R1): initialized rState(R2): prepared cState: committed rState(R1): prepared rState(R2): prepared cState: aborted rState(R1): prepared rState(R2): prepared cState: committed rState(R1): prepared rState(R2): prepared cState: aborted rState(R1): prepared rState(R2): prepared TLA+ (actually TLC) automatically explores all the possible states! cState: committed rState(R1): committed rState(R2): prepared cState: committed rState(R1): prepared rState(R2): committed cState: committed rState(R1): committed rState(R2): committed cState: committed rState(R1): committed rState(R2): committed
  23. Wrap-up - This presentation talked about: - What TLA+, PlusCal

    and TLC are like - How to get started with TLA+ by using Lost Update anomaly as an example - Overview of the TLA+ spec for ScalarDB See also - https://lamport.azurewebsites.net/tla/tla.html - https://lamport.azurewebsites.net/tla/summary-standalone.pdf - https://learntla.com/index.html - https://github.com/Apress/practical-tla-plus