Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TLA+ for programmers

TLA+ for programmers

As developers, we have a number of well-known practices to ensure code quality, such as unit tests, code review and so on. But these practices often break down when we need to design concurrent systems. Often, there can be subtle and serious bugs that are not found with conventional practices.

But there’s another approach that you can use -- model-checking -- that can detect potential concurrency errors at design time, and so dramatically increase your confidence in your code. In this talk, I’ll demonstrate and demystify TLA+, a powerful design and model-checking system. We’ll see how it can check your concurrent designs for errors, saving you time up front and frustration later!

Scott Wlaschin

June 10, 2020
Tweet

More Decks by Scott Wlaschin

Other Decks in Programming

Transcript

  1. Building confidence in
    concurrent code
    using a model checker
    (aka TLA+ for programmers)
    @ScottWlaschin
    fsharpforfunandprofit.com
    Warning – this talk will have
    too much information!

    View full-size slide

  2. People who have
    written concurrent
    code
    People who have had
    weird painful bugs in
    concurrent code
    Why concurrent code in particular?

    View full-size slide

  3. People who have
    written concurrent
    code
    People who have had
    weird painful bugs in
    concurrent code
    Why concurrent code in particular?

    View full-size slide

  4. People who have
    written concurrent
    code
    People who have had
    weird painful bugs in
    concurrent code
    A perfect circle 
    Why concurrent code in particular?

    View full-size slide

  5. How many programmers are
    very confident about their code?

    View full-size slide

  6. "This code doesn't work
    and I don't know why"

    View full-size slide

  7. "This code works and
    I don't know why"

    View full-size slide

  8. Tools to improve confidence
    • Design
    – Domain driven design
    – Behavior driven design
    – Rapid prototyping
    – Modeling with UML etc
    • Coding
    – Static typing
    – Good libraries
    • Testing
    – TDD
    – Property-based testing
    – Canary testing

    View full-size slide

  9. Tools to improve confidence
    All of the above, plus
    • "Model checking"

    View full-size slide

  10. What is "model checking"?
    • Use a special DSL to design a "model"
    • Then "check" the model:
    – Are all the constraints met?
    – Does anything unexpected happen?
    – Does it deadlock?
    • This is part of a "formal methods" approach

    View full-size slide

  11. Two popular model checkers
    • TLA+ (TLC)
    – Focuses on temporal properties
    – Good for modeling concurrent systems
    • Alloy (Alloy Analyzer)
    – Focuses on relational logic
    – Good for modeling structures

    View full-size slide

  12. Two popular model checkers
    • TLA+ (TLC)
    – Focuses on temporal properties
    – Good for modeling concurrent systems
    • Alloy (Alloy Analyzer)
    – Focuses on relational logic
    – Good for modeling structures

    View full-size slide

  13. Start(s) == serverState[s] = "online_v1"
    /\ ~(\E other \in servers : serverState[other] = "offline")
    /\ serverState' = [serverState EXCEPT ![s] = "offline"]
    Finish(s) == serverState[s] = "offline"
    /\ serverState' = [serverState EXCEPT ![s] = "online_v2"]
    UpgradeStep == \E s \in servers : Start(s) \/ Finish(s)
    Done == \A s \in servers : serverState[s] = "online_v2"
    /\ UNCHANGED serverState
    Spec == /\ Init /\ [][Next]_serverState
    /\ WF_serverState(UpgradeStep)
    Here's what TLA+ looks like

    View full-size slide

  14. Start(s) == serverState[s] = "online_v1"
    /\ ~(\E other \in servers : serverState[other] = "offline")
    /\ serverState' = [serverState EXCEPT ![s] = "offline"]
    Finish(s) == serverState[s] = "offline"
    /\ serverState' = [serverState EXCEPT ![s] = "online_v2"]
    UpgradeStep == \E s \in servers : Start(s) \/ Finish(s)
    Done == \A s \in servers : serverState[s] = "online_v2"
    /\ UNCHANGED serverState
    Spec == /\ Init /\ [][Next]_serverState
    /\ WF_serverState(UpgradeStep)
    Here's what TLA+ looks like
    By the end of the talk you should be
    able to make sense of it!

    View full-size slide

  15. Time for some live polling!

    View full-size slide

  16. bit.ly/tlapoll

    View full-size slide

  17. Poll #1 results:
    "Can you see this poll?"
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  18. Outline of this talk
    • How confident are you?
    • Introducing TLA+
    • Examples:
    – Using TLA+ for a simple model
    – Checking a Producer/Consumer model
    – Checking a zero-downtime deployment model

    View full-size slide

  19. Part I
    How confident are you?

    View full-size slide

  20. To sort a list:
    1) If the list is empty or has 1 element, it is already sorted.
    So just return it unchanged.
    2) Otherwise, take the first element (called the "pivot")
    3) Divide the remaining elements into two piles:
    * those < than the pivot
    * those > than the pivot
    4) Sort each of the two piles using this sort algorithm
    5) Return the sorted list by concatenating:
    * the sorted "smaller" list
    * then the pivot
    * then the sorted "bigger" list
    Here's a spec for a sort algorithm

    View full-size slide

  21. To sort a list:
    1) If the list is empty or has 1 element, it is already sorted.
    So just return it unchanged.
    2) Otherwise, take the first element (called the "pivot")
    3) Divide the remaining elements into two piles:
    * those < than the pivot
    * those > than the pivot
    4) Sort each of the two piles using this sort algorithm
    5) Return the sorted list by concatenating:
    * the sorted "smaller" list
    * then the pivot
    * then the sorted "bigger" list
    Here's a spec for a sort algorithm
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  22. Poll #2 results:
    " What is your confidence in
    the design of this sort algorithm?"
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  23. To sort a list:
    1) If the list is empty or has 1 element, it is already sorted.
    So just return it unchanged.
    2) Otherwise, take the first element (called the "pivot")
    3) Divide the remaining elements into two piles:
    * those < than the pivot
    * those > than the pivot
    4) Sort each of the two piles using this sort algorithm
    5) Return the sorted list by concatenating:
    * the sorted "smaller" list
    * then the pivot
    * then the sorted "bigger" list
    Here's a spec for a sort algorithm

    View full-size slide

  24. Some approaches to gain confidence
    • Careful inspection and code review
    • Create an implementation
    and then test it thoroughly
    – E.g. Using property-based tests
    • Use mathematical proof assistant tool

    View full-size slide

  25. How confident are you when
    concurrency is involved?

    View full-size slide

  26. A concurrent producer/consumer system
    A queue
    Consumer spec (2 separate steps)
    1) Check if queue is not empty
    2) If true, then read item from queue
    Producer spec (2 separate steps)
    1) Check if queue is not full
    2) If true, then write item to queue
    Consumer
    reads from
    queue
    Producer
    writes to
    queue

    View full-size slide

  27. Given a bounded queue of items
    And 1 producer, 1 consumer running concurrently
    Constraints:
    * never read from an empty queue
    * never add to a full queue
    Producer spec (separate steps)
    1) Check if queue is not full
    2) If true, then write item to queue
    3) Go to step 1
    Consumer spec (separate steps)
    1) Check if queue is not empty
    2) If true, then read item from queue
    3) Go to step 1
    A spec for a producer/consumer system
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  28. Poll #3 results:
    "What is your confidence in the design
    of this producer/consumer system?"
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  29. Given a bounded queue of items
    And 2 producers, 2 consumers running concurrently
    Constraints:
    * never read from an empty queue
    * never add to a full queue
    Producer spec (separate steps)
    1) Check if queue is not full
    2) If true, then write item to queue
    3) Go to step 1
    Consumer spec (separate steps)
    1) Check if queue is not empty
    2) If true, then read item from queue
    3) Go to step 1
    A spec for a producer/consumer system
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  30. Poll #4 results:
    " What is your confidence in the design
    of this producer/consumer system
    (now with multiple clients)?"

    View full-size slide

  31. Being confident in the
    design of concurrent systems
    is hard

    View full-size slide

  32. How to gain confidence for concurrency?
    • Careful inspection and code review
    – Human intuition for concurrency is very bad
    • Create an implementation and then test it
    – Many concurrency errors might never show up
    • Use mathematical proof assistant tool
    – A model checker is much easier!

    View full-size slide

  33. Part II
    Introducing TLA+

    View full-size slide

  34. Stand Back!
    I'm going to use
    Mathematics!

    View full-size slide

  35. TLA+ was designed by Leslie Lamport
    – Famous "Time & Clocks" paper
    – Paxos algorithm for consensus
    – Turing award winner
    – Initial developer of LaTeX

    View full-size slide

  36. TLA+ stands for
    – Temporal
    – Logic
    – of Actions
    – plus …

    View full-size slide

  37. TLA+ stands for
    – Temporal
    – Logic
    – of Actions
    – plus …

    View full-size slide

  38. TLA+ stands for
    – Temporal
    – Logic
    – of Actions
    – plus …

    View full-size slide

  39. TLA+ stands for
    – Temporal
    – Logic
    – of Actions
    – plus …

    View full-size slide

  40. TLA+ stands for
    – Temporal
    – Logic
    – of Actions
    – plus …

    View full-size slide

  41. TLA+ stands for
    – Temporal
    – Logic
    – of Actions
    – plus …

    View full-size slide

  42. The "Logic" in TLA+

    View full-size slide

  43. Boolean Logic
    Boolean Mathematics TLA+ Programming
    AND a ∧ b a /\ b a && b
    OR a ∨ b a \/ b a || b
    NOT ¬a ~a !a; not a
    You all know how
    this works, I hope!

    View full-size slide

  44. Boolean Logic
    A "predicate" is an expression that returns a boolean
    \* TLA-style definition
    operator(a,b,c) ==
    (a /\ b) \/ (a /\ ~c)
    // programming language definition
    function(a,b,c) {
    (a && b) || (a && !c)
    }

    View full-size slide

  45. The "Actions" in TLA+
    a.k.a. state transitions

    View full-size slide

  46. State A State B State C
    Transition from A to B
    A state machine
    Transition from B to A
    Transition
    from B to C

    View full-size slide

  47. White to play
    Black to
    play
    Game
    Over
    White plays
    and wins
    Black plays
    White plays
    Black plays
    and wins
    States and transitions for a chess game

    View full-size slide

  48. Undelivered
    Out for
    delivery
    Delivered
    Send out for delivery
    Address
    not found
    Signed for
    Failed
    Delivery
    Redeliver
    States and transitions for deliveries

    View full-size slide

  49. "hello" "goodbye"
    States and transitions in TLA+
    State before State after
    state = "hello"
    In TLA+
    state' = "goodbye"
    In TLA+
    An "action"

    View full-size slide

  50. "hello" "goodbye"
    States and transitions in TLA+
    Next ==
    state = "hello"
    /\ state' = "goodbye"
    In TLA+, define the action "Next" like this
    Next
    Or in English:
    state before is "hello"
    AND state after is "goodbye"

    View full-size slide

  51. "hello" "goodbye"
    States and transitions in TLA+
    Next ==
    state = "hello"
    /\ state' = "goodbye"
    Next

    View full-size slide

  52. "hello" "goodbye"
    States and transitions in TLA+
    Next ==
    state' = "goodbye"
    /\ state = "hello"
    Next

    View full-size slide

  53. Actions are not assignments.
    Actions are tests
    state = "hello" /\ state' = "goodbye"
    "hello" "goodbye"
     Does match
    "hello" "ciao" Doesn't match

    "howdy" "goodbye" Doesn't match

    View full-size slide

  54. The "Temporal" in TLA+

    View full-size slide

  55. TLA+ models a series of state transitions over time
    In TLA+ you can ask questions like:
    • Is something always true?
    • Is something ever true?
    • If X happens, must Y happen afterwards?

    View full-size slide

  56. Temporal Logic of Actions
    Boolean logic of state transitions over time

    View full-size slide

  57. Temporal Logic of Actions
    Boolean logic of state transitions over time

    View full-size slide

  58. Temporal Logic of Actions
    Boolean logic of state transitions over time

    View full-size slide

  59. Temporal Logic of Actions
    Boolean logic of state transitions over time

    View full-size slide

  60. Part III
    Using TLA+ for a simple model

    View full-size slide

  61. Count to three
    1 2 3
    // programming language version
    var x = 1
    x = 2
    x = 3

    View full-size slide

  62. Count to three
    1 2 3
    \* TLA version
    Init == \* initial state
    x=1
    Next == \* transition
    (x=1 /\ x'=2) \* match step 1
    \/ (x=2 /\ x'=3) \* or match step 2

    View full-size slide

  63. Count to three
    1 2 3
    Init ==
    x=1
    Next ==
    (x=1 /\ x'=2) \* match step 1
    \/ (x=2 /\ x'=3) \* or match step 2

    View full-size slide

  64. Count to three
    1 2 3
    Init ==
    x=1
    Next ==
    (x=1 /\ x'=2) \* match step 1
    \/ (x=2 /\ x'=3) \* or match step 2

    View full-size slide

  65. Count to three
    1 2 3
    Init ==
    x=1
    Next ==
    (x=1 /\ x'=2) \* match step 1
    \/ (x=2 /\ x'=3) \* or match step 2

    View full-size slide

  66. Count to three
    1 2 3
    Init ==
    x=1
    Next ==
    (x=1 /\ x'=2) \* match step 1
    \/ (x=2 /\ x'=3) \* or match step 2

    View full-size slide

  67. A quick refactor

    View full-size slide

  68. Count to three, refactored
    1 2 3
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Next == Step1 \/ Step2
    Refactored version.
    Steps are now explicitly named

    View full-size slide

  69. Count to three, refactored
    1 2 3
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Next == Step1 \/ Step2

    View full-size slide

  70. Introducing the TLA+ Toolbox
    (the IDE)

    View full-size slide

  71. This is the TLA+ Toolbox app

    View full-size slide

  72. b) Tell the model checker what
    the initial and next states are

    View full-size slide

  73. c) Run the model checker

    View full-size slide

  74. And if we run this script?
    • Detects "3 distinct states"
    – Good – what we expected
    • But also "Deadlock reached"
    – Bad!

    View full-size slide

  75. 1 2 3
    So "Count to three" deadlocks when it reaches 3
    If there is no valid transition available,
    that is what TLA+ calls a "deadlock"

    View full-size slide

  76. It's important to think of these state machines as
    an infinite series of state transitions.
    1 2 3 ? ? ?

    View full-size slide

  77. When we're "done", we can say that
    a valid transition is from 3 to 3, forever
    1 2 3 3 3 3

    View full-size slide

  78. Updated "Count to three"
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Done == x=3 /\ UNCHANGED x
    Next == Step1 \/ Step2 \/ Done

    1 2 3

    View full-size slide

  79. Doing nothing is
    always an option

    View full-size slide

  80. Staying in the same state is
    almost always a valid state transition!
    1 1 2 2 3 3
    What is the difference between these two systems?
    1 2 3
    1 -> 1 2 -> 2 3 -> 3

    View full-size slide

  81. "Count to three" with stuttering
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Done == x=3 /\ UNCHANGED x
    Next == Step1 \/ Step2 \/ Done \/ UNCHANGED x
    1 2 3

    View full-size slide

  82. Part IV
    The Power of Temporal Properties

    View full-size slide

  83. Temporal properties
    A property applies to the whole system over time
    – Not just to individual states
    Checking these properties is important
    – Humans are bad at this
    – Programming languages are bad at this too
    – TLA+ is good at this!

    View full-size slide

  84. Useful properties to check
    • Always true
    – For all states, "x > 0"
    • Eventually true
    – At some point in time, "x = 2"
    • Eventually always
    – x eventually becomes 3 and then stays there
    • Leads to
    – if x ever becomes 2 then it will become 3 later

    View full-size slide

  85. Properties for "count to three"
    In English Formally In TLA+
    x is always > 2 Always (x > 0) [] (x > 0)

    View full-size slide

  86. Properties for "count to three"
    In English Formally In TLA+
    x is always > 2 Always (x > 0) [] (x > 0)
    At some point
    x is 2
    Eventually (x = 2) <> (x = 2)

    View full-size slide

  87. Properties for "count to three"
    In English Formally In TLA+
    x is always > 2 Always (x > 0) [] (x > 0)
    At some point
    x is 2
    Eventually (x = 2) <> (x = 2)
    x eventually
    becomes 3 and
    then stays there.
    Eventually (Always (x = 3)) <>[] (x = 3)

    View full-size slide

  88. Properties for "count to three"
    In English Formally In TLA+
    x is always > 2 Always (x > 0) [] (x > 0)
    At some point
    x is 2
    Eventually (x = 2) <> (x = 2)
    x eventually
    becomes 3 and
    then stays there.
    Eventually (Always (x = 3)) <>[] (x = 3)
    if x ever becomes
    2 then it will
    become 3 later.
    (x=2) leads to (x=3) (x=2) ~> (x=3)

    View full-size slide

  89. Adding properties to the script
    \* Always, x >= 1 && x <= 3
    AlwaysWithinBounds == [](x >= 1 /\ x <= 3)
    \* At some point, x = 2
    EventuallyTwo == <>(x = 2)
    \* At some point, x = 3 and stays there
    EventuallyAlwaysThree == <>[](x = 3)
    \* Whenever x=2, then x=3 later
    TwoLeadsToThree == (x = 2) ~> (x = 3)

    View full-size slide

  90. Tell the model checker what
    the properties are,
    and run the model checker again
    Adding properties to the model in the TLA+ toolbox

    View full-size slide

  91. Adding properties to the script
    \* Always, x >= 1 && x <= 3
    AlwaysWithinBounds == [](x >= 1 /\ x <= 3)
    \* At some point, x = 2
    EventuallyTwo == <>(x = 2)
    \* At some point, x = 3 and stays there
    EventuallyAlwaysThree == <>[](x = 3)
    \* Whenever x=2, then x=3 later
    TwoLeadsToThree == (x = 2) ~> (x = 3)
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  92. Poll #5 results:
    "How many of these
    properties are true?"
    Link to live poll: bit.ly/tlapoll

    View full-size slide

  93. Oh no! The model checker says there are errors!

    View full-size slide

  94. Who forgot about stuttering?
    1 2 3

    View full-size slide

  95. How to fix this?
    • Make sure every possible transition is followed
    • Not just stay stuck in a infinite loop!
    This is called "fairness"

    View full-size slide

  96. How can we model fairness in TLA+?
    We have to do some refactoring first
    Then we can add fairness to the spec
    (warning: the syntax is a bit ugly)

    View full-size slide

  97. How to fix?
    Refactor #1: change the spec to merge init/next
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Done == x=3 /\ UNCHANGED x
    Next == Step1 \/ Step2 \/ Done
    Spec = Init /\ [](Next \/ UNCHANGED x)

    View full-size slide

  98. How to fix?
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Done == x=3 /\ UNCHANGED x
    Next == Step1 \/ Step2 \/ Done
    Spec = Init /\ [](Next \/ UNCHANGED x)
    Refactor #1: change the spec to merge init/next

    View full-size slide

  99. How to fix?
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Done == x=3 /\ UNCHANGED x
    Next == Step1 \/ Step2 \/ Done
    Spec = Init /\ [](Next \/ UNCHANGED x)
    Refactor #1: change the spec to merge init/next

    View full-size slide

  100. Spec = Init /\ [](Next \/ UNCHANGED x)
    Refactor #2: Use a special syntax for stuttering
    Before

    View full-size slide

  101. Spec = Init /\ [][Next]_x
    Refactor #2: Use a special syntax for stuttering
    After

    View full-size slide

  102. Spec = Init /\ [][Next]_x
    Refactor #3: Now we can add fairness!

    View full-size slide

  103. Spec = Init /\ [][Next]_x /\ WF_x(Next)
    Refactor #3: Now we can add fairness!
    With fairness

    View full-size slide

  104. Spec = Init /\ [][Next]_x /\ WF_x(Next)
    Refactor #3: Now we can add fairness!
    With fairness

    View full-size slide

  105. The complete spec with fairness
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Done == x=3 /\ UNCHANGED x
    Next == Step1 \/ Step2 \/ Done
    Spec == Init /\ [][Next]_x /\ WF_x(Next)

    \* properties to check
    AlwaysWithinBounds == [](x >= 1 /\ x <= 3)
    EventuallyTwo == <>(x = 2)
    EventuallyAlwaysThree == <>[](x = 3)
    TwoLeadsToThree == (x = 2) ~> (x = 3)

    View full-size slide

  106. The complete spec with fairness
    Init == x=1
    Step1 == x=1 /\ x'=2
    Step2 == x=2 /\ x'=3
    Done == x=3 /\ UNCHANGED x
    Next == Step1 \/ Step2 \/ Done
    Spec == Init /\ [][Next]_x /\ WF_x(Next)
    \* properties to check
    AlwaysWithinBounds == [](x >= 1 /\ x <= 3)
    EventuallyTwo == <>(x = 2)
    EventuallyAlwaysThree == <>[](x = 3)
    TwoLeadsToThree == (x = 2) ~> (x = 3)

    View full-size slide

  107. Part V
    Using TLA+ to model the
    producer/consumer examples

    View full-size slide

  108. Modeling a Producer/Consumer system
    A queue
    Consumer spec (2 separate steps)
    1) Check if queue is not empty
    2) If true, then read item from queue
    Producer spec (2 separate steps)
    1) Check if queue is not full
    2) If true, then write item to queue
    Consumer
    reads from
    queue
    Producer
    writes to
    queue

    View full-size slide

  109. ready canWrite
    CheckWritable Write
    States for a Producer
    We're choosing to model this
    as two distinct state transitions,
    not one atomic step

    View full-size slide

  110. ready canWrite
    CheckWritable Write
    States for a Producer
    def CheckWritable():
    if (queueSize < MaxQueueSize)
    && (producerState = "ready")
    then
    producerState = "canWrite";
    def Write():
    if producerState = "canWrite"
    then
    producerState = "ready";
    queueSize = queueSize + 1;

    View full-size slide

  111. States for a Producer
    CheckWritable ==
    producerState = "ready"
    /\ queueSize < MaxQueueSize
    /\ producerState' = "canWrite" \* transition
    /\ UNCHANGED queueSize
    ready canWrite
    CheckWritable Write
    Write ==
    producerState = "canWrite"
    /\ producerState' = "ready" \* transition
    /\ queueSize' = queueSize + 1 \* push to queue
    ProducerAction == CheckWritable \/ Write
    All the valid actions
    for a producer

    View full-size slide

  112. States for a Consumer
    CheckReadable ==
    consumerState = "ready"
    /\ queueSize > 0
    /\ consumerState' = "canRead" \* transition
    /\ UNCHANGED queueSize
    Read ==
    consumerState = "canRead"
    /\ consumerState' = "ready" \* transition
    /\ queueSize' = queueSize - 1 \* pop from queue
    ConsumerAction == CheckReadable \/ Read
    ready canRead
    CheckReadable Read
    All the valid actions
    for a consumer

    View full-size slide

  113. Complete TLA+ script (1/2)
    VARIABLES
    queueSize,
    producerState,
    consumerState
    MaxQueueSize == 2 \* can be small
    Init ==
    queueSize = 0
    /\ producerState = "ready"
    /\ consumerState = "ready"
    CheckWritable ==
    producerState = "ready"
    /\ queueSize < MaxQueueSize
    /\ producerState' = "canWrite"
    /\ UNCHANGED queueSize
    /\ UNCHANGED consumerState
    Write ==
    producerState = "canWrite"
    /\ producerState' = "ready"
    /\ queueSize' = queueSize + 1
    /\ UNCHANGED consumerState
    ProducerAction ==
    CheckWritable \/ Write

    View full-size slide

  114. Complete TLA+ script (2/2)
    CheckReadable ==
    consumerState = "ready"
    /\ queueSize > 0
    /\ consumerState' = "canRead"
    /\ UNCHANGED queueSize
    /\ UNCHANGED producerState
    Read ==
    consumerState = "canRead"
    /\ consumerState' = "ready"
    /\ queueSize' = queueSize – 1
    /\ UNCHANGED producerState
    ConsumerAction ==
    CheckReadable \/ Read
    Next ==
    ProducerAction
    \/ ConsumerAction

    View full-size slide

  115. Complete TLA+ script (2/2)
    CheckReadable ==
    consumerState = "ready"
    /\ queueSize > 0
    /\ consumerState' = "canRead"
    /\ UNCHANGED queueSize
    /\ UNCHANGED producerState
    Read ==
    consumerState = "canRead"
    /\ consumerState' = "ready"
    /\ queueSize' = queueSize – 1
    /\ UNCHANGED producerState
    ConsumerAction ==
    CheckReadable \/ Read
    Next ==
    ProducerAction
    \/ ConsumerAction

    View full-size slide

  116. Complete TLA+ script (2/2)
    CheckReadable ==
    consumerState = "ready"
    /\ queueSize > 0
    /\ consumerState' = "canRead"
    /\ UNCHANGED queueSize
    /\ UNCHANGED producerState
    Read ==
    consumerState = "canRead"
    /\ consumerState' = "ready"
    /\ queueSize' = queueSize – 1
    /\ UNCHANGED producerState
    ConsumerAction ==
    CheckReadable \/ Read
    Next ==
    ProducerAction
    \/ ConsumerAction
    \/ (UNCHANGED producerState
    /\ UNCHANGED consumerState
    /\ UNCHANGED queueSize)

    View full-size slide

  117. AlwaysWithinBounds ==
    [] (queueSize >= 0
    /\ queueSize <= MaxQueueSize)
    What are the temporal properties for
    the producer/consumer design?

    View full-size slide

  118. And if we run this script?
    • Detects "8 distinct states"
    – Good
    • No errors!
    – Means invariant was always true.
    – We now have confidence in this design!
    – But only with a single producer/consumer
    We don't need to guess, as
    we did in the earlier poll!

    View full-size slide

  119. Now let's do a
    concurrent version!

    View full-size slide

  120. Time for the "Plus" in TLA+

    View full-size slide

  121. TLA plus… Set theory
    Set theory Mathematics TLA+ Programming
    e is an element of set S e ∈ S e \in S
    Define a set by
    enumeration
    {1,2,3} {1,2,3} [1,2,3]
    Define a set by
    predicate "p"
    { e ∈ S | p } {e \in S : p} Set.filter(p)
    For all e in Set, some
    predicate "p" is true
    ∀ e ∈ S : p \A e \in S : p Set.all(p)
    There exists e in Set
    such that some
    predicate "p" is true
    ∃ e ∈ S : p \E x \in S : p Set.any(p)

    View full-size slide

  122. Plus… Set theory
    Set theory Mathematics TLA Programming
    e is an element of set S e ∈ S e \in S
    Define a set by
    enumeration
    {1,2,3} {1,2,3} [1,2,3]
    Define a set by
    predicate "p"
    { e ∈ S | p } {e \in S : p} Set.filter(p)
    For all e in Set, some
    predicate "p" is true
    ∀ e ∈ S : p \A e \in S : p Set.all(p)
    There exists e in Set
    such that some
    predicate "p" is true
    ∃ e ∈ S : p \E x \in S : p Set.any(p)
    Set theory Mathematics TLA+ Programming
    e is an element of set S e ∈ S e \in S
    Define a set by
    enumeration
    {1,2,3} {1,2,3} [1,2,3]
    Define a set by
    predicate "p"
    { e ∈ S | p } {e \in S : p} Set.filter(p)
    For all e in Set, some
    predicate "p" is true
    ∀ e ∈ S : p \A e \in S : p Set.all(p)
    There exists e in Set
    such that some
    predicate "p" is true
    ∃ e ∈ S : p \E x \in S : p Set.any(p)

    View full-size slide

  123. Plus… Set theory
    Set theory Mathematics TLA+ Programming
    e is an element of set S e ∈ S e \in S
    Define a set by
    enumeration
    {1,2,3} {1,2,3} [1,2,3]
    Define a set by
    predicate "p"
    { e ∈ S | p } {e \in S : p} Set.filter(p)
    For all e in Set, some
    predicate "p" is true
    ∀ e ∈ S : p \A e \in S : p Set.all(p)
    There exists e in Set
    such that some
    predicate "p" is true
    ∃ e ∈ S : p \E x \in S : p Set.any(p)

    View full-size slide

  124. • We need
    – a set of producers
    – a set of consumers
    • Need to use the set-description part of TLA+
    producers={"p1","p2"}
    consumers={"c1","c2"}

    View full-size slide

  125. CONSTANT producers, consumers
    \* e.g
    \* 2 producers={"p1","p2"}
    \* 2 consumers={"c1","c2"}
    VARIABLES queueSize, producerState, consumerState
    MaxQueueSize == 2
    Init ==
    queueSize = 0
    /\ producerState = [p \in producers |-> "ready"]
    \* same as {"p1":"ready","p2":"ready"}
    /\ consumerState = [c \in consumers |-> "ready"]
    Producer/Consumer Spec, part 1

    View full-size slide

  126. CONSTANT producers, consumers
    \* e.g
    \* 2 producers={"p1","p2"}
    \* 2 consumers={"c1","c2"}
    VARIABLES queueSize, producerState, consumerState
    MaxQueueSize == 2
    Init ==
    queueSize = 0
    /\ producerState = [p \in producers |-> "ready"]
    \* same as {"p1":"ready","p2":"ready"}
    /\ consumerState = [c \in consumers |-> "ready"]
    For each producer, set
    the state to be "ready"
    Producer/Consumer Spec, part 1

    View full-size slide

  127. CheckWritable(p) ==
    producerState[p] = "ready"
    /\ queueSize < MaxQueueSize
    /\ producerState' =
    [producerState EXCEPT ![p] = "canWrite"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED consumerState
    Producer/Consumer Spec, part 2

    View full-size slide

  128. CheckWritable(p) ==
    producerState[p] = "ready"
    /\ queueSize < MaxQueueSize
    /\ producerState' =
    [producerState EXCEPT ![p] = "canWrite"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED consumerState
    Parameterized by a producer
    Update one element of the
    state map/dictionary only
    Check the state

    View full-size slide

  129. Write(p) ==
    producerState[p] = "canWrite"
    /\ queueSize' = queueSize + 1
    /\ producerState' =
    [producerState EXCEPT ![p] = "ready"]
    /\ UNCHANGED consumerState
    ProducerAction ==
    \E p \in producers : CheckWritable(p) \/ Write(p)
    Producer/Consumer Spec, part 2
    CheckWritable(p) ==
    producerState[p] = "ready"
    /\ queueSize < MaxQueueSize
    /\ producerState' =
    [producerState EXCEPT ![p] = "canWrite"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED consumerState

    View full-size slide

  130. CheckWritable(p) ==
    producerState[p] = "ready"
    /\ queueSize < MaxQueueSize
    /\ producerState' =
    [producerState EXCEPT ![p] = "canWrite"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED consumerState
    Write(p) ==
    producerState[p] = "canWrite"
    /\ queueSize' = queueSize + 1
    /\ producerState' =
    [producerState EXCEPT ![p] = "ready"]
    /\ UNCHANGED consumerState
    ProducerAction ==
    \E p \in producers : CheckWritable(p) \/ Write(p)
    Find any producer which has a valid action
    Producer/Consumer Spec, part 2

    View full-size slide

  131. CheckReadable(c) ==
    consumerState[c] = "ready"
    /\ queueSize > 0
    /\ consumerState' =
    [consumerState EXCEPT ![c] = "canRead"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED producerState
    Producer/Consumer Spec, part 3

    View full-size slide

  132. CheckReadable(c) ==
    consumerState[c] = "ready"
    /\ queueSize > 0
    /\ consumerState' =
    [consumerState EXCEPT ![c] = "canRead"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED producerState
    Parameterized by a consumer
    Update one element of the
    state map/dictionary only
    Check the state

    View full-size slide

  133. Read(c) ==
    consumerState[c] = "canRead"
    /\ queueSize' = queueSize - 1
    /\ consumerState' =
    [consumerState EXCEPT ![c] = "ready"]
    /\ UNCHANGED producerState
    ConsumerAction ==
    \E c \in consumers : CheckReadable(c) \/ Read(c)
    CheckReadable(c) ==
    consumerState[c] = "ready"
    /\ queueSize > 0
    /\ consumerState' =
    [consumerState EXCEPT ![c] = "canRead"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED producerState
    Producer/Consumer Spec, part 3

    View full-size slide

  134. CheckReadable(c) ==
    consumerState[c] = "ready"
    /\ queueSize > 0
    /\ consumerState' =
    [consumerState EXCEPT ![c] = "canRead"]
    /\ UNCHANGED queueSize
    /\ UNCHANGED producerState
    Read(c) ==
    consumerState[c] = "canRead"
    /\ queueSize' = queueSize - 1
    /\ consumerState' =
    [consumerState EXCEPT ![c] = "ready"]
    /\ UNCHANGED producerState
    ConsumerAction ==
    \E c \in consumers : CheckReadable(c) \/ Read(c)
    Find any consumer which has a valid action

    View full-size slide

  135. And if we run this script?
    • Run model checker with 2 producers, 2 consumers
    – And same "AlwaysWithinBounds" property
    • Detects 38 distinct states now
    – Too many for human inspection
    • Error: "Invariant AlwaysWithinBounds is violated"
    – We are confident that this design doesn't work!
    We don't need to guess, as
    we did in the earlier poll!

    View full-size slide

  136. Fixing the error
    • TLA+ won't tell you how to fix it
    – You have to think!
    • But it is easy to test fixes:
    – Update the model with the fix
    • Atomic operations (or locks, or whatever)
    – Then rerun the model checker
    – You have confidence that the fix works (or not!)
    • All this in only 50 lines of code

    View full-size slide

  137. Part VI
    Using TLA+ to model
    zero-downtime deployment

    View full-size slide

  138. Using TLA+ as a tool to improve design
    The process is:
    – Sketch the design in TLA+
    – Then check it with the model checker
    – Then fix it
    – Then check it again
    – Repeat until TLA+ says the design is correct
    Think of it as TDD but for concurrency design
    Red Green
    Remodel

    View full-size slide

  139. Modeling a zero-downtime deployment
    What to model
    – We have a bunch of servers
    – Each server must be upgraded from v1 to v2
    – Each server goes offline during the upgrade
    Conditions to check
    – There must always be an online server
    – All servers must be upgraded eventually
    Idea credit: https://www.hillelwayne.com/post/modeling-deployments/

    View full-size slide

  140. Online(v1) Offline
    Start
    Sketching the design
    \* a dictionary of key/value pairs: server => state
    VARIABLES serverState
    Init == serverState = [s \in servers |-> "online_v1"]
    Start(s) ==
    serverState[s] = "online_v1"
    /\ serverState' = [serverState EXCEPT ![s] = "offline"]
    Finish(s) ==
    serverState[s] = "offline"
    /\ serverState' = [serverState EXCEPT ![s] = "online_v2"]
    Online(v2)
    Finish Done
    Server state

    View full-size slide

  141. Online(v1) Offline
    Start
    Sketching the design
    \* try to find a server to start or finish
    UpgradeStep == \E s \in servers : Start(s) \/ Finish(s)
    \* done if ALL servers are finished
    Done ==
    \A s \in servers : serverState[s] = "online_v2"
    /\ UNCHANGED serverState
    \* overall state transition
    Next == UpgradeStep \/ Done
    Online(v2)
    Finish Done
    Server state

    View full-size slide

  142. Stop and check
    • Run the script now to check our assumptions
    – With 1 server: 3 distinct states (as expected)
    – With 2 servers: 9 distinct states
    – With 3 servers: 27 distinct states
    • The number of states gets large very quickly!
    – Eyeballing for errors will not work

    View full-size slide

  143. Now let's add some properties
    • Zero downtime
    – "Not all servers should be offline at once"
    • Upgrade should complete
    – "All servers should eventually be upgraded to v2"
    Temporal properties

    View full-size slide

  144. \* It is always true that there exists
    \* a server that is not offline (!= is /= in TLA)
    ZeroDowntime ==
    [](\E s \in servers : serverState[s] /= "offline")
    Temporal properties
    Always, there exists a
    server, such that
    the state for
    that server
    is not
    "offline"

    View full-size slide

  145. \* Eventually, all servers will be online at v2
    EventuallyUpgraded ==
    <>(\A s \in servers : serverState[s] = "online_v2")
    Temporal properties
    eventually for all servers the state for
    that server
    is "v2"
    \* It is always true that there exists
    \* a server that is not offline (!= is /= in TLA)
    ZeroDowntime ==
    [](\E s \in servers : serverState[s] /= "offline")

    View full-size slide

  146. Running the script
    If we run this script with two servers
    Error: "Invariant ZeroDowntime is violated"
    The model checker trace shows us how:
    s1 -> "online_v1", s2 -> "online_v1"
    s1 -> "offline", s2 -> "online_v1"
    s1 -> "offline", s2 -> "offline" // boom!
    No problem, we think we
    have a fix for this

    View full-size slide

  147. Improving the design with upgrade condition
    Start(s) ==
    \* server is ready
    serverState[s] = "online_v1"
    \* NEW: there does not exist any other server which is offline
    /\ ~(\E other \in servers : serverState[other] = "offline")
    \* then transition
    /\ serverState' = [serverState EXCEPT ![s] = "offline"]
    A new condition for the Start action:
    You can only transition to "offline" if no other servers are offline.

    View full-size slide

  148. Running the script
    Now re-run this script with two servers
    • "ZeroDowntime" works
    – We have confidence in the design!
    • "EventuallyUpgraded" fails
    – Because of stuttering
    – But add fairness and it works again, yay!
    We now have confidence in the design!

    View full-size slide

  149. Adding another condition
    New rule! All online servers must be running the same version
    \* Define the set of servers which are online.
    OnlineServers ==
    { s \in servers : serverState[s] /= "offline" }
    \* It is always true that
    \* any two online servers are the same version
    SameVersion ==
    [] (\A s1,s2 \in OnlineServers :
    serverState[s1] = serverState[s2])

    View full-size slide

  150. Running the script
    Now run this script with the new property
    Error "Invariant SameVersion is violated"
    The model checker trace shows us how:
    s1 -> "online_v1", s2 -> "online_v1"
    s1 -> "offline", s2 -> "online_v1"
    s1 -> "online_v2", s2 -> "online_v1" // boom!
    Let's add a load balancer to fix this

    View full-size slide

  151. Improving the design with a load balancer
    VARIABLES serverState, loadBalancer
    \* initialize all servers to "online_v1"
    Init == serverState = [s \in servers |-> "online_v1"]
    /\ loadBalancer = "v1"
    \* the online servers depend on the load balancer
    OnlineServers ==
    IF loadBalancer = "v1"
    THEN { s \in servers : serverState[s] = "online_v1" }
    ELSE { s \in servers : serverState[s] = "online_v2" }
    The load balancer points to only "v1" or "v2" servers

    View full-size slide

  152. Improving the design with a load balancer
    Finish(s) ==
    serverState[s] = "down"
    /\ serverState' = [serverState EXCEPT ![s] = "online_v2"]
    \* and load balancer can point to v2 pool now
    /\ loadBalancer' = "v2"
    Then, when one server has successfully upgraded,
    the load balancer can switch over to using v2

    View full-size slide

  153. Running the script
    Now re-run this script with the load balancer
    • "ZeroDowntime" works
    • "EventuallyUpgraded" works
    • "SameVersion" works

    View full-size slide

  154. Our sketch is complete (for now)
    Think of TLA+ as "agile" modeling
    for software systems
    A few minutes of sketching =>
    much more confidence!

    View full-size slide

  155. Some common questions
    • How to handle failures?
    – Just add failure cases to the state diagram!
    • How does this model convert to code?
    – It doesn't! Modeling is a tool for thinking, not a
    code generator.
    – It's about having confidence in the design.

    View full-size slide

  156. Conclusion
    • TLA+ and model checking is not that scary
    – It's just agile modeling for software systems!
    – For concurrency, it's essential
    – Check it out! A bigger toolbox is a good thing to have
    • TLA+ can do much more than I showed today
    – Not just model checking, but refinements, proofs, etc
    • More information:
    – TLA+ Home Page with videos, book, papers, etc
    – learntla.com book (and trainings!) by Hillel Wayne

    View full-size slide

  157. Slides and video here
    fsharpforfunandprofit.com/tlaplus
    Thank you!
    "Domain Modeling Made Functional" book
    fsharpforfunandprofit.com/books
    @ScottWlaschin
    Me on twitter

    View full-size slide