Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Consistency without Clocks: The FaunaDB Distributed Transaction Protocol

Consistency without Clocks: The FaunaDB Distributed Transaction Protocol

Jeferson David Ossa

May 17, 2019
Tweet

More Decks by Jeferson David Ossa

Other Decks in Technology

Transcript

  1. Consistency without Clocks:
    The FaunaDB Distributed
    Transaction Protocol

    View Slide

  2. About me
    Jeferson David Ossa
    @unyagami on twitter
    FP Developer at s4n.co
    https://jobs.lever.co/s4n

    View Slide

  3. FaunaDB
    A distributed, indexed document store based on the Calvin
    transaction protocol that provides snapshot isolation up to strict
    serializability and does not rely on physical clock synchronization to
    maintain consistency.
    ACID transactions with up to serializable isolation.
    Linearizable, consistent operations across replicas geographically
    distributed.

    View Slide

  4. FaunaDB
    A distributed, indexed document store based on the Calvin
    transaction protocol that provides snapshot isolation up to strict
    serializability and does not rely on physical clock synchronization to
    maintain consistency.
    ACID transactions with up to serializable isolation.
    Linearizable, consistent operations across replicas geographically
    distributed.

    View Slide

  5. Calvin: Fast Distributed Transactions for Partitioned
    Database Systems
    T2
    T3
    T1
    Tn
    sequencer
    Log
    T1 T2 T3 Tn
    Replica 1
    Replica 2
    Replica X
    ordering transactions, and actually
    executing those transactions, are
    separable problems

    View Slide

  6. Snapshot Isolation
    Each transaction appears to operate on an independent, consistent
    snapshot of the database and are external consistent (Tn-1 is
    visible to Tn).
    If transaction T1 has modified an object X, and another transaction
    T2 committed a write to X after T1’s snapshot began, and before
    T1’s commit, then T1 must abort.

    View Slide

  7. Strict Serializability
    Serializability: transactions appear to have occurred in some total
    order.
    +
    Linearizability: operation appears to take place atomically, in some
    order, consistent with the real-time ordering of those operations.
    =
    Transactions with total order and real-time constraints.

    View Slide

  8. Serializable Isolation
    The system can process many transactions in parallel, but the final
    result is equivalent to processing them one after another.
    For most database systems, the order is not determined in
    advance. Instead, transactions are run in parallel, and some variant
    of locking is used to ensure that the final result is equivalent to
    some serial order.

    View Slide

  9. FaunaDB’s replication protocol
    Caveat:
    FaunaDB’s replication protocol uses consensus, not wall clocks, to
    construct its transaction logs but still relies on wall clocks to decide
    when to seal time windows in the log, which means that clock skew
    can delay transaction processing.

    View Slide

  10. Transaction Log & Snapshots
    1:00 P.M. 1:01 P.M. 1:03 P.M. 1:07 P.M.
    T1 T2 T3 T4
    Time Stamp Customer ID Credit
    T4 1 50
    T1 2 100
    T3 3 200
    T2 2 50

    View Slide

  11. Node 2
    Node 1
    Replication
    Time Stamp Ticket Price Stock
    T3 1 100 12
    Replica ABC
    Node
    1
    Node
    2
    Replica XYZ
    Node
    1
    Node
    2
    Time Stamp Customer ID Credit
    T1 2 200
    T4 2 100
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T3 3 80 1
    Time Stamp Customer ID Credit
    T0 7 50

    View Slide

  12. The FaunaDB Distributed Transaction Protocol
    Transaction submitted to Replica ABC:
    1. Read ticket 3, validate there’s at least 1 in stock, check price.
    2. Read customer 2, validate credit is enough to buy ticket 3.
    3. Subtract one from ticket 3’s stock.
    4. Subtract price from customer 2’s credit.
    A similar transaction is submitted to Replica XYZ for customer 6 at
    the same time.

    View Slide

  13. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T4 2 100
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T3 3 80 1
    Time Stamp Customer ID Credit
    T0 7 50
    T0
    T1
    T2
    T3
    T4
    Replica ABC
    Running Transaction
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100

    View Slide

  14. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T4 2 100
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T3 3 80 1
    Time Stamp Customer ID Credit
    T0 7 50
    T0
    T1
    T2
    T3
    T4
    Replica XYZ
    Running Transaction
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T2 6 300

    View Slide

  15. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T4 2 100
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T3 3 80 1
    Time Stamp Customer ID Credit
    T0 7 50
    T0
    T1
    T2
    T3
    T4
    Replica ABC
    Coordinator’s buffer
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20

    View Slide

  16. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T4 2 100
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T3 3 80 1
    Time Stamp Customer ID Credit
    T0 7 50
    T0
    T1
    T2
    T3
    T4
    Replica XYZ
    Coordinator’s buffer
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T2 6 300
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    6 220

    View Slide

  17. Replica ABC
    Coordinator
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    Replica XYZ
    Coordinator
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T2 6 300
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    6 220
    T0
    T1
    T2
    T3
    T4
    T5
    T6

    View Slide

  18. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T4 2 100
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T3 3 80 1
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Replica
    ABC
    T5’s
    buffered
    writes

    View Slide

  19. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T4 2 100
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T3 3 80 1
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Replica
    XYZ
    T5’s
    buffered
    writes

    View Slide

  20. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T5 2 20
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T5 3 80 0
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Replica
    ABC
    Commit
    T5
    SAME

    View Slide

  21. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T5 2 20
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T5 3 80 0
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T4 2 100
    Replica
    XYZ
    Commit
    T5
    SAME

    View Slide

  22. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T5 2 20
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T5 3 80 0
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T5 3 80 0
    Time Stamp Customer ID Credit
    T2 6 300
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T2 6 300
    Replica
    ABC
    T5’s
    buffered
    writes

    View Slide

  23. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T5 2 20
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T5 3 80 0
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T5 3 80 0
    Time Stamp Customer ID Credit
    T2 6 300
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T2 6 300
    Replica
    XYZ
    T5’s
    buffered
    writes

    View Slide

  24. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T5 2 20
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T5 3 80 0
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T5 3 80 0
    Time Stamp Customer ID Credit
    T2 6 300
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T2 6 300
    Replica
    ABC
    ABORT
    DIFFERENT

    View Slide

  25. Time Stamp Ticket Price Stock
    T3 1 100 12
    Time Stamp Customer ID Credit
    T1 2 200
    T5 2 20
    T2 6 300
    Time Stamp Ticket Price Stock
    T2 3 80 2
    T5 3 80 0
    Time Stamp Customer ID Credit
    T0 7 50
    Time Stamp Ticket Price Stock
    T5 3 80 0
    Time Stamp Customer ID Credit
    T2 6 300
    Ticket Price Stock
    3 80 0
    Customer ID Credit
    2 20
    T0
    T1
    T2
    T3
    T4
    T5
    T6
    Time Stamp Ticket Price Stock
    T3 3 80 1
    Time Stamp Customer ID Credit
    T2 6 300
    Replica
    XYZ
    ABORT
    DIFFERENT

    View Slide

  26. Multi-Region Global Replica Consistency
    Once a transaction commits, it is guaranteed that any subsequent
    read-write transaction—no matter which replica is processing
    it—will read all data that was written by the earlier transaction.

    View Slide

  27. Summary
    1. Reads are performed as of a recent snapshot, and writes are
    buffered.
    2. A consensus protocol is used (Raft) to insert the transaction
    into a distributed log. This is the only point at which global
    consensus is required.
    3. Checks each replica for potential violations of serializability
    guarantees.

    View Slide

  28. Performance implications
    - Transactions that update data only go through a single round of
    global consensus.
    - FaunaDB does not require clock synchronization or bounds on
    clock skew uncertainty across machines in a deployment.
    - FaunaDB has a global notion of "FaunaDB time" that is agreed
    upon by every node in the system.
    - FaunaDB supports serializable snapshot reads with no
    consensus or locking, so they complete with local datacenter
    latency.

    View Slide

  29. References
    - https://fauna.com/faunadb
    - http://jepsen.io/analyses/faunadb-2.5.4
    - https://jepsen.io/consistency
    - http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf
    - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/0
    2/tr-95-51.pdf
    - http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
    - http://web.cecs.pdx.edu/~len/sql1999.pdf
    - http://pmg.csail.mit.edu/papers/adya-phd.pdf

    View Slide

  30. ¡Gracias!
    @unyagami

    View Slide