Slide 1

Slide 1 text

Consistency without Clocks: The FaunaDB Distributed Transaction Protocol

Slide 2

Slide 2 text

About me Jeferson David Ossa @unyagami on twitter FP Developer at s4n.co https://jobs.lever.co/s4n

Slide 3

Slide 3 text

FaunaDB A distributed, indexed document store based on the Calvin transaction protocol that provides snapshot isolation up to strict serializability and does not rely on physical clock synchronization to maintain consistency. ACID transactions with up to serializable isolation. Linearizable, consistent operations across replicas geographically distributed.

Slide 4

Slide 4 text

FaunaDB A distributed, indexed document store based on the Calvin transaction protocol that provides snapshot isolation up to strict serializability and does not rely on physical clock synchronization to maintain consistency. ACID transactions with up to serializable isolation. Linearizable, consistent operations across replicas geographically distributed.

Slide 5

Slide 5 text

Calvin: Fast Distributed Transactions for Partitioned Database Systems T2 T3 T1 Tn sequencer Log T1 T2 T3 Tn Replica 1 Replica 2 Replica X ordering transactions, and actually executing those transactions, are separable problems

Slide 6

Slide 6 text

Snapshot Isolation Each transaction appears to operate on an independent, consistent snapshot of the database and are external consistent (Tn-1 is visible to Tn). If transaction T1 has modified an object X, and another transaction T2 committed a write to X after T1’s snapshot began, and before T1’s commit, then T1 must abort.

Slide 7

Slide 7 text

Strict Serializability Serializability: transactions appear to have occurred in some total order. + Linearizability: operation appears to take place atomically, in some order, consistent with the real-time ordering of those operations. = Transactions with total order and real-time constraints.

Slide 8

Slide 8 text

Serializable Isolation The system can process many transactions in parallel, but the final result is equivalent to processing them one after another. For most database systems, the order is not determined in advance. Instead, transactions are run in parallel, and some variant of locking is used to ensure that the final result is equivalent to some serial order.

Slide 9

Slide 9 text

FaunaDB’s replication protocol Caveat: FaunaDB’s replication protocol uses consensus, not wall clocks, to construct its transaction logs but still relies on wall clocks to decide when to seal time windows in the log, which means that clock skew can delay transaction processing.

Slide 10

Slide 10 text

Transaction Log & Snapshots 1:00 P.M. 1:01 P.M. 1:03 P.M. 1:07 P.M. T1 T2 T3 T4 Time Stamp Customer ID Credit T4 1 50 T1 2 100 T3 3 200 T2 2 50

Slide 11

Slide 11 text

Node 2 Node 1 Replication Time Stamp Ticket Price Stock T3 1 100 12 Replica ABC Node 1 Node 2 Replica XYZ Node 1 Node 2 Time Stamp Customer ID Credit T1 2 200 T4 2 100 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T3 3 80 1 Time Stamp Customer ID Credit T0 7 50

Slide 12

Slide 12 text

The FaunaDB Distributed Transaction Protocol Transaction submitted to Replica ABC: 1. Read ticket 3, validate there’s at least 1 in stock, check price. 2. Read customer 2, validate credit is enough to buy ticket 3. 3. Subtract one from ticket 3’s stock. 4. Subtract price from customer 2’s credit. A similar transaction is submitted to Replica XYZ for customer 6 at the same time.

Slide 13

Slide 13 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T4 2 100 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T3 3 80 1 Time Stamp Customer ID Credit T0 7 50 T0 T1 T2 T3 T4 Replica ABC Running Transaction Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100

Slide 14

Slide 14 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T4 2 100 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T3 3 80 1 Time Stamp Customer ID Credit T0 7 50 T0 T1 T2 T3 T4 Replica XYZ Running Transaction Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T2 6 300

Slide 15

Slide 15 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T4 2 100 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T3 3 80 1 Time Stamp Customer ID Credit T0 7 50 T0 T1 T2 T3 T4 Replica ABC Coordinator’s buffer Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Ticket Price Stock 3 80 0 Customer ID Credit 2 20

Slide 16

Slide 16 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T4 2 100 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T3 3 80 1 Time Stamp Customer ID Credit T0 7 50 T0 T1 T2 T3 T4 Replica XYZ Coordinator’s buffer Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T2 6 300 Ticket Price Stock 3 80 0 Customer ID Credit 6 220

Slide 17

Slide 17 text

Replica ABC Coordinator Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 Replica XYZ Coordinator Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T2 6 300 Ticket Price Stock 3 80 0 Customer ID Credit 6 220 T0 T1 T2 T3 T4 T5 T6

Slide 18

Slide 18 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T4 2 100 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T3 3 80 1 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Replica ABC T5’s buffered writes

Slide 19

Slide 19 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T4 2 100 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T3 3 80 1 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Replica XYZ T5’s buffered writes

Slide 20

Slide 20 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T5 2 20 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T5 3 80 0 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Replica ABC Commit T5 SAME

Slide 21

Slide 21 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T5 2 20 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T5 3 80 0 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T4 2 100 Replica XYZ Commit T5 SAME

Slide 22

Slide 22 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T5 2 20 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T5 3 80 0 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T5 3 80 0 Time Stamp Customer ID Credit T2 6 300 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T2 6 300 Replica ABC T5’s buffered writes

Slide 23

Slide 23 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T5 2 20 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T5 3 80 0 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T5 3 80 0 Time Stamp Customer ID Credit T2 6 300 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T2 6 300 Replica XYZ T5’s buffered writes

Slide 24

Slide 24 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T5 2 20 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T5 3 80 0 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T5 3 80 0 Time Stamp Customer ID Credit T2 6 300 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T2 6 300 Replica ABC ABORT DIFFERENT

Slide 25

Slide 25 text

Time Stamp Ticket Price Stock T3 1 100 12 Time Stamp Customer ID Credit T1 2 200 T5 2 20 T2 6 300 Time Stamp Ticket Price Stock T2 3 80 2 T5 3 80 0 Time Stamp Customer ID Credit T0 7 50 Time Stamp Ticket Price Stock T5 3 80 0 Time Stamp Customer ID Credit T2 6 300 Ticket Price Stock 3 80 0 Customer ID Credit 2 20 T0 T1 T2 T3 T4 T5 T6 Time Stamp Ticket Price Stock T3 3 80 1 Time Stamp Customer ID Credit T2 6 300 Replica XYZ ABORT DIFFERENT

Slide 26

Slide 26 text

Multi-Region Global Replica Consistency Once a transaction commits, it is guaranteed that any subsequent read-write transaction—no matter which replica is processing it—will read all data that was written by the earlier transaction.

Slide 27

Slide 27 text

Summary 1. Reads are performed as of a recent snapshot, and writes are buffered. 2. A consensus protocol is used (Raft) to insert the transaction into a distributed log. This is the only point at which global consensus is required. 3. Checks each replica for potential violations of serializability guarantees.

Slide 28

Slide 28 text

Performance implications - Transactions that update data only go through a single round of global consensus. - FaunaDB does not require clock synchronization or bounds on clock skew uncertainty across machines in a deployment. - FaunaDB has a global notion of "FaunaDB time" that is agreed upon by every node in the system. - FaunaDB supports serializable snapshot reads with no consensus or locking, so they complete with local datacenter latency.

Slide 29

Slide 29 text

References - https://fauna.com/faunadb - http://jepsen.io/analyses/faunadb-2.5.4 - https://jepsen.io/consistency - http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/0 2/tr-95-51.pdf - http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf - http://web.cecs.pdx.edu/~len/sql1999.pdf - http://pmg.csail.mit.edu/papers/adya-phd.pdf

Slide 30

Slide 30 text

¡Gracias! @unyagami