Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Don't Give Up on Serializability Just Yet

749c111168bcee4d556ac780537ed9e6?s=47 Neha
June 08, 2015

Don't Give Up on Serializability Just Yet

A short version of a talk on serializability and consistency, given at dotScale in Paris. Describes consistency in three contexts: database transactions, consistency models, and the CAP theorem.

749c111168bcee4d556ac780537ed9e6?s=128

Neha

June 08, 2015
Tweet

Transcript

  1. Consistency and Candy Crush Neha Narula @neha dotScale June 8,

    2015 1   Don’t give up on serializability just yet
  2. @neha 2   •  PhD from MIT •  Formerly at

    Google •  Research in fast transactions for multi-core databases and distributed systems
  3. 3   … the most important person in my gang

    will be a systems programmer. A person who can debug a device driver or a distributed system is a person who can be trusted in a Hobbesian nightmare of breathtaking scope; a systems programmer has seen the terrors of the world and understood the intrinsic horror of existence.
  4. Consistency models help us reason about our code and avoid

    subtle bugs
  5. Outline Consistency as in ACID Consistency models Consistency as in

    CAP
  6. Outline Consistency as in ACID! Consistency models Consistency as in

    CAP
  7. 7   mysql> START TRANSACTION; mysql> UPDATE t SET x=x+1

    WHERE y=2; mysql> UPDATE t SET y=y+1 WHERE z=3; mysql> COMMIT;
  8. ACID transactions Atomic Consistent Isolated Durable 8   Whole thing

    happens or not Application-defined correctness Transactions don’t interfere with each other Database can recover correctly from a crash
  9. What is serializability? The result of executing a set of

    transactions is equivalent to executing those transactions one at a time, in some serial order. If each transaction preserves correctness, the database will be in a correct state. We can pretend like there’s no concurrency! 9  
  10. What is serializability? 10   serializability != serial execution

  11. TXN1(k, j Key) (int, int) { a := GET(k) b

    := GET(j) return a, b } Serializable database transactions 11   TXN2(k, j Key) { ADD(k,1) ADD(j,1) } TXN1 TXN2 TXN2 TXN1 time or" To the programmer:" Valid return values for TX1: (0,0)" k=0,j=0" or (1,1)"
  12. Interleaved execution:" GET(k) GET(j) Transactions can execute in parallel 12

      ADD(k,1) ADD(j,1) time k=0,j=0 TX1 returns (1,1)"
  13. Interleaved execution:" GET(k)GET(j) Non-serializable means incorrect interleavings 13   ADD(k,1)

    ADD(j,1) time TX1 returns (1,0)!" k=0,j=0
  14. Benefits of serializability •  Do not have to reason about

    interleavings •  Express invariants in one place: the code 14  
  15. Outline Consistency as in ACID Consistency models! Consistency as in

    CAP
  16. Eventual consistency: key/value stores •  Bigtable 16   •  Dynamo

  17. Eventual consistency If no new updates are made to a

    key, eventually all accesses will return the last updated value.
  18. Eventual consistency If no new updates are made to a

    key, eventually all accesses will return the last updated value the same value. (What is last, really?) (And when do we stop writing?)
  19. Strict consistency •  Reads and writes appear to have executed

    in a total order that matches time •  Single processor semantics •  Linearizability 19  
  20. Different Consistency Models Strict consistency Sequential consistency Causal consistency PRAM

    consistency Read-your-writes consistency Eventual consistency 20   Stronger" Weaker"
  21. Outline Consistency as in ACID Consistency models Consistency as in

    CAP!
  22. CAP theorem •  Brewer’s PODC talk: Consistency, Availability, Partition-tolerance: choose

    two in 2000 –  Partition-tolerance is a failure model –  Choice: can you process reads and writes during a partition or not? •  FLP result: Impossibility of Distributed Consensus with One Faulty Process in 1985 –  Asynchronous model; cannot tell the difference between message delay and failure
  23. What does this mean? Is it impossible to run a

    correct distributed database?
  24. NP-hard

  25. What does CAP mean? It is impossible to 100% of

    the time make progress and get the right answer if we can’t rely on synchronous messaging We can 100% of the time make progress and get the right answer if partitions heal (we know the upper bound on message delays) We can still play Candy Crush
  26. CAP" Consistency vs. performance Consistency requires communication and blocking. How

    do we reduce these costs while producing a correct ordering of reads and writes and handling failures?
  27. Spanner/F1 “We believe it is better to have application programmers

    deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.” Corbett, James C., Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat et al. "Spanner: Google’s globally distributed database." ACM Transactions on Computer Systems (TOCS) , 2013.
  28. Outline Consistency as in ACID Consistency models Consistency as in

    CAP
  29. Takeaways Use well-tested, long-lived databases with SERIALIZABLE until you have

    a performance problem Be aware of what is changing when you move between systems with different consistency models Consciously decide what trade-offs to make 29  
  30. Thanks!" The Stata Center via emax: http://hip.cat/emax/ narula@gmail.com http://nehanaru.la @neha