Slide 1

Slide 1 text

Don’t Give Up on Serializability Just Yet Neha Narula

Slide 2

Slide 2 text

Don’t Give Up on Serializability Just Yet Neha Narula MIT CSAIL GOTO Chicago May 2015 2   A journey into serializable systems

Slide 3

Slide 3 text

@neha 3   •  PhD candidate at MIT •  Formerly at Google •  Research in fast transactions for multi-core databases and distributed systems

Slide 4

Slide 4 text

4   However, the most important person in my gang will be a systems programmer. A person who can debug a device driver or a distributed system is a person who can be trusted in a Hobbesian nightmare of breathtaking scope; a systems programmer has seen the terrors of the world and understood the intrinsic horror of existence.

Slide 5

Slide 5 text

A journey into serializable systems

Slide 6

Slide 6 text

6  

Slide 7

Slide 7 text

1M messages/sec 1/5 of all page views in the US 1M messages/sec from mobile devices

Slide 8

Slide 8 text

Databases are difficult to scale 8   Application servers are stateless; add more for more traffic Database is stateful

Slide 9

Slide 9 text

Distributed databases 9   Partition data on multiple servers for more performance

Slide 10

Slide 10 text

Example partitioned database Database   Database   Database   widgets table widget_id! 100-199! 0-99! 200-299! Webservers Database   ?!

Slide 11

Slide 11 text

2007 •  Mapreduce •  Google File System •  Bigtable 11  

Slide 12

Slide 12 text

Pros/Cons •  In-memory •  HIGHLY scalable •  Transparently fault tolerant •  Geo replication 12   •  No schema •  Require complex key/row/document design •  No query language •  No indexes •  No transactions •  No guarantees

Slide 13

Slide 13 text

13  

Slide 14

Slide 14 text

14  

Slide 15

Slide 15 text

15   mysql> BEGIN TRANSACTION UPDATE … COMMIT

Slide 16

Slide 16 text

Problem with dropping transactions •  Difficult to reason about concurrent interleavings •  Might result in incorrect, unrecoverable state 16  

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

“The hacker discovered that multiple simultaneous withdrawals are processed essentially at the same time and that the system's software doesn't check quickly enough for a negative balance” h1p://arstechnica.com/security/2014/03/yet-­‐another-­‐exchange-­‐hacked-­‐poloniex-­‐loses-­‐ around-­‐50000-­‐in-­‐bitcoin/  

Slide 19

Slide 19 text

Consistency guarantees help us reason about our code and avoid subtle bugs

Slide 20

Slide 20 text

Consistency A very misused word in systems! •  C as in ACID •  C as in CAP •  C as in sequential, causal, eventual, strict consistency

Slide 21

Slide 21 text

ACID Transactions Atomic Consistent Isolated Durable 21   Whole thing happens or not Application-defined correctness Other transactions do not interfere Can recover correctly from a crash SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRANSACTION ... COMMIT

Slide 22

Slide 22 text

What is Serializability? 22   Serializability != Serial

Slide 23

Slide 23 text

What is Serializability? The result of executing a set of transactions is the same as if those transactions had executed one at a time, in some serial order. If each transaction preserves correctness, the DB will be in a correct state. We can pretend like there’s no concurrency! 23  

Slide 24

Slide 24 text

TXN1(k, j Key) (Value, Value) { a := GET(k) b := GET(j) return a, b } Database transactions should be serializable 24   TXN2(k, j Key) { ADD(k,1) ADD(j,1) } TXN1 TXN2 TXN2 TXN1 time or" To the programmer:" Valid return values for TX1: (0,0)" k=0,j=0" or (1,1)"

Slide 25

Slide 25 text

Benefits of Serializability •  Do not have to reason about interleavings •  Do not have to express invariants separately from the code! 25  

Slide 26

Slide 26 text

Serializability Costs •  On a multi-core database, serialization and cache line transfers •  On a distributed database, serialization and network calls Concurrency control: Locking and coordination 26  

Slide 27

Slide 27 text

Eventual consistency If no new updates are made to the object, eventually all accesses will return the last updated value.

Slide 28

Slide 28 text

Eventual consistency If no new updates are made to the object, eventually all accesses will return the last updated value the same value. (What is last, really?) (And when do we stop writing?) (And what about multi-key transactions?)

Slide 29

Slide 29 text

Sequential consistency: cache coherence P1   P2   P3   RAM  

Slide 30

Slide 30 text

P1:  W(x)a   P2:                          W(x)b   P3:                                                    R(x)a                                  R(x)b   P1:  W(x)a   P2:                                                                    W(x)b   P3:                                                  R(x)a                                    R(x)b   Lme   Lme  

Slide 31

Slide 31 text

P1:  W(x)a   P2:                          W(x)b   P3:                                                  R(x)b                                  R(x)a   P1:                                                                            W(x)a   P2:                        W(x)b   P3:                                                  R(x)b                                      R(x)a   Lme   Lme  

Slide 32

Slide 32 text

External Consistency Everything that sequential consistency has Except results actually match time. An external observer

Slide 33

Slide 33 text

P1:  W(x)a   P2:                          W(x)b   P3:                                                    R(x)b                                  R(x)a   The  value  of  x   is  b!   Then  I  read   x=a?       P3:                                                       Not Externally Consistent Lme  

Slide 34

Slide 34 text

CAP Theorem •  Brewer’s PODC talk: “Consistency, Availability, Partition-tolerance: choose two” in 2000 –  Partition-tolerance is a failure model –  Choice: can you process reads and writes during a partition or not? •  FLP result – “Impossibility of Distributed Consensus with One Faulty Process” in 1985 –  Asynchronous model; cannot tell the difference between message delay and failure

Slide 35

Slide 35 text

What does this mean? It’s impossible to decide anything on the internet?

Slide 36

Slide 36 text

NP-hard

Slide 37

Slide 37 text

What does CAP mean? It’s impossible to 100% of the time decide everything on the internet if we can’t rely on synchronous messaging We can 100% of the time decide everything if partitions heal (we know the upper bound on message delays) We can still play Candy Crush

Slide 38

Slide 38 text

CAP" Consistency vs. Performance Consistency (like serializability) requires communication and blocking How do we reduce these costs while: •  Producing a correct ordering of reads and writes and •  Handling failures and (eventually) making progress?

Slide 39

Slide 39 text

Improving Serializability Performance 39   Technique Systems Atomic clocks to bound time skew Spanner Transaction chopping Lynx, ROCOCO Commutative locking Escrow transactions, abstract data types, Doppel Deterministic ordering Granola, Calvin

Slide 40

Slide 40 text

Goal: parallel performance •  Different concurrency control schemes for popular, contended data •  Commutative locking •  Abstract datatypes •  Per-core (or per-server) data and constraints 40  

Slide 41

Slide 41 text

Ordered PUT, insert to an ordered list, user-defined functions Operation Model Developers write transactions as stored procedures which are composed of operations on keys and values: 41   value GET(k) void PUT(k,v) void INCR(k,n) void MAX(k,n) void MULT(k,n) void OPUT(k,v,o) void TOPK_INSERT(k,v,o) void UDF(k,v,a) Traditional key/value operations Operations on numeric values which modify the existing value Replicate for reads Save last write Replicate for commutative operations Log operations

Slide 42

Slide 42 text

Spanner/F1 “We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.”

Slide 43

Slide 43 text

Takeaways •  Use well-tested, long-lived database systems •  Use SERIALIZABLE until it becomes a performance problem •  Think about what is changing when you move to systems with different models 43  

Slide 44

Slide 44 text

Thanks!" The  Stata  Center  via  emax:  h1p://hip.cat/emax/   [email protected] http://nehanaru.la @neha

Slide 45

Slide 45 text

Questions? Please remember to evaluate via the GOTO Guide App