Slide 1

Slide 1 text

COORDINATION AND THE ART OF SCALING Peter Bailis • UC Berkeley • @pbailis CloudantCON 2014

Slide 2

Slide 2 text

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. —Leslie Lamport 2013 Turing Award Winner

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

THE NETWORK INCURS LATENCY

Slide 7

Slide 7 text

THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE

Slide 8

Slide 8 text

THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE SO HOW CAN WE BUILD ROBUST AND SCALABLE DISTRIBUTED SYSTEMS?

Slide 9

Slide 9 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE

Slide 10

Slide 10 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE (SERIALIZABILITY/LINEARIZABILITY)

Slide 11

Slide 11 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE Impose a total order on events in the system

Slide 12

Slide 12 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system

Slide 13

Slide 13 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Ask Am anda: “how ’s the w eather on the farm ?” Am anda replies: “Let m e check w ith the tractor.” Am anda replies: “It’s a beautiful day!” Tractor replies: current tem perature is 75°F

Slide 14

Slide 14 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE Impose a total order on events in the system TIME Illusion created by a partially ordered protocol

Slide 15

Slide 15 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Illusion created by a partially ordered protocol Remarkably powerful abstraction core to ACID transactions

Slide 16

Slide 16 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Illusion created by a partially ordered protocol Remarkably powerful abstraction This is the way you’d want to program distributed systems, but… core to ACID transactions

Slide 17

Slide 17 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Illusion created by a partially ordered protocol COST:

Slide 18

Slide 18 text

THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Illusion created by a partially ordered protocol COST: BLOCKING COMMUNICATION COORDINATION

Slide 19

Slide 19 text

COORDINATION (BLOCKING COMMUNICATION) Can I make progress without waiting?

Slide 20

Slide 20 text

COORDINATION (BLOCKING COMMUNICATION) Can I make progress without waiting? UNDER SINGLE SYSTEM IMAGE, MUST WAIT!

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

COORDINATION REQUIRED? Throughput: 1/delay

Slide 23

Slide 23 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

Slide 24

Slide 24 text

SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 25

Slide 25 text

1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING LOG SCALE! “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 26

Slide 26 text

1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING COORDINATED “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 27

Slide 27 text

SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) COORDINATED COORDINATION-FREE “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 28

Slide 28 text

SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING SINGLE SERVER: 10x faster (multi-core parallelism) MULTI-SERVER: ~1000x faster 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) COORDINATED COORDINATION-FREE “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 29

Slide 29 text

do not support! SSI/serializability HANA

Slide 30

Slide 30 text

do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO! N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not “Highly Available Transactions: Virtues and Limitations” VLDB 2014

Slide 31

Slide 31 text

do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO! N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations” VLDB 2014

Slide 32

Slide 32 text

do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO! N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations” VLDB 2014

Slide 33

Slide 33 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

Slide 34

Slide 34 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately

Slide 35

Slide 35 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately SINGLE DC: .5 ms on public cloud 5 µs on Infiniband

Slide 36

Slide 36 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately SINGLE DC: .5 ms on public cloud 5 µs on Infiniband MULTI-DC?

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

133.7+ ms RTT

Slide 40

Slide 40 text

133.7+ ms RTT

Slide 41

Slide 41 text

133.7+ ms RTT

Slide 42

Slide 42 text

133.7+ ms RTT 85.1+ ms RTT

Slide 43

Slide 43 text

THOSE LIGHT CONES_

Slide 44

Slide 44 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures

Slide 45

Slide 45 text

COORDINATION-FREE EXECUTION IS KEY TO INDEFINITE SCALABILITY

Slide 46

Slide 46 text

COORDINATION IS THE BANE OF SCALABLE SYSTEMS

Slide 47

Slide 47 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures WHEN DO WE HAVE TO COORDINATE?

Slide 48

Slide 48 text

THAT SIMULTANEITY_

Slide 49

Slide 49 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures WHEN DO WE HAVE TO COORDINATE?

Slide 50

Slide 50 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures CAP Theorem (for recency guarantees) FLP result (for consensus; e.g., Paxos) WHEN DO WE HAVE TO COORDINATE? Davidson result (for SSI)

Slide 51

Slide 51 text

COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures CAP Theorem (for recency guarantees) FLP result (for consensus; e.g., Paxos) BUT DO APPS ALWAYS HAVE TO COORDINATE? WHEN DO WE HAVE TO COORDINATE? Davidson result (for SSI)

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

TICKET 241 TICKET 242 TICKET 243 TICKET 244

Slide 54

Slide 54 text

TICKET 241 TICKET 242 TICKET 243 TICKET 244

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL

Slide 57

Slide 57 text

INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL TICKET 241 TICKET 242 TICKET 243

Slide 58

Slide 58 text

INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL TICKET 241 TICKET 241 COORDINATION REQUIRED!

Slide 59

Slide 59 text

INVARIANT: TICKET IDs SHOULD BE UNIQUE TICKET 241 TICKET 242 PRE-PARTITION ID SPACE (1,4,…) (2,5,…) (3,6,…)

Slide 60

Slide 60 text

INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE TICKET 241 TICKET 242 COORDINATION-FREE!

Slide 61

Slide 61 text

INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE COORDINATION-FREE! INVARIANT: TICKET IDs SHOULD BE UNIQUE PRE-PARTITION ID SPACE INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL COORDINATION REQUIRED!

Slide 62

Slide 62 text

INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE COORDINATION-FREE! INVARIANT: TICKET IDs SHOULD BE UNIQUE PRE-PARTITION ID SPACE INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL COORDINATION REQUIRED! WHEN DO WE HAVE TO COORDINATE? DEPENDS ON APPLICATION SAFE ANSWER: ALWAYS COORDINATE

Slide 63

Slide 63 text

WHEN DO WE HAVE TO COORDINATE? SAFE ANSWER: ALWAYS COORDINATE

Slide 64

Slide 64 text

WHEN DO WE HAVE TO COORDINATE? SAFE ANSWER: ALWAYS COORDINATE BETTER ANSWER: (YOUR TAX DOLLARS AT WORK)

Slide 65

Slide 65 text

WHEN DO WE HAVE TO COORDINATE? SAFE ANSWER: ALWAYS COORDINATE BETTER ANSWER: COORDINATION AVOIDANCE COORDINATE ONLY WHEN STRICTLY NECESSARY MOVE COMMUNICATION TO BACKGROUND “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

SAFETY correctness always guaranteed LIVENESS database states agree (converge)

Slide 68

Slide 68 text

Invariant Confluence is necessary and sufficient for ensuring safety, convergence, availability, and coordination-free execution. Invariant Confluence holds?! A safe, c-free execution strategy exists. Invariant Confluence fails?! No safe, c-free mechanism exists. “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 69

Slide 69 text

Invariant Operation C.F. Equality, Inequality Any ??? Generate unique ID Any ??? Specify unique ID Insert ??? >! Increment ??? >! Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 70

Slide 70 text

Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 71

Slide 71 text

Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 72

Slide 72 text

Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N MANY TRADITIONAL DB APPS OK Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 73

Slide 73 text

Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N MANY TRADITIONAL DB APPS OK Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237

Slide 74

Slide 74 text

FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013

Slide 75

Slide 75 text

FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 FRIENDS FRIENDS

Slide 76

Slide 76 text

as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 FRIENDS FRIENDS

Slide 77

Slide 77 text

as s FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013

Slide 78

Slide 78 text

as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates

Slide 79

Slide 79 text

as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates s

Slide 80

Slide 80 text

as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates s

Slide 81

Slide 81 text

as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates Not cleanly partitionable s

Slide 82

Slide 82 text

NEED ATOMIC VISIBILITY FOREIGN KEY DEPENDENCIES “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 83

Slide 83 text

NEED ATOMIC VISIBILITY SEE ALL OF A TXN’S UPDATES, OR NONE OF THEM FOREIGN KEY DEPENDENCIES “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 84

Slide 84 text

NEED ATOMIC VISIBILITY SEE ALL OF A TXN’S UPDATES, OR NONE OF THEM FOREIGN KEY DEPENDENCIES SECONDARY INDEXING MATERIALIZED VIEWS “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 85

Slide 85 text

X=0 Y=0 HOW TO ACHIEVE ATOMIC VISIBILITY “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 86

Slide 86 text

STRAWMAN: LOCKING X=0 Y=0 “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 87

Slide 87 text

STRAWMAN: LOCKING X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 88

Slide 88 text

STRAWMAN: LOCKING X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 89

Slide 89 text

STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 90

Slide 90 text

STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 91

Slide 91 text

STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 92

Slide 92 text

STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 93

Slide 93 text

STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 94

Slide 94 text

STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) R(X=1) R(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 95

Slide 95 text

Y=0 STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 96

Slide 96 text

Y=0 STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) R(X=?) R(Y=?) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 97

Slide 97 text

Y=0 STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) R(X=?) R(Y=?) ATOMIC VISIBILITY COUPLED WITH MUTUAL EXCLUSION “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 98

Slide 98 text

STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) ATOMIC VISIBILITY COUPLED WITH MUTUAL EXCLUSION SLOW unavailable “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 99

Slide 99 text

TRANSACTIONS R A M P TOMIC EAD ULTI- ARTITION “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 100

Slide 100 text

TRANSACTIONS R A M P TOMIC EAD ULTI- ARTITION “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 101

Slide 101 text

TRANSACTIONS RAMP DECOUPLE ATOMIC VISIBILITY MUTUAL EXCLUSION “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 102

Slide 102 text

TRANSACTIONS RAMP DECOUPLE ATOMIC VISIBILITY MUTUAL EXCLUSION from “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 103

Slide 103 text

BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) X=1 “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 104

Slide 104 text

BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) X=1 “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 105

Slide 105 text

BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 106

Slide 106 text

BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 LIMITED MULTI-VERSIONING + METADATA “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 107

Slide 107 text

BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP” LIMITED MULTI-VERSIONING + METADATA X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 108

Slide 108 text

BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 109

Slide 109 text

BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=1 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 110

Slide 110 text

BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=1 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 111

Slide 111 text

BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=1 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 112

Slide 112 text

BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 113

Slide 113 text

BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 114

Slide 114 text

BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 115

Slide 115 text

BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) ITEM HIGHEST TS X 124 Y 124 R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 116

Slide 116 text

BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) ITEM HIGHEST TS X 124 Y 124 R(X=1) R(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

Slide 117

Slide 117 text

TPCC Combine fkeys with sequence number insert on commit... 500K txns/s

Slide 118

Slide 118 text

47,852 Serializable locking bottlenecks on coordination over network “Coordination-Avoiding Database Systems” arXiv:1402.2237 New-Order Transactions/s

Slide 119

Slide 119 text

47,852 Serializable locking bottlenecks on coordination over network 632,589 Coordination-avoiding implementation (RAMP with fast ID assignment) bottlenecks on CPU EC2 cr1.8xlarge here, 8 servers “Coordination-Avoiding Database Systems” arXiv:1402.2237 New-Order Transactions/s

Slide 120

Slide 120 text

0 50 100 150 200 Number of Servers 2M 4M 6M 8M 10M 12M 14M Total Throughput (txn/s)

Slide 121

Slide 121 text

0 50 100 150 200 Number of Servers 2M 4M 6M 8M 10M 12M 14M Total Throughput (txn/s) INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE*

Slide 122

Slide 122 text

INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE* GIVEN THE RIGHT MANY

Slide 123

Slide 123 text

INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE* GIVEN THE RIGHT SYSTEM DESIGN CONCURRENCY PRIMITIVES ATTENTION TO SCALE MANY

Slide 124

Slide 124 text

INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE* GIVEN THE RIGHT SYSTEM DESIGN CONCURRENCY PRIMITIVES ATTENTION TO SCALE LEVEL OF COORDINATION MANY

Slide 125

Slide 125 text

THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE SO HOW CAN WE BUILD ROBUST AND SCALABLE DISTRIBUTED SYSTEMS?

Slide 126

Slide 126 text

THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE SO HOW CAN WE BUILD ROBUST AND SCALABLE DISTRIBUTED SYSTEMS? UNDERSTAND COORDINATION

Slide 127

Slide 127 text

COORDINATION AVOIDANCE UNDERSTAND IF/WHEN COORDINATION IS REQUIRED

Slide 128

Slide 128 text

COORDINATION AVOIDANCE UNDERSTAND IF/WHEN COORDINATION IS REQUIRED INVARIANT CONFLUENCE (arXiv 2014) necessary and sufficient condition for c-free operation HIGHLY AVAILABLE TRANSACTIONS (CACM, VLDB 2014) what database isolation levels are coordination-free? RAMP ATOMIC VISIBILITY (SIGMOD 2014) fast and intuitive multi-put, multi-get, indexing BLOOM and BLAZES (ICDE 2014) language-level automated coordination analysis CRDTS and BLOOM^L (SoCC 2013, USENIX ATC 2014) correct-by-design distributed data types PBS INCONSISTENCY (VLDBJ 2014) how stale is data if we don’t coordinate?

Slide 129

Slide 129 text

Traditional distributed systems designs! suffer from coordination bottlenecks By understanding application requirements,! we can avoid coordination We can build systems that actually scale! while providing correct behavior Thanks!! ! [email protected]! @pbailis! http://bailis.org/ http://amplab.cs.berkeley.edu/!

Slide 130

Slide 130 text

Punk designed by my name is mud from the Noun Project Creative Commons – Attribution (CC BY 3.0) Queen designed by Bohdan Burmich from the Noun Project Creative Commons – Attribution (CC BY 3.0) Guy Fawkes designed by Anisha Varghese from the Noun Project Creative Commons – Attribution (CC BY 3.0) Emperor designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0) Database designed by Shmidt Sergey from the Noun Project Creative Commons – Attribution (CC BY 3.0) List designed by Nicholas Menghini from the Noun Project Creative Commons – Attribution (CC BY 3.0) Warehouse designed by Wilson Joseph from the Noun Project Creative Commons – Attribution (CC BY 3.0) User designed by JM Waideaswaran from the Noun Project Creative Commons – Attribution (CC BY 3.0) Thermostat designed by Michael Senkow from the Noun Project Creative Commons – Attribution (CC BY 3.0) Customer Service designed by Bybzee from the Noun Project Creative Commons – Attribution (CC BY 3.0) Punk Rocker designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0) Jackhammer designed by Jamie Dickinson from the Noun Project Creative Commons – Attribution (CC BY 3.0) Earth designed by Martin Vanco from the Noun Project Creative Commons – Attribution (CC BY 3.0) Smart-Phone designed by Emily Haasch from the Noun Project Creative Commons – Attribution (CC BY 3.0) Cloud designed by Piotrek Chuchla from the Noun Project Creative Commons – Attribution (CC BY 3.0) Server designed by Jaime Carrion from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by Matthew Hawdon from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by james zamyslianskyj from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by Alyssa Mahlberg from the Noun Project Creative Commons – Attribution (CC BY 3.0) Lock designed by dylan voisard from the Noun Project Creative Commons – Attribution (CC BY 3.0) ! COCOGOOSE font by ZetaFonts COMMON CREATIVE NON COMMERCIAL USE IMAGE/FONT CREDITs