Coordination and the Art of Scaling

B7dc26518988058faa50712248c80bd3?s=47 pbailis
June 17, 2014

Coordination and the Art of Scaling

CloudantCON 2014
17 June 2014
http://www.cloudantcon.com/#schedule

For more information/details/nuance (!):
http://www.bailis.org/blog/
http://www.bailis.org/pubs.html
@pbailis

B7dc26518988058faa50712248c80bd3?s=128

pbailis

June 17, 2014
Tweet

Transcript

  1. COORDINATION AND THE ART OF SCALING Peter Bailis • UC

    Berkeley • @pbailis CloudantCON 2014
  2. A distributed system is one in which the failure of

    a computer you didn't even know existed can render your own computer unusable. —Leslie Lamport 2013 Turing Award Winner
  3. None
  4. None
  5. None
  6. THE NETWORK INCURS LATENCY

  7. THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE

  8. THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE SO HOW

    CAN WE BUILD ROBUST AND SCALABLE DISTRIBUTED SYSTEMS?
  9. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE

  10. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE (SERIALIZABILITY/LINEARIZABILITY)

  11. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE Impose a total order on

    events in the system
  12. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order

    on events in the system
  13. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order

    on events in the system Ask Am anda: “how ’s the w eather on the farm ?” Am anda replies: “Let m e check w ith the tractor.” Am anda replies: “It’s a beautiful day!” Tractor replies: current tem perature is 75°F
  14. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE Impose a total order on

    events in the system TIME Illusion created by a partially ordered protocol
  15. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order

    on events in the system Illusion created by a partially ordered protocol Remarkably powerful abstraction core to ACID transactions
  16. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order

    on events in the system Illusion created by a partially ordered protocol Remarkably powerful abstraction This is the way you’d want to program distributed systems, but… core to ACID transactions
  17. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order

    on events in the system Illusion created by a partially ordered protocol COST:
  18. THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order

    on events in the system Illusion created by a partially ordered protocol COST: BLOCKING COMMUNICATION COORDINATION
  19. COORDINATION (BLOCKING COMMUNICATION) Can I make progress without waiting?

  20. COORDINATION (BLOCKING COMMUNICATION) Can I make progress without waiting? UNDER

    SINGLE SYSTEM IMAGE, MUST WAIT!
  21. None
  22. COORDINATION REQUIRED? Throughput: 1/delay

  23. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

  24. SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING “Coordination-Avoiding Database Systems” arXiv:1402.2237

  25. 1 2 3 4 5 6 7 Number of Items

    per Transaction Throughput (txns/s) SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING LOG SCALE! “Coordination-Avoiding Database Systems” arXiv:1402.2237
  26. 1 2 3 4 5 6 7 Number of Items

    per Transaction Throughput (txns/s) SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING COORDINATED “Coordination-Avoiding Database Systems” arXiv:1402.2237
  27. SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING 1 2 3 4

    5 6 7 Number of Items per Transaction Throughput (txns/s) COORDINATED COORDINATION-FREE “Coordination-Avoiding Database Systems” arXiv:1402.2237
  28. SERIALIZABLE TRANSACTIONS ON EC2 IN-MEMORY LOCKING SINGLE SERVER: 10x faster

    (multi-core parallelism) MULTI-SERVER: ~1000x faster 1 2 3 4 5 6 7 Number of Items per Transaction Throughput (txns/s) COORDINATED COORDINATION-FREE “Coordination-Avoiding Database Systems” arXiv:1402.2237
  29. do not support! SSI/serializability HANA

  30. do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO!

    N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not “Highly Available Transactions: Virtues and Limitations” VLDB 2014
  31. do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO!

    N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations” VLDB 2014
  32. do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO!

    N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations” VLDB 2014
  33. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

  34. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately
  35. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately SINGLE DC: .5 ms on public cloud 5 µs on Infiniband
  36. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately SINGLE DC: .5 ms on public cloud 5 µs on Infiniband MULTI-DC?
  37. None
  38. None
  39. 133.7+ ms RTT

  40. 133.7+ ms RTT

  41. 133.7+ ms RTT

  42. 133.7+ ms RTT 85.1+ ms RTT

  43. THOSE LIGHT CONES_

  44. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures
  45. COORDINATION-FREE EXECUTION IS KEY TO INDEFINITE SCALABILITY

  46. COORDINATION IS THE BANE OF SCALABLE SYSTEMS

  47. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures WHEN DO WE HAVE TO COORDINATE?
  48. THAT SIMULTANEITY_

  49. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures WHEN DO WE HAVE TO COORDINATE?
  50. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures CAP Theorem (for recency guarantees) FLP result (for consensus; e.g., Paxos) WHEN DO WE HAVE TO COORDINATE? Davidson result (for SSI)
  51. COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources

    Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures CAP Theorem (for recency guarantees) FLP result (for consensus; e.g., Paxos) BUT DO APPS ALWAYS HAVE TO COORDINATE? WHEN DO WE HAVE TO COORDINATE? Davidson result (for SSI)
  52. None
  53. TICKET 241 TICKET 242 TICKET 243 TICKET 244

  54. TICKET 241 TICKET 242 TICKET 243 TICKET 244

  55. None
  56. INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL

  57. INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL TICKET 241 TICKET 242

    TICKET 243
  58. INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL TICKET 241 TICKET 241

    COORDINATION REQUIRED!
  59. INVARIANT: TICKET IDs SHOULD BE UNIQUE TICKET 241 TICKET 242

    PRE-PARTITION ID SPACE (1,4,…) (2,5,…) (3,6,…)
  60. INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE TICKET 241 TICKET 242

    COORDINATION-FREE!
  61. INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE COORDINATION-FREE! INVARIANT: TICKET IDs

    SHOULD BE UNIQUE PRE-PARTITION ID SPACE INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL COORDINATION REQUIRED!
  62. INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE COORDINATION-FREE! INVARIANT: TICKET IDs

    SHOULD BE UNIQUE PRE-PARTITION ID SPACE INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL COORDINATION REQUIRED! WHEN DO WE HAVE TO COORDINATE? DEPENDS ON APPLICATION SAFE ANSWER: ALWAYS COORDINATE
  63. WHEN DO WE HAVE TO COORDINATE? SAFE ANSWER: ALWAYS COORDINATE

  64. WHEN DO WE HAVE TO COORDINATE? SAFE ANSWER: ALWAYS COORDINATE

    BETTER ANSWER: (YOUR TAX DOLLARS AT WORK)
  65. WHEN DO WE HAVE TO COORDINATE? SAFE ANSWER: ALWAYS COORDINATE

    BETTER ANSWER: COORDINATION AVOIDANCE COORDINATE ONLY WHEN STRICTLY NECESSARY MOVE COMMUNICATION TO BACKGROUND “Coordination-Avoiding Database Systems” arXiv:1402.2237
  66. None
  67. SAFETY correctness always guaranteed LIVENESS database states agree (converge)

  68. Invariant Confluence is necessary and sufficient for ensuring safety, convergence,

    availability, and coordination-free execution. Invariant Confluence holds?! A safe, c-free execution strategy exists. Invariant Confluence fails?! No safe, c-free mechanism exists. “Coordination-Avoiding Database Systems” arXiv:1402.2237
  69. Invariant Operation C.F. Equality, Inequality Any ??? Generate unique ID

    Any ??? Specify unique ID Insert ??? >! Increment ??? >! Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
  70. Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID

    Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
  71. Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality

    Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
  72. Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality

    Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N MANY TRADITIONAL DB APPS OK Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
  73. Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality

    Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N MANY TRADITIONAL DB APPS OK Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
  74. FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the

    Social Graph” USENIX ATC 2013
  75. FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the

    Social Graph” USENIX ATC 2013 FRIENDS FRIENDS
  76. as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for

    the Social Graph” USENIX ATC 2013 FRIENDS FRIENDS
  77. as s FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store

    for the Social Graph” USENIX ATC 2013
  78. as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for

    the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates
  79. as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for

    the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates s
  80. as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for

    the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates s
  81. as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for

    the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates Not cleanly partitionable s
  82. NEED ATOMIC VISIBILITY FOREIGN KEY DEPENDENCIES “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  83. NEED ATOMIC VISIBILITY SEE ALL OF A TXN’S UPDATES, OR

    NONE OF THEM FOREIGN KEY DEPENDENCIES “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  84. NEED ATOMIC VISIBILITY SEE ALL OF A TXN’S UPDATES, OR

    NONE OF THEM FOREIGN KEY DEPENDENCIES SECONDARY INDEXING MATERIALIZED VIEWS “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  85. X=0 Y=0 HOW TO ACHIEVE ATOMIC VISIBILITY “Scalable Atomic Visibility

    with RAMP Transactions” SIGMOD 2014
  86. STRAWMAN: LOCKING X=0 Y=0 “Scalable Atomic Visibility with RAMP Transactions”

    SIGMOD 2014
  87. STRAWMAN: LOCKING X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  88. STRAWMAN: LOCKING X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  89. STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  90. STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  91. STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  92. STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  93. STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) R(X=1) “Scalable Atomic Visibility

    with RAMP Transactions” SIGMOD 2014
  94. STRAWMAN: LOCKING X=1 Y=1 W(X=1) W(Y=1) R(X=1) R(Y=1) “Scalable Atomic

    Visibility with RAMP Transactions” SIGMOD 2014
  95. Y=0 STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) “Scalable Atomic Visibility with

    RAMP Transactions” SIGMOD 2014
  96. Y=0 STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) R(X=?) R(Y=?) “Scalable Atomic

    Visibility with RAMP Transactions” SIGMOD 2014
  97. Y=0 STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) R(X=?) R(Y=?) ATOMIC VISIBILITY

    COUPLED WITH MUTUAL EXCLUSION “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  98. STRAWMAN: LOCKING X=1 W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) ATOMIC VISIBILITY

    COUPLED WITH MUTUAL EXCLUSION SLOW unavailable “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  99. TRANSACTIONS R A M P TOMIC EAD ULTI- ARTITION “Scalable

    Atomic Visibility with RAMP Transactions” SIGMOD 2014
  100. TRANSACTIONS R A M P TOMIC EAD ULTI- ARTITION “Scalable

    Atomic Visibility with RAMP Transactions” SIGMOD 2014
  101. TRANSACTIONS RAMP DECOUPLE ATOMIC VISIBILITY MUTUAL EXCLUSION “Scalable Atomic Visibility

    with RAMP Transactions” SIGMOD 2014
  102. TRANSACTIONS RAMP DECOUPLE ATOMIC VISIBILITY MUTUAL EXCLUSION from “Scalable Atomic

    Visibility with RAMP Transactions” SIGMOD 2014
  103. BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) X=1 “Scalable Atomic

    Visibility with RAMP Transactions” SIGMOD 2014
  104. BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) X=1 “Scalable Atomic

    Visibility with RAMP Transactions” SIGMOD 2014
  105. BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) LET CLIENTS RACE,

    but HAVE READERS “CLEAN UP” X=1 “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  106. BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) LET CLIENTS RACE,

    but HAVE READERS “CLEAN UP” X=1 LIMITED MULTI-VERSIONING + METADATA “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  107. BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP”

    LIMITED MULTI-VERSIONING + METADATA X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  108. BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP”

    X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  109. BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP”

    X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=1 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  110. BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP”

    X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=1 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  111. BASIC IDEA LET CLIENTS RACE, but HAVE READERS “CLEAN UP”

    X=1 LIMITED MULTI-VERSIONING + METADATA X=0 Y=1 Y=0 W(X=1) W(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  112. BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but

    HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  113. BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but

    HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  114. BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but

    HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  115. BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but

    HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) ITEM HIGHEST TS X 124 Y 124 R(X=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  116. BASIC IDEA W(X=1) W(Y=1) R(X=?) R(Y=?) LET CLIENTS RACE, but

    HAVE READERS “CLEAN UP” X=1 [t=124, {Y}] LIMITED MULTI-VERSIONING + METADATA X=0 [t=0, {}] Y=1 [t=124, {X}] Y=0 [t=0, {}] R(Y=0) ITEM HIGHEST TS X 124 Y 124 R(X=1) R(Y=1) “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
  117. TPCC Combine fkeys with sequence number insert on commit... 500K

    txns/s
  118. 47,852 Serializable locking bottlenecks on coordination over network “Coordination-Avoiding Database

    Systems” arXiv:1402.2237 New-Order Transactions/s
  119. 47,852 Serializable locking bottlenecks on coordination over network 632,589 Coordination-avoiding

    implementation (RAMP with fast ID assignment) bottlenecks on CPU EC2 cr1.8xlarge here, 8 servers “Coordination-Avoiding Database Systems” arXiv:1402.2237 New-Order Transactions/s
  120. 0 50 100 150 200 Number of Servers 2M 4M

    6M 8M 10M 12M 14M Total Throughput (txn/s)
  121. 0 50 100 150 200 Number of Servers 2M 4M

    6M 8M 10M 12M 14M Total Throughput (txn/s) INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE*
  122. INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE* GIVEN THE RIGHT

    MANY
  123. INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE* GIVEN THE RIGHT

    SYSTEM DESIGN CONCURRENCY PRIMITIVES ATTENTION TO SCALE MANY
  124. INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE* GIVEN THE RIGHT

    SYSTEM DESIGN CONCURRENCY PRIMITIVES ATTENTION TO SCALE LEVEL OF COORDINATION MANY
  125. THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE SO HOW

    CAN WE BUILD ROBUST AND SCALABLE DISTRIBUTED SYSTEMS?
  126. THE NETWORK INCURS LATENCY THE NETWORK IS UNRELIABLE SO HOW

    CAN WE BUILD ROBUST AND SCALABLE DISTRIBUTED SYSTEMS? UNDERSTAND COORDINATION
  127. COORDINATION AVOIDANCE UNDERSTAND IF/WHEN COORDINATION IS REQUIRED

  128. COORDINATION AVOIDANCE UNDERSTAND IF/WHEN COORDINATION IS REQUIRED INVARIANT CONFLUENCE (arXiv

    2014) necessary and sufficient condition for c-free operation HIGHLY AVAILABLE TRANSACTIONS (CACM, VLDB 2014) what database isolation levels are coordination-free? RAMP ATOMIC VISIBILITY (SIGMOD 2014) fast and intuitive multi-put, multi-get, indexing BLOOM and BLAZES (ICDE 2014) language-level automated coordination analysis CRDTS and BLOOM^L (SoCC 2013, USENIX ATC 2014) correct-by-design distributed data types PBS INCONSISTENCY (VLDBJ 2014) how stale is data if we don’t coordinate?
  129. Traditional distributed systems designs! suffer from coordination bottlenecks By understanding

    application requirements,! we can avoid coordination We can build systems that actually scale! while providing correct behavior Thanks!! ! pbailis@cs.berkeley.edu! @pbailis! http://bailis.org/ http://amplab.cs.berkeley.edu/!
  130. Punk designed by my name is mud from the Noun

    Project Creative Commons – Attribution (CC BY 3.0) Queen designed by Bohdan Burmich from the Noun Project Creative Commons – Attribution (CC BY 3.0) Guy Fawkes designed by Anisha Varghese from the Noun Project Creative Commons – Attribution (CC BY 3.0) Emperor designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0) Database designed by Shmidt Sergey from the Noun Project Creative Commons – Attribution (CC BY 3.0) List designed by Nicholas Menghini from the Noun Project Creative Commons – Attribution (CC BY 3.0) Warehouse designed by Wilson Joseph from the Noun Project Creative Commons – Attribution (CC BY 3.0) User designed by JM Waideaswaran from the Noun Project Creative Commons – Attribution (CC BY 3.0) Thermostat designed by Michael Senkow from the Noun Project Creative Commons – Attribution (CC BY 3.0) Customer Service designed by Bybzee from the Noun Project Creative Commons – Attribution (CC BY 3.0) Punk Rocker designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0) Jackhammer designed by Jamie Dickinson from the Noun Project Creative Commons – Attribution (CC BY 3.0) Earth designed by Martin Vanco from the Noun Project Creative Commons – Attribution (CC BY 3.0) Smart-Phone designed by Emily Haasch from the Noun Project Creative Commons – Attribution (CC BY 3.0) Cloud designed by Piotrek Chuchla from the Noun Project Creative Commons – Attribution (CC BY 3.0) Server designed by Jaime Carrion from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by Matthew Hawdon from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by james zamyslianskyj from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by Alyssa Mahlberg from the Noun Project Creative Commons – Attribution (CC BY 3.0) Lock designed by dylan voisard from the Noun Project Creative Commons – Attribution (CC BY 3.0) ! COCOGOOSE font by ZetaFonts COMMON CREATIVE NON COMMERCIAL USE IMAGE/FONT CREDITs