A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. —Leslie Lamport 2013 Turing Award Winner
THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Ask Am anda: “how ’s the w eather on the farm ?” Am anda replies: “Let m e check w ith the tractor.” Am anda replies: “It’s a beautiful day!” Tractor replies: current tem perature is 75°F
THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Illusion created by a partially ordered protocol Remarkably powerful abstraction core to ACID transactions
THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Illusion created by a partially ordered protocol Remarkably powerful abstraction This is the way you’d want to program distributed systems, but… core to ACID transactions
THE SIMPLE ANSWER: SINGLE-SYSTEM IMAGE TIME Impose a total order on events in the system Illusion created by a partially ordered protocol COST: BLOCKING COMMUNICATION COORDINATION
do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO! N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not “Highly Available Transactions: Virtues and Limitations” VLDB 2014
do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO! N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations” VLDB 2014
do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO! N Persistit NO! N Clustrix NO! N Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO! N MS SQL Server YES NuoDB NO! N Oracle 11G NO! N Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP HANA NO! N ScaleDB NO! N VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations” VLDB 2014
COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately SINGLE DC: .5 ms on public cloud 5 µs on Infiniband
COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately SINGLE DC: .5 ms on public cloud 5 µs on Infiniband MULTI-DC?
COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures WHEN DO WE HAVE TO COORDINATE?
COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures WHEN DO WE HAVE TO COORDINATE?
COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures CAP Theorem (for recency guarantees) FLP result (for consensus; e.g., Paxos) WHEN DO WE HAVE TO COORDINATE? Davidson result (for SSI)
COORDINATION REQUIRED? COORDINATION FREE? Throughput: 1/delay Limited by physical resources Latency: 1+ RTT Can return immediately Unavailable during failures Progress despite failures CAP Theorem (for recency guarantees) FLP result (for consensus; e.g., Paxos) BUT DO APPS ALWAYS HAVE TO COORDINATE? WHEN DO WE HAVE TO COORDINATE? Davidson result (for SSI)
INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE COORDINATION-FREE! INVARIANT: TICKET IDs SHOULD BE UNIQUE PRE-PARTITION ID SPACE INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL COORDINATION REQUIRED!
INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE COORDINATION-FREE! INVARIANT: TICKET IDs SHOULD BE UNIQUE PRE-PARTITION ID SPACE INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL COORDINATION REQUIRED! WHEN DO WE HAVE TO COORDINATE? DEPENDS ON APPLICATION SAFE ANSWER: ALWAYS COORDINATE
WHEN DO WE HAVE TO COORDINATE? SAFE ANSWER: ALWAYS COORDINATE BETTER ANSWER: COORDINATION AVOIDANCE COORDINATE ONLY WHEN STRICTLY NECESSARY MOVE COMMUNICATION TO BACKGROUND “Coordination-Avoiding Database Systems” arXiv:1402.2237
Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N MANY TRADITIONAL DB APPS OK Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
Test fails? Cannot avoid coordination Invariant Operation C.F. Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N MANY TRADITIONAL DB APPS OK Typical DB! operations and ! invariants! (SQL) “Coordination-Avoiding Database Systems” arXiv:1402.2237
as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates
as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates s
as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates s
as FOREIGN KEY DEPENDENCIES “TAO: Facebook’s Distributed Data Store for the Social Graph” USENIX ATC 2013 s Denormalized Friend List Fast reads… …multi-entity updates Not cleanly partitionable s
NEED ATOMIC VISIBILITY SEE ALL OF A TXN’S UPDATES, OR NONE OF THEM FOREIGN KEY DEPENDENCIES “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
NEED ATOMIC VISIBILITY SEE ALL OF A TXN’S UPDATES, OR NONE OF THEM FOREIGN KEY DEPENDENCIES SECONDARY INDEXING MATERIALIZED VIEWS “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
BASIC IDEA W(X=1) W(Y=1) Y=0 R(X=?) R(Y=?) LET CLIENTS RACE, but HAVE READERS “CLEAN UP” X=1 “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014
47,852 Serializable locking bottlenecks on coordination over network 632,589 Coordination-avoiding implementation (RAMP with fast ID assignment) bottlenecks on CPU EC2 cr1.8xlarge here, 8 servers “Coordination-Avoiding Database Systems” arXiv:1402.2237 New-Order Transactions/s
INDUSTRY-STANDARD TRANSACTIONAL WORKLOADS CAN SCALE JUST FINE* GIVEN THE RIGHT SYSTEM DESIGN CONCURRENCY PRIMITIVES ATTENTION TO SCALE LEVEL OF COORDINATION MANY
COORDINATION AVOIDANCE UNDERSTAND IF/WHEN COORDINATION IS REQUIRED INVARIANT CONFLUENCE (arXiv 2014) necessary and sufficient condition for c-free operation HIGHLY AVAILABLE TRANSACTIONS (CACM, VLDB 2014) what database isolation levels are coordination-free? RAMP ATOMIC VISIBILITY (SIGMOD 2014) fast and intuitive multi-put, multi-get, indexing BLOOM and BLAZES (ICDE 2014) language-level automated coordination analysis CRDTS and BLOOM^L (SoCC 2013, USENIX ATC 2014) correct-by-design distributed data types PBS INCONSISTENCY (VLDBJ 2014) how stale is data if we don’t coordinate?
Traditional distributed systems designs! suffer from coordination bottlenecks By understanding application requirements,! we can avoid coordination We can build systems that actually scale! while providing correct behavior Thanks!! ! [email protected]! @pbailis! http://bailis.org/ http://amplab.cs.berkeley.edu/!
Punk designed by my name is mud from the Noun Project Creative Commons – Attribution (CC BY 3.0) Queen designed by Bohdan Burmich from the Noun Project Creative Commons – Attribution (CC BY 3.0) Guy Fawkes designed by Anisha Varghese from the Noun Project Creative Commons – Attribution (CC BY 3.0) Emperor designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0) Database designed by Shmidt Sergey from the Noun Project Creative Commons – Attribution (CC BY 3.0) List designed by Nicholas Menghini from the Noun Project Creative Commons – Attribution (CC BY 3.0) Warehouse designed by Wilson Joseph from the Noun Project Creative Commons – Attribution (CC BY 3.0) User designed by JM Waideaswaran from the Noun Project Creative Commons – Attribution (CC BY 3.0) Thermostat designed by Michael Senkow from the Noun Project Creative Commons – Attribution (CC BY 3.0) Customer Service designed by Bybzee from the Noun Project Creative Commons – Attribution (CC BY 3.0) Punk Rocker designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0) Jackhammer designed by Jamie Dickinson from the Noun Project Creative Commons – Attribution (CC BY 3.0) Earth designed by Martin Vanco from the Noun Project Creative Commons – Attribution (CC BY 3.0) Smart-Phone designed by Emily Haasch from the Noun Project Creative Commons – Attribution (CC BY 3.0) Cloud designed by Piotrek Chuchla from the Noun Project Creative Commons – Attribution (CC BY 3.0) Server designed by Jaime Carrion from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by Matthew Hawdon from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by james zamyslianskyj from the Noun Project Creative Commons – Attribution (CC BY 3.0) Computer designed by Alyssa Mahlberg from the Noun Project Creative Commons – Attribution (CC BY 3.0) Lock designed by dylan voisard from the Noun Project Creative Commons – Attribution (CC BY 3.0) ! COCOGOOSE font by ZetaFonts COMMON CREATIVE NON COMMERCIAL USE IMAGE/FONT CREDITs