at least one is a write, operations might conflict T2 T1 T3 ww(c) rw(a) rw(c) rw(b) T3 T1 T2 Under the hood: ACID (conflict serializability) reasons about low-level read/write traces
at least one is a write, operations might conflict T2 T1 T3 ww(c) rw(a) rw(c) rw(b) T3 T1 T2 Under the hood: ACID (conflict serializability) reasons about low-level read/write traces END RESULT:! A MYOPIC APPROACH TO CORRECTNESS
Persistit NO Clustrix NO Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO MS SQL Server YES NuoDB NO Oracle 11G NO Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP Hana NO ScaleDB NO VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations,” VLDB 2014
Persistit NO Clustrix NO Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO MS SQL Server YES NuoDB NO Oracle 11G NO Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP Hana NO ScaleDB NO VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations,” VLDB 2014
= low latency (no RTT) = synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access
= low latency (no RTT) = indefinite horizontal scaling (even for a single record; true scalability) synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access
= low latency (no RTT) = indefinite horizontal scaling (even for a single record; true scalability) synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access
= low latency (no RTT) = indefinite horizontal scaling (even for a single record; true scalability) benefits also apply to concurrent access in single-node systems synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access
= low latency (no RTT) = indefinite horizontal scaling (even for a single record; true scalability) benefits also apply to concurrent access in single-node systems synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access BUT OFTEN GIVE UP CORRECTNESS!!
SCALABILITY Our insight: serializability is sufficient for correctness! ! ! ! ! but is not necessary! ! ! ! Only coordinate when necessary Ask applications for invariants
SCALABILITY Our insight: serializability is sufficient for correctness! ! ! ! ! but is not necessary! ! ! ! Only coordinate when necessary Ask applications for invariants Invariants determine:! ! When is coordination needed?! ! How much coordination is required?! !
{} dept = {{“ops”:1}, {“dev”:2}} Invariant: each employee is in a department Operations: add employees d2 = dept.find(“dev”) employees.add({“Sue”:d2}) d1 = dept.find(“ops”) employees.add({“Harry”:d1})
{} dept = {{“ops”:1}, {“dev”:2}} Invariant: each employee is in a department Operations: add employees d2 = dept.find(“dev”) employees.add({“Sue”:d2}) d1 = dept.find(“ops”) employees.add({“Harry”:d1}) Invariant holds!
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution
on combination of:! - expressiveness of operations! - strength of invariants STRENGTH OF INVARIANTS EXPRESSIVENESS OF OPERATIONS *Okay, so this is simplified, and there isn’t really a linear order on either axis (rather, it’s more about equivalence classes), but humor me here...
on combination of:! - expressiveness of operations! - strength of invariants STRENGTH OF INVARIANTS EXPRESSIVENESS OF OPERATIONS *Okay, so this is simplified, and there isn’t really a linear order on either axis (rather, it’s more about equivalence classes), but humor me here... COORDINATION! REQUIRED! COORDINATION-FREE
on combination of:! - expressiveness of operations! - strength of invariants STRENGTH OF INVARIANTS EXPRESSIVENESS OF OPERATIONS *Okay, so this is simplified, and there isn’t really a linear order on either axis (rather, it’s more about equivalence classes), but humor me here... COORDINATION! REQUIRED! COORDINATION-FREE
on combination of:! - expressiveness of operations! - strength of invariants STRENGTH OF INVARIANTS EXPRESSIVENESS OF OPERATIONS *Okay, so this is simplified, and there isn’t really a linear order on either axis (rather, it’s more about equivalence classes), but humor me here... COORDINATION! REQUIRED! COORDINATION-FREE
on combination of:! - expressiveness of operations! - strength of invariants STRENGTH OF INVARIANTS EXPRESSIVENESS OF OPERATIONS *Okay, so this is simplified, and there isn’t really a linear order on either axis (rather, it’s more about equivalence classes), but humor me here... COORDINATION! REQUIRED! COORDINATION-FREE
“Lamport”) NOT C-FREE Operation: insert record with specific ID INSERT INTO users (ID, firstname, lastname)! VALUES (1, “Leslie”, “Lamport”) Constraint: record IDs are unique DECLARE TABLE users (! ID int UNIQUE,! FirstName string,! LastName string )
UUID C-FREE! Operation: insert record INSERT INTO users (firstname, lastname)! VALUES (“Leslie”, “Lamport”) NOT C-FREE Operation: insert record with specific ID INSERT INTO users (ID, firstname, lastname)! VALUES (1, “Leslie”, “Lamport”) Constraint: record IDs are unique DECLARE TABLE users (! ID int UNIQUE,! FirstName string,! LastName string )
D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Anomaly (to avoid): “lamport” has no department! read lamport record; lookup lamport.D_ID returns NULL I-confluence insight:! cannot be violated by inserts! …but be careful about implementation! many ways to use coordination to enforce coordination-free semantics
(! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers
(! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers
(! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (???, 342, “lamport”)
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”)
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”)
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”)
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”)
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”)
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”) 2 RTT writes (prepare and make visible)! Between 1-2 RTTs for reads! Basic idea: store metadata to record sibling writes
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”) 2 RTT writes (prepare and make visible)! Between 1-2 RTTs for reads! Basic idea: store metadata to record sibling writes Read Atomic Multi-Partition ! Transactions, SIGMOD 2014
ID Any ??? Specify unique ID Insert ??? >! Increment ??? >! Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? Can apply the I-confluence test to! standard SQL for program analysis
ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Can apply the I-confluence test to! standard SQL for program analysis
Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Can apply the I-confluence test to! standard SQL for program analysis
Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Abstract data types Can apply the I-confluence test to! standard SQL for program analysis
Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N RAMP Transaction Abstract data types Can apply the I-confluence test to! standard SQL for program analysis
Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N RAMP Transaction Abstract data types Can apply the I-confluence test to! standard SQL for program analysis
12 TPC-E 4 of 4 AuctionMark all but 1 SEATS all but 1 JPAB all TATP all Still requires coordination-avoiding query plans!! • Appropriate merge (e.g., counter datatype) • Atomic multi-put (e.g., RAMP) • Nested atomic transactions (e.g., New-Order ID assignment) TRADITIONAL! OLTP! APPLICATIONS! ARE ACHIEVABLE! WITHOUT! (MUCH)! COORDINATION
aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables insert! O_ID warehouse district orders neworders insert! O_ID
aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID
aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables rewrite FK references to point to temp unique ID! create local index from temp unique ID to sequence ID insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID
aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables rewrite FK references to point to temp unique ID! create local index from temp unique ID to sequence ID insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID
aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables rewrite FK references to point to temp unique ID! create local index from temp unique ID to sequence ID insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID
aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables rewrite FK references to point to temp unique ID! create local index from temp unique ID to sequence ID insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID
aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables rewrite FK references to point to temp unique ID! create local index from temp unique ID to sequence ID insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID O NLY SYNCH CO O RDINATIO N! REQ UIRED
bottleneck (if implemented in a coordination-free manner): UC Berkeley database prototype, 100 EC2 CC2.8xlarge instances (thank you AWS folks! currently poor single-node performance, but unimportant if you can scale out [for the time being]), linearizable masters, only blocking coordination: incrementAndGet for “district next order ID” key, CPU-bound on in-memory data; ~2500 lines Java; 120 clients/warehouse, 5 warehouses/machine, no THINK TIME (i.e., more contention than stock configuration)
bottleneck (if implemented in a coordination-free manner): UC Berkeley database prototype, 100 EC2 CC2.8xlarge instances (thank you AWS folks! currently poor single-node performance, but unimportant if you can scale out [for the time being]), linearizable masters, only blocking coordination: incrementAndGet for “district next order ID” key, CPU-bound on in-memory data; ~2500 lines Java; 120 clients/warehouse, 5 warehouses/machine, no THINK TIME (i.e., more contention than stock configuration)
requirements,! we can avoid coordination unless necessary We can build systems that actually scale! while providing correct behavior Thanks!! ! [email protected]! @pbailis! http://bailis.org/ http://amplab.cs.berkeley.edu/!
requirements,! we can avoid coordination unless necessary We can build systems that actually scale! while providing correct behavior Thanks!! ! [email protected]! @pbailis! http://bailis.org/ http://amplab.cs.berkeley.edu/! based on “Coordination-Avoiding Database Systems,”! Bailis, Fekete, Franklin, Ghodsi, Hellerstein, Stoica! arXiv:1402.2237 http://arxiv.org/abs/1402.2237