Availability, Consistency, and Horizontally Scalable Data Management (SF Bay Area ACM)

Data Management Availability, Consistency, Peter Bailis! UC Berkeley, AMPLab! @pbailis
with Alan Fekete, Mike Franklin, Ali Ghodsi, Ion Stoica, Joe Hellerstein and Horizontally Scalable

DISTRIBUTED DATABASES THE END OF THE END OF SCALABLE AND
CORRECT Peter Bailis! UC Berkeley, AMPLab! @pbailis with Alan Fekete, Mike Franklin, Ali Ghodsi, Ion Stoica, Joe Hellerstein

A portrait of big services

stateless! horizontally scalable A portrait of big services

stateless! horizontally scalable concurrent,! stateful! durable A portrait of big
services

Users care about the correctness of their applications

Users care about the correctness of their applications “usernames should
be unique”

be unique” “each patient should have a attending doctor”

be unique” “each patient should have a attending doctor” “account balances should be positive”

Classic answer: use ACID transactions

Classic answer: use ACID transactions Equivalent Serial Execution

Classic answer: use ACID transactions Equivalent Serial Execution isolation provides
correctness

correctness ACID

correctness ACID ACID

Under the hood: ACID (conﬂict serializability) reasons about low-level read/write
traces

For any two operations to the same data item, if
at least one is a write, operations might conﬂict T2 T1 T3 ww(c) rw(a) rw(c) rw(b) T3 T1 T2 Under the hood: ACID (conﬂict serializability) reasons about low-level read/write traces

For any two operations to the same data item, if
at least one is a write, operations might conﬂict T2 T1 T3 ww(c) rw(a) rw(c) rw(b) T3 T1 T2 Under the hood: ACID (conﬂict serializability) reasons about low-level read/write traces END RESULT:! A MYOPIC APPROACH TO CORRECTNESS

END RESULT:! A MYOPIC APPROACH TO CORRECTNESS

END RESULT:! A MYOPIC APPROACH TO CORRECTNESS COST:! SERIALIZABILITY REQUIRES
COORDINATION

COORDINATION synchronous coordination =

COORDINATION synchronous coordination = stalls during network partitions =

COORDINATION synchronous coordination = stalls during network partitions = RTT latency during operations =

COORDINATION synchronous coordination = stalls during network partitions = RTT latency during operations = possible stall during concurrent access

THE ACID SCALABILITY WALL

2 4 6 8 10 12 14 16 18 20
Number of Servers in 2PC 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) LOCAL! DATACENTER max 1200 txn/s THE ACID SCALABILITY WALL

+OR +CA +IR +SP +TO +SI +SY Participating Datacenters (+VA)
2 4 6 8 10 12 Maximum Throughput (txn/s) 2 4 6 8 10 12 14 16 18 20 Number of Servers in 2PC 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) LOCAL! DATACENTER MULTI-! DATACENTER! max 1200 txn/s max 12 txn/s THE ACID SCALABILITY WALL

+OR +CA +IR +SP +TO +SI +SY Participating Datacenters (+VA)
2 4 6 8 10 12 Maximum Throughput (txn/s) 2 4 6 8 10 12 14 16 18 20 Number of Servers in 2PC 0 200 400 600 800 1000 1200 Maximum Throughput (txns/s) LOCAL! DATACENTER MULTI-! DATACENTER! max 1200 txn/s max 12 txn/s THE ACID SCALABILITY WALL decentralized (optimized) 2PC SERIALIZABILITY REQUIRES COORDINATION decentralized (optimized) 2PC

do not support! SSI/serializability HANA

do not support! SSI/serializability HANA Actian Ingres YES Aerospike NO
Persistit NO Clustrix NO Greenplum YES IBM DB2 YES IBM Informix YES MySQL YES MemSQL NO MS SQL Server YES NuoDB NO Oracle 11G NO Oracle BDB YES Oracle BDB JE YES Postgres 9.2.2 YES SAP Hana NO ScaleDB NO VoltDB YES 8/18 databases! surveyed did not 15/18 used! weaker models! by default “Highly Available Transactions: Virtues and Limitations,” VLDB 2014

synchronous coordination =! stalls during network partitions = RTT latency
during operations = possible stall during concurrent access

no (or asynchronous) coordination = synchronous coordination =! stalls during
network partitions = RTT latency during operations = possible stall during concurrent access

no (or asynchronous) coordination = Gilbert and Lynch “High Availability”
= synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access

= low latency (no RTT) = synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access

= low latency (no RTT) = indeﬁnite horizontal scaling synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access

= low latency (no RTT) = indeﬁnite horizontal scaling (even for a single record; true scalability) synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access

= low latency (no RTT) = indeﬁnite horizontal scaling (even for a single record; true scalability) beneﬁts also apply to concurrent access in single-node systems synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access

= low latency (no RTT) = indeﬁnite horizontal scaling (even for a single record; true scalability) beneﬁts also apply to concurrent access in single-node systems synchronous coordination =! stalls during network partitions = RTT latency during operations = possible stall during concurrent access BUT OFTEN GIVE UP CORRECTNESS!!

CORRECTNESS vs. SCALABILITY

CORRECTNESS vs. SCALABILITY SERIALIZABILITY

CORRECTNESS vs. SCALABILITY SERIALIZABILITY Our insight: serializability is sufﬁcient for
correctness! ! ! ! ! but is not necessary! ! ! !

CORRECTNESS vs. SCALABILITY SERIALIZABILITY Our solution: coordination avoidance Our insight:
serializability is sufﬁcient for correctness! ! ! ! ! but is not necessary! ! ! !

CORRECTNESS vs. SCALABILITY SERIALIZABILITY Our solution: coordination avoidance Our insight:
serializability is sufﬁcient for correctness! ! ! ! ! but is not necessary! ! ! ! Only coordinate when necessary

CORRECTNESS vs. SCALABILITY SERIALIZABILITY Our solution: coordination avoidance CORRECTNESS and
SCALABILITY Our insight: serializability is sufﬁcient for correctness! ! ! ! ! but is not necessary! ! ! ! Only coordinate when necessary

SCALABILITY Our insight: serializability is sufﬁcient for correctness! ! ! ! ! but is not necessary! ! ! ! Only coordinate when necessary Ask applications for invariants

SCALABILITY Our insight: serializability is sufﬁcient for correctness! ! ! ! ! but is not necessary! ! ! ! Only coordinate when necessary Ask applications for invariants Invariants determine:! ! When is coordination needed?! ! How much coordination is required?! !

Ask applications for invariants

Invariant:! user IDs are unique Ask applications for invariants

Invariant: each employee is in a department Operations: add employees

Anomaly (to avoid):

l_emp = employees.find(id=“louise”) ! Anomaly (to avoid):

l_emp = employees.find(id=“louise”) ! l_dept = dept.find(l_emp.dept) ! Anomaly (to avoid):

l_emp = employees.find(id=“louise”) ! l_dept = dept.find(l_emp.dept) ! ENORECORD Anomaly (to avoid):

employees = {} dept = {{“ops”:1}, {“dev”:2}} Invariant: each employee
is in a department Operations: add employees

is in a department Operations: add employees d1 = dept.find(“ops”) employees.add({“Harry”:d1})

is in a department Operations: add employees d2 = dept.find(“dev”) employees.add({“Sue”:d2}) d1 = dept.find(“ops”) employees.add({“Harry”:d1})

employees = {{“Harry”:1}, {“Sue”:2}} dept = {{“ops”:1}, {“dev”:2}} employees =
{} dept = {{“ops”:1}, {“dev”:2}} Invariant: each employee is in a department Operations: add employees d2 = dept.find(“dev”) employees.add({“Sue”:d2}) d1 = dept.find(“ops”) employees.add({“Harry”:d1})

employees = {{“Harry”:1}, {“Sue”:2}} dept = {{“ops”:1}, {“dev”:2}} employees =
{} dept = {{“ops”:1}, {“dev”:2}} Invariant: each employee is in a department Operations: add employees d2 = dept.find(“dev”) employees.add({“Sue”:d2}) d1 = dept.find(“ops”) employees.add({“Harry”:d1}) Invariant holds!

Invariant: only one ops on staff at a time Operations:
change staffing

Anomaly (to avoid): Invariant: only one ops on staff at
a time Operations: change staffing

on_duty = employees.find(staffed=”T”) ! Anomaly (to avoid): Invariant: only one
ops on staff at a time Operations: change staffing

on_duty = employees.find(staffed=”T”) ! assert(len(on_duty) == 1) ! Anomaly (to
avoid): Invariant: only one ops on staff at a time Operations: change staffing

on_duty = employees.find(staffed=”T”) ! assert(len(on_duty) == 1) ! ASSERTION FAILS
Anomaly (to avoid): Invariant: only one ops on staff at a time Operations: change staffing

Invariant: only one ops on staff at a time Operations:
change staffing

staff = {“Laura”:T, “Harry”:F, “Gary”:F} Invariant: only one ops on
staff at a time Operations: change staffing

staff = {“Laura”:T, “Harry”:F, “Gary”:F} staff.set({“Laura”:F}, {“Harry”:T}) Invariant: only one
ops on staff at a time Operations: change staffing

staff = {“Laura”:T, “Harry”:F, “Gary”:F} staff.set({“Laura”:F}, “Gary”:T}) staff.set({“Laura”:F}, {“Harry”:T}) Invariant:
only one ops on staff at a time Operations: change staffing

staff = {“Laura”:T, “Harry”:F, “Gary”:F} staff.set({“Laura”:F}, “Gary”:T}) staff.set({“Laura”:F}, {“Harry”:T}) Invariant
violated! staff = {“Laura”:F, “Harry”:T, “Gary”:T} Invariant: only one ops on staff at a time Operations: change staffing

SAFETY correctness always guaranteed

SAFETY correctness always guaranteed LIVENESS database states agree (converge)

I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency,
availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution

To maintain consistency... I-confluence is necessary and sufficient for simultaneously
maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution

Sufficient? Necessary? App-Level? Conflict Serializability Yes No No Invariant Confluence
Yes Yes Yes State-based Commutativity Yes* No Depends To maintain consistency... I-confluence is necessary and sufficient for simultaneously maintaining application-level consistency, availability, convergence, and coordination-freedom Invariant confluence: formal characterization of safe, coordination-free execution

Formal framework for reasoning about application coordination requirements

Formal framework for reasoning about application coordination requirements Coordination depends
on combination of:! - expressiveness of operations! - strength of invariants

on combination of:! - expressiveness of operations! - strength of invariants STRENGTH OF INVARIANTS EXPRESSIVENESS OF OPERATIONS *Okay, so this is simpliﬁed, and there isn’t really a linear order on either axis (rather, it’s more about equivalence classes), but humor me here...

on combination of:! - expressiveness of operations! - strength of invariants STRENGTH OF INVARIANTS EXPRESSIVENESS OF OPERATIONS *Okay, so this is simpliﬁed, and there isn’t really a linear order on either axis (rather, it’s more about equivalence classes), but humor me here... COORDINATION! REQUIRED! COORDINATION-FREE

Can apply the I-conﬂuence test to! standard SQL for program
analysis Invariant Operation C.F. ? Equality, Inequality Any ??? Generate unique ID Any ??? Specify unique ID Insert ??? >! Increment ??? >! Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ???

Constraint: record IDs are unique

Constraint: record IDs are unique DECLARE TABLE users (! ID
int UNIQUE,! FirstName string,! LastName string )

int UNIQUE,! FirstName string,! LastName string ) Anomaly! (to avoid):

int UNIQUE,! FirstName string,! LastName string )

Operation: insert record with speciﬁc ID INSERT INTO users (ID,
firstname, lastname)! VALUES (1, “Leslie”, “Lamport”) Constraint: record IDs are unique DECLARE TABLE users (! ID int UNIQUE,! FirstName string,! LastName string )

NOT C-FREE Operation: insert record with speciﬁc ID INSERT INTO
users (ID, firstname, lastname)! VALUES (1, “Leslie”, “Lamport”) Constraint: record IDs are unique DECLARE TABLE users (! ID int UNIQUE,! FirstName string,! LastName string )

Operation: insert record INSERT INTO users (firstname, lastname)! VALUES (“Leslie”,
“Lamport”) NOT C-FREE Operation: insert record with speciﬁc ID INSERT INTO users (ID, firstname, lastname)! VALUES (1, “Leslie”, “Lamport”) Constraint: record IDs are unique DECLARE TABLE users (! ID int UNIQUE,! FirstName string,! LastName string )

let the DB decide the ID; use node ID or
UUID C-FREE! Operation: insert record INSERT INTO users (firstname, lastname)! VALUES (“Leslie”, “Lamport”) NOT C-FREE Operation: insert record with speciﬁc ID INSERT INTO users (ID, firstname, lastname)! VALUES (1, “Leslie”, “Lamport”) Constraint: record IDs are unique DECLARE TABLE users (! ID int UNIQUE,! FirstName string,! LastName string )

Foreign key constraints

Foreign key constraints DECLARE TABLE users (! U_ID int UNIQUE,!
D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) )

D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string )

D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);!

D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Anomaly (to avoid):

D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Anomaly (to avoid): “lamport” has no department! read lamport record; lookup lamport.D_ID returns NULL

D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Anomaly (to avoid): “lamport” has no department! read lamport record; lookup lamport.D_ID returns NULL I-conﬂuence insight:! cannot be violated by inserts!

D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Anomaly (to avoid): “lamport” has no department! read lamport record; lookup lamport.D_ID returns NULL I-conﬂuence insight:! cannot be violated by inserts! …but be careful about implementation! many ways to use coordination to enforce coordination-free semantics

D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);!

users shard department shard Foreign key constraints DECLARE TABLE users
(! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);!

(! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers

(! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers

users shard department shard (342, “awesome division”) Foreign key constraints
DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers

DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (???, 342, “lamport”)

DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”)

DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”) 2 RTT writes (prepare and make visible)! Between 1-2 RTTs for reads! Basic idea: store metadata to record sibling writes

DECLARE TABLE users (! U_ID int UNIQUE,! D_ID int! UserName string ! FOREIGN KEY (D_ID)! REFERENCES department(D_ID) ) DECLARE TABLE department (! D_ID int UNIQUE,! DeptName string ) NEW_D_ID = INSERT INTO department VALUES (“awesome division”);! INSERT INTO users (D_ID, UserName) VALUES (NEW_D_ID, “lamport”);! Visible to all readers Visible to all readers Not yet visible to all readers Not yet visible to all readers (402, 342, “lamport”) 2 RTT writes (prepare and make visible)! Between 1-2 RTTs for reads! Basic idea: store metadata to record sibling writes Read Atomic Multi-Partition ! Transactions, SIGMOD 2014

Invariant Operation C.F. ? Equality, Inequality Any ??? Generate unique
ID Any ??? Specify unique ID Insert ??? >! Increment ??? >! Decrement ??? < Decrement ??? < Increment ??? Foreign Key Insert ??? Foreign Key Delete ??? Secondary Indexing Any ??? Materialized Views Any ??? AUTO_INCREMENT Insert ??? Can apply the I-conﬂuence test to! standard SQL for program analysis

Invariant Operation C.F. ? Equality, Inequality Any Y Generate unique
ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Can apply the I-conﬂuence test to! standard SQL for program analysis

Eventual consistency Invariant Operation C.F. ? Equality, Inequality Any Y
Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Can apply the I-conﬂuence test to! standard SQL for program analysis

Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N Abstract data types Can apply the I-conﬂuence test to! standard SQL for program analysis

Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N RAMP Transaction Abstract data types Can apply the I-conﬂuence test to! standard SQL for program analysis

Remainder: Cannot avoid coordination Eventual consistency Invariant Operation C.F. ?
Equality, Inequality Any Y Generate unique ID Any Y Specify unique ID Insert N >! Increment Y >! Decrement N < Decrement Y < Increment N Foreign Key Insert Y Foreign Key Delete Y* Secondary Indexing Any Y Materialized Views Any Y! AUTO_INCREMENT Insert N RAMP Transaction Abstract data types Can apply the I-conﬂuence test to! standard SQL for program analysis

Standard SQL with extensions and ! analysis!

CREATE TABLE Orders! (! O_ID int AUTO_INCREMENT,! C_ID int,! O_QTY
int,! DATE datetime NOT NULL! ! PRIMARY KEY (OrderID),! FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID),! CONSTRAINT [O_QTY > 0]! ) CREATE PROCEDURE CreateOrder(@C_ID int, @O_QTY int)! AS! INSERT INTO Orders (C_ID, O_QTY, DATE) VALUES! (C_ID, O_QTY, NOW());! GO Standard SQL with extensions and ! analysis!

> WARNING: Orders.O_ID requires coordination!! INSERT found in CreateOrder! >
WARNING: CreateOrder requires remote check for @C_ID! CREATE TABLE Orders! (! O_ID int AUTO_INCREMENT,! C_ID int,! O_QTY int,! DATE datetime NOT NULL! ! PRIMARY KEY (OrderID),! FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID),! CONSTRAINT [O_QTY > 0]! ) CREATE PROCEDURE CreateOrder(@C_ID int, @O_QTY int)! AS! INSERT INTO Orders (C_ID, O_QTY, DATE) VALUES! (C_ID, O_QTY, NOW());! GO Standard SQL with extensions and ! analysis!

DDL with invariants DDL with invariants CREATE TABLE Orders (
O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] )

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) I-conﬂuence! analysis

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) I-conﬂuence! analysis COORDINATION COST

I-conﬂuence in real programs?

I-conﬂuence in real programs? Benchmark I-conﬂuent invariants TPC-C! 10 of
12 TPC-E 4 of 4 AuctionMark all but 1 SEATS all but 1 JPAB all TATP all

12 TPC-E 4 of 4 AuctionMark all but 1 SEATS all but 1 JPAB all TATP all TRADITIONAL! OLTP! APPLICATIONS! ARE ACHIEVABLE! WITHOUT! (MUCH)! COORDINATION

12 TPC-E 4 of 4 AuctionMark all but 1 SEATS all but 1 JPAB all TATP all Still requires coordination-avoiding query plans!! • Appropriate merge (e.g., counter datatype) • Atomic multi-put (e.g., RAMP) • Nested atomic transactions (e.g., New-Order ID assignment) TRADITIONAL! OLTP! APPLICATIONS! ARE ACHIEVABLE! WITHOUT! (MUCH)! COORDINATION

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) I-conﬂuence! analysis COORDINATION COST

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner I-conﬂuence! analysis COORDINATION COST

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner Application queries I-conﬂuence! analysis COORDINATION COST

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner Application queries Query! executor I-conﬂuence! analysis COORDINATION COST

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner Storage! manager Application queries Query! executor I-conﬂuence! analysis COORDINATION COST

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner Storage! manager Lock! manager Application queries Query! executor I-conﬂuence! analysis COORDINATION COST

O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner Storage! manager Lock! manager Application queries Query! executor I-conﬂuence! analysis STATISTICS COORDINATION COST

COORDINATION COST DDL with invariants DDL with invariants CREATE TABLE
Orders ( O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner Storage! manager Application queries Query! executor I-conﬂuence! analysis STATISTICS Lock! manager

Orders ( O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) Query! planner Storage! manager Application queries Query! executor I-conﬂuence! analysis STATISTICS Lock! manager ONGOING! WORK

Orders ( O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) STATIC! PLAN! (manual) Storage! manager Application queries Query! executor I-conﬂuence! analysis STATISTICS Lock! manager

Orders ( O_ID int AUTO_INCREMENT, C_ID int, O_QTY int, DATE datetime NOT NULL PRIMARY KEY (OrderID), FOREIGN KEY (CustomerID) REFERENCES Customers(C_ID), CONSTRAINT [O_QTY > 0] ) STATIC! PLAN! (manual) Storage! manager Application queries Query! executor I-conﬂuence! analysis Lock! manager

TPC-C New-Order

TPC-C New-Order warehouse district orders neworders

TPC-C New-Order Pre-materialized aggregates (e.g., W_YTD=SUM(orders for warehouse)) warehouse district
orders neworders

orders neworders insert! 100

orders neworders +100 insert! 100

TPC-C New-Order Pre-materialized aggregates (e.g., W_YTD=SUM(orders for warehouse)) RAMP transaction
on counter CRDT warehouse district orders neworders +100 insert! 100

TPC-C New-Order Pre-materialized aggregates (e.g., W_YTD=SUM(orders for warehouse)) RAMP transaction
on counter CRDT warehouse district orders neworders

TPC-C New-Order Foreign key insert (e.g., NewOrder, Orders tables) Pre-materialized
aggregates (e.g., W_YTD=SUM(orders for warehouse)) RAMP transaction on counter CRDT warehouse district orders neworders

aggregates (e.g., W_YTD=SUM(orders for warehouse)) RAMP transaction on counter CRDT insert! O_ID warehouse district orders neworders

aggregates (e.g., W_YTD=SUM(orders for warehouse)) RAMP transaction on counter CRDT insert! O_ID warehouse district orders neworders insert! O_ID

aggregates (e.g., W_YTD=SUM(orders for warehouse)) RAMP transaction on counter CRDT RAMP transaction across tables insert! O_ID warehouse district orders neworders insert! O_ID

aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables insert! O_ID warehouse district orders neworders insert! O_ID

aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables insert! O_ID warehouse district orders neworders insert! O_ID assign! new! O_ID

aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID

aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID

aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables rewrite FK references to point to temp unique ID! create local index from temp unique ID to sequence ID insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID

aggregates (e.g., W_YTD=SUM(orders for warehouse)) Sequence number ID assignment (i.e., D_NEXT_O_ID) RAMP transaction on counter CRDT RAMP transaction across tables rewrite FK references to point to temp unique ID! create local index from temp unique ID to sequence ID insert! O_ID warehouse district orders neworders insert! O_ID deferred atomic incrementAndGet() on commit! assign! new! O_ID tmp ID O NLY SYNCH CO O RDINATIO N! REQ UIRED

TPCC Combine fkeys with sequence number insert on commit...

TPCC Combine fkeys with sequence number insert on commit... 500K
txns/s

Linear Scaling via Coordination Avoidance Coordination need not be a
bottleneck (if implemented in a coordination-free manner): UC Berkeley database prototype, 100 EC2 CC2.8xlarge instances (thank you AWS folks! currently poor single-node performance, but unimportant if you can scale out [for the time being]), linearizable masters, only blocking coordination: incrementAndGet for “district next order ID” key, CPU-bound on in-memory data; ~2500 lines Java; 120 clients/warehouse, 5 warehouses/machine, no THINK TIME (i.e., more contention than stock conﬁguration)

Traditional database systems suffer from! coordination bottlenecks By understanding application
requirements,! we can avoid coordination unless necessary We can build systems that actually scale! while providing correct behavior

requirements,! we can avoid coordination unless necessary We can build systems that actually scale! while providing correct behavior Thanks!! ! [email protected]! @pbailis! http://bailis.org/ http://amplab.cs.berkeley.edu/!

requirements,! we can avoid coordination unless necessary We can build systems that actually scale! while providing correct behavior Thanks!! ! [email protected]! @pbailis! http://bailis.org/ http://amplab.cs.berkeley.edu/! based on “Coordination-Avoiding Database Systems,”! Bailis, Fekete, Franklin, Ghodsi, Hellerstein, Stoica! arXiv:1402.2237 http://arxiv.org/abs/1402.2237

http://pbs.cs.berkeley.edu/#demo

Availability, Consistency, and Horizontally Sca...

Availability, Consistency, and Horizontally Scalable Data Management (SF Bay Area ACM)

More Decks by pbailis

Other Decks in Technology

Featured

Transcript