CONCURRENCY CONTROL
THE CASE FOR
INVARIANT-BASED
Peter Bailis
UC Berkeley
with Alan Fekete, Mike Franklin, Ali Ghodsi, Ion Stoica, Joe Hellerstein
Slide 2
Slide 2 text
CONCURRENCY CONTROL
THE CASE FOR
INVARIANT-BASED
Peter Bailis
UC Berkeley
with Alan Fekete, Mike Franklin, Ali Ghodsi, Ion Stoica, Joe Hellerstein
CIDR 2015 Gong Show
5 January 2015, Pacific Grove, CA
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Serializability is expensive
Slide 6
Slide 6 text
Use weaker models instead
Serializability is expensive
Slide 7
Slide 7 text
Use weaker models instead
Serializability is expensive
1975!
Slide 8
Slide 8 text
do not support
serializability
HANA
[VLDB 2014]
Slide 9
Slide 9 text
do not support
serializability
HANA
Actian Ingres YES
Aerospike NO
N
Persistit NO
N
Clustrix NO
N
Greenplum YES
IBM DB2 YES
IBM Informix YES
MySQL YES
MemSQL NO
N
MS SQL Server YES
NuoDB NO
N
Oracle 11G NO
N
Oracle BDB YES
Oracle BDB JE YES
Postgres 9.2.2 YES*
SAP Hana NO
N
ScaleDB NO
N
VoltDB YES
Serializability supported?
[VLDB 2014]
Slide 10
Slide 10 text
do not support
serializability
HANA
Actian Ingres YES
Aerospike NO
N
Persistit NO
N
Clustrix NO
N
Greenplum YES
IBM DB2 YES
IBM Informix YES
MySQL YES
MemSQL NO
N
MS SQL Server YES
NuoDB NO
N
Oracle 11G NO
N
Oracle BDB YES
Oracle BDB JE YES
Postgres 9.2.2 YES*
SAP Hana NO
N
ScaleDB NO
N
VoltDB YES
8/18 databases
surveyed didn’t
15/18 used
weak models
by default
Serializability supported?
[VLDB 2014]
Slide 11
Slide 11 text
READ COMMITTED
Slide 12
Slide 12 text
READ COMMITTED
Slide 13
Slide 13 text
READ COMMITTED
G0: Write Cycles. A history H exhibits phenomenon G0 if
DSG(H) contains a directed cycle consisting entirely of
write-dependency edges.
G1a: Aborted Reads. A history H shows phenomenon G1a
if it contains an aborted transaction T1 and a committed
transaction T2 such that T2 has read some object (maybe
via a predicate) modified by T1.
G1b: Intermediate Reads. A history H shows phenomenon
G1b if it contains a committed transaction T2 that has read
a version of object x (maybe via a predicate) written by
transaction T1 that was not T1’s final modification of x.
G1c: Circular Information Flow. A history H exhibits
phenomenon G1c if DSG(H) contains a directed cycle
consisting entirely of dependency edges.
[Atul Adya’s Ph.D, 1999]
Slide 14
Slide 14 text
READ COMMITTED
G0: Write Cycles. A history H exhibits phenomenon G0 if
DSG(H) contains a directed cycle consisting entirely of
write-dependency edges.
G1a: Aborted Reads. A history H shows phenomenon G1a
if it contains an aborted transaction T1 and a committed
transaction T2 such that T2 has read some object (maybe
via a predicate) modified by T1.
G1b: Intermediate Reads. A history H shows phenomenon
G1b if it contains a committed transaction T2 that has read
a version of object x (maybe via a predicate) written by
transaction T1 that was not T1’s final modification of x.
G1c: Circular Information Flow. A history H exhibits
phenomenon G1c if DSG(H) contains a directed cycle
consisting entirely of dependency edges.
[Atul Adya’s Ph.D, 1999]
Highly nuanced,
very technical,
sometimes
incomplete!
Slide 15
Slide 15 text
It is insane to assume users
can/should reason about
weak isolation…
Slide 16
Slide 16 text
It is insane to assume users
can/should reason about
weak isolation…
a fate worse than death
Slide 17
Slide 17 text
It is insane to assume users
can/should reason about
weak isolation…
…and yet they still use it!
a fate worse than death
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
Coordination
costs increase
with
distribution!
Slide 20
Slide 20 text
Coordination
costs increase
with
distribution!
Slide 21
Slide 21 text
Can we provide a
more usable
high performance
concurrency
control primitive?
Slide 22
Slide 22 text
Invariants:
Slide 23
Slide 23 text
Invariants:
“usernames should be unique”
Slide 24
Slide 24 text
Invariants:
“usernames should be unique”
“each patient should have a attending doctor”
Slide 25
Slide 25 text
Invariants:
“usernames should be unique”
“each patient should have a attending doctor”
“account balances should be positive”
Slide 26
Slide 26 text
1.) Are easier to reason about than weak isolation
Invariants:
“usernames should be unique”
“each patient should have a attending doctor”
“account balances should be positive”
Slide 27
Slide 27 text
1.) Are easier to reason about than weak isolation
2.) Are already specified in many applications
Invariants:
“usernames should be unique”
“each patient should have a attending doctor”
“account balances should be positive”
1.) Are easier to reason about than weak isolation
2.) Are already specified in many applications
Invariants:
“usernames should be unique”
“each patient should have a attending doctor”
“account balances should be positive”
Slide 34
Slide 34 text
Foreign Key Constraints YES
Primary Key Constraints YES
Row-Level Check Constraints YES
Multi-Row Check Constraints NO
Generic ADT Invariants NO
UDF Invariants NO
DB supported invariants today:
Slide 35
Slide 35 text
Foreign Key Constraints YES
Primary Key Constraints YES
Row-Level Check Constraints YES
Multi-Row Check Constraints NO
Generic ADT Invariants NO
UDF Invariants NO
DB supported invariants today:
Slide 36
Slide 36 text
Foreign Key Constraints YES
Primary Key Constraints YES
Row-Level Check Constraints YES
Multi-Row Check Constraints NO
Generic ADT Invariants NO
UDF Invariants NO
DB supported invariants today:
& little support for distributing,
suggesting, mining invariants
Slide 37
Slide 37 text
1.) Are easier to reason about than weak isolation
2.) Are already specified in many applications
3.) Should be a first-class database primitive
4.) Enable more efficient systems design
Invariants:
Slide 38
Slide 38 text
1.) Are easier to reason about than weak isolation
2.) Are already specified in many applications
3.) Should be a first-class database primitive
4.) Enable more efficient systems design
Invariants:
Slide 39
Slide 39 text
1.) Are easier to reason about than weak isolation
2.) Are already specified in many applications
3.) Should be a first-class database primitive
4.) Enable more efficient systems design
Invariants:
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
scale to
over 25x
prior best
on New-Order
0 50 100 150 200
2M
4M
6M
8M
10M
12M
14M
Total Throughput (txn/s)
0 50 100 150 200
Number of Servers
0
20K
40K
60K
80K
Throughput (txn/s/server)
6-11x faster than
ACID/serializability
on New-Order
8 16 32 48 64
Number of Warehouses
40K
100K
600K
Throughput (txns/s)
Coordination-Avoiding Serializable (2PL)
TPC-C
Slide 42
Slide 42 text
1.) Are easier to reason about than weak isolation
2.) Are already specified in many applications
3.) Should be a first-class database primitive
4.) Enable more efficient systems design
Invariants:
Slide 43
Slide 43 text
1.) Are easier to reason about than weak isolation
2.) Are already specified in many applications
3.) Should be a first-class database primitive
4.) Enable more efficient systems design
Invariants:
We can do so much better
than weak isolation
Slide 44
Slide 44 text
Image Credits:
world by Wayne Tyler Sall
surprised by Julian Deveaux
database by Austin Condiff
man by Simon Child
by the Noun Project
Creative Commons - Attribution (CC by 3.0)