Sumedh Pathak, Co-Founder & VP Engineering, Citus Data
DataEngConf San Francisco 2018
Scaling the Relational
Database for the Cloud Age
@moss_toss | @citusdata
Slide 2
Slide 2 text
About Me
• Co-Founder & VP Engineering at Citus Data
• Amazon Shopping Cart
(former)
• Amazon Supply Chain & Order Fulfillment (former)
• Stanford Computer Science
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
Why
RDBMS?
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 5
Slide 5 text
Streaming Storage
Map
Reduce
NoSQL
SQL
Database
SELECT ...
Application
Application
Sumedh Pathak | Citus Data | DataEngConf 2018
Because your architecture could be simpler
Slide 6
Slide 6 text
An RDBMS is a
general-purpose
data platform
Sumedh Pathak | Citus Data | DataEngConf 2018
Fast writes
Real-time & bulk
High throughput
High concurrency
Data consistency
Query optimizers
Slide 7
Slide 7 text
PostgreSQL
MySQL
MongoDB
SQL Server +
Oracle
Source: Hacker News, https://news.ycombinator.com
Startups Are Choosing Postgres
% database job posts mentioning each database, across 20K+ job posts
Slide 8
Slide 8 text
Data Trends in the “Cloud Age”
- Explosion of Data
- Higher Volume
- Higher Velocity
** SCALE **
Slide 9
Slide 9 text
but RDBMS’s
don’t scale!
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 10
Slide 10 text
but RDBMS’s don’t scale
RDBMS’s are hard to scale
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 11
Slide 11 text
What exactly needs to Scale?
- Tables (Data)
- Partitioning, Co-location, Reference Tables
- SQL (Reads)
- How do we express and optimize distributed SQL
- Transactions (Writes)
- Cross Shard updates/deletes, Global Atomic Transactions
Sumedh Pathak | Citus Data | DataEngConf 2018
1
2
3
Slide 12
Slide 12 text
Scaling
Tables
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 13
Slide 13 text
Data Partitioning
- Pick a column
- Date
- Id (customer_id, cart_id)
- Pick a method
- Hash
- Range
Slide 14
Slide 14 text
Partition data across nodes
R1
R2
R3
R4
R5
R6
R7
Coordinator
Node
Worker
Nodes
Shards
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 15
Slide 15 text
Worker → RDBMS, Shard → Table
Slide 16
Slide 16 text
Reference Tables
N1
N1
N1
N1
Copies of same table
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Node
Worker
Nodes
Slide 17
Slide 17 text
Co-Location
R1
R2
R3
R4
S1
S2
S3
S4
Explicit Co-Location API.
E.g. Partition by Tenant
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Node
Worker
Nodes
Slide 18
Slide 18 text
What about Foreign Keys?
Slide 19
Slide 19 text
The key to scaling tables...
- Use relational databases as a building block
- Understand semantics of application—to be smart about
partitioning
- Multi-tenant applications
Slide 20
Slide 20 text
Scaling
SQL
Slide 21
Slide 21 text
FROM table R
SELECT x Project
x
(R) ! R’
WHERE f(x) Filter
f(x)
(R) ! R’
… JOIN … R × S ! R’
SQL ↔ Relational Algebra
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 22
Slide 22 text
FROM sharded_table Collect(R
1
,R
2
,...) ! R
Distributed Relational Algebra
R1
R2
C
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 23
Slide 23 text
Commutativity
A + B = B + A
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 24
Slide 24 text
Projectx
(Collect(R1
,R2
,...)) = Collect(Projectx
(R1
), Projectx
(R2
)...)
Commutative property
R1
R2
C
R1
R2
C
Px
Px
Px
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 25
Slide 25 text
Distributivity
A*(B + C) = A*B + A*C
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 26
Slide 26 text
Collect(R1
,R2
,...) x Collect(S1
,S2
,...) = Collect(R1
× S1
,R2
× S2,
...)
Distributive property
R1
R2
C
×
C
S1
S2
R1
R2
C
×
S1
S2
×
X = Join Operator Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 27
Slide 27 text
Associativity
A + B + C =
(A + B) + C =
A + (B + C)
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 28
Slide 28 text
SUM(x)(Collect(R1
,R2
,...)) = SUM(Collect(SUM(R1
), SUM(R2
)...))
Associative property
R1
R2
C
R1
R2
C
Sumx
Sumx
Sumx
Sumedh Pathak | Citus Data | DataEngConf 2018
Sumx
Slide 29
Slide 29 text
SELECT sum(price) FROM orders,
nation
WHERE orders.nation = nation.name
AND
orders.date >= '2012-01-01'
AND
nation.region = 'Asia';
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 30
Slide 30 text
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 31
Slide 31 text
Sumedh Pathak | Citus Data | DataEngConf 2018
Volcano style processing
Data flows
from
bottom
to top
Slide 32
Slide 32 text
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 33
Slide 33 text
Sumedh Pathak | Citus Data | DataEngConf 2018
Parallelize
Aggregate
Push Joins & Filters
below collect. Run in
parallel across all
nodes
Filters & Projections
done before Join
Slide 34
Slide 34 text
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 35
Slide 35 text
SELECT sum(intermediate_col)
FROM ;
SELECT sum(price)
FROM orders_2 JOIN nation_2
ON (orders_2.name =
nation_2.name)
WHERE
orders_2.date >= '2017-01-01'
AND
nation_2.region = 'Asia';
SELECT sum(price)
FROM orders_2 JOIN nation_1
ON (orders_2.name =
nation_1.name)
WHERE
orders_2.date >= '2017-01-01'
AND
nation_2.region = 'Asia';
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 36
Slide 36 text
Executing Distributed SQL
SQL database
orders_1
nation_1
orders
nation
SELECT sum(price)
FROM orders_2 o JOIN nation_1
ON (o.name = n.name)
WHERE o.date >= '2017-01-01'
AND n.region = 'Asia';
SELECT sum(price)
FROM orders_1 o JOIN nation_1
ON (o.name = n.name)
WHERE o.date >= '2017-01-01'
AND n.region = 'Asia';
SELECT sum(price) FROM
;
SQL database
orders_2
nation_1
Slide 37
Slide 37 text
The key to scaling SQL...
- New relational algebra operators for
distributed processing
- Relational Algebra Properties to optimize
tree: Commutativity, Associativity, &
Distributivity
- Map / Reduce operators
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 38
Slide 38 text
Scaling
Transactions
Slide 39
Slide 39 text
Money Transfer, as an example
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 40
Slide 40 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
Coordinator
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 41
Slide 41 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Slide 42
Slide 42 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
COMMIT;
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Slide 43
Slide 43 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
Coordinator
BEGIN;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
Slide 44
Slide 44 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
BEGIN;
UPDATE accounts SET balance =
balance + 100 WHERE id =
‘BOB’;
COMMIT
Sumedh Pathak | Citus Data | DataEngConf 2018
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
Coordinator
Slide 45
Slide 45 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Slide 46
Slide 46 text
What happens during PREPARE?
State of transaction stored
on a durable store
Locks are
maintained
&
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 47
Slide 47 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
Coordinator
PREPARE TRANSACTION ‘citus_...98’;
ROLLBACK TRANSACTION ‘citus_...
98’;
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 48
Slide 48 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Slide 49
Slide 49 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
COMMIT PREPARED ‘citus_...
98’;
COMMIT PREPARED ‘citus_...
98’;
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Slide 50
Slide 50 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
COMMIT PREPARED ‘citus_...
98’;
COMMIT PREPARED ‘citus_...
98’;
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Slide 51
Slide 51 text
A1
A2
BEGIN;
UPDATE accounts SET balance =
balance - 100 WHERE id = ‘ALICE’;
UPDATE accounts SET balance =
balance + 100 WHERE id = ‘BOB’;
COMMIT;
A3
A4
COMMIT PREPARED ‘citus_...
98’;
COMMIT PREPARED ‘citus_...
98’;
Sumedh Pathak | Citus Data | DataEngConf 2018
Coordinator
Slide 52
Slide 52 text
But....
Deadlocks!
Slide 53
Slide 53 text
Example
// SESSION 1
BEGIN;
UPDATE accounts SET
balance = balance - 100
WHERE id = ‘ALICE’;
(LOCK on ROW with id
‘ALICE’)
// SESSION 2
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 54
Slide 54 text
Example
// SESSION 1
BEGIN;
UPDATE accounts SET balance = balance - 100
WHERE id = ‘ALICE’;
(LOCK on ROW with id ‘ALICE’)
// SESSION 2
BEGIN;
UPDATE accounts SET
balance = balance + 100
WHERE id = ‘BOB’;
(LOCK on ROW with id
‘BOB’)
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 55
Slide 55 text
Example
// SESSION 1
BEGIN;
UPDATE accounts SET balance = balance - 100
WHERE id = ‘ALICE’;
(LOCK on ROW with id ‘ALICE’)
UPDATE accounts SET
balance = balance + 100
WHERE id = ‘BOB’;
// SESSION 2
BEGIN;
UPDATE accounts SET balance = balance + 100
WHERE id = ‘BOB’;
(LOCK on ROW with id ‘BOB’)
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 56
Slide 56 text
Example
// SESSION 1
BEGIN;
UPDATE accounts SET balance = balance - 100
WHERE id = ‘ALICE’;
(LOCK on ROW with id ‘ALICE’)
UPDATE accounts SET balance = balance +
100 WHERE id = ‘BOB’;
// SESSION 2
BEGIN;
UPDATE accounts SET balance = balance + 100
WHERE id = ‘BOB’;
(LOCK on ROW with id ‘BOB’)
UPDATE accounts SET
balance = balance - 100
WHERE id = ‘ALICE’
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 57
Slide 57 text
How do Relational DB’s solve this?
S1
S2
- Construct a
Directed Graph
- Each node is a
session/transaction
- Edge represents a
wait on a lock
Waiting
on ‘Bob’
Waiting
on ‘Alice’
Slide 58
Slide 58 text
S1
S2
S1 S2
S2 S1
S2 Waits on S1 S1 Waits on S2
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 59
Slide 59 text
S1
S2
S1 S2
S2 S1
S2 Waits on S1 S1 Waits on S2
Distributed
Deadlock
Detector
S1
S2 S2
S1
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 60
Slide 60 text
keys to scaling transactions
- 2PC to ensure atomic transactions across nodes
- Deadlock Detection—to scale complex & concurrent
transaction workloads
- MVCC
- Failure Handling
4
Sumedh Pathak | Citus Data | DataEngConf 2018
Slide 61
Slide 61 text
It’s 2018. Distributed can be Relational
- Scale tables—via sharding
- Scale SQL—via distributed relational algebra
- Scale transactions—via 2PC & Deadlock Detection
Sumedh Pathak | Citus Data | DataEngConf 2018
Now, how do we implement all of this?
Slide 62
Slide 62 text
Postgres
Sumedh Pathak | Citus Data | DataEngConf 2018