Slide 1

Slide 1 text

Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San Francisco 2018 Scaling the Relational Database for the Cloud Age @moss_toss | @citusdata

Slide 2

Slide 2 text

About Me • Co-Founder & VP Engineering at Citus Data • Amazon Shopping Cart
 (former) • Amazon Supply Chain & Order Fulfillment (former) • Stanford Computer Science Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Why RDBMS? Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 5

Slide 5 text

Streaming Storage Map Reduce NoSQL SQL Database SELECT ... Application Application Sumedh Pathak | Citus Data | DataEngConf 2018 Because your architecture could be simpler

Slide 6

Slide 6 text

An RDBMS is a general-purpose
 data platform Sumedh Pathak | Citus Data | DataEngConf 2018 Fast writes Real-time & bulk High throughput High concurrency Data consistency Query optimizers

Slide 7

Slide 7 text

PostgreSQL MySQL MongoDB SQL Server + Oracle Source: Hacker News, https://news.ycombinator.com
 Startups Are Choosing Postgres % database job posts mentioning each database, across 20K+ job posts

Slide 8

Slide 8 text

Data Trends in the “Cloud Age” - Explosion of Data - Higher Volume - Higher Velocity ** SCALE **

Slide 9

Slide 9 text

but RDBMS’s 
 don’t scale! Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 10

Slide 10 text

but RDBMS’s don’t scale RDBMS’s are hard to scale Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 11

Slide 11 text

What exactly needs to Scale? - Tables (Data) - Partitioning, Co-location, Reference Tables - SQL (Reads) - How do we express and optimize distributed SQL - Transactions (Writes) - Cross Shard updates/deletes, Global Atomic Transactions Sumedh Pathak | Citus Data | DataEngConf 2018 1 2 3

Slide 12

Slide 12 text

Scaling Tables Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 13

Slide 13 text

Data Partitioning - Pick a column - Date - Id (customer_id, cart_id) - Pick a method - Hash - Range

Slide 14

Slide 14 text

Partition data across nodes R1 R2 R3 R4 R5 R6 R7 Coordinator Node Worker Nodes Shards Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 15

Slide 15 text

Worker → RDBMS, Shard → Table

Slide 16

Slide 16 text

Reference Tables N1 N1 N1 N1 Copies of same table Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator Node Worker Nodes

Slide 17

Slide 17 text

Co-Location R1 R2 R3 R4 S1 S2 S3 S4 Explicit Co-Location API.
 E.g. Partition by Tenant Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator Node Worker Nodes

Slide 18

Slide 18 text

What about Foreign Keys?

Slide 19

Slide 19 text

The key to scaling tables... - Use relational databases as a building block - Understand semantics of application—to be smart about partitioning - Multi-tenant applications

Slide 20

Slide 20 text

Scaling SQL

Slide 21

Slide 21 text

FROM table R SELECT x Project x (R) ! R’ WHERE f(x) Filter f(x) (R) ! R’ … JOIN … R × S ! R’ SQL ↔ Relational Algebra Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 22

Slide 22 text

FROM sharded_table Collect(R 1 ,R 2 ,...) ! R Distributed Relational Algebra R1 R2 C Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 23

Slide 23 text

Commutativity A + B = B + A Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 24

Slide 24 text

Projectx (Collect(R1 ,R2 ,...)) = Collect(Projectx (R1 ), Projectx (R2 )...) Commutative property R1 R2 C R1 R2 C Px Px Px Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 25

Slide 25 text

Distributivity A*(B + C) = A*B + A*C Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 26

Slide 26 text

Collect(R1 ,R2 ,...) x Collect(S1 ,S2 ,...) = Collect(R1 × S1 ,R2 × S2, ...) Distributive property R1 R2 C × C S1 S2 R1 R2 C × S1 S2 × X = Join Operator Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 27

Slide 27 text

Associativity A + B + C = (A + B) + C = A + (B + C) Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 28

Slide 28 text

SUM(x)(Collect(R1 ,R2 ,...)) = SUM(Collect(SUM(R1 ), SUM(R2 )...)) Associative property R1 R2 C R1 R2 C Sumx Sumx Sumx Sumedh Pathak | Citus Data | DataEngConf 2018 Sumx

Slide 29

Slide 29 text

SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND orders.date >= '2012-01-01' AND nation.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 30

Slide 30 text

Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 31

Slide 31 text

Sumedh Pathak | Citus Data | DataEngConf 2018 Volcano style processing Data flows from
 bottom
 to top

Slide 32

Slide 32 text

Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 33

Slide 33 text

Sumedh Pathak | Citus Data | DataEngConf 2018 Parallelize Aggregate Push Joins & Filters below collect. Run in parallel across all nodes Filters & Projections done before Join

Slide 34

Slide 34 text

Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 35

Slide 35 text

SELECT sum(intermediate_col) 
 FROM ; SELECT sum(price) FROM orders_2 JOIN nation_2
 ON (orders_2.name = nation_2.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia'; SELECT sum(price) FROM orders_2 JOIN nation_1
 ON (orders_2.name = nation_1.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 36

Slide 36 text

Executing Distributed SQL SQL database orders_1 nation_1 orders nation SELECT sum(price) FROM orders_2 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM orders_1 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM ; SQL database orders_2 nation_1

Slide 37

Slide 37 text

The key to scaling SQL... - New relational algebra operators for distributed processing - Relational Algebra Properties to optimize tree: Commutativity, Associativity, & Distributivity - Map / Reduce operators Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 38

Slide 38 text

Scaling Transactions

Slide 39

Slide 39 text

Money Transfer, as an example BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 40

Slide 40 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 Coordinator Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 41

Slide 41 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator

Slide 42

Slide 42 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator

Slide 43

Slide 43 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 Coordinator BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; Sumedh Pathak | Citus Data | DataEngConf 2018 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’;

Slide 44

Slide 44 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT Sumedh Pathak | Citus Data | DataEngConf 2018 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; Coordinator

Slide 45

Slide 45 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator

Slide 46

Slide 46 text

What happens during PREPARE? State of transaction stored on a durable store Locks are
 maintained & Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 47

Slide 47 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 Coordinator PREPARE TRANSACTION ‘citus_...98’; ROLLBACK TRANSACTION ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 48

Slide 48 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator

Slide 49

Slide 49 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator

Slide 50

Slide 50 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator

Slide 51

Slide 51 text

A1 A2 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator

Slide 52

Slide 52 text

But.... Deadlocks!

Slide 53

Slide 53 text

Example // SESSION 1 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) // SESSION 2 Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 54

Slide 54 text

Example // SESSION 1 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) // SESSION 2 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’) Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 55

Slide 55 text

Example // SESSION 1 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; // SESSION 2 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’) Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 56

Slide 56 text

Example // SESSION 1 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; // SESSION 2 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’) UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’ Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 57

Slide 57 text

How do Relational DB’s solve this? S1 S2 - Construct a Directed Graph - Each node is a session/transaction - Edge represents a wait on a lock Waiting on ‘Bob’ Waiting on ‘Alice’

Slide 58

Slide 58 text

S1 S2 S1 S2 S2 S1 S2 Waits on S1 S1 Waits on S2 Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 59

Slide 59 text

S1 S2 S1 S2 S2 S1 S2 Waits on S1 S1 Waits on S2 Distributed
 Deadlock Detector S1 S2 S2 S1 Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 60

Slide 60 text

keys to scaling transactions - 2PC to ensure atomic transactions across nodes - Deadlock Detection—to scale complex & concurrent transaction workloads - MVCC - Failure Handling 4 Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 61

Slide 61 text

It’s 2018. Distributed can be Relational - Scale tables—via sharding - Scale SQL—via distributed relational algebra - Scale transactions—via 2PC & Deadlock Detection Sumedh Pathak | Citus Data | DataEngConf 2018 Now, how do we implement all of this?

Slide 62

Slide 62 text

Postgres Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 63

Slide 63 text

PostgreSQL Planner Executor Custom scan Commit / abort Extension (.so) Access methods Foreign tables Functions ... ... ... ... ... ... ... CREATE EXTENSION ... Sumedh Pathak | Citus Data | DataEngConf 2018

Slide 64

Slide 64 text

PostgreSQL PostgreSQL PostgreSQL shards shards shard shards shards shard SELECT … FROM distributed_table … SELECT … FROM shard… SELECT … FROM shard… Citus

Slide 65

Slide 65 text

PostgreSQL Citus PostGIS PL/Python JSONB 2PC Replication ... Sequences Indexes Full-Text Search Transactions … dblink Sumedh Pathak | Citus Data | DataEngConf 2018 Rich ecosystem of tooling, data types,
 & extensions in PostgreSQL

Slide 66

Slide 66 text

66 PostgreSQL can be
 extended into an all-purpose distributed RDBMS

Slide 67

Slide 67 text

67 I believe the future of distributed databases is relational.

Slide 68

Slide 68 text

Thank you! Sumedh Pathak [email protected] @moss_toss | @citusdata | citusdata.com DataEngConf 2018