Scaling the Relational Database for the Cloud Age | DataEngConf SF 2018 | Sumedh Pathak

Scaling the Relational Database for the Cloud Age | DataEngConf SF 2018 | Sumedh Pathak

Years ago when working at Amazon on shopping cart infrastructure and the precursor to DynamoDB, my co-founder and I realized that while distributed key value stores were useful for a few use-cases, we missed many of the benefits of relational databases: transactions, joins, and the power of the lingua franca of RDBMS’s: SQL. So we challenged ourselves to modernize the traditional relational database, to take a robust open source relational database and transform it into a distributed database.

This talk is about my team’s journey to create a more modern relational database. I’ll talk about the distributed systems problems we had to solve in order to scale out the Postgres open source database, in order to achieve parallelism and a concomitant increase in performance. I'll describe the architecture of the distributed query planner; how we extend traditional relational algebra operators to plan distributed queries and scale reads. I’ll also describe distributed deadlock detection, and how that enabled us to scale out transactions spanning multiple machines.

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

April 17, 2018
Tweet

Transcript

  1. Sumedh Pathak, Co-Founder & VP Engineering, Citus Data DataEngConf San

    Francisco 2018 Scaling the Relational Database for the Cloud Age @moss_toss | @citusdata
  2. About Me • Co-Founder & VP Engineering at Citus Data

    • Amazon Shopping Cart
 (former) • Amazon Supply Chain & Order Fulfillment (former) • Stanford Computer Science Sumedh Pathak | Citus Data | DataEngConf 2018
  3. None
  4. Why RDBMS? Sumedh Pathak | Citus Data | DataEngConf 2018

  5. Streaming Storage Map Reduce NoSQL SQL Database SELECT ... Application

    Application Sumedh Pathak | Citus Data | DataEngConf 2018 Because your architecture could be simpler
  6. An RDBMS is a general-purpose
 data platform Sumedh Pathak |

    Citus Data | DataEngConf 2018 Fast writes Real-time & bulk High throughput High concurrency Data consistency Query optimizers
  7. PostgreSQL MySQL MongoDB SQL Server + Oracle Source: Hacker News,

    https://news.ycombinator.com
 Startups Are Choosing Postgres % database job posts mentioning each database, across 20K+ job posts
  8. Data Trends in the “Cloud Age” - Explosion of Data

    - Higher Volume - Higher Velocity ** SCALE **
  9. but RDBMS’s 
 don’t scale! Sumedh Pathak | Citus Data

    | DataEngConf 2018
  10. but RDBMS’s don’t scale RDBMS’s are hard to scale Sumedh

    Pathak | Citus Data | DataEngConf 2018
  11. What exactly needs to Scale? - Tables (Data) - Partitioning,

    Co-location, Reference Tables - SQL (Reads) - How do we express and optimize distributed SQL - Transactions (Writes) - Cross Shard updates/deletes, Global Atomic Transactions Sumedh Pathak | Citus Data | DataEngConf 2018 1 2 3
  12. Scaling Tables Sumedh Pathak | Citus Data | DataEngConf 2018

  13. Data Partitioning - Pick a column - Date - Id

    (customer_id, cart_id) - Pick a method - Hash - Range
  14. Partition data across nodes R1 R2 R3 R4 R5 R6

    R7 Coordinator Node Worker Nodes Shards Sumedh Pathak | Citus Data | DataEngConf 2018
  15. Worker → RDBMS, Shard → Table

  16. Reference Tables N1 N1 N1 N1 Copies of same table

    Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator Node Worker Nodes
  17. Co-Location R1 R2 R3 R4 S1 S2 S3 S4 Explicit

    Co-Location API.
 E.g. Partition by Tenant Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator Node Worker Nodes
  18. What about Foreign Keys?

  19. The key to scaling tables... - Use relational databases as

    a building block - Understand semantics of application—to be smart about partitioning - Multi-tenant applications
  20. Scaling SQL

  21. FROM table R SELECT x Project x (R) ! R’

    WHERE f(x) Filter f(x) (R) ! R’ … JOIN … R × S ! R’ SQL ↔ Relational Algebra Sumedh Pathak | Citus Data | DataEngConf 2018
  22. FROM sharded_table Collect(R 1 ,R 2 ,...) ! R Distributed

    Relational Algebra R1 R2 C Sumedh Pathak | Citus Data | DataEngConf 2018
  23. Commutativity A + B = B + A Sumedh Pathak

    | Citus Data | DataEngConf 2018
  24. Projectx (Collect(R1 ,R2 ,...)) = Collect(Projectx (R1 ), Projectx (R2

    )...) Commutative property R1 R2 C R1 R2 C Px Px Px Sumedh Pathak | Citus Data | DataEngConf 2018
  25. Distributivity A*(B + C) = A*B + A*C Sumedh Pathak

    | Citus Data | DataEngConf 2018
  26. Collect(R1 ,R2 ,...) x Collect(S1 ,S2 ,...) = Collect(R1 ×

    S1 ,R2 × S2, ...) Distributive property R1 R2 C × C S1 S2 R1 R2 C × S1 S2 × X = Join Operator Sumedh Pathak | Citus Data | DataEngConf 2018
  27. Associativity A + B + C = (A + B)

    + C = A + (B + C) Sumedh Pathak | Citus Data | DataEngConf 2018
  28. SUM(x)(Collect(R1 ,R2 ,...)) = SUM(Collect(SUM(R1 ), SUM(R2 )...)) Associative property

    R1 R2 C R1 R2 C Sumx Sumx Sumx Sumedh Pathak | Citus Data | DataEngConf 2018 Sumx
  29. SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND

    orders.date >= '2012-01-01' AND nation.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018
  30. Sumedh Pathak | Citus Data | DataEngConf 2018

  31. Sumedh Pathak | Citus Data | DataEngConf 2018 Volcano style

    processing Data flows from
 bottom
 to top
  32. Sumedh Pathak | Citus Data | DataEngConf 2018

  33. Sumedh Pathak | Citus Data | DataEngConf 2018 Parallelize Aggregate

    Push Joins & Filters below collect. Run in parallel across all nodes Filters & Projections done before Join
  34. Sumedh Pathak | Citus Data | DataEngConf 2018

  35. SELECT sum(intermediate_col) 
 FROM <concatenated results>; SELECT sum(price) FROM orders_2

    JOIN nation_2
 ON (orders_2.name = nation_2.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia'; SELECT sum(price) FROM orders_2 JOIN nation_1
 ON (orders_2.name = nation_1.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia'; Sumedh Pathak | Citus Data | DataEngConf 2018
  36. Executing Distributed SQL SQL database orders_1 nation_1 orders nation SELECT

    sum(price) FROM orders_2 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM orders_1 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM <results>; SQL database orders_2 nation_1
  37. The key to scaling SQL... - New relational algebra operators

    for distributed processing - Relational Algebra Properties to optimize tree: Commutativity, Associativity, & Distributivity - Map / Reduce operators Sumedh Pathak | Citus Data | DataEngConf 2018
  38. Scaling Transactions

  39. Money Transfer, as an example BEGIN; UPDATE accounts SET balance

    = balance - 100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; Sumedh Pathak | Citus Data | DataEngConf 2018
  40. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 Coordinator Sumedh Pathak | Citus Data | DataEngConf 2018
  41. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator
  42. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; COMMIT; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator
  43. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 Coordinator BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; Sumedh Pathak | Citus Data | DataEngConf 2018 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’;
  44. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT Sumedh Pathak | Citus Data | DataEngConf 2018 BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’; Coordinator
  45. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator
  46. What happens during PREPARE? State of transaction stored on a

    durable store Locks are
 maintained & Sumedh Pathak | Citus Data | DataEngConf 2018
  47. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 Coordinator PREPARE TRANSACTION ‘citus_...98’; ROLLBACK TRANSACTION ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018
  48. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator
  49. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator
  50. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator
  51. A1 A2 BEGIN; UPDATE accounts SET balance = balance -

    100 WHERE id = ‘ALICE’; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; COMMIT; A3 A4 COMMIT PREPARED ‘citus_... 98’; COMMIT PREPARED ‘citus_... 98’; Sumedh Pathak | Citus Data | DataEngConf 2018 Coordinator
  52. But.... Deadlocks!

  53. Example // SESSION 1 BEGIN; UPDATE accounts SET balance =

    balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) // SESSION 2 Sumedh Pathak | Citus Data | DataEngConf 2018
  54. Example // SESSION 1 BEGIN; UPDATE accounts SET balance =

    balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) // SESSION 2 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’) Sumedh Pathak | Citus Data | DataEngConf 2018
  55. Example // SESSION 1 BEGIN; UPDATE accounts SET balance =

    balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; // SESSION 2 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’) Sumedh Pathak | Citus Data | DataEngConf 2018
  56. Example // SESSION 1 BEGIN; UPDATE accounts SET balance =

    balance - 100 WHERE id = ‘ALICE’; (LOCK on ROW with id ‘ALICE’) UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; // SESSION 2 BEGIN; UPDATE accounts SET balance = balance + 100 WHERE id = ‘BOB’; (LOCK on ROW with id ‘BOB’) UPDATE accounts SET balance = balance - 100 WHERE id = ‘ALICE’ Sumedh Pathak | Citus Data | DataEngConf 2018
  57. How do Relational DB’s solve this? S1 S2 - Construct

    a Directed Graph - Each node is a session/transaction - Edge represents a wait on a lock Waiting on ‘Bob’ Waiting on ‘Alice’
  58. S1 S2 S1 S2 S2 S1 S2 Waits on S1

    S1 Waits on S2 Sumedh Pathak | Citus Data | DataEngConf 2018
  59. S1 S2 S1 S2 S2 S1 S2 Waits on S1

    S1 Waits on S2 Distributed
 Deadlock Detector S1 S2 S2 S1 Sumedh Pathak | Citus Data | DataEngConf 2018
  60. keys to scaling transactions - 2PC to ensure atomic transactions

    across nodes - Deadlock Detection—to scale complex & concurrent transaction workloads - MVCC - Failure Handling 4 Sumedh Pathak | Citus Data | DataEngConf 2018
  61. It’s 2018. Distributed can be Relational - Scale tables—via sharding

    - Scale SQL—via distributed relational algebra - Scale transactions—via 2PC & Deadlock Detection Sumedh Pathak | Citus Data | DataEngConf 2018 Now, how do we implement all of this?
  62. Postgres Sumedh Pathak | Citus Data | DataEngConf 2018

  63. PostgreSQL Planner Executor Custom scan Commit / abort Extension (.so)

    Access methods Foreign tables Functions ... ... ... ... ... ... ... CREATE EXTENSION ... Sumedh Pathak | Citus Data | DataEngConf 2018
  64. PostgreSQL PostgreSQL PostgreSQL shards shards shard shards shards shard SELECT

    … FROM distributed_table … SELECT … FROM shard… SELECT … FROM shard… Citus
  65. PostgreSQL Citus PostGIS PL/Python JSONB 2PC Replication ... Sequences Indexes

    Full-Text Search Transactions … dblink Sumedh Pathak | Citus Data | DataEngConf 2018 Rich ecosystem of tooling, data types,
 & extensions in PostgreSQL
  66. 66 PostgreSQL can be
 extended into an all-purpose distributed RDBMS

  67. 67 I believe the future of distributed databases is relational.

  68. Thank you! Sumedh Pathak sumedh@citusdata.com @moss_toss | @citusdata | citusdata.com

    DataEngConf 2018