$30 off During Our Annual Pro Sale. View Details »

Scaling the Relational Database for the Cloud Age | DataEngConf SF 2018 | Sumedh Pathak

Scaling the Relational Database for the Cloud Age | DataEngConf SF 2018 | Sumedh Pathak

Years ago when working at Amazon on shopping cart infrastructure and the precursor to DynamoDB, my co-founder and I realized that while distributed key value stores were useful for a few use-cases, we missed many of the benefits of relational databases: transactions, joins, and the power of the lingua franca of RDBMS’s: SQL. So we challenged ourselves to modernize the traditional relational database, to take a robust open source relational database and transform it into a distributed database.

This talk is about my team’s journey to create a more modern relational database. I’ll talk about the distributed systems problems we had to solve in order to scale out the Postgres open source database, in order to achieve parallelism and a concomitant increase in performance. I'll describe the architecture of the distributed query planner; how we extend traditional relational algebra operators to plan distributed queries and scale reads. I’ll also describe distributed deadlock detection, and how that enabled us to scale out transactions spanning multiple machines.

Citus Data

April 17, 2018
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Sumedh Pathak, Co-Founder & VP Engineering, Citus Data
    DataEngConf San Francisco 2018
    Scaling the Relational
    Database for the Cloud Age
    @moss_toss | @citusdata

    View Slide

  2. About Me
    • Co-Founder & VP Engineering at Citus Data
    • Amazon Shopping Cart

    (former)
    • Amazon Supply Chain & Order Fulfillment (former)
    • Stanford Computer Science
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  3. View Slide

  4. Why
    RDBMS?
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  5. Streaming Storage
    Map
    Reduce
    NoSQL
    SQL
    Database
    SELECT ...
    Application
    Application
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Because your architecture could be simpler

    View Slide

  6. An RDBMS is a
    general-purpose

    data platform
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Fast writes
    Real-time & bulk
    High throughput
    High concurrency
    Data consistency
    Query optimizers

    View Slide

  7. PostgreSQL
    MySQL
    MongoDB
    SQL Server +
    Oracle
    Source: Hacker News, https://news.ycombinator.com

    Startups Are Choosing Postgres
    % database job posts mentioning each database, across 20K+ job posts

    View Slide

  8. Data Trends in the “Cloud Age”
    - Explosion of Data
    - Higher Volume
    - Higher Velocity
    ** SCALE **

    View Slide

  9. but RDBMS’s 

    don’t scale!
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  10. but RDBMS’s don’t scale
    RDBMS’s are hard to scale
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  11. What exactly needs to Scale?
    - Tables (Data)
    - Partitioning, Co-location, Reference Tables
    - SQL (Reads)
    - How do we express and optimize distributed SQL
    - Transactions (Writes)
    - Cross Shard updates/deletes, Global Atomic Transactions
    Sumedh Pathak | Citus Data | DataEngConf 2018
    1
    2
    3

    View Slide

  12. Scaling
    Tables
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  13. Data Partitioning
    - Pick a column
    - Date
    - Id (customer_id, cart_id)
    - Pick a method
    - Hash
    - Range

    View Slide

  14. Partition data across nodes
    R1
    R2
    R3
    R4
    R5
    R6
    R7
    Coordinator
    Node
    Worker
    Nodes
    Shards
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  15. Worker → RDBMS, Shard → Table

    View Slide

  16. Reference Tables
    N1
    N1
    N1
    N1
    Copies of same table
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator
    Node
    Worker
    Nodes

    View Slide

  17. Co-Location
    R1
    R2
    R3
    R4
    S1
    S2
    S3
    S4
    Explicit Co-Location API.

    E.g. Partition by Tenant
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator
    Node
    Worker
    Nodes

    View Slide

  18. What about Foreign Keys?

    View Slide

  19. The key to scaling tables...
    - Use relational databases as a building block
    - Understand semantics of application—to be smart about
    partitioning
    - Multi-tenant applications

    View Slide

  20. Scaling
    SQL

    View Slide

  21. FROM table R
    SELECT x Project
    x
    (R) ! R’
    WHERE f(x) Filter
    f(x)
    (R) ! R’
    … JOIN … R × S ! R’
    SQL ↔ Relational Algebra
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  22. FROM sharded_table Collect(R
    1
    ,R
    2
    ,...) ! R
    Distributed Relational Algebra
    R1
    R2
    C
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  23. Commutativity
    A + B = B + A
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  24. Projectx
    (Collect(R1
    ,R2
    ,...)) = Collect(Projectx
    (R1
    ), Projectx
    (R2
    )...)
    Commutative property
    R1
    R2
    C
    R1
    R2
    C
    Px
    Px
    Px
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  25. Distributivity
    A*(B + C) = A*B + A*C
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  26. Collect(R1
    ,R2
    ,...) x Collect(S1
    ,S2
    ,...) = Collect(R1
    × S1
    ,R2
    × S2,
    ...)
    Distributive property
    R1
    R2
    C
    ×
    C
    S1
    S2
    R1
    R2
    C
    ×
    S1
    S2
    ×
    X = Join Operator Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  27. Associativity
    A + B + C =
    (A + B) + C =
    A + (B + C)
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  28. SUM(x)(Collect(R1
    ,R2
    ,...)) = SUM(Collect(SUM(R1
    ), SUM(R2
    )...))
    Associative property
    R1
    R2
    C
    R1
    R2
    C
    Sumx
    Sumx
    Sumx
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Sumx

    View Slide

  29. SELECT sum(price) FROM orders,
    nation
    WHERE orders.nation = nation.name
    AND
    orders.date >= '2012-01-01'
    AND
    nation.region = 'Asia';
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  30. Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  31. Sumedh Pathak | Citus Data | DataEngConf 2018
    Volcano style processing
    Data flows
    from

    bottom

    to top

    View Slide

  32. Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  33. Sumedh Pathak | Citus Data | DataEngConf 2018
    Parallelize
    Aggregate
    Push Joins & Filters
    below collect. Run in
    parallel across all
    nodes
    Filters & Projections
    done before Join

    View Slide

  34. Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  35. SELECT sum(intermediate_col) 

    FROM ;
    SELECT sum(price)
    FROM orders_2 JOIN nation_2

    ON (orders_2.name =
    nation_2.name)
    WHERE
    orders_2.date >= '2017-01-01'
    AND
    nation_2.region = 'Asia';
    SELECT sum(price)
    FROM orders_2 JOIN nation_1

    ON (orders_2.name =
    nation_1.name)
    WHERE
    orders_2.date >= '2017-01-01'
    AND
    nation_2.region = 'Asia';
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  36. Executing Distributed SQL
    SQL database
    orders_1
    nation_1
    orders
    nation
    SELECT sum(price)
    FROM orders_2 o JOIN nation_1
    ON (o.name = n.name)
    WHERE o.date >= '2017-01-01'
    AND n.region = 'Asia';
    SELECT sum(price)
    FROM orders_1 o JOIN nation_1
    ON (o.name = n.name)
    WHERE o.date >= '2017-01-01'
    AND n.region = 'Asia';
    SELECT sum(price) FROM
    ;
    SQL database
    orders_2
    nation_1

    View Slide

  37. The key to scaling SQL...
    - New relational algebra operators for
    distributed processing
    - Relational Algebra Properties to optimize
    tree: Commutativity, Associativity, &
    Distributivity
    - Map / Reduce operators
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  38. Scaling
    Transactions

    View Slide

  39. Money Transfer, as an example
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  40. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    Coordinator
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  41. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator

    View Slide

  42. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    COMMIT;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator

    View Slide

  43. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    Coordinator
    BEGIN;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;

    View Slide

  44. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    BEGIN;
    UPDATE accounts SET balance =
    balance + 100 WHERE id =
    ‘BOB’;
    COMMIT
    Sumedh Pathak | Citus Data | DataEngConf 2018
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    Coordinator

    View Slide

  45. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator

    View Slide

  46. What happens during PREPARE?
    State of transaction stored
    on a durable store
    Locks are

    maintained
    &
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  47. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    Coordinator
    PREPARE TRANSACTION ‘citus_...98’;
    ROLLBACK TRANSACTION ‘citus_...
    98’;
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  48. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator

    View Slide

  49. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    COMMIT PREPARED ‘citus_...
    98’;
    COMMIT PREPARED ‘citus_...
    98’;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator

    View Slide

  50. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    COMMIT PREPARED ‘citus_...
    98’;
    COMMIT PREPARED ‘citus_...
    98’;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator

    View Slide

  51. A1
    A2
    BEGIN;
    UPDATE accounts SET balance =
    balance - 100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT;
    A3
    A4
    COMMIT PREPARED ‘citus_...
    98’;
    COMMIT PREPARED ‘citus_...
    98’;
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Coordinator

    View Slide

  52. But....
    Deadlocks!

    View Slide

  53. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET
    balance = balance - 100
    WHERE id = ‘ALICE’;
    (LOCK on ROW with id
    ‘ALICE’)
    // SESSION 2
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  54. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET balance = balance - 100
    WHERE id = ‘ALICE’;
    (LOCK on ROW with id ‘ALICE’)
    // SESSION 2
    BEGIN;
    UPDATE accounts SET
    balance = balance + 100
    WHERE id = ‘BOB’;
    (LOCK on ROW with id
    ‘BOB’)
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  55. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET balance = balance - 100
    WHERE id = ‘ALICE’;
    (LOCK on ROW with id ‘ALICE’)
    UPDATE accounts SET
    balance = balance + 100
    WHERE id = ‘BOB’;
    // SESSION 2
    BEGIN;
    UPDATE accounts SET balance = balance + 100
    WHERE id = ‘BOB’;
    (LOCK on ROW with id ‘BOB’)
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  56. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET balance = balance - 100
    WHERE id = ‘ALICE’;
    (LOCK on ROW with id ‘ALICE’)
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    // SESSION 2
    BEGIN;
    UPDATE accounts SET balance = balance + 100
    WHERE id = ‘BOB’;
    (LOCK on ROW with id ‘BOB’)
    UPDATE accounts SET
    balance = balance - 100
    WHERE id = ‘ALICE’
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  57. How do Relational DB’s solve this?
    S1
    S2
    - Construct a
    Directed Graph
    - Each node is a
    session/transaction
    - Edge represents a
    wait on a lock
    Waiting
    on ‘Bob’
    Waiting
    on ‘Alice’

    View Slide

  58. S1
    S2
    S1 S2
    S2 S1
    S2 Waits on S1 S1 Waits on S2
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  59. S1
    S2
    S1 S2
    S2 S1
    S2 Waits on S1 S1 Waits on S2
    Distributed

    Deadlock
    Detector
    S1
    S2 S2
    S1
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  60. keys to scaling transactions
    - 2PC to ensure atomic transactions across nodes
    - Deadlock Detection—to scale complex & concurrent
    transaction workloads
    - MVCC
    - Failure Handling
    4
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  61. It’s 2018. Distributed can be Relational
    - Scale tables—via sharding
    - Scale SQL—via distributed relational algebra
    - Scale transactions—via 2PC & Deadlock Detection
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Now, how do we implement all of this?

    View Slide

  62. Postgres
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  63. PostgreSQL
    Planner
    Executor
    Custom scan
    Commit / abort
    Extension (.so)
    Access methods
    Foreign tables
    Functions
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    CREATE EXTENSION ...
    Sumedh Pathak | Citus Data | DataEngConf 2018

    View Slide

  64. PostgreSQL PostgreSQL
    PostgreSQL
    shards
    shards
    shard
    shards
    shards
    shard
    SELECT … FROM distributed_table …
    SELECT … FROM
    shard…
    SELECT … FROM
    shard…
    Citus

    View Slide

  65. PostgreSQL
    Citus
    PostGIS PL/Python
    JSONB
    2PC
    Replication
    ...
    Sequences
    Indexes
    Full-Text Search
    Transactions

    dblink
    Sumedh Pathak | Citus Data | DataEngConf 2018
    Rich
    ecosystem
    of tooling,
    data types,

    & extensions
    in
    PostgreSQL

    View Slide

  66. 66
    PostgreSQL can be

    extended into an all-purpose
    distributed RDBMS

    View Slide

  67. 67
    I believe the future of distributed
    databases is relational.

    View Slide

  68. Thank you!
    Sumedh Pathak
    [email protected]
    @moss_toss | @citusdata | citusdata.com
    DataEngConf 2018

    View Slide