Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Future of Distributed Databases is Relational | QCon London 2018 | Sumedh Pathak

The Future of Distributed Databases is Relational | QCon London 2018 | Sumedh Pathak

Years ago when working at Amazon on shopping cart infrastructure and the precursor to DynamoDB, my co-founder and I realized that while distributed key value stores were useful for a few use-cases, we missed many of the benefits of relational databases: transactions, joins, and the power of the lingua franca of RDBMS’s: SQL. So we challenged ourselves to modernize the traditional relational database, to take a robust open source relational database and transform it into a distributed database.

This talk is about my team’s journey to create a more modern relational database. I’ll talk about the distributed systems problems we had to solve in order to scale out the Postgres open source database, in order to achieve parallelism and a concomitant increase in performance. I'll describe the architecture of the distributed query planner; how we extend traditional relational algebra operators to plan distributed queries and scale reads. I’ll also describe distributed deadlock detection, and how that enabled us to scale out transactions spanning multiple machines.

Citus Data

March 06, 2018
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Sumedh Pathak, Co-Founder & VP Engineering, Citus Data
    QCon London 2018
    The Future of Distributed
    Databases is Relational
    @moss_toss | @citusdata

    View full-size slide

  2. About Me
    • Co-Founder & VP
    Engineering at Citus Data
    • Amazon Shopping Cart
    (former)
    • Amazon Supply Chain &
    Order Fulfillment (former)
    • Stanford Computer Science
    Citus Data co-founders, left-to-right
    Ozgun Erdogan, Sumedh Pathak, & Umur Cubukcu
    Photo cred: Willy Johnson, Monterey CA, Sep 2017
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  3. Why
    RDBMS?
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  4. Streaming Storage
    Map
    Reduce
    NoSQL
    SQL
    Database
    SELECT ...
    Application
    Application
    Sumedh Pathak | Citus Data | QCon London 2018
    Because your architecture could be simpler

    View full-size slide

  5. An RDBMS is a
    general-purpose
    data platform
    Sumedh Pathak | Citus Data | QCon London 2018
    Fast writes
    Real-time & bulk
    High throughput
    High concurrency
    Data consistency
    Query optimizers

    View full-size slide

  6. PostgreSQL
    MySQL
    MongoDB
    SQL Server +
    Oracle
    Source: Hacker News, https://news.ycombinator.com
    Startups Are Choosing Postgres
    % database job posts mentioning each database, across 20K+ job posts
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  7. but RDBMS’s
    don’t scale, right?
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  8. but RDBMS’s don’t scale
    RDBMS’s are hard to scale
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  9. What exactly needs to Scale?
    - Tables (Data)
    - Partitioning, Co-location, Reference Tables
    - SQL (Reads)
    - How do we express and optimize distributed SQL
    - Transactions (Writes)
    - Cross Shard updates/deletes, Global Atomic Transactions
    Sumedh Pathak | Citus Data | QCon London 2018
    1
    2
    3

    View full-size slide

  10. Scaling
    Tables
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  11. How to partition the data?
    - Pick a column
    - Date
    - Id (customer_id, cart_id)
    - Pick a method
    - Hash
    - Range

    View full-size slide

  12. Partition data across nodes
    R
    1
    R
    2
    R
    3
    R
    4
    R
    5
    R
    6
    R
    7
    Coordinator
    Node
    Worker
    Nodes
    Shards
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  13. Worker → RDBMS, Shard → Table

    View full-size slide

  14. Reference Tables
    N
    1
    N
    1
    N
    1
    N
    1
    Copies of same table
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator
    Node
    Worker
    Nodes

    View full-size slide

  15. Co-Location
    R
    1
    R
    2
    R
    3
    R
    4
    S
    1
    S
    2
    S
    3
    S
    4
    Explicit Co-Location API.
    E.g. Partition by Tenant
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator
    Node
    Worker
    Nodes

    View full-size slide

  16. What about Foreign Keys?

    View full-size slide

  17. The key to scaling tables...
    - Use relational databases as a building block
    - Understand semantics of application—to be smart about
    partitioning
    - Multi-tenant applications

    View full-size slide

  18. FROM table R
    SELECT x Project
    x
    (R) → R’
    WHERE f(x) Filter
    f(x)
    (R) → R’
    … JOIN … R × S → R’
    SQL ↔ Relational Algebra
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  19. FROM sharded_table Collect(R
    1
    ,R
    2
    ,...) → R
    Distributed Relational Algebra
    R
    1
    R
    2
    C
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  20. Project
    x
    (Collect(R
    1
    ,R
    2
    ,...)) = Collect(Project
    x
    (R
    1
    ), Project
    x
    (R
    2
    )...)
    Commutative property
    R
    1
    R
    2
    C
    R
    1
    R
    2
    C
    P
    x
    P
    x
    P
    x
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  21. Collect(R
    1
    ,R
    2
    ,...) x Collect(S
    1
    ,S
    2
    ,...) = Collect(R
    1
    × S
    1
    ,R
    2
    × S
    2,
    ...)
    Distributive property
    R
    1
    R
    2
    C
    ×
    C
    S
    1
    S
    2
    R
    1
    R
    2
    C
    ×
    S
    1
    S
    2
    ×
    X = Join Operator

    View full-size slide

  22. SUM(x)(Collect(R
    1
    ,R
    2
    ,...)) = SUM(Collect(SUM(R
    1
    ), SUM(R
    2
    )...))
    Associative property
    R
    1
    R
    2
    C
    R
    1
    R
    2
    C
    Sum
    x
    Sum
    x
    Sum
    x
    Sumedh Pathak | Citus Data | QCon London 2018
    Sum
    x

    View full-size slide

  23. SELECT sum(price) FROM orders, nation
    WHERE orders.nation = nation.name AND
    orders.date >= '2012-01-01' AND
    nation.region = 'Asia';
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  24. Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  25. Sumedh Pathak | Citus Data | QCon London 2018
    Volcano style processing
    Data flows
    from
    bottom
    to top

    View full-size slide

  26. Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  27. Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  28. Sumedh Pathak | Citus Data | QCon London 2018
    Parallelize
    Aggregate
    Push Joins & Filters
    below collect. Run in
    parallel across all
    nodes
    Filters & Projections
    done before Join

    View full-size slide

  29. SELECT sum(intermediate_col)
    FROM ;
    SELECT sum(price)
    FROM orders_2 JOIN nation_2
    ON (orders_2.name = nation_2.name)
    WHERE
    orders_2.date >= '2017-01-01'
    AND
    nation_2.region = 'Asia';
    SELECT sum(price)
    FROM orders_2 JOIN nation_1
    ON (orders_2.name = nation_1.name)
    WHERE
    orders_2.date >= '2017-01-01'
    AND
    nation_2.region = 'Asia';
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  30. Executing Distributed SQL
    SQL database
    orders_1
    nation_1
    orders
    nation
    SELECT sum(price)
    FROM orders_2 o JOIN nation_1
    ON (o.name = n.name)
    WHERE o.date >= '2017-01-01'
    AND n.region = 'Asia';
    SELECT sum(price)
    FROM orders_1 o JOIN nation_1
    ON (o.name = n.name)
    WHERE o.date >= '2017-01-01'
    AND n.region = 'Asia';
    SELECT sum(price) FROM ;
    SQL database
    orders_2
    nation_1

    View full-size slide

  31. The key to scaling SQL...
    - New relational algebra operators for
    distributed processing
    - Relational Algebra Properties to optimize
    tree: Commutativity, Associativity, &
    Distributivity
    - Map / Reduce operators

    View full-size slide

  32. Scaling
    Transactions

    View full-size slide

  33. Money Transfer, as an example
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  34. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    Coordinator
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  35. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator

    View full-size slide

  36. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    COMMIT;
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator

    View full-size slide

  37. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    Coordinator
    BEGIN;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    Sumedh Pathak | Citus Data | QCon London 2018
    BEGIN;
    UPDATE accounts SET balance = balance
    - 100 WHERE id = ‘ALICE’;

    View full-size slide

  38. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    BEGIN;
    UPDATE accounts SET balance =
    balance + 100 WHERE id = ‘BOB’;
    COMMIT
    Sumedh Pathak | Citus Data | QCon London 2018
    BEGIN;
    UPDATE accounts SET balance = balance
    - 100 WHERE id = ‘ALICE’;
    Coordinator

    View full-size slide

  39. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator

    View full-size slide

  40. What happens during PREPARE?
    State of transaction stored
    on a durable store
    Locks are
    maintained
    &

    View full-size slide

  41. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    Coordinator
    PREPARE TRANSACTION ‘citus_...98’; ROLLBACK TRANSACTION ‘citus_...98’;
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  42. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    PREPARE TRANSACTION ‘citus_...98’; PREPARE TRANSACTION ‘citus_...98’;
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator

    View full-size slide

  43. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    COMMIT PREPARED ‘citus_...98’; COMMIT PREPARED ‘citus_...98’;
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator

    View full-size slide

  44. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    COMMIT PREPARED ‘citus_...98’; COMMIT PREPARED ‘citus_...98’;
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator

    View full-size slide

  45. A
    1
    A
    2
    BEGIN;
    UPDATE accounts SET balance = balance -
    100 WHERE id = ‘ALICE’;
    UPDATE accounts SET balance = balance +
    100 WHERE id = ‘BOB’;
    COMMIT;
    A
    3
    A
    4
    COMMIT PREPARED ‘citus_...98’; COMMIT PREPARED ‘citus_...98’;
    Sumedh Pathak | Citus Data | QCon London 2018
    Coordinator

    View full-size slide

  46. But....
    Deadlocks!

    View full-size slide

  47. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET balance
    = balance - 100 WHERE id =
    ‘ALICE’;
    (LOCK on ROW with id
    ‘ALICE’)
    // SESSION 2
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  48. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET balance = balance - 100 WHERE
    id = ‘ALICE’;
    (LOCK on ROW with id ‘ALICE’)
    // SESSION 2
    BEGIN;
    UPDATE accounts SET balance
    = balance + 100 WHERE id =
    ‘BOB’;
    (LOCK on ROW with id ‘BOB’)
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  49. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET balance = balance - 100 WHERE
    id = ‘ALICE’;
    (LOCK on ROW with id ‘ALICE’)
    UPDATE accounts SET balance
    = balance + 100 WHERE id =
    ‘BOB’;
    // SESSION 2
    BEGIN;
    UPDATE accounts SET balance = balance + 100 WHERE
    id = ‘BOB’;
    (LOCK on ROW with id ‘BOB’)
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  50. Example
    // SESSION 1
    BEGIN;
    UPDATE accounts SET balance = balance - 100 WHERE
    id = ‘ALICE’;
    (LOCK on ROW with id ‘ALICE’)
    UPDATE accounts SET balance = balance + 100
    WHERE id = ‘BOB’;
    // SESSION 2
    BEGIN;
    UPDATE accounts SET balance = balance + 100 WHERE
    id = ‘BOB’;
    (LOCK on ROW with id ‘BOB’)
    UPDATE accounts SET balance
    = balance - 100 WHERE id =
    ‘ALICE’
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  51. How do Relational DB’s solve this?
    S1
    S2
    - Construct a
    Directed Graph
    - Each node is a
    session/transaction
    - Edge represents a
    wait on a lock
    Waiting
    on ‘Bob’
    Waiting on
    ‘Alice’

    View full-size slide

  52. S1
    S2
    S1 S2
    S2 S1
    S2 Waits on S1 S1 Waits on S2
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  53. S1
    S2
    S1 S2
    S2 S1
    S2 Waits on S1 S1 Waits on S2
    Distributed
    Deadlock
    Detector
    S1
    S2 S2
    S1
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  54. keys to scaling transactions
    - 2PC to ensure atomic transactions across nodes
    - Deadlock Detection—to scale complex & concurrent
    transaction workloads
    - MVCC
    - Failure Handling
    4

    View full-size slide

  55. It’s 2018. Distributed can be Relational
    - Scale tables—via sharding
    - Scale SQL—via distributed relational algebra
    - Scale transactions—via 2PC & Deadlock Detection
    Sumedh Pathak | Citus Data | QCon London 2018
    Now, how do we implement all of this?

    View full-size slide

  56. Postgres
    Sumedh Pathak | Citus Data | QCon London 2018

    View full-size slide

  57. PostgreSQL
    Planner
    Executor
    Custom scan
    Commit / abort
    Extension (.so)
    Access methods
    Foreign tables
    Functions
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    CREATE EXTENSION ...

    View full-size slide

  58. PostgreSQL PostgreSQL
    PostgreSQL
    shards
    shards
    shard
    shards
    shards
    shard
    SELECT … FROM distributed_table …
    SELECT … FROM shard… SELECT … FROM shard…
    Citus

    View full-size slide

  59. PostgreSQL
    Citus
    PostGIS PL/Python
    JSONB
    2PC
    Replication
    ...
    Sequences
    Indexes
    Full-Text Search
    Transactions

    dblink
    Sumedh Pathak | Citus Data | QCon London 2018
    Rich
    ecosystem of
    tooling, data
    types,
    & extensions
    in
    PostgreSQL

    View full-size slide

  60. PostgreSQL can be
    extended into an all-purpose
    distributed RDBMS

    View full-size slide

  61. I believe the future of distributed
    databases is relational.

    View full-size slide

  62. Thank you!
    Sumedh Pathak
    [email protected]
    @moss_toss | @citusdata | citusdata.com
    QCon London 2018 | The Future of Distributed Databases is Relational

    View full-size slide