$30 off During Our Annual Pro Sale. View Details »

Scaling out (Postgre)SQL

Marco Slot
April 24, 2017

Scaling out (Postgre)SQL

Slides from my talk at dotScale in Paris on April 24th, 2017.

Marco Slot

April 24, 2017
Tweet

More Decks by Marco Slot

Other Decks in Technology

Transcript

  1. Scaling out (Postgre)SQL
    [email protected]

    View Slide

  2. Why SQL?

    View Slide

  3. Streaming Storage
    Map
    Reduce
    NoSQL
    SQL
    Database
    SELECT ...
    Application
    Application

    View Slide

  4. SQL databases are
    general-purpose data platforms

    View Slide

  5. but SQL doesn’t scale

    View Slide

  6. but SQL doesn’t scale
    SQL is hard to scale

    View Slide

  7. FROM table R
    SELECT x Project
    x
    (R) → R’
    WHERE f(x) Filter
    f(x)
    (R) → R’
    … JOIN … R × S → R’
    SQL ↔ Relational Algebra

    View Slide

  8. FROM sharded_table Collect(R
    1
    ,R
    2
    ,...) → R’
    Distributed Relational Algebra
    R
    1
    R
    2
    C

    View Slide

  9. Project
    x
    (Collect(R
    1
    ,R
    2
    ,...)) = Collect(Project
    x
    (R
    1
    ), Project
    x
    (R
    2
    )...)
    Commutative property
    R
    1
    R
    2
    C
    R
    1
    R
    2
    C
    P
    x
    P
    x
    P
    x

    View Slide

  10. Collect(R
    1
    ,R
    2
    ,...) × Collect(S
    1
    ,S
    2
    ,...) = Collect(R
    1
    × S
    1
    ,R
    2
    × S
    2,
    ...)
    Distributive property
    R
    1
    R
    2
    C
    ×
    C
    S
    1
    S
    2
    R
    1
    R
    2
    C
    ×
    S
    1
    S
    2
    ×

    View Slide

  11. SELECT sum(price) FROM orders, nation
    WHERE orders.nation = nation.name AND
    orders.date >= '2017-01-01' AND
    nation.region = 'Asia';

    View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. SELECT sum(intermediate_col)
    FROM ;
    SELECT sum(price)
    FROM orders_2 JOIN nation_2
    ON (orders_2.name = nation_2.name)
    WHERE
    orders_2.date >= '2017-01-01'
    AND
    nation_2.region = 'Asia';
    SELECT sum(price)
    FROM orders_2 JOIN nation_1
    ON (orders_2.name = nation_1.name)
    WHERE
    orders_2.date >= '2017-01-01'
    AND
    nation_2.region = 'Asia';

    View Slide

  16. Executing Distributed SQL
    SQL database
    orders_1
    nation_1
    orders
    nation SELECT sum(price)
    FROM orders_2 o JOIN nation_1
    ON (o.name = n.name)
    WHERE o.date >= '2017-01-01'
    AND n.region = 'Asia';
    SELECT sum(price)
    FROM orders_1 o JOIN nation_1
    ON (o.name = n.name)
    WHERE o.date >= '2017-01-01'
    AND n.region = 'Asia';
    SELECT sum(price) FROM ;
    SQL database
    orders_2
    nation_1

    View Slide

  17. Why PostgreSQL?

    View Slide

  18. PostgreSQL
    Planner
    Executor
    Custom scan
    Commit / abort
    Extension (shared library)
    Access methods
    Foreign tables
    Functions
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    SELECT ... CREATE EXTENSION ...

    View Slide

  19. PostgreSQL PostgreSQL
    PostgreSQL
    shards
    shards
    shard
    shards
    shards
    shard
    SELECT … FROM distributed_table …
    SELECT … FROM shard… SELECT … FROM shard…
    Citus

    View Slide

  20. PostgreSQL
    Citus
    PostGIS PL/Python
    JSONB
    2PC
    Replication
    ...
    Sequences
    Indexes
    Full-Text Search
    Transactions

    dblink

    View Slide

  21. PostgreSQL is becoming an extensible,
    scalable general-purpose data platform

    View Slide