Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling out (Postgre)SQL

3a932587ce6bcaa5e118198dc4c05d3c?s=47 Marco Slot
April 24, 2017

Scaling out (Postgre)SQL

Slides from my talk at dotScale in Paris on April 24th, 2017.

3a932587ce6bcaa5e118198dc4c05d3c?s=128

Marco Slot

April 24, 2017
Tweet

Transcript

  1. Scaling out (Postgre)SQL marco@citusdata.com

  2. Why SQL?

  3. Streaming Storage Map Reduce NoSQL SQL Database SELECT ... Application

    Application
  4. SQL databases are general-purpose data platforms

  5. but SQL doesn’t scale

  6. but SQL doesn’t scale SQL is hard to scale

  7. FROM table R SELECT x Project x (R) → R’

    WHERE f(x) Filter f(x) (R) → R’ … JOIN … R × S → R’ SQL ↔ Relational Algebra
  8. FROM sharded_table Collect(R 1 ,R 2 ,...) → R’ Distributed

    Relational Algebra R 1 R 2 C
  9. Project x (Collect(R 1 ,R 2 ,...)) = Collect(Project x

    (R 1 ), Project x (R 2 )...) Commutative property R 1 R 2 C R 1 R 2 C P x P x P x
  10. Collect(R 1 ,R 2 ,...) × Collect(S 1 ,S 2

    ,...) = Collect(R 1 × S 1 ,R 2 × S 2, ...) Distributive property R 1 R 2 C × C S 1 S 2 R 1 R 2 C × S 1 S 2 ×
  11. SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND

    orders.date >= '2017-01-01' AND nation.region = 'Asia';
  12. None
  13. None
  14. None
  15. SELECT sum(intermediate_col) FROM <concatenated results>; SELECT sum(price) FROM orders_2 JOIN

    nation_2 ON (orders_2.name = nation_2.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia'; SELECT sum(price) FROM orders_2 JOIN nation_1 ON (orders_2.name = nation_1.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia';
  16. Executing Distributed SQL SQL database orders_1 nation_1 orders nation SELECT

    sum(price) FROM orders_2 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM orders_1 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM <results>; SQL database orders_2 nation_1
  17. Why PostgreSQL?

  18. PostgreSQL Planner Executor Custom scan Commit / abort Extension (shared

    library) Access methods Foreign tables Functions ... ... ... ... ... ... ... SELECT ... CREATE EXTENSION ...
  19. PostgreSQL PostgreSQL PostgreSQL shards shards shard shards shards shard SELECT

    … FROM distributed_table … SELECT … FROM shard… SELECT … FROM shard… Citus
  20. PostgreSQL Citus PostGIS PL/Python JSONB 2PC Replication ... Sequences Indexes

    Full-Text Search Transactions … dblink
  21. PostgreSQL is becoming an extensible, scalable general-purpose data platform