Slide 1

Slide 1 text

Scaling out (Postgre)SQL [email protected]

Slide 2

Slide 2 text

Why SQL?

Slide 3

Slide 3 text

Streaming Storage Map Reduce NoSQL SQL Database SELECT ... Application Application

Slide 4

Slide 4 text

SQL databases are general-purpose data platforms

Slide 5

Slide 5 text

but SQL doesn’t scale

Slide 6

Slide 6 text

but SQL doesn’t scale SQL is hard to scale

Slide 7

Slide 7 text

FROM table R SELECT x Project x (R) → R’ WHERE f(x) Filter f(x) (R) → R’ … JOIN … R × S → R’ SQL ↔ Relational Algebra

Slide 8

Slide 8 text

FROM sharded_table Collect(R 1 ,R 2 ,...) → R’ Distributed Relational Algebra R 1 R 2 C

Slide 9

Slide 9 text

Project x (Collect(R 1 ,R 2 ,...)) = Collect(Project x (R 1 ), Project x (R 2 )...) Commutative property R 1 R 2 C R 1 R 2 C P x P x P x

Slide 10

Slide 10 text

Collect(R 1 ,R 2 ,...) × Collect(S 1 ,S 2 ,...) = Collect(R 1 × S 1 ,R 2 × S 2, ...) Distributive property R 1 R 2 C × C S 1 S 2 R 1 R 2 C × S 1 S 2 ×

Slide 11

Slide 11 text

SELECT sum(price) FROM orders, nation WHERE orders.nation = nation.name AND orders.date >= '2017-01-01' AND nation.region = 'Asia';

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

SELECT sum(intermediate_col) FROM ; SELECT sum(price) FROM orders_2 JOIN nation_2 ON (orders_2.name = nation_2.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia'; SELECT sum(price) FROM orders_2 JOIN nation_1 ON (orders_2.name = nation_1.name) WHERE orders_2.date >= '2017-01-01' AND nation_2.region = 'Asia';

Slide 16

Slide 16 text

Executing Distributed SQL SQL database orders_1 nation_1 orders nation SELECT sum(price) FROM orders_2 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM orders_1 o JOIN nation_1 ON (o.name = n.name) WHERE o.date >= '2017-01-01' AND n.region = 'Asia'; SELECT sum(price) FROM ; SQL database orders_2 nation_1

Slide 17

Slide 17 text

Why PostgreSQL?

Slide 18

Slide 18 text

PostgreSQL Planner Executor Custom scan Commit / abort Extension (shared library) Access methods Foreign tables Functions ... ... ... ... ... ... ... SELECT ... CREATE EXTENSION ...

Slide 19

Slide 19 text

PostgreSQL PostgreSQL PostgreSQL shards shards shard shards shards shard SELECT … FROM distributed_table … SELECT … FROM shard… SELECT … FROM shard… Citus

Slide 20

Slide 20 text

PostgreSQL Citus PostGIS PL/Python JSONB 2PC Replication ... Sequences Indexes Full-Text Search Transactions … dblink

Slide 21

Slide 21 text

PostgreSQL is becoming an extensible, scalable general-purpose data platform