Slide 1

Slide 1 text

Challenges of Distributing Postgres:
 A Citus Story Ozgun Erdogan QCon San Francisco | November 2017

Slide 2

Slide 2 text

Developers Love Postgres PostgreSQL MySQL MongoDB SQL Server + Oracle 0% 18% 35% 53% 70% Mar-11 Sept-11 Mar-12 Sept-12 Mar-13 Sept-13 Mar-14 Sept-14 Mar-15 Sept-15 Mar-16 RDBMS: PostgreSQL, MySQL, Microsoft SQL Server, Oracle Ozgun Erdogan | QCon San Francisco 2017

Slide 3

Slide 3 text

I love Postgres, too 3 Ozgun Erdogan | QCon San Francisco 2017 Ozgun Erdogan CTO of Citus Data Distributed Systems Distributed Databases Formerly of Amazon Love drinking margaritas

Slide 4

Slide 4 text

4

Slide 5

Slide 5 text

Our mission at Citus Data 5 Ozgun Erdogan | QCon San Francisco 2017 Make it so SaaS businesses
 never have to worry about scaling their database again

Slide 6

Slide 6 text

What is the Citus database? 1.Scales out PostgreSQL 2.Extension to PostgreSQL 3.Available in 3 Ways Ozgun Erdogan | QCon San Francisco 2017 • Using sharding & replication • Query engine parallelizes SQL queries across many nodes • Using PostgreSQL extension APIs

Slide 7

Slide 7 text

Citus, Packaged Three Ways Ozgun Erdogan | QCon San Francisco 2017 Open
 Source Enterprise
 Software Fully-Managed
 Database as a Service github.com/citusdata/citus

Slide 8

Slide 8 text

Simplified Citus Architecture

Slide 9

Slide 9 text

3 Challenges Distributing Postgres 1. PostgreSQL and High Availability 2. PostgreSQL is huge. How to keep up with it 3. Distributed transactions Ozgun Erdogan | QCon San Francisco 2017

Slide 10

Slide 10 text

PostgreSQL &
 High Availability (HA) Designing for a Cloud-native world 1

Slide 11

Slide 11 text

Why is High Availability hard? PostgreSQL replication uses one primary & multiple secondary nodes. Two challenges: 1. Most Postgres clients aren’t smart. When the primary fails, they retry the same IP. 2. Postgres replicates entire state. This makes it resource intensive to reconstruct new nodes from a primary. Ozgun Erdogan | QCon San Francisco 2017

Slide 12

Slide 12 text

Database Failures Should Be Transparent Ozgun Erdogan | QCon San Francisco 2017

Slide 13

Slide 13 text

Database Failures Shouldn’t Be a Big Deal 1. PostgreSQL streaming replication to replicate from primary to secondary. Back up to S3. 2. Volume level replication to replicate to secondary’s volume. Back up to S3. 3. Incremental backups to S3. Reconstruct secondary nodes from S3. Ozgun Erdogan | QCon San Francisco 2017 3 Methods for HA & Backups in Postgres

Slide 14

Slide 14 text

Postgres - Streaming Replication (1) Write-ahead logs (streaming repl.) Table foo Primary – PostgreSQL streaming repl. Table bar WAL logs Table foo Table bar WAL logs Secondary – PostgreSQL streaming repl. Monitoring Agents - streaming repl. setup & auto failover S3 / Blob Storage (Encrypted) Backup Process Ozgun Erdogan | QCon San Francisco 2017

Slide 15

Slide 15 text

Postgres – AWS RDS & Azure (2) Postgres Primary Monitoring Agents (Auto node failover) Persistent Volume Postgres Standby S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Table foo Table bar WAL logs Backup process Backup Process Persistent Volume Ozgun Erdogan | QCon San Francisco 2017

Slide 16

Slide 16 text

Postgres – Reconstruct from WAL (3) Postgres Primary Monitoring Agents (Auto node failover) Persistent Volume Postgres Secondary Backup Process S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Persistent Volume Table foo Table bar WAL logs Backup process Ozgun Erdogan | QCon San Francisco 2017

Slide 17

Slide 17 text

WHO DOES THIS? PRIMARY BENEFITS Streaming Replication (local / ephemeral disk) On-prem
 Manual EC2 Simple to set up Direct I/O: High I/O & large storage Disk Mirroring RDS
 Azure Preview Works for MySQL and PostgreSQL
 Data durability in cloud environments Reconstruct from WAL Heroku
 Citus Cloud Enables Fork and PITR Node reconstruction in background (Data durability in cloud environments) How do these approaches compare? 17 Ozgun Erdogan | QCon San Francisco 2017

Slide 18

Slide 18 text

Summary • In PostgreSQL, a database node’s state gets replicated in its entirety. The replication can be set up in three ways. • Reconstructing a secondary node from S3 makes bringing up or shooting down nodes easy. • When you shard your database, the state you need to replicate per node becomes smaller. Ozgun Erdogan | QCon San Francisco 2017

Slide 19

Slide 19 text

PostgreSQL is huge How do you scale all features? 2

Slide 20

Slide 20 text

3 ways to build a distributed database 1. Build a distributed database from scratch 2. Middleware sharding (mimic the parser) 3. Fork your favorite database (like PostgreSQL) Ozgun Erdogan | QCon San Francisco 2017

Slide 21

Slide 21 text

Example Transaction Block Ozgun Erdogan | QCon San Francisco 2017

Slide 22

Slide 22 text

Postgres Features, Tools & Frameworks • PostgreSQL manual (US Letter) • Clients for diff programming languages • ORMs, libraries, GUIs • Tools (dump, restore, analyze) • New features Ozgun Erdogan | QCon San Francisco 2017

Slide 23

Slide 23 text

At First, Forked PostgreSQL with Style Ozgun Erdogan | QCon San Francisco 2017

Slide 24

Slide 24 text

Two Stage Query Optimization 1. Plan to minimize network I/O 2. Nodes talk to each other using SQL over libpq 3. Learned to cooperate with planner / executor bit by bit (Volcano style executor) Ozgun Erdogan | QCon San Francisco 2017

Slide 25

Slide 25 text

Citus Architecture (Simplified) 25 SELECT avg(revenue) FROM sales Coordinator SELECT sum(revenue), count(revenue) FROM table_1001 SELECT sum … FROM table_1003 Worker node 1 Table metadata Table_1001 Table_1003 SELECT sum … FROM table_1002 SELECT sum … FROM table_1004 Worker node 2 Table_1002 Table_1004 Worker node N . . . . . . Each node PostgreSQL with Citus installed 1 shard = 1 PostgreSQL table Ozgun Erdogan | QCon San Francisco 2017

Slide 26

Slide 26 text

Unfork Citus using Extension APIs CREATE EXTENSION citus; • System catalogs – Distributed metadata • Planner hook – Insert, Update, Delete, Select • Executor hook – Insert, Update, Delete, Select • Utility hook – Alter Table, Create Index, Vacuum, etc. • Transaction & resources handling – file descriptors, etc. • Background worker process – Maintenance processes (distributed deadlock detection, task tracker, etc.) • Logical decoding – Online data migrations Ozgun Erdogan | QCon San Francisco 2017

Slide 27

Slide 27 text

PostgreSQL has transactions. How to handle distributed transactions 3

Slide 28

Slide 28 text

BEGIN INSERT UPDATE SELECT COMMIT ROLLBA CK

Slide 29

Slide 29 text

Consistency in Distributed Databases 1. 2PC: All participating nodes need to be up 2. Paxos: Achieves consensus with quorum 3. Raft: More understandable alternative to Paxos Ozgun Erdogan | QCon San Francisco 2017

Slide 30

Slide 30 text

Concurrency in Distributed Databases Ozgun Erdogan | QCon San Francisco 2017

Slide 31

Slide 31 text

Locks

Slide 32

Slide 32 text

What is a Lock? • Protects against concurrent modifications. • Locks are released at the end of a transaction.

Slide 33

Slide 33 text

Transactions Block on 1st Conflicting Lock

Slide 34

Slide 34 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 35

Slide 35 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 36

Slide 36 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 37

Slide 37 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 38

Slide 38 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 39

Slide 39 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 40

Slide 40 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 41

Slide 41 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 42

Slide 42 text

Transactions and Concurrency • Transactions that don’t modify the same row can run concurrently.

Slide 43

Slide 43 text

Distributed transactions are a complex topic • Most articles on distributed transactions focus on data consistency. • Data consistency is only one side of the coin. If you’re using a relational database, your application benefits from another key feature: deadlock detection. • https://www.citusdata.com/blog/2017/08/31/ databases-and-distributed-deadlocks-a-faq Ozgun Erdogan | QCon San Francisco 2017

Slide 44

Slide 44 text

So now what? We talked about 3 challenges distributing Postgres 1. PostgreSQL, Replication, High Availability 2. Tradeoffs in building a distributed database— and how we chose PostgreSQL’s extension APIs 3. Distributed deadlock detection & distributed transactions Ozgun Erdogan | QCon San Francisco 2017

Slide 45

Slide 45 text

45 Solving a very complex problem

Slide 46

Slide 46 text

46 “SQL is hard, not impossible, to scale”

Slide 47

Slide 47 text

© 2017 Citus Data. All right reserved. [email protected] Questions? @citusdata Ozgun Erdogan www.citusdata.com