The Challenges of Distributing Postgres: A Citus Story

The Challenges of Distributing Postgres: A Citus Story Ozgun Erdogan
DataEngConf NYC | October 2017

Developers Love Postgres PostgreSQL MySQL MongoDB SQL Server + Oracle
RDBMS: PostgreSQL, MySQL, Microsoft SQL Server, Oracle Ozgun Erdogan | DataEngConf NYC 2017

I love Postgres, too 3 Ozgun Erdogan | DataEngConf NYC
2017 Ozgun Erdogan CTO of Citus Data Distributed Systems Distributed Databases Formerly of Amazon Love drinking margaritas

Our mission at Citus Data 5 Ozgun Erdogan | DataEngConf
NYC 2017 Make it so SaaS businesses never have to worry about scaling their database again

What is the Citus database? 1.Scales out PostgreSQL 2.Extension to
PostgreSQL 3.Available in 3 Ways Ozgun Erdogan | DataEngConf NYC 2017 • Using sharding & replication • Query engine parallelizes SQL queries across many nodes • Using PostgreSQL extension APIs

Citus, Packaged Three Ways Ozgun Erdogan | DataEngConf NYC 2017
Open Source Enterprise Software Fully-Managed Database as a Service github.com/citusdata/citus

Simplified Citus Architecture

3 Challenges Distributing Postgres 1. PostgreSQL and High Availability 2.
To build new distributed database—or to fork? 3. Distributed transactions Ozgun Erdogan | DataEngConf NYC 2017

PostgreSQL & High Availability (HA) Designing for a Cloud-native world
1

Why is High Availability hard? PostgreSQL replication uses one primary
& multiple secondary nodes. Two challenges: 1. Most Postgres clients aren’t smart. When the primary fails, they retry the same IP. 2. Postgres replicates entire state. This makes it resource intensive to reconstruct new nodes from a primary. Ozgun Erdogan | DataEngConf NYC 2017

Database Failures Should Be Transparent Ozgun Erdogan | DataEngConf NYC
2017

Database Failures Shouldn’t Be a Big Deal 1. PostgreSQL streaming
replication to replicate from primary to secondary. Back up to S3. 2. Volume level replication to replicate to secondary’s volume. Back up to S3. 3. Incremental backups to S3. Reconstruct secondary nodes from S3. Ozgun Erdogan | DataEngConf NYC 2017 3 Methods for HA & Backups in Postgres

Postgres - Streaming Replication (1) Write-ahead logs (streaming repl.) Table
foo Primary – PostgreSQL streaming repl. Table bar WAL logs Table foo Table bar WAL logs Secondary – PostgreSQL streaming repl. Monitoring Agents - streaming repl. setup & auto failover S3 / Blob Storage (Encrypted) Backup Process Ozgun Erdogan | DataEngConf NYC 2017

Postgres – AWS RDS & Azure (2) Postgres Primary Monitoring
Agents (Auto node failover) Persistent Volume Postgres Standby S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Table foo Table bar WAL logs Backup process Backup Process Ozgun Erdogan | DataEngConf NYC 2017 Persistent Volume

Postgres – Reconstruct from WAL (3) Postgres Primary Monitoring Agents
(Auto node failover) Persistent Volume Postgres Secondary Backup Process S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Persistent Volume Table foo Table bar WAL logs Backup process Ozgun Erdogan | DataEngConf NYC 2017

WHO DOES THIS? PRIMARY BENEFITS Streaming Replication (local / ephemeral
disk) On-prem Manual EC2 Simple to set up Direct I/O: High I/O & large storage Disk Mirroring RDS Azure Preview Works for MySQL and PostgreSQL Data durability in cloud environments Reconstruct from WAL Heroku Citus Data Enables Fork and PITR Node reconstruction in background (Data durability in cloud environments) How do these approaches compare? 17 Ozgun Erdogan | DataEngConf NYC 2017

Summary • In PostgreSQL, a database node’s state gets replicated
in its entirety. The replication can be set up in three ways. • Reconstructing a secondary node from S3 makes bringing up or shooting down nodes easy. • When you shard your database, the state you need to replicate per node becomes smaller. Ozgun Erdogan | DataEngConf NYC 2017

PostgreSQL has a huge ecosystem. How do you keep up
with it? 2

3 ways to build a distributed database 1. Build a
distributed database from scratch 2. Middleware sharding (mimic the parser) 3. Fork your favorite database (like PostgreSQL) Ozgun Erdogan | DataEngConf NYC 2017

Example Transaction Block Ozgun Erdogan | DataEngConf NYC 2017

Postgres Features, Tools & Frameworks • PostgreSQL manual (US Letter)
• Clients for diff programming languages • ORMs, libraries, GUIs • Tools (dump, restore, analyze) • New features Ozgun Erdogan | DataEngConf NYC 2017

At First, Forked PostgreSQL with Style Ozgun Erdogan | DataEngConf
NYC 2017

Two Stage Query Optimization 1. Plan to minimize network I/O
2. Nodes talk to each other using SQL over libpq 3. Learned to cooperate with planner / executor bit by bit (Volcano style executor) Ozgun Erdogan | DataEngConf NYC 2017

Citus Architecture (Simplified) 25 SELECT avg(revenue) FROM sales Coordinator SELECT
sum(revenue), count(revenue) FROM table_1001 SELECT sum … FROM table_1003 Worker node 1 Table metadata Table_1001 Table_1003 SELECT sum … FROM table_1002 SELECT sum … FROM table_1004 Worker node 2 Table_1002 Table_1004 Worker node N . . . . . . Each node PostgreSQL with Citus installed 1 shard = 1 PostgreSQL table Ozgun Erdogan | DataEngConf NYC 2017

Unfork Citus using Extension APIs CREATE EXTENSION citus; • System
catalogs – Distributed metadata • Planner hook – Insert, Update, Delete, Select • Executor hook – Insert, Update, Delete, Select • Utility hook – Alter Table, Create Index, Vacuum, etc. • Transaction & resources handling – file descriptors, etc. • Background worker process – Maintenance processes (distributed deadlock detection, task tracker, etc.) • Logical decoding – Online data migrations Ozgun Erdogan | DataEngConf NYC 2017

PostgreSQL has transactions. How to handle distributed transactions 3

BEGIN INSERT UPDATE SELECT COMMIT ROLLBACK

Consistency in Distributed Databases 1. 2PC: All participating nodes need
to be up 2. Paxos: Achieves consensus with quorum 3. Raft: More understandable alternative to Paxos Ozgun Erdogan | DataEngConf NYC 2017

Concurrency in Distributed Databases Ozgun Erdogan | DataEngConf NYC 2017

Locks Locks

What is a Lock? • Protects against concurrent modifications. •
Locks are released at the end of a transaction. Deadlocks

Transactions Block on 1st Conflicting Lock What is a lock?
Protects against concurrent modifications Locks released at end of transaction BEGIN; UPDATE data SET y = 2 WHERE x = 1; <obtained lock on rows with x = 1> COMMIT; <all locks released> BEGIN; UPDATE data SET y = 5 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT;

Transactions and Concurrency • Transactions that don’t modify the same
row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT;

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other?

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other? Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other.

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other? Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other? Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other? Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other? Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other? Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection

row can run concurrently. Transactions block on 1st lock that conflicts BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; COMMIT; <all locks released> BEGIN; UPDATE data SET y = y + 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; <waiting for lock on rows with x = 1> <obtained lock on rows with x = 1> COMMIT; (Distributed) deadlock! BEGIN; UPDATE data SET y = y - 1 WHERE x = 1; UPDATE data SET y = y + 1 WHERE x = 2; BEGIN; UPDATE data SET y = y - 1 WHERE x = 2; UPDATE data SET y = y + 1 WHERE x = 1; But what if they start blocking each other? Deadlock detection in PostgreSQL Deadlock detection builds a graph of processes that are waiting for each other. Deadlock detection in PostgreSQL Transactions are cancelled until the cycle is gone Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus Citus delegates transactions to nodes Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus PostgreSQL’s deadlock detector still works Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlocks in Citus When deadlocks span across node, PostgreSQL cannot help us Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection Firstname Lastname | Citus Data | Meeting Name | Month Year Deadlock detection in Citus 7 Citus 7 adds distributed deadlock detection.

Distributed transactions are… a complex topic • Most articles on
distributed transactions focus on data consistency. • Data consistency is only one side of the coin. If you’re using a relational database, your application benefits from another key feature: deadlock detection. • https://www.citusdata.com/blog/2017/08/31/databases -and-distributed-deadlocks-a-faq Ozgun Erdogan | DataEngConf NYC 2017

So now what? We talked about 3 challenges distributing Postgres…
1. PostgreSQL, Replication, High Availability 2. Tradeoffs in different approaches to building a distributed database—and how we chose PostgreSQL’s extension APIs 3. Distributed deadlock detection & distributed transactions Ozgun Erdogan | DataEngConf NYC 2017

45 “SQL is hard, not impossible, to scale”

The Challenges of Distributing Postgres: A Citu...

The Challenges of Distributing Postgres: A Citus Story

More Decks by Ozgun Erdogan

Other Decks in Technology

Featured

Transcript