Challenges of Distributing Postgres: A Citus Story | QCon San Francisco 2017 | Ozgun Erdogan

024d6a0dd14fb31c804969a57a06dfbe?s=47 Citus Data
November 13, 2017

Challenges of Distributing Postgres: A Citus Story | QCon San Francisco 2017 | Ozgun Erdogan

Set theory forms the basis for relational algebra and relational databases, and SQL is the lingua franca of modern RDBMS’s. Even with all the attention given to NoSQL in recent years, the lion share of database usage remains relational. But until recently, nearly all relational database solutions have been limited to the resources of a single node. Not anymore. This talk is about my team’s journey tackling the challenges of distributing SQL. Specifically in the context of my favorite (open source) database: Postgres. I believe that too many developers spend too much time worrying about scaling their databases. So at Citus Data, we created an extension to Postgres that enables developers to scale out compute, memory, and storage by distributing queries across a cluster of nodes. This talk describes the distributed systems challenges we faced at Citus in scaling out Postgres—and how we addressed them. I’ll talk about how we use PostgreSQL’s extension APIs to parallelize queries in a distributed cluster. I’ll cover the architecture of a distributed query planner and specifically how the join order planner has to choose between broadcast, co-located, and repartition joins in order to minimize network I/O. And if there’s time, I’ll walk through the dynamic executor logic that we built. The end result: a distributed database and a lot less time spent worrying about scale.

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

November 13, 2017
Tweet

Transcript

  1. Challenges of Distributing Postgres:
 A Citus Story Ozgun Erdogan QCon

    San Francisco | November 2017
  2. Developers Love Postgres PostgreSQL MySQL MongoDB SQL Server + Oracle

    0% 18% 35% 53% 70% Mar-11 Sept-11 Mar-12 Sept-12 Mar-13 Sept-13 Mar-14 Sept-14 Mar-15 Sept-15 Mar-16 RDBMS: PostgreSQL, MySQL, Microsoft SQL Server, Oracle Ozgun Erdogan | QCon San Francisco 2017
  3. I love Postgres, too 3 Ozgun Erdogan | QCon San

    Francisco 2017 Ozgun Erdogan CTO of Citus Data Distributed Systems Distributed Databases Formerly of Amazon Love drinking margaritas
  4. 4

  5. Our mission at Citus Data 5 Ozgun Erdogan | QCon

    San Francisco 2017 Make it so SaaS businesses
 never have to worry about scaling their database again
  6. What is the Citus database? 1.Scales out PostgreSQL 2.Extension to

    PostgreSQL 3.Available in 3 Ways Ozgun Erdogan | QCon San Francisco 2017 • Using sharding & replication • Query engine parallelizes SQL queries across many nodes • Using PostgreSQL extension APIs
  7. Citus, Packaged Three Ways Ozgun Erdogan | QCon San Francisco

    2017 Open
 Source Enterprise
 Software Fully-Managed
 Database as a Service github.com/citusdata/citus
  8. Simplified Citus Architecture

  9. 3 Challenges Distributing Postgres 1. PostgreSQL and High Availability 2.

    PostgreSQL is huge. How to keep up with it 3. Distributed transactions Ozgun Erdogan | QCon San Francisco 2017
  10. PostgreSQL &
 High Availability (HA) Designing for a Cloud-native world

    1
  11. Why is High Availability hard? PostgreSQL replication uses one primary

    & multiple secondary nodes. Two challenges: 1. Most Postgres clients aren’t smart. When the primary fails, they retry the same IP. 2. Postgres replicates entire state. This makes it resource intensive to reconstruct new nodes from a primary. Ozgun Erdogan | QCon San Francisco 2017
  12. Database Failures Should Be Transparent Ozgun Erdogan | QCon San

    Francisco 2017
  13. Database Failures Shouldn’t Be a Big Deal 1. PostgreSQL streaming

    replication to replicate from primary to secondary. Back up to S3. 2. Volume level replication to replicate to secondary’s volume. Back up to S3. 3. Incremental backups to S3. Reconstruct secondary nodes from S3. Ozgun Erdogan | QCon San Francisco 2017 3 Methods for HA & Backups in Postgres
  14. Postgres - Streaming Replication (1) Write-ahead logs (streaming repl.) Table

    foo Primary – PostgreSQL streaming repl. Table bar WAL logs Table foo Table bar WAL logs Secondary – PostgreSQL streaming repl. Monitoring Agents - streaming repl. setup & auto failover S3 / Blob Storage (Encrypted) Backup Process Ozgun Erdogan | QCon San Francisco 2017
  15. Postgres – AWS RDS & Azure (2) Postgres Primary Monitoring

    Agents (Auto node failover) Persistent Volume Postgres Standby S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Table foo Table bar WAL logs Backup process Backup Process Persistent Volume Ozgun Erdogan | QCon San Francisco 2017
  16. Postgres – Reconstruct from WAL (3) Postgres Primary Monitoring Agents

    (Auto node failover) Persistent Volume Postgres Secondary Backup Process S3 / Blob Storage (Encrypted) Table foo Table bar WAL logs Persistent Volume Table foo Table bar WAL logs Backup process Ozgun Erdogan | QCon San Francisco 2017
  17. WHO DOES THIS? PRIMARY BENEFITS Streaming Replication (local / ephemeral

    disk) On-prem
 Manual EC2 Simple to set up Direct I/O: High I/O & large storage Disk Mirroring RDS
 Azure Preview Works for MySQL and PostgreSQL
 Data durability in cloud environments Reconstruct from WAL Heroku
 Citus Cloud Enables Fork and PITR Node reconstruction in background (Data durability in cloud environments) How do these approaches compare? 17 Ozgun Erdogan | QCon San Francisco 2017
  18. Summary • In PostgreSQL, a database node’s state gets replicated

    in its entirety. The replication can be set up in three ways. • Reconstructing a secondary node from S3 makes bringing up or shooting down nodes easy. • When you shard your database, the state you need to replicate per node becomes smaller. Ozgun Erdogan | QCon San Francisco 2017
  19. PostgreSQL is huge How do you scale all features? 2

  20. 3 ways to build a distributed database 1. Build a

    distributed database from scratch 2. Middleware sharding (mimic the parser) 3. Fork your favorite database (like PostgreSQL) Ozgun Erdogan | QCon San Francisco 2017
  21. Example Transaction Block Ozgun Erdogan | QCon San Francisco 2017

  22. Postgres Features, Tools & Frameworks • PostgreSQL manual (US Letter)

    • Clients for diff programming languages • ORMs, libraries, GUIs • Tools (dump, restore, analyze) • New features Ozgun Erdogan | QCon San Francisco 2017
  23. At First, Forked PostgreSQL with Style Ozgun Erdogan | QCon

    San Francisco 2017
  24. Two Stage Query Optimization 1. Plan to minimize network I/O

    2. Nodes talk to each other using SQL over libpq 3. Learned to cooperate with planner / executor bit by bit (Volcano style executor) Ozgun Erdogan | QCon San Francisco 2017
  25. Citus Architecture (Simplified) 25 SELECT avg(revenue) FROM sales Coordinator SELECT

    sum(revenue), count(revenue) FROM table_1001 SELECT sum … FROM table_1003 Worker node 1 Table metadata Table_1001 Table_1003 SELECT sum … FROM table_1002 SELECT sum … FROM table_1004 Worker node 2 Table_1002 Table_1004 Worker node N . . . . . . Each node PostgreSQL with Citus installed 1 shard = 1 PostgreSQL table Ozgun Erdogan | QCon San Francisco 2017
  26. Unfork Citus using Extension APIs CREATE EXTENSION citus; • System

    catalogs – Distributed metadata • Planner hook – Insert, Update, Delete, Select • Executor hook – Insert, Update, Delete, Select • Utility hook – Alter Table, Create Index, Vacuum, etc. • Transaction & resources handling – file descriptors, etc. • Background worker process – Maintenance processes (distributed deadlock detection, task tracker, etc.) • Logical decoding – Online data migrations Ozgun Erdogan | QCon San Francisco 2017
  27. PostgreSQL has transactions. How to handle distributed transactions 3

  28. BEGIN INSERT UPDATE SELECT COMMIT ROLLBA CK

  29. Consistency in Distributed Databases 1. 2PC: All participating nodes need

    to be up 2. Paxos: Achieves consensus with quorum 3. Raft: More understandable alternative to Paxos Ozgun Erdogan | QCon San Francisco 2017
  30. Concurrency in Distributed Databases Ozgun Erdogan | QCon San Francisco

    2017
  31. Locks

  32. What is a Lock? • Protects against concurrent modifications. •

    Locks are released at the end of a transaction.
  33. Transactions Block on 1st Conflicting Lock

  34. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  35. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  36. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  37. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  38. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  39. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  40. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  41. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  42. Transactions and Concurrency • Transactions that don’t modify the same

    row can run concurrently.
  43. Distributed transactions are a complex topic • Most articles on

    distributed transactions focus on data consistency. • Data consistency is only one side of the coin. If you’re using a relational database, your application benefits from another key feature: deadlock detection. • https://www.citusdata.com/blog/2017/08/31/ databases-and-distributed-deadlocks-a-faq Ozgun Erdogan | QCon San Francisco 2017
  44. So now what? We talked about 3 challenges distributing Postgres

    1. PostgreSQL, Replication, High Availability 2. Tradeoffs in building a distributed database— and how we chose PostgreSQL’s extension APIs 3. Distributed deadlock detection & distributed transactions Ozgun Erdogan | QCon San Francisco 2017
  45. 45 Solving a very complex problem

  46. 46 “SQL is hard, not impossible, to scale”

  47. © 2017 Citus Data. All right reserved. ozgun@citusdata.com Questions? @citusdata

    Ozgun Erdogan www.citusdata.com