KEYNOTE: The Distributed PostgreSQL Problem & How Citus Solves it | Citus Con 2023 | Marco Slot

Citus is a PostgreSQL extension that adds the ability to
distribute and replicate PostgreSQL tables across a shared-nothing PostgreSQL cluster. Citus open-source repo on GitHub: https://github.com/citusdata/citus Citus is a core component of Azure Cosmos DB for PostgreSQL

Many cloud era (OLTP) applications have activity & data multipliers:
Data-intensity can overwhelm PostgreSQL on typical cloud VM with typical cloud storage.

General scalability challenges:

Why is it hard to build Distributed PostgreSQL?

Evaluating a relationship is a computation that requires knowledge of
both sides Relational databases are relational

Evaluating a relationship in a single machine: Following and evaluating
a relationship takes time C Row Row Memory Disk Index Index Index

Ro w Evaluating long-distance relationships takes a lot of time

The PostgreSQL protocol is synchronous. Maximum possible throughput: #connections /
(avg. response time) Many ORMs send long transaction blocks with multiple queries, subtransactions (e.g. 100ms query becomes 1s transaction) High number of concurrent connections often impractical for applications.

Ro w To make distributed relational databases fast, make operations
non-distributed Ro w

Requires certain workload patterns, finding a scaling dimension with relatively
few relationships

Tables can be distributed & replicated according to data relationships.
Co-location: distributed distributed Reference tables: reference devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30)

Queries can often be fully pushed down to node that
holds data & relationships. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) select * from measurements join devices using (device_id) join zones using (zone_id) where device_id = 22;

Queries can often be fully pushed down to node that
holds data & relationships. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) insert into measurements values (22, …); update devices set active = true where device_id = 22; call stored_proc(device_id := 22, …);

Cross-shard joins can be efficiently pushed down when they join
on co-located shard key or with a reference table. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) Joins Foreign keys select * from measurements join devices using (device_id) join zones using (zone_id); (non-co-located joins have worse perf, some limitations)

Do not take “distributed = fast” for granted. * HammerDB
stored procedures not supported on CockroachDB, used built-in TPC-C implementation 0 200000 400000 600000 800000 1000000 1200000 PostgreSQL (96 vcpus) Yugabyte Managed (224 vcpus) CockroachDB Dedicated* (224 vcpus) Azure Cosmos DB for PostgreSQL (224 vcores) NOPM (higher is better) HammerDB TPROC-C with 1000 warehouses on 224 cores 1k warehouses best result (20k warehuses)

Microservices can scale their CRUD workloads (simple single shard queries)
SaaS apps can co-locate by tenant ID (complex single shard queries) IoT apps can co-locate measurements & devices by device ID (parallel queries) Geospatial apps can replicate the “map” to all nodes, while keeping point data in distributed tables and do fast spatial joins.

Any worker node can handle distributed queries & transactions SQL
Requests Real-time analytics (e.g. IoT, time series) High throughput CRUD (e.g. microservices) Multi-tenant OLTP (e.g. Software-as-a-service)

4. SQL Syntax 5. Data Definition 6. Data Manipulation 7.
Queries 8. Data Types 9. Functions and Operators 10. Type Conversion 11. Indexes 12. Full Text Search 13. Concurrency Control 14. Performance Tips 15. Parallel Query 19. Server Setup and Operation 20. Server Configuration 21. Client Authentication 22. Database Roles 23. Managing Databases 24. Localization 25. Routine Database Maintenance Tasks 26. Backup and Restore 27. High Availability, Load Balancing, and Replication 28. Monitoring Database Activity 29. Monitoring Disk Usage 30. Reliability and the Write-Ahead Log 31. Logical Replication 32. Just-in-Time Compilation (JIT) 33. Regression Tests 19. Server Setup and Operation 20. Server Configuration 21. Client Authentication 22. Database Roles 23. Managing Databases 24. Localization 25. Routine Database Maintenance Tasks 26. Backup and Restore 27. High Availability, Load Balancing, and Replication 28. Monitoring Database Activity 29. Monitoring Disk Usage 30. Reliability and the Write-Ahead Log 31. Logical Replication 32. Just-in-Time Compilation (JIT) 33. Regression Tests 38. Extending SQL 39. Triggers 40. Event Triggers 41. The Rule System 42. Procedural Languages 43. PL/pgSQL — SQL Procedural Language 44. PL/Tcl — Tcl Procedural Language 45. PL/Perl — Perl Procedural Language 46. PL/Python — Python Procedural Language 47. Server Programming Interface 48. Background Worker Processes 49. Logical Decoding 50. Replication Progress Tracking 51. Archive Modules 52. Overview of PostgreSQL Internals 53. System Catalogs 54. System Views 55. Frontend/Backend Protocol 56. PostgreSQL Coding Conventions 57. Native Language Support 58. Writing a Procedural Language Handler 59. Writing a Foreign Data Wrapper 60. Writing a Table Sampling Method 61. Writing a Custom Scan Provider 62. Genetic Query Optimizer 63. Table Access Method Interface Definition 64. Index Access Method Interface Definition 65. Generic WAL Records 66. Custom WAL Resource Managers 67. B-Tree Indexes 68. GiST Indexes 69. SP-GiST Indexes 70. GIN Indexes 71. BRIN Indexes 72. Hash Indexes 73. Database Physical Storage 74. System Catalog Declarations and Initial Contents 75. How the Planner Uses Statistics 76. Backup Manifest Format

Joins Transaction blocks Subqueries & CTEs Sequences Expression indexes Partial
indexes Custom types Prepared statements Stored procedures Time-partitioning … Schema-level sharding DDL from any node Automatic shard splits Non-co-located foreign keys, triggers Unique constraints on non-dist. column Cross-node snapshot isolation Geo-partitioning Database-level sharding Non-co-located correlated subqueries Vectorized execution … Distributed & reference tables Co-location Scale OLTP throughput Fast co-located joins, foreign keys, .. Parallel, distributed queries Transactional ETL (INSERT..SELECT) Fast data loading (COPY) Online rebalancing Stored procedure call routing Columnar compression … Most PostgreSQL features just work on Citus tables Distributed database superpowers with PostgreSQL-level efficiency Some gaps remain

PostgreSQL is the best PostgreSQL implementation. Build a distributed database
on top using extension APIs. PostgreSQL 1 release per year community-driven OSS database engineering 9+ active contributors at MS Citus 3-4 releases per year Microsoft-driven OSS distributed systems engineering 13 engineers

[email protected]

[email protected] https://aka.ms/open-source-discord http://aka.ms/cituscon-ondemand https://github.com/citusdata/citus

KEYNOTE: The Distributed PostgreSQL Problem & H...

KEYNOTE: The Distributed PostgreSQL Problem & How Citus Solves it | Citus Con 2023 | Marco Slot

Citus Data

More Decks by Citus Data

Other Decks in Technology

Featured

Transcript

Citus is a PostgreSQL extension that adds the ability to

Many cloud era (OLTP) applications have activity & data multipliers:

General scalability challenges:

Why is it hard to build Distributed PostgreSQL?

Evaluating a relationship is a computation that requires knowledge of

Evaluating a relationship in a single machine: Following and evaluating

Ro w Evaluating long-distance relationships takes a lot of time

The PostgreSQL protocol is synchronous. Maximum possible throughput: #connections /

Ro w To make distributed relational databases fast, make operations

Requires certain workload patterns, finding a scaling dimension with relatively

Tables can be distributed & replicated according to data relationships.

Queries can often be fully pushed down to node that

Queries can often be fully pushed down to node that

Cross-shard joins can be efficiently pushed down when they join

Do not take “distributed = fast” for granted. * HammerDB

Microservices can scale their CRUD workloads (simple single shard queries)

Any worker node can handle distributed queries & transactions SQL

4. SQL Syntax 5. Data Definition 6. Data Manipulation 7.

Joins Transaction blocks Subqueries & CTEs Sequences Expression indexes Partial

PostgreSQL is the best PostgreSQL implementation. Build a distributed database

[email protected]

[email protected] https://aka.ms/open-source-discord http://aka.ms/cituscon-ondemand https://github.com/citusdata/citus