Upgrade to Pro — share decks privately, control downloads, hide ads and more …

KEYNOTE: The Distributed PostgreSQL Problem & How Citus Solves it | Citus Con 2023 | Marco Slot

KEYNOTE: The Distributed PostgreSQL Problem & How Citus Solves it | Citus Con 2023 | Marco Slot

These are the slides from Marco Slot's keynote talk at Citus Con: An Event for Postgres 2023, a virtual event organized by Microsoft. Building distributed PostgreSQL is perhaps one of the most challenging software engineering projects imaginable. In this keynote, Marco explores the distributed PostgreSQL problem and how Citus solves it.

Early on, the Citus team decided to architect Citus as PostgreSQL extension. That way Citus remains part of the PostgreSQL ecosystem even as PostgreSQL keeps developing. Moreover, architecting Citus as an extension made distribution a feature that can simply be added to PostgreSQL without losing the versatile feature set of Postgres, nor its mature, efficient implementations.

The goal of the Citus database is to provide high PostgreSQL performance at any scale, but simply distributing data across machines is rarely sufficient to achieve that. Crisp distribution concepts and careful trade-offs are important to favor workload patterns that benefit from scaling out. There are also many complex engineering problems given the large PostgreSQL feature set, failures and concurrency in distributed systems, and mission-critical nature of databases.

Marco discusses the main engineering challenges faced over the past 10 years of developing the fastest, most mature, open-source Distributed PostgreSQL implementation: Citus.

Citus Data

May 24, 2023
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Citus is a PostgreSQL extension that adds the ability to

    distribute and replicate PostgreSQL tables across a shared-nothing PostgreSQL cluster. Citus open-source repo on GitHub: https://github.com/citusdata/citus Citus is a core component of Azure Cosmos DB for PostgreSQL
  2. Many cloud era (OLTP) applications have activity & data multipliers:

    Data-intensity can overwhelm PostgreSQL on typical cloud VM with typical cloud storage.
  3. Evaluating a relationship is a computation that requires knowledge of

    both sides Relational databases are relational
  4. Evaluating a relationship in a single machine: Following and evaluating

    a relationship takes time C Row Row Memory Disk Index Index Index
  5. The PostgreSQL protocol is synchronous. Maximum possible throughput: #connections /

    (avg. response time) Many ORMs send long transaction blocks with multiple queries, subtransactions (e.g. 100ms query becomes 1s transaction) High number of concurrent connections often impractical for applications.
  6. Tables can be distributed & replicated according to data relationships.

    Co-location: distributed distributed Reference tables: reference devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30)
  7. Queries can often be fully pushed down to node that

    holds data & relationships. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) select * from measurements join devices using (device_id) join zones using (zone_id) where device_id = 22;
  8. Queries can often be fully pushed down to node that

    holds data & relationships. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) insert into measurements values (22, …); update devices set active = true where device_id = 22; call stored_proc(device_id := 22, …);
  9. Cross-shard joins can be efficiently pushed down when they join

    on co-located shard key or with a reference table. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) Joins Foreign keys select * from measurements join devices using (device_id) join zones using (zone_id); (non-co-located joins have worse perf, some limitations)
  10. Do not take “distributed = fast” for granted. * HammerDB

    stored procedures not supported on CockroachDB, used built-in TPC-C implementation 0 200000 400000 600000 800000 1000000 1200000 PostgreSQL (96 vcpus) Yugabyte Managed (224 vcpus) CockroachDB Dedicated* (224 vcpus) Azure Cosmos DB for PostgreSQL (224 vcores) NOPM (higher is better) HammerDB TPROC-C with 1000 warehouses on 224 cores 1k warehouses best result (20k warehuses)
  11. Microservices can scale their CRUD workloads (simple single shard queries)

    SaaS apps can co-locate by tenant ID (complex single shard queries) IoT apps can co-locate measurements & devices by device ID (parallel queries) Geospatial apps can replicate the “map” to all nodes, while keeping point data in distributed tables and do fast spatial joins.
  12. Any worker node can handle distributed queries & transactions SQL

    Requests Real-time analytics (e.g. IoT, time series) High throughput CRUD (e.g. microservices) Multi-tenant OLTP (e.g. Software-as-a-service)
  13. 4. SQL Syntax 5. Data Definition 6. Data Manipulation 7.

    Queries 8. Data Types 9. Functions and Operators 10. Type Conversion 11. Indexes 12. Full Text Search 13. Concurrency Control 14. Performance Tips 15. Parallel Query 19. Server Setup and Operation 20. Server Configuration 21. Client Authentication 22. Database Roles 23. Managing Databases 24. Localization 25. Routine Database Maintenance Tasks 26. Backup and Restore 27. High Availability, Load Balancing, and Replication 28. Monitoring Database Activity 29. Monitoring Disk Usage 30. Reliability and the Write-Ahead Log 31. Logical Replication 32. Just-in-Time Compilation (JIT) 33. Regression Tests 19. Server Setup and Operation 20. Server Configuration 21. Client Authentication 22. Database Roles 23. Managing Databases 24. Localization 25. Routine Database Maintenance Tasks 26. Backup and Restore 27. High Availability, Load Balancing, and Replication 28. Monitoring Database Activity 29. Monitoring Disk Usage 30. Reliability and the Write-Ahead Log 31. Logical Replication 32. Just-in-Time Compilation (JIT) 33. Regression Tests 38. Extending SQL 39. Triggers 40. Event Triggers 41. The Rule System 42. Procedural Languages 43. PL/pgSQL — SQL Procedural Language 44. PL/Tcl — Tcl Procedural Language 45. PL/Perl — Perl Procedural Language 46. PL/Python — Python Procedural Language 47. Server Programming Interface 48. Background Worker Processes 49. Logical Decoding 50. Replication Progress Tracking 51. Archive Modules 52. Overview of PostgreSQL Internals 53. System Catalogs 54. System Views 55. Frontend/Backend Protocol 56. PostgreSQL Coding Conventions 57. Native Language Support 58. Writing a Procedural Language Handler 59. Writing a Foreign Data Wrapper 60. Writing a Table Sampling Method 61. Writing a Custom Scan Provider 62. Genetic Query Optimizer 63. Table Access Method Interface Definition 64. Index Access Method Interface Definition 65. Generic WAL Records 66. Custom WAL Resource Managers 67. B-Tree Indexes 68. GiST Indexes 69. SP-GiST Indexes 70. GIN Indexes 71. BRIN Indexes 72. Hash Indexes 73. Database Physical Storage 74. System Catalog Declarations and Initial Contents 75. How the Planner Uses Statistics 76. Backup Manifest Format
  14. Joins Transaction blocks Subqueries & CTEs Sequences Expression indexes Partial

    indexes Custom types Prepared statements Stored procedures Time-partitioning … Schema-level sharding DDL from any node Automatic shard splits Non-co-located foreign keys, triggers Unique constraints on non-dist. column Cross-node snapshot isolation Geo-partitioning Database-level sharding Non-co-located correlated subqueries Vectorized execution … Distributed & reference tables Co-location Scale OLTP throughput Fast co-located joins, foreign keys, .. Parallel, distributed queries Transactional ETL (INSERT..SELECT) Fast data loading (COPY) Online rebalancing Stored procedure call routing Columnar compression … Most PostgreSQL features just work on Citus tables Distributed database superpowers with PostgreSQL-level efficiency Some gaps remain
  15. PostgreSQL is the best PostgreSQL implementation. Build a distributed database

    on top using extension APIs. PostgreSQL 1 release per year community-driven OSS database engineering 9+ active contributors at MS Citus 3-4 releases per year Microsoft-driven OSS distributed systems engineering 13 engineers