Upgrade to Pro — share decks privately, control downloads, hide ads and more …

KEYNOTE: The Distributed PostgreSQL Problem & How Citus Solves it | Citus Con 2023 | Marco Slot

KEYNOTE: The Distributed PostgreSQL Problem & How Citus Solves it | Citus Con 2023 | Marco Slot

These are the slides from Marco Slot's keynote talk at Citus Con: An Event for Postgres 2023, a virtual event organized by Microsoft. Building distributed PostgreSQL is perhaps one of the most challenging software engineering projects imaginable. In this keynote, Marco explores the distributed PostgreSQL problem and how Citus solves it.

Early on, the Citus team decided to architect Citus as PostgreSQL extension. That way Citus remains part of the PostgreSQL ecosystem even as PostgreSQL keeps developing. Moreover, architecting Citus as an extension made distribution a feature that can simply be added to PostgreSQL without losing the versatile feature set of Postgres, nor its mature, efficient implementations.

The goal of the Citus database is to provide high PostgreSQL performance at any scale, but simply distributing data across machines is rarely sufficient to achieve that. Crisp distribution concepts and careful trade-offs are important to favor workload patterns that benefit from scaling out. There are also many complex engineering problems given the large PostgreSQL feature set, failures and concurrency in distributed systems, and mission-critical nature of databases.

Marco discusses the main engineering challenges faced over the past 10 years of developing the fastest, most mature, open-source Distributed PostgreSQL implementation: Citus.

Citus Data

May 24, 2023
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. View Slide

  2. Citus is a PostgreSQL extension that adds the ability to distribute and replicate
    PostgreSQL tables across a shared-nothing PostgreSQL cluster.
    Citus open-source repo on GitHub: https://github.com/citusdata/citus
    Citus is a core component of Azure Cosmos DB for PostgreSQL

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. Many cloud era (OLTP) applications have activity & data multipliers:
    Data-intensity can overwhelm PostgreSQL on typical cloud VM
    with typical cloud storage.

    View Slide

  7. General scalability challenges:

    View Slide

  8. View Slide

  9. View Slide

  10. Why is it hard to build Distributed PostgreSQL?

    View Slide

  11. View Slide

  12. View Slide

  13. Evaluating a relationship is a computation that requires knowledge of both sides
    Relational databases are relational

    View Slide

  14. Evaluating a relationship in a single machine:
    Following and evaluating a relationship takes time
    C
    Row Row
    Memory
    Disk
    Index
    Index
    Index

    View Slide

  15. View Slide

  16. Ro
    w
    Evaluating long-distance relationships takes a lot of time

    View Slide

  17. The PostgreSQL protocol is synchronous.
    Maximum possible throughput: #connections / (avg. response time)
    Many ORMs send long transaction blocks with multiple queries, subtransactions
    (e.g. 100ms query becomes 1s transaction)
    High number of concurrent connections often impractical for applications.

    View Slide

  18. Ro
    w
    To make distributed relational databases fast, make operations non-distributed
    Ro
    w

    View Slide

  19. Requires certain workload patterns, finding a scaling dimension
    with relatively few relationships

    View Slide

  20. Tables can be distributed & replicated according to data relationships.
    Co-location: distributed
    distributed
    Reference tables: reference
    devices
    (1-10)
    zones
    measurements
    (1-10)
    devices
    (11-20)
    zones
    measurements
    (11-20)
    devices
    (21-30)
    zones
    measurements
    (21-30)

    View Slide

  21. Queries can often be fully pushed down to node that holds data & relationships.
    devices
    (1-10)
    zones
    measurements
    (1-10)
    devices
    (11-20)
    zones
    measurements
    (11-20)
    devices
    (21-30)
    zones
    measurements
    (21-30)
    select * from measurements
    join devices using (device_id)
    join zones using (zone_id)
    where device_id = 22;

    View Slide

  22. Queries can often be fully pushed down to node that holds data & relationships.
    devices
    (1-10)
    zones
    measurements
    (1-10)
    devices
    (11-20)
    zones
    measurements
    (11-20)
    devices
    (21-30)
    zones
    measurements
    (21-30)
    insert into measurements values (22, …);
    update devices set active = true where device_id = 22;
    call stored_proc(device_id := 22, …);

    View Slide

  23. Cross-shard joins can be efficiently pushed down when they join on co-located
    shard key or with a reference table.
    devices
    (1-10)
    zones
    measurements
    (1-10)
    devices
    (11-20)
    zones
    measurements
    (11-20)
    devices
    (21-30)
    zones
    measurements
    (21-30)
    Joins
    Foreign keys
    select * from measurements
    join devices using (device_id)
    join zones using (zone_id);
    (non-co-located joins have
    worse perf, some limitations)

    View Slide

  24. Do not take “distributed = fast” for granted.
    * HammerDB stored procedures not supported on CockroachDB, used built-in TPC-C implementation
    0 200000 400000 600000 800000 1000000 1200000
    PostgreSQL (96 vcpus)
    Yugabyte Managed (224 vcpus)
    CockroachDB Dedicated* (224 vcpus)
    Azure Cosmos DB for PostgreSQL (224 vcores)
    NOPM (higher is better)
    HammerDB TPROC-C with 1000 warehouses on 224 cores
    1k warehouses best result (20k warehuses)

    View Slide

  25. View Slide

  26. Microservices can scale their CRUD workloads (simple single shard queries)
    SaaS apps can co-locate by tenant ID (complex single shard queries)
    IoT apps can co-locate measurements & devices by device ID (parallel queries)
    Geospatial apps can replicate the “map” to all nodes, while keeping point data in
    distributed tables and do fast spatial joins.

    View Slide

  27. Any worker node can handle distributed queries & transactions
    SQL Requests
    Real-time analytics
    (e.g. IoT, time series)
    High throughput CRUD
    (e.g. microservices)
    Multi-tenant OLTP
    (e.g. Software-as-a-service)

    View Slide

  28. View Slide

  29. View Slide

  30. 4. SQL Syntax
    5. Data Definition
    6. Data Manipulation
    7. Queries
    8. Data Types
    9. Functions and Operators
    10. Type Conversion
    11. Indexes
    12. Full Text Search
    13. Concurrency Control
    14. Performance Tips
    15. Parallel Query
    19. Server Setup and Operation
    20. Server Configuration
    21. Client Authentication
    22. Database Roles
    23. Managing Databases
    24. Localization
    25. Routine Database Maintenance Tasks
    26. Backup and Restore
    27. High Availability, Load Balancing, and Replication
    28. Monitoring Database Activity
    29. Monitoring Disk Usage
    30. Reliability and the Write-Ahead Log
    31. Logical Replication
    32. Just-in-Time Compilation (JIT)
    33. Regression Tests
    19. Server Setup and Operation
    20. Server Configuration
    21. Client Authentication
    22. Database Roles
    23. Managing Databases
    24. Localization
    25. Routine Database Maintenance Tasks
    26. Backup and Restore
    27. High Availability, Load Balancing, and Replication
    28. Monitoring Database Activity
    29. Monitoring Disk Usage
    30. Reliability and the Write-Ahead Log
    31. Logical Replication
    32. Just-in-Time Compilation (JIT)
    33. Regression Tests
    38. Extending SQL
    39. Triggers
    40. Event Triggers
    41. The Rule System
    42. Procedural Languages
    43. PL/pgSQL — SQL Procedural Language
    44. PL/Tcl — Tcl Procedural Language
    45. PL/Perl — Perl Procedural Language
    46. PL/Python — Python Procedural Language
    47. Server Programming Interface
    48. Background Worker Processes
    49. Logical Decoding
    50. Replication Progress Tracking
    51. Archive Modules
    52. Overview of PostgreSQL Internals
    53. System Catalogs
    54. System Views
    55. Frontend/Backend Protocol
    56. PostgreSQL Coding Conventions
    57. Native Language Support
    58. Writing a Procedural Language Handler
    59. Writing a Foreign Data Wrapper
    60. Writing a Table Sampling Method
    61. Writing a Custom Scan Provider
    62. Genetic Query Optimizer
    63. Table Access Method Interface Definition
    64. Index Access Method Interface Definition
    65. Generic WAL Records
    66. Custom WAL Resource Managers
    67. B-Tree Indexes
    68. GiST Indexes
    69. SP-GiST Indexes
    70. GIN Indexes
    71. BRIN Indexes
    72. Hash Indexes
    73. Database Physical Storage
    74. System Catalog Declarations and Initial Contents
    75. How the Planner Uses Statistics
    76. Backup Manifest Format

    View Slide

  31. Joins
    Transaction blocks
    Subqueries & CTEs
    Sequences
    Expression indexes
    Partial indexes
    Custom types
    Prepared statements
    Stored procedures
    Time-partitioning

    Schema-level sharding
    DDL from any node
    Automatic shard splits
    Non-co-located foreign keys, triggers
    Unique constraints on non-dist. column
    Cross-node snapshot isolation
    Geo-partitioning
    Database-level sharding
    Non-co-located correlated subqueries
    Vectorized execution

    Distributed & reference tables
    Co-location
    Scale OLTP throughput
    Fast co-located joins, foreign keys, ..
    Parallel, distributed queries
    Transactional ETL (INSERT..SELECT)
    Fast data loading (COPY)
    Online rebalancing
    Stored procedure call routing
    Columnar compression

    Most PostgreSQL features
    just work on Citus tables
    Distributed database superpowers
    with PostgreSQL-level efficiency
    Some gaps remain

    View Slide

  32. View Slide

  33. View Slide

  34. PostgreSQL is the best PostgreSQL implementation.
    Build a distributed database on top using extension APIs.
    PostgreSQL
    1 release per year
    community-driven OSS
    database engineering
    9+ active contributors at MS
    Citus
    3-4 releases per year
    Microsoft-driven OSS
    distributed systems engineering
    13 engineers

    View Slide

  35. View Slide

  36. [email protected] https://aka.ms/open-source-discord
    http://aka.ms/cituscon-ondemand
    https://github.com/citusdata/citus

    View Slide