Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Citus is a PostgreSQL extension that adds the ability to distribute and replicate PostgreSQL tables across a shared-nothing PostgreSQL cluster. Citus open-source repo on GitHub: https://github.com/citusdata/citus Citus is a core component of Azure Cosmos DB for PostgreSQL

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Many cloud era (OLTP) applications have activity & data multipliers: Data-intensity can overwhelm PostgreSQL on typical cloud VM with typical cloud storage.

Slide 7

Slide 7 text

General scalability challenges:

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Why is it hard to build Distributed PostgreSQL?

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Evaluating a relationship is a computation that requires knowledge of both sides Relational databases are relational

Slide 14

Slide 14 text

Evaluating a relationship in a single machine: Following and evaluating a relationship takes time C Row Row Memory Disk Index Index Index

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Ro w Evaluating long-distance relationships takes a lot of time

Slide 17

Slide 17 text

The PostgreSQL protocol is synchronous. Maximum possible throughput: #connections / (avg. response time) Many ORMs send long transaction blocks with multiple queries, subtransactions (e.g. 100ms query becomes 1s transaction) High number of concurrent connections often impractical for applications.

Slide 18

Slide 18 text

Ro w To make distributed relational databases fast, make operations non-distributed Ro w

Slide 19

Slide 19 text

Requires certain workload patterns, finding a scaling dimension with relatively few relationships

Slide 20

Slide 20 text

Tables can be distributed & replicated according to data relationships. Co-location: distributed distributed Reference tables: reference devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30)

Slide 21

Slide 21 text

Queries can often be fully pushed down to node that holds data & relationships. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) select * from measurements join devices using (device_id) join zones using (zone_id) where device_id = 22;

Slide 22

Slide 22 text

Queries can often be fully pushed down to node that holds data & relationships. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) insert into measurements values (22, …); update devices set active = true where device_id = 22; call stored_proc(device_id := 22, …);

Slide 23

Slide 23 text

Cross-shard joins can be efficiently pushed down when they join on co-located shard key or with a reference table. devices (1-10) zones measurements (1-10) devices (11-20) zones measurements (11-20) devices (21-30) zones measurements (21-30) Joins Foreign keys select * from measurements join devices using (device_id) join zones using (zone_id); (non-co-located joins have worse perf, some limitations)

Slide 24

Slide 24 text

Do not take “distributed = fast” for granted. * HammerDB stored procedures not supported on CockroachDB, used built-in TPC-C implementation 0 200000 400000 600000 800000 1000000 1200000 PostgreSQL (96 vcpus) Yugabyte Managed (224 vcpus) CockroachDB Dedicated* (224 vcpus) Azure Cosmos DB for PostgreSQL (224 vcores) NOPM (higher is better) HammerDB TPROC-C with 1000 warehouses on 224 cores 1k warehouses best result (20k warehuses)

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Microservices can scale their CRUD workloads (simple single shard queries) SaaS apps can co-locate by tenant ID (complex single shard queries) IoT apps can co-locate measurements & devices by device ID (parallel queries) Geospatial apps can replicate the “map” to all nodes, while keeping point data in distributed tables and do fast spatial joins.

Slide 27

Slide 27 text

Any worker node can handle distributed queries & transactions SQL Requests Real-time analytics (e.g. IoT, time series) High throughput CRUD (e.g. microservices) Multi-tenant OLTP (e.g. Software-as-a-service)

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

4. SQL Syntax 5. Data Definition 6. Data Manipulation 7. Queries 8. Data Types 9. Functions and Operators 10. Type Conversion 11. Indexes 12. Full Text Search 13. Concurrency Control 14. Performance Tips 15. Parallel Query 19. Server Setup and Operation 20. Server Configuration 21. Client Authentication 22. Database Roles 23. Managing Databases 24. Localization 25. Routine Database Maintenance Tasks 26. Backup and Restore 27. High Availability, Load Balancing, and Replication 28. Monitoring Database Activity 29. Monitoring Disk Usage 30. Reliability and the Write-Ahead Log 31. Logical Replication 32. Just-in-Time Compilation (JIT) 33. Regression Tests 19. Server Setup and Operation 20. Server Configuration 21. Client Authentication 22. Database Roles 23. Managing Databases 24. Localization 25. Routine Database Maintenance Tasks 26. Backup and Restore 27. High Availability, Load Balancing, and Replication 28. Monitoring Database Activity 29. Monitoring Disk Usage 30. Reliability and the Write-Ahead Log 31. Logical Replication 32. Just-in-Time Compilation (JIT) 33. Regression Tests 38. Extending SQL 39. Triggers 40. Event Triggers 41. The Rule System 42. Procedural Languages 43. PL/pgSQL — SQL Procedural Language 44. PL/Tcl — Tcl Procedural Language 45. PL/Perl — Perl Procedural Language 46. PL/Python — Python Procedural Language 47. Server Programming Interface 48. Background Worker Processes 49. Logical Decoding 50. Replication Progress Tracking 51. Archive Modules 52. Overview of PostgreSQL Internals 53. System Catalogs 54. System Views 55. Frontend/Backend Protocol 56. PostgreSQL Coding Conventions 57. Native Language Support 58. Writing a Procedural Language Handler 59. Writing a Foreign Data Wrapper 60. Writing a Table Sampling Method 61. Writing a Custom Scan Provider 62. Genetic Query Optimizer 63. Table Access Method Interface Definition 64. Index Access Method Interface Definition 65. Generic WAL Records 66. Custom WAL Resource Managers 67. B-Tree Indexes 68. GiST Indexes 69. SP-GiST Indexes 70. GIN Indexes 71. BRIN Indexes 72. Hash Indexes 73. Database Physical Storage 74. System Catalog Declarations and Initial Contents 75. How the Planner Uses Statistics 76. Backup Manifest Format

Slide 31

Slide 31 text

Joins Transaction blocks Subqueries & CTEs Sequences Expression indexes Partial indexes Custom types Prepared statements Stored procedures Time-partitioning … Schema-level sharding DDL from any node Automatic shard splits Non-co-located foreign keys, triggers Unique constraints on non-dist. column Cross-node snapshot isolation Geo-partitioning Database-level sharding Non-co-located correlated subqueries Vectorized execution … Distributed & reference tables Co-location Scale OLTP throughput Fast co-located joins, foreign keys, .. Parallel, distributed queries Transactional ETL (INSERT..SELECT) Fast data loading (COPY) Online rebalancing Stored procedure call routing Columnar compression … Most PostgreSQL features just work on Citus tables Distributed database superpowers with PostgreSQL-level efficiency Some gaps remain

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

PostgreSQL is the best PostgreSQL implementation. Build a distributed database on top using extension APIs. PostgreSQL 1 release per year community-driven OSS database engineering 9+ active contributors at MS Citus 3-4 releases per year Microsoft-driven OSS distributed systems engineering 13 engineers

Slide 35

Slide 35 text

marco.slot@microsoft.com

Slide 36

Slide 36 text

marco.slot@microsoft.com https://aka.ms/open-source-discord http://aka.ms/cituscon-ondemand https://github.com/citusdata/citus