Slide 1

Slide 1 text

Citus Marco Slot marco.slot@microsoft.com

Slide 2

Slide 2 text

What is Citus? Citus is an extension (plug-in) to PostgreSQL which adds: ž Distributed tables ž Reference tables Simplicity and flexibility of using PostgreSQL, at scale. Multi-purpose: ž Scale transactional workloads through routing / delegation ž Scale analytical workloads through parallelism and columnar storage

Slide 3

Slide 3 text

Why be an extension / not a fork? PostgreSQL is a core project and a vast ecosystem https://github.com/EfficiencyGeek/postgresql-ecosystem 13

Slide 4

Slide 4 text

Why Citus? Capacity / execution time issues: ž Working set does not fit in memory ž Reaching limits of network-attached storage (IOPS) / CPU ž Analytical query takes too long ž Data transformations are single-threaded (e.g. insert..select) ž Autovacuum cannot keep up with transactional workload ž … 1) PostgreSQL is limited to a single server

Slide 5

Slide 5 text

• Joins • Functions • Constraints • Indexes: B-tree, GIN, BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • JSONB • Window functions • CTEs • Atomic update / delete • Partitioning • Interactive transactions • Open source • … Why Citus? 2) Systems that are not PostgreSQL may be lacking one of:

Slide 6

Slide 6 text

Citus workload patterns Workloads that benefit from PostgreSQL Workloads that benefit from scaling out Citus workloads

Slide 7

Slide 7 text

Citus workload patterns Multi-tenant Software-as-a-service Real-time Analytics Customer-facing dashboards High performance CRUD Microservices Data warehouse Analytical reports

Slide 8

Slide 8 text

Citus workload requirements Capability Multi-tenant Real-time Analytics High perf. CRUD Data warehouse Distributed tables Yes Yes Yes Yes Reference tables Yes Yes Yes Yes Co-location Yes Yes Yes Yes Parallel query Yes Yes Parallel DML Yes Query routing Yes Yes Yes Fast full table scans Yes Fast bulk loading Yes Yes Connection scaling Yes Foreign keys Yes …

Slide 9

Slide 9 text

How to transform PostgreSQL into a distributed database as an extension?

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

planner, executor, transactions Background workers foreign data wrappers

Slide 12

Slide 12 text

What is an extension? Extensions consist of: 1. SQL objects (tables, functions, types, …) 2. Shared library CREATE TABLE pg_dist_node (…); CREATE TABLE pg_dist_partition (…); CREATE FUNCTION citus_add_node(…) RETURNS void LANGUAGE c AS '$libdir/citus', $function$citus_add_node$function$; CREATE FUNCTION create_distributed_table(…) RETURNS void LANGUAGE c AS '$libdir/citus', $function$create_distributed_table$function$; #include "postgres.h" Datum citus_add_node(…) { … } Datum create_distributed_table(…) { … } citus.sql citus.c

Slide 13

Slide 13 text

COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn A Citus cluster consists of multiple PostgreSQL servers with the Citus extension. CREATE EXTENSION citus; SELECT citus_add_node(…); SELECT citus_add_node(…); SELECT citus_add_node(…); CREATE EXTENSION citus; CREATE EXTENSION citus; CREATE EXTENSION citus;

Slide 14

Slide 14 text

COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn Worker nodes can also act as coordinator Load balancer APPLICATION ADMINISTRATION

Slide 15

Slide 15 text

APPLICATION COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn Typical production deployment ADMINISTRATION Blob storage WAL archive

Slide 16

Slide 16 text

APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns', 'company_id'); METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster

Slide 17

Slide 17 text

Citus Table Types Distributed tables with co-location: SELECT create_distributed_table( 'campaigns', 'company_id'); SELECT create_distributed_table( 'ads', 'company_id'); Reference table: SELECT create_reference_table( 'ad_types'); Items-4 Items-4 Items-4 campai_4 ad_types_ Items-4 Items-4 Items-4 ads_4 Items-4 Items-4 Items-4 campai_8 Items-4 Items-4 Items-4 ads_8 ad_types_ foreign keys Worker 1 Worker 2 Coordinator ad_types campaigns sessions ads

Slide 18

Slide 18 text

APPLICATION SELECT FROM GROUP BY campaign_id, avg(spend) AS avg_campaign_spend campaigns campaign_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_102 … SELECT company_id sum(spend), count(spend) … FROM campaigns_101 … SELECT company_id sum(spend), count(spend) … FROM campaigns_103 … How Citus distributes queries across the database cluster

Slide 19

Slide 19 text

Query planner hook PostgreSQL defines function pointers that are called at critical points. #include "postgres.h" void _PG_init(void) { … planner_hook = distributed_planner; … } PlannedStmt * distributed_planner(Query *parse, …) { … } citus.c postgres.c planner_hook_type planner_hook = NULL; PlannedStmt * planner(Query *parse, …) { PlannedStmt *result; if (planner_hook) result = (*planner_hook) (parse, …); else result = standard_planner(parse, …); return result; }

Slide 20

Slide 20 text

Executor hook: CustomScan Citus integrates into the PostgreSQL planner via a CustomScan HashAggregate CustomScan SELECT ad_id, count(*) FROM clicks GROUP BY ad_id; Adaptive Executor W1 W2 SELECT ad_id, count(*) FROM clicks_102 GROUP BY ad_id;

Slide 21

Slide 21 text

Citus query planners Queries go through several planning layers Fast path Router (CRUD operations) (Single tenant operations)

Slide 22

Slide 22 text

Citus query planners Query pushdown Join order Recursive planning (Real-time analytics) (Data warehouse) Logical planner

Slide 23

Slide 23 text

Query pushdown planning Complex queries can be embarrassingly parallel SELECT campaign_id, sum(clicks.count) FROM ads LEFT JOIN LATERAL ( SELECT campaign_id, count(*) FROM clicks JOIN sites USING (site_id) WHERE sites.language = 'nl' AND clicks.click_time >= now() – interval '1 hour' AND ads.campaign_id = clicks.campaign_id AND clicks.ad_id = ads.ad_id GROUP BY campaign_id ) clicks ON (true) JOIN campaigns USING (campaign_id) WHERE campaigns.customer_id = 4 GROUP BY 1 ORDER BY 2 DESC LIMIT 10;

Slide 24

Slide 24 text

Query pushdown planning Complex queries can be embarrassingly parallel Limit CustomScan Adaptive Executor Sort W1 W2 SELECT campaign_id, ad_id, sum(clicks.count) FROM ads_303 LEFT JOIN LATERAL ( SELECT campaign_id, count(*) FROM clicks_402 JOIN sites_1 USING (site_id) WHERE sites.language = 'nl' AND clicks.click_time >= now() – interval '1 hour' AND ads.campaign_id = clicks.campaign_id AND clicks.ad_id = ads.ad_id ) clicks ON (true) JOIN campaigns_102 USING (campaign_id) WHERE campaigns.customer_id = 4 GROUP BY 1, 2 ORDER BY 2 LIMIT 10;

Slide 25

Slide 25 text

Recursive planning Plan non-pushdownable subqueries separately SELECT ads.* FROM ads JOIN ( SELECT ad_id, click_count FROM click_counts ORDER BY 2 DESC LIMIT 10 ) top_ads USING (ad_id); SELECT ads.* FROM ads JOIN read_intermediate_result('top_ads') USING (ad_id); SELECT ad_id, click_count FROM click_counts ORDER BY 2 DESC LIMIT 10 Main query Subplan query

Slide 26

Slide 26 text

Subplan execution (broadcast) CustomScan Subplan (broadcast) W1 W2 Adaptive Executor … COPY "top_ads" FROM STDIN WITH (format 'result') SELECT ads.* FROM ads_102 JOIN read_intermediate_result('top_ads') USING (ad_id);

Slide 27

Slide 27 text

How Citus handles transactions in a multi-node cluster BEGIN; UPDATE SET WHERE COMMIT; campaigns start_date = '2018-03-17' company_id = 'Pat Co'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE Campaigns_2012 SET …; COMMIT; APPLICATION COORDINATOR NODE WORKER NODES callbacks: - pre-commit - post-commit - abort

Slide 28

Slide 28 text

APPLICATION BEGIN; UPDATE SET WHERE UPDATE SET WHERE COMMIT; campaigns started = true campaign_id = 2; ads finished = true campaign_id = 1; METADATA W1 W2 W3 … Wn BEGIN … assign_distributed_ transaction_id … UPDATE campaigns_102 … PREPARE TRANSACTION… COMMIT PREPARED… BEGIN … assign_distributed_ transaction_id … UPDATE campaigns_203 … PREPARE TRANSACTION… COMMIT PREPARED… COORDINATOR NODE WORKER NODES How Citus distributes transactions in a multi-node cluster callbacks: - pre-commit - post-commit - abort

Slide 29

Slide 29 text

METADATA W1 W2 W3 … Wn BEGIN … assign_distributed_ transaction_id … UPDATE campaigns_102 … PREPARE TRANSACTION citus_0_2431; COMMIT PREPARED… BEGIN … assign_distributed_ transaction_id … UPDATE campaigns_203 … PREPARE TRANSACTION citus_0_2431; COMMIT PREPARED …; COORDINATOR NODE WORKER NODES worker Prepared xact W1 citus_0_2413 W2 citus_0_2413 SELECT gid FROM pg_prepared_xacts WHERE gid LIKE 'citus_%d_%' 2PC recovery Compare

Slide 30

Slide 30 text

METADATA W1 W2 W3 … Wn BEGIN … assign_distributed_ transaction_id … UPDATE campaigns_102 … BEGIN … assign_distributed_ transaction_id … UPDATE campaigns_203 … COORDINATOR NODE WORKER NODES SELECT * FROM local_wait_edges(); Deadlock detection Detect cycles in lock graph

Slide 31

Slide 31 text

APPLICATION COPY FROM clicks STDIN WITH CSV METADATA W1 W2 W3 … Wn -- Data loading COORDINATOR NODE WORKER NODES How Citus parallelizes bulk loading Bulk loading using COPY ProcessUtility_hook

Slide 32

Slide 32 text

Creating rollup tables with INSERT..SELECT INSERT INTO click_counts SELECT campaign_id, date_trunc('hour', click_time), count(*) FROM clicks WHERE ingest_time BETWEEN '2021-02-22 00:00' AND '2021-02-22 00:05’ GROUP BY 1, 2 ON CONFLICT (campaign_id, hour) DO UPDATE count = rollup.count + EXCLUDED.count; Distributed, transactional, parallel data transformation:

Slide 33

Slide 33 text

Demo

Slide 34

Slide 34 text

Lessons learned ž PostgreSQL is extensible enough to build a comprehensive distributed database system. ž Distribution choices are essential to scale, but difficult for users. ž Help customers adopt your database, get paid when they scale out. ž If you’re going for compatibility, look at what ORMs are doing. ž Solve adoption blockers. ž Be prescriptive about the problem/workload you’re solving.

Slide 35

Slide 35 text

Open challenges ž Hybrid local-distributed databases ž Picking good distribution columns automatically ž Creating good rollups automatically ž Distributed geospatial (PostGIS) join optimization ž Nested distributed transactions (e.g. in triggers) ž Arbitrary foreign keys ž Linear connection scaling

Slide 36

Slide 36 text

Questions? marco.slot@microsoft.com Citus GitHub github.com/citusdata/citus Citus Slack slack.citusdata.com Citus Docs docs.citusdata.com Azure - Hyperscale (Citus) Docs docs.microsoft.com/azure/postgresql/hyperscale/ PostgreSQL hooks documentation github.com/AmatanHead/psql-hooks