Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

https://github.com/citusdata/citus https://azure.microsoft.com/en-us/services/postgresql/

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

APPLICATION COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn A Citus cluster consists of multiple PostgreSQL servers with the Citus extension.

Slide 6

Slide 6 text

SELECT create_distributed_table( 'table_name', 'distribution_column');

Slide 7

Slide 7 text

APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns','company_id'); METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster

Slide 8

Slide 8 text

Distributed tables create_distributed_table('companies', 'company_id') Data is hash-partitioned across worker nodes based on distribution column values. Limitations: ž Unique constraints must include the distribution column ž Foreign keys/join (performance) limitations (more on that later) ž Missing features: Triggers, table inheritance (but you can use them...)

Slide 9

Slide 9 text

INSERT INTO VALUES campaigns (company_id, …) ('Microsoft', …); METADATA W1 W2 W3 … Wn INSERT INTO campaigns_102 VALUES … APPLICATION COORDINATOR NODE WORKER NODES hashtext('Microsoft') = 76743714 … [0,134217727] campaigns_102 [134217728,268435455] campaigns_103 …

Slide 10

Slide 10 text

APPLICATION COPY FROM WITH campaigns STDIN (format 'csv'); METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn COPY campaigns_102 FROM STDIN WITH (format 'binary') COPY campaigns_101 FROM STDIN WITH (format 'binary') COPY campaigns_103 FROM STDIN WITH (format 'binary') How Citus performs bulk loading across the database cluster Microsoft,2938,13,3.7,… Acme,386,13,12.8,… …

Slide 11

Slide 11 text

Citus COPY performance 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 Citus (160 cores) Citus (64 cores) PostgreSQL (64 cores) Rows loaded using COPY per second - higher is better Multi-threaded (10 threads) COPY with a primary key

Slide 12

Slide 12 text

APPLICATION SELECT FROM GROUP BY company_id, sum(spend) AS campaign_spend campaigns company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), FROM campaigns_102 … SELECT company_id sum(spend) FROM campaigns_101 … SELECT company_id sum(spend), FROM campaigns_103 … How Citus distributes queries across the database cluster

Slide 13

Slide 13 text

Citus parallel SELECT performance 0 50 100 150 200 250 300 350 400 Citus (160 cores) Citus (64 cores) PostgreSQL (64 cores) Query time in seconds - lower is better Simple analytical query on 100GB of data

Slide 14

Slide 14 text

How Citus handles transactions in a multi-node cluster BEGIN; UPDATE SET WHERE … COMMIT; campaigns start_date = '2020-03-17' company_id = 'Microsoft'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE campaigns_102 SET …; … COMMIT; APPLICATION COORDINATOR NODE WORKER NODES

Slide 15

Slide 15 text

Citus HammerDB TPC-C performance 606k 434k 207k 254k 0 100000 200000 300000 400000 500000 600000 700000 Citus (256 cores) Citus (160 cores) Citus (64 cores) PostgreSQL (64 cores) New Order Transactions Per Minute (NOPM) - higher is better HammerDB TPC-C performance (1000 warehouses, 400 vusers)

Slide 16

Slide 16 text

Distributed tables can be co-located create_distributed_table('campaigns', 'company_id', colocate_with := 'companies') Distributed tables can be co-located with other distributed tables. Same hash range is always on the same worker node. Co-location enables: ž full query push down when filtering all tables by distribution column ž foreign keys between distributed tables with distribution column ž efficient joins between distributed tables on distribution column

Slide 17

Slide 17 text

Reference tables create_reference_table('categories') Reference tables are replicated to all worker nodes. Enables: ž Foreign keys from distributed (or reference) tables on any column ž Efficient joins with distributed (or reference) tables on any column

Slide 18

Slide 18 text

Co-located joins

Slide 19

Slide 19 text

Router queries

Slide 20

Slide 20 text

Router queries power multi-tenant applications

Slide 21

Slide 21 text

Benefits of Citus for multi-tenant applications

Slide 22

Slide 22 text

Citus helps ASB onboard customers 20x faster “After migrating to Citus, we can onboard Vonto customers 20X faster, in 2 minutes vs. the 40+ minutes it used to take. And with the launch of Hyperscale (Citus) on Azure Database for PostgreSQL, we are excited to see what we can build next on Azure.” 100 GB+ data Multi-tenant SaaS application Milliseconds latency

Slide 23

Slide 23 text

Parallel operations power real-time analytics https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/

Slide 24

Slide 24 text

INSERT..SELECT transforms the data inside the database in parallel Network Network

Slide 25

Slide 25 text

Benefits of Citus for real-time analytics Operation Achievable performance Bulk loading 1M rows/sec SELECT from raw data 10-100M rows/sec INSERT..SELECT into rollup 1-100M rows/sec SELECT on rollup table 1k-100k queries/sec

Slide 26

Slide 26 text

Microsoft Windows relies on Citus for mission-critical decisions ž “Ship/no-ship decisions for Microsoft Windows are made using Hyperscale (Citus), where our team runs on-the-fly analytics on billions of JSON events with sub-second responses. ž Distributed PostgreSQL is a game changer.” 1 PB+ data 6M+ queries per day; 75% of queries completing < 0.2 secs

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

https://github.com/citusdata/citus https://docs.citusdata.com/