Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

aka.ms/citus

Slide 3

Slide 3 text

What is Citus? • Distributed tables • Reference tables • & more, as of Citus 10 Extension to Postgres (not a fork!) • Add nodes • Rebalance Simplicity & flexibility of using PostgreSQL, at scale • Scale transactional workloads • Scale analytical workloads • Mixed workloads too Multi-purpose:

Slide 4

Slide 4 text

Why

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

planner, executor, transactions Background workers foreign data wrappers published in 1986

Slide 7

Slide 7 text

Why be an extension to Postgres (and not a fork?) Vast ecosystem

Slide 8

Slide 8 text

Developers ❤ Postgres

Slide 9

Slide 9 text

Why Citus, Reason #1: Postgres limited to single node Capacity / execution time issues: § Working set does not fit in memory § Reaching limits of network-attached storage (IOPS) / CPU § Analytical query takes too long § Data transformations are single-threaded (e.g. insert..select) § Autovacuum cannot keep up with transactional workload § …

Slide 10

Slide 10 text

• Joins • Functions • Constraints • Indexes: B-tree, GIN, BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • JSONB • Window functions • CTEs • Atomic update / delete • Partitioning • Interactive transactions • Open source • … Why Citus, Reason #2: Because Postgres includes:

Slide 11

Slide 11 text

COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn A Citus cluster consists of multiple Postgres nodes with the Citus extension. CREATE EXTENSION citus; SELECT citus_add_node(…); SELECT citus_add_node(…); SELECT citus_add_node(…); CREATE EXTENSION citus; CREATE EXTENSION citus; CREATE EXTENSION citus;

Slide 12

Slide 12 text

APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns', 'company_id'); METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster

Slide 13

Slide 13 text

APPLICATION SELECT FROM GROUP BY campaign_id, avg(spend) AS avg_campaign_spend campaigns campaign_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_102 … SELECT company_id sum(spend), count(spend) … FROM campaigns_101 … SELECT company_id sum(spend), count(spend) … FROM campaigns_103 … How Citus distributes queries across the database cluster

Slide 14

Slide 14 text

easy # run PostgreSQL with Citus on port 5500 docker run = citusdata/citus

Slide 15

Slide 15 text

CREATE TABLE users( id bigserial primary key, name text); SELECT create_distributed_table( 'users', 'id’); SELECT count(*) FROM users; easy

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

aka.ms/citus10

Slide 20

Slide 20 text

Citus Coordinator Citus Workers Citus Coordinator Citus single node Distributed Citus cluster

Slide 21

Slide 21 text

slack.citusdata.com

Slide 22

Slide 22 text

Columnar Storage Row-based storage

Slide 23

Slide 23 text

CREATE TABLE events( ts timestamptz, i int, n numeric, s text); CREATE TABLE events_columnar( ts timestamptz, i int, n numeric, s text) USING columnar;

Slide 24

Slide 24 text

Citus Columnar && Range Partitioning in Postgres CREATE TABLE events( ts timestamptz, i int, n numeric, s text) PARTITION BY RANGE (ts); CREATE TABLE events_2021_jan PARTITION OF events FOR VALUES FROM ('2021-01-01') TO ('2021-02-01'); CREATE TABLE events_2021_feb PARTITION OF events FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');

Slide 25

Slide 25 text

events table

Slide 26

Slide 26 text

Citus Columnar && Range Partitioning in Postgres SELECT alter_table_set_access_method( 'events_2021_jan', 'columnar');

Slide 27

Slide 27 text

events table …

Slide 28

Slide 28 text

events table …

Slide 29

Slide 29 text

events table

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

In Citus 10, we open sourced Citus Shard Rebalancer

Slide 33

Slide 33 text

Easy to rebalance shards after adding a new Citus node

Slide 34

Slide 34 text

What if shards get out-of-balance on existing nodes?

Slide 35

Slide 35 text

Rebalancing shards to optimize for performance, too

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a game changer." aka.ms/blog-petabyte-scale-analytics

Slide 38

Slide 38 text

aka.ms/azure-portal-postgres Try Citus on Azure

Slide 39

Slide 39 text

Citus Newsletter aka.ms/citus-newsletter

Slide 40

Slide 40 text

Questions? nils.dijk@microsoft.com claire.giordano@microsoft.com Citus repo on GitHub aka.ms/citus Citus Public Slack for open source Q&A slack.citusdata.com Citus Docs docs.citusdata.com Definitive Citus 10 blog post by Marco aka.ms/citus10 Download Citus open source citusdata.com/download/

Slide 41

Slide 41 text

If need to scale Postgres, learn more about Citus 10 As of Citus 10, now includes columnar compression We’ve open sourced the shard rebalancer too & Citus on a single node