What is Citus?
• Distributed tables
• Reference tables
• & more, as of Citus 10
Extension to Postgres (not a fork!)
• Add nodes
• Rebalance
Simplicity & flexibility of using PostgreSQL, at scale
• Scale transactional workloads
• Scale analytical workloads
• Mixed workloads too
Multi-purpose:
Slide 4
Slide 4 text
Why
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
planner, executor,
transactions
Background workers
foreign data wrappers
published in 1986
Slide 7
Slide 7 text
Why be an extension to
Postgres (and not a fork?)
Vast ecosystem
Slide 8
Slide 8 text
Developers ❤ Postgres
Slide 9
Slide 9 text
Why Citus, Reason #1: Postgres limited to single node
Capacity / execution time issues:
§ Working set does not fit in memory
§ Reaching limits of network-attached storage (IOPS) / CPU
§ Analytical query takes too long
§ Data transformations are single-threaded (e.g. insert..select)
§ Autovacuum cannot keep up with transactional workload
§ …
APPLICATION
SELECT
FROM
GROUP BY
campaign_id,
avg(spend) AS avg_campaign_spend
campaigns
campaign_id;
METADATA
COORDINATOR
NODE
WORKER NODES
W1
W2
W3 …
Wn
SELECT company_id
sum(spend),
count(spend) …
FROM
campaigns_102 …
SELECT company_id
sum(spend),
count(spend) …
FROM
campaigns_101 …
SELECT company_id
sum(spend),
count(spend) …
FROM
campaigns_103 …
How Citus distributes queries across the database cluster
Slide 14
Slide 14 text
easy
# run PostgreSQL with Citus on port 5500
docker run
= citusdata/citus
Slide 15
Slide 15 text
CREATE TABLE users(
id bigserial primary key,
name text);
SELECT create_distributed_table(
'users', 'id’);
SELECT count(*) FROM users;
easy
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
aka.ms/citus10
Slide 20
Slide 20 text
Citus
Coordinator
Citus
Workers
Citus
Coordinator
Citus single node Distributed Citus cluster
Slide 21
Slide 21 text
slack.citusdata.com
Slide 22
Slide 22 text
Columnar Storage Row-based storage
Slide 23
Slide 23 text
CREATE TABLE events(
ts timestamptz, i int,
n numeric, s text);
CREATE TABLE events_columnar(
ts timestamptz, i int,
n numeric, s text) USING columnar;
Slide 24
Slide 24 text
Citus Columnar && Range Partitioning in Postgres
CREATE TABLE events(
ts timestamptz, i int,
n numeric, s text)
PARTITION BY RANGE (ts);
CREATE TABLE events_2021_jan PARTITION OF events
FOR VALUES FROM ('2021-01-01') TO ('2021-02-01');
CREATE TABLE events_2021_feb PARTITION OF events
FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');
Slide 25
Slide 25 text
events table
Slide 26
Slide 26 text
Citus Columnar && Range Partitioning in Postgres
SELECT alter_table_set_access_method(
'events_2021_jan', 'columnar');
Slide 27
Slide 27 text
events table
…
Slide 28
Slide 28 text
events table
…
Slide 29
Slide 29 text
events table
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
In Citus 10, we open sourced Citus Shard Rebalancer
Slide 33
Slide 33 text
Easy to rebalance shards after adding a new Citus node
Slide 34
Slide 34 text
What if shards get out-of-balance on existing nodes?
Slide 35
Slide 35 text
Rebalancing shards to optimize for performance, too
Slide 36
Slide 36 text
No content
Slide 37
Slide 37 text
Min Wei, Principal Engineer at Microsoft
Distributed PostgreSQL
is a game changer."
aka.ms/blog-petabyte-scale-analytics
Slide 38
Slide 38 text
aka.ms/azure-portal-postgres
Try Citus
on Azure
Slide 39
Slide 39 text
Citus Newsletter
aka.ms/citus-newsletter
Slide 40
Slide 40 text
Questions?
nils.dijk@microsoft.com
claire.giordano@microsoft.com
Citus repo on GitHub
aka.ms/citus
Citus Public Slack for open source Q&A
slack.citusdata.com
Citus Docs
docs.citusdata.com
Definitive Citus 10 blog post by Marco
aka.ms/citus10
Download Citus open source
citusdata.com/download/
Slide 41
Slide 41 text
If need to scale Postgres, learn more about Citus 10
As of Citus 10,
now includes
columnar
compression
We’ve open
sourced the
shard rebalancer
too
& Citus on a
single node