Slide 1

Slide 1 text

Citus from the Customer Eyes Sai Srirampur Engineering Manager, Customer Success for Postgres on Azure

Slide 2

Slide 2 text

This talk is for 1. SaaS companies looking to scale with distributed PostgreSQL. 100s 10s 1000s

Slide 3

Slide 3 text

This talk is for 2. Companies who want to build scalable Timeseries apps

Slide 4

Slide 4 text

Customer Journey - 4 steps 1. What is Citus? 2. When to use Citus? 3. Data-modeling with Citus 4. Running app in production with Citus

Slide 5

Slide 5 text

What is Citus?

Slide 6

Slide 6 text

Postgres At Any Scale

Slide 7

Slide 7 text

Superpower of ”distributed tables” to scale-out Postgres Start Building apps on Single node Seamlessly scale to multiple nodes by distributing tables

Slide 8

Slide 8 text

Packaged as extension, not a fork Postgres At Any Scale Open source & Fully managed

Slide 9

Slide 9 text

Packaged as extension, not a fork Postgres At Any Scale Open source & Fully managed Open source & fully-managed

Slide 10

Slide 10 text

Global replication Security Distributed Scale High Availability Compliance Enterprise-grade monitoring Easy ecosystem integration Extension support Extensions JSONB Full text search Geospatial support Rich indexing Powered by fully Open Source Citus Extension to Postgres Azure Cosmos DB for PostgreSQL Fully managed D-SQL offering built upon core benefits of PostgreSQL

Slide 11

Slide 11 text

When to use Citus? Multi-tenant SaaS Apps Timeseries/IoT Apps

Slide 12

Slide 12 text

Multi-tenant SaaS Apps – Scenarios • Expecting to outgrow single node soon. • Complex homegrown sharding to manage tenants • Incurring large tech debt • Inflexible to scale out • Inflexible to rebalance tenants

Slide 13

Slide 13 text

Scaling multi-tenant SaaS apps with Citus • Automatic sharding of tenants across nodes • Online scale-out with shard rebalancer • Tenant Isolation • Comprehensive tenant monitoring • Row Level Security per tenant

Slide 14

Slide 14 text

11K+ Customers 800ms response times for 95% of queries 100 requests per second 20 nodes 10B+ Total #rows 2 clusters in EU and US Reference

Slide 15

Slide 15 text

Timeseries / IoT apps - Scenarios • Apps providing real-time insights on telemetry data • Examples: device telemetry, vehicle telemetry, logs, event data etc. • Have implicit scale requirement • Affinity to PostgreSQL

Slide 16

Slide 16 text

Scaling timeseries apps with Citus • Horizontal scale out - Start small and scale out to multiple nodes • JSONB to store semi-structured data • Native timeseries APIs for partitioning, expiry and archival.

Slide 17

Slide 17 text

Reference

Slide 18

Slide 18 text

Data-modeling with Citus

Slide 19

Slide 19 text

5 tips for distributed data-modeling with Postgres

Slide 20

Slide 20 text

TIP 1: Column that adds natural dimension to data is a good shard key • Represents a central piece of application • Lives at heart of application • Examples: • SaaS apps – customer_id is a good candidate • IoT app – the device_id is a good candidate More details on Distribution column and how Citus distributes data and queries

Slide 21

Slide 21 text

github_events Event_id Event_type Event_hash Repo id User_id Repo id Payload Created_at org github_users User_id url login avatar Country_id gravatar Display login User_id is good candidate for distributing tables • Events generated by users themselves • Adds natural dimension & is at center of workload countries Country_id Name Continent Capital Example: App to analyze GitHub events

Slide 22

Slide 22 text

TIP 2: Co-locate large tables & make small tables reference

Slide 23

Slide 23 text

Example: App to analyze GitHub events github_events Event_id Event_type Event_hash Repo id User_id Repo id Payload Created_at org github_users User_id url login avatar Country_id gravatar Display login countries Country_id Name Continent Capital Distributed PostgreSQL tables with co-location: Events & users distributed on user_id SELECT create_distributed_table( 'github_users', 'user_id'); SELECT create_distributed_table( 'github_events', 'user_id'); Reference table: SELECT create_reference_table( 'countries');

Slide 24

Slide 24 text

Power of Co-locating tables Items-4 Items-4 Items-4 events_4 countries Items-4 Items-4 Items-4 users_4 Items-4 Items-4 Items-4 events_8 Items-4 Items-4 Items-4 users_8 countries SQL pushdown - joins/fkeys Citus Worker 1 Citus Worker 2 Citus Coordinator countries events users

Slide 25

Slide 25 text

github_events Event_id Event_type Event_hash Repo id User_id Repo id Payload (JSONB) Created_at org • Simplified schema – single table to store a variety of data • GIN indexing for fast JSONB access • NoSQL capabilities to Postgres world • Helpful in implementing workloads that need scale – Ex IoT, Events, Logs TIP 3: Use JSONB data-type to store unstructured data

Slide 26

Slide 26 text

TIP 4: Timeseries partitioning on distributed tables github_events Event_id Event_type User_id Repo id User_id Repo id Payload (JSONB) Created_at org Distribute on user_id Partition on created_at github_events_jan Event_id Event_type User_id … github_events_feb Event_id Event_type User_id … github_events_sept Event_id Event_type User_id … github_events_aug Event_id Event_type User_id … github_events_july Event_id Event_type User_id … . . . . . . . Archiving older partitions using columnar Most recent partitions using row storage – real time ingests • Helps filtered querying with timestamp filters • Efficient expiry – DROP TABLE instead of DELETE • Helps archiving older data through columnar • Common in implementing timeseries workloads

Slide 27

Slide 27 text

Timeseries APIs for partitioning, archival, & expiry • Automatic Data Partitioning – Easily Create partitions in past and future SELECT create_time_partitions( table_name:= 'time_series_events', partition_interval:= '1 day’, end_at:= '2021-10-30’, start_from:= '2021-10-10'); • Automatic Data Archival – Convert older partitions from row to columnar CALL alter_old_partitions_set_access_method('time_series_events’, '2021-10-20', 'columnar'); • Automatic Data Expiry - Expire data older than a given time period CALL drop_old_time_partitions( table_name:= 'time_series_events', older_than:= '2021-10-15');

Slide 28

Slide 28 text

TIP 5: Scope queries to shard key where possible • Ex: SaaS workloads – queries can be scoped on tenant_id UPDATE ads SET impressions_count = impressions_count+1 WHERE id = 42; TO UPDATE ads SET impressions_count = impressions_count+1 WHERE id = 42 AND company_id = 1; • Avoids unnecessary fan out & helps query latency. • More relevant in transactional workloads

Slide 29

Slide 29 text

5 tips for distributed data-modeling with Postgres • TIP 1: Column that adds natural dimension to data is a good shard key • TIP 2: Co-locate large tables, make small tables reference • TIP 3: Use JSONB data-type to store unstructured data • TIP 4: Timeseries partitioning on distributed tables • TIP 5: Scope queries to shard key where possible

Slide 30

Slide 30 text

Running app in production with Citus

Slide 31

Slide 31 text

Line item Open Source / Self managed Azure Cosmos DB for PostgreSQL App Migration Use 5-tips for data-modeling and modify app End-to-end migration guide Data Migration pg_dump/pg_restore, pgcopydb for online migration pg_azure_storage extension: Query and load data directly from blob. ADF, Azure Databricks, Azure Stream Analytics, any integration to PG. Deployment Docker images for dev/test Burstable Compute for dev/test ARM/BICEP templates High Availability Patroni HA for Citus Auto HA with SLAs Upgrades Open Source Docs Self-serve upgrades Disaster Recovery pg_dump/pg_restore pgbackrest, wal-e, wal-g Automatic Cross-AZ Backups + PITR Cross-region read replicas Infra Metrics Self-managed using any PG monitoring tool Azure Monitor Integration PG Bouncer Setup using community docs Pgbouncer support included DB tuning and troubleshooting Performance tuning 1, Performance tuning 2, Useful Diagnostic Queries Security Roles, RLS Roles, RLS, Azure Firewall, Azure Private Endpoint Scale Self-manage addition, removal of infra Use fully OSS shard rebalancer Out-of-box support for vertical and horizontal scale Checklist to run app in production with Citus

Slide 32

Slide 32 text

DEMO of Azure Cosmos DB for PostgreSQL (aka Citus on Azure)

Slide 33

Slide 33 text

@saisrirampur @AzureCosmosDB @citusdata