Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Quick intro (or refresher) on Citus

Slide 4

Slide 4 text

Provides simple scale-out of operational workloads to execute on a cluster on machines Single connection – no code changes to the app Scale locally across a cluster, or globally using replication Traditional PostgreSQL Single Node Scale Distributed PostgreSQL Cluster Scale SQL Query SQL Query

Slide 5

Slide 5 text

Provides simple scale-out of operational workloads to execute on a cluster on machines Single connection – no code changes to the app Scale locally across a cluster, or globally using replication SQL Query

Slide 6

Slide 6 text

1990s 2000s 2010s 2020s

Slide 7

Slide 7 text

Dispelling the great distributed database myth

Slide 8

Slide 8 text

Easily the most common misconception about distributed databases is that they are only applicable to large data volumes The definition of “large” varies depending who you talk to However, Distributed databases are incredibly powerful even for small data volumes

Slide 9

Slide 9 text

There are several common database bottlenecks that single node systems encounter Infrastructure Database Operations CPU – Increase total amount of read/write compute Cache – Increase cache hit ratio IO – Shared nothing architecture adds IO with new nodes Queries – Reduce scans by isolating onto a shard Data Modification – parallelize heavy operations Maintenance – parallelize tasks such as backup

Slide 10

Slide 10 text

Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2] Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) … CREATE TABLE SalesTransactions SELECT create_distributed_table

Slide 11

Slide 11 text

CREATE TABLE SalesTransactions SELECT create_distributed_table INSERT INTO SalesTransactions VALUES (1,1,10.23) INSERT INTO SalesTransactions VALUES (2,3,17.94) Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2] Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) …

Slide 12

Slide 12 text

CREATE TABLE SalesTransactions SELECT create_distributed_table INSERT INTO SalesTransactions VALUES (1,1,10.23) INSERT INTO SalesTransactions VALUES (2,3,17.94) Sales Transactions Sales Transactions [Shard 1] Sales Transactions [Shard 2] Sales Transactions [Shard 5] Sales Transactions [Shard 3] Sales Transactions [Shard 4] Sales Transactions [Shard 32] (default) … Node 1 Node 2 Node 3

Slide 13

Slide 13 text

DEMO: Single node vs. Parallel Update

Slide 14

Slide 14 text

Single Node Citus

Slide 15

Slide 15 text

Relational database performance is highly sensitive to cache hit ratio Performance can “fall off a cliff” once cache hit ration drops Scale out can provide better than linear performance improvement where cache is under pressure • https://aka.ms/pg-hyperscale-perf 15,923 214,043 368 39 0 50 100 150 200 250 300 350 400 - 50,000 100,000 150,000 200,000 250,000 Single Node PostgreSQL 4 Node Citus (Cosmos DB) Millisecond Response Time Orders per minute Single Node vs. Cluster Workload Scalability

Slide 16

Slide 16 text

Building distributed databases

Slide 17

Slide 17 text

If built well, the scalability of distributed databases is magical • Virtually linear scalability • Well balanced across compute nodes • Maximize local execution on the compute nodes (more on the next slide …and if done wrong, they will slow crawl and waste an resource

Slide 18

Slide 18 text

1. Filter on shard column SQL Query 1 2 3 4 5

Slide 19

Slide 19 text

2. Primary/foreign key checks within compute node SQL Query 1 2 3 4 5

Slide 20

Slide 20 text

3. Unique constraints within compute node SQL Query 1 2 3 4 5

Slide 21

Slide 21 text

4. Joins within compute node SQL Query 1 2 3 4 5

Slide 22

Slide 22 text

2. Local/Global aggregates for HTAP SQL Query 1 2 3 4 5 Distributed COUNT(*) SUM (shard counts) # #

Slide 23

Slide 23 text

Multi-tenant SaaS IoT tenand_id tenand_id tenand_id device_id Complex models with common tenant_id shard key Simple model (often no relationasips) sharded on device_id

Slide 24

Slide 24 text

DEMO: Optimizing Distributed Data

Slide 25

Slide 25 text

SQL

Slide 26

Slide 26 text

Some options to fix it when things slow down

Slide 27

Slide 27 text

Citus Cluster Online rebalancing of shards to less utilized nodes Shard move or shard split policies supported

Slide 28

Slide 28 text

Azure Cosmos DB for PostgreSQL Cluster Enables online movement of tenants to new nodes Large or busy tenants can be isolated to a dedicated node to maximize performance No application code changes, or downtime required

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Getting started

Slide 31

Slide 31 text

Download Citus today No Azure Subscription Required No Credit Card Required

Slide 32

Slide 32 text

Try Azure Cosmos DB for PostgreSQL free for 30 days No Azure Subscription Required No Credit Card Required

Slide 33

Slide 33 text

Managed Service

Slide 34

Slide 34 text

Open-source PostgreSQL and Citus extension for distributed queries are free downloads Empowers developers to develop and test locally, then deploy to the cloud Zero code changes for deployment

Slide 35

Slide 35 text

Optional to enable Synchronous replication provides zero data loss on failover No application changes required Primary Cluster Standby Cluster High Availability

Slide 36

Slide 36 text

Easy Measure Creation Pre-defined and configurable dashboards to monitor workload performance Configure alerts based on business specific thresholds Built-in Monitoring

Slide 37

Slide 37 text

Users Private Link Provides private connectivity from a virtual network to Azure Simplifies the network architecture Secures connections between Azure endpoints by eliminating data exposure to the public internet Private Link

Slide 38

Slide 38 text

aka.ms/trycosmosdb General Availability – October 2022

Slide 39

Slide 39 text

© Copyright Microsoft Corporation. All rights reserved.