Slide 1

Slide 1 text

© Copyright Microsoft Corporation. All rights reserved. Leandro Santana linkedin.com/in/leandroeredia/ @citusdata @AzureDBPostgres Citus Hyperscale Overview

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Postgres

Slide 4

Slide 4 text

PostgreSQL is more popular than ever PostgreSQL at core of many digital transformations Open source Proven resilience & stability Rich feature set enterprise-ready • Zero data loss • Rich indexing and data types, including geospatial • Low TCO. No Lock-in • Innovate faster with broad ecosystem

Slide 5

Slide 5 text

Single Server Fully-managed, single-node PostgreSQL Hyperscale (Citus) High-performance Postgres for scale out Enterprise-ready, fully managed community PostgreSQL with built-in HA and multi-layered security

Slide 6

Slide 6 text

Monitor Azure Database for PostgreSQL with Azure Monitor Azure Database for PostgreSQL tracks performance metrics and logs telemetry data Collect, analyze, and take action on telemetry data gathered from your database using Azure Monitor Better understand your apps with deep insights into app and database behavior, view alerts, and build remediation plans Azure Monitor Visualize Analyze Respond Integrate Insights

Slide 7

Slide 7 text

Migrating to Azure Database for PostgreSQL Opportunity DBAs spending time maintaining their databases, not focused on their apps On premises PostgreSQL VM PostgreSQL Other Managed Services Azure Database Migration Services Azure Database for PostgreSQL Solution Azure Database for PostgreSQL takes care of patching, high availability, and performance management Provides added benefits of comprehensive, intelligent security and compliance, including threat alerts

Slide 8

Slide 8 text

Migration from Oracle to Azure Database for PostgreSQL Single instance of Oracle Opportunity Oracle license cost is expensive and additional features (such as partitioning and high availability) come at added cost Solution Migrating to Azure Database for PostgreSQL removes need for licenses as the customer can take advantage of open source PostgreSQL Azure Database for PostgreSQL Azure Database Migration Services

Slide 9

Slide 9 text

Citus extension to Postgres

Slide 10

Slide 10 text

What is Citus? • Distributed tables • Reference tables • & more, as of Citus 10 Extension to Postgres (not a fork!) • Add nodes • Rebalance Simplicity & flexibility of using PostgreSQL, at scale • Scale transactional workloads • Scale analytical workloads • Mixed workloads too Multi-purpose:

Slide 11

Slide 11 text

aka.ms/citus

Slide 12

Slide 12 text

Why

Slide 13

Slide 13 text

Citus

Slide 14

Slide 14 text

• Joins • Functions • Constraints • Indexes: B-tree, GIN, BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • JSONB • Window functions • CTEs • Atomic update / delete • Partitioning • Interactive transactions • Open source • … Citus

Slide 15

Slide 15 text

Workloads that benefit from PostgreSQL CITUS WORKLOADS Workloads that benefit from scaling out

Slide 16

Slide 16 text

Multi-tenant SaaS Time series Analytics dashboards Mixed OLTP/OLAP

Slide 17

Slide 17 text

Citus Coordinator Hyperscale (Citus) Coordinator Hyperscale (Citus) Workers Hyperscale (Citus) Coordinator Standard Tier in Hyperscale (Citus) Basic Tier

Slide 18

Slide 18 text

SELECT create_distributed_table( 'table_name', 'distribution_column');

Slide 19

Slide 19 text

Citus Coordinator Choosing the distribution column for Multi-Tenant scenarios SELECT create_distributed_table('companies', 'id’); SELECT create_distributed_table('campaigns', 'company_id’); SELECT create_distributed_table('ads', 'company_id’); SELECT create_distributed_table('clicks', 'company_id’); SELECT create_distributed_table('impressions', 'company_id');

Slide 20

Slide 20 text

Citus Coordinator Choosing the distribution column for Real-Time scenarios SELECT create_distributed_table('github_users', 'user_id’); SELECT create_distributed_table('github_events', 'user_id');

Slide 21

Slide 21 text

Rest of the tables can be reference tables or local table SELECT create_reference_table( 'table_name')

Slide 22

Slide 22 text

APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns','company_id'); METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster

Slide 23

Slide 23 text

APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … Query across distribution columns.

Slide 24

Slide 24 text

Transactions routed to a single node BEGIN; UPDATE SET WHERE COMMIT; campaigns start_date = '2018-03-17' company_id = 'Pat Co'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE Campaigns_2012 SET …; COMMIT; APPLICATION COORDINATOR NODE WORKER NODES

Slide 25

Slide 25 text

Node A Node B Node C 1 2 4 5 3 6

Slide 26

Slide 26 text

Node A Node B 1 2 4 5 3 6 Node C Node A Node B Node C 1 2 4 5 3 6

Slide 27

Slide 27 text

Node A Node B 1 2 3 4

Slide 28

Slide 28 text

1 4 3 2 Node A Node B

Slide 29

Slide 29 text

10M+ queries per day; 75% https://techcommunity.microsoft.com/t5/Azure-Database-for- PostgreSQL/Architecting-petabyte-scale-analytics-by-scaling-out-Postgres-on/ba- p/969685

Slide 30

Slide 30 text

Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a game changer." aka.ms/blog-petabyte-scale-analytics

Slide 31

Slide 31 text

Storage efficiency IOPs Query speed Cost Savings Benefits of Citus Columnar

Slide 32

Slide 32 text

Row-based Storage Columnar storage

Slide 33

Slide 33 text

CREATE TABLE events_columnar( ts timestamptz, i int, n numeric, s text) USING columnar; It’s as simple as “USING columnar;”

Slide 34

Slide 34 text

Can change access method from heap à columnar SELECT alter_table_set_access_method( 'events_2021_jan', 'columnar');

Slide 35

Slide 35 text

Citus Columnar && Range Partitioning in Postgres CREATE TABLE events( ts timestamptz, i int, n numeric, s text) PARTITION BY RANGE (ts); CREATE TABLE events_2021_jan PARTITION OF events FOR VALUES FROM ('2021-01-01') TO ('2021-02-01'); CREATE TABLE events_2021_feb PARTITION OF events FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');

Slide 36

Slide 36 text

events table …

Slide 37

Slide 37 text

events table …

Slide 38

Slide 38 text

events table

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Multi-tenant (SaaS) tutorial aka.ms/hyperscale-citus-multi-tenant-tutorial

Slide 41

Slide 41 text

Tutorial: Real-time analytics dashboard aka.ms/hyperscale-citus-real-time-tutorial

Slide 42

Slide 42 text

© Copyright Microsoft Corporation. All rights reserved. Leandro Santana linkedin.com/in/leandroeredia/ @citusdata @AzureDBPostgres Thank you!