Slide 1

Slide 1 text

High performance HTAP with Postgres & Hyperscale (Citus) Claire Giordano Principal PM Manager, Postgres team at Microsoft | Marco Slot Principal Engineer & Lead, Citus open source project @clairegiordano / @marcoslot @azuredbpostgres / @citusdata

Slide 2

Slide 2 text

Agenda for today’s Interview with Marco Slot Postgres database Citus open source extension to Postgres What is HTAP, followed by Demo Interview throughout & Q&A at the end @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 3

Slide 3 text

What will you take away from this talk?

Slide 4

Slide 4 text

#1 – What an HTAP database is

Slide 5

Slide 5 text

#1 – What an HTAP database is #2 – Why Hyperscale (Citus) is so good for HTAP

Slide 6

Slide 6 text

Claire Giordano at FOSDEM 2020

Slide 7

Slide 7 text

Marco Slot

Slide 8

Slide 8 text

Postgres team at Microsoft

Slide 9

Slide 9 text

Back in the beginning of 2019… @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 10

Slide 10 text

Microsoft Azure Welcomes PostgreSQL Committers aka.ms/blog-postgres-committers

Slide 11

Slide 11 text

Marco during COVID quarantine

Slide 12

Slide 12 text

Based in the Netherlands

Slide 13

Slide 13 text

Citus open source team, some of Amsterdam crew

Slide 14

Slide 14 text

Vrije Universiteit Amsterdam— Masters degree in Distributed Systems source: Facebook, Vrije Universiteit Amsterdam. https://www.facebook.com/vuamsterdam/photos/ a.352972296481/10156489650536482/

Slide 15

Slide 15 text

Trinity College Dublin

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

ARE YOU READY TO SCALE OUT POSTGRES? BUT DON’T HAVE A PHD IN DISTRIBUTED SYSTEMS? Is your Django/Rails/Spring app using Postgres? Re-architecting database infrastructure to support growth can be painful. Which is why we created a database that extends Postgres and enables you to scale out—while continuing to use your familiar SQL toolset and leverage your existing Postgres expertise. Is your Postgres database bigger than 50 GB+? Many SaaS apps outgrow a single-node database at 50 GB, 100 GB, or 500 GB. At some point, single-node Postgres may no longer give you the performance you need. So we designed a Postgres database that scales out. analytics for billions of events? f events per day tell us the es, and

Slide 18

Slide 18 text

Source: GitHub. citusdata/citus repo. Jan 31, 2016 – May 8, 2020. https://aka.ms/citus

Slide 19

Slide 19 text

Postgres

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Postgres is more popular than ever 30 years of data integrity & robustness built in Ranked 2017 & 2018 DBMS of the Year by DB-Engines One of most loved & wanted databases in Stack Overflow 2019 Developer Survey

Slide 22

Slide 22 text

One of world’s most popular extensions to Postgres— PostGIS. Yes, for geospatial use cases.

Slide 23

Slide 23 text

We are still dining out on the radical design decision by Michael Stonebraker in the design of Postgres—to have run-time extensions.” —Paul Ramsey, PGConfEU 2018 “

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

Citus extension to Postgres

Slide 27

Slide 27 text

aka.ms/citus

Slide 28

Slide 28 text

Not a fork

Slide 29

Slide 29 text

Citus transforms Postgres into a distributed database

Slide 30

Slide 30 text

Distributing your data & your queries

Slide 31

Slide 31 text

Parallelism, and All the cpu cores, memory, & disk of database cluster

Slide 32

Slide 32 text

Hyperscale (Citus) now available as part of Azure Database for PostgreSQL

Slide 33

Slide 33 text

Drumroll please…… Welcome Marco Slot

Slide 34

Slide 34 text

Why do you think Postgres is so popular among developers? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 35

Slide 35 text

Postgres is there for you at every stage of your application lifecycle.” —Marco Slot “

Slide 36

Slide 36 text

Download

Slide 37

Slide 37 text

Download Linux MacOS Windows FreeBSD

Slide 38

Slide 38 text

Download Tooling

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Download Experiment Tooling

Slide 41

Slide 41 text

Download Production Experiment Tooling

Slide 42

Slide 42 text

aka.ms/azure-postgres

Slide 43

Slide 43 text

Download Production Scale Experiment Tooling

Slide 44

Slide 44 text

Partitioning Partial indexes Scaling out with Hyperscale (Citus)

Slide 45

Slide 45 text

Download Production Scale Experiment Tooling

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

EASY TO GET STARTED GOOD TO PLAY WITH USEFUL FOR EVERY STAGE OF APP LIFECYCLE DEVELOPE RS O IT CREATE CLIENT LIBRARIES, TOOLS, EXTENSIONS WHOLE ECOSYSTEM GETS BETTER & BETTER

Slide 48

Slide 48 text

How does the Citus extension to Postgres work? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 49

Slide 49 text

APPLICATION COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn A Citus cluster consists of multiple PostgreSQL servers with the Citus extension.

Slide 50

Slide 50 text

SELECT create_distributed_table( 'table_name', 'distribution_column'); @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 51

Slide 51 text

APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns','company_id'); METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster

Slide 52

Slide 52 text

APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus distributes queries across the database cluster

Slide 53

Slide 53 text

• JSONB • Joins • Functions • Constraints • Indexes: B-tree, GIN, BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • Foreign data wrappers • Window functions • CTEs • Full text search • pg_stat_statements All the functions of Postgres available to Citus cluster

Slide 54

Slide 54 text

How Citus handles transactions in a multi-node cluster BEGIN; UPDATE SET WHERE COMMIT; campaigns start_date = '2018-03-17' company_id = 'Pat Co'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE Campaigns_2012 SET …; COMMIT; APPLICATION COORDINATOR NODE WORKER NODES

Slide 55

Slide 55 text

What does HTAP stand for? Why are we even talking about HTAP databases today? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 56

Slide 56 text

Hybrid Transactional Analytical Processing @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 57

Slide 57 text

OLTP = Online Transactional Processing @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 58

Slide 58 text

OLAP = Online Analytical Processing @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 59

Slide 59 text

Data sources Storage Transactional database Batch processing Queue Stream processing Analytical data store Cache Analytics and reporting

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

Headaches managing separate databases Operational costs (!!!) Glue code Lag between events & analytics Updates & deletes @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 62

Slide 62 text

The promise of HTAP is that there is finally a database that can do both transactions & analytics—at scale.” —Marco Slot “

Slide 63

Slide 63 text

Can you tell us a bit about what you will demo today? What’s the anatomy of the demo? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 64

Slide 64 text

Order Processing System for Warehouses

Slide 65

Slide 65 text

What you will see in today’s HTAP database demo All running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)

Slide 66

Slide 66 text

What you will see in today’s HTAP database demo All running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)

Slide 67

Slide 67 text

A bit about HammerDB (it’s NOT a database) hammerdb.com

Slide 68

Slide 68 text

What you will see in today’s HTAP database demo All running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)

Slide 69

Slide 69 text

What you will see in today’s HTAP database demo All running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)

Slide 70

Slide 70 text

What you will see in today’s HTAP database demo All running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)

Slide 71

Slide 71 text

What you will see in today’s HTAP database demo All running on Azure Side-by-side performance compare: Hyperscale (Citus) v. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)

Slide 72

Slide 72 text

Demo: HTAP Database with Hyperscale (Citus) Marco Slot @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 73

Slide 73 text

Will all apps see the performance increase you just showed us? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

It’s important to find a good distribution column, something that is common to all large tables SELECT create_distributed_table( 'table_name', 'distribution_column'); @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 76

Slide 76 text

At the end of the demo, you called Citus an “almost anything” database. What did you mean? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 77

Slide 77 text

As an extensible, relational database, Postgres is capable of so many things on a single server…

Slide 78

Slide 78 text

By transforming Postgres into a distributed database, Hyperscale (Citus) makes Postgres capable of almost anything

Slide 79

Slide 79 text

How best to get started with Hyperscale (Citus)? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 80

Slide 80 text

Download Citus open source packages aka.ms/citus

Slide 81

Slide 81 text

Multi-tenant (SaaS) tutorial aka.ms/hyperscale-citus-multi-tenant-tutorial

Slide 82

Slide 82 text

Tutorial: Real-time analytics dashboard aka.ms/hyperscale-citus-real-time-tutorial

Slide 83

Slide 83 text

Do you have a favorite blog post? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 84

Slide 84 text

Architecting petabyte-scale analytics by scaling out Postgres on Azure with the Citus extension aka.ms/blog-petabyte-scale-analytics

Slide 85

Slide 85 text

@clairegiordano / @marcoslot / @azuredbpostgres / @citusdata Petabyte-scale service architecture used by Windows

Slide 86

Slide 86 text

Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a game changer." source: https://aka.ms/blog-petabyte-scale-analytics

Slide 87

Slide 87 text

Please don’t forget the Citus newsletter @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 88

Slide 88 text

Citus Newsletter aka.ms/citus-newsletter

Slide 89

Slide 89 text

So what should we take away from your HTAP demo? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 90

Slide 90 text

HTAP database H·tap da·ta·base /eɪtʃ tæp ˈdeɪtəˌbeɪs/ noun 1. A database that can handle both transactions and analytics— at scale—obviating the need to manage and use multiple platforms to handle different aspects of your application. 2. A database that saves you time, reduces operational costs, and gets rid of many headaches. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

Slide 91

Slide 91 text

Hyperscale (Citus) enables you to parallelize both your queries and your transactions across a cluster, while still keeping all the powerful Postgres features and rich ecosystem.

Slide 92

Slide 92 text

© Copyright Microsoft Corporation. All rights reserved. thank you merci grazie ध"यवाद ευχαριστώ gracias ధన#$ా&'ల) 谢谢 dank u merci danke schön thank you ευχαριστώ grazie gracias tack teşekkür ederim Claire Giordano & @clairegiordano @AzureDBPostgres @marcoslot @citusdata Marco Slot

Slide 93

Slide 93 text

Any questions?