High Performance HTAP with Postgres & Hyperscale (Citus) | European Virtual Open Source Summit 2020 | Marco Slot & Claire Giordano

High Performance HTAP with Postgres & Hyperscale (Citus) | European Virtual Open Source Summit 2020 | Marco Slot & Claire Giordano

Did you know you can scale out Postgres horizontally, transforming Postgres into a distributed database that can do almost anything - including support demanding HTAP applications? Join Marco Slot - principal engineer and lead for the Citus open source project at Microsoft and Citus open source advocate Claire Giordano - for a conversation about scaling out Postgres to serve HTAP workloads on Azure, with Hyperscale (Citus).

Citus is an open source extension to Postgres that changes the game for HTAP (hybrid transactional analytical processing) workloads. Because Citus is an extension to Postgres, Citus gives you all the transactional and indexing capabilities of Postgres. And because Citus distributes both your data and your queries across multiple servers, Citus gives you low-latency analytics even in the face of high concurrency and billions of rows of data.

In addition to the interview, Marco will also demo Hyperscale (Citus) on Azure Database for PostgreSQL, to shine a light on the extreme performance you can get when you scale out Postgres horizontally on Azure. Come learn why the combination of relational database semantics and parallel distributed operations in Hyperscale (Citus) can give your HTAP application such a high-performing and unique competitive edge.

143117954187136b825331f24da0e201?s=128

Azure Postgres

June 16, 2020
Tweet

Transcript

  1. High performance HTAP with Postgres & Hyperscale (Citus) Claire Giordano

    Principal PM Manager, Postgres team at Microsoft | Marco Slot Principal Engineer & Lead, Citus open source project @clairegiordano / @marcoslot @azuredbpostgres / @citusdata
  2. Agenda for today’s Interview with Marco Slot Postgres database Citus

    open source extension to Postgres What is HTAP, followed by Demo Interview throughout & Q&A at the end @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  3. What will you take away from this talk?

  4. #1 – What an HTAP database is

  5. #1 – What an HTAP database is #2 – Why

    Hyperscale (Citus) is so good for HTAP
  6. Claire Giordano at FOSDEM 2020

  7. Marco Slot

  8. Postgres team at Microsoft

  9. Back in the beginning of 2019… @clairegiordano / @marcoslot /

    @azuredbpostgres / @citusdata
  10. Microsoft Azure Welcomes PostgreSQL Committers aka.ms/blog-postgres-committers

  11. Marco during COVID quarantine

  12. Based in the Netherlands

  13. Citus open source team, some of Amsterdam crew

  14. Vrije Universiteit Amsterdam— Masters degree in Distributed Systems source: Facebook,

    Vrije Universiteit Amsterdam. https://www.facebook.com/vuamsterdam/photos/ a.352972296481/10156489650536482/
  15. Trinity College Dublin

  16. None
  17. ARE YOU READY TO SCALE OUT POSTGRES? BUT DON’T HAVE

    A PHD IN DISTRIBUTED SYSTEMS? Is your Django/Rails/Spring app using Postgres? Re-architecting database infrastructure to support growth can be painful. Which is why we created a database that extends Postgres and enables you to scale out—while continuing to use your familiar SQL toolset and leverage your existing Postgres expertise. Is your Postgres database bigger than 50 GB+? Many SaaS apps outgrow a single-node database at 50 GB, 100 GB, or 500 GB. At some point, single-node Postgres may no longer give you the performance you need. So we designed a Postgres database that scales out. analytics for billions of events? f events per day tell us the es, and
  18. Source: GitHub. citusdata/citus repo. Jan 31, 2016 – May 8,

    2020. https://aka.ms/citus
  19. Postgres

  20. None
  21. Postgres is more popular than ever 30 years of data

    integrity & robustness built in Ranked 2017 & 2018 DBMS of the Year by DB-Engines One of most loved & wanted databases in Stack Overflow 2019 Developer Survey
  22. One of world’s most popular extensions to Postgres— PostGIS. Yes,

    for geospatial use cases.
  23. We are still dining out on the radical design decision

    by Michael Stonebraker in the design of Postgres—to have run-time extensions.” —Paul Ramsey, PGConfEU 2018 “
  24. None
  25. None
  26. Citus extension to Postgres

  27. aka.ms/citus

  28. Not a fork

  29. Citus transforms Postgres into a distributed database

  30. Distributing your data & your queries

  31. Parallelism, and All the cpu cores, memory, & disk of

    database cluster
  32. Hyperscale (Citus) now available as part of Azure Database for

    PostgreSQL
  33. Drumroll please…… Welcome Marco Slot

  34. Why do you think Postgres is so popular among developers?

    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  35. Postgres is there for you at every stage of your

    application lifecycle.” —Marco Slot “
  36. Download

  37. Download Linux MacOS Windows FreeBSD

  38. Download Tooling

  39. None
  40. Download Experiment Tooling

  41. Download Production Experiment Tooling

  42. aka.ms/azure-postgres

  43. Download Production Scale Experiment Tooling

  44. Partitioning Partial indexes Scaling out with Hyperscale (Citus)

  45. Download Production Scale Experiment Tooling

  46. None
  47. EASY TO GET STARTED GOOD TO PLAY WITH USEFUL FOR

    EVERY STAGE OF APP LIFECYCLE DEVELOPE RS O IT CREATE CLIENT LIBRARIES, TOOLS, EXTENSIONS WHOLE ECOSYSTEM GETS BETTER & BETTER
  48. How does the Citus extension to Postgres work? @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  49. APPLICATION COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn

    A Citus cluster consists of multiple PostgreSQL servers with the Citus extension.
  50. SELECT create_distributed_table( 'table_name', 'distribution_column'); @clairegiordano / @marcoslot / @azuredbpostgres /

    @citusdata
  51. APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns','company_id'); METADATA COORDINATOR

    NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster
  52. APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns

    company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus distributes queries across the database cluster
  53. • JSONB • Joins • Functions • Constraints • Indexes:

    B-tree, GIN, BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • Foreign data wrappers • Window functions • CTEs • Full text search • pg_stat_statements All the functions of Postgres available to Citus cluster
  54. How Citus handles transactions in a multi-node cluster BEGIN; UPDATE

    SET WHERE COMMIT; campaigns start_date = '2018-03-17' company_id = 'Pat Co'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE Campaigns_2012 SET …; COMMIT; APPLICATION COORDINATOR NODE WORKER NODES
  55. What does HTAP stand for? Why are we even talking

    about HTAP databases today? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  56. Hybrid Transactional Analytical Processing @clairegiordano / @marcoslot / @azuredbpostgres /

    @citusdata
  57. OLTP = Online Transactional Processing @clairegiordano / @marcoslot / @azuredbpostgres

    / @citusdata
  58. OLAP = Online Analytical Processing @clairegiordano / @marcoslot / @azuredbpostgres

    / @citusdata
  59. Data sources Storage Transactional database Batch processing Queue Stream processing

    Analytical data store Cache Analytics and reporting
  60. None
  61. Headaches managing separate databases Operational costs (!!!) Glue code Lag

    between events & analytics Updates & deletes @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  62. The promise of HTAP is that there is finally a

    database that can do both transactions & analytics—at scale.” —Marco Slot “
  63. Can you tell us a bit about what you will

    demo today? What’s the anatomy of the demo? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  64. Order Processing System for Warehouses

  65. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  66. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  67. A bit about HammerDB (it’s NOT a database) hammerdb.com

  68. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  69. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  70. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  71. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) v. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  72. Demo: HTAP Database with Hyperscale (Citus) Marco Slot @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  73. Will all apps see the performance increase you just showed

    us? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  74. None
  75. It’s important to find a good distribution column, something that

    is common to all large tables SELECT create_distributed_table( 'table_name', 'distribution_column'); @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  76. At the end of the demo, you called Citus an

    “almost anything” database. What did you mean? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  77. As an extensible, relational database, Postgres is capable of so

    many things on a single server…
  78. By transforming Postgres into a distributed database, Hyperscale (Citus) makes

    Postgres capable of almost anything
  79. How best to get started with Hyperscale (Citus)? @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  80. Download Citus open source packages aka.ms/citus

  81. Multi-tenant (SaaS) tutorial aka.ms/hyperscale-citus-multi-tenant-tutorial

  82. Tutorial: Real-time analytics dashboard aka.ms/hyperscale-citus-real-time-tutorial

  83. Do you have a favorite blog post? @clairegiordano / @marcoslot

    / @azuredbpostgres / @citusdata
  84. Architecting petabyte-scale analytics by scaling out Postgres on Azure with

    the Citus extension aka.ms/blog-petabyte-scale-analytics
  85. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata Petabyte-scale service architecture

    used by Windows
  86. Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a

    game changer." source: https://aka.ms/blog-petabyte-scale-analytics
  87. Please don’t forget the Citus newsletter @clairegiordano / @marcoslot /

    @azuredbpostgres / @citusdata
  88. Citus Newsletter aka.ms/citus-newsletter

  89. So what should we take away from your HTAP demo?

    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  90. HTAP database H·tap da·ta·base /eɪtʃ tæp ˈdeɪtəˌbeɪs/ noun 1. A

    database that can handle both transactions and analytics— at scale—obviating the need to manage and use multiple platforms to handle different aspects of your application. 2. A database that saves you time, reduces operational costs, and gets rid of many headaches. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  91. Hyperscale (Citus) enables you to parallelize both your queries and

    your transactions across a cluster, while still keeping all the powerful Postgres features and rich ecosystem.
  92. © Copyright Microsoft Corporation. All rights reserved. thank you merci

    grazie ध"यवाद ευχαριστώ gracias ధన#$ా&'ల) 谢谢 dank u merci danke schön thank you ευχαριστώ grazie gracias tack teşekkür ederim Claire Giordano & @clairegiordano @AzureDBPostgres @marcoslot @citusdata Marco Slot
  93. Any questions?