DEMO - High performance HTAP with Postgres & Hyperscale (Citus) | ACM SIGMOD/PODS 2020 | Marco Slot & Claire Giordano

DEMO - High performance HTAP with Postgres & Hyperscale (Citus) | ACM SIGMOD/PODS 2020 | Marco Slot & Claire Giordano

In this demo, we run a large-scale HTAP workload on Azure Database for PostgreSQL with the built-in Hyperscale (Citus) deployment option. Hyperscale (Citus) uses the open source Citus extension to Postgres to turn a cluster of PostgreSQL servers into a single distributed database that can shard or replicate Postgres tables across the cluster. Citus can simultaneously scale transaction throughput by routing transactions to the right server, and scale analytical queries and data transformations by parallelizing them across all of the servers in the database cluster. In combination with all the powerful Postgres features such as its different index types and other PostgreSQL extensions, this makes Hyperscale (Citus) able to run high performance HTAP workloads at scale.

We will show a side-by-side comparison of Hyperscale (Citus) and a single PostgreSQL server running a transactional workload generated by HammerDB, while simultaneously running analytical queries, and show how you get further speedups by pre-aggregating the data in parallel (using rollups) on the same Postgres database.

143117954187136b825331f24da0e201?s=128

Azure Postgres

June 18, 2020
Tweet

Transcript

  1. Marco Slot Principal Engineer & Lead of Citus Open Source

    project/ with intro by Claire Giordano DEMO High performance HTAP with Postgres & Hyperscale (Citus)
  2. Marco Slot

  3. Hybrid Transactional Analytical Processing @clairegiordano / @marcoslot

  4. Postgres

  5. Hyperscale (Citus) now available as part of Azure Database for

    PostgreSQL
  6. Hyperscale (Citus) now available as part of Azure Database for

    PostgreSQL
  7. Citus extension to Postgres

  8. aka.ms/citus

  9. What is Citus? /// github.com/citusdata/citus ž Transforms Postgres into a

    distributed database
  10. What is Citus? /// github.com/citusdata/citus ž Transforms Postgres into a

    distributed database ž Distributes your data & queries
  11. What is Citus? /// github.com/citusdata/citus ž Transforms Postgres into a

    distributed database ž Distributes your data & queries ž Parallelism
  12. What is Citus? /// github.com/citusdata/citus ž Transforms Postgres into a

    distributed database ž Distributes your data & queries ž Parallelism ž All the cpu, memory, & disk of cluster
  13. Can you tell us a bit about what you will

    demo today? What’s the anatomy of the demo? @clairegiordano / @marcoslot
  14. Order Processing System for Warehouses

  15. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  16. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  17. A bit about HammerDB (it’s NOT a database) hammerdb.com

  18. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  19. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  20. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) v. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  21. Demo: HTAP Database with Hyperscale (Citus) Marco Slot @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  22. Hyperscale (Citus) 10-node cluster 53 minutes 10 sec Transactions Analytical

    query ~900K transactions/min ~40-50K transactions/min 20x faster 300x faster 20 milliseconds Analytical query with rollups Single Postgres Server ~150,000x faster
  23. METADATA W7 W6 W5 W4 W10 W9 W8 W3 W2

    W1 Hyperscale (Citus) 10-node database cluster Coordinator CITUS WORKER NODES
  24. Power of HTAP with Hyperscale (Citus) on Azure Database for

    PostgreSQL
  25. Will all apps see the performance increase you just showed

    us? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  26. None
  27. It’s important to find a good distribution column, something that

    is common to all large tables SELECT create_distributed_table( 'table_name', 'distribution_column'); @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  28. At the end of the demo, you called Citus an

    “almost anything” database. What did you mean? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  29. As an extensible, relational database, Postgres is capable of so

    many things on a single server…
  30. By transforming Postgres into a distributed database, Hyperscale (Citus) makes

    Postgres capable of almost anything
  31. How best to get started with Hyperscale (Citus)? @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  32. Download Citus open source packages aka.ms/citus

  33. Multi-tenant (SaaS) tutorial aka.ms/hyperscale-citus-multi-tenant-tutorial

  34. Tutorial: Real-time analytics dashboard aka.ms/hyperscale-citus-real-time-tutorial

  35. Do you have a favorite blog post? @clairegiordano / @marcoslot

    / @azuredbpostgres / @citusdata
  36. Architecting petabyte-scale analytics by scaling out Postgres on Azure with

    the Citus extension aka.ms/blog-petabyte-scale-analytics
  37. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata Petabyte-scale service architecture

    used by Windows
  38. Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a

    game changer." source: https://aka.ms/blog-petabyte-scale-analytics
  39. © Copyright Microsoft Corporation. All rights reserved. Marco Slot &

    @marcoslot @citusdata @clairegiordano @AzureDBPostgres Claire Giordano Thank you!