Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DEMO - High performance HTAP with Postgres & Hyperscale (Citus) | ACM SIGMOD/PODS 2020 | Marco Slot & Claire Giordano

DEMO - High performance HTAP with Postgres & Hyperscale (Citus) | ACM SIGMOD/PODS 2020 | Marco Slot & Claire Giordano

In this demo, we run a large-scale HTAP workload on Azure Database for PostgreSQL with the built-in Hyperscale (Citus) deployment option. Hyperscale (Citus) uses the open source Citus extension to Postgres to turn a cluster of PostgreSQL servers into a single distributed database that can shard or replicate Postgres tables across the cluster. Citus can simultaneously scale transaction throughput by routing transactions to the right server, and scale analytical queries and data transformations by parallelizing them across all of the servers in the database cluster. In combination with all the powerful Postgres features such as its different index types and other PostgreSQL extensions, this makes Hyperscale (Citus) able to run high performance HTAP workloads at scale.

We will show a side-by-side comparison of Hyperscale (Citus) and a single PostgreSQL server running a transactional workload generated by HammerDB, while simultaneously running analytical queries, and show how you get further speedups by pre-aggregating the data in parallel (using rollups) on the same Postgres database.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. Marco Slot Principal Engineer & Lead of Citus Open Source

    project/ with intro by Claire Giordano DEMO High performance HTAP with Postgres & Hyperscale (Citus)
  2. What is Citus? /// github.com/citusdata/citus ž Transforms Postgres into a

    distributed database ž Distributes your data & queries
  3. What is Citus? /// github.com/citusdata/citus ž Transforms Postgres into a

    distributed database ž Distributes your data & queries ž Parallelism
  4. What is Citus? /// github.com/citusdata/citus ž Transforms Postgres into a

    distributed database ž Distributes your data & queries ž Parallelism ž All the cpu, memory, & disk of cluster
  5. Can you tell us a bit about what you will

    demo today? What’s the anatomy of the demo? @clairegiordano / @marcoslot
  6. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  7. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  8. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  9. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  10. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) v. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  11. Hyperscale (Citus) 10-node cluster 53 minutes 10 sec Transactions Analytical

    query ~900K transactions/min ~40-50K transactions/min 20x faster 300x faster 20 milliseconds Analytical query with rollups Single Postgres Server ~150,000x faster
  12. METADATA W7 W6 W5 W4 W10 W9 W8 W3 W2

    W1 Hyperscale (Citus) 10-node database cluster Coordinator CITUS WORKER NODES
  13. Will all apps see the performance increase you just showed

    us? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  14. It’s important to find a good distribution column, something that

    is common to all large tables SELECT create_distributed_table( 'table_name', 'distribution_column'); @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  15. At the end of the demo, you called Citus an

    “almost anything” database. What did you mean? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  16. How best to get started with Hyperscale (Citus)? @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  17. Architecting petabyte-scale analytics by scaling out Postgres on Azure with

    the Citus extension aka.ms/blog-petabyte-scale-analytics
  18. Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a

    game changer." source: https://aka.ms/blog-petabyte-scale-analytics
  19. © Copyright Microsoft Corporation. All rights reserved. Marco Slot &

    @marcoslot @citusdata @clairegiordano @AzureDBPostgres Claire Giordano Thank you!