Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DEMO - High performance HTAP with Postgres & Hyperscale (Citus) | ACM SIGMOD/PODS 2020 | Marco Slot & Claire Giordano

DEMO - High performance HTAP with Postgres & Hyperscale (Citus) | ACM SIGMOD/PODS 2020 | Marco Slot & Claire Giordano

In this demo, we run a large-scale HTAP workload on Azure Database for PostgreSQL with the built-in Hyperscale (Citus) deployment option. Hyperscale (Citus) uses the open source Citus extension to Postgres to turn a cluster of PostgreSQL servers into a single distributed database that can shard or replicate Postgres tables across the cluster. Citus can simultaneously scale transaction throughput by routing transactions to the right server, and scale analytical queries and data transformations by parallelizing them across all of the servers in the database cluster. In combination with all the powerful Postgres features such as its different index types and other PostgreSQL extensions, this makes Hyperscale (Citus) able to run high performance HTAP workloads at scale.

We will show a side-by-side comparison of Hyperscale (Citus) and a single PostgreSQL server running a transactional workload generated by HammerDB, while simultaneously running analytical queries, and show how you get further speedups by pre-aggregating the data in parallel (using rollups) on the same Postgres database.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. Marco Slot Principal Engineer & Lead of
    Citus Open Source project/ with
    intro by Claire Giordano
    DEMO
    High performance HTAP
    with Postgres &
    Hyperscale (Citus)

    View Slide

  2. Marco Slot

    View Slide

  3. Hybrid
    Transactional
    Analytical
    Processing
    @clairegiordano / @marcoslot

    View Slide

  4. Postgres

    View Slide

  5. Hyperscale (Citus)
    now available as part of Azure
    Database for PostgreSQL

    View Slide

  6. Hyperscale (Citus)
    now available as part of Azure
    Database for PostgreSQL

    View Slide

  7. Citus
    extension
    to Postgres

    View Slide

  8. aka.ms/citus

    View Slide

  9. What is Citus? /// github.com/citusdata/citus
    ž Transforms Postgres into a distributed database

    View Slide

  10. What is Citus? /// github.com/citusdata/citus
    ž Transforms Postgres into a distributed database
    ž Distributes your data & queries

    View Slide

  11. What is Citus? /// github.com/citusdata/citus
    ž Transforms Postgres into a distributed database
    ž Distributes your data & queries
    ž Parallelism

    View Slide

  12. What is Citus? /// github.com/citusdata/citus
    ž Transforms Postgres into a distributed database
    ž Distributes your data & queries
    ž Parallelism
    ž All the cpu, memory, & disk of cluster

    View Slide

  13. Can you tell us a bit about what you
    will demo today?
    What’s the anatomy of the demo?
    @clairegiordano / @marcoslot

    View Slide

  14. Order Processing
    System for
    Warehouses

    View Slide

  15. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  16. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  17. A bit about HammerDB (it’s NOT a database)
    hammerdb.com

    View Slide

  18. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  19. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  20. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) v. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  21. Demo: HTAP Database with Hyperscale (Citus)
    Marco Slot
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  22. Hyperscale (Citus)
    10-node cluster
    53 minutes
    10 sec
    Transactions
    Analytical query
    ~900K
    transactions/min
    ~40-50K
    transactions/min
    20x faster
    300x faster
    20 milliseconds
    Analytical query
    with rollups
    Single Postgres
    Server
    ~150,000x faster

    View Slide

  23. METADATA
    W7
    W6
    W5
    W4
    W10
    W9
    W8
    W3
    W2
    W1
    Hyperscale
    (Citus)
    10-node
    database cluster
    Coordinator
    CITUS WORKER
    NODES

    View Slide

  24. Power of HTAP with
    Hyperscale (Citus) on
    Azure Database for PostgreSQL

    View Slide

  25. Will all apps see the performance
    increase you just showed us?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  26. View Slide

  27. It’s important to
    find a good
    distribution column,
    something that is
    common to all large
    tables
    SELECT create_distributed_table(
    'table_name',
    'distribution_column');
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  28. At the end of the demo, you called
    Citus an “almost anything” database.
    What did you mean?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  29. As an extensible, relational database, Postgres is
    capable of so many things on a single server…

    View Slide

  30. By transforming Postgres
    into a distributed
    database, Hyperscale
    (Citus) makes Postgres
    capable of
    almost anything

    View Slide

  31. How best to get started with
    Hyperscale (Citus)?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  32. Download
    Citus open
    source
    packages
    aka.ms/citus

    View Slide

  33. Multi-tenant
    (SaaS)
    tutorial
    aka.ms/hyperscale-citus-multi-tenant-tutorial

    View Slide

  34. Tutorial:
    Real-time
    analytics
    dashboard
    aka.ms/hyperscale-citus-real-time-tutorial

    View Slide

  35. Do you have a favorite blog
    post?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  36. Architecting petabyte-scale analytics by scaling out
    Postgres on Azure with the Citus extension
    aka.ms/blog-petabyte-scale-analytics

    View Slide

  37. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
    Petabyte-scale service architecture used by Windows

    View Slide

  38. Min Wei, Principal Engineer at Microsoft
    Distributed PostgreSQL
    is a game changer."
    source:
    https://aka.ms/blog-petabyte-scale-analytics

    View Slide

  39. © Copyright Microsoft Corporation. All rights reserved.
    Marco Slot &
    @marcoslot
    @citusdata
    @clairegiordano
    @AzureDBPostgres
    Claire Giordano
    Thank you!

    View Slide