Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Performance HTAP with Postgres & Hyperscale (Citus) | European Virtual Open Source Summit 2020 | Marco Slot & Claire Giordano

High Performance HTAP with Postgres & Hyperscale (Citus) | European Virtual Open Source Summit 2020 | Marco Slot & Claire Giordano

Did you know you can scale out Postgres horizontally, transforming Postgres into a distributed database that can do almost anything - including support demanding HTAP applications? Join Marco Slot - principal engineer and lead for the Citus open source project at Microsoft and Citus open source advocate Claire Giordano - for a conversation about scaling out Postgres to serve HTAP workloads on Azure, with Hyperscale (Citus).

Citus is an open source extension to Postgres that changes the game for HTAP (hybrid transactional analytical processing) workloads. Because Citus is an extension to Postgres, Citus gives you all the transactional and indexing capabilities of Postgres. And because Citus distributes both your data and your queries across multiple servers, Citus gives you low-latency analytics even in the face of high concurrency and billions of rows of data.

In addition to the interview, Marco will also demo Hyperscale (Citus) on Azure Database for PostgreSQL, to shine a light on the extreme performance you can get when you scale out Postgres horizontally on Azure. Come learn why the combination of relational database semantics and parallel distributed operations in Hyperscale (Citus) can give your HTAP application such a high-performing and unique competitive edge.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. High performance HTAP with
    Postgres & Hyperscale (Citus)
    Claire Giordano Principal PM Manager, Postgres team
    at Microsoft | Marco Slot Principal Engineer & Lead,
    Citus open source project
    @clairegiordano / @marcoslot
    @azuredbpostgres / @citusdata

    View full-size slide

  2. Agenda for today’s Interview with Marco Slot
    Postgres database Citus open source
    extension to
    Postgres
    What is HTAP,
    followed by
    Demo
    Interview
    throughout &
    Q&A at the end
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  3. What will you take away
    from this talk?

    View full-size slide

  4. #1 – What an HTAP database is

    View full-size slide

  5. #1 – What an HTAP database is
    #2 – Why Hyperscale (Citus) is so good for HTAP

    View full-size slide

  6. Claire Giordano at FOSDEM 2020

    View full-size slide

  7. Postgres team at Microsoft

    View full-size slide

  8. Back in the beginning of 2019…
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  9. Microsoft Azure Welcomes PostgreSQL Committers
    aka.ms/blog-postgres-committers

    View full-size slide

  10. Marco during COVID quarantine

    View full-size slide

  11. Based in the Netherlands

    View full-size slide

  12. Citus open source team, some of
    Amsterdam crew

    View full-size slide

  13. Vrije Universiteit
    Amsterdam—
    Masters degree in
    Distributed Systems
    source: Facebook, Vrije Universiteit Amsterdam.
    https://www.facebook.com/vuamsterdam/photos/
    a.352972296481/10156489650536482/

    View full-size slide

  14. Trinity College Dublin

    View full-size slide

  15. ARE YOU
    READY TO SCALE OUT
    POSTGRES?
    BUT DON’T HAVE A PHD IN DISTRIBUTED SYSTEMS?
    Is your Django/Rails/Spring app using Postgres?
    Re-architecting database infrastructure to support growth can be painful. Which
    is why we created a database that extends Postgres and enables you to scale
    out—while continuing to use your familiar SQL toolset and leverage your existing
    Postgres expertise.
    Is your Postgres database bigger than 50 GB+?
    Many SaaS apps outgrow a single-node database at 50 GB, 100 GB, or 500 GB.
    At some point, single-node Postgres may no longer give you the performance
    you need. So we designed a Postgres database that scales out.
    analytics for billions of events?
    f events per day tell us the
    es, and

    View full-size slide

  16. Source: GitHub. citusdata/citus repo. Jan 31, 2016 – May 8, 2020. https://aka.ms/citus

    View full-size slide

  17. Postgres is more popular than ever
    30 years of
    data integrity
    & robustness
    built in
    Ranked 2017 &
    2018 DBMS of
    the Year by
    DB-Engines
    One of most
    loved & wanted
    databases in Stack
    Overflow 2019
    Developer Survey

    View full-size slide

  18. One of world’s most popular extensions to Postgres—
    PostGIS. Yes, for geospatial use cases.

    View full-size slide

  19. We are still dining
    out on the radical
    design decision by
    Michael Stonebraker
    in the design of
    Postgres—to have
    run-time extensions.”
    —Paul Ramsey, PGConfEU 2018

    View full-size slide

  20. Citus
    extension
    to Postgres

    View full-size slide

  21. aka.ms/citus

    View full-size slide

  22. Citus transforms Postgres into a distributed database

    View full-size slide

  23. Distributing your data & your queries

    View full-size slide

  24. Parallelism, and
    All the cpu cores, memory, & disk of database cluster

    View full-size slide

  25. Hyperscale (Citus)
    now available as part of Azure
    Database for PostgreSQL

    View full-size slide

  26. Drumroll please……
    Welcome
    Marco Slot

    View full-size slide

  27. Why do you think Postgres
    is so popular
    among developers?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  28. Postgres is there for
    you at every stage
    of your application
    lifecycle.”
    —Marco Slot

    View full-size slide

  29. Download
    Linux
    MacOS
    Windows
    FreeBSD

    View full-size slide

  30. Download
    Tooling

    View full-size slide

  31. Download
    Experiment
    Tooling

    View full-size slide

  32. Download
    Production
    Experiment
    Tooling

    View full-size slide

  33. aka.ms/azure-postgres

    View full-size slide

  34. Download
    Production
    Scale
    Experiment
    Tooling

    View full-size slide

  35. Partitioning
    Partial indexes
    Scaling out with
    Hyperscale (Citus)

    View full-size slide

  36. Download
    Production
    Scale
    Experiment
    Tooling

    View full-size slide

  37. EASY TO GET
    STARTED
    GOOD TO
    PLAY WITH
    USEFUL FOR
    EVERY STAGE OF
    APP LIFECYCLE
    DEVELOPE
    RS O IT
    CREATE CLIENT
    LIBRARIES,
    TOOLS,
    EXTENSIONS
    WHOLE ECOSYSTEM
    GETS BETTER & BETTER

    View full-size slide

  38. How does the Citus extension to
    Postgres work?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  39. APPLICATION
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    A Citus cluster consists of multiple PostgreSQL servers with the Citus extension.

    View full-size slide

  40. SELECT create_distributed_table(
    'table_name',
    'distribution_column');
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  41. APPLICATION
    CREATE TABLE campaigns (…);
    SELECT create_distributed_table(
    'campaigns','company_id');
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    CREATE TABLE
    campaigns_102
    CREATE TABLE
    campaigns_105
    CREATE TABLE
    campaigns_101
    CREATE TABLE
    campaigns_104
    CREATE TABLE
    campaigns_103
    CREATE TABLE
    campaigns_106
    How Citus distributes tables across the database cluster

    View full-size slide

  42. APPLICATION
    SELECT
    FROM
    GROUP BY
    company_id,
    avg(spend) AS avg_campaign_spend
    campaigns
    company_id;
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2009 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2001 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2017 …
    How Citus distributes queries across the database cluster

    View full-size slide

  43. • JSONB
    • Joins
    • Functions
    • Constraints
    • Indexes: B-tree, GIN,
    BRIN, & GiST
    • Partial Indexes
    • Other extensions
    • PostGIS
    • Rich datatypes
    • Foreign data wrappers
    • Window functions
    • CTEs
    • Full text search
    • pg_stat_statements
    All the functions of Postgres available to Citus cluster

    View full-size slide

  44. How Citus handles transactions in a multi-node cluster
    BEGIN;
    UPDATE
    SET
    WHERE
    COMMIT;
    campaigns
    start_date = '2018-03-17'
    company_id = 'Pat Co';
    METADATA
    W1
    W2
    W3 …
    Wn
    BEGIN; UPDATE
    Campaigns_2012
    SET …;
    COMMIT;
    APPLICATION
    COORDINATOR
    NODE
    WORKER NODES

    View full-size slide

  45. What does HTAP stand for?
    Why are we even talking about HTAP
    databases today?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  46. Hybrid
    Transactional
    Analytical
    Processing
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  47. OLTP =
    Online Transactional Processing
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  48. OLAP =
    Online Analytical Processing
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  49. Data
    sources
    Storage
    Transactional
    database
    Batch
    processing
    Queue
    Stream
    processing
    Analytical
    data
    store
    Cache
    Analytics and
    reporting

    View full-size slide

  50. Headaches
    managing
    separate
    databases
    Operational costs (!!!)
    Glue code
    Lag between events & analytics
    Updates & deletes
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  51. The promise of HTAP
    is that there is finally
    a database that can
    do both transactions
    & analytics—at
    scale.”
    —Marco Slot

    View full-size slide

  52. Can you tell us a bit about what you
    will demo today?
    What’s the anatomy of the demo?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  53. Order Processing
    System for
    Warehouses

    View full-size slide

  54. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View full-size slide

  55. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View full-size slide

  56. A bit about HammerDB (it’s NOT a database)
    hammerdb.com

    View full-size slide

  57. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View full-size slide

  58. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View full-size slide

  59. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View full-size slide

  60. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) v. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View full-size slide

  61. Demo: HTAP Database with Hyperscale (Citus)
    Marco Slot
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  62. Will all apps see the performance
    increase you just showed us?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  63. It’s important to
    find a good
    distribution column,
    something that is
    common to all large
    tables
    SELECT create_distributed_table(
    'table_name',
    'distribution_column');
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  64. At the end of the demo, you called
    Citus an “almost anything” database.
    What did you mean?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  65. As an extensible, relational database, Postgres is
    capable of so many things on a single server…

    View full-size slide

  66. By transforming Postgres
    into a distributed
    database, Hyperscale
    (Citus) makes Postgres
    capable of
    almost anything

    View full-size slide

  67. How best to get started with
    Hyperscale (Citus)?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  68. Download
    Citus open
    source
    packages
    aka.ms/citus

    View full-size slide

  69. Multi-tenant
    (SaaS)
    tutorial
    aka.ms/hyperscale-citus-multi-tenant-tutorial

    View full-size slide

  70. Tutorial:
    Real-time
    analytics
    dashboard
    aka.ms/hyperscale-citus-real-time-tutorial

    View full-size slide

  71. Do you have a favorite blog
    post?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  72. Architecting petabyte-scale analytics by scaling out
    Postgres on Azure with the Citus extension
    aka.ms/blog-petabyte-scale-analytics

    View full-size slide

  73. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
    Petabyte-scale service architecture used by Windows

    View full-size slide

  74. Min Wei, Principal Engineer at Microsoft
    Distributed PostgreSQL
    is a game changer."
    source:
    https://aka.ms/blog-petabyte-scale-analytics

    View full-size slide

  75. Please don’t forget the
    Citus newsletter
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  76. Citus Newsletter
    aka.ms/citus-newsletter

    View full-size slide

  77. So what should we take away
    from your HTAP demo?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  78. HTAP database
    H·tap da·ta·base /eɪtʃ tæp ˈdeɪtəˌbeɪs/
    noun
    1. A database that can handle both transactions and analytics—
    at scale—obviating the need to manage and use multiple
    platforms to handle different aspects of your application.
    2. A database that saves you time, reduces operational costs,
    and gets rid of many headaches.
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View full-size slide

  79. Hyperscale (Citus) enables
    you to parallelize both your
    queries and your
    transactions across a cluster,
    while still keeping all the
    powerful Postgres features
    and rich ecosystem.

    View full-size slide

  80. © Copyright Microsoft Corporation. All rights reserved.
    thank you
    merci
    grazie
    ध"यवाद
    ευχαριστώ
    gracias
    ధన#$ా&'ల)
    谢谢
    dank u
    merci
    danke schön
    thank you
    ευχαριστώ
    grazie
    gracias
    tack
    teşekkür ederim
    Claire Giordano &
    @clairegiordano
    @AzureDBPostgres
    @marcoslot
    @citusdata
    Marco Slot

    View full-size slide

  81. Any questions?

    View full-size slide