Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Performance HTAP with Postgres & Hyperscale (Citus) | European Virtual Open Source Summit 2020 | Marco Slot & Claire Giordano

High Performance HTAP with Postgres & Hyperscale (Citus) | European Virtual Open Source Summit 2020 | Marco Slot & Claire Giordano

Did you know you can scale out Postgres horizontally, transforming Postgres into a distributed database that can do almost anything - including support demanding HTAP applications? Join Marco Slot - principal engineer and lead for the Citus open source project at Microsoft and Citus open source advocate Claire Giordano - for a conversation about scaling out Postgres to serve HTAP workloads on Azure, with Hyperscale (Citus).

Citus is an open source extension to Postgres that changes the game for HTAP (hybrid transactional analytical processing) workloads. Because Citus is an extension to Postgres, Citus gives you all the transactional and indexing capabilities of Postgres. And because Citus distributes both your data and your queries across multiple servers, Citus gives you low-latency analytics even in the face of high concurrency and billions of rows of data.

In addition to the interview, Marco will also demo Hyperscale (Citus) on Azure Database for PostgreSQL, to shine a light on the extreme performance you can get when you scale out Postgres horizontally on Azure. Come learn why the combination of relational database semantics and parallel distributed operations in Hyperscale (Citus) can give your HTAP application such a high-performing and unique competitive edge.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. High performance HTAP with
    Postgres & Hyperscale (Citus)
    Claire Giordano Principal PM Manager, Postgres team
    at Microsoft | Marco Slot Principal Engineer & Lead,
    Citus open source project
    @clairegiordano / @marcoslot
    @azuredbpostgres / @citusdata

    View Slide

  2. Agenda for today’s Interview with Marco Slot
    Postgres database Citus open source
    extension to
    Postgres
    What is HTAP,
    followed by
    Demo
    Interview
    throughout &
    Q&A at the end
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  3. What will you take away
    from this talk?

    View Slide

  4. #1 – What an HTAP database is

    View Slide

  5. #1 – What an HTAP database is
    #2 – Why Hyperscale (Citus) is so good for HTAP

    View Slide

  6. Claire Giordano at FOSDEM 2020

    View Slide

  7. Marco Slot

    View Slide

  8. Postgres team at Microsoft

    View Slide

  9. Back in the beginning of 2019…
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  10. Microsoft Azure Welcomes PostgreSQL Committers
    aka.ms/blog-postgres-committers

    View Slide

  11. Marco during COVID quarantine

    View Slide

  12. Based in the Netherlands

    View Slide

  13. Citus open source team, some of
    Amsterdam crew

    View Slide

  14. Vrije Universiteit
    Amsterdam—
    Masters degree in
    Distributed Systems
    source: Facebook, Vrije Universiteit Amsterdam.
    https://www.facebook.com/vuamsterdam/photos/
    a.352972296481/10156489650536482/

    View Slide

  15. Trinity College Dublin

    View Slide

  16. View Slide

  17. ARE YOU
    READY TO SCALE OUT
    POSTGRES?
    BUT DON’T HAVE A PHD IN DISTRIBUTED SYSTEMS?
    Is your Django/Rails/Spring app using Postgres?
    Re-architecting database infrastructure to support growth can be painful. Which
    is why we created a database that extends Postgres and enables you to scale
    out—while continuing to use your familiar SQL toolset and leverage your existing
    Postgres expertise.
    Is your Postgres database bigger than 50 GB+?
    Many SaaS apps outgrow a single-node database at 50 GB, 100 GB, or 500 GB.
    At some point, single-node Postgres may no longer give you the performance
    you need. So we designed a Postgres database that scales out.
    analytics for billions of events?
    f events per day tell us the
    es, and

    View Slide

  18. Source: GitHub. citusdata/citus repo. Jan 31, 2016 – May 8, 2020. https://aka.ms/citus

    View Slide

  19. Postgres

    View Slide

  20. View Slide

  21. Postgres is more popular than ever
    30 years of
    data integrity
    & robustness
    built in
    Ranked 2017 &
    2018 DBMS of
    the Year by
    DB-Engines
    One of most
    loved & wanted
    databases in Stack
    Overflow 2019
    Developer Survey

    View Slide

  22. One of world’s most popular extensions to Postgres—
    PostGIS. Yes, for geospatial use cases.

    View Slide

  23. We are still dining
    out on the radical
    design decision by
    Michael Stonebraker
    in the design of
    Postgres—to have
    run-time extensions.”
    —Paul Ramsey, PGConfEU 2018

    View Slide

  24. View Slide

  25. View Slide

  26. Citus
    extension
    to Postgres

    View Slide

  27. aka.ms/citus

    View Slide

  28. Not a fork

    View Slide

  29. Citus transforms Postgres into a distributed database

    View Slide

  30. Distributing your data & your queries

    View Slide

  31. Parallelism, and
    All the cpu cores, memory, & disk of database cluster

    View Slide

  32. Hyperscale (Citus)
    now available as part of Azure
    Database for PostgreSQL

    View Slide

  33. Drumroll please……
    Welcome
    Marco Slot

    View Slide

  34. Why do you think Postgres
    is so popular
    among developers?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  35. Postgres is there for
    you at every stage
    of your application
    lifecycle.”
    —Marco Slot

    View Slide

  36. Download

    View Slide

  37. Download
    Linux
    MacOS
    Windows
    FreeBSD

    View Slide

  38. Download
    Tooling

    View Slide

  39. View Slide

  40. Download
    Experiment
    Tooling

    View Slide

  41. Download
    Production
    Experiment
    Tooling

    View Slide

  42. aka.ms/azure-postgres

    View Slide

  43. Download
    Production
    Scale
    Experiment
    Tooling

    View Slide

  44. Partitioning
    Partial indexes
    Scaling out with
    Hyperscale (Citus)

    View Slide

  45. Download
    Production
    Scale
    Experiment
    Tooling

    View Slide

  46. View Slide

  47. EASY TO GET
    STARTED
    GOOD TO
    PLAY WITH
    USEFUL FOR
    EVERY STAGE OF
    APP LIFECYCLE
    DEVELOPE
    RS O IT
    CREATE CLIENT
    LIBRARIES,
    TOOLS,
    EXTENSIONS
    WHOLE ECOSYSTEM
    GETS BETTER & BETTER

    View Slide

  48. How does the Citus extension to
    Postgres work?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  49. APPLICATION
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    A Citus cluster consists of multiple PostgreSQL servers with the Citus extension.

    View Slide

  50. SELECT create_distributed_table(
    'table_name',
    'distribution_column');
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  51. APPLICATION
    CREATE TABLE campaigns (…);
    SELECT create_distributed_table(
    'campaigns','company_id');
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    CREATE TABLE
    campaigns_102
    CREATE TABLE
    campaigns_105
    CREATE TABLE
    campaigns_101
    CREATE TABLE
    campaigns_104
    CREATE TABLE
    campaigns_103
    CREATE TABLE
    campaigns_106
    How Citus distributes tables across the database cluster

    View Slide

  52. APPLICATION
    SELECT
    FROM
    GROUP BY
    company_id,
    avg(spend) AS avg_campaign_spend
    campaigns
    company_id;
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2009 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2001 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2017 …
    How Citus distributes queries across the database cluster

    View Slide

  53. • JSONB
    • Joins
    • Functions
    • Constraints
    • Indexes: B-tree, GIN,
    BRIN, & GiST
    • Partial Indexes
    • Other extensions
    • PostGIS
    • Rich datatypes
    • Foreign data wrappers
    • Window functions
    • CTEs
    • Full text search
    • pg_stat_statements
    All the functions of Postgres available to Citus cluster

    View Slide

  54. How Citus handles transactions in a multi-node cluster
    BEGIN;
    UPDATE
    SET
    WHERE
    COMMIT;
    campaigns
    start_date = '2018-03-17'
    company_id = 'Pat Co';
    METADATA
    W1
    W2
    W3 …
    Wn
    BEGIN; UPDATE
    Campaigns_2012
    SET …;
    COMMIT;
    APPLICATION
    COORDINATOR
    NODE
    WORKER NODES

    View Slide

  55. What does HTAP stand for?
    Why are we even talking about HTAP
    databases today?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  56. Hybrid
    Transactional
    Analytical
    Processing
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  57. OLTP =
    Online Transactional Processing
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  58. OLAP =
    Online Analytical Processing
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  59. Data
    sources
    Storage
    Transactional
    database
    Batch
    processing
    Queue
    Stream
    processing
    Analytical
    data
    store
    Cache
    Analytics and
    reporting

    View Slide

  60. View Slide

  61. Headaches
    managing
    separate
    databases
    Operational costs (!!!)
    Glue code
    Lag between events & analytics
    Updates & deletes
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  62. The promise of HTAP
    is that there is finally
    a database that can
    do both transactions
    & analytics—at
    scale.”
    —Marco Slot

    View Slide

  63. Can you tell us a bit about what you
    will demo today?
    What’s the anatomy of the demo?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  64. Order Processing
    System for
    Warehouses

    View Slide

  65. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  66. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  67. A bit about HammerDB (it’s NOT a database)
    hammerdb.com

    View Slide

  68. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  69. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  70. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) vs. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  71. What you will see in today’s HTAP database demo
    All running on Azure
    Side-by-side performance compare: Hyperscale (Citus) v. single node
    Millisecond analytics queries with rollups
    Retail: Order processing system for warehouses (using HammerDB)

    View Slide

  72. Demo: HTAP Database with Hyperscale (Citus)
    Marco Slot
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  73. Will all apps see the performance
    increase you just showed us?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  74. View Slide

  75. It’s important to
    find a good
    distribution column,
    something that is
    common to all large
    tables
    SELECT create_distributed_table(
    'table_name',
    'distribution_column');
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  76. At the end of the demo, you called
    Citus an “almost anything” database.
    What did you mean?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  77. As an extensible, relational database, Postgres is
    capable of so many things on a single server…

    View Slide

  78. By transforming Postgres
    into a distributed
    database, Hyperscale
    (Citus) makes Postgres
    capable of
    almost anything

    View Slide

  79. How best to get started with
    Hyperscale (Citus)?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  80. Download
    Citus open
    source
    packages
    aka.ms/citus

    View Slide

  81. Multi-tenant
    (SaaS)
    tutorial
    aka.ms/hyperscale-citus-multi-tenant-tutorial

    View Slide

  82. Tutorial:
    Real-time
    analytics
    dashboard
    aka.ms/hyperscale-citus-real-time-tutorial

    View Slide

  83. Do you have a favorite blog
    post?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  84. Architecting petabyte-scale analytics by scaling out
    Postgres on Azure with the Citus extension
    aka.ms/blog-petabyte-scale-analytics

    View Slide

  85. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
    Petabyte-scale service architecture used by Windows

    View Slide

  86. Min Wei, Principal Engineer at Microsoft
    Distributed PostgreSQL
    is a game changer."
    source:
    https://aka.ms/blog-petabyte-scale-analytics

    View Slide

  87. Please don’t forget the
    Citus newsletter
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  88. Citus Newsletter
    aka.ms/citus-newsletter

    View Slide

  89. So what should we take away
    from your HTAP demo?
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  90. HTAP database
    H·tap da·ta·base /eɪtʃ tæp ˈdeɪtəˌbeɪs/
    noun
    1. A database that can handle both transactions and analytics—
    at scale—obviating the need to manage and use multiple
    platforms to handle different aspects of your application.
    2. A database that saves you time, reduces operational costs,
    and gets rid of many headaches.
    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata

    View Slide

  91. Hyperscale (Citus) enables
    you to parallelize both your
    queries and your
    transactions across a cluster,
    while still keeping all the
    powerful Postgres features
    and rich ecosystem.

    View Slide

  92. © Copyright Microsoft Corporation. All rights reserved.
    thank you
    merci
    grazie
    ध"यवाद
    ευχαριστώ
    gracias
    ధన#$ా&'ల)
    谢谢
    dank u
    merci
    danke schön
    thank you
    ευχαριστώ
    grazie
    gracias
    tack
    teşekkür ederim
    Claire Giordano &
    @clairegiordano
    @AzureDBPostgres
    @marcoslot
    @citusdata
    Marco Slot

    View Slide

  93. Any questions?

    View Slide