Upgrade to Pro — share decks privately, control downloads, hide ads and more …

High Performance HTAP with Postgres & Hyperscal...

High Performance HTAP with Postgres & Hyperscale (Citus) | European Virtual Open Source Summit 2020 | Marco Slot & Claire Giordano

Did you know you can scale out Postgres horizontally, transforming Postgres into a distributed database that can do almost anything - including support demanding HTAP applications? Join Marco Slot - principal engineer and lead for the Citus open source project at Microsoft and Citus open source advocate Claire Giordano - for a conversation about scaling out Postgres to serve HTAP workloads on Azure, with Hyperscale (Citus).

Citus is an open source extension to Postgres that changes the game for HTAP (hybrid transactional analytical processing) workloads. Because Citus is an extension to Postgres, Citus gives you all the transactional and indexing capabilities of Postgres. And because Citus distributes both your data and your queries across multiple servers, Citus gives you low-latency analytics even in the face of high concurrency and billions of rows of data.

In addition to the interview, Marco will also demo Hyperscale (Citus) on Azure Database for PostgreSQL, to shine a light on the extreme performance you can get when you scale out Postgres horizontally on Azure. Come learn why the combination of relational database semantics and parallel distributed operations in Hyperscale (Citus) can give your HTAP application such a high-performing and unique competitive edge.

More Decks by Azure Database for PostgreSQL

Other Decks in Technology

Transcript

  1. High performance HTAP with Postgres & Hyperscale (Citus) Claire Giordano

    Principal PM Manager, Postgres team at Microsoft | Marco Slot Principal Engineer & Lead, Citus open source project @clairegiordano / @marcoslot @azuredbpostgres / @citusdata
  2. Agenda for today’s Interview with Marco Slot Postgres database Citus

    open source extension to Postgres What is HTAP, followed by Demo Interview throughout & Q&A at the end @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  3. #1 – What an HTAP database is #2 – Why

    Hyperscale (Citus) is so good for HTAP
  4. Vrije Universiteit Amsterdam— Masters degree in Distributed Systems source: Facebook,

    Vrije Universiteit Amsterdam. https://www.facebook.com/vuamsterdam/photos/ a.352972296481/10156489650536482/
  5. ARE YOU READY TO SCALE OUT POSTGRES? BUT DON’T HAVE

    A PHD IN DISTRIBUTED SYSTEMS? Is your Django/Rails/Spring app using Postgres? Re-architecting database infrastructure to support growth can be painful. Which is why we created a database that extends Postgres and enables you to scale out—while continuing to use your familiar SQL toolset and leverage your existing Postgres expertise. Is your Postgres database bigger than 50 GB+? Many SaaS apps outgrow a single-node database at 50 GB, 100 GB, or 500 GB. At some point, single-node Postgres may no longer give you the performance you need. So we designed a Postgres database that scales out. analytics for billions of events? f events per day tell us the es, and
  6. Postgres is more popular than ever 30 years of data

    integrity & robustness built in Ranked 2017 & 2018 DBMS of the Year by DB-Engines One of most loved & wanted databases in Stack Overflow 2019 Developer Survey
  7. We are still dining out on the radical design decision

    by Michael Stonebraker in the design of Postgres—to have run-time extensions.” —Paul Ramsey, PGConfEU 2018 “
  8. Why do you think Postgres is so popular among developers?

    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  9. Postgres is there for you at every stage of your

    application lifecycle.” —Marco Slot “
  10. EASY TO GET STARTED GOOD TO PLAY WITH USEFUL FOR

    EVERY STAGE OF APP LIFECYCLE DEVELOPE RS O IT CREATE CLIENT LIBRARIES, TOOLS, EXTENSIONS WHOLE ECOSYSTEM GETS BETTER & BETTER
  11. How does the Citus extension to Postgres work? @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  12. APPLICATION COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn

    A Citus cluster consists of multiple PostgreSQL servers with the Citus extension.
  13. APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns','company_id'); METADATA COORDINATOR

    NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster
  14. APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns

    company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … How Citus distributes queries across the database cluster
  15. • JSONB • Joins • Functions • Constraints • Indexes:

    B-tree, GIN, BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • Foreign data wrappers • Window functions • CTEs • Full text search • pg_stat_statements All the functions of Postgres available to Citus cluster
  16. How Citus handles transactions in a multi-node cluster BEGIN; UPDATE

    SET WHERE COMMIT; campaigns start_date = '2018-03-17' company_id = 'Pat Co'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE Campaigns_2012 SET …; COMMIT; APPLICATION COORDINATOR NODE WORKER NODES
  17. What does HTAP stand for? Why are we even talking

    about HTAP databases today? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  18. Headaches managing separate databases Operational costs (!!!) Glue code Lag

    between events & analytics Updates & deletes @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  19. The promise of HTAP is that there is finally a

    database that can do both transactions & analytics—at scale.” —Marco Slot “
  20. Can you tell us a bit about what you will

    demo today? What’s the anatomy of the demo? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  21. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  22. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  23. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  24. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  25. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) vs. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  26. What you will see in today’s HTAP database demo All

    running on Azure Side-by-side performance compare: Hyperscale (Citus) v. single node Millisecond analytics queries with rollups Retail: Order processing system for warehouses (using HammerDB)
  27. Will all apps see the performance increase you just showed

    us? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  28. It’s important to find a good distribution column, something that

    is common to all large tables SELECT create_distributed_table( 'table_name', 'distribution_column'); @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  29. At the end of the demo, you called Citus an

    “almost anything” database. What did you mean? @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  30. How best to get started with Hyperscale (Citus)? @clairegiordano /

    @marcoslot / @azuredbpostgres / @citusdata
  31. Architecting petabyte-scale analytics by scaling out Postgres on Azure with

    the Citus extension aka.ms/blog-petabyte-scale-analytics
  32. Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a

    game changer." source: https://aka.ms/blog-petabyte-scale-analytics
  33. So what should we take away from your HTAP demo?

    @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  34. HTAP database H·tap da·ta·base /eɪtʃ tæp ˈdeɪtəˌbeɪs/ noun 1. A

    database that can handle both transactions and analytics— at scale—obviating the need to manage and use multiple platforms to handle different aspects of your application. 2. A database that saves you time, reduces operational costs, and gets rid of many headaches. @clairegiordano / @marcoslot / @azuredbpostgres / @citusdata
  35. Hyperscale (Citus) enables you to parallelize both your queries and

    your transactions across a cluster, while still keeping all the powerful Postgres features and rich ecosystem.
  36. © Copyright Microsoft Corporation. All rights reserved. thank you merci

    grazie ध"यवाद ευχαριστώ gracias ధన#$ా&'ల) 谢谢 dank u merci danke schön thank you ευχαριστώ grazie gracias tack teşekkür ederim Claire Giordano & @clairegiordano @AzureDBPostgres @marcoslot @citusdata Marco Slot