Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Citus Hyperscale Overview | CI&T Data Week | Leandro Santana

Citus Hyperscale Overview | CI&T Data Week | Leandro Santana

In this talk, Leandro explains how to scale Postgres on Azure by using the Citus extension. Citus is a powerful extension to transform PostgreSQL into a distributed database. He covers the shard rebalancer, columnar storage, and much more.

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

July 23, 2021
Tweet

Transcript

  1. © Copyright Microsoft Corporation. All rights reserved. Leandro Santana linkedin.com/in/leandroeredia/

    @citusdata @AzureDBPostgres Citus Hyperscale Overview
  2. None
  3. Postgres

  4. PostgreSQL is more popular than ever PostgreSQL at core of

    many digital transformations Open source Proven resilience & stability Rich feature set enterprise-ready • Zero data loss • Rich indexing and data types, including geospatial • Low TCO. No Lock-in • Innovate faster with broad ecosystem
  5. Single Server Fully-managed, single-node PostgreSQL Hyperscale (Citus) High-performance Postgres for

    scale out Enterprise-ready, fully managed community PostgreSQL with built-in HA and multi-layered security
  6. Monitor Azure Database for PostgreSQL with Azure Monitor Azure Database

    for PostgreSQL tracks performance metrics and logs telemetry data Collect, analyze, and take action on telemetry data gathered from your database using Azure Monitor Better understand your apps with deep insights into app and database behavior, view alerts, and build remediation plans Azure Monitor Visualize Analyze Respond Integrate Insights
  7. Migrating to Azure Database for PostgreSQL Opportunity DBAs spending time

    maintaining their databases, not focused on their apps On premises PostgreSQL VM PostgreSQL Other Managed Services Azure Database Migration Services Azure Database for PostgreSQL Solution Azure Database for PostgreSQL takes care of patching, high availability, and performance management Provides added benefits of comprehensive, intelligent security and compliance, including threat alerts
  8. Migration from Oracle to Azure Database for PostgreSQL Single instance

    of Oracle Opportunity Oracle license cost is expensive and additional features (such as partitioning and high availability) come at added cost Solution Migrating to Azure Database for PostgreSQL removes need for licenses as the customer can take advantage of open source PostgreSQL Azure Database for PostgreSQL Azure Database Migration Services
  9. Citus extension to Postgres

  10. What is Citus? • Distributed tables • Reference tables •

    & more, as of Citus 10 Extension to Postgres (not a fork!) • Add nodes • Rebalance Simplicity & flexibility of using PostgreSQL, at scale • Scale transactional workloads • Scale analytical workloads • Mixed workloads too Multi-purpose:
  11. aka.ms/citus

  12. Why

  13. Citus

  14. • Joins • Functions • Constraints • Indexes: B-tree, GIN,

    BRIN, & GiST • Partial Indexes • Other extensions • PostGIS • Rich datatypes • JSONB • Window functions • CTEs • Atomic update / delete • Partitioning • Interactive transactions • Open source • … Citus
  15. Workloads that benefit from PostgreSQL CITUS WORKLOADS Workloads that benefit

    from scaling out
  16. Multi-tenant SaaS Time series Analytics dashboards Mixed OLTP/OLAP

  17. Citus Coordinator Hyperscale (Citus) Coordinator Hyperscale (Citus) Workers Hyperscale (Citus)

    Coordinator Standard Tier in Hyperscale (Citus) Basic Tier
  18. SELECT create_distributed_table( 'table_name', 'distribution_column');

  19. Citus Coordinator Choosing the distribution column for Multi-Tenant scenarios SELECT

    create_distributed_table('companies', 'id’); SELECT create_distributed_table('campaigns', 'company_id’); SELECT create_distributed_table('ads', 'company_id’); SELECT create_distributed_table('clicks', 'company_id’); SELECT create_distributed_table('impressions', 'company_id');
  20. Citus Coordinator Choosing the distribution column for Real-Time scenarios SELECT

    create_distributed_table('github_users', 'user_id’); SELECT create_distributed_table('github_events', 'user_id');
  21. Rest of the tables can be reference tables or local

    table SELECT create_reference_table( 'table_name')
  22. APPLICATION CREATE TABLE campaigns (…); SELECT create_distributed_table( 'campaigns','company_id'); METADATA COORDINATOR

    NODE WORKER NODES W1 W2 W3 … Wn CREATE TABLE campaigns_102 CREATE TABLE campaigns_105 CREATE TABLE campaigns_101 CREATE TABLE campaigns_104 CREATE TABLE campaigns_103 CREATE TABLE campaigns_106 How Citus distributes tables across the database cluster
  23. APPLICATION SELECT FROM GROUP BY company_id, avg(spend) AS avg_campaign_spend campaigns

    company_id; METADATA COORDINATOR NODE WORKER NODES W1 W2 W3 … Wn SELECT company_id sum(spend), count(spend) … FROM campaigns_2009 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2001 … SELECT company_id sum(spend), count(spend) … FROM campaigns_2017 … Query across distribution columns.
  24. Transactions routed to a single node BEGIN; UPDATE SET WHERE

    COMMIT; campaigns start_date = '2018-03-17' company_id = 'Pat Co'; METADATA W1 W2 W3 … Wn BEGIN; UPDATE Campaigns_2012 SET …; COMMIT; APPLICATION COORDINATOR NODE WORKER NODES
  25. Node A Node B Node C 1 2 4 5

    3 6
  26. Node A Node B 1 2 4 5 3 6

    Node C Node A Node B Node C 1 2 4 5 3 6
  27. Node A Node B 1 2 3 4

  28. 1 4 3 2 Node A Node B

  29. 10M+ queries per day; 75% https://techcommunity.microsoft.com/t5/Azure-Database-for- PostgreSQL/Architecting-petabyte-scale-analytics-by-scaling-out-Postgres-on/ba- p/969685

  30. Min Wei, Principal Engineer at Microsoft Distributed PostgreSQL is a

    game changer." aka.ms/blog-petabyte-scale-analytics
  31. Storage efficiency IOPs Query speed Cost Savings Benefits of Citus

    Columnar
  32. Row-based Storage Columnar storage

  33. CREATE TABLE events_columnar( ts timestamptz, i int, n numeric, s

    text) USING columnar; It’s as simple as “USING columnar;”
  34. Can change access method from heap à columnar SELECT alter_table_set_access_method(

    'events_2021_jan', 'columnar');
  35. Citus Columnar && Range Partitioning in Postgres CREATE TABLE events(

    ts timestamptz, i int, n numeric, s text) PARTITION BY RANGE (ts); CREATE TABLE events_2021_jan PARTITION OF events FOR VALUES FROM ('2021-01-01') TO ('2021-02-01'); CREATE TABLE events_2021_feb PARTITION OF events FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');
  36. events table …

  37. events table …

  38. events table

  39. None
  40. Multi-tenant (SaaS) tutorial aka.ms/hyperscale-citus-multi-tenant-tutorial

  41. Tutorial: Real-time analytics dashboard aka.ms/hyperscale-citus-real-time-tutorial

  42. © Copyright Microsoft Corporation. All rights reserved. Leandro Santana linkedin.com/in/leandroeredia/

    @citusdata @AzureDBPostgres Thank you!