$30 off During Our Annual Pro Sale. View Details »

Citus Hyperscale Overview | CI&T Data Week | Leandro Santana

Citus Hyperscale Overview | CI&T Data Week | Leandro Santana

In this talk, Leandro explains how to scale Postgres on Azure by using the Citus extension. Citus is a powerful extension to transform PostgreSQL into a distributed database. He covers the shard rebalancer, columnar storage, and much more.

Citus Data

July 23, 2021
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. © Copyright Microsoft Corporation. All rights reserved.
    Leandro Santana
    linkedin.com/in/leandroeredia/
    @citusdata
    @AzureDBPostgres
    Citus Hyperscale Overview

    View Slide

  2. View Slide

  3. Postgres

    View Slide

  4. PostgreSQL is more popular than ever
    PostgreSQL at core of many digital transformations
    Open source Proven resilience & stability Rich feature set
    enterprise-ready
    • Zero data loss • Rich indexing and data types,
    including geospatial
    • Low TCO. No Lock-in
    • Innovate faster with broad
    ecosystem

    View Slide

  5. Single Server
    Fully-managed, single-node
    PostgreSQL
    Hyperscale (Citus)
    High-performance Postgres for
    scale out
    Enterprise-ready, fully
    managed community
    PostgreSQL with built-in HA
    and multi-layered security

    View Slide

  6. Monitor Azure Database for PostgreSQL with Azure Monitor
    Azure Database for PostgreSQL tracks
    performance metrics and logs telemetry data
    Collect, analyze, and take action on telemetry
    data gathered from your database using
    Azure Monitor
    Better understand your apps with deep
    insights into app and database behavior,
    view alerts, and build remediation plans
    Azure Monitor
    Visualize
    Analyze
    Respond
    Integrate Insights

    View Slide

  7. Migrating to Azure Database for PostgreSQL
    Opportunity
    DBAs spending time maintaining their
    databases, not focused on their apps
    On premises PostgreSQL
    VM PostgreSQL
    Other Managed Services
    Azure Database
    Migration Services
    Azure Database
    for PostgreSQL
    Solution
    Azure Database for PostgreSQL takes care
    of patching, high availability, and
    performance management
    Provides added benefits of comprehensive,
    intelligent security and compliance,
    including threat alerts

    View Slide

  8. Migration from Oracle to Azure Database for PostgreSQL
    Single instance
    of Oracle
    Opportunity
    Oracle license cost is expensive and
    additional features (such as partitioning
    and high availability) come at added cost
    Solution
    Migrating to Azure Database for
    PostgreSQL removes need for licenses as
    the customer can take advantage of open
    source PostgreSQL
    Azure Database
    for PostgreSQL
    Azure Database
    Migration Services

    View Slide

  9. Citus
    extension
    to Postgres

    View Slide

  10. What is Citus?
    • Distributed tables
    • Reference tables
    • & more, as of Citus 10
    Extension to Postgres (not a fork!)
    • Add nodes
    • Rebalance
    Simplicity & flexibility of using PostgreSQL, at scale
    • Scale transactional workloads
    • Scale analytical workloads
    • Mixed workloads too
    Multi-purpose:

    View Slide

  11. aka.ms/citus

    View Slide

  12. Why

    View Slide

  13. Citus

    View Slide

  14. • Joins
    • Functions
    • Constraints
    • Indexes: B-tree, GIN,
    BRIN, & GiST
    • Partial Indexes
    • Other extensions
    • PostGIS
    • Rich datatypes
    • JSONB
    • Window functions
    • CTEs
    • Atomic update / delete
    • Partitioning
    • Interactive transactions
    • Open source
    • …
    Citus

    View Slide

  15. Workloads that benefit
    from PostgreSQL
    CITUS WORKLOADS Workloads that benefit
    from scaling out

    View Slide

  16. Multi-tenant
    SaaS
    Time series
    Analytics
    dashboards
    Mixed
    OLTP/OLAP

    View Slide

  17. Citus
    Coordinator
    Hyperscale (Citus)
    Coordinator
    Hyperscale (Citus)
    Workers
    Hyperscale (Citus)
    Coordinator
    Standard Tier in Hyperscale (Citus)
    Basic Tier

    View Slide

  18. SELECT create_distributed_table(
    'table_name',
    'distribution_column');

    View Slide

  19. Citus
    Coordinator
    Choosing the distribution column for Multi-Tenant scenarios
    SELECT create_distributed_table('companies', 'id’);
    SELECT create_distributed_table('campaigns', 'company_id’);
    SELECT create_distributed_table('ads', 'company_id’);
    SELECT create_distributed_table('clicks', 'company_id’);
    SELECT create_distributed_table('impressions', 'company_id');

    View Slide

  20. Citus
    Coordinator
    Choosing the distribution column for Real-Time scenarios
    SELECT create_distributed_table('github_users', 'user_id’);
    SELECT create_distributed_table('github_events', 'user_id');

    View Slide

  21. Rest of the tables
    can be reference
    tables or local table
    SELECT create_reference_table(
    'table_name')

    View Slide

  22. APPLICATION
    CREATE TABLE campaigns (…);
    SELECT create_distributed_table(
    'campaigns','company_id');
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    CREATE TABLE
    campaigns_102
    CREATE TABLE
    campaigns_105
    CREATE TABLE
    campaigns_101
    CREATE TABLE
    campaigns_104
    CREATE TABLE
    campaigns_103
    CREATE TABLE
    campaigns_106
    How Citus distributes tables across the database cluster

    View Slide

  23. APPLICATION
    SELECT
    FROM
    GROUP BY
    company_id,
    avg(spend) AS avg_campaign_spend
    campaigns
    company_id;
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2009 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2001 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_2017 …
    Query across distribution columns.

    View Slide

  24. Transactions routed to a single node
    BEGIN;
    UPDATE
    SET
    WHERE
    COMMIT;
    campaigns
    start_date = '2018-03-17'
    company_id = 'Pat Co';
    METADATA
    W1
    W2
    W3 …
    Wn
    BEGIN; UPDATE
    Campaigns_2012
    SET …;
    COMMIT;
    APPLICATION
    COORDINATOR
    NODE
    WORKER NODES

    View Slide

  25. Node A Node B
    Node C
    1 2 4 5
    3 6

    View Slide

  26. Node A Node B
    1 2 4 5
    3 6
    Node C
    Node A Node B
    Node C
    1 2 4 5
    3 6

    View Slide

  27. Node A Node B
    1 2
    3 4

    View Slide

  28. 1
    4
    3
    2
    Node A Node B

    View Slide

  29. 10M+ queries per day;
    75%
    https://techcommunity.microsoft.com/t5/Azure-Database-for-
    PostgreSQL/Architecting-petabyte-scale-analytics-by-scaling-out-Postgres-on/ba-
    p/969685

    View Slide

  30. Min Wei, Principal Engineer at Microsoft
    Distributed PostgreSQL
    is a game changer."
    aka.ms/blog-petabyte-scale-analytics

    View Slide

  31. Storage efficiency
    IOPs
    Query speed
    Cost Savings
    Benefits
    of
    Citus
    Columnar

    View Slide

  32. Row-based Storage Columnar storage

    View Slide

  33. CREATE TABLE events_columnar(
    ts timestamptz, i int,
    n numeric, s text) USING columnar;
    It’s as simple as “USING columnar;”

    View Slide

  34. Can change access method from heap à columnar
    SELECT alter_table_set_access_method(
    'events_2021_jan', 'columnar');

    View Slide

  35. Citus Columnar && Range Partitioning in Postgres
    CREATE TABLE events(
    ts timestamptz, i int,
    n numeric, s text)
    PARTITION BY RANGE (ts);
    CREATE TABLE events_2021_jan PARTITION OF events
    FOR VALUES FROM ('2021-01-01') TO ('2021-02-01');
    CREATE TABLE events_2021_feb PARTITION OF events
    FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');

    View Slide

  36. events table

    View Slide

  37. events table

    View Slide

  38. events table

    View Slide

  39. View Slide

  40. Multi-tenant
    (SaaS)
    tutorial
    aka.ms/hyperscale-citus-multi-tenant-tutorial

    View Slide

  41. Tutorial:
    Real-time
    analytics
    dashboard
    aka.ms/hyperscale-citus-real-time-tutorial

    View Slide

  42. © Copyright Microsoft Corporation. All rights reserved.
    Leandro Santana
    linkedin.com/in/leandroeredia/
    @citusdata
    @AzureDBPostgres
    Thank you!

    View Slide