Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Citus 10 Open Source & Columnar Storage for Postgres | contributing today | Claire Giordano & Nils Dijk

Citus 10 Open Source & Columnar Storage for Postgres | contributing today | Claire Giordano & Nils Dijk

Citus 10 is out! A spectacular new release from our Citus open source team. Citus 10 gives you columnar storage for Postgres, Citus on a single node—plus, we’ve open sourced the shard rebalancer. Come see a demo & learn how the Citus extension gives you Postgres at any scale, from a single node to a distributed cluster. And how easy it is to give Citus a try.

Citus Data

March 17, 2021
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. View Slide

  2. aka.ms/citus

    View Slide

  3. What is Citus?
    • Distributed tables
    • Reference tables
    • & more, as of Citus 10
    Extension to Postgres (not a fork!)
    • Add nodes
    • Rebalance
    Simplicity & flexibility of using PostgreSQL, at scale
    • Scale transactional workloads
    • Scale analytical workloads
    • Mixed workloads too
    Multi-purpose:

    View Slide

  4. Why

    View Slide

  5. View Slide

  6. planner, executor,
    transactions
    Background workers
    foreign data wrappers
    published in 1986

    View Slide

  7. Why be an extension to
    Postgres (and not a fork?)
    Vast ecosystem

    View Slide

  8. Developers ❤ Postgres

    View Slide

  9. Why Citus, Reason #1: Postgres limited to single node
    Capacity / execution time issues:
    § Working set does not fit in memory
    § Reaching limits of network-attached storage (IOPS) / CPU
    § Analytical query takes too long
    § Data transformations are single-threaded (e.g. insert..select)
    § Autovacuum cannot keep up with transactional workload
    § …

    View Slide

  10. • Joins
    • Functions
    • Constraints
    • Indexes: B-tree, GIN,
    BRIN, & GiST
    • Partial Indexes
    • Other extensions
    • PostGIS
    • Rich datatypes
    • JSONB
    • Window functions
    • CTEs
    • Atomic update / delete
    • Partitioning
    • Interactive transactions
    • Open source
    • …
    Why Citus, Reason #2: Because Postgres includes:

    View Slide

  11. COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    A Citus cluster consists of multiple Postgres nodes with the Citus extension.
    CREATE EXTENSION citus;
    SELECT citus_add_node(…);
    SELECT citus_add_node(…);
    SELECT citus_add_node(…);
    CREATE EXTENSION citus;
    CREATE EXTENSION citus;
    CREATE EXTENSION citus;

    View Slide

  12. APPLICATION
    CREATE TABLE campaigns (…);
    SELECT create_distributed_table(
    'campaigns', 'company_id');
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    CREATE TABLE
    campaigns_102
    CREATE TABLE
    campaigns_105
    CREATE TABLE
    campaigns_101
    CREATE TABLE
    campaigns_104
    CREATE TABLE
    campaigns_103
    CREATE TABLE
    campaigns_106
    How Citus distributes tables across the database cluster

    View Slide

  13. APPLICATION
    SELECT
    FROM
    GROUP BY
    campaign_id,
    avg(spend) AS avg_campaign_spend
    campaigns
    campaign_id;
    METADATA
    COORDINATOR
    NODE
    WORKER NODES
    W1
    W2
    W3 …
    Wn
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_102 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_101 …
    SELECT company_id
    sum(spend),
    count(spend) …
    FROM
    campaigns_103 …
    How Citus distributes queries across the database cluster

    View Slide

  14. easy
    # run PostgreSQL with Citus on port 5500
    docker run
    = citusdata/citus

    View Slide

  15. CREATE TABLE users(
    id bigserial primary key,
    name text);
    SELECT create_distributed_table(
    'users', 'id’);
    SELECT count(*) FROM users;
    easy

    View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. aka.ms/citus10

    View Slide

  20. Citus
    Coordinator
    Citus
    Workers
    Citus
    Coordinator
    Citus single node Distributed Citus cluster

    View Slide

  21. slack.citusdata.com

    View Slide

  22. Columnar Storage Row-based storage

    View Slide

  23. CREATE TABLE events(
    ts timestamptz, i int,
    n numeric, s text);
    CREATE TABLE events_columnar(
    ts timestamptz, i int,
    n numeric, s text) USING columnar;

    View Slide

  24. Citus Columnar && Range Partitioning in Postgres
    CREATE TABLE events(
    ts timestamptz, i int,
    n numeric, s text)
    PARTITION BY RANGE (ts);
    CREATE TABLE events_2021_jan PARTITION OF events
    FOR VALUES FROM ('2021-01-01') TO ('2021-02-01');
    CREATE TABLE events_2021_feb PARTITION OF events
    FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');

    View Slide

  25. events table

    View Slide

  26. Citus Columnar && Range Partitioning in Postgres
    SELECT alter_table_set_access_method(
    'events_2021_jan', 'columnar');

    View Slide

  27. events table

    View Slide

  28. events table

    View Slide

  29. events table

    View Slide

  30. View Slide

  31. View Slide

  32. In Citus 10, we open sourced Citus Shard Rebalancer

    View Slide

  33. Easy to rebalance shards after adding a new Citus node

    View Slide

  34. What if shards get out-of-balance on existing nodes?

    View Slide

  35. Rebalancing shards to optimize for performance, too

    View Slide

  36. View Slide

  37. Min Wei, Principal Engineer at Microsoft
    Distributed PostgreSQL
    is a game changer."
    aka.ms/blog-petabyte-scale-analytics

    View Slide

  38. aka.ms/azure-portal-postgres
    Try Citus
    on Azure

    View Slide

  39. Citus Newsletter
    aka.ms/citus-newsletter

    View Slide

  40. Questions?
    [email protected]
    [email protected]
    Citus repo on GitHub
    aka.ms/citus
    Citus Public Slack for open source Q&A
    slack.citusdata.com
    Citus Docs
    docs.citusdata.com
    Definitive Citus 10 blog post by Marco
    aka.ms/citus10
    Download Citus open source
    citusdata.com/download/

    View Slide

  41. If need to scale Postgres, learn more about Citus 10
    As of Citus 10,
    now includes
    columnar
    compression
    We’ve open
    sourced the
    shard rebalancer
    too
    & Citus on a
    single node

    View Slide