Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Postgres for Time Series Data with the Citus Database | A Citus Conversation | Claire Giordano & Marco Slot

Citus Data
November 14, 2018

Scaling Postgres for Time Series Data with the Citus Database | A Citus Conversation | Claire Giordano & Marco Slot

Claire Giordano interviewed Marco Slot, Principal Engineer at Citus Data, and they explored how you can use Postgres and Citus to scale your time series data.

If you're working with time series data, you are likely dealing with large volumes—and you might not be able to control the amount of data people are sending you. Especially if you need to do fairly advanced analytics, join us to learn:

-- How you can use PostgreSQL extensions such as pg_partman and Citus to scale out Postgres for time series data
-- How pg_partman does auto partitioning

Citus is an extension to Postgres that transforms Postgres into a distributed database—popular among SaaS developers building multi-tenant apps and teams building real-time analytics dashboards that require sub-second latency. What you might not know is that Citus is a good fit for time series data, especially in combination with new Postgres extensions such as pg_partman.

Citus Data

November 14, 2018
Tweet

More Decks by Citus Data

Other Decks in Programming

Transcript

  1. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres
    for Time Series
    Data with Citus
    Marco Slot & Claire Giordano
    A Citus Conversation | Nov 15 2018

    View Slide

  2. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Marco Slot
    PhD Distributed Systems
    Netherlands
    ex-Amazon
    Lead engineer on Citus
    Chocolate chip cookies
    Windy Beach at Den Helder, Netherlands

    View Slide

  3. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Courtesy of pgDay Paris 2018. Marco Slot’s lightning talk on YouTube: https://youtu.be/_OT9l5N3lXs

    View Slide

  4. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Train from Amsterdam to Paris
    Courtesy of OpenStreetMap

    View Slide

  5. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano

    View Slide

  6. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Q:What is a Postgres
    extension?

    View Slide

  7. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Q:What is time series
    data?
    What is does it look
    like?

    View Slide

  8. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano

    View Slide

  9. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano

    View Slide

  10. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Q:Why talking about time
    series data?
    Why is it so special?

    View Slide

  11. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Q:What are the challenges
    for databases when it
    comes to handling time
    series data?

    View Slide

  12. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Q:Are all databases good
    with time series data?

    View Slide

  13. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Why Postgres? TLDR;
    Open source
    Constraints
    Extensions
    PostGIS / Geospatial
    HLL, TopN, Citus
    Foreign data wrappers
    Rich SQL
    CTEs
    Window functions
    Full text search
    Datatypes
    JSONB

    View Slide

  14. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Examples of PostgreSQL extensions
    PostGIS Geospatial data
    pg_partman Auto partitioning
    Citus
    Sharding / distributing /
    parallelizing
    pg_cron Periodic jobs
    cstore_fdw Columnar storage
    timescale Auto partitioning
    mysql_fdw, oracle_fdw, tds_fdw Query other databases

    View Slide

  15. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    github.com/pgpartman/pg_partman

    View Slide

  16. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Auto-partitioning using pg_partman in Postgres
    Disk
    Drop old data really fast
    Smaller indexes,
    faster insertion
    Faster SELECTs on
    recent data
    SELECT create_parent('trips', …)
    Optimise Postgres for time series data

    View Slide

  17. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano

    View Slide

  18. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Sharding using Citus—transforming Postgres
    into a distributed database
    Nodes
    Always enough memory,
    CPU, storage, I/O throughput
    COPY with
    ingest parallelism
    Parallel rollups
    using INSERT...SELECT
    SELECT create_distributed_table('trips', …)
    Parallel
    SELECT

    View Slide

  19. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Shard by ID (Citus) + Partition by time (pg_partman)
    Partitioning
    (pg_partman)
    Disk x =
    Nodes
    Sharding
    (Citus)
    Distributed Time
    Series Database

    View Slide

  20. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Q: Can you show me how it
    works, this combination
    of Postgres, and
    pg_partman, and Citus?

    View Slide

  21. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano

    View Slide

  22. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    citusdata.com/customers/mixrank

    View Slide

  23. Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
    Min Wei of
    Microsoft

    View Slide

  24. @citusdata @clairegiordano
    https://slack.citusdata.com
    Thank you for your time!
    @marcoslot

    View Slide