Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Postgres for Time Series Data with the ...

Citus Data
November 14, 2018

Scaling Postgres for Time Series Data with the Citus Database | A Citus Conversation | Claire Giordano & Marco Slot

Claire Giordano interviewed Marco Slot, Principal Engineer at Citus Data, and they explored how you can use Postgres and Citus to scale your time series data.

If you're working with time series data, you are likely dealing with large volumes—and you might not be able to control the amount of data people are sending you. Especially if you need to do fairly advanced analytics, join us to learn:

-- How you can use PostgreSQL extensions such as pg_partman and Citus to scale out Postgres for time series data
-- How pg_partman does auto partitioning

Citus is an extension to Postgres that transforms Postgres into a distributed database—popular among SaaS developers building multi-tenant apps and teams building real-time analytics dashboards that require sub-second latency. What you might not know is that Citus is a good fit for time series data, especially in combination with new Postgres extensions such as pg_partman.

Citus Data

November 14, 2018
Tweet

More Decks by Citus Data

Other Decks in Programming

Transcript

  1. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus Marco Slot & Claire Giordano A Citus Conversation | Nov 15 2018
  2. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Marco Slot PhD Distributed Systems Netherlands ex-Amazon Lead engineer on Citus Chocolate chip cookies Windy Beach at Den Helder, Netherlands
  3. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Courtesy of pgDay Paris 2018. Marco Slot’s lightning talk on YouTube: https://youtu.be/_OT9l5N3lXs
  4. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Train from Amsterdam to Paris Courtesy of OpenStreetMap
  5. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
  6. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Q:What is a Postgres extension?
  7. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Q:What is time series data? What is does it look like?
  8. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
  9. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
  10. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Q:Why talking about time series data? Why is it so special?
  11. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Q:What are the challenges for databases when it comes to handling time series data?
  12. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Q:Are all databases good with time series data?
  13. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Why Postgres? TLDR; Open source Constraints Extensions PostGIS / Geospatial HLL, TopN, Citus Foreign data wrappers Rich SQL CTEs Window functions Full text search Datatypes JSONB
  14. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Examples of PostgreSQL extensions PostGIS Geospatial data pg_partman Auto partitioning Citus Sharding / distributing / parallelizing pg_cron Periodic jobs cstore_fdw Columnar storage timescale Auto partitioning mysql_fdw, oracle_fdw, tds_fdw Query other databases
  15. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano github.com/pgpartman/pg_partman
  16. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Auto-partitioning using pg_partman in Postgres Disk Drop old data really fast Smaller indexes, faster insertion Faster SELECTs on recent data SELECT create_parent('trips', …) Optimise Postgres for time series data
  17. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
  18. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Sharding using Citus—transforming Postgres into a distributed database Nodes Always enough memory, CPU, storage, I/O throughput COPY with ingest parallelism Parallel rollups using INSERT...SELECT SELECT create_distributed_table('trips', …) Parallel SELECT
  19. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Shard by ID (Citus) + Partition by time (pg_partman) Partitioning (pg_partman) Disk x = Nodes Sharding (Citus) Distributed Time Series Database
  20. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Q: Can you show me how it works, this combination of Postgres, and pg_partman, and Citus?
  21. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano
  22. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano citusdata.com/customers/mixrank
  23. Scaling Postgres for Time Series Data with Citus | Nov

    15 2018 | Marco Slot | Claire Giordano Scaling Postgres for Time Series Data with Citus | Nov 15 2018 | Marco Slot | Claire Giordano Min Wei of Microsoft