Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Distributed Geospatial Time Series Database | pgDay Paris 2018 | Marco Slot

A Distributed Geospatial Time Series Database | pgDay Paris 2018 | Marco Slot

Citus Data

March 15, 2018
Tweet

More Decks by Citus Data

Other Decks in Technology

Transcript

  1. Postgres has many third-party extensions: - PostGIS: Geospatial data -

    pg_partman: Auto-partitioning - pg_cron: Periodic jobs - Citus: Sharding - cstore_fdw: Columnar storage - mysql_fdw, oracle_fdw, tds_fdw: Query other database - … Postgres extensions 2 Marco Slot | Citus Data | pgDay Paris 2018
  2. PostgreSQL 10 + PostGIS + pg_partman + Citus = ...

    3 Marco Slot | Citus Data | pgDay Paris 2018
  3. PostgreSQL 10 + PostGIS + pg_partman + Citus = A

    pretty good database for real-time analytics on large volumes of geospatial sensor data... 5 Marco Slot | Citus Data | pgDay Paris 2018
  4. Geospatial queries using PostGIS 6 CREATE TABLE trips ( …

    pickup geometry(Point,4326), dropoff geometry(Point,4326) ); Query next Neighbourhoods WHERE ST_Within(pickup, …) Marco Slot | Citus Data | pgDay Paris 2018
  5. Auto-partitioning using pg_partman 7 Disk Drop old data really fast

    Smaller indexes, faster insertion Faster SELECTs on recent data SELECT create_parent('trips', …) Optimise your database for time series data Marco Slot | Citus Data | pgDay Paris 2018
  6. Sharding using Citus 8 Nodes Distribute and parallelise all the

    things Always enough memory, CPU, storage, I/O throughput COPY with ingest parallelism Parallel rollups using INSERT...SELECT SELECT create_distributed_table('trips', …) Parallel SELECT Marco Slot | Citus Data | pgDay Paris 2018
  7. Shard by ID, Partition by time 9 Partitioning Disk x

    = Nodes Sharding Distributed time series database Marco Slot | Citus Data | pgDay Paris 2018
  8. Taxi data from: https://github.com/toddwschneider/nyc-taxi-data/ Blog post: “Analyzing 1.1 Billion NYC

    Taxi and Uber Trips” Citus Cloud formation with 4 nodes, each 60GB of memory and 4 cores. A distributed geospatial time series database 10 Marco Slot | Citus Data | pgDay Paris 2018