A Distributed Geospatial Time Series Database | pgDay Paris 2018 | Marco Slot

A Distributed Geospatial Time Series Database | pgDay Paris 2018 | Marco Slot

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

March 15, 2018
Tweet

Transcript

  1. A Distributed Geospatial Time series Database Postgres Extension Synergy

  2. Postgres has many third-party extensions: - PostGIS: Geospatial data -

    pg_partman: Auto-partitioning - pg_cron: Periodic jobs - Citus: Sharding - cstore_fdw: Columnar storage - mysql_fdw, oracle_fdw, tds_fdw: Query other database - … Postgres extensions 2 Marco Slot | Citus Data | pgDay Paris 2018
  3. PostgreSQL 10 + PostGIS + pg_partman + Citus = ...

    3 Marco Slot | Citus Data | pgDay Paris 2018
  4. Postgres extensions 4

  5. PostgreSQL 10 + PostGIS + pg_partman + Citus = A

    pretty good database for real-time analytics on large volumes of geospatial sensor data... 5 Marco Slot | Citus Data | pgDay Paris 2018
  6. Geospatial queries using PostGIS 6 CREATE TABLE trips ( …

    pickup geometry(Point,4326), dropoff geometry(Point,4326) ); Query next Neighbourhoods WHERE ST_Within(pickup, …) Marco Slot | Citus Data | pgDay Paris 2018
  7. Auto-partitioning using pg_partman 7 Disk Drop old data really fast

    Smaller indexes, faster insertion Faster SELECTs on recent data SELECT create_parent('trips', …) Optimise your database for time series data Marco Slot | Citus Data | pgDay Paris 2018
  8. Sharding using Citus 8 Nodes Distribute and parallelise all the

    things Always enough memory, CPU, storage, I/O throughput COPY with ingest parallelism Parallel rollups using INSERT...SELECT SELECT create_distributed_table('trips', …) Parallel SELECT Marco Slot | Citus Data | pgDay Paris 2018
  9. Shard by ID, Partition by time 9 Partitioning Disk x

    = Nodes Sharding Distributed time series database Marco Slot | Citus Data | pgDay Paris 2018
  10. Taxi data from: https://github.com/toddwschneider/nyc-taxi-data/ Blog post: “Analyzing 1.1 Billion NYC

    Taxi and Uber Trips” Citus Cloud formation with 4 nodes, each 60GB of memory and 4 cores. A distributed geospatial time series database 10 Marco Slot | Citus Data | pgDay Paris 2018
  11. 11 PostGIS pg_partman Citus

  12. 12 Data loading tool does a lot of pre-processing using

    PostGIS functions
  13. 13 Distributed SELECT scans 30 million rows/sec

  14. marco@citusdata.com Thanks! 14