A Distributed Geospatial Time Series Database | pgDay Paris 2018 | Marco Slot

A Distributed Geospatial Time Series Database | pgDay Paris 2018 | Marco Slot

024d6a0dd14fb31c804969a57a06dfbe?s=128

Citus Data

March 15, 2018
Tweet

Transcript

  1. 2.

    Postgres has many third-party extensions: - PostGIS: Geospatial data -

    pg_partman: Auto-partitioning - pg_cron: Periodic jobs - Citus: Sharding - cstore_fdw: Columnar storage - mysql_fdw, oracle_fdw, tds_fdw: Query other database - … Postgres extensions 2 Marco Slot | Citus Data | pgDay Paris 2018
  2. 3.

    PostgreSQL 10 + PostGIS + pg_partman + Citus = ...

    3 Marco Slot | Citus Data | pgDay Paris 2018
  3. 5.

    PostgreSQL 10 + PostGIS + pg_partman + Citus = A

    pretty good database for real-time analytics on large volumes of geospatial sensor data... 5 Marco Slot | Citus Data | pgDay Paris 2018
  4. 6.

    Geospatial queries using PostGIS 6 CREATE TABLE trips ( …

    pickup geometry(Point,4326), dropoff geometry(Point,4326) ); Query next Neighbourhoods WHERE ST_Within(pickup, …) Marco Slot | Citus Data | pgDay Paris 2018
  5. 7.

    Auto-partitioning using pg_partman 7 Disk Drop old data really fast

    Smaller indexes, faster insertion Faster SELECTs on recent data SELECT create_parent('trips', …) Optimise your database for time series data Marco Slot | Citus Data | pgDay Paris 2018
  6. 8.

    Sharding using Citus 8 Nodes Distribute and parallelise all the

    things Always enough memory, CPU, storage, I/O throughput COPY with ingest parallelism Parallel rollups using INSERT...SELECT SELECT create_distributed_table('trips', …) Parallel SELECT Marco Slot | Citus Data | pgDay Paris 2018
  7. 9.

    Shard by ID, Partition by time 9 Partitioning Disk x

    = Nodes Sharding Distributed time series database Marco Slot | Citus Data | pgDay Paris 2018
  8. 10.

    Taxi data from: https://github.com/toddwschneider/nyc-taxi-data/ Blog post: “Analyzing 1.1 Billion NYC

    Taxi and Uber Trips” Citus Cloud formation with 4 nodes, each 60GB of memory and 4 cores. A distributed geospatial time series database 10 Marco Slot | Citus Data | pgDay Paris 2018