Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time series data in a relational database. TimescaleDB and PipelineDB extensions for PostgreSQL

Time series data in a relational database. TimescaleDB and PipelineDB extensions for PostgreSQL

Extensions allow you to stay in the PostgreSQL ecosystem, use the usual means of backup, monitoring and other things, while getting functionality that is specific to temporal data and time series.

Ivan Muratov

June 25, 2019
Tweet

More Decks by Ivan Muratov

Other Decks in Programming

Transcript

  1. Введение Временные ряды (time series) — собранные в разные моменты

    времени данные о характеристиках исследуемого процесса. Временные ряды состоят из двух элементов: 1. временная метка; 2. характеристики, называемые уровнями ряда.
  2. Знакомство TimescaleDB is an open-source time-series database optimized for fast

    ingest and complex queries. https://www.timescale.com https://github.com/timescale/timescaledb PipelineDB is a high-performance extension built to run SQL queries continuously on time-series data. https://www.pipelinedb.com https://github.com/pipelinedb/pipelinedb
  3. Тезисы • Fast ingest and large data volume • Industry-leading

    performance • Highly available • Powered by PostgreSQL • Continuous aggregations • Efficient and sustainable • Joining streams on tables • Runs on PostgreSQL
  4. TimescaleDB — примеры запросов #1 SELECT time_bucket('10 seconds', time) AS

    period, container_id, avg(free_mem) FROM metrics WHERE time > now() - interval '10 minutes' GROUP BY period, container_id ORDER BY period DESC, container_id; period | container_id | avg ------------------------+--------------+--------- 2019-06-24 12:01:00+00 | 16 | 72202 2019-06-24 12:01:00+00 | 73 | 837725 2019-06-24 12:01:00+00 | 96 | 412237 2019-06-24 12:00:50+00 | 16 | 1173393 2019-06-24 12:00:50+00 | 73 | 90104 2019-06-24 12:00:50+00 | 96 | 784596
  5. TimescaleDB — примеры запросов #2 SELECT time_bucket('1 day', time) AS

    day, count(*) AS trucks_exiting, sum(weight) / 1000 AS tonnage FROM vehicles INNER JOIN cities ON cities.name = 'Krasnodar' WHERE ST_Within(last_location, ST_Polygon(cities.geom, 4326)) AND NOT ST_Within(current_location, ST_Polygon(cities.geom, 4326)) GROUP BY day ORDER BY day DESC LIMIT 3; day | trucks_exiting | tonnage ------------------------+----------------+--------- 2019-06-24 00:00:00+00 | 5326 | 113082 2019-06-23 00:00:00+00 | 6051 | 133123 2019-06-22 00:00:00+00 | 3741 | 74820
  6. TimescaleDB — примеры запросов #3 SELECT time_bucket('14 days', c.time) AS

    period, last(c.closing_price, c.time) AS closing_price_btc, last(c.closing_price, c.time) * last(b.closing_price, c.time) filter (WHERE b.currency_code = 'USD') AS closing_price_usd FROM crypto_prices c JOIN btc_prices b ON time_bucket('1 day', c.time) = time_bucket('1 day', b.time) WHERE c.currency_code = 'ETH' GROUP BY period ORDER BY period DESC;
  7. TimescaleDB — основные возможности - Партицирование (hypertables) - Интеграция в

    планировщик и исполнитель запросов - Честный SQL (JOINs, aggregations, window functions, CTEs, secondary indexes) - Time bucketing, histograms, gap filling, interpolation, locf - HA, replication, backup / restore и остальная экосистема
  8. - Prometheus - Telegraf - Grafana - Zabbix - ...

    TimescaleDB — интеграции
  9. TimescaleDB — лицензии и новости Версии: - Open Source (Apache

    2.0) - Community (TSL) - Enterprise (Commercial) - Cloud Последние новости: - Over 1 million downloads in less than 18 months - Timescale has now raised over $31 million to build the foundational database - Timescale and MS Azure join forces to power IoT and time-series workloads
  10. PipelineDB — примеры запросов #1 CREATE CONTINUOUS VIEW v AS

    SELECT url::text, count(*) AS total_count, count(DISTINCT cookie::text) AS uniques, percentile_cont(0.99) WITHIN GROUP (ORDER BY latency::integer) AS p99_latency FROM page_views GROUP BY url; url | total_count | uniques | p99_latency -------------+-------------+---------+------------- /some/url/0 | 633 | 51 | 178 /some/url/1 | 688 | 37 | 139 /some/url/2 | 508 | 88 | 121 /some/url/3 | 848 | 36 | 59 /some/url/4 | 126 | 64 | 159
  11. PipelineDB — примеры запросов #2 CREATE STREAM ab_event_stream ( name

    text, ab_group text, event_type varchar(1), cookie varchar(32) ); CREATE CONTINUOUS VIEW ab_test_monitor AS SELECT name, ab_group, sum(CASE WHEN event_type = 'v' THEN 1 ELSE 0 END) AS view_count, sum(CASE WHEN event_type = 'c' THEN 1 ELSE 0 END) AS conversion_count, count(DISTINCT cookie) AS uniques FROM ab_event_stream GROUP BY name, ab_group;
  12. PipelineDB — примеры запросов #2 (продолжение) INSERT INTO ab_event_stream (name,

    ab_group, event_type, cookie) SELECT round(random() * 2) AS name, round(random() * 4) AS ab_group, (CASE WHEN random() > 0.4 THEN 'v' ELSE 'c' END) AS event_type, md5(random()::text) AS cookie FROM generate_series(0, 100000); SELECT ab_group, uniques FROM ab_test_monitor; SELECT ab_group, view_count * 100 / (conversion_count + view_count) AS conversion_rate FROM ab_test_monitor;
  13. PipelineDB — основные возможности - Потоковая обработка данных в реальном

    времени (streams, continuous views, transforms, aggregates, triggers) - Интеграция в планировщик и исполнитель запросов - Честный SQL (JOINs, aggregations, window functions, CTEs) - Continuous aggregations, time windows, joining streams on tables, поддержка вероятностных структур данных и алгоритмов (Bloom filter, count-min sketch, filtered-space saving top-k, HyperLogLog, T-Digest) - HA, replication, backup / restore и остальная экосистема
  14. PipelineDB — лицензии и новости Версии: - Open Source (Apache

    2.0) - Support subscriptions (Developer, Pro, Enterprise) - Cluster (Commercial) - Stride (real-time analytics-as-a-service API) Последние новости: - PipelineDB is joining Confluent
  15. Использование PipelineDB и TimescaleDB GPS | ГЛОНАСС GPRS Waliot Platform

    Continuous Transform Continuous View Hypertable PostgreSQL Kafka Topic
  16. ПРОФИТ? - SQL + NoSQL (JSON) + Time Series +

    Continuous Computation - Экосистема PostgreSQL - Открытый исходный код - Свободное использование
  17. Полезняшки :) @binakot https://krd.dev https://t.me/krddevdays https://t.me/pgsql https://blog.timescale.com https://www.pipelinedb.com/blog https://github.com/timescale/tsbs https://github.com/timescale/timescaledb-tune

    https://gitlab.com/postgres-ai/postgres-checkup https://github.com/binakot/HighLoad-2019-Siberia-Demo https://github.com/binakot/PostgreSQL-TimescaleDB-PipelineDB