Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Getting Started with Time Series Data

Sebastian Cohnen
September 04, 2014

Getting Started with Time Series Data

At the NoSQL Matters Conference in Dublin 2014 I gave a talk on getting started with time series data, where I presented some basics and how you can query and model your data using InfluxDB.

Sebastian Cohnen

September 04, 2014
Tweet

More Decks by Sebastian Cohnen

Other Decks in Technology

Transcript

  1. Getting Started with Time Series Data Sebastian Cohnen, @tisba
 stormforger.com,

    @StormForgerApp
 NoSQL Matters 2014, Dublin, Sep. 2014 with InfluxDB
  2. About me • Sebastian Cohnen, M. Sc. • Developer &

    Founder from Cologne, Germany • (Web-) Architectures, Performance & Scalability • Founder of StormForger.com; load test HTTP-based systems
  3. Time Series Data • Data, which has an inherent relation

    to time, e.g… • Sensor Data, Log Data • Stock Prices • "Events" in general
  4. Use Cases • (statistical) Analysis • Summaries and Aggregations •

    Visualization • Dashboards,
 Monitoring & Alerting
  5. Time Series Database (TSDB) • Make use of the inherent

    relation to time • (almost) everything builds around “ORDER BY time” • Examples: RRDTool, carbon (Graphite), OpenTSDB, … and InfluxDB
  6. Wish List for TSDBs • Flexible Query Language; e.g. SQL-like

    • Stream Processing • Full control over Downsampling & Retention Policies
  7. InfluxDB • written in Go, Open Source since 04/2013 •

    HTTP (binary protocol planned) • SQL-like Query Language • Clustering Support
  8. Writing Data [{ "name" : "hd_used", ! "columns" : ["time",

    "value", "host", "mount"], "points" : [ [1409665622000, 42, "a.example.com", "/mnt"], [1409137873000, 23, "b.example.com", "/mnt"] ] ! }] POST "http://localhost:8086/db/matters-dub-2014/series" database time series points
  9. [{ "name": "hd_used", "columns": [ "time", "sequence_number", "value", "host", "mount"

    ], "points": [ [ 1409137878266, 10001, 42, "a.example.com", "/mnt" ], … ] }] GET "http://localhost:8086/db/ matters-dub-2014/series?q=$QUERY" Querying Data SELECT * FROM hd_used
  10. Aggregation SELECT COUNT(duration) AS request_count, MEDIAN(duration) AS median, PERCENTILE(duration, 95.0)

    AS p95, PERCENTILE(duration, 99.0) AS p99 FROM production.web.server01.requests WHERE time > "2014-09-03 00:00:00.000" AND time < "2014-09-05" GROUP BY time(1m)
  11. Regular Expressions SELECT COUNT(duration) AS request_count FROM production.web.server01.requests WHERE time

    > "2014-09-03 00:00:00.000" AND time < "2014-09-05" ! AND user_agent =~ /.*Chrome.*/ ! GROUP BY time(1m)
  12. Selecting Multiple Series SELECT COUNT(duration) AS request_count ! FROM production.web.server01.requests,

    production.web.server02.requests ! WHERE time > "2014-09-03 00:00:00.000" AND time < "2014-09-05"
  13. …or via Regular Expressions SELECT COUNT(duration) AS request_count ! FROM

    /production\.web\.server\d{2}\.requests/ ! ! WHERE time > "2014-09-03 00:00:00.000" AND time < "2014-09-05"
  14. Join SELECT errors_per_minute.value /
 page_views_per_minute.value ! FROM errors_per_minute INNER JOIN

    page_views_per_minute a1 b1 a2 b2 a3 a3 a1 b2 a2 b1 b3 a3 t1 t2 t3 t4
  15. Continuous Queries • Idea: Process data as it arrives, not

    (only) at query time • Continuous Queries can be defined via "SQL" and run continuously • Support for backfilling (e.g. when creating new queries on existing data)
  16. time: 1409137878266 path: /example.html duration: 0.23 status: 200 time: 1409137878266

    path: /example.html duration: 0.23 status: 200 time: 1409137878266 path: /example.html duration: 0.23 status: 200 time: 1409137878266 path: /example.html duration: 0.23 status: 200 time: 1409137878266 path: /example.html duration: 0.23 status: 200 SELECT path, duration FROM http_requests GROUP BY http_status response_times.500 … response_times.200 Input Events http_requests INTO response_times.[http_status]
  17. SELECT Across Series SELECT PERCENTILE(duration, 95.0) FROM /response_times\..*/ { "response_times.200":

    [{ "percentile": 5.058, "time": 0 }], "response_times.500": [{ "percentile": 63.761, "time": 0 }] }
  18. Goals • group by status code • keep 1 data

    point per minute • calculate 95th percentile & mean duration • put results into response_times.1m.$status_code
  19. time: 1409137878266 path: /example.html duration: 0.23 status: 200 time: 1409137878266

    path: /example.html duration: 0.23 status: 200 time: 1409137878266 path: /example.html duration: 0.23 status: 200 time: 1409137878266 path: /example.html duration: 0.23 status: 200 time: 1409137878266 path: /example.html duration: 0.23 status: 200 SELECT MEAN(duration) AS mean, PERCENTILE(duration, 95.0) AS p95 FROM http_requests GROUP BY time(1m), http_status INTO response_times.1m.[http_status] http_requests response_times.1m.500 … response_times.1m.200 1min 1min 1min
  20. SELECT time, p95, mean FROM /response_times\.1m\..*/ SELECT time, p95, mean

    FROM response_times.1m.200 WHERE time > "2014-09-03 00:00:00.000" AND time < "2014-09-05"
  21. :-)