InfluxDB intro for Data Driven NYC

39b7a68b6cbc43ec7683ad0bcc4c9570?s=47 Paul Dix
March 17, 2015
220

InfluxDB intro for Data Driven NYC

Talk I gave on 3/17/2015 at Data Driven NYC. Introduces the motivation behind creating a time series database and some of the basic features in 0.9.0.

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

March 17, 2015
Tweet

Transcript

  1. InfluxDB - an open source time series database Paul Dix

    CEO @pauldix paul@influxdb.com
  2. What it’s for…

  3. Metrics

  4. Time Series

  5. Analytics

  6. Events

  7. Use Cases

  8. DevOps

  9. Real-time analytics (user & business)

  10. Sensor Data

  11. Can’t you just use a regular DB?

  12. order by time?

  13. Doesn’t Scale

  14. Example from metrics: 100 measurements per host * 10 hosts

    * 8640 per day (once every 10s) * 365 days = 3,153,600,000 records per year
  15. Have fun with that table…

  16. But wait, we’ll just keep the summaries!

  17. 1h averages = 8,760,000 per year

  18. Lose Detail and AdHoc Queryability

  19. So let’s use Cassandra, HBase, or Scaleasaurus!

  20. Too much application code and complexity

  21. Application logic and scripts to compute summaries

  22. Application level logic for balancing

  23. No data locality for AdHoc queries

  24. How to handle data retention?

  25. And then there’s more…

  26. Web services

  27. Libraries for web services

  28. Data collection

  29. Visualization

  30. –Paul Dix “Building an application with an analytics component today

    is like building a web application in 1998. You spend months building infrastructure before getting to the actual thing you want to build.”
  31. Analytics and monitoring should be about analyzing and interpreting data,

    not the infrastructure to store and process it.
  32. None
  33. A time series database with no external dependencies

  34. Features Upcoming 0.9.0 release

  35. Data model • Databases

  36. Data model • Databases • Measurements • cpu_load, temperature, log_lines,

    click, etc.
  37. Data model • Databases • Measurements • cpu_load, temperature, log_lines,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc.
  38. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset
  39. Data model • Databases • Measurements • cpu_load, temperature, log,

    click, etc. • Tags • region=uswest, host=serverA, building=23, service=redis, etc. • Series - measurement + unique tagset • Points • Fields - bool, int64, float64, string, []byte • Timestamp - nano epoch
  40. Writing Data curl -XPOST 'http://localhost:8086/write' -d '...'

  41. Writing Data { "database": "mydb", "retentionPolicy": "30d", "points": [ {

    "name": "cpu_load", "tags": { "host": "server01", "region": "us-west" }, "timestamp": "2009-11-10T23:00:00Z", "fields": { "value": 0.64 } } ] } Measurement Tags Fields
  42. Querying curl -G 'http://localhost:8086/query' --data-urlencode "q=..."

  43. SQL-ish query language

  44. SELECT value FROM cpu WHERE host = 'serverA' { "results":[

    { "query": "SELECT value FROM cpu WHERE host='serverA'", "series": [ { "name": "cpu", "tags": { "host": "serverA" }, "columns": ["time", "value"], "values": [ ["2009-11-10T23:00:00Z", 22.1], ["2009-11-10T23:00:10Z", 25.2] ] } ] } ] } QUERY: RESULTS:
  45. SELECT value FROM cpu WHERE host = ‘serverA'OR host =

    'serverB' QUERY: { "series": [ { "name": "cpu", "tags": { "host": "serverA" }, "columns": ["time", "value"], "values": [] }, { "name": "cpu", "tags": { "host": "serverB" }, "columns": ["time", "value"], "values": [] } ] } SERIES IN RESULT:
  46. SELECT percentile(90, value) FROM cpu WHERE time > now() -

    4h GROUP BY time(10m), region QUERY: [ { "name": "cpu", "tags": { "region": "us-west" }, "columns": ["time", "percentile"], "values": [] }, { "name": "cpu", "tags": { "region": "us-east" }, "columns": ["time", "percentile"], "values": [] } ] SERIES IN RESULT:
  47. Multiple aggregates SELECT mean(value), percentile(90, value), min(value), max(value) FROM cpu

    WHERE host='serverA' AND time > now() - 48h GROUP BY time(1h)
  48. Return every series in CPU SELECT mean(value) FROM cpu WHERE

    time > now() - 48h GROUP BY time(1h), *
  49. Discovery based on tags

  50. { "results":[ { "query": "SHOW MEASUREMENTS", "series": [ { "name":

    "measurements", "columns": ["name"], "values": [ ["cpu"], ["memory"], ["network"] ] } ] } ] }
  51. { "results":[ { "query": "SHOW SERIES", "series": [ { "name":

    "cpu", "columns": ["id", "region", "host"], "values": [ [1, "us-west", "serverA"], [2, "us-east", "serverB"] ] } ] } ] }
  52. { "query": "SHOW MEASUREMENTS WHERE service='redis'", "series": [ { "name":

    "measurements", "name": "series", "columns": ["measurement"], "values": [ ["key_count"], ["connections"] ] } ] }
  53. { "query": "SHOW TAG KEYS from cpu", "series": [ {

    "name": "keys", "columns": ["key"], "values": [ ["region"], ["host"] ] } ] }
  54. { "query": "SHOW TAG VALUES WITH KEY = service", "series":

    [ { "name": "series", "columns": ["service"], "values": [ ["redis"], ["apache"] ] } ] }
  55. { "query": "SHOW TAG VALUES FROM cpu WITH KEY =

    service", "series": [ { "name": "series", "columns": ["service"], "values": [ ["redis"], ["apache"] ] } ] }
  56. Much more • Retention policies • Automatic downsampling and aggregation

    • Clustering
  57. Grafana Dashboards

  58. Thank you! Paul Dix @pauldix paul@influxdb.com