InfluxDB - at NoVA MAMaL

39b7a68b6cbc43ec7683ad0bcc4c9570?s=47 Paul Dix
November 12, 2014

InfluxDB - at NoVA MAMaL

Presented at the the DC area Monitoring, Alerting, Metrics, and Logging meetup.

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

November 12, 2014
Tweet

Transcript

  1. InfluxDB - an open source distributed time series, metrics, and

    events database Paul Dix paul@influxdb.com @pauldix @influxdb
  2. YC (W13) 4 people full time. hiring more!

  3. What it’s for…

  4. Metrics

  5. Time Series

  6. Analytics

  7. Events

  8. Can’t you just use a regular DB?

  9. order by time?

  10. Doesn’t Scale

  11. Example from metrics: ! 100 measurements per host * 10

    hosts * 8640 per day (once every 10s) * 365 days ! = 3,153,600,000 records per year
  12. Have fun with that table…

  13. But wait, we’ll just keep the summaries!

  14. 1h averages = ! 8,760,000 per year

  15. Lose Detail and AdHoc Queryability

  16. So let’s use Cassandra, HBase, or Scaleasaurus!

  17. Too much application code and complexity

  18. Application logic and scripts to compute summaries

  19. Application level logic for balancing

  20. No data locality for AdHoc queries

  21. And then there’s more…

  22. Web services

  23. Libraries for web services

  24. Data collection

  25. Visualization

  26. –Paul Dix “Building an application with an analytics component today

    is like building a web application in 1998. You spend months building infrastructure before getting to the actual thing you want to build.”
  27. Analytics should be about analyzing and interpreting data, not the

    infrastructure to store and process it.
  28. None
  29. HTTP API Web services built in

  30. HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d

    '[{"name":"foo", "columns":["val"], "points": [[3]]}]'
  31. Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",

    "host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
  32. HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'

  33. SQL-ish select * from events where time > now() -

    1h
  34. SQL-ish select * from “series with weird chars ()*@#0982#$” where

    time > now() - 1h
  35. Where Regex select line from application_logs where line =~ /.*ERROR.*/

    and time > "2014-03-01" and time < "2014-03-03"
  36. Only scans the time range Series and time are the

    primary index
  37. Aggregate on the fly…

  38. Aggregates select percentile(90, value) from response_times group by time(10m) where

    time > now() - 1d
  39. Continuous Aggregation…

  40. Continuous queries (summaries) select count(page_id) from events group by time(1h),

    page_id into events.[page_id]
  41. Series per page id select count from events.67 where time

    > now() - 7d
  42. Work with many series…

  43. Select from Regex select * from /stats\.cpu\..*/ limit 1

  44. Continuous queries (regex aggregating) select percentile(value, 90) as value from

    /stats\.*/ group by time(5m) into percentile.90.:series_name
  45. Merge with Regex select percentile(90, value) from merge(/stats\.cpu_load\..*/) group by

    time(10m) where time > now() - 4h
  46. Percentile series per host select value from percentile.90.stats.cpu.host1 where time

    > now() - 4h
  47. Denormalization for performance

  48. Range scans all user events for last hour select *

    from events where user_id = 3 and time > now() - 1h
  49. Continuous queries (fan out) select * from events into events.[user_id]

  50. Series per user id select * from events.3 where time

    > now() - 1h
  51. Distributed Scale out, data locality, high availability

  52. Raft for metadata *Goraft now, streaming soon

  53. Protobuf + TCP for queries, writes

  54. Scalable Hundreds of thousands of series (soon millions)

  55. Libraries Go, Ruby, Javascript, Python, Node.js, Clojure, Java, Perl, Haskell,

    R, Scala, CLI (ruby and node)
  56. Visualization

  57. Built-in UI

  58. Grafana

  59. Javascript library + D3, HighCharts, Rickshaw, NVD3, etc. Definitely more

    to do here!
  60. Data Collection CollectD, StatsD backend, Carbon ingestion

  61. Coming Soon

  62. New Clustering Implementation

  63. Two Parts

  64. Broker

  65. Data Node

  66. How writes work

  67. Any server Write

  68. Broker Broker Broker Any server Write Streaming Raft Cluster

  69. Writes are CP

  70. Broker Data Node Broker Broker Any server Write

  71. Broker Data Node Data Node Broker Broker Any server Write

    If replication factor = 2
  72. Broker Data Node Data Node Broker Broker Any server Write

    Data Node Data Node Data Node Data Node
  73. How Queries Work

  74. Data Node Data Node Any server Data Node Data Node

    Data Node Data Node select mean(cpu_load)! where data_center = 'us-west'! and host = 'serverA'! and time > now() - 24h! group by time(10m)!
  75. Data Node Data Node Any server Data Node Data Node

    Data Node Data Node Compute Locally select mean(cpu_load)! where data_center = 'us-west'! and host = 'serverA'! and time > now() - 24h! group by time(10m)!
  76. Data Node Data Node Any server Data Node Data Node

    Data Node Data Node Send Summary Ticks select mean(cpu_load)! where data_center = 'us-west'! and host = 'serverA'! and time > now() - 24h! group by time(10m)!
  77. Clustering Goal: 1-2M values per second

  78. Potential Cluster Size: 3-5 Brokers 50 Data Nodes

  79. Binary Protocol

  80. Pubsub select * from some_series where host = “serverA” into

    subscription() select percentile(90, value) from some_series group by time(1m) into subscription()
  81. Custom Functions select myFunc(value) from some_series

  82. Column Indexes

  83. Dictionaries

  84. Rack aware sharding and querying

  85. Multi-datacenter replication Push and bi-directional

  86. Need help? support@influxdb.com Thanks! paul@influxdb.com @pauldix