Time Series Data Models and the future of InfluxDB's Query Language

39b7a68b6cbc43ec7683ad0bcc4c9570?s=47 Paul Dix
April 20, 2017

Time Series Data Models and the future of InfluxDB's Query Language

A talk I gave at the Austin Monitoring Meetup about different open source projects' time series data models and some ideas for a new InfluxDB query language.

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

April 20, 2017
Tweet

Transcript

  1. Time Series Data Models & the Query Languages That Love

    Them Paul Dix paul@influxdb.com @pauldix
  2. Future of InfluxDB’s Data Model & Query Language

  3. CTO & co-founder makers of

  4. Founder of NYC Machine Learning Meetup

  5. None
  6. Editor Addison Wesley’s Data & Analytics

  7. Recovering Rubyist

  8. Data Models

  9. Graphite apps.backend.server_01.counters.requests.count

  10. Graphite apps.backend.server_01.counters.requests.count Hierarchy

  11. Graphite apps.backend.server_01.counters.requests.count Hierarchy

  12. Graphite apps.backend.server_01.counters.requests.count Value - double precision float Time - second

    precision epoch
  13. Graphite apps.backend.server_01.counters.requests.count Value - double precision float Time - second

    precision epoch regular series only!
  14. Regular time series t0 t1 t2 t3 t4 t6 t7

    Samples at regular intervals
  15. Irregular time series t0 t1 t2 t3 t4 t6 t7

    Events whenever they come in
  16. OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2

  17. OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 Metric

  18. OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 Tags (string key/value pairs)

  19. OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 millisecond precision epoch

  20. OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 Value int64 or float64 (2.4)

  21. Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000

  22. Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 Metric

  23. Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 Labels (string key/value pairs)

  24. Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 Value - float64

  25. Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 millisecond precision epoch

  26. InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200

  27. InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 Measurement

  28. InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 Tags (string key/value pairs)

  29. InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 Fields (key/value pairs)

  30. InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 float64 value

  31. InfluxDB 1.x cpu,host=serverA,region=west foo=23i 1465839830100400200 int64 value

  32. InfluxDB 1.x cpu,host=serverA,region=west bar=t 1465839830100400200 bool value

  33. InfluxDB 1.x cpu,host=serverA,region=west line=“some text here” 1465839830100400200 string value

  34. InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 nanosecond precision epoch

  35. Differences organization data types precision Graphite hierarchical float64 seconds OpenTSDB

    metric, tags float64 milliseconds Prometheus metric, tags float64 milliseconds InfluxDB 1.x metric, tags, fields float64, int64, bool, string nanoseconds
  36. Querying

  37. Data Exploration what series do I have

  38. Retrieval & Computation raw data, transforms, materialized series, aggregates, samples

  39. Organization matters with thousands of series or more

  40. Hierarchy Tree!

  41. Lookup metrics/measurements OpenTSDB /api/search/lookup?query= (all series) Prometheus {__name__=~“.+”} (all series?)

    InfluxDB SHOW MEASUREMENTS
  42. Lookup tag/label keys OpenTSDB /api/search/lookup?query= (all series) Prometheus {__name__=~“.+”} (all

    series?) InfluxDB SHOW TAG KEYS
  43. Lookup tag/label values OpenTSDB /api/search/lookup?query={host=*} (all series with host?) Prometheus

    {host=~“.+”} (all series with host?) InfluxDB SHOW TAG VALUES WITH KEY = “host” SHOW TAG VAVUES FROM “cpu” with KEY = “host”
  44. Drill Down OpenTSDB /api/search/lookup?query={host=*,service=mysql} (all metrics on the hosts) Prometheus

    {host=~“.+”,service=mysql} (all metrics on the hosts) InfluxDB SHOW TAG VALUES WITH KEY = “host” WHERE “service” = ‘mysql’
  45. Why to care about drill down (faceted search)

  46. Why to care about drill down (faceted search)

  47. Why to care about drill down (faceted search) select nonstop,

    LGA, JFK
  48. Facets __name__ host service region group …

  49. Facets __name__ host service region group … go_goroutines go_memstats_alloc_bytes go_memstats_alloc_bytes_total

    go_memstats_gc_sys_bytes go_memstats_other_sys_bytes …
  50. Facets __name__ host service region group … go_goroutines go_memstats_alloc_bytes go_memstats_alloc_bytes_total

    go_memstats_gc_sys_bytes go_memstats_other_sys_bytes … host service region group …
  51. Facets __name__ host service region group … go_goroutines go_memstats_alloc_bytes go_memstats_alloc_bytes_total

    go_memstats_gc_sys_bytes go_memstats_other_sys_bytes … host service region group … dynamic hierarchy! (name already selected)
  52. Labels/Tags > Hierarchy

  53. Up Front Design

  54. Powerful Discovery

  55. Slicing, dicing, grouping

  56. Query Languages Query Language Example Graphite functional target, from, until

    sumSeries(summarize(water.level.h2o.feet.*, '1hour', 'max')) OpenTSDB http params startTime, endTime, metric, aggregationFunction, filter, functions, expressions Prometheus functional-ish increase(http_requests_total{job=“prometheus”}[5m]) InfluxDB 1.x SQL-ish select mean(system) from cpu where time > now() - 6h group by time(10m)
  57. Functional > SQL or API

  58. Time series are streams

  59. Apply Functions!

  60. Selection what series (streams) are we working with?

  61. Timing what time range are we interested in?

  62. Merging multiple streams into 1

  63. Joining /, *, +, -, &, |, ^, filter

  64. Partitioning do we slice the stream into blocks of time?

  65. Sampling first, last, min, max, filters

  66. Transforming time shift, derivative, rate, interpolate

  67. Summarizing count, percentile, mean, median, mode, histogram

  68. Future InfluxDB!

  69. Subject to Change! *disclaimer

  70. Requirements • Support InfluxDB 1.x Data Model • Support InfluxDB

    1.x QL • Support Prometheus Data Model • Functional Query Language • Rich Query Builder UI • Query Completion CLI • PromQL?
  71. InfluxDB 2.0 Data Model • Tags • non-string values? •

    Value • int64 • uint64 • float64 • bool • string • bytes • Timestamp (nanosecond)
  72. No More Measurement!

  73. No more fields?! yep, but remember joining and merging!

  74. SIMPLE

  75. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816

  76. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Tags

  77. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Key

  78. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Value

  79. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Separators

  80. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 spaces, /, and

    : must be escaped
  81. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 float64 value

  82. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23 1491675816 float64 value

  83. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23 1491675816 time (precision assumed

    closest to now)
  84. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23 1491675816000000us time (precision specified)

  85. InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 2017-04-08T14:23:54Z time (RFC3339Nano)

  86. InfluxDB 2.0 Line Protocol name:foo 2i 1491675816 int64 value

  87. InfluxDB 2.0 Line Protocol name:foo 2u 1491675816 uint64 value

  88. InfluxDB 2.0 Line Protocol name:foo [234, 21, 9, 23, 87,

    90, 11, 54] 1491675816 bytes value
  89. InfluxDB 2.0 Line Protocol name:foo “it’s a string, yo!” 2017-04-18T14:58:00Z

    string value
  90. InfluxDB 2.0 Line Protocol name:foo f 2017-04-18T14:58:00Z bool value

  91. InfluxQL 2.0 functional!

  92. f1(f2(f3(f4(streams)))) Lisp?

  93. None
  94. None
  95. Paul Graham, Rich Hickey

  96. D3 d3.select("body") .selectAll("p") .data([4, 8, 15, 16, 23, 42]) .enter().append("p")

    .text(function(d) { return "I’m number " + d + "!"; });
  97. Function Chaining!

  98. Series { "id": 24, "meta": { "dataType": "float64", "metricType": "gauge"

    }, "tagset": { "host": "A", "region": "B" }, "vector": [ {"value":23.2, "epoch":1491499253}, {"value":78.1, "epoch":1491499263, "tagset":{"host":"B"}} ] }
  99. Matrix [ { "tagset": { "host": "A", }, "vector": [{"value":23.1,

    "epoch":1491499253}, {"value":56.2, "epoch":1491499263}] }, { "tagset": { "host": "B" }, "vector": [{"value":23.1, "epoch":1491499253}, {"value":56.2, "epoch":1491499263}] } ]
  100. Example database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`) .range(startOffset:"-1h")

  101. Named Parameters database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`)

    .range(startOffset:"-1h") Named parameters!
  102. Example Queries database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`)

    .range(startOffset:"-1h") Wrap strings in back ticks to avoid pesky escaping
  103. Database returns Matrix database(name:"testdb") [ { "tagset": { "host": "A",

    }, "vector": [ {"value":23.1, “epoch":1491499253}, {"value":56.2, "epoch":1491499263}] }, { "tagset": { "host": "B" }, "vector": [ {"value":23.1, “epoch":1491499253}, {"value":56.2, "epoch":1491499263}] } ]
  104. Select filters vectors database(name:"testdb") .select(criteria:`"host" = 'A' and "system" =

    'cpu'`)
  105. Select filters vectors database(name:"testdb") .select(criteria:`"host" = 'A' and "system" =

    'cpu'`) Tag keys
  106. Select filters vectors database(name:"testdb") .select(criteria:`"host" = 'A' and "system" =

    'cpu'`) Tag values
  107. Complex Criteria database(name:"testdb") .select(criteria:`”t1” = ‘foo’ AND (“t2” = ‘bar’

    OR “t3” = ‘asdf’)`)
  108. Criteria Operators • = • != • =~ • !~

    • < • > • startsWith • in • notIn
  109. What hosts do we have? database(name:"testdb") .values(key:"host") .sort() .limit(n:20)

  110. How many hosts? database(name:"testdb") .values(key:”host") .count()

  111. // get the cpu load of hosts that have mysql

    running var db = database(name:"testdb") db.select( criteria:`"system" = 'cpu' and "metric" = 'load' and "host" in #{ db.select(`"service" = 'mysql'`).values(key:"host") }`) .range(startOffset:"-4h")
  112. // get the cpu load of hosts that have mysql

    running var db = database(name:"testdb") db.select( criteria:`"system" = 'cpu' and "metric" = 'load' and "host" in #{ db.select(`"service" = 'mysql'`).values(key:"host") }`) .range(startOffset:"-4h") Variables
  113. // get the cpu load of hosts that have mysql

    running var db = database(name:"testdb") db.select( criteria:`"system" = 'cpu' and "metric" = 'load' and "host" in #{ db.select(`"service" = 'mysql'`).values(key:"host") }`) .range(startOffset:"-4h") string interpolation
  114. // get the count in 10m periods in the last

    24h from an event stream // and filter that to only include those periods that were 2 sigma above the average var m = database(name:”testdb”).select(criteria:"\"event\" = 'pageview'") .range(startOffset:"-24h") .merge() .window(func:count(),duration:"10m")
  115. // get the count in 10m periods in the last

    24h from an event stream // and filter that to only include those periods that were 2 sigma above the average var m = database(name:”testdb”).select(criteria:”\”event\” = 'pageview'") .range(startOffset:"-24h") .merge() .window(func:count(),duration:"10m") // this is shorthand for m.stddev.join(op:"*", right:2) var sigma = m.stddev() * 2
  116. // get the count in 10m periods in the last

    24h from an event stream // and filter that to only include those periods that were 2 sigma above the average var m = database(name:”testdb”).select(criteria:”\”event\” = 'pageview'") .range(startOffset:"-24h") .merge() .window(func:count(),duration:"10m") // this is shorthand for m.stddev.join(op:"*", right:2) var sigma = m.stddev() * 2 // return only the counts 1 sigma above m.filter(exp:"$ > #{sigma}")
  117. // return the last hour of time series of the

    top 10 host cpu utilizations by // their average load over last 10 minutes var topTen = db.select(criteria:”\”metric\” = 'load' and system = 'cpu'") .range(startOffset:"-10m") .mean() .sort(func:first()) .slice(end:10) .values(key:"host") db.select(criteria:”\"metric\" = 'load' and system = 'cpu' and host in #{topTen}") .range(startOffset:"-1h")
  118. Functions • interpolate • join • merge • timeShift •

    window • rate • first, last, min, max, mean, percentile, etc.
  119. Public docs PR in two weeks! please to give feedback

    :)
  120. Thank you. Paul Dix paul@influxdb.com @pauldix