Slide 1

Slide 1 text

Time Series Data Models & the Query Languages That Love Them Paul Dix paul@influxdb.com @pauldix

Slide 2

Slide 2 text

Future of InfluxDB’s Data Model & Query Language

Slide 3

Slide 3 text

CTO & co-founder makers of

Slide 4

Slide 4 text

Founder of NYC Machine Learning Meetup

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Editor Addison Wesley’s Data & Analytics

Slide 7

Slide 7 text

Recovering Rubyist

Slide 8

Slide 8 text

Data Models

Slide 9

Slide 9 text

Graphite apps.backend.server_01.counters.requests.count

Slide 10

Slide 10 text

Graphite apps.backend.server_01.counters.requests.count Hierarchy

Slide 11

Slide 11 text

Graphite apps.backend.server_01.counters.requests.count Hierarchy

Slide 12

Slide 12 text

Graphite apps.backend.server_01.counters.requests.count Value - double precision float Time - second precision epoch

Slide 13

Slide 13 text

Graphite apps.backend.server_01.counters.requests.count Value - double precision float Time - second precision epoch regular series only!

Slide 14

Slide 14 text

Regular time series t0 t1 t2 t3 t4 t6 t7 Samples at regular intervals

Slide 15

Slide 15 text

Irregular time series t0 t1 t2 t3 t4 t6 t7 Events whenever they come in

Slide 16

Slide 16 text

OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2

Slide 17

Slide 17 text

OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 Metric

Slide 18

Slide 18 text

OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 Tags (string key/value pairs)

Slide 19

Slide 19 text

OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 millisecond precision epoch

Slide 20

Slide 20 text

OpenTSDB sys.cpu.user host=webserver01,cpu=3 1356998400 1.2 Value int64 or float64 (2.4)

Slide 21

Slide 21 text

Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000

Slide 22

Slide 22 text

Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 Metric

Slide 23

Slide 23 text

Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 Labels (string key/value pairs)

Slide 24

Slide 24 text

Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 Value - float64

Slide 25

Slide 25 text

Prometheus http_requests_total{method="post",code="200"} 1027 1395066363000 millisecond precision epoch

Slide 26

Slide 26 text

InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200

Slide 27

Slide 27 text

InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 Measurement

Slide 28

Slide 28 text

InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 Tags (string key/value pairs)

Slide 29

Slide 29 text

InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 Fields (key/value pairs)

Slide 30

Slide 30 text

InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 float64 value

Slide 31

Slide 31 text

InfluxDB 1.x cpu,host=serverA,region=west foo=23i 1465839830100400200 int64 value

Slide 32

Slide 32 text

InfluxDB 1.x cpu,host=serverA,region=west bar=t 1465839830100400200 bool value

Slide 33

Slide 33 text

InfluxDB 1.x cpu,host=serverA,region=west line=“some text here” 1465839830100400200 string value

Slide 34

Slide 34 text

InfluxDB 1.x cpu,host=serverA,region=west user=23.2,system=54.1 1465839830100400200 nanosecond precision epoch

Slide 35

Slide 35 text

Differences organization data types precision Graphite hierarchical float64 seconds OpenTSDB metric, tags float64 milliseconds Prometheus metric, tags float64 milliseconds InfluxDB 1.x metric, tags, fields float64, int64, bool, string nanoseconds

Slide 36

Slide 36 text

Querying

Slide 37

Slide 37 text

Data Exploration what series do I have

Slide 38

Slide 38 text

Retrieval & Computation raw data, transforms, materialized series, aggregates, samples

Slide 39

Slide 39 text

Organization matters with thousands of series or more

Slide 40

Slide 40 text

Hierarchy Tree!

Slide 41

Slide 41 text

Lookup metrics/measurements OpenTSDB /api/search/lookup?query= (all series) Prometheus {__name__=~“.+”} (all series?) InfluxDB SHOW MEASUREMENTS

Slide 42

Slide 42 text

Lookup tag/label keys OpenTSDB /api/search/lookup?query= (all series) Prometheus {__name__=~“.+”} (all series?) InfluxDB SHOW TAG KEYS

Slide 43

Slide 43 text

Lookup tag/label values OpenTSDB /api/search/lookup?query={host=*} (all series with host?) Prometheus {host=~“.+”} (all series with host?) InfluxDB SHOW TAG VALUES WITH KEY = “host” SHOW TAG VAVUES FROM “cpu” with KEY = “host”

Slide 44

Slide 44 text

Drill Down OpenTSDB /api/search/lookup?query={host=*,service=mysql} (all metrics on the hosts) Prometheus {host=~“.+”,service=mysql} (all metrics on the hosts) InfluxDB SHOW TAG VALUES WITH KEY = “host” WHERE “service” = ‘mysql’

Slide 45

Slide 45 text

Why to care about drill down (faceted search)

Slide 46

Slide 46 text

Why to care about drill down (faceted search)

Slide 47

Slide 47 text

Why to care about drill down (faceted search) select nonstop, LGA, JFK

Slide 48

Slide 48 text

Facets __name__ host service region group …

Slide 49

Slide 49 text

Facets __name__ host service region group … go_goroutines go_memstats_alloc_bytes go_memstats_alloc_bytes_total go_memstats_gc_sys_bytes go_memstats_other_sys_bytes …

Slide 50

Slide 50 text

Facets __name__ host service region group … go_goroutines go_memstats_alloc_bytes go_memstats_alloc_bytes_total go_memstats_gc_sys_bytes go_memstats_other_sys_bytes … host service region group …

Slide 51

Slide 51 text

Facets __name__ host service region group … go_goroutines go_memstats_alloc_bytes go_memstats_alloc_bytes_total go_memstats_gc_sys_bytes go_memstats_other_sys_bytes … host service region group … dynamic hierarchy! (name already selected)

Slide 52

Slide 52 text

Labels/Tags > Hierarchy

Slide 53

Slide 53 text

Up Front Design

Slide 54

Slide 54 text

Powerful Discovery

Slide 55

Slide 55 text

Slicing, dicing, grouping

Slide 56

Slide 56 text

Query Languages Query Language Example Graphite functional target, from, until sumSeries(summarize(water.level.h2o.feet.*, '1hour', 'max')) OpenTSDB http params startTime, endTime, metric, aggregationFunction, filter, functions, expressions Prometheus functional-ish increase(http_requests_total{job=“prometheus”}[5m]) InfluxDB 1.x SQL-ish select mean(system) from cpu where time > now() - 6h group by time(10m)

Slide 57

Slide 57 text

Functional > SQL or API

Slide 58

Slide 58 text

Time series are streams

Slide 59

Slide 59 text

Apply Functions!

Slide 60

Slide 60 text

Selection what series (streams) are we working with?

Slide 61

Slide 61 text

Timing what time range are we interested in?

Slide 62

Slide 62 text

Merging multiple streams into 1

Slide 63

Slide 63 text

Joining /, *, +, -, &, |, ^, filter

Slide 64

Slide 64 text

Partitioning do we slice the stream into blocks of time?

Slide 65

Slide 65 text

Sampling first, last, min, max, filters

Slide 66

Slide 66 text

Transforming time shift, derivative, rate, interpolate

Slide 67

Slide 67 text

Summarizing count, percentile, mean, median, mode, histogram

Slide 68

Slide 68 text

Future InfluxDB!

Slide 69

Slide 69 text

Subject to Change! *disclaimer

Slide 70

Slide 70 text

Requirements • Support InfluxDB 1.x Data Model • Support InfluxDB 1.x QL • Support Prometheus Data Model • Functional Query Language • Rich Query Builder UI • Query Completion CLI • PromQL?

Slide 71

Slide 71 text

InfluxDB 2.0 Data Model • Tags • non-string values? • Value • int64 • uint64 • float64 • bool • string • bytes • Timestamp (nanosecond)

Slide 72

Slide 72 text

No More Measurement!

Slide 73

Slide 73 text

No more fields?! yep, but remember joining and merging!

Slide 74

Slide 74 text

SIMPLE

Slide 75

Slide 75 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816

Slide 76

Slide 76 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Tags

Slide 77

Slide 77 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Key

Slide 78

Slide 78 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Value

Slide 79

Slide 79 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 Separators

Slide 80

Slide 80 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 spaces, /, and : must be escaped

Slide 81

Slide 81 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 1491675816 float64 value

Slide 82

Slide 82 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23 1491675816 float64 value

Slide 83

Slide 83 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23 1491675816 time (precision assumed closest to now)

Slide 84

Slide 84 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23 1491675816000000us time (precision specified)

Slide 85

Slide 85 text

InfluxDB 2.0 Line Protocol system:cpu,region:cpu,host:a,metric:user_idle 23.2 2017-04-08T14:23:54Z time (RFC3339Nano)

Slide 86

Slide 86 text

InfluxDB 2.0 Line Protocol name:foo 2i 1491675816 int64 value

Slide 87

Slide 87 text

InfluxDB 2.0 Line Protocol name:foo 2u 1491675816 uint64 value

Slide 88

Slide 88 text

InfluxDB 2.0 Line Protocol name:foo [234, 21, 9, 23, 87, 90, 11, 54] 1491675816 bytes value

Slide 89

Slide 89 text

InfluxDB 2.0 Line Protocol name:foo “it’s a string, yo!” 2017-04-18T14:58:00Z string value

Slide 90

Slide 90 text

InfluxDB 2.0 Line Protocol name:foo f 2017-04-18T14:58:00Z bool value

Slide 91

Slide 91 text

InfluxQL 2.0 functional!

Slide 92

Slide 92 text

f1(f2(f3(f4(streams)))) Lisp?

Slide 93

Slide 93 text

No content

Slide 94

Slide 94 text

No content

Slide 95

Slide 95 text

Paul Graham, Rich Hickey

Slide 96

Slide 96 text

D3 d3.select("body") .selectAll("p") .data([4, 8, 15, 16, 23, 42]) .enter().append("p") .text(function(d) { return "I’m number " + d + "!"; });

Slide 97

Slide 97 text

Function Chaining!

Slide 98

Slide 98 text

Series { "id": 24, "meta": { "dataType": "float64", "metricType": "gauge" }, "tagset": { "host": "A", "region": "B" }, "vector": [ {"value":23.2, "epoch":1491499253}, {"value":78.1, "epoch":1491499263, "tagset":{"host":"B"}} ] }

Slide 99

Slide 99 text

Matrix [ { "tagset": { "host": "A", }, "vector": [{"value":23.1, "epoch":1491499253}, {"value":56.2, "epoch":1491499263}] }, { "tagset": { "host": "B" }, "vector": [{"value":23.1, "epoch":1491499253}, {"value":56.2, "epoch":1491499263}] } ]

Slide 100

Slide 100 text

Example database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`) .range(startOffset:"-1h")

Slide 101

Slide 101 text

Named Parameters database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`) .range(startOffset:"-1h") Named parameters!

Slide 102

Slide 102 text

Example Queries database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`) .range(startOffset:"-1h") Wrap strings in back ticks to avoid pesky escaping

Slide 103

Slide 103 text

Database returns Matrix database(name:"testdb") [ { "tagset": { "host": "A", }, "vector": [ {"value":23.1, “epoch":1491499253}, {"value":56.2, "epoch":1491499263}] }, { "tagset": { "host": "B" }, "vector": [ {"value":23.1, “epoch":1491499253}, {"value":56.2, "epoch":1491499263}] } ]

Slide 104

Slide 104 text

Select filters vectors database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`)

Slide 105

Slide 105 text

Select filters vectors database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`) Tag keys

Slide 106

Slide 106 text

Select filters vectors database(name:"testdb") .select(criteria:`"host" = 'A' and "system" = 'cpu'`) Tag values

Slide 107

Slide 107 text

Complex Criteria database(name:"testdb") .select(criteria:`”t1” = ‘foo’ AND (“t2” = ‘bar’ OR “t3” = ‘asdf’)`)

Slide 108

Slide 108 text

Criteria Operators • = • != • =~ • !~ • < • > • startsWith • in • notIn

Slide 109

Slide 109 text

What hosts do we have? database(name:"testdb") .values(key:"host") .sort() .limit(n:20)

Slide 110

Slide 110 text

How many hosts? database(name:"testdb") .values(key:”host") .count()

Slide 111

Slide 111 text

// get the cpu load of hosts that have mysql running var db = database(name:"testdb") db.select( criteria:`"system" = 'cpu' and "metric" = 'load' and "host" in #{ db.select(`"service" = 'mysql'`).values(key:"host") }`) .range(startOffset:"-4h")

Slide 112

Slide 112 text

// get the cpu load of hosts that have mysql running var db = database(name:"testdb") db.select( criteria:`"system" = 'cpu' and "metric" = 'load' and "host" in #{ db.select(`"service" = 'mysql'`).values(key:"host") }`) .range(startOffset:"-4h") Variables

Slide 113

Slide 113 text

// get the cpu load of hosts that have mysql running var db = database(name:"testdb") db.select( criteria:`"system" = 'cpu' and "metric" = 'load' and "host" in #{ db.select(`"service" = 'mysql'`).values(key:"host") }`) .range(startOffset:"-4h") string interpolation

Slide 114

Slide 114 text

// get the count in 10m periods in the last 24h from an event stream // and filter that to only include those periods that were 2 sigma above the average var m = database(name:”testdb”).select(criteria:"\"event\" = 'pageview'") .range(startOffset:"-24h") .merge() .window(func:count(),duration:"10m")

Slide 115

Slide 115 text

// get the count in 10m periods in the last 24h from an event stream // and filter that to only include those periods that were 2 sigma above the average var m = database(name:”testdb”).select(criteria:”\”event\” = 'pageview'") .range(startOffset:"-24h") .merge() .window(func:count(),duration:"10m") // this is shorthand for m.stddev.join(op:"*", right:2) var sigma = m.stddev() * 2

Slide 116

Slide 116 text

// get the count in 10m periods in the last 24h from an event stream // and filter that to only include those periods that were 2 sigma above the average var m = database(name:”testdb”).select(criteria:”\”event\” = 'pageview'") .range(startOffset:"-24h") .merge() .window(func:count(),duration:"10m") // this is shorthand for m.stddev.join(op:"*", right:2) var sigma = m.stddev() * 2 // return only the counts 1 sigma above m.filter(exp:"$ > #{sigma}")

Slide 117

Slide 117 text

// return the last hour of time series of the top 10 host cpu utilizations by // their average load over last 10 minutes var topTen = db.select(criteria:”\”metric\” = 'load' and system = 'cpu'") .range(startOffset:"-10m") .mean() .sort(func:first()) .slice(end:10) .values(key:"host") db.select(criteria:”\"metric\" = 'load' and system = 'cpu' and host in #{topTen}") .range(startOffset:"-1h")

Slide 118

Slide 118 text

Functions • interpolate • join • merge • timeShift • window • rate • first, last, min, max, mean, percentile, etc.

Slide 119

Slide 119 text

Public docs PR in two weeks! please to give feedback :)

Slide 120

Slide 120 text

Thank you. Paul Dix paul@influxdb.com @pauldix