$30 off During Our Annual Pro Sale. View details »

Flux (#fluxlang): a new (time series) data scripting language

Paul Dix
June 14, 2018

Flux (#fluxlang): a new (time series) data scripting language

Talk from InfluxDays London on Flux, the new scripting language being built by InfluxData.

Paul Dix

June 14, 2018
Tweet

More Decks by Paul Dix

Other Decks in Technology

Transcript

  1. Flux (#fluxlang): a new (time series) data scripting language Paul

    Dix @pauldix paul@pauldix.net
  2. IFQL -> Flux

  3. Data scripting language?

  4. MIT License Language & Engine written in Go

  5. Talk Structure • Why Flux? • Design & Structure •

    Motivating Examples
  6. Why not SQL?

  7. Relational Algebra

  8. SQL isn’t the only interpretation!

  9. QUEL & POSTGRESQUEL range of E is EMPLOYEE retrieve into

    W (COMP = E.Salary / (E.Age - 18)) where E.Name = "Jones" select (e.salary / (e.age - 18)) as comp from employee as e where e.name = "Jones" SQL
  10. None
  11. Inertia

  12. Additions & Semantics

  13. Functional FTW!

  14. Rethink Programmer Productivity

  15. Language > Query

  16. Change Reality

  17. Existing Language?

  18. Haskell or Lisp!

  19. Flux Design Principles

  20. Useable

  21. Make Everyone a Data Programmer!

  22. Readable

  23. Flexible

  24. Composable

  25. Testable

  26. Contributable

  27. Shareable

  28. Beginning Examples

  29. None
  30. None
  31. showMeasurements(db: "telegraf")

  32. showMeasurements(db: "telegraf") Function

  33. showMeasurements(db: "telegraf") Named Argument

  34. showMeasurements(db: "telegraf") String Literal

  35. showTagKeys(db: "telegraf", measurement: "cpu")

  36. showTagKeys(db: "telegraf", measurement: "cpu") Named Arguments

  37. showTagKeys(db: "telegraf", measurements: ["redis", "mysql"])

  38. showTagKeys(db: "telegraf", measurements: ["redis", "mysql"]) Passing an array

  39. showTagValues(db: "telegraf", tag: "host")

  40. showFieldKeys(db:"telegraf", measurement:"cpu")

  41. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system")
  42. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Comments
  43. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Duration Literal
  44. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Pipe forward operator
  45. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Anonymous Function
  46. Operators + == != ( ) - < !~ [

    ] * > =~ { } / <= = , : % >= <- . |>
  47. Types • int • uint • float64 • string •

    duration • time • regex • array • object • function • namespace
  48. Functions Overview

  49. Inputs from, fromKafka, fromFile, fromS3, fromPrometheus, fromMySQL, etc.

  50. Outputs to, toKafka, toFile, toS3, toPrometheus, toMySQL, etc.

  51. Functions • count • covariance • cumulativeSum • derivative •

    difference • distinct • filter • first • from • group • integral • mean • min • percentile • range • sample • set • shift • skew • sort • spread • stateTracking • limit • map • max • window • yield • cov • highestMax • highestAverage • highestCurrent • lowestMin • join • last • stddev • sum • lowestAverage • lowestCurrent • pearsonR • stateCount • stateDuration • top • bottom
  52. Flux ⊇ Graphite

  53. Data Model

  54. Example Series _measurement=mem,host=A,region=west,_field=free _measurement=mem,host=B,region=west,_field=free _measurement=cpu,host=A,region=west,_field=usage_system _measurement=cpu,host=A,region=west,_field=usage_user

  55. Example Series _measurement=mem,host=A,region=west,_field=free _measurement=mem,host=B,region=west,_field=free _measurement=cpu,host=A,region=west,_field=usage_system _measurement=cpu,host=A,region=west,_field=usage_user Measurement

  56. Example Series _measurement=mem,host=A,region=west,_field=free _measurement=mem,host=B,region=west,_field=free _measurement=cpu,host=A,region=west,_field=usage_system _measurement=cpu,host=A,region=west,_field=usage_user Field

  57. Table _measurement host region _field _time _value mem A west

    free 2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10
  58. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Column
  59. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Record
  60. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Group Key _measurement=mem,host=A,region=west,_field=free
  61. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Every record has the same value! _measurement=mem,host=A,region=west,_field=free
  62. Table Per Series _measurement host region _field _time _value mem

    A west free 2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 11 _measurement host region _field _time _value mem B west free 2018-06-14T09:15:00 20 mem B west free 2018-06-14T09:14:50 22 _measurement host region _field _time _value cpu A west usage_user 2018-06-14T09:15:00 45 cpu A west usage_user 2018-06-14T09:14:50 49 _measurement host region _field _time _value cpu A west usage_system 2018-06-14T09:15:00 35 cpu A west usage_system 2018-06-14T09:14:50 38
  63. input tables -> function -> output tables

  64. input tables -> function -> output tables // example query

    from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  65. input tables -> function -> output tables DateTime Literal //

    example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  66. input tables -> function -> output tables What to sum

    on? // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  67. input tables -> function -> output tables Default columns argument

    // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum(columns: [“_value”])
  68. input tables -> function -> output tables _meas ureme host

    region _field _time _valu e mem A west free 2018-06- 14T09:1 10 mem A west free 2018-06- 14T09:1 11 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 20 mem B west free 2018-06- 14T09:14 22 Input in table form // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  69. input tables -> function -> output tables _meas ureme host

    region _field _time _valu e mem A west free 2018-06- 14T09:1 10 mem A west free 2018-06- 14T09:1 11 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 20 mem B west free 2018-06- 14T09:14 22 sum() // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  70. input tables -> function -> output tables // example query

    from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum() _meas ureme host region _field _time _valu e mem A west free 2018-06- 14T09:1 10 mem A west free 2018-06- 14T09:1 11 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 20 mem B west free 2018-06- 14T09:14 22 sum() _meas ureme host region _field _time _valu e mem A west free 2018-06- 14T09:1 21 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 42
  71. N to N table mapping (1 to 1 mapping)

  72. N to M table mapping

  73. window // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) 30s of data (4 samples)
  74. window // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) split into 20s windows
  75. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) Input
  76. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 window( every:20s) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s)
  77. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 window( every:20s) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) _meas ureme host region _field _time _valu e mem A west free …14:30 10 mem A west free …14:40 11 _meas ureme host region _field _time _valu e mem B west free …14:50 23 mem B west free …15:00 24 _meas ureme host region _field _time _valu e mem B west free …14:30 20 mem B west free …14:40 22 _meas ureme host region _field _time _valu e mem A west free …14:50 12 mem A west free …15:00 13
  78. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 window( every:20s) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) _meas ureme host region _field _time _valu e mem A west free …14:30 10 mem A west free …14:40 11 _meas ureme host region _field _time _valu e mem B west free …14:50 23 mem B west free …15:00 24 _meas ureme host region _field _time _valu e mem B west free …14:30 20 mem B west free …14:40 22 _meas ureme host region _field _time _valu e mem A west free …14:50 12 mem A west free …15:00 13 N to M tables
  79. Window based on time _start and _stop columns

  80. group // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"])
  81. group // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"]) new partition key
  82. group _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"])
  83. group _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 group( keys: [“region”]) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"]) _meas ureme host region _field _time _valu e mem A west free …14:30 10 mem B west free …14:30 20 mem A west free …14:40 11 mem B west free …14:40 21 mem A west free …14:50 12 mem B west free …14:50 22 mem B west free …15:00 13 mem B west free …15:00 23 N to M tables M == cardinality(group keys)
  84. Group based on columns

  85. Composable & Flexible

  86. showTagValues(db: "telegraf", tag: "host")

  87. New argument, same function definition showTagValues(db: "telegraf", tag: "host", startTime:

    2018-06-14T09:15:00)
  88. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"])
  89. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) Assign function to variable
  90. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) Specify default argument value to make optional
  91. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) now function
  92. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) pass function as argument
  93. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"])
  94. showTagValues( db:"telegraf", tag:"host", predicate: (r) => r._measurement == "redis")

  95. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value))
  96. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) map function
  97. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) float function
  98. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) only named arguments!
  99. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) make this a function?
  100. Defining functions that take inputs castToFloat = (table=<-) { return

    table |> map(fn: (r) => float(v:r._value)) } user defined pipe forwardable function
  101. Defining functions that take inputs // calling it from(db:"telegraf") |>

    range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> castToFloat()
  102. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) castToFloat = (table=<-) { return table |> map(fn: (r) => float(v:r._value)) } from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> castToFloat()
  103. Any pipe forward function can use arguments min(table: range(start: -1h,

    table: filter(fn: (r) => r.host == "server0", table: from(db: "mydb"))))
  104. Make you a Lisp

  105. New Query Functionality finally getting to those feature requests!

  106. None
  107. Math across measurements foo = from(db: "mydb") |> filter(fn: (r)

    => r._measurement == "foo") |> range(start: -1h) bar = from(db: "mydb") |> filter(fn: (r) => r._measurement == "bar") |> range(start: -1h) join( tables: {foo:foo, bar:bar}, on: [“foobar”, “_time”], fn: (t) => t.foo._value + t.bar._value, ) |> yield(name: "foobar")
  108. None
  109. Having Query from(db:"mydb") |> filter(fn: (r) => r._measurement == "cpu"

    and r._field == “usage_system”) |> range(start:-1h) |> window(every:10m) |> mean() // this is the having part |> filter(fn: (r) => r._value > 90)
  110. Shareable

  111. Imports and Namespaces import "math" from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square()
  112. Imports and Namespaces import "math" from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square() namespace
  113. Package Manager

  114. Imports and Namespaces import “pauldix/math" from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square() Username like RubyGems
  115. Public Package Repository (like RubyGems, npm, etc.)

  116. Imports and Namespaces import “github.com/pauldix/math” from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square() Or from Github
  117. Difficult SQL Queries

  118. Exponential Moving Average from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) =>

    r._measurement == "foo") |> exponentialMovingAverage(size:-10s)
  119. SQL rolling average select id, temp, avg(temp) over (partition by

    group_nr order by time_read) as rolling_avg from ( select id, temp, time_read, interval_group, id - row_number() over (partition by interval_group order by time_read) as group_nr from ( select id, time_read, 'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group, temp from readings ) t1 ) t2 order by time_read;
  120. Alerting import “alert” from(db:"telegraf") |> range(start: -1m) |> filter(fn: (r)

    => r._measurement == "work_queue" and r._field == "depth") |> mean() |> alert.track( warn: (r) => r._value > 200, crit: (r) => r._value > 500) |> alert.limit(duration:1m) |> toSlack(config: loadConfig(key: "slack"))
  121. Wrap up

  122. Get the nightlies! InfluxDB, Flux, Chronograf http://influxdata.com/download

  123. Get the code, file issues! https://github.com/influxdata/platform

  124. SQL is a great thing

  125. But it’s not the only thing

  126. Thank you Paul Dix @pauldix paul@influxdata.com