Flux (#fluxlang): a new (time series) data scripting language

39b7a68b6cbc43ec7683ad0bcc4c9570?s=47 Paul Dix
June 14, 2018

Flux (#fluxlang): a new (time series) data scripting language

Talk from InfluxDays London on Flux, the new scripting language being built by InfluxData.

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

June 14, 2018
Tweet

Transcript

  1. Flux (#fluxlang): a new (time series) data scripting language Paul

    Dix @pauldix paul@pauldix.net
  2. IFQL -> Flux

  3. Data scripting language?

  4. MIT License Language & Engine written in Go

  5. Talk Structure • Why Flux? • Design & Structure •

    Motivating Examples
  6. Why not SQL?

  7. Relational Algebra

  8. SQL isn’t the only interpretation!

  9. QUEL & POSTGRESQUEL range of E is EMPLOYEE retrieve into

    W (COMP = E.Salary / (E.Age - 18)) where E.Name = "Jones" select (e.salary / (e.age - 18)) as comp from employee as e where e.name = "Jones" SQL
  10. None
  11. Inertia

  12. Additions & Semantics

  13. Functional FTW!

  14. Rethink Programmer Productivity

  15. Language > Query

  16. Change Reality

  17. Existing Language?

  18. Haskell or Lisp!

  19. Flux Design Principles

  20. Useable

  21. Make Everyone a Data Programmer!

  22. Readable

  23. Flexible

  24. Composable

  25. Testable

  26. Contributable

  27. Shareable

  28. Beginning Examples

  29. None
  30. None
  31. showMeasurements(db: "telegraf")

  32. showMeasurements(db: "telegraf") Function

  33. showMeasurements(db: "telegraf") Named Argument

  34. showMeasurements(db: "telegraf") String Literal

  35. showTagKeys(db: "telegraf", measurement: "cpu")

  36. showTagKeys(db: "telegraf", measurement: "cpu") Named Arguments

  37. showTagKeys(db: "telegraf", measurements: ["redis", "mysql"])

  38. showTagKeys(db: "telegraf", measurements: ["redis", "mysql"]) Passing an array

  39. showTagValues(db: "telegraf", tag: "host")

  40. showFieldKeys(db:"telegraf", measurement:"cpu")

  41. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system")
  42. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Comments
  43. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Duration Literal
  44. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Pipe forward operator
  45. // get all data from the telegraf db from(db:"telegraf") //

    filter that by the last hour |> range(start:-1h) // filter further by series with a specific measurement and field |> filter(fn: r => r._measurement == "cpu" and r._field == "usage_system") Anonymous Function
  46. Operators + == != ( ) - < !~ [

    ] * > =~ { } / <= = , : % >= <- . |>
  47. Types • int • uint • float64 • string •

    duration • time • regex • array • object • function • namespace
  48. Functions Overview

  49. Inputs from, fromKafka, fromFile, fromS3, fromPrometheus, fromMySQL, etc.

  50. Outputs to, toKafka, toFile, toS3, toPrometheus, toMySQL, etc.

  51. Functions • count • covariance • cumulativeSum • derivative •

    difference • distinct • filter • first • from • group • integral • mean • min • percentile • range • sample • set • shift • skew • sort • spread • stateTracking • limit • map • max • window • yield • cov • highestMax • highestAverage • highestCurrent • lowestMin • join • last • stddev • sum • lowestAverage • lowestCurrent • pearsonR • stateCount • stateDuration • top • bottom
  52. Flux ⊇ Graphite

  53. Data Model

  54. Example Series _measurement=mem,host=A,region=west,_field=free _measurement=mem,host=B,region=west,_field=free _measurement=cpu,host=A,region=west,_field=usage_system _measurement=cpu,host=A,region=west,_field=usage_user

  55. Example Series _measurement=mem,host=A,region=west,_field=free _measurement=mem,host=B,region=west,_field=free _measurement=cpu,host=A,region=west,_field=usage_system _measurement=cpu,host=A,region=west,_field=usage_user Measurement

  56. Example Series _measurement=mem,host=A,region=west,_field=free _measurement=mem,host=B,region=west,_field=free _measurement=cpu,host=A,region=west,_field=usage_system _measurement=cpu,host=A,region=west,_field=usage_user Field

  57. Table _measurement host region _field _time _value mem A west

    free 2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10
  58. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Column
  59. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Record
  60. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Group Key _measurement=mem,host=A,region=west,_field=free
  61. _measurement host region _field _time _value mem A west free

    2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 10 Every record has the same value! _measurement=mem,host=A,region=west,_field=free
  62. Table Per Series _measurement host region _field _time _value mem

    A west free 2018-06-14T09:15:00 10 mem A west free 2018-06-14T09:14:50 11 _measurement host region _field _time _value mem B west free 2018-06-14T09:15:00 20 mem B west free 2018-06-14T09:14:50 22 _measurement host region _field _time _value cpu A west usage_user 2018-06-14T09:15:00 45 cpu A west usage_user 2018-06-14T09:14:50 49 _measurement host region _field _time _value cpu A west usage_system 2018-06-14T09:15:00 35 cpu A west usage_system 2018-06-14T09:14:50 38
  63. input tables -> function -> output tables

  64. input tables -> function -> output tables // example query

    from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  65. input tables -> function -> output tables DateTime Literal //

    example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  66. input tables -> function -> output tables What to sum

    on? // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  67. input tables -> function -> output tables Default columns argument

    // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum(columns: [“_value”])
  68. input tables -> function -> output tables _meas ureme host

    region _field _time _valu e mem A west free 2018-06- 14T09:1 10 mem A west free 2018-06- 14T09:1 11 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 20 mem B west free 2018-06- 14T09:14 22 Input in table form // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  69. input tables -> function -> output tables _meas ureme host

    region _field _time _valu e mem A west free 2018-06- 14T09:1 10 mem A west free 2018-06- 14T09:1 11 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 20 mem B west free 2018-06- 14T09:14 22 sum() // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum()
  70. input tables -> function -> output tables // example query

    from(db:"telegraf") |> range(start:2018-06-14T09:14:50, start:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> sum() _meas ureme host region _field _time _valu e mem A west free 2018-06- 14T09:1 10 mem A west free 2018-06- 14T09:1 11 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 20 mem B west free 2018-06- 14T09:14 22 sum() _meas ureme host region _field _time _valu e mem A west free 2018-06- 14T09:1 21 _meas ureme host region _field _time _valu e mem B west free 2018-06- 14T09:15 42
  71. N to N table mapping (1 to 1 mapping)

  72. N to M table mapping

  73. window // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) 30s of data (4 samples)
  74. window // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) split into 20s windows
  75. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) Input
  76. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 window( every:20s) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s)
  77. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 window( every:20s) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) _meas ureme host region _field _time _valu e mem A west free …14:30 10 mem A west free …14:40 11 _meas ureme host region _field _time _valu e mem B west free …14:50 23 mem B west free …15:00 24 _meas ureme host region _field _time _valu e mem B west free …14:30 20 mem B west free …14:40 22 _meas ureme host region _field _time _valu e mem A west free …14:50 12 mem A west free …15:00 13
  78. window _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 window( every:20s) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> window(every:20s) _meas ureme host region _field _time _valu e mem A west free …14:30 10 mem A west free …14:40 11 _meas ureme host region _field _time _valu e mem B west free …14:50 23 mem B west free …15:00 24 _meas ureme host region _field _time _valu e mem B west free …14:30 20 mem B west free …14:40 22 _meas ureme host region _field _time _valu e mem A west free …14:50 12 mem A west free …15:00 13 N to M tables
  79. Window based on time _start and _stop columns

  80. group // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"])
  81. group // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn:

    r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"]) new partition key
  82. group _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"])
  83. group _meas host region _field _time _valu mem A west

    free …14:30 10 mem A west free …14:40 11 mem A west free …14:50 12 mem A west free …15:00 13 _meas host region _field _time _valu mem B west free …14:30 20 mem B west free …14:40 22 mem B west free …14:50 23 mem B west free …15:00 24 group( keys: [“region”]) // example query from(db:"telegraf") |> range(start:2018-06-14T09:14:30, end:2018-06-14T09:15:01) |> filter(fn: r => r._measurement == “mem" and r._field == “free”) |> group(keys:[“region"]) _meas ureme host region _field _time _valu e mem A west free …14:30 10 mem B west free …14:30 20 mem A west free …14:40 11 mem B west free …14:40 21 mem A west free …14:50 12 mem B west free …14:50 22 mem B west free …15:00 13 mem B west free …15:00 23 N to M tables M == cardinality(group keys)
  84. Group based on columns

  85. Composable & Flexible

  86. showTagValues(db: "telegraf", tag: "host")

  87. New argument, same function definition showTagValues(db: "telegraf", tag: "host", startTime:

    2018-06-14T09:15:00)
  88. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"])
  89. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) Assign function to variable
  90. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) Specify default argument value to make optional
  91. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) now function
  92. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"]) pass function as argument
  93. showTagValues = (db, tag, start=-1h, stop=now(), predicate=(r) => true) =>

    from(db:db) |> range(start:start, stop:stop) |> filter(fn: predicate) |> group(by:[tag]) // get the distinct values for the tag |> distinct(column:tag) // collapse all tables into one |> group(none:true) // drop all columns except _value |> keep(columns: ["_value"])
  94. showTagValues( db:"telegraf", tag:"host", predicate: (r) => r._measurement == "redis")

  95. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value))
  96. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) map function
  97. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) float function
  98. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) only named arguments!
  99. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) make this a function?
  100. Defining functions that take inputs castToFloat = (table=<-) { return

    table |> map(fn: (r) => float(v:r._value)) } user defined pipe forwardable function
  101. Defining functions that take inputs // calling it from(db:"telegraf") |>

    range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> castToFloat()
  102. Defining functions that take inputs // convert all values into

    floats from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> map(fn: (r) => float(v:r._value)) castToFloat = (table=<-) { return table |> map(fn: (r) => float(v:r._value)) } from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) => r._measurement == "foo") |> castToFloat()
  103. Any pipe forward function can use arguments min(table: range(start: -1h,

    table: filter(fn: (r) => r.host == "server0", table: from(db: "mydb"))))
  104. Make you a Lisp

  105. New Query Functionality finally getting to those feature requests!

  106. None
  107. Math across measurements foo = from(db: "mydb") |> filter(fn: (r)

    => r._measurement == "foo") |> range(start: -1h) bar = from(db: "mydb") |> filter(fn: (r) => r._measurement == "bar") |> range(start: -1h) join( tables: {foo:foo, bar:bar}, on: [“foobar”, “_time”], fn: (t) => t.foo._value + t.bar._value, ) |> yield(name: "foobar")
  108. None
  109. Having Query from(db:"mydb") |> filter(fn: (r) => r._measurement == "cpu"

    and r._field == “usage_system”) |> range(start:-1h) |> window(every:10m) |> mean() // this is the having part |> filter(fn: (r) => r._value > 90)
  110. Shareable

  111. Imports and Namespaces import "math" from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square()
  112. Imports and Namespaces import "math" from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square() namespace
  113. Package Manager

  114. Imports and Namespaces import “pauldix/math" from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square() Username like RubyGems
  115. Public Package Repository (like RubyGems, npm, etc.)

  116. Imports and Namespaces import “github.com/pauldix/math” from(db:"mydb") |> filter(fn: (r) =>

    r.host == "server0") |> range(start: -1h) |> math.square() Or from Github
  117. Difficult SQL Queries

  118. Exponential Moving Average from(db:"telegraf") |> range(start:-1h) |> filter(fn: (r) =>

    r._measurement == "foo") |> exponentialMovingAverage(size:-10s)
  119. SQL rolling average select id, temp, avg(temp) over (partition by

    group_nr order by time_read) as rolling_avg from ( select id, temp, time_read, interval_group, id - row_number() over (partition by interval_group order by time_read) as group_nr from ( select id, time_read, 'epoch'::timestamp + '900 seconds'::interval * (extract(epoch from time_read)::int4 / 900) as interval_group, temp from readings ) t1 ) t2 order by time_read;
  120. Alerting import “alert” from(db:"telegraf") |> range(start: -1m) |> filter(fn: (r)

    => r._measurement == "work_queue" and r._field == "depth") |> mean() |> alert.track( warn: (r) => r._value > 200, crit: (r) => r._value > 500) |> alert.limit(duration:1m) |> toSlack(config: loadConfig(key: "slack"))
  121. Wrap up

  122. Get the nightlies! InfluxDB, Flux, Chronograf http://influxdata.com/download

  123. Get the code, file issues! https://github.com/influxdata/platform

  124. SQL is a great thing

  125. But it’s not the only thing

  126. Thank you Paul Dix @pauldix paul@influxdata.com