Autour des requêtes des TSDB

Autour des requêtes des TSDB

Presented by Aurélien Hébert at SysadminDays #8 (https://sysadmindays.fr)

415efaa445ed983307231341eaa4be55?s=128

Renaud Chaput

October 20, 2018
Tweet

Transcript

  1. 5.

    @sysadmindays @ : / # Human data classification ❖ Relational

    ID Name Country Job 1 Peter Ireland Bookkeeper 2 Paolo Italy Sales 5
  2. 6.

    @sysadmindays @ : / # And ❖ Key/Value ❖ Document

    ❖ Graphs ❖ ... Key Value edge Node Node 6
  3. 9.

    @sysadmindays @ : / # Time series are well known

    Stock market Analytics Economic Forecasting 9
  4. 11.

    @sysadmindays @ : / # Many open source out there

    ❖ Steven Acreman (Outlier) ❖ Top 10 Time Series Databases 11
  5. 13.

    @sysadmindays @ : / # Server data CPU disk disk

    I/O load kernel memory network I/O temperature system swap ... 13
  6. 15.

    @sysadmindays @ : / # 1. Human data viz ❖

    Raw ❖ Sampling ❖ Grouping 15
  7. 16.

    @sysadmindays @ : / # 2. Data analysis ❖ Metrics

    functions ❖ Operation across metrics 16
  8. 19.

    @sysadmindays @ : / # Using a server data subset

    Memory available CPU usage Disks I/O 19
  9. 24.

    @sysadmindays @ : / # OpenTSDB api/query { "start":1535752800000, "end":1535839199999,

    "queries": [ { "metric":"mem.available", "aggregator":"none" } ] } 24
  10. 28.

    @sysadmindays @ : / # Reduce data point per series

    ❖ Keep only one point every 2 minutes 28
  11. 33.

    @sysadmindays @ : / # OpenTSDB > Main down-samplers are:

    avg, count, dev, first, last, percentiles, min, max and sum 33
  12. 35.

    @sysadmindays @ : / # Only last down-sampler Interpolation of

    missing values are computed using last too PromQL 35
  13. 36.

    @sysadmindays @ : / # Graphite At configuration, using aggregation-rules

    cpu.usage_system (120) = avg cpu.usage_system Main down-samplers are: sum, avg, min, max, percentiles and count 36
  14. 38.

    @sysadmindays @ : / # Group CPU data ❖ Sampling

    synchronised timestamps ➢ Compute max aggregation 38
  15. 39.

    @sysadmindays @ : / # OpenTSDB "queries": [{ "metric":"cpu.usage_system", "aggregator":"max",

    "downsample":"2m-avg", "filters":[{ "type":"regexp", "tagk":"cpu", "filter":"cpu[0-9]+", "groupBy":false }] }] 39
  16. 40.

    @sysadmindays @ : / # OpenTSDB > Main aggregators are:

    avg, count, dev, percentiles, min, max, mimmin, mimmax, sum, none (raw data) and zimsum (Difference between mimmin and min are missing values interpolation, same for mimmax and max and zimsum and sum) 40
  17. 42.

    @sysadmindays @ : / # PromQL > Grouping operator can

    be one of: sum, avg, min, max, stddev, stdvar, count, topk, bottomk and quantile 42
  18. 44.

    @sysadmindays @ : / # Graphite > Main aggregators are

    avg, median, sum, min, max, diff, stddev, count, range, multiply and last 44
  19. 49.

    @sysadmindays @ : / # OpenTSDB "queries": [{ "metric":"diskio.writes", "aggregator":"sum",

    "downsample":"2m-avg", "rateOptions": { "counter":true, "dropResets":true }, "tags": { "name":"*" } }] 49
  20. 52.

    @sysadmindays @ : / # PromQL functions Around 50 functions

    Mean_over_time, max_over_time Delta, rate, sqrt Topk, sort 52
  21. 54.

    @sysadmindays @ : / # Graphite functions TimeSlice, TimeShift Integral,

    Interpolate, derivative More than 100 functions Unique, sort LinearRegression, exponential smoothing PieAverage, legendValue MovingMean, MovingMax 54
  22. 57.

    @sysadmindays @ : / # Series operators Prometheus: rate(diskio.write_time[2m]) /

    on(name) rate(diskio.writes[2m]) Graphite: divideSeriesList(derivative( diskio.write_time), derivative(diskio.writes)) 57
  23. 59.

    @sysadmindays @ : / # Common usage ❖ Succinct time

    series queries ❖ Same functionality ❖ Analytics 59
  24. 60.

    @sysadmindays @ : / # Data model structure ❖ Labels

    with Key/Value map attached to each metrics with Prometheus ❖ Name with dot separated component for Graphite 60
  25. 61.

    @sysadmindays @ : / # Languages review PromQL: ❖ Structured

    ❖ Easier to compute operation on multiple series ❖ Less control Graphite: ❖ More Time series functions ➢ stats ➢ maths ➢ graphs ❖ Less control 61
  26. 63.

    @sysadmindays @ : / # Warp 10 api/v0/exec: [ "token"

    "cpu.average" { "cpu" "~cpu[0-9]+" } 1535818770 10 h ] FETCH [ SWAP bucketizer.mean 0 2 m 0 ] BUCKETIZE [ SWAP [ "host" ] reducer.max ] REDUCE 63
  27. 67.

    @sysadmindays @ : / # Warp 10 review ❖ Dedicated

    language ❖ A time series workflow ❖ Queries complexity ❖ Abstraction needed to end user 67
  28. 69.

    @sysadmindays @ : / # The Elastic Time Series stack

    .es(index=test*, metric=min:mem.available).mvavg(10) 69
  29. 70.

    @sysadmindays @ : / # Does the job ❖ Mix

    of visualization ❖ Multiple series ❖ Lof of functions (functions) ❖ Less control on data ❖ Need a graphical tool (Timelion on Kibana) ❖ Lower query performance 70
  30. 71.

    @sysadmindays @ : / # M3 TSDB POST /query {

    "namespace": "test", "query": { "regexp": { "field": "city", "regexp": ".*" } }, "rangeStart": 0, "rangeEnd":'"$(date +"%s")"' } 71
  31. 74.

    @sysadmindays @ : / # Different use cases, differents TSDB...

    Wait we are missing one, aren't we? 74
  32. 75.

    @sysadmindays @ : / # Different use cases, differents TSDB...

    Wait we are missing one, aren't we? 75
  33. 76.

    @sysadmindays @ : / # From InfluxQL SELECT max("usage_system") FROM

    "telegraf".."cpu" WHERE "host" = 'ahe-XPS-13-9360' AND time > now() - 12h GROUP BY time(10m) ❖ First iteration ❖ Database queries ❖ Familiar SQL user 76
  34. 77.

    @sysadmindays @ : / # InfluxQL drawback Time series data

    are NOT relational InfluxQL had limitations for advanced use cases 77
  35. 78.

    @sysadmindays @ : / # To IFQL select(db:"telegraf") .where(exp:{"_measurement"=="cpu" AND

    "_field"=="usage_system") .range(start:-12h) .window(every:10m) .max() ❖ Time series API ❖ Functional paradigm ❖ Consistent semantics 78
  36. 79.

    @sysadmindays @ : / # And flux POST query= from(bucket:"telegraf")

    |> filter(fn: (r) => r._measurement == "cpu" AND r._field == "usage_system") |> range(start:-12h) |> group(by: ["host"]) |> window(every: 10m) |> max() ❖ Data language ❖ Lot of native functions ❖ User defined function ❖ A usable language 79
  37. 80.

    @sysadmindays @ : / # A time series query language

    Working on data locally is more powerful 80
  38. 81.

    @sysadmindays @ : / # What we want? ❖ Quick

    access to the data ❖ Times series native features ❖ Back-end agnostic ❖ Simplify user experience 81
  39. 82.

    @sysadmindays @ : / # Alternative: TSQL spec select("cpu.usage_system") .where("cpu~cpu[0-7]*")

    .last(12h) .sampleBy(5m,max) .groupBy(mean) .rate() ❖ Time Series Queries Language ❖ Simplify Time Series computation 82
  40. 83.

    @sysadmindays @ : / # HEW use case with TSQL

    sample = select('sap.flux') .where('KEPLERID=6541920') .from("2009-05-02T00:56:10.000000Z", to="2013-05-11T12:02:06.000000Z") .timesplit(6h,100,"record") .filterByLabels('record~[2-5]') .sampleBy(2h, min, false, "none") trend = sample.window(mean, 5, 5) sub(sample,trend) .on('KEPLERID','record') .lessThan(-20.0) ❖ Support complex use cases 83