Autour des requêtes des TSDB

Autour des requêtes des TSDB

Presented by Aurélien Hébert at SysadminDays #8 (https://sysadmindays.fr)

415efaa445ed983307231341eaa4be55?s=128

Renaud Chaput

October 20, 2018
Tweet

Transcript

  1. @sysadmindays @ : / # A T B qu e

    t wi m i r zo
  2. @sysadmindays @ : / # Aurélien Hébert @AurrelH95 Software Engineer

    and data lover 2
  3. @sysadmindays @ : / # A connected world 3

  4. @sysadmindays @ : / # Producing more daily data 4

  5. @sysadmindays @ : / # Human data classification ❖ Relational

    ID Name Country Job 1 Peter Ireland Bookkeeper 2 Paolo Italy Sales 5
  6. @sysadmindays @ : / # And ❖ Key/Value ❖ Document

    ❖ Graphs ❖ ... Key Value edge Node Node 6
  7. @sysadmindays @ : / # Server and application data 7

  8. @sysadmindays @ : / # Metrics A series of data

    point indexed by time 8
  9. @sysadmindays @ : / # Time series are well known

    Stock market Analytics Economic Forecasting 9
  10. @sysadmindays @ : / # Time series Database 10

  11. @sysadmindays @ : / # Many open source out there

    ❖ Steven Acreman (Outlier) ❖ Top 10 Time Series Databases 11
  12. @sysadmindays @ : / # A monitoring use case 12

  13. @sysadmindays @ : / # Server data CPU disk disk

    I/O load kernel memory network I/O temperature system swap ... 13
  14. @sysadmindays @ : / # What can we do ?

    14
  15. @sysadmindays @ : / # 1. Human data viz ❖

    Raw ❖ Sampling ❖ Grouping 15
  16. @sysadmindays @ : / # 2. Data analysis ❖ Metrics

    functions ❖ Operation across metrics 16
  17. @sysadmindays @ : / # 3. More complex analytics 17

  18. @sysadmindays @ : / # An example is worth 1000

    words 18
  19. @sysadmindays @ : / # Using a server data subset

    Memory available CPU usage Disks I/O 19
  20. @sysadmindays @ : / # Data collection Agent 20

  21. @sysadmindays @ : / # Raw memory data Name Meta

    List <time, value> 21
  22. @sysadmindays @ : / # Focus on a TSDB subset

    grafana 22
  23. @sysadmindays @ : / # 1. Human data viz 23

  24. @sysadmindays @ : / # OpenTSDB api/query { "start":1535752800000, "end":1535839199999,

    "queries": [ { "metric":"mem.available", "aggregator":"none" } ] } 24
  25. @sysadmindays @ : / # PromQL api/v1/query_range? query=mem.available& start=1535797890& end=1535818770&

    step=30 25
  26. @sysadmindays @ : / # Graphite /render? target=mem.available& from=1535797842& until=1535818822&

    26
  27. @sysadmindays @ : / # CPU’s monitoring 27

  28. @sysadmindays @ : / # Reduce data point per series

    ❖ Keep only one point every 2 minutes 28
  29. @sysadmindays @ : / # Sampling 29

  30. @sysadmindays @ : / # Sampling 30

  31. @sysadmindays @ : / # Sampling 31

  32. @sysadmindays @ : / # OpenTSDB "queries": [{ "metric":"cpu.usage_system", "aggregator":"sum",

    "downsample":"2m-avg", "tags": { "cpu":"*" } }] 32
  33. @sysadmindays @ : / # OpenTSDB > Main down-samplers are:

    avg, count, dev, first, last, percentiles, min, max and sum 33
  34. @sysadmindays @ : / # PromQL api/v1/query_range? query=cpu.usage_system{ cpu=~"cpu[0-7]*"}& start=1535797890&

    end=1535818770& step=2m 34
  35. @sysadmindays @ : / # Only last down-sampler Interpolation of

    missing values are computed using last too PromQL 35
  36. @sysadmindays @ : / # Graphite At configuration, using aggregation-rules

    cpu.usage_system (120) = avg cpu.usage_system Main down-samplers are: sum, avg, min, max, percentiles and count 36
  37. @sysadmindays @ : / # Reduce CPU series 37

  38. @sysadmindays @ : / # Group CPU data ❖ Sampling

    synchronised timestamps ➢ Compute max aggregation 38
  39. @sysadmindays @ : / # OpenTSDB "queries": [{ "metric":"cpu.usage_system", "aggregator":"max",

    "downsample":"2m-avg", "filters":[{ "type":"regexp", "tagk":"cpu", "filter":"cpu[0-9]+", "groupBy":false }] }] 39
  40. @sysadmindays @ : / # OpenTSDB > Main aggregators are:

    avg, count, dev, percentiles, min, max, mimmin, mimmax, sum, none (raw data) and zimsum (Difference between mimmin and min are missing values interpolation, same for mimmax and max and zimsum and sum) 40
  41. @sysadmindays @ : / # PromQL api/v1/query_range? query=max(cpu. usage_system{ cpu=~"cpu[0-7]*"})

    start=1535797890& end=1535818770& step=2m 41
  42. @sysadmindays @ : / # PromQL > Grouping operator can

    be one of: sum, avg, min, max, stddev, stdvar, count, topk, bottomk and quantile 42
  43. @sysadmindays @ : / # Graphite /render? target=aggregate(cpu. usage_system,'max')& from=1535797842&

    until=1535818822& 43
  44. @sysadmindays @ : / # Graphite > Main aggregators are

    avg, median, sum, min, max, diff, stddev, count, range, multiply and last 44
  45. @sysadmindays @ : / # Be able to see data

    45
  46. @sysadmindays @ : / # Disk I/O’s monitoring 46

  47. @sysadmindays @ : / # 2. Data analysis 47

  48. @sysadmindays @ : / # Compute a rate From bytes

    to bytes per seconds 48
  49. @sysadmindays @ : / # OpenTSDB "queries": [{ "metric":"diskio.writes", "aggregator":"sum",

    "downsample":"2m-avg", "rateOptions": { "counter":true, "dropResets":true }, "tags": { "name":"*" } }] 49
  50. @sysadmindays @ : / # OpenTSDB functions Only rate operation

    50
  51. @sysadmindays @ : / # PromQL api/v1/query_range? query=rate(diskio.wri tes[2m])& start=1535797890&

    end=1535818770& step=2m 51
  52. @sysadmindays @ : / # PromQL functions Around 50 functions

    Mean_over_time, max_over_time Delta, rate, sqrt Topk, sort 52
  53. @sysadmindays @ : / # Graphite /render? target=divideSeries( derivative(diskio.writes), 60)&

    from=1535797842& until=1535818822& 53
  54. @sysadmindays @ : / # Graphite functions TimeSlice, TimeShift Integral,

    Interpolate, derivative More than 100 functions Unique, sort LinearRegression, exponential smoothing PieAverage, legendValue MovingMean, MovingMax 54
  55. @sysadmindays @ : / # Disk I/O writes times 55

  56. @sysadmindays @ : / # Series operators Disk I/O time

    Disk I/O writes 56
  57. @sysadmindays @ : / # Series operators Prometheus: rate(diskio.write_time[2m]) /

    on(name) rate(diskio.writes[2m]) Graphite: divideSeriesList(derivative( diskio.write_time), derivative(diskio.writes)) 57
  58. @sysadmindays @ : / # Graphite and promQL review 58

  59. @sysadmindays @ : / # Common usage ❖ Succinct time

    series queries ❖ Same functionality ❖ Analytics 59
  60. @sysadmindays @ : / # Data model structure ❖ Labels

    with Key/Value map attached to each metrics with Prometheus ❖ Name with dot separated component for Graphite 60
  61. @sysadmindays @ : / # Languages review PromQL: ❖ Structured

    ❖ Easier to compute operation on multiple series ❖ Less control Graphite: ❖ More Time series functions ➢ stats ➢ maths ➢ graphs ❖ Less control 61
  62. @sysadmindays @ : / # 3. Complex analytics 62

  63. @sysadmindays @ : / # Warp 10 api/v0/exec: [ "token"

    "cpu.average" { "cpu" "~cpu[0-9]+" } 1535818770 10 h ] FETCH [ SWAP bucketizer.mean 0 2 m 0 ] BUCKETIZE [ SWAP [ "host" ] reducer.max ] REDUCE 63
  64. @sysadmindays @ : / # Hello Exo World use case

    64
  65. @sysadmindays @ : / # Warp10 - hands on 65

  66. @sysadmindays @ : / # Hello Exo World result 66

  67. @sysadmindays @ : / # Warp 10 review ❖ Dedicated

    language ❖ A time series workflow ❖ Queries complexity ❖ Abstraction needed to end user 67
  68. @sysadmindays @ : / # And the Elastic Stack? 68

  69. @sysadmindays @ : / # The Elastic Time Series stack

    .es(index=test*, metric=min:mem.available).mvavg(10) 69
  70. @sysadmindays @ : / # Does the job ❖ Mix

    of visualization ❖ Multiple series ❖ Lof of functions (functions) ❖ Less control on data ❖ Need a graphical tool (Timelion on Kibana) ❖ Lower query performance 70
  71. @sysadmindays @ : / # M3 TSDB POST /query {

    "namespace": "test", "query": { "regexp": { "field": "city", "regexp": ".*" } }, "rangeStart": 0, "rangeEnd":'"$(date +"%s")"' } 71
  72. @sysadmindays @ : / # Nobody’s is perfect 72

  73. @sysadmindays @ : / # Different use cases, differents TSDB...

    73
  74. @sysadmindays @ : / # Different use cases, differents TSDB...

    Wait we are missing one, aren't we? 74
  75. @sysadmindays @ : / # Different use cases, differents TSDB...

    Wait we are missing one, aren't we? 75
  76. @sysadmindays @ : / # From InfluxQL SELECT max("usage_system") FROM

    "telegraf".."cpu" WHERE "host" = 'ahe-XPS-13-9360' AND time > now() - 12h GROUP BY time(10m) ❖ First iteration ❖ Database queries ❖ Familiar SQL user 76
  77. @sysadmindays @ : / # InfluxQL drawback Time series data

    are NOT relational InfluxQL had limitations for advanced use cases 77
  78. @sysadmindays @ : / # To IFQL select(db:"telegraf") .where(exp:{"_measurement"=="cpu" AND

    "_field"=="usage_system") .range(start:-12h) .window(every:10m) .max() ❖ Time series API ❖ Functional paradigm ❖ Consistent semantics 78
  79. @sysadmindays @ : / # And flux POST query= from(bucket:"telegraf")

    |> filter(fn: (r) => r._measurement == "cpu" AND r._field == "usage_system") |> range(start:-12h) |> group(by: ["host"]) |> window(every: 10m) |> max() ❖ Data language ❖ Lot of native functions ❖ User defined function ❖ A usable language 79
  80. @sysadmindays @ : / # A time series query language

    Working on data locally is more powerful 80
  81. @sysadmindays @ : / # What we want? ❖ Quick

    access to the data ❖ Times series native features ❖ Back-end agnostic ❖ Simplify user experience 81
  82. @sysadmindays @ : / # Alternative: TSQL spec select("cpu.usage_system") .where("cpu~cpu[0-7]*")

    .last(12h) .sampleBy(5m,max) .groupBy(mean) .rate() ❖ Time Series Queries Language ❖ Simplify Time Series computation 82
  83. @sysadmindays @ : / # HEW use case with TSQL

    sample = select('sap.flux') .where('KEPLERID=6541920') .from("2009-05-02T00:56:10.000000Z", to="2013-05-11T12:02:06.000000Z") .timesplit(6h,100,"record") .filterByLabels('record~[2-5]') .sampleBy(2h, min, false, "none") trend = sample.window(mean, 5, 5) sub(sample,trend) .on('KEPLERID','record') .lessThan(-20.0) ❖ Support complex use cases 83
  84. @sysadmindays @ : / # Tha s!