Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Autour des requêtes des TSDB

Autour des requêtes des TSDB

Presented by Aurélien Hébert at SysadminDays #8 (https://sysadmindays.fr)

Renaud Chaput

October 20, 2018
Tweet

More Decks by Renaud Chaput

Other Decks in Programming

Transcript

  1. @sysadmindays
    @ :
    / #
    A T B qu e t
    wi m i r zo

    View full-size slide

  2. @sysadmindays
    @ :
    / #
    Aurélien Hébert
    @AurrelH95
    Software Engineer and data
    lover
    2

    View full-size slide

  3. @sysadmindays
    @ :
    / #
    A connected world
    3

    View full-size slide

  4. @sysadmindays
    @ :
    / #
    Producing more daily data
    4

    View full-size slide

  5. @sysadmindays
    @ :
    / #
    Human data classification
    ❖ Relational
    ID Name Country Job
    1 Peter Ireland Bookkeeper
    2 Paolo Italy Sales
    5

    View full-size slide

  6. @sysadmindays
    @ :
    / #
    And
    ❖ Key/Value
    ❖ Document
    ❖ Graphs
    ❖ ...
    Key Value
    edge
    Node
    Node
    6

    View full-size slide

  7. @sysadmindays
    @ :
    / #
    Server and application data
    7

    View full-size slide

  8. @sysadmindays
    @ :
    / #
    Metrics
    A series of data point indexed by time
    8

    View full-size slide

  9. @sysadmindays
    @ :
    / #
    Time series are well known
    Stock market Analytics
    Economic Forecasting
    9

    View full-size slide

  10. @sysadmindays
    @ :
    / #
    Time series Database
    10

    View full-size slide

  11. @sysadmindays
    @ :
    / #
    Many open source out there
    ❖ Steven Acreman
    (Outlier)
    ❖ Top 10 Time Series
    Databases
    11

    View full-size slide

  12. @sysadmindays
    @ :
    / #
    A monitoring use case
    12

    View full-size slide

  13. @sysadmindays
    @ :
    / #
    Server data
    CPU
    disk
    disk I/O
    load
    kernel
    memory
    network I/O
    temperature
    system
    swap
    ...
    13

    View full-size slide

  14. @sysadmindays
    @ :
    / #
    What can we do ?
    14

    View full-size slide

  15. @sysadmindays
    @ :
    / #
    1. Human data viz
    ❖ Raw
    ❖ Sampling
    ❖ Grouping
    15

    View full-size slide

  16. @sysadmindays
    @ :
    / #
    2. Data analysis
    ❖ Metrics functions
    ❖ Operation across metrics
    16

    View full-size slide

  17. @sysadmindays
    @ :
    / #
    3. More complex analytics
    17

    View full-size slide

  18. @sysadmindays
    @ :
    / #
    An example is worth 1000 words
    18

    View full-size slide

  19. @sysadmindays
    @ :
    / #
    Using a server data subset
    Memory available
    CPU usage
    Disks I/O
    19

    View full-size slide

  20. @sysadmindays
    @ :
    / #
    Data collection
    Agent
    20

    View full-size slide

  21. @sysadmindays
    @ :
    / #
    Raw memory data
    Name
    Meta
    List
    21

    View full-size slide

  22. @sysadmindays
    @ :
    / #
    Focus on a TSDB subset
    grafana
    22

    View full-size slide

  23. @sysadmindays
    @ :
    / #
    1. Human data viz
    23

    View full-size slide

  24. @sysadmindays
    @ :
    / #
    OpenTSDB
    api/query
    {
    "start":1535752800000,
    "end":1535839199999,
    "queries": [
    {
    "metric":"mem.available",
    "aggregator":"none"
    }
    ]
    }
    24

    View full-size slide

  25. @sysadmindays
    @ :
    / #
    PromQL
    api/v1/query_range?
    query=mem.available&
    start=1535797890&
    end=1535818770&
    step=30
    25

    View full-size slide

  26. @sysadmindays
    @ :
    / #
    Graphite
    /render?
    target=mem.available&
    from=1535797842&
    until=1535818822&
    26

    View full-size slide

  27. @sysadmindays
    @ :
    / #
    CPU’s monitoring
    27

    View full-size slide

  28. @sysadmindays
    @ :
    / #
    Reduce data point per series
    ❖ Keep only one point every 2 minutes
    28

    View full-size slide

  29. @sysadmindays
    @ :
    / #
    Sampling
    29

    View full-size slide

  30. @sysadmindays
    @ :
    / #
    Sampling
    30

    View full-size slide

  31. @sysadmindays
    @ :
    / #
    Sampling
    31

    View full-size slide

  32. @sysadmindays
    @ :
    / #
    OpenTSDB
    "queries":
    [{
    "metric":"cpu.usage_system",
    "aggregator":"sum",
    "downsample":"2m-avg",
    "tags": {
    "cpu":"*"
    }
    }]
    32

    View full-size slide

  33. @sysadmindays
    @ :
    / #
    OpenTSDB
    > Main down-samplers are: avg, count, dev, first, last,
    percentiles, min, max and sum
    33

    View full-size slide

  34. @sysadmindays
    @ :
    / #
    PromQL
    api/v1/query_range?
    query=cpu.usage_system{
    cpu=~"cpu[0-7]*"}&
    start=1535797890&
    end=1535818770&
    step=2m
    34

    View full-size slide

  35. @sysadmindays
    @ :
    / #
    Only last down-sampler
    Interpolation of missing values are computed using last too
    PromQL
    35

    View full-size slide

  36. @sysadmindays
    @ :
    / #
    Graphite
    At configuration, using aggregation-rules
    cpu.usage_system (120) = avg cpu.usage_system
    Main down-samplers are: sum, avg, min, max, percentiles and count
    36

    View full-size slide

  37. @sysadmindays
    @ :
    / #
    Reduce CPU series
    37

    View full-size slide

  38. @sysadmindays
    @ :
    / #
    Group CPU data
    ❖ Sampling synchronised timestamps
    ➢ Compute max aggregation
    38

    View full-size slide

  39. @sysadmindays
    @ :
    / #
    OpenTSDB
    "queries":
    [{
    "metric":"cpu.usage_system",
    "aggregator":"max",
    "downsample":"2m-avg",
    "filters":[{
    "type":"regexp",
    "tagk":"cpu",
    "filter":"cpu[0-9]+",
    "groupBy":false
    }]
    }]
    39

    View full-size slide

  40. @sysadmindays
    @ :
    / #
    OpenTSDB
    > Main aggregators are: avg, count, dev, percentiles,
    min, max, mimmin, mimmax, sum, none (raw data) and
    zimsum
    (Difference between mimmin and min are missing values interpolation, same for
    mimmax and max and zimsum and sum)
    40

    View full-size slide

  41. @sysadmindays
    @ :
    / #
    PromQL
    api/v1/query_range?
    query=max(cpu.
    usage_system{
    cpu=~"cpu[0-7]*"})
    start=1535797890&
    end=1535818770&
    step=2m
    41

    View full-size slide

  42. @sysadmindays
    @ :
    / #
    PromQL
    > Grouping operator can be one of: sum, avg, min,
    max, stddev, stdvar, count, topk, bottomk and
    quantile
    42

    View full-size slide

  43. @sysadmindays
    @ :
    / #
    Graphite
    /render?
    target=aggregate(cpu.
    usage_system,'max')&
    from=1535797842&
    until=1535818822&
    43

    View full-size slide

  44. @sysadmindays
    @ :
    / #
    Graphite
    > Main aggregators are avg, median, sum, min, max,
    diff, stddev, count, range, multiply and last
    44

    View full-size slide

  45. @sysadmindays
    @ :
    / #
    Be able to see data
    45

    View full-size slide

  46. @sysadmindays
    @ :
    / #
    Disk I/O’s monitoring
    46

    View full-size slide

  47. @sysadmindays
    @ :
    / #
    2. Data analysis
    47

    View full-size slide

  48. @sysadmindays
    @ :
    / #
    Compute a rate
    From bytes to bytes per seconds
    48

    View full-size slide

  49. @sysadmindays
    @ :
    / #
    OpenTSDB
    "queries":
    [{
    "metric":"diskio.writes",
    "aggregator":"sum",
    "downsample":"2m-avg",
    "rateOptions": {
    "counter":true,
    "dropResets":true
    },
    "tags": {
    "name":"*"
    }
    }]
    49

    View full-size slide

  50. @sysadmindays
    @ :
    / #
    OpenTSDB functions
    Only rate operation
    50

    View full-size slide

  51. @sysadmindays
    @ :
    / #
    PromQL
    api/v1/query_range?
    query=rate(diskio.wri
    tes[2m])&
    start=1535797890&
    end=1535818770&
    step=2m
    51

    View full-size slide

  52. @sysadmindays
    @ :
    / #
    PromQL functions
    Around 50 functions
    Mean_over_time, max_over_time
    Delta, rate, sqrt
    Topk, sort
    52

    View full-size slide

  53. @sysadmindays
    @ :
    / #
    Graphite
    /render?
    target=divideSeries(
    derivative(diskio.writes),
    60)&
    from=1535797842&
    until=1535818822&
    53

    View full-size slide

  54. @sysadmindays
    @ :
    / #
    Graphite functions
    TimeSlice, TimeShift
    Integral, Interpolate, derivative
    More than 100 functions
    Unique, sort
    LinearRegression, exponential smoothing PieAverage, legendValue
    MovingMean, MovingMax
    54

    View full-size slide

  55. @sysadmindays
    @ :
    / #
    Disk I/O writes times
    55

    View full-size slide

  56. @sysadmindays
    @ :
    / #
    Series operators
    Disk I/O time Disk I/O writes
    56

    View full-size slide

  57. @sysadmindays
    @ :
    / #
    Series operators
    Prometheus:
    rate(diskio.write_time[2m])
    / on(name)
    rate(diskio.writes[2m])
    Graphite:
    divideSeriesList(derivative(
    diskio.write_time),
    derivative(diskio.writes))
    57

    View full-size slide

  58. @sysadmindays
    @ :
    / #
    Graphite and promQL review
    58

    View full-size slide

  59. @sysadmindays
    @ :
    / #
    Common usage
    ❖ Succinct time series queries
    ❖ Same functionality
    ❖ Analytics
    59

    View full-size slide

  60. @sysadmindays
    @ :
    / #
    Data model structure
    ❖ Labels with Key/Value map attached to each metrics with
    Prometheus
    ❖ Name with dot separated component for Graphite
    60

    View full-size slide

  61. @sysadmindays
    @ :
    / #
    Languages review
    PromQL:
    ❖ Structured
    ❖ Easier to compute
    operation on multiple series
    ❖ Less control
    Graphite:
    ❖ More Time series functions
    ➢ stats
    ➢ maths
    ➢ graphs
    ❖ Less control
    61

    View full-size slide

  62. @sysadmindays
    @ :
    / #
    3. Complex analytics
    62

    View full-size slide

  63. @sysadmindays
    @ :
    / #
    Warp 10
    api/v0/exec:
    [
    "token" "cpu.average" { "cpu" "~cpu[0-9]+" } 1535818770 10 h
    ] FETCH
    [ SWAP bucketizer.mean 0 2 m 0 ] BUCKETIZE
    [ SWAP [ "host" ] reducer.max ] REDUCE
    63

    View full-size slide

  64. @sysadmindays
    @ :
    / #
    Hello Exo World use case
    64

    View full-size slide

  65. @sysadmindays
    @ :
    / #
    Warp10 - hands on
    65

    View full-size slide

  66. @sysadmindays
    @ :
    / #
    Hello Exo World result
    66

    View full-size slide

  67. @sysadmindays
    @ :
    / #
    Warp 10 review
    ❖ Dedicated language
    ❖ A time series
    workflow
    ❖ Queries complexity
    ❖ Abstraction needed to
    end user
    67

    View full-size slide

  68. @sysadmindays
    @ :
    / #
    And the Elastic Stack?
    68

    View full-size slide

  69. @sysadmindays
    @ :
    / #
    The Elastic Time Series stack
    .es(index=test*, metric=min:mem.available).mvavg(10)
    69

    View full-size slide

  70. @sysadmindays
    @ :
    / #
    Does the job
    ❖ Mix of visualization
    ❖ Multiple series
    ❖ Lof of functions
    (functions)
    ❖ Less control on data
    ❖ Need a graphical tool
    (Timelion on Kibana)
    ❖ Lower query
    performance
    70

    View full-size slide

  71. @sysadmindays
    @ :
    / #
    M3 TSDB
    POST /query
    {
    "namespace": "test",
    "query": {
    "regexp": {
    "field": "city",
    "regexp": ".*"
    }
    },
    "rangeStart": 0,
    "rangeEnd":'"$(date +"%s")"'
    }
    71

    View full-size slide

  72. @sysadmindays
    @ :
    / #
    Nobody’s is perfect
    72

    View full-size slide

  73. @sysadmindays
    @ :
    / #
    Different use cases, differents TSDB...
    73

    View full-size slide

  74. @sysadmindays
    @ :
    / #
    Different use cases, differents TSDB...
    Wait we are missing one, aren't we?
    74

    View full-size slide

  75. @sysadmindays
    @ :
    / #
    Different use cases, differents TSDB...
    Wait we are missing one, aren't we?
    75

    View full-size slide

  76. @sysadmindays
    @ :
    / #
    From InfluxQL
    SELECT max("usage_system")
    FROM "telegraf".."cpu"
    WHERE "host" = 'ahe-XPS-13-9360'
    AND time > now() - 12h
    GROUP BY time(10m)
    ❖ First iteration
    ❖ Database queries
    ❖ Familiar SQL user
    76

    View full-size slide

  77. @sysadmindays
    @ :
    / #
    InfluxQL drawback
    Time series data are NOT relational
    InfluxQL had limitations for advanced use
    cases
    77

    View full-size slide

  78. @sysadmindays
    @ :
    / #
    To IFQL
    select(db:"telegraf")
    .where(exp:{"_measurement"=="cpu"
    AND "_field"=="usage_system")
    .range(start:-12h)
    .window(every:10m)
    .max()
    ❖ Time series API
    ❖ Functional paradigm
    ❖ Consistent semantics
    78

    View full-size slide

  79. @sysadmindays
    @ :
    / #
    And flux
    POST query=
    from(bucket:"telegraf")
    |> filter(fn: (r) =>
    r._measurement == "cpu"
    AND r._field == "usage_system")
    |> range(start:-12h)
    |> group(by: ["host"])
    |> window(every: 10m)
    |> max()
    ❖ Data language
    ❖ Lot of native functions
    ❖ User defined function
    ❖ A usable language
    79

    View full-size slide

  80. @sysadmindays
    @ :
    / #
    A time series query language
    Working on data locally is more powerful
    80

    View full-size slide

  81. @sysadmindays
    @ :
    / #
    What we want?
    ❖ Quick access to the data
    ❖ Times series native features
    ❖ Back-end agnostic
    ❖ Simplify user experience
    81

    View full-size slide

  82. @sysadmindays
    @ :
    / #
    Alternative: TSQL spec
    select("cpu.usage_system")
    .where("cpu~cpu[0-7]*")
    .last(12h)
    .sampleBy(5m,max)
    .groupBy(mean)
    .rate()
    ❖ Time Series Queries
    Language
    ❖ Simplify Time Series
    computation
    82

    View full-size slide

  83. @sysadmindays
    @ :
    / #
    HEW use case with TSQL
    sample = select('sap.flux')
    .where('KEPLERID=6541920')
    .from("2009-05-02T00:56:10.000000Z",
    to="2013-05-11T12:02:06.000000Z")
    .timesplit(6h,100,"record")
    .filterByLabels('record~[2-5]')
    .sampleBy(2h, min, false, "none")
    trend = sample.window(mean, 5, 5)
    sub(sample,trend)
    .on('KEPLERID','record')
    .lessThan(-20.0)
    ❖ Support complex use
    cases
    83

    View full-size slide

  84. @sysadmindays
    @ :
    / #
    Tha s!

    View full-size slide