Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Organizing Metrics: Hierarchical or Tagged?

Paul Dix
October 30, 2014
720

Organizing Metrics: Hierarchical or Tagged?

Talk given at the London DevOps Exchange about how to organize hundreds of thousands of metric time series.

Paul Dix

October 30, 2014
Tweet

Transcript

  1. Organizing Metrics:
    Hierarchical or Tagged?
    Paul Dix
    CEO of InfluxDB
    [email protected]fluxdb.com
    @pauldix

    View Slide

  2. Organizing Metrics?

    View Slide

  3. Necessary when you
    have thousands, tens,
    hundreds, or millions

    View Slide

  4. Discovery
    What metrics do I have?

    View Slide

  5. Merging and
    Aggregating
    Combine these and give me a result

    View Slide

  6. View Slide

  7. Hierarchy

    View Slide

  8. Artifact of Whisper’s
    implementation

    View Slide

  9. Series are round robin
    files on disk organized in
    directories (hierarchy)

    View Slide

  10. Meta data encoded in
    series name

    View Slide

  11. View Slide

  12. Tagged?

    View Slide

  13. OpenTSDB Metrics
    mysql.bytes_received \
    !
    1287333217 327810227706 \
    !
    schema=foo host=db1

    View Slide

  14. mysql.bytes_received \
    !
    1287333217 327810227706 \
    !
    schema=foo host=db1
    Name

    View Slide

  15. mysql.bytes_received \
    !
    1287333217 327810227706 \
    !
    schema=foo host=db1
    Tags

    View Slide

  16. Single Level Hierarchy
    + Tags

    View Slide

  17. Hierarchy: names

    View Slide

  18. Tags

    View Slide

  19. Meta data encoded in
    series name and tags

    View Slide

  20. View Slide

  21. Data
    [
    {
    "name": "cpu",
    "columns": ["time", "value", "host"],
    "points": [
    [1395168540, 56.7, "foo.influxdb.com"],
    [1395168540, 43.9, "bar.influxdb.com"]
    ]
    }
    ]

    View Slide

  22. Flat

    View Slide

  23. list series

    View Slide

  24. list series /.*dc\.USWest/

    View Slide

  25. select percentile(90, value)
    from merge(/cpu_wait.*dc\.USWest.*/)
    group by time(10m)
    where time > now() - 4h

    View Slide

  26. Doesn’t scale well to
    millions of series!

    View Slide

  27. select percentile(90, value)
    from cpu_wait
    group by time(10m)
    where time > now() - 4h and
    dataCenter = ‘USWest’

    View Slide

  28. Doesn’t scale well to
    thousands of hosts!

    View Slide

  29. We have to pick a
    method

    View Slide

  30. Hierarchy vs. Tags?

    View Slide

  31. Religious Debate?

    View Slide

  32. Emacs vs. Vim

    View Slide

  33. i.e. debates that can’t
    be solved

    View Slide

  34. Scientific Debate?

    View Slide

  35. acceleration due to
    gravity

    View Slide

  36. Things that have a
    clear testable answer

    View Slide

  37. Hierarchy vs. Tags a
    bit of both?

    View Slide

  38. Tags are vastly
    superior to hierarchies

    View Slide

  39. What questions can
    you ask?

    View Slide

  40. What sensors do I
    have?
    CPU Idle, network in bytes, memory used, redis key
    count, etc.

    View Slide

  41. OpenTSDB
    names

    View Slide

  42. cpu_wait
    network_in_bytes
    network_out_bytes

    View Slide

  43. Graphite
    traverse hierarchy

    View Slide

  44. app.foo.dc.uswest.host.servera.cpu_wait
    app.foo.dc.uswest.host.servera.network_in_bytes
    app.foo.dc.uswest.host.servera.network_out_bytes

    View Slide

  45. app.foo.dc.uswest.host.servera.cpu_wait
    app.foo.dc.uswest.host.servera.network_in_bytes
    app.foo.dc.uswest.host.servera.network_out_bytes

    Sensor at the end

    View Slide

  46. What values do I have
    on dimension X?
    hosts, data centers, services, applications

    View Slide

  47. OpenTSDB
    traverse one level and tags

    View Slide

  48. redis_connections
    response_times.90

    View Slide

  49. app.foo.dc.uswest.host.servera.cpu_wait
    app.foo.dc.uswest.host.servera.network_in_bytes
    app.foo.dc.uswest.host.servera.network_out_bytes

    View Slide

  50. Show me all time
    series for X
    dashboard for MySQL, dashboard for host

    View Slide

  51. Computations
    percentiles across sets of hosts, data centers, services

    View Slide

  52. Pure tagging

    View Slide

  53. {!
    "Name": "CPU Wait",!
    "Host": "serverA.influxdb.com",!
    "Data Center": "US West"!
    }!

    View Slide

  54. {!
    "Name": "CPU Wait",!
    "Host": "serverA.influxdb.com",!
    "Data Center": "US West"!
    }!
    Readable Names!

    View Slide

  55. {!
    "Name": “Redis Connections",!
    "Host": "serverA.influxdb.com",!
    "Data Center": "US West"!
    }!
    Queryable!
    Which hosts have redis connections?

    View Slide

  56. {!
    "Name": "Erorrs",!
    "Host": "serverA.influxdb.com",!
    "Data Center": "US West",!
    "Application": "My super rad app"!
    }!
    Queryable!
    What names (sensors) do I have for My super rad app?

    View Slide

  57. Queryable!
    What names (sensors) do I have at 1h precision?
    {!
    "Name": "Erorrs",!
    "Host": "serverA.influxdb.com",!
    "Data Center": "US West",!
    "Application": "My super rad app",!
    "Precision": "1h"!
    }!

    View Slide

  58. Computation
    select percentile(90)!
    from ("Name": "CPU Wait", "Data Center": "US West")!
    group by time(10m)!
    where time > now() - 6h!

    View Slide

  59. Hierarchy on the fly
    What tags co-occur with a given tag?

    View Slide

  60. Faceted Serch

    View Slide

  61. Given “Host” and “Data
    Center” what other tags
    are there?

    View Slide

  62. "Data Center" = "US West"
    !
    "Name": 2153
    "Host": 256
    "Service": 20
    "Precision": 10
    "Application": 4

    View Slide

  63. Need to be able to
    add dimensions/tags

    View Slide

  64. Need to support a
    large number of tags
    Both for a single data point and over all

    View Slide

  65. Pure tagging gives you
    much more power than
    hierarchies

    View Slide

  66. Can be combinatorial
    OpenTSDB hot spots, etc

    View Slide

  67. Need to be able to define
    indexing behavior for
    tags

    View Slide

  68. Can it work?

    View Slide

  69. This is part of what
    InfluxDB is working on
    Feedback welcome!
    http://influxdb.com/community.html
    @InfluxDB

    View Slide