Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Organizing Metrics: Hierarchical or Tagged?

39b7a68b6cbc43ec7683ad0bcc4c9570?s=47 Paul Dix
October 30, 2014
710

Organizing Metrics: Hierarchical or Tagged?

Talk given at the London DevOps Exchange about how to organize hundreds of thousands of metric time series.

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

October 30, 2014
Tweet

Transcript

  1. Organizing Metrics: Hierarchical or Tagged? Paul Dix CEO of InfluxDB

    paul@influxdb.com @pauldix
  2. Organizing Metrics?

  3. Necessary when you have thousands, tens, hundreds, or millions

  4. Discovery What metrics do I have?

  5. Merging and Aggregating Combine these and give me a result

  6. None
  7. Hierarchy

  8. Artifact of Whisper’s implementation

  9. Series are round robin files on disk organized in directories

    (hierarchy)
  10. Meta data encoded in series name

  11. None
  12. Tagged?

  13. OpenTSDB Metrics mysql.bytes_received \ ! 1287333217 327810227706 \ ! schema=foo

    host=db1
  14. mysql.bytes_received \ ! 1287333217 327810227706 \ ! schema=foo host=db1 Name

  15. mysql.bytes_received \ ! 1287333217 327810227706 \ ! schema=foo host=db1 Tags

  16. Single Level Hierarchy + Tags

  17. Hierarchy: names

  18. Tags

  19. Meta data encoded in series name and tags

  20. None
  21. Data [ { "name": "cpu", "columns": ["time", "value", "host"], "points":

    [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
  22. Flat

  23. list series

  24. list series /.*dc\.USWest/

  25. select percentile(90, value) from merge(/cpu_wait.*dc\.USWest.*/) group by time(10m) where time

    > now() - 4h
  26. Doesn’t scale well to millions of series!

  27. select percentile(90, value) from cpu_wait group by time(10m) where time

    > now() - 4h and dataCenter = ‘USWest’
  28. Doesn’t scale well to thousands of hosts!

  29. We have to pick a method

  30. Hierarchy vs. Tags?

  31. Religious Debate?

  32. Emacs vs. Vim

  33. i.e. debates that can’t be solved

  34. Scientific Debate?

  35. acceleration due to gravity

  36. Things that have a clear testable answer

  37. Hierarchy vs. Tags a bit of both?

  38. Tags are vastly superior to hierarchies

  39. What questions can you ask?

  40. What sensors do I have? CPU Idle, network in bytes,

    memory used, redis key count, etc.
  41. OpenTSDB names

  42. cpu_wait network_in_bytes network_out_bytes …

  43. Graphite traverse hierarchy

  44. app.foo.dc.uswest.host.servera.cpu_wait app.foo.dc.uswest.host.servera.network_in_bytes app.foo.dc.uswest.host.servera.network_out_bytes …

  45. app.foo.dc.uswest.host.servera.cpu_wait app.foo.dc.uswest.host.servera.network_in_bytes app.foo.dc.uswest.host.servera.network_out_bytes … Sensor at the end

  46. What values do I have on dimension X? hosts, data

    centers, services, applications
  47. OpenTSDB traverse one level and tags

  48. redis_connections response_times.90 …

  49. app.foo.dc.uswest.host.servera.cpu_wait app.foo.dc.uswest.host.servera.network_in_bytes app.foo.dc.uswest.host.servera.network_out_bytes …

  50. Show me all time series for X dashboard for MySQL,

    dashboard for host
  51. Computations percentiles across sets of hosts, data centers, services

  52. Pure tagging

  53. {! "Name": "CPU Wait",! "Host": "serverA.influxdb.com",! "Data Center": "US West"!

    }!
  54. {! "Name": "CPU Wait",! "Host": "serverA.influxdb.com",! "Data Center": "US West"!

    }! Readable Names!
  55. {! "Name": “Redis Connections",! "Host": "serverA.influxdb.com",! "Data Center": "US West"!

    }! Queryable! Which hosts have redis connections?
  56. {! "Name": "Erorrs",! "Host": "serverA.influxdb.com",! "Data Center": "US West",! "Application":

    "My super rad app"! }! Queryable! What names (sensors) do I have for My super rad app?
  57. Queryable! What names (sensors) do I have at 1h precision?

    {! "Name": "Erorrs",! "Host": "serverA.influxdb.com",! "Data Center": "US West",! "Application": "My super rad app",! "Precision": "1h"! }!
  58. Computation select percentile(90)! from ("Name": "CPU Wait", "Data Center": "US

    West")! group by time(10m)! where time > now() - 6h!
  59. Hierarchy on the fly What tags co-occur with a given

    tag?
  60. Faceted Serch

  61. Given “Host” and “Data Center” what other tags are there?

  62. "Data Center" = "US West" ! "Name": 2153 "Host": 256

    "Service": 20 "Precision": 10 "Application": 4
  63. Need to be able to add dimensions/tags

  64. Need to support a large number of tags Both for

    a single data point and over all
  65. Pure tagging gives you much more power than hierarchies

  66. Can be combinatorial OpenTSDB hot spots, etc

  67. Need to be able to define indexing behavior for tags

  68. Can it work?

  69. This is part of what InfluxDB is working on Feedback

    welcome! http://influxdb.com/community.html @InfluxDB