Upgrade to Pro — share decks privately, control downloads, hide ads and more …

5 Years of Metrics & Monitoring

5 Years of Metrics & Monitoring

Video of this talk from DevOpsDays Ghent: http://www.ustream.tv/recorded/54694069


5 years ago, monitoring was just beginning to emerge from the dark ages.

Since then there's been a Cambrian explosion of tools, a rough formalisation of how the tools should be strung together, the emergence of the #monitoringsucks meme, the transformation of #monitoringsucks into #monitoringlove, and the rise of a sister community around Monitorama.

Alert fatigue has become a concept that's entered the devops consciousness, and more advanced shops along the monitoring continuum are analysing their alerting data to help humans and machines work better together.

But Nagios is still the dominant check executor. Plenty of sites still use RRDtool. And plenty of people are still chained to their pagers, with no relief in sight.

What's holding us back? What will the next 5 years look like? Will we still be using Nagios? Have we misjudged our audience? What are our biggest challenges?

### Sources ###

Font: http://www.fontsquirrel.com/fonts/sketchetica
The Gospel of Graphs, according to Cleveland: http://www.amazon.com/Elements-Graphing-Data-William-Cleveland/dp/0963488414

Lindsay Holmwood

October 27, 2014

More Decks by Lindsay Holmwood

Other Decks in Technology


  1. • Key retrospective questions • What did we do well?

    • What did we learn? • What should we do differently next time? • What still puzzles us?
  2. bounding box with x + y axes labels 1 2

    3 4 5 5 3 1 5 3 1 1 2 3 4 5
  3. 8%

  4. – William S. Cleveland, p.86 Principles of Graphing Data This

    allows us to see very clearly that the pie chart judgements are less accurate than the bar chart judgements.
  5. When I hear people say “I'm not using Sensu because

    it's too complex” I think “and Nagios isn't hiding the same complexity from you?”
  6. 1. checkout 2. build 3. test 4. notify can I

    see my app? Continuous Integration Monitoring
  7. • Last 5 years • Building new tools • Formalising

    relationships • Search for parallels in other industries • Measuring the human impact
  8. • Next • Stabilisation of tools • Emerging standards •

    Exploiting parallels • Mitigating the human impact
  9. { server: dfs1 what: diskspace mountpoint: srv/node/dfs10 unit: B type:

    used metric_type: gauge } meta: { agent: diamond, processed_by: statsd2 }
  10. • Aggregation & • Grouping & • Unit conversions &

    • Scaling & • Axes labelling & • …
  11. • Cultural • Coach on what makes a good check

    • Coach on what is good alert design • Listen to the needs of the end-user