Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Self Service Alerting for Everyone

Dan
April 22, 2014

Self Service Alerting for Everyone

Short Talk given at the Boston Devops Meetup
http://www.meetup.com/Boston-Devops/events/174763722/

Dan

April 22, 2014
Tweet

More Decks by Dan

Other Decks in Technology

Transcript

  1. 3 Definitions • Everyone 1. Any user with a browser

    (within your organization). 2. Yes Everyone! • Self Service Alerting 1. What it is not: • Committing and pushing Nagios configs, and than restarting Nagios. • Needing a Masters in Zabbix to setup a trigger, action, email template, etc.. 2. What it is: • The ability for everyone to create a basic alert and get notified by it. • Key Performance Indicator 1. A metric that is a proxy of success for • a company • particular department • system
  2. 5 KPIs • One person's KPIs are another's noise. •

    WebOps = Web Site Performance / Availability • ServerEng = CPU / Mem / Disk • NetEng = Bandwidth / packets per sec • OMS = Order Count • Communications = Max concurrent calls • Customer Service = Calls answered in under X seconds • Finance = Time it takes to close an accounting period Some times you’re lucky Some times you’re not
  3. 6 Alerts What should I alert on? Déjà vu? Isn’t

    this the same/similar exercise as KPIs? Reminder: Don’t ask me, ask SMEs
  4. 7 Alerts • Who should alerts be created by? •

    Everyone • Why? • Alerting should be tuned by the people that care about it most. • They can adjust it to a reasonable threshold that they are happy with. • They should be able to see how it performs over time so they can improve it.
  5. 8 Alerts This is a DevOps talk, no throwing things

    over the fence! We aren’t saying to start sending them all straight to the beeper. The alert creator should also initially receive and react to it. This helps with awareness and tuning.
  6. 9 Analogy Email Routing and Filtering as an Analogy Who

    sets up Outlook rules? (answer: Everyone) Who sets up Exchange / Postfix rules? (answer: Less people than Everyone)
  7. 10 Promotion of Alerts Great now we have all these

    user created alerts, now what? The alert history can be reviewed for signal to noise ratio. After an alert has proven itself useful to a user or team promote it! Promoting it might mean migrating the alert to your normal alerting system, or could be as simple expand the subscription of the current alert.
  8. 11 Rainbows and Unicorns Great another talk about Rainbows and

    unicorns. Actually it’s not, you should start doing this now and can!
  9. 12 Metrics One prerequisite for this is having a lot

    of good metrics (or at least the ability to easily add them). We are DevOps! So we're already graphing / logging all the things (right?)
  10. 13 Pick a tool Back in my day…. you had

    to write your own self service alerting tool https://github.com/wayfair/Graphite-Tattle
  11. 14 Pick a tool These days there is a list

    of tools in this category Spin the wheel or try a few and pick the one you like.
  12. 16 Done? Great now we’re managing alerting like a super

    star, Pro Tip: Setup alerts and dashboards for the number of alerts you have and the number of alerts that are being sent out. Someone has to watch the watchers.
  13. 17 Next? Where do we go from here? • Infrastructure

    and systems have more than just Graphite data to share. • Expand beyond Graphite to alert on other systems with Self Service tools.
  14. 18 Example New Frontier Example of a new frontier: Logstash

    + Elasticsearch + Kibana = Logging done well. It's similar to the Statsd + Graphite = Metrics done well. But it has a similar gap that the Metrics stack had early one.
  15. 20 Elasticsearch Alert UI We needed a Tattle for Elasticsearch

    We are starting to develop/test one currently.
  16. 21 Elasticsearch Alert UI And it’s an easy page that

    Everyone (in our organization) can use
  17. 23 Image credits in order of appearance: • http://commons.wikimedia.org/wiki/File:Who_is_responsible_not_me.jpg •

    http://commons.wikimedia.org/wiki/File:Venn_diagram_ABC_BW.png • http://en.wikipedia.org/wiki/File:Mond-vergleich.svg • https://www.flickr.com/photos/gedankenstuecke/108894568 • http://commons.wikimedia.org/wiki/File:Pager_1.jpg • http://www.jungleredwriters.com/2011/01/all-good-things-come-in- threes.html • http://ocw.mit.edu/courses/special-programs/sp-2322-unicorns-and- rainbows-a-seminar-fall-2014/ • http://en.wikipedia.org/wiki/File:WheelUK2001Round1.jpg • https://www.flickr.com/photos/rileyroxx/151985627/ • http://en.wikipedia.org/wiki/File:Compass_align.jpg • http://www.ebay.com/bhp/unicorn-sign • http://ih2.redbubble.net/image.9886230.9887/sticker,375x360.png • http://en.wikipedia.org/wiki/File:Ap_16_view_of_Earth_during_TLC.jpg