Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Self Service Alerting for Everyone

Dan
April 22, 2014

Self Service Alerting for Everyone

Short Talk given at the Boston Devops Meetup
http://www.meetup.com/Boston-Devops/events/174763722/

Dan

April 22, 2014
Tweet

More Decks by Dan

Other Decks in Technology

Transcript

  1. Self Service Alerting for Everyone Dan Rowe Wayfair.com April 22,

    2014
  2. DRowe's Definitions:

  3. 3 Definitions • Everyone 1. Any user with a browser

    (within your organization). 2. Yes Everyone! • Self Service Alerting 1. What it is not: • Committing and pushing Nagios configs, and than restarting Nagios. • Needing a Masters in Zabbix to setup a trigger, action, email template, etc.. 2. What it is: • The ability for everyone to create a basic alert and get notified by it. • Key Performance Indicator 1. A metric that is a proxy of success for • a company • particular department • system
  4. 4 KPIs What KPIs should I track? Don’t ask me,

    ask SMEs
  5. 5 KPIs • One person's KPIs are another's noise. •

    WebOps = Web Site Performance / Availability • ServerEng = CPU / Mem / Disk • NetEng = Bandwidth / packets per sec • OMS = Order Count • Communications = Max concurrent calls • Customer Service = Calls answered in under X seconds • Finance = Time it takes to close an accounting period Some times you’re lucky Some times you’re not
  6. 6 Alerts What should I alert on? Déjà vu? Isn’t

    this the same/similar exercise as KPIs? Reminder: Don’t ask me, ask SMEs
  7. 7 Alerts • Who should alerts be created by? •

    Everyone • Why? • Alerting should be tuned by the people that care about it most. • They can adjust it to a reasonable threshold that they are happy with. • They should be able to see how it performs over time so they can improve it.
  8. 8 Alerts This is a DevOps talk, no throwing things

    over the fence! We aren’t saying to start sending them all straight to the beeper. The alert creator should also initially receive and react to it. This helps with awareness and tuning.
  9. 9 Analogy Email Routing and Filtering as an Analogy Who

    sets up Outlook rules? (answer: Everyone) Who sets up Exchange / Postfix rules? (answer: Less people than Everyone)
  10. 10 Promotion of Alerts Great now we have all these

    user created alerts, now what? The alert history can be reviewed for signal to noise ratio. After an alert has proven itself useful to a user or team promote it! Promoting it might mean migrating the alert to your normal alerting system, or could be as simple expand the subscription of the current alert.
  11. 11 Rainbows and Unicorns Great another talk about Rainbows and

    unicorns. Actually it’s not, you should start doing this now and can!
  12. 12 Metrics One prerequisite for this is having a lot

    of good metrics (or at least the ability to easily add them). We are DevOps! So we're already graphing / logging all the things (right?)
  13. 13 Pick a tool Back in my day…. you had

    to write your own self service alerting tool https://github.com/wayfair/Graphite-Tattle
  14. 14 Pick a tool These days there is a list

    of tools in this category Spin the wheel or try a few and pick the one you like.
  15. 15 Pick a tool Examples are: https://github.com/scobal/seyren And https://github.com/arachnys/cabot Somewhere

    in between: https://github.com/livingsocial/rearview/
  16. 16 Done? Great now we’re managing alerting like a super

    star, Pro Tip: Setup alerts and dashboards for the number of alerts you have and the number of alerts that are being sent out. Someone has to watch the watchers.
  17. 17 Next? Where do we go from here? • Infrastructure

    and systems have more than just Graphite data to share. • Expand beyond Graphite to alert on other systems with Self Service tools.
  18. 18 Example New Frontier Example of a new frontier: Logstash

    + Elasticsearch + Kibana = Logging done well. It's similar to the Statsd + Graphite = Metrics done well. But it has a similar gap that the Metrics stack had early one.
  19. 19 Caution Rainbows and Unicorns ahead

  20. 20 Elasticsearch Alert UI We needed a Tattle for Elasticsearch

    We are starting to develop/test one currently.
  21. 21 Elasticsearch Alert UI And it’s an easy page that

    Everyone (in our organization) can use
  22. Yes we are hiring  http://www.wayfair.com/careers Tell them DRowe sent

    you
  23. 23 Image credits in order of appearance: • http://commons.wikimedia.org/wiki/File:Who_is_responsible_not_me.jpg •

    http://commons.wikimedia.org/wiki/File:Venn_diagram_ABC_BW.png • http://en.wikipedia.org/wiki/File:Mond-vergleich.svg • https://www.flickr.com/photos/gedankenstuecke/108894568 • http://commons.wikimedia.org/wiki/File:Pager_1.jpg • http://www.jungleredwriters.com/2011/01/all-good-things-come-in- threes.html • http://ocw.mit.edu/courses/special-programs/sp-2322-unicorns-and- rainbows-a-seminar-fall-2014/ • http://en.wikipedia.org/wiki/File:WheelUK2001Round1.jpg • https://www.flickr.com/photos/rileyroxx/151985627/ • http://en.wikipedia.org/wiki/File:Compass_align.jpg • http://www.ebay.com/bhp/unicorn-sign • http://ih2.redbubble.net/image.9886230.9887/sticker,375x360.png • http://en.wikipedia.org/wiki/File:Ap_16_view_of_Earth_during_TLC.jpg