Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data driven alerting with Flapjack + Puppet + H...

Data driven alerting with Flapjack + Puppet + Hiera

Working in operations in 2014 is hard.

The infrastructures we manage are growing rapidly, and responsibility is being divided up across multiple teams.

Then something breaks. Your on-call engineer receives 900 SMS in 30 seconds. Her phone melts. You can’t distinguish the signal from the noise. It takes an hour to fix the problem.

Enter Flapjack: an event processing & monitoring alert routing system. Flapjack sits at the end of your monitoring pipeline and sends alerts to the right person.

You should be interested in Flapjack if:

- You want to automatically configure how your on-call are notified from within Puppet
- You want to identify failures faster by rolling up your alerts across multiple monitoring systems.
- You monitor infrastructures that have multiple teams responsible for keeping them up.

In this talk you will learn how to setup Flapjack with Puppet, data-driven Flapjack configuration with Hiera, and how you can leverage Puppet's metadata to effectively route alerts to people who solve problems.

## Credits

http://www.flickr.com/photos/lizadaly/4373330774
http://www.flickr.com/photos/meltwater/420749031
http://www.flickr.com/photos/whatknot/8642836187
http://www.flickr.com/photos/jonmould/5393395335
http://vmfarms.com/static/img/logos/ruby-logo.png
http://www.flickr.com/photos/l1v32r1d3bmx/3985457584
http://www.flickr.com/photos/thomasforsyth/4313764488
http://www.flickr.com/photos/rubodewig/5161937181
http://www.flickr.com/photos/ronwls/7001551988
http://www.flickr.com/photos/sparktography/83217827
http://www.flickr.com/photos/sdphotography/1570906849
http://tosbourn.com/wp-content/uploads/2013/12/redis-logo.png?e0df77
http://www.flickr.com/photos/derekskey/9530097369
http://giphy.com/gifs/yeUxljCJjH1rW
http://en.wikipedia.org/wiki/Broadcast_delay
http://www.flickr.com/photos/karen_d/8448507872
http://www.flickr.com/photos/buzzhoffman/4127280540
http://i.imgur.com/2UduUZ5.gif

Lindsay Holmwood

February 10, 2014
Tweet

More Decks by Lindsay Holmwood

Other Decks in Technology

Transcript

  1. Developed + used in production at: Developers: Ali Graham Jesse

    Reynolds Project manager: Lindsay Holmwood
  2. +

  3. Contact Checks Checks Media Checks Checks Notification Rules History (maintenance,

    acks, state changes) Checks Checks Checks Checks Checks Entities
  4. Find people interested in entity map [ alice bob, carol

    ] notification event filters Find failing events
  5. Find people interested in entity map map Find media owned

    by people [ [ alice, email ], [ alice, sms ], [ bob, email ], [ bob, sms ], [ carol, sms ], ] notification event filters Find failing events
  6. Find people interested in entity map map reduce Find media

    owned by people Delete media based on tags, severity, time of day [ [ alice, email ], [ alice, sms ], [ bob, sms ], ] notification event filters Find failing events
  7. Find people interested in entity map map reduce reduce Find

    media owned by people Delete media based on tags, severity, time of day Delete media based on blackholes [ [ alice, sms ], [ bob, sms ], ] notification event filters Find failing events
  8. Find people interested in entity map map reduce reduce reduce

    Find media owned by people Delete media based on tags, severity, time of day Delete media based on blackholes Delete media based on notification intervals notification event filters Find failing events [ [ alice, sms ], [ bob, sms ], ]
  9. Find people interested in entity map map reduce reduce reduce

    Find media owned by people Delete media based on tags, severity, time of day Delete media based on blackholes Delete media based on notification intervals notification event filters Find failing events alert alert [ [ alice, sms ], [ bob, sms ], ]
  10. flapjack_contact { '[email protected]': ensure => present, first_name => 'Ada', last_name

    => 'Lovelace', timezone => 'Europe/London', sms_media => { address => '+61412345678', interval => '120', rollup_threshold => '5', }, }
  11. flapjack_contact { '[email protected]': ensure => present, first_name => 'Ada', last_name

    => 'Lovelace', timezone => 'Europe/London', sms_media => { address => '+61412345678', interval => '120', rollup_threshold => '5', }, email_media => { address => '[email protected]', interval => '1800', } }
  12. flapjack_notification_rule { 'ada app-01': contact_id => '[email protected]', entities => [

    'app-01.example.com' ] warning_media => [ 'sms' ], critical_media => [ 'sms' ], } flapjack_notification_rule { 'ada catchall': contact_id => '[email protected]', warning_media => [ 'email' ], critical_media => [ 'sms' ], }
  13. flapjack_notification_rule { 'ada db': contact_id => '[email protected]', entity_tags => [

    'db' ], warning_media => [ 'email' ], critical_media => [ ], } flapjack_notification_rule { 'ada app-01': contact_id => '[email protected]', entities => [ 'app-01.example.com' ] warning_media => [ 'sms' ], critical_media => [ 'sms' ], } flapjack_notification_rule { 'ada catchall': contact_id => '[email protected]', warning_media => [ 'email' ], critical_media => [ 'sms' ], }
  14. resources: flapjack_contact: '[email protected]': ensure: present first_name: John last_name: Doe timezone:

    'Australia/Sydney' sms_media: address: '+61431261000' interval: 120 rollup_threshold: 5
  15. Credits: http://www.flickr.com/photos/lizadaly/4373330774 http://www.flickr.com/photos/meltwater/420749031 http://www.flickr.com/photos/whatknot/8642836187 http://www.flickr.com/photos/jonmould/5393395335 http://vmfarms.com/static/img/logos/ruby-logo.png http://www.flickr.com/photos/l1v32r1d3bmx/3985457584 http://www.flickr.com/photos/thomasforsyth/4313764488 http://www.flickr.com/photos/rubodewig/5161937181 http://www.flickr.com/photos/ronwls/7001551988

    http://www.flickr.com/photos/sparktography/83217827 http://www.flickr.com/photos/sdphotography/1570906849 http://tosbourn.com/wp-content/uploads/2013/12/redis-logo.png?e0df77 http://www.flickr.com/photos/derekskey/9530097369 http://giphy.com/gifs/yeUxljCJjH1rW http://en.wikipedia.org/wiki/Broadcast_delay http://www.flickr.com/photos/karen_d/8448507872 http://www.flickr.com/photos/buzzhoffman/4127280540 http://i.imgur.com/2UduUZ5.gif