Data driven alerting with Flapjack + Puppet + Hiera

Data-driven alerting with Flapjack + Puppet + Hiera

What is ﬂapjack?

Monitoring alert routing system

Composable

Rollup

Alert routing

• event

• event • ↪ notify?

• event • ↪ notify? • ↪ who?

• event • ↪ notify? • ↪ who? • ↪
how?

API driven

No restarts required

Developed + used in production at:

Developed + used in production at: Developers: Ali Graham Jesse
Reynolds Project manager: Lindsay Holmwood

Designed for humans

Why ﬂapjack ?

Speciﬁc use cases

Multi-tenant

Segregated responsibility

Check engine independence

Killer features

Self-checking

event producers

event producers ﬂapjack

event producers ﬂapjack oobetet

ﬂapper

event producers ﬂapper

event producers ﬂapjack ﬂapper

event producers ﬂapjack jabber room ﬂapper

event producers ﬂapjack jabber room ﬂapper oobetet

event producers ﬂapjack jabber room ﬂapper oobetet PagerDuty

Rollup (alert summarisation)

Per-media thresholds

• Contact

• Contact • has many • Media

• Contact • has many • Media • has one
• Summary Threshold

Tagging

How does it work?

Data model

Contact

Contact Checks Checks Media

Contact Checks Checks Media Checks Checks Notiﬁcation Rules

Contact Checks Checks Media Checks Checks Notiﬁcation Rules Checks Checks
Entities

Checks Checks Checks Entities

Contact Checks Checks Media Checks Checks Notiﬁcation Rules History (maintenance,
acks, state changes) Checks Checks Checks Checks Checks Entities

Architecture

event producers

event producers processors

event producers processors gateways

processors gateways Icinga ﬂapjackfeeder Sensu jestin's thing

Event Producers Icinga Sensu Cron Nagios

processors gateways Icinga ﬂapjackfeeder Sensu jestin's thing

processor gateways notiﬁer Icinga ﬂapjackfeeder Sensu jestin's thing

How are alerts routed?

event ﬁlters Find failing events

notiﬁcation event ﬁlters Find failing events

Find people interested in entity map [ alice bob, carol
] notiﬁcation event ﬁlters Find failing events

Find people interested in entity map map Find media owned
by people [ [ alice, email ], [ alice, sms ], [ bob, email ], [ bob, sms ], [ carol, sms ], ] notiﬁcation event ﬁlters Find failing events

Find people interested in entity map map reduce Find media
owned by people Delete media based on tags, severity, time of day [ [ alice, email ], [ alice, sms ], [ bob, sms ], ] notiﬁcation event ﬁlters Find failing events

Find people interested in entity map map reduce reduce Find
media owned by people Delete media based on tags, severity, time of day Delete media based on blackholes [ [ alice, sms ], [ bob, sms ], ] notiﬁcation event ﬁlters Find failing events

Find people interested in entity map map reduce reduce reduce
Find media owned by people Delete media based on tags, severity, time of day Delete media based on blackholes Delete media based on notification intervals notiﬁcation event ﬁlters Find failing events [ [ alice, sms ], [ bob, sms ], ]

Find people interested in entity map map reduce reduce reduce
Find media owned by people Delete media based on tags, severity, time of day Delete media based on blackholes Delete media based on notification intervals notiﬁcation event ﬁlters Find failing events alert alert [ [ alice, sms ], [ bob, sms ], ]

processor gateways notiﬁer Icinga ﬂapjackfeeder Sensu jestin's thing

processor notiﬁer Icinga ﬂapjackfeeder Sensu jestin's thing Email SMS Jabber
PagerDuty Web API

Things that may surprise you

Constant heartbeat

No one-off events

How long has a check been failing?

NOT "How many times has the check failed?"

No HARD/SOFT states

Broadcast delay

Alert summarisation (Rollup)

Integrating

Conﬁgure Flapjack with Puppet

puppet as external source of truth

Contact

Contact Checks Checks Media

Contact Checks Checks Media Checks Checks Notiﬁcation Rules

Entities

Puppet

API Puppet

ﬂapjack API Puppet

ﬂapjack API Puppet events

ﬂapjack API Puppet events notiﬁcations

Puppet type + provider for ﬂapjack

Bootstrapping

git clone https://github.com/flpjck/vagrant-flapjack.git cd vagrant-flapjack vagrant up

manifests/site.pp

node default { class {'icinga': } -> class {'nagios': }
-> class {'flapjack': }

Entities

flapjack_contact { '[email protected]': ensure => present, first_name => 'Ada', last_name
=> 'Lovelace', timezone => 'Europe/London', }

Entities

=> 'Lovelace', timezone => 'Europe/London', }

=> 'Lovelace', timezone => 'Europe/London', sms_media => { address => '+61412345678', interval => '120', rollup_threshold => '5', }, }

=> 'Lovelace', timezone => 'Europe/London', sms_media => { address => '+61412345678', interval => '120', rollup_threshold => '5', }, email_media => { address => '[email protected]', interval => '1800', } }

Entities

flapjack_notification_rule { 'ada catchall': contact_id => '[email protected]', warning_media => [
'email' ], critical_media => [ 'sms' ], }

flapjack_notification_rule { 'ada app-01': contact_id => '[email protected]', entities => [
'app-01.example.com' ] warning_media => [ 'sms' ], critical_media => [ 'sms' ], } flapjack_notification_rule { 'ada catchall': contact_id => '[email protected]', warning_media => [ 'email' ], critical_media => [ 'sms' ], }

flapjack_notification_rule { 'ada db': contact_id => '[email protected]', entity_tags => [
'db' ], warning_media => [ 'email' ], critical_media => [ ], } flapjack_notification_rule { 'ada app-01': contact_id => '[email protected]', entities => [ 'app-01.example.com' ] warning_media => [ 'sms' ], critical_media => [ 'sms' ], } flapjack_notification_rule { 'ada catchall': contact_id => '[email protected]', warning_media => [ 'email' ], critical_media => [ 'sms' ], }

hiera_resources(['resources']) node default { # ... }

resources: flapjack_contact: '[email protected]': ensure: present first_name: John last_name: Doe timezone:
'Australia/Sydney' sms_media: address: '+61431261000' interval: 120 rollup_threshold: 5

Open Source

• github.com/ﬂpjck/ﬂapjack

Quality documentation github.com/ﬂpjck/ﬂapjack/wiki

Bad documentation? BUG

Bad ﬁrst experience? BUG

Thank you! flapjack.io github.com/flpjck/flapjack github.com/flpjck/flapjack-vagrant

Thank you! Liked the talk? Let @auxesis know! flapjack.io github.com/flpjck/flapjack
github.com/flpjck/flapjack-vagrant

Credits: http://www.flickr.com/photos/lizadaly/4373330774 http://www.flickr.com/photos/meltwater/420749031 http://www.flickr.com/photos/whatknot/8642836187 http://www.flickr.com/photos/jonmould/5393395335 http://vmfarms.com/static/img/logos/ruby-logo.png http://www.flickr.com/photos/l1v32r1d3bmx/3985457584 http://www.flickr.com/photos/thomasforsyth/4313764488 http://www.flickr.com/photos/rubodewig/5161937181 http://www.flickr.com/photos/ronwls/7001551988
http://www.flickr.com/photos/sparktography/83217827 http://www.flickr.com/photos/sdphotography/1570906849 http://tosbourn.com/wp-content/uploads/2013/12/redis-logo.png?e0df77 http://www.flickr.com/photos/derekskey/9530097369 http://giphy.com/gifs/yeUxljCJjH1rW http://en.wikipedia.org/wiki/Broadcast_delay http://www.flickr.com/photos/karen_d/8448507872 http://www.flickr.com/photos/buzzhoffman/4127280540 http://i.imgur.com/2UduUZ5.gif

Data driven alerting with Flapjack + Puppet + H...

Data driven alerting with Flapjack + Puppet + Hiera

More Decks by Lindsay Holmwood

Other Decks in Technology

Featured

Transcript