Data Driven Monitoring

Data Driven Monitoring Daniel Schauenberg [email protected] @mrtazz

@mrtazz

Item by TheBackPackShoppe

http://www.ﬂickr.com/photos/brianglanz/1095706242

@mrtazz

How comfortable are you deploying a change right now?

“If this is your first day at Etsy, you deploy
the site”

@mrtazz

@mrtazz Ganglia • System level metrics • Instance per DC/environment
• > 220k RRD files • Fully configured through Chef role attributes

@mrtazz Rainbow Graphs!

@mrtazz StatsD

@mrtazz Graphite • Application level metrics • 96G RAM, 20
Cores, 7.3T SSD RAID 10 • 525k metrics/minute • Mirrored Primary/Primary Setup • Functionally sharded relays

@mrtazz

@mrtazz nagios

@mrtazz <3 nagios

@mrtazz

@mrtazz Nagios • 2 instances in each DC/environment • Fully
Chef generated configuration • Service checks and contacts in git • Notifications via email->SMS gateway • ~75% ops on-call

@mrtazz github.com/lozzd/nagdash

@mrtazz

@mrtazz Much more… • Syslog-ng • Logstash • Logster •
Supergrep • Eventinator

Information Overload Image by http://jasoncasteel.deviantart.com/

@mrtazz Alert Fatigue

We have the data We can make it better Item
by PicksFromThePast

@mrtazz nagios-herald

@mrtazz Failed Check nagios-herald Formatter Helpers Graphite Ganglia Logstash Message

github.com/etsy/nagios-herald

@mrtazz opsweekly

@mrtazz

@mrtazz Opsweekly

@mrtazz Alert categorization

@mrtazz Wearables! Item by JennysTrinketShoppe

@mrtazz Sleep tracking

github.com/etsy/opsweekly

@mrtazz Summary • Set of trusted tools for monitoring •
Always experiment • Always learn • Always improve • Use the data, Luke

@mrtazz Shout out to @lozzd and @Ryan_Frantz

@mrtazz codeascraft.com etsy.com/codeascraft/talks etsy.github.com etsy.com/careers

Questions?

Data Driven Monitoring Daniel Schauenberg [email protected] @mrtazz

Data Driven Monitoring

Data Driven Monitoring

More Decks by Daniel Schauenberg

Other Decks in Technology

Featured

Transcript