Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
It’s not in production unless it’s monitored. Wednesday, April 25, 2012
Slide 2
Slide 2 text
‣ @josephruscio ‣ Co-Founder/CTO Librato ‣ I <3 graphs Wednesday, April 25, 2012
Slide 3
Slide 3 text
Wednesday, April 25, 2012
Slide 4
Slide 4 text
SaaS 2002 ‣ Seed Round: $1.5M USD ‣ Infrastructure: CAPEX ‣ Dedicated Ops Team ‣ Custom Software Stack Wednesday, April 25, 2012
Slide 5
Slide 5 text
SaaS 2012 ‣ Seed Round: $20K USD ‣ Infrastructure: OPEX ‣ <=1 Ops Person ‣ OSS, External Services Wednesday, April 25, 2012
Slide 6
Slide 6 text
‣ agile infrastructure ‣ ephemeral infrastructure ‣ more change, worse tools! Wednesday, April 25, 2012
Slide 7
Slide 7 text
‣ continuous integration ‣ one-click deploy ‣ feature-flagging ‣ monitoring ‣ alerting Cont. Deployment Wednesday, April 25, 2012
Slide 8
Slide 8 text
Wednesday, April 25, 2012
Slide 9
Slide 9 text
Chunky Bacon!! Wednesday, April 25, 2012
Slide 10
Slide 10 text
Graphite StatsD OpenTSDB Cube d3.js Wednesday, April 25, 2012
Slide 11
Slide 11 text
Anti-Pattern • Custom Stats • MySQL threads • VMstat • .... Storage • CPU • Interface • Memory • Ping • Battery charge • .... Storage • Ping • CPU • Memory • Disks • SNMP Service • .... Storage ... Nagios Ganglia RRD/Cacti Wednesday, April 25, 2012
Slide 12
Slide 12 text
#monitoringsucks Wednesday, April 25, 2012
Slide 13
Slide 13 text
We need a better model Wednesday, April 25, 2012
Slide 14
Slide 14 text
Metrics ‣ business drivers ‣ application performance ‣ system resources ‣ network Wednesday, April 25, 2012
Slide 15
Slide 15 text
Collection Storage Aggregation Analysis Wednesday, April 25, 2012
Slide 16
Slide 16 text
Separation of Concerns Wednesday, April 25, 2012
Slide 17
Slide 17 text
Collection Wednesday, April 25, 2012
Slide 18
Slide 18 text
Logging ‣ etsy/logster ‣ logstash/logstash ‣ Papertrail et al. Wednesday, April 25, 2012
Slide 19
Slide 19 text
AS::Notifications ‣ pub/sub instrumentation ‣ mattmatt/lograge ‣ twinturbo/harness Wednesday, April 25, 2012
Slide 20
Slide 20 text
eric/metriks ‣ Ruby instrumentation ‣ counters,meters,timers ‣ multiple reporters Wednesday, April 25, 2012
Slide 21
Slide 21 text
Aggregation Wednesday, April 25, 2012
Slide 22
Slide 22 text
etsy/statsd ‣ ~319 SLOC Node.js ‣ counters, timers, gauges ‣ UDP Wednesday, April 25, 2012
Slide 23
Slide 23 text
Nginx Unicorn StatsD Front-End Wednesday, April 25, 2012
Slide 24
Slide 24 text
StatsD Clients ‣ zebrafishlabs/nginx-statsd ‣ github/rack-statsd ‣ shopify/statsd-instrument Wednesday, April 25, 2012
Slide 25
Slide 25 text
StatsD Servers Wednesday, April 25, 2012
Slide 26
Slide 26 text
Storage Wednesday, April 25, 2012
Slide 27
Slide 27 text
RRDTool ‣ Round-Robin Database Tool ‣ constant storage size ‣ rollups Wednesday, April 25, 2012
Slide 28
Slide 28 text
Graphite ‣ Whisper RRD ‣ flat.hierarchical.namespace ‣ HTTP queries Wednesday, April 25, 2012
Slide 29
Slide 29 text
OpenTSDB ‣ HBase ‣ multiple dimensions ‣ HTTP queries Wednesday, April 25, 2012
Slide 30
Slide 30 text
SaaS ‣ Librato Metrics et al. ‣ JSON over HTTP ‣ rollups ‣ interactive front-ends Wednesday, April 25, 2012
Slide 31
Slide 31 text
Visualization Wednesday, April 25, 2012
Slide 32
Slide 32 text
Correlation ‣ metrics ‣ annotations ‣ arbitrary combinations Wednesday, April 25, 2012
Slide 33
Slide 33 text
Wednesday, April 25, 2012
Slide 34
Slide 34 text
Wednesday, April 25, 2012
Slide 35
Slide 35 text
Dashboards ‣ shared understanding ‣ aberration detection ‣ fire-fighting manual Wednesday, April 25, 2012
Slide 36
Slide 36 text
Wednesday, April 25, 2012
Slide 37
Slide 37 text
Wednesday, April 25, 2012
Slide 38
Slide 38 text
Alerting Wednesday, April 25, 2012
Slide 39
Slide 39 text
Tuning Alerts ‣ trigger threshold ‣ cancel threshold ‣ re-arm window ‣ function ‣ window Wednesday, April 25, 2012
Slide 40
Slide 40 text
Wednesday, April 25, 2012
Slide 41
Slide 41 text
Aberrant Behavior Wednesday, April 25, 2012
Slide 42
Slide 42 text
‣ separation of concerns ‣ monitoring == tests ‣ arbitrary correlations ‣ dashboards ‣ living alerts Wednesday, April 25, 2012
Slide 43
Slide 43 text
fin Wednesday, April 25, 2012