monitoring: it gets better

monitoring [still kinda] sucks it gets better Adam Horwich Systems
Engineer MetaBroadcast

• Simple incidents can have disastrous consequences who is my
target audience?

old skool • Limited suite of free applications: Nagios, Cacti,
Ganglia • Static, Inﬂexible, Single-purpose • Commercial applications try to achieve everything (badly) • Limited scope to enhance!

hashtag monitoringsucks • A couple of years ago, sysadmins despaired
• https://github.com/monitoringsucks • The notion that monitoring products are ill suited for DevOps needs • And that most are outdated, and misaligned to Cloud architectures • The one tool to solve all problems model is a con

my ﬁrst year • My history is with large infrastructure,
static, rented/owned Datacentre • Started at MetaBroadcast and spent my time hacking Nagios to play well with AWS • We knew what we needed in terms of quality monitoring but it was uphill struggle • Automated infra changes • Metrics gathering and graphing • I cry

cloud surﬁng • Basically, it’s because of IaaS Clouds and
DevOps that we need to rethink our models • No longer static hardware with sentimental names and birthdays • Virtual Infrastructure is abstracted. Harder to monitor ‘switches’ and ‘routers’ • VPC Simulates the DC model, but still virtualised • More ﬂexibility but less accessible data

inspiration • Obfuscurity - Tasseo creator • https://speakerdeck.com/obfuscurity/the-state-of-open-source-monitoring • Finding
the components to build a better way of life • Modular, open-source, framework led design

monitoring manifesto 1. Proactive not reactive metrics and alerts 2.
Focus on instrumentation not thresholds 3. Flexible architecture, built for a ﬂexible infrastructure 4. Tie with conﬁguration management tools (Puppet)

• Router, Queue, Scheduler • Connects everything together • Designed
for the Cloud • High Availability Model be sensu my beating heart

sensu and pagerduty • Sensu PagerDuty handlers

sensu and graphite • Sensu Graphite handlers • Tasseo frontend

syslog-ng and logstalgia • Not Sensu integrated

syslog-ng and sensu and kairosdb • Integrating KairosDB with Syslog-NG
• Delivers metrics to Graphite via Sensu

sensu and cloudwatch • Collecting metrics from AWS CloudWatch •
Pushing Graphite metrics to CloudWatch

kairosdb • Originally Nimrod, but as a developer focused tool,
couldn’t cope with volume of logs • Proof that architecture principles make components easily interchangeable • Apache performance monitoring • APDEX, Aggregation, and Comparison

graphite and tasseo • Taking data and making it accessible
and pretty and stuff • Dashboards, comparisons, oh my •

logstalgia • Realtime API response time visualisation • Good for
SHTF moments

silver linings • AWS CloudWatch: Amazon’s metrics engine • We
collect ‘privileged’ and AWS speciﬁc metrics into our Graphite for comparison and persistence • Spot prices! Billing! All can be monitored in AWS or with your own services • Can push metrics to AWS too. Great for Auto Scaling (spot price ;) ) • CloudWatch isn’t an adequate component in itself (API Speed, Cost, Flexibility, Integration)

monitoring still kinda sucks • Sensu is great but has
a poor dashboard, visibility, historical access. • Alerts are still based on thresholds rather than trends and patterns • Monitoring the monitoring with more monitoring • Actively developed and improving every day • Can’t escape from that list of RED services. But it’s next to do!

hashtag monitoringlove • But, monitoring is as hard as you
want to make it • By choosing the right components, they can seamlessly interact • Focus on what you want to achieve, not what you want to monitor • Don’t be afraid of building zords

questions? • [email protected] • @mmmkayness

monitoring: it gets better

monitoring: it gets better

MetaBroadcast

More Decks by MetaBroadcast

Other Decks in Technology

Featured

Transcript

monitoring [still kinda] sucks it gets better Adam Horwich Systems

• Simple incidents can have disastrous consequences who is my

old skool • Limited suite of free applications: Nagios, Cacti,

hashtag monitoringsucks • A couple of years ago, sysadmins despaired

my ﬁrst year • My history is with large infrastructure,

cloud surﬁng • Basically, it’s because of IaaS Clouds and

inspiration • Obfuscurity - Tasseo creator • https://speakerdeck.com/obfuscurity/the-state-of-open-source-monitoring • Finding

monitoring manifesto 1. Proactive not reactive metrics and alerts 2.

• Router, Queue, Scheduler • Connects everything together • Designed

sensu and pagerduty • Sensu PagerDuty handlers

sensu and graphite • Sensu Graphite handlers • Tasseo frontend

syslog-ng and logstalgia • Not Sensu integrated

syslog-ng and sensu and kairosdb • Integrating KairosDB with Syslog-NG

sensu and cloudwatch • Collecting metrics from AWS CloudWatch •

kairosdb • Originally Nimrod, but as a developer focused tool,

graphite and tasseo • Taking data and making it accessible

logstalgia • Realtime API response time visualisation • Good for

silver linings • AWS CloudWatch: Amazon’s metrics engine • We

monitoring still kinda sucks • Sensu is great but has

hashtag monitoringlove • But, monitoring is as hard as you

questions? • [email protected] • @mmmkayness