monitoring: it gets better

Slide 1

Slide 1 text

monitoring [still kinda] sucks it gets better Adam Horwich Systems Engineer MetaBroadcast

Slide 2

Slide 2 text

• Simple incidents can have disastrous consequences who is my target audience?

Slide 3

Slide 3 text

old skool • Limited suite of free applications: Nagios, Cacti, Ganglia • Static, Inﬂexible, Single-purpose • Commercial applications try to achieve everything (badly) • Limited scope to enhance!

Slide 4

Slide 4 text

hashtag monitoringsucks • A couple of years ago, sysadmins despaired • https://github.com/monitoringsucks • The notion that monitoring products are ill suited for DevOps needs • And that most are outdated, and misaligned to Cloud architectures • The one tool to solve all problems model is a con

Slide 5

Slide 5 text

my ﬁrst year • My history is with large infrastructure, static, rented/owned Datacentre • Started at MetaBroadcast and spent my time hacking Nagios to play well with AWS • We knew what we needed in terms of quality monitoring but it was uphill struggle • Automated infra changes • Metrics gathering and graphing • I cry

Slide 6

Slide 6 text

cloud surﬁng • Basically, it’s because of IaaS Clouds and DevOps that we need to rethink our models • No longer static hardware with sentimental names and birthdays • Virtual Infrastructure is abstracted. Harder to monitor ‘switches’ and ‘routers’ • VPC Simulates the DC model, but still virtualised • More ﬂexibility but less accessible data

Slide 7

Slide 7 text

inspiration • Obfuscurity - Tasseo creator • https://speakerdeck.com/obfuscurity/the-state-of-open-source-monitoring • Finding the components to build a better way of life • Modular, open-source, framework led design

Slide 8

Slide 8 text

monitoring manifesto 1. Proactive not reactive metrics and alerts 2. Focus on instrumentation not thresholds 3. Flexible architecture, built for a ﬂexible infrastructure 4. Tie with conﬁguration management tools (Puppet)

Slide 9

Slide 9 text

• Router, Queue, Scheduler • Connects everything together • Designed for the Cloud • High Availability Model be sensu my beating heart

Slide 10

Slide 10 text

sensu and pagerduty • Sensu PagerDuty handlers

Slide 11

Slide 11 text

sensu and graphite • Sensu Graphite handlers • Tasseo frontend

Slide 12

Slide 12 text

syslog-ng and logstalgia • Not Sensu integrated

Slide 13

Slide 13 text

syslog-ng and sensu and kairosdb • Integrating KairosDB with Syslog-NG • Delivers metrics to Graphite via Sensu

Slide 14

Slide 14 text

sensu and cloudwatch • Collecting metrics from AWS CloudWatch • Pushing Graphite metrics to CloudWatch

Slide 15

Slide 15 text

kairosdb • Originally Nimrod, but as a developer focused tool, couldn’t cope with volume of logs • Proof that architecture principles make components easily interchangeable • Apache performance monitoring • APDEX, Aggregation, and Comparison

Slide 16

Slide 16 text

graphite and tasseo • Taking data and making it accessible and pretty and stuff • Dashboards, comparisons, oh my •

Slide 17

Slide 17 text

logstalgia • Realtime API response time visualisation • Good for SHTF moments

Slide 18

Slide 18 text

silver linings • AWS CloudWatch: Amazon’s metrics engine • We collect ‘privileged’ and AWS speciﬁc metrics into our Graphite for comparison and persistence • Spot prices! Billing! All can be monitored in AWS or with your own services • Can push metrics to AWS too. Great for Auto Scaling (spot price ;) ) • CloudWatch isn’t an adequate component in itself (API Speed, Cost, Flexibility, Integration)

Slide 19

Slide 19 text

monitoring still kinda sucks • Sensu is great but has a poor dashboard, visibility, historical access. • Alerts are still based on thresholds rather than trends and patterns • Monitoring the monitoring with more monitoring • Actively developed and improving every day • Can’t escape from that list of RED services. But it’s next to do!

Slide 20

Slide 20 text

hashtag monitoringlove • But, monitoring is as hard as you want to make it • By choosing the right components, they can seamlessly interact • Focus on what you want to achieve, not what you want to monitor • Don’t be afraid of building zords

Slide 21

Slide 21 text

questions? • [email protected] • @mmmkayness