DevOpsDays Cuba 2016: Modern Monitoring

Slide 1

Slide 1 text

@bridgetkromhout Monitoring

Slide 2

Slide 2 text

@bridgetkromhout lives: Minneapolis, Minnesota works: Pivotal podcasts: Arrested DevOps organizes: devopsdays Bridget Kromhout

Slide 3

Slide 3 text

@bridgetkromhout “…measuring value, throughput, and performance… revenue rather than cost” The Art of Monitoring (2016) James Turnbull artofmonitoring.com

Slide 4

Slide 4 text

@bridgetkromhout Why monitor? The Art of Monitoring (2016) James Turnbull artofmonitoring.com

Slide 5

Slide 5 text

@bridgetkromhout Why monitor? Two customers of monitoring with diﬀerent needs. The Art of Monitoring (2016) James Turnbull artofmonitoring.com

Slide 6

Slide 6 text

@bridgetkromhout Why monitor? The business: UX data for product & engineering Measure value delivered Two customers of monitoring with diﬀerent needs. The Art of Monitoring (2016) James Turnbull artofmonitoring.com

Slide 7

Slide 7 text

@bridgetkromhout Why monitor? The business: UX data for product & engineering Measure value delivered Information Technology: Visibility into state and failures Product & engineering decisions Measure success of projects Two customers of monitoring with diﬀerent needs. The Art of Monitoring (2016) James Turnbull artofmonitoring.com

Slide 8

Slide 8 text

@bridgetkromhout Method Type Focus Manual Checklists, simple scripts “Tribal knowledge” of things broken in the past Minimizing downtime, managing assets Reactive disk, CPU, memory checks Thresholds, alerting; updated after incidents Availability, assets, some customer experience Proactive Automatic; required for deployment Alerting includes context, automated remediation Application performance, business outcomes Monitoring Maturity Model The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com

Slide 9

Slide 9 text

@bridgetkromhout Monitoring Maturity Model The Art of Monitoring (2016) James Turnbull artofmonitoring.com

Slide 10

Slide 10 text

@bridgetkromhout Typical reactive notiﬁcation The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com

Slide 11

Slide 11 text

@bridgetkromhout Better notiﬁcations Actionable Provide necessary context Prevent alert fatigue The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com (in the brave new cloudy-with-a-chance-of-containers world)

Slide 12

Slide 12 text

@bridgetkromhout “Nothing’s the same anymore.” Jeﬀrey Sinclair Babylon 5

Slide 13

Slide 13 text

@bridgetkromhout containers: dawn of the third age?

Slide 14

Slide 14 text

@bridgetkromhout 1979 (Version 7)  1982 (BSD) 2004 2000 chroot FreeBSD jails Solaris Zones LXC 2008 A Brief History of Containers, Part 1 (before docker era)

Slide 15

Slide 15 text

@bridgetkromhout A Brief History of Containers, Part 2 (docker common era) 2011 Cloud Foundry 2013 Docker 2014 Rocket (later rkt) 2015 Open Container Initiative

Slide 16

Slide 16 text

@bridgetkromhout containing dramas

Slide 17

Slide 17 text

@bridgetkromhout (autoscaling EC2 instances up & down over time) ephemeral infrastructure

Slide 18

Slide 18 text

@bridgetkromhout “cattle, not pets” (even adorable Attack Kittens)

Slide 19

Slide 19 text

@bridgetkromhout • cloud-based infrastructure • static checks or thresholds no longer scale • manual conﬁguration no longer scales “cattle, not pets”

Slide 20

Slide 20 text

@bridgetkromhout automation: solution? or problem? (it depends)

Slide 21

Slide 21 text

@bridgetkromhout architectural considerations

Slide 22

Slide 22 text

@bridgetkromhout

Slide 23

Slide 23 text

@bridgetkromhout consistent development repeatable deployment Why containers?

Slide 24

Slide 24 text

@bridgetkromhout Monitoring containers The Art of Monitoring (2016) James Turnbull artofmonitoring.com

Slide 25

Slide 25 text

@bridgetkromhout Events - changes in your environment Logs - diagnosis & troubleshooting Metrics - seeing state in real time for anomaly detection & pattern analysis

Slide 26

Slide 26 text

@bridgetkromhout The Art of Monitoring (2016) James Turnbull artofmonitoring.com Open source & SaaS • choose TCP over UDP • conﬁgurable granularity • “push” vs “pull”

Slide 27

Slide 27 text

@bridgetkromhout “Almost every task run under Borg contains a built-in HTTP server that publishes information about the health of the task and thousands of performance metrics” Large-scale cluster management at Google with Borg - Verma et al. 2015 “Almost every task run under Borg contains a built-in HTTP server that publishes information about the health of the task and thousands of performance metrics”

Slide 28

Slide 28 text

@bridgetkromhout Image credit: James Ernest

Slide 29

Slide 29 text

@bridgetkromhout Security Pros & Cons Containers limit attack surface & emitters don’t need ports open but… Microservices move IPC to network transactions & complexity is distributed