DevOpsDays Cuba 2016: Modern Monitoring

DevOpsDays Cuba 2016: Modern Monitoring

Author: Bridget Kromhout
Summary:

D5db2dc3cc883df3479797edb63b581b?s=128

DevOpsDays Cuba

October 19, 2016
Tweet

Transcript

  1. @bridgetkromhout Monitoring

  2. @bridgetkromhout lives: Minneapolis, Minnesota works: Pivotal podcasts: Arrested DevOps organizes:

    devopsdays Bridget Kromhout
  3. @bridgetkromhout “…measuring value, throughput, and performance… revenue rather than cost”

    The Art of Monitoring (2016) James Turnbull artofmonitoring.com
  4. @bridgetkromhout Why monitor? The Art of Monitoring (2016) James Turnbull

    artofmonitoring.com
  5. @bridgetkromhout Why monitor? Two customers of monitoring with different needs.

    The Art of Monitoring (2016) James Turnbull artofmonitoring.com
  6. @bridgetkromhout Why monitor? The business: UX data for product &

    engineering Measure value delivered Two customers of monitoring with different needs. The Art of Monitoring (2016) James Turnbull artofmonitoring.com
  7. @bridgetkromhout Why monitor? The business: UX data for product &

    engineering Measure value delivered Information Technology: Visibility into state and failures Product & engineering decisions Measure success of projects Two customers of monitoring with different needs. The Art of Monitoring (2016) James Turnbull artofmonitoring.com
  8. @bridgetkromhout Method Type Focus Manual Checklists, simple scripts “Tribal knowledge”

    of things broken in the past Minimizing downtime, managing assets Reactive disk, CPU, memory checks Thresholds, alerting; updated after incidents Availability, assets, some customer experience Proactive Automatic; required for deployment Alerting includes context, automated remediation Application performance, business outcomes Monitoring Maturity Model The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com
  9. @bridgetkromhout Monitoring Maturity Model The Art of Monitoring (2016) James

    Turnbull artofmonitoring.com
  10. @bridgetkromhout Typical reactive notification The Art of Monitoring (2016) -

    James Turnbull - artofmonitoring.com
  11. @bridgetkromhout Better notifications Actionable Provide necessary context Prevent alert fatigue

    The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com (in the brave new cloudy-with-a-chance-of-containers world)
  12. @bridgetkromhout “Nothing’s the same anymore.” Jeffrey Sinclair Babylon 5

  13. @bridgetkromhout containers: dawn of the third age?

  14. @bridgetkromhout 1979 (Version 7)
 1982 (BSD) 2004 2000 chroot FreeBSD

    jails Solaris Zones LXC 2008 A Brief History of Containers, Part 1 (before docker era)
  15. @bridgetkromhout A Brief History of Containers, Part 2 (docker common

    era) 2011 Cloud Foundry 2013 Docker 2014 Rocket (later rkt) 2015 Open Container Initiative
  16. @bridgetkromhout containing dramas

  17. @bridgetkromhout (autoscaling EC2 instances up & down over time) ephemeral

    infrastructure
  18. @bridgetkromhout “cattle, not pets” (even adorable Attack Kittens)

  19. @bridgetkromhout • cloud-based infrastructure • static checks or thresholds no

    longer scale • manual configuration no longer scales “cattle, not pets”
  20. @bridgetkromhout automation: solution? or problem? (it depends)

  21. @bridgetkromhout architectural considerations

  22. @bridgetkromhout

  23. @bridgetkromhout consistent development repeatable deployment Why containers?

  24. @bridgetkromhout Monitoring containers The Art of Monitoring (2016) James Turnbull

    artofmonitoring.com
  25. @bridgetkromhout Events - changes in your environment Logs - diagnosis

    & troubleshooting Metrics - seeing state in real time for anomaly detection & pattern analysis
  26. @bridgetkromhout The Art of Monitoring (2016) James Turnbull artofmonitoring.com Open

    source & SaaS • choose TCP over UDP • configurable granularity • “push” vs “pull”
  27. @bridgetkromhout “Almost every task run under Borg contains a built-in

    HTTP server that publishes information about the health of the task and thousands of performance metrics” Large-scale cluster management at Google with Borg - Verma et al. 2015 “Almost every task run under Borg contains a built-in HTTP server that publishes information about the health of the task and thousands of performance metrics”
  28. @bridgetkromhout Image credit: James Ernest

  29. @bridgetkromhout Security Pros & Cons Containers limit attack surface &

    emitters don’t need ports open but… Microservices move IPC to network transactions & complexity is distributed
  30. @bridgetkromhout removed restored Information radiators

  31. @bridgetkromhout monitoring: the old way

  32. @bridgetkromhout monitoring: the new way

  33. @bridgetkromhout

  34. @bridgetkromhout Thanks!