Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOpsDays Cuba 2016: Modern Monitoring

DevOpsDays Cuba 2016: Modern Monitoring

Author: Bridget Kromhout
Summary:

DevOpsDays Cuba

October 19, 2016
Tweet

More Decks by DevOpsDays Cuba

Other Decks in Technology

Transcript

  1. @bridgetkromhout
    Monitoring

    View full-size slide

  2. @bridgetkromhout
    lives:
    Minneapolis,
    Minnesota
    works:
    Pivotal
    podcasts:
    Arrested
    DevOps
    organizes:
    devopsdays
    Bridget Kromhout

    View full-size slide

  3. @bridgetkromhout
    “…measuring value, throughput,
    and performance…
    revenue rather than cost”
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com

    View full-size slide

  4. @bridgetkromhout
    Why monitor?
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com

    View full-size slide

  5. @bridgetkromhout
    Why monitor?
    Two customers of monitoring with different needs.
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com

    View full-size slide

  6. @bridgetkromhout
    Why monitor?
    The business:
    UX data for product &
    engineering
    Measure value delivered
    Two customers of monitoring with different needs.
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com

    View full-size slide

  7. @bridgetkromhout
    Why monitor?
    The business:
    UX data for product &
    engineering
    Measure value delivered
    Information Technology:
    Visibility into state and failures
    Product & engineering decisions
    Measure success of projects
    Two customers of monitoring with different needs.
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com

    View full-size slide

  8. @bridgetkromhout
    Method Type Focus
    Manual
    Checklists, simple
    scripts
    “Tribal knowledge” of
    things broken in the past
    Minimizing downtime,
    managing assets
    Reactive
    disk, CPU,
    memory checks
    Thresholds, alerting;
    updated after incidents
    Availability, assets,
    some customer
    experience
    Proactive
    Automatic;
    required for
    deployment
    Alerting includes
    context, automated
    remediation
    Application
    performance,
    business outcomes
    Monitoring Maturity Model
    The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com

    View full-size slide

  9. @bridgetkromhout
    Monitoring
    Maturity
    Model
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com

    View full-size slide

  10. @bridgetkromhout
    Typical reactive notification
    The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com

    View full-size slide

  11. @bridgetkromhout
    Better notifications
    Actionable
    Provide necessary context
    Prevent alert fatigue
    The Art of Monitoring (2016) - James Turnbull - artofmonitoring.com
    (in the brave new cloudy-with-a-chance-of-containers world)

    View full-size slide

  12. @bridgetkromhout
    “Nothing’s the same anymore.”
    Jeffrey Sinclair
    Babylon 5

    View full-size slide

  13. @bridgetkromhout
    containers:
    dawn of the third age?

    View full-size slide

  14. @bridgetkromhout
    1979 (Version 7)

    1982 (BSD)
    2004
    2000
    chroot FreeBSD jails Solaris Zones LXC
    2008
    A Brief History of Containers, Part 1
    (before docker era)

    View full-size slide

  15. @bridgetkromhout
    A Brief History of Containers, Part 2
    (docker common era)
    2011
    Cloud Foundry
    2013
    Docker
    2014
    Rocket (later rkt)
    2015
    Open Container
    Initiative

    View full-size slide

  16. @bridgetkromhout
    containing dramas

    View full-size slide

  17. @bridgetkromhout (autoscaling EC2 instances up & down over time)
    ephemeral infrastructure

    View full-size slide

  18. @bridgetkromhout
    “cattle,
    not pets”
    (even
    adorable
    Attack
    Kittens)

    View full-size slide

  19. @bridgetkromhout
    • cloud-based infrastructure
    • static checks or thresholds no longer scale
    • manual configuration no longer scales
    “cattle, not pets”

    View full-size slide

  20. @bridgetkromhout
    automation: solution? or problem?
    (it depends)

    View full-size slide

  21. @bridgetkromhout
    architectural considerations

    View full-size slide

  22. @bridgetkromhout

    View full-size slide

  23. @bridgetkromhout
    consistent development repeatable deployment
    Why containers?

    View full-size slide

  24. @bridgetkromhout
    Monitoring containers
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com

    View full-size slide

  25. @bridgetkromhout
    Events - changes in your environment
    Logs - diagnosis & troubleshooting
    Metrics - seeing state in real time for
    anomaly detection & pattern analysis

    View full-size slide

  26. @bridgetkromhout
    The Art of Monitoring (2016)
    James Turnbull
    artofmonitoring.com
    Open source & SaaS
    • choose TCP over UDP
    • configurable granularity
    • “push” vs “pull”

    View full-size slide

  27. @bridgetkromhout
    “Almost every task run
    under Borg contains a
    built-in HTTP server that
    publishes information
    about the health of the
    task and thousands of
    performance metrics”
    Large-scale cluster management at Google with Borg - Verma et al. 2015
    “Almost every task run
    under Borg contains a
    built-in HTTP server that
    publishes information
    about the health of the
    task and thousands of
    performance metrics”

    View full-size slide

  28. @bridgetkromhout
    Image credit: James Ernest

    View full-size slide

  29. @bridgetkromhout
    Security Pros & Cons
    Containers limit attack
    surface & emitters don’t
    need ports open
    but…
    Microservices move IPC
    to network transactions &
    complexity is distributed

    View full-size slide

  30. @bridgetkromhout
    removed restored
    Information radiators

    View full-size slide

  31. @bridgetkromhout
    monitoring: the old way

    View full-size slide

  32. @bridgetkromhout
    monitoring: the new way

    View full-size slide

  33. @bridgetkromhout

    View full-size slide

  34. @bridgetkromhout
    Thanks!

    View full-size slide