Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring Containers Correctly

Michael
October 01, 2018

Monitoring Containers Correctly

Michael Kehoe walks you through building a small monitoring utility for cgroup containers to illustrate best practices in container monitoring. You'll explore various cgroup constraints and learn how to specifically monitor for each of them to ensure that your application is behaving as expected. Along the way, Michael shares tricks and tips about monitoring containerized applications.

Michael

October 01, 2018
Tweet

More Decks by Michael

Other Decks in Programming

Transcript

  1. Monitoring Containers Correctly
    Michael Kehoe
    Staff Site Reliability Engineer
    https://github.com/michael-kehoe/container-monitoring-workshop

    View full-size slide

  2. Getting Started
    • Setup your workshop platform:
    • https://app.strigo.io/event/QXDpmTiRAuf
    Q4LBis
    • Token: F7C7
    • Background slides: https://bit.ly/2NcEBQN
    • Code repo: https://github.com/michael-
    kehoe/container-monitoring-workshop
    • Please let me know ASAP if you’re having
    problems

    View full-size slide

  3. Today’s
    agenda
    1 Introductions
    2 Container Primitives
    3 What we’ll monitor
    4 Cgroup interface file formats
    5 Exercises

    View full-size slide

  4. Today’s
    agenda
    Exercises
    100 CPU Basics
    101 CPU Enhanced
    102 CPU Advanced
    200 Memory Basics
    201 Memory Enhanced
    300 IO Basics
    400 PID

    View full-size slide

  5. Michael Kehoe
    $ WHOAMI
    • Staff Site Reliability Engineer @ LinkedIn
    • Production-SRE Team
    • Funny accent = Australian + 4 years American
    • Worked on:
    • Networks
    • Micro-services
    • Traffic Engineering
    • Databases

    View full-size slide

  6. Production-SRE Team @ LinkedIn
    $ WHOAMI
    • Disaster Recovery - Planning & Automation
    • Incident Response – Process & Automation
    • Visibility Engineering – Making use of
    operational data
    • Reliability Principles – Defining best practice
    & automating it

    View full-size slide

  7. Container Primitives

    View full-size slide

  8. Containers
    Limiting the resources
    that can be used by a
    process/ set of
    processes
    cgroups
    Isolating filesystem
    resources
    Namespaces
    Implicit sharing or
    shadowing
    Copy on Write
    Locking down
    container privileges
    Linux Security Modules

    View full-size slide

  9. Cgroup
    • Abbreviation for ‘Control Groups’
    • Provides
    • Resource Limiting
    • Prioritization
    • Accounting
    • Control

    View full-size slide

  10. What we’ll monitor

    View full-size slide

  11. • 100: Basic cgroup CPU
    utilization
    • 101: Enhanced cgroup CPU
    utilization (with percentiles
    • 102: cgroup throttles
    What we’ll monitor
    CPU

    View full-size slide

  12. • 200: Memory Basics
    • Cgroup utilization
    • 201: Enhanced Memory
    Metrics
    What we’ll monitor
    MEMORY

    View full-size slide

  13. • 300: Disk IO Monitoring
    What we’ll monitor
    DISK/ NETWORK

    View full-size slide

  14. • 400: PID Utilization
    What we’ll monitor
    PID

    View full-size slide

  15. Cgroup interface file formats

    View full-size slide

  16. Cgroup interface file formats
    https://www.kernel.org/doc/Documentation/cgroup-v2.txt

    View full-size slide

  17. 100: CPU Monitoring

    View full-size slide

  18. 101: Enhanced CPU
    Monitoring

    View full-size slide

  19. Enhanced CPU Monitoring

    View full-size slide

  20. 102: CPU Advanced Monitoring

    View full-size slide

  21. Advanced CPU Monitoring

    View full-size slide

  22. 200: Memory Basics

    View full-size slide

  23. 201: Memory Enhanced

    View full-size slide

  24. 300: Disk IO Basics

    View full-size slide

  25. 400: PID Monitoring

    View full-size slide