Monitoring Containers Correctly

0fe4657094b62f41fb86888015817359?s=47 Michael
October 01, 2018

Monitoring Containers Correctly

Michael Kehoe walks you through building a small monitoring utility for cgroup containers to illustrate best practices in container monitoring. You'll explore various cgroup constraints and learn how to specifically monitor for each of them to ensure that your application is behaving as expected. Along the way, Michael shares tricks and tips about monitoring containerized applications.

0fe4657094b62f41fb86888015817359?s=128

Michael

October 01, 2018
Tweet

Transcript

  1. Monitoring Containers Correctly Michael Kehoe Staff Site Reliability Engineer https://github.com/michael-kehoe/container-monitoring-workshop

  2. Getting Started • Setup your workshop platform: • https://app.strigo.io/event/QXDpmTiRAuf Q4LBis

    • Token: F7C7 • Background slides: https://bit.ly/2NcEBQN • Code repo: https://github.com/michael- kehoe/container-monitoring-workshop • Please let me know ASAP if you’re having problems
  3. Today’s agenda 1 Introductions 2 Container Primitives 3 What we’ll

    monitor 4 Cgroup interface file formats 5 Exercises
  4. Today’s agenda Exercises 100 CPU Basics 101 CPU Enhanced 102

    CPU Advanced 200 Memory Basics 201 Memory Enhanced 300 IO Basics 400 PID
  5. Michael Kehoe $ WHOAMI • Staff Site Reliability Engineer @

    LinkedIn • Production-SRE Team • Funny accent = Australian + 4 years American • Worked on: • Networks • Micro-services • Traffic Engineering • Databases
  6. Production-SRE Team @ LinkedIn $ WHOAMI • Disaster Recovery -

    Planning & Automation • Incident Response – Process & Automation • Visibility Engineering – Making use of operational data • Reliability Principles – Defining best practice & automating it
  7. Container Primitives

  8. Containers Limiting the resources that can be used by a

    process/ set of processes cgroups Isolating filesystem resources Namespaces Implicit sharing or shadowing Copy on Write Locking down container privileges Linux Security Modules
  9. Cgroup • Abbreviation for ‘Control Groups’ • Provides • Resource

    Limiting • Prioritization • Accounting • Control
  10. What we’ll monitor

  11. • 100: Basic cgroup CPU utilization • 101: Enhanced cgroup

    CPU utilization (with percentiles • 102: cgroup throttles What we’ll monitor CPU
  12. • 200: Memory Basics • Cgroup utilization • 201: Enhanced

    Memory Metrics What we’ll monitor MEMORY
  13. • 300: Disk IO Monitoring What we’ll monitor DISK/ NETWORK

  14. • 400: PID Utilization What we’ll monitor PID

  15. Cgroup interface file formats

  16. Cgroup interface file formats https://www.kernel.org/doc/Documentation/cgroup-v2.txt

  17. Exercises

  18. 100: CPU Monitoring

  19. 101: Enhanced CPU Monitoring

  20. Enhanced CPU Monitoring

  21. 102: CPU Advanced Monitoring

  22. Advanced CPU Monitoring

  23. 200: Memory Basics

  24. 201: Memory Enhanced

  25. 300: Disk IO Basics

  26. 400: PID Monitoring

  27. None