Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient Monitoring in modern environments

Efficient Monitoring in modern environments

Container environments make it easy to deploy hundreds of microservices in today’s infrastructures. Monitoring thousands of metrics efficiently introduces new challenges to not lose insight, avoid alert fatigue and maintain a high development velocity. In this talk I’ll present an overview of important metrics including the 4 golden signals, discuss strategies to organize alerting efficiently, give insight into SoundCloud’s monitoring history and highlight a few success and failure stories.

Tobias Schmidt

June 28, 2016
Tweet

Other Decks in Technology

Transcript

  1. Introduction About myself Production Engineer for 5+ years Container orchestration

    (in-house, Kubernetes) Service discovery Monitoring (Prometheus) Production readiness
  2. Collecting, processing, aggregating, and displaying real- time quantitative data about

    a system, such as query counts and types, processing times, and server lifetimes. Site Reliability Engineering - O’Reilly 2016 Monitoring
  3. Monitoring Why monitor? Enable automatic alerting Analysis of long-term trends

    Validate new features/experiments/implementations Debugging
  4. Monitoring Blackbox vs. Whitebox Blackbox: Externally observed What the user

    sees Whitebox: Data exposed by the system Allows to act on imminent issues
  5. Metrics Instrument everything Host (CPU, memory, I/O, network, filesystem, …)

    Container (CPU, memory, restarts, OOM, throttling, …) Applications (throughput, latency, queues, …)
  6. Four golden signals Traffic Demand placed on a system (HTTP

    requests, network throughput, transactions, …)
  7. Alerting Use symptom based alerting Monitor for your users Four

    golden signals (traffic is tricky) Only page if something needs immediate human intervention
  8. Alerting Provide runbooks (playbooks) Keep them concise Explanation, hints, links

    Dynamic - include recent observations Discuss with non-experts
  9. Thank you May the queries flow, and your pagers be

    quiet. Tobias Schmidt - ContainerDays Hamburg 2016 @dagrobie - github.com/grobie