Monitoring In Motion
Challenges in Monitoring Kubernetes & Containers
Cloud Native SF Meetup
Feb 25, 2016
Ilan Rabinovitch
Director, Community
Datadog
Slide 2
Slide 2 text
About Me
● Long time Datadog user.
● Prior to Datadog built automation and
monitoring tooling at Ooyala and
Edmunds.com
● SCALE and TXLF Co-Founder
Ilan Rabinovitch
Datadog
[email protected]
@irabinovitch
Slide 3
Slide 3 text
Agenda
• Monitoring 101 - Crash Course
• Challenges in Monitoring Dynamic Infrastructure
• Demo Time
• Questions?
Slide 4
Slide 4 text
Monitoring Everything
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
@honest_update on Twitter
Slide 7
Slide 7 text
Quick Overview of Datadog
• Monitoring for modern applications.
• Time series storage of metrics and events.
• Trending, alerting and anomaly detection.
• Hundreds of integrations out of the box.
Slide 8
Slide 8 text
Monitoring 101: Categorization
More at: http://goo.gl/t1Rgcg
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
Monitoring 101: Focus on symptoms
More at: http://goo.gl/t1Rgcg
Slide 11
Slide 11 text
Recurse until you find root cause.
More at: http://goo.gl/t1Rgcg
Slide 12
Slide 12 text
Container Monitoring Challenges
Slide 13
Slide 13 text
https://www.datadoghq.com/docker-adoption/
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
Operational Complexity
•Average containers per host: N
(N=4, 10/2015)
•N-times as many “hosts” to manage
•Affects everything
•Use tags, labels, etc on your hosts and metrics.
•Pull in existing labels from your infrastructure (Region, Docker Images,
K8S Tags..)
Query Based Monitoring
By using tags, auto-adapt!
Slide 25
Slide 25 text
Where is my application running ?
What’s the total throughput of App X ?
What’s its response time per tag ? (pod, version, DC)
What’s the distribution of 5xx from Nginx per pod ?
Slide 26
Slide 26 text
Auto Discovery
Slide 27
Slide 27 text
Docker API Kubelet API
Monitoring Agent
Container
A O A O
A
O
Application Container
Off-The-Shelf Application (Redis,
PostgreSQL, …)
Containers List
Metadata
Additional Metadata
(Pod names, RC, …)
Config Backend
Integration Configurations
Host Level
Metrics
Slide 28
Slide 28 text
Some Pictures
Dashboards and Metrics Alerts
Sharing