Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes monitoring 101

Kubernetes monitoring 101

In this talk, I describe some common issues in a Kubernetes cluster and what are the metrics you should monitor to troubleshoot.

0d83f514dfdd88bd9315481cee61fda8?s=128

Sergio Moya

July 16, 2018
Tweet

Transcript

  1. ©2008–18 New Relic, Inc. All rights reserved Kubernetes Monitoring 101

    Contain the Complexity of Kubernetes Sergio Moya - Senior Software Engineer @ New Relic
  2. ©2008–18 New Relic, Inc. All rights reserved • Why monitoring

    is a must. • What Needs to be Monitored in Kubernetes • Metric sources • How to monitor • Q&A Agenda
  3. ©2008–18 New Relic, Inc. All rights reserved Why monitoring is

    a must Ephemerality
  4. ©2008–18 New Relic, Inc. All rights reserved Kubernetes Cluster Node

    Applications Pod/Deployments Containers 4 What Needs to be Monitored in Kubernetes? And more...
  5. ©2008–18 New Relic, Inc. All rights reserved • What is

    the size of my Kubernetes cluster? • How many nodes, namespaces, deployments, pods, containers do I have running in my Cluster? Cluster Admin Cluster
  6. ©2008–18 New Relic, Inc. All rights reserved dsc 6 Cluster

    MONITORING FOR: Cluster Overview • What is the size of my Kubernetes cluster? • How many nodes, namespaces, deployments, pods, containers do I have running in my Cluster? Cluster Admin WHAT • Snapshot of what objects are included in a Cluster WHY • Kubernetes is managed by various teams including SREs, SysAdmin, Developers so it can be difficult to keep track of the current state of a Cluster
  7. ©2008–18 New Relic, Inc. All rights reserved • Do we

    have enough nodes in our cluster? • Are the resource requirements for the deployed applications overbook with existing nodes? Node Operations
  8. ©2008–18 New Relic, Inc. All rights reserved dsc 8 Node

    MONITORING FOR: Node resource consumption WHAT • Resource consumption (Used cores, Used memory) for each Kubernetes node • Total Memory VS Used WHY • Ensure that your cluster remains healthy • Ensure new deployments will succeed and not be blocked by lack of resources • Do we have enough nodes in our cluster? • Are the resource requirements for the deployed applications overbook with existing nodes? Operations
  9. ©2008–18 New Relic, Inc. All rights reserved • Are things

    working the way I expect them to? • Are my apps running and healthy? Pods Operations
  10. ©2008–18 New Relic, Inc. All rights reserved dsc 10 MONITORING

    FOR: Pods not running WHY • Missing pods may indicate: ◦ Insufficient resources to schedule a pod ◦ Unhealthy pods: Liveness probe, readinessProbe, etc. ◦ Others • Are things working the way I expect them to? • Are my apps running and healthy? Operations Pods/ Deployment WHAT • Number of current pods in a Deployment should be the same as desired.
  11. ©2008–18 New Relic, Inc. All rights reserved • Are my

    containers hitting their resource limits and affecting application performance? • Are there spikes in resource consumption? • Are there any containers in a restart loop? • How many container restarts have there been in X amount of time? Containers DevOps
  12. ©2008–18 New Relic, Inc. All rights reserved dsc 12 MONITORING

    FOR: Container Resources Usage WHY • If a container hits the limit of CPU usage, the application’s performances will be affected • If a container hits the limit of memory usage, K8s might terminate it or restart it • Are my containers hitting their resource limits and affecting application performance? • Are there spikes in resource consumption? DevOps Containers WHAT • Resource Request: minimum amount of resource which will be guaranteed by the scheduler • Resource Limit: is the maximum amount of the resource that the container will be allowed to consume
  13. ©2008–18 New Relic, Inc. All rights reserved dsc 13 MONITORING

    FOR: Container Restarts WHY • In normal conditions, container restart should not happen • A restart indicates an issue either with the container itself or the underlying host • Are there any containers in a restart loop? • How many container restarts have there been in X amount of time? DevOps Containers WHAT • A container can be restarted when it crashes or when its memory usage reaches the limit defined
  14. ©2008–18 New Relic, Inc. All rights reserved • What and

    how many services does my cluster have? • Which is the current status of my Horizontal Pod Autoscalers? • Are my Persistent Volumes well provisioned? • Etc Others You
  15. ©2008–18 New Relic, Inc. All rights reserved Metric sources

  16. ©2008–18 New Relic, Inc. All rights reserved Metric sources •

    Kubernetes API • kube-state-metrics • Heapster (deprecated) • Metrics Server • Kubelet and Cadvisor
  17. ©2008–18 New Relic, Inc. All rights reserved K8s API •

    No third party • Up to date • Bottleneck • Missing critical data. Ex: Pods resources Pros Cons
  18. ©2008–18 New Relic, Inc. All rights reserved kube-state-metrics • Tons

    of metrics • Well supported • Prometheus format • No data about not-scheduled-yet pods • Only state, no resources Pros Cons
  19. ©2008–18 New Relic, Inc. All rights reserved Heapster • Tons

    of metrics • Different backends (sinks) • Exposes Prometheus format • Plug&Play • No Prometheus backend (sink) • Resource consumption • Some sinks are not maintained • Deprecated (k8s >=v1.13.0) Pros Cons
  20. ©2008–18 New Relic, Inc. All rights reserved Metrics Server •

    Implements K8s Metrics API standard • Official • Only few metrics (CPU & Memory) • Early stage (incubator) Pros Cons
  21. ©2008–18 New Relic, Inc. All rights reserved Kubelet + Cadvisor

    • No third party • All data regarding the node, pods and containers resources • Distributed by nature • Only data about nodes, pods and containers • Some data inconsistency between the API and Kubelet Pros Cons
  22. ©2008–18 New Relic, Inc. All rights reserved K8s API Pros

    - No third party - Up to date - Bottleneck - Missing critical data. Ex: Pods resources Cons kube-state- metrics - Tons of metrics - Well supported - Prometheus format - No data about not-scheduled-yet pods - Only state, no resources Heapster - Tons of metrics - Different backends (sinks) - Exposes Prometheus format - Plug&Play - No Prometheus backend (sink) - Resource consumption - Some sinks are not maintained - Deprecated (k8s >=v1.13.0) Metrics Server - Implements K8s Metrics API standard - Official - Only few metrics (CPU & Memory) - Early stage (incubator) Kubelet + Cadvisor - No third party - All data regarding the node, pods and containers resources - Distributed by nature - Only data about nodes, pods and containers - Some data inconsistency between the API and Kubelet
  23. ©2008–18 New Relic, Inc. All rights reserved How to monitor

  24. ©2008–18 New Relic, Inc. All rights reserved Heapster + InfluxDB

    + Grafana Source: blog.couchbase.com
  25. ©2008–18 New Relic, Inc. All rights reserved Custom solutions •

    Deployment of pods fetching metrics from any of the sources. • Daemonset fetching metrics the Kubelet + Cadvisor (node) • Combination of both • Others?
  26. ©2008–18 New Relic, Inc. All rights reserved APM solutions

  27. ©2008–18 New Relic, Inc. All rights reserved How New Relic

    Kubernetes integration works under the hood? This topic: another talk
  28. ©2008–18 New Relic, Inc. All rights reserved Q&A

  29. ©2008–18 New Relic, Inc. All rights reserved Thank you