Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes monitoring 101

Kubernetes monitoring 101

In this talk, I describe some common issues in a Kubernetes cluster and what are the metrics you should monitor to troubleshoot.

Sergio Moya

July 16, 2018
Tweet

More Decks by Sergio Moya

Other Decks in Programming

Transcript

  1. ©2008–18 New Relic, Inc. All rights reserved
    Kubernetes Monitoring 101
    Contain the
    Complexity
    of Kubernetes
    Sergio Moya - Senior Software Engineer @ New Relic

    View full-size slide

  2. ©2008–18 New Relic, Inc. All rights reserved
    ● Why monitoring is a must.
    ● What Needs to be Monitored in Kubernetes
    ● Metric sources
    ● How to monitor
    ● Q&A
    Agenda

    View full-size slide

  3. ©2008–18 New Relic, Inc. All rights reserved
    Why monitoring is a must
    Ephemerality

    View full-size slide

  4. ©2008–18 New Relic, Inc. All rights reserved
    Kubernetes Cluster
    Node
    Applications
    Pod/Deployments
    Containers
    4
    What Needs to be Monitored in Kubernetes?
    And more...

    View full-size slide

  5. ©2008–18 New Relic, Inc. All rights reserved
    • What is the size of my Kubernetes cluster?
    • How many nodes, namespaces, deployments, pods, containers
    do I have running in my Cluster?
    Cluster Admin
    Cluster

    View full-size slide

  6. ©2008–18 New Relic, Inc. All rights reserved
    dsc
    6
    Cluster
    MONITORING FOR:
    Cluster Overview
    • What is the size of my Kubernetes cluster?
    • How many nodes, namespaces, deployments, pods, containers
    do I have running in my Cluster?
    Cluster Admin
    WHAT
    • Snapshot of what objects are included in a Cluster
    WHY
    • Kubernetes is managed by various teams including SREs,
    SysAdmin, Developers so it can be difficult to keep track of
    the current state of a Cluster

    View full-size slide

  7. ©2008–18 New Relic, Inc. All rights reserved
    • Do we have enough nodes in our cluster?
    • Are the resource requirements for the deployed applications
    overbook with existing nodes?
    Node
    Operations

    View full-size slide

  8. ©2008–18 New Relic, Inc. All rights reserved
    dsc
    8
    Node
    MONITORING FOR:
    Node resource consumption
    WHAT
    • Resource consumption (Used cores, Used memory) for each
    Kubernetes node
    • Total Memory VS Used
    WHY
    • Ensure that your cluster remains healthy
    • Ensure new deployments will succeed and not be blocked by
    lack of resources
    • Do we have enough nodes in our cluster?
    • Are the resource requirements for the deployed applications
    overbook with existing nodes?
    Operations

    View full-size slide

  9. ©2008–18 New Relic, Inc. All rights reserved
    • Are things working the way I expect them to?
    • Are my apps running and healthy?
    Pods
    Operations

    View full-size slide

  10. ©2008–18 New Relic, Inc. All rights reserved
    dsc
    10
    MONITORING FOR:
    Pods not running
    WHY
    • Missing pods may indicate:
    ○ Insufficient resources to schedule a pod
    ○ Unhealthy pods: Liveness probe, readinessProbe,
    etc.
    ○ Others
    • Are things working the way I expect them to?
    • Are my apps running and healthy?
    Operations
    Pods/
    Deployment
    WHAT
    • Number of current pods in a Deployment should be the
    same as desired.

    View full-size slide

  11. ©2008–18 New Relic, Inc. All rights reserved
    • Are my containers hitting their resource limits and affecting
    application performance?
    • Are there spikes in resource consumption?
    • Are there any containers in a restart loop?
    • How many container restarts have there been in X amount of
    time?
    Containers
    DevOps

    View full-size slide

  12. ©2008–18 New Relic, Inc. All rights reserved
    dsc
    12
    MONITORING FOR:
    Container Resources Usage
    WHY
    • If a container hits the limit of CPU usage, the application’s
    performances will be affected
    • If a container hits the limit of memory usage, K8s might
    terminate it or restart it
    • Are my containers hitting their resource limits and affecting
    application performance?
    • Are there spikes in resource consumption?
    DevOps
    Containers
    WHAT
    • Resource Request: minimum amount of resource which will
    be guaranteed by the scheduler
    • Resource Limit: is the maximum amount of the resource
    that the container will be allowed to consume

    View full-size slide

  13. ©2008–18 New Relic, Inc. All rights reserved
    dsc
    13
    MONITORING FOR:
    Container Restarts
    WHY
    • In normal conditions, container restart should not happen
    • A restart indicates an issue either with the container itself or
    the underlying host
    • Are there any containers in a restart loop?
    • How many container restarts have there been in X amount of time?
    DevOps
    Containers
    WHAT
    • A container can be restarted when it crashes or when its
    memory usage reaches the limit defined

    View full-size slide

  14. ©2008–18 New Relic, Inc. All rights reserved
    • What and how many services does my cluster have?
    • Which is the current status of my Horizontal Pod Autoscalers?
    • Are my Persistent Volumes well provisioned?
    • Etc
    Others
    You

    View full-size slide

  15. ©2008–18 New Relic, Inc. All rights reserved
    Metric sources

    View full-size slide

  16. ©2008–18 New Relic, Inc. All rights reserved
    Metric sources
    ● Kubernetes API
    ● kube-state-metrics
    ● Heapster (deprecated)
    ● Metrics Server
    ● Kubelet and Cadvisor

    View full-size slide

  17. ©2008–18 New Relic, Inc. All rights reserved
    K8s API
    ● No third party
    ● Up to date
    ● Bottleneck
    ● Missing critical data. Ex:
    Pods resources
    Pros Cons

    View full-size slide

  18. ©2008–18 New Relic, Inc. All rights reserved
    kube-state-metrics
    ● Tons of metrics
    ● Well supported
    ● Prometheus format
    ● No data about
    not-scheduled-yet pods
    ● Only state, no resources
    Pros Cons

    View full-size slide

  19. ©2008–18 New Relic, Inc. All rights reserved
    Heapster
    ● Tons of metrics
    ● Different backends
    (sinks)
    ● Exposes
    Prometheus format
    ● Plug&Play
    ● No Prometheus backend
    (sink)
    ● Resource consumption
    ● Some sinks are not
    maintained
    ● Deprecated (k8s
    >=v1.13.0)
    Pros Cons

    View full-size slide

  20. ©2008–18 New Relic, Inc. All rights reserved
    Metrics Server
    ● Implements K8s
    Metrics API standard
    ● Official
    ● Only few metrics (CPU &
    Memory)
    ● Early stage (incubator)
    Pros Cons

    View full-size slide

  21. ©2008–18 New Relic, Inc. All rights reserved
    Kubelet + Cadvisor
    ● No third party
    ● All data regarding
    the node, pods and
    containers resources
    ● Distributed by
    nature
    ● Only data about nodes,
    pods and containers
    ● Some data inconsistency
    between the API and
    Kubelet
    Pros Cons

    View full-size slide

  22. ©2008–18 New Relic, Inc. All rights reserved
    K8s API
    Pros
    - No third party
    - Up to date
    - Bottleneck
    - Missing critical data. Ex: Pods
    resources
    Cons
    kube-state-
    metrics
    - Tons of metrics
    - Well supported
    - Prometheus format
    - No data about not-scheduled-yet pods
    - Only state, no resources
    Heapster
    - Tons of metrics
    - Different backends (sinks)
    - Exposes Prometheus format
    - Plug&Play
    - No Prometheus backend (sink)
    - Resource consumption
    - Some sinks are not maintained
    - Deprecated (k8s >=v1.13.0)
    Metrics
    Server
    - Implements K8s Metrics API
    standard
    - Official
    - Only few metrics (CPU & Memory)
    - Early stage (incubator)
    Kubelet +
    Cadvisor
    - No third party
    - All data regarding the node,
    pods and containers resources
    - Distributed by nature
    - Only data about nodes, pods and
    containers
    - Some data inconsistency between the
    API and Kubelet

    View full-size slide

  23. ©2008–18 New Relic, Inc. All rights reserved
    How to monitor

    View full-size slide

  24. ©2008–18 New Relic, Inc. All rights reserved
    Heapster + InfluxDB + Grafana
    Source: blog.couchbase.com

    View full-size slide

  25. ©2008–18 New Relic, Inc. All rights reserved
    Custom solutions
    ● Deployment of pods fetching metrics from any of the sources.
    ● Daemonset fetching metrics the Kubelet + Cadvisor (node)
    ● Combination of both
    ● Others?

    View full-size slide

  26. ©2008–18 New Relic, Inc. All rights reserved
    APM solutions

    View full-size slide

  27. ©2008–18 New Relic, Inc. All rights reserved
    How New Relic Kubernetes integration works
    under the hood?
    This topic: another talk

    View full-size slide

  28. ©2008–18 New Relic, Inc. All rights reserved
    Q&A

    View full-size slide

  29. ©2008–18 New Relic, Inc. All rights reserved
    Thank you

    View full-size slide