Slide 1

Slide 1 text

©2008–18 New Relic, Inc. All rights reserved Kubernetes Monitoring 101 Contain the Complexity of Kubernetes Sergio Moya - Senior Software Engineer @ New Relic

Slide 2

Slide 2 text

©2008–18 New Relic, Inc. All rights reserved ● Why monitoring is a must. ● What Needs to be Monitored in Kubernetes ● Metric sources ● How to monitor ● Q&A Agenda

Slide 3

Slide 3 text

©2008–18 New Relic, Inc. All rights reserved Why monitoring is a must Ephemerality

Slide 4

Slide 4 text

©2008–18 New Relic, Inc. All rights reserved Kubernetes Cluster Node Applications Pod/Deployments Containers 4 What Needs to be Monitored in Kubernetes? And more...

Slide 5

Slide 5 text

©2008–18 New Relic, Inc. All rights reserved • What is the size of my Kubernetes cluster? • How many nodes, namespaces, deployments, pods, containers do I have running in my Cluster? Cluster Admin Cluster

Slide 6

Slide 6 text

©2008–18 New Relic, Inc. All rights reserved dsc 6 Cluster MONITORING FOR: Cluster Overview • What is the size of my Kubernetes cluster? • How many nodes, namespaces, deployments, pods, containers do I have running in my Cluster? Cluster Admin WHAT • Snapshot of what objects are included in a Cluster WHY • Kubernetes is managed by various teams including SREs, SysAdmin, Developers so it can be difficult to keep track of the current state of a Cluster

Slide 7

Slide 7 text

©2008–18 New Relic, Inc. All rights reserved • Do we have enough nodes in our cluster? • Are the resource requirements for the deployed applications overbook with existing nodes? Node Operations

Slide 8

Slide 8 text

©2008–18 New Relic, Inc. All rights reserved dsc 8 Node MONITORING FOR: Node resource consumption WHAT • Resource consumption (Used cores, Used memory) for each Kubernetes node • Total Memory VS Used WHY • Ensure that your cluster remains healthy • Ensure new deployments will succeed and not be blocked by lack of resources • Do we have enough nodes in our cluster? • Are the resource requirements for the deployed applications overbook with existing nodes? Operations

Slide 9

Slide 9 text

©2008–18 New Relic, Inc. All rights reserved • Are things working the way I expect them to? • Are my apps running and healthy? Pods Operations

Slide 10

Slide 10 text

©2008–18 New Relic, Inc. All rights reserved dsc 10 MONITORING FOR: Pods not running WHY • Missing pods may indicate: ○ Insufficient resources to schedule a pod ○ Unhealthy pods: Liveness probe, readinessProbe, etc. ○ Others • Are things working the way I expect them to? • Are my apps running and healthy? Operations Pods/ Deployment WHAT • Number of current pods in a Deployment should be the same as desired.

Slide 11

Slide 11 text

©2008–18 New Relic, Inc. All rights reserved • Are my containers hitting their resource limits and affecting application performance? • Are there spikes in resource consumption? • Are there any containers in a restart loop? • How many container restarts have there been in X amount of time? Containers DevOps

Slide 12

Slide 12 text

©2008–18 New Relic, Inc. All rights reserved dsc 12 MONITORING FOR: Container Resources Usage WHY • If a container hits the limit of CPU usage, the application’s performances will be affected • If a container hits the limit of memory usage, K8s might terminate it or restart it • Are my containers hitting their resource limits and affecting application performance? • Are there spikes in resource consumption? DevOps Containers WHAT • Resource Request: minimum amount of resource which will be guaranteed by the scheduler • Resource Limit: is the maximum amount of the resource that the container will be allowed to consume

Slide 13

Slide 13 text

©2008–18 New Relic, Inc. All rights reserved dsc 13 MONITORING FOR: Container Restarts WHY • In normal conditions, container restart should not happen • A restart indicates an issue either with the container itself or the underlying host • Are there any containers in a restart loop? • How many container restarts have there been in X amount of time? DevOps Containers WHAT • A container can be restarted when it crashes or when its memory usage reaches the limit defined

Slide 14

Slide 14 text

©2008–18 New Relic, Inc. All rights reserved • What and how many services does my cluster have? • Which is the current status of my Horizontal Pod Autoscalers? • Are my Persistent Volumes well provisioned? • Etc Others You

Slide 15

Slide 15 text

©2008–18 New Relic, Inc. All rights reserved Metric sources

Slide 16

Slide 16 text

©2008–18 New Relic, Inc. All rights reserved Metric sources ● Kubernetes API ● kube-state-metrics ● Heapster (deprecated) ● Metrics Server ● Kubelet and Cadvisor

Slide 17

Slide 17 text

©2008–18 New Relic, Inc. All rights reserved K8s API ● No third party ● Up to date ● Bottleneck ● Missing critical data. Ex: Pods resources Pros Cons

Slide 18

Slide 18 text

©2008–18 New Relic, Inc. All rights reserved kube-state-metrics ● Tons of metrics ● Well supported ● Prometheus format ● No data about not-scheduled-yet pods ● Only state, no resources Pros Cons

Slide 19

Slide 19 text

©2008–18 New Relic, Inc. All rights reserved Heapster ● Tons of metrics ● Different backends (sinks) ● Exposes Prometheus format ● Plug&Play ● No Prometheus backend (sink) ● Resource consumption ● Some sinks are not maintained ● Deprecated (k8s >=v1.13.0) Pros Cons

Slide 20

Slide 20 text

©2008–18 New Relic, Inc. All rights reserved Metrics Server ● Implements K8s Metrics API standard ● Official ● Only few metrics (CPU & Memory) ● Early stage (incubator) Pros Cons

Slide 21

Slide 21 text

©2008–18 New Relic, Inc. All rights reserved Kubelet + Cadvisor ● No third party ● All data regarding the node, pods and containers resources ● Distributed by nature ● Only data about nodes, pods and containers ● Some data inconsistency between the API and Kubelet Pros Cons

Slide 22

Slide 22 text

©2008–18 New Relic, Inc. All rights reserved K8s API Pros - No third party - Up to date - Bottleneck - Missing critical data. Ex: Pods resources Cons kube-state- metrics - Tons of metrics - Well supported - Prometheus format - No data about not-scheduled-yet pods - Only state, no resources Heapster - Tons of metrics - Different backends (sinks) - Exposes Prometheus format - Plug&Play - No Prometheus backend (sink) - Resource consumption - Some sinks are not maintained - Deprecated (k8s >=v1.13.0) Metrics Server - Implements K8s Metrics API standard - Official - Only few metrics (CPU & Memory) - Early stage (incubator) Kubelet + Cadvisor - No third party - All data regarding the node, pods and containers resources - Distributed by nature - Only data about nodes, pods and containers - Some data inconsistency between the API and Kubelet

Slide 23

Slide 23 text

©2008–18 New Relic, Inc. All rights reserved How to monitor

Slide 24

Slide 24 text

©2008–18 New Relic, Inc. All rights reserved Heapster + InfluxDB + Grafana Source: blog.couchbase.com

Slide 25

Slide 25 text

©2008–18 New Relic, Inc. All rights reserved Custom solutions ● Deployment of pods fetching metrics from any of the sources. ● Daemonset fetching metrics the Kubelet + Cadvisor (node) ● Combination of both ● Others?

Slide 26

Slide 26 text

©2008–18 New Relic, Inc. All rights reserved APM solutions

Slide 27

Slide 27 text

©2008–18 New Relic, Inc. All rights reserved How New Relic Kubernetes integration works under the hood? This topic: another talk

Slide 28

Slide 28 text

©2008–18 New Relic, Inc. All rights reserved Q&A

Slide 29

Slide 29 text

©2008–18 New Relic, Inc. All rights reserved Thank you