Slide 1

Slide 1 text

Monitoring Kubernetes Clusters with Prometheus Fabian Reinartz Software Engineer at CoreOS @fabxc

Slide 2

Slide 2 text

Monitoring Challenges ● A lot of targets to monitor ● Targets constantly change ● Need high-level overview (by namespace, service, …) ● Need drill-down for investigation (down to pod and below)

Slide 3

Slide 3 text

Prometheus Recap ● Pull-based monitoring system ● Multi-dimensional data model ● Handles millions of time series per instance http_requests_total{path="/home", status="200", method="GET"} 9523 http_requests_total{path="/home", status="500", method="GET"} 233 http_requests_total{path="/settings", status="200", method="GET"} 512 http_requests_total{path="/settings", status="200", method="POST"} 68

Slide 4

Slide 4 text

Powerful Querying histogram_quantile(0.99, sum by(path, le) (rate(request_latency_seconds_bucket[5m])) ) 99th percentile latency on API server operations per resource? {path="/status"} 0.012 {path="/"} 0.43 {path="/api/v1/topics/:topic"} 1.31 {path="/api/v1/topics} 0.192

Slide 5

Slide 5 text

Powerful Querying ALERT DiskWillFillIn4Hours IF predict_linear(node_filesystem_free[1h], 4*3600) < 0 Is any disk about to run full within 4 hours? 0 now -1h +4h

Slide 6

Slide 6 text

Kubernetes Integration ● Always sync monitoring targets with Kubernetes API ● Use meta information to enrich metrics scrape metrics API Server sync monitoring targets

Slide 7

Slide 7 text

Metrics of Kubernetes Node node exporter cAdvisor API server Kubelet etcd0 etcd1 etcd2 ... pods kube-state-metrics

Slide 8

Slide 8 text

Metrics of Kubernetes Node node exporter cAdvisor API server Kubelet etcd0 etcd1 etcd2 ... pods kube-state-metrics

Slide 9

Slide 9 text

Metrics of Kubernetes cAdvisor API server Kubelet Node node exporter etcd0 etcd1 etcd2 ... pods kube-state-metrics

Slide 10

Slide 10 text

Metrics of Kubernetes Node node exporter cAdvisor API server Kubelet etcd0 etcd1 etcd2 ... pods kube-state-metrics

Slide 11

Slide 11 text

Metrics of Kubernetes Node node exporter cAdvisor API server Kubelet etcd0 etcd1 etcd2 ... pods kube-state-metrics

Slide 12

Slide 12 text

Metrics of Kubernetes Node node exporter API server Kubelet etcd0 cAdvisor etcd1 etcd2 ... pods kube-state-metrics

Slide 13

Slide 13 text

Monitoring Challenges ● A lot of targets to monitor ✓ ● Targets constantly change ✓ ● Need high-level overview (by namespace, service, …) ✓ ● Need drill-down for investigation (down to pod and below) ✓ ● AND: Make monitoring trivial to deploy & operate

Slide 14

Slide 14 text

Managed Deployments apiVersion: prometheus.coreos.com/v1alpha1 kind: Prometheus metadata: name: prometheus-k8s spec: replicas: 2 version: v1.3.0 Prometheus TPR defines desired Prometheus setup Operator deploys and manages Prometheus instances Operator deploy & manage Prometheus Server watch

Slide 15

Slide 15 text

github.com/coreos/kube-prometheus $ cluster-monitoring/deploy

Slide 16

Slide 16 text

Metrics of Kubernetes Node node exporter cAdvisor API server Kubelet etcd0 etcd1 etcd2 kube-state-metrics pods

Slide 17

Slide 17 text

Monitoring as a Cluster Feature

Slide 18

Slide 18 text

Operator (continued) apiVersion: monitoring.coreos.com/v1alpha1 kind: ServiceMonitor metadata: name: frontend labels: tier: frontend spec: selector: matchLabels: tier: frontend endpoints: - port: web path: /metrics interval: 30s Declarative definition of how to monitor a group of services Loosely coupled via labels Part of your cluster’s API

Slide 19

Slide 19 text

Operator (continued) apiVersion: monitoring.coreos.com/v1alpha1 kind: ServiceMonitor metadata: name: frontend labels: tier: frontend spec: selector: matchLabels: tier: frontend endpoints: - port: web path: /metrics interval: 30s Select applicable services by their labels

Slide 20

Slide 20 text

Operator (continued) apiVersion: monitoring.coreos.com/v1alpha1 kind: ServiceMonitor metadata: name: frontend labels: tier: frontend spec: selector: matchLabels: tier: frontend endpoints: - port: web path: /metrics interval: 30s Declare where these services expose metrics

Slide 21

Slide 21 text

Operator (continued) apiVersion: monitoring.coreos.com/v1alpha1 kind: ServiceMonitor metadata: name: frontend labels: tier: frontend spec: selector: matchLabels: tier: frontend endpoints: - port: web path: /metrics interval: 30s Prometheus deployments include ServiceMonitors by their labels

Slide 22

Slide 22 text

Operator (continued) apiVersion: monitoring.coreos.com/v1alpha1 kind: Prometheus metadata: name: prometheus-frontend Spec: version: v1.3.0 serviceMonitors: - selector: matchLabels: tier: frontend Prometheus deployments include ServiceMonitors by their labels

Slide 23

Slide 23 text

Service 1 Service 2 Service 3 Service 4 Service 5 ServiceMonitor 1 ServiceMonitor 2 Prometheus Operator deploy & manage Prometheus Server watch

Slide 24

Slide 24 text

Try it out! github.com/coreos/prometheus-operator github.com/coreos/kube-prometheus

Slide 25

Slide 25 text

[email protected] @fabxc QUESTIONS? Thanks! We’re hiring: coreos.com/careers Let’s talk! #prometheus on Freenode More events: coreos.com/community LONGER CHAT? also in Berlin!