Improve Monitoring and Observability of Kubernetes with OSS tools

@nileshgule Improve Monitoring and Observability for Kubernetes with OSS tools

$whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github”
: “https://GitHub.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “likes” : “Technical Evangelism, Cricket”, “co-organizer” : “Azure Singapore UG” }

@nileshgule Pre-requisites Self contained application with all its dependencies Docker
❖ Orchestrates containers ❖ Self healing ❖ Service discovery ❖ Scaling Kubernetes ❖ Scalable apps in dynamic environments (public / private / hybrid clouds) ❖ Exemplified by Containers, service meshes, microservices, immutable infrastructure & declarative APIs ❖ Loosely coupled systems, resilient, observable & manageable ❖ Robust automation Cloud Native Applications

@nileshgule

@nileshgule CNCF cloud trail https://github.com/cncf/trailmap

@nileshgule CNCF Observability landscape https://landscape.cncf.io

@nileshgule CNCF Observability Radar https://radar.cncf.io/2020-09-observability

@nileshgule 3 Pillars of Observability Logs Metrics Traces

@nileshgule Centralized Logging

@nileshgule ❑ Application specific ❖ Long term log retention for
compliance reasons ❖ Workloads scheduled on different nodes during application restarts / updates ❖ Autoscaling workloads ❑ Kubernetes upgrades ❖ Auto healing can reschedule workloads ❖ Underlying nodes added / deleted during cluster scaling ❖ Underlying nodes replaced during cluster upgrades Container based workloads Why centralized logging ❖ Not much control over underlying infra ❖ Relies on cloud prover specific logging and monitoring solution PaaS / Serverless services

@nileshgule Tech Talks EFK integration Log collector Log storage Log
search, visualise, dashboards rabbitmq-producer-service rabbitmq-consumer-deployment

@nileshgule Demo 1 – Log Aggregation with EFK

@nileshgule Monitoring and Alerting

@nileshgule • Application specific • Monitor resource usage • Monitor
scaling needs • Monitor anomalies / outliers • Kubernetes platform level • Monitor cluster resources (CPU / RAM) • API health • Autoscaling Container based workloads Why Monitoring & Alerting • Monitor resource usage • Scaling • Bottlenecks PaaS / Serverless services

@nileshgule Prometheus Architecture

@nileshgule Demo 2 – Metrics using Prometheus & Grafana

@nileshgule Spring Boot Conference App integration https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server conference-demo-service-monitor conference-demo-service

@nileshgule Exception Handling

@nileshgule Sentry Architecture https://develop.sentry.dev/architecture/

@nileshgule Spring Boot Sentry integration conference-demo-service Managed Kubernetes cluster

@nileshgule Demo 3 – Exception aggregation using Sentry

@nileshgule End to End Observability

@nileshgule Observability challenges ➢ Too many telemetry agents ➢ Instrumentation
of Apps ➢ Dynamic & small units in Cloud Native Applications ➢ Right retention period for each type of metric and usage ➢ Minimize vendor or feature lock-in ➢ Buy vs Build ➢ Transition from Monitoring to Observability ➢ Single pane of glass for consuming different information ➢ Correlation of signals

@nileshgule Analogy - Use right tool for right purpose

@nileshgule Summary ✓ Use best-of-class for given use case ✓
Rely on open standards (e.g. OpenTelemetry) ✓ Build portable observability systems (e.g. hybrid cloud migration) Log Aggregation ✓ EFK stack helps in centralized logging ✓ Kibana is used to visualize logs and build dashboards Monitoring & Alerting ✓ Prometheus provides easy to use metrics for platforms, applications ✓ Grafana provides visualization capabilities to build intuitive dashboards Exception Aggregation ✓ Sentry provides Exception Aggregation capabilities ✓ Excellent telemetry data captured by Sentry to help diagnose problems

@nileshgule Some Recommendations ♣ Too many agents ♣ Instrumentation, vendor
lock-in ♣ Cloud native logs ♣ Cloud native metrics ♣ Cloud native traces ♣ Single pane of glass, correlation ∞ OpenTelemetry collector ∞ OpenTelemetry, OpenMetrics ∞ Fluent Bit / Fluentd, OpenSearch, Loki ∞ Prometheus, Cortex, Thanos ∞ OpenTelemetry, Jaeger, Grafana ∞ Grafana Challenges Tools

@nileshgule References Log Aggregation ❖ Elastic stack ❖ Kibana ❖
Fluentbit Monitoring & Alerting ❖ Prometheus ❖ Grafana ❖ Kube Prometheus stack ❖ Dynatrace – Monitoring vs Observability ❖ Houssem Dellai – Prometheus & Grafana for monitoring Kubernetes Sentry ❖ Sentry docs

@nileshgule Source Code & slide deck Tech Talks https://github.com/NileshGule/pd-tech-fest-2019 Observability
& Monitoring markdown Conference app https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server https://speakerdeck.com/nileshgule/ https://www.slideshare.net/nileshgule/

Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and
Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com https://bit.ly/youtube-nileshgule

Improve Monitoring and Observability of Kuberne...

Improve Monitoring and Observability of Kubernetes with OSS tools

Nilesh Gule

More Decks by Nilesh Gule

Other Decks in Technology

Featured

Transcript

@nileshgule Improve Monitoring and Observability for Kubernetes with OSS tools

$whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github”

@nileshgule Pre-requisites Self contained application with all its dependencies Docker

@nileshgule

@nileshgule CNCF cloud trail https://github.com/cncf/trailmap

@nileshgule CNCF Observability landscape https://landscape.cncf.io

@nileshgule CNCF Observability Radar https://radar.cncf.io/2020-09-observability

@nileshgule CNCF Observability Radar https://radar.cncf.io/2020-09-observability

@nileshgule 3 Pillars of Observability Logs Metrics Traces

@nileshgule Centralized Logging

@nileshgule ❑ Application specific ❖ Long term log retention for

@nileshgule Tech Talks EFK integration Log collector Log storage Log

@nileshgule Demo 1 – Log Aggregation with EFK

@nileshgule Monitoring and Alerting

@nileshgule • Application specific • Monitor resource usage • Monitor

@nileshgule Prometheus Architecture

@nileshgule Demo 2 – Metrics using Prometheus & Grafana

@nileshgule Spring Boot Conference App integration https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server conference-demo-service-monitor conference-demo-service

@nileshgule Exception Handling

@nileshgule Sentry Architecture https://develop.sentry.dev/architecture/

@nileshgule Spring Boot Sentry integration conference-demo-service Managed Kubernetes cluster

@nileshgule Demo 3 – Exception aggregation using Sentry

@nileshgule End to End Observability

@nileshgule Observability challenges ➢ Too many telemetry agents ➢ Instrumentation

@nileshgule Analogy - Use right tool for right purpose

@nileshgule Summary ✓ Use best-of-class for given use case ✓

@nileshgule Some Recommendations ♣ Too many agents ♣ Instrumentation, vendor

@nileshgule References Log Aggregation ❖ Elastic stack ❖ Kibana ❖

@nileshgule Source Code & slide deck Tech Talks https://github.com/NileshGule/pd-tech-fest-2019 Observability

Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and

Q&A