Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improve Monitoring and Observability of Kubernetes with OSS tools

Improve Monitoring and Observability of Kubernetes with OSS tools

Slide deck from the ASEAN Cloud Summit meetup on 27 January 2022. The session cover the following topics
1 - Centralized Loggin with Elasticsearch, Fluentbit and Kibana
2 - Monitoring and Alerting with Prometheus and Grafana
3 - Exception aggregation with Sentry
The live demo showcased these aspects using Azure Kubernetes Service (AKS)


Nilesh Gule

January 27, 2022

More Decks by Nilesh Gule

Other Decks in Technology


  1. @nileshgule Improve Monitoring and Observability for Kubernetes with OSS tools

  2. $whoami { “name” : “Nilesh Gule”, “website” : “https://www.HandsOnArchitect.com", “github”

    : “https://GitHub.com/NileshGule" “twitter” : “@nileshgule”, “linkedin” : “https://www.linkedin.com/in/nileshgule”, “likes” : “Technical Evangelism, Cricket”, “co-organizer” : “Azure Singapore UG” }
  3. @nileshgule Pre-requisites Self contained application with all its dependencies Docker

    ❖ Orchestrates containers ❖ Self healing ❖ Service discovery ❖ Scaling Kubernetes ❖ Scalable apps in dynamic environments (public / private / hybrid clouds) ❖ Exemplified by Containers, service meshes, microservices, immutable infrastructure & declarative APIs ❖ Loosely coupled systems, resilient, observable & manageable ❖ Robust automation Cloud Native Applications
  4. @nileshgule

  5. @nileshgule CNCF cloud trail https://github.com/cncf/trailmap

  6. @nileshgule CNCF Observability landscape https://landscape.cncf.io

  7. @nileshgule CNCF Observability Radar https://radar.cncf.io/2020-09-observability

  8. @nileshgule CNCF Observability Radar https://radar.cncf.io/2020-09-observability

  9. @nileshgule 3 Pillars of Observability Logs Metrics Traces

  10. @nileshgule Centralized Logging

  11. @nileshgule ❑ Application specific ❖ Long term log retention for

    compliance reasons ❖ Workloads scheduled on different nodes during application restarts / updates ❖ Autoscaling workloads ❑ Kubernetes upgrades ❖ Auto healing can reschedule workloads ❖ Underlying nodes added / deleted during cluster scaling ❖ Underlying nodes replaced during cluster upgrades Container based workloads Why centralized logging ❖ Not much control over underlying infra ❖ Relies on cloud prover specific logging and monitoring solution PaaS / Serverless services
  12. @nileshgule Tech Talks EFK integration Log collector Log storage Log

    search, visualise, dashboards rabbitmq-producer-service rabbitmq-consumer-deployment
  13. @nileshgule Demo 1 – Log Aggregation with EFK

  14. @nileshgule Monitoring and Alerting

  15. @nileshgule • Application specific • Monitor resource usage • Monitor

    scaling needs • Monitor anomalies / outliers • Kubernetes platform level • Monitor cluster resources (CPU / RAM) • API health • Autoscaling Container based workloads Why Monitoring & Alerting • Monitor resource usage • Scaling • Bottlenecks PaaS / Serverless services
  16. @nileshgule Prometheus Architecture

  17. @nileshgule Demo 2 – Metrics using Prometheus & Grafana

  18. @nileshgule Spring Boot Conference App integration https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server conference-demo-service-monitor conference-demo-service

  19. @nileshgule Exception Handling

  20. @nileshgule Sentry Architecture https://develop.sentry.dev/architecture/

  21. @nileshgule Spring Boot Sentry integration conference-demo-service Managed Kubernetes cluster

  22. @nileshgule Demo 3 – Exception aggregation using Sentry

  23. @nileshgule End to End Observability

  24. @nileshgule Observability challenges ➢ Too many telemetry agents ➢ Instrumentation

    of Apps ➢ Dynamic & small units in Cloud Native Applications ➢ Right retention period for each type of metric and usage ➢ Minimize vendor or feature lock-in ➢ Buy vs Build ➢ Transition from Monitoring to Observability ➢ Single pane of glass for consuming different information ➢ Correlation of signals
  25. @nileshgule Analogy - Use right tool for right purpose

  26. @nileshgule Summary ✓ Use best-of-class for given use case ✓

    Rely on open standards (e.g. OpenTelemetry) ✓ Build portable observability systems (e.g. hybrid cloud migration) Log Aggregation ✓ EFK stack helps in centralized logging ✓ Kibana is used to visualize logs and build dashboards Monitoring & Alerting ✓ Prometheus provides easy to use metrics for platforms, applications ✓ Grafana provides visualization capabilities to build intuitive dashboards Exception Aggregation ✓ Sentry provides Exception Aggregation capabilities ✓ Excellent telemetry data captured by Sentry to help diagnose problems
  27. @nileshgule Some Recommendations ♣ Too many agents ♣ Instrumentation, vendor

    lock-in ♣ Cloud native logs ♣ Cloud native metrics ♣ Cloud native traces ♣ Single pane of glass, correlation ∞ OpenTelemetry collector ∞ OpenTelemetry, OpenMetrics ∞ Fluent Bit / Fluentd, OpenSearch, Loki ∞ Prometheus, Cortex, Thanos ∞ OpenTelemetry, Jaeger, Grafana ∞ Grafana Challenges Tools
  28. @nileshgule References Log Aggregation ❖ Elastic stack ❖ Kibana ❖

    Fluentbit Monitoring & Alerting ❖ Prometheus ❖ Grafana ❖ Kube Prometheus stack ❖ Dynatrace – Monitoring vs Observability ❖ Houssem Dellai – Prometheus & Grafana for monitoring Kubernetes Sentry ❖ Sentry docs
  29. @nileshgule Source Code & slide deck Tech Talks https://github.com/NileshGule/pd-tech-fest-2019 Observability

    & Monitoring markdown Conference app https://github.com/NileshGule/spring-boot-conference-app/tree/mssql-server https://speakerdeck.com/nileshgule/ https://www.slideshare.net/nileshgule/
  30. Nilesh Gule ARCHITECT | MICROSOFT MVP “Code with Passion and

    Strive for Excellence” nileshgule @nileshgule Nilesh Gule NileshGule www.handsonarchitect.com https://bit.ly/youtube-nileshgule
  31. Q&A