Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Improve Monitoring and Observability for Kubernetes with OSS tools

Nilesh Gule
December 12, 2023

Improve Monitoring and Observability for Kubernetes with OSS tools

Slide deck related to the presentation at the KubeDay Singapore event. The session covered 3 pillars of Observability and how to use Jaeger for Distribute Tracing, Loki for Log Aggregation and Prometheus and Grafana for Metrics in a distributed application. Azure Kubernetes Service AKS cluster was used for live demo.


Nilesh Gule

December 12, 2023

More Decks by Nilesh Gule

Other Decks in Technology


  1. Nilesh Gule ARCHITECT | MICROSOFT MVP | First Docker Captain

    in Singapore “Code with Passion and Strive for Excellence” nileshgule @nileshgul e Nilesh Gule NileshGule www.handsonarchitect.co m https://www.youtube.com/@nilesh-gule
  2. @nileshgule ❑ Application specific ❖ Long term log retention for

    compliance reasons ❖ Workloads scheduled on different nodes during application restarts / updates ❖ Autoscaling workloads ❑ Kubernetes upgrades ❖ Auto healing can reschedule workloads ❖ Underlying nodes added / deleted during cluster scaling ❖ Underlying nodes replaced during cluster upgrades Container based workloads Why centralized logging ❖ Not much control over underlying infra ❖ Relies on cloud prover specific logging and monitoring solution PaaS / Serverless services
  3. @nileshgule Financial Services App Loki integration Log collector Log storage

    Log search, visualise, dashboards backend-service account-service authentication-service forex-service transaction-service
  4. @nileshgule • Application specific • Monitor resource usage • Monitor

    scaling needs • Monitor anomalies / outliers • Kubernetes platform level • Monitor cluster resources (CPU / RAM) • API health • Autoscaling Container based workloads Why Metrics • Monitor resource usage • Scaling • Bottlenecks PaaS / Serverless services
  5. @nileshgule Financial Services App Prometheus integration Scrape Metrics Metrics storage

    visualise, dashboards backend-service account-service authentication-service forex-service transaction-service service-monitor
  6. @nileshgule • Distributed Tracing • Understanding complex systems • Performance

    monitoring and optimizations • Debugging and problem resolution Why Distributed Tracing
  7. @nileshgule Financial Services App Jaeger integration Distributed Traces Visualise Traces

    backend-service account-service authentication-service forex-service transaction-service Jaeger Operator
  8. @nileshgule Summary Modern day cloud native applications need new ways

    to address observability & monitoring ✓ Use best-of-class for given use case ✓ Rely on open standards (e.g. OpenTelemetry) ✓ Build portable observability systems (e.g. hybrid cloud migration) Log Aggregation ✓ Loki helps in centralized logging ✓ Grafana is used to visualize logs and build dashboards Metrics ✓ Prometheus provides easy to use metrics for platforms, applications ✓ Grafana provides visualization capabilities to build intuitive dashboards Distributed Tracing ✓ Jaeger provides distributed tracing capabilities
  9. @nileshgule Some Recommendations ♣ Too many agents ♣ Instrumentation, vendor

    lock-in ♣ Cloud native logs ♣ Cloud native metrics ♣ Cloud native traces ♣ Single pane of glass, correlation ∞ OpenTelemetry collector ∞ OpenTelemetry, OpenMetrics ∞ Fluent Bit / Fluentd, OpenSearch, Loki ∞ Prometheus, Cortex, Thanos ∞ OpenTelemetry, Jaeger, Grafana ∞ Grafana Challenges Tools
  10. @nileshgule References Log Aggregation ❖ Grafana Loki Monitoring & Alerting

    ❖ Prometheus ❖ Grafana ❖ Kube Prometheus stack ❖ Houssem Dellai – Prometheus & Grafana for monitoring Kubernetes Distributed Tracing ❖ Jaeger Tracing
  11. @nileshgule Source Code & slide deck Financial Services Demo https://github.com/infofractionalservices/microservices/tree/do

    cker_build_fixes https://speakerdeck.com/nileshgule/ https://www.slideshare.net/nileshgule/
  12. Q&A