Improve Monitoring and Observability for Kubernetes with OSS tools

Nilesh Gule
December 12, 2023

Slide deck related to the presentation at the KubeDay Singapore event. The session covered 3 pillars of Observability and how to use Jaeger for Distribute Tracing, Loki for Log Aggregation and Prometheus and Grafana for Metrics in a distributed application. Azure Kubernetes Service AKS cluster was used for live demo.


  1. Nilesh Gule ARCHITECT | MICROSOFT MVP | First Docker Captain

  2. @nileshgule ❑ Application specific ❖ Long term log retention for

    compliance reasons ❖ Workloads scheduled on different nodes during application restarts / updates ❖ Autoscaling workloads ❑ Kubernetes upgrades ❖ Auto healing can reschedule workloads ❖ Underlying nodes added / deleted during cluster scaling ❖ Underlying nodes replaced during cluster upgrades Container based workloads Why centralized logging ❖ Not much control over underlying infra ❖ Relies on cloud prover specific logging and monitoring solution PaaS / Serverless services
  3. @nileshgule Financial Services App Loki integration Log collector Log storage

    Log search, visualise, dashboards backend-service account-service authentication-service forex-service transaction-service
  4. @nileshgule • Application specific • Monitor resource usage • Monitor

    scaling needs • Monitor anomalies / outliers • Kubernetes platform level • Monitor cluster resources (CPU / RAM) • API health • Autoscaling Container based workloads Why Metrics • Monitor resource usage • Scaling • Bottlenecks PaaS / Serverless services
  5. @nileshgule Financial Services App Prometheus integration Scrape Metrics Metrics storage

    visualise, dashboards backend-service account-service authentication-service forex-service transaction-service service-monitor
  6. @nileshgule • Distributed Tracing • Understanding complex systems • Performance

    monitoring and optimizations • Debugging and problem resolution Why Distributed Tracing
  7. @nileshgule Financial Services App Jaeger integration Distributed Traces Visualise Traces

    backend-service account-service authentication-service forex-service transaction-service Jaeger Operator
  8. @nileshgule Summary Modern day cloud native applications need new ways

    to address observability & monitoring ✓ Use best-of-class for given use case ✓ Rely on open standards (e.g. OpenTelemetry) ✓ Build portable observability systems (e.g. hybrid cloud migration) Log Aggregation ✓ Loki helps in centralized logging ✓ Grafana is used to visualize logs and build dashboards Metrics ✓ Prometheus provides easy to use metrics for platforms, applications ✓ Grafana provides visualization capabilities to build intuitive dashboards Distributed Tracing ✓ Jaeger provides distributed tracing capabilities
  9. @nileshgule Some Recommendations ♣ Too many agents ♣ Instrumentation, vendor

    lock-in ♣ Cloud native logs ♣ Cloud native metrics ♣ Cloud native traces ♣ Single pane of glass, correlation ∞ OpenTelemetry collector ∞ OpenTelemetry, OpenMetrics ∞ Fluent Bit / Fluentd, OpenSearch, Loki ∞ Prometheus, Cortex, Thanos ∞ OpenTelemetry, Jaeger, Grafana ∞ Grafana Challenges Tools
  10. @nileshgule References Log Aggregation ❖ Grafana Loki Monitoring & Alerting

    ❖ Prometheus ❖ Grafana ❖ Kube Prometheus stack ❖ Houssem Dellai – Prometheus & Grafana for monitoring Kubernetes Distributed Tracing ❖ Jaeger Tracing
  11. @nileshgule Source Code & slide deck Financial Services Demo https://github.com/infofractionalservices/microservices/tree/do

    cker_build_fixes https://speakerdeck.com/nileshgule/ https://www.slideshare.net/nileshgule/
  12. Q&A