Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes Monitoring Introduction

Kubernetes Monitoring Introduction

Kyohei Mizumoto

February 27, 2019
Tweet

More Decks by Kyohei Mizumoto

Other Decks in Technology

Transcript

  1. 2019/3/28 Kubernetes Monitoring Introduction 127.0.0.1:5500/index.html#2 6/31 Monitoring Problem Detection ダッシュボード、アラート

    Problem Resolution 根本原因の特定、トラブルシュート Continuous Improvement キャパシティ、コスト最適化 6 / 31
  2. 2019/3/28 Kubernetes Monitoring Introduction 127.0.0.1:5500/index.html#2 7/31 Observability A measure of

    how well internal states of a system can be inferred from knowledge of its external outputs 7 / 31
  3. 2019/3/28 Kubernetes Monitoring Introduction 127.0.0.1:5500/index.html#2 17/31 Architecture pull metrics HDD

    / SSD Pushgateway Short-lived jobs Jobs / Exporters Storage Retrieval PromQL Prometheus Server Node Service Discovery find targets Prometheus Server Alertmanager push alerts Web UI Grafana API clients PagerDuty Email DNS Kubernetes Consul ... Custom integration notify ... 17 / 31
  4. 2019/3/28 Kubernetes Monitoring Introduction 127.0.0.1:5500/index.html#2 18/31 Architecture Prometheus Server メトリクスの収集(Pull型)、保存

    Alert Manager Exporter 要求に応じてメトリクスを送信 Push Gateway メトリクスをPushしておく 18 / 31
  5. 2019/3/28 Kubernetes Monitoring Introduction 127.0.0.1:5500/index.html#2 20/31 Use Helm Helmを使⽤してマニフェストを作成 公式サイト︓https://helm.sh/

    # Chartのダウンロード $ helm fetch stable/prometheus --version 8.8.0 # Chartを元にマニフェストを作成 $ helm template --name sample-prometheus \ prometheus-8.8.0.tgz \ > sample-prometheus.yaml 20 / 31
  6. 2019/3/28 Kubernetes Monitoring Introduction 127.0.0.1:5500/index.html#2 26/31 PromQL Prometheus Query Language

    # Example # apiserverのhttpリクエスト合計 http_requests_total{job="apiserver"} # 直近5分のhttpリクエストの増加率 rate(http_requests_total[5m])[30m:1m] # 直近1時間での空きメモリ量の差 delta(node_memory_MemFree_bytes[1h]) 26 / 31
  7. 2019/3/28 Kubernetes Monitoring Introduction 127.0.0.1:5500/index.html#2 30/31 Links Monitoring and Observability

    https://thenewstack.io/monitoring-and-observability-whats-the- difference-and-why-does-it-matter/ Datadog https://www.datadoghq.com/ Prometheus https://prometheus.io/ Grafana https://grafana.com/ 30 / 31