Cluster AutoscalerをTerraformとHelmfileでデプロイしてPrometheusでモニタリングする / Deploy the Cluster Autoscaler with Terraform and Helmfile, Monitor with Prometheus

Slide 1

Slide 1 text

Cluster Autoscalerを TerraformとHelmﬁleでデプロイして Prometheusでモニタリングする Kubernetes Meetup Tokyo #25 Hidetake Iwata at NTT DATA (@int128)

Slide 2

Slide 2 text

2 Who are you? Software Engineer at NTT DATA, working on DevOps and Cloud Native Technology R&D. Author of kubectl plugins (kubelogin, kauthproxy).

Slide 3

Slide 3 text

お話しすること ● Cluster Autoscalerのデプロイ（Terraform, Helmﬁle） ● Cluster Autoscalerのモニタリング（Prometheus, Grafana）お話ししないこと ● Cluster Autoscalerのマニアックな仕様今日お話しすること 3 CI/CD Observability

Slide 4

Slide 4 text

クラスタに必要なリソース（CPU Request, Memory Request）に応じて、ノード数を自動的に増減させるツール。 Kubernetes Cluster Cluster Autoscalerとは Worker Nodes https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler クラスタに必要なリソースを計算する (Core) 例：メモリ不足で新しい Podが起動できないため、ノードの追加が必要と判断するクラウド依存のスケール処理を行う (Cloud Provider) 例：AWSの場合はAuto Scaling GroupのDesired Capacityを増やす 4

Slide 5

Slide 5 text

公式のHelm Chartを利用するとCluster Autoscalerを簡単にデプロイできる。（GCP やAzureの場合はマネージドサービスで設定できる） Helm Chart Cluster Autoscalerのデプロイ https://github.com/helm/charts/tree/master/stable/cluster-autoscaler stable/cluster-autoscaler 5 Helm Release Deployment Cluster Role Service Account ...

Slide 6

Slide 6 text

AWS Cluster Autoscalerのデプロイ AWSの場合、Cluster AutoscalerにIAM Roleを割り当てて、Cluster Autoscalerが Auto Scaling Groupを操作できる必要がある。 6 stable/cluster-autoscaler Deployment stable/kube2iam DaemonSet Auto Scaling Group IAM Role (Cluster Autoscaler) Cluster AutoscalerがAWS APIにアクセスする kube2iamが一時的なクレデンシャルを取得する IAM Role (Worker) https://github.com/jtblin/kube2iam

Slide 7

Slide 7 text

クラスタにデプロイするHelm Releasesを宣言的に管理できるツール。すべてデプロイするには： $ helmfile sync YAMLとクラスタの差分を表示するには： $ helmfile diff Helmﬁleとは https://github.com/roboll/helmfile 7 # helmfile.yaml releases: - name: cluster-autoscaler namespace: kube-system chart: stable/cluster-autoscaler values: - cloudProvider: aws awsRegion: {{ env "AWS_REGION" }} - name: kube2iam namespace: kube-system ● Helm ReleasesのセットをYAMLで宣言できる ● 設定値はインラインでも外部ファイルでも OK ● テンプレートで環境変数を参照できる

Slide 8

Slide 8 text

Helm ReleasesはHelmfile、AWSのリソースはTerraformでデプロイする。（Terraformでも管理できるけどHelmfileの方がおすすめ※） HelmfileとTerraformによるデプロイ Auto Scaling Group IAM Role (Worker) stable/cluster-autoscaler helmfile.yaml *.tf Helmfile Terraform ※個人の感想です 8 stable/kube2iam IAM Role (CA)

Slide 9

Slide 9 text

CI Ops AWS Cluster Autoscalerのデプロイメントパイプライン Helmﬁle Terraform 9 Git Repository Auto Scaling Group IAM Role (Worker) stable/cluster-autoscaler stable/kube2iam IAM Role (CA) helmfile.yaml *.tf HelmfileでGitOpsも可能らしい（未検証）

Slide 10

Slide 10 text

Cluster Autoscalerの動作確認（1/2） CPU Requestの大きなPodをデプロイすると、ノードが追加される。 10 I0927 11:50:35.158353 1 scale_up.go:263] Pod echoserver/echoserver-74fd7d865f-vkzqb is unschedulable I0927 11:50:35.158391 1 scale_up.go:300] Upcoming 0 nodes I0927 11:50:35.158521 1 scale_up.go:423] Best option to resize: ASG_NAME I0927 11:50:35.158540 1 scale_up.go:427] Estimated 1 nodes needed in ASG_NAME I0927 11:50:35.158556 1 scale_up.go:529] Final scale-up plan: [{ASG_NAME 4->5 (max: 8)}] I0927 11:50:35.158572 1 scale_up.go:694] Scale-up: setting group ASG_NAME size to 5 I0927 11:52:36.144782 1 clusterstate.go:194] Scale up in group ASG_NAME finished successfully in 2m0.794268739s

Slide 11

Slide 11 text

Cluster Autoscalerの動作確認（2/2）デフォルトでは、Cluster Autoscalerがノードが必要ないと判断してから10分後にノードが削除される。 11 I0927 11:57:07.790306 1 scale_down.go:407] Node ip-172-19-67-52.ap-northeast-1.compute.internal - utilization 0.055000 I0927 11:57:07.790634 1 static_autoscaler.go:359] ip-172-19-67-52.ap-northeast-1.compute.internal is unneeded since 2019-09-27 11:57:07.773690521 +0000 UTC m=+2997.491422805 duration 0s I0927 12:07:12.161679 1 static_autoscaler.go:359] ip-172-19-67-52.ap-northeast-1.compute.internal is unneeded since 2019-09-27 11:57:07.773690521 +0000 UTC m=+2997.491422805 duration 10m4.367847963s I0927 12:07:12.391908 1 auto_scaling_groups.go:269] Terminating EC2 instance: i-066bc60549f083e38

Slide 12

Slide 12 text

Cluster Autoscalerのモニタリング Cluster Autoscalerは以下の方法でモニタリングできる。 ● メトリクスをPrometheusで取得する。　←本スライドで説明 ● Podのログを参照する。 ● ConﬁgMapに格納されているステータスを参照する。 ● Eventをsubscribeする。 12 https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/metrics.md

Slide 13

Slide 13 text

Prometheus OperatorのServiceMonitorリソースを利用すると、監視対象のServiceとPrometheusを紐づけることができる。同じNamespaceに配置する必要がある Prometheus ServiceMonitorとは Prometheus ServiceMonitor Service Pod (exporter) Grafana https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/running-exporters.md 13

Slide 14

Slide 14 text

# 実際に生成されるマニフェスト apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: prometheus: kube-prometheus release: prometheus-operator name: cluster-autoscaler-aws-cluster-autoscaler namespace: monitoring # helmfile.yaml releases: - name: cluster-autoscaler namespace: kube-system chart: stable/cluster-autoscaler values: - serviceMonitor: enabled: true namespace: monitoring selector: release: prometheus-operator Cluster AutoscalerのServiceMonitor Cluster AutoscalerのHelm ChartにはServiceMonitorが含まれている。このラベルが付いた Prometheusに登録される 14

Slide 15

Slide 15 text

15 Cluster AutoscalerのGrafanaダッシュボード https://grafana.com/grafana/dashboards/3831

Slide 16

Slide 16 text

Cluster Autoscalerを TerraformとHelmﬁleでデプロイして Prometheusでモニタリングするシリーズにできるかも？？ 16

Slide 17

Slide 17 text

まとめ Cluster Autoscalerを利用すると、クラスタに必要なリソースに応じてノード数を自動的に増減できます。 TerraformとHelmﬁleによるCluster Autoscalerのデプロイ、 PrometheusとGrafanaによるCluster Autoscalerのモニタリングについて説明しました。 17 ※記載されている会社名、商品名、サービス名は各社の登録商標または商標です。