HPA autoscaling/v2beta2 の機能解説と Datadog を利用した HPA External Metrics の活用事例 / Introduction to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog

HPA autoscaling/v2beta2 ͷػೳղઆͱ Datadog Λར༻ͨ͠ HPA External Metrics ͷ׆༻ࣄྫ Introduction
to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog Takeshi Kondo / @chaspy 2021/01/21 Kubernetes Meetup Tokyo #38

#k8sjp

Who am I chaspy chaspy_ Lead Software Engineer Site Reliability
at Quipper Takeshi Kondo

ࠓ೔ͷൃදʹ͍ͭͯ • ର৅ • Kubernetes HPA ·ͬͨ͘஌Βͳ͍ͻͱ • Kubernetes HPA
autoscaling/v1 ͸࢖ͬͯΔ͕ v2 ͸஌Βͳ͍ͻͱ • Kuberentes HPA Λ Datadog metric ࢖ͬͯಈ͔͍ͨ͠ͻͱ • ΰʔϧ • HPA v1/v2 ͷجຊػೳΛ஌Δ • Datadog ͱ૊Έ߹Θͤͨࣄྫ͔ΒࣗࣾͰ׆༻͢ΔώϯτΛಘΔ

Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics
Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ

What is HPA?

Horizontal Pod Autoscaler

How does the Horizontal Pod Autoscaler work? https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Check blog post https://quipper.hatenablog.com/entry/2020/04/10/hpa

Two api versions are available autoscaling/v1 autoscaling/v2beta2

Two api versions are available autoscaling/v1 autoscaling/v2beta2 The FIELD has
changed between v1 to v2

Two api versions are available autoscaling/v1 autoscaling/v2beta2 Let's check out
v1 ﬁrst to learn the basic algorithm Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation

autoscaling/v1

autoscaling/v1 scaleTargetRef <Object> -required reference to scaled resource; horizontal pod
autoscaler will learn the current resource consumption and will set the desired number of pods by using its Scale subresource.

autoscaling/v1 minReplicas <integer> minReplicas is the lower limit for the
number of replicas to which the autoscaler can scale down. It defaults to 1 pod. minReplicas is allowed to be 0 if the alpha feature gate HPAScaleToZero is enabled and at least one Object or External metric is conﬁgured. Scaling is active as long as at least one metric value is available. https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ HPAScaleToZero ͷ Feature Gate ͸ 1.16 ͔Β

autoscaling/v1 maxReplicas <integer> -required- upper limit for the number of
pods that can be set by the autoscaler; cannot be smaller than MinReplicas.

autoscaling/v1 targetCPUUtilizationPercentage <integer> target average CPU utilization (represented as a
percentage of requested CPU) over all the pods; if not speciﬁed the default autoscaling policy will be used.

Algorithm Details desiredReplicas = ceil [currentReplicas * ( currentMetricValue /
desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200%

desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200% currentMetricValue = 150 (100 + 150 + 200 / 3) desiredMetricValue = 100 Current Replicas = 3 desiredReplicas = ceil(3 * (150 / 100)) = 5

desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200% currentMetricValue = 150 (100 + 150 + 200 / 3) desiredMetricValue = 100 Current Replicas = 3 desiredReplicas = ceil(3 * (150 / 100)) = 5 ΋ͪΖΜݮΔͱ͖΋ಉ͡ܭࢉ

autoscaling/v1 targetCPUUtilizationPercentage <integer> target average CPU utilization (represented as a
percentage of requested CPU) over all the pods; if not speciﬁed the default autoscaling policy will be used.

autoscaling/v1 CPU Request ͕ 100m core ͩͬͨ ৔߹ɺHPA ͸ Pod
ͷฏۉ CPU ࢖༻ ཰͕ 100m core ʹͳΔΑ͏ʹ replicas Λ૿ݮͤ͞Δ ʢtargetCPUUtilizationPercentage ͸ Request ʹର͢Δ Percentageʣ

Two api versions are available autoscaling/v1 autoscaling/v2beta2 New features are
available in v2 Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation

New features in v2 • Support for multiple metrics •
Support for custom metrics • Support for conﬁgurable scaling behavior

Support for multiple metrics • the Horizontal Pod Autoscaler controller
will evaluate each metric, and propose a new scale based on that metric. The largest of the proposed scales will be used as the new scale. Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics

Support for multiple metrics Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics Current Pod = 3
resource.name = cpu target.averageUtilization = 100 resource.name = memory target.averageUtilization = 100 Desired replicas CPU base ͩͱ 5 Memory base ͩͱ 2 HPA ͸ 5 Λ࠾༻ ʢΑΓେ͖͍஋Λ࠾༻ʣ

Support for custom metrics • Kubernetes then queries the new
custom metrics API to fetch the values of the appropriate custom metrics. • By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In order for it to access these APIs, cluster administrators must ensure that: • The API aggregation layer is enabled. • The corresponding APIs are registered: • For resource metrics, this is the metrics.k8s.io API, generally provided by metrics- server... • For custom metrics, this is the custom.metrics.k8s.io API. It's provided by "adapter" API servers provided by metrics solution vendors... • For external metrics, this is the external.metrics.k8s.io API.... https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis

How to get the resource metrics? HPA Controller API Server
Metrics Server Tell me cpu utilization of the pod

How to get the resource metrics? HPA Controller API Server
Metrics Server Tell me cpu utilization of the pod 1 apiVersion: apiregistration.k8s.io/v1 2 kind: APIService 3 metadata: 4 labels: 5 app.kubernetes.io/instance: metrics-server 6 k8s-app: metrics-server 7 name: v1beta1.metrics.k8s.io 8 spec: 9 group: metrics.k8s.io 10 groupPriorityMinimum: 100 11 insecureSkipTLSVerify: true 12 service: 13 name: metrics-server 14 namespace: kube-system 15 port: 443 16 version: v1beta1 17 versionPriority: 100 $ kubectl get apiservice v1beta1.metrics.k8s.iop -o yaml

How to get the custom/external metrics? HPA Controller API Server
Custom Metrics Server Tell me the value of your custom/external metric Aggregation Layer External Metrics Server

How to get the custom/external metrics? HPA Controller API Server
Custom Metrics Server Tell me the value of your custom/external metric Aggregation Layer External Metrics Server 1 apiVersion: apiregistration.k8s.io/v1 2 kind: APIService 3 metadata: 4 labels: 5 app.kubernetes.io/instance: datadog 6 helm.sh/chart: datadog 7 name: v1beta1.external.metrics.k8s.io 8 spec: 9 group: external.metrics.k8s.io 10 groupPriorityMinimum: 100 11 insecureSkipTLSVerify: true 12 service: 13 name: datadog-custom-metrics-server 14 namespace: monitor 15 port: 443 16 version: v1beta1 17 versionPriority: 100

Support for conﬁgurable scaling behavior(v1.18~) • Scaling Policies • εέʔϧΞοϓɺμ΢ϯͷมԽྔΛ੍ݶͰ͖Δ
• ϙϦγʔ͸ෳ਺ఆٛͰ͖ɺselectPolicy ʹैͬͯ࠾༻͞ΕΔ • σϑΥϧτ͸࠷େ 1 behavior: 2 scaleDown: 3 policies: 4 - type: Pods 5 value: 4 6 periodSeconds: 60 7 - type: Percent 8 value: 10 9 periodSeconds: 60

Support for conﬁgurable scaling behavior(v1.18~) • Scaling Policies • When
the number of pods is more than 40 the second policy will be used for scaling down. For instance if there are 80 replicas and the target has to be scaled down to 10 replicas then during the ﬁrst step 8 replicas will be reduced 1 behavior: 2 scaleDown: 3 policies: 4 - type: Pods 5 value: 4 6 periodSeconds: 60 7 - type: Percent 8 value: 10 9 periodSeconds: 60

Support for configurable scaling behavior(v1.18~) • Stabilization Window • The
stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating. The stabilization window is used by the autoscaling algorithm to consider the computed desired state from the past to prevent scaling 1 scaleDown: 2 stabilizationWindowSeconds: 300

Default behavior 1 behavior: 2 scaleDown: 3 stabilizationWindowSeconds: 300 4
policies: 5 - type: Percent 6 value: 100 7 periodSeconds: 15 8 scaleUp: 9 stabilizationWindowSeconds: 0 10 policies: 11 - type: Percent 12 value: 100 13 periodSeconds: 15 14 - type: Pods 15 value: 4 16 periodSeconds: 15 17 selectPolicy: Max

Two api versions are available autoscaling/v1 autoscaling/v2beta2 The FIELD has
changed between v1 to v2

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4
name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 Same as autoscaling/v1

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 Multiple metrics!

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 These 4 types are available: resource, pods, object, external

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 resource: cpu or memory ͷ͜ͱ. metrics-server ͔Βఏڙ͞ΕΔɻ pods: pod ʹؔ͢Δ metrics. custom metrics server ͔Βఏڙ͞ΕΔ object: Pod Ҏ֎ͷ object ʹؔ͢Δ metrics. custom metrics server ͔Βఏڙ͞ΕΔ external: Cluster ֎෦ͷ metrics. external metrics server ͔Βఏڙ͞ΕΔ

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 These 3 ﬁelds are available in metrics.<metrics type>.target • averageUtilization • averageValue • value

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageUtilization averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageUtilization Resource type ͰͷΈ࢖͑Δ Pod ͷฏۉͷRequest ʹର͢Δൺ཰Λ ໨ඪ஋ͱ͢Δ

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue averageValue is the target value of the average of the metric across all relevant pods (as a quantity)

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue ಘͨ஋Λ Pod ਺Ͱׂͬͨ஋Λ໨ඪ஋͢ Δɻ

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 value <string> value is the target value of the metric (as a quantity).

Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics
Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ

Check blog post https://quipper.hatenablog.com/entry/2020/11/30/scheduled-scaling-with-hpa

Background • ϑΟϦϐϯͰఆظࢼݧΛ Quipper ্Ͱ࣮ࢪ • ࢼݧ։࢝લޙʹҰ੪ΞΫηεͰαʔϏεμ΢ϯ Teacher uploaded exams
and registered the class and time slot Some students ware able to take an exam but someones ware not

Why did it happen? • HPA Ͱ͸ؒʹ߹Θͳ͍ • εέʔϧΞ΢τ͸ Pod
Λ૿΍ͨ͋͠ͱɺNode Λ૿΍͢ඞཁ ͕͋Δ • Φʔτεέʔϧ͸εύΠΫΞΫηεʹऑ͍

ͱΓ͋͑ͣ • ೔த Pod ૿΍͍ͯ྇ͩ͠

Ͳ͏͢Δ • ӡ༻ͰΧόʔ • "Exam" ͱ͍͏σʔλ࡞੒ʢઌੜͷ໰୊࡞੒ʣΛ࣮ࢪ೔ͷ24࣌ؒ·Ͱ ʹͯ͠΋Β͏ • ಉ࣌ࢼݧडߨऀ਺ͷ੍ݶ •
ͳΔ΂͘ࢼݧ࣮ࢪΛֶߍ಺Ͱζϥͯ͠΋Β͏ͳͲͷ͓ئ͍

Ͳ͏͢Δ • ΤϯδχΞϦϯάͰղܾ • ࣄલʹ໰୊࡞੒Λͯ͠΋Β͑Ε͹ɺࢼݧ͝ͱͷ։࢝࣌ࠁͱडߨਓ਺͸ ࣄલʹΘ͔Δ • ͜ͷ਺Λ΋ͱʹαʔόΛεέʔϧͤ͞Ε͹͍͍ΜͰ͸ʁ

Database ʹडݧਓ਺ͱ࣌ࠁ͸ଘࡏ͢Δ Teacher uploaded exams 10:00 - 11:00 A school
18 11:00 - 12:00 B school 30 13:00 - 14:00 C school 500

࣌ؒͱडݧਓ਺Λ΋ͱʹαʔόΛεέʔϧ͍ͨ͠ 10:00 - 11:00 A school 18 11:00 - 12:00
B school 30 13:00 - 14:00 C school 500 Time,#pods 10:00 90 11:00 270 13:00 2500 Scheduled-Scaling Desired replicas

How to solve? 1. σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch 2. ࣌ؒ͝ͱͷࢼݧର৅ਓ਺Λ Datadog ʹ
custom metrics ͱ ͯ͠ૹ৴ 3. HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻

Architecture Conﬁgmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset:
datadog-agent

Architecture Conﬁgmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset:
datadog-agent ᶃ ᶄ ᶅ

1.σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch Conﬁgmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent
daemonset: datadog-agent 2020-01-20.tsv: --- 12:00 229 12:15 54 12:45 67 13:00 3684 13:15 91 13:30 4821 13:45 37 14:00 138 Ruby ͷόονॲཧΛ WebDev ͕γϡͬͱ ॻ͍ͯ͘Εͨ

2.࣌ܥྻσʔλΛ Datadog ʹ custom metrics ͱͯ͠ૹ৴ Conﬁgmap Pod: timed-exam-schedule-exporter HPA
Controller Deployment Datadog Cluster-agent daemonset: datadog-agent 2020-01-20.tsv: --- 12:00 229 12:15 54 12:45 67 13:00 3684 13:15 91 13:30 4821 13:45 37 14:00 138 File mount ৽ن։ൃίϯϙʔωϯτɻGo ੡ɻ ແݶϧʔϓͰ tsv ΛಡΈࠐΜͰɺݱࡏ࣌ࠁͷ15෼ޙ ͷ஋Λ Prometheus ܗࣜͰ export ͢Δ܅ɻ Kubernetes Integration Autodiscovery Ͱ metrics Λ৯΂ͯ΋Β͏

Kubernetes Integration Autodiscovery 1 annotations: 2 ad.datadoghq.com/timed-exam-schedule-exporter.check_names: | 3 ["prometheus"]
4 ad.datadoghq.com/timed-exam-schedule-exporter.init_configs: | 5 [{}] 6 ad.datadoghq.com/timed-exam-schedule-exporter.instances: | 7 [ 8 { 9 "prometheus_url": "http://%%host%%:8080/metrics", 10 "namespace": "timed_exam", 11 "metrics": ["*"] 12 } 13 ] https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes

3.HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻ Conﬁgmap Pod: timed-exam-schedule-exporter HPA Controller
Deployment Datadog Cluster-agent daemonset: datadog-agent Datadogmetric ͱ͍͏ Custom Resource Λ࢖͏ HPA autoscaling/v2beta2 Ͱ External type ͰͦΕΛࢦఆ

Datadogmetric 1 apiVersion: datadoghq.com/v1alpha1 2 kind: DatadogMetric 3 metadata: 4
name: timed-exam 5 spec: 6 # throughput: 10 = 500 / 5000. 500 pods accept 5000 users. 7 # ref: https://github.com/quipper/quipper/issues/26054 8 query: ceil(max:timed_exam.timed_exam_scheduled_scaling_desired_replicas{environment:production}/10) Datadog ͷ query ͕ͦͷ··࢖͑ΔɻίʔυΛॻ͘ྔΛݮΒͤͯ࠷ߴ ͜ͷQuery Ͱࢼݧର৅ਓ਺͔Β Desired replicas ʹม׵͍ͯ͠Δ(܎਺10Ͱׂ͍ͬͯΔ)

HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4
name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 લ͔Β͋ͬͨ CPU ࢖༻཰ʹΑΔ HPA ઃఆ

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 ࠓճ௥Ճ෦෼ External metric ͱ ͯ͠ datadogmetric Λࢦఆ

name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue: ର৅ͷ metric Λ Pod ૯਺Ͱׂͬͨ ஋͕͜ͷ஋ʹۙͮ͘Α͏ʹ໨ࢦ͢ Datadogmetric Ͱಘͨ਺͕ 100 ͳΒɺPod ͸ 100 ݸʹ૿΍ͦ͏ͱ͢Δ = desired replicas Λ͍ࣔͯ͠Δ

݁Ռ ԫ৭ͷઢͰࢼݧʹ߹ΘͤͨεέʔϦϯάΛɺ ͦΕҎ֎͸ CPU ʹΑΔεέʔϧ͕Ͱ͖͍ͯΔʂ

݁Ռ ࢵͱ੨ͷ໘ੵͷ͕ࠩݮΒͨ͠ίετɻ ݄ؒ $3150 ͸ݮΒͤΔࢼࢉʹɻ

·ͱΊ • HPA autoscaling/v2beta2 Ͱ͸ multiple metrics ͕࢖͑Δ • Multiple
metrics ͸ෳ਺৚݅ΛݟͯΑΓ҆શʢߴ͍஋ʣΛ࠾༻͢Δ • HPA autoscaling/v2beta2 Ͱ͸ external metrics ͕࢖͑Δ • Datadog ʹ͋Δ metrics / query ͕࢖͑ͯศར • Datadog ʹ custom metrics ΛૹΔ͜ͱͰ೚ҙͷ࣌ؒʹ೚ҙͷεέʔ ϧΛߦ͏͜ͱ͕Ͱ͖ͯศར

Thank you! chaspy chaspy_ Lead Software Engineer Site Reliability at
Quipper Takeshi Kondo

HPA autoscaling/v2beta2 の機能解説と Datadog を利用した HP...

HPA autoscaling/v2beta2 の機能解説と Datadog を利用した HPA External Metrics の活用事例 / Introduction to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog

More Decks by Takeshi Kondo

Other Decks in Technology

Featured

Transcript