Slide 1

Slide 1 text

HPA autoscaling/v2beta2 ͷػೳղઆͱ Datadog Λར༻ͨ͠ HPA External Metrics ͷ׆༻ࣄྫ Introduction to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog Takeshi Kondo / @chaspy 2021/01/21 Kubernetes Meetup Tokyo #38

Slide 2

Slide 2 text

#k8sjp

Slide 3

Slide 3 text

Who am I chaspy chaspy_ Lead Software Engineer Site Reliability at Quipper Takeshi Kondo

Slide 4

Slide 4 text

ࠓ೔ͷൃදʹ͍ͭͯ • ର৅ • Kubernetes HPA ·ͬͨ͘஌Βͳ͍ͻͱ • Kubernetes HPA autoscaling/v1 ͸࢖ͬͯΔ͕ v2 ͸஌Βͳ͍ͻͱ • Kuberentes HPA Λ Datadog metric ࢖ͬͯಈ͔͍ͨ͠ͻͱ • ΰʔϧ • HPA v1/v2 ͷجຊػೳΛ஌Δ • Datadog ͱ૊Έ߹Θͤͨࣄྫ͔ΒࣗࣾͰ׆༻͢ΔώϯτΛಘΔ

Slide 5

Slide 5 text

Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ

Slide 6

Slide 6 text

Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ

Slide 7

Slide 7 text

What is HPA?

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Horizontal Pod Autoscaler

Slide 10

Slide 10 text

How does the Horizontal Pod Autoscaler work? https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Slide 11

Slide 11 text

Check blog post https://quipper.hatenablog.com/entry/2020/04/10/hpa

Slide 12

Slide 12 text

Two api versions are available autoscaling/v1 autoscaling/v2beta2

Slide 13

Slide 13 text

Two api versions are available autoscaling/v1 autoscaling/v2beta2 The FIELD has changed between v1 to v2

Slide 14

Slide 14 text

Two api versions are available autoscaling/v1 autoscaling/v2beta2 Let's check out v1 first to learn the basic algorithm Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation

Slide 15

Slide 15 text

autoscaling/v1

Slide 16

Slide 16 text

autoscaling/v1 scaleTargetRef -required reference to scaled resource; horizontal pod autoscaler will learn the current resource consumption and will set the desired number of pods by using its Scale subresource.

Slide 17

Slide 17 text

autoscaling/v1 minReplicas minReplicas is the lower limit for the number of replicas to which the autoscaler can scale down. It defaults to 1 pod. minReplicas is allowed to be 0 if the alpha feature gate HPAScaleToZero is enabled and at least one Object or External metric is configured. Scaling is active as long as at least one metric value is available. https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ HPAScaleToZero ͷ Feature Gate ͸ 1.16 ͔Β

Slide 18

Slide 18 text

autoscaling/v1 maxReplicas -required- upper limit for the number of pods that can be set by the autoscaler; cannot be smaller than MinReplicas.

Slide 19

Slide 19 text

autoscaling/v1 targetCPUUtilizationPercentage target average CPU utilization (represented as a percentage of requested CPU) over all the pods; if not specified the default autoscaling policy will be used.

Slide 20

Slide 20 text

Algorithm Details desiredReplicas = ceil [currentReplicas * ( currentMetricValue / desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200%

Slide 21

Slide 21 text

Algorithm Details desiredReplicas = ceil [currentReplicas * ( currentMetricValue / desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200% currentMetricValue = 150 (100 + 150 + 200 / 3) desiredMetricValue = 100 Current Replicas = 3 desiredReplicas = ceil(3 * (150 / 100)) = 5

Slide 22

Slide 22 text

Algorithm Details desiredReplicas = ceil [currentReplicas * ( currentMetricValue / desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200% currentMetricValue = 150 (100 + 150 + 200 / 3) desiredMetricValue = 100 Current Replicas = 3 desiredReplicas = ceil(3 * (150 / 100)) = 5 ΋ͪΖΜݮΔͱ͖΋ಉ͡ܭࢉ

Slide 23

Slide 23 text

autoscaling/v1 targetCPUUtilizationPercentage target average CPU utilization (represented as a percentage of requested CPU) over all the pods; if not specified the default autoscaling policy will be used.

Slide 24

Slide 24 text

autoscaling/v1 CPU Request ͕ 100m core ͩͬͨ ৔߹ɺHPA ͸ Pod ͷฏۉ CPU ࢖༻ ཰͕ 100m core ʹͳΔΑ͏ʹ replicas Λ૿ݮͤ͞Δ ʢtargetCPUUtilizationPercentage ͸ Request ʹର͢Δ Percentageʣ

Slide 25

Slide 25 text

Two api versions are available autoscaling/v1 autoscaling/v2beta2 New features are available in v2 Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation

Slide 26

Slide 26 text

New features in v2 • Support for multiple metrics • Support for custom metrics • Support for configurable scaling behavior

Slide 27

Slide 27 text

New features in v2 • Support for multiple metrics • Support for custom metrics • Support for configurable scaling behavior

Slide 28

Slide 28 text

Support for multiple metrics • the Horizontal Pod Autoscaler controller will evaluate each metric, and propose a new scale based on that metric. The largest of the proposed scales will be used as the new scale. Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics

Slide 29

Slide 29 text

Support for multiple metrics Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics Current Pod = 3 resource.name = cpu target.averageUtilization = 100 resource.name = memory target.averageUtilization = 100 Desired replicas CPU base ͩͱ 5 Memory base ͩͱ 2 HPA ͸ 5 Λ࠾༻ ʢΑΓେ͖͍஋Λ࠾༻ʣ

Slide 30

Slide 30 text

New features in v2 • Support for multiple metrics • Support for custom metrics • Support for configurable scaling behavior

Slide 31

Slide 31 text

Support for custom metrics • Kubernetes then queries the new custom metrics API to fetch the values of the appropriate custom metrics. • By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In order for it to access these APIs, cluster administrators must ensure that: • The API aggregation layer is enabled. • The corresponding APIs are registered: • For resource metrics, this is the metrics.k8s.io API, generally provided by metrics- server... • For custom metrics, this is the custom.metrics.k8s.io API. It's provided by "adapter" API servers provided by metrics solution vendors... • For external metrics, this is the external.metrics.k8s.io API.... https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis

Slide 32

Slide 32 text

How to get the resource metrics? HPA Controller API Server Metrics Server Tell me cpu utilization of the pod

Slide 33

Slide 33 text

How to get the resource metrics? HPA Controller API Server Metrics Server Tell me cpu utilization of the pod 1 apiVersion: apiregistration.k8s.io/v1 2 kind: APIService 3 metadata: 4 labels: 5 app.kubernetes.io/instance: metrics-server 6 k8s-app: metrics-server 7 name: v1beta1.metrics.k8s.io 8 spec: 9 group: metrics.k8s.io 10 groupPriorityMinimum: 100 11 insecureSkipTLSVerify: true 12 service: 13 name: metrics-server 14 namespace: kube-system 15 port: 443 16 version: v1beta1 17 versionPriority: 100 $ kubectl get apiservice v1beta1.metrics.k8s.iop -o yaml

Slide 34

Slide 34 text

How to get the custom/external metrics? HPA Controller API Server Custom Metrics Server Tell me the value of your custom/external metric Aggregation Layer External Metrics Server

Slide 35

Slide 35 text

How to get the custom/external metrics? HPA Controller API Server Custom Metrics Server Tell me the value of your custom/external metric Aggregation Layer External Metrics Server 1 apiVersion: apiregistration.k8s.io/v1 2 kind: APIService 3 metadata: 4 labels: 5 app.kubernetes.io/instance: datadog 6 helm.sh/chart: datadog 7 name: v1beta1.external.metrics.k8s.io 8 spec: 9 group: external.metrics.k8s.io 10 groupPriorityMinimum: 100 11 insecureSkipTLSVerify: true 12 service: 13 name: datadog-custom-metrics-server 14 namespace: monitor 15 port: 443 16 version: v1beta1 17 versionPriority: 100

Slide 36

Slide 36 text

New features in v2 • Support for multiple metrics • Support for custom metrics • Support for configurable scaling behavior

Slide 37

Slide 37 text

Support for configurable scaling behavior(v1.18~) • Scaling Policies • εέʔϧΞοϓɺμ΢ϯͷมԽྔΛ੍ݶͰ͖Δ • ϙϦγʔ͸ෳ਺ఆٛͰ͖ɺselectPolicy ʹैͬͯ࠾༻͞ΕΔ • σϑΥϧτ͸࠷େ 1 behavior: 2 scaleDown: 3 policies: 4 - type: Pods 5 value: 4 6 periodSeconds: 60 7 - type: Percent 8 value: 10 9 periodSeconds: 60

Slide 38

Slide 38 text

Support for configurable scaling behavior(v1.18~) • Scaling Policies • When the number of pods is more than 40 the second policy will be used for scaling down. For instance if there are 80 replicas and the target has to be scaled down to 10 replicas then during the first step 8 replicas will be reduced 1 behavior: 2 scaleDown: 3 policies: 4 - type: Pods 5 value: 4 6 periodSeconds: 60 7 - type: Percent 8 value: 10 9 periodSeconds: 60

Slide 39

Slide 39 text

Support for configurable scaling behavior(v1.18~) • Stabilization Window • The stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating. The stabilization window is used by the autoscaling algorithm to consider the computed desired state from the past to prevent scaling 1 scaleDown: 2 stabilizationWindowSeconds: 300

Slide 40

Slide 40 text

Default behavior 1 behavior: 2 scaleDown: 3 stabilizationWindowSeconds: 300 4 policies: 5 - type: Percent 6 value: 100 7 periodSeconds: 15 8 scaleUp: 9 stabilizationWindowSeconds: 0 10 policies: 11 - type: Percent 12 value: 100 13 periodSeconds: 15 14 - type: Pods 15 value: 4 16 periodSeconds: 15 17 selectPolicy: Max

Slide 41

Slide 41 text

Two api versions are available autoscaling/v1 autoscaling/v2beta2 The FIELD has changed between v1 to v2

Slide 42

Slide 42 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1

Slide 43

Slide 43 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 Same as autoscaling/v1

Slide 44

Slide 44 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 Multiple metrics!

Slide 45

Slide 45 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 These 4 types are available: resource, pods, object, external

Slide 46

Slide 46 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 resource: cpu or memory ͷ͜ͱ. metrics-server ͔Βఏڙ͞ΕΔɻ pods: pod ʹؔ͢Δ metrics. custom metrics server ͔Βఏڙ͞ΕΔ object: Pod Ҏ֎ͷ object ʹؔ͢Δ metrics. custom metrics server ͔Βఏڙ͞ΕΔ external: Cluster ֎෦ͷ metrics. external metrics server ͔Βఏڙ͞ΕΔ

Slide 47

Slide 47 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 These 3 fields are available in metrics..target • averageUtilization • averageValue • value

Slide 48

Slide 48 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageUtilization averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type

Slide 49

Slide 49 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageUtilization Resource type ͰͷΈ࢖͑Δ Pod ͷฏۉͷRequest ʹର͢Δൺ཰Λ ໨ඪ஋ͱ͢Δ

Slide 50

Slide 50 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue averageValue is the target value of the average of the metric across all relevant pods (as a quantity)

Slide 51

Slide 51 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue ಘͨ஋Λ Pod ਺Ͱׂͬͨ஋Λ໨ඪ஋͢ Δɻ

Slide 52

Slide 52 text

autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 value value is the target value of the metric (as a quantity).

Slide 53

Slide 53 text

Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ

Slide 54

Slide 54 text

Check blog post https://quipper.hatenablog.com/entry/2020/11/30/scheduled-scaling-with-hpa

Slide 55

Slide 55 text

Background • ϑΟϦϐϯͰఆظࢼݧΛ Quipper ্Ͱ࣮ࢪ • ࢼݧ։࢝લޙʹҰ੪ΞΫηεͰαʔϏεμ΢ϯ Teacher uploaded exams and registered the class and time slot Some students ware able to take an exam but someones ware not

Slide 56

Slide 56 text

Why did it happen? • HPA Ͱ͸ؒʹ߹Θͳ͍ • εέʔϧΞ΢τ͸ Pod Λ૿΍ͨ͋͠ͱɺNode Λ૿΍͢ඞཁ ͕͋Δ • Φʔτεέʔϧ͸εύΠΫΞΫηεʹऑ͍

Slide 57

Slide 57 text

ͱΓ͋͑ͣ • ೔த Pod ૿΍͍ͯ྇ͩ͠

Slide 58

Slide 58 text

Ͳ͏͢Δ • ӡ༻ͰΧόʔ • "Exam" ͱ͍͏σʔλ࡞੒ʢઌੜͷ໰୊࡞੒ʣΛ࣮ࢪ೔ͷ24࣌ؒ·Ͱ ʹͯ͠΋Β͏ • ಉ࣌ࢼݧडߨऀ਺ͷ੍ݶ • ͳΔ΂͘ࢼݧ࣮ࢪΛֶߍ಺Ͱζϥͯ͠΋Β͏ͳͲͷ͓ئ͍

Slide 59

Slide 59 text

Ͳ͏͢Δ • ΤϯδχΞϦϯάͰղܾ • ࣄલʹ໰୊࡞੒Λͯ͠΋Β͑Ε͹ɺࢼݧ͝ͱͷ։࢝࣌ࠁͱडߨਓ਺͸ ࣄલʹΘ͔Δ • ͜ͷ਺Λ΋ͱʹαʔόΛεέʔϧͤ͞Ε͹͍͍ΜͰ͸ʁ

Slide 60

Slide 60 text

Database ʹडݧਓ਺ͱ࣌ࠁ͸ଘࡏ͢Δ Teacher uploaded exams 10:00 - 11:00 A school 18 11:00 - 12:00 B school 30 13:00 - 14:00 C school 500

Slide 61

Slide 61 text

࣌ؒͱडݧਓ਺Λ΋ͱʹαʔόΛεέʔϧ͍ͨ͠ 10:00 - 11:00 A school 18 11:00 - 12:00 B school 30 13:00 - 14:00 C school 500 Time,#pods 10:00 90 11:00 270 13:00 2500 Scheduled-Scaling Desired replicas

Slide 62

Slide 62 text

How?

Slide 63

Slide 63 text

How to solve? 1. σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch 2. ࣌ؒ͝ͱͷࢼݧର৅ਓ਺Λ Datadog ʹ custom metrics ͱ ͯ͠ૹ৴ 3. HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻

Slide 64

Slide 64 text

Architecture Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset: datadog-agent

Slide 65

Slide 65 text

Architecture Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset: datadog-agent ᶃ ᶄ ᶅ

Slide 66

Slide 66 text

1.σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset: datadog-agent 2020-01-20.tsv: --- 12:00 229 12:15 54 12:45 67 13:00 3684 13:15 91 13:30 4821 13:45 37 14:00 138 Ruby ͷόονॲཧΛ WebDev ͕γϡͬͱ ॻ͍ͯ͘Εͨ

Slide 67

Slide 67 text

2.࣌ܥྻσʔλΛ Datadog ʹ custom metrics ͱͯ͠ૹ৴ Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset: datadog-agent 2020-01-20.tsv: --- 12:00 229 12:15 54 12:45 67 13:00 3684 13:15 91 13:30 4821 13:45 37 14:00 138 File mount ৽ن։ൃίϯϙʔωϯτɻGo ੡ɻ ແݶϧʔϓͰ tsv ΛಡΈࠐΜͰɺݱࡏ࣌ࠁͷ15෼ޙ ͷ஋Λ Prometheus ܗࣜͰ export ͢Δ܅ɻ Kubernetes Integration Autodiscovery Ͱ metrics Λ৯΂ͯ΋Β͏

Slide 68

Slide 68 text

Kubernetes Integration Autodiscovery 1 annotations: 2 ad.datadoghq.com/timed-exam-schedule-exporter.check_names: | 3 ["prometheus"] 4 ad.datadoghq.com/timed-exam-schedule-exporter.init_configs: | 5 [{}] 6 ad.datadoghq.com/timed-exam-schedule-exporter.instances: | 7 [ 8 { 9 "prometheus_url": "http://%%host%%:8080/metrics", 10 "namespace": "timed_exam", 11 "metrics": ["*"] 12 } 13 ] https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes

Slide 69

Slide 69 text

3.HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻ Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset: datadog-agent Datadogmetric ͱ͍͏ Custom Resource Λ࢖͏ HPA autoscaling/v2beta2 Ͱ External type ͰͦΕΛࢦఆ

Slide 70

Slide 70 text

Datadogmetric 1 apiVersion: datadoghq.com/v1alpha1 2 kind: DatadogMetric 3 metadata: 4 name: timed-exam 5 spec: 6 # throughput: 10 = 500 / 5000. 500 pods accept 5000 users. 7 # ref: https://github.com/quipper/quipper/issues/26054 8 query: ceil(max:timed_exam.timed_exam_scheduled_scaling_desired_replicas{environment:production}/10) Datadog ͷ query ͕ͦͷ··࢖͑ΔɻίʔυΛॻ͘ྔΛݮΒͤͯ࠷ߴ ͜ͷQuery Ͱࢼݧର৅ਓ਺͔Β Desired replicas ʹม׵͍ͯ͠Δ(܎਺10Ͱׂ͍ͬͯΔ)

Slide 71

Slide 71 text

HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1

Slide 72

Slide 72 text

HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 લ͔Β͋ͬͨ CPU ࢖༻཰ʹΑΔ HPA ઃఆ

Slide 73

Slide 73 text

HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 ࠓճ௥Ճ෦෼ External metric ͱ ͯ͠ datadogmetric Λࢦఆ

Slide 74

Slide 74 text

HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4 name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue: ର৅ͷ metric Λ Pod ૯਺Ͱׂͬͨ ஋͕͜ͷ஋ʹۙͮ͘Α͏ʹ໨ࢦ͢ Datadogmetric Ͱಘͨ਺͕ 100 ͳΒɺPod ͸ 100 ݸʹ૿΍ͦ͏ͱ͢Δ = desired replicas Λ͍ࣔͯ͠Δ

Slide 75

Slide 75 text

݁Ռ ԫ৭ͷઢͰࢼݧʹ߹ΘͤͨεέʔϦϯάΛɺ ͦΕҎ֎͸ CPU ʹΑΔεέʔϧ͕Ͱ͖͍ͯΔʂ

Slide 76

Slide 76 text

݁Ռ ࢵͱ੨ͷ໘ੵͷ͕ࠩݮΒͨ͠ίετɻ ݄ؒ $3150 ͸ݮΒͤΔࢼࢉʹɻ

Slide 77

Slide 77 text

·ͱΊ • HPA autoscaling/v2beta2 Ͱ͸ multiple metrics ͕࢖͑Δ • Multiple metrics ͸ෳ਺৚݅ΛݟͯΑΓ҆શʢߴ͍஋ʣΛ࠾༻͢Δ • HPA autoscaling/v2beta2 Ͱ͸ external metrics ͕࢖͑Δ • Datadog ʹ͋Δ metrics / query ͕࢖͑ͯศར • Datadog ʹ custom metrics ΛૹΔ͜ͱͰ೚ҙͷ࣌ؒʹ೚ҙͷεέʔ ϧΛߦ͏͜ͱ͕Ͱ͖ͯศར

Slide 78

Slide 78 text

Thank you! chaspy chaspy_ Lead Software Engineer Site Reliability at Quipper Takeshi Kondo