Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HPA autoscaling/v2beta2 の機能解説と Datadog を利用した HPA External Metrics の活用事例 / Introduction to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog

93c80c388fe9d8f9df7d030549a0ff0b?s=47 Takeshi Kondo
January 21, 2021

HPA autoscaling/v2beta2 の機能解説と Datadog を利用した HPA External Metrics の活用事例 / Introduction to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog

Kubernetes meetup tokyo#38

93c80c388fe9d8f9df7d030549a0ff0b?s=128

Takeshi Kondo

January 21, 2021
Tweet

Transcript

  1. HPA autoscaling/v2beta2 ͷػೳղઆͱ Datadog Λར༻ͨ͠ HPA External Metrics ͷ׆༻ࣄྫ Introduction

    to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog Takeshi Kondo / @chaspy 2021/01/21 Kubernetes Meetup Tokyo #38
  2. #k8sjp

  3. Who am I chaspy chaspy_ Lead Software Engineer Site Reliability

    at Quipper Takeshi Kondo
  4. ࠓ೔ͷൃදʹ͍ͭͯ • ର৅ • Kubernetes HPA ·ͬͨ͘஌Βͳ͍ͻͱ • Kubernetes HPA

    autoscaling/v1 ͸࢖ͬͯΔ͕ v2 ͸஌Βͳ͍ͻͱ • Kuberentes HPA Λ Datadog metric ࢖ͬͯಈ͔͍ͨ͠ͻͱ • ΰʔϧ • HPA v1/v2 ͷجຊػೳΛ஌Δ • Datadog ͱ૊Έ߹Θͤͨࣄྫ͔ΒࣗࣾͰ׆༻͢ΔώϯτΛಘΔ
  5. Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics

    Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ
  6. Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics

    Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ
  7. What is HPA?

  8. None
  9. Horizontal Pod Autoscaler

  10. How does the Horizontal Pod Autoscaler work? https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

  11. Check blog post https://quipper.hatenablog.com/entry/2020/04/10/hpa

  12. Two api versions are available autoscaling/v1 autoscaling/v2beta2

  13. Two api versions are available autoscaling/v1 autoscaling/v2beta2 The FIELD has

    changed between v1 to v2
  14. Two api versions are available autoscaling/v1 autoscaling/v2beta2 Let's check out

    v1 first to learn the basic algorithm Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation
  15. autoscaling/v1

  16. autoscaling/v1 scaleTargetRef <Object> -required reference to scaled resource; horizontal pod

    autoscaler will learn the current resource consumption and will set the desired number of pods by using its Scale subresource.
  17. autoscaling/v1 minReplicas <integer> minReplicas is the lower limit for the

    number of replicas to which the autoscaler can scale down. It defaults to 1 pod. minReplicas is allowed to be 0 if the alpha feature gate HPAScaleToZero is enabled and at least one Object or External metric is configured. Scaling is active as long as at least one metric value is available. https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ HPAScaleToZero ͷ Feature Gate ͸ 1.16 ͔Β
  18. autoscaling/v1 maxReplicas <integer> -required- upper limit for the number of

    pods that can be set by the autoscaler; cannot be smaller than MinReplicas.
  19. autoscaling/v1 targetCPUUtilizationPercentage <integer> target average CPU utilization (represented as a

    percentage of requested CPU) over all the pods; if not specified the default autoscaling policy will be used.
  20. Algorithm Details desiredReplicas = ceil [currentReplicas * ( currentMetricValue /

    desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200%
  21. Algorithm Details desiredReplicas = ceil [currentReplicas * ( currentMetricValue /

    desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200% currentMetricValue = 150 (100 + 150 + 200 / 3) desiredMetricValue = 100 Current Replicas = 3 desiredReplicas = ceil(3 * (150 / 100)) = 5
  22. Algorithm Details desiredReplicas = ceil [currentReplicas * ( currentMetricValue /

    desiredMetricValue )] Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details Pod = 3, targetCPUUtilizationPercentage = 100 Pod1 CPUUtilization 100% Pod2 CPUUtilization 150% Pod3 CPUUtilization 200% currentMetricValue = 150 (100 + 150 + 200 / 3) desiredMetricValue = 100 Current Replicas = 3 desiredReplicas = ceil(3 * (150 / 100)) = 5 ΋ͪΖΜݮΔͱ͖΋ಉ͡ܭࢉ
  23. autoscaling/v1 targetCPUUtilizationPercentage <integer> target average CPU utilization (represented as a

    percentage of requested CPU) over all the pods; if not specified the default autoscaling policy will be used.
  24. autoscaling/v1 CPU Request ͕ 100m core ͩͬͨ ৔߹ɺHPA ͸ Pod

    ͷฏۉ CPU ࢖༻ ཰͕ 100m core ʹͳΔΑ͏ʹ replicas Λ૿ݮͤ͞Δ ʢtargetCPUUtilizationPercentage ͸ Request ʹର͢Δ Percentageʣ
  25. Two api versions are available autoscaling/v1 autoscaling/v2beta2 New features are

    available in v2 Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation
  26. New features in v2 • Support for multiple metrics •

    Support for custom metrics • Support for configurable scaling behavior
  27. New features in v2 • Support for multiple metrics •

    Support for custom metrics • Support for configurable scaling behavior
  28. Support for multiple metrics • the Horizontal Pod Autoscaler controller

    will evaluate each metric, and propose a new scale based on that metric. The largest of the proposed scales will be used as the new scale. Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics
  29. Support for multiple metrics Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics Current Pod = 3

    resource.name = cpu target.averageUtilization = 100 resource.name = memory target.averageUtilization = 100 Desired replicas CPU base ͩͱ 5 Memory base ͩͱ 2 HPA ͸ 5 Λ࠾༻ ʢΑΓେ͖͍஋Λ࠾༻ʣ
  30. New features in v2 • Support for multiple metrics •

    Support for custom metrics • Support for configurable scaling behavior
  31. Support for custom metrics • Kubernetes then queries the new

    custom metrics API to fetch the values of the appropriate custom metrics. • By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In order for it to access these APIs, cluster administrators must ensure that: • The API aggregation layer is enabled. • The corresponding APIs are registered: • For resource metrics, this is the metrics.k8s.io API, generally provided by metrics- server... • For custom metrics, this is the custom.metrics.k8s.io API. It's provided by "adapter" API servers provided by metrics solution vendors... • For external metrics, this is the external.metrics.k8s.io API.... https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis
  32. How to get the resource metrics? HPA Controller API Server

    Metrics Server Tell me cpu utilization of the pod
  33. How to get the resource metrics? HPA Controller API Server

    Metrics Server Tell me cpu utilization of the pod 1 apiVersion: apiregistration.k8s.io/v1 2 kind: APIService 3 metadata: 4 labels: 5 app.kubernetes.io/instance: metrics-server 6 k8s-app: metrics-server 7 name: v1beta1.metrics.k8s.io 8 spec: 9 group: metrics.k8s.io 10 groupPriorityMinimum: 100 11 insecureSkipTLSVerify: true 12 service: 13 name: metrics-server 14 namespace: kube-system 15 port: 443 16 version: v1beta1 17 versionPriority: 100 $ kubectl get apiservice v1beta1.metrics.k8s.iop -o yaml
  34. How to get the custom/external metrics? HPA Controller API Server

    Custom Metrics Server Tell me the value of your custom/external metric Aggregation Layer External Metrics Server
  35. How to get the custom/external metrics? HPA Controller API Server

    Custom Metrics Server Tell me the value of your custom/external metric Aggregation Layer External Metrics Server 1 apiVersion: apiregistration.k8s.io/v1 2 kind: APIService 3 metadata: 4 labels: 5 app.kubernetes.io/instance: datadog 6 helm.sh/chart: datadog 7 name: v1beta1.external.metrics.k8s.io 8 spec: 9 group: external.metrics.k8s.io 10 groupPriorityMinimum: 100 11 insecureSkipTLSVerify: true 12 service: 13 name: datadog-custom-metrics-server 14 namespace: monitor 15 port: 443 16 version: v1beta1 17 versionPriority: 100
  36. New features in v2 • Support for multiple metrics •

    Support for custom metrics • Support for configurable scaling behavior
  37. Support for configurable scaling behavior(v1.18~) • Scaling Policies • εέʔϧΞοϓɺμ΢ϯͷมԽྔΛ੍ݶͰ͖Δ

    • ϙϦγʔ͸ෳ਺ఆٛͰ͖ɺselectPolicy ʹैͬͯ࠾༻͞ΕΔ • σϑΥϧτ͸࠷େ 1 behavior: 2 scaleDown: 3 policies: 4 - type: Pods 5 value: 4 6 periodSeconds: 60 7 - type: Percent 8 value: 10 9 periodSeconds: 60
  38. Support for configurable scaling behavior(v1.18~) • Scaling Policies • When

    the number of pods is more than 40 the second policy will be used for scaling down. For instance if there are 80 replicas and the target has to be scaled down to 10 replicas then during the first step 8 replicas will be reduced 1 behavior: 2 scaleDown: 3 policies: 4 - type: Pods 5 value: 4 6 periodSeconds: 60 7 - type: Percent 8 value: 10 9 periodSeconds: 60
  39. Support for configurable scaling behavior(v1.18~) • Stabilization Window • The

    stabilization window is used to restrict the flapping of replicas when the metrics used for scaling keep fluctuating. The stabilization window is used by the autoscaling algorithm to consider the computed desired state from the past to prevent scaling 1 scaleDown: 2 stabilizationWindowSeconds: 300
  40. Default behavior 1 behavior: 2 scaleDown: 3 stabilizationWindowSeconds: 300 4

    policies: 5 - type: Percent 6 value: 100 7 periodSeconds: 15 8 scaleUp: 9 stabilizationWindowSeconds: 0 10 policies: 11 - type: Percent 12 value: 100 13 periodSeconds: 15 14 - type: Pods 15 value: 4 16 periodSeconds: 15 17 selectPolicy: Max
  41. Two api versions are available autoscaling/v1 autoscaling/v2beta2 The FIELD has

    changed between v1 to v2
  42. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1
  43. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 Same as autoscaling/v1
  44. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 Multiple metrics!
  45. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 These 4 types are available: resource, pods, object, external
  46. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 resource: cpu or memory ͷ͜ͱ. metrics-server ͔Βఏڙ͞ΕΔɻ pods: pod ʹؔ͢Δ metrics. custom metrics server ͔Βఏڙ͞ΕΔ object: Pod Ҏ֎ͷ object ʹؔ͢Δ metrics. custom metrics server ͔Βఏڙ͞ΕΔ external: Cluster ֎෦ͷ metrics. external metrics server ͔Βఏڙ͞ΕΔ
  47. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 These 3 fields are available in metrics.<metrics type>.target • averageUtilization • averageValue • value
  48. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageUtilization averageUtilization is the target value of the average of the resource metric across all relevant pods, represented as a percentage of the requested value of the resource for the pods. Currently only valid for Resource metric source type
  49. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageUtilization Resource type ͰͷΈ࢖͑Δ Pod ͷฏۉͷRequest ʹର͢Δൺ཰Λ ໨ඪ஋ͱ͢Δ
  50. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue averageValue is the target value of the average of the metric across all relevant pods (as a quantity)
  51. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue ಘͨ஋Λ Pod ਺Ͱׂͬͨ஋Λ໨ඪ஋͢ Δɻ
  52. autoscaling/v2beta2 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 value <string> value is the target value of the metric (as a quantity).
  53. Agenda • Kubernetes HPA ػೳղઆ • Datadog ͷ Custom Metrics

    Λ HPA External metrics ͱͯ͠ ׆༻ͨ͠ࣄྫ঺հ
  54. Check blog post https://quipper.hatenablog.com/entry/2020/11/30/scheduled-scaling-with-hpa

  55. Background • ϑΟϦϐϯͰఆظࢼݧΛ Quipper ্Ͱ࣮ࢪ • ࢼݧ։࢝લޙʹҰ੪ΞΫηεͰαʔϏεμ΢ϯ Teacher uploaded exams

    and registered the class and time slot Some students ware able to take an exam but someones ware not
  56. Why did it happen? • HPA Ͱ͸ؒʹ߹Θͳ͍ • εέʔϧΞ΢τ͸ Pod

    Λ૿΍ͨ͋͠ͱɺNode Λ૿΍͢ඞཁ ͕͋Δ • Φʔτεέʔϧ͸εύΠΫΞΫηεʹऑ͍
  57. ͱΓ͋͑ͣ • ೔த Pod ૿΍͍ͯ྇ͩ͠

  58. Ͳ͏͢Δ • ӡ༻ͰΧόʔ • "Exam" ͱ͍͏σʔλ࡞੒ʢઌੜͷ໰୊࡞੒ʣΛ࣮ࢪ೔ͷ24࣌ؒ·Ͱ ʹͯ͠΋Β͏ • ಉ࣌ࢼݧडߨऀ਺ͷ੍ݶ •

    ͳΔ΂͘ࢼݧ࣮ࢪΛֶߍ಺Ͱζϥͯ͠΋Β͏ͳͲͷ͓ئ͍
  59. Ͳ͏͢Δ • ΤϯδχΞϦϯάͰղܾ • ࣄલʹ໰୊࡞੒Λͯ͠΋Β͑Ε͹ɺࢼݧ͝ͱͷ։࢝࣌ࠁͱडߨਓ਺͸ ࣄલʹΘ͔Δ • ͜ͷ਺Λ΋ͱʹαʔόΛεέʔϧͤ͞Ε͹͍͍ΜͰ͸ʁ

  60. Database ʹडݧਓ਺ͱ࣌ࠁ͸ଘࡏ͢Δ Teacher uploaded exams 10:00 - 11:00 A school

    18 11:00 - 12:00 B school 30 13:00 - 14:00 C school 500
  61. ࣌ؒͱडݧਓ਺Λ΋ͱʹαʔόΛεέʔϧ͍ͨ͠ 10:00 - 11:00 A school 18 11:00 - 12:00

    B school 30 13:00 - 14:00 C school 500 Time,#pods 10:00 90 11:00 270 13:00 2500 Scheduled-Scaling Desired replicas
  62. How?

  63. How to solve? 1. σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch 2. ࣌ؒ͝ͱͷࢼݧର৅ਓ਺Λ Datadog ʹ

    custom metrics ͱ ͯ͠ૹ৴ 3. HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻
  64. Architecture Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset:

    datadog-agent
  65. Architecture Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent daemonset:

    datadog-agent ᶃ ᶄ ᶅ
  66. 1.σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch Configmap Pod: timed-exam-schedule-exporter HPA Controller Deployment Datadog Cluster-agent

    daemonset: datadog-agent 2020-01-20.tsv: --- 12:00 229 12:15 54 12:45 67 13:00 3684 13:15 91 13:30 4821 13:45 37 14:00 138 Ruby ͷόονॲཧΛ WebDev ͕γϡͬͱ ॻ͍ͯ͘Εͨ
  67. 2.࣌ܥྻσʔλΛ Datadog ʹ custom metrics ͱͯ͠ૹ৴ Configmap Pod: timed-exam-schedule-exporter HPA

    Controller Deployment Datadog Cluster-agent daemonset: datadog-agent 2020-01-20.tsv: --- 12:00 229 12:15 54 12:45 67 13:00 3684 13:15 91 13:30 4821 13:45 37 14:00 138 File mount ৽ن։ൃίϯϙʔωϯτɻGo ੡ɻ ແݶϧʔϓͰ tsv ΛಡΈࠐΜͰɺݱࡏ࣌ࠁͷ15෼ޙ ͷ஋Λ Prometheus ܗࣜͰ export ͢Δ܅ɻ Kubernetes Integration Autodiscovery Ͱ metrics Λ৯΂ͯ΋Β͏
  68. Kubernetes Integration Autodiscovery 1 annotations: 2 ad.datadoghq.com/timed-exam-schedule-exporter.check_names: | 3 ["prometheus"]

    4 ad.datadoghq.com/timed-exam-schedule-exporter.init_configs: | 5 [{}] 6 ad.datadoghq.com/timed-exam-schedule-exporter.instances: | 7 [ 8 { 9 "prometheus_url": "http://%%host%%:8080/metrics", 10 "namespace": "timed_exam", 11 "metrics": ["*"] 12 } 13 ] https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes
  69. 3.HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻ Configmap Pod: timed-exam-schedule-exporter HPA Controller

    Deployment Datadog Cluster-agent daemonset: datadog-agent Datadogmetric ͱ͍͏ Custom Resource Λ࢖͏ HPA autoscaling/v2beta2 Ͱ External type ͰͦΕΛࢦఆ
  70. Datadogmetric 1 apiVersion: datadoghq.com/v1alpha1 2 kind: DatadogMetric 3 metadata: 4

    name: timed-exam 5 spec: 6 # throughput: 10 = 500 / 5000. 500 pods accept 5000 users. 7 # ref: https://github.com/quipper/quipper/issues/26054 8 query: ceil(max:timed_exam.timed_exam_scheduled_scaling_desired_replicas{environment:production}/10) Datadog ͷ query ͕ͦͷ··࢖͑ΔɻίʔυΛॻ͘ྔΛݮΒͤͯ࠷ߴ ͜ͷQuery Ͱࢼݧର৅ਓ਺͔Β Desired replicas ʹม׵͍ͯ͠Δ(܎਺10Ͱׂ͍ͬͯΔ)
  71. HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1
  72. HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 લ͔Β͋ͬͨ CPU ࢖༻཰ʹΑΔ HPA ઃఆ
  73. HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 ࠓճ௥Ճ෦෼ External metric ͱ ͯ͠ datadogmetric Λࢦఆ
  74. HorizontalPodAutoscaler 1 apiVersion: autoscaling/v2beta2 2 kind: HorizontalPodAutoscaler 3 metadata: 4

    name: api 5 spec: 6 scaleTargetRef: 7 apiVersion: apps/v1 8 kind: Deployment 9 name: api 10 minReplicas: 40 11 maxReplicas: 1000 12 metrics: 13 - type: Resource 14 resource: 15 name: cpu 16 target: 17 type: Utilization 18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6 19 - type: External 20 external: 21 metric: 22 name: datadogmetric@production:timed-exam 23 target: 24 type: AverageValue 25 averageValue: 1 averageValue: ର৅ͷ metric Λ Pod ૯਺Ͱׂͬͨ ஋͕͜ͷ஋ʹۙͮ͘Α͏ʹ໨ࢦ͢ Datadogmetric Ͱಘͨ਺͕ 100 ͳΒɺPod ͸ 100 ݸʹ૿΍ͦ͏ͱ͢Δ = desired replicas Λ͍ࣔͯ͠Δ
  75. ݁Ռ ԫ৭ͷઢͰࢼݧʹ߹ΘͤͨεέʔϦϯάΛɺ ͦΕҎ֎͸ CPU ʹΑΔεέʔϧ͕Ͱ͖͍ͯΔʂ

  76. ݁Ռ ࢵͱ੨ͷ໘ੵͷ͕ࠩݮΒͨ͠ίετɻ ݄ؒ $3150 ͸ݮΒͤΔࢼࢉʹɻ

  77. ·ͱΊ • HPA autoscaling/v2beta2 Ͱ͸ multiple metrics ͕࢖͑Δ • Multiple

    metrics ͸ෳ਺৚݅ΛݟͯΑΓ҆શʢߴ͍஋ʣΛ࠾༻͢Δ • HPA autoscaling/v2beta2 Ͱ͸ external metrics ͕࢖͑Δ • Datadog ʹ͋Δ metrics / query ͕࢖͑ͯศར • Datadog ʹ custom metrics ΛૹΔ͜ͱͰ೚ҙͷ࣌ؒʹ೚ҙͷεέʔ ϧΛߦ͏͜ͱ͕Ͱ͖ͯศར
  78. Thank you! chaspy chaspy_ Lead Software Engineer Site Reliability at

    Quipper Takeshi Kondo