Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HPA autoscaling/v2beta2 の機能解説と Datadog を利用した HPA External Metrics の活用事例 / Introduction to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog

Takeshi Kondo
January 21, 2021

HPA autoscaling/v2beta2 の機能解説と Datadog を利用した HPA External Metrics の活用事例 / Introduction to the feature of HPA autoscaling/v2beta2 and examples of using HPA External Metrics with Datadog

Kubernetes meetup tokyo#38

Takeshi Kondo

January 21, 2021
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. HPA autoscaling/v2beta2 ͷػೳղઆͱ
    Datadog Λར༻ͨ͠ HPA External Metrics ͷ׆༻ࣄྫ
    Introduction to the feature of HPA autoscaling/v2beta2
    and examples of using HPA External Metrics with Datadog
    Takeshi Kondo / @chaspy
    2021/01/21
    Kubernetes Meetup Tokyo #38

    View Slide

  2. #k8sjp

    View Slide

  3. Who am I
    chaspy chaspy_
    Lead Software Engineer

    Site Reliability at Quipper
    Takeshi Kondo

    View Slide

  4. ࠓ೔ͷൃදʹ͍ͭͯ
    • ର৅
    • Kubernetes HPA ·ͬͨ͘஌Βͳ͍ͻͱ
    • Kubernetes HPA autoscaling/v1 ͸࢖ͬͯΔ͕ v2 ͸஌Βͳ͍ͻͱ
    • Kuberentes HPA Λ Datadog metric ࢖ͬͯಈ͔͍ͨ͠ͻͱ
    • ΰʔϧ
    • HPA v1/v2 ͷجຊػೳΛ஌Δ
    • Datadog ͱ૊Έ߹Θͤͨࣄྫ͔ΒࣗࣾͰ׆༻͢ΔώϯτΛಘΔ

    View Slide

  5. Agenda
    • Kubernetes HPA ػೳղઆ
    • Datadog ͷ Custom Metrics Λ HPA External metrics ͱͯ͠
    ׆༻ͨ͠ࣄྫ঺հ

    View Slide

  6. Agenda
    • Kubernetes HPA ػೳղઆ
    • Datadog ͷ Custom Metrics Λ HPA External metrics ͱͯ͠
    ׆༻ͨ͠ࣄྫ঺հ

    View Slide

  7. What is HPA?

    View Slide

  8. View Slide

  9. Horizontal Pod Autoscaler

    View Slide

  10. How does the Horizontal Pod Autoscaler work?
    https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

    View Slide

  11. Check blog post
    https://quipper.hatenablog.com/entry/2020/04/10/hpa

    View Slide

  12. Two api versions are available

    autoscaling/v1
    autoscaling/v2beta2

    View Slide

  13. Two api versions are available

    autoscaling/v1
    autoscaling/v2beta2
    The FIELD has changed
    between v1 to v2

    View Slide

  14. Two api versions are available

    autoscaling/v1
    autoscaling/v2beta2
    Let's check out v1 first to
    learn the basic algorithm
    Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation

    View Slide

  15. autoscaling/v1

    View Slide

  16. autoscaling/v1
    scaleTargetRef -required
    reference to scaled resource; horizontal
    pod autoscaler will learn the current
    resource consumption and will set the
    desired number of pods by using its
    Scale subresource.

    View Slide

  17. autoscaling/v1
    minReplicas
    minReplicas is the lower limit for the
    number of replicas to which the
    autoscaler can scale down. It defaults to 1
    pod. minReplicas is allowed to be 0 if the
    alpha feature gate HPAScaleToZero is
    enabled and at least one Object or
    External metric is configured. Scaling is
    active as long as at least one metric value
    is available.
    https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/ HPAScaleToZero ͷ Feature Gate ͸ 1.16 ͔Β

    View Slide

  18. autoscaling/v1
    maxReplicas -required-
    upper limit for the number of pods that
    can be set by the autoscaler; cannot be
    smaller than MinReplicas.

    View Slide

  19. autoscaling/v1
    targetCPUUtilizationPercentage
    target average CPU utilization
    (represented as a percentage of
    requested CPU) over all the pods; if
    not specified the default autoscaling policy
    will be used.

    View Slide

  20. Algorithm Details
    desiredReplicas
    = ceil [currentReplicas * ( currentMetricValue / desiredMetricValue )]
    Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
    Pod = 3, targetCPUUtilizationPercentage = 100
    Pod1 CPUUtilization 100%
    Pod2 CPUUtilization 150%
    Pod3 CPUUtilization 200%

    View Slide

  21. Algorithm Details
    desiredReplicas
    = ceil [currentReplicas * ( currentMetricValue / desiredMetricValue )]
    Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
    Pod = 3, targetCPUUtilizationPercentage = 100
    Pod1 CPUUtilization 100%
    Pod2 CPUUtilization 150%
    Pod3 CPUUtilization 200%
    currentMetricValue = 150 (100 + 150 + 200 / 3)
    desiredMetricValue = 100
    Current Replicas = 3
    desiredReplicas = ceil(3 * (150 / 100)) = 5

    View Slide

  22. Algorithm Details
    desiredReplicas
    = ceil [currentReplicas * ( currentMetricValue / desiredMetricValue )]
    Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
    Pod = 3, targetCPUUtilizationPercentage = 100
    Pod1 CPUUtilization 100%
    Pod2 CPUUtilization 150%
    Pod3 CPUUtilization 200%
    currentMetricValue = 150 (100 + 150 + 200 / 3)
    desiredMetricValue = 100
    Current Replicas = 3
    desiredReplicas = ceil(3 * (150 / 100)) = 5
    ΋ͪΖΜݮΔͱ͖΋ಉ͡ܭࢉ

    View Slide

  23. autoscaling/v1
    targetCPUUtilizationPercentage

    target average CPU utilization
    (represented as a percentage of
    requested CPU) over all the pods; if
    not specified the default autoscaling
    policy will be used.

    View Slide

  24. autoscaling/v1
    CPU Request ͕ 100m core ͩͬͨ
    ৔߹ɺHPA ͸ Pod ͷฏۉ CPU ࢖༻
    ཰͕ 100m core ʹͳΔΑ͏ʹ
    replicas Λ૿ݮͤ͞Δ
    ʢtargetCPUUtilizationPercentage
    ͸ Request ʹର͢Δ Percentageʣ

    View Slide

  25. Two api versions are available

    autoscaling/v1
    autoscaling/v2beta2
    New features are available in v2
    Note: autoscaling/v2beta1 is deprecated in v1.19 https://v1-19.docs.kubernetes.io/docs/setup/release/notes/#deprecation

    View Slide

  26. New features in v2
    • Support for multiple metrics
    • Support for custom metrics
    • Support for configurable scaling behavior

    View Slide

  27. New features in v2
    • Support for multiple metrics
    • Support for custom metrics
    • Support for configurable scaling behavior

    View Slide

  28. Support for multiple metrics
    • the Horizontal Pod Autoscaler controller will evaluate each
    metric, and propose a new scale based on that metric. The
    largest of the proposed scales will be used as the new
    scale.
    Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics

    View Slide

  29. Support for multiple metrics
    Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics
    Current Pod = 3
    resource.name = cpu
    target.averageUtilization = 100
    resource.name = memory
    target.averageUtilization = 100
    Desired replicas
    CPU base ͩͱ 5
    Memory base ͩͱ 2
    HPA ͸ 5 Λ࠾༻
    ʢΑΓେ͖͍஋Λ࠾༻ʣ

    View Slide

  30. New features in v2
    • Support for multiple metrics
    • Support for custom metrics
    • Support for configurable scaling behavior

    View Slide

  31. Support for custom metrics
    • Kubernetes then queries the new custom metrics API to fetch the values of the appropriate
    custom metrics.
    • By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In
    order for it to access these APIs, cluster administrators must ensure that:
    • The API aggregation layer is enabled.
    • The corresponding APIs are registered:
    • For resource metrics, this is the metrics.k8s.io API, generally provided by metrics-
    server...
    • For custom metrics, this is the custom.metrics.k8s.io API. It's provided by "adapter" API
    servers provided by metrics solution vendors...
    • For external metrics, this is the external.metrics.k8s.io API....
    https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics
    https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis

    View Slide

  32. How to get the resource metrics?
    HPA Controller API Server Metrics Server
    Tell me cpu utilization of the pod

    View Slide

  33. How to get the resource metrics?
    HPA Controller API Server Metrics Server
    Tell me cpu utilization of the pod
    1 apiVersion: apiregistration.k8s.io/v1
    2 kind: APIService
    3 metadata:
    4 labels:
    5 app.kubernetes.io/instance: metrics-server
    6 k8s-app: metrics-server
    7 name: v1beta1.metrics.k8s.io
    8 spec:
    9 group: metrics.k8s.io
    10 groupPriorityMinimum: 100
    11 insecureSkipTLSVerify: true
    12 service:
    13 name: metrics-server
    14 namespace: kube-system
    15 port: 443
    16 version: v1beta1
    17 versionPriority: 100
    $ kubectl get apiservice v1beta1.metrics.k8s.iop -o yaml

    View Slide

  34. How to get the custom/external metrics?
    HPA Controller API Server
    Custom Metrics
    Server
    Tell me the value of your custom/external metric
    Aggregation
    Layer
    External
    Metrics Server

    View Slide

  35. How to get the custom/external metrics?
    HPA Controller API Server
    Custom Metrics
    Server
    Tell me the value of your custom/external metric
    Aggregation
    Layer
    External
    Metrics Server
    1 apiVersion: apiregistration.k8s.io/v1
    2 kind: APIService
    3 metadata:
    4 labels:
    5 app.kubernetes.io/instance: datadog
    6 helm.sh/chart: datadog
    7 name: v1beta1.external.metrics.k8s.io
    8 spec:
    9 group: external.metrics.k8s.io
    10 groupPriorityMinimum: 100
    11 insecureSkipTLSVerify: true
    12 service:
    13 name: datadog-custom-metrics-server
    14 namespace: monitor
    15 port: 443
    16 version: v1beta1
    17 versionPriority: 100

    View Slide

  36. New features in v2
    • Support for multiple metrics
    • Support for custom metrics
    • Support for configurable scaling behavior

    View Slide

  37. Support for configurable scaling behavior(v1.18~)
    • Scaling Policies
    • εέʔϧΞοϓɺμ΢ϯͷมԽྔΛ੍ݶͰ͖Δ
    • ϙϦγʔ͸ෳ਺ఆٛͰ͖ɺselectPolicy ʹैͬͯ࠾༻͞ΕΔ
    • σϑΥϧτ͸࠷େ
    1 behavior:
    2 scaleDown:
    3 policies:
    4 - type: Pods
    5 value: 4
    6 periodSeconds: 60
    7 - type: Percent
    8 value: 10
    9 periodSeconds: 60

    View Slide

  38. Support for configurable scaling behavior(v1.18~)
    • Scaling Policies
    • When the number of pods is more than 40 the second policy will be
    used for scaling down. For instance if there are 80 replicas and the
    target has to be scaled down to 10 replicas then during the first step
    8 replicas will be reduced
    1 behavior:
    2 scaleDown:
    3 policies:
    4 - type: Pods
    5 value: 4
    6 periodSeconds: 60
    7 - type: Percent
    8 value: 10
    9 periodSeconds: 60

    View Slide

  39. Support for configurable scaling behavior(v1.18~)
    • Stabilization Window
    • The stabilization window is used to restrict the flapping of replicas
    when the metrics used for scaling keep fluctuating. The stabilization
    window is used by the autoscaling algorithm to consider the
    computed desired state from the past to prevent scaling
    1 scaleDown:
    2 stabilizationWindowSeconds: 300

    View Slide

  40. Default behavior
    1 behavior:
    2 scaleDown:
    3 stabilizationWindowSeconds: 300
    4 policies:
    5 - type: Percent
    6 value: 100
    7 periodSeconds: 15
    8 scaleUp:
    9 stabilizationWindowSeconds: 0
    10 policies:
    11 - type: Percent
    12 value: 100
    13 periodSeconds: 15
    14 - type: Pods
    15 value: 4
    16 periodSeconds: 15
    17 selectPolicy: Max

    View Slide

  41. Two api versions are available

    autoscaling/v1
    autoscaling/v2beta2
    The FIELD has changed
    between v1 to v2

    View Slide

  42. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1

    View Slide

  43. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: datadogmetric[email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    Same as autoscaling/v1

    View Slide

  44. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    Multiple metrics!

    View Slide

  45. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    These 4 types are available:
    resource, pods, object, external

    View Slide

  46. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    resource:
    cpu or memory ͷ͜ͱ.
    metrics-server ͔Βఏڙ͞ΕΔɻ
    pods:
    pod ʹؔ͢Δ metrics.
    custom metrics server ͔Βఏڙ͞ΕΔ
    object:
    Pod Ҏ֎ͷ object ʹؔ͢Δ metrics.
    custom metrics server ͔Βఏڙ͞ΕΔ
    external:
    Cluster ֎෦ͷ metrics.
    external metrics server ͔Βఏڙ͞ΕΔ

    View Slide

  47. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    These 3 fields are available in
    metrics..target
    • averageUtilization
    • averageValue
    • value

    View Slide

  48. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    averageUtilization
    averageUtilization is the target value
    of the average of the resource metric
    across all relevant pods, represented
    as a percentage of the requested
    value of the resource for the pods.
    Currently only valid for Resource
    metric source type

    View Slide

  49. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    averageUtilization
    Resource type ͰͷΈ࢖͑Δ
    Pod ͷฏۉͷRequest ʹର͢Δൺ཰Λ
    ໨ඪ஋ͱ͢Δ

    View Slide

  50. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    averageValue
    averageValue is the target value of the
    average of the metric across all
    relevant pods (as a quantity)

    View Slide

  51. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    averageValue
    ಘͨ஋Λ Pod ਺Ͱׂͬͨ஋Λ໨ඪ஋͢
    Δɻ

    View Slide

  52. autoscaling/v2beta2
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    value
    value is the target value of
    the metric (as a quantity).

    View Slide

  53. Agenda
    • Kubernetes HPA ػೳղઆ
    • Datadog ͷ Custom Metrics Λ HPA External metrics ͱͯ͠
    ׆༻ͨ͠ࣄྫ঺հ

    View Slide

  54. Check blog post
    https://quipper.hatenablog.com/entry/2020/11/30/scheduled-scaling-with-hpa

    View Slide

  55. Background
    • ϑΟϦϐϯͰఆظࢼݧΛ Quipper ্Ͱ࣮ࢪ
    • ࢼݧ։࢝લޙʹҰ੪ΞΫηεͰαʔϏεμ΢ϯ










    Teacher uploaded exams and
    registered the class and time slot
    Some students ware able to take an exam
    but someones ware not

    View Slide

  56. Why did it happen?
    • HPA Ͱ͸ؒʹ߹Θͳ͍
    • εέʔϧΞ΢τ͸ Pod Λ૿΍ͨ͋͠ͱɺNode Λ૿΍͢ඞཁ
    ͕͋Δ
    • Φʔτεέʔϧ͸εύΠΫΞΫηεʹऑ͍

    View Slide

  57. ͱΓ͋͑ͣ
    • ೔த Pod ૿΍͍ͯ྇ͩ͠

    View Slide

  58. Ͳ͏͢Δ
    • ӡ༻ͰΧόʔ
    • "Exam" ͱ͍͏σʔλ࡞੒ʢઌੜͷ໰୊࡞੒ʣΛ࣮ࢪ೔ͷ24࣌ؒ·Ͱ
    ʹͯ͠΋Β͏
    • ಉ࣌ࢼݧडߨऀ਺ͷ੍ݶ
    • ͳΔ΂͘ࢼݧ࣮ࢪΛֶߍ಺Ͱζϥͯ͠΋Β͏ͳͲͷ͓ئ͍

    View Slide

  59. Ͳ͏͢Δ
    • ΤϯδχΞϦϯάͰղܾ
    • ࣄલʹ໰୊࡞੒Λͯ͠΋Β͑Ε͹ɺࢼݧ͝ͱͷ։࢝࣌ࠁͱडߨਓ਺͸
    ࣄલʹΘ͔Δ
    • ͜ͷ਺Λ΋ͱʹαʔόΛεέʔϧͤ͞Ε͹͍͍ΜͰ͸ʁ

    View Slide

  60. Database ʹडݧਓ਺ͱ࣌ࠁ͸ଘࡏ͢Δ










    Teacher uploaded exams
    10:00 - 11:00 A school 18
    11:00 - 12:00 B school 30
    13:00 - 14:00 C school 500

    View Slide

  61. ࣌ؒͱडݧਓ਺Λ΋ͱʹαʔόΛεέʔϧ͍ͨ͠









    10:00 - 11:00 A school 18
    11:00 - 12:00 B school 30
    13:00 - 14:00 C school 500
    Time,#pods
    10:00 90
    11:00 270
    13:00 2500
    Scheduled-Scaling
    Desired replicas

    View Slide

  62. How?

    View Slide

  63. How to solve?
    1. σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch
    2. ࣌ؒ͝ͱͷࢼݧର৅ਓ਺Λ Datadog ʹ custom metrics ͱ
    ͯ͠ૹ৴
    3. HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻

    View Slide

  64. Architecture
    Configmap
    Pod: timed-exam-schedule-exporter
    HPA Controller
    Deployment Datadog Cluster-agent
    daemonset: datadog-agent

    View Slide

  65. Architecture
    Configmap
    Pod: timed-exam-schedule-exporter
    HPA Controller
    Deployment Datadog Cluster-agent
    daemonset: datadog-agent



    View Slide

  66. 1.σʔλϕʔε͔Βࢼݧର৅ਓ਺ͱ࣌ؒΛ Fetch
    Configmap
    Pod: timed-exam-schedule-exporter
    HPA Controller
    Deployment Datadog Cluster-agent
    daemonset: datadog-agent
    2020-01-20.tsv:
    ---
    12:00 229
    12:15 54
    12:45 67
    13:00 3684
    13:15 91
    13:30 4821
    13:45 37
    14:00 138
    Ruby ͷόονॲཧΛ
    WebDev ͕γϡͬͱ
    ॻ͍ͯ͘Εͨ

    View Slide

  67. 2.࣌ܥྻσʔλΛ Datadog ʹ custom metrics ͱͯ͠ૹ৴
    Configmap
    Pod: timed-exam-schedule-exporter
    HPA Controller
    Deployment Datadog Cluster-agent
    daemonset: datadog-agent
    2020-01-20.tsv:
    ---
    12:00 229
    12:15 54
    12:45 67
    13:00 3684
    13:15 91
    13:30 4821
    13:45 37
    14:00 138
    File mount
    ৽ن։ൃίϯϙʔωϯτɻGo ੡ɻ
    ແݶϧʔϓͰ tsv ΛಡΈࠐΜͰɺݱࡏ࣌ࠁͷ15෼ޙ
    ͷ஋Λ Prometheus ܗࣜͰ export ͢Δ܅ɻ
    Kubernetes Integration Autodiscovery
    Ͱ metrics Λ৯΂ͯ΋Β͏

    View Slide

  68. Kubernetes Integration Autodiscovery
    1 annotations:
    2 ad.datadoghq.com/timed-exam-schedule-exporter.check_names: |
    3 ["prometheus"]
    4 ad.datadoghq.com/timed-exam-schedule-exporter.init_configs: |
    5 [{}]
    6 ad.datadoghq.com/timed-exam-schedule-exporter.instances: |
    7 [
    8 {
    9 "prometheus_url": "http://%%host%%:8080/metrics",
    10 "namespace": "timed_exam",
    11 "metrics": ["*"]
    12 }
    13 ]
    https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes

    View Slide

  69. 3.HPA ͔Β external metrics ͱͯͦ͠ΕΛར༻
    Configmap
    Pod: timed-exam-schedule-exporter
    HPA Controller
    Deployment Datadog Cluster-agent
    daemonset: datadog-agent
    Datadogmetric ͱ͍͏ Custom Resource Λ࢖͏
    HPA autoscaling/v2beta2 Ͱ External type ͰͦΕΛࢦఆ

    View Slide

  70. Datadogmetric
    1 apiVersion: datadoghq.com/v1alpha1
    2 kind: DatadogMetric
    3 metadata:
    4 name: timed-exam
    5 spec:
    6 # throughput: 10 = 500 / 5000. 500 pods accept 5000 users.
    7 # ref: https://github.com/quipper/quipper/issues/26054
    8 query:
    ceil(max:timed_exam.timed_exam_scheduled_scaling_desired_replicas{environment:production}/10)
    Datadog ͷ query ͕ͦͷ··࢖͑ΔɻίʔυΛॻ͘ྔΛݮΒͤͯ࠷ߴ
    ͜ͷQuery Ͱࢼݧର৅ਓ਺͔Β Desired replicas ʹม׵͍ͯ͠Δ(܎਺10Ͱׂ͍ͬͯΔ)

    View Slide

  71. HorizontalPodAutoscaler
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1

    View Slide

  72. HorizontalPodAutoscaler
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    લ͔Β͋ͬͨ CPU ࢖༻཰ʹΑΔ HPA ઃఆ

    View Slide

  73. HorizontalPodAutoscaler
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    ࠓճ௥Ճ෦෼ External metric ͱ
    ͯ͠ datadogmetric
    Λࢦఆ

    View Slide

  74. HorizontalPodAutoscaler
    1 apiVersion: autoscaling/v2beta2
    2 kind: HorizontalPodAutoscaler
    3 metadata:
    4 name: api
    5 spec:
    6 scaleTargetRef:
    7 apiVersion: apps/v1
    8 kind: Deployment
    9 name: api
    10 minReplicas: 40
    11 maxReplicas: 1000
    12 metrics:
    13 - type: Resource
    14 resource:
    15 name: cpu
    16 target:
    17 type: Utilization
    18 averageUtilization: 60 # want 570 mcore of cpu usage. 570 / 950(requests) = 0.6
    19 - type: External
    20 external:
    21 metric:
    22 name: [email protected]:timed-exam
    23 target:
    24 type: AverageValue
    25 averageValue: 1
    averageValue: ର৅ͷ metric Λ Pod ૯਺Ͱׂͬͨ
    ஋͕͜ͷ஋ʹۙͮ͘Α͏ʹ໨ࢦ͢
    Datadogmetric Ͱಘͨ਺͕ 100 ͳΒɺPod ͸ 100
    ݸʹ૿΍ͦ͏ͱ͢Δ
    = desired replicas Λ͍ࣔͯ͠Δ

    View Slide

  75. ݁Ռ
    ԫ৭ͷઢͰࢼݧʹ߹ΘͤͨεέʔϦϯάΛɺ
    ͦΕҎ֎͸ CPU ʹΑΔεέʔϧ͕Ͱ͖͍ͯΔʂ

    View Slide

  76. ݁Ռ
    ࢵͱ੨ͷ໘ੵͷ͕ࠩݮΒͨ͠ίετɻ
    ݄ؒ $3150 ͸ݮΒͤΔࢼࢉʹɻ

    View Slide

  77. ·ͱΊ
    • HPA autoscaling/v2beta2 Ͱ͸ multiple metrics ͕࢖͑Δ
    • Multiple metrics ͸ෳ਺৚݅ΛݟͯΑΓ҆શʢߴ͍஋ʣΛ࠾༻͢Δ
    • HPA autoscaling/v2beta2 Ͱ͸ external metrics ͕࢖͑Δ
    • Datadog ʹ͋Δ metrics / query ͕࢖͑ͯศར
    • Datadog ʹ custom metrics ΛૹΔ͜ͱͰ೚ҙͷ࣌ؒʹ೚ҙͷεέʔ
    ϧΛߦ͏͜ͱ͕Ͱ͖ͯศར

    View Slide

  78. Thank you!
    chaspy chaspy_
    Lead Software Engineer

    Site Reliability at Quipper
    Takeshi Kondo

    View Slide