Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cost Optimization with Cluster Autoscaler

Takeshi Kondo
September 30, 2019

Cost Optimization with Cluster Autoscaler

2019/09/30 Lightning Talks

Takeshi Kondo

September 30, 2019
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. How to solve the problem? • Cluster Autoscaler • Horizontal

    Pod Autoscaler • Vertical Pod Autoscaler
  2. How to solve the problem? • Cluster Autoscaler • Horizontal

    Pod Autoscaler • Vertical Pod Autoscaler Tried and reverted because “JaJa Uma” (unmanageable) Not trying
  3. Agenda • Introduction / Background • Cluster Autoscaler • How

    to scale-in/out • Check the code • And more topic • (Production) Cluster Autoscaler works when releasing • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces • Achievement • Conclusion
  4. Agenda • Introduction / Background • Cluster Autoscaler • How

    to scale-in/out • Check the code • And more topic • (Production) Cluster Autoscaler works when releasing • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces • Achievement • Conclusion
  5. Cluster Autoscaler *1 • Scale-up • When any unschedulable pods

    exist • Scale-in (Need All below conditions) • If no scale-up is needed, • The sum of cpu and memory requests of all pods running on this node is smaller than 50% of the node's allocatable • All pods running on the node can be moved to other nodes • For example, PodDisruptionBudget prevents. See for details *2 • It doesn't have scale-down disabled annotation • "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" *1 https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md *2 https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can prevent-ca-from-removing-a-node
  6. Agenda • Introduction / Background • Cluster Autoscaler • How

    to scale-in/out • Check the code • And more topic • (Production) Cluster Autoscaler works when releasing • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces • Achievement • Conclusion
  7. And more topic • (Production) Cluster Autoscaler works when releasing

    • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces
  8. And more topic • (Production) Cluster Autoscaler works when releasing

    • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces
  9. Cluster Autoscaler works when releasing • When rolling update, the

    count of "max surge" pods is created so there is temporarily Unschedulable pod. Num of running pods Num of desired capacity of ASG
  10. And more topic • (Production) Cluster Autoscaler works when releasing

    • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces
  11. And more topic • (Production) Cluster Autoscaler works when releasing

    • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces
  12. And more topic • (Production) Cluster Autoscaler works when releasing

    • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces
  13. Agenda • Introduction / Background • Cluster Autoscaler • How

    to scale-in/out • Check the code • And more topic • (Production) Cluster Autoscaler works when releasing • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces • Achievement • Conclusion
  14. Achievement (Staging) Instance Class Node Count (Aug.) Node Count (Sep.)

    Metrics Usage / Capacity (Aug.) Usage / Capacity (Sep.) Japan/ default r5.2xlarge 20.56 17.65 Memory 0.53 0.59 Global/ default r5.2xlarge 12 12 Memory 0.51 0.56 $443.840000 (monthly) * (20.56 - 17.65) = $1291.5744
  15. Achievement (Production) Instance Class Node Count (Aug.) Node Count (Sep.)

    Metrics Usage / Capacity (Aug.) Usage / Capacity (Sep.) Japan/ default m5.xlarge 6.38 8.26 Memory 0.68 0.58 Japan/ api m5.2xlarge 17.1 13.42 CPU 0.17 0.14 Global/ default m5.xlarge 6 6 Memory 0.51 0.51 Global/ api m5.2xlarge 6 6 CPU 0.15 0.17 $181.040000 (monthly) * (6.38 - 8.26) = $340.3552 $362.080000 (monthly) * (17.1 - 13.42) = $1332.4544
  16. Achievement (Production) Instance Class Node Count (Aug.) Node Count (Sep.)

    Metrics Usage / Capacity (Aug.) Usage / Capacity (Sep.) Japan/ default m5.xlarge 6.38 8.26 Memory 0.68 0.58 Japan/ api m5.2xlarge 17.1 13.42 CPU 0.17 0.14 Global/ default m5.xlarge 6 6 Memory 0.51 0.51 Global/ api m5.2xlarge 6 6 CPU 0.15 0.17 $181.040000 (monthly) * (6.38 - 8.26) = $340.3552 $362.080000 (monthly) * (17.1 - 13.42) = $1332.4544 Due to PDB
  17. Agenda • Introduction / Background • Cluster Autoscaler • How

    to scale-in/out • Check the code • And more topic • (Production) Cluster Autoscaler works when releasing • (Production) PodDisruptionBudget prevents to scale-in • (Production) Making zero downtime when scaling-in • (Staging) Remove stale PullRequests namespaces • Achievement • Conclusion
  18. Conclusion • Cluster Autoscaler saves the cost of both operation

    and infrastructure • (In Staging) With deleting stale namespaces • (In Production) With making no downtime when scaling-in • Pod level (Horizontal / Vertical) autoscaling is not yet introduced • Mean we SRE should increase pods/nodes when high loading • Read Code
  19. Special Thanks • @yuya-takeyama / SRE • Thanks to review

    PR • @rbmrclo / SRE • Thanks to review PR • @hiroki-iwasaki / People & Culture • Thanks to organize the “Lightning Talks”