Upgrade to Pro — share decks privately, control downloads, hide ads and more …

To Infinity and Beyond: Seamless autoscaling with in-place resource resize for Kubernetes Pods

To Infinity and Beyond: Seamless autoscaling with in-place resource resize for Kubernetes Pods

Aya (Igarashi) Ozawa

March 23, 2024
Tweet

More Decks by Aya (Igarashi) Ozawa

Other Decks in Technology

Transcript

  1. Aya Ozawa, CloudNatix Kohei Ota, Apple To Infinity and Beyond:

    Seamless autoscaling with in-place resource resize for Kubernetes Pods
  2. Self introduction Kohei Ota Senior Field Engineer at Apple CNCF

    Ambassador Chair of Cloud Native Community Japan Owner of SIG-Docs Japanese localization Twitter: @inductor__ GitHub: @inductor Aya Ozawa Member of Technical Staff at CloudNatix Co-organizer of Kubernetes Meetup Tokyo Twitter: @Ladicle GitHub: @Ladicle
  3. Resource Requests & Limits Convert Requests & Limits to Container

    Config Check Requests Requests: cpu.share Limits: cpu.max, memory.max Pod Qos: oom_score_adj
  4. Does the Pod fit the Node? Requests are used for

    scheduling CPU/Memory can be overcommitted
  5. Degrade Performance Hit the Resource Limits! Memory Limits → OOM

    Killed CPU Limits → Throttled Overcommitted nodes could be in the same situation before reaching the limit.
  6. Pod Quality of Service(QoS) Class & OOM Score For all

    containers At least one container For all containers OOM Killer oom_score_adj Node OOM!
  7. What goes wrong? Requests are too low Requests are too

    high Waste of cost Performance/Availability Degradation
  8. What goes wrong? Requests are too low Requests are too

    high Waste of cost Performance/Availability Degradation If requests are not configured correctly, cluster-autoscaler cannot scale properly.
  9. Recommended Requests lowerBound target uncappedTarget upperBound Running with less resources

    is likely to damage performance/availability. Recommended amount of resources. Original target before capped by resource policy. Allocating beyond these resources is likely wasted. Keep the same ratio as the original requests and limits
  10. Add samples to 24h half-decayed histograms How are targets estimated?

    1. Fetch Metrics 2. 3. Find P90 values Newer samples are more important than older ones Default: OOM: Always: Peak usage in 5m Last usage Current usage CPU Memory Percentile
  11. HPA vs. VPA VPA cannot be used with HPA on

    CPU or Memory Scenario where VPA and HPA do not work well together Use HPA on custom/external metrics (e.g., RPS) instead. Requests Per Seconds
  12. 4 Update Modes (subject to change) Off Initial Recreate Auto

    Just calculate the recommended values Set resource only for pod creation Evict Pod and recreate it if resources not as recommended Equivalent to Recreate Only 1 1 & 3 All 1. Update 2. Evict 3. Set Resources VPA Components
  13. Restarting is a serious disruption for some Pods (PDB) mitigates

    availability impact by Pod eviction! Successful scenario Non-optimal scenario High usage only at startup Resources not optimized Long-running Job e.g., ML workloads High running cost Pod Disruption Budget
  14. Detects a Pod that needs to be assigned to a

    Node Set to OCI spec Pass limits Call cgroups memory.max Setting resource limits by level
  15. What has(had) to be implemented • API spec • Scheduler

    • Kubelet • Runtime ◦ cgroups v2 • Test
  16. What has(had) to be implemented - API spec - Scheduler

    - Kubelet - Runtime - cgroups v2 - Test These are kubelet/api changes only! VPA implementation is another different story ☠
  17. VPA will support in-place mode • Still lots to consider

    in AEP(Autoscaler Enhancement Proposal): ◦ https://github.com/kubernetes/autoscaler/tree/master/vertical -pod-autoscaler/enhancements/4016-in-place-updates-supp ort ◦ Current implementation depends on runtime/kubelet decision on restarting containers, however in the proposed design, it should happen in VPA ◦ More use cases and feedback are wanted! ◦ We really need to be careful about memory scale down • More discussion in #sig-node-inplace-pod-resize on K8s Slack
  18. Key Takeaways! • Resource management is important to managing the

    cluster efficiently and reliably. • VPA helps you to set the right amount of resources. • Current (GA) resource changes require restarting Pods ◦ There are some corner cases where there should not be • In-place resource resize solves this problem • VPA + in-place resource resize will bring you even more autonomous resource management on your cluster ◦ There are still LOTS of todos to make it happen
  19. Please provide us feedback 🤞 https://sched.co/1YePO Kohei Ota Email: [email protected]

    GitHub: @inductor Twitter: @inductor__ Aya Ozawa Email: [email protected] GitHub: @Ladicle Twitter: @Ladicle