KubeCon: To Infinity and Beyond: Seamless autoscaling with in-place resource resize for Kubernetes Pods

Aya Ozawa, CloudNatix Kohei Ota, Apple To Inﬁnity and Beyond:
Seamless autoscaling with in-place resource resize for Kubernetes Pods

Self introduction Kohei Ota Senior Field Engineer at Apple CNCF
Ambassador Chair of Cloud Native Community Japan Owner of SIG-Docs Japanese localization Twitter: @inductor__ GitHub: @inductor Aya Ozawa Member of Technical Staff at CloudNatix Co-organizer of Kubernetes Meetup Tokyo Twitter: @Ladicle GitHub: @Ladicle

Resource management is a key to smooth sailing!

Resource Requests & Limits Minimum guaranteed amount of a resource
Maximum amount of a resource

Resource Requests & Limits Convert Requests & Limits to Container
Conﬁg Check Requests Requests: cpu.share Limits: cpu.max, memory.max Pod Qos: oom_score_adj

Does the Pod ﬁt the Node? Requests are used for
scheduling CPU/Memory can be overcommitted

Degrade Performance Hit the Resource Limits! Memory Limits → OOM
Killed CPU Limits → Throttled Overcommitted nodes could be in the same situation before reaching the limit.

Pod Quality of Service(QoS) Class & OOM Score For all
containers At least one container For all containers OOM Killer oom_score_adj Node OOM!

Eviction Manager: Memory-pressure & Requests Evicted in order of Usage>Requests,
Pod Priority(PP), Request-usage Gap

What goes wrong? Requests are too low Requests are too
high Waste of cost Performance/Availability Degradation

What goes wrong? Requests are too low Requests are too
high Waste of cost Performance/Availability Degradation If requests are not conﬁgured correctly, cluster-autoscaler cannot scale properly.

How do you set the right value? Observe Review Scale

Vertical Pod Autoscaler

What’s the VPA Scalable Objects e.g., Deployments, StatefulSets Observe Review
Scale

Recommended Requests lowerBound target uncappedTarget upperBound Running with less resources
is likely to damage performance/availability. Recommended amount of resources. Original target before capped by resource policy. Allocating beyond these resources is likely wasted. Keep the same ratio as the original requests and limits

Add samples to 24h half-decayed histograms How are targets estimated?
1. Fetch Metrics 2. 3. Find P90 values Newer samples are more important than older ones Default: OOM: Always: Peak usage in 5m Last usage Current usage CPU Memory Percentile

HPA vs. VPA VPA cannot be used with HPA on
CPU or Memory Scenario where VPA and HPA do not work well together Use HPA on custom/external metrics (e.g., RPS) instead. Requests Per Seconds

4 Update Modes (subject to change) Oﬀ Initial Recreate Auto
Just calculate the recommended values Set resource only for pod creation Evict Pod and recreate it if resources not as recommended Equivalent to Recreate Only 1 1 & 3 All 1. Update 2. Evict 3. Set Resources VPA Components

Restarting is a serious disruption for some Pods (PDB) mitigates
availability impact by Pod eviction! Successful scenario Non-optimal scenario High usage only at startup Resources not optimized Long-running Job e.g., ML workloads High running cost Pod Disruption Budget

In-place Pod Resizing Alpha

Make resource settings mutable

Demo time! (wish us luck)

Demo time! (wish us luck) Succeeded? 👏👏👏 …Failed? It happens

Demo: Plan B (in case failure)

Detects a Pod that needs to be assigned to a
Node Set to OCI spec Pass limits Call cgroups memory.max Setting resource limits by level

What has(had) to be implemented • API spec • Scheduler
• Kubelet • Runtime ◦ cgroups v2 • Test

What has(had) to be implemented - API spec - Scheduler
- Kubelet - Runtime - cgroups v2 - Test These are kubelet/api changes only! VPA implementation is another different story ☠

In-place resizing will be Beta soon (?) 🤷

VPA will support in-place mode • Still lots to consider
in AEP(Autoscaler Enhancement Proposal): ◦ https://github.com/kubernetes/autoscaler/tree/master/vertical -pod-autoscaler/enhancements/4016-in-place-updates-supp ort ◦ Current implementation depends on runtime/kubelet decision on restarting containers, however in the proposed design, it should happen in VPA ◦ More use cases and feedback are wanted! ◦ We really need to be careful about memory scale down • More discussion in #sig-node-inplace-pod-resize on K8s Slack

Key Takeaways! • Resource management is important to managing the
cluster efficiently and reliably. • VPA helps you to set the right amount of resources. • Current (GA) resource changes require restarting Pods ◦ There are some corner cases where there should not be • In-place resource resize solves this problem • VPA + in-place resource resize will bring you even more autonomous resource management on your cluster ◦ There are still LOTS of todos to make it happen

In-place resizing will make your voyage manageable

Please provide us feedback 🤞 https://sched.co/1YePO Kohei Ota Email: [email protected]
GitHub: @inductor Twitter: @inductor__ Aya Ozawa Email: [email protected] GitHub: @Ladicle Twitter: @Ladicle

Thank you!

KubeCon: To Infinity and Beyond: Seamless autos...

KubeCon: To Infinity and Beyond: Seamless autoscaling with in-place resource resize for Kubernetes Pods

More Decks by Aya (Igarashi) Ozawa

Other Decks in Technology

Featured

Transcript