Slide 1

Slide 1 text

Aya Ozawa, CloudNatix Kohei Ota, Apple To Infinity and Beyond: Seamless autoscaling with in-place resource resize for Kubernetes Pods

Slide 2

Slide 2 text

Self introduction Kohei Ota Senior Field Engineer at Apple CNCF Ambassador Chair of Cloud Native Community Japan Owner of SIG-Docs Japanese localization Twitter: @inductor__ GitHub: @inductor Aya Ozawa Member of Technical Staff at CloudNatix Co-organizer of Kubernetes Meetup Tokyo Twitter: @Ladicle GitHub: @Ladicle

Slide 3

Slide 3 text

Resource management is a key to smooth sailing!

Slide 4

Slide 4 text

Resource Requests & Limits Minimum guaranteed amount of a resource Maximum amount of a resource

Slide 5

Slide 5 text

Resource Requests & Limits Convert Requests & Limits to Container Config Check Requests Requests: cpu.share Limits: cpu.max, memory.max Pod Qos: oom_score_adj

Slide 6

Slide 6 text

Does the Pod fit the Node? Requests are used for scheduling CPU/Memory can be overcommitted

Slide 7

Slide 7 text

Degrade Performance Hit the Resource Limits! Memory Limits → OOM Killed CPU Limits → Throttled Overcommitted nodes could be in the same situation before reaching the limit.

Slide 8

Slide 8 text

Pod Quality of Service(QoS) Class & OOM Score For all containers At least one container For all containers OOM Killer oom_score_adj Node OOM!

Slide 9

Slide 9 text

Eviction Manager: Memory-pressure & Requests Evicted in order of Usage>Requests, Pod Priority(PP), Request-usage Gap

Slide 10

Slide 10 text

What goes wrong? Requests are too low Requests are too high Waste of cost Performance/Availability Degradation

Slide 11

Slide 11 text

What goes wrong? Requests are too low Requests are too high Waste of cost Performance/Availability Degradation If requests are not configured correctly, cluster-autoscaler cannot scale properly.

Slide 12

Slide 12 text

How do you set the right value? Observe Review Scale

Slide 13

Slide 13 text

Vertical Pod Autoscaler

Slide 14

Slide 14 text

What’s the VPA Scalable Objects e.g., Deployments, StatefulSets Observe Review Scale

Slide 15

Slide 15 text

Recommended Requests lowerBound target uncappedTarget upperBound Running with less resources is likely to damage performance/availability. Recommended amount of resources. Original target before capped by resource policy. Allocating beyond these resources is likely wasted. Keep the same ratio as the original requests and limits

Slide 16

Slide 16 text

Add samples to 24h half-decayed histograms How are targets estimated? 1. Fetch Metrics 2. 3. Find P90 values Newer samples are more important than older ones Default: OOM: Always: Peak usage in 5m Last usage Current usage CPU Memory Percentile

Slide 17

Slide 17 text

HPA vs. VPA VPA cannot be used with HPA on CPU or Memory Scenario where VPA and HPA do not work well together Use HPA on custom/external metrics (e.g., RPS) instead. Requests Per Seconds

Slide 18

Slide 18 text

4 Update Modes (subject to change) Off Initial Recreate Auto Just calculate the recommended values Set resource only for pod creation Evict Pod and recreate it if resources not as recommended Equivalent to Recreate Only 1 1 & 3 All 1. Update 2. Evict 3. Set Resources VPA Components

Slide 19

Slide 19 text

Restarting is a serious disruption for some Pods (PDB) mitigates availability impact by Pod eviction! Successful scenario Non-optimal scenario High usage only at startup Resources not optimized Long-running Job e.g., ML workloads High running cost Pod Disruption Budget

Slide 20

Slide 20 text

In-place Pod Resizing Alpha

Slide 21

Slide 21 text

In-place Pod Resizing Alpha

Slide 22

Slide 22 text

Make resource settings mutable

Slide 23

Slide 23 text

Make resource settings mutable

Slide 24

Slide 24 text

Demo time! (wish us luck)

Slide 25

Slide 25 text

Demo time! (wish us luck) Succeeded? 👏👏👏 …Failed? It happens

Slide 26

Slide 26 text

Demo: Plan B (in case failure)

Slide 27

Slide 27 text

Detects a Pod that needs to be assigned to a Node Set to OCI spec Pass limits Call cgroups memory.max Setting resource limits by level

Slide 28

Slide 28 text

What has(had) to be implemented ● API spec ● Scheduler ● Kubelet ● Runtime ○ cgroups v2 ● Test

Slide 29

Slide 29 text

What has(had) to be implemented - API spec - Scheduler - Kubelet - Runtime - cgroups v2 - Test These are kubelet/api changes only! VPA implementation is another different story ☠

Slide 30

Slide 30 text

In-place resizing will be Beta soon (?) 🤷

Slide 31

Slide 31 text

VPA will support in-place mode ● Still lots to consider in AEP(Autoscaler Enhancement Proposal): ○ https://github.com/kubernetes/autoscaler/tree/master/vertical -pod-autoscaler/enhancements/4016-in-place-updates-supp ort ○ Current implementation depends on runtime/kubelet decision on restarting containers, however in the proposed design, it should happen in VPA ○ More use cases and feedback are wanted! ○ We really need to be careful about memory scale down ● More discussion in #sig-node-inplace-pod-resize on K8s Slack

Slide 32

Slide 32 text

Key Takeaways! ● Resource management is important to managing the cluster efficiently and reliably. ● VPA helps you to set the right amount of resources. ● Current (GA) resource changes require restarting Pods ○ There are some corner cases where there should not be ● In-place resource resize solves this problem ● VPA + in-place resource resize will bring you even more autonomous resource management on your cluster ○ There are still LOTS of todos to make it happen

Slide 33

Slide 33 text

In-place resizing will make your voyage manageable

Slide 34

Slide 34 text

Please provide us feedback 🤞 https://sched.co/1YePO Kohei Ota Email: [email protected] GitHub: @inductor Twitter: @inductor__ Aya Ozawa Email: [email protected] GitHub: @Ladicle Twitter: @Ladicle

Slide 35

Slide 35 text

Thank you!