Slide 1

Slide 1 text

Kubernetes 與 Swap 的愛恨情仇 HungWei Chiu(邱宏瑋) 10/24/2024

Slide 2

Slide 2 text

K8s Installation • We have been told to disable swap in your Kubernetes installation, otherwise you will see the following error • Running with swap on is not supported, please disable swap! Or set — fail-swap-on f lag to false • [ERROR Swap]: running with swap on is not supported, Please disable swap

Slide 3

Slide 3 text

Swap • A memory management technique • Move less frequently used memory pages to disk storage, freeing up physical memory for other applications.

Slide 4

Slide 4 text

Swap • Pros: • Enhance memory utilization • Provides additional buffer for running more applications when physical memory is insuf f icient • Increase system stability • Helps prevent system crashes due to memory shortages (OOM)

Slide 5

Slide 5 text

Swap • Cons • Performance degradation • Disk I/O is slower than accessing memory, leading to potential slowdowns • Higher latency • Accessing swapped-out memory pages includes additional delays, affecting application responsiveness.

Slide 6

Slide 6 text

Kubernetes & Swap • Original Issue • https://github.com/kubernetes/kubernetes/issues/53533

Slide 7

Slide 7 text

Kubernetes & Swap • The swap support issue was raised since K8s 1.8 (2017) • 150+- discussion • Concerns • QoS De f inition and isolation • Design Principles

Slide 8

Slide 8 text

Kubernetes & Swap • Concerns • Have to control which pod could use swap • Performance penalty for Guaranteed Pods • Guaranteed pod should have consistent performance

Slide 9

Slide 9 text

Kubernetes & Swap • Concerns • No storage isolation • Disk (I/O) access performance maybe affected by swap • Guaranteed pods should have consistent performance.

Slide 10

Slide 10 text

Kubernetes & Swap • Concerns • No storage isolation • Disk (I/O) access performance maybe affected by swap • Guaranteed pods should have consistent performance.

Slide 11

Slide 11 text

Kubernetes & Swap • Concerns • How to determine the amount of total memory? • If we set the memory.limit to 2G • How about the swap? • Should be speci f ied by user or calculating by kubelet?

Slide 12

Slide 12 text

Kubernetes & Swap • Why Swap? • Cluster administrator • Node-level performance tuning and stability for noisy neighbor issues. • Application developers • Application would bene f it from using swap memory

Slide 13

Slide 13 text

Case 1 • Improved Node Stability • Cgroups v2 improved memory management algorithm, such as oomd which strongly recommend the use of swap. • Having a small amount of swap could improve resource pressure handling and recovery.

Slide 14

Slide 14 text

Case 1 • oomd (systemd-oomd) • From Facebook • Users-pace process • Flexible policy to control the OOM behavior • Needs the swap to make it stable

Slide 15

Slide 15 text

Case2 • Long-running applications • Jave and Node runtime rely on swap for optimal performance. • Initialization logic of applications can be safely swapped out without affecting long-running application resource usage.

Slide 16

Slide 16 text

Case2 • Alternative Solutions • Memory • Request: • Limit: • Lower resource utilization if we have to be guaranteed QoS

Slide 17

Slide 17 text

Case2 • Alternative Solutions • In-place resource update (alpha since 1.27) • Issues • Pod restart • GitOps • Loop • Update yaml + PR + Merged

Slide 18

Slide 18 text

Kubernetes & Swap (User Stories) • Low footprint system • Edge devices with limited memory • Edge compute system/devices with small memory (<2G) • Cluster with nodes <4Gi memory

Slide 19

Slide 19 text

Kubernetes & Swap (User Stories) • Virtualization management overhead • Virtualize k8s workloads such as VMs launched by Kubevirt • VM comes with a management related overhead • Swap help to not request much more memory to deal with. • Live migration to make system available instead of down

Slide 20

Slide 20 text

Kubernetes & Swap (User Stories) • Virtualization management overhead • Virtualize k8s workloads such as VMs launched by Kubevirt • VM comes with a management related overhead • Swap help to not request much more memory to deal with. •

Slide 21

Slide 21 text

Kubernetes & Swap (User Stories) • Live migration to make system available instead of down • When the target server doesn’t have enough memory

Slide 22

Slide 22 text

Kubernetes & Swap • KEP-2400: Node System Swap Support • Phase • 1.22: Alpha • 1.28: Beta1 • 1.30: Beta2 • Number of cases that would bene f it from swap. • Design principles?

Slide 23

Slide 23 text

Kubernetes & Swap • Scenarios • Swap is enabled on a node’s host system, but kubelet doesn’t not permit K8s workloads to use swap -> case 1 • Kubernetes can permit K8s workload scheduled on the node to use some quantity of swap, depending on the con f iguration -> case 2

Slide 24

Slide 24 text

Kubernetes & Swap • Proposal • QoS Issue • Enable swap support only for Burstable QoS Pods • No guaranteed • Performance penalty. • No Best-Effort • Low-priority pods that are f irst to be killed during node press.

Slide 25

Slide 25 text

Kubernetes & Swap • Proposal • Memory limit con f iguration • De f ine a format to calculate • Performance penalty • Take care by yourself :)

Slide 26

Slide 26 text

Kubernetes & Swap • According to the proposal • TotalPodsSwapAvailable • Total - system reserved • ContainerMemoryProportion • Request/Total • Example • Proportion = 0.5, 0.25 • Swap Limit is • 0.5 * 38 = 19 GB • 0.25 * 38 = 9.5 GB

Slide 27

Slide 27 text

Kubernetes & Swap • According to the code (master branch in 10/24/2024) • TotalPodsSwapAvailable • Total • ContainerMemoryProportion • Request/Total • Example • Proportion = 0.5, 0.25 • Swap Limit is • 0.5 * 38 = 19 GB • 0.25 * 38 = 9.5 GB

Slide 28

Slide 28 text

Kubernetes & Swap kubernetes/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go

Slide 29

Slide 29 text

Kubernetes & Swap google/cadvisor/machine/machine.go

Slide 30

Slide 30 text

Kubernetes & Swap • Proposal is proposal • Code will tell you the truth

Slide 31

Slide 31 text

Kubernetes & Swap • Goals • Kubelet can start up with swap on • Case 1 or 2 • Kubelet is able to set swap utilization to K8s workloads, default to 0 swap. • Use a swap memory for cgroups v2 • I/O isolation support

Slide 32

Slide 32 text

Kubernetes & Swap • Non-Goals • Support non-Linux OS • Provisioning swap from K8s con f iguration. • Setting swappiness from K8s • Allocate swap on per-workload basis with accounting. • Support cgroup v1

Slide 33

Slide 33 text

Kubernetes & Swap(Best Practices) • Disable swap for system critical daemons(kubelet) • Setting the cgroup for the system slice to avoid swap. • cgroup v2 • memory.swap.max = 0 • Setting up the io.latency • cgroup v2 • io.latency to adjust the priority, should higher than workload

Slide 34

Slide 34 text

Kubernetes & Swap(Best Practices) • Control Plane Swap • Only enable swap for worker nodes. • Control plane contains mostly guaranteed QoS Pods • Performance concern

Slide 35

Slide 35 text

Kubernetes & Swap(Best Practices) • Dedicated disk for swap • Using a separate disk for your swap partition. • Workload will interfere the swap performance if all are in the same disk. • Same as previous concern

Slide 36

Slide 36 text

Kubernetes & Swap(Best Practices) • Swap as the default. • Enabling swap on nodes is advanced con f iguration which needs the understanding of the Linux system. • —fail-swap-on=true still the 1st option for most cases of K8s

Slide 37

Slide 37 text

Kubernetes Swap • In 1.28 • Kubelet can’t run on node with swap on by default. • Swap behavior is UnlimitedSwap by default if we enable it. • Two options • LimitedSwap • Only Burstable Pod can use it • UnlimitedSwap • Allow for all pods.

Slide 38

Slide 38 text

Kubernetes Swap • In 1.30 • Kubelet supports to be running on node with swap on. • Swap behavior is NoSwap by default. • NoSwap • NoSwap means no k8s workload can use swap, as no-swap experience, but system can have swap for other use case. • LimitedSwap • Only burstalbe Pods can use it.

Slide 39

Slide 39 text

Summary • Swap and other features (CPU Manager..etc) • Kubernetes support these features • Controlled by application itself • Who will give them up if we can gain bene f it from them? • I do only care my application

Slide 40

Slide 40 text

Summary • More and more features are developed to improve the performance and stability. • Those are natural properties in the virtual machine environment • Container’s lightweight isolation make us rework again. • Still not able to replace all VM’s environment unless you know nothing about OS tuning