Slide 1

Slide 1 text

Kubernetes Kubernetes Resource and Eviction Management Ader Fu 1

Slide 2

Slide 2 text

About Me • I am Ader Fu • Job: ➢ DevOps Engineer • Email: ➢ [email protected] 2

Slide 3

Slide 3 text

3 Outline ⚫ Resource types of Pod ⚫ Resource of requests and limits ⚫ QoS classes ⚫ Node Behavior : ➢ Eviction Policy & oom_killer ⚫ Experience Sharing

Slide 4

Slide 4 text

Resource types of Pod 4 1

Slide 5

Slide 5 text

Resource types • The Kubernetes scheduler uses Resource types to figure out where to run your pods. • CPU and memory are collectively referred to as compute resources. Compute resources are measurable quantities that can be requested, allocated, and consumed. • CPU is specified in units of millicores • Memory is specified in units of bytes. 5 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types

Slide 6

Slide 6 text

Compressible resource • Compressible resource • Hold no state • Can be taken away very quickly • “Merely” cause slowness when revoked • e.g. CPU,disk time • Incompressible resource • Hold state • Are slower to be taken away • Can fail to be revoked • e.g. Memory,disk space https://www.slideshare.net/damianigbe/kubernetes-scheduling-and-qos 6

Slide 7

Slide 7 text

Resource types : CPU • CPU resources are measured in cpu units • One CPU, in Kubernetes is equivalent to • 1 AWS vCPU • 1 GCP Core • 1 Azure vCore • 1 IBM vCPU • 1 Hyperthread on a bare-metal Intel processor with Hyperthreading • Unit Form: the form 100m might be preferred. • CPU is considered a “compressible” resource. 7 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types

Slide 8

Slide 8 text

Resource types : Memory • Memory resources are measured in bytes. • Unit Form: • integer or as a fixed-point integer using one of these suffixes: E, P, T, G, M, K. • the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. • Memory is considered a “incompressible” resource. 8 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types

Slide 9

Slide 9 text

Unit of Resource types 9 Decimal Value Metric 1000 KB kilobyte 10002 MB megabyte 10003 GB gigabyte 10004 TB terabyte 10005 PB petabyte 10006 EB exabyte 10007 ZB zettabyte 10008 YB yottabyte Binary Value IEC 1024 KiB kibibyte 10242 MiB mebibyte 10243 GiB gibibyte 10244 TiB tebibyte 10245 PiB pebibyte 10246 EiB exbibyte 10247 ZiB zebibyte 10248 YiB yobibyte https://en.wikipedia.org/wiki/Mebibyte

Slide 10

Slide 10 text

Resource of requests and limits 2 10

Slide 11

Slide 11 text

Resource requests and limits • Cgroups are used to map Pod CPU and Memory Resources https://schd.ws/hosted_files/kccnceu18/33/Inside%20Kubernetes%20QoS%20M.%20Gasch%20KubeCon%20E U%20FINAL.pdf 11 ESXi (Host) OS (Linux Kernel) Kubernetes (Pod Manifest) CPU Requests CPU Limits CPU Shares CPU Quota CPU Period CPU Shares CPU Reservation CPU Limit MEM Requests MEM Limits OOM Score Adj. MEM Limits MEM Shares MEM Reservation MEM Limit

Slide 12

Slide 12 text

How QoS is enforcedat the Node https://schd.ws/hosted_files/kccnceu18/33/Inside%20Kubernetes%20QoS%20M.%20Gasch%20KubeCon%20E U%20FINAL.pdf 12 Kubelet View cpu = 1 memory=2 00Mi

Slide 13

Slide 13 text

How Pods with resource are scheduled ➢How Pods with resource requests are scheduled? • When you create a Pod, the Kubernetes scheduler selects a node for the Pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for Pods. ➢How Pods with resource limits are run? • When the kubelet starts a Container of a Pod, it passes the CPU and memory limits to the container runtime. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource- requests-are-scheduled 13

Slide 14

Slide 14 text

Note • Node: It is important to remember that you cannot set requests that are larger than resources provided by your nodes. For example, if you have a cluster of dual-core machines, a Pod with a request of 2.5 cores will never be scheduled! • Pod: Each container in the Pod can set its own requests and limits, and these are all additive. 14 https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits

Slide 15

Slide 15 text

QoS classes 15 3

Slide 16

Slide 16 text

QoS classes 16 https://schd.ws/hosted_files/kccnceu18/33/Inside%20Kubernetes%20QoS%20M.%20Gasch%20KubeCon%20E U%20FINAL.pdf Guaranteed + Predictable SLA and highest Priority(Eviction) - Lower Efficiency (Resources capped, no Overcommit) Burstable + Increase Overcommit Level, use idle Resources* - Medium Priority (Eviction), unbounded Resources* Best Effort + High Resource Efficiency &Utilization - Resource Starvation and Eviction verylikely

Slide 17

Slide 17 text

QoS classes 17 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#qos-classes CPU Memory Class R(equests) L(imits) R L Pod Best Effort ( 1Container) 0=R=L (all Containers) R L R L Pod Guaranteed ( 1Container) 0

Slide 18

Slide 18 text

Assigned a QoS class of Guaranteed • For a Pod to be given a QoS class of Guaranteed: • Every Container in the Pod must have a memory limit and a memory request, and they must be the same. • Every Container in the Pod must have a CPU limit and a CPU request, and they must be the same 18 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-guaranteed

Slide 19

Slide 19 text

apiVersion: v1 kind: Pod metadata: name: qos-demo namespace: qos-example spec: containers: - name: qos-demo-ctr image: nginx resources: limits: memory: "200Mi" cpu: "700m" requests: memory: "200Mi" cpu: "700m" Specification:Guaranteed Pod 19 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-guaranteed

Slide 20

Slide 20 text

Note : Guaranteed Pod • If a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit • If a Container specifies its own CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit. 20 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-guaranteed

Slide 21

Slide 21 text

Assigned a QoS class of Burstable • A Pod is given a QoS class of Burstable if: • The Pod does not meet the criteria for QoS class Guaranteed. • At least one Container in the Pod has a memory or CPU request. 21 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-burstable

Slide 22

Slide 22 text

Specification: Burstable Pod 22 apiVersion: v1 kind: Pod metadata: name: qos-demo-2 namespace: qos-example spec: containers: - name: qos-demo-2-ctr image: nginx resources: limits: memory: "200Mi" requests: memory: "100Mi" https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-guaranteed

Slide 23

Slide 23 text

Assigned a QoS class of BestEffort • For a Pod to be given a QoS class of BestEffort • the Containers in the Pod must not have any memory or CPU limits or requests. 23 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-besteffort

Slide 24

Slide 24 text

pods/qos/qos-pod-3.yaml apiVersion: v1 kind: Pod metadata: name: qos-demo-3 namespace: qos-example spec: containers: - name: qos-demo-3-ctr image: nginx Specification: BestEffort Pod 24 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-guaranteed

Slide 25

Slide 25 text

Summuy : QoS classes setting 25 Resource types(set up) QoS classes limits requests O O O X Guaranteed X O Burstable X X Best Effort Resource types(value) QoS classes limits requests limits=requests Guaranteed limits>requests Burstable limits

Slide 26

Slide 26 text

NodeBehavior : Eviction Policy & oom_killer 26 4

Slide 27

Slide 27 text

QoS Classes and NodeBehavior • Resources are either compressible (CPU) or uncompressible (Memory, Storage) • Compressible = Throttling (Weight: cpu.shares) • Uncompressible = Evict (Kubelet) or OOM_kill (“OutOfMemory Killer” byKernel) • Kubelet Eviction Thresholds can be “hard” (instantly) and “soft” (allow Pod Termination Grace Period) • Note: – If Kubelet cannot react fast enough, e.g. Memory Spike, Kernel OOM kills Container • There’s no Coordination between Eviction and OOM Killer (Race Condition possible) https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#qos-classes 27

Slide 28

Slide 28 text

Eviction Policy • The kubelet needs to preserve node stability when available compute resources are low. This is especially important when dealing with incompressible compute resources, such as memory or disk space. 28 Eviction Signal memory.available nodefs.available nodefs.inodesFree imagefs.available imagefs.inodesFree Default hard eviction threshold memory.available<100Mi nodefs.available<10% nodefs.inodesFree<5% imagefs.available<15% https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#eviction-policy

Slide 29

Slide 29 text

Soft Eviction Thresholds • A soft eviction threshold pairs an eviction threshold with a required administrator-specified grace period. No action is taken by the kubelet to reclaim resources associated with the eviction signal until that grace period has been exceeded. • soft eviction thresholds flags are supported: ◆eviction-soft ◆eviction-soft-grace-period ◆eviction-max-pod-grace-period 29 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#soft-eviction-thresholds

Slide 30

Slide 30 text

Hard Eviction Thresholds • A hard eviction threshold has no grace period, and if observed, the kubelet will take immediate action to reclaim the associated starved resource. If a hard eviction threshold is met, the kubelet kills the Pod immediately with no graceful termination. • hard eviction thresholds flags are supported: ◆eviction-hard 30 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#hard-eviction-thresholds

Slide 31

Slide 31 text

Evicting end-user Pods • If the kubelet is unable to reclaim sufficient resource on the node, kubelet begins evicting Pods. • kubelet ranks and evicts Pods in the following order: 1. BestEffort 2. Burstable 3. Guaranteed 31 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods

Slide 32

Slide 32 text

Node OOM Behavior • The kubelet sets a oom_score_adj value for each container based on the quality of service for the Pod. https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior 32 Quality of Service oom_score_adj Guaranteed -998 BestEffort 1000 Burstable min(max(2, 1000 - (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999)

Slide 33

Slide 33 text

Node OOM Behavior • kubelet may not observe memory pressure right away • The kubelet currently polls cAdvisor to collect memory usage stats at a regular interval. If memory usage increases within that window rapidly, the kubelet may not observe MemoryPressure fast enough, and the OOMKiller will still be invoked. • viable workaround :set eviction thresholds at approximately 75% capacity 33

Slide 34

Slide 34 text

Experience Sharing 34 5

Slide 35

Slide 35 text

Experience Sharing:cpuset • The static policy allows containers in Guaranteed pods with integer CPU requests access to exclusive CPUs on the node. 35 spec: containers: - name: nginx image: nginx resources: limits: memory: "200Mi" cpu: "2" requests: memory: "200Mi" cpu: "2" https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy

Slide 36

Slide 36 text

Experience Sharing: DaemonSet • Protect critical (System) Pods (DaemonSets, Controllers, Master Components) • It is never desired for kubelet to evict a DaemonSet Pod, since the Pod is immediately recreated and rescheduled back to the same node. • Instead DaemonSet should ideally launch Guaranteed Pods. 36 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#daemonset

Slide 37

Slide 37 text

Q&A 37

Slide 38

Slide 38 text

Ref. ➢ Inside Kubernetes Resource Management (QoS) – Mechanics and Lessons from the Field - Michael Gasch • https://www.youtube.com/watch?v=8-apJyr2gi0 • https://schd.ws/hosted_files/kccnceu18/33/Inside %20Kubernetes%20QoS%20M.%20Gasch%20Kube Con%20EU%20FINAL.pdf 38