$30 off During Our Annual Pro Sale. View Details »

Kubernetes Resource and Eviction Management

Kubernetes Resource and Eviction Management

1. Resource types of Pod
2. Resource of requests and limits
3. QoS classes
4. Node Behavior : Eviction Policy & oom_killer
5. Experience Sharing

ydFu(Ader Fu)

May 21, 2020
Tweet

More Decks by ydFu(Ader Fu)

Other Decks in Technology

Transcript

  1. Kubernetes
    Kubernetes Resource and
    Eviction Management
    Ader Fu
    1

    View Slide

  2. About Me
    • I am Ader Fu
    • Job:
    ➢ DevOps Engineer
    • Email:
    [email protected]
    2

    View Slide

  3. 3
    Outline
    ⚫ Resource types of Pod
    ⚫ Resource of requests and limits
    ⚫ QoS classes
    ⚫ Node Behavior :
    ➢ Eviction Policy & oom_killer
    ⚫ Experience Sharing

    View Slide

  4. Resource types
    of Pod
    4
    1

    View Slide

  5. Resource types
    • The Kubernetes scheduler uses Resource types to
    figure out where to run your pods.
    • CPU and memory are collectively referred to
    as compute resources. Compute resources are
    measurable quantities that can be requested,
    allocated, and consumed.
    • CPU is specified in units of millicores
    • Memory is specified in units of bytes.
    5
    https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types

    View Slide

  6. Compressible resource
    • Compressible resource
    • Hold no state
    • Can be taken away very quickly
    • “Merely” cause slowness when revoked
    • e.g. CPU,disk time
    • Incompressible resource
    • Hold state
    • Are slower to be taken away
    • Can fail to be revoked
    • e.g. Memory,disk space
    https://www.slideshare.net/damianigbe/kubernetes-scheduling-and-qos 6

    View Slide

  7. Resource types : CPU
    • CPU resources are measured in cpu units
    • One CPU, in Kubernetes is equivalent to
    • 1 AWS vCPU
    • 1 GCP Core
    • 1 Azure vCore
    • 1 IBM vCPU
    • 1 Hyperthread on a bare-metal Intel processor with Hyperthreading
    • Unit Form: the form 100m might be preferred.
    • CPU is considered a “compressible” resource.
    7
    https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types

    View Slide

  8. Resource types : Memory
    • Memory resources are measured in bytes.
    • Unit Form:
    • integer or as a fixed-point integer using one of these
    suffixes: E, P, T, G, M, K.
    • the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki.
    • Memory is considered a “incompressible” resource.
    8
    https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types

    View Slide

  9. Unit of Resource types
    9
    Decimal
    Value Metric
    1000 KB kilobyte
    10002 MB megabyte
    10003 GB gigabyte
    10004 TB terabyte
    10005 PB petabyte
    10006 EB exabyte
    10007 ZB zettabyte
    10008 YB yottabyte
    Binary
    Value IEC
    1024 KiB kibibyte
    10242 MiB mebibyte
    10243 GiB gibibyte
    10244 TiB tebibyte
    10245 PiB pebibyte
    10246 EiB exbibyte
    10247 ZiB zebibyte
    10248 YiB yobibyte
    https://en.wikipedia.org/wiki/Mebibyte

    View Slide

  10. Resource of
    requests and limits
    2
    10

    View Slide

  11. Resource requests and limits
    • Cgroups are used to map Pod CPU and Memory Resources
    https://schd.ws/hosted_files/kccnceu18/33/Inside%20Kubernetes%20QoS%20M.%20Gasch%20KubeCon%20E
    U%20FINAL.pdf
    11
    ESXi (Host)
    OS (Linux Kernel)
    Kubernetes (Pod Manifest)
    CPU Requests
    CPU Limits
    CPU Shares
    CPU Quota
    CPU Period
    CPU Shares
    CPU Reservation
    CPU Limit
    MEM Requests
    MEM Limits
    OOM Score Adj.
    MEM Limits
    MEM Shares
    MEM Reservation
    MEM Limit

    View Slide

  12. How QoS is enforcedat the Node
    https://schd.ws/hosted_files/kccnceu18/33/Inside%20Kubernetes%20QoS%20M.%20Gasch%20KubeCon%20E
    U%20FINAL.pdf
    12
    Kubelet View
    cpu
    =
    1
    memory=2
    00Mi

    View Slide

  13. How Pods with resource are scheduled
    ➢How Pods with resource requests are scheduled?
    • When you create a Pod, the Kubernetes scheduler
    selects a node for the Pod to run on. Each node has
    a maximum capacity for each of the resource types:
    the amount of CPU and memory it can provide for
    Pods.
    ➢How Pods with resource limits are run?
    • When the kubelet starts a Container of a Pod, it
    passes the CPU and memory limits to the container
    runtime.
    https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-
    requests-are-scheduled
    13

    View Slide

  14. Note
    • Node: It is important to remember that you cannot
    set requests that are larger than resources provided
    by your nodes. For example, if you have a cluster of
    dual-core machines, a Pod with a request of 2.5
    cores will never be scheduled!
    • Pod: Each container in the Pod can set its own
    requests and limits, and these are all additive.
    14
    https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits

    View Slide

  15. QoS classes
    15
    3

    View Slide

  16. QoS classes
    16
    https://schd.ws/hosted_files/kccnceu18/33/Inside%20Kubernetes%20QoS%20M.%20Gasch%20KubeCon%20E
    U%20FINAL.pdf
    Guaranteed
    + Predictable SLA and highest Priority(Eviction)
    - Lower Efficiency (Resources capped, no Overcommit)
    Burstable
    + Increase Overcommit Level, use idle Resources*
    - Medium Priority (Eviction), unbounded Resources*
    Best Effort
    + High Resource Efficiency &Utilization
    - Resource Starvation and Eviction verylikely

    View Slide

  17. QoS classes
    17
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#qos-classes
    CPU Memory
    Class
    R(equests)
    L(imits)
    R
    L
    Pod Best Effort
    (
    1Container) 0=R=L
    (all Containers)
    R
    L
    R
    L
    Pod Guaranteed
    (
    1Container) 0(all Containers)
    R
    L
    R
    L
    R
    L
    R
    L
    Pod
    (2 Containers)
    Burstable
    0(at least
    one Container)
    QoS Examples
    • Classes calculated based on CPU and Memory
    Resource Specifications (Requests/Limits)

    View Slide

  18. Assigned a QoS class of Guaranteed
    • For a Pod to be given a QoS class of Guaranteed:
    • Every Container in the Pod must have a memory limit
    and a memory request, and they must be the same.
    • Every Container in the Pod must have a CPU limit and a
    CPU request, and they must be the same
    18
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-
    qos-class-of-guaranteed

    View Slide

  19. apiVersion: v1
    kind: Pod
    metadata:
    name: qos-demo
    namespace: qos-example
    spec:
    containers:
    - name: qos-demo-ctr
    image: nginx
    resources:
    limits:
    memory: "200Mi"
    cpu: "700m"
    requests:
    memory: "200Mi"
    cpu: "700m"
    Specification:Guaranteed Pod
    19
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-
    qos-class-of-guaranteed

    View Slide

  20. Note : Guaranteed Pod
    • If a Container specifies its own memory limit, but
    does not specify a memory request, Kubernetes
    automatically assigns a memory request that
    matches the limit
    • If a Container specifies its own CPU limit, but does
    not specify a CPU request, Kubernetes
    automatically assigns a CPU request that matches
    the limit.
    20
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-
    qos-class-of-guaranteed

    View Slide

  21. Assigned a QoS class of Burstable
    • A Pod is given a QoS class of Burstable if:
    • The Pod does not meet the criteria for QoS class
    Guaranteed.
    • At least one Container in the Pod has a memory or CPU
    request.
    21
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-
    qos-class-of-burstable

    View Slide

  22. Specification: Burstable Pod
    22
    apiVersion: v1
    kind: Pod
    metadata:
    name: qos-demo-2
    namespace: qos-example
    spec:
    containers:
    - name: qos-demo-2-ctr
    image: nginx
    resources:
    limits:
    memory: "200Mi"
    requests:
    memory: "100Mi"
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-
    qos-class-of-guaranteed

    View Slide

  23. Assigned a QoS class of BestEffort
    • For a Pod to be given a QoS class of BestEffort
    • the Containers in the Pod must not have any memory or
    CPU limits or requests.
    23
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-
    qos-class-of-besteffort

    View Slide

  24. pods/qos/qos-pod-3.yaml
    apiVersion: v1
    kind: Pod
    metadata:
    name: qos-demo-3
    namespace: qos-example
    spec:
    containers:
    - name: qos-demo-3-ctr
    image: nginx
    Specification: BestEffort Pod
    24
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-
    qos-class-of-guaranteed

    View Slide

  25. Summuy : QoS classes setting
    25
    Resource types(set up) QoS classes
    limits requests
    O O
    O X Guaranteed
    X O Burstable
    X X Best Effort
    Resource types(value) QoS classes
    limits requests
    limits=requests Guaranteed
    limits>requests Burstable
    limits• QoS classes Soucre Code:
    kubernetes/pkg/apis/core/v1/helper/qos/qos.go
    https://github.com/kubernetes/kubernetes/blob/5713c22eecff461
    0026643fbd3d37c33a43c168d/pkg/apis/core/v1/helper/qos/qos.go

    View Slide

  26. NodeBehavior :
    Eviction Policy &
    oom_killer
    26
    4

    View Slide

  27. QoS Classes and NodeBehavior
    • Resources are either compressible (CPU) or
    uncompressible (Memory, Storage)
    • Compressible = Throttling (Weight: cpu.shares)
    • Uncompressible = Evict (Kubelet) or OOM_kill
    (“OutOfMemory Killer” byKernel)
    • Kubelet Eviction Thresholds can be “hard”
    (instantly) and “soft” (allow Pod Termination Grace
    Period)
    • Note:
    – If Kubelet cannot react fast enough, e.g. Memory Spike, Kernel
    OOM kills Container
    • There’s no Coordination between Eviction and OOM Killer (Race
    Condition possible)
    https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#qos-classes 27

    View Slide

  28. Eviction Policy
    • The kubelet needs to preserve node stability when
    available compute resources are low. This is
    especially important when dealing with
    incompressible compute resources, such as
    memory or disk space.
    28
    Eviction Signal
    memory.available
    nodefs.available
    nodefs.inodesFree
    imagefs.available
    imagefs.inodesFree
    Default hard eviction threshold
    memory.available<100Mi
    nodefs.available<10%
    nodefs.inodesFree<5%
    imagefs.available<15%
    https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#eviction-policy

    View Slide

  29. Soft Eviction Thresholds
    • A soft eviction threshold pairs an eviction threshold
    with a required administrator-specified grace
    period. No action is taken by the kubelet to reclaim
    resources associated with the eviction signal until
    that grace period has been exceeded.
    • soft eviction thresholds flags are supported:
    ◆eviction-soft
    ◆eviction-soft-grace-period
    ◆eviction-max-pod-grace-period
    29
    https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#soft-eviction-thresholds

    View Slide

  30. Hard Eviction Thresholds
    • A hard eviction threshold has no grace period, and
    if observed, the kubelet will take immediate action
    to reclaim the associated starved resource. If a hard
    eviction threshold is met, the kubelet kills the Pod
    immediately with no graceful termination.
    • hard eviction thresholds flags are supported:
    ◆eviction-hard
    30
    https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#hard-eviction-thresholds

    View Slide

  31. Evicting end-user Pods
    • If the kubelet is unable to reclaim sufficient
    resource on the node, kubelet begins evicting Pods.
    • kubelet ranks and evicts Pods in the following order:
    1. BestEffort
    2. Burstable
    3. Guaranteed
    31
    https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods

    View Slide

  32. Node OOM Behavior
    • The kubelet sets a oom_score_adj value for each
    container based on the quality of service for the
    Pod.
    https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior 32
    Quality of Service oom_score_adj
    Guaranteed -998
    BestEffort 1000
    Burstable
    min(max(2, 1000 - (1000 * memoryRequestBytes) /
    machineMemoryCapacityBytes), 999)

    View Slide

  33. Node OOM Behavior
    • kubelet may not observe memory pressure right
    away
    • The kubelet currently polls cAdvisor to collect
    memory usage stats at a regular interval. If memory
    usage increases within that window rapidly, the
    kubelet may not observe MemoryPressure fast
    enough, and the OOMKiller will still be invoked.
    • viable workaround :set eviction thresholds at
    approximately 75% capacity
    33

    View Slide

  34. Experience Sharing
    34
    5

    View Slide

  35. Experience Sharing:cpuset
    • The static policy allows containers in Guaranteed
    pods with integer CPU requests access to exclusive
    CPUs on the node.
    35
    spec:
    containers:
    - name: nginx
    image: nginx
    resources:
    limits:
    memory: "200Mi"
    cpu: "2"
    requests:
    memory: "200Mi"
    cpu: "2"
    https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy

    View Slide

  36. Experience Sharing: DaemonSet
    • Protect critical (System) Pods (DaemonSets,
    Controllers, Master Components)
    • It is never desired for kubelet to evict a DaemonSet
    Pod, since the Pod is immediately recreated and
    rescheduled back to the same node.
    • Instead DaemonSet should ideally launch
    Guaranteed Pods.
    36
    https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#daemonset

    View Slide

  37. Q&A
    37

    View Slide

  38. Ref.
    ➢ Inside Kubernetes Resource Management (QoS) –
    Mechanics and Lessons from the Field - Michael
    Gasch
    • https://www.youtube.com/watch?v=8-apJyr2gi0
    • https://schd.ws/hosted_files/kccnceu18/33/Inside
    %20Kubernetes%20QoS%20M.%20Gasch%20Kube
    Con%20EU%20FINAL.pdf
    38

    View Slide