Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SIG-Node 2020-05-12, experiences of advanced re...

SIG-Node 2020-05-12, experiences of advanced resource management in Kubernetes

Screencast used in demo:
https://asciinema.org/a/327044

Alexander D. Kanevskiy

May 12, 2020
Tweet

More Decks by Alexander D. Kanevskiy

Other Decks in Programming

Transcript

  1. 2 Agenda § Why? § Demo § What do we

    know about – Hardware in general – CPUs – Memory – UX
  2. 5

  3. Node 1 Node 0 System devices topology Socket 0 Core

    0 Core 1 Core 6 Core 7 Core 2 Core 3 Core 8 Core 9 Core 4 Core 5 Core 10 Core 11 PCIe UPI Socket 1 Core 0 Core 1 Core 6 Core 7 Core 2 Core 3 Core 8 Core 9 Core 4 Core 5 Core 10 Core 11 UPI PCIe Memory Controller Memory Controller
  4. Node 5 Node 4 System topology in real world Node

    0 Node 2 Node 1 Node 3 Package 0 Core 0 Core 1 Core 5 Core 6 Memory Controller Core 2 Core 7 Memory Controller Core 3 Core 4 Core 8 Core 9 PCIe UPI Package 1 Core 0 Core 1 Core 5 Core 6 Core 2 Core 7 Core 3 Core 4 Core 8 Core 9 UPI PCIe UPI UPI Memory Controller Memory Controller PCIe PCIe UPI UPI DMI DMI Chipset QAT x16 QAT x16 QAT x16 I/O Hub 4x10G NIC
  5. 8 CPU Things to keep in mind for CPUs §

    CPU cores vs. threads § CPU cores frequencies: base, turbo, throttling § CPU usage: Shared, Exclusive, Isolated § Additional CPU resources: Cache, Memory Bandwidth § Interconnect is an expensive resource § “C” in “NUMA” stands for “CPU” – Vendors have BIOS configurable settings to redefine what means NUMA (SNC, NPS,…) § Workload migration cost: low to very low § CPUs for Kubelet might be not the same meaning inside VM based runtimes
  6. Group … 9 System root Socket 0 Die 0 CPU

    0 CPU 1 Die 1 CPU 2 CPU 3 Socket 1 Die 0 CPU 4 CPU 5 Die 1 CPU 6 CPU 7 For each leaf node § Groups of CPU+Memory § Dynamic pools – Shared – Exclusive – Isolated – “Throttled” – … Parent nodes § Sum of subtree resources Topology-Aware CPU policy physical_package_id die_id core_id
  7. 10 Memory Things to keep in mind for Memory §

    Memory types – DRAM – “Persistent”, in ”volatile, system RAM mode” (PMEM) – High Bandwidth (HBM) § Kernel’s “Normal” vs. “Movable” § NUMA – Distances – Have CPU – Have ”normal” memory § Workload migration costs: medium to HIGH
  8. NUMA C 11 NUMA A NUMA B System root Socket

    0 Die 0 IO and Memory CPUs Core Thread Thread Core... DRAM PMEM HBM IO and Memory ... Die ... Socket ... Each Node § CPU – CPU-less NUMA nodes are linked to nodes with CPUs § All memory types – DRAM – PMEM – HBM § Placement Cost calculated based on – Requested memory type(s) and amount of available memory – Later: BW, WSS MemTier Topology-Aware policy
  9. 12 Linux OS Memory Tiering Principle of operation § Memory

    pages promoted from PMEM to DRAM when capacity available § Cold pages in DRAM Demoted to PMEM Page Promotions Page Demotions DDR
  10. 13 User friendly resources controls § Good UX that spawns

    public cloud, VMs, bare metal is hard – Especially for non-trivial resources (block I/O, caches, …) § Placing workloads might lead to situations where it can’t be done – Reject? – Rebalance? § Rebalancing of the running workloads can be also hard – Assigned devices – Memory migrations – Priorities The story of jar, rocks, pebbles and sand…
  11. 14 What next? UX is the key in our opinion

    § User should expect “it just works great” by default § Advanced users should be able to utilize good patterns on resource groups – Affinity/anti-affinity pattern – Device pipelines § Solutions that we do now should be aligned with where hardware is evolving to § Maybe not in Kubelet…?
  12. 15

  13. Description of the functionality CRI Resource Manager https://github.com/intel/cri-resource-manager • is

    a Container Runtime Interface proxy • sits between CRI Clients and the CRI Runtime • applies (hardware) resource policies to containers CPU, Memory, Cache, Memory Bandwidth, Block I/O, … • policies are applied by • modifying proxied container requests, or • generating container update requests, or • triggering extra policy-specific actions during request processing
  14. 17 Legal notices and disclaimers § Intel technologies’ features and

    benefits depend on system configuration and may require enabled hardware, software or service activation. § Performance varies depending on system configuration. § No computer system can be absolutely secure. § Check with your system manufacturer or retailer or learn more at www.intel.com. § Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. § *Other names and brands may be claimed as the property of others. § © Intel Corporation