Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reading: Mind the Gap: Broken Promises of CPU R...

Avatar for wkb8s wkb8s
September 08, 2025
0

Reading: Mind the Gap: Broken Promises of CPU Reservations in Containerized Multi-tenant Clouds

Avatar for wkb8s

wkb8s

September 08, 2025
Tweet

Transcript

  1. Mind the Gap: Broken Promises of CPU Reservations in Containerized

    Multi-tenant Clouds Li Liu1, Haoliang Wang2, An Wang3, Mengbai Xiao4, Yue Cheng1, Songqing Chen1 (1George Mason University, 2Adobe Research, 3Case Western Reserve University, 4Shandong University) Keio University Kono Laboratory, Daiki Wakabayashi SoCC ’21
  2. ▪ Users can easily develop & deploy containerized applications ▪

    Kubernetes (k8s) automates container deployment ▪ Users submit requests to k8s ▪ k8s deploys and manages containers on machines (nodes) Container orchestration systems 2 Machine0 (Node0) Container0 Machine1 (Node1) Container1 Container2 user kubernetes request deploy
  3. ▪ Different type of containers may share the same node

    ▪ Sharing the same machine may lead to performance degradation Single-tenant node vs. Multi-tenant node 3 Single-tenant Node App:A Container0 Multi-tenant Node App:A Container1 App: B Container0 App: B Container1 VS Physical machine Physical machine
  4. ▪ Goal ▪ Evaluate performance degradation in a multi-tenant environment

    ▪ Setup ▪ Dell PowerEdge R420s ◆ 2 Intel Xeon E5-2420 CPUs (6 Core / 12 Threads) – 22 CPUs available for hosting containers – 2 CPUs for Kubernetes and OS system services ◆ 24 GB RAM ◆ Debian 9.12 with Linux kernel 5.10 ◆ Docker 18.09.7 ◆ Kubernetes 1.17.3 Preliminary experiment1: Impact of neighbor containers 4
  5. ▪ Workloads ▪ Target container ◆ Batch application: PARSEC or

    SPLASH-2 ◆ Interactive application: Memcached + YCSB ▪ Neighbor container ◆ Application: stress-ng Preliminary experiment1: Impact of neighbor containers 5 Machine (24 core) Target Container Neighbor Container measure target metrics
  6. ▪ The performance degradation is severe under multi-tenancy ▪ makes

    it difficult to predict performance Performance: Single- vs. Multi-tenancy 6 lower is better x5 Performance comparison single-tenancy vs. multi-tenancy
  7. ▪ Noisy neighbors can abuse shared resources ▪ scheduling latency,

    flushing CPU cache, generating memory traffic, … Commonly believed cause: HW contention 7 Multi-tenant Node Target Container1 Neighbor Container0 Neighbor Container1 Physical Core0 Physical Core1 Containers may share resources
  8. ▪ Goal ▪ Evaluate the effect of HW contention by

    adjusting the degree of contention ▪ Workloads ▪ Prepare 4 options in neighboring application Preliminary experiment2: Impact of HW contention 8 Burstable: container is allowed to use more CPU when idle CPU exists specified via k8s YAML
  9. ▪ Least noisy capped & CPU-intensive neighbor degrades ▪ completion

    time (x2) ▪ CPU utilization of target app (x0.6) HW contention is not the sole cause 9 lower is better Performance and CPU utilization of target container when running alone vs. neighboring apps (CPU time spent by target app) (CPU requests of target app)
  10. ▪ With Capped & CPU intensive neighbor, ▪ No CPU

    over-committed in nodes ▪ Capped neighbors cannot use more CPU than they reserve Why the target’s CPU usage degrades? 10 Available CPU (24 core) Target App (R core, Capped) Neighbor App (22 - R core, Capped) System (2 core) Not fully utilized!!
  11. ▪ Demonstrate the performance degradation in multi-tenant node ▪ Point

    out that the cause of this problem lies in scheduling ▪ Propose makeshift solution: rKube Goal 11
  12. ▪ Demonstrate the performance degradation in multi-tenant node ▪ Point

    out that the cause of this problem lies in scheduling ▪ Propose makeshift solution: rKube Goal 12
  13. ▪ Kubernetes translates resource requests into cgroup values ▪ cpu.shares

    is a relative weight, which may cause a resource premises Container CPU allocation 13 user k8s .yaml containers: resources: requests: cpu: X limits: cpu: Y cgroup values cpu.shares: X * 1024 cpu.cfs_quota_us Y * cpu.cfs_period_us Linux CFS relative weight k8s throttle threads if they exceed cpu_time limits
  14. ▪ Linux CFS fails to fulfill CPU reservations because of

    ▪ 1. Forced Runqueue Sharing ◆ One container may be forced to share the runqueue with neighboring containers ▪ 2. Phantom CPU Time ◆ Kubernetes throttle the proceeding threads after using up reserved CPU ◆ However, target applications fail to utilize reserved CPU because of insufficient threads Root cause of performance degradation 14
  15. ▪ Case 1A: target is running batch application with capped

    neighbors Mechanism of performance degradation 15 one scheduling period task weight is not used because there is no other threads in runqueue task weight is relative value. so, usage of T is 1024 / (1024 + 683) = 0.6 CPU time
  16. ▪ Case 1A: target is running batch application with capped

    neighbors Mechanism of performance degradation 16 one scheduling period N1-3 are throttled after using up reserved CPU However, T cannot utilize idle CPU1 and CPU2 phantom CPU time
  17. ▪ Case 1B: target is running batch application with burstable

    neighbors Mechanism of performance degradation 17 N2 and N3 use idle CPU because T cannot use CPU1 and CPU2
  18. ▪ Case 2: target is running interactive application ▪ A:

    with capped neighbors ▪ B: with burstable neighbors Mechanism of performance degradation 18 A: w/ capped neighbors B: w/ burstable neighbors T cannot wake up in time because of the enforcement of minimum granurality
  19. ▪ Goal: proof-of-concept ▪ Demonstrate that an enforced CPU reservation

    significantly improves performance in multi-tenant nodes ▪ Solution ▪ Set CPU affinity for individual containers ▪ Enforce CPU reservation by using cpuset.cpus of Linux cgroup ◆ e.g: cpuset.cpus=0-1,3: only CPU0, CPU1, CPU3 are available ▪ Implementation ▪ Build rKube prototype based on kubernetes v1.17.3 ▪ Add new field named “policy” to the kubernetes templete ◆ standard policy  : use cpu.shares ◆ strict policy (rKube) : use cpuset.cpus Makeshift solution: rKube 19
  20. ▪ Evaluate effectiveness of rKube to reduce neighbor interference ▪

    rKube vs. resource request scaling ▪ rKube vs. resource under-commitment ▪ Setup & Workload ▪ basically same as preliminary experiment ◆ rKube ◆ standard: default scheduling policy Evaluation 20
  21. ▪ rKube reduces neighbor interference in multi-tenant nodes ▪ For

    streamcluster, rKube increases CPU utilization from 27% to 93% CPU utilization with target container 21 rKube: CPU utilization with batch applications standard policy: CPU utilization with batch applications higher is better
  22. ▪ rKube slightly decreases overall host CPU utilization ▪ CPU

    utilization decreases 100% to 91% for burstable case Overall host CPU utilization 22 higher is better Batch app (streamcluster) Memcached Composition of overall host CPU utilization with rKube and standard CPU allocated to target/neighbors
  23. ▪ rKube reduces neighbor interference in multi-tenant nodes ▪ the

    speedup of completion time ranges from 2.1x to 5.6x Performance with target container 23 rKube: completion time of batch applications standard policy: completion time of batch applications lower is better
  24. ▪ Improvement happens across all the situations ▪ reduction of

    tail latency ranges from 12.9x to 13.7x Performance with target container 24 Memcached tail latency rKube vs. standard lower is better
  25. ▪ rKube requires no additional CPU but reduces completion time

    Resource request scaling vs. rKube 25 Capped & Mem-Intensive neighbors Burstable & Mem-Intensive neighbors streamcluster performance improvement when using vertical scaling and rKube lower is better increase number of threads and CPU requests increase CPU requests (keeps number of threads 12) default
  26. ▪ rKube outperforms the under-commitment strategy Resource under-commitment vs. rKube

    26 Batch application (streamcluster, fixed 12 threads) Memcached (Read operation, fixed 6 threads) Application performance by under-commitment strategy and rKube higher is better lower is better (assigned CPUs both target and neighbor) (total allocatable CPUs)
  27. ▪ Performance isolation in multi-tenant clouds ▪ CPI2 [Zhang+, Eurosys

    ’13] ◆ Detect offender jobs by monitoring cycles-per-instruction (CPI) and throttle them ◆ Less predictable in performance because it throttles jobs after suffering damage ▪ Investigate issues of task scheduling ▪ The Linux scheduler: a decade of wasted cores [Lozi+, Eurosys ’16] ◆ Demonstrate the work conserving bugs in the Linux kernel ▪ The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS [Bouron+, USENIX ATC ’18] ◆ Replace the Linux kernel’s scheduler with FreeBSD ULE policy scheduler ◆ Compare the performance with different scheduling policy Related Works 27
  28. ▪ Observed x5 performance degradation in multi- vs. single-tenant node

    ▪ CPU utilization degradation is a major contributor ▪ Found that Linux CFS fails to fulfill CPU reservations ▪ Forced Runqueue Sharing ▪ Phantom CPU Time ▪ Implement a makeshift solution: rKube ▪ Demonstrated that an enforced CPU reservations significantly improves performance in multi-tenant nodes Conclusion 28