Reading: Mind the Gap: Broken Promises of CPU Reservations in Containerized Multi-tenant Clouds

Mind the Gap: Broken Promises of CPU Reservations in Containerized
Multi-tenant Clouds Li Liu1, Haoliang Wang2, An Wang3, Mengbai Xiao4, Yue Cheng1, Songqing Chen1 (1George Mason University, 2Adobe Research, 3Case Western Reserve University, 4Shandong University) Keio University Kono Laboratory, Daiki Wakabayashi SoCC ’21

▪ Users can easily develop & deploy containerized applications ▪
Kubernetes (k8s) automates container deployment ▪ Users submit requests to k8s ▪ k8s deploys and manages containers on machines (nodes) Container orchestration systems 2 Machine0 (Node0) Container0 Machine1 (Node1) Container1 Container2 user kubernetes request deploy

▪ Different type of containers may share the same node
▪ Sharing the same machine may lead to performance degradation Single-tenant node vs. Multi-tenant node 3 Single-tenant Node App:A Container0 Multi-tenant Node App:A Container1 App: B Container0 App: B Container1 VS Physical machine Physical machine

▪ Goal ▪ Evaluate performance degradation in a multi-tenant environment
▪ Setup ▪ Dell PowerEdge R420s ◆ 2 Intel Xeon E5-2420 CPUs (6 Core / 12 Threads) – 22 CPUs available for hosting containers – 2 CPUs for Kubernetes and OS system services ◆ 24 GB RAM ◆ Debian 9.12 with Linux kernel 5.10 ◆ Docker 18.09.7 ◆ Kubernetes 1.17.3 Preliminary experiment1: Impact of neighbor containers 4

▪ Workloads ▪ Target container ◆ Batch application: PARSEC or
SPLASH-2 ◆ Interactive application: Memcached + YCSB ▪ Neighbor container ◆ Application: stress-ng Preliminary experiment1: Impact of neighbor containers 5 Machine (24 core) Target Container Neighbor Container measure target metrics

▪ The performance degradation is severe under multi-tenancy ▪ makes
it difficult to predict performance Performance: Single- vs. Multi-tenancy 6 lower is better x5 Performance comparison single-tenancy vs. multi-tenancy

▪ Noisy neighbors can abuse shared resources ▪ scheduling latency,
flushing CPU cache, generating memory traffic, … Commonly believed cause: HW contention 7 Multi-tenant Node Target Container1 Neighbor Container0 Neighbor Container1 Physical Core0 Physical Core1 Containers may share resources

▪ Goal ▪ Evaluate the effect of HW contention by
adjusting the degree of contention ▪ Workloads ▪ Prepare 4 options in neighboring application Preliminary experiment2: Impact of HW contention 8 Burstable: container is allowed to use more CPU when idle CPU exists specified via k8s YAML

▪ Least noisy capped & CPU-intensive neighbor degrades ▪ completion
time (x2) ▪ CPU utilization of target app (x0.6) HW contention is not the sole cause 9 lower is better Performance and CPU utilization of target container when running alone vs. neighboring apps (CPU time spent by target app) (CPU requests of target app)

▪ With Capped & CPU intensive neighbor, ▪ No CPU
over-committed in nodes ▪ Capped neighbors cannot use more CPU than they reserve Why the target’s CPU usage degrades? 10 Available CPU (24 core) Target App (R core, Capped) Neighbor App (22 - R core, Capped) System (2 core) Not fully utilized!!

▪ Demonstrate the performance degradation in multi-tenant node ▪ Point
out that the cause of this problem lies in scheduling ▪ Propose makeshift solution: rKube Goal 11

▪ Demonstrate the performance degradation in multi-tenant node ▪ Point
out that the cause of this problem lies in scheduling ▪ Propose makeshift solution: rKube Goal 12

▪ Kubernetes translates resource requests into cgroup values ▪ cpu.shares
is a relative weight, which may cause a resource premises Container CPU allocation 13 user k8s .yaml containers: resources: requests: cpu: X limits: cpu: Y cgroup values cpu.shares: X * 1024 cpu.cfs_quota_us Y * cpu.cfs_period_us Linux CFS relative weight k8s throttle threads if they exceed cpu_time limits

▪ Linux CFS fails to fulfill CPU reservations because of
▪ 1. Forced Runqueue Sharing ◆ One container may be forced to share the runqueue with neighboring containers ▪ 2. Phantom CPU Time ◆ Kubernetes throttle the proceeding threads after using up reserved CPU ◆ However, target applications fail to utilize reserved CPU because of insufficient threads Root cause of performance degradation 14

▪ Case 1A: target is running batch application with capped
neighbors Mechanism of performance degradation 15 one scheduling period task weight is not used because there is no other threads in runqueue task weight is relative value. so, usage of T is 1024 / (1024 + 683) = 0.6 CPU time

▪ Case 1A: target is running batch application with capped
neighbors Mechanism of performance degradation 16 one scheduling period N1-3 are throttled after using up reserved CPU However, T cannot utilize idle CPU1 and CPU2 phantom CPU time

▪ Case 1B: target is running batch application with burstable
neighbors Mechanism of performance degradation 17 N2 and N3 use idle CPU because T cannot use CPU1 and CPU2

▪ Case 2: target is running interactive application ▪ A:
with capped neighbors ▪ B: with burstable neighbors Mechanism of performance degradation 18 A: w/ capped neighbors B: w/ burstable neighbors T cannot wake up in time because of the enforcement of minimum granurality

▪ Goal: proof-of-concept ▪ Demonstrate that an enforced CPU reservation
significantly improves performance in multi-tenant nodes ▪ Solution ▪ Set CPU affinity for individual containers ▪ Enforce CPU reservation by using cpuset.cpus of Linux cgroup ◆ e.g: cpuset.cpus=0-1,3: only CPU0, CPU1, CPU3 are available ▪ Implementation ▪ Build rKube prototype based on kubernetes v1.17.3 ▪ Add new field named “policy” to the kubernetes templete ◆ standard policy 　: use cpu.shares ◆ strict policy (rKube) : use cpuset.cpus Makeshift solution: rKube 19

▪ Evaluate effectiveness of rKube to reduce neighbor interference ▪
rKube vs. resource request scaling ▪ rKube vs. resource under-commitment ▪ Setup & Workload ▪ basically same as preliminary experiment ◆ rKube ◆ standard: default scheduling policy Evaluation 20

▪ rKube reduces neighbor interference in multi-tenant nodes ▪ For
streamcluster, rKube increases CPU utilization from 27% to 93% CPU utilization with target container 21 rKube: CPU utilization with batch applications standard policy: CPU utilization with batch applications higher is better

▪ rKube slightly decreases overall host CPU utilization ▪ CPU
utilization decreases 100% to 91% for burstable case Overall host CPU utilization 22 higher is better Batch app (streamcluster) Memcached Composition of overall host CPU utilization with rKube and standard CPU allocated to target/neighbors

▪ rKube reduces neighbor interference in multi-tenant nodes ▪ the
speedup of completion time ranges from 2.1x to 5.6x Performance with target container 23 rKube: completion time of batch applications standard policy: completion time of batch applications lower is better

▪ Improvement happens across all the situations ▪ reduction of
tail latency ranges from 12.9x to 13.7x Performance with target container 24 Memcached tail latency rKube vs. standard lower is better

▪ rKube requires no additional CPU but reduces completion time
Resource request scaling vs. rKube 25 Capped & Mem-Intensive neighbors Burstable & Mem-Intensive neighbors streamcluster performance improvement when using vertical scaling and rKube lower is better increase number of threads and CPU requests increase CPU requests (keeps number of threads 12) default

▪ rKube outperforms the under-commitment strategy Resource under-commitment vs. rKube
26 Batch application (streamcluster, fixed 12 threads) Memcached (Read operation, fixed 6 threads) Application performance by under-commitment strategy and rKube higher is better lower is better (assigned CPUs both target and neighbor) (total allocatable CPUs)

▪ Performance isolation in multi-tenant clouds ▪ CPI2 [Zhang+, Eurosys
’13] ◆ Detect offender jobs by monitoring cycles-per-instruction (CPI) and throttle them ◆ Less predictable in performance because it throttles jobs after suffering damage ▪ Investigate issues of task scheduling ▪ The Linux scheduler: a decade of wasted cores [Lozi+, Eurosys ’16] ◆ Demonstrate the work conserving bugs in the Linux kernel ▪ The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS [Bouron+, USENIX ATC ’18] ◆ Replace the Linux kernel’s scheduler with FreeBSD ULE policy scheduler ◆ Compare the performance with different scheduling policy Related Works 27

▪ Observed x5 performance degradation in multi- vs. single-tenant node
▪ CPU utilization degradation is a major contributor ▪ Found that Linux CFS fails to fulfill CPU reservations ▪ Forced Runqueue Sharing ▪ Phantom CPU Time ▪ Implement a makeshift solution: rKube ▪ Demonstrated that an enforced CPU reservations significantly improves performance in multi-tenant nodes Conclusion 28

Reading: Mind the Gap: Broken Promises of CPU R...

Reading: Mind the Gap: Broken Promises of CPU Reservations in Containerized Multi-tenant Clouds

wkb8s

More Decks by wkb8s

Featured

Transcript

Mind the Gap: Broken Promises of CPU Reservations in Containerized

▪ Users can easily develop & deploy containerized applications ▪

▪ Different type of containers may share the same node

▪ Goal ▪ Evaluate performance degradation in a multi-tenant environment

▪ Workloads ▪ Target container ◆ Batch application: PARSEC or

▪ The performance degradation is severe under multi-tenancy ▪ makes

▪ Noisy neighbors can abuse shared resources ▪ scheduling latency,

▪ Goal ▪ Evaluate the effect of HW contention by

▪ Least noisy capped & CPU-intensive neighbor degrades ▪ completion

▪ With Capped & CPU intensive neighbor, ▪ No CPU

▪ Demonstrate the performance degradation in multi-tenant node ▪ Point

▪ Demonstrate the performance degradation in multi-tenant node ▪ Point

▪ Kubernetes translates resource requests into cgroup values ▪ cpu.shares

▪ Linux CFS fails to fulfill CPU reservations because of

▪ Case 1A: target is running batch application with capped

▪ Case 1A: target is running batch application with capped

▪ Case 1B: target is running batch application with burstable

▪ Case 2: target is running interactive application ▪ A:

▪ Goal: proof-of-concept ▪ Demonstrate that an enforced CPU reservation

▪ Evaluate effectiveness of rKube to reduce neighbor interference ▪

▪ rKube reduces neighbor interference in multi-tenant nodes ▪ For

▪ rKube slightly decreases overall host CPU utilization ▪ CPU

▪ rKube reduces neighbor interference in multi-tenant nodes ▪ the

▪ Improvement happens across all the situations ▪ reduction of

▪ rKube requires no additional CPU but reduces completion time

▪ rKube outperforms the under-commitment strategy Resource under-commitment vs. rKube

▪ Performance isolation in multi-tenant clouds ▪ CPI2 [Zhang+, Eurosys

▪ Observed x5 performance degradation in multi- vs. single-tenant node