Slide 1

Slide 1 text

Copyright 2021 Sony Corporation Embedded Container Runtime Updates The container performance issues caused by the cgroup Container Runtime Meetup #3 Kenta Tada R&D Center Sony Corporation

Slide 2

Slide 2 text

About me ⚫Kenta Tada ⚫Software Engineer, Sony 2

Slide 3

Slide 3 text

Agenda ⚫The problem caused by the cgroup hierarchy ⚫CPU controller with flat runqueue 3

Slide 4

Slide 4 text

The problem caused by the cgroup hierarchy 4

Slide 5

Slide 5 text

The performance degradation ⚫We noticed that runc decreased the performance by a lot of context switches inside the container when we ran UnixBench. • ~ 6 % overhead when we use runc ⚫But crun did not decrease the performance. 5 Machine x86-64 OS Ubuntu 20.04: 5.4.0-56-generic runtime runc, crun CPU AMD Ryzen 9 3900X 12-Core Processor Memory 32GB Benchmark unixbench, mbw, iperf Environment

Slide 6

Slide 6 text

The difference between runc and crun ⚫The cgroup hierarchy is different • runc : /sys/fs/cgroup/cpu/user.slice/{container-id} – It depends on the runc’s settings. • crun : /sys/fs/cgroup/cpu/{container-id} as default 6

Slide 7

Slide 7 text

Inside kernel ⚫The number of calls to enqueue_entity() and dequeue_entity() is different in kernel. 7 3 times runc bare-metal and crun ⚫ Summary of Call Flow * enqueue_entity pipe_write → __wake_up_sync_key → activate_task → enqueue_task_fair → enqueue_entity * dequeue_entity: pipe_read → pipe_wait → deactivate_task → dequeue_task_fair → dequeue_entity 2 times

Slide 8

Slide 8 text

Root cause analysis 1. The difference of the cgroup hierarchy depth • The difference of the CPU resource allocation degrades the performance. 2. The overhead of the cgroup hierarchy walking • The number of enqueue_entity() and dequene_entity() impacts the performance. 8

Slide 9

Slide 9 text

Evaluate the overhead of the cgroup hierarchy walking ⚫Even if same resource allocation and usage, the performance is degraded. 9 ~ 9 % overhead ~ 16 % overhead No tasks directly under the group No tasks directly under the group No tasks directly under the group

Slide 10

Slide 10 text

CPU controller with flat runqueue 10

Slide 11

Slide 11 text

The current design of the cgroup CPU controller ⚫The Linux Kernel Community has already found this issue. ⚫Problems • Build up entire hierarchy on wakeup • Tear it back down when task sleeps • Do vruntime accounting at each level at every reschedule • Preemption decisions re-evaluated at every level • load_avg calculated periodically 11 https://www.linuxplumbersconf.org/event/4/contributions/288/

Slide 12

Slide 12 text

The new idea ⚫They are trying to merge the new design into the mainline. ⚫Basic design • All tasks in root cfs_rq • Groups not placed on root cfs_rq • Rate limit hierarchy walks as much as possible • Use hierarchical load & weight for task priority • Scale vruntime with hierarchical task weight • Slight variation on vruntime formula 12 https://www.linuxplumbersconf.org/event/7/contributions/762/

Slide 13

Slide 13 text

Key takeaways ⚫The cgroup hierarchy walking causes the overhead inside kernel. ⚫The flatten CPU controller runqueues may improve the container performance. 13

Slide 14

Slide 14 text

SONYはソニー株式会社の登録商標または商標です。 各ソニー製品の商品名・サービス名はソニー株式会社またはグループ各社の登録商標または商標です。その他の製品および会社名は、各社の商号、登録商標または商標です。