Slide 1

Slide 1 text

Copyright 2020 Sony Corporation CPU shielding on Docker/Kubernetes 第13回 コンテナ技術の情報交換会@オンライン Kenta Tada R&D Center Sony Corporation

Slide 2

Slide 2 text

About me ⚫Kenta Tada ⚫Software Engineer, Sony ⚫CloudNative Days Tokyo 2020 • https://speakerdeck.com/kentatada/embedded-container- runtime-using-linux-capabilities-seccomp-cgroups 2

Slide 3

Slide 3 text

Agenda ⚫Overview of CPU shielding ⚫CPU shielding on Docker ⚫CPU shielding on Kubernetes 3

Slide 4

Slide 4 text

Background ⚫We want to run realtime (RT) processes on our embedded container environment. ⚫There are many things to think about when RT processes run on the container environment. • Integrate tools for RT into Kubernetes : Today’s Topic • Security : https://blogs.oracle.com/linux/dealing-with-realtime-processes-in-linux-user-namespaces 4 Inside Container Core 0 Linux dockerd RT process Non-RT process Non-RT process Core 0 Core 1 Core 2 Core 3 kernel thread Interrupt CPU isolation

Slide 5

Slide 5 text

What is RT process ⚫ First in my words, real-time is not about the lowest possible latency or the maximum possible throughput. Real-time is deterministic execution time. Deterministic execution time means performing tasks within a certain time, this not being affected by any external process. ⚫There are many tasks to make processes real-time. Especially, containerization makes it more difficult. ⚫Today, I just only introduce the issues of CPU shielding on the container environment. 5 https://www.redhat.com/en/blog/going-full-deterministic-using-real-time-openstack

Slide 6

Slide 6 text

Overview of CPU shielding 6

Slide 7

Slide 7 text

What is CPU shielding ⚫ CPU shielding is a practice where on a multiprocessor system or on a CPU with multiple cores, real-time tasks can run on one CPU or core while non-real-time tasks run on another. ⚫Use cases • Isolating RT processes • Thermal throttling –This use case just only uses cpuset. –Reduce power consumption by pining background threads that are not performance-critical on LITTLE CPUs. • NFV(Network Functions Virtualization) –Improve NFV performance and prevent spurious packet loss. 7 https://en.wikipedia.org/wiki/CPU_shielding

Slide 8

Slide 8 text

CPU shielding ⚫User processes • Isolate the specified core to launch RT processes ⚫Kernel threads • Move kernel threads from the isolated core – Ex. Use cset. The “isolcpus“ kernel boot option cannot isolate kernel threads. “nohz_full” supports it except for CPU bounded threads since kernel version 5.9 •Set dynamic tickless behaviour – Ex. Set up the “nohz_full” kernel boot option • Stop RCU callbacks – Ex. Set up the “rcu_nocbs” kernel boot option • Set CPU affinity for work queue – Ex. Modify cpumasks in /sys/devices/virtual/workqueue and so on… 8

Slide 9

Slide 9 text

CPU shielding ⚫Interrupts • Set CPU affinity for interrupts – Ex. Modify files under /proc/irq • Change the interrupt handler from irq context to kernel thread – Ex. Set up the “threadirqs” kernel boot option and so on… You should adjust settings to your use case. 9 OK! What about CPU shielding inside a container??

Slide 10

Slide 10 text

CPU shielding on Docker 10

Slide 11

Slide 11 text

Isolate the specified core to launch RT processes inside a container 11 Core 0 Linux dockerd RT process Non-RT process Non-RT process Core 0 Core 1 Core 2 Core 3 ⚫cpuset is incomplete for CPU shielding. • When --cpuset-cpus argument is used, Docker can set CPU affinity. • But it cannot isolate CPUs against other than user processes. kernel thread Interrupt Outside the scope of this presentation How to move??

Slide 12

Slide 12 text

Move kernel threads from the isolated core using cset ⚫cset is a tool to manipulate cpusets. • cset can isolate both user processes and kernel threads except for CPU bounded threads. ⚫How to isolate • cset creates directories of 'system' and 'user' to operate cpuset on the root of cpuset controller. • The 'system' cpuset which contains CPUs which are used for unimportant tasks. • The 'user' cpuset which contains CPUs which are used for important tasks. – The 'user' cpuset is the shield. 12 https://github.com/lpechacek/cpuset/blob/v1.6/doc/tutorial.txt

Slide 13

Slide 13 text

The problem of Docker Shielding with cset ⚫Docker cannot launch the container on the isolated core by cset. ⚫What happened?? • cset creates directories of 'system' and 'user’. • Docker launches the container with --cpuset-cpus argument –Docker(runc) also creates the directory of cpuset(Ex. /sys/fs/cgroup/cpuset/docker) and tries to launch the container from that. –But cset has already made cpuset exclusive as default. – # echo 1 > cpuset.cpu_exclusive –So Docker fails to launch the container. 13

Slide 14

Slide 14 text

The problem of Docker Shielding with cset 14 Core 0 Core 0 Core 1 Core 2 Core 3 /sys /fs /cpuset /user/cpuset.cpu_exclusive /system/cpuset.cpu_exclusive Created by cset Created by Docker Shielded by cset /cgroup exclusive /docker/cpuset.cpu_exclusive

Slide 15

Slide 15 text

How to integrate cset into Docker ⚫How to fix when you use cgroupfs driver 1. Create the isolated cpuset as 'docker' # cset shield --userset=docker -c 0 -k on 2. Launch your Docker container 3. Move processes to the non-isolated cpuset when you launch the unimportant container if you need ⚫It is difficult to maintain cpuset… • Using systemd driver • Using KVM 15

Slide 16

Slide 16 text

Users launch the container in the isolated core 16 Core 0 Core 0 Core 1 Core 2 Core 3 /sys /fs /cpuset /docker/cpuset.cpu_exclusive /system/cpuset.cpu_exclusive Created by cset Created by Docker Shielded by cset /cgroup exclusive

Slide 17

Slide 17 text

CPU shielding on Kubernetes 17

Slide 18

Slide 18 text

Explicitly Reserved CPU List ⚫What about CPU shielding and cpuset in Kuberntes? ⚫Support explicitly reserved CPU list since Kubernetes v1.17 • The new Kubelet Flag to define an explicit CPU set for OS system daemons and Kubernetes system daemons. • This option is specifically designed for Telco/NFV. • To move the system daemon, Kubernetes daemons and interrupts/timers are out of scope. –In CentOS, you can do this using the tuned toolset. 18 https://v1-18.docs.kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#explicitly-reserved-cpu-list

Slide 19

Slide 19 text

Goal 19 Core 0 Linux dockerd RT process Non-RT process kubelet Core 0 Core 1 Core 2 Core 3 ⚫Make architecture simple to reduce maintenance costs • Try new kernel features to reduce necessary tools ⚫Integrate tools for RT into Kubernetes kernel thread Interrupt Reserved CPU list isolcpus, nohz_full and so on Inside Container CPU isolation

Slide 20

Slide 20 text

Key takeaways ⚫There are many caveats to isolate CPU cores. • Processes • Kernel threads •Interrupt ⚫Containerization makes CPU Shielding more difficult. • Integrate tools for RT into Kubernetes • Consider security ⚫Diversity is important for OSS. • To get patches into mainline, we need to understand different use cases. 20

Slide 21

Slide 21 text

SONYはソニー株式会社の登録商標または商標です。 各ソニー製品の商品名・サービス名はソニー株式会社またはグループ各社の登録商標または商標です。その他の製品および会社名は、各社の商号、登録商標または商標です。