Upgrade to Pro — share decks privately, control downloads, hide ads and more …

cgroupv2 : What's new with the containers?

cgroupv2 : What's new with the containers?

Laurent Grangeau

June 04, 2021
Tweet

More Decks by Laurent Grangeau

Other Decks in Technology

Transcript

  1. Ils sont intervenus en 2018 Qu’est ce que le Paris

    Container Day ? Who am I ? Laurent Grangeau Cloud Solution Architect @ Sogeti @laurentgrangeau
  2. Ils sont intervenus en 2018 Qu’est ce que le Paris

    Container Day ? A huge thanks ! Akihiro Suda Software engineer @ NTT Corporation @_akihirosuda_
  3. Fedora 31 was the first major distro to adopt cgroupv2,

    aka unified hierarchy. It was released 29 october 2019. This talk was initially planned for the ParisContainerDay 2020, but we will see that things have evolved a lot and the adoption is much wider than in 2019. Stay tuned, it will be fun 🙂 Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Why talking about cgroupv2 ?
  4. cgroups (abbreviated from control groups) is a Linux kernel feature

    that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes. It provides: - Resource limiting - Prioritization - Account - Control It started in 2006 from Google under the name “process containers”. It was renamed "control groups" to avoid confusion caused by multiple meanings of the term "container" in the Linux kernel context, and the control groups functionality was merged into the Linux kernel mainline in kernel version 2.6.24, which was released in January 2008. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? What are cgroup(v1) ?
  5. There are 13 controllers in cgroupv1, each responsible for one

    aspect of the system: - cpu for managing user / system CPU time and usage. - cpuacct - hugetlb for accounting usage of huge pages by process group. - cpuset for binding a group to specific CPU. Useful for real time applications and NUMA systems with localized memory per CPU. - freezer for freezing a group. Useful for cluster batch scheduling, process migration and debugging without affecting prtrace. - net_cls,net_prio for tagging the traffic control. - devices for reading / writing access devices. - pids for controlling number of processes. - perf_event for per-cgroup perf monitoring. - rdma for distribution and accounting of RDMA resources. - memory for managing accounting, limits and notifications. - bulkio for measuring & limiting amount of blckIO by group. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? What are cgroup(v1) ?
  6. You can control access to differents part of the system

    to setup virtual OSes (or containers ;-)) Each cgroup will only be able to use as much as the cgroup has defined Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? What are cgroup(v1) ?
  7. You don’t need to slice across an entire system. You

    can have cgroup that deals with CPU or memory. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? What are cgroup(v1) ?
  8. You can isolate processes inside containers with limited access to

    the system. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? What are cgroup(v1) ?
  9. You can create cgroup easily from within your shell Here

    we mount the memory controller to limit a process memory Files that are starting with cgroup are common interfaces which allow interaction with the group. Files that are starting with memory are interfaces specific to the controller that has been activated Directories with *.slice are other cgroups automatically created. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? How to use cgroupv1 ?
  10. Creating a cgroup is as easy as creating a directory

    Here we are limiting the memory used by the process using this cgroup to 100Kib. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? How to use cgroupv1 ?
  11. cgroupv1 has a drawback : each controller has an independent

    tree. A process can join independent cgroups for example cgroup foo for CPU and bar for memory. It was designed at first to provide good flexibility, but wasn’t proved to be useful. Utility controllers (e.g.,freezer) that might be useful in all hierarchies could be used in only one Allowing thread granularity for cgroup membership proved problematic (e.g. memory controller (threads share memory...)) Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Problems with cgroupv1
  12. cgroupv2 focuses on simplicity. cgroupv2 uses single hierarchy for all

    controllers. When you create a new cgroup like newcgroup all controllers enabled for newcgroup will take the control of the process. cgroupv2 allows only process-granularity membership cgroupv2 has consistent names and values for interface files,consistent inheritance rules for all controllers Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? A new model : unified hierarchy
  13. cgroupv2 has also controllers to manage processes: - cpu ->

    successor to v1 cpu and cpuacct controllers - cpuacct - hugetlb -> successor to v1 hugetlb controller - cpuset -> successor to v1 cpuset controller - freezer - net_cls,net_prio -> no direct equivalent - devices -> successor to v1 devices controller - pids -> exactly the same as v1 controller - perf_event -> same as v1 controller - rdma -> same as v1 controller - memory -> successor to v1 memory controller - bulkio - io -> successor to v1 blkio controller - misc -> new cgroup controller Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? A new model : unified hierarchy
  14. Contrary to cgroupv1, all controllers are automatically available in cgroupv2

    in a unified hierarchy No need to explicitly bind controllers to mount point Each v2 cgroup has a (read-only) cgroup.controllers file, which lists available controllers this cgroup can enable Controllers are enabled/disabled by writing some subset of available controllers to cgroup.subtree_control Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? A new model : unified hierarchy
  15. To enable a controller (ex. pids), write to cgroup.subtree_control Allows

    resource to be controlled in child cgroups Creates controller-specific attribute files in each child directory If a controller is disabled in a cgroup (i.e., not written to cgroup.subtree_control in parent cgroup), it cannot be enabled in any descendants of the cgroup Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? A new model : unified hierarchy
  16. In cgroupv1, access controls are implemented by writing static configuration

    In cgroupv2, the device access control is implemented by attaching an eBPF program. Here is the same configuration in cilium-flavored assembler syntax. eBPF is a technology from the kernel that can, among other things, analyze network traffic Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? cgroupv2 : eBPF oriented
  17. The most impactful changement is rootless containers. Rootless containers refers

    to the ability for an unprivileged user to create, run and otherwise manage containers. When we say Rootless Containers, it means running the entire container runtime as well as the containers without the root privileges. Allowing a non-root user to access to /var/run/docker.sock, by adding the user to docker group (sudo usermod -aG docker somebody) is NOT an example of a rootless container. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? cgroupv2 : rootless containers
  18. Enable cgroupv2 in GRUB and disable all cgroupv1 controllers By

    default, a non-root user can only get memory controller and pids controller to be delegated. Create a new file in /etc/systemd/system/[email protected]/delegate .conf to delegate cpu and io. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Enabling cgroupv2
  19. Development of cgroupv2 started in 2014 and was released with

    version 4.5 of kernel on march 2016 However, it wasn’t considered to be useful for containers until the release of kernel 5.2 (July 7, 2019), due to the lack of the support for the device controller and the freezer. After the introduction of cgroupv2 device controller in kernel 4.15 (Jan 28, 2018) and cgroupv2 freezer in kernel 5.2, now cgroupv2 is considered to be ready for containers. Although there is “hybrid” configuration that allows mounting both v1 hierarchy and cgroupv2 hierarchy, the “hybrid” mode is underutilized for containers because you can’t enable cgroupv2 controllers that are already enabled for cgroupv1. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status
  20. runc The PR has been merged 5 Sep 2019: https://github.com/opencontainers/runc/pull/2113

    User namespaces must be compiled and enable in your kernel. Confirm CONFIG_USER_NS=y is set in your kernel configuration. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status : low-level runtimes
  21. crun (redhat) crun is another implementation of OCI Runtime Spec.

    It is written in C, much smaller (around 300ko vs 15Mo), twice as fast as runc. It has support for cgroupv2 since late 2019 and was the default runtime from Fedora 31 Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status : low-level runtimes
  22. containerd The PR has been merged 12 Dec 2019 https://github.com/containerd/containerd/pull/3799

    You can start containerd as a user with containerd-rootless-setuptool.sh Don’t forget to install CNI plugins inside /opt/cni/bin To start/stop the daemon: systemctl --user <start|stop> containerd Enabling resource limitations: nerdctl run --cpus | --memory | --blkio-weight | --pids-limit Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status : high-level runtimes
  23. Docker / Moby Moby support cgroupv2 : - https://github.com/moby/moby/pull/40174 -

    https://github.com/moby/moby/pull/40657 - https://github.com/moby/moby/pull/40662 Docker 20.10 add support for cgroupv2. It is now out of experimental : https://github.com/moby/moby/pull/42263 To install Docker in rootless: Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status : high-level runtimes
  24. Podman Podman supports cgroupv2 since 1.5 and add support for

    multi-container networking since 2.1 Enabling resource limitations: podman run --cpus | --memory | --blkio-weight | --pids-limit To use CPU controller, you need to add it to your configuration: [Service] # default: Delegate=pids memory Delegate=pids memory cpu Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status : high-level runtimes
  25. BuildKit It can runs on cgroupv2 perfectly fine. It uses

    rootkit to launch the daemon. Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status : high-level runtimes
  26. usernetes Reference distribution of Kubernetes that can be installed under

    a user’s home https://github.com/rootless-containers/usernetes k3s It supports rootless mode using usernetes : k3s server --rootless https://rancher.com/docs/k3s/latest/en/advanced/#running-k3s-with-rootlesskit-experimental kubernetes The PR has been merged 20 May 2021 ! https://github.com/kubernetes/enhancements/pull/1371 It should be available for 1.22 Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Adoption status : Kubernetes
  27. • We see a massive adoption of cgroupv2 at every

    level of containers (from OCI, to runtime, to Kubernetes) • cgroupv2 enables a new unified hierarchy with naming and developing conventions • rootless containers allow for increased security enabling running containers in non-root user Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Qu’est ce que le Paris Container Day ? Key takeaways
  28. Ils sont intervenus en 2018 Qu’est ce que le Paris

    Container Day ? Questions ? Laurent Grangeau Cloud Solution Architect @ Sogeti @laurentgrangeau