Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unprivileged Image Builds: What are the Challe...

Unprivileged Image Builds: What are the Challenges and Where are we Today?

Recording of the talk available at: https://youtu.be/62p6v_A4KTM

Most popular container image build tools require extensive privileges to perform their intricate task. This makes it challenging to run them in container-based CI systems, on Kubernetes, or even in rootless environments. At the same time, CI systems are an attractive target for attacks and privileged image builds pose a well-known risk.

In many ways, the problem boils down to "running containers within containers". It received substantial attention around 2018, when various roadblocks were identified and patches were under review. But six years later, most container images are still built in privileged environments. However, the fundamentals have improved and real-world solutions are now available!

The talk will first give an overview of the technical obstacles and what has changed in recent years. It will then spotlight real-world tools and their underlying approaches. Finally, some practical guidance will be provided to engineers eager to adopt unprivileged image builds.

Felix Dreissig

September 04, 2024
Tweet

Other Decks in Technology

Transcript

  1. Unprivileged Image Builds What are the Challenges and Where are

    we Today? Felix Dreissig 4th September 2024 ContainerDays
  2. Building Container Images CI Server CI Job Image Build CI

    Server CI Job Image Build Container Runtime /var/run/docker.sock CI Server CI Job Image Build privileged 2
  3. The Problem Why can’t we just run docker build within

    a container? • Jessie Frazelle: Building Container Images Securely on Kubernetes https://blog.jessfraz.com/post/building-container-images-securely-on-kubernetes/ • Alban Crequy: Towards unprivileged container builds, ContainerDays 2018 https://youtu.be/yarJuToHHxY, https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/ • Andrew Martin: Rootless, Reproducible & Hermetic: Secure Container Build Showdown, All Systems Go 2019 https://media.ccc.de/v/ ASG2019-146-rootless-reproducible-hermetic-secure-container-build-showdown 4
  4. Challenge 1: Elevated Privileges Initial User Namespace (Host) User Namespace

    Mount Namespace Unprivileged User root PID Namespace … 5
  5. Challenge 1: Elevated Privileges Initial User Namespace (Host) User Namespace

    Mount Namespace Unprivileged User root PID Namespace … CI Server CI Job in User Namespace Image Build 5
  6. Challenge 1: Elevated Privileges Initial User Namespace (Host) User Namespace

    Mount Namespace Unprivileged User root PID Namespace … CI Server CI Job in User Namespace Image Build Enabled for regular users? CONFIG_USER_NS=y, sysctl kernel.unprivileged_userns_clone=1 (or similar) 5
  7. Challenge 2: Mounting the Root File System user_namespaces Manpage: Holding

    CAP_SYS_ADMIN within the user namespace that owns a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: • /proc (since Linux 3.8) • /sys (since Linux 3.8) • devpts (since Linux 3.9) • tmpfs(5) (since Linux 3.9) • ramfs (since Linux 3.9) • mqueue (since Linux 3.9) • bpf (since Linux 4.4) + FUSE (since Linux 4.18, distro-specifc earlier) 6
  8. Challenge 2: Mounting the Root File System user_namespaces Manpage: Holding

    CAP_SYS_ADMIN within the user namespace that owns a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: • /proc (since Linux 3.8) • /sys (since Linux 3.8) • devpts (since Linux 3.9) • tmpfs(5) (since Linux 3.9) • ramfs (since Linux 3.9) • mqueue (since Linux 3.9) • bpf (since Linux 4.4) • overlayfs (since Linux 5.11) + FUSE (since Linux 4.18, distro-specifc earlier) 6
  9. Challenge 2: Mounting the Root File System user_namespaces Manpage: Holding

    CAP_SYS_ADMIN within the user namespace that owns a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: • /proc (since Linux 3.8) • /sys (since Linux 3.8) • devpts (since Linux 3.9) • tmpfs(5) (since Linux 3.9) • ramfs (since Linux 3.9) • mqueue (since Linux 3.9) • bpf (since Linux 4.4) • overlayfs (since Linux 5.11) + FUSE (since Linux 4.18, distro-specifc earlier) ✓ 6
  10. Challenge 3: procfs Masking proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

    proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=200000,gid=200000,inode64) 7
  11. Challenge 3: procfs Masking proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

    proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=200000,gid=200000,inode64) udev on /proc/kcore type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/keys type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/timer_list type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → 7
  12. Challenge 3: procfs Masking proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

    proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=200000,gid=200000,inode64) udev on /proc/kcore type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/keys type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/timer_list type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → ⇒ Cannot mount new procfs (mount_too_revealing()) 7
  13. procfs Masking: Disable it? • Should be fine if we

    have user (and process) namespaces • Docker: --security-opt systempaths=unconfined • GitLab Docker Runner: Unsupported (security_opt won’t work, see https://gitlab.com/ gitlab-org/gitlab-runner/-/issues/36810) • Podman: --security-opt "unmask=/proc/*" • Kubernetes: securityContext: procMount: Unmasked (behind ProcMountType flag) • GitLab Kubernetes Runner: proc_mount (undocumented, see https://gitlab.com/gitlab-org/ gitlab-runner/-/merge_requests/3546) 8
  14. Other Potential Roadblocks • seccomp (syscall blocking) Docker / Podman:

    --security-opt seccomp=unconfined (or use custom profile) • AppArmor (Mandatory Access Control) Docker / Podman: --security-opt apparmor=unconfined (or use custom profile) • SELinux (Mandatory Access Control) Podman: --security-opt label=disable (or use custom labels) 10
  15. Other Potential Roadblocks • seccomp (syscall blocking) Docker / Podman:

    --security-opt seccomp=unconfined (or use custom profile) • AppArmor (Mandatory Access Control) Docker / Podman: --security-opt apparmor=unconfined (or use custom profile) • SELinux (Mandatory Access Control) Podman: --security-opt label=disable (or use custom labels) ✓ 10
  16. A Word on Rootlessness Rootless containers refers to the ability

    for an unprivileged user to create, run and otherwise manage containers. This term also includes the variety of tooling around containers that can also be run as an unprivileged user. [...] When we say Rootless Containers, it means running the entire container runtime as well as the containers without the root privileges. https://rootlesscontaine.rs/ 12
  17. Showtime > docker info | grep -A 7 'Security Options:'

    Security Options: apparmor seccomp Profile: default userns cgroupns Kernel Version: 6.1.0-23-amd64 Operating System: Debian GNU/Linux 12 (bookworm) https://docs.docker.com/engine/security/userns-remap/ 13
  18. BuildKit https://github.com/moby/buildkit/blob/master/docs/rootless.md > ls Dockerfile > docker run \ --name

    buildkitd -d \ --security-opt systempaths=unconfined --security-opt seccomp=unconfined \ --security-opt apparmor=unconfined -v "$(pwd):/build" \ moby/buildkit:rootless > docker exec -it --workdir /build buildkitd sh $ buildctl build --frontend=dockerfile.v0 --local context=. --local dockerfile=. \ --output type=docker,name=demo > demo.tar 14
  19. Buildah > docker run -it \ --security-opt systempaths=unconfined \ --security-opt

    seccomp=unconfined \ --security-opt apparmor=unconfined \ -v "$(pwd):/build" --workdir /build \ quay.io/buildah/stable $ sed -i 's/^mount_program =/#&/' /etc/containers/storage.conf $ sed -i 's/^mountopt =/#&/' /etc/containers/storage.conf $ buildah build --isolation rootless -t demo:latest $ buildah push demo:latest docker-archive:demo.tar:demo:latest 15
  20. Isolation Trade-Offs Reuse the parent’s PID namespaces ⇒ No need

    for mounting procfs, no --security-opt systempaths=unconfined • BuildKit: buildctl --oci-worker-no-process-sandbox • Buildah: buildah --isolation chroot (also reuses other namespaces) 16
  21. Buildah in GitLab CI Runner config.toml: [runners.docker] # ... security_opt

    = ["apparmor:unconfined", "seccomp:unconfined"] .gitlab-ci.yml: build_image: image: quay.io/buildah/stable before_script: - sed -i 's/^mount_program =/#&/' /etc/containers/storage.conf - sed -i 's/^mountopt =/#&/' /etc/containers/storage.conf script: - buildah build --isolation chroot -t demo:latest 17
  22. Notable Mentions (1) • img (https://github.com/genuinetools/img) • Based on BuildKit

    • Early showcase • Active development from 2017 to 2021 18
  23. Notable Mentions (1) • img (https://github.com/genuinetools/img) • Based on BuildKit

    • Early showcase • Active development from 2017 to 2021 • orca-build (https://github.com/cyphar/orca-build) • Not https://orca-build.io/ • Based on umoci and runc • More of a tech demo • Active development from 2017 to 2018 18
  24. Notable Mentions (1) • img (https://github.com/genuinetools/img) • Based on BuildKit

    • Early showcase • Active development from 2017 to 2021 • orca-build (https://github.com/cyphar/orca-build) • Not https://orca-build.io/ • Based on umoci and runc • More of a tech demo • Active development from 2017 to 2018 • Stacker (https://github.com/project-stacker/stacker) • "Declarative" alternative to Dockerfiles • Based on LXC and user namespaces • Under active development Stacker 18
  25. Notable Mentions (2) • apko / melange (https://github.com/chainguard-dev) • Declarative,

    reproducible image build • Delegate non-declarative parts through basic package management • No support for unprivileged builds: https://github.com/chainguard-dev/melange/issues/285 19
  26. Notable Mentions (2) • apko / melange (https://github.com/chainguard-dev) • Declarative,

    reproducible image build • Delegate non-declarative parts through basic package management • No support for unprivileged builds: https://github.com/chainguard-dev/melange/issues/285 • Kaniko (https://github.com/GoogleContainerTools/kaniko) • Well-established • No isolation to build container • Weird bugs (https://github.com/GoogleContainerTools/kaniko/issues/1921, https://github.com/GoogleContainerTools/kaniko/issues/2136) 19
  27. Unprivileged Image Builds What are the Challenges and Where are

    we Today? Felix Dreissig 4th September 2024 ContainerDays