Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond Kaniko: Navigating Unprivileged Containe...

Beyond Kaniko: Navigating Unprivileged Container Image Creation

Most container images today are built using one of two primary approaches: Either by relying on privileged containers, or by employing a highly specialized method like Kaniko. While privileged builds are often used, they pose a well-known risk, making the build environment inherently insecure. Kaniko has presented its own set of challenges and its recent discontinuation by Google has led many teams to seek viable alternatives.

This situation naturally leads to the question: Why can't we just use any regular container build tool for unprivileged operations? The answer basically comes down to the complexities of "running containers within containers", a core technical challenge that has historically presented various roadblocks.

This talk will explore the current state of unprivileged container image builds. We'll delve into the underlying technical challenges that have historically constrained these efforts, and how continuous advancements over recent years are shaping what's possible. This evolution creates opportunities for more secure and efficient build pipelines.

We'll examine the different approaches available today, assessing their capabilities and limitations. You'll learn how these modern approaches enhance the security of your build pipeline, increase reliability by reducing reliance on elevated privileges, and simplify your overall build processes. By the end of this session, you'll have a clear overview of the landscape, enabling you to make informed decisions and adopt the right build solution for your environment, achieving stronger security and streamlined operations.

Avatar for Felix Dreissig

Felix Dreissig

July 02, 2025
Tweet

More Decks by Felix Dreissig

Other Decks in Technology

Transcript

  1. Beyond Kaniko Navigating Unprivileged Container Image Creation Felix Dreissig 2nd

    July 2025 Kubernetes, Cloud Native & Platform Engineering Meetup Munich
  2. State of the Kaniko Chainguard is going to keep this

    fork updated, patched, and maintained. We do not plan any major feature work, but bug fixes and other minor contributions are welcome! We don’t plan on publishing built release artifacts (container images, etc.) publicly, but they are available to Chainguard customers. You’re welcome to build these yourself from this repository if you are not a Chainguard customer. https://github.com/chainguard-dev/kaniko/blob/4ef13c4/README.md https://www.chainguard.dev/unchained/ fork-yeah-were-bringing-kaniko-back 4
  3. Building Container Images (2) CI Server CI Job Image Build

    CI Server CI Job Image Build Container Runtime /var/run/docker.sock CI Server CI Job Image Build privileged 5
  4. The Problem Why can’t we just run docker build within

    a container? CI Server CI Job Image Build 7
  5. The Problem Why can’t we just run docker build within

    a container? CI Server CI Job Image Build • Jessie Frazelle: Building Container Images Securely on Kubernetes https://blog.jessfraz.com/post/building-container-images-securely-on-kubernetes/ • Alban Crequy: Towards unprivileged container builds, ContainerDays 2018 https://youtu.be/yarJuToHHxY, https://kinvolk.io/blog/2018/04/towards-unprivileged-container-builds/ • Andrew Martin: Rootless, Reproducible & Hermetic: Secure Container Build Showdown, All Systems Go 2019 https://media.ccc.de/v/ ASG2019-146-rootless-reproducible-hermetic-secure-container-build-showdown 7
  6. Challenge 1: Elevated Privileges Initial User Namespace (Host) User Namespace

    Mount Namespace Unprivileged User root PID Namespace … 8
  7. Challenge 1: Elevated Privileges Initial User Namespace (Host) User Namespace

    Mount Namespace Unprivileged User root PID Namespace … CI Server CI Job in User Namespace Image Build 8
  8. Challenge 1: Elevated Privileges Initial User Namespace (Host) User Namespace

    Mount Namespace Unprivileged User root PID Namespace … CI Server CI Job in User Namespace Image Build Enabled for regular users? CONFIG_USER_NS=y, sysctl kernel.unprivileged_userns_clone=1 (or similar) 8
  9. Challenge 2: Mounting the Root File System user_namespaces Manpage: Holding

    CAP_SYS_ADMIN within the user namespace that owns a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: • /proc (since Linux 3.8) • /sys (since Linux 3.8) • devpts (since Linux 3.9) • tmpfs(5) (since Linux 3.9) • ramfs (since Linux 3.9) • mqueue (since Linux 3.9) • bpf (since Linux 4.4) + FUSE (since Linux 4.18, distro-specifc earlier) 9
  10. Challenge 2: Mounting the Root File System user_namespaces Manpage: Holding

    CAP_SYS_ADMIN within the user namespace that owns a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: • /proc (since Linux 3.8) • /sys (since Linux 3.8) • devpts (since Linux 3.9) • tmpfs(5) (since Linux 3.9) • ramfs (since Linux 3.9) • mqueue (since Linux 3.9) • bpf (since Linux 4.4) • overlayfs (since Linux 5.11) + FUSE (since Linux 4.18, distro-specifc earlier) 9
  11. Challenge 2: Mounting the Root File System user_namespaces Manpage: Holding

    CAP_SYS_ADMIN within the user namespace that owns a process's mount namespace allows that process to create bind mounts and mount the following types of filesystems: • /proc (since Linux 3.8) • /sys (since Linux 3.8) • devpts (since Linux 3.9) • tmpfs(5) (since Linux 3.9) • ramfs (since Linux 3.9) • mqueue (since Linux 3.9) • bpf (since Linux 4.4) • overlayfs (since Linux 5.11) + FUSE (since Linux 4.18, distro-specifc earlier) ✓ 9
  12. Challenge 3: procfs Masking proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

    proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) 10
  13. Challenge 3: procfs Masking proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

    proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=200000,gid=200000,inode64) udev on /proc/kcore type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/keys type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/timer_list type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → 10
  14. Challenge 3: procfs Masking proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)

    proc on /proc/bus type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/fs type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/irq type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime) proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime) tmpfs on /proc/acpi type tmpfs (ro,relatime,uid=200000,gid=200000,inode64) udev on /proc/kcore type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/keys type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → udev on /proc/timer_list type devtmpfs (rw,nosuid,relatime,size=8148648k,nr_inodes=2037162,mode=755,inode64) → ⇒ Cannot mount new procfs (mount_too_revealing()) 10
  15. procfs Masking: Disable it? • Should be fine if we

    have user (and process) namespaces • Docker: --security-opt systempaths=unconfined • GitLab Docker Runner: Unsupported (security_opt won’t work, see https://gitlab.com/ gitlab-org/gitlab-runner/-/issues/36810) • Podman: --security-opt "unmask=/proc/*" • Kubernetes: securityContext: procMount: Unmasked (behind ProcMountType flag) • GitLab Kubernetes Runner: proc_mount (undocumented, see https://gitlab.com/gitlab-org/ gitlab-runner/-/merge_requests/3546) 11
  16. Other Potential Roadblocks • seccomp (syscall blocking) Docker / Podman:

    --security-opt seccomp=unconfined (or use custom profile) • AppArmor (Mandatory Access Control) Docker / Podman: --security-opt apparmor=unconfined (or use custom profile) • SELinux (Mandatory Access Control) Podman: --security-opt label=disable (or use custom labels) 12
  17. Other Potential Roadblocks • seccomp (syscall blocking) Docker / Podman:

    --security-opt seccomp=unconfined (or use custom profile) • AppArmor (Mandatory Access Control) Docker / Podman: --security-opt apparmor=unconfined (or use custom profile) • SELinux (Mandatory Access Control) Podman: --security-opt label=disable (or use custom labels) ✓ 12
  18. A Word on Rootlessness Rootless containers refers to the ability

    for an unprivileged user to create, run and otherwise manage containers. This term also includes the variety of tooling around containers that can also be run as an unprivileged user. [...] When we say Rootless Containers, it means running the entire container runtime as well as the containers without the root privileges. https://rootlesscontaine.rs/ 14
  19. Showtime > docker info | grep -A 7 'Security Options:'

    Security Options: apparmor seccomp Profile: default userns cgroupns Kernel Version: 6.1.0-23-amd64 Operating System: Debian GNU/Linux 12 (bookworm) https://docs.docker.com/engine/security/userns-remap/ 15
  20. BuildKit https://github.com/moby/buildkit/blob/master/docs/rootless.md > ls Dockerfile > docker run \ --name

    buildkitd -d \ --security-opt systempaths=unconfined --security-opt seccomp=unconfined \ --security-opt apparmor=unconfined -v "$(pwd):/build" \ moby/buildkit:rootless > docker exec -it --workdir /build buildkitd sh $ buildctl build --frontend=dockerfile.v0 --local context=. --local dockerfile=. \ --output type=docker,name=demo > demo.tar 16
  21. Buildah > docker run -it \ --security-opt systempaths=unconfined \ --security-opt

    seccomp=unconfined \ --security-opt apparmor=unconfined \ -v "$(pwd):/build" --workdir /build \ quay.io/buildah/stable $ sed -i 's/^mount_program =/#&/' /etc/containers/storage.conf $ sed -i 's/^mountopt =/#&/' /etc/containers/storage.conf $ buildah build --isolation rootless -t demo:latest $ buildah push demo:latest docker-archive:demo.tar:demo:latest 17
  22. Isolation Trade-Offs Reuse the parent’s PID namespaces ⇒ No need

    for mounting procfs, no --security-opt systempaths=unconfined ⇒ Still more isolations than Kaniko • BuildKit: buildctl --oci-worker-no-process-sandbox • Buildah: buildah --isolation chroot (also reuses other namespaces) 18
  23. Buildah in GitLab CI Runner config.toml: [runners.docker] # ... security_opt

    = ["apparmor:unconfined", "seccomp:unconfined"] .gitlab-ci.yml: build_image: image: quay.io/buildah/stable before_script: - sed -i 's/^mount_program =/#&/' /etc/containers/storage.conf - sed -i 's/^mountopt =/#&/' /etc/containers/storage.conf script: - buildah build --isolation chroot -t demo:latest 19
  24. Notable Mentions (1) • img (https://github.com/genuinetools/img) • Based on BuildKit

    • Early showcase • Active development from 2017 to 2021 20
  25. Notable Mentions (1) • img (https://github.com/genuinetools/img) • Based on BuildKit

    • Early showcase • Active development from 2017 to 2021 • orca-build (https://github.com/cyphar/orca-build) • Not https://orca-build.io/ • Based on umoci and runc • More of a tech demo • Active development from 2017 to 2018 20
  26. Notable Mentions (2) • apko / melange (https://github.com/chainguard-dev) • Declarative,

    reproducible image build • Delegate non-declarative parts through basic package management • No support for unprivileged builds: https://github.com/chainguard-dev/melange/issues/285 21
  27. Notable Mentions (2) • apko / melange (https://github.com/chainguard-dev) • Declarative,

    reproducible image build • Delegate non-declarative parts through basic package management • No support for unprivileged builds: https://github.com/chainguard-dev/melange/issues/285 • Stacker (https://github.com/project-stacker/stacker) • "Declarative" alternative to Dockerfiles • Based on LXC and user namespaces • Under active development Stacker 21
  28. Beyond Kaniko Navigating Unprivileged Container Image Creation Felix Dreissig 2nd

    July 2025 Kubernetes, Cloud Native & Platform Engineering Meetup Munich