CoreOS Fest 2017 - Containers from Scratch

9a5b28c1bf706b9138017bf0fd23ac45?s=47 Eric Chiang
May 31, 2017
520

CoreOS Fest 2017 - Containers from Scratch

9a5b28c1bf706b9138017bf0fd23ac45?s=128

Eric Chiang

May 31, 2017
Tweet

Transcript

  1. Containers from Scratch Eric Chiang Senior Engineer, CoreOS twitter.com/erchiang github.com/ericchiang

  2. None
  3. None
  4. None
  5. Today’s agenda Build a container without using a container runtime

    (docker, rkt, etc.)
  6. Step 1: The container image What are you shipping around?

    TL;DR - It’s a tarball. Images contain: • App metadata (how to run your app) • Filesystem (your app + an operating system?)
  7. Step 1: The container image Container filesystem: something that looks

    like an OS No kernel, no init system.
  8. $ mkdir rootfs $ sudo dnf -y \ --installroot=$PWD/rootfs \

    --releasever=24 install \ @development-tools \ procps-ng \ python3 \ which \ iproute \ net-tools $ ls rootfs
  9. Step 2: chroot Next step is to execute a process

    in our filesystem. chroot(2)
  10. $ sudo chroot rootfs

  11. Step 3: namespaces The chroots of other systems. • Process

    trees. • Network interfaces. • Mounted volumes. clone(2) and unshare(2)
  12. $ sudo unshare -p -f \ --mount-proc=$PWD/rootfs/proc \ chroot rootfs

    /bin/bash
  13. Step 4: entering namespaces Namespaces are composable. Kubernetes pod: •

    Multiple processes with different chroots. • Same network and mount namespace. setns(2)
  14. # PID=1234 # ls /proc/$PID/ns cgroup ipc mnt net pid

    user uts
  15. # PID=1234 # ls /proc/$PID/ns cgroup ipc mnt net pid

    user uts # nsenter \ --pid=/proc/$PID/ns/pid \ --mount=/proc/$PID/ns/mnt \ chroot $PWD/rootfs /bin/bash
  16. Step 5: volume mounts Let’s inject files into our chroot.

    How does docker’s -v flag work or Kubernetes host mounts?
  17. # nsenter --mount=/proc/$PID/ns/mnt \ mount --bind -o ro \ $PWD/readonlyfiles

    \ $PWD/rootfs/var/readonlyfiles
  18. Step 6: cgroups cgroups, resource restrictions for processes.

  19. # ls /sys/fs/cgroup/ # mkdir /sys/fs/cgroup/memory/demo # echo $$ >

    /sys/fs/cgroup/memory/demo # cat /proc/self/cgroup
  20. # CGROUP=/sys/fs/cgroup/memory/demo/ # echo "100000000" > $CGROUP/memory.limit_in_bytes # echo "0"

    > $CGROUP/memory.swappiness # python3 hungry.py
  21. Step 7: cgroup namespaces Q: How do you restrict a

    process from re-assigning its own cgroup? A: More namespaces!
  22. $ sudo unshare -C -p -f \ --mount-proc=rootfs/proc \ chroot

    rootfs /bin/bash cat /proc/self/cgroup mkdir -p /sys/fs/cgroup mount -t tmpfs cgroup_root /sys/fs/cgroup mkdir -p /sys/fs/cgroup/memory mount -t cgroup memory -omemory \ /sys/fs/cgroup/memory
  23. # echo "How to remove a cgroup" # echo "Reassign

    each task, remove the dir" # echo $$ > /sys/fs/cgroup/memory/tasks # rmdir /sys/fs/cgroup/memory/demo
  24. Step 8: capabilities “I have a co-worker who said: ‘Docker

    is about running random code downloaded from the Internet and running it as root.’” - Dan Walsh (Red Hat)
  25. Step 8: capabilities This section probably should have covered: -

    SELinux - seccomp - AppArmor Those are hard to demo, so we’ll be covering capabilities.
  26. $ go build -o /tmp/listen listen.go $ sudo setcap cap_net_bind_service=+ep

    \ /tmp/listen $ getcap /tmp/listen
  27. $ sudo capsh --print $ sudo capsh --drop=cap_chown --

  28. Step 9: network namespaces

  29. $ sudo unshare -n chroot rootfs # ip addr #

    ip link set dev lo up
  30. ip link add veth0 type veth peer name veth1 ip

    link set veth1 netns $PID ifconfig veth0 10.1.1.2/24 up # (inside namespace) ifconfig veth1 10.1.1.1/24 up
  31. Step 10: user namespaces Mapping of UIDs/GIDs from the host

    to the container. Container thinks it’s root when it’s not.
  32. $ unshare --map-root-user # cat /proc/self/uid_map # capsh --print

  33. Step 10: user namespaces Still need a lot of permissions

    on the host. • Unpacking images (device files). • Dealing with cgroups.
  34. Conclusion

  35. Conclusion “Containers” are a bunch of technologies provided by the

    Linux Kernel. Container runtimes are opinionated wrappers around these technologies.
  36. Links Namespaces in operation, Michael Kerrisk https://lwn.net/Articles/531114 Building minimal containers,

    Brian Redbeard https://github.com/brianredbeard/minimal_containers cgroups V1, Paul Menage https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt Getting Towards Real Sandbox Containers, Jessie Frazelle https://blog.jessfraz.com/post/getting-towards-real-sandbox-containers/ (Also lots of Linux man pages)
  37. eric.chiang@coreos.com twitter.com/erchiang github.com/ericchiang QUESTIONS? Thanks! We’re hiring for my team!

    coreos.com/careers Let’s talk! More events: coreos.com/community LONGER CHAT?