Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CoreOS Fest 2017 - Containers from Scratch

Eric Chiang
May 31, 2017
980

CoreOS Fest 2017 - Containers from Scratch

Eric Chiang

May 31, 2017
Tweet

Transcript

  1. Step 1: The container image What are you shipping around?

    TL;DR - It’s a tarball. Images contain: • App metadata (how to run your app) • Filesystem (your app + an operating system?)
  2. $ mkdir rootfs $ sudo dnf -y \ --installroot=$PWD/rootfs \

    --releasever=24 install \ @development-tools \ procps-ng \ python3 \ which \ iproute \ net-tools $ ls rootfs
  3. Step 3: namespaces The chroots of other systems. • Process

    trees. • Network interfaces. • Mounted volumes. clone(2) and unshare(2)
  4. Step 4: entering namespaces Namespaces are composable. Kubernetes pod: •

    Multiple processes with different chroots. • Same network and mount namespace. setns(2)
  5. # PID=1234 # ls /proc/$PID/ns cgroup ipc mnt net pid

    user uts # nsenter \ --pid=/proc/$PID/ns/pid \ --mount=/proc/$PID/ns/mnt \ chroot $PWD/rootfs /bin/bash
  6. Step 5: volume mounts Let’s inject files into our chroot.

    How does docker’s -v flag work or Kubernetes host mounts?
  7. # ls /sys/fs/cgroup/ # mkdir /sys/fs/cgroup/memory/demo # echo $$ >

    /sys/fs/cgroup/memory/demo # cat /proc/self/cgroup
  8. Step 7: cgroup namespaces Q: How do you restrict a

    process from re-assigning its own cgroup? A: More namespaces!
  9. $ sudo unshare -C -p -f \ --mount-proc=rootfs/proc \ chroot

    rootfs /bin/bash cat /proc/self/cgroup mkdir -p /sys/fs/cgroup mount -t tmpfs cgroup_root /sys/fs/cgroup mkdir -p /sys/fs/cgroup/memory mount -t cgroup memory -omemory \ /sys/fs/cgroup/memory
  10. # echo "How to remove a cgroup" # echo "Reassign

    each task, remove the dir" # echo $$ > /sys/fs/cgroup/memory/tasks # rmdir /sys/fs/cgroup/memory/demo
  11. Step 8: capabilities “I have a co-worker who said: ‘Docker

    is about running random code downloaded from the Internet and running it as root.’” - Dan Walsh (Red Hat)
  12. Step 8: capabilities This section probably should have covered: -

    SELinux - seccomp - AppArmor Those are hard to demo, so we’ll be covering capabilities.
  13. ip link add veth0 type veth peer name veth1 ip

    link set veth1 netns $PID ifconfig veth0 10.1.1.2/24 up # (inside namespace) ifconfig veth1 10.1.1.1/24 up
  14. Step 10: user namespaces Mapping of UIDs/GIDs from the host

    to the container. Container thinks it’s root when it’s not.
  15. Step 10: user namespaces Still need a lot of permissions

    on the host. • Unpacking images (device files). • Dealing with cgroups.
  16. Conclusion “Containers” are a bunch of technologies provided by the

    Linux Kernel. Container runtimes are opinionated wrappers around these technologies.
  17. Links Namespaces in operation, Michael Kerrisk https://lwn.net/Articles/531114 Building minimal containers,

    Brian Redbeard https://github.com/brianredbeard/minimal_containers cgroups V1, Paul Menage https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt Getting Towards Real Sandbox Containers, Jessie Frazelle https://blog.jessfraz.com/post/getting-towards-real-sandbox-containers/ (Also lots of Linux man pages)
  18. [email protected] twitter.com/erchiang github.com/ericchiang QUESTIONS? Thanks! We’re hiring for my team!

    coreos.com/careers Let’s talk! More events: coreos.com/community LONGER CHAT?