Linux Container Primitives: cgroups, namespaces, and more! (LinuxFest Northwest 2020)

Linux Container Primitives: cgroups, namespaces, and more! (LinuxFest Northwest 2020)

In this session, we’ll explore the different Linux primitives that are commonly used in implementing container runtimes. We’ll learn about the Linux primitives that underlie container runtimes like Docker, including cgroups, namespaces, and union filesystems. We’ll see how Docker uses these primitives, and how the OCI standard makes it possible to customize how your containers run. We’ll also discuss alternative container runtimes like CRI-O, rkt, and systemd-nspawn and what makes them different. This will be an interactive session with a live demo and open questions.

This session is a repeat of the session from last year.

D3f5ebf1d4c147756c86ff6c8a83f4e0?s=128

Samuel Karp

May 08, 2020
Tweet

Transcript

  1. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Samuel Karp, Amazon Web Services – @samuelkarp LinuxFest Nortwest 2020 – Online Linux Container Primitives cgroups, namespaces, and more!
  2. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. LinuxFest Northwest – Online! Q&A at https://discuss.lfnw.org Also available on Twitter – @samuelkarp
  3. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Agenda Container primitives overview Control groups (cgroups) Namespaces Union filesystems Runtimes
  4. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Agenda Container primitives overview Control groups (cgroups) Namespaces Union filesystems Capabilities Runtimes
  5. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Linux container primitives
  6. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Containers are an abstraction over several different Linux technologies
  7. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Linux Kernel Container runtime Container 1 Container 2 Container 3 Container 4 Container 5 Container 6 Namespaces Control groups Union filesystem
  8. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Control groups
  9. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What do control groups (cgroups) do? • Organize all processes in the system • Account for resource usage and gather utilization data • Limit or prioritize resource utilization
  10. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Subsystems • Control group system is an abstract framework • Subsystems are concrete implementations • Different subsystems can organize processes separately • Most subsystems are resource controllers Examples of subsystems: • Memory • CPU time • Block I/O • Number of discrete processes (pids) • CPU & memory pinning • Freezer (used by docker pause) • Devices • Network priority
  11. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Hierarchical representation • Independent subsystem hierarchies • Every pid is represented exactly once in each subsystem • New processes inherit cgroups from their parents ├── blkio │ └── docker │ └── b211c37 ├── cpu,cpuacct │ └── docker │ └── b211c37 ├── cpuset │ └── docker │ └── b211c37 ├── devices │ └── docker │ └── b211c37 ├── freezer │ └── docker │ └── b211c37 ├── hugetlb │ └── docker │ └── b211c37 ├── memory │ └── docker │ └── b211c37
  12. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. cgroup virtual filesystem • Typically mounted at /sys/fs/cgroup • tasks virtual file holds all pids in the cgroup • Other files have settings and utilization data ├── cgroup.clone_children ├── cgroup.procs ├── cgroup.sane_behavior ├── cpuacct.stat ├── cpuacct.usage ├── cpuacct.usage_all ├── cpuacct.usage_percpu ├── cpuacct.usage_percpu_sys ├── cpuacct.usage_percpu_user ├── cpuacct.usage_sys ├── cpuacct.usage_user ├── cpu.cfs_period_us ├── cpu.cfs_quota_us ├── cpu.rt_period_us ├── cpu.rt_runtime_us ├── cpu.shares ├── cpu.stat ├── notify_on_release ├── release_agent └── tasks
  13. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo
  14. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What can you use cgroups for? • cgroups can be used independently of containers • cgroups control resource limits for processes • Monitor processes and organize them • Be careful not to break any assumptions your container runtime or orchestrator might have
  15. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Further reading • Linux: Documentation/cgroup-v1
  16. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Namespaces
  17. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What do namespaces do? • Isolation mechanism for resources • Changes to resources within namespace can be invisible outside the namespace • Resource mapping with permission changes
  18. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What namespaces are available? • Network • Filesystem (mounts) • Processes (pid) • Inter-process communication (ipc) • Hostname and domain name (uts) • User and group IDs • cgroup
  19. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Namespace sharing Process A Process B Process C Process D pid:[2] pid:[1] pid:[3] net:[4] net:[5] net:[6] mount:[7] mount:[8]
  20. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Network namespace • Frequently used in containers • veth devices can connect different namespaces • docker run uses a separate network namespace per container • Multiple containers can share a network namespace • Kubernetes pods • Amazon ECS tasks with the awsvpc networking mode
  21. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Mount namespace • Used for giving containers their own filesystem • Container image is mounted as the root filesystem • “Volumes” to share data between containers or the host • More about filesystems to come! bash-4.2# mount overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2 /l/Q5EBZ7CIJYELLG2MBKZIRRFWW6:/var/lib/docker/ overlay2/l/ PKATP76T57BQZ5D44JXYFIB26E,upperdir=/var/lib/ docker/ overlay2/88816f9510a9ff38b31eaaceccbef6ffc9cc3 c06bcc451f9684850db5ee1b152/diff,workdir=/var/ lib/docker/ overlay2/88816f9510a9ff38b31eaaceccbef6ffc9cc3 c06bcc451f9684850db5ee1b152/work) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmx mode=666) sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)
  22. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. procfs virtual filesystem • Namespaces are visible in /proc • Files are symbolic links to the namespace • The link contains the namespace type and inode number to identify the namespace $ readlink /proc/$$/ns/* cgroup:[4026531835] ipc:[4026531839] mnt:[4026531840] net:[4026531993] pid:[4026531836] user:[4026531837] uts:[4026531838]
  23. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Creating namespaces • clone(2) and unshare(2) • CLONE_NEW* flags to specify which namespaces • clone(2) is for new processes to create new namespaces • unshare(2) is for existing processes to create new namespaces
  24. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Persisting namespaces • The kernel automatically garbage-collects namespaces by reference-counting • New namespace remains open as long as • a process runs or • a mount is open • Bind-mount a file in /proc/$$/ns to another place on the filesystem $ mount \ --bind /proc/$$/ns/net \ /var/run/netns/lfnw
  25. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Entering namespaces • Open a file from /proc/$$/ns (or a bind-mount) • Pass to setns(2) to enter the existing namespace • Namespace remains open as long as the process is running, even if the original file goes away • nsenter(1) is a command for doing this interactively • ip-netns(8) works specifically for network namespaces
  26. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo
  27. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. How can you leverage this? • Use nsenter or ip netns to troubleshoot container networking • Monitor containers by entering the pid namespace • Access binaries in your containers with the mount namespace
  28. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Further reading • man 7 namespaces • man 7 pid_namespaces • man 7 user_namespaces
  29. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Images, layers, and union filesystems
  30. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Filesystem images • Images are representations of a filesystem • Images are popular for virtualization and container systems • Docker helped popularize the concept of layers
  31. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Top layer (read-write) Intermediate layer (read-only) Base layer (read-only) How Docker layers work • A copy-on-write view of your files • New files exist only in the top layer • When a file is modified, it is copied up to the top layer • Unmodified files exist in whatever layer they were added/modified • Deleted files are hidden, but still exist
  32. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Union filesystems • Unified view of two (or more) filesystems • Popular in container runtimes (like Docker) to implement layers • Efficient use of storage when making minor modifications to images • Efficient use of storage when starting multiple containers with identical images
  33. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Overlay filesystem • Joins two directories (upper and lower) to form a union • Uses file name to describe the files • When writing to the overlay • lowerdir is not modified, all changes go to upperdir • Existing files are copied-up to the upperdir for modificiation • Whole file is copied, not just blocks • “Deleting” a file in the upperdir creates a whiteout • Files: character devices with 0/0 device number • Directories: xattr “trusted.overlay.opaque” set to “y”
  34. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Overlay filesystem (continued) • An upperdir can have multiple lowerdirs • Overlay filesystems can be created with mount(2) • You can examine the mounts with • mount(8) • /proc/mounts • /proc/$$/mountinfo
  35. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Docker’s overlay driver • Docker’s default layer storage uses the overlay filesystem • upperdir, lowerdir, and diff directories are in /var/lib/docker/overlay2
  36. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo
  37. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. How can you leverage this? • Locate files in your layers • Examine which files and layers contribute to your disk usage • Understand the impact of writable files in your containers # du -h . | sort -hr 753M . 211M ./e33f37/diff 211M ./e33f37 204M ./e33f37/diff/usr 169M ./f87973/diff … # ls ./f87973 diff link # ls ./e33f37 diff link lower work Base layer! Intermediate layer!
  38. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Further reading • Linux: Documentation/filesystems/overlay.txt
  39. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Capabilities
  40. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Traditional UNIX permissions • Privileged operations restricted to UID 0 (root) • Non-privileged operations available to all users • Privileged processes bypass all permission checks • Unprivileged processes permission checks (UID/GID)
  41. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Increased granularity • Grant some permissions without root • Deny permissions even to root processes • 38 distinct capabilities • Varying degrees of granularity • CAP_SYS_ADMIN is very broad • CAP_SYS_TIME is comparatively narrow • Capabilities set on threads and files
  42. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Thread capabilities • Capabilities are set on threads; different threads of the same process can have different capabilities • Threads can raise or lower privileges at runtime • Effective – used by the kernel for permission checks • Permitted – limiting superset of effective capabilities • Inheritable – persist across execve(2) for root • Ambient – persist across execve(2) for non-root • Bounding – limits permissions across execve(2)
  43. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. File capabilities • File capabilities + thread capabilities determine capabilities after execve(2) • Permitted – automatically permitted, regardless of inheritable • Inheritable – ANDed with thread inheritable set to determine which capabilities are enabled after execve(2) • Effective – whether permitted capabilities are automatically enabled
  44. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  45. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  46. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  47. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  48. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  49. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  50. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  51. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  52. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  53. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  54. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  55. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  56. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  57. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  58. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  59. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  60. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]
  61. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Special treatment for root • Preserve traditional UNIX semantics • setuid root with file capabilities • Still bound by the Bounding set
  62. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Syscalls • prctl(2): control ambient and bounding capabilities, “no new privileges”, “keep capabilities”, etc. • capget(2)/cap_get_proc(3) & capset(2)/cap_set_proc(3): Control effective, permitted, and inheritable capability sets
  63. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Tools • capsh(1): run processes with specified capabilities • getcap(8)/setcap(8): get/set file capabilities
  64. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo
  65. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Challenges with the capability system • Capabilities were added to Linux much later, and are not as widely used • Very complex, interactions between thread and file capabilities are hard to reason about • Broad capabilities make it hard to effectively restrict • Some capabilities can be used to escalate arbitrarily
  66. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. How can you leverage this? • Reduce the need for setuid/setgid binaries • Understand how Docker uses bounded capabilities to restrict permissions "capabilities": { "bounding": [ "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW", "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE", "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE" ], "effective": [ "CAP_CHOWN",
  67. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Further reading • capabilities(7) • capsh(1) • setcap(8)
  68. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Runtimes
  69. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. What is a container runtime? • A software tool that configures Linux primitives to create and run containers on a host • Examples include: • Docker • containerd • runc • CRI-O • systemd-nspawn • Open Containers Initiative (OCI) aims to standardize container runtimes, image format, and distribution • The OCI reference implementation (runc) powers Docker, containerd, and CRI‑O
  70. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. OCI runtime spec • Containers are “bundles” • Filesystem • JSON document • Filesystem can be a union • JSON document describes • cgroups • Namespaces • Additional mounts • Linux capabilities • Linux security modules • And more • Hooks can modify the bundle { "ociVersion": "1.0.1", ⋮ "root": { "path": "/var/lib/docker/overlay2/03004c/merged" }, ⋮ "hooks": { "prestart": [{"path": "/proc/9306/exe"}] }, "linux": { "resources": { "cpu": {"shares": 0}, "pids": {“limit": 0}, ⋮ }, "cgroupsPath": "/docker/bd5cebc8950c", "namespaces": [ {"type": "mount"}, {"type": "network"}, ⋮ ], ⋮ }
  71. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. OCI runtime hooks • Hooks run • Before a container starts • After a container starts • After a container stops • Hooks can modify the filesystem, modify the JSON file, or take other actions • Hooks run sequentially, in an order defined in the JSON file • Docker generates a bundle without hooks • Docker does let you specify your own runtime • Your runtime could inject hooks, then execute the real runtime
  72. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo
  73. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. A brief note before we finish — Feedback provides valuable information to speakers! Feedback that is very helpful: • Topics you were excited to learn about • Suggestions for improving understanding and clarity Feedback that is extremely unhelpful: • Comments unrelated to talk content (please refer to the LinuxFest Northwest Code of Conduct) Reach out for Q&A (https://discuss.lfnw.org, @samuelkarp) For support, use the AWS Forums or contact AWS Support
  74. © 2019, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Questions? Q&A at https://discuss.lfnw.org Also available on Twitter – @samuelkarp
  75. © 2020, Amazon Web Services, Inc. or its Affiliates. All

    rights reserved. Thank you! Samuel Karp @samuelkarp