Slide 1

Slide 1 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Samuel Karp, Amazon Web Services – @samuelkarp LinuxFest Nortwest 2020 – Online Linux Container Primitives cgroups, namespaces, and more!

Slide 2

Slide 2 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. LinuxFest Northwest – Online! Q&A at https://discuss.lfnw.org Also available on Twitter – @samuelkarp

Slide 3

Slide 3 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Container primitives overview Control groups (cgroups) Namespaces Union filesystems Runtimes

Slide 4

Slide 4 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Container primitives overview Control groups (cgroups) Namespaces Union filesystems Capabilities Runtimes

Slide 5

Slide 5 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Linux container primitives

Slide 6

Slide 6 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Containers are an abstraction over several different Linux technologies

Slide 7

Slide 7 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Linux Kernel Container runtime Container 1 Container 2 Container 3 Container 4 Container 5 Container 6 Namespaces Control groups Union filesystem

Slide 8

Slide 8 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Control groups

Slide 9

Slide 9 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What do control groups (cgroups) do? • Organize all processes in the system • Account for resource usage and gather utilization data • Limit or prioritize resource utilization

Slide 10

Slide 10 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Subsystems • Control group system is an abstract framework • Subsystems are concrete implementations • Different subsystems can organize processes separately • Most subsystems are resource controllers Examples of subsystems: • Memory • CPU time • Block I/O • Number of discrete processes (pids) • CPU & memory pinning • Freezer (used by docker pause) • Devices • Network priority

Slide 11

Slide 11 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hierarchical representation • Independent subsystem hierarchies • Every pid is represented exactly once in each subsystem • New processes inherit cgroups from their parents ├── blkio │ └── docker │ └── b211c37 ├── cpu,cpuacct │ └── docker │ └── b211c37 ├── cpuset │ └── docker │ └── b211c37 ├── devices │ └── docker │ └── b211c37 ├── freezer │ └── docker │ └── b211c37 ├── hugetlb │ └── docker │ └── b211c37 ├── memory │ └── docker │ └── b211c37

Slide 12

Slide 12 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. cgroup virtual filesystem • Typically mounted at /sys/fs/cgroup • tasks virtual file holds all pids in the cgroup • Other files have settings and utilization data ├── cgroup.clone_children ├── cgroup.procs ├── cgroup.sane_behavior ├── cpuacct.stat ├── cpuacct.usage ├── cpuacct.usage_all ├── cpuacct.usage_percpu ├── cpuacct.usage_percpu_sys ├── cpuacct.usage_percpu_user ├── cpuacct.usage_sys ├── cpuacct.usage_user ├── cpu.cfs_period_us ├── cpu.cfs_quota_us ├── cpu.rt_period_us ├── cpu.rt_runtime_us ├── cpu.shares ├── cpu.stat ├── notify_on_release ├── release_agent └── tasks

Slide 13

Slide 13 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 14

Slide 14 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What can you use cgroups for? • cgroups can be used independently of containers • cgroups control resource limits for processes • Monitor processes and organize them • Be careful not to break any assumptions your container runtime or orchestrator might have

Slide 15

Slide 15 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Further reading • Linux: Documentation/cgroup-v1

Slide 16

Slide 16 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Namespaces

Slide 17

Slide 17 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What do namespaces do? • Isolation mechanism for resources • Changes to resources within namespace can be invisible outside the namespace • Resource mapping with permission changes

Slide 18

Slide 18 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What namespaces are available? • Network • Filesystem (mounts) • Processes (pid) • Inter-process communication (ipc) • Hostname and domain name (uts) • User and group IDs • cgroup

Slide 19

Slide 19 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Namespace sharing Process A Process B Process C Process D pid:[2] pid:[1] pid:[3] net:[4] net:[5] net:[6] mount:[7] mount:[8]

Slide 20

Slide 20 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network namespace • Frequently used in containers • veth devices can connect different namespaces • docker run uses a separate network namespace per container • Multiple containers can share a network namespace • Kubernetes pods • Amazon ECS tasks with the awsvpc networking mode

Slide 21

Slide 21 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mount namespace • Used for giving containers their own filesystem • Container image is mounted as the root filesystem • “Volumes” to share data between containers or the host • More about filesystems to come! bash-4.2# mount overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2 /l/Q5EBZ7CIJYELLG2MBKZIRRFWW6:/var/lib/docker/ overlay2/l/ PKATP76T57BQZ5D44JXYFIB26E,upperdir=/var/lib/ docker/ overlay2/88816f9510a9ff38b31eaaceccbef6ffc9cc3 c06bcc451f9684850db5ee1b152/diff,workdir=/var/ lib/docker/ overlay2/88816f9510a9ff38b31eaaceccbef6ffc9cc3 c06bcc451f9684850db5ee1b152/work) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmx mode=666) sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)

Slide 22

Slide 22 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. procfs virtual filesystem • Namespaces are visible in /proc • Files are symbolic links to the namespace • The link contains the namespace type and inode number to identify the namespace $ readlink /proc/$$/ns/* cgroup:[4026531835] ipc:[4026531839] mnt:[4026531840] net:[4026531993] pid:[4026531836] user:[4026531837] uts:[4026531838]

Slide 23

Slide 23 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Creating namespaces • clone(2) and unshare(2) • CLONE_NEW* flags to specify which namespaces • clone(2) is for new processes to create new namespaces • unshare(2) is for existing processes to create new namespaces

Slide 24

Slide 24 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Persisting namespaces • The kernel automatically garbage-collects namespaces by reference-counting • New namespace remains open as long as • a process runs or • a mount is open • Bind-mount a file in /proc/$$/ns to another place on the filesystem $ mount \ --bind /proc/$$/ns/net \ /var/run/netns/lfnw

Slide 25

Slide 25 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Entering namespaces • Open a file from /proc/$$/ns (or a bind-mount) • Pass to setns(2) to enter the existing namespace • Namespace remains open as long as the process is running, even if the original file goes away • nsenter(1) is a command for doing this interactively • ip-netns(8) works specifically for network namespaces

Slide 26

Slide 26 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 27

Slide 27 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How can you leverage this? • Use nsenter or ip netns to troubleshoot container networking • Monitor containers by entering the pid namespace • Access binaries in your containers with the mount namespace

Slide 28

Slide 28 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Further reading • man 7 namespaces • man 7 pid_namespaces • man 7 user_namespaces

Slide 29

Slide 29 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Images, layers, and union filesystems

Slide 30

Slide 30 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Filesystem images • Images are representations of a filesystem • Images are popular for virtualization and container systems • Docker helped popularize the concept of layers

Slide 31

Slide 31 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Top layer (read-write) Intermediate layer (read-only) Base layer (read-only) How Docker layers work • A copy-on-write view of your files • New files exist only in the top layer • When a file is modified, it is copied up to the top layer • Unmodified files exist in whatever layer they were added/modified • Deleted files are hidden, but still exist

Slide 32

Slide 32 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Union filesystems • Unified view of two (or more) filesystems • Popular in container runtimes (like Docker) to implement layers • Efficient use of storage when making minor modifications to images • Efficient use of storage when starting multiple containers with identical images

Slide 33

Slide 33 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Overlay filesystem • Joins two directories (upper and lower) to form a union • Uses file name to describe the files • When writing to the overlay • lowerdir is not modified, all changes go to upperdir • Existing files are copied-up to the upperdir for modificiation • Whole file is copied, not just blocks • “Deleting” a file in the upperdir creates a whiteout • Files: character devices with 0/0 device number • Directories: xattr “trusted.overlay.opaque” set to “y”

Slide 34

Slide 34 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Overlay filesystem (continued) • An upperdir can have multiple lowerdirs • Overlay filesystems can be created with mount(2) • You can examine the mounts with • mount(8) • /proc/mounts • /proc/$$/mountinfo

Slide 35

Slide 35 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Docker’s overlay driver • Docker’s default layer storage uses the overlay filesystem • upperdir, lowerdir, and diff directories are in /var/lib/docker/overlay2

Slide 36

Slide 36 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 37

Slide 37 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How can you leverage this? • Locate files in your layers • Examine which files and layers contribute to your disk usage • Understand the impact of writable files in your containers # du -h . | sort -hr 753M . 211M ./e33f37/diff 211M ./e33f37 204M ./e33f37/diff/usr 169M ./f87973/diff … # ls ./f87973 diff link # ls ./e33f37 diff link lower work Base layer! Intermediate layer!

Slide 38

Slide 38 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Further reading • Linux: Documentation/filesystems/overlay.txt

Slide 39

Slide 39 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Capabilities

Slide 40

Slide 40 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traditional UNIX permissions • Privileged operations restricted to UID 0 (root) • Non-privileged operations available to all users • Privileged processes bypass all permission checks • Unprivileged processes permission checks (UID/GID)

Slide 41

Slide 41 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Increased granularity • Grant some permissions without root • Deny permissions even to root processes • 38 distinct capabilities • Varying degrees of granularity • CAP_SYS_ADMIN is very broad • CAP_SYS_TIME is comparatively narrow • Capabilities set on threads and files

Slide 42

Slide 42 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thread capabilities • Capabilities are set on threads; different threads of the same process can have different capabilities • Threads can raise or lower privileges at runtime • Effective – used by the kernel for permission checks • Permitted – limiting superset of effective capabilities • Inheritable – persist across execve(2) for root • Ambient – persist across execve(2) for non-root • Bounding – limits permissions across execve(2)

Slide 43

Slide 43 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. File capabilities • File capabilities + thread capabilities determine capabilities after execve(2) • Permitted – automatically permitted, regardless of inheritable • Inheritable – ANDed with thread inheritable set to determine which capabilities are enabled after execve(2) • Effective – whether permitted capabilities are automatically enabled

Slide 44

Slide 44 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 45

Slide 45 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 46

Slide 46 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 47

Slide 47 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 48

Slide 48 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 49

Slide 49 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 50

Slide 50 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 51

Slide 51 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 52

Slide 52 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 53

Slide 53 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 54

Slide 54 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 55

Slide 55 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 56

Slide 56 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 57

Slide 57 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 58

Slide 58 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 59

Slide 59 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 60

Slide 60 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transforming capabilities with execve P’(ambient) = (file is privileged) ? 0 : P(ambient) P’(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & P(bounding)) | P’(ambient) P’(effective) = F(effective) ? P’(permitted) : P’(ambient) P’(inheritable) = P(inheritable) [i.e., unchanged] P’(bounding) = P(bounding) [i.e., unchanged]

Slide 61

Slide 61 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Special treatment for root • Preserve traditional UNIX semantics • setuid root with file capabilities • Still bound by the Bounding set

Slide 62

Slide 62 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Syscalls • prctl(2): control ambient and bounding capabilities, “no new privileges”, “keep capabilities”, etc. • capget(2)/cap_get_proc(3) & capset(2)/cap_set_proc(3): Control effective, permitted, and inheritable capability sets

Slide 63

Slide 63 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tools • capsh(1): run processes with specified capabilities • getcap(8)/setcap(8): get/set file capabilities

Slide 64

Slide 64 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 65

Slide 65 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges with the capability system • Capabilities were added to Linux much later, and are not as widely used • Very complex, interactions between thread and file capabilities are hard to reason about • Broad capabilities make it hard to effectively restrict • Some capabilities can be used to escalate arbitrarily

Slide 66

Slide 66 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How can you leverage this? • Reduce the need for setuid/setgid binaries • Understand how Docker uses bounded capabilities to restrict permissions "capabilities": { "bounding": [ "CAP_CHOWN", "CAP_DAC_OVERRIDE", "CAP_FSETID", "CAP_FOWNER", "CAP_MKNOD", "CAP_NET_RAW", "CAP_SETGID", "CAP_SETUID", "CAP_SETFCAP", "CAP_SETPCAP", "CAP_NET_BIND_SERVICE", "CAP_SYS_CHROOT", "CAP_KILL", "CAP_AUDIT_WRITE" ], "effective": [ "CAP_CHOWN",

Slide 67

Slide 67 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Further reading • capabilities(7) • capsh(1) • setcap(8)

Slide 68

Slide 68 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Runtimes

Slide 69

Slide 69 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is a container runtime? • A software tool that configures Linux primitives to create and run containers on a host • Examples include: • Docker • containerd • runc • CRI-O • systemd-nspawn • Open Containers Initiative (OCI) aims to standardize container runtimes, image format, and distribution • The OCI reference implementation (runc) powers Docker, containerd, and CRI‑O

Slide 70

Slide 70 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. OCI runtime spec • Containers are “bundles” • Filesystem • JSON document • Filesystem can be a union • JSON document describes • cgroups • Namespaces • Additional mounts • Linux capabilities • Linux security modules • And more • Hooks can modify the bundle { "ociVersion": "1.0.1", ⋮ "root": { "path": "/var/lib/docker/overlay2/03004c/merged" }, ⋮ "hooks": { "prestart": [{"path": "/proc/9306/exe"}] }, "linux": { "resources": { "cpu": {"shares": 0}, "pids": {“limit": 0}, ⋮ }, "cgroupsPath": "/docker/bd5cebc8950c", "namespaces": [ {"type": "mount"}, {"type": "network"}, ⋮ ], ⋮ }

Slide 71

Slide 71 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. OCI runtime hooks • Hooks run • Before a container starts • After a container starts • After a container stops • Hooks can modify the filesystem, modify the JSON file, or take other actions • Hooks run sequentially, in an order defined in the JSON file • Docker generates a bundle without hooks • Docker does let you specify your own runtime • Your runtime could inject hooks, then execute the real runtime

Slide 72

Slide 72 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Demo

Slide 73

Slide 73 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A brief note before we finish — Feedback provides valuable information to speakers! Feedback that is very helpful: • Topics you were excited to learn about • Suggestions for improving understanding and clarity Feedback that is extremely unhelpful: • Comments unrelated to talk content (please refer to the LinuxFest Northwest Code of Conduct) Reach out for Q&A (https://discuss.lfnw.org, @samuelkarp) For support, use the AWS Forums or contact AWS Support

Slide 74

Slide 74 text

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Questions? Q&A at https://discuss.lfnw.org Also available on Twitter – @samuelkarp

Slide 75

Slide 75 text

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! Samuel Karp @samuelkarp