Slide 1

Slide 1 text

CONTAINERIZATION PRIMITIVES Sam Kottler @samkottler

Slide 2

Slide 2 text

ABOUT ME • Work at DigitalOcean as a systems engineer • Formerly of Red Hat, Venmo, Acquia • Committer/core for Puppet, Ansible, Fedora, CentOS, RubyGems, Bundler

Slide 3

Slide 3 text

WE’RE GONNA BE TALKING ABOUT LINUX

Slide 4

Slide 4 text

GOOD TO KNOW’S • What is a syscall • Basic understanding of linux networking • Containers vs. virtualization

Slide 5

Slide 5 text

WHY DO WE CARE ABOUT ANY OF THIS?

Slide 6

Slide 6 text

CONTAINERS ARE THE PAST *, PRESENT, AND FUTURE * Most of the linux ideas are poached from other OS’s

Slide 7

Slide 7 text

VIRTUALIZATION HAS BECOME MASSIVELY POPULAR BECAUSE OF ITS ECONOMICS

Slide 8

Slide 8 text

CONTAINERS ARE BECOMING MASSIVELY POPULAR BECAUSE THEY ALLOW LOGICAL SEPARATION

Slide 9

Slide 9 text

APPLICATION VS. FULL CONTAINERS

Slide 10

Slide 10 text

NETWORKS, USERS, AND PROCESSES

Slide 11

Slide 11 text

NAMESPACES • mnt: filesystem • pid: process • net: network • ipc: SysV IPC • uts: hostname • user: UID

Slide 12

Slide 12 text

THE BASICS • Namespaces do not have names • Six inodes exist under /proc//ns • Each namespace has a unique inode

Slide 13

Slide 13 text

USERSPACE TOOLING • iproute2 • util-linux • systemd

Slide 14

Slide 14 text

NAMESPACE SYSCALLS • unshare() • moves existing process into a new namespace • clone() • creates new process and namespace • setns() • joins an existing namespace

Slide 15

Slide 15 text

NETWORK ISOLATION • One namespace per networking device • Single default namespace, init_net(*nets) • A lo device is included in every ns_net.

Slide 16

Slide 16 text

NETWORK NAMESPACES IN PRACTICE • ip netns add testns1 • creates /var/run/netns/testns1 • route management per-NS • prevents cross-NS bonds • setns(int fd, int nstype) • validates namespace type vs. FD

Slide 17

Slide 17 text

SOCKET ISOLATION • Sockets are mapped into network namespaces • Also part of a single network namespace • sk_net is part of the sock struct • sock_net()/sock_net_set() getter/setter

Slide 18

Slide 18 text

SOCKET ACTIVATION • Listen on a socket, but have no services behind it • Request arrives, service is spun up, responds • Enabling 10k+ low-usage services on a VM

Slide 19

Slide 19 text

USER ISOLATION • Allows non-privileged usage • Often used as the start of a namespace chain • UID’s come from the overflow rules

Slide 20

Slide 20 text

CGROUPS • Resource management • Around since 2006/2007 • Widely used by userspace management tools

Slide 21

Slide 21 text

CGROUPS + NAMESPACES • “This PID can only see part of the filesystem” • “This PID can only see part of the filesystem, use 384mb of memory, and utilize a single CPU.”

Slide 22

Slide 22 text

CGROUP IMPLEMENTATION • Hooks into fork() and exit() • VFS of a new type called “cgroup” • More complex descriptors for task_struct • Procfs entry in /proc//cgroup • All actions take place on the FS

Slide 23

Slide 23 text

CGROUP MANAGEMENT • 4 files per-cgroup • tasks • cgroup.procs • cgroup.event_control • notify_on_release

Slide 24

Slide 24 text

CPU • Split into “shares” • Default is 2048 shares • Linear CPU time use

Slide 25

Slide 25 text

MEMORY • Exposes most of the memory subsystem • NUMA management • Most complex type of cgroup

Slide 26

Slide 26 text

LETS TALK ABOUT SECURITY…

Slide 27

Slide 27 text

SHARING A KERNEL IS INHERENTLY LESS SECURE

Slide 28

Slide 28 text

KERNEL VULNERABILITIES AROUND BREAKOUT ARE USUALLY MITIGATED BY RUNNING SERVICES NON- PRIVILEGED

Slide 29

Slide 29 text

THANKS! • @samkottler • https://github.com/skottler • [email protected]