CONTAINERIZATION
PRIMITIVES
Sam Kottler
@samkottler
Slide 2
Slide 2 text
ABOUT ME
• Work at DigitalOcean as a systems engineer
• Formerly of Red Hat, Venmo, Acquia
• Committer/core for Puppet, Ansible, Fedora,
CentOS, RubyGems, Bundler
Slide 3
Slide 3 text
WE’RE GONNA BE TALKING
ABOUT LINUX
Slide 4
Slide 4 text
GOOD TO KNOW’S
• What is a syscall
• Basic understanding of linux networking
• Containers vs. virtualization
Slide 5
Slide 5 text
WHY DO WE CARE ABOUT
ANY OF THIS?
Slide 6
Slide 6 text
CONTAINERS ARE THE PAST *,
PRESENT, AND FUTURE
* Most of the linux ideas are poached from other OS’s
Slide 7
Slide 7 text
VIRTUALIZATION HAS
BECOME MASSIVELY POPULAR
BECAUSE OF ITS ECONOMICS
Slide 8
Slide 8 text
CONTAINERS ARE BECOMING
MASSIVELY POPULAR
BECAUSE THEY ALLOW
LOGICAL SEPARATION
NAMESPACE SYSCALLS
• unshare()
• moves existing process into a new namespace
• clone()
• creates new process and namespace
• setns()
• joins an existing namespace
Slide 15
Slide 15 text
NETWORK ISOLATION
• One namespace per networking device
• Single default namespace, init_net(*nets)
• A lo device is included in every ns_net.
Slide 16
Slide 16 text
NETWORK NAMESPACES IN
PRACTICE
• ip netns add testns1
• creates /var/run/netns/testns1
• route management per-NS
• prevents cross-NS bonds
• setns(int fd, int nstype)
• validates namespace type vs. FD
Slide 17
Slide 17 text
SOCKET ISOLATION
• Sockets are mapped into network namespaces
• Also part of a single network namespace
• sk_net is part of the sock struct
• sock_net()/sock_net_set() getter/setter
Slide 18
Slide 18 text
SOCKET ACTIVATION
• Listen on a socket, but have no services behind it
• Request arrives, service is spun up, responds
• Enabling 10k+ low-usage services on a VM
Slide 19
Slide 19 text
USER ISOLATION
• Allows non-privileged usage
• Often used as the start of a namespace chain
• UID’s come from the overflow rules
Slide 20
Slide 20 text
CGROUPS
• Resource management
• Around since 2006/2007
• Widely used by userspace management tools
Slide 21
Slide 21 text
CGROUPS + NAMESPACES
• “This PID can only see part of the filesystem”
• “This PID can only see part of the filesystem, use
384mb of memory, and utilize a single CPU.”
Slide 22
Slide 22 text
CGROUP IMPLEMENTATION
• Hooks into fork() and exit()
• VFS of a new type called “cgroup”
• More complex descriptors for task_struct
• Procfs entry in /proc//cgroup
• All actions take place on the FS