Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Containerization primatives

Sam Kottler
November 05, 2014

Containerization primatives

Sam Kottler

November 05, 2014
Tweet

More Decks by Sam Kottler

Other Decks in Technology

Transcript

  1. ABOUT ME • Work at DigitalOcean as a systems engineer

    • Formerly of Red Hat, Venmo, Acquia • Committer/core for Puppet, Ansible, Fedora, CentOS, RubyGems, Bundler
  2. GOOD TO KNOW’S • What is a syscall • Basic

    understanding of linux networking • Containers vs. virtualization
  3. CONTAINERS ARE THE PAST *, PRESENT, AND FUTURE * Most

    of the linux ideas are poached from other OS’s
  4. NAMESPACES • mnt: filesystem • pid: process • net: network

    • ipc: SysV IPC • uts: hostname • user: UID
  5. THE BASICS • Namespaces do not have names • Six

    inodes exist under /proc/<pid>/ns • Each namespace has a unique inode
  6. NAMESPACE SYSCALLS • unshare() • moves existing process into a

    new namespace • clone() • creates new process and namespace • setns() • joins an existing namespace
  7. NETWORK ISOLATION • One namespace per networking device • Single

    default namespace, init_net(*nets) • A lo device is included in every ns_net.
  8. NETWORK NAMESPACES IN PRACTICE • ip netns add testns1 •

    creates /var/run/netns/testns1 • route management per-NS • prevents cross-NS bonds • setns(int fd, int nstype) • validates namespace type vs. FD
  9. SOCKET ISOLATION • Sockets are mapped into network namespaces •

    Also part of a single network namespace • sk_net is part of the sock struct • sock_net()/sock_net_set() getter/setter
  10. SOCKET ACTIVATION • Listen on a socket, but have no

    services behind it • Request arrives, service is spun up, responds • Enabling 10k+ low-usage services on a VM
  11. USER ISOLATION • Allows non-privileged usage • Often used as

    the start of a namespace chain • UID’s come from the overflow rules
  12. CGROUPS + NAMESPACES • “This PID can only see part

    of the filesystem” • “This PID can only see part of the filesystem, use 384mb of memory, and utilize a single CPU.”
  13. CGROUP IMPLEMENTATION • Hooks into fork() and exit() • VFS

    of a new type called “cgroup” • More complex descriptors for task_struct • Procfs entry in /proc/<pid>/cgroup • All actions take place on the FS
  14. CGROUP MANAGEMENT • 4 files per-cgroup • tasks • cgroup.procs

    • cgroup.event_control • notify_on_release
  15. MEMORY • Exposes most of the memory subsystem • NUMA

    management • Most complex type of cgroup