Slide 1

Slide 1 text

Agenda • Virtualization and Containers • Brief History of Containers • Linux Container Internals (namespaces, cgroups, capabilities) • Union Filesystem • Demo

Slide 2

Slide 2 text

What is Virtualization? It is a layer of abstraction for emulation/simulation of various resources. why? Isolation, scalability, utilization, reducing costs, compatibility…

Slide 3

Slide 3 text

• Hardware virtualization (Virtual machine, virtual memory?) • Application virtualization (JVM) • Operating system level virtualization (Containers) • Network virtualization • …

Slide 4

Slide 4 text

What is a Virtual Machine? Creating a computer within a computer.

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

What is a Container? A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.

Slide 7

Slide 7 text

Infrastructure / Hardware Operating System (kernel, libraries, system programs, configurations etc…) CHROME (PID, env, address space…) SLACK Containers are just regular processes with some isolation and security features CONTAINER CONTAINER

Slide 8

Slide 8 text

Why are containers more lightweight than virtual machines?

Slide 9

Slide 9 text

Brief history of containers 1979 chroot 2000 FreeBSD Jails 2001 Linux VServer 2004 Solaris Zones OpenVZ 2008 LXC 2013 Docker rkt 2018 Kata Containers

Slide 10

Slide 10 text

Okay but how it actually works?

Slide 11

Slide 11 text

Containers are combinations of many different technologies Linux Namespaces, cgroups, Linux Capabilities, bridge networks, Union Filesystem…

Slide 12

Slide 12 text

1) Namespaces They allow for isolation of global system resources between independent processes. History: The Linux Namespaces originated in 2002 in the 2.4.19 kernel with work on the mount namespace. “What happens in namespace stays in namespace”

Slide 13

Slide 13 text

Types Of Namespaces • Mount: isolate the set of filesystem mount points seen by a group of processes • UTS: isolate domain and host name • IPC: isolate certain interprocess communication resources (semaphores, queues…) • PID: isolate the PID number space • Network: isolate network related system resources (network devices, ip, ports …) • User: isolate user and group ID number spaces • Cgroup: hides the identity of the control group of which process is a member

Slide 14

Slide 14 text

• When a Linux kernel boots up, it creates a default namespace for each type, used by all processes. • Processes can create additional namespaces with the unshare command or as new flags in a clone syscall. • nsenter command can bu used to enter a namespace P.S. Google Chrome make use of namespaces to isolate its own processes which are at risk from attack on the internet.

Slide 15

Slide 15 text

Parent PID Namespace Child PID Namespace 1 2 1 3 5 4 6 7 2 1 3 PID tree view of parent PID tree view of child PID namespace

Slide 16

Slide 16 text

User namespace UID GID 0 0 1 1 … … 503 20 509 509 UID GID 0 0 … … Users on Host OS Users on Container

Slide 17

Slide 17 text

• The kernel assigns each process a symbolic link per namespace type in /proc//ns

Slide 18

Slide 18 text

2) CGROUPS Control Groups are a Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored. History: cgroups are originally developed by Google and merged into the Linux kernel in 2008

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Cgroups allow you to allocate resources — such as CPU time, system memory, network bandwidth, storage i/o or combinations of these resources — among processes (or threads) running on a system. In other words:

Slide 21

Slide 21 text

Resource limiting: Limit the memory usage of a process to 100 MB Prioritisation: Some groups may get a larger share of CPU utilization Accounting: Measure a group's resource usage Control: Stop, freeze or restart group of processes Group Profile 1 Group Profile • % 60 CPU • 5 GB Memory • %90 Network • % 70 blkio Applications • NGINX, postgresql, httpd…

Slide 22

Slide 22 text

cpu ├── 1 ├── 100 ├── 2 ├── browsers │ ├── 44 │ ├── 45 │ └── 47 └── important ├── 60 ├── 61 ├── containers │ ├── 200 │ ├── 202 │ └── 204 └── scripts ├── 604 └── 800 • Process with PID 1 belongs to root cpu group • PID 60 belongs to important cgroup • PID 202 belongs to important/containers cgroup

Slide 23

Slide 23 text

• cgroup hierarchy: list of resources • Each resource can have more than one cgroup

Slide 24

Slide 24 text

Container Security • Traditional UNIX has a very simple permission check • Privileged processes (root) or unprivileged (non root users) • Root (UID 0) user is too powerful, dangerous • Other users have very restricted access (can’t open raw socket, load module etc..)

Slide 25

Slide 25 text

So how can you run ping command as a non-root user?

Slide 26

Slide 26 text

3) Linux Capabilities • Break up root privileges into distinct units, known as capabilities • They can be assigned to processes independently • Parent processes might pass capabilities to child • There are around 40 capabilities on current Linux kernel

Slide 27

Slide 27 text

Examples of Linux Capabilities • CAP_CHOWN: Make arbitrary changes to file UIDs and GIDs • CAP_KILL: Bypass permission checks for sending signals • CAP_NET_RAW: Use RAW and PACKET sockets • CAP_SYS_BOOT: Use reboot P.S. The child process created by clone() with the CLONE_NEWUSER flag starts out with a complete set of capabilities in the new user namespace

Slide 28

Slide 28 text

Other security features • Seccomp • AppArmor/SELinux • TOMOYO • Nested Containers • Hardware assisted containerization • …

Slide 29

Slide 29 text

4) Union File System • Combines multiple file systems together to create a single unified filesystem • Docker uses it to layer images $ docker pull python Using default tag: latest latest: Pulling from library/python b6f892c0043b: Pull complete 55010f332b04: Pull complete 2955fb827c94: Pull complete 3deef3fcbd30: Pull complete cf9722e506aa: Pull complete Digest: sha256:382452f82a8bbd34443b2c727650af46aced0f94a44463c62a9848133ecb1aa8 Status: Downloaded newer image for python:latest

Slide 30

Slide 30 text

What do you see when you type ls command on a docker container?

Slide 31

Slide 31 text

Base Layer (debian:jessie) /bin /lib /dev /etc … ….. Python Layer /bin/python /bin/pip /lib/libc.so.6 /lib/… … /bin /lib /dev /etc /usr /tmp … Resulting File System

Slide 32

Slide 32 text

FILE1 FILE4 FILE5 FILE2 FILE3 FILE4 FILE2 FILE4 FILE5 FILE2 FILE1 FILE3 FILE4 FILE5 Layer 3 Layer 2 Layer 1 Container

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

DEMO https://github.com/zeyneloz/simple-container-with-go