What is a Cgroup? - Mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour. - First Class Citizens - Process-like Hierarchical model, but: - Multiple parallel hierarchies coexist. - Each hierarchy connects to a Subsystem.
What is a subsystem? - Represents Single resource - CPU - Memory - Blkio - cpuacct - cpuset - devices - freezer - ns - etc. - Something that does something to a group of Tasks :-/
Kernel maintains hierarchical constraints on Limits if devices cgroup /child1 cannot access a disk drive, then /child1/child2 cannot give itself those rights
Implications - There is only one way that a task can be limited or affected by any single subsystem. - You can group several subsystems together so that they affect all tasks in a single hierarchy. - Cgroups in that hierarchy have different parameters set, those tasks will be affected differently. - Constant, Refactor is required for best Knapsack.
Manual [email protected]:~/workspace$ cgcreate -h Usage: cgcreate [-h] [-f mode] [-d mode] [-s mode] [-t :] [-a :] -g : [-g ...] Create control group(s) -a : Owner of the group and all its files -d, --dperm=mode Group directory permissions -f, --fperm=mode Group file permissions -g : Control group which should be added -h, --help Display this help -s, --tperm=mode Tasks file permissions -t : Owner of the tasks file
nsenter [email protected]:~$ nsenter --help Options: -a, --all enter all namespaces -t, --target target process to get namespaces from -m, --mount[=] enter mount namespace -u, --uts[=] enter UTS namespace (hostname etc) -i, --ipc[=] enter System V IPC namespace -n, --net[=] enter network namespace -p, --pid[=] enter pid namespace -C, --cgroup[=] enter cgroup namespace -U, --user[=] enter user namespace -S, --setuid set uid in entered namespace -G, --setgid set gid in entered namespace --preserve-credentials do not touch uids or gids -r, --root[=] set the root directory -w, --wd[=] set the working directory -F, --no-fork do not fork before exec'ing
net namespace [email protected]:~/workspace/meson10/linuxlab$ ip link 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: wlp58s0: mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000 link/ether 18:5e:0f:ee:d9:32 brd ff:ff:ff:ff:ff:ff 9: bridge0: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
net namespace [email protected]:~/workspace/meson10/linuxlab# ip link 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
net namespace [email protected]:~$ sudo unshare -n /bin/bash [sudo] password for meson10: [email protected]:~/workspace/meson10/linuxlab# ip addr 1: lo: mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
net namespace [email protected]:~$ sudo unshare -n /bin/bash [sudo] password for meson10: [email protected]:~/workspace/meson10/linuxlab# ip addr 1: lo: mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 [email protected]:~/workspace/meson10/linuxlab# ping localhost connect: Network is unreachable [email protected]:~/workspace/meson10/linuxlab# ip link set dev lo up && ping localhost PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.033 ms
net namespace implications - One device can only be connected to one namespace. - If eth0 is connected to root namespace, created namespace won’t find internet access.
net namespace solution $: sudo ip link add host type veth peer name guest $: sudo ip link set guest netns $: sudo ip addr add 192.168.0.2/24 dev host $: sudo ip link set host up $ns: ip addr add 192.168.0.1/24 dev guest $ns: ip link set guest up $: brctl addbr bridge0 $: ip addr add 192.168.1.2/24 dev bridge0 $: ip link set dev bridge0 up $: brctl addif bridge0 host $: ip link set host up $ns: ip addr add 192.168.1.1/24 dev guest $ns: ip link set guest up $ns: ip route add default via 192.168.1.2
pid namespace $: sudo unshare -p /bin/bash - Child process enters a new PID namespace - Gets PID 1 - Forked Process gets PID for namespace and a global PID. - Signals - Register explicit signals. - Ctrl-C doesn’t work in Docker. - Child dying, grandchildren get connected to PID1. - If PID1 dies: - children get SIGKILL recursively - namespace is deleted.
mnt namespace - Isolated list of mount points - Unshare copies the parent’s mountpoints - May conditionally propagate. - Private by default. - If unshared namespace user != parent namespace user, it is less privileged. - For less privileged namespace, shared become slaves. - Mount flags cannot be altered across less privileged mounts