- suspends or resumes tasks ◦ Processes or container migration to another node, using memory dumps • perf_event - allows monitoring using perf • hugetlb - controls the amount of large pages
creates veths pairs - acting as eth0 inside the container ◦ Each container gets its own IP address ◦ All veths bridged to docker0 • Docker handles routing ◦ Container to container links possible through Docker NET Namespace
host to container ◦ Container’s “root”, is not actually root ◦ 0-10000 in a container can be 10000-20000 in host • Landed in Docker 1.10 ◦ https://integratedcode.us/2016/02/05/docker-1-10-security-userns/
◦ If PID 1 dies, all other processes get killed • Container PID 1 is mapped to another PID in the host ◦ Host can see all processes running inside containers • PID namespaces can be nested ◦ There’s a PID-ception • Shared namespaces supported in Docker 1.12
your host ◦ Can do pretty much anything if compromised Solutions • Restrict access to the daemon only to the ones really needing it (users, processes etc) • Don’t expose the daemon to the outside world ◦ If you do so, make sure you have put this behind a secure proxy, like NGINX • Don’t make it easy to SSH with the users that have access to the daemon
Containers might have access to system resources they shouldn’t ◦ ie broad volume mounts Solutions • Mount only volumes you need to ◦ Try to mount them as readonly if the container should not write • Don’t use the --privileged flag ◦ Use the --cap-add flag, only for the capabilities that you really need • If you can, don’t run containers as root ◦ Or use user remapping Loopholes in container config
for containers and spot malicious ones. Also, having information that spans multiple containers of the same origin can enhance your tracking mechanisms.
◦ Identify the user that is doing the request • Authorize requests for users ◦ Check if the user is allowed to make the given request ◦ Reject requests that don’t comply with the user’s allowance Authentication and authorization plugins
spawned ◦ Switch to the container network namespace (using setns) ◦ Use tc and iptables to impose limits • Distributed network initialization ◦ Create a network initialization container ◦ Initialize the network stack using imposed limits ◦ Make other containers use the same network stack ◦ CAP_NET_ADMIN to the rescue, since such action is not allowed by default Use case: Impose network limits
and watch the size of each container • Use device mapper and ensure max size of a container • Make the root filesystem read-only ◦ In combination with --tmpfs flag, available in Docker 1.10 for volumes that the container should write to
and watch the size of each directory • Use filesystem quotas to the mounted directories • Use loopback devices from sparse files • Use a logical volume manager (LVM) ◦ ZFS, etc
Control (MAC) system which is a kernel (LSM) enhancement to confine programs to a limited set of resources. AppArmor's security model is to bind access control attributes to programs rather than to users. ◦ Bane to the rescue: https://github.com/jfrazelle/bane • SELinux ◦ Security-Enhanced Linux (SELinux) is a Linux kernel security module that provides a mechanism for supporting access control security policies, including United States Department of Defense– style mandatory access controls (MAC).