Linux kernel namespaces provide the isolation (hence “container”) in which we place one or more processes Linux kernel cgroups (“Control groups”) provide resource limiting and accounting (CPU, memory, I/O bandwidth, etc.) IPC net user mount pid uts
• A shared kernel across all containers on a single host • Unique filesystem that could look like a Linux distro. • In Docker, this is a layered model where, using CoW (copy-on- write) on union filesystem, other containers can share a set of underlying read- only content (writes happen on an unshared “top” layer) • Linux namespaces are shareable (see Kubernetes “pod” concept); so containers do not have to have explicit 1-to- 1 boundaries • Each container gets its own network stack. • A container doesn’t get privileged access to the sockets or interfaces of another container.
all? • Kernel Keyring • Containers running with a user of the same UID will have access to the same keys if they are handled by kernel keyring • /proc • /proc offers a source of information leak and large attack surface. • It includes files that contain configuration information of the kernel & host system • Time: • Can be controlled by CAP_SYSTEM_TIME (disabled in docker defaults) • The Kernel Modules • Loaded modules become available across all containers and the host • Hardware/devices • Disk drives, sound-cards, GPU, etc.
Container Compromise Illegitimate data access and affect control flow of instructions • Manipulation of control flow. • Data Theft Denial of Service(DoS) Disturb normal operation of the host or other container • Fork Bomb • DoS Host • Noisy neighbor Privilege Escalation Obtain a privilege which is not originally granted to the container • Access host/private data • Access other container’s data • Docker Admin access • Docker API Access • Kernel modification/modules -- THREAT MITIGATION -- -- THREAT MITIGATION -- -- THREAT MITIGATION --
Reliance on Linux kernel features to properly isolate (namespace) and control (cgroups) • The attack surface of the Docker daemon • Loopholes in the container configuration profile (default or customized) • The “hardening” security features of the kernel. • Trusted docker images.
cgroups LSMs Capabilities seccomp userns Control/limit container access to CPU, memory, swap, block IO (rates), network AppArmor and SELinux are both supported in the Docker engine (via runc); a default profile is applied for the engine and containers Docker by default only allows 14 of the 37 Linux capability groups; more can be dropped or added as required Fine grained per- syscall control is available via seccomp; a default profile limiting many syscalls is already applied User namespaced processes remap root to an unprivileged ID on the host. Docker supports a global uid/gid mapping --pids-limit for controlling PID limitations per container (forkbomb prevention); --no-new-privileges to prevent privilege escalation, --readonly filesystem for immutable container image; DOCKER_CONTENT_TRUST=1 for notary/signed image provenance, Authz plugins (Twistlock), TLS certificate-based API endpoint configuration; Storage quotas for specific Docker storage backends (btrfs, zfs in 1.12; devicemapper already available)
• Docker REST API’s do not have any in-built authenticating mechanism! • With Docker on TCP socket, any Docker client can connect. • Docker allows you to share a directory between the Docker host and a guest container • Only trusted users should be allowed to control your Docker daemon • Always configure TLS-enabled daemon, client & enable server verification. • Enable TLS by specifying the tlsverify flag and pointing Docker’s tlscacert flag to a trusted CA certificate. • Use of user-namespace. • --userns-remap=<user-name> • Makes it harder to perform privilege elevation through the file system.
Secure Hash: • Content Addressable Storage. • In docker, it’s called docker digest, a SHA-256 hash of a filesystem layer or manifest [since docker v1.10] • Secure Signing and Verification Infrastructure: • Data could be changed / copied if it travels over unsecure channels • (e.g. HTTP), so we need to ensure we are publishing and accessing content using secure protocols. • e.g. Notary project, which compares a checksum for a downloaded file with the checksum in Notary’s trusted collection for the file source • Dockerfile: • It produce different images over time, so as time goes, it’s hard to be sure what is in your images. • Always specify a tag in FROM instruction, and use digest to pull the exactly same image each time • Tag must not be latest! • Verify any software or data downloaded from the internet by using checksums or cryptographic signatures.
• Container configuration. • Cgroups • --kernel-memory, --memory, --memory-swap, --cpu-period, --cpu-quota, -- cpu-shares, --cpuset- cpus, --cpuset-mems, --device-read-bps, --device-read-iops, --device- write-bps, --device-write- iops, --blkio-weight, --blkio-weight-device, –cpus (since v1.13) • Use these flags with docker run/create • Think on these lines • Suites the application in container. • Container should not eat all/any resources alone • e.g. CPU, Memory, IO bandwidth, Network bandwidth. • Protects from DoS attacks • capabilities • Use cap-add and cap-drop with docker run/create • Drop all capabilities which are not required. • Use CAP_SYS_ADMIN with caution!.(disable by default) • Make the attack surface very narrow
• SElinux/AppArmor ( docker-default) • Helps in protecting resources with fine grained policies. • Restrict usage of file systems, directories or even single file. • Controls process initialization and program execution. • Restrcit sockets, messages, and network interfaces • docker run --security-opt selinux=/path/to/selinux/profile hello-world • e.g. • Prevents any operation on /proc/ , /sys/* , /dev/* etc. • Solves keyring issue in container. • Denies kernel module insert • seccomp (docker-default) • Seccomp restricts the kernel calls that containers can make. • Default profile only disables around 44 system calls out of 300+ • docker run --security-opt seccomp=/path/to/seccomp/profile.json hello-world • e.g. • Disable mount in container • Disable change of time inside container.
Privilege” • Do not run processes in a container as root to avoid root access from attackers. • Enable User-namespace (disabled by default.) • Run filesystems as read-only(wherever applicable) so that attackers can not overwrite data or save malicious scripts to file. • Cut down the kernel calls that a container can make to reduce the potential attack surface. • Limit the resources that a container can use (SELinux/AppArmor)
- https://docs.docker.com - http://www.slideshare.net/PhilEstes/docker-london-container-security - NCC Group Report “Understanding and Hardening Linux Containers” v1.1