Docker Security: A Deep Dive

Docker Security A Deep Dive

Antonis Kalipetis CTO @ SourceLair Docker Captain and big fan
Python enthusiast Coffee lover @akalipetis

Agenda • Docker internals ◦ What is a container? ◦
Possible Docker attack vectors ◦ Controlling resources ◦ Isolating processes • Custom security policies ◦ Docker metrics ◦ Authentication/Authorization plugins ◦ Examples ◦ Other tools

What is a container after all?

Containers are the use of a collection of kernel tools
and features, in order to jail and limit a process according to our needs and wants.

Controlling resources with cgroups

cgroups (abbreviated from control groups) is a Linux kernel feature
that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, network, etc.) of a collection of processes.

• memory • cpu/cpuset • devices • blkio • network*
*network is not a real cgroup, it can be used though for metering Useful cgroups

memory cgroup • Tracks memory pages used by processes •
Soft limit ◦ Reclaim under high memory usage • Hard limit ◦ OOM killed

cpu/cpuset cgroups • cpuset limits processes to specific cores •
cpu tracks CPU time used by processes ◦ Imposes weights, not limits ◦ A process can consume all the available CPU, if no other process uses it

devices cgroup • Allows read/write/mknod to certain devices • Typically
defaults to only allow tty, zero, random, null • Can give access to other devices, if this is required by the application

blkio and network cgroups • They meter network and I/O
usage • Can be used for throttling usage, or identifying malicious containers

Other cgroups • cpuacct - reports CPU usage • freezer
- suspends or resumes tasks ◦ Processes or container migration to another node, using memory dumps • perf_event - allows monitoring using perf • hugetlb - controls the amount of large pages

Questions? Don’t control your questions, shoot ‘em!

Isolating containers with namespaces

A namespace wraps a global system resource in an abstraction
that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.

• net • mnt • uts • user • pid
Namespaces

• Each container gets its own network stack ◦ Docker
creates veths pairs - acting as eth0 inside the container ◦ Each container gets its own IP address ◦ All veths bridged to docker0 • Docker handles routing ◦ Container to container links possible through Docker NET Namespace

MNT Namespace • Each container gets its own root filesystem
• Host directories bound privately to the container • Together with CoW filesystems allow for ultra-fast boots ◦ AUFS, overlay ◦ Device mapper ◦ BTRFS, ZFS

USER namespace • Allows remapping of users and groups from
host to container ◦ Container’s “root”, is not actually root ◦ 0-10000 in a container can be 10000-20000 in host • Landed in Docker 1.10 ◦ https://integratedcode.us/2016/02/05/docker-1-10-security-userns/

PID namespace • Every container has its own “PID 1”
◦ If PID 1 dies, all other processes get killed • Container PID 1 is mapped to another PID in the host ◦ Host can see all processes running inside containers • PID namespaces can be nested ◦ There’s a PID-ception • Shared namespaces supported in Docker 1.12

• uts namespace - allows for custom domain/hostname ◦ sethostname/gethostname
• ipc namespace - allows interprocess communication ◦ semaphores, message queues, shared memory Other Namespaces

Questions? Don’t isolate yourself

Possible Docker attack vectors

• Namespaces and cgroups support • Zero day vulnerabilities •
Vulnerabilities of cgroups / namespaces Solutions • Make use of recent kernels • Be informed • Take additional measures Intrinsic kernel security

The Docker daemon • The daemon runs as root in
your host ◦ Can do pretty much anything if compromised Solutions • Restrict access to the daemon only to the ones really needing it (users, processes etc) • Don’t expose the daemon to the outside world ◦ If you do so, make sure you have put this behind a secure proxy, like NGINX • Don’t make it easy to SSH with the users that have access to the daemon

• Containers might have elevated privileges, allowing container escaping •
Containers might have access to system resources they shouldn’t ◦ ie broad volume mounts Solutions • Mount only volumes you need to ◦ Try to mount them as readonly if the container should not write • Don’t use the --privileged flag ◦ Use the --cap-add flag, only for the capabilities that you really need • If you can, don’t run containers as root ◦ Or use user remapping Loopholes in container config

Questions? Feel secure to ask

Custom security policies

• CPU • Memory • Network • Disk I/O •
Process names and trees ◦ Container processes are visible in the host, or host PID namespaced containers Using Docker metrics

Using and combining Docker metrics, allows you to create profiles
for containers and spot malicious ones. Also, having information that spans multiple containers of the same origin can enhance your tracking mechanisms.

• Authenticate requests to Docker daemon ◦ Reject unauthenticated requests
◦ Identify the user that is doing the request • Authorize requests for users ◦ Check if the user is allowed to make the given request ◦ Reject requests that don’t comply with the user’s allowance Authentication and authorization plugins

Image scanning • You can scan your images for known
vulnerabilities • There are tools for that, like Docker Nautilus and CoreOS Clair • Find known vulnerable binaries

• Externally imposed limits ◦ Listen for new containers being
spawned ◦ Switch to the container network namespace (using setns) ◦ Use tc and iptables to impose limits • Distributed network initialization ◦ Create a network initialization container ◦ Initialize the network stack using imposed limits ◦ Make other containers use the same network stack ◦ CAP_NET_ADMIN to the rescue, since such action is not allowed by default Use case: Impose network limits

Use case: Impose storage limits (RootFS) • Create a watcher
and watch the size of each container • Use device mapper and ensure max size of a container • Make the root filesystem read-only ◦ In combination with --tmpfs flag, available in Docker 1.10 for volumes that the container should write to

Use case: Impose storage limits (Volumes) • Create a watcher
and watch the size of each directory • Use filesystem quotas to the mounted directories • Use loopback devices from sparse files • Use a logical volume manager (LVM) ◦ ZFS, etc

Other tools • AppArmor ◦ AppArmor is a Mandatory Access
Control (MAC) system which is a kernel (LSM) enhancement to confine programs to a limited set of resources. AppArmor's security model is to bind access control attributes to programs rather than to users. ◦ Bane to the rescue: https://github.com/jfrazelle/bane • SELinux ◦ Security-Enhanced Linux (SELinux) is a Linux kernel security module that provides a mechanism for supporting access control security policies, including United States Department of Defense– style mandatory access controls (MAC).

Questions? There’s no policy here

Some resources • Jérôme Petazzoni - http://www.slideshare.net/jpetazzo/cgroups-namespaces-and- beyond-what-are-containers-made-from-dockercon-europe-2015 • Dan
Walsh - http://www.projectatomic.io/blog/2015/12/making-docker-images- write-only-in-production/ • Linux Advanced Routing & Traffic Control - http://lartc.org/howto/lartc.ratelimit. single.html • Crosby Michael - http://crosbymichael.com/creating-containers-part-1.html • Jessie Frazelle - https://blog.jessfraz.com/post/getting-towards-real-sandbox- containers/

Thanks! Antonis Kalipetis @akalipetis https://www.sourcelair.com

Docker Security: A Deep Dive

Docker Security: A Deep Dive

More Decks by Antonis Kalipetis

Other Decks in Technology

Featured

Transcript