Slide 1

Slide 1 text

A modern services SDK for LinuxKit Type safety, container-native daemons, minimum privilege, easy development, unikernel protection, .. hacked to you by Thomas Gazagnaire, Thomas Leonard,
 Martin Lucina, Anil Madhavapeddy, Mindy Preston 7th June 2017 - Moby Security SIG #2

Slide 2

Slide 2 text

A modern services SDK for LinuxKit Type safety, container-native daemons, minimum privilege, easy development, unikernel protection, .. disclaimer: this is active work in progress, and we're showing this early to the Moby Security SIG community to get feedback on the work. interruptions and feedback are welcome. and patches :)

Slide 3

Slide 3 text

Motivation • Base daemons in LinuxKit are typically wrapped versions of existing system software (e.g. dhcpcd, ntpd). • Often written in C, different configuration mechanisms, no structured logging, require lots of privilege for system operations. • Want to make these less monolithic and more container- native, and fit with LinuxKit philosophy of a lean, secure container runtime. • This project provides us with a vehicle to deploy more advanced security protections in LinuxKit in a practical way.

Slide 4

Slide 4 text

Approach • LinuxKit has a single build-time yaml file and everything except init runs in a container namespace. • We build privilege separated applications that use this architecture to avoid common security vulnerabilities by: 1. Specifying the process layout for an application in yaml 2. Enforcing isolated, minimal privileges per process 3. Separating every process in a container namespace 4. Coordinating the containers with standard RPC tooling Developer experience matters: containerisation complexity is hidden inside the SDK tooling and not the application.

Slide 5

Slide 5 text

Approach • First daemon being developed is a DHCP client. • This is a difficult daemon to privilege separate due the deep (and non-portable) system hooks required for handling IP and routing tables (e.g. netlink). • Implementation flushes out a lot of architectural questions and makes subsequent protocol implementations such as HTTPS or NTP more straightforward. https://github.com/linuxkit/linuxkit/tree/master/projects/miragesdk https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=dhcp

Slide 6

Slide 6 text

dhcp- network dhcp- actuator dhcp- engine eth0 kernel/ netlink • Three processes, each with very minimal privileges. • dhcp-network can only access eth0 for networking • dhcp-engine can see nothing except the other two processes • dhcp-actuator can manipulate routing tables but cannot see network. • Each process can be written in safe language best suited to the task (OCaml, Rust in this case).

Slide 7

Slide 7 text

- name: dhcp-network capabilities: - CAP_NET_ADMIN # bring eth0 up - CAP_NET_RAW # read /dev/eth0 dhcp- network dhcp- actuator dhcp- engine eth0 kernel/ netlink

Slide 8

Slide 8 text

- name: dhcp-network capabilities: - CAP_NET_ADMIN # bring eth0 up - CAP_NET_RAW # read /dev/eth0 - name: dhcp-actuator image: capabilities: - CAP_NET_ADMIN # for netlink binds: - /state # to write resolv.conf dhcp- network dhcp- actuator dhcp- engine eth0 kernel/ netlink

Slide 9

Slide 9 text

- name: dhcp-network capabilities: - CAP_NET_ADMIN # bring eth0 up - CAP_NET_RAW # read /dev/eth0 - name: dhcp-actuator image: capabilities: - CAP_NET_ADMIN # for netlink binds: - /state # to write resolv.conf - name: dhcp-engine image: rpc: - dhcp-network - dhcp-actuator dhcp- network dhcp- actuator dhcp- engine eth0 kernel/ netlink

Slide 10

Slide 10 text

- name: dhcp-network capabilities: - CAP_NET_ADMIN # bring eth0 up - CAP_NET_RAW # read /dev/eth0 - name: dhcp-actuator image: capabilities: - CAP_NET_ADMIN # for netlink binds: - /state # to write resolv.conf - name: dhcp-engine image: rpc: - dhcp-network - dhcp-actuator - name: dhcp-init image: files: - path: /var/run/dhcp-client/README contents: 'data for dhcp-client' dhcp- network dhcp- actuator dhcp- engine eth0 kernel/ netlink

Slide 11

Slide 11 text

dhcp- network dhcp- actuator dhcp- engine eth0 kernel/ netlink @0xb224be3ea8450819; struct DhcpNetworkRequest { id @0 :Int32; path @1 :List(Text); union { write @2 :Data; read @3 :Void; delete @4 :Void; } } struct DhcpNetworkResponse { id @0: Int32; union { ok @1 :Data; error @2 :Data; } } struct DhcpActuatorRequest { id @0 :Int32; interface @1 :Text; ipv4Addr @2 :List(Text); resolvConf @3 :List(Text); } struct DhcpActuatorResponse { id @0: Int32; union { ok @1 :Data; error @2 :Data; } } • RPC via Capnp transport layer. • Provides RPC making it easy to generate bindings to languages.
 https://github.com/ capnproto • LinuxKit SDK takes care of starting the containers with an initial config and connecting the file descriptors.

Slide 12

Slide 12 text

Demo: Capnp RPC • Capnp has an interface file and stub code generator for many languages. • Very simple binary format to parse (e.g. no HTTP2 dependency) so is a viable small attack surface to depend on for privileged components. • The CLI checks your interface specs and makes it relatively easy to glue pieces together. • Here is an example of an HTTPS server built like this:
 https://github.com/talex5/linuxkit/tree/https-unikernel/ projects/https-unikernel
 (see https://github.com/linuxkit/linuxkit/pull/1981)

Slide 13

Slide 13 text

Going deeper for security • Need protections at all levels of the stack for defence in depth: • application level: static type safety when parsing network traffic (via OCaml, Rust logic) and secure RPC (via Capnp) • protocol state machine: fuzz testing for rapid state space exploration (via American Fuzzy Lop aka AFL) • runtime process: container namespacing and KVM hardware protection if available (via unikernel Solo5). • kernel interface: eBPF sandboxing for fine-grained access to syscalls. • implementation diversity: the container/rpc approach lets many languages/runtimes work together without tight coupling. • What else? LinuxKit lets us patch kernel and use facility directly in the base daemons, just like a BSD distro. SGX, TrustZone, etc...

Slide 14

Slide 14 text

Demo: ukvm service • For service isolation, we can further protect processes against exploit by using /dev/kvm • This is a unikernel (standalone specialised VM) running as a normal Linux process in a container. • External channel setup will be handled by the RPC layer, but for now is just a tap device. • Demo: here is a DNS service running as a KVM process on Linux and serving network traffic.

Slide 15

Slide 15 text

Demo: fuzz testing • Fuzz testing: throw a lot of random input at a program, see where it breaks, fix it, repeat. • AFL is helpful as it can figure out an effective fuzz path quickly, and minimise test cases. Comes with a CLI afl-fuzz: http://lcamtuf.coredump.cx • Writing adapters for AFL to the LinuxKit SDK (which uses file descriptors) to make fuzzing easier to start. • Demo: afl-fuzz working on the DHCP state machine.
 Details at 
 Asciicast: https://asciinema.org/a/3ljccmn19m25uj02kve678xp6

Slide 16

Slide 16 text

Putting it all together • WIP: wanted to explain the architecture early to the Security SIG community. Another update at the Moby Summit in a few weeks to show the frontend tooling. • DHCP, DNS, HTTPS are our first targets to have safe system services by default. Anything else to focus on? • Config interface is as similar to existing daemons as possible so they can be swapped easily. • @mato is working on integrating Solo5 so that isolated services (e.g. dhcp-engine) can be unikernel-protected if hardware virt is available, and fall back to eBPF/seccomp sandboxing if not. • @talex5 is working on the RPC substrate. • @samoht @avsm are building the system daemons and CLI frontend. • @yomimono is hacking on fuzz testing all the things with AFL.

Slide 17

Slide 17 text

Where its going • Initially it is very LinuxKit specific since we depend on a specific containerd featureset, but everything is intended to be portable (including to FreeBSD jails, OpenBSD pledge, ...) • The Moby CLI should be able to package up as deb or rpms though, so it can be deployable more widely. • We want to take a structured approach to classifying CVEs for common system services to determine what to fuzz on. Memory safety, logic bugs, container breakouts, ... • Support more languages, build an ecosystem for practical correct-by-construction services. • https://github.com/linuxkit/linuxkit projects/miragesdk