Slide 1

Slide 1 text

Linux Container with Alternate Linux Container with Alternate Linux Kernel (Library) Linux Kernel (Library) Hajime Tazaki ( ⽥崎 創 ) IIJ Research Laboratory Container Runtime Meetup #2 August 2020 @thehajime   1 

Slide 2

Slide 2 text

History of Containers History of Containers Started as an OS virtualization method The Best Linux Blog In the Unixverse @nixcraft History of containers on unix like system: 1. chroot 1982 2. Freebsd jails 2000 3. Linux vserver 2001 4. Solaris zones 2004 5. OpenVZ 2005 6. LXC 2008 7. Systemd-nspawn 2010 8. Docker 2013 #sysadmin #linux #unix #macos #devops 1157 PM · Aug 10, 2018 694 365 people are Tweeting about this 2 

Slide 3

Slide 3 text

Containers (cont'd) Containers (cont'd) As a sandbox (one-shot image) As a package management (prog/runtime, decoupled con g) With an orchestration framework to host bunch of applications 3 

Slide 4

Slide 4 text

All Satisfying ? All Satisfying ? Kernel is shared across instances But want to use alternate kernel ? no extensibility (latest kernel, out-of-tree module) wish to avoid kernel crash by your app ? VM ? ref: https://www.redhat.com/cms/managed- les/virtualization-vs-containers.png 4 

Slide 5

Slide 5 text

Alternate Options Alternate Options Lightweight VMs Kata containers Docker Desktop Userspace kernel UML (User-mode Linux) gVisor Library OS Graphene (a libOS) Nabla Container/Rumprun unikernel Drawbridge (WSL1) 5 

Slide 6

Slide 6 text

Kata Containers Kata Containers Use lightweight VMM under container infrastructure qemu (lightweight version), or recracker OCI runtime: runv Run independent kernel in a container instance (isolation) Run (small) guest Linux kernel (compatibility) Still slight overhead Compatibility: ++, Portability: ++, Lightweight: - ref: https://katacontainers.io/ 6 

Slide 7

Slide 7 text

Docker Desktop Docker Desktop Run Linux container on foreign platform (Windows/macOS) Small Linux VM Run (most of) components in Linux Goal: Transparent usage from host OS Useful for development environment Compatibility: ++, Portability: ++, Lightweight: - ref: https://docs.docker.com/docker-for-mac/images/docker-for-mac-install.png 7 

Slide 8

Slide 8 text

User-mode Linux (UML) User-mode Linux (UML) run Linux kernel as a userspace process upstream (since kernel 2.2.x?) Support i386/x86_64 Linux host experimental ppc/Linux and windows host (not maintained) ptrace-based syscall interpose less portability Compatibility: ++, Portability: -, Lightweight: +/- 8 

Slide 9

Slide 9 text

Library OSs/Unikernels Library OSs/Unikernels Graphene a LibOS (Drawbridge inspired) with syscall translation Linux ABI-level compatibility Compatibility: +, Portability: +, Lightweight: + Rumprun unikernel/nabla container Bind NetBSD (rump) kernel to Linux programs API-level compatibility Compatibility: +/-, Portability: +, Lightweight: ++ ref: https://grapheneproject.io https://nabla-containers.github.io 9 

Slide 10

Slide 10 text

Summary of Alternatives Summary of Alternatives Approaches Compatibility-centric: more-like VM Portability-centric: less compatibility Goal: VM-level compatibility while Container-level lightweight property 10 

Slide 11

Slide 11 text

µKontainer & Linux Kernel Library µKontainer & Linux Kernel Library 11 

Slide 12

Slide 12 text

Linux Kernel Library (LKL) Linux Kernel Library (LKL) a library (liblkl.{so,a}) run Linux code on various ways with a reusable library h/w dependent layer on Linux/Windows /FreeBSD/macOS/Android uspace, unikernel, on UEFI network simulator (ns-3) code 2.4KLoC (h/w independent) 6.6KLoC (h/w dep) 12 

Slide 13

Slide 13 text

Design options Design options Why modifying Linux kernel ? (even it's hard to upstream) almost 30 years old (since 1991) still growing in a rapid pace (new features + bug xes) we don't want to rewrite from scratch Reuse instead of Rewrite (NetBSD rump kernel) 13 

Slide 14

Slide 14 text

Container integration (µKontainer) Container integration (µKontainer) Components OCI runtime: runu (containerd/dockerd port (macOS)) Type of Images runu-private image (statically-linked LKL application) public image (e.g., alpine:latest) (libc replacement) 14 

Slide 15

Slide 15 text

runu internals runu internals Run LKL programs under docker/k8s communicate w/ containerd/kubelet setup (virtual) devices as exposed le descriptors (fds) (tap, veth, disk image, virtio 9pfs) (optionally) replace libc.so usage Docker: docker run --runtime=runu runu-python:latest k8s: add a runtimeClassName line runtimeClassName: ukontainer apiVersion: apps/v1 1 kind: Deployment 2 spec: 3 template: 4 spec: 5 6 containers: 7 - name: runu-python 8 image: thehajime/runu-python:3.0 9 15 

Slide 16

Slide 16 text

Use case: Docker for mac+ Use case: Docker for mac+ Run docker images without Hypervisor.framework as Mach-O (user space) programs Programs except container image are Mach-O binaries syscalls are invoked inside LKLed programs Bene ts native experience while doing Linux Currently only x86_64 works (both mac and container image) with Apple Silicon may work w/ a slight e ort 16 

Slide 17

Slide 17 text

Demo: alpine linux on macOS Demo: alpine linux on macOS 00:00 https://asciinema.org/a/347292 17 

Slide 18

Slide 18 text

Docker for mac+ : How it works Docker for mac+ : How it works 0. (Mach-O) Run LKL as init process 1. (Mach-O) (v)fork/execve Linux ELF binary 2. (ELF) interpreter (musl+) loads (downloaded) ELF program 3. (ELF) call main() function 4. (ELF) syscall => LKL syscall (libc replacement) 5. (Mach-O) handle lkl syscall from ELF 18 

Slide 19

Slide 19 text

Limitations Limitations vfork (nommu) still bugs has to block parent process until children exit no glibc-based image support (will work on) libc-replacement doesn't work with static binaries no x86_64 (will work on) 19 

Slide 20

Slide 20 text

Compatibility Study Compatibility Study Network stack protocol conformance test Ixia IxAnvl Test ARP/IPv4/ICMP implementation Measurement of conformance (to standards/RFCs) 20 

Slide 21

Slide 21 text

Software/Devices Under Test (DUT) Software/Devices Under Test (DUT) year lang how API features original (if any) lwip (2001) C src- embedded custom v4,v6,ipfwd,tcp scratch Seastar (2014) C++17 static lib custom v4,tcp,dpdk scratch OSv (2013) C++/C static lib POSIX v4,tcp (freebsd) gVisor (2018) golang go pkg custom v4,v6,tcp scratch (netstack mTCP (2014) C static lib custom v4,tcp,dpdk scratch rump (2007) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp NetBSD Linux (1991) C,asm (kernel) POSIX v4,v6,ipfwd,tcp,xdp? Linux LKL (2007?) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp,dpdk Linux 21 

Slide 22

Slide 22 text

Linux Compatibility: network stack Linux Compatibility: network stack ARP IPv4 ICMPv4 lwip 31/52 27/68 14/32 Seastar 32/52 12/27 10/22 OSv 20/52 26/68 17/32 gVisor 31/52 21/68 11/32 mTCP 16/52 15/68 12/32 rump 31/52 17/31 19/32 Linux 51/52 61/68 25/32 LKL 51/52 61/68 25/32 7 network stacks used by container runtimes Results Seastar lacks IP forwarding (lower result) Without Quagga/Zebra, the results become worse (lack of con g options) LKL(µKontainer) == Linux kernel, identical in behavior level 22 

Slide 23

Slide 23 text

Startup time Startup time Duration of (simple) python program (until exit) socket(2)/listen(2), to be ready for accepting HTTP requests time docker run --runtime=XXX python-hello native < µKontainer < docker/runc < gvisor < nabla < kata 23 

Slide 24

Slide 24 text

Benchmark: Network I/O Benchmark: Network I/O netperf (TCP_STREAM) 、 10GEther (p-t-p) native (host kernel) == runu Factors for better performance of runu low syscall overhead (µKontainer) help of o oad features (TSO, checksum) 24 

Slide 25

Slide 25 text

LKL Upstreaming LKL Upstreaming Initial patches on LKML (2008) Proposed on LKML (2015) Recently restarted (Oct. 2019) as a mode of UML (UMMODE=library) 1st step: eliminate duplicated features (devices) still ongoing latest: v5 patch (July 2020) 25 

Slide 26

Slide 26 text

Summary Summary Container can use an alternate kernel We don't have to reimplement linux kernel from scratch Good kernel-level compatibility 26 

Slide 27

Slide 27 text

Resources/LKL/runu/µKontainer Resources/LKL/runu/µKontainer runu: LKL: LKL upstream (v5): https://github.com/ukontainer/runu https://github.com/lkl/linux https://lwn.net/Articles/825100/ 27 