Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Linux Container with Alternate Linux Kernel (Library)/container-runtime-meetup-202008-lkl

Linux Container with Alternate Linux Kernel (Library)/container-runtime-meetup-202008-lkl


Hajime Tazaki

August 22, 2020


  1. Linux Container with Alternate Linux Container with Alternate Linux Kernel

    (Library) Linux Kernel (Library) Hajime Tazaki ( ⽥崎 創 ) IIJ Research Laboratory Container Runtime Meetup #2 August 2020 @thehajime   1 
  2. History of Containers History of Containers Started as an OS

    virtualization method The Best Linux Blog In the Unixverse @nixcraft History of containers on unix like system: 1. chroot 1982 2. Freebsd jails 2000 3. Linux vserver 2001 4. Solaris zones 2004 5. OpenVZ 2005 6. LXC 2008 7. Systemd-nspawn 2010 8. Docker 2013 #sysadmin #linux #unix #macos #devops 1157 PM · Aug 10, 2018 694 365 people are Tweeting about this 2 
  3. Containers (cont'd) Containers (cont'd) As a sandbox (one-shot image) As

    a package management (prog/runtime, decoupled con g) With an orchestration framework to host bunch of applications 3 
  4. All Satisfying ? All Satisfying ? Kernel is shared across

    instances But want to use alternate kernel ? no extensibility (latest kernel, out-of-tree module) wish to avoid kernel crash by your app ? VM ? ref: https://www.redhat.com/cms/managed- les/virtualization-vs-containers.png 4 
  5. Alternate Options Alternate Options Lightweight VMs Kata containers Docker Desktop

    Userspace kernel UML (User-mode Linux) gVisor Library OS Graphene (a libOS) Nabla Container/Rumprun unikernel Drawbridge (WSL1) 5 
  6. Kata Containers Kata Containers Use lightweight VMM under container infrastructure

    qemu (lightweight version), or recracker OCI runtime: runv Run independent kernel in a container instance (isolation) Run (small) guest Linux kernel (compatibility) Still slight overhead Compatibility: ++, Portability: ++, Lightweight: - ref: https://katacontainers.io/ 6 
  7. Docker Desktop Docker Desktop Run Linux container on foreign platform

    (Windows/macOS) Small Linux VM Run (most of) components in Linux Goal: Transparent usage from host OS Useful for development environment Compatibility: ++, Portability: ++, Lightweight: - ref: https://docs.docker.com/docker-for-mac/images/docker-for-mac-install.png 7 
  8. User-mode Linux (UML) User-mode Linux (UML) run Linux kernel as

    a userspace process upstream (since kernel 2.2.x?) Support i386/x86_64 Linux host experimental ppc/Linux and windows host (not maintained) ptrace-based syscall interpose less portability Compatibility: ++, Portability: -, Lightweight: +/- 8 
  9. Library OSs/Unikernels Library OSs/Unikernels Graphene a LibOS (Drawbridge inspired) with

    syscall translation Linux ABI-level compatibility Compatibility: +, Portability: +, Lightweight: + Rumprun unikernel/nabla container Bind NetBSD (rump) kernel to Linux programs API-level compatibility Compatibility: +/-, Portability: +, Lightweight: ++ ref: https://grapheneproject.io https://nabla-containers.github.io 9 
  10. Summary of Alternatives Summary of Alternatives Approaches Compatibility-centric: more-like VM

    Portability-centric: less compatibility Goal: VM-level compatibility while Container-level lightweight property 10 
  11. µKontainer & Linux Kernel Library µKontainer & Linux Kernel Library

    11 
  12. Linux Kernel Library (LKL) Linux Kernel Library (LKL) a library

    (liblkl.{so,a}) run Linux code on various ways with a reusable library h/w dependent layer on Linux/Windows /FreeBSD/macOS/Android uspace, unikernel, on UEFI network simulator (ns-3) code 2.4KLoC (h/w independent) 6.6KLoC (h/w dep) 12 
  13. Design options Design options Why modifying Linux kernel ? (even

    it's hard to upstream) almost 30 years old (since 1991) still growing in a rapid pace (new features + bug xes) we don't want to rewrite from scratch Reuse instead of Rewrite (NetBSD rump kernel) 13 
  14. Container integration (µKontainer) Container integration (µKontainer) Components OCI runtime: runu

    (containerd/dockerd port (macOS)) Type of Images runu-private image (statically-linked LKL application) public image (e.g., alpine:latest) (libc replacement) 14 
  15. runu internals runu internals Run LKL programs under docker/k8s communicate

    w/ containerd/kubelet setup (virtual) devices as exposed le descriptors (fds) (tap, veth, disk image, virtio 9pfs) (optionally) replace libc.so usage Docker: docker run --runtime=runu runu-python:latest k8s: add a runtimeClassName line runtimeClassName: ukontainer apiVersion: apps/v1 1 kind: Deployment 2 spec: 3 template: 4 spec: 5 6 containers: 7 - name: runu-python 8 image: thehajime/runu-python:3.0 9 15 
  16. Use case: Docker for mac+ Use case: Docker for mac+

    Run docker images without Hypervisor.framework as Mach-O (user space) programs Programs except container image are Mach-O binaries syscalls are invoked inside LKLed programs Bene ts native experience while doing Linux Currently only x86_64 works (both mac and container image) with Apple Silicon may work w/ a slight e ort 16 
  17. Demo: alpine linux on macOS Demo: alpine linux on macOS

    00:00 https://asciinema.org/a/347292 17 
  18. Docker for mac+ : How it works Docker for mac+

    : How it works 0. (Mach-O) Run LKL as init process 1. (Mach-O) (v)fork/execve Linux ELF binary 2. (ELF) interpreter (musl+) loads (downloaded) ELF program 3. (ELF) call main() function 4. (ELF) syscall => LKL syscall (libc replacement) 5. (Mach-O) handle lkl syscall from ELF 18 
  19. Limitations Limitations vfork (nommu) still bugs has to block parent

    process until children exit no glibc-based image support (will work on) libc-replacement doesn't work with static binaries no x86_64 (will work on) 19 
  20. Compatibility Study Compatibility Study Network stack protocol conformance test Ixia

    IxAnvl Test ARP/IPv4/ICMP implementation Measurement of conformance (to standards/RFCs) 20 
  21. Software/Devices Under Test (DUT) Software/Devices Under Test (DUT) year lang

    how API features original (if any) lwip (2001) C src- embedded custom v4,v6,ipfwd,tcp scratch Seastar (2014) C++17 static lib custom v4,tcp,dpdk scratch OSv (2013) C++/C static lib POSIX v4,tcp (freebsd) gVisor (2018) golang go pkg custom v4,v6,tcp scratch (netstack mTCP (2014) C static lib custom v4,tcp,dpdk scratch rump (2007) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp NetBSD Linux (1991) C,asm (kernel) POSIX v4,v6,ipfwd,tcp,xdp? Linux LKL (2007?) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp,dpdk Linux 21 
  22. Linux Compatibility: network stack Linux Compatibility: network stack ARP IPv4

    ICMPv4 lwip 31/52 27/68 14/32 Seastar 32/52 12/27 10/22 OSv 20/52 26/68 17/32 gVisor 31/52 21/68 11/32 mTCP 16/52 15/68 12/32 rump 31/52 17/31 19/32 Linux 51/52 61/68 25/32 LKL 51/52 61/68 25/32 7 network stacks used by container runtimes Results Seastar lacks IP forwarding (lower result) Without Quagga/Zebra, the results become worse (lack of con g options) LKL(µKontainer) == Linux kernel, identical in behavior level 22 
  23. Startup time Startup time Duration of (simple) python program (until

    exit) socket(2)/listen(2), to be ready for accepting HTTP requests time docker run --runtime=XXX python-hello native < µKontainer < docker/runc < gvisor < nabla < kata 23 
  24. Benchmark: Network I/O Benchmark: Network I/O netperf (TCP_STREAM) 、 10GEther

    (p-t-p) native (host kernel) == runu Factors for better performance of runu low syscall overhead (µKontainer) help of o oad features (TSO, checksum) 24 
  25. LKL Upstreaming LKL Upstreaming Initial patches on LKML (2008) Proposed

    on LKML (2015) Recently restarted (Oct. 2019) as a mode of UML (UMMODE=library) 1st step: eliminate duplicated features (devices) still ongoing latest: v5 patch (July 2020) 25 
  26. Summary Summary Container can use an alternate kernel We don't

    have to reimplement linux kernel from scratch Good kernel-level compatibility 26 
  27. Resources/LKL/runu/µKontainer Resources/LKL/runu/µKontainer runu: LKL: LKL upstream (v5): https://github.com/ukontainer/runu https://github.com/lkl/linux https://lwn.net/Articles/825100/

    27 