Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Linux Container with Alternate Linux Kernel (Li...

Linux Container with Alternate Linux Kernel (Library)/container-runtime-meetup-202008-lkl

Hajime Tazaki

August 22, 2020
Tweet

More Decks by Hajime Tazaki

Other Decks in Technology

Transcript

  1. Linux Container with Alternate Linux Container with Alternate Linux Kernel

    (Library) Linux Kernel (Library) Hajime Tazaki ( ⽥崎 創 ) IIJ Research Laboratory Container Runtime Meetup #2 August 2020 @thehajime   1 
  2. History of Containers History of Containers Started as an OS

    virtualization method The Best Linux Blog In the Unixverse @nixcraft History of containers on unix like system: 1. chroot 1982 2. Freebsd jails 2000 3. Linux vserver 2001 4. Solaris zones 2004 5. OpenVZ 2005 6. LXC 2008 7. Systemd-nspawn 2010 8. Docker 2013 #sysadmin #linux #unix #macos #devops 1157 PM · Aug 10, 2018 694 365 people are Tweeting about this 2 
  3. Containers (cont'd) Containers (cont'd) As a sandbox (one-shot image) As

    a package management (prog/runtime, decoupled con g) With an orchestration framework to host bunch of applications 3 
  4. All Satisfying ? All Satisfying ? Kernel is shared across

    instances But want to use alternate kernel ? no extensibility (latest kernel, out-of-tree module) wish to avoid kernel crash by your app ? VM ? ref: https://www.redhat.com/cms/managed- les/virtualization-vs-containers.png 4 
  5. Alternate Options Alternate Options Lightweight VMs Kata containers Docker Desktop

    Userspace kernel UML (User-mode Linux) gVisor Library OS Graphene (a libOS) Nabla Container/Rumprun unikernel Drawbridge (WSL1) 5 
  6. Kata Containers Kata Containers Use lightweight VMM under container infrastructure

    qemu (lightweight version), or recracker OCI runtime: runv Run independent kernel in a container instance (isolation) Run (small) guest Linux kernel (compatibility) Still slight overhead Compatibility: ++, Portability: ++, Lightweight: - ref: https://katacontainers.io/ 6 
  7. Docker Desktop Docker Desktop Run Linux container on foreign platform

    (Windows/macOS) Small Linux VM Run (most of) components in Linux Goal: Transparent usage from host OS Useful for development environment Compatibility: ++, Portability: ++, Lightweight: - ref: https://docs.docker.com/docker-for-mac/images/docker-for-mac-install.png 7 
  8. User-mode Linux (UML) User-mode Linux (UML) run Linux kernel as

    a userspace process upstream (since kernel 2.2.x?) Support i386/x86_64 Linux host experimental ppc/Linux and windows host (not maintained) ptrace-based syscall interpose less portability Compatibility: ++, Portability: -, Lightweight: +/- 8 
  9. Library OSs/Unikernels Library OSs/Unikernels Graphene a LibOS (Drawbridge inspired) with

    syscall translation Linux ABI-level compatibility Compatibility: +, Portability: +, Lightweight: + Rumprun unikernel/nabla container Bind NetBSD (rump) kernel to Linux programs API-level compatibility Compatibility: +/-, Portability: +, Lightweight: ++ ref: https://grapheneproject.io https://nabla-containers.github.io 9 
  10. Summary of Alternatives Summary of Alternatives Approaches Compatibility-centric: more-like VM

    Portability-centric: less compatibility Goal: VM-level compatibility while Container-level lightweight property 10 
  11. Linux Kernel Library (LKL) Linux Kernel Library (LKL) a library

    (liblkl.{so,a}) run Linux code on various ways with a reusable library h/w dependent layer on Linux/Windows /FreeBSD/macOS/Android uspace, unikernel, on UEFI network simulator (ns-3) code 2.4KLoC (h/w independent) 6.6KLoC (h/w dep) 12 
  12. Design options Design options Why modifying Linux kernel ? (even

    it's hard to upstream) almost 30 years old (since 1991) still growing in a rapid pace (new features + bug xes) we don't want to rewrite from scratch Reuse instead of Rewrite (NetBSD rump kernel) 13 
  13. Container integration (µKontainer) Container integration (µKontainer) Components OCI runtime: runu

    (containerd/dockerd port (macOS)) Type of Images runu-private image (statically-linked LKL application) public image (e.g., alpine:latest) (libc replacement) 14 
  14. runu internals runu internals Run LKL programs under docker/k8s communicate

    w/ containerd/kubelet setup (virtual) devices as exposed le descriptors (fds) (tap, veth, disk image, virtio 9pfs) (optionally) replace libc.so usage Docker: docker run --runtime=runu runu-python:latest k8s: add a runtimeClassName line runtimeClassName: ukontainer apiVersion: apps/v1 1 kind: Deployment 2 spec: 3 template: 4 spec: 5 6 containers: 7 - name: runu-python 8 image: thehajime/runu-python:3.0 9 15 
  15. Use case: Docker for mac+ Use case: Docker for mac+

    Run docker images without Hypervisor.framework as Mach-O (user space) programs Programs except container image are Mach-O binaries syscalls are invoked inside LKLed programs Bene ts native experience while doing Linux Currently only x86_64 works (both mac and container image) with Apple Silicon may work w/ a slight e ort 16 
  16. Demo: alpine linux on macOS Demo: alpine linux on macOS

    00:00 https://asciinema.org/a/347292 17 
  17. Docker for mac+ : How it works Docker for mac+

    : How it works 0. (Mach-O) Run LKL as init process 1. (Mach-O) (v)fork/execve Linux ELF binary 2. (ELF) interpreter (musl+) loads (downloaded) ELF program 3. (ELF) call main() function 4. (ELF) syscall => LKL syscall (libc replacement) 5. (Mach-O) handle lkl syscall from ELF 18 
  18. Limitations Limitations vfork (nommu) still bugs has to block parent

    process until children exit no glibc-based image support (will work on) libc-replacement doesn't work with static binaries no x86_64 (will work on) 19 
  19. Compatibility Study Compatibility Study Network stack protocol conformance test Ixia

    IxAnvl Test ARP/IPv4/ICMP implementation Measurement of conformance (to standards/RFCs) 20 
  20. Software/Devices Under Test (DUT) Software/Devices Under Test (DUT) year lang

    how API features original (if any) lwip (2001) C src- embedded custom v4,v6,ipfwd,tcp scratch Seastar (2014) C++17 static lib custom v4,tcp,dpdk scratch OSv (2013) C++/C static lib POSIX v4,tcp (freebsd) gVisor (2018) golang go pkg custom v4,v6,tcp scratch (netstack mTCP (2014) C static lib custom v4,tcp,dpdk scratch rump (2007) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp NetBSD Linux (1991) C,asm (kernel) POSIX v4,v6,ipfwd,tcp,xdp? Linux LKL (2007?) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp,dpdk Linux 21 
  21. Linux Compatibility: network stack Linux Compatibility: network stack ARP IPv4

    ICMPv4 lwip 31/52 27/68 14/32 Seastar 32/52 12/27 10/22 OSv 20/52 26/68 17/32 gVisor 31/52 21/68 11/32 mTCP 16/52 15/68 12/32 rump 31/52 17/31 19/32 Linux 51/52 61/68 25/32 LKL 51/52 61/68 25/32 7 network stacks used by container runtimes Results Seastar lacks IP forwarding (lower result) Without Quagga/Zebra, the results become worse (lack of con g options) LKL(µKontainer) == Linux kernel, identical in behavior level 22 
  22. Startup time Startup time Duration of (simple) python program (until

    exit) socket(2)/listen(2), to be ready for accepting HTTP requests time docker run --runtime=XXX python-hello native < µKontainer < docker/runc < gvisor < nabla < kata 23 
  23. Benchmark: Network I/O Benchmark: Network I/O netperf (TCP_STREAM) 、 10GEther

    (p-t-p) native (host kernel) == runu Factors for better performance of runu low syscall overhead (µKontainer) help of o oad features (TSO, checksum) 24 
  24. LKL Upstreaming LKL Upstreaming Initial patches on LKML (2008) Proposed

    on LKML (2015) Recently restarted (Oct. 2019) as a mode of UML (UMMODE=library) 1st step: eliminate duplicated features (devices) still ongoing latest: v5 patch (July 2020) 25 
  25. Summary Summary Container can use an alternate kernel We don't

    have to reimplement linux kernel from scratch Good kernel-level compatibility 26 