Slide 1

Slide 1 text

How to Design a Library OS How to Design a Library OS for Practical Containers ? for Practical Containers ? Hajime Tazaki (IIJ Research Laboratory) Akira Moroo (Ricerca Security, Inc.) Yohei Kuga (The University of Tokyo) Ryo Nakamura (The University of Tokyo) ACM VEE 2021, Virtual, 2021  1

Slide 2

Slide 2 text

- Malcom McLean (Containerization) https://en.wikipedia.org/wiki/Malcom_McLean https://www.techradar.com/news/what-is-container-technology container ? container ? Distribution technology (1950's) to reduce loading time to ships Distribution technology (2000's) In computer system to reduce headache ? 2

Slide 3

Slide 3 text

Containers ? (cont'd) Containers ? (cont'd) In early 2000's (Jail/Vserver/Solaris Zone) Popularized by docker (2013?) Standardized (Open Containers Initiative = OCI) Various runtime extensions gVisor, Kata containers, Nabla Containers, Graphene LibOS nixCraft @nixcraft History of containers on unix like system: 1. chroot 1982 2. Freebsd jails 2000 3. Linux vserver 2001 4. Solaris zones 2004 5. OpenVZ 2005 6. LXC 2008 7. Systemd-nspawn 2010 8. Docker 2013 #sysadmin #linux #unix #macos #devops 1157 PM · Aug 10, 2018 677 22 Copy link to Tweet 3

Slide 4

Slide 4 text

Runtime Extensions Runtime Extensions Rebuild programs and use alternate kernel (OSv[31], rumprun[30]/nabla[60]) Intercept syscall and replace with an external (UML[11], gVisor[16], Noah[48]) Replace standard library on-the- y (HermiTux[40]) Binary translation (X-container[50], HermiTux[40]) Non-Linux platforms (Kata[52], Docker Desktop, Graphene[55]) Library OSs (libOS) as kernel of a Linux container Library OSs (libOS) as kernel of a Linux container 4

Slide 5

Slide 5 text

How hard to implement Linux ? How hard to implement Linux ? 5

Slide 6

Slide 6 text

The level of compatibility The level of compatibility API-level compatibility programs are runnable if source code is available and rebuilt ABI-level compatibility programs are runnable without modi cations, but the kernel behavior doesn't needed to be identical kernel-level compatibility (ABI compat w/ Linux personality) programs are runnable with identical behavior of the original kernel bug-for-bug compatibility 6

Slide 7

Slide 7 text

TCP cc algorithm (gVisor) lack of CMSG handling (Graphene) Examples of drop compatibility Examples of drop compatibility 7

Slide 8

Slide 8 text

Huge Linux Codebase Huge Linux Codebase Number of con g options (≈ # of features): Keep increasing except around v4.17 Number of commits: 80,000 commits/year Number of bug xes: 10,000 commits/year (2019, 2020) 8

Slide 9

Slide 9 text

"Rome was not buylt in one day" - (Erasmus’s Proverbs) https://commons.wikimedia.org/wiki/File:Colosseum_exterior,_inner_and_outer_wall_AvL.jpg 9

Slide 10

Slide 10 text

"Rome was not buylt in one day" - (Erasmus’s Proverbs) "Linux was not built in one day" - (???) https://commons.wikimedia.org/wiki/File:Colosseum_exterior,_inner_and_outer_wall_AvL.jpg 9

Slide 11

Slide 11 text

The level of compatibility (cont'd) The level of compatibility (cont'd) Approaches Compatibility-centric: more-like VM Portability-centric: less compatibility Goal: VM-level compatibility while Container-level lightweight property 10

Slide 12

Slide 12 text

µKontainer: a practical container µKontainer: a practical container runtime runtime 11

Slide 13

Slide 13 text

µKontainer & runu runtime µKontainer & runu runtime Run container programs with alternate container kernel (LKL) Integrate with container ecosystem OCI runtime: runu Userspace execution of LKL Type of Images runu-private image (statically-linked LKL application) public image (e.g., alpine:latest) (libc replacement) 12

Slide 14

Slide 14 text

Linux Kernel Library (LKL) Linux Kernel Library (LKL) h/w independent architecture (arch/lkl) a library (liblkl.{so,a}) run Linux code on various ways as a reusable library code 2.4KLoC (h/w independent) 6.6KLoC (h/w dep) Extended for µKontainer libc/libc++ port vfork(2) implementation macOS host port 13

Slide 15

Slide 15 text

Evaluations Evaluations Q1: What degree of compatibility achieved ? Q2: Execution on non-Linux platform ? Q3: How much overhead introduced ? 14

Slide 16

Slide 16 text

Linux compatibility tests Linux compatibility tests Network protocol conformance tests (Ixia IxANVL) Blackbox tests (ARP/IPv4/ICMP implementation) Send packets to DUT Observe the response from DUT Validate the response with IETF standards (RFCs) See how much di erent from Linux kernel 15

Slide 17

Slide 17 text

DUT: Various implementations DUT: Various implementations year lang how API features origin an lwip (2001) C src- embedded custom v4,v6,ipfwd,tcp scra Seastar (2014) C++17 static lib custom v4,tcp,dpdk scra OSv (2013) C++/C static lib POSIX v4,tcp (free gVisor (2018) golang go pkg custom v4,v6,tcp scra mTCP (2014) C static lib custom v4,tcp,dpdk scra rump (2007) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp Net Graphene (2014) C,asm static/sh lib POSIX v4,v6,tcp scratch Linux (1991) C,asm (kernel) POSIX v4,v6,ipfwd,tcp,xdp? Lin LKL (2007?) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp,dpdk Lin 9 (userspace) network stacks used by container runtimes 16

Slide 18

Slide 18 text

Pass Fail Fail (Inconclusive) Error No Test (arp) lwip seastar osv gvisor mtcp rump graphene linux lkl lkl-osx (ip) lwip seastar osv gvisor mtcp rump graphene linux lkl lkl-osx (icmp) lwip seastar osv gvisor mtcp rump graphene linux lkl lkl-osx Linux compatibility (cont'd) Linux compatibility (cont'd) ARP IPv4 ICMPv4 lwip 31/52 27/68 14/32 Seastar 32/52 12/27 10/22 OSv 20/52 26/68 17/32 gVisor 31/52 21/68 11/32 mTCP 16/52 15/68 12/32 rump 31/52 17/31 19/32 Graphene 45/52 51/68 21/32 Linux 51/52 61/68 25/32 LKL 51/52 61/68 25/32 LKL (osx) 51/52 61/68 25/32 IP forwarding tests requires multi-NIC support Without Quagga/Zebra, the results become worse (lack of con g options) LKL (µKontainer) == Linux kernel, identical in behavior level (Details are in the paper) 17

Slide 19

Slide 19 text

Platform portability: Alpine Linux on macOS Platform portability: Alpine Linux on macOS 00:00 https://asciinema.org/a/347292 18

Slide 20

Slide 20 text

Startup duration (cold-start) Startup duration (cold-start) Duration of (simple) python program (until exit) socket(2)/listen(2), to be ready for accepting HTTP requests time docker run --runtime=XXX python-hello native < µKontainer < docker/runc < gvisor < nabla < kata 19

Slide 21

Slide 21 text

Netperf benchmark (Goodput, Latency) Netperf benchmark (Goodput, Latency) netperf (TCP_STREAM/TCP_MAERTS), 10GEther (p-t-p) native (host kernel) ≈ µKontainer Factors for better performance of µKontainer low syscall overhead (µKontainer) help of o oad features (TSO, checksum) 20

Slide 22

Slide 22 text

Summary Summary Reimplementing Linux kernel is not a weekend task Basic network behavior of IETF standards To cover broad range of rich features (TSO, CC algorithm) ABI compatibility is not enough Morphing Linux kernel is possible w/ low syscall overhead bonus 21

Slide 23

Slide 23 text

µKontainer µKontainer https://github.com/ukontainer/ 22