Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Design a Library OS for Practical Containers?/vee21-ukontainer

How to Design a Library OS for Practical Containers?/vee21-ukontainer

Slides of our VEE 2021 paper (https://dl.acm.org/doi/10.1145/3453933.3454011).

Hajime Tazaki

April 17, 2021
Tweet

More Decks by Hajime Tazaki

Other Decks in Technology

Transcript

  1. How to Design a Library OS How to Design a

    Library OS for Practical Containers ? for Practical Containers ? Hajime Tazaki (IIJ Research Laboratory) Akira Moroo (Ricerca Security, Inc.) Yohei Kuga (The University of Tokyo) Ryo Nakamura (The University of Tokyo) ACM VEE 2021, Virtual, 2021  1
  2. - Malcom McLean (Containerization) https://en.wikipedia.org/wiki/Malcom_McLean https://www.techradar.com/news/what-is-container-technology container ? container ?

    Distribution technology (1950's) to reduce loading time to ships Distribution technology (2000's) In computer system to reduce headache ? 2
  3. Containers ? (cont'd) Containers ? (cont'd) In early 2000's (Jail/Vserver/Solaris

    Zone) Popularized by docker (2013?) Standardized (Open Containers Initiative = OCI) Various runtime extensions gVisor, Kata containers, Nabla Containers, Graphene LibOS nixCraft @nixcraft History of containers on unix like system: 1. chroot 1982 2. Freebsd jails 2000 3. Linux vserver 2001 4. Solaris zones 2004 5. OpenVZ 2005 6. LXC 2008 7. Systemd-nspawn 2010 8. Docker 2013 #sysadmin #linux #unix #macos #devops 1157 PM · Aug 10, 2018 677 22 Copy link to Tweet 3
  4. Runtime Extensions Runtime Extensions Rebuild programs and use alternate kernel

    (OSv[31], rumprun[30]/nabla[60]) Intercept syscall and replace with an external (UML[11], gVisor[16], Noah[48]) Replace standard library on-the- y (HermiTux[40]) Binary translation (X-container[50], HermiTux[40]) Non-Linux platforms (Kata[52], Docker Desktop, Graphene[55]) Library OSs (libOS) as kernel of a Linux container Library OSs (libOS) as kernel of a Linux container 4
  5. The level of compatibility The level of compatibility API-level compatibility

    programs are runnable if source code is available and rebuilt ABI-level compatibility programs are runnable without modi cations, but the kernel behavior doesn't needed to be identical kernel-level compatibility (ABI compat w/ Linux personality) programs are runnable with identical behavior of the original kernel bug-for-bug compatibility 6
  6. TCP cc algorithm (gVisor) lack of CMSG handling (Graphene) Examples

    of drop compatibility Examples of drop compatibility 7
  7. Huge Linux Codebase Huge Linux Codebase Number of con g

    options (≈ # of features): Keep increasing except around v4.17 Number of commits: 80,000 commits/year Number of bug xes: 10,000 commits/year (2019, 2020) 8
  8. "Rome was not buylt in one day" - (Erasmus’s Proverbs)

    https://commons.wikimedia.org/wiki/File:Colosseum_exterior,_inner_and_outer_wall_AvL.jpg 9
  9. "Rome was not buylt in one day" - (Erasmus’s Proverbs)

    "Linux was not built in one day" - (???) https://commons.wikimedia.org/wiki/File:Colosseum_exterior,_inner_and_outer_wall_AvL.jpg 9
  10. The level of compatibility (cont'd) The level of compatibility (cont'd)

    Approaches Compatibility-centric: more-like VM Portability-centric: less compatibility Goal: VM-level compatibility while Container-level lightweight property 10
  11. µKontainer & runu runtime µKontainer & runu runtime Run container

    programs with alternate container kernel (LKL) Integrate with container ecosystem OCI runtime: runu Userspace execution of LKL Type of Images runu-private image (statically-linked LKL application) public image (e.g., alpine:latest) (libc replacement) 12
  12. Linux Kernel Library (LKL) Linux Kernel Library (LKL) h/w independent

    architecture (arch/lkl) a library (liblkl.{so,a}) run Linux code on various ways as a reusable library code 2.4KLoC (h/w independent) 6.6KLoC (h/w dep) Extended for µKontainer libc/libc++ port vfork(2) implementation macOS host port 13
  13. Evaluations Evaluations Q1: What degree of compatibility achieved ? Q2:

    Execution on non-Linux platform ? Q3: How much overhead introduced ? 14
  14. Linux compatibility tests Linux compatibility tests Network protocol conformance tests

    (Ixia IxANVL) Blackbox tests (ARP/IPv4/ICMP implementation) Send packets to DUT Observe the response from DUT Validate the response with IETF standards (RFCs) See how much di erent from Linux kernel 15
  15. DUT: Various implementations DUT: Various implementations year lang how API

    features origin an lwip (2001) C src- embedded custom v4,v6,ipfwd,tcp scra Seastar (2014) C++17 static lib custom v4,tcp,dpdk scra OSv (2013) C++/C static lib POSIX v4,tcp (free gVisor (2018) golang go pkg custom v4,v6,tcp scra mTCP (2014) C static lib custom v4,tcp,dpdk scra rump (2007) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp Net Graphene (2014) C,asm static/sh lib POSIX v4,v6,tcp scratch Linux (1991) C,asm (kernel) POSIX v4,v6,ipfwd,tcp,xdp? Lin LKL (2007?) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp,dpdk Lin 9 (userspace) network stacks used by container runtimes 16
  16. Pass Fail Fail (Inconclusive) Error No Test (arp) lwip seastar

    osv gvisor mtcp rump graphene linux lkl lkl-osx (ip) lwip seastar osv gvisor mtcp rump graphene linux lkl lkl-osx (icmp) lwip seastar osv gvisor mtcp rump graphene linux lkl lkl-osx Linux compatibility (cont'd) Linux compatibility (cont'd) ARP IPv4 ICMPv4 lwip 31/52 27/68 14/32 Seastar 32/52 12/27 10/22 OSv 20/52 26/68 17/32 gVisor 31/52 21/68 11/32 mTCP 16/52 15/68 12/32 rump 31/52 17/31 19/32 Graphene 45/52 51/68 21/32 Linux 51/52 61/68 25/32 LKL 51/52 61/68 25/32 LKL (osx) 51/52 61/68 25/32 IP forwarding tests requires multi-NIC support Without Quagga/Zebra, the results become worse (lack of con g options) LKL (µKontainer) == Linux kernel, identical in behavior level (Details are in the paper) 17
  17. Startup duration (cold-start) Startup duration (cold-start) Duration of (simple) python

    program (until exit) socket(2)/listen(2), to be ready for accepting HTTP requests time docker run --runtime=XXX python-hello native < µKontainer < docker/runc < gvisor < nabla < kata 19
  18. Netperf benchmark (Goodput, Latency) Netperf benchmark (Goodput, Latency) netperf (TCP_STREAM/TCP_MAERTS),

    10GEther (p-t-p) native (host kernel) ≈ µKontainer Factors for better performance of µKontainer low syscall overhead (µKontainer) help of o oad features (TSO, checksum) 20
  19. Summary Summary Reimplementing Linux kernel is not a weekend task

    Basic network behavior of IETF standards To cover broad range of rich features (TSO, CC algorithm) ABI compatibility is not enough Morphing Linux kernel is possible w/ low syscall overhead bonus 21