Slide 1

Slide 1 text

Linux Kernel Library Linux Kernel Library A Library Version of Linux Kernel A Library Version of Linux Kernel Hajime Tazaki (@thehajime) IIJ Research Laboratory FOSDEM 2020: February 2020  1 

Slide 2

Slide 2 text

Motivations Motivations Plenty of Linux kernel-like projects WSL (Windows Subsystem for Linux) gVisor Graphene Noah We don't wish to re-write Linux kernel 2 

Slide 3

Slide 3 text

Motivations (cont'd) Motivations (cont'd) Linux kernel is written in C Our programs may also be written in C Why not use the kernel code as a library ? Quite similar motivation to NetBSD rump kernel 3 

Slide 4

Slide 4 text

Linux Kernel Library (LKL) Linux Kernel Library (LKL) a library (liblkl.{so,a}) out-of-tree architecture (h/w-independent) run Linux code on various ways with a reusable library h/w dependent layer on Linux/Windows /FreeBSD/Android uspace, unikernel, on UEFI network simulator (ns-3) code 2.4KLoC (h/w independent) 6.6KLoC (h/w dep) 4 

Slide 5

Slide 5 text

Alternatives (userspace network stacks) Alternatives (userspace network stacks) year lang how API features original (if any) lwip (2001) C src- embedded custom v4,v6,ipfwd,tcp scratch Seastar (2014) C++17 static lib custom v4,tcp,dpdk scratch OSv (2013) C++/C static lib POSIX v4,tcp (freebsd) gVisor (2018) golang go pkg custom v4,v6,tcp scratch mTCP (2014) C static lib custom v4,tcp,dpdk scratch rump (2007) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp NetBSD Linux (1991) C,asm (kernel) POSIX v4,v6,ipfwd,tcp,xdp? Linux LKL (2007?) C,asm static/sh lib POSIX v4,v6,ipfwd,tcp,dpdk Linux 5 

Slide 6

Slide 6 text

LKL: internals LKL: internals core design outsource machine dependent code keep application and kernel code untouched components 1. host backend (host_ops) 2. CPU independent arch. (arch/lkl) 3. application interface 6 

Slide 7

Slide 7 text

1. host backend 1. host backend environment dependent part unify an interface across di erent platforms (rump-hypercall like) device interface with Virtio block device <=> disk image networking <=> TAP, raw socket, DPDK, VDE 7 

Slide 8

Slide 8 text

2. CPU independent architecture 2. CPU independent architecture architecture (arch/lkl) transparent architecture bind (as CPU arch) require no modi cation to the other implementation thread information (struct thread_info) irq, timer, syscall handler access to underlying layer by host_ops 8 

Slide 9

Slide 9 text

3. Application interface 3. Application interface 1. use exposed API (LKL syscall) 2. use host libc (LD_PRELOAD) 3. extend (alternative) libc 9 

Slide 10

Slide 10 text

API 1: use exposed API (LKL syscall) API 1: use exposed API (LKL syscall) call entry points of LKL kernel lkl_sys_open(), lkl_sys_socket() almost same as ordinal syscalls return value, errno noti cation are di erent can use LKL syscall and host syscall simultaneously read ext4 le by lkl_sys_read() => write into host (Windows) by write() 10 

Slide 11

Slide 11 text

API 2: hijack host standard library API 2: hijack host standard library dynamically replace symbols of host syscalls (of libc) LD_PRELOAD socket() => lkl_sys_socket() can use host binary (executable) as-is limitation of replaceable symbols needs syscall translation on non-linux host 11 

Slide 12

Slide 12 text

API 3: extend (alternative) libc API 3: extend (alternative) libc only call LKL syscall with our own libc also introduce as a virtual CPU architecture a program can link this instead of host libc can't access to (underlying) host resource directly via this lkl syscall as a patch for musl libc 12 

Slide 13

Slide 13 text

Showcases Showcases 13 

Slide 14

Slide 14 text

mount a disk image w/o root mount a disk image w/o root privilege privilege mount/modify a disk image mount as loopback devices (may not on foreign OS) or use a VM LKL is for you modifying btrfs image with non root user 14 

Slide 15

Slide 15 text

out of tree network protocols on Android out of tree network protocols on Android 15 

Slide 16

Slide 16 text

UNIX pipe as a NIC UNIX pipe as a NIC access control by grep port mirroring by tee service function chaining, huh ? https://github.com/thehajime/blog/issues/3 16 

Slide 17

Slide 17 text

Linux kernel on web browser Linux kernel on web browser Compile kernel with clang/llvm generate JS code by emscripten run on a browser 17 

Slide 18

Slide 18 text

Convert Linux kernel (C) to JS Convert Linux kernel (C) to JS asmlinkage __visible void __init start_kernel(void) set_task_stack_end_magic(&init_task); smp_setup_processor_id(); 1 { 2 char *command_line; 3 char *after_dashes; 4 5 6 7 debug_objects_early_init(); 8 9 cgroup_init_early(); 10 11 local_irq_disable(); 12 early_boot_irqs_disabled = true; 13 14 /* 15 * Interrupts are still disabled. Do necessary setups, then 16 bl h function _start_kernel() { 1 var $0 = 0, $1 = 0, $10 = 0, $11 = 0, $12 = 0, $13 = 0, $14 = 0, $15 = 0, $16 = 2 0, $17 = 0, $18 = 0, $19 = 0, $2 = 0, $20 = 0, $21 = 0, $22 = 0, $23 = 0, $24 = 3 0, $25 = 0, $26 = 0; 4 var $27 = 0, $28 = 0, $29 = 0, $3 = 0, $30 = 0, $4 = 0, $5 = 0, $6 = 0, $7 = 0, 5 $8 = 0, $9 = 0, $spec$select$i = 0, $vararg_buffer = 0, $vararg_buffer1 = 0, $v 6 ararg_buffer4 = 0, $vararg_buffer6 = 0, $vararg_buffer8 = 0, $vararg_ptr11 = 0, 7 label = 0, sp = 0; 8 sp = STACKTOP; 9 STACKTOP = STACKTOP + 48|0; if ((STACKTOP|0) >= (STACK_MAX|0)) abortStackOverfl 10 ow(48|0); 11 $vararg_buffer8 = sp + 32|0; 12 $vararg_buffer6 = sp + 24|0; 13 $vararg_buffer4 = sp + 16|0; 14 $vararg_buffer1 = sp + 8|0; 15 $vararg_buffer = sp; 16 $ | 18 

Slide 19

Slide 19 text

Running Linux container on non-Linux host Running Linux container on non-Linux host port LKL to macOS docker integration (OCI runtime) w/ modi ed dockerd w/o Hypervisor.framework 19 

Slide 20

Slide 20 text

Fuzz testing with LKL Fuzz testing with LKL Syscall fuzzer by LKL Focus on lesystem fuzzing tests Found numbers of unknown bugs (ext4, btrfs, f2fs, etc) - Xu et al., Fuzzing File Systems via Two-Dimensional Input Space Exploration, IEEE S&P 2019 20 

Slide 21

Slide 21 text

Network simulation Network simulation What ? network simulation (ns-3) with Linux network stack Why ? less abstraction more realistic fully reproducible 21 

Slide 22

Slide 22 text

Future Directions Future Directions broader (underlying) environments (solo5) Upstreaming ? 22 

Slide 23

Slide 23 text

LKL Upstreaming LKL Upstreaming Initial patches on LKML (2008) Proposed on LKML (2015) Recently restarted (Oct. 2019) as a mode of UML (UMMODE=library) 1st step: eliminate duplicated features (devices) still ongoing 23 

Slide 24

Slide 24 text

Linux Kernel Library Linux Kernel Library cc -llkl hello.c https://lkl.github.io/ 24 