Slide 1

Slide 1 text

Manabu, Sugimoto R&D Center, Sony Group Corporation Cloud Native Rust Day 2021 Rust-based, Secure and Lightweight Container Runtime for Embedded Systems Copyright 2021 Sony Group Corporation

Slide 2

Slide 2 text

 Manabu Sugimoto  System Software Engineer, Sony R&D Center  Interests  Containers, Unikernels and Linux Kernel 1 About Me

Slide 3

Slide 3 text

 Introduction  Container virtualization, Container runtime  Motivation  Containers on embedded systems, Problems of existing container runtimes  Proposal  Rust-based, secure and lightweight container runtime  Evaluation  How well is our runtime compared to the existing runtimes?  Summary  Future work, Conclusion 2 Outline of the Talk

Slide 4

Slide 4 text

Introduction 3

Slide 5

Slide 5 text

4 Container Virtualization  Container provides isolation and containment for applications  The mechanism can prevent attacks from untrusted applications  Container has been utilized increasingly in embedded systems  More attractive to resource-constrained systems due to the lightweight Container Trusted Application Container Container Platform Operating System Hardware Container Runtime Untrusted Application Dos Attack × Container Trusted Application Illegal Access ✔ ✔ ✖ ✖ Access Access

Slide 6

Slide 6 text

 Container runtime spawns and runs containers  OCI (Open Container Initiative) runtime specification compliance  Sometimes referred to as “low-level” runtimes  Set up cgroups, namespaces, capabilities, seccomp, etc. 5 What is Container Runtime? Linux Kernel Namespaces Capabilities Cgroups Seccomp ... Container Runtime

Slide 7

Slide 7 text

Today’s Topic 6 Container Runtime Stack Client kubelet High-level Runtime containerd CRI-O Low-level Runtime runc runsc Singularity CRI OCI ※ From the next slides onwards, the term container runtime refers to the low-level runtime

Slide 8

Slide 8 text

Motivation 7

Slide 9

Slide 9 text

 Embedded systems have more restrictions than server systems  Resource-constrained systems  Small memory size  Low-capacity storage  Low-spec CPU  Mission-critical systems  Real-time application  Critical functionality  Longer life cycle Requirements of Embedded Systems Embedded Systems Low Resource Utilization Security High Response High Dependability 8

Slide 10

Slide 10 text

 Using Kubernetes or Docker on embedded systems is difficult  Include performance overhead and high resource usage  Write operations by the daemon process shorten the lifespan of eMMC*  We run the low-level container runtime alone on the systems 9 Containers on Embedded Systems *Embedded Multimedia Card Lightweight Container Trusted Application Operating System Hardware Container Trusted Application Daemon Process Container Runtime Container Trusted Application Operating System Hardware Container Trusted Application Container Runtime eMMC eMMC General-purpose Systems Embedded Systems

Slide 11

Slide 11 text

 The existing runtimes are not optimized for embedded systems  Security  Linux capabilities are not fine-grained access control  e.g., Both ping and ARP spoofing need CAP_NET_RAW  The rootless container by user namespace is very strict for the systems  The rootless container cannot emulate all system calls  However, some embedded apps need to access devices via mount(2), mknod(2), etc.  Lightweight  Container startup time is not fast enough for real-time systems  The go-based runtimes are not suitable for resource-constrained systems  The application binary size is big  Garbage Collection (GC) includes high CPU utilization 10 Problems of the Existing Runtimes

Slide 12

Slide 12 text

Proposal 11

Slide 13

Slide 13 text

 Secure and Lightweight container runtime (SL runtime)  SL runtime is implemented fully in Rust with modern crates  Minimal container runtime for embedded systems  OCI-compatible runtime 12 Rust-based Container Runtime 真ん中に持って いく 左が secure 右が ligheweight Secure Mechanism Lightweight Mechanism Embedded Systems Low Resource Utilization Security High Response High Dependability  Low memory usage  Small binary size  Fast startup mechanism  Real-time support  Isolation by container  Fine-grained access control  Memory safety ✔ ✔ ✔ ✔

Slide 14

Slide 14 text

13 Comparison with the Existing Runtimes SL runtime runsc (v20201208.0) runc (v1.0.0-rc93) singularity (v3.1.0) crun (v0.18) railcar (v1.0.4) Language Rust Go Go and C Go and C C Rust OCI compatibility       Binary size* 2.63 MB 23.6 MB 14.0 MB 18.0 MB 0.43 MB 1.68 MB Memory safety       Fine-grained access control       GC overhead Not included Included Included Included Not included Not included Fast startup       Real-time support (WIP)      *All binary files are stripped Comparison Table from the Perspective of Embedded Systems

Slide 15

Slide 15 text

 Rust is a great fit for embedded systems  Performance is equivalent to C/C++  Memory safety without GC  Small application binary size  Awesome crates for developing the container runtime  FFI (Foreign Function Interface) to bind Linux API  Go is also good language but has some limitations  Problem interacting with namespaces by go-runtime  The application binary size is big compared to Rust  Overhead by GC 14 Why Rust?

Slide 16

Slide 16 text

 Many awesome crates for developing the container runtime 15 Crates for the Container Runtime  capability : https://crates.io/crates/caps  rlimit : https://crates.io/crates/rlimit  cgroups : https://crates.io/crates/cgroups-rs  seccomp : https://crates.io/crates/seccomp-sys  passfd : https://crates.io/crates/passfd  This crate is used for the fine-grained access control  core_affinity : https://crates.io/crates/core_affinity  This crate is used for the real-time support  etc.  clap : https://crates.io/crates/clap  serde_json : https://crates.io/crates/serde_json  anyhow : https://crates.io/crates/anyhow  etc. Developing Runtime Creating Container

Slide 17

Slide 17 text

Our Container Platform Operating System 16 Architecture Overview Rootless Containers by User Namespace Container (running) Fine-Grained Access Control (FGAC) Server Namespaces Seccomp Capabilities Execute a system call e.g., mount Create a secure container AppArmor CPU Affinity Launch a secure container Capture the system call Perform the system call on behalf of the container Hardware (Resource Constrained System) SL runtime Start the container with arbitrary execution process Container (created) Container (running) Fast Startup

Slide 18

Slide 18 text

Lightweight Mechanism 17

Slide 19

Slide 19 text

Our Container Platform Launch a secure container Execute a system call e.g., mount Hardware (Resource Constrained System) Operating System Fine-Grained Access Control (FGAC) Server 18 Architecture Overview Rootless Containers by User Namespace Container (running) Container (created) Container (running) SL runtime Namespaces Seccomp Capabilities AppArmor CPU Affinity Capture the system call Create a secure container Start the container with an arbitrary execution process Perform the system call on behalf of the container Fast Startup

Slide 20

Slide 20 text

Container (running) Container (running) Container (created) Container (created)  Launch a container speedily by leveraging a pre-created container  Omit time for initializing the runtime and creating the container  Replace only the execution process inside the container at startup  Reuse the other configuration except for the execution process Fast Startup Create Container Init Runtime Normal Run Reduced Time Fast Startup Fast Startup Elapsed Time Start Container Container (created) SL runtime Container (running) Real-Time App Fast Startup: Replace the process inside the container Linux Kernel Dummy 19

Slide 21

Slide 21 text

 RT support enables the runtime to set CPU affinity at fast startup  Ensuring RT performance for embedded systems  Allow the runtime to set CPU affinity depending on the load at startup 20 Real-Time (RT) Support Container (created) SL runtime Container (running) RT Support: Set CPU affinity with fast startup Linux Kernel Hardware CPU 1 CPU 2 CPU 3 CPU 4 Dummy Real-Time App

Slide 22

Slide 22 text

21 Design of Fast Startup and RT Support { “args”: [ “RT process” ], “cpu”: 3 } “ociVersion”: “1.0.1-dev “process”: {” ... “args”: [ “dummy” ], ... }, ... Container (created) 6. Start the container execvp(“RT Process”) SL runtime Container (running) /var/run/exec.fifo config.json (OCI runtime spec) 3. Event loop select(fd of exec.fifo, ...) fast-startup.json dummy RT Process 4. Fast startup 2. Initialize the container based on the config.json e.g., namespaces, seccomp, etc. 5. Write the contents of the fast-startup.json 1. Create a container

Slide 23

Slide 23 text

Secure Mechanism 22

Slide 24

Slide 24 text

Hardware (Resource Constraints) Capabilities Container (created) Our Container Platform Operating System 23 Architecture Overview Rootless Containers by User Namespace Container (running) Fine-Grained Access Control (FGAC) Server Namespaces Seccomp AppArmor CPU Affinity SL runtime Container (running) Fast Startup Capture the system call Execute a system call e.g., mount Create a secure container Launch a secure container Perform the system call on behalf of the container Start the container with an arbitrary execution process

Slide 25

Slide 25 text

Rootless Containers by User Namespace  FGAC enables the rootless containers to execute system calls safely  FGAC server emulates the system call in userspace on behalf of the container  The rootless container can access devices safely via mount(2), mknod(2), etc.  FGAC mechanism is achieved using the new seccomp notify feature 24 Fine-Grained Access Control (FGAC) FGAC Server A: Allow mount tmpfs B: Deny mount tmpfs SL runtime Container A (running) Linux Kernel Container B (running) mount tmpfs mount tmpfs Perform the mount on behalf of the container A ✔ ✖

Slide 26

Slide 26 text

 Provide a way to handle a particular system call in userspace  Introduced in Linux 5.0 25 Seccomp Notify Feature Userspace Kernel Seccomp Agent 1. Issue a system call e.g., mount() Container 4. The container wants to run the system call ioctl(fd, SECCOMP_IOCTL_NOTIF_RECV, req) 5. Read the system call arguments from /proc/$pid/mem 6. Validate the system call if OK, go to 7a. If NG, go to 7b 7a. Perform the system call on behalf of the process 7b. Reject the system call 8a. Set the return value to 0 (success) 8b. Set the return value to error code (failure) ioctl(fd, SECCOMP_IOCTL_NOTIF_SEND, req) Process 2. Execute filter 3. Return “notify” cBPF Program Seccomp 9a. Return 0 (success) 9b. Return error code (failure)

Slide 27

Slide 27 text

 Launch a FGAC server before starting a container  The server is launched as root by only a system administrator  Run the container using config.json that describes seccomp notify  OCI runtime specification already supports seccomp notify [1] 26 Design of FGAC [1] https://github.com/opencontainers/runtime-spec/pull/1074 Container “seccomp”: { “defaultAction”: SCMP_ACT_ALLOW” “listenerPath”: “/var/run/notify.sock “architectures”: [ “SCMP_ARCH_X86” ] “syscalls”: [ { “names”: [ “mount” ] “action”: “SCMP_ACT_NOTIFY” FGAC Server 4. Create a seccomp notify fd 5. Pass the notify fd to FGAC server via SCM_RIGHTS (notify.sock) 2. Input the config.json SL runtime 3. Initialize a container 1. Launch the server with security policy Admin

Slide 28

Slide 28 text

Evaluation 27

Slide 29

Slide 29 text

 Goals  Start time of the containers: Normal run and Fast startup  Memory consumption of the container runtimes  Environment  Host: AMD Ryzen 9 3900X 12-Core (Ubuntu 20.04)  Evaluated the runtimes: SL runtime, runsc, singularity, runc, crun and railcar  Experimental Setup  All the runtimes use same config.json  Remove cgroups configuration because SL runtime does not support it yet  Run the container runtimes alone without any client tools  Execute /usr/bin/true inside containers 28 Evaluation

Slide 30

Slide 30 text

 SL runtime is the fastest among the existing runtimes  Normal run achieves a 7.4x speed-up compared to runc  Fast startup achieves a 1.5x speed-up compared to the Normal run 29 Results: Start Time Normal Run Time Fast Startup Time [Version of the evaluated runtimes] runsc:v20201208.0, singularity: v3.1.0, runc: v1.0.0-rc93, crun: v0.18, railcar: v1.0.4

Slide 31

Slide 31 text

 SL runtime memory usage is equivalent to crun written in C  Rust is a great fit for resource-constrained systems 30 Results: Memory Usage Memory Usage [Version of the evaluated runtimes] runsc:v20201208.0, singularity: v3.1.0, runc: v1.0.0-rc93, crun: v0.18, railcar: v1.0.4

Slide 32

Slide 32 text

Summary 31

Slide 33

Slide 33 text

 Fully compliance with OCI runtime specification  SL runtime is a research prototype  Support some features such as cgroups  Enable Kubernetes to use SL runtime  RuntimeClass == SL Runtime  We plan to integrate SL runtime into Kata Containers  Kata Containers has already developed the container runtime in Rust 32 Future Work

Slide 34

Slide 34 text

 Rust is a great fit for embedded systems  Small memory footprint and binary size for resource-constrained systems  Memory safety without any overhead for mission-critical systems  Rust-based container runtime optimized for embedded systems  Fast startup that launches a container speedily from a pre-created container  Fine-grained access control for the rootless container  The results show that our runtime is suitable for embedded systems  Run the container 7.4x faster than runc  The runtime memory usage is equivalent to crun written in C 33 Conclusion

Slide 35

Slide 35 text

No content