Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rust-based, Secure and Lightweight Container Ru...

Rust-based, Secure and Lightweight Container Runtime for Embedded Systems

Presentation slide at Cloud Native Rust Day 2021

Manabu Sugimoto

May 03, 2021
Tweet

More Decks by Manabu Sugimoto

Other Decks in Technology

Transcript

  1. Manabu, Sugimoto R&D Center, Sony Group Corporation Cloud Native Rust

    Day 2021 Rust-based, Secure and Lightweight Container Runtime for Embedded Systems Copyright 2021 Sony Group Corporation
  2.  Manabu Sugimoto  System Software Engineer, Sony R&D Center

     Interests  Containers, Unikernels and Linux Kernel 1 About Me
  3.  Introduction  Container virtualization, Container runtime  Motivation 

    Containers on embedded systems, Problems of existing container runtimes  Proposal  Rust-based, secure and lightweight container runtime  Evaluation  How well is our runtime compared to the existing runtimes?  Summary  Future work, Conclusion 2 Outline of the Talk
  4. 4 Container Virtualization  Container provides isolation and containment for

    applications  The mechanism can prevent attacks from untrusted applications  Container has been utilized increasingly in embedded systems  More attractive to resource-constrained systems due to the lightweight Container Trusted Application Container Container Platform Operating System Hardware Container Runtime Untrusted Application Dos Attack × Container Trusted Application Illegal Access ✔ ✔ ✖ ✖ Access Access
  5.  Container runtime spawns and runs containers  OCI (Open

    Container Initiative) runtime specification compliance  Sometimes referred to as “low-level” runtimes  Set up cgroups, namespaces, capabilities, seccomp, etc. 5 What is Container Runtime? Linux Kernel Namespaces Capabilities Cgroups Seccomp ... Container Runtime
  6. Today’s Topic 6 Container Runtime Stack Client kubelet High-level Runtime

    containerd CRI-O Low-level Runtime runc runsc Singularity CRI OCI ※ From the next slides onwards, the term container runtime refers to the low-level runtime
  7.  Embedded systems have more restrictions than server systems 

    Resource-constrained systems  Small memory size  Low-capacity storage  Low-spec CPU  Mission-critical systems  Real-time application  Critical functionality  Longer life cycle Requirements of Embedded Systems Embedded Systems Low Resource Utilization Security High Response High Dependability 8
  8.  Using Kubernetes or Docker on embedded systems is difficult

     Include performance overhead and high resource usage  Write operations by the daemon process shorten the lifespan of eMMC*  We run the low-level container runtime alone on the systems 9 Containers on Embedded Systems *Embedded Multimedia Card Lightweight Container Trusted Application Operating System Hardware Container Trusted Application Daemon Process Container Runtime Container Trusted Application Operating System Hardware Container Trusted Application Container Runtime eMMC eMMC General-purpose Systems Embedded Systems
  9.  The existing runtimes are not optimized for embedded systems

     Security  Linux capabilities are not fine-grained access control  e.g., Both ping and ARP spoofing need CAP_NET_RAW  The rootless container by user namespace is very strict for the systems  The rootless container cannot emulate all system calls  However, some embedded apps need to access devices via mount(2), mknod(2), etc.  Lightweight  Container startup time is not fast enough for real-time systems  The go-based runtimes are not suitable for resource-constrained systems  The application binary size is big  Garbage Collection (GC) includes high CPU utilization 10 Problems of the Existing Runtimes
  10.  Secure and Lightweight container runtime (SL runtime)  SL

    runtime is implemented fully in Rust with modern crates  Minimal container runtime for embedded systems  OCI-compatible runtime 12 Rust-based Container Runtime 真ん中に持って いく 左が secure 右が ligheweight Secure Mechanism Lightweight Mechanism Embedded Systems Low Resource Utilization Security High Response High Dependability  Low memory usage  Small binary size  Fast startup mechanism  Real-time support  Isolation by container  Fine-grained access control  Memory safety ✔ ✔ ✔ ✔
  11. 13 Comparison with the Existing Runtimes SL runtime runsc (v20201208.0)

    runc (v1.0.0-rc93) singularity (v3.1.0) crun (v0.18) railcar (v1.0.4) Language Rust Go Go and C Go and C C Rust OCI compatibility       Binary size* 2.63 MB 23.6 MB 14.0 MB 18.0 MB 0.43 MB 1.68 MB Memory safety       Fine-grained access control       GC overhead Not included Included Included Included Not included Not included Fast startup       Real-time support (WIP)      *All binary files are stripped Comparison Table from the Perspective of Embedded Systems
  12.  Rust is a great fit for embedded systems 

    Performance is equivalent to C/C++  Memory safety without GC  Small application binary size  Awesome crates for developing the container runtime  FFI (Foreign Function Interface) to bind Linux API  Go is also good language but has some limitations  Problem interacting with namespaces by go-runtime  The application binary size is big compared to Rust  Overhead by GC 14 Why Rust?
  13.  Many awesome crates for developing the container runtime 15

    Crates for the Container Runtime  capability : https://crates.io/crates/caps  rlimit : https://crates.io/crates/rlimit  cgroups : https://crates.io/crates/cgroups-rs  seccomp : https://crates.io/crates/seccomp-sys  passfd : https://crates.io/crates/passfd  This crate is used for the fine-grained access control  core_affinity : https://crates.io/crates/core_affinity  This crate is used for the real-time support  etc.  clap : https://crates.io/crates/clap  serde_json : https://crates.io/crates/serde_json  anyhow : https://crates.io/crates/anyhow  etc. Developing Runtime Creating Container
  14. Our Container Platform Operating System 16 Architecture Overview Rootless Containers

    by User Namespace Container (running) Fine-Grained Access Control (FGAC) Server Namespaces Seccomp Capabilities Execute a system call e.g., mount Create a secure container AppArmor CPU Affinity Launch a secure container Capture the system call Perform the system call on behalf of the container Hardware (Resource Constrained System) SL runtime Start the container with arbitrary execution process Container (created) Container (running) Fast Startup
  15. Our Container Platform Launch a secure container Execute a system

    call e.g., mount Hardware (Resource Constrained System) Operating System Fine-Grained Access Control (FGAC) Server 18 Architecture Overview Rootless Containers by User Namespace Container (running) Container (created) Container (running) SL runtime Namespaces Seccomp Capabilities AppArmor CPU Affinity Capture the system call Create a secure container Start the container with an arbitrary execution process Perform the system call on behalf of the container Fast Startup
  16. Container (running) Container (running) Container (created) Container (created)  Launch

    a container speedily by leveraging a pre-created container  Omit time for initializing the runtime and creating the container  Replace only the execution process inside the container at startup  Reuse the other configuration except for the execution process Fast Startup Create Container Init Runtime Normal Run Reduced Time Fast Startup Fast Startup Elapsed Time Start Container Container (created) SL runtime Container (running) Real-Time App Fast Startup: Replace the process inside the container Linux Kernel Dummy 19
  17.  RT support enables the runtime to set CPU affinity

    at fast startup  Ensuring RT performance for embedded systems  Allow the runtime to set CPU affinity depending on the load at startup 20 Real-Time (RT) Support Container (created) SL runtime Container (running) RT Support: Set CPU affinity with fast startup Linux Kernel Hardware CPU 1 CPU 2 CPU 3 CPU 4 Dummy Real-Time App
  18. 21 Design of Fast Startup and RT Support { “args”:

    [ “RT process” ], “cpu”: 3 } “ociVersion”: “1.0.1-dev “process”: {” ... “args”: [ “dummy” ], ... }, ... Container (created) 6. Start the container execvp(“RT Process”) SL runtime Container (running) /var/run/exec.fifo config.json (OCI runtime spec) 3. Event loop select(fd of exec.fifo, ...) fast-startup.json dummy RT Process 4. Fast startup 2. Initialize the container based on the config.json e.g., namespaces, seccomp, etc. 5. Write the contents of the fast-startup.json 1. Create a container
  19. Hardware (Resource Constraints) Capabilities Container (created) Our Container Platform Operating

    System 23 Architecture Overview Rootless Containers by User Namespace Container (running) Fine-Grained Access Control (FGAC) Server Namespaces Seccomp AppArmor CPU Affinity SL runtime Container (running) Fast Startup Capture the system call Execute a system call e.g., mount Create a secure container Launch a secure container Perform the system call on behalf of the container Start the container with an arbitrary execution process
  20. Rootless Containers by User Namespace  FGAC enables the rootless

    containers to execute system calls safely  FGAC server emulates the system call in userspace on behalf of the container  The rootless container can access devices safely via mount(2), mknod(2), etc.  FGAC mechanism is achieved using the new seccomp notify feature 24 Fine-Grained Access Control (FGAC) FGAC Server A: Allow mount tmpfs B: Deny mount tmpfs SL runtime Container A (running) Linux Kernel Container B (running) mount tmpfs mount tmpfs Perform the mount on behalf of the container A ✔ ✖
  21.  Provide a way to handle a particular system call

    in userspace  Introduced in Linux 5.0 25 Seccomp Notify Feature Userspace Kernel Seccomp Agent 1. Issue a system call e.g., mount() Container 4. The container wants to run the system call ioctl(fd, SECCOMP_IOCTL_NOTIF_RECV, req) 5. Read the system call arguments from /proc/$pid/mem 6. Validate the system call if OK, go to 7a. If NG, go to 7b 7a. Perform the system call on behalf of the process 7b. Reject the system call 8a. Set the return value to 0 (success) 8b. Set the return value to error code (failure) ioctl(fd, SECCOMP_IOCTL_NOTIF_SEND, req) Process 2. Execute filter 3. Return “notify” cBPF Program Seccomp 9a. Return 0 (success) 9b. Return error code (failure)
  22.  Launch a FGAC server before starting a container 

    The server is launched as root by only a system administrator  Run the container using config.json that describes seccomp notify  OCI runtime specification already supports seccomp notify [1] 26 Design of FGAC [1] https://github.com/opencontainers/runtime-spec/pull/1074 Container “seccomp”: { “defaultAction”: SCMP_ACT_ALLOW” “listenerPath”: “/var/run/notify.sock “architectures”: [ “SCMP_ARCH_X86” ] “syscalls”: [ { “names”: [ “mount” ] “action”: “SCMP_ACT_NOTIFY” FGAC Server 4. Create a seccomp notify fd 5. Pass the notify fd to FGAC server via SCM_RIGHTS (notify.sock) 2. Input the config.json SL runtime 3. Initialize a container 1. Launch the server with security policy Admin
  23.  Goals  Start time of the containers: Normal run

    and Fast startup  Memory consumption of the container runtimes  Environment  Host: AMD Ryzen 9 3900X 12-Core (Ubuntu 20.04)  Evaluated the runtimes: SL runtime, runsc, singularity, runc, crun and railcar  Experimental Setup  All the runtimes use same config.json  Remove cgroups configuration because SL runtime does not support it yet  Run the container runtimes alone without any client tools  Execute /usr/bin/true inside containers 28 Evaluation
  24.  SL runtime is the fastest among the existing runtimes

     Normal run achieves a 7.4x speed-up compared to runc  Fast startup achieves a 1.5x speed-up compared to the Normal run 29 Results: Start Time Normal Run Time Fast Startup Time [Version of the evaluated runtimes] runsc:v20201208.0, singularity: v3.1.0, runc: v1.0.0-rc93, crun: v0.18, railcar: v1.0.4
  25.  SL runtime memory usage is equivalent to crun written

    in C  Rust is a great fit for resource-constrained systems 30 Results: Memory Usage Memory Usage [Version of the evaluated runtimes] runsc:v20201208.0, singularity: v3.1.0, runc: v1.0.0-rc93, crun: v0.18, railcar: v1.0.4
  26.  Fully compliance with OCI runtime specification  SL runtime

    is a research prototype  Support some features such as cgroups  Enable Kubernetes to use SL runtime  RuntimeClass == SL Runtime  We plan to integrate SL runtime into Kata Containers  Kata Containers has already developed the container runtime in Rust 32 Future Work
  27.  Rust is a great fit for embedded systems 

    Small memory footprint and binary size for resource-constrained systems  Memory safety without any overhead for mission-critical systems  Rust-based container runtime optimized for embedded systems  Fast startup that launches a container speedily from a pre-created container  Fine-grained access control for the rootless container  The results show that our runtime is suitable for embedded systems  Run the container 7.4x faster than runc  The runtime memory usage is equivalent to crun written in C 33 Conclusion