sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Process Container Runtime Normal user Root Rootful container Rootless container
sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Isolation broken! Process Container Runtime Normal user Root Rootful container Rootless container
sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Isolation broken! Attacker Process Process Container Runtime Normal user Root Rootful container Rootless container
sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Malicious operations with root! Isolation broken! Attacker Process Process Container Runtime Normal user Root Rootful container Rootless container
sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Malicious operations with root! Isolation broken! Attacker Process Process Container Runtime Isolation broken! Normal user Root Rootful container Rootless container
sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Malicious operations with root! Isolation broken! Attacker Process Process Container Runtime Isolation broken! Attacker Process Normal user Root Rootful container Rootless container
sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Malicious operations with root! Isolation broken! Attacker Process Process Container Runtime Malicious operations with normal user! Isolation broken! Attacker Process Normal user Root Rootful container Rootless container
sometimes breaks container isolation. In 2024, CVE-2024-21626(fd leaking for /sys/fs/cgroup in runc) was reported. Rootless containers: Running container runtimes without root. → They mitigates attacks against container runtimes and a host. 2 Process Container Runtime Malicious operations with root! Isolation broken! Attacker Process Process Container Runtime Malicious operations with normal user! Isolation broken! Attacker Process Normal user Root Rootful container Rootless container More secure!
It creates a socket on a host and switches sockets in a container to it. • The switched socket is not slowed down by rootless components. → It provides significant performance improvements. 4 Bypassing bottleneck!
requirements. • Fully rootless: Every components should run without root privileges. • No dedicated kernel modules: Implementing a secure and stable kernel module is too hard. Dedicated kernel modules increases maintenance costs. 5
requirements. • Fully rootless: Every components should run without root privileges. • No dedicated kernel modules: Implementing a secure and stable kernel module is too hard. Dedicated kernel modules increases maintenance costs. • No application modifications: Rootless containers aims to work as same as ordinary rootful containers. 5
• It switches the target of a file descriptor to other file description like a socket. 8 Rootless Container Process Target syscall Socket‘s file descriptor bypass4netns Socket switching module A socket in a container Seccomp Notify 5.execution
• It switches the target of a file descriptor to other file description like a socket. 8 Rootless Container Process Target syscall Socket‘s file descriptor bypass4netns Socket switching module A socket in a container Seccomp Notify 5.execution
• It switches the target of a file descriptor to other file description like a socket. 8 Rootless Container Process Target syscall Socket‘s file descriptor bypass4netns Socket switching module A socket in a container Seccomp Notify 5.execution 1. Notify
• It switches the target of a file descriptor to other file description like a socket. 8 Rootless Container Process Target syscall Socket‘s file descriptor bypass4netns Socket switching module A socket in a container A socket allocated on a host Seccomp Notify 5.execution 1. Notify 2. Allocate a socket
• It switches the target of a file descriptor to other file description like a socket. 8 Rootless Container Process Target syscall Socket‘s file descriptor bypass4netns Socket switching module A socket in a container A socket allocated on a host Seccomp Notify 5.execution 1. Notify 2. Allocate a socket 3. Socket switching (SECCOMP_IOCTL_NOTIF_ADDFD)
• It switches the target of a file descriptor to other file description like a socket. 8 Rootless Container Process Target syscall Socket‘s file descriptor bypass4netns Socket switching module A socket in a container A socket allocated on a host Seccomp Notify 5.execution 4. Socket switched 1. Notify 2. Allocate a socket 3. Socket switching (SECCOMP_IOCTL_NOTIF_ADDFD)
syscalls in iperf3, wget, nginx, httpd. bypss4netns hooks following syscalls. • (2) Configuration: records sockets configuration. • (3) Connection: checks endpoint and performs switching. • (4) Status: writes dummy values when needed. • (7) Close: cleanups the socket’s information. bypass4netns does not hook communication syscalls like recv(2),send(2) → Overhead by Seccomp Notify does not affect the performance. 10
destination is external endpoints. External Endpoints: Not other containers and loopback address 11 memset((char *)&end_addr, 0, sizeof(end_addr)); end_addr.sin_family = AF_INET; end_addr.sin_port = htons(80); end_addr.sin_addr.s_addr = inet_addr("133.3.254.6"); The endpoint is external Flow of switching in bypass4netns
performance issue. • Creating VXLAN interface on a host requires root privileges. • VXLAN interfaces in rootless containers are used. → slirp4netns and RootlessKit process VXLAN packets and it causes overhead. 12 Causes performance issue!
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 bypass4netns C2 bypass4netns HostA HostB
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 bypass4netns C2 Socket binding on 80/tcp bypass4netns HostA HostB
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 bypass4netns C2 Socket binding on 80/tcp Socket binded on 12345/tcp bypass4netns HostA HostB
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 bypass4netns C2 Socket binding on 80/tcp Socket binded on 12345/tcp bypass4netns Container Host C2:80/tcp HostB:12345/tcp KVS HostA HostB
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 Socket connecting to C2:80/tcp bypass4netns C2 Socket binding on 80/tcp Socket binded on 12345/tcp bypass4netns Container Host C2:80/tcp HostB:12345/tcp KVS HostA HostB
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 Socket connecting to C2:80/tcp bypass4netns C2 Socket binding on 80/tcp Socket binded on 12345/tcp bypass4netns Container Host C2:80/tcp HostB:12345/tcp KVS HostA HostB HostB:12345/tcp
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 Socket connecting to C2:80/tcp Socket connecting to HostB:12345/tcp bypass4netns C2 Socket binding on 80/tcp Socket binded on 12345/tcp bypass4netns Container Host C2:80/tcp HostB:12345/tcp KVS HostA HostB HostB:12345/tcp
1. A published port is bound with the host’s address. 2. The address and port are shared among hosts via KVS. 3. bypass4netns hooks the connection to the socket. 4. Rewrites connect(2)’s the container’s address to the host’s address. 13 C1 Socket connecting to C2:80/tcp Socket connecting to HostB:12345/tcp bypass4netns C2 Socket binding on 80/tcp Socket binded on 12345/tcp bypass4netns Container Host C2:80/tcp HostB:12345/tcp KVS HostA HostB HostB:12345/tcp
applications? • Does it provide faster communications and better performance? Evaluations were performed with 2 virtual machines. • CPU: AMD EPYC 7452 (2.35GHz base clock, 8 core assigned) • Memory: 16GB • OS: Ubuntu 22.04 14
Evaluations show • Most of tested applications worked. → Our approach works in real applications without modifications. • It worked including static linked applications. → bypass4netns works with apps that LD_PRELOAD cannot handle. Not working case exists • Seccomp Notify sometimes dropped syscalls silently. (with Go 1.21.3 HTTP Client) → This happens when running more than 2 HTTP Clients with goroutines. 15
• b4ns-pfd achieved 30x faster throughput(19.7 Gbps) than rootless-pfd(570 Mbps) • b4ns-pfd achieved better performance than rootful-pfd (17.9 Gbps). → b4ns-pfd bypassed components like a bridge and it brought better performance. • b4ns-multinode achieved the best performance in multi-node communications. 17
achieved 2x and b4ns-multimode achieved 3x better performance. • Rootless with bypass4netns achieved better performance than rootful. → We consider that same reason with iperf3 caused this. • bypass4netns provides better performance in a real used application. 18
No significant performance improvements were observed. • The latency was slightly reduced. • Rootless without bypass4netns provided almost the same performance as rootful. → Network performance seems not to be dominant in OLTP with RDB. 19
rootless containers have communication performance issues. Solution: bypass4netns, fully rootless socket switching based approach. • Switching sockets in rootless containers to host’s sockets. • We also proposed an approach to accelerate multi-node communication. Evaluation: 30x faster throughput and performance improvements in applications. bypass4netns removes performance tradeoffs and makes container security better. Upstream implementation is available at https://github.com/rootless-containers/bypass4netns 20
Fully rootless: Every components should run without root privileges. • No dedicated kernel modules: Implementing a secure and stable kernel module is too hard. Dedicated kernel modules lack compatibility and increase maintenance costs. • No application modifications: Rootless containers aims to work as same as ordinary rootful containers. 22
a host’s root privilege. → Rootless network components are used instead of veth IFs. • RootlessKit: handles incoming connections for published ports. • slirp4netns: handles outgoing connections to external endpoints. 23
degradation • RootlessKit: Relaying payloads between a host and a container. • slirp4netns: Extracts payloads from packets and sends it via host’s socket. 24
the socket exists in a host’s NetNS. → Other containers cannot connect to the socket with container’s address. bypass4netns rewrites an address to the sockets in hooking connect(2). • Rewrites the destination address via /proc/PID/mem. • Returns the container’s address when getpeername(2) called. 25
Go HTTP Client • Client binary was statically linked. → The result shows that bypass4netns can handle static linked binary correctly. • Containers with bypass4netns provides almost same performance as rootful-pfd. 26
→ Access control with iptables does not work. bypass4netns provides dynamic connectivity tracing mechanism. • Tracing agents checks connectivity between containers dynamically. • bypass4netns checks the status before socket switching. 27
be used as a versatile socket switching method. • LD_PRELOAD: Cannot handle static linked binaries. • Seccomp Notify: Can handle any binaries but cause huge overheads. → bypass4netns avoids this overhead by handling only required syscalls. bypass4netns provides the ability to switch TCP sockets to other sockets. • No application modifications is required. • Providing dynamic switching depending on the peer address. e.g. ) Switching connections to UNIX Domain sockets or RDMA sockets. → Faster and lighter communication 28
approach allocates all sockets on the host → This may lead port starvation or unintended connection. • Another approach is to use sidecar-proxy to relay communications. → The proxy can cause some performance degradation, but reliable. 29 Our approach Another approach Validating and relaying connections with proxy Switching only proxy’s socket Need many sockets! Unintended connection!