Slide 1

Slide 1 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. eBPF technologies with container 2024年 2月 21日 多田 健太 トヨタ自動車株式会社 情報通信企画部 InfoTech-IS 兼オープンソースプログラムグループ 主幹 Container Runtime Meetup #5 1

Slide 2

Slide 2 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Introduction ⚫ eBPF is a great and revolutionary technology but it is difficult to understand internals. ⚫ Especially, using eBPF programs in the container environment is getting more complicated. ⚫ This session helps to operate eBPF programs in your container- based production system. 2

Slide 3

Slide 3 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Agenda ⚫ About me ⚫ What is eBPF ⚫ Challenges with eBPF ⚫ Prepare for operating eBPF programs in prod 3

Slide 4

Slide 4 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. About me ⚫ Kenta Tada ⚫ Project Manager @ Toyota Motor Corporation ⚫ I’m researching and developing both server-side and automotive systems. ✓Especially, I’m trying to integrate eBPF technologies into our systems. ⚫ I’m a member of our open source program office. ⚫ Recent activities ⚫ The reviewer of 入門 eBPF ⚫ Cloud Native Community Japan Organizer ✓CNCF Cloud Native Community Japan 4

Slide 5

Slide 5 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. What is eBPF ⚫ eBPF is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules. 5 What is eBPF? – eBPF

Slide 6

Slide 6 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. What is possible ⚫ Networking ⚫ Speed packet processing without leaving kernel space. Add additional protocol parsers and easily program any forwarding logic to meet changing requirements. ⚫ Observability ⚫ Collection and in-kernel aggregation of custom metrics with generation of visibility events and data structures from a wide range of possible sources without having to export samples. ⚫ Tracing & Profiling ⚫ Attach eBPF programs to trace points as well as kernel and user application probe points giving powerful introspection abilities and unique insights to troubleshoot system performance problems. ⚫ Security ⚫ Combine seeing and understanding all system calls with a packet and socket-level view of all networking to create security systems operating on more context with a better level of control. 6 eBPF - Introduction, Tutorials & Community Resources

Slide 7

Slide 7 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Challenges with eBPF ⚫ Security ⚫ If one is going to run code in the kernel space, it’s going to have access to a lot of capabilities that normal programs on computers don’t get. ⚫ Performance tradeoffs ⚫ Doing too many things with eBPF may end up eating the gains. ⚫ Co-existence ⚫ eBPF tools will have to work in combination with other software. ⚫ Deep kernel expertise ⚫ Programming eBPF effectively requires deep kernel expertise. ⚫ Too much data ⚫ Interoperability The_State_of_eBPF.pdf (linuxfoundation.org) 7

Slide 8

Slide 8 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Prepare for operating eBPF programs in prod ⚫ Confirm kernel facilities for eBPF ⚫ Available facilities for eBPF depend on kernel versions and architectures. ⚫ For example, eBPF tracing programs (fentry/fexit/fmod_ret/lsm) on arm64 was not supported before introducing ftrace direct call support(v6.4). ✓ https://lore.kernel.org/bpf/20230405180250.2046566-1-revest@chromium.org/ ⚫ Observe eBPF utilization in prod ⚫ If you want to load it in prod, we should observe not only applications but eBPF programs. ⚫ Understand Linux Kernel internals for eBPF ⚫ Ex1. Memory leak in bpffs ⚫ Ex2. The behavior of bpf_send_signal ⚫ Ex3. uprobes in a separated mount namespace 8

Slide 9

Slide 9 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Confirm kernel facilities for eBPF ⚫ Kernel Configuration for eBPF Features ⚫ bcc/docs/kernel_config.md at master · iovisor/bcc · GitHub ⚫ The list of such program types supported in the kernel ⚫ bcc/docs/kernel-versions.md at master · iovisor/bcc · GitHub ⚫ The list of program types and supported helper functions ⚫ bcc/docs/kernel-versions.md at master · iovisor/bcc · GitHub ⚫ How to inspect eBPF programs in your system on the fly ⚫ Use bpftool ⚫ Especially, bpftool-feature shows the the running kernel about eBPF- related parameters 9

Slide 10

Slide 10 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Security tradeoffs ⚫ Decide which kernel facilities are actually needed for eBPF ⚫ Modify the kernel parameters ✓ Ex. /proc/sys/net/core/bpf_jit_harden ⚫ Check bpf_override_return() (CONFIG_BPF_KPROBE_OVERRIDE) ✓ Use case : chaos engineering tools ⚫ Restrict bpf_probe_write_user() using LSM Lockdown ✓ bpf_probe_write_user() can overwrite the user memory. ⚫ Maybe, we cannot disable configurations depending on most eBPF-based tools(Especially systemd). ⚫ CONFIG_BPF_SYSCALL ⚫ CONFIG_CGROUP_BPF ⚫ If the facility is experimental, we can disable it. ⚫ CONFIG_BPFILTER 10

Slide 11

Slide 11 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Observe eBPF utilization in prod (1/3) ⚫ bpftool is useful to inspect your system and eBPF programs. ⚫ If you use systemd, you can see any eBPF programs. ⚫ List eBPF programs attached to tracing facilities ⚫ # bpftool perf ⚫ List eBPF programs attached to all cgroups ⚫ # bpftool cgroup tree 11 /sys/fs/cgroup/system.slice/systemd-oomd.service 13 cgroup_inet_ingress multi 12 cgroup_inet_egress multi 11 cgroup_device multi /sys/fs/cgroup/system.slice/systemd-resolved.service 14 cgroup_device multi /sys/fs/cgroup/system.slice/systemd-timesyncd.service 15 cgroup_device multi

Slide 12

Slide 12 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Observe eBPF utilization in prod (2/3) ⚫ Q. What tools are running? 519: sched_cls name cil_to_host tag 2aa6812762b4536b gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 352B jited 194B memlock 4096B map_ids 73 btf_id 222 525: sched_cls name tail_handle_ipv4_from_netdev tag 6a33aa4c8f330faf gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 936B jited 596B memlock 4096B map_ids 73,90 btf_id 229 528: sched_cls name cil_from_host tag ece73a7f3e04c10f gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 2016B jited 1297B memlock 4096B map_ids 73,72,90 btf_id 232 530: sched_cls name __send_drop_notify tag bb5bcebce88430e5 gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 376B jited 217B memlock 4096B map_ids 70 btf_id 234 12

Slide 13

Slide 13 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Observe eBPF utilization in prod (3/3) ⚫ A. Cilium ⚫ Some tools give names to their BPF programs with the prefix. ✓Ex. Cilium : cil_ ⚫ bpftool is actually useful but we need more information about each eBPF program. 13

Slide 14

Slide 14 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Memory leak in BPFFS ⚫ BPFFS : BPF File System ⚫ A user space process can pin a BPF program or map in BPFFS. ⚫ We experienced the below issue about memory leak in BPFFS when we tried OpenTelemetry Auto Instrumentation using eBPF. ⚫ Call the Cleanup method of bpffs to remove the bpf fs after instrumen… by RonFed · Pull Request #347 · open- telemetry/opentelemetry-go-instrumentation · GitHub ⚫ You can show the pinned paths in BPFFS. ✓# bpftool prog show --bpffs ⚫ But how to detect the memory leak of BPFFS in other BPFFS instances?? ⚫ Ex1. Dedicated BPFFS instance ✓ See https://lpc.events/event/11/contributions/933/ ⚫ Ex2. BPF token will use BPFFS inside each mount namespace. 14

Slide 15

Slide 15 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. The behavior of bpf_send_signal (1/2) ⚫ The bpf_send_signal() which is one of bpf-helper functions helps to send signals from kernel space. ⚫ This function is used for security observability. ⚫ For example, Tetragon tries to kill malicious processes by sending a SIGKILL using bpf_send_signal() synchronously. Malicious Process Kernel Attack SIGKILL bpf_send_signal() from your eBPF program 15

Slide 16

Slide 16 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. The behavior of bpf_send_signal (2/2) ⚫ Q. If I tried to stop linkat(2) by bpf_send_signal(), the process is killed but the new link file is created. ⚫ A. The kernel checks the flag of signals before returning to user space. ⚫ Some kernel components check signals in fatal_signal_pending(). ✓For example, when the page cache is written back to storage in generic_perform_write(), fatal_signal_pending() is executed. ⚫ But it depends on the kernel side implementation. ⚫ After linkat(2) is done, the process is killed. 16

Slide 17

Slide 17 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. uprobes in a separated mount namespace ⚫ Some libbpf-based tools could not register uprobes in a container environment correctly. ⚫ libbpf-tools/gethostlatency: Resolve the path of libc for different namespaces by KentaTada · Pull Request #4785 · iovisor/bcc · GitHub ⚫ libbpf-tools: support to find symbols in different mount namespace by ethercflow · Pull Request #4854 · iovisor/bcc · GitHub ⚫ When you try to register uprobes, you need ⚫ Inode of the target binary file ⚫ Offset in the target binary file ⚫ Because the path is different among mount namespaces, we cannot register uprobes in the kernel. 17

Slide 18

Slide 18 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Other challenges for us ⚫ First of all, we want to know capabilities of eBPF for our use cases. ⚫ From the perspective of our systems, we should consider ⚫ arm64 support ⚫ Security ⚫ Deploy ⚫ License ⚫ Without Kubernetes … 18

Slide 19

Slide 19 text

Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Key takeaways ⚫ Deep kernel knowledges are important to detect and prevent problems. ⚫ To integrate eBPF-based technologies into existing systems, we need a lot of knowledges about not only kernel space but user space. ⚫ Collaboration among diverse companies is essential to improve eBPF technologies. 19