Upgrade to Pro — share decks privately, control downloads, hide ads and more …

eBPF_technologies_with_container

Kenta Tada
February 21, 2024

 eBPF_technologies_with_container

Kenta Tada

February 21, 2024
Tweet

More Decks by Kenta Tada

Other Decks in Programming

Transcript

  1. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. eBPF

    technologies with container 2024年 2月 21日 多田 健太 トヨタ自動車株式会社 情報通信企画部 InfoTech-IS 兼オープンソースプログラムグループ 主幹 Container Runtime Meetup #5 1
  2. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Introduction

    ⚫ eBPF is a great and revolutionary technology but it is difficult to understand internals. ⚫ Especially, using eBPF programs in the container environment is getting more complicated. ⚫ This session helps to operate eBPF programs in your container- based production system. 2
  3. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Agenda

    ⚫ About me ⚫ What is eBPF ⚫ Challenges with eBPF ⚫ Prepare for operating eBPF programs in prod 3
  4. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. About

    me ⚫ Kenta Tada ⚫ Project Manager @ Toyota Motor Corporation ⚫ I’m researching and developing both server-side and automotive systems. ✓Especially, I’m trying to integrate eBPF technologies into our systems. ⚫ I’m a member of our open source program office. ⚫ Recent activities ⚫ The reviewer of 入門 eBPF ⚫ Cloud Native Community Japan Organizer ✓CNCF Cloud Native Community Japan 4
  5. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. What

    is eBPF ⚫ eBPF is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules. 5 What is eBPF? – eBPF
  6. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. What

    is possible ⚫ Networking ⚫ Speed packet processing without leaving kernel space. Add additional protocol parsers and easily program any forwarding logic to meet changing requirements. ⚫ Observability ⚫ Collection and in-kernel aggregation of custom metrics with generation of visibility events and data structures from a wide range of possible sources without having to export samples. ⚫ Tracing & Profiling ⚫ Attach eBPF programs to trace points as well as kernel and user application probe points giving powerful introspection abilities and unique insights to troubleshoot system performance problems. ⚫ Security ⚫ Combine seeing and understanding all system calls with a packet and socket-level view of all networking to create security systems operating on more context with a better level of control. 6 eBPF - Introduction, Tutorials & Community Resources
  7. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Challenges

    with eBPF ⚫ Security ⚫ If one is going to run code in the kernel space, it’s going to have access to a lot of capabilities that normal programs on computers don’t get. ⚫ Performance tradeoffs ⚫ Doing too many things with eBPF may end up eating the gains. ⚫ Co-existence ⚫ eBPF tools will have to work in combination with other software. ⚫ Deep kernel expertise ⚫ Programming eBPF effectively requires deep kernel expertise. ⚫ Too much data ⚫ Interoperability The_State_of_eBPF.pdf (linuxfoundation.org) 7
  8. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Prepare

    for operating eBPF programs in prod ⚫ Confirm kernel facilities for eBPF ⚫ Available facilities for eBPF depend on kernel versions and architectures. ⚫ For example, eBPF tracing programs (fentry/fexit/fmod_ret/lsm) on arm64 was not supported before introducing ftrace direct call support(v6.4). ✓ https://lore.kernel.org/bpf/[email protected]/ ⚫ Observe eBPF utilization in prod ⚫ If you want to load it in prod, we should observe not only applications but eBPF programs. ⚫ Understand Linux Kernel internals for eBPF ⚫ Ex1. Memory leak in bpffs ⚫ Ex2. The behavior of bpf_send_signal ⚫ Ex3. uprobes in a separated mount namespace 8
  9. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Confirm

    kernel facilities for eBPF ⚫ Kernel Configuration for eBPF Features ⚫ bcc/docs/kernel_config.md at master · iovisor/bcc · GitHub ⚫ The list of such program types supported in the kernel ⚫ bcc/docs/kernel-versions.md at master · iovisor/bcc · GitHub ⚫ The list of program types and supported helper functions ⚫ bcc/docs/kernel-versions.md at master · iovisor/bcc · GitHub ⚫ How to inspect eBPF programs in your system on the fly ⚫ Use bpftool ⚫ Especially, bpftool-feature shows the the running kernel about eBPF- related parameters 9
  10. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Security

    tradeoffs ⚫ Decide which kernel facilities are actually needed for eBPF ⚫ Modify the kernel parameters ✓ Ex. /proc/sys/net/core/bpf_jit_harden ⚫ Check bpf_override_return() (CONFIG_BPF_KPROBE_OVERRIDE) ✓ Use case : chaos engineering tools ⚫ Restrict bpf_probe_write_user() using LSM Lockdown ✓ bpf_probe_write_user() can overwrite the user memory. ⚫ Maybe, we cannot disable configurations depending on most eBPF-based tools(Especially systemd). ⚫ CONFIG_BPF_SYSCALL ⚫ CONFIG_CGROUP_BPF ⚫ If the facility is experimental, we can disable it. ⚫ CONFIG_BPFILTER 10
  11. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Observe

    eBPF utilization in prod (1/3) ⚫ bpftool is useful to inspect your system and eBPF programs. ⚫ If you use systemd, you can see any eBPF programs. ⚫ List eBPF programs attached to tracing facilities ⚫ # bpftool perf ⚫ List eBPF programs attached to all cgroups ⚫ # bpftool cgroup tree 11 /sys/fs/cgroup/system.slice/systemd-oomd.service 13 cgroup_inet_ingress multi 12 cgroup_inet_egress multi 11 cgroup_device multi /sys/fs/cgroup/system.slice/systemd-resolved.service 14 cgroup_device multi /sys/fs/cgroup/system.slice/systemd-timesyncd.service 15 cgroup_device multi
  12. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Observe

    eBPF utilization in prod (2/3) ⚫ Q. What tools are running? 519: sched_cls name cil_to_host tag 2aa6812762b4536b gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 352B jited 194B memlock 4096B map_ids 73 btf_id 222 525: sched_cls name tail_handle_ipv4_from_netdev tag 6a33aa4c8f330faf gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 936B jited 596B memlock 4096B map_ids 73,90 btf_id 229 528: sched_cls name cil_from_host tag ece73a7f3e04c10f gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 2016B jited 1297B memlock 4096B map_ids 73,72,90 btf_id 232 530: sched_cls name __send_drop_notify tag bb5bcebce88430e5 gpl loaded_at 2024-02-14T22:51:05+0900 uid 0 xlated 376B jited 217B memlock 4096B map_ids 70 btf_id 234 12
  13. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Observe

    eBPF utilization in prod (3/3) ⚫ A. Cilium ⚫ Some tools give names to their BPF programs with the prefix. ✓Ex. Cilium : cil_ ⚫ bpftool is actually useful but we need more information about each eBPF program. 13
  14. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Memory

    leak in BPFFS ⚫ BPFFS : BPF File System ⚫ A user space process can pin a BPF program or map in BPFFS. ⚫ We experienced the below issue about memory leak in BPFFS when we tried OpenTelemetry Auto Instrumentation using eBPF. ⚫ Call the Cleanup method of bpffs to remove the bpf fs after instrumen… by RonFed · Pull Request #347 · open- telemetry/opentelemetry-go-instrumentation · GitHub ⚫ You can show the pinned paths in BPFFS. ✓# bpftool prog show --bpffs ⚫ But how to detect the memory leak of BPFFS in other BPFFS instances?? ⚫ Ex1. Dedicated BPFFS instance ✓ See https://lpc.events/event/11/contributions/933/ ⚫ Ex2. BPF token will use BPFFS inside each mount namespace. 14
  15. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. The

    behavior of bpf_send_signal (1/2) ⚫ The bpf_send_signal() which is one of bpf-helper functions helps to send signals from kernel space. ⚫ This function is used for security observability. ⚫ For example, Tetragon tries to kill malicious processes by sending a SIGKILL using bpf_send_signal() synchronously. Malicious Process Kernel Attack SIGKILL bpf_send_signal() from your eBPF program 15
  16. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. The

    behavior of bpf_send_signal (2/2) ⚫ Q. If I tried to stop linkat(2) by bpf_send_signal(), the process is killed but the new link file is created. ⚫ A. The kernel checks the flag of signals before returning to user space. ⚫ Some kernel components check signals in fatal_signal_pending(). ✓For example, when the page cache is written back to storage in generic_perform_write(), fatal_signal_pending() is executed. ⚫ But it depends on the kernel side implementation. ⚫ After linkat(2) is done, the process is killed. 16
  17. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. uprobes

    in a separated mount namespace ⚫ Some libbpf-based tools could not register uprobes in a container environment correctly. ⚫ libbpf-tools/gethostlatency: Resolve the path of libc for different namespaces by KentaTada · Pull Request #4785 · iovisor/bcc · GitHub ⚫ libbpf-tools: support to find symbols in different mount namespace by ethercflow · Pull Request #4854 · iovisor/bcc · GitHub ⚫ When you try to register uprobes, you need ⚫ Inode of the target binary file ⚫ Offset in the target binary file ⚫ Because the path is different among mount namespaces, we cannot register uprobes in the kernel. 17
  18. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Other

    challenges for us ⚫ First of all, we want to know capabilities of eBPF for our use cases. ⚫ From the perspective of our systems, we should consider ⚫ arm64 support ⚫ Security ⚫ Deploy ⚫ License ⚫ Without Kubernetes … 18
  19. Copyright © 2024 TOYOTA MOTOR CORPORATION All rights reserved. Key

    takeaways ⚫ Deep kernel knowledges are important to detect and prevent problems. ⚫ To integrate eBPF-based technologies into existing systems, we need a lot of knowledges about not only kernel space but user space. ⚫ Collaboration among diverse companies is essential to improve eBPF technologies. 19