Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Kenta Tada, CNCJ Organizer CNCF End User TAB Beyond the Limits of eBPF A Journey Through OS Innovations

Slide 3

Slide 3 text

Introduction ● Can't read a string with eBPF? ● A bug? A limitation? Or kernel design? ● Today, we'll dive into the OS internals behind this mystery. bpf_probe_read_user_str helper

Slide 4

Slide 4 text

What is bpf_probe_read_user_str()? ● Standard method in eBPF to read strings from user space ● Safe, but with conditions: ○ The page must be resident in physical memory

Slide 5

Slide 5 text

bpf_probe_read_user_str() returns -EFAULT ● Reproduction conditions: ○ Areas mapped by an application but never accessed ○ Pages swapped out ● Result: ○ bpf_probe_read_user_str() returns -EFAULT

Slide 6

Slide 6 text

What is Page-Out? ● Mechanism of virtual memory ○ Not all pages are always in physical memory ● Unused pages are evicted to disk (swap) ● A user-space pointer might "exist," but its data may not be physically present

Slide 7

Slide 7 text

Constraints of Non-Sleepable Context ● Most BPF hooks (e.g., kprobe, tracepoint) run in non- sleepable contexts ● Page faults (which require sleeping) are not allowed ● Therefore, bpf_probe_read_user_str() cannot fault in a page, resulting in -EFAULT

Slide 8

Slide 8 text

Leveraging mincore() and madvise() ● mincore() ○ Allows checking whether a given virtual memory page is resident in physical memory ○ In our demo, we use it to visualize “whether the data is here now ● madvise() ○ A way to give hints to the kernel about memory usage patterns ○ With MADV_DONTNEED, we can explicitly trigger page eviction ● These syscalls enable experimental control over page-in and page-out behavior

Slide 9

Slide 9 text

Demo ● Use mincore() to confirm page-in status ● After touching a page to bring it into memory, BPF read succeeds ● Same address, but outcome changes based on timing ● Whether data is "there right now" is the deciding factor

Slide 10

Slide 10 text

Summary ● Behind strange eBPF errors lie kernel-level mechanisms ● The true lesson isn't that a read failed, but understanding why it failed