Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Beyond_the_Limits_of_eBPF__A_Journey_Through_OS...

 Beyond_the_Limits_of_eBPF__A_Journey_Through_OS_Technologies.pdf

Avatar for Kenta Tada

Kenta Tada

June 16, 2025
Tweet

More Decks by Kenta Tada

Other Decks in Programming

Transcript

  1. Kenta Tada, CNCJ Organizer CNCF End User TAB Beyond the

    Limits of eBPF A Journey Through OS Innovations
  2. Introduction • Can't read a string with eBPF? • A

    bug? A limitation? Or kernel design? • Today, we'll dive into the OS internals behind this mystery. bpf_probe_read_user_str helper
  3. What is bpf_probe_read_user_str()? • Standard method in eBPF to read

    strings from user space • Safe, but with conditions: ◦ The page must be resident in physical memory
  4. bpf_probe_read_user_str() returns -EFAULT • Reproduction conditions: ◦ Areas mapped by

    an application but never accessed ◦ Pages swapped out • Result: ◦ bpf_probe_read_user_str() returns -EFAULT
  5. What is Page-Out? • Mechanism of virtual memory ◦ Not

    all pages are always in physical memory • Unused pages are evicted to disk (swap) • A user-space pointer might "exist," but its data may not be physically present
  6. Constraints of Non-Sleepable Context • Most BPF hooks (e.g., kprobe,

    tracepoint) run in non- sleepable contexts • Page faults (which require sleeping) are not allowed • Therefore, bpf_probe_read_user_str() cannot fault in a page, resulting in -EFAULT
  7. Leveraging mincore() and madvise() • mincore() ◦ Allows checking whether

    a given virtual memory page is resident in physical memory ◦ In our demo, we use it to visualize “whether the data is here now • madvise() ◦ A way to give hints to the kernel about memory usage patterns ◦ With MADV_DONTNEED, we can explicitly trigger page eviction • These syscalls enable experimental control over page-in and page-out behavior
  8. Demo • Use mincore() to confirm page-in status • After

    touching a page to bring it into memory, BPF read succeeds • Same address, but outcome changes based on timing • Whether data is "there right now" is the deciding factor
  9. Summary • Behind strange eBPF errors lie kernel-level mechanisms •

    The true lesson isn't that a read failed, but understanding why it failed