Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PremDay #2 - Lightning Talk - BPF and low-level...

Avatar for PremDay PremDay
April 07, 2025

PremDay #2 - Lightning Talk - BPF and low-level errors

Vincent Minet presents an approach using BPF applications to detect and report low-level errors like broken block devices.

Avatar for PremDay

PremDay

April 07, 2025
Tweet

More Decks by PremDay

Other Decks in Technology

Transcript

  1. 3 Monitoring Monitoring is hard • More data sources =

    more information • In-band and out-of-band monitoring are complementary In-band monitoring • The kernel is well positioned to know when hardware fails • It has a ton of contextual information Interfacing • But logging on the console has limited usability • Is there an API to get this context?
  2. 4 Tracepoints Low-level observability • Tracepoints are probe points placed

    in strategic position in the kernel • Log structured information with very low overhead • Ftrace API Example Hardware errors • Many kernel error code paths have tracepoints (eg block_rq_error) • rasdaemon
  3. 5 BPF Flexibility • Tracepoints arguments are fixed • What

    if you want more context? • Do you need to recompile the kernel? Observability superpowers • Low overhead in-kernel virtual machine • BPF programs can be attached to tracepoints / error functions • Can be used to create our own context Flight recorder pattern • Record context in kernel-space • Output to user-space on error • User-space exfiltrate to centralized location for analysis
  4. 6 Overview FLR daemon BPF program BPF ringbuf User space

    Kernel space Prometheus Off device FAA daemon