Upgrade to Pro — share decks privately, control downloads, hide ads and more …

eBPF Superpowers

Liz Rice
April 30, 2019

eBPF Superpowers

As seen at DockerCon SF 2019.

Find the code at https://gist.github.com/lizrice/47ad44a15cce912502f8667a403f5649

Liz Rice

April 30, 2019
Tweet

More Decks by Liz Rice

Other Decks in Technology

Transcript

  1. eBPF “Superpowers have finally come to Linux” - Brendan Gregg,

    Netflix “eBPF does to Linux what JavaScript does to HTML” @lizrice
  2. man bpf The bpf() system call performs a range of

    operations related to extended Berkeley Packet Filters. Extended BPF (or eBPF) is similar to the original ("classic") BPF (cBPF) used to filter network packets. For both cBPF and eBPF programs, the kernel statically analyzes the programs before loading them, in order to ensure that they cannot harm the running system. eBPF extends cBPF in multiple ways, including the ability to call a fixed set of in-kernel helper functions and access shared data structures such as eBPF maps. @lizrice
  3. man bpf eBPF programs can be written in a restricted

    C that is compiled (using the clang compiler) into eBPF bytecode. Various features are omitted from this restricted C, such as loops, global variables, variadic functions, floating-point numbers, and passing structures as function arguments. (limited) C eBPF bytecode @lizrice
  4. clang & LLVM “The LLVM Project is a collection of

    modular and reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines. The name "LLVM" itself is not an acronym; it is the full name of the project.” “Clang is an ‘LLVM native’ C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles” llvm.org @lizrice
  5. man bpf The kernel contains a just-in-time (JIT) compiler that

    translates eBPF bytecode into native machine code for better performance. @lizrice (limited) C eBPF bytecode machine code
  6. bcc “BCC makes BPF programs easier to write, with kernel

    instrumentation in C (and includes a C wrapper around LLVM), and front-ends in Python and lua.” github.com/iovisor/bcc @lizrice
  7. bcc bcc llvm bpf() python lua compiles eBPF program wrapper

    for bpf() syscalls compiles eBPF program easy coding C++ language support
  8. #!/usr/bin/python from bcc import BPF prog = """ int my_prog(void

    *ctx) { bpf_trace_printk("Hello world\\n"); return 0; } """ b = BPF(text=prog) b.attach_kprobe(event="sys_clone", fn_name="my_prog") b.trace_print() Use strace to see the system calls
  9. Triggering eBPF programs eBPF programs can be attached to different

    events. • Kprobes • Uprobes • Tracepoints • Network packets • Perf events • etc... @lizrice
  10. bcc function names b = BPF(text=""" int kprobe__sys_clone(void *ctx) {

    bpf_trace_printk("Hello, DockerCon!\\n"); return 0; } """) b.trace_print() @lizrice
  11. eBPF maps Maps are a generic data structure for storage

    of different types of data. They allow sharing of data between eBPF kernel programs, and also between kernel and user-space applications. Each map type has the following attributes: * type * maximum number of elements * key size in bytes * value size in bytes BPF_MAP_TYPE_UNSPEC BPF_MAP_TYPE_HASH BPF_MAP_TYPE_ARRAY BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF_MAP_TYPE_PERCPU_HASH BPF_MAP_TYPE_PERCPU_ARRAY BPF_MAP_TYPE_STACK_TRACE BPF_MAP_TYPE_CGROUP_ARRAY BPF_MAP_TYPE_LRU_HASH BPF_MAP_TYPE_LRU_PERCPU_HASH BPF_MAP_TYPE_LPM_TRIE BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS BPF_MAP_TYPE_DEVMAP BPF_MAP_TYPE_SOCKMAP BPF_MAP_TYPE_CPUMAP @lizrice
  12. (limited) C ELF object file ◦ eBPF opcodes ◦ eBPF

    maps clang -O2 -emit-llvm -c bpf.c -o - | llc -march=bpf -filetype=obj -o bpf.o @lizrice
  13. ELF object file ◦ eBPF opcodes ◦ eBPF maps kernel

    verifier BPF vm maps user space bpf() system calls @lizrice
  14. ELF object file ◦ eBPF opcodes ◦ eBPF maps kernel

    verifier BPF vm maps user space bpf() system calls BPF_PROG_LOAD BPF_MAP_CREATE @lizrice
  15. ELF object file ◦ eBPF opcodes ◦ eBPF maps kernel

    verifier BPF vm maps user space bpf() system calls BPF_PROG_LOAD BPF_MAP_CREATE Attach BPF program to event @lizrice
  16. ELF object file ◦ eBPF opcodes ◦ eBPF maps kernel

    verifier BPF vm maps user space bpf() system calls BPF_PROG_LOAD BPF_MAP_CREATE Attach BPF program to event Read / write maps BPF_MAP_GET_NEXT_KEY BPF_MAP_LOOKUP_ELEM BPF_MAP_UPDATE_ELEM BPF_MAP_DELETE_ELEM @lizrice
  17. from bcc import BPF from time import sleep program =

    """ BPF_HASH(syscalls); int hello(void *ctx) { u64 counter = 0; u64 key = 56; u64 *p; p = syscalls.lookup(&key); if (p != 0) { counter = *p; } counter++; #!/usr/bin/python syscalls.update(&key, &counter); return 0; } """ b = BPF(text=program) b.attach_kprobe(event="sys_clone", fn_name="hello") while True: sleep(3) for k,v in b["syscalls"].items(): print(k,v)
  18. eBPF helper functions These helpers are used by eBPF programs

    to interact with the system, or with the context in which they work. For instance, they can be used to print debugging messages, to get the time since the system was booted, to interact with eBPF maps, or to manipulate network packets. bpf_trace_printk() bpf_map_*_elem() bpf_get_current_pid_tgid() ... github.com/iovisor/bpf-docs/blob/master/bpf_helpers.rst @lizrice
  19. Verifier Each eBPF program is a set of instructions that

    is safe to run until its completion. An in-kernel verifier statically determines that the eBPF program terminates and is safe to execute. • No loops • No bad pointer dereferences • Restricted program size • Always exits @lizrice See what happens if you try to dereference pointer without checking it’s not NULL
  20. “The eBPF validator’s muse is a fickle miscreant with a

    very short attention span” - Jeff Dileo & Andy Olsen, NCC Group @lizrice
  21. A packet filter can drop packets But you can’t drop

    / fail a function call What an eBPF program can’t do @lizrice
  22. seccomp-bpf Blacklisting / whitelisting system calls e.g. Docker’s default seccomp

    profile Uses (classic) BPF: • Can’t dereference syscall arguments • No eBPF maps to communicate with userspace @lizrice
  23. Landlock In-development Linux Security Module Like seccomp, but using eBPF

    • Unprivileged process can set up its own sandbox (~ seccomp rules++) • Configure on the fly using eBPF maps • Cgroup aware • Access to kernel objects, so eBPF code can make more granular decisions @lizrice landlock.io
  24. A few references IO Visor Project - iovisor.org Brendan Gregg

    - brendangregg.com O’Reilly book “Linux Observability with BPF” - David Calavera and Lorenzo Fontana @lizrice