Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Go eBPF superpowers

Go eBPF superpowers

GoLab 2019 - Florence, Italy.

It has been three years since eBPF have been described "Superpowers for Linux".
Since then eBPF evolved a lot and with it its whole ecosystem grew that much so that tools, libraries, and frameworks to work with eBPF have been created, Go libraries too.
This talk first explains what eBPF is, basically a way to code the kernel without risking to break it and without having to recompile it, then shows how to use eBPF in Go, with Go or against Go programs.
The gobpf library from the IOVisor Linux Foundation branch, for example, provides low-level routines to load and use eBPF programs from ELF files, and also provides bindings to the BCC (eBPF Compiler Collection) framework. Which is just another framework for simplifying working with eBPF by the IOVisor organization.

3ca96bf73310050f689bbd36cc5571de?s=128

Leonardo Di Donato

October 21, 2019
Tweet

Transcript

  1. Go eBPF superpowers Leonardo Di Donato. Open Source Software Engineer

    @ Sysdig. 2019.10.21 - GoLab - Florence, Italy
  2. whoami Leonardo Di Donato. Maintainer of Falco. Creator of kubectl-trace,

    kube-bpf, kubectl-dig, and go-syslog. Reach me out @leodido on twitter & github.
  3. 1. 1992 The BSD Packet Filter: A new architecture for

    user-level packet capture (S. McCanne & Van Jacobson) a. VM working with register (accumulators) based CPUs b. 20x times faster than the state of art at the time 2. 1997 - Port to Linux 3. Jan. 2014 ‍ - Alexei Starovoitov extended the BPF implementation a. 10 64-bit (general purpose) registers + 1 stack register b. 512-byte stack c. 4x faster than previous implementation d. Still restricted to kernel space 4. Jun. 2014 - Exposed to user-space a. top level kernel subsystem b. no more limited to networking stack only c. emphasis on safety and security In the beginning ...
  4. «eBPF does to Linux what JavaScript does to HTML[1]» @leodido

    [1]: http://www.brendangregg.com/blog/2019-01-01/learn-ebpf-tracing.html
  5. eBPF ~= V8 coding directly in these two is incredibly

    hard, wanna try? iovisor/gobpf - dropbox/goebpf - iovisor/bpftrace - iovisor/bcc Disclaimer: simpler to use frameworks! eBFP superpowers @leodido
  6. To summarize: Run code safely in the kernel without having

    to write a kernel module. eBPF
  7. What eBPF is You can now write mini programs that

    run on events like disk I/O which are run in a safe register-based VM using a custom 64 bit RISC instruction set in the kernel. In-kernel verifier refuses to load eBPF programs with: • invalid or bad pointer dereferences • exceeding maximum call stack • loops without an upper bound Imposes a stable Application Binary Interface (ABI). Even more amazing than cBPF A core part of the Linux kernel. @leodido extended because it’s not just packets anymore
  8. load compile @leodido BPF_PROG_TYPE_SOCKET_FILTER BPF_PROG_TYPE_KPROBE BPF_PROG_TYPE_TRACEPOINT BPF_PROG_TYPE_RAW_TRACEPOINT BPF_PROG_TYPE_XDP BPF_PROG_TYPE_PERF_EVENT BPF_PROG_TYPE_CGROUP_SKB

    BPF_PROG_TYPE_CGROUP_SOCK BPF_PROG_TYPE_SOCK_OPS BPF_PROG_TYPE_SK_SKB BPF_PROG_TYPE_SK_MSG BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT bit.ly/bpf_prog_types man 2 bpf man 8 tc-bpf How does eBFP work? user-space kernel BPF source BPF ELF bpf() verifier BPF Maps Maps data kprobe uprobe static tracepoint perf event XDP (net driver) socket filter eBPF opcodes eBPF maps BPF_PROG_LOAD BPF_MAP_CREATE cgroups TC (traffic control) networking tracing/monitoring
  9. PF_PROG_TYPE_SOCKET_FILTER, // Packet filtering BPF_PROG_TYPE_KPROBE, // Tracing (any function) BPF_PROG_TYPE_SCHED_CLS,

    // Packet filtering (TC) BPF_PROG_TYPE_SCHED_ACT, // Packet filtering (TC) BPF_PROG_TYPE_TRACEPOINT, // Tracing (stable tracepoints) BPF_PROG_TYPE_XDP, // Packet filtering (driver level) BPF_PROG_TYPE_PERF_EVENT, // Tracing (Proc. Monit. Unit events) BPF_PROG_TYPE_CGROUP_SKB, // Access control (IP ingress/egress) BPF_PROG_TYPE_CGROUP_SOCK, // Access control (socket crea/ops/…) BPF_PROG_TYPE_LWT_IN, // Network tunnels BPF_PROG_TYPE_LWT_OUT, // Network tunnels BPF_PROG_TYPE_LWT_XMIT, // Network tunnels BPF_PROG_TYPE_SOCK_OPS, // Update socket options BPF_PROG_TYPE_SK_SKB, // Socket redirection BPF_PROG_TYPE_CGROUP_DEVICE, // Access control (device) BPF_PROG_TYPE_SK_MSG, // Data stream filtering BPF_PROG_TYPE_RAW_TRACEPOINT, // Tracing BPF_PROG_TYPE_CGROUP_SOCK_ADDR, // Access control (socket binding) BPF_PROG_TYPE_LWT_SEG6LOCAL, // Network tunnels BPF_PROG_TYPE_LIRC_MODE2, // Infra-red remote control protocols BPF_PROG_TYPE_SK_REUSEPORT, // Select socket to use BPF_PROG_TYPE_FLOW_DISSECTOR, // Network processing BPF_PROG_TYPE_CGROUP_SYSCTL, // Access control (procfs) BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, // Tracing ... // ... @leodido
  10. Wait, maps? Sharing state between eBPF kernel programs, but especially

    between kernel and user-space applications. Each map type has: • a type • a max number of elements • key size (bytes) • value size (bytes) (async) in-kernel key-value store Generic data structure for storage of different types of data. @leodido Types: BPF_MAP_TYPE_HASH // Hash map BPF_MAP_TYPE_ARRAY // Array BPF_MAP_TYPE_PROG_ARRAY // BPF tail calls BPF_MAP_TYPE_PERF_EVENT_ARRAY // Stream info BPF_MAP_TYPE_PERCPU_HASH // Per-CPU hash map BPF_MAP_TYPE_PERCPU_ARRAY // Per-CPU array BPF_MAP_TYPE_STACK_TRACE // Stack info for tracing BPF_MAP_TYPE_CGROUP_ARRAY // Store refs to cgroups BPF_MAP_TYPE_LRU_HASH // Least recently used cache BPF_MAP_TYPE_LRU_PERCPU_HASH // Per-CPU LRU cache BPF_MAP_TYPE_LPM_TRIE // Longest prefix match BPF_MAP_TYPE_ARRAY_OF_MAPS // Array of eBPF maps BPF_MAP_TYPE_HASH_OF_MAPS // Hash map of eBPF maps BPF_MAP_TYPE_DEVMAP // Redirect packet to device BPF_MAP_TYPE_SOCKMAP // Redirect packet to socket BPF_MAP_TYPE_CPUMAP // Redirect packet to CPU BPF_MAP_TYPE_XSKMAP // Redirect packet AF_XDP socket BPF_MAP_TYPE_SOCKHASH // Redirect packet to socket BPF_MAP_TYPE_CGROUP_STORAGE // Store data per cgroup BPF_MAP_TYPE_REUSEPORT_SOCKARRAY // Socket for packet BPF_MAP_TYPE_QUEUE // FIFO BPF_MAP_TYPE_STACK // LIFO BPF_MAP_TYPE_SK_STORAGE // Store data per socket … … bit.ly/bpf_map_types Operations: BPF_MAP_CREATE BPF_MAP_LOOKUP_ELEM BPF_MAP_UPDATE_ELEM BPF_MAP_DELETE_ELEM BPF_MAP_GET_NEXT_KEY bit.ly/bpf_map_commands
  11. • Print debugging messages • Interact with eBPF maps •

    Find out about the current context • Macros (SEC, …) • ... • bpf_trace_printk() -> /sys/kernel/debug/tracing/trace_pipe bit.ly/bpf_print_helper • bpf_map_{lookup,delete,update,push,pop,peek}_elem() bit.ly/bpf_map_helpers • bpg_get_current_{pid_tgid,uid_gid,cgroup_id,task}() bit.ly/bpf_context_helpers • SEC(“...”) bit.ly/bpf_sec_helper • ... “standard library” [tools/testing/selftests/bpf/bpf_helpers.h] ~100 functions and counting ✌ eBPF helper functions @leodido Warning: bpf_helpers.h not distributed with the kernel headers, copy it from your distro’s linux source package!
  12. 1. Write some (restricted ➡ safe) C language: a. max

    4096 instructions (up to 1 million for root) b. unbounded loops c. global variables d. variadic functions e. passing structs as function arguments f. out-of-range jumps g. unreachable code h. read uninitialised registers/memory i. out-of-bound/random memory access Write eBPF! @leodido
  13. ELF BPF opcodes BPF maps 1. Compile with clang to

    convert it to eBPF bytecode a. Standard ELF format file 2. Load with bpf (or with a framework) a. Gives a file descriptor to the program 3. Attach the program to an hook/event using the file descriptor 4. The kernel JIT compiles it into native machine code instructions for performances a. ARM{32,64}, MIPS, RISC V, Sparc64, S390, x86_{32,64} 5. Automatically removed when instances detached / file descriptor closed a. Pin program to the /sys/fs/bpf virtual file system to keep it loaded eBPF life (restricted) C eBPF bytecode machine code @leodido
  14. 1. Examine the arguments of a function 2. Examine its

    context a. PID b. parent c. UID d. stack e. etc. 3. Examine function’s return value ( {u,k}retprobe ) 4. Collect statistics 5. Aggregate and process all of these 6. Modify the behaviour of the function 7. Modify the content of function variables What eBPF can do @leodido
  15. macro to generate sections inside the object file (later interpreted

    by the ELF BPF loader) eBPF helpers pkts.c Count packets by protocol @leodido
  16. @leodido Compile and inspect ELF This communicates to set the

    current running kernel version! Tricky and controversial legal thing about licenses ... The bpf_prog_load() wrapper also has a license parameter to provide the license that applies to the eBPF program being loaded. Not GPL-compatible license? Kernel won’t load some eBPF! Exceptions applies... eBPF Maps
  17. iovisor/gobpf/elf elf/elf.go#556-565 elf/elf.go#278-287 (elfReadLicense) elf/elf.go#289-302 (elfReadVersion) elf/elf.go#361-409 (elfReadMaps) • kprobe/…

    • kretprobe/… • cgroup/skb… • cgroup/sock… • maps/… • socket… • tracepoint/… • uprobe/… • uretprobe/… • sched_cls/… • sched_act/… • version • license Section conventions: SEC(“...”) elf/module.go#94-108 Module struct func (b *Module) Load(...) error @leodido
  18. Let’s eBPF with Go ELFs to the rescue! godoc.org/github.com/iovisor/gobpf/elf Instantiate

    new module from object (ELF) file Load eBPF sections from it Retrieve our eBPF map by section name Retrieve our socket filter by section name Attach socket filter to all network interfaces by socket file descriptor @leodido
  19. Let’s eBPF with Go Poll eBPF map data! @leodido Method

    `m.LookupNextElement()` looks up the next element in the `data` map using given the key `k`. The next key and the value are stored in the `unsafe.Pointer` parameters. Return `false` when there are no other keys in the map. Polling eBPF map data every second for 10 seconds. Complete example @ github.com/leodido/go-ebpf-examples
  20. So, how many packets? $ sudo ./bin/countpackets 0 25 6

    25 ----- … ----- 0 8 1 202 6 202 ----- 0 16 1 319 6 319 ----- … ----- 0 20 1 4 17 392 6 392 ----- quit List of IP protocol numbers • 0: HOPOPT • 1: ICMP • 6: TCP • 17: UDP @leodido Grab it @ github.com/leodido/go-ebpf-examples
  21. Let’s eBPF in Go Hello clone! iovisor/gobpf/bcc BCC is a

    framework for BPF tools. Mostly a set of Python wrappers. The iovisor/gobpf project provides low-level routines to load and use eBPF programs from ELF files as well as Go bindings for BCC. Include eBPF code inside your Go file! no SEC macro? @leodido golang.org/cmd/cgo
  22. Let’s eBPF in Go helloworld.go (github.com/leodido/go-ebpf-examples ) Left as home

    exercise: Do the same using a tracepoint. Suggestion: sys_enter_clone. BCC under the hoods! iovisor/gobpf/bcc @leodido
  23. Let’s eBPF in Go Complete example @ github.com/leodido/go-ebpf-examples $ sudo

    ./bin/helloworld Say hello at each "__x64_sys_clone" syscall ... $ sudo cat /sys/kernel/debug/tracing/trace_pipe zsh-16435 [005] ...3 1911.783126: 0: pid<16435> uid<1000> tid<16435> hello clone <...>-31662 [005] ...3 10682.395852: 0: pid<31662> uid<1000> tid<31662> hello clone vsls-agent-31676 [002] ...3 10732.644700: 0: pid<2101> uid<1000> tid<31676> hello clone Execution & output @leodido
  24. Verify the verifier $ sudo ./bin/countpackets error while loading "socket/countpackets"

    (permission denied): 0: (bf) r6 = r1 1: (30) r0 = *(u8 *)skb[23] 2: (63) *(u32 *)(r10 -4) = r0 3: (bf) r6 = r10 4: (07) r6 += -4 5: (18) r1 = 0xffff9e29aa6e8c00 7: (bf) r2 = r6 8: (85) call bpf_map_lookup_elem#1 9: (61) r1 = *(u32 *)(r0 +0) R0 invalid mem access 'map_value_or_null' Let’s remove the pointer’s check ... @leodido
  25. Let’s eBPF over Go • Sits on top of BCC

    • Embeds built-in functions and variables • PID • One-liners! • Ships ready-to-use scripts • Better documentation # Syscall count by program bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' # Read bytes by process bpftrace -e 'tracepoint:syscalls:sys_exit_read /args->ret/ { @[comm] = sum(args->ret); }' # Count page faults by process bpftrace -e 'software:faults:1 { @[comm] = count(); }' # Profile user-level stacks at 99 Hertz, for PID 189 bpftrace -e 'profile:hz:99 /pid == 189/ { @[ustack] = count(); }' Thanks to iovisor/bpftrace expressivity! A language that abstracts on top of eBPF restricted C @leodido
  26. @leodido • Makes kernel programmable again • In-kernel async key-value

    store • Traces everything • Negligible overhead • Avoid user-space allocations • Performances ⚡ • Event driven • Ecosystem growing • Load from ELF • Compile on the fly • Various ready-to-run scripts ♻ Key takeaways • Linux only • Requires recent kernels • Still missing tools • libraries needs love • frameworks needs love • eBPF alone can be complex to use
  27. Acronyms & Abbreviations In case you wonder ABI Application Binary

    Interface BPF Berkeley Packet Filters cBPF classic Berkeley Packet Filters eBPF extended Berkeley Packet Filters ELF Executable and Linkable Format RISC Reduced instruction set computer VM Virtual Machine @leodido
  28. There’s a book! Wait wait wait wait! From Lorenzo Fontana

    and David Calavera It contains everything BPF Most of code examples are in Go Foreword by Jessie Frazelle
  29. Thanks. Reach me out @leodido on twitter & github! SEE

    Y’ALL AROUND AT KUBECON NA 2019 Slides here.