Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Go eBPF superpowers

Go eBPF superpowers

GoLab 2019 - Florence, Italy.

It has been three years since eBPF have been described "Superpowers for Linux".
Since then eBPF evolved a lot and with it its whole ecosystem grew that much so that tools, libraries, and frameworks to work with eBPF have been created, Go libraries too.
This talk first explains what eBPF is, basically a way to code the kernel without risking to break it and without having to recompile it, then shows how to use eBPF in Go, with Go or against Go programs.
The gobpf library from the IOVisor Linux Foundation branch, for example, provides low-level routines to load and use eBPF programs from ELF files, and also provides bindings to the BCC (eBPF Compiler Collection) framework. Which is just another framework for simplifying working with eBPF by the IOVisor organization.

Leonardo Di Donato

October 21, 2019
Tweet

More Decks by Leonardo Di Donato

Other Decks in Programming

Transcript

  1. Go eBPF superpowers
    Leonardo Di Donato. Open Source Software Engineer @ Sysdig.
    2019.10.21 - GoLab - Florence, Italy

    View Slide

  2. whoami
    Leonardo Di Donato.
    Maintainer of Falco.
    Creator of kubectl-trace, kube-bpf, kubectl-dig, and go-syslog.
    Reach me out @leodido on twitter & github.

    View Slide

  3. 1. 1992
    The BSD Packet Filter: A new architecture for user-level packet capture (S. McCanne & Van Jacobson)
    a. VM working with register (accumulators) based CPUs
    b. 20x times faster than the state of art at the time
    2. 1997 - Port to Linux
    3. Jan. 2014 ‍ - Alexei Starovoitov extended the BPF implementation
    a. 10 64-bit (general purpose) registers + 1 stack register
    b. 512-byte stack
    c. 4x faster than previous implementation
    d. Still restricted to kernel space
    4. Jun. 2014 - Exposed to user-space
    a. top level kernel subsystem
    b. no more limited to networking stack only
    c. emphasis on safety and security
    In the beginning ...

    View Slide

  4. «eBPF does to Linux what
    JavaScript does to HTML[1]»
    @leodido
    [1]: http://www.brendangregg.com/blog/2019-01-01/learn-ebpf-tracing.html

    View Slide

  5. eBPF ~= V8 coding directly in these two is incredibly hard, wanna try?
    iovisor/gobpf - dropbox/goebpf - iovisor/bpftrace - iovisor/bcc
    Disclaimer: simpler to use frameworks!
    eBFP superpowers
    @leodido

    View Slide

  6. To summarize:
    Run code safely in the
    kernel without having to
    write a kernel module.
    eBPF

    View Slide

  7. What eBPF is
    You can now write mini programs that run on events like disk I/O
    which are run in a safe register-based VM using a custom 64 bit
    RISC instruction set in the kernel.
    In-kernel verifier refuses to load eBPF programs with:
    • invalid or bad pointer dereferences
    • exceeding maximum call stack
    • loops without an upper bound
    Imposes a stable Application Binary Interface (ABI).
    Even more amazing than cBPF
    A core part of the Linux kernel.
    @leodido
    extended because it’s not just packets anymore

    View Slide

  8. load
    compile
    @leodido
    BPF_PROG_TYPE_SOCKET_FILTER
    BPF_PROG_TYPE_KPROBE
    BPF_PROG_TYPE_TRACEPOINT
    BPF_PROG_TYPE_RAW_TRACEPOINT
    BPF_PROG_TYPE_XDP
    BPF_PROG_TYPE_PERF_EVENT
    BPF_PROG_TYPE_CGROUP_SKB
    BPF_PROG_TYPE_CGROUP_SOCK
    BPF_PROG_TYPE_SOCK_OPS
    BPF_PROG_TYPE_SK_SKB
    BPF_PROG_TYPE_SK_MSG
    BPF_PROG_TYPE_SCHED_CLS
    BPF_PROG_TYPE_SCHED_ACT
    bit.ly/bpf_prog_types
    man 2 bpf
    man 8 tc-bpf
    How does eBFP work?
    user-space
    kernel
    BPF source
    BPF ELF
    bpf()
    verifier
    BPF
    Maps
    Maps
    data
    kprobe uprobe
    static
    tracepoint
    perf event
    XDP
    (net driver)
    socket filter
    eBPF opcodes
    eBPF maps
    BPF_PROG_LOAD
    BPF_MAP_CREATE
    cgroups
    TC
    (traffic control)
    networking
    tracing/monitoring

    View Slide

  9. PF_PROG_TYPE_SOCKET_FILTER, // Packet filtering
    BPF_PROG_TYPE_KPROBE, // Tracing (any function)
    BPF_PROG_TYPE_SCHED_CLS, // Packet filtering (TC)
    BPF_PROG_TYPE_SCHED_ACT, // Packet filtering (TC)
    BPF_PROG_TYPE_TRACEPOINT, // Tracing (stable tracepoints)
    BPF_PROG_TYPE_XDP, // Packet filtering (driver level)
    BPF_PROG_TYPE_PERF_EVENT, // Tracing (Proc. Monit. Unit events)
    BPF_PROG_TYPE_CGROUP_SKB, // Access control (IP ingress/egress)
    BPF_PROG_TYPE_CGROUP_SOCK, // Access control (socket crea/ops/…)
    BPF_PROG_TYPE_LWT_IN, // Network tunnels
    BPF_PROG_TYPE_LWT_OUT, // Network tunnels
    BPF_PROG_TYPE_LWT_XMIT, // Network tunnels
    BPF_PROG_TYPE_SOCK_OPS, // Update socket options
    BPF_PROG_TYPE_SK_SKB, // Socket redirection
    BPF_PROG_TYPE_CGROUP_DEVICE, // Access control (device)
    BPF_PROG_TYPE_SK_MSG, // Data stream filtering
    BPF_PROG_TYPE_RAW_TRACEPOINT, // Tracing
    BPF_PROG_TYPE_CGROUP_SOCK_ADDR, // Access control (socket binding)
    BPF_PROG_TYPE_LWT_SEG6LOCAL, // Network tunnels
    BPF_PROG_TYPE_LIRC_MODE2, // Infra-red remote control protocols
    BPF_PROG_TYPE_SK_REUSEPORT, // Select socket to use
    BPF_PROG_TYPE_FLOW_DISSECTOR, // Network processing
    BPF_PROG_TYPE_CGROUP_SYSCTL, // Access control (procfs)
    BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE, // Tracing
    ... // ...
    @leodido

    View Slide

  10. Wait, maps?
    Sharing state between eBPF kernel programs, but
    especially between kernel and user-space
    applications.
    Each map type has:
    • a type
    • a max number of elements
    • key size (bytes)
    • value size (bytes)
    (async) in-kernel key-value store
    Generic data structure for storage of different types of data.
    @leodido
    Types:
    BPF_MAP_TYPE_HASH // Hash map
    BPF_MAP_TYPE_ARRAY // Array
    BPF_MAP_TYPE_PROG_ARRAY // BPF tail calls
    BPF_MAP_TYPE_PERF_EVENT_ARRAY // Stream info
    BPF_MAP_TYPE_PERCPU_HASH // Per-CPU hash map
    BPF_MAP_TYPE_PERCPU_ARRAY // Per-CPU array
    BPF_MAP_TYPE_STACK_TRACE // Stack info for tracing
    BPF_MAP_TYPE_CGROUP_ARRAY // Store refs to cgroups
    BPF_MAP_TYPE_LRU_HASH // Least recently used cache
    BPF_MAP_TYPE_LRU_PERCPU_HASH // Per-CPU LRU cache
    BPF_MAP_TYPE_LPM_TRIE // Longest prefix match
    BPF_MAP_TYPE_ARRAY_OF_MAPS // Array of eBPF maps
    BPF_MAP_TYPE_HASH_OF_MAPS // Hash map of eBPF maps
    BPF_MAP_TYPE_DEVMAP // Redirect packet to device
    BPF_MAP_TYPE_SOCKMAP // Redirect packet to socket
    BPF_MAP_TYPE_CPUMAP // Redirect packet to CPU
    BPF_MAP_TYPE_XSKMAP // Redirect packet AF_XDP socket
    BPF_MAP_TYPE_SOCKHASH // Redirect packet to socket
    BPF_MAP_TYPE_CGROUP_STORAGE // Store data per cgroup
    BPF_MAP_TYPE_REUSEPORT_SOCKARRAY // Socket for packet
    BPF_MAP_TYPE_QUEUE // FIFO
    BPF_MAP_TYPE_STACK // LIFO
    BPF_MAP_TYPE_SK_STORAGE // Store data per socket


    bit.ly/bpf_map_types
    Operations:
    BPF_MAP_CREATE
    BPF_MAP_LOOKUP_ELEM
    BPF_MAP_UPDATE_ELEM
    BPF_MAP_DELETE_ELEM
    BPF_MAP_GET_NEXT_KEY
    bit.ly/bpf_map_commands

    View Slide

  11. • Print debugging messages
    • Interact with eBPF maps
    • Find out about the current context
    • Macros (SEC, …)
    • ...
    • bpf_trace_printk() -> /sys/kernel/debug/tracing/trace_pipe bit.ly/bpf_print_helper
    • bpf_map_{lookup,delete,update,push,pop,peek}_elem() bit.ly/bpf_map_helpers
    • bpg_get_current_{pid_tgid,uid_gid,cgroup_id,task}() bit.ly/bpf_context_helpers
    • SEC(“...”) bit.ly/bpf_sec_helper
    • ...
    “standard library” [tools/testing/selftests/bpf/bpf_helpers.h] ~100 functions and counting ✌
    eBPF helper functions
    @leodido
    Warning: bpf_helpers.h not distributed with the kernel headers, copy it from your distro’s linux source package!

    View Slide

  12. 1. Write some (restricted ➡ safe) C language:
    a. max 4096 instructions (up to 1 million for root)
    b. unbounded loops
    c. global variables
    d. variadic functions
    e. passing structs as function arguments
    f. out-of-range jumps
    g. unreachable code
    h. read uninitialised registers/memory
    i. out-of-bound/random memory access
    Write eBPF!
    @leodido

    View Slide

  13. ELF
    BPF opcodes
    BPF maps
    1. Compile with clang to convert it to eBPF bytecode
    a. Standard ELF format file
    2. Load with bpf (or with a framework)
    a. Gives a file descriptor to the program
    3. Attach the program to an hook/event using the file descriptor
    4. The kernel JIT compiles it into native machine code instructions for performances
    a. ARM{32,64}, MIPS, RISC V, Sparc64, S390, x86_{32,64}
    5. Automatically removed when instances detached / file descriptor closed
    a. Pin program to the /sys/fs/bpf virtual file system to keep it loaded
    eBPF life
    (restricted) C eBPF bytecode machine code
    @leodido

    View Slide

  14. 1. Examine the arguments of a function
    2. Examine its context
    a. PID
    b. parent
    c. UID
    d. stack
    e. etc.
    3. Examine function’s return value ( {u,k}retprobe )
    4. Collect statistics
    5. Aggregate and process all of these
    6. Modify the behaviour of the function
    7. Modify the content of function variables
    What eBPF can do
    @leodido

    View Slide

  15. macro to generate sections inside the object file
    (later interpreted by the ELF BPF loader)
    eBPF helpers
    pkts.c
    Count packets by protocol
    @leodido

    View Slide

  16. @leodido
    Compile and inspect ELF
    This communicates to set the current running
    kernel version!
    Tricky and controversial legal thing about
    licenses ...
    The bpf_prog_load() wrapper also has a license
    parameter to provide the license that applies to
    the eBPF program being loaded.
    Not GPL-compatible license?
    Kernel won’t load some eBPF!
    Exceptions applies...
    eBPF
    Maps

    View Slide

  17. iovisor/gobpf/elf
    elf/elf.go#556-565
    elf/elf.go#278-287 (elfReadLicense)
    elf/elf.go#289-302 (elfReadVersion)
    elf/elf.go#361-409 (elfReadMaps)
    ● kprobe/…
    ● kretprobe/…
    ● cgroup/skb…
    ● cgroup/sock…
    ● maps/…
    ● socket…
    ● tracepoint/…
    ● uprobe/…
    ● uretprobe/…
    ● sched_cls/…
    ● sched_act/…
    ● version
    ● license
    Section conventions: SEC(“...”)
    elf/module.go#94-108
    Module struct
    func (b *Module) Load(...) error
    @leodido

    View Slide

  18. Let’s eBPF with Go
    ELFs to the rescue!
    godoc.org/github.com/iovisor/gobpf/elf
    Instantiate new module from
    object (ELF) file
    Load eBPF sections from it
    Retrieve our eBPF map by
    section name
    Retrieve our socket filter by
    section name
    Attach socket filter to all
    network interfaces by socket
    file descriptor
    @leodido

    View Slide

  19. Let’s eBPF with Go
    Poll eBPF map data!
    @leodido
    Method `m.LookupNextElement()` looks up the next element in the `data` map using given the key `k`.
    The next key and the value are stored in the `unsafe.Pointer` parameters.
    Return `false` when there are no other keys in the map.
    Polling eBPF map data every second for 10 seconds.
    Complete example @ github.com/leodido/go-ebpf-examples

    View Slide

  20. So, how many packets?
    $ sudo ./bin/countpackets
    0 25
    6 25
    -----

    -----
    0 8
    1 202
    6 202
    -----
    0 16
    1 319
    6 319
    -----

    -----
    0 20
    1 4
    17 392
    6 392
    -----
    quit
    List of IP protocol numbers
    ● 0: HOPOPT
    ● 1: ICMP
    ● 6: TCP
    ● 17: UDP
    @leodido
    Grab it @ github.com/leodido/go-ebpf-examples

    View Slide

  21. Let’s eBPF in Go
    Hello clone!
    iovisor/gobpf/bcc
    BCC is a framework for BPF tools.
    Mostly a set of Python wrappers.
    The iovisor/gobpf project provides low-level
    routines to load and use eBPF programs from
    ELF files as well as Go bindings for BCC.
    Include eBPF code inside your Go file!
    no SEC macro?
    @leodido
    golang.org/cmd/cgo

    View Slide

  22. Let’s eBPF in Go
    helloworld.go
    (github.com/leodido/go-ebpf-examples )
    Left as home exercise:
    Do the same using a tracepoint.
    Suggestion: sys_enter_clone.
    BCC under the hoods!
    iovisor/gobpf/bcc
    @leodido

    View Slide

  23. Let’s eBPF in Go
    Complete example @ github.com/leodido/go-ebpf-examples
    $ sudo ./bin/helloworld
    Say hello at each "__x64_sys_clone" syscall ...
    $ sudo cat /sys/kernel/debug/tracing/trace_pipe
    zsh-16435 [005] ...3 1911.783126: 0: pid<16435> uid<1000> tid<16435> hello clone
    <...>-31662 [005] ...3 10682.395852: 0: pid<31662> uid<1000> tid<31662> hello clone
    vsls-agent-31676 [002] ...3 10732.644700: 0: pid<2101> uid<1000> tid<31676> hello clone
    Execution & output
    @leodido

    View Slide

  24. Verify the verifier
    $ sudo ./bin/countpackets
    error while loading "socket/countpackets" (permission denied):
    0: (bf) r6 = r1
    1: (30) r0 = *(u8 *)skb[23]
    2: (63) *(u32 *)(r10 -4) = r0
    3: (bf) r6 = r10
    4: (07) r6 += -4
    5: (18) r1 = 0xffff9e29aa6e8c00
    7: (bf) r2 = r6
    8: (85) call bpf_map_lookup_elem#1
    9: (61) r1 = *(u32 *)(r0 +0)
    R0 invalid mem access 'map_value_or_null'

    Let’s remove the pointer’s check ...
    @leodido

    View Slide

  25. Let’s eBPF over Go
    • Sits on top of BCC
    • Embeds built-in functions and variables
    • PID
    • One-liners!
    • Ships ready-to-use scripts
    • Better documentation
    # Syscall count by program
    bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
    # Read bytes by process
    bpftrace -e 'tracepoint:syscalls:sys_exit_read /args->ret/ { @[comm] = sum(args->ret); }'
    # Count page faults by process
    bpftrace -e 'software:faults:1 { @[comm] = count(); }'
    # Profile user-level stacks at 99 Hertz, for PID 189
    bpftrace -e 'profile:hz:99 /pid == 189/ { @[ustack] = count(); }'
    Thanks to iovisor/bpftrace expressivity!
    A language that abstracts on top of eBPF restricted C
    @leodido

    View Slide

  26. @leodido
    • Makes kernel programmable again
    • In-kernel async key-value store

    • Traces everything
    • Negligible overhead
    • Avoid user-space allocations
    • Performances ⚡
    • Event driven
    • Ecosystem growing
    • Load from ELF
    • Compile on the fly
    • Various ready-to-run scripts ♻
    Key takeaways
    • Linux only
    • Requires recent kernels
    • Still missing tools
    • libraries needs love
    • frameworks needs love
    • eBPF alone can be complex to use

    View Slide

  27. Acronyms & Abbreviations
    In case you wonder
    ABI Application Binary Interface
    BPF Berkeley Packet Filters
    cBPF classic Berkeley Packet Filters
    eBPF extended Berkeley Packet Filters
    ELF Executable and Linkable Format
    RISC Reduced instruction set computer
    VM Virtual Machine
    @leodido

    View Slide

  28. There’s a book!
    Wait wait wait wait!
    From Lorenzo Fontana and David Calavera
    It contains everything BPF
    Most of code examples are in Go
    Foreword by Jessie Frazelle

    View Slide

  29. Thanks.
    Reach me out @leodido on twitter & github!
    SEE Y’ALL AROUND AT KUBECON NA 2019
    Slides here.

    View Slide