$30 off During Our Annual Pro Sale. View Details »

Prometheus as exposition format for eBPF programs running on Kubernetes

Prometheus as exposition format for eBPF programs running on Kubernetes

Nowadays every application exposes their metrics via an HTTP endpoint readable by using Prometheus. Recently the exposition format got included into the OpenMetrics standard of the CNCF. Nevertheless, this very common pattern by definition only expose metrics regarding the specific applications being observed.

This talk wants to expose the idea, and a reference implementation, of a slightly different use case that uses eBPF programs as a source of information to allow the exposition and collection of kernel and application probes via a Prometheus endpoint.

Leonardo Di Donato

September 19, 2019
Tweet

More Decks by Leonardo Di Donato

Other Decks in Programming

Transcript

  1. Prometheus as exposition
    format for eBPF programs
    running on Kubernetes
    Leonardo Di Donato. Open Source Software Engineer @ Sysdig.
    2019.09.19 - DevOpsDays - Istanbul, Turkey

    View Slide

  2. whoami
    Leonardo Di Donato.
    Maintainer of Falco.
    Creator of kubectl-trace, kube-bpf, kubectl-dig, and go-syslog.
    Reach me out @leodido on twitter & github.

    View Slide

  3. Deal with just a few
    instead of thousands
    of them.
    Aggregate events
    at kernel level
    @leodido

    View Slide

  4. What if we could exploit Prometheus
    (or OpenMetrics) awesomeness without
    having to punctually instrument
    applications to monitor?
    Can we avoid to clog our applications
    through eBPF superpowers?
    eBFP superpowers
    @leodido

    View Slide

  5. What eBPF is
    You can now write mini programs that run on events like
    disk I/O which are run in a safe register-based VM
    using a custom 64 bit RISC instruction set in the
    kernel.
    In-kernel verifier refuses to load eBPF programs with
    invalid pointer dereferences, exceeding maximum call
    stack, or with loop without an upper bound.
    Imposes a stable Application Binary Interface (ABI).
    Even more amazing than BPF 🚀
    A core part of the Linux kernel.
    @leodido
    extended because it’s not just packets anymore

    View Slide

  6. load
    compile
    @leodido
    BPF_MAP_CREATE
    BPF_MAP_LOOKUP_ELEM
    BPF_MAP_UPDATE_ELEM
    BPF_MAP_DELETE_ELEM
    BPF_MAP_GET_NEXT_KEY
    http://bit.ly/bpf_map_types 📎
    BPF_PROG_TYPE_SOCKET_FILTER
    BPF_PROG_TYPE_KPROBE
    BPF_PROG_TYPE_TRACEPOINT
    BPF_PROG_TYPE_RAW_TRACEPOINT
    BPF_PROG_TYPE_XDP
    BPF_PROG_TYPE_PERF_EVENT
    BPF_PROG_TYPE_CGROUP_SKB
    BPF_PROG_TYPE_CGROUP_SOCK
    BPF_PROG_TYPE_SOCK_OPS
    BPF_PROG_TYPE_SK_SKB
    BPF_PROG_TYPE_SK_MSG
    BPF_PROG_TYPE_SCHED_CLS
    BPF_PROG_TYPE_SCHED_ACT
    📎 http://bit.ly/bpf_prog_types
    man 2 bpf
    man 8 tc-bpf
    How does eBFP work?
    user-space
    kernel
    BPF source
    BPF ELF
    bpf()
    verifier
    BPF
    Maps
    Maps
    data
    kprobe uprobe
    static
    tracepoint
    perf events XDP socket filter

    View Slide

  7. • fully programmable
    • event driven
    • can trace everything in a system
    • not limited to a specific application
    • unified tracing interface for both kernel and
    userspace
    • {k,u}probes, (dtrace)tracepoints and so on
    are also used by other tools
    • minimal (negligible) performance impact
    • attach JIT native compiled inst. code
    • no long suspensions of execution
    Advantages
    • requires a fairly recent kernel
    • definitely not for debugging
    • no knowledge of the calling higher level
    language implementation
    • not fully running in user space
    • kernel-user context (usually negligible)
    switch when eBPF instrument a user process
    • still not portable as other tracers
    • VM primarily developer in the Linux kernel
    (work-in-progress portings btw)
    Disadvantages
    Why use eBPF at all to trace userspace processes?

    View Slide

  8. @leodido
    Count packets by protocol Count sys_enter_write by process ID
    macro to generate sections inside the object file (later interpreted by the ELF BPF loader)

    View Slide

  9. Why not instrumenting eBPF
    programs for Kubernetes?

    View Slide

  10. Just use a sidecar container
    • A sidecar container
    sharing the process
    namespace
    • Image with eBPF loader +
    eBPF program in it
    • Not very generic
    approach but does the
    job! 🤔
    @leodido

    View Slide

  11. github.com/bpftools/kube-bpf
    🔗
    Like loading whatever eBPF program from
    its ELF using a Kubernetes CRD ? 🤯
    Grab metrics via eBPF and expose them
    using a Prometheus endpoint.
    Something
    more generic?
    @leodido

    View Slide

  12. @leodido
    BFP operator for
    Kubernetes
    Why don’t we make eBPF programs look
    more YAML ✌✌✌

    View Slide

  13. 📎 http://bit.ly/k8s_crd
    An extension of the
    K8S API that let you
    store and retrieve
    structured data.
    Custom resources
    📎 http://bit.ly/k8s_shared_informers
    The actual control
    loop that watches the
    shared state using the
    workqueue.
    Shared informers
    📎
    http://bit.ly/k8s_custom_controllers
    It declares and
    specifies the desired
    state of your resource
    continuously trying to
    match it with the
    actual state.
    Controllers
    Customize all the things

    View Slide

  14. @leodido
    BPF
    runner
    bpf()
    syscall
    eBPF
    program
    ...
    user-space
    kernel
    eBPF
    map
    eBPF
    program
    ...
    BPF
    runner
    bpf()
    syscall
    eBPF
    program
    ...
    user-space
    kernel
    eBPF
    map
    eBPF
    program
    BPF
    CRD
    Here’s the evil plan
    :9387/metrics :9387/metrics

    View Slide

  15. @leodido as seen on https://yaml.engineering
    Did y’all say
    Y’AML?!
    let’s put some ELF magic
    in it...
    󰨐🤯󰩃

    View Slide

  16. @leodido
    Compile and inspect
    This is important because communicates to set the
    current running kernel version!
    Tricky and controversial legal thing about
    licenses ...
    The bpf_prog_load() wrapper also has a license
    parameter to provide the license that applies to
    the eBPF program being loaded.
    Not GPL-compatible license?
    Kernel won’t load you eBPF!
    Exceptions applies...
    eBPF
    Maps

    View Slide

  17. @leodido
    encoded ELF
    👀
    BPF custom
    resource
    👀

    View Slide

  18. @leodido
    encoded ELF
    👀
    BPF custom
    resource
    👀

    View Slide

  19. @leodido
    Demo time
    Doing all the eBPF things, with YAML 💦

    View Slide

  20. @leodido

    View Slide

  21. @leodido
    # HELP test_packets No. of packets per protocol (key), node
    # TYPE test_packets counter
    test_packets{key="00001",node="127.0.0.1"} 8
    test_packets{key="00002",node="127.0.0.1"} 1
    test_packets{key="00006",node="127.0.0.1"} 551
    test_packets{key="00008",node="127.0.0.1"} 1
    test_packets{key="00017",node="127.0.0.1"} 15930
    test_packets{key="00089",node="127.0.0.1"} 9
    test_packets{key="00233",node="127.0.0.1"} 1
    # EOF
    It is a WIP project but already open source! 🎺
    Check the protocol numbers 🔗
    Check it out @ gh:bfptools/kube-bpf 🔗
    ip-10-12-0-136.ec2.internal:9387/metrics
    # <- ICMP
    # <- IGMP
    # <- TCP
    # <- EGP
    # <- UDP
    # <- OSPF
    # <- ?

    View Slide

  22. @leodido
    # HELP test_dummy No. sys_enter_write calls per PID (key), node
    # TYPE test_dummy counter
    test_dummy{key="00001",node="127.0.0.1"} ...
    test_dummy{key="00001",node="127.0.0.1"} 8
    test_dummy{key="00295",node="127.0.0.1"} 1
    test_dummy{key="01278",node="127.0.0.1"} 1158
    test_dummy{key="04690",node="127.0.0.1"} 209
    test_dummy{key="04691",node="127.0.0.1"} 889
    # EOF
    It is a WIP project but already open source! 🎺
    Check it out @ gh:bfptools/kube-bpf 🔗
    ip-10-12-0-122.ec2.internal:9387/metrics

    View Slide

  23. Packets everywhere

    View Slide

  24. View Slide

  25. @leodido
    It is a WIP project but already open source! 🎺
    Contributions are welcome! 🎊
    Check it out @ gh:bfptools/kube-bpf 🔗

    View Slide

  26. kubectl-trace
    More eBPF + Kubernetes?
    Run bpftrace program (from file)
    Ctrl-C tells the
    program to
    plot the results
    using hist()
    The output histogram
    Maps

    View Slide

  27. Falco
    More eBPF + Kubernetes?

    View Slide

  28. @leodido
    • Prometheus exposition format is here to stay given how simple it is 📊
    • OpenMetrics will introduce improvements on such giant shoulders 📈
    • We cannot monitor and observe everything from inside our applications 🎯
    • We might want to have a look at the orchestrator (context) our apps live
    and die in 🕸
    • Kubernetes can be extended to achieve such levels of integrations 🔌
    • ELF is cool 🧝
    • We look for better tools (eBPF) for grabbing our metrics and even more 🔮
    • Almost nullify footprint ⚡
    • Enable a wider range of available data 🌊
    • Do not touch our applications directly 👻
    • There is a PoC doing some magic at github.com/bfptools/kube-bpf 🧞
    Key takeaways

    View Slide

  29. Acronyms & Abbreviations
    In case you wonder
    ABI Application Binary Interface
    BPF Berkeley Packet Filters
    CRD Custom Resource Definition (Kubernetes)
    eBPF extended Berkeley Packet Filters
    ELF Executable and Linkable Format
    RISC Reduced instruction set computer
    VM Virtual Machine

    View Slide

  30. Thanks.
    Reach me out @leodido on twitter & github!
    SEE Y’ALL AROUND AT KUBECON
    Slides here.

    View Slide