$30 off During Our Annual Pro Sale. View Details »

Prometheus as exposition format for eBPF programs running on Kubernetes

Prometheus as exposition format for eBPF programs running on Kubernetes

The kernel knows more than our programs. Stop bloating our applications with copy-and-paste instrumentation code for metrics. Let's go look under the hoods!

Nowadays every application exposes their metrics via an HTTP endpoint readable by using Prometheus. Nevertheless, this very common pattern, by definition only exposes metrics regarding the specific applications being observed.

This talk, and its companion slides, wants to expose the idea, and a reference implementation (https://github.com/bpftools/kube-bpf), of using eBPF programs to collect and automatically expose applications and kernel metrics via a Prometheus endpoint.

It walks through the architecture of the proposed reference implementation - a Kubernetes operator with a custom resource for eBPF programs - and finally links to a simple demo showing how to use it to grab and present some metrics without having touched any application running on the demo cluster.

---

Talk given at Cloud_Native Rejekts EU - Barcelona, Spain - on May 18th, 2019

Leonardo Di Donato

May 18, 2019
Tweet

More Decks by Leonardo Di Donato

Other Decks in Technology

Transcript

  1. Prometheus as exposition
    format for eBPF programs
    running on k8s
    Leonardo Di Donato. Open Source Software Engineer @ Sysdig.
    2019.05.18 - Cloud_Native Rejekts EU - Barcelona, Spain

    View Slide

  2. whoami
    Leonardo Di Donato.
    Maintainer of Falco.
    Creator of kubectl-trace and go-syslog.
    Reach me out @leodido.

    View Slide

  3. @leodido
    • Old buzzword.
    • Is this SNMP?
    • Focus on collecting, persisting, and alerting
    on just any data!
    • It might also become simply garbage.
    • Data lake.
    • Doing it well requires a strategy.
    • Uninformed monitoring equals hope.
    Monitoring
    The missing buzzwords
    Wait, another really cool buzzword is Tracing!
    • Ability of a system to give to humans
    insights.
    • Humans can observe, understand, and act on
    the presented state of an observable system.
    • Ability to make deductions about internal
    state only looking at boundaries (inputs vs
    outputs).
    • Never truly achieved. Ongoing process and
    mindset.
    • Avoid black box data. Extract fine-grained
    and meaningful data.
    Observability

    View Slide

  4. @leodido
    • Monitoring landscape very fragmented
    • Many solutions
    • with ancient tech
    • Proprietary data formats
    • often not completely impl. or undocumented or ...
    • Hierarchical data models
    • Metrics? W00t?
    Before Prometheus
    But there’s a thing ...
    • De-facto standard
    • Cloud-native metric monitoring
    • Ease of use
    • Explosion of /metrics endpoints
    After Prometheus
    The journey so far

    View Slide

  5. What if we could exploit Prometheus
    (or OpenMetrics) exposition format’s
    awesomeness without having to
    punctually instrument applications?
    Can we avoid to clog our applications
    through eBPF superpowers?
    eBFP superpowers
    @leodido

    View Slide

  6. What eBPF is
    You can now write mini programs that run on events like disk I/O
    which are run in a safe virtual machine in the kernel.
    In-kernel verifier refuses to load eBPF programs with invalid
    pointer dereferences, exceeding maximum call stack, or with loop
    without an upper bound.
    Imposes a stable Application Binary Interface (ABI).
    BPF on steroids
    A core part of the Linux kernel.
    @leodido

    View Slide

  7. @leodido
    userspace
    program
    bpf() syscall
    eBPF program ...
    user-space
    kernel
    eBPF map
    BPF_MAP_CREATE
    BPF_MAP_LOOKUP_ELEM
    BPF_MAP_UPDATE_ELEM
    BPF_MAP_DELETE_ELEM
    BPF_MAP_GET_NEXT_KEY
    http://bit.ly/bpf_map_types
    BPF_PROG_TYPE_SOCKET_FILTER
    BPF_PROG_TYPE_KPROBE
    BPF_PROG_TYPE_TRACEPOINT
    BPF_PROG_TYPE_RAW_TRACEPOINT
    BPF_PROG_TYPE_XDP
    BPF_PROG_TYPE_PERF_EVENT
    BPF_PROG_TYPE_CGROUP_SKB
    BPF_PROG_TYPE_CGROUP_SOCK
    BPF_PROG_TYPE_SOCK_OPS
    BPF_PROG_TYPE_SK_SKB
    BPF_PROG_TYPE_SK_MSG
    BPF_PROG_TYPE_SCHED_CLS
    BPF_PROG_TYPE_SCHED_ACT
    http://bit.ly/bpf_prog_types
    eBPF program
    How does eBFP work?

    View Slide

  8. • fully programmable
    • can trace everything in a system
    • not limited to a specific application
    • unified tracing interface for both kernel and
    userspace
    • [k,u]probes, (dtrace)tracepoints and so on
    are also used by other tools
    • minimal (negligible) performance impact
    • attach JIT native compiled instrumentation
    code
    • no long suspensions of execution
    Advantages
    • requires a fairly recent kernel
    • definitely not for debugging
    • no knowledge of the calling higher level
    language implementation
    • not fully running in user space
    • kernel-user context (usually negligible)
    switch when eBPF instrument a user process
    • still not portable as other tracers
    • VM primarily developer in the Linux kernel
    (work-in-progress portings btw)
    Disadvantages
    Why use eBPF at all to trace userspace processes?

    View Slide

  9. @leodido
    BFP operator for
    Kubernetes
    Why don’t we make eBPF programs look
    more YAML ✌✌✌

    View Slide

  10. http://bit.ly/k8s_crd
    An extension of the
    K8S API that let you
    store and retrieve
    structured data.
    Custom resources
    http://bit.ly/k8s_shared_informers
    The actual control
    loop that watches the
    shared state using the
    workqueue.
    Shared informers

    http://bit.ly/k8s_custom_controllers
    It declares and
    specifies the desired
    state of your resource
    continuously trying to
    match it with the
    actual state.
    Controllers
    Customize all the things

    View Slide

  11. @leodido
    BPF
    runner
    bpf()
    syscall
    eBPF
    program
    ...
    user-space
    kernel
    eBPF
    map
    eBPF
    program
    ...
    BPF
    runner
    bpf()
    syscall
    eBPF
    program
    ...
    user-space
    kernel
    eBPF
    map
    eBPF
    program
    BPF
    CRD
    Here’s the evil plan
    :9387/metrics :9387/metrics

    View Slide

  12. @leodido
    Did y’all say
    Y’AML?!
    let’s put some ELF magic
    in it...
    ‍♂‍♂

    View Slide

  13. @leodido
    Count packets by protocol Count sys_enter_write by process ID
    macro to generate sections inside the object file (later interpreted by the ELF BPF loader)

    View Slide

  14. @leodido
    Compile and inspect
    This is important because communicates to set the
    current running kernel version!
    Tricky and controversial legal thing about
    licenses ...
    The bpf_prog_load() wrapper also has a license
    parameter to provide the license that applies to
    the eBPF program being loaded.
    Not GPL-compatible license?
    Kernel won’t load you eBPF!
    Exceptions applies...
    eBPF
    Maps

    View Slide

  15. @leodido

    View Slide

  16. @leodido

    View Slide

  17. @leodido
    Demo time
    Doing all the BPF things, with YAML

    View Slide

  18. @leodido
    asciinema

    View Slide

  19. @leodido
    # HELP test_packets No. of packets per protocol (key), node
    # TYPE test_packets counter
    test_packets{key="00001",node="127.0.0.1"} 8
    test_packets{key="00002",node="127.0.0.1"} 1
    test_packets{key="00006",node="127.0.0.1"} 551
    test_packets{key="00008",node="127.0.0.1"} 1
    test_packets{key="00017",node="127.0.0.1"} 15930
    test_packets{key="00089",node="127.0.0.1"} 9
    test_packets{key="00233",node="127.0.0.1"} 1
    # EOF
    It is a WIP project but already open source!
    Check it out @ gh:bfptools/kube-bpf
    ip-10-12-0-136.ec2.internal:9387/metrics
    # <- ICMP
    # <- IGMP
    # <- TCP
    # <- EGP
    # <- UDP
    # <- OSPF
    # <- ?

    View Slide

  20. @leodido
    # HELP test_dummy No. sys_enter_write calls per PID (key), node
    # TYPE test_dummy counter
    test_dummy{key="00001",node="127.0.0.1"} ...
    test_dummy{key="00001",node="127.0.0.1"} 8
    test_dummy{key="00295",node="127.0.0.1"} 1
    test_dummy{key="01278",node="127.0.0.1"} 1158
    test_dummy{key="04690",node="127.0.0.1"} 209
    test_dummy{key="04691",node="127.0.0.1"} 889
    # EOF
    It is a WIP project but already open source!
    Check it out @ gh:bfptools/kube-bpf
    ip-10-12-0-122.ec2.internal:9387/metrics

    View Slide

  21. @leodido
    It is a WIP project but already open source!
    Check it out @ gh:bfptools/kube-bpf

    View Slide

  22. @leodido
    kubectl-trace
    More eBPF + k8s
    Run bpftrace program (from file)
    Ctrl-C tells the
    program to
    plot the results
    using hist()
    The output histogram
    Maps

    View Slide

  23. @leodido
    • Prometheus exposition format is here to stay given how simple it is
    • OpenMetrics will introduce improvements on such giant shoulders
    • We cannot monitor and observe everything from inside our applications
    • We might want to have a look at the orchestrator (context) our apps live
    and die in
    • Kubernetes can be extended to achieve such levels of integrations
    • ELF is cool
    • We look for better tools (eBPF) for grabbing our metrics and even more
    • Almost nullify footprint ⚡
    • Enable a wider range of available data
    • Do not touch our applications directly
    • There is a PoC doing some magic at gh:bfptools/kube-bpf
    Key takeaways

    View Slide

  24. Thanks.
    Reach me out @leodido on twitter & github!
    SEE Y’ALL AROUND AT KUBECON
    http://bit.ly/prometheus_ebpf_k8s

    View Slide