Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prometheus as exposition format for eBPF programs running on Kubernetes

Prometheus as exposition format for eBPF programs running on Kubernetes

The kernel knows more than our programs. Stop bloating our applications with copy-and-paste instrumentation code for metrics. Let's go look under the hoods!

Nowadays every application exposes their metrics via an HTTP endpoint readable by using Prometheus. Nevertheless, this very common pattern, by definition only exposes metrics regarding the specific applications being observed.

This talk, and its companion slides, wants to expose the idea, and a reference implementation (https://github.com/bpftools/kube-bpf), of using eBPF programs to collect and automatically expose applications and kernel metrics via a Prometheus endpoint.

It walks through the architecture of the proposed reference implementation - a Kubernetes operator with a custom resource for eBPF programs - and finally links to a simple demo showing how to use it to grab and present some metrics without having touched any application running on the demo cluster.

---

Talk given at Cloud_Native Rejekts EU - Barcelona, Spain - on May 18th, 2019

Leonardo Di Donato

May 18, 2019
Tweet

More Decks by Leonardo Di Donato

Other Decks in Technology

Transcript

  1. Prometheus as exposition format for eBPF programs running on k8s

    Leonardo Di Donato. Open Source Software Engineer @ Sysdig. 2019.05.18 - Cloud_Native Rejekts EU - Barcelona, Spain
  2. @leodido • Old buzzword. • Is this SNMP? • Focus

    on collecting, persisting, and alerting on just any data! • It might also become simply garbage. • Data lake. • Doing it well requires a strategy. • Uninformed monitoring equals hope. Monitoring The missing buzzwords Wait, another really cool buzzword is Tracing! • Ability of a system to give to humans insights. • Humans can observe, understand, and act on the presented state of an observable system. • Ability to make deductions about internal state only looking at boundaries (inputs vs outputs). • Never truly achieved. Ongoing process and mindset. • Avoid black box data. Extract fine-grained and meaningful data. Observability
  3. @leodido • Monitoring landscape very fragmented • Many solutions •

    with ancient tech • Proprietary data formats • often not completely impl. or undocumented or ... • Hierarchical data models • Metrics? W00t? Before Prometheus But there’s a thing ... • De-facto standard • Cloud-native metric monitoring • Ease of use • Explosion of /metrics endpoints After Prometheus The journey so far
  4. What if we could exploit Prometheus (or OpenMetrics) exposition format’s

    awesomeness without having to punctually instrument applications? Can we avoid to clog our applications through eBPF superpowers? eBFP superpowers @leodido
  5. What eBPF is You can now write mini programs that

    run on events like disk I/O which are run in a safe virtual machine in the kernel. In-kernel verifier refuses to load eBPF programs with invalid pointer dereferences, exceeding maximum call stack, or with loop without an upper bound. Imposes a stable Application Binary Interface (ABI). BPF on steroids A core part of the Linux kernel. @leodido
  6. @leodido userspace program bpf() syscall eBPF program ... user-space kernel

    eBPF map BPF_MAP_CREATE BPF_MAP_LOOKUP_ELEM BPF_MAP_UPDATE_ELEM BPF_MAP_DELETE_ELEM BPF_MAP_GET_NEXT_KEY http://bit.ly/bpf_map_types BPF_PROG_TYPE_SOCKET_FILTER BPF_PROG_TYPE_KPROBE BPF_PROG_TYPE_TRACEPOINT BPF_PROG_TYPE_RAW_TRACEPOINT BPF_PROG_TYPE_XDP BPF_PROG_TYPE_PERF_EVENT BPF_PROG_TYPE_CGROUP_SKB BPF_PROG_TYPE_CGROUP_SOCK BPF_PROG_TYPE_SOCK_OPS BPF_PROG_TYPE_SK_SKB BPF_PROG_TYPE_SK_MSG BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT http://bit.ly/bpf_prog_types eBPF program How does eBFP work?
  7. • fully programmable • can trace everything in a system

    • not limited to a specific application • unified tracing interface for both kernel and userspace • [k,u]probes, (dtrace)tracepoints and so on are also used by other tools • minimal (negligible) performance impact • attach JIT native compiled instrumentation code • no long suspensions of execution Advantages • requires a fairly recent kernel • definitely not for debugging • no knowledge of the calling higher level language implementation • not fully running in user space • kernel-user context (usually negligible) switch when eBPF instrument a user process • still not portable as other tracers • VM primarily developer in the Linux kernel (work-in-progress portings btw) Disadvantages Why use eBPF at all to trace userspace processes?
  8. http://bit.ly/k8s_crd An extension of the K8S API that let you

    store and retrieve structured data. Custom resources http://bit.ly/k8s_shared_informers The actual control loop that watches the shared state using the workqueue. Shared informers http://bit.ly/k8s_custom_controllers It declares and specifies the desired state of your resource continuously trying to match it with the actual state. Controllers Customize all the things
  9. @leodido BPF runner bpf() syscall eBPF program ... user-space kernel

    eBPF map eBPF program ... BPF runner bpf() syscall eBPF program ... user-space kernel eBPF map eBPF program BPF CRD Here’s the evil plan :9387/metrics :9387/metrics
  10. @leodido Count packets by protocol Count sys_enter_write by process ID

    macro to generate sections inside the object file (later interpreted by the ELF BPF loader)
  11. @leodido Compile and inspect This is important because communicates to

    set the current running kernel version! Tricky and controversial legal thing about licenses ... The bpf_prog_load() wrapper also has a license parameter to provide the license that applies to the eBPF program being loaded. Not GPL-compatible license? Kernel won’t load you eBPF! Exceptions applies... eBPF Maps
  12. @leodido # HELP test_packets No. of packets per protocol (key),

    node # TYPE test_packets counter test_packets{key="00001",node="127.0.0.1"} 8 test_packets{key="00002",node="127.0.0.1"} 1 test_packets{key="00006",node="127.0.0.1"} 551 test_packets{key="00008",node="127.0.0.1"} 1 test_packets{key="00017",node="127.0.0.1"} 15930 test_packets{key="00089",node="127.0.0.1"} 9 test_packets{key="00233",node="127.0.0.1"} 1 # EOF It is a WIP project but already open source! Check it out @ gh:bfptools/kube-bpf ip-10-12-0-136.ec2.internal:9387/metrics # <- ICMP # <- IGMP # <- TCP # <- EGP # <- UDP # <- OSPF # <- ?
  13. @leodido # HELP test_dummy No. sys_enter_write calls per PID (key),

    node # TYPE test_dummy counter test_dummy{key="00001",node="127.0.0.1"} ... test_dummy{key="00001",node="127.0.0.1"} 8 test_dummy{key="00295",node="127.0.0.1"} 1 test_dummy{key="01278",node="127.0.0.1"} 1158 test_dummy{key="04690",node="127.0.0.1"} 209 test_dummy{key="04691",node="127.0.0.1"} 889 # EOF It is a WIP project but already open source! Check it out @ gh:bfptools/kube-bpf ip-10-12-0-122.ec2.internal:9387/metrics
  14. @leodido It is a WIP project but already open source!

    Check it out @ gh:bfptools/kube-bpf
  15. @leodido kubectl-trace More eBPF + k8s Run bpftrace program (from

    file) Ctrl-C tells the program to plot the results using hist() The output histogram Maps
  16. @leodido • Prometheus exposition format is here to stay given

    how simple it is • OpenMetrics will introduce improvements on such giant shoulders • We cannot monitor and observe everything from inside our applications • We might want to have a look at the orchestrator (context) our apps live and die in • Kubernetes can be extended to achieve such levels of integrations • ELF is cool • We look for better tools (eBPF) for grabbing our metrics and even more • Almost nullify footprint ⚡ • Enable a wider range of available data • Do not touch our applications directly • There is a PoC doing some magic at gh:bfptools/kube-bpf Key takeaways
  17. Thanks. Reach me out @leodido on twitter & github! SEE

    Y’ALL AROUND AT KUBECON http://bit.ly/prometheus_ebpf_k8s