Prometheus as exposition format for eBPF programs running on Kubernetes

Prometheus as exposition format for eBPF programs running on k8s
Leonardo Di Donato. Open Source Software Engineer @ Sysdig. 2019.05.18 - Cloud_Native Rejekts EU - Barcelona, Spain

whoami Leonardo Di Donato. Maintainer of Falco. Creator of kubectl-trace
and go-syslog. Reach me out @leodido.

@leodido • Old buzzword. • Is this SNMP? • Focus
on collecting, persisting, and alerting on just any data! • It might also become simply garbage. • Data lake. • Doing it well requires a strategy. • Uninformed monitoring equals hope. Monitoring The missing buzzwords Wait, another really cool buzzword is Tracing! • Ability of a system to give to humans insights. • Humans can observe, understand, and act on the presented state of an observable system. • Ability to make deductions about internal state only looking at boundaries (inputs vs outputs). • Never truly achieved. Ongoing process and mindset. • Avoid black box data. Extract fine-grained and meaningful data. Observability

@leodido • Monitoring landscape very fragmented • Many solutions •
with ancient tech • Proprietary data formats • often not completely impl. or undocumented or ... • Hierarchical data models • Metrics? W00t? Before Prometheus But there’s a thing ... • De-facto standard • Cloud-native metric monitoring • Ease of use • Explosion of /metrics endpoints After Prometheus The journey so far

What if we could exploit Prometheus (or OpenMetrics) exposition format’s
awesomeness without having to punctually instrument applications? Can we avoid to clog our applications through eBPF superpowers? eBFP superpowers @leodido

What eBPF is You can now write mini programs that
run on events like disk I/O which are run in a safe virtual machine in the kernel. In-kernel verifier refuses to load eBPF programs with invalid pointer dereferences, exceeding maximum call stack, or with loop without an upper bound. Imposes a stable Application Binary Interface (ABI). BPF on steroids A core part of the Linux kernel. @leodido

@leodido userspace program bpf() syscall eBPF program ... user-space kernel
eBPF map BPF_MAP_CREATE BPF_MAP_LOOKUP_ELEM BPF_MAP_UPDATE_ELEM BPF_MAP_DELETE_ELEM BPF_MAP_GET_NEXT_KEY http://bit.ly/bpf_map_types BPF_PROG_TYPE_SOCKET_FILTER BPF_PROG_TYPE_KPROBE BPF_PROG_TYPE_TRACEPOINT BPF_PROG_TYPE_RAW_TRACEPOINT BPF_PROG_TYPE_XDP BPF_PROG_TYPE_PERF_EVENT BPF_PROG_TYPE_CGROUP_SKB BPF_PROG_TYPE_CGROUP_SOCK BPF_PROG_TYPE_SOCK_OPS BPF_PROG_TYPE_SK_SKB BPF_PROG_TYPE_SK_MSG BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT http://bit.ly/bpf_prog_types eBPF program How does eBFP work?

• fully programmable • can trace everything in a system
• not limited to a specific application • unified tracing interface for both kernel and userspace • [k,u]probes, (dtrace)tracepoints and so on are also used by other tools • minimal (negligible) performance impact • attach JIT native compiled instrumentation code • no long suspensions of execution Advantages • requires a fairly recent kernel • definitely not for debugging • no knowledge of the calling higher level language implementation • not fully running in user space • kernel-user context (usually negligible) switch when eBPF instrument a user process • still not portable as other tracers • VM primarily developer in the Linux kernel (work-in-progress portings btw) Disadvantages Why use eBPF at all to trace userspace processes?

@leodido BFP operator for Kubernetes Why don’t we make eBPF
programs look more YAML ✌✌✌

http://bit.ly/k8s_crd An extension of the K8S API that let you
store and retrieve structured data. Custom resources http://bit.ly/k8s_shared_informers The actual control loop that watches the shared state using the workqueue. Shared informers http://bit.ly/k8s_custom_controllers It declares and specifies the desired state of your resource continuously trying to match it with the actual state. Controllers Customize all the things

@leodido BPF runner bpf() syscall eBPF program ... user-space kernel
eBPF map eBPF program ... BPF runner bpf() syscall eBPF program ... user-space kernel eBPF map eBPF program BPF CRD Here’s the evil plan :9387/metrics :9387/metrics

@leodido Did y’all say Y’AML?! let’s put some ELF magic
in it... ‍♂‍♂

@leodido Count packets by protocol Count sys_enter_write by process ID
macro to generate sections inside the object file (later interpreted by the ELF BPF loader)

@leodido Compile and inspect This is important because communicates to
set the current running kernel version! Tricky and controversial legal thing about licenses ... The bpf_prog_load() wrapper also has a license parameter to provide the license that applies to the eBPF program being loaded. Not GPL-compatible license? Kernel won’t load you eBPF! Exceptions applies... eBPF Maps

@leodido

@leodido Demo time Doing all the BPF things, with YAML

@leodido asciinema

@leodido # HELP test_packets No. of packets per protocol (key),
node # TYPE test_packets counter test_packets{key="00001",node="127.0.0.1"} 8 test_packets{key="00002",node="127.0.0.1"} 1 test_packets{key="00006",node="127.0.0.1"} 551 test_packets{key="00008",node="127.0.0.1"} 1 test_packets{key="00017",node="127.0.0.1"} 15930 test_packets{key="00089",node="127.0.0.1"} 9 test_packets{key="00233",node="127.0.0.1"} 1 # EOF It is a WIP project but already open source! Check it out @ gh:bfptools/kube-bpf ip-10-12-0-136.ec2.internal:9387/metrics # <- ICMP # <- IGMP # <- TCP # <- EGP # <- UDP # <- OSPF # <- ?

@leodido # HELP test_dummy No. sys_enter_write calls per PID (key),
node # TYPE test_dummy counter test_dummy{key="00001",node="127.0.0.1"} ... test_dummy{key="00001",node="127.0.0.1"} 8 test_dummy{key="00295",node="127.0.0.1"} 1 test_dummy{key="01278",node="127.0.0.1"} 1158 test_dummy{key="04690",node="127.0.0.1"} 209 test_dummy{key="04691",node="127.0.0.1"} 889 # EOF It is a WIP project but already open source! Check it out @ gh:bfptools/kube-bpf ip-10-12-0-122.ec2.internal:9387/metrics

@leodido It is a WIP project but already open source!
Check it out @ gh:bfptools/kube-bpf

@leodido kubectl-trace More eBPF + k8s Run bpftrace program (from
file) Ctrl-C tells the program to plot the results using hist() The output histogram Maps

@leodido • Prometheus exposition format is here to stay given
how simple it is • OpenMetrics will introduce improvements on such giant shoulders • We cannot monitor and observe everything from inside our applications • We might want to have a look at the orchestrator (context) our apps live and die in • Kubernetes can be extended to achieve such levels of integrations • ELF is cool • We look for better tools (eBPF) for grabbing our metrics and even more • Almost nullify footprint ⚡ • Enable a wider range of available data • Do not touch our applications directly • There is a PoC doing some magic at gh:bfptools/kube-bpf Key takeaways

Thanks. Reach me out @leodido on twitter & github! SEE
Y’ALL AROUND AT KUBECON http://bit.ly/prometheus_ebpf_k8s

Prometheus as exposition format for eBPF progra...

Prometheus as exposition format for eBPF programs running on Kubernetes

Leonardo Di Donato

More Decks by Leonardo Di Donato

Other Decks in Technology

Featured

Transcript

Prometheus as exposition format for eBPF programs running on k8s

whoami Leonardo Di Donato. Maintainer of Falco. Creator of kubectl-trace

@leodido • Old buzzword. • Is this SNMP? • Focus

@leodido • Monitoring landscape very fragmented • Many solutions •

What if we could exploit Prometheus (or OpenMetrics) exposition format’s

What eBPF is You can now write mini programs that

@leodido userspace program bpf() syscall eBPF program ... user-space kernel

• fully programmable • can trace everything in a system

@leodido BFP operator for Kubernetes Why don’t we make eBPF

http://bit.ly/k8s_crd An extension of the K8S API that let you

@leodido BPF runner bpf() syscall eBPF program ... user-space kernel

@leodido Did y’all say Y’AML?! let’s put some ELF magic

@leodido Count packets by protocol Count sys_enter_write by process ID

@leodido Compile and inspect This is important because communicates to

@leodido

@leodido

@leodido Demo time Doing all the BPF things, with YAML

@leodido asciinema

@leodido # HELP test_packets No. of packets per protocol (key),

@leodido # HELP test_dummy No. sys_enter_write calls per PID (key),

@leodido It is a WIP project but already open source!

@leodido kubectl-trace More eBPF + k8s Run bpftrace program (from

@leodido • Prometheus exposition format is here to stay given

Thanks. Reach me out @leodido on twitter & github! SEE