Slide 1

Slide 1 text

Prometheus as exposition format for eBPF programs running on Kubernetes Leonardo Di Donato. Open Source Software Engineer @ Sysdig. 2019.09.19 - DevOpsDays - Istanbul, Turkey

Slide 2

Slide 2 text

whoami Leonardo Di Donato. Maintainer of Falco. Creator of kubectl-trace, kube-bpf, kubectl-dig, and go-syslog. Reach me out @leodido on twitter & github.

Slide 3

Slide 3 text

Deal with just a few instead of thousands of them. Aggregate events at kernel level @leodido

Slide 4

Slide 4 text

What if we could exploit Prometheus (or OpenMetrics) awesomeness without having to punctually instrument applications to monitor? Can we avoid to clog our applications through eBPF superpowers? eBFP superpowers @leodido

Slide 5

Slide 5 text

What eBPF is You can now write mini programs that run on events like disk I/O which are run in a safe register-based VM using a custom 64 bit RISC instruction set in the kernel. In-kernel verifier refuses to load eBPF programs with invalid pointer dereferences, exceeding maximum call stack, or with loop without an upper bound. Imposes a stable Application Binary Interface (ABI). Even more amazing than BPF ๐Ÿš€ A core part of the Linux kernel. @leodido extended because itโ€™s not just packets anymore

Slide 6

Slide 6 text

load compile @leodido BPF_MAP_CREATE BPF_MAP_LOOKUP_ELEM BPF_MAP_UPDATE_ELEM BPF_MAP_DELETE_ELEM BPF_MAP_GET_NEXT_KEY http://bit.ly/bpf_map_types ๐Ÿ“Ž BPF_PROG_TYPE_SOCKET_FILTER BPF_PROG_TYPE_KPROBE BPF_PROG_TYPE_TRACEPOINT BPF_PROG_TYPE_RAW_TRACEPOINT BPF_PROG_TYPE_XDP BPF_PROG_TYPE_PERF_EVENT BPF_PROG_TYPE_CGROUP_SKB BPF_PROG_TYPE_CGROUP_SOCK BPF_PROG_TYPE_SOCK_OPS BPF_PROG_TYPE_SK_SKB BPF_PROG_TYPE_SK_MSG BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT ๐Ÿ“Ž http://bit.ly/bpf_prog_types man 2 bpf man 8 tc-bpf How does eBFP work? user-space kernel BPF source BPF ELF bpf() verifier BPF Maps Maps data kprobe uprobe static tracepoint perf events XDP socket filter

Slide 7

Slide 7 text

โ€ข fully programmable โ€ข event driven โ€ข can trace everything in a system โ€ข not limited to a specific application โ€ข unified tracing interface for both kernel and userspace โ€ข {k,u}probes, (dtrace)tracepoints and so on are also used by other tools โ€ข minimal (negligible) performance impact โ€ข attach JIT native compiled inst. code โ€ข no long suspensions of execution Advantages โ€ข requires a fairly recent kernel โ€ข definitely not for debugging โ€ข no knowledge of the calling higher level language implementation โ€ข not fully running in user space โ€ข kernel-user context (usually negligible) switch when eBPF instrument a user process โ€ข still not portable as other tracers โ€ข VM primarily developer in the Linux kernel (work-in-progress portings btw) Disadvantages Why use eBPF at all to trace userspace processes?

Slide 8

Slide 8 text

@leodido Count packets by protocol Count sys_enter_write by process ID macro to generate sections inside the object file (later interpreted by the ELF BPF loader)

Slide 9

Slide 9 text

Why not instrumenting eBPF programs for Kubernetes?

Slide 10

Slide 10 text

Just use a sidecar container โ€ข A sidecar container sharing the process namespace โ€ข Image with eBPF loader + eBPF program in it โ€ข Not very generic approach but does the job! ๐Ÿค” @leodido

Slide 11

Slide 11 text

github.com/bpftools/kube-bpf ๐Ÿ”— Like loading whatever eBPF program from its ELF using a Kubernetes CRD ? ๐Ÿคฏ Grab metrics via eBPF and expose them using a Prometheus endpoint. Something more generic? @leodido

Slide 12

Slide 12 text

@leodido BFP operator for Kubernetes Why donโ€™t we make eBPF programs look more YAML โœŒโœŒโœŒ

Slide 13

Slide 13 text

๐Ÿ“Ž http://bit.ly/k8s_crd An extension of the K8S API that let you store and retrieve structured data. Custom resources ๐Ÿ“Ž http://bit.ly/k8s_shared_informers The actual control loop that watches the shared state using the workqueue. Shared informers ๐Ÿ“Ž http://bit.ly/k8s_custom_controllers It declares and specifies the desired state of your resource continuously trying to match it with the actual state. Controllers Customize all the things

Slide 14

Slide 14 text

@leodido BPF runner bpf() syscall eBPF program ... user-space kernel eBPF map eBPF program ... BPF runner bpf() syscall eBPF program ... user-space kernel eBPF map eBPF program BPF CRD Hereโ€™s the evil plan :9387/metrics :9387/metrics

Slide 15

Slide 15 text

@leodido as seen on https://yaml.engineering Did yโ€™all say Yโ€™AML?! letโ€™s put some ELF magic in it... ๓ฐจ๐Ÿคฏ๓ฐฉƒ

Slide 16

Slide 16 text

@leodido Compile and inspect This is important because communicates to set the current running kernel version! Tricky and controversial legal thing about licenses ... The bpf_prog_load() wrapper also has a license parameter to provide the license that applies to the eBPF program being loaded. Not GPL-compatible license? Kernel wonโ€™t load you eBPF! Exceptions applies... eBPF Maps

Slide 17

Slide 17 text

@leodido encoded ELF ๐Ÿ‘€ BPF custom resource ๐Ÿ‘€

Slide 18

Slide 18 text

@leodido encoded ELF ๐Ÿ‘€ BPF custom resource ๐Ÿ‘€

Slide 19

Slide 19 text

@leodido Demo time Doing all the eBPF things, with YAML ๐Ÿ’ฆ

Slide 20

Slide 20 text

@leodido

Slide 21

Slide 21 text

@leodido # HELP test_packets No. of packets per protocol (key), node # TYPE test_packets counter test_packets{key="00001",node="127.0.0.1"} 8 test_packets{key="00002",node="127.0.0.1"} 1 test_packets{key="00006",node="127.0.0.1"} 551 test_packets{key="00008",node="127.0.0.1"} 1 test_packets{key="00017",node="127.0.0.1"} 15930 test_packets{key="00089",node="127.0.0.1"} 9 test_packets{key="00233",node="127.0.0.1"} 1 # EOF It is a WIP project but already open source! ๐ŸŽบ Check the protocol numbers ๐Ÿ”— Check it out @ gh:bfptools/kube-bpf ๐Ÿ”— ip-10-12-0-136.ec2.internal:9387/metrics # <- ICMP # <- IGMP # <- TCP # <- EGP # <- UDP # <- OSPF # <- ?

Slide 22

Slide 22 text

@leodido # HELP test_dummy No. sys_enter_write calls per PID (key), node # TYPE test_dummy counter test_dummy{key="00001",node="127.0.0.1"} ... test_dummy{key="00001",node="127.0.0.1"} 8 test_dummy{key="00295",node="127.0.0.1"} 1 test_dummy{key="01278",node="127.0.0.1"} 1158 test_dummy{key="04690",node="127.0.0.1"} 209 test_dummy{key="04691",node="127.0.0.1"} 889 # EOF It is a WIP project but already open source! ๐ŸŽบ Check it out @ gh:bfptools/kube-bpf ๐Ÿ”— ip-10-12-0-122.ec2.internal:9387/metrics

Slide 23

Slide 23 text

Packets everywhere

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

@leodido It is a WIP project but already open source! ๐ŸŽบ Contributions are welcome! ๐ŸŽŠ Check it out @ gh:bfptools/kube-bpf ๐Ÿ”—

Slide 26

Slide 26 text

kubectl-trace More eBPF + Kubernetes? Run bpftrace program (from file) Ctrl-C tells the program to plot the results using hist() The output histogram Maps

Slide 27

Slide 27 text

Falco More eBPF + Kubernetes?

Slide 28

Slide 28 text

@leodido โ€ข Prometheus exposition format is here to stay given how simple it is ๐Ÿ“Š โ€ข OpenMetrics will introduce improvements on such giant shoulders ๐Ÿ“ˆ โ€ข We cannot monitor and observe everything from inside our applications ๐ŸŽฏ โ€ข We might want to have a look at the orchestrator (context) our apps live and die in ๐Ÿ•ธ โ€ข Kubernetes can be extended to achieve such levels of integrations ๐Ÿ”Œ โ€ข ELF is cool ๐Ÿง โ€ข We look for better tools (eBPF) for grabbing our metrics and even more ๐Ÿ”ฎ โ€ข Almost nullify footprint โšก โ€ข Enable a wider range of available data ๐ŸŒŠ โ€ข Do not touch our applications directly ๐Ÿ‘ป โ€ข There is a PoC doing some magic at github.com/bfptools/kube-bpf ๐Ÿงž Key takeaways

Slide 29

Slide 29 text

Acronyms & Abbreviations In case you wonder ABI Application Binary Interface BPF Berkeley Packet Filters CRD Custom Resource Definition (Kubernetes) eBPF extended Berkeley Packet Filters ELF Executable and Linkable Format RISC Reduced instruction set computer VM Virtual Machine

Slide 30

Slide 30 text

Thanks. Reach me out @leodido on twitter & github! SEE Yโ€™ALL AROUND AT KUBECON Slides here.