Slide 1

Slide 1 text

Prometheus as exposition format for eBPF programs running on k8s Leonardo Di Donato. Open Source Software Engineer @ Sysdig. 2019.05.18 - Cloud_Native Rejekts EU - Barcelona, Spain

Slide 2

Slide 2 text

whoami Leonardo Di Donato. Maintainer of Falco. Creator of kubectl-trace and go-syslog. Reach me out @leodido.

Slide 3

Slide 3 text

@leodido • Old buzzword. • Is this SNMP? • Focus on collecting, persisting, and alerting on just any data! • It might also become simply garbage. • Data lake. • Doing it well requires a strategy. • Uninformed monitoring equals hope. Monitoring The missing buzzwords Wait, another really cool buzzword is Tracing! • Ability of a system to give to humans insights. • Humans can observe, understand, and act on the presented state of an observable system. • Ability to make deductions about internal state only looking at boundaries (inputs vs outputs). • Never truly achieved. Ongoing process and mindset. • Avoid black box data. Extract fine-grained and meaningful data. Observability

Slide 4

Slide 4 text

@leodido • Monitoring landscape very fragmented • Many solutions • with ancient tech • Proprietary data formats • often not completely impl. or undocumented or ... • Hierarchical data models • Metrics? W00t? Before Prometheus But there’s a thing ... • De-facto standard • Cloud-native metric monitoring • Ease of use • Explosion of /metrics endpoints After Prometheus The journey so far

Slide 5

Slide 5 text

What if we could exploit Prometheus (or OpenMetrics) exposition format’s awesomeness without having to punctually instrument applications? Can we avoid to clog our applications through eBPF superpowers? eBFP superpowers @leodido

Slide 6

Slide 6 text

What eBPF is You can now write mini programs that run on events like disk I/O which are run in a safe virtual machine in the kernel. In-kernel verifier refuses to load eBPF programs with invalid pointer dereferences, exceeding maximum call stack, or with loop without an upper bound. Imposes a stable Application Binary Interface (ABI). BPF on steroids A core part of the Linux kernel. @leodido

Slide 7

Slide 7 text

@leodido userspace program bpf() syscall eBPF program ... user-space kernel eBPF map BPF_MAP_CREATE BPF_MAP_LOOKUP_ELEM BPF_MAP_UPDATE_ELEM BPF_MAP_DELETE_ELEM BPF_MAP_GET_NEXT_KEY http://bit.ly/bpf_map_types BPF_PROG_TYPE_SOCKET_FILTER BPF_PROG_TYPE_KPROBE BPF_PROG_TYPE_TRACEPOINT BPF_PROG_TYPE_RAW_TRACEPOINT BPF_PROG_TYPE_XDP BPF_PROG_TYPE_PERF_EVENT BPF_PROG_TYPE_CGROUP_SKB BPF_PROG_TYPE_CGROUP_SOCK BPF_PROG_TYPE_SOCK_OPS BPF_PROG_TYPE_SK_SKB BPF_PROG_TYPE_SK_MSG BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT http://bit.ly/bpf_prog_types eBPF program How does eBFP work?

Slide 8

Slide 8 text

• fully programmable • can trace everything in a system • not limited to a specific application • unified tracing interface for both kernel and userspace • [k,u]probes, (dtrace)tracepoints and so on are also used by other tools • minimal (negligible) performance impact • attach JIT native compiled instrumentation code • no long suspensions of execution Advantages • requires a fairly recent kernel • definitely not for debugging • no knowledge of the calling higher level language implementation • not fully running in user space • kernel-user context (usually negligible) switch when eBPF instrument a user process • still not portable as other tracers • VM primarily developer in the Linux kernel (work-in-progress portings btw) Disadvantages Why use eBPF at all to trace userspace processes?

Slide 9

Slide 9 text

@leodido BFP operator for Kubernetes Why don’t we make eBPF programs look more YAML ✌✌✌

Slide 10

Slide 10 text

http://bit.ly/k8s_crd An extension of the K8S API that let you store and retrieve structured data. Custom resources http://bit.ly/k8s_shared_informers The actual control loop that watches the shared state using the workqueue. Shared informers http://bit.ly/k8s_custom_controllers It declares and specifies the desired state of your resource continuously trying to match it with the actual state. Controllers Customize all the things

Slide 11

Slide 11 text

@leodido BPF runner bpf() syscall eBPF program ... user-space kernel eBPF map eBPF program ... BPF runner bpf() syscall eBPF program ... user-space kernel eBPF map eBPF program BPF CRD Here’s the evil plan :9387/metrics :9387/metrics

Slide 12

Slide 12 text

@leodido Did y’all say Y’AML?! let’s put some ELF magic in it... ‍♂‍♂

Slide 13

Slide 13 text

@leodido Count packets by protocol Count sys_enter_write by process ID macro to generate sections inside the object file (later interpreted by the ELF BPF loader)

Slide 14

Slide 14 text

@leodido Compile and inspect This is important because communicates to set the current running kernel version! Tricky and controversial legal thing about licenses ... The bpf_prog_load() wrapper also has a license parameter to provide the license that applies to the eBPF program being loaded. Not GPL-compatible license? Kernel won’t load you eBPF! Exceptions applies... eBPF Maps

Slide 15

Slide 15 text

@leodido

Slide 16

Slide 16 text

@leodido

Slide 17

Slide 17 text

@leodido Demo time Doing all the BPF things, with YAML

Slide 18

Slide 18 text

@leodido asciinema

Slide 19

Slide 19 text

@leodido # HELP test_packets No. of packets per protocol (key), node # TYPE test_packets counter test_packets{key="00001",node="127.0.0.1"} 8 test_packets{key="00002",node="127.0.0.1"} 1 test_packets{key="00006",node="127.0.0.1"} 551 test_packets{key="00008",node="127.0.0.1"} 1 test_packets{key="00017",node="127.0.0.1"} 15930 test_packets{key="00089",node="127.0.0.1"} 9 test_packets{key="00233",node="127.0.0.1"} 1 # EOF It is a WIP project but already open source! Check it out @ gh:bfptools/kube-bpf ip-10-12-0-136.ec2.internal:9387/metrics # <- ICMP # <- IGMP # <- TCP # <- EGP # <- UDP # <- OSPF # <- ?

Slide 20

Slide 20 text

@leodido # HELP test_dummy No. sys_enter_write calls per PID (key), node # TYPE test_dummy counter test_dummy{key="00001",node="127.0.0.1"} ... test_dummy{key="00001",node="127.0.0.1"} 8 test_dummy{key="00295",node="127.0.0.1"} 1 test_dummy{key="01278",node="127.0.0.1"} 1158 test_dummy{key="04690",node="127.0.0.1"} 209 test_dummy{key="04691",node="127.0.0.1"} 889 # EOF It is a WIP project but already open source! Check it out @ gh:bfptools/kube-bpf ip-10-12-0-122.ec2.internal:9387/metrics

Slide 21

Slide 21 text

@leodido It is a WIP project but already open source! Check it out @ gh:bfptools/kube-bpf

Slide 22

Slide 22 text

@leodido kubectl-trace More eBPF + k8s Run bpftrace program (from file) Ctrl-C tells the program to plot the results using hist() The output histogram Maps

Slide 23

Slide 23 text

@leodido • Prometheus exposition format is here to stay given how simple it is • OpenMetrics will introduce improvements on such giant shoulders • We cannot monitor and observe everything from inside our applications • We might want to have a look at the orchestrator (context) our apps live and die in • Kubernetes can be extended to achieve such levels of integrations • ELF is cool • We look for better tools (eBPF) for grabbing our metrics and even more • Almost nullify footprint ⚡ • Enable a wider range of available data • Do not touch our applications directly • There is a PoC doing some magic at gh:bfptools/kube-bpf Key takeaways

Slide 24

Slide 24 text

Thanks. Reach me out @leodido on twitter & github! SEE Y’ALL AROUND AT KUBECON http://bit.ly/prometheus_ebpf_k8s