Slide 1

Slide 1 text

Ruian Huang @ Dcard / Cloud Native TW Meetup #33. 2020/10/26 Cilium & cgroup eBPF cgroup eBPF applications

Slide 2

Slide 2 text

Hi, I am Ruian • Graduated from NCTU CSCC • Dcard Backend Engineer • https://medium.com/@ruian • http://github.com/rueian • https://speakerdeck.com/rueian Previous Sharing

Slide 3

Slide 3 text

• Why Cilium? Why eBPF? • What is cgroup eBPF? • commit history • connect syscall example • How does Cilium use cgroup eBPF? • How does Cilium agent prepare the eBPF map? • How does Cilium eBPF program utilize the map? Outline

Slide 4

Slide 4 text

What is Cilium? Source: https://cilium.io

Slide 5

Slide 5 text

Why Cilium? Source: https://cilium.io/blog/2020/08/19/google-chooses-cilium-for-gke-networking/

Slide 6

Slide 6 text

Why eBPF? • High Performance • Able to skip large amount of executions in kernel. • Full Control • Able to change kernel/application behavior on the fly. Other eBPF Applications • [XDP] https://blog.cloudflare.com/how-to-drop-10-million-packets/ • [XDP] https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/ • [XDP] https://engineering.fb.com/open-source/open-sourcing-katran-a- scalable-network-load-balancer • https://github.com/zoidbergwill/awesome-ebpf

Slide 7

Slide 7 text

Source: KubeCon2020 - Hubble eBPF Based Observability for Kubernetes By Sebastian Wicki, Isovalent Where is XDP eBPF?

Slide 8

Slide 8 text

Where is XDP eBPF? Source: KubeCon2020 - Hubble eBPF Based Observability for Kubernetes By Sebastian Wicki, Isovalent

Slide 9

Slide 9 text

Where is cgroup eBPF? Source: KubeCon2020 - Hubble eBPF Based Observability for Kubernetes By Sebastian Wicki, Isovalent

Slide 10

Slide 10 text

Where is cgroup eBPF? Source: KubeCon2020 - Hubble eBPF Based Observability for Kubernetes By Sebastian Wicki, Isovalent

Slide 11

Slide 11 text

cgroup eBPF - First 2016 Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc

Slide 12

Slide 12 text

cgroup eBPF - First 2016 Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc

Slide 13

Slide 13 text

Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc cgroup eBPF - First 2016

Slide 14

Slide 14 text

Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc cgroup eBPF - First 2016

Slide 15

Slide 15 text

Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc cgroup eBPF - First

Slide 16

Slide 16 text

cgroup eBPF - 2017 Source: https://github.com/torvalds/linux/commit/324bda9e6c5add86ba2e1066476481c48132aca0

Slide 17

Slide 17 text

cgroup eBPF - Now

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

Slide 21

Slide 21 text

Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

Slide 22

Slide 22 text

Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

Slide 23

Slide 23 text

Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

Slide 24

Slide 24 text

Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

Slide 25

Slide 25 text

• https://github.com/torvalds/linux/commit/1cedee13d25ab118d325f95588c1a084e9317229 BPF_CGROUP_UDP4_SENDMSG BPF_CGROUP_UDP4_RECVMSG • https://github.com/torvalds/linux/commit/983695fa676568fc0fe5ddd995c7267aabc24632 BPF_CGROUP_INET4_GETPEERNAME • https://github.com/torvalds/linux/commit/1b66d253610c7f8f257103808a9460223a087469 Other cgroup BPF commit history • https://github.com/torvalds/linux/commits/master/include/linux/bpf-cgroup.h

Slide 26

Slide 26 text

How does Cilium use cgroup eBPF?

Slide 27

Slide 27 text

Source: https://cilium.io/blog/2019/08/20/cilium-16/ How does Cilium use cgroup eBPF?

Slide 28

Slide 28 text

Kube-proxy iptable rules for service backend selection Kube-proxy iptable rules for backend DNAT

Slide 29

Slide 29 text

Source: https://cilium.io/blog/2019/08/20/cilium-16/ https://www.hwchiu.com/tags/Kubernetes

Slide 30

Slide 30 text

kube-api-server Cilium Agent (daemon_main.go) BPF Maps Watch k8s services, 
 endpoints … update /bpf/init.sh BPFs lookup update k8s Node Cilium Overview Kernel Processe syscall

Slide 31

Slide 31 text

kube-api-server Cilium Agent (daemon_main.go) BPF Maps Watch k8s services, 
 endpoints … update /bpf/init.sh BPFs lookup update k8s Node Cilium Overview Kernel Processe syscall

Slide 32

Slide 32 text

Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

Slide 33

Slide 33 text

Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

Slide 34

Slide 34 text

Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

Slide 35

Slide 35 text

Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

Slide 36

Slide 36 text

Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

Slide 37

Slide 37 text

Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium BPF Map LB4_SERVICES_MAP_V2

Slide 38

Slide 38 text

Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium BPF Map LB4_SERVICES_MAP_V2

Slide 39

Slide 39 text

Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium BPF Map LB4_SERVICES_MAP_V2 key value address dport backend_slot scope backend_id /
 affinity_timeout count rev_nat index flags flags2 10.10.10.10 80 0 0 0 3 1 10.10.10.10 80 1 0 1 0 1 10.10.10.10 80 2 0 2 0 1 10.10.10.10 80 3 0 3 0 1 Example of a k8s service with 3 backend pods in the 
 LB4_SERVICES_MAP_V2

Slide 40

Slide 40 text

kube-api-server Cilium Agent (daemon_main.go) BPF Maps Watch k8s services, 
 endpoints … update /bpf/init.sh BPFs lookup update k8s Node Cilium Overview Kernel Processe syscall

Slide 41

Slide 41 text

How does Cilium use cgroup eBPF?

Slide 42

Slide 42 text

Source: Virtual bpfconf 2020 - Alexei Starovoitov, Daniel Borkmann (LSF/MM/BPF 2020) https://docs.google.com/presentation/d/1w2zlpGWV7JUhHYd37El_AUZzyUNSvDfktrF5MJ5G8Bs/edit#slide=id.g746fc02b5b_3_33 How does Cilium replace kube-proxy?

Slide 43

Slide 43 text

Source: Virtual bpfconf 2020 - Alexei Starovoitov, Daniel Borkmann (LSF/MM/BPF 2020) https://docs.google.com/presentation/d/1w2zlpGWV7JUhHYd37El_AUZzyUNSvDfktrF5MJ5G8Bs/edit#slide=id.g746fc02b5b_3_33 How does Cilium replace kube-proxy?

Slide 44

Slide 44 text

Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd & __sock4_xlate_rev

Slide 45

Slide 45 text

Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd & __sock4_xlate_rev

Slide 46

Slide 46 text

Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd

Slide 47

Slide 47 text

Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd

Slide 48

Slide 48 text

Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium lb4_lookup_service

Slide 49

Slide 49 text

Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium __lb4_lookup_backend

Slide 50

Slide 50 text

Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd & __sock4_xlate_rev

Slide 51

Slide 51 text

Cilium __sock4_xlate_rev Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c

Slide 52

Slide 52 text

Cilium LB4_REVERSE_NAT_SK_MAP Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c

Slide 53

Slide 53 text

• cgroup eBPF Introduction • commit history • connect syscall example • Cilium Agent Overview • LB4_SERVICES_MAP_V2 preparation • Cilium kube-proxy replacement (application side) • NAT on per connect/getpeername/sendmsg/recvmsg syscall, not on per packet Recap

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

Recent Changes - Linux 5.8

Slide 56

Slide 56 text

Recent Changes - Linux 5.9

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

No content