Cilium & cgroup eBPF

E91a24de5f8858932171b35bd47c8485?s=47 Rueian
October 26, 2020

Cilium & cgroup eBPF

Tracing the linux kernel commit history to understand what is cgroup eBPF and how Cilium use it to perform NAT on system calls to replace kube-proxy's iptable rules.

E91a24de5f8858932171b35bd47c8485?s=128

Rueian

October 26, 2020
Tweet

Transcript

  1. Ruian Huang @ Dcard / Cloud Native TW Meetup #33.

    2020/10/26 Cilium & cgroup eBPF cgroup eBPF applications
  2. Hi, I am Ruian • Graduated from NCTU CSCC •

    Dcard Backend Engineer • https://medium.com/@ruian • http://github.com/rueian • https://speakerdeck.com/rueian Previous Sharing
  3. • Why Cilium? Why eBPF? • What is cgroup eBPF?

    • commit history • connect syscall example • How does Cilium use cgroup eBPF? • How does Cilium agent prepare the eBPF map? • How does Cilium eBPF program utilize the map? Outline
  4. What is Cilium? Source: https://cilium.io

  5. Why Cilium? Source: https://cilium.io/blog/2020/08/19/google-chooses-cilium-for-gke-networking/

  6. Why eBPF? • High Performance • Able to skip large

    amount of executions in kernel. • Full Control • Able to change kernel/application behavior on the fly. Other eBPF Applications • [XDP] https://blog.cloudflare.com/how-to-drop-10-million-packets/ • [XDP] https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/ • [XDP] https://engineering.fb.com/open-source/open-sourcing-katran-a- scalable-network-load-balancer • https://github.com/zoidbergwill/awesome-ebpf
  7. Source: KubeCon2020 - Hubble eBPF Based Observability for Kubernetes By

    Sebastian Wicki, Isovalent Where is XDP eBPF?
  8. Where is XDP eBPF? Source: KubeCon2020 - Hubble eBPF Based

    Observability for Kubernetes By Sebastian Wicki, Isovalent
  9. Where is cgroup eBPF? Source: KubeCon2020 - Hubble eBPF Based

    Observability for Kubernetes By Sebastian Wicki, Isovalent
  10. Where is cgroup eBPF? Source: KubeCon2020 - Hubble eBPF Based

    Observability for Kubernetes By Sebastian Wicki, Isovalent
  11. cgroup eBPF - First 2016 Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc

  12. cgroup eBPF - First 2016 Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc

  13. Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc cgroup eBPF - First 2016

  14. Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc cgroup eBPF - First 2016

  15. Source: https://github.com/torvalds/linux/commit/ca89fa77b4488ecf2e3f72096386e8f3a58fe2fc cgroup eBPF - First

  16. cgroup eBPF - 2017 Source: https://github.com/torvalds/linux/commit/324bda9e6c5add86ba2e1066476481c48132aca0

  17. cgroup eBPF - Now

  18. None
  19. None
  20. Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

  21. Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

  22. Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

  23. Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

  24. Source: https://github.com/torvalds/linux/commit/d74bad4e74ee373787a9ae24197c17b7cdc428d5? branch=d74bad4e74ee373787a9ae24197c17b7cdc428d5 Linux Kernel BPF_CGROUP_INET4_CONNECT

  25. • https://github.com/torvalds/linux/commit/1cedee13d25ab118d325f95588c1a084e9317229 BPF_CGROUP_UDP4_SENDMSG BPF_CGROUP_UDP4_RECVMSG • https://github.com/torvalds/linux/commit/983695fa676568fc0fe5ddd995c7267aabc24632 BPF_CGROUP_INET4_GETPEERNAME • https://github.com/torvalds/linux/commit/1b66d253610c7f8f257103808a9460223a087469 Other

    cgroup BPF commit history • https://github.com/torvalds/linux/commits/master/include/linux/bpf-cgroup.h
  26. How does Cilium use cgroup eBPF?

  27. Source: https://cilium.io/blog/2019/08/20/cilium-16/ How does Cilium use cgroup eBPF?

  28. Kube-proxy iptable rules for service backend selection Kube-proxy iptable rules

    for backend DNAT
  29. Source: https://cilium.io/blog/2019/08/20/cilium-16/ https://www.hwchiu.com/tags/Kubernetes

  30. kube-api-server Cilium Agent (daemon_main.go) BPF Maps Watch k8s services, 


    endpoints … update /bpf/init.sh BPFs lookup update k8s Node Cilium Overview Kernel Processe syscall
  31. kube-api-server Cilium Agent (daemon_main.go) BPF Maps Watch k8s services, 


    endpoints … update /bpf/init.sh BPFs lookup update k8s Node Cilium Overview Kernel Processe syscall
  32. Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

  33. Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

  34. Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

  35. Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

  36. Cilium Agent UpsertService https://github.com/cilium/cilium/blob/945a852cfe62d1ea865e52c53aab3a4bee2de75a/pkg/maps/lbmap/lbmap.go

  37. Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium BPF Map LB4_SERVICES_MAP_V2

  38. Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium BPF Map LB4_SERVICES_MAP_V2

  39. Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium BPF Map LB4_SERVICES_MAP_V2 key value address dport

    backend_slot scope backend_id /
 affinity_timeout count rev_nat index flags flags2 10.10.10.10 80 0 0 0 3 1 10.10.10.10 80 1 0 1 0 1 10.10.10.10 80 2 0 2 0 1 10.10.10.10 80 3 0 3 0 1 Example of a k8s service with 3 backend pods in the 
 LB4_SERVICES_MAP_V2
  40. kube-api-server Cilium Agent (daemon_main.go) BPF Maps Watch k8s services, 


    endpoints … update /bpf/init.sh BPFs lookup update k8s Node Cilium Overview Kernel Processe syscall
  41. How does Cilium use cgroup eBPF?

  42. Source: Virtual bpfconf 2020 - Alexei Starovoitov, Daniel Borkmann (LSF/MM/BPF

    2020) https://docs.google.com/presentation/d/1w2zlpGWV7JUhHYd37El_AUZzyUNSvDfktrF5MJ5G8Bs/edit#slide=id.g746fc02b5b_3_33 How does Cilium replace kube-proxy?
  43. Source: Virtual bpfconf 2020 - Alexei Starovoitov, Daniel Borkmann (LSF/MM/BPF

    2020) https://docs.google.com/presentation/d/1w2zlpGWV7JUhHYd37El_AUZzyUNSvDfktrF5MJ5G8Bs/edit#slide=id.g746fc02b5b_3_33 How does Cilium replace kube-proxy?
  44. Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd & __sock4_xlate_rev

  45. Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd & __sock4_xlate_rev

  46. Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd

  47. Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd

  48. Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium lb4_lookup_service

  49. Source: https://github.com/cilium/cilium/blob/master/bpf/lib/lb.h Cilium __lb4_lookup_backend

  50. Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c Cilium __sock4_xlate_fwd & __sock4_xlate_rev

  51. Cilium __sock4_xlate_rev Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c

  52. Cilium LB4_REVERSE_NAT_SK_MAP Source: https://github.com/cilium/cilium/blob/2864f4844e9c0eea0994cbfee15f4c10b81f1e30/bpf/bpf_sock.c

  53. • cgroup eBPF Introduction • commit history • connect syscall

    example • Cilium Agent Overview • LB4_SERVICES_MAP_V2 preparation • Cilium kube-proxy replacement (application side) • NAT on per connect/getpeername/sendmsg/recvmsg syscall, not on per packet Recap
  54. None
  55. Recent Changes - Linux 5.8

  56. Recent Changes - Linux 5.9

  57. None
  58. None