Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tracing the Containers (mainly about eBPF)

KONDO Uchio
November 28, 2019

Tracing the Containers (mainly about eBPF)

Presented @ CNDK 2019

KONDO Uchio

November 28, 2019
Tweet

More Decks by KONDO Uchio

Other Decks in Technology

Transcript

  1. audit, falco, ... and eBPF! Uchio Kondo @ GMO Pepabo,

    Inc.
 #CNDK2019 Tracing the Containers Image from pixabay: https://pixabay.com/images/id-984050/
  2. Señor-Principal Engineer @ GMO Pepabo, Inc. Uchio Kondo https://blog.udzura.jp/ @udzura

    Technical department, Dev Productivity/R&D Team Chair on CNDJ at Fukuoka, 2019.04 Systems programmer wannabe Duolingo freak (Emerald League)
  3. Intertested: •Container features in Linux Kernel (namespace, cgroup, capability, ...)

    •System calls •Kernel programming interfaces •eBPF (<= New!!) •The most favorite struct: struct task_struct
  4. ToC •Rough overview of Container tracing (5m~) •Introducing to eBPF

    •Comparison to existing tracers •Kernel events (~ 5m) •Use cases with some DEMO (~ 10m)
  5. Falco as a audit tool •ϧʔϧϕʔεͰ༷ʑͳ΋ͷΛ؂ࠪɻ •ϑΝΠϧૢ࡞ɺϓϩηεɺsyslog... •ref: Wazuh/OSSec https://wazuh.com/

    •ίϯςφʹಛԽͨ͠؂ࠪϧʔϧ •trusted_images, falco_sensitive_mount_images, ... https://github.com/falcosecurity/falco/blob/dev/rules/falco_rules.yaml
  6. Falco internal •؂ࠪ͢Δ৘ใͷιʔε͸େ͖͘͸ΧʔωϧϞδϡʔϧɻ •sysdig(~0.6), falco-probe(0.6~) •> The kernel modules are

    actually built from the same source code •eBPF΋಺෦Ͱ࢖͑ΔΑ͏ʹͳ͍ͬͯΔ • https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf/
  7. “Berkley Packet Filter” •ݩʑ͸ύέοτϑΟϧλͷख๏ͷ࿦จ (classic BPF, 1993) •Tcpdump ͷத਎ͱͯ͠׆༂ •ύέοτϑΟϧλҎ֎:

    Seccomp Ͱ΋࢖ΘΕΔΑ͏ʹͳΔ •Linux 3.14 (2014)͔Βେ͖ͳมߋɺࠓͷܗʹۙͮ͘
 (extended BPF) ʮBerkeley Packet FilterʢBPFʣೖ໳ʢ1ʣʯ https://www.atmarkit.co.jp/ait/articles/1811/21/news010.html http://www.tcpdump.org/papers/bpf-usenix93.pdf
  8. Existing Linux tracers Tool Ability Key sys call Invasivity gdb

    ϓϩάϥϜͷεςοϓ࣮ߦɺ
 γάφϧͳͲͰͷఀࢭ ptrace(2) Large strace γεςϜίʔϧͷ௥੻ ptrace(2) Large perf ύϑΥʔϚϯεΧ΢ϯλͳͲͷ
 ूܭͱՄࢹԽ perf_event_open(2) Medium bpftrace/BCC ͋ΒΏΔΧʔωϧΠϕϯτͷ
 ूܭͱՄࢹԽ bpf(2) Smaller
  9. Comparison to perf •perf ͸ tracepoint ͳͲɺ eBPF ͕औಘͰ͖ΔΑ͏ͳ৘ใͷଟ͘Λಉ͡ Α͏ʹऔಘͰ͖Δ

    •Ұํɺूܭ͸ɺྫ͑͹ϓϩʔϒ͝ͱʹ perf_event_open(2) ͯ͠ɺ
 ϢʔβϥϯυͰूܭ͢ΔͳͲΦʔόϔου͕ແࢹͰ͖ͳ͍
 ʮ؍ଌऀޮՌʯ •eBPF͸ΧʔωϧͰϑΟϧλɺूܭ(eBPF map)͕Ͱ͖Δɻ
 DTrace ʹ͍ۙɻ
  10. “Raw” usage of tracefs •tracefs Λܦ༝ͯ͠ɺeBPFͳ͠Ͱ΋ΧʔωϧτϨʔεՄೳ
 (debugfs͔Βݟ͑Δ΋ͷͱಉ͡ɺΑΓݶఆతͳػೳ͔͠ݟͤͳ͍) ʮࣗ෼ͷͨΊͷΧʔωϧτϨʔγϯάɺͦͷ1ʯ https://udzura.hatenablog.jp/entry/2019/09/02/174801 echo

    "p:myprobe1 $sym" >> \ /sys/kernel/debug/tracing/kprobe_events ʮftrace Λ࢖ͬͨίϯςφ಺σόοάͷ४උʯ https://speakerdeck.com/kentatada/container-debug-using-ftrace
  11. eBPF use case •Debugging HOST Linux itself •Syscalls or kernel

    functions around containers •Runtime performance •bpftrace result to Prometheus for monitoring •Tracing events per container •Cgroup v2 with eBPF •Tracee by AquaSeciruty
  12. Tracing kernel on containers •ίϯςφ͸༷ʑͳΧʔωϧػೳΛ࢖͏ͷͰɺͦͷΧʔωϧػೳࣗମΛ σόοάͨ͠Γܭଌͨ͠Γ͢Δ͜ͱ͕eBPFͰͰ͖Δɻ •ྫ͑͹: `ip netns add/del`

    •಺෦Ͱ copy_net_ns/cleanup_net ͱ͍͏Χʔωϧؔ਺ΛݺͿ •͜ΕΒ͸͞Βʹ಺෦Ͱ͸ΧʔωϧͷόʔδϣϯʹΑΓϩοΫΛऔΔͷ ͰɺύϑΥʔϚϯεӨڹͳͲΛௐ΂͍ͨˠ eBPF Ͱʂ
  13. bpftrace → Prometheus •bt2prom ͱ͍͏πʔϧΛॻ͍ͨɻ •bpftraceͷు͖ग़͢JSONϑΥʔϚοτΛɺPrometheusՄ׵ͷϑΥʔ Ϛοτʹม׵ɻ •ͦͷ·· Textfile exporter

    ͷσΟϨΫτϦʹஔ͍ͨΒϓϩοτՄೳ •Cron ͳͲͰʢsarΈ͍ͨͳΠϝʔδͰʣఆظ࣮ߦ͢ΔͷΛ૝ఆ “Format bpftrace JSON into prometheus-compat textfile” https://github.com/udzura/mruby-bin-bt2prom
  14. CGroup v2 x eBPF •BPFͷcgroupઐ༻ؔ਺ - ࣮ߦ͞ΕͨεϨου͕ॴଐ͢Δcgroup͕Θ͔ Δɻ BPF_FUNC_get_current_cgroup_id ΄͔

    •Χʔωϧ͕ΊͪΌ৽͘͠ͳ͍ͱ࢖͑ͳ͍... ͕ɺศར •ίϯςφ୯ҐͰɺͲͷΑ͏ͳϑΝΠϧ͕Φʔϓϯ͞ΕΔ͔ͷτϨʔε ͳͲ͕༰қʹͰ͖Δ •e.g. Apache HTTPDίϯςφ͕ϦΫΤετຖʹ։͘ϑΝΠϧͷsnoop
  15. We’re moving to cgroup v2 •Moby ͷ cgroup v2 ରԠP/R

    (WIP) •Systemd ͷ v2 default Խ (from 243)
  16. What is new in cgroup v2 (Reprise) •Unified Hierarchy •CGroup-aware

    OOM Killer •nsdelegate and better cgroup namespace •PSI - Pressure Stall Information •BPF helper for cgroup v2
 (such as BPF_FUNC_get_current_cgroup_id, ...)
  17. It should be “per-container” •Load Avarage •Memory usage •psutils, top,

    vmstat... •netstat, iostat •syslog, auditd •perf Host-wide Per-Container •Cgroup stat •PSI(especially) •eBPF (per container) •USDT, syscalls... •sysdig/falco •perf --cgroup