Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tracing the Containers (mainly about eBPF)
Search
KONDO Uchio
November 28, 2019
Technology
980
6
Share
Tracing the Containers (mainly about eBPF)
Presented @ CNDK 2019
KONDO Uchio
November 28, 2019
More Decks by KONDO Uchio
See All by KONDO Uchio
大規模レガシーテストを 倒すための CI基盤の作り方 / #CICD2023
udzura
5
2.6k
Ruby x BPF in Action / RubyKaigi 2022
udzura
0
310
Narrative of Ruby & Rust
udzura
0
270
開発者生産性指標の可視化 / pepabo-four-keys
udzura
3
1.8k
Talk of RBS
udzura
0
500
Re: みなさん最近どうですか? / FGN tech meetup in 2021
udzura
0
860
Dockerとやわらかい仮想化 - ProSec-IT/SECKUN 2021 edition -
udzura
2
810
Device access filtering in cgroup v2
udzura
1
1k
"Story of Rucy" on RubyKaigi takeout 2021
udzura
0
920
Other Decks in Technology
See All in Technology
【Gen-AX】20260530開催_JJUG CCC 2026 Spring
genax
0
420
JJUG CCC 2026 Spring AI時代の開発こそ標準化を武器に! ― 方式・プロセス・プラットフォームの標準化
s27watanabe
2
720
GoとSIMDとWasmの今。
askua
3
510
速さだけじゃない! VoidZero ツールが移行先に選ばれる理由
mizdra
PRO
6
750
ポケモンの型をTypeScriptの型システムで表現してみた
subroh0508
0
320
AI Adaptable なテストを整える工夫 / Ways to Make Your Tests AI-Adaptable
bitkey
PRO
3
210
生成 AI × MCP で切り拓く次世代 SRE!自律型運用への挑戦と開発者体験の進化
_awache
0
150
React、まだ楽しくて草
uhyo
7
4.1k
Platform Engineering as a Product: Criteria for Improvement and Multi-Tenant Design
kumorn5s
0
500
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
1.8k
個人の発見を、組織の知恵に 〜生成AI活用を"探索"から"組織の仕組み"へ〜
kintotechdev
2
960
LLMと共に進化するプロセスを目指して
ymatsuwitter
10
2.9k
Featured
See All Featured
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
120k
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Technical Leadership for Architectural Decision Making
baasie
3
400
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
140
Visualization
eitanlees
152
17k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
220
Writing Fast Ruby
sferik
630
63k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Done Done
chrislema
186
16k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.3k
Deep Space Network (abreviated)
tonyrice
0
160
Transcript
audit, falco, ... and eBPF! Uchio Kondo @ GMO Pepabo,
Inc. #CNDK2019 Tracing the Containers Image from pixabay: https://pixabay.com/images/id-984050/
Señor-Principal Engineer @ GMO Pepabo, Inc. Uchio Kondo https://blog.udzura.jp/ @udzura
Technical department, Dev Productivity/R&D Team Chair on CNDJ at Fukuoka, 2019.04 Systems programmer wannabe Duolingo freak (Emerald League)
JapanContainerDays 2018.12 •CRIU
CNDF 2019 Spring
CNDT 2019 summer •cgroup v2 & PSI
Intertested: •Container features in Linux Kernel (namespace, cgroup, capability, ...)
•System calls •Kernel programming interfaces •eBPF (<= New!!) •The most favorite struct: struct task_struct
Today
ToC •Rough overview of Container tracing (5m~) •Introducing to eBPF
•Comparison to existing tracers •Kernel events (~ 5m) •Use cases with some DEMO (~ 10m)
Tracing Your containers
Why tracing? •τϨʔεʹҎԼͷΑ͏ͳత͕͋Δ •ϩΪϯά: ෳࡶͳΞϓϦέʔγϣϯͰԿ͕͓͖͍ͯΔ͔Ѳ •ࠪɾηΩϡϦςΟ: ඞཁͳτϨʔεϩάΛग़͢͜ͱͰɺෆଌͷࣄଶ ͕͋ͬͨ߹ʹޙ͔Βௐ͕ࠪͰ͖Δɻ·ͨɺෆਖ਼ͳΞΫηεΛݕ Ͱ͖Δ͜ͱ͋Δ •σόοάɾύϑΥʔϚϯε:
୯७ͳΞϓϦέʔγϣϯϩάͰΘ͔Βͳ ͍༰Λ୳Δ
What to trace? Kubernetes/ API Host Linux Per-Container Apps (Networking)
Methodology
Kubernetes audit - orchestrator
Falco / sysdig - host, containers
Falco as a audit tool •ϧʔϧϕʔεͰ༷ʑͳͷΛࠪɻ •ϑΝΠϧૢ࡞ɺϓϩηεɺsyslog... •ref: Wazuh/OSSec https://wazuh.com/
•ίϯςφʹಛԽͨࠪ͠ϧʔϧ •trusted_images, falco_sensitive_mount_images, ... https://github.com/falcosecurity/falco/blob/dev/rules/falco_rules.yaml
Falco internal •ࠪ͢Δใͷιʔεେ͖͘ΧʔωϧϞδϡʔϧɻ •sysdig(~0.6), falco-probe(0.6~) •> The kernel modules are
actually built from the same source code •eBPF෦Ͱ͑ΔΑ͏ʹͳ͍ͬͯΔ • https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf/
None
eBPF?
“Berkley Packet Filter” •ݩʑύέοτϑΟϧλͷख๏ͷจ (classic BPF, 1993) •Tcpdump ͷதͱͯ͠׆༂ •ύέοτϑΟϧλҎ֎:
Seccomp ͰΘΕΔΑ͏ʹͳΔ •Linux 3.14 (2014)͔Βେ͖ͳมߋɺࠓͷܗʹۙͮ͘ (extended BPF) ʮBerkeley Packet FilterʢBPFʣೖʢ1ʣʯ https://www.atmarkit.co.jp/ait/articles/1811/21/news010.html http://www.tcpdump.org/papers/bpf-usenix93.pdf
eBPF overview •BPFόΠτίʔυΛͭ͘Δ ʢ৭ʑͳํ๏Ͱ࡞Δʣ •ΧʔωϧͰݕࠪ͞ΕɺඞཁʹԠ͡JIT •ΧʔωϧͷΠϕϯτΛϓϩάϥϜ͕ऩू •BPF map ͱ͍͏໊લͷ Χʔωϧूੵମ͕͋Δʢͱͬͯߴʣ
From: https://www.atmarkit.co.jp/ait/articles/1811/21/news010_2.html
Tools •bpftrace(8) - ෦ͰeBPFΛ͏൚༻తτϨʔαʔ •DTraceݴޠͦͬ͘ΓͷεΫϦϓτͰτϨʔε༰Λهड़ •BCC - eBPF ͷػೳΛϥοϓͨ͠ϓϩάϥϜΛ࡞ΔͨΊͷϥΠϒϥϦ •Python,
Lua, C++ •Ruby ࣮ - RbBCC (࡞)
Existing Linux tracers Tool Ability Key sys call Invasivity gdb
ϓϩάϥϜͷεςοϓ࣮ߦɺ γάφϧͳͲͰͷఀࢭ ptrace(2) Large strace γεςϜίʔϧͷ ptrace(2) Large perf ύϑΥʔϚϯεΧϯλͳͲͷ ूܭͱՄࢹԽ perf_event_open(2) Medium bpftrace/BCC ͋ΒΏΔΧʔωϧΠϕϯτͷ ूܭͱՄࢹԽ bpf(2) Smaller
Comparison to gdb/strace •gdb/strace ྆ํͱ伴ͱͳΔγεςϜίʔϧ ptrace(2) •Έ্ɺҰϓϩάϥϜΛࢭΊΔඞཁ͕͋Δ •ࢭΊ͍ͯΔ͔Βͦ͜ྫ͑ϨδελΛߋ৽ͨ͠ΓɺΑΓϓϩάϥϜͷ ڍಈʹ౿ΈࠐΜͩૢ࡞͕ՄೳͰ͋Δ ʮptraceγεςϜίʔϧೖʯ
https://itchyny.hatenablog.com/entry/2017/07/31/090000
Comparison to perf •perf tracepoint ͳͲɺ eBPF ͕औಘͰ͖ΔΑ͏ͳใͷଟ͘Λಉ͡ Α͏ʹऔಘͰ͖Δ
•Ұํɺूܭɺྫ͑ϓϩʔϒ͝ͱʹ perf_event_open(2) ͯ͠ɺ ϢʔβϥϯυͰूܭ͢ΔͳͲΦʔόϔου͕ແࢹͰ͖ͳ͍ ʮ؍ଌऀޮՌʯ •eBPFΧʔωϧͰϑΟϧλɺूܭ(eBPF map)͕Ͱ͖Δɻ DTrace ʹ͍ۙɻ
None
eBPF and Kernel events
eBPF event source http://www.brendangregg.com/blog/2019-07-15/bpf-performance-tools-book.html
Important source for tracing •perf, ftrace, eBPF Ͱಉ͡ιʔεΛ͏ ʮperf, ftraceͷ͘͠Έʯ
http://mmi.hatenablog.com/entry/2018/03/04/052249
tracepoint •LinuxΧʔωϧʹɺ෦Ͱى͜Δ༷ʑͳΠϕϯτΛ τϨʔε͢ΔͨΊͷϑοΫϙΠϯτ͕Έࠐ·Ε͍ͯΔɻ •ͦΕΒΛ tracepoint ͱݺͿɻΧʔωϧͷཚػೳΛͬͨ࣌ͷΠϕϯ τͷྫ
kprobe •tracepointجຊతʹ͋Β͔͡ΊΧʔωϧ։ൃऀ͕༻ҙͨ͠ ϑοΫϙΠϯτ͔͠τϨʔεͰ͖ͳ͍ɻ •ࣗͰɺಛఆͷΧʔωϧؔͷݺͼग़͠ΛτϨʔε͍ͨ͠߹ kprobe Λ͏ɻόʔδϣϯɺΞʔΩςΫνϟͰҟͳΔ͜ͱʹҙ͢Δ
uprobe •ϢʔβۭؒͷϓϩάϥϜͷڍಈΛɺΧʔωϧଆͰ͍͔͚ΒΕΔ •uprobe ɺόΠφϦ୯Ґʢਖ਼֬ʹͦͷ࣮ߦϑΝΠϧͷinode୯Ґͱ ͷ͜ͱʣͰΠϕϯτΛొ͢Δඞཁ͕͋Δɻ •ྫ͑ɺόΠφϦͰݟ͍͑ͯΔؔΛొ͢Δ
USDT •User Statically Defined Tracepoint •ϢʔβϓϩάϥϜͷҙͷՕॴʹprobeΛֻ͚ɺΦʔόʔϔουগ ͳ͘ར༻͢Δ͜ͱ͕Ͱ͖Δɻʢதͱͯ͠uprobeʹͳΔ༷ʣ
Others •perfͰ͏Α͏ͳϋʔυΣΞιϑτΣΞΧϯλͳͲeBPF͔ Βѻ͑Δɻ •bpftrace ͷϚχϡΞϧͰɺhardwareϓϩόΠμɺ softwareϓϩόΠ μɺϝϞϦͷwatchpointϓϩόΠμ͕ଘࡏ͢Δ
“Raw” usage of tracefs •tracefs Λܦ༝ͯ͠ɺeBPFͳ͠ͰΧʔωϧτϨʔεՄೳ (debugfs͔Βݟ͑Δͷͱಉ͡ɺΑΓݶఆతͳػೳ͔͠ݟͤͳ͍) ʮࣗͷͨΊͷΧʔωϧτϨʔγϯάɺͦͷ1ʯ https://udzura.hatenablog.jp/entry/2019/09/02/174801 echo
"p:myprobe1 $sym" >> \ /sys/kernel/debug/tracing/kprobe_events ʮftrace Λͬͨίϯςφσόοάͷ४උʯ https://speakerdeck.com/kentatada/container-debug-using-ftrace
ping͕connectΛଧͭτϨʔε
OK, what is good with containers?
eBPF use case •Debugging HOST Linux itself •Syscalls or kernel
functions around containers •Runtime performance •bpftrace result to Prometheus for monitoring •Tracing events per container •Cgroup v2 with eBPF •Tracee by AquaSeciruty
Tracing kernel on containers •ίϯςφ༷ʑͳΧʔωϧػೳΛ͏ͷͰɺͦͷΧʔωϧػೳࣗମΛ σόοάͨ͠Γܭଌͨ͠Γ͢Δ͜ͱ͕eBPFͰͰ͖Δɻ •ྫ͑: `ip netns add/del`
•෦Ͱ copy_net_ns/cleanup_net ͱ͍͏ΧʔωϧؔΛݺͿ •͜ΕΒ͞Βʹ෦ͰΧʔωϧͷόʔδϣϯʹΑΓϩοΫΛऔΔͷ ͰɺύϑΥʔϚϯεӨڹͳͲΛௐ͍ͨˠ eBPF Ͱʂ
Demo (1)
Reference •ʮLinux Kernel: rtnl_mutex Λ࣌ؒ ϩοΫͯͬͨ͠͞ঢ়ଶΛ؍͢Δʯ •https://hiboma.hatenadiary.jp/entry/2019/10/29/123455 •ʢ༨ஊͰ͕͢hiboma͞Μͷ͓͔͛Ͱ /proc/$pid/stack
wchan ͷ͍ํΛ Ѳ͠·ͨ͠ʣ
Tracing Runtime •ʢ࡞ίϯςφHaconiwaͰʣҎԼΛܭଌͯ͠Έͨ •ίϯςφϥϯλΠϜͷىಈʙexecve͢Δ·Ͱͷ࣌ؒ •ίϯςφϥϯλΠϜͷىಈʙίϯςφ͕listen͢Δ·Ͱͷ࣌ؒ •USDTͱtracepointͷ Έ߹Θͤ
bpftrace script
bpftrace → Prometheus •bt2prom ͱ͍͏πʔϧΛॻ͍ͨɻ •bpftraceͷు͖ग़͢JSONϑΥʔϚοτΛɺPrometheusՄͷϑΥʔ Ϛοτʹมɻ •ͦͷ·· Textfile exporter
ͷσΟϨΫτϦʹஔ͍ͨΒϓϩοτՄೳ •Cron ͳͲͰʢsarΈ͍ͨͳΠϝʔδͰʣఆظ࣮ߦ͢ΔͷΛఆ “Format bpftrace JSON into prometheus-compat textfile” https://github.com/udzura/mruby-bin-bt2prom
ࡶʹ vfs_read ΛτϥοΫͨ͠ྫ
CGroup v2 x eBPF •BPFͷcgroupઐ༻ؔ - ࣮ߦ͞ΕͨεϨου͕ॴଐ͢Δcgroup͕Θ͔ Δɻ BPF_FUNC_get_current_cgroup_id ΄͔
•Χʔωϧ͕ΊͪΌ৽͘͠ͳ͍ͱ͑ͳ͍... ͕ɺศར •ίϯςφ୯ҐͰɺͲͷΑ͏ͳϑΝΠϧ͕Φʔϓϯ͞ΕΔ͔ͷτϨʔε ͳͲ͕༰қʹͰ͖Δ •e.g. Apache HTTPDίϯςφ͕ϦΫΤετຖʹ։͘ϑΝΠϧͷsnoop
Demo (2) ͕࣌ؒͳ͍Ͱ͢ɺੋඇ͓͕͚Λʂ
Tracee •eBPFΛશ໘తʹ͏ίϯςφτϨʔα࣮ •෦ͰPID → NamespaceΛղܾͳͲ •bpftrace/BCC൚༻తͳͷͰɺ ಛԽͨ͠ػೳʹظ https://blog.aquasec.com/ebpf-tracing-containers
Conclusion
Happy publishing!
We’re moving to cgroup v2 •Moby ͷ cgroup v2 ରԠP/R
(WIP) •Systemd ͷ v2 default Խ (from 243)
What is new in cgroup v2 (Reprise) •Unified Hierarchy •CGroup-aware
OOM Killer •nsdelegate and better cgroup namespace •PSI - Pressure Stall Information •BPF helper for cgroup v2 (such as BPF_FUNC_get_current_cgroup_id, ...)
It should be “per-container” •Load Avarage •Memory usage •psutils, top,
vmstat... •netstat, iostat •syslog, auditd •perf Host-wide Per-Container •Cgroup stat •PSI(especially) •eBPF (per container) •USDT, syscalls... •sysdig/falco •perf --cgroup
Understand new feature to use new tools in a better
way