Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tracing the Containers (mainly about eBPF)
Search
KONDO Uchio
November 28, 2019
Technology
970
6
Share
Tracing the Containers (mainly about eBPF)
Presented @ CNDK 2019
KONDO Uchio
November 28, 2019
More Decks by KONDO Uchio
See All by KONDO Uchio
大規模レガシーテストを 倒すための CI基盤の作り方 / #CICD2023
udzura
5
2.6k
Ruby x BPF in Action / RubyKaigi 2022
udzura
0
300
Narrative of Ruby & Rust
udzura
0
270
開発者生産性指標の可視化 / pepabo-four-keys
udzura
3
1.8k
Talk of RBS
udzura
0
490
Re: みなさん最近どうですか? / FGN tech meetup in 2021
udzura
0
850
Dockerとやわらかい仮想化 - ProSec-IT/SECKUN 2021 edition -
udzura
2
810
Device access filtering in cgroup v2
udzura
1
1k
"Story of Rucy" on RubyKaigi takeout 2021
udzura
0
910
Other Decks in Technology
See All in Technology
[Oracle TechNight#99] 生成AI時代のAI/ML入門 ~ AIとオラクルデータベースの関係 (前半)
oracle4engineer
PRO
2
230
Oracle Cloud Infrastructure:2026年4月度サービス・アップデート
oracle4engineer
PRO
0
350
Modernizing Your HCL Connections Experience: Visual Report to chain, Profile Enhancements, and AI Integration
wannesrams
0
280
生成AIはソフトウェア開発の革命か、ソフトウェア工学の宿題再提出なのか -ソフトウェア品質特性の追加提案-
kyonmm
PRO
2
850
『生成AI時代のクレデンシャルとパーミッション設計 — Claude Code を起点に』の執筆企画
takuros
3
2.2k
20260428_Product Management Summit_tadokoroyoshiro
tadokoro_yoshiro
15
18k
大学職員のための生成AI最前線 :最前線を、AIガバナンスとして読み直すためのTips
gmoriki
2
3.7k
Digital Independence: Why, When and How
wannesrams
0
290
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
100k
変化の激しい時代をゴキゲンに生き抜くために 〜ストレスマネジメントのススメ〜
kakehashi
PRO
4
1.1k
AIが盛んな時代に 技術記事を書き始めて起きた私の中での小さな変化
peintangos
0
360
オライリーイベント登壇資料「鉄リサイクル・産廃業界におけるAI技術実応用のカタチ」
takarasawa_
0
280
Featured
See All Featured
The Art of Programming - Codeland 2020
erikaheidi
57
14k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
350
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
900
Design in an AI World
tapps
1
210
Speed Design
sergeychernyshev
33
1.6k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
250
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
180
Building Adaptive Systems
keathley
44
3k
New Earth Scene 8
popppiees
3
2.2k
Code Reviewing Like a Champion
maltzj
528
40k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.3k
Documentation Writing (for coders)
carmenintech
77
5.3k
Transcript
audit, falco, ... and eBPF! Uchio Kondo @ GMO Pepabo,
Inc. #CNDK2019 Tracing the Containers Image from pixabay: https://pixabay.com/images/id-984050/
Señor-Principal Engineer @ GMO Pepabo, Inc. Uchio Kondo https://blog.udzura.jp/ @udzura
Technical department, Dev Productivity/R&D Team Chair on CNDJ at Fukuoka, 2019.04 Systems programmer wannabe Duolingo freak (Emerald League)
JapanContainerDays 2018.12 •CRIU
CNDF 2019 Spring
CNDT 2019 summer •cgroup v2 & PSI
Intertested: •Container features in Linux Kernel (namespace, cgroup, capability, ...)
•System calls •Kernel programming interfaces •eBPF (<= New!!) •The most favorite struct: struct task_struct
Today
ToC •Rough overview of Container tracing (5m~) •Introducing to eBPF
•Comparison to existing tracers •Kernel events (~ 5m) •Use cases with some DEMO (~ 10m)
Tracing Your containers
Why tracing? •τϨʔεʹҎԼͷΑ͏ͳత͕͋Δ •ϩΪϯά: ෳࡶͳΞϓϦέʔγϣϯͰԿ͕͓͖͍ͯΔ͔Ѳ •ࠪɾηΩϡϦςΟ: ඞཁͳτϨʔεϩάΛग़͢͜ͱͰɺෆଌͷࣄଶ ͕͋ͬͨ߹ʹޙ͔Βௐ͕ࠪͰ͖Δɻ·ͨɺෆਖ਼ͳΞΫηεΛݕ Ͱ͖Δ͜ͱ͋Δ •σόοάɾύϑΥʔϚϯε:
୯७ͳΞϓϦέʔγϣϯϩάͰΘ͔Βͳ ͍༰Λ୳Δ
What to trace? Kubernetes/ API Host Linux Per-Container Apps (Networking)
Methodology
Kubernetes audit - orchestrator
Falco / sysdig - host, containers
Falco as a audit tool •ϧʔϧϕʔεͰ༷ʑͳͷΛࠪɻ •ϑΝΠϧૢ࡞ɺϓϩηεɺsyslog... •ref: Wazuh/OSSec https://wazuh.com/
•ίϯςφʹಛԽͨࠪ͠ϧʔϧ •trusted_images, falco_sensitive_mount_images, ... https://github.com/falcosecurity/falco/blob/dev/rules/falco_rules.yaml
Falco internal •ࠪ͢Δใͷιʔεେ͖͘ΧʔωϧϞδϡʔϧɻ •sysdig(~0.6), falco-probe(0.6~) •> The kernel modules are
actually built from the same source code •eBPF෦Ͱ͑ΔΑ͏ʹͳ͍ͬͯΔ • https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf/
None
eBPF?
“Berkley Packet Filter” •ݩʑύέοτϑΟϧλͷख๏ͷจ (classic BPF, 1993) •Tcpdump ͷதͱͯ͠׆༂ •ύέοτϑΟϧλҎ֎:
Seccomp ͰΘΕΔΑ͏ʹͳΔ •Linux 3.14 (2014)͔Βେ͖ͳมߋɺࠓͷܗʹۙͮ͘ (extended BPF) ʮBerkeley Packet FilterʢBPFʣೖʢ1ʣʯ https://www.atmarkit.co.jp/ait/articles/1811/21/news010.html http://www.tcpdump.org/papers/bpf-usenix93.pdf
eBPF overview •BPFόΠτίʔυΛͭ͘Δ ʢ৭ʑͳํ๏Ͱ࡞Δʣ •ΧʔωϧͰݕࠪ͞ΕɺඞཁʹԠ͡JIT •ΧʔωϧͷΠϕϯτΛϓϩάϥϜ͕ऩू •BPF map ͱ͍͏໊લͷ Χʔωϧूੵମ͕͋Δʢͱͬͯߴʣ
From: https://www.atmarkit.co.jp/ait/articles/1811/21/news010_2.html
Tools •bpftrace(8) - ෦ͰeBPFΛ͏൚༻తτϨʔαʔ •DTraceݴޠͦͬ͘ΓͷεΫϦϓτͰτϨʔε༰Λهड़ •BCC - eBPF ͷػೳΛϥοϓͨ͠ϓϩάϥϜΛ࡞ΔͨΊͷϥΠϒϥϦ •Python,
Lua, C++ •Ruby ࣮ - RbBCC (࡞)
Existing Linux tracers Tool Ability Key sys call Invasivity gdb
ϓϩάϥϜͷεςοϓ࣮ߦɺ γάφϧͳͲͰͷఀࢭ ptrace(2) Large strace γεςϜίʔϧͷ ptrace(2) Large perf ύϑΥʔϚϯεΧϯλͳͲͷ ूܭͱՄࢹԽ perf_event_open(2) Medium bpftrace/BCC ͋ΒΏΔΧʔωϧΠϕϯτͷ ूܭͱՄࢹԽ bpf(2) Smaller
Comparison to gdb/strace •gdb/strace ྆ํͱ伴ͱͳΔγεςϜίʔϧ ptrace(2) •Έ্ɺҰϓϩάϥϜΛࢭΊΔඞཁ͕͋Δ •ࢭΊ͍ͯΔ͔Βͦ͜ྫ͑ϨδελΛߋ৽ͨ͠ΓɺΑΓϓϩάϥϜͷ ڍಈʹ౿ΈࠐΜͩૢ࡞͕ՄೳͰ͋Δ ʮptraceγεςϜίʔϧೖʯ
https://itchyny.hatenablog.com/entry/2017/07/31/090000
Comparison to perf •perf tracepoint ͳͲɺ eBPF ͕औಘͰ͖ΔΑ͏ͳใͷଟ͘Λಉ͡ Α͏ʹऔಘͰ͖Δ
•Ұํɺूܭɺྫ͑ϓϩʔϒ͝ͱʹ perf_event_open(2) ͯ͠ɺ ϢʔβϥϯυͰूܭ͢ΔͳͲΦʔόϔου͕ແࢹͰ͖ͳ͍ ʮ؍ଌऀޮՌʯ •eBPFΧʔωϧͰϑΟϧλɺूܭ(eBPF map)͕Ͱ͖Δɻ DTrace ʹ͍ۙɻ
None
eBPF and Kernel events
eBPF event source http://www.brendangregg.com/blog/2019-07-15/bpf-performance-tools-book.html
Important source for tracing •perf, ftrace, eBPF Ͱಉ͡ιʔεΛ͏ ʮperf, ftraceͷ͘͠Έʯ
http://mmi.hatenablog.com/entry/2018/03/04/052249
tracepoint •LinuxΧʔωϧʹɺ෦Ͱى͜Δ༷ʑͳΠϕϯτΛ τϨʔε͢ΔͨΊͷϑοΫϙΠϯτ͕Έࠐ·Ε͍ͯΔɻ •ͦΕΒΛ tracepoint ͱݺͿɻΧʔωϧͷཚػೳΛͬͨ࣌ͷΠϕϯ τͷྫ
kprobe •tracepointجຊతʹ͋Β͔͡ΊΧʔωϧ։ൃऀ͕༻ҙͨ͠ ϑοΫϙΠϯτ͔͠τϨʔεͰ͖ͳ͍ɻ •ࣗͰɺಛఆͷΧʔωϧؔͷݺͼग़͠ΛτϨʔε͍ͨ͠߹ kprobe Λ͏ɻόʔδϣϯɺΞʔΩςΫνϟͰҟͳΔ͜ͱʹҙ͢Δ
uprobe •ϢʔβۭؒͷϓϩάϥϜͷڍಈΛɺΧʔωϧଆͰ͍͔͚ΒΕΔ •uprobe ɺόΠφϦ୯Ґʢਖ਼֬ʹͦͷ࣮ߦϑΝΠϧͷinode୯Ґͱ ͷ͜ͱʣͰΠϕϯτΛొ͢Δඞཁ͕͋Δɻ •ྫ͑ɺόΠφϦͰݟ͍͑ͯΔؔΛొ͢Δ
USDT •User Statically Defined Tracepoint •ϢʔβϓϩάϥϜͷҙͷՕॴʹprobeΛֻ͚ɺΦʔόʔϔουগ ͳ͘ར༻͢Δ͜ͱ͕Ͱ͖Δɻʢதͱͯ͠uprobeʹͳΔ༷ʣ
Others •perfͰ͏Α͏ͳϋʔυΣΞιϑτΣΞΧϯλͳͲeBPF͔ Βѻ͑Δɻ •bpftrace ͷϚχϡΞϧͰɺhardwareϓϩόΠμɺ softwareϓϩόΠ μɺϝϞϦͷwatchpointϓϩόΠμ͕ଘࡏ͢Δ
“Raw” usage of tracefs •tracefs Λܦ༝ͯ͠ɺeBPFͳ͠ͰΧʔωϧτϨʔεՄೳ (debugfs͔Βݟ͑Δͷͱಉ͡ɺΑΓݶఆతͳػೳ͔͠ݟͤͳ͍) ʮࣗͷͨΊͷΧʔωϧτϨʔγϯάɺͦͷ1ʯ https://udzura.hatenablog.jp/entry/2019/09/02/174801 echo
"p:myprobe1 $sym" >> \ /sys/kernel/debug/tracing/kprobe_events ʮftrace Λͬͨίϯςφσόοάͷ४උʯ https://speakerdeck.com/kentatada/container-debug-using-ftrace
ping͕connectΛଧͭτϨʔε
OK, what is good with containers?
eBPF use case •Debugging HOST Linux itself •Syscalls or kernel
functions around containers •Runtime performance •bpftrace result to Prometheus for monitoring •Tracing events per container •Cgroup v2 with eBPF •Tracee by AquaSeciruty
Tracing kernel on containers •ίϯςφ༷ʑͳΧʔωϧػೳΛ͏ͷͰɺͦͷΧʔωϧػೳࣗମΛ σόοάͨ͠Γܭଌͨ͠Γ͢Δ͜ͱ͕eBPFͰͰ͖Δɻ •ྫ͑: `ip netns add/del`
•෦Ͱ copy_net_ns/cleanup_net ͱ͍͏ΧʔωϧؔΛݺͿ •͜ΕΒ͞Βʹ෦ͰΧʔωϧͷόʔδϣϯʹΑΓϩοΫΛऔΔͷ ͰɺύϑΥʔϚϯεӨڹͳͲΛௐ͍ͨˠ eBPF Ͱʂ
Demo (1)
Reference •ʮLinux Kernel: rtnl_mutex Λ࣌ؒ ϩοΫͯͬͨ͠͞ঢ়ଶΛ؍͢Δʯ •https://hiboma.hatenadiary.jp/entry/2019/10/29/123455 •ʢ༨ஊͰ͕͢hiboma͞Μͷ͓͔͛Ͱ /proc/$pid/stack
wchan ͷ͍ํΛ Ѳ͠·ͨ͠ʣ
Tracing Runtime •ʢ࡞ίϯςφHaconiwaͰʣҎԼΛܭଌͯ͠Έͨ •ίϯςφϥϯλΠϜͷىಈʙexecve͢Δ·Ͱͷ࣌ؒ •ίϯςφϥϯλΠϜͷىಈʙίϯςφ͕listen͢Δ·Ͱͷ࣌ؒ •USDTͱtracepointͷ Έ߹Θͤ
bpftrace script
bpftrace → Prometheus •bt2prom ͱ͍͏πʔϧΛॻ͍ͨɻ •bpftraceͷు͖ग़͢JSONϑΥʔϚοτΛɺPrometheusՄͷϑΥʔ Ϛοτʹมɻ •ͦͷ·· Textfile exporter
ͷσΟϨΫτϦʹஔ͍ͨΒϓϩοτՄೳ •Cron ͳͲͰʢsarΈ͍ͨͳΠϝʔδͰʣఆظ࣮ߦ͢ΔͷΛఆ “Format bpftrace JSON into prometheus-compat textfile” https://github.com/udzura/mruby-bin-bt2prom
ࡶʹ vfs_read ΛτϥοΫͨ͠ྫ
CGroup v2 x eBPF •BPFͷcgroupઐ༻ؔ - ࣮ߦ͞ΕͨεϨου͕ॴଐ͢Δcgroup͕Θ͔ Δɻ BPF_FUNC_get_current_cgroup_id ΄͔
•Χʔωϧ͕ΊͪΌ৽͘͠ͳ͍ͱ͑ͳ͍... ͕ɺศར •ίϯςφ୯ҐͰɺͲͷΑ͏ͳϑΝΠϧ͕Φʔϓϯ͞ΕΔ͔ͷτϨʔε ͳͲ͕༰қʹͰ͖Δ •e.g. Apache HTTPDίϯςφ͕ϦΫΤετຖʹ։͘ϑΝΠϧͷsnoop
Demo (2) ͕࣌ؒͳ͍Ͱ͢ɺੋඇ͓͕͚Λʂ
Tracee •eBPFΛશ໘తʹ͏ίϯςφτϨʔα࣮ •෦ͰPID → NamespaceΛղܾͳͲ •bpftrace/BCC൚༻తͳͷͰɺ ಛԽͨ͠ػೳʹظ https://blog.aquasec.com/ebpf-tracing-containers
Conclusion
Happy publishing!
We’re moving to cgroup v2 •Moby ͷ cgroup v2 ରԠP/R
(WIP) •Systemd ͷ v2 default Խ (from 243)
What is new in cgroup v2 (Reprise) •Unified Hierarchy •CGroup-aware
OOM Killer •nsdelegate and better cgroup namespace •PSI - Pressure Stall Information •BPF helper for cgroup v2 (such as BPF_FUNC_get_current_cgroup_id, ...)
It should be “per-container” •Load Avarage •Memory usage •psutils, top,
vmstat... •netstat, iostat •syslog, auditd •perf Host-wide Per-Container •Cgroup stat •PSI(especially) •eBPF (per container) •USDT, syscalls... •sysdig/falco •perf --cgroup
Understand new feature to use new tools in a better
way