Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tracing the Containers (mainly about eBPF)
Search
KONDO Uchio
November 28, 2019
Technology
6
930
Tracing the Containers (mainly about eBPF)
Presented @ CNDK 2019
KONDO Uchio
November 28, 2019
Tweet
Share
More Decks by KONDO Uchio
See All by KONDO Uchio
大規模レガシーテストを 倒すための CI基盤の作り方 / #CICD2023
udzura
5
2.4k
Ruby x BPF in Action / RubyKaigi 2022
udzura
0
250
Narrative of Ruby & Rust
udzura
0
220
開発者生産性指標の可視化 / pepabo-four-keys
udzura
3
1.7k
Talk of RBS
udzura
0
450
Re: みなさん最近どうですか? / FGN tech meetup in 2021
udzura
0
780
Dockerとやわらかい仮想化 - ProSec-IT/SECKUN 2021 edition -
udzura
2
730
Device access filtering in cgroup v2
udzura
1
910
"Story of Rucy" on RubyKaigi takeout 2021
udzura
0
830
Other Decks in Technology
See All in Technology
AWS テクニカルサポートとエンドカスタマーの中間地点から見えるより良いサポートの活用方法
kazzpapa3
2
610
GitHub Copilot の概要
tomokusaba
1
150
Model Mondays S2E03: SLMs & Reasoning
nitya
0
240
LangSmith×Webhook連携で実現するプロンプトドリブンCI/CD
sergicalsix
1
150
KubeCon + CloudNativeCon Japan 2025 に行ってきた! & containerd の新機能紹介
honahuku
0
120
登壇ネタの見つけ方 / How to find talk topics
pinkumohikan
5
590
250627 関西Ruby会議08 前夜祭 RejectKaigi「DJ on Ruby Ver.0.1」
msykd
PRO
2
370
GeminiとNotebookLMによる金融実務の業務革新
abenben
0
240
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
26k
なぜ私はいま、ここにいるのか? #もがく中堅デザイナー #プロダクトデザイナー
bengo4com
0
1.3k
「Chatwork」の認証基盤の移行とログ活用によるプロダクト改善
kubell_hr
1
240
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
5
590
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
524
40k
Embracing the Ebb and Flow
colly
86
4.7k
BBQ
matthewcrist
89
9.7k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
Build your cross-platform service in a week with App Engine
jlugia
231
18k
Building Adaptive Systems
keathley
43
2.6k
How to train your dragon (web standard)
notwaldorf
94
6.1k
What's in a price? How to price your products and services
michaelherold
246
12k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
How STYLIGHT went responsive
nonsquared
100
5.6k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
Unsuck your backbone
ammeep
671
58k
Transcript
audit, falco, ... and eBPF! Uchio Kondo @ GMO Pepabo,
Inc. #CNDK2019 Tracing the Containers Image from pixabay: https://pixabay.com/images/id-984050/
Señor-Principal Engineer @ GMO Pepabo, Inc. Uchio Kondo https://blog.udzura.jp/ @udzura
Technical department, Dev Productivity/R&D Team Chair on CNDJ at Fukuoka, 2019.04 Systems programmer wannabe Duolingo freak (Emerald League)
JapanContainerDays 2018.12 •CRIU
CNDF 2019 Spring
CNDT 2019 summer •cgroup v2 & PSI
Intertested: •Container features in Linux Kernel (namespace, cgroup, capability, ...)
•System calls •Kernel programming interfaces •eBPF (<= New!!) •The most favorite struct: struct task_struct
Today
ToC •Rough overview of Container tracing (5m~) •Introducing to eBPF
•Comparison to existing tracers •Kernel events (~ 5m) •Use cases with some DEMO (~ 10m)
Tracing Your containers
Why tracing? •τϨʔεʹҎԼͷΑ͏ͳత͕͋Δ •ϩΪϯά: ෳࡶͳΞϓϦέʔγϣϯͰԿ͕͓͖͍ͯΔ͔Ѳ •ࠪɾηΩϡϦςΟ: ඞཁͳτϨʔεϩάΛग़͢͜ͱͰɺෆଌͷࣄଶ ͕͋ͬͨ߹ʹޙ͔Βௐ͕ࠪͰ͖Δɻ·ͨɺෆਖ਼ͳΞΫηεΛݕ Ͱ͖Δ͜ͱ͋Δ •σόοάɾύϑΥʔϚϯε:
୯७ͳΞϓϦέʔγϣϯϩάͰΘ͔Βͳ ͍༰Λ୳Δ
What to trace? Kubernetes/ API Host Linux Per-Container Apps (Networking)
Methodology
Kubernetes audit - orchestrator
Falco / sysdig - host, containers
Falco as a audit tool •ϧʔϧϕʔεͰ༷ʑͳͷΛࠪɻ •ϑΝΠϧૢ࡞ɺϓϩηεɺsyslog... •ref: Wazuh/OSSec https://wazuh.com/
•ίϯςφʹಛԽͨࠪ͠ϧʔϧ •trusted_images, falco_sensitive_mount_images, ... https://github.com/falcosecurity/falco/blob/dev/rules/falco_rules.yaml
Falco internal •ࠪ͢Δใͷιʔεେ͖͘ΧʔωϧϞδϡʔϧɻ •sysdig(~0.6), falco-probe(0.6~) •> The kernel modules are
actually built from the same source code •eBPF෦Ͱ͑ΔΑ͏ʹͳ͍ͬͯΔ • https://sysdig.com/blog/sysdig-and-falco-now-powered-by-ebpf/
None
eBPF?
“Berkley Packet Filter” •ݩʑύέοτϑΟϧλͷख๏ͷจ (classic BPF, 1993) •Tcpdump ͷதͱͯ͠׆༂ •ύέοτϑΟϧλҎ֎:
Seccomp ͰΘΕΔΑ͏ʹͳΔ •Linux 3.14 (2014)͔Βେ͖ͳมߋɺࠓͷܗʹۙͮ͘ (extended BPF) ʮBerkeley Packet FilterʢBPFʣೖʢ1ʣʯ https://www.atmarkit.co.jp/ait/articles/1811/21/news010.html http://www.tcpdump.org/papers/bpf-usenix93.pdf
eBPF overview •BPFόΠτίʔυΛͭ͘Δ ʢ৭ʑͳํ๏Ͱ࡞Δʣ •ΧʔωϧͰݕࠪ͞ΕɺඞཁʹԠ͡JIT •ΧʔωϧͷΠϕϯτΛϓϩάϥϜ͕ऩू •BPF map ͱ͍͏໊લͷ Χʔωϧूੵମ͕͋Δʢͱͬͯߴʣ
From: https://www.atmarkit.co.jp/ait/articles/1811/21/news010_2.html
Tools •bpftrace(8) - ෦ͰeBPFΛ͏൚༻తτϨʔαʔ •DTraceݴޠͦͬ͘ΓͷεΫϦϓτͰτϨʔε༰Λهड़ •BCC - eBPF ͷػೳΛϥοϓͨ͠ϓϩάϥϜΛ࡞ΔͨΊͷϥΠϒϥϦ •Python,
Lua, C++ •Ruby ࣮ - RbBCC (࡞)
Existing Linux tracers Tool Ability Key sys call Invasivity gdb
ϓϩάϥϜͷεςοϓ࣮ߦɺ γάφϧͳͲͰͷఀࢭ ptrace(2) Large strace γεςϜίʔϧͷ ptrace(2) Large perf ύϑΥʔϚϯεΧϯλͳͲͷ ूܭͱՄࢹԽ perf_event_open(2) Medium bpftrace/BCC ͋ΒΏΔΧʔωϧΠϕϯτͷ ूܭͱՄࢹԽ bpf(2) Smaller
Comparison to gdb/strace •gdb/strace ྆ํͱ伴ͱͳΔγεςϜίʔϧ ptrace(2) •Έ্ɺҰϓϩάϥϜΛࢭΊΔඞཁ͕͋Δ •ࢭΊ͍ͯΔ͔Βͦ͜ྫ͑ϨδελΛߋ৽ͨ͠ΓɺΑΓϓϩάϥϜͷ ڍಈʹ౿ΈࠐΜͩૢ࡞͕ՄೳͰ͋Δ ʮptraceγεςϜίʔϧೖʯ
https://itchyny.hatenablog.com/entry/2017/07/31/090000
Comparison to perf •perf tracepoint ͳͲɺ eBPF ͕औಘͰ͖ΔΑ͏ͳใͷଟ͘Λಉ͡ Α͏ʹऔಘͰ͖Δ
•Ұํɺूܭɺྫ͑ϓϩʔϒ͝ͱʹ perf_event_open(2) ͯ͠ɺ ϢʔβϥϯυͰूܭ͢ΔͳͲΦʔόϔου͕ແࢹͰ͖ͳ͍ ʮ؍ଌऀޮՌʯ •eBPFΧʔωϧͰϑΟϧλɺूܭ(eBPF map)͕Ͱ͖Δɻ DTrace ʹ͍ۙɻ
None
eBPF and Kernel events
eBPF event source http://www.brendangregg.com/blog/2019-07-15/bpf-performance-tools-book.html
Important source for tracing •perf, ftrace, eBPF Ͱಉ͡ιʔεΛ͏ ʮperf, ftraceͷ͘͠Έʯ
http://mmi.hatenablog.com/entry/2018/03/04/052249
tracepoint •LinuxΧʔωϧʹɺ෦Ͱى͜Δ༷ʑͳΠϕϯτΛ τϨʔε͢ΔͨΊͷϑοΫϙΠϯτ͕Έࠐ·Ε͍ͯΔɻ •ͦΕΒΛ tracepoint ͱݺͿɻΧʔωϧͷཚػೳΛͬͨ࣌ͷΠϕϯ τͷྫ
kprobe •tracepointجຊతʹ͋Β͔͡ΊΧʔωϧ։ൃऀ͕༻ҙͨ͠ ϑοΫϙΠϯτ͔͠τϨʔεͰ͖ͳ͍ɻ •ࣗͰɺಛఆͷΧʔωϧؔͷݺͼग़͠ΛτϨʔε͍ͨ͠߹ kprobe Λ͏ɻόʔδϣϯɺΞʔΩςΫνϟͰҟͳΔ͜ͱʹҙ͢Δ
uprobe •ϢʔβۭؒͷϓϩάϥϜͷڍಈΛɺΧʔωϧଆͰ͍͔͚ΒΕΔ •uprobe ɺόΠφϦ୯Ґʢਖ਼֬ʹͦͷ࣮ߦϑΝΠϧͷinode୯Ґͱ ͷ͜ͱʣͰΠϕϯτΛొ͢Δඞཁ͕͋Δɻ •ྫ͑ɺόΠφϦͰݟ͍͑ͯΔؔΛొ͢Δ
USDT •User Statically Defined Tracepoint •ϢʔβϓϩάϥϜͷҙͷՕॴʹprobeΛֻ͚ɺΦʔόʔϔουগ ͳ͘ར༻͢Δ͜ͱ͕Ͱ͖Δɻʢதͱͯ͠uprobeʹͳΔ༷ʣ
Others •perfͰ͏Α͏ͳϋʔυΣΞιϑτΣΞΧϯλͳͲeBPF͔ Βѻ͑Δɻ •bpftrace ͷϚχϡΞϧͰɺhardwareϓϩόΠμɺ softwareϓϩόΠ μɺϝϞϦͷwatchpointϓϩόΠμ͕ଘࡏ͢Δ
“Raw” usage of tracefs •tracefs Λܦ༝ͯ͠ɺeBPFͳ͠ͰΧʔωϧτϨʔεՄೳ (debugfs͔Βݟ͑Δͷͱಉ͡ɺΑΓݶఆతͳػೳ͔͠ݟͤͳ͍) ʮࣗͷͨΊͷΧʔωϧτϨʔγϯάɺͦͷ1ʯ https://udzura.hatenablog.jp/entry/2019/09/02/174801 echo
"p:myprobe1 $sym" >> \ /sys/kernel/debug/tracing/kprobe_events ʮftrace Λͬͨίϯςφσόοάͷ४උʯ https://speakerdeck.com/kentatada/container-debug-using-ftrace
ping͕connectΛଧͭτϨʔε
OK, what is good with containers?
eBPF use case •Debugging HOST Linux itself •Syscalls or kernel
functions around containers •Runtime performance •bpftrace result to Prometheus for monitoring •Tracing events per container •Cgroup v2 with eBPF •Tracee by AquaSeciruty
Tracing kernel on containers •ίϯςφ༷ʑͳΧʔωϧػೳΛ͏ͷͰɺͦͷΧʔωϧػೳࣗମΛ σόοάͨ͠Γܭଌͨ͠Γ͢Δ͜ͱ͕eBPFͰͰ͖Δɻ •ྫ͑: `ip netns add/del`
•෦Ͱ copy_net_ns/cleanup_net ͱ͍͏ΧʔωϧؔΛݺͿ •͜ΕΒ͞Βʹ෦ͰΧʔωϧͷόʔδϣϯʹΑΓϩοΫΛऔΔͷ ͰɺύϑΥʔϚϯεӨڹͳͲΛௐ͍ͨˠ eBPF Ͱʂ
Demo (1)
Reference •ʮLinux Kernel: rtnl_mutex Λ࣌ؒ ϩοΫͯͬͨ͠͞ঢ়ଶΛ؍͢Δʯ •https://hiboma.hatenadiary.jp/entry/2019/10/29/123455 •ʢ༨ஊͰ͕͢hiboma͞Μͷ͓͔͛Ͱ /proc/$pid/stack
wchan ͷ͍ํΛ Ѳ͠·ͨ͠ʣ
Tracing Runtime •ʢ࡞ίϯςφHaconiwaͰʣҎԼΛܭଌͯ͠Έͨ •ίϯςφϥϯλΠϜͷىಈʙexecve͢Δ·Ͱͷ࣌ؒ •ίϯςφϥϯλΠϜͷىಈʙίϯςφ͕listen͢Δ·Ͱͷ࣌ؒ •USDTͱtracepointͷ Έ߹Θͤ
bpftrace script
bpftrace → Prometheus •bt2prom ͱ͍͏πʔϧΛॻ͍ͨɻ •bpftraceͷు͖ग़͢JSONϑΥʔϚοτΛɺPrometheusՄͷϑΥʔ Ϛοτʹมɻ •ͦͷ·· Textfile exporter
ͷσΟϨΫτϦʹஔ͍ͨΒϓϩοτՄೳ •Cron ͳͲͰʢsarΈ͍ͨͳΠϝʔδͰʣఆظ࣮ߦ͢ΔͷΛఆ “Format bpftrace JSON into prometheus-compat textfile” https://github.com/udzura/mruby-bin-bt2prom
ࡶʹ vfs_read ΛτϥοΫͨ͠ྫ
CGroup v2 x eBPF •BPFͷcgroupઐ༻ؔ - ࣮ߦ͞ΕͨεϨου͕ॴଐ͢Δcgroup͕Θ͔ Δɻ BPF_FUNC_get_current_cgroup_id ΄͔
•Χʔωϧ͕ΊͪΌ৽͘͠ͳ͍ͱ͑ͳ͍... ͕ɺศར •ίϯςφ୯ҐͰɺͲͷΑ͏ͳϑΝΠϧ͕Φʔϓϯ͞ΕΔ͔ͷτϨʔε ͳͲ͕༰қʹͰ͖Δ •e.g. Apache HTTPDίϯςφ͕ϦΫΤετຖʹ։͘ϑΝΠϧͷsnoop
Demo (2) ͕࣌ؒͳ͍Ͱ͢ɺੋඇ͓͕͚Λʂ
Tracee •eBPFΛશ໘తʹ͏ίϯςφτϨʔα࣮ •෦ͰPID → NamespaceΛղܾͳͲ •bpftrace/BCC൚༻తͳͷͰɺ ಛԽͨ͠ػೳʹظ https://blog.aquasec.com/ebpf-tracing-containers
Conclusion
Happy publishing!
We’re moving to cgroup v2 •Moby ͷ cgroup v2 ରԠP/R
(WIP) •Systemd ͷ v2 default Խ (from 243)
What is new in cgroup v2 (Reprise) •Unified Hierarchy •CGroup-aware
OOM Killer •nsdelegate and better cgroup namespace •PSI - Pressure Stall Information •BPF helper for cgroup v2 (such as BPF_FUNC_get_current_cgroup_id, ...)
It should be “per-container” •Load Avarage •Memory usage •psutils, top,
vmstat... •netstat, iostat •syslog, auditd •perf Host-wide Per-Container •Cgroup stat •PSI(especially) •eBPF (per container) •USDT, syscalls... •sysdig/falco •perf --cgroup
Understand new feature to use new tools in a better
way