Slide 1

Slide 1 text

#1'CDDʹΑΔ τϨʔγϯάೖ໳ Ծ 2018/11/5 OSS

Slide 2

Slide 2 text

͜ͷൃදͷ಺༰ 2 BPFʹΑΔτϨʔγϯάͷ಺෦ಈ࡞ͷઆ໌͕ओʹͳΓ·͢ɽ ۩ମతͳπʔϧͷ࢖͍ํͷઆ໌΍ɼτϨʔγϯάͷηΦϦʔɾఆੴͳͲͷ ࿩͸͋·Γ͋Γ·ͤΜɽ ͜ͷࢿྉͷBPF = eBPFͰ͢

Slide 3

Slide 3 text

01 02 03 ΞδΣϯμ 3 Linux Tracing ͷ֓ཁ BPFʹΑΔτϨʔγϯά bccʹΑΔτϨʔγϯά

Slide 4

Slide 4 text

1 Linux Tracing System 4

Slide 5

Slide 5 text

͸͡Ίʹ 5 BPFͰͷτϨʔγϯά = º ׬શʹ৽͍͠τϨʔγϯάϑϨʔϜϫʔΫ ̋ طଘͷτϨʔγϯάϑϨʔϜϫʔΫΛิ͏΋ͷ

Slide 6

Slide 6 text

-JOVY5SBDJOH4ZTUFN$PNQPOFOU 6 Performance Counter (PMU) Tracepoint (Static Tracing) Kprobe (Dynamic Tracing) perf_event ftrace Lttng SystemTap Mcount (gprof) perf tracefs (debugfs) trace-cmd SystemTap Lttng *O,FSOFM 'SBNFXPSL 6TFSMBOE5PPM %BUBTPVSDF

Slide 7

Slide 7 text

-JOVY5SBDJOH4ZTUFN$PNQPOFOU 7 Tracepoint (Static Tracing) Kprobe (Dynamic Tracing) perf_event ftrace Lttng SystemTap Mcount (gprof) perf tracefs (debugfs) trace-cmd SystemTap Lttng *O,FSOFM 'SBNFXPSL 6TFSMBOE5PPM %BUBTPVSDF Performance Counter (PMU) zzzzzz zzzzzz

Slide 8

Slide 8 text

σʔλιʔε 8 ɾ$16ݻ༗ͷػೳ ɾ.43ܦ༝Ͱ৘ใΛऔಘ ɾ*1$ Ωϟογϡώοτ཰ ʜ Performance Counter (PMU) ɾ4UBUJD5SBDJOH ɾΧʔωϧ಺ʹຒΊࠐ·Ε͍ͯΔ ɾ$BMMCBDLؔ਺Λొ࿥Ͱ͖Δ Tracepoint ɾ%ZOBNJD5SBDJOH ɾCSFBLQPJOUʹΑΔ ಈతϑοΫ ɾ$BMMCBDLؔ਺Λొ࿥Ͱ͖Δ Kprobe 1 2 3

Slide 9

Slide 9 text

1FSGPSNBODF$PVOUFS 1.6 9 ɾ$16ݻ༗ͷػೳ ɾαΠΫϧ਺ *1$ Ωϟογϡώοτ཰ ෼ذ༧ଌώοτ཰ ʜ ɾ*OUFMͷ৔߹ ɾ.43 .PEFM4QFDJGJD3FHJTUFS ͔Βऔಘ ɾΞʔΩςΫνϟʹΑͬͯdݸఔ౓ ɾͲͷ৘ใΛಘ͍͔ͨ.43Ͱઃఆ͢Δ ɾಛఆͷ஋ʹୡͨ͠৔߹ׂΓࠐΈΛൃੜ͢Δػೳ͋Γ ɾ.43ݸ਺Ҏ্ͷ৘ใΛऔಘ͍ͨ͠৔߹͏·࣌͘෼ׂ͢Δඞཁ͕͋Δ

Slide 10

Slide 10 text

5SBDFQPJOU 10 ɾΧʔωϧιʔεதʹ௚઀ఆٛ ) ) ( (( ɾUSBDF@ ͱ͍͏໊લͷఆ͕ٛ͋Ε͹ ͍͍ͩͨ5SBDFQPJOUͷఆٛ ɾΧʔωϧόʔδϣϯ͕ҟͳͬͯ΋Πϯ λϑΣʔεతͳޓ׵ੑ͕͋Δʢ͸ͣʣ https://github.com/torvalds/linux/blob/v4.18/fs/exec.c#L1697

Slide 11

Slide 11 text

,QSPCF 11 Insn Break point pre handler post handler Insn ( ) ɾϒϨʔΫϙΠϯτΛར༻ͨ͠ ಈతϑοΫ ɾΧʔωϧ಺ͷେ෦෼͕ϑοΫՄೳ ɾΧʔωϧόʔδϣϯʹґଘ #

Slide 12

Slide 12 text

ɾ-JOVYඪ४૷උͷϓϩϑΝΠϥ ɾΧʔωϧ಺ϑϨʔϜϫʔΫ ( ) ( ) ) ɾϢʔβπεϖʔεπʔϧ ( ɾQFSGͰͰ͖Δ͜ͱ ɾΠϕϯτͷൃੜճ਺ͷΧ΢ϯτ ( ɾ)BSEXBSF&WFOU 1FSGPSNBODF$PVOUFS ɾ5SBDFQPJOU &WFOU 5SBDFQPJOU ,QSPCF ɾ4PGUXBSF&WFOU QFSGಠࣗͷΠϕϯτ ɾαϯϓϦϯά ( ɾ1.6ͷׂΓࠐΈΛར༻ͨ͠αϯϓϦϯά ҰൠʹαΠΫϧ਺Λར༻ 1FSG DGQFSGGUSBDFͷ࢓૊Έ IUUQNNJIBUFOBCMPHDPNFOUSZ 12 kprobe Performance Counter perf_event tracepoint perf_event_open(2) Hardware Tracepoint perf mmaped ring buffer Software

Slide 13

Slide 13 text

2 Tracing with BPF

Slide 14

Slide 14 text

5SBDJOHXJUI#1' Tracepoint Kporbe Perf software event Perf hardware event Event Call BPF Program Helper Function pid uid … eBPF Map perf buffer

Slide 15

Slide 15 text

15 bpf(2) system call Create BPF map Kernel Userland BPF map User Program

Slide 16

Slide 16 text

16 bpf(2) system call Verifier C source BPF Program JIT (Optional) Load BPF Program Kernel Userland BPF map LLVM/Clang User Program Event Attach BPF bytecode Tracepoint Kporbe Performane counter

Slide 17

Slide 17 text

17 bpf(2) system call C source BPF Program Load BPF Program Kernel Userland BPF map LLVM/Clang User Program Event Call Return value Access BPF bytecode Tracepoint Kporbe Performane counter Call Return value Helper Function

Slide 18

Slide 18 text

18 bpf(2) system call C source BPF Program Load BPF Program Kernel Userland BPF map LLVM/Clang User Program Event BPF bytecode Tracepoint Kporbe Performane counter Read BPF map

Slide 19

Slide 19 text

#1'ϓϩάϥϜͷྫ 19 ɾF#1' NBQ͔Β͜Ε·Ͱͷܭ਺݁ՌΛऔಘ ɾ݁ՌʹΛ଍ͯ͠NBQʹॻ͖໭͢ Πϕϯτൃੜճ਺ͷܭ਺ ɾϖΞͱͳΔؔ਺Λݟ͚ͭΔ FH BMMPDGSFF ɾQSPMPHVFͷؔ਺Ͱ࣌ࠁΛऔಘɼNBQʹ֨ೲ ɾFQJMPHVFͷؔ਺ͰNBQʹ֨ೲͨ࣌͠ࠁͱͷࠩΛܭࢉ ϨΠςϯγͷଌఆ

Slide 20

Slide 20 text

#1'ϓϩάϥϜྫ 20 https://github.com/torvalds/linux/blob/v4.18/samples/bpf/tracex3_kern.c ɾCMLJP MBUFODZͷଌఆ * , ( ( ( ( ( * , * ,( * ( ( ( *) ( * (

Slide 21

Slide 21 text

Χʔωϧαϙʔτঢ়گ 21 ػೳ -JOVY7FSTJPO #1'1SPHSBN5ZQF ,QSPCF 6QSPCF 5SBDFQPJOU 1FSGTPGUXBSF IBSEXBSFFWFOU https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md

Slide 22

Slide 22 text

#1'ͷ࣮ࡍͷར༻ํ๏ 22 Linux Sample https://github.com/torvalds/linux/tree/master/samples/bpf bpf(2) http://man7.org/linux/man-pages/man2/bpf.2.html pef_event_open(2) http://man7.org/linux/man-pages/man2/perf_event_open.2.html

Slide 23

Slide 23 text

3 Tracing with bcc

Slide 24

Slide 24 text

#1'ϓϩάϥϜ࡞੒Ͱେมͳ఺ • υΩϡϝϯτෆ଍ɼγεςϜίʔϧͷཧղ͕େม • CQG QFSG@FWFOU@PQFO ͱ͍͏ڧఢ • $ݴޠͷจ๏ͱͯ͠ؾΛ͚ͭΔ͜ͱ͕ଟʑ͋Δ • FH จࣈྻఆ਺͸ελοΫʹ഑ஔ͢Δ • #1'NBQͷऔΓѻ͍ • #1'ϓϩάϥϜʹ#1'NBQͷGJMFEFTDSJQUPSΛຒΊࠐΉඞཁ͕͋Δ • Ұํ$MBOHͰ࡞੒ͨ͠#1'ϓϩάϥϜ͸&-'όΠφϦ • &-'όΠφϦΛద੾ʹϩʔυ͢Δϩʔμʔ͕ඞཁ • -JOVYͷαϯϓϧʹଘࡏ͢Δ͕ɼҰൠͷΞϓϦέʔγϣϯ͔Β͸࢖͍ʹ͍͘

Slide 25

Slide 25 text

CDD #1'$PNQJMFS$PMMFDUJPO 25 ɾIUUQTHJUIVCDPNJPWJTPSCDD ɾ#1'ϓϩάϥϜ࡞੒Λαϙʔτ͢ΔͨΊͷϥΠϒϥϦ ஫τϨʔγϯάʹݶఆ͢Δ΋ͷͰ͸ͳ͍ ɾ#1'༻NPEJGJFE$ίϯύΠϥ ϩʔμ ɾଞݴޠόΠϯσΟϯά -VB 1ZUIPO (P ˞τϨʔγϯάίʔυࣗମ͸$Ͱهड़ ɾCDDΛ༻͍ͨτϨʔγϯάπʔϧ܈

Slide 26

Slide 26 text

CDDͰͷϓϩάϥϜྫ 26 finish_task_switch() kprobe Pythonmap ! https://github.com/iovisor/bcc/blob/master/examples/tracing/task_switch.py

Slide 27

Slide 27 text

.PEJGJFE$ 27 https://github.com/iovisor/bcc/blob/master/examples/tracing/vfsreadlat.c &bcc (5BPF.4, modified C &BPF map206&-3 ' 7+19 &… &Clang$%%)5:%#* AST & eBPF map/8 &"!% https://github.com/iovisor/bcc/blob/master/docs/reference_g uide.md

Slide 28

Slide 28 text

CDDDPNQJMFS 28 ) : : (

Slide 29

Slide 29 text

πʔϧͱͯ͠ͷCDD 29 CDDͷϦϙδτϦʹɼCDDΛར༻ͨ͠τϨʔγϯάπʔϧؚ͕·Ε͍ͯΔ IUUQTHJUIVCDPNJPWJTPSCDDUSFFNBTUFSUPPMT ओཁEJTUSPʹQBDLBHF͕ଘࡏ ɾ6CVOUV ɾ'FEPSB ɾ"SDI ɾ(FOUPP ɾPQFO464& ɾ3)&- IUUQTHJUIVCDPNJPWJTPSCDDCMPCNBTUFS*/45"--NE

Slide 30

Slide 30 text

πʔϧͱͯ͠ͷCDD 30

Slide 31

Slide 31 text

πʔϧͱͯ͠ͷCDD 31

Slide 32

Slide 32 text

CDDͷܽ఺ ࢓༷ 32 ࣮ߦ࣌͝ͱʹίϯύΠϧ͕ൃੜ ͨͩ͠ɼBPFʹ͸جຊతʹҾ਺ͷ֓೦͕ͳ͍ͨΊಈతίϯύΠϧ͕ඞཁͳ৔໘͸ଟʑ͋Δ ґଘؔ܎͕૿Ճ͢Δ ݱঢ়Python3ରԠ͕͍·͍ͪ

Slide 33

Slide 33 text

Complementary 33

Slide 34

Slide 34 text

΍ͬͺΓ$Ҏ֎ͰτϨʔγϯά͍ͨ͠ 34 bpftrace (Dtrace-like ⇨ LLVM ⇨ eBPF) https://github.com/iovisor/bpftrace ply (Dtrace-like ⇨ eBPF) https://github.com/iovisor/ply py2bpf (Python byte code ⇨ eBPF) https://github.com/facebookresearch/py2bpf

Slide 35

Slide 35 text

(P͔Βͷར༻ 35 gobpf https://github.com/iovisor/gobpf github.com/iovisor/gobpf/bcc bcc binding (libbcc͕ඞཁ) github.com/iovisor/gobpf/elf elf loader (elfόΠφϦ͸ࣗ෼ͰίϯύΠϧ͢Δ)

Slide 36

Slide 36 text

ຊ೔આ໌͍ͯ͠ͳ͍͜ͱ 36 uprobe (⇔ kprobe) USDT (⇔ tracepoint) ftrace

Slide 37

Slide 37 text

·ͱΊ ैདྷͷLinux͔Βଘࡏ͢ΔτϨʔγϯάػߏΛBPFͰϓϩάϥϚϥϒϧʹ ར༻͢Δ͜ͱ͕Ͱ͖·͢ bccΛ࢖͏ͱBPFͰͷτϨʔεϓϩάϥϜ࡞੒͕͙ͬͱָʹͳΓ·͢ bccʹΑͬͯ(BPFͷ͜ͱΛԿ΋஌Βͳͯ͘΋)؆୯ʹBPFʹΑΔτϨʔγϯά ͕࣮ߦͰ͖·͢ Let’s try! 37

Slide 38

Slide 38 text

38