Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rubyのパフォーマンスプロファイリングの改善 / Enhancing performance...

osyoyu
July 19, 2024

Rubyのパフォーマンスプロファイリングの改善 / Enhancing performance profiling for Ruby

osyoyu

July 19, 2024
Tweet

More Decks by osyoyu

Other Decks in Programming

Transcript

  1. RubyϓϩϑΝΠϥͷݱگ (2023) StackProf Sampling In-Process rb_pro fi le_frames() ௿Φʔόʔϔου ΞΫςΟϒͳεϨουͷΈΛ؍࡯Մೳ

    ruby-prof Tracing In-Process TracePoint ͢΂ͯͷϝιουݺͼग़͠Λਖ਼֬ʹ௥੻Մ ߴΦʔόʔϔου rbspy Sampling Out-of-Process process_vm_readv(2) ϓϩάϥϜͷมߋͳ͠ͰಋೖՄೳ RubyͷόʔδϣϯʹରԠͨ͠΋ͷ͕ඞཁ • ޿͘࢖ΘΕΔϓϩϑΝΠϥ͕͍͔ͭ͘ଘࡏ͢Δ • ͦΕͧΕͷಛ௃͕͋ΓɺͲΕ͕࠷΋͙͢Ε͍ͯΔͱ͍͏΋ͷͰ͸ͳ͍ • (ࠓճͷϓϩδΣΫτͰ͸CPUϓϩϑΝΠϥʹߜͬͯௐࠪ)
  2. RubyϓϩϑΝΠϥͷݱگ (2023) Sampling Tracing ҰఆͷपظͰαϯϓϧΛऔಘ͢Δɺ౷ܭతͳख๏ 👍 Low overhead / 👎

    ৴པੑ͸ྼΔ શϝιουΤϯτϦʔͱϦλʔϯΛه࿥͢Δ 👍 ৴པੑ͸࣮֬ / 👎 High overhead In-Process Out-of-Process Rubyϓϩηε಺ʹϓϩϑΝΠϥΛઃஔ͢Δख๏ 👍 ॲཧܥʹؔ͢Δଟ༷ͳ৘ใΛऔಘՄ 👎 ಋೖ͕ෳࡶʹͳΓ͕ͪ ϓϩηε֎͔ΒϝϞϦಡΈऔΓ౳Ͱ৘ใΛऔಘ 👍 ಋೖɾ࢖͍ํ͕γϯϓϧ 👎 ࢓૊Έͱͯ͠ෳࡶɺॲཧܥ΁ͷ௥ै͕େม StackProf Sampling In-Process rb_pro fi le_frames() ௿Φʔόʔϔου ΞΫςΟϒͳεϨουͷΈΛ؍࡯Մೳ ruby-prof Tracing In-Process TracePoint ͢΂ͯͷϝιουݺͼग़͠Λਖ਼֬ʹ௥੻Մ ߴΦʔόʔϔου rbspy Sampling Out-of-Process process_vm_readv(2) ϓϩάϥϜͷมߋͳ͠ͰಋೖՄೳ RubyͷόʔδϣϯʹରԠͨ͠΋ͷ͕ඞཁ
  3. ϓϩδΣΫτͷΰʔϧઃఆ RubyҎ֎ͷݴޠॲཧܥͷϓϩϑΝΠϥΛ؍࡯ • Go (pprof) ΍ Java (JFR, async-pro fi

    le, ...) ͷΤίγεςϜ͕ඇৗʹॆ࣮ • ݴޠػೳʹ͋Θͤͨػೳ͕ॆ࣮͍ͯ͠Δ • ಛʹ pprof (Go) ͸಄ͻͱͭൈ͚͍ͯΔ • Xcode, Android Studio ૊ΈࠐΈͷϓϩϑΝΠϥ΋ۃΊͯڧྗ • CPU, Memory, Network, Battery, Events, શ෦·ͱΊͯݟΒΕΔ → Ruby ͷಛੑΛΑΓ౿·͑ͨϓϩϑΝΠϦϯάΛՄೳʹ͍ͨ͠
  4. ϓϩδΣΫτͷΰʔϧઃఆ RubyͷಛੑΛ౿·͑ͨϓϩϑΝΠϧΛͰ͖Δ͜ͱ - RubyεϨου୯ҐͰͷ݁ՌΛग़ྗͰ͖Δ - RubyҎ֎ͷ෦෼ (C) ΛϒϥοΫϘοΫεʹ͠ͳ͍ - Global

    VM Lock (GVL) ͷӨڹΛܭଌͰ͖Δ ͙͢ΕͨϏδϡΞϥΠζΛ௨ͨ͠෼ੳΛՄೳʹ͢Δ͜ͱ - ෼ੳʹ༗༻ͳࢹ఺Λఏڙ͢Δ ҎԼͷੑ࣭Λͦͳ࣮͑ͨݧతͳϓϩϑΝΠϥΛ࣮૷͢Δɻ ·ͨɺ͜ΕΒͷ੒ՌΛRubyΤίγεςϜʹؐݩ͢Δɻ
  5. Pf2 ϓϩϑΝΠϥ • Rubyͷ "۱ʑ·Ͱ" "ਖ਼֬ʹ" ଌఆ Ͱ͖Δ͜ͱΛ໨ࢦ͢ • RubyϑϨʔϜ

    • C (Native) ϑϨʔϜ • GVLͷ઎༗ঢ়گ • Sampling, In-process ͳϓϩϑΝΠϥ ͱͯ͠ઃܭ • ͙͢ΕͨϏδϡΞϥΠζ΋ͦͳ͑Δ
  6. ࢖͍͔ͨ % bundle exec pf2 run -- ruby main.rb [Pf2]

    Listening on localhost:51502. [Pf2] Open https://profiler.firefox.com/from-url/ http%3A%2F%2Flocalhost%3A51502%2Fprofile for visualization. Pf2.profile(mode: :wall) { # code to be profiled fib(200) } ίϚϯυϥΠϯ͔Β ruby શମΛϓϩϑΝΠϧ ϓϩάϥϜͷҰ෦͚ͩΛϓϩϑΝΠϧ
  7. Firefox Profilerͷར༻ • ϏδϡΞϥΠζ͸ॏཁͳػೳ͕ͩɺͦΕࣗମͰ1ϓϩδΣΫτʹͳΔ΄Ͳେ͖͍ • ຊϓϩδΣΫτͰ͸ɺ׬੒౓ͷߴ͍طଘͷϏδϡΞϥΠβ = Firefox Pro fi

    lerʹ RubyͷཁૉΛϚοϐϯά͢Δ͜ͱͰରԠ • Firefox Pro fi lerͷϓϩϑΝΠϥػೳ͸·ͬͨ͘࢖͍ͬͯͳ͍ ϓϩϑΝΠϧσʔλࣗମ͸ Firefox Pro fi ler ʹඇґଘɻ ଞͷϏδϡΞϥΠβʢgo tool pprofʣʹ΋ରԠ
  8. [σϞ] Optcarrot % pf2 run -- bin/optcarrot 'examples/Lan Master.zip' [Pf2]

    Listening on localhost:51502. [Pf2] Open https://profiler.firefox.com/from-url/ http%3A%2F%2Flocalhost%3A51502%2Fprofile for visualization.
  9. [σϞ] Multi-threaded Mandelbrot "Mandelbrot set" by Wolfgang Beyer. Licensed under

    CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/ • Ϛϯσϧϒϩू߹ = ਺ࣜʹΑͬͯఆٛ͞ΕΔ • Ϛϯσϧϒϩू߹Λܭࢉ͠ɺࠨਤΛϓϩοτ ͢ΔϓϩάϥϜΛϓϩϑΝΠϧ • ฒྻԽ͕Α͘ޮ͘΋ͷͱͯ͠஌ΒΕΔ • ͜͜Ͱ͸16 threadsͰܭࢉ
  10. Pf2ͷ࢓૊Έͷུ֓ αϯϓϧͷऩू ҰఆपظͰϓϩάϥϜͷ࣮ߦঢ়گΛऔಘ ूܭ ؔ਺໊ͷ෮ݩͳͲ ϏδϡΞϥΠζ Call tree΍ fl amegraphͳͲͷ෼͔Γ΍͍͢දݱ΁

    • Ұఆ࣌ؒ͝ͱʹ • RubyͷελοΫͱ • CͷελοΫΛऩू • ϝϞϦ্ʹอଘ • σϑΥϧτͷपظ͸9ms
  11. Pf2ͷ࢓૊Έͷུ֓ αϯϓϧͷऩू ҰఆपظͰϓϩάϥϜͷ࣮ߦঢ়گΛऔಘ ूܭ ؔ਺໊ͷ෮ݩͳͲ ϏδϡΞϥΠζ Call tree΍ fl amegraphͳͲͷ෼͔Γ΍͍͢දݱ΁

    • ऩूͨ͠αϯϓϧΛूܭ͠ɺ ग़ݱճ਺ͷଟ͍ϝιου౳Λܭࢉ • ϝιου໊ͷ෮ݩ΋͜ͷλΠϛϯά Ͱߦ͏ • αϯϓϧऩू࣌͸ϙΠϯλ͚ͩΛ อ͍࣋ͯ͠Δ
  12. Thread୯ҐͰͷߴ౓ͳϓϩϑΝΠϦϯά • pthread ୯ҐͰͷγάφϧઃఆ & Ruby 3.3 Ͱ௥Ճͨ͠ rb_pro fi

    le_thread_frames() Λར༻ • ؔ৺ͷ͋ΔεϨουΛ໌ࣔతʹࢦఆ͢Δ͜ͱ΋Մೳʹͳͬͨ 0ms 10ms 20ms 30ms Ruby Thread 1 Ruby Thread 2 SIGPROF SIGPROF Using CPU Using CPU Using CPU γάφϧϋϯυϥ rb_pro fi le_thread_frames() Λ࢖͍ελοΫΛऔಘ γάφϧϋϯυϥ
  13. C (Native) ίʔυͷϓϩϑΝΠϦϯά • RubyͰ͸CͰ࣮૷͞Εͨػೳ͕ͨ͘͞Μ͋Δ • C֦ுɺRuby಺෦ͷॲཧ • RubyͷϓϩϑΝΠϦϯάػೳ rb_pro

    fi le_frames() Ͱ͸͜ΕΛ؍࡯Ͱ͖ͳ͍ • libbacktrace Λར༻͠ɺ࣮ߦதͷΞυϨεʢPCʣΛอଘ͢Δ͜ͱͰରԠ 0ms 10ms 20ms 30ms Ruby Thread 1 SIGPROF Using CPU Using CPU γάφϧϋϯυϥ rb_pro fi le_thread_frames() + PCͷอଘ
  14. ʮݟ͑ͳ͍ʯCίʔυͷՄࢹԽ a.k.a. Native Stack Consolidation <main> Integer#times block in <main>

    Kernel#foo block (2 levels) in <main> main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() vm_exec_core() rb_yield() rb_foo() vm_call_cfunc_with_cfra main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() <main> Integer#times block in <main> Kernel#my_cfunc vm_callc_cfunc_with_cframe() vm_exec_core() rb_yield() Kernel#foo block (2 levels) in <main> Ruby/CͰผʑʹऩूͨ͠ ελοΫΛ "Ϛʔδ" ͢Δ ෦෼ʹ՝୊͕͋Δ
  15. ʮݟ͑ͳ͍ʯCίʔυͷՄࢹԽ <main> Integer#times block in <main> Kernel#my_cfunc block (2 levels)

    in <main> Kernel#foo block (2 levels) in <main> main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() vm_exec_core() rb_yield() rb_foo() main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() ωΠςΟϒίʔυͷ main() ͔ΒߏஙΛ։࢝ rb_vm_exec() Λൃݟͨ͠ Β Ruby ελοΫʹ͏ͭΔ
  16. ʮݟ͑ͳ͍ʯCίʔυͷՄࢹԽ <main> Integer#times block in <main> Kernel#my_cfunc block (2 levels)

    in <main> Kernel#foo block (2 levels) in <main> main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() vm_exec_core() rb_yield() rb_foo() main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() <main> Integer#times block in <main> Kernel#my_cfunc cfunc Λൃݟͨ͠Β C ͷε λοΫʹ໭Δ
  17. ʮݟ͑ͳ͍ʯCίʔυͷՄࢹԽ <main> Integer#times block in <main> Kernel#my_cfunc block (2 levels)

    in <main> Kernel#foo block (2 levels) in <main> main() ... ... vm_callc_cfunc_with_cframe() my_cfunc() vm_exec_core() ... 0x7cabca (non-cfunc) (non-cfunc) 0x7cabca 0x7c0009 0x7c0010 #[repr(C)] pub struct rb_callable_method_entry_struct { pub flags: VALUE, pub def: *mut rb_method_definition_struct, // ... } #[repr(C)] pub struct rb_method_definition_struct { pub type_: c_int, _padding: [c_char; 4], pub cfunc: rb_method_cfunc_struct, } #[repr(C)] pub struct rb_method_cfunc_struct { pub func: *mut c_void, } ͜ͷ࢓૊ΈͷͨΊʹଟ਺ͷ struct Λ CRuby ͔Βίϐʔ͍ͯ͠Δ... rb_pro fi le_frames() ͷฦΓ஋͔ΒͨͲΕΔ cfunc ͷϙΠϯλͱҰக͢ΔΞυϨεΛϚονϯά
  18. ʮݟ͑ͳ͍ʯCίʔυͷՄࢹԽ <main> Integer#times block in <main> Kernel#my_cfunc block (2 levels)

    in <main> Kernel#foo block (2 levels) in <main> main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() vm_exec_core() rb_yield() rb_foo() main() ruby_run_node() rb_ec_exec_node() rb_vm_exec() <main> Integer#times block in <main> Kernel#my_cfunc vm_callc_cfunc_with_cframe() vm_exec_core() rb_yield() Kernel#foo block (2 levels) in <main> શͯফඅͨ͠Β׬ྃ
  19. ߴ౓ͳεέδϡʔϦϯά • εέδϡʔϦϯά = αϯϓϦϯάपظͷௐ੔ • Stackprof ͸ CPU time

    ʹରԠ͍ͯ͠Δ • ϓϩηε͕Ұఆͷ CPU ࣌ؒΛফඅͨ͠ΒαϯϓϦϯάΛ૸ΒͤΔ࢓૊Έ • ͔͠͠ɺεϨουؒͰॲཧʹภΓ͕͋Δ࣌ɺ݁Ռ͕͓͔͘͠ͳΔ • Linux ݶఆͰ͸͋Δ͕ɺtimer_create(2) ͷ CLOCK_THREAD_CPUTIME_ID ΦϓγϣϯΛ࢖͏͜ͱͰɺεϨου୯ҐͷܭଌΛ࣮ݱ͍ͯ͠Δ
  20. ߴ౓ͳεέδϡʔϦϯά job() 0ms 10ms 20ms 30ms job() job() • σϑΥϧτͷपظ͸

    9ms ͱͨ͠ • Lockstep sampling (पظੑͷ͋Δ࣮ߦΛݟಀͯ͠͠·͏໰୊) ରࡦ ݟಀ͢ ݟಀ͢ ݟಀ͢
  21. ͓·͚: RustʹΑΔ࣮૷ͷײ૝ unsafe { • Pf2ͷC֦ு෦෼͸΄ͱΜͲRustͰ࣮૷ͨ͠ • αϯϓϧͷετϨʔδ౳Λ࣮૷͢Δʹ͋ͨͬͯ͸ྑ͔ͬͨ • Ruby

    C API΍libcΛݺͼग़͢෦෼Ͱ͸΄΅ C ͷݟͨ໨ (unsafe) • RustͰgemΛͭ͘ΔΤίγεςϜ͸·ͩ·ͩൃల్্ • CRuby͕ఏڙ͢ΔϚΫϩ (INT2FIX౳) ͸࢖͑ͳ͔ͬͨΓ • ruby masterͰϏϧυͰ͖ͳ͔ͬͨΓ }
  22. ϓϩϑΝΠϥͷΦʔόʔϔουͷධՁ ࣮ߦ࣌ؒ • optcarrot (ϑΝϛίϯΤϛϡ Ϩʔλ) ͰϕϯνϚʔΫ • CPU-bound ͸

    CPU ϓϩϑ ΝΠϥʹͱͬͯݫ͍͠৚݅ • Φʔόʔϔου͸ 1-2% Ͱ ͋Δ͜ͱ͕ࣔ͞Εͨ
  23. ϓϩϑΝΠϥͷΦʔόʔϔουͷධՁ ফඅϝϞϦ • ऩूͨ͠αϯϓϧΛϝϞϦʹอଘ͢Δ౎߹্ɺϝϞϦফඅ͸૿େ͢Δ • 3.01ඵͷ optcarrot ࣮ߦͰͷϝϞϦফඅ (time -v

    ʹΑΔ maximum RSS) • Pf2ແޮ ... 58.9 MB • Pf2༗ޮ ... 69.3 MB • ελοΫͷେ͖͞ͱ࣌ؒʹରͯ͠ઢܗʹ৳ͼΔ • RailsΞϓϦέʔγϣϯͰ͸ΑΓݦஶʹͳΔ
  24. ϝϞϦফඅͷվળ΁ • αϯϓϧ1ͭ͋ͨΓ14KBͷ͸ͣ • 111 samples/ඵ = 1.5 MB/ඵ •

    ·ͨɺframesΛ࣋ͭ͜ͱͰCRubyͷ ϑϨʔϜ (Callable Method Entry) ͕ GC͞Εͳ͘ͳΓɺϝϞϦফඅ͕૿େ • ਖ਼֬ͳௐࠪʹ͸ϓϩϑΝΠϥͷ ϓϩϑΝΠϥ͕ඞཁ……ʂ // 14038 bytes (14 KB) pub struct Sample { pub ruby_thread: VALUE, // 8 bytes pub timestamp: Instant, // 16 bytes pub line_count: i32, // 4 bytes // frames = rb_profile_frames() pub frames: [VALUE; 500], // 4000 bytes pub linenos: [i32; 500], // 2000 bytes // 8008 bytes pub c_backtrace_pcs: [usize; 1001], }
  25. ॲཧܥͱϓϩϑΝΠϥͷؔ܎ͷมԽ • C API ΍ॲཧܥͷ಺෦࣮૷Λ΋؍࡯ ͠Α͏ͱࢥ͏ͱɺॲཧܥͷ಺෦Ҏ֎ ʹஔ͖৔ॴ͕ͳ͍ • M:N ͕ਐΉͱɺ1

    Thread = 1 pthread ͷԾఆ͕յΕɺݱࡏͷख๏͕ద༻͠ ͮΒ͘ͳΔ • → CRuby ʹػೳΛ௥Ճ͍ͯ͘͠ํ๏ Λݕ౼ current Pf2