Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Vernier: A next generation CRuby profiler - Rub...

Vernier: A next generation CRuby profiler - RubyKaigi 2024

John Hawthorn

May 15, 2024
Tweet

More Decks by John Hawthorn

Other Decks in Programming

Transcript

  1. bundle update $ vernier run -- Fetching gem metadata from

    https://rubygems.org/... Resolving dependencies... Bundle updated! #<Vernier::Result 0.6187 seconds, 1086 samples> written to ./bundle_update.vernier.json.gz
  2. Firefox CRuby • C++ • Javascript • JIT compilers •

    multithreaded • C • Ruby • JIT compiler • multithreaded
  3. name = line[0, line.index(" ")] checksum = line[line.rindex(" "), line.size]

    name, _, checksum = line.split(" ", 3) checksums[name] = checksum # name, _, checksum = line.split(" ", 3)
  4. $stack_table = Vernier::StackTable.new def bar = $stack_table.current_stack def foo =

    bar p foo() #=> 2 p 3.times.map { foo() } #=> [5, 5, 5] puts $stack_table.backtrace(2) # test.rb:3:in 'Object#bar' # test.rb:4:in 'Object#foo' # test.rb:5:in '<main>'
  5. $stack_table = Vernier::StackTable.new def bar = $stack_table.current_stack def foo =

    bar p foo() #=> 2 p 3.times.map { foo() } #=> [5, 5, 5] Never allocates Ruby objects
  6. Tracing pro fi ler times = [] TracePoint.new(:call) do |tp|

    times << Time.now end TracePoint.new(:return) do |tp| puts "#{tp.method} took #{times.pop - Time.now}" end
  7. $ time ruby fib.rb 0.107s $ time ruby -rprofile fib.rb

    52.105s $ time ruby-prof fib.rb 2.888s 28x slower 500x slower
  8. Thread.new do loop do sleep 0.001 end end Sampling pro

    fi ler $samples = [] [ ["test.rb:2:in `foo'", "test.rb:8:in `<main>'"], ["test.rb:5:in `bar'", "test.rb:9:in `<main>'"], ... $samples << Thread.main.backtrace
  9. running program (idle) sleep(0.001) sleep(0 sleep(0.001) start = Time.now while

    Time.now < start + 1 # loop for 1s end record sample record sample pure-Ruby pro fi ler thread
  10. stalled sleep(0.001) running program (Ruby GVL) GVL contention start =

    Time.now while Time.now < start + 1 # loop for 1s end
  11. #define native_thread_yield() ((void)0) #endif /* 100ms. 10ms is too small

    for user level * thread scheduling on recent Linux * (tested on 2.6.35) */ #define TIME_QUANTUM_MSEC (100) #define TIME_QUANTUM_USEC (TIME_QUANTUM_MSEC * 100 #define TIME_QUANTUM_NSEC (TIME_QUANTUM_USEC * 100 static void native_thread_dedicated_inc(rb_vm_t *v struct rb_native_thread *nt);
  12. Thread GVL states • Running - Holding the GVL. Executing

    Ruby code • Stalled - Blocked acquiring the GVL • Suspended - GVL released (I/O, sleep, network, etc.) Records backtrace!
  13. rb_pro fi le_frames save sample UNIX Signal queue job "safepoint"

    running program (Ruby GVL) Stackprof (< 0.2.18) RUBY_VM_CHECK_INTS
  14. def empty_method end def slow_method x = "h" + "e"

    + "l" + "l" + "o" + "," x += "w" + "o" + "r" + "l" + "d" empty_method end 1000.times do slow_method end
  15. def empty_method end def slow_method x = "h" + "e"

    + "l" + "l" + "o" + "," x += "w" + "o" + "r" + "l" + "d" empty_method end 1000.times do slow_method end
  16. TOTAL (pct) SAMPLES (pct) FRAME 56 (100.0%) 56 (100.0%) empty_method

    56 (100.0%) 0 (0.0%) slow_method 56 (100.0%) 0 (0.0%) Integer#times 56 (100.0%) 0 (0.0%) <main> def empty_method end def slow_method x = "h" + "e" + "l" + "l" + "o" + "," x += "w" + "o" + "r" + "l" + "d" empty_method end 1000.times do slow_method end 100%
  17. rb_pro fi le_frames save sample UNIX Signal queue job running

    program (Ruby GVL) Stackprof (>= 0.2.18) "safepoint" RUBY_VM_CHECK_INTS
  18. rb_pro fi le_frames save sample UNIX Signal queue job running

    program (Ruby GVL) Stackprof (>= 0.2.18) "safepoint" RUBY_VM_CHECK_INTS
  19. $ time \ ruby --yjit test.rb 2.149 total $ stackprof

    run -- \ ruby --yjit test.rb 7.902 total $ time vernier run -- \ ruby --yjit test.rb 2.351 total def test(n) while n > 0 n -= 1 end end test(100_000_000)
  20. rb_pro fi le_frames save sample UNIX Signal queue job running

    program (Ruby GVL) Stackprof (>= 0.2.18) "safepoint" RUBY_VM_CHECK_INTS
  21. rb_pro fi le_frames UNIX Signal queue job Stackprof (>= 0.2.18)

    running program (C GVL) GVLTest.sleep_holding_gvl(1)
  22. rb_pro fi le_frames UNIX Signal queue job running program (C

    GVL) rb_pro fi le_frames UNIX Signal queue job "Missed sample" GVLTest.sleep_holding_gvl(1)
  23. Vernier rb_pro fi le_frames save sample Vernier thread running program

    (C GVL) 📏 Less overhead! More accurate! GVLTest.sleep_holding_gvl(1)
  24. $ bundle exec \ vernier run \ --hooks=rails \ --output

    /tmp/puma.profile.json \ --signal CONT --start-paused -- \ bin/rails s -p 3000 $ pkill -CONT -f puma Toggle pro fi ling
  25. • Option 1: Use separate processes • Option 2: Make

    "running" section faster • Option 3: Short lived Ractor
  26. Vernier • GVL aware pro fi ling • Pro fi

    les all Ractors and Threads (M:N supported) • Low overhead, no postponed jobs • GC tracing, GVL tracing, user tracing, allocation tracing • Retained memory tracing (bonus feature)