John Hawthorn
Sta
ff
Engineer @ GitHub
Ruby Committer
Rails Core
@jhawthorn
@[email protected]
john.hawthorn.website
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
bundle update
$ vernier run --
Fetching gem metadata from https://rubygems.org/...
Resolving dependencies...
Bundle updated!
#
written to ./bundle_update.vernier.json.gz
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
Firefox
CRuby
• C++
• Javascript
• JIT compilers
• multithreaded
• C
• Ruby
• JIT compiler
• multithreaded
Tracing pro
fi
ler
times = []
TracePoint.new(:call) do |tp|
times << Time.now
end
TracePoint.new(:return) do |tp|
puts "#{tp.method} took #{times.pop - Time.now}"
end
Slide 35
Slide 35 text
$ time ruby fib.rb
0.107s
$ time ruby -rprofile fib.rb
52.105s
$ time ruby-prof fib.rb
2.888s 28x slower
500x slower
Slide 36
Slide 36 text
Ruby
Native code
Idle
Slide 37
Slide 37 text
Ruby
Native code
Idle
with 30x overhead
Slide 38
Slide 38 text
Tracing: User timings
Slide 39
Slide 39 text
GC instrumentation
Slide 40
Slide 40 text
Thread.new do
loop do
sleep 0.001
end
end
Sampling pro
fi
ler
$samples = []
[
["test.rb:2:in `foo'", "test.rb:8:in `'"],
["test.rb:5:in `bar'", "test.rb:9:in `'"],
...
$samples << Thread.main.backtrace
Slide 41
Slide 41 text
sleep(1)
Slide 42
Slide 42 text
running program (idle)
sleep(1)
Slide 43
Slide 43 text
running program (idle)
sleep(1)
pure-Ruby pro
fi
ler thread
Slide 44
Slide 44 text
record sample
record sample
running program (idle)
sleep(0.001) sleep(0
sleep(0.001)
sleep(1)
pure-Ruby pro
fi
ler thread
Slide 45
Slide 45 text
running program (idle)
sleep(0.001) sleep(0
sleep(0.001)
start = Time.now
while Time.now < start + 1
# loop for 1s
end
record sample
record sample
pure-Ruby pro
fi
ler thread
Slide 46
Slide 46 text
stalled
sleep(0.001)
running program (Ruby GVL)
GVL contention
start = Time.now
while Time.now < start + 1
# loop for 1s
end
#define native_thread_yield() ((void)0)
#endif
/* 100ms. 10ms is too small for user level
* thread scheduling on recent Linux
* (tested on 2.6.35)
*/
#define TIME_QUANTUM_MSEC (100)
#define TIME_QUANTUM_USEC (TIME_QUANTUM_MSEC * 100
#define TIME_QUANTUM_NSEC (TIME_QUANTUM_USEC * 100
static void native_thread_dedicated_inc(rb_vm_t *v
struct rb_native_thread *nt);
Slide 52
Slide 52 text
Thread GVL states
• Running - Holding the GVL. Executing Ruby code
• Stalled - Blocked acquiring the GVL
• Suspended - GVL released (I/O, sleep, network, etc.)
Records backtrace!
Slide 53
Slide 53 text
No content
Slide 54
Slide 54 text
No content
Slide 55
Slide 55 text
No content
Slide 56
Slide 56 text
rb_pro
fi
le_frames
save sample
UNIX
Signal
queue job
"safepoint"
running program (Ruby GVL)
Stackprof (< 0.2.18)
RUBY_VM_CHECK_INTS
Slide 57
Slide 57 text
def empty_method
end
def slow_method
x = "h" + "e" + "l" + "l" + "o" + ","
x += "w" + "o" + "r" + "l" + "d"
empty_method
end
1000.times do
slow_method
end
Slide 58
Slide 58 text
def empty_method
end
def slow_method
x = "h" + "e" + "l" + "l" + "o" + ","
x += "w" + "o" + "r" + "l" + "d"
empty_method
end
1000.times do
slow_method
end
Slide 59
Slide 59 text
TOTAL (pct) SAMPLES (pct) FRAME
56 (100.0%) 56 (100.0%) empty_method
56 (100.0%) 0 (0.0%) slow_method
56 (100.0%) 0 (0.0%) Integer#times
56 (100.0%) 0 (0.0%)
def empty_method
end
def slow_method
x = "h" + "e" + "l" + "l" + "o" + ","
x += "w" + "o" + "r" + "l" + "d"
empty_method
end
1000.times do
slow_method
end
100%
Slide 60
Slide 60 text
rb_pro
fi
le_frames
save sample
UNIX
Signal
queue job
running program (Ruby GVL)
Stackprof (>= 0.2.18)
"safepoint"
RUBY_VM_CHECK_INTS
Slide 61
Slide 61 text
rb_pro
fi
le_frames
save sample
UNIX
Signal
queue job
running program (Ruby GVL)
Stackprof (>= 0.2.18)
"safepoint"
RUBY_VM_CHECK_INTS
Slide 62
Slide 62 text
async-signal-safe?
Is rb_profile_frames
Of
fi
cially? No.
In the VM? Since Ruby 3.0.0
On YJIT? Since Ruby 3.2.1
Slide 63
Slide 63 text
$ time \
ruby --yjit test.rb
2.149 total
$ stackprof run -- \
ruby --yjit test.rb
7.902 total
$ time vernier run -- \
ruby --yjit test.rb
2.351 total
def test(n)
while n > 0
n -= 1
end
end
test(100_000_000)
Slide 64
Slide 64 text
rb_pro
fi
le_frames
save sample
UNIX
Signal
queue job
running program (Ruby GVL)
Stackprof (>= 0.2.18)
"safepoint"
RUBY_VM_CHECK_INTS
Slide 65
Slide 65 text
rb_pro
fi
le_frames
UNIX
Signal
queue job
Stackprof (>= 0.2.18)
running program (C GVL)
GVLTest.sleep_holding_gvl(1)
Slide 66
Slide 66 text
rb_pro
fi
le_frames
UNIX
Signal
queue job
running program (C GVL)
rb_pro
fi
le_frames
UNIX
Signal
queue job
"Missed sample"
GVLTest.sleep_holding_gvl(1)
Slide 67
Slide 67 text
Vernier
rb_pro
fi
le_frames
save sample
Vernier thread
running program (C GVL)
📏
Less overhead!
More accurate!
GVLTest.sleep_holding_gvl(1)
Slide 68
Slide 68 text
main thread
puma worker thread
Stackprof + threads
Slide 69
Slide 69 text
main thread
puma worker thread
Stackprof + threads
$ bundle exec \
vernier run \
--hooks=rails \
--output /tmp/puma.profile.json \
--signal CONT --start-paused -- \
bin/rails s -p 3000
$ pkill -CONT -f puma Toggle pro
fi
ling
Slide 75
Slide 75 text
No content
Slide 76
Slide 76 text
No content
Slide 77
Slide 77 text
• Option 1: Use separate processes
• Option 2: Make "running" section faster
• Option 3: Short lived Ractor
Slide 78
Slide 78 text
No content
Slide 79
Slide 79 text
Use Ractor or make faster
Slide 80
Slide 80 text
Vernier
• GVL aware pro
fi
ling
• Pro
fi
les all Ractors and Threads (M:N supported)
• Low overhead, no postponed jobs
• GC tracing, GVL tracing, user tracing, allocation tracing
• Retained memory tracing (bonus feature)