Slide 1

Slide 1 text

STORES, Inc. Chasing Real-Time Observability for CRuby RubyKaigi 2026 2026/04/23

Slide 2

Slide 2 text

Hello! Shintaro Otsuka Software Engineer at STORES, Inc. @White-Green on GitHub @White_Green2525 on 𝕏 2

Slide 3

Slide 3 text

Background: It is difficult to understand program behavior ● It is difficult to accurately grasp the behavior from the literal text of the program (especially in Ruby). ● We can only observe the results of the operation, not the operation itself. ● How many times is a method called? ● How many threads are running? ● How often does GC run? ● How do these change when a button is clicked? 3

Slide 4

Slide 4 text

Background: Tools such as profilers exist ● Convenient for performance analysis ● Focused on performance analysis, so only statistical information can be seen ● Not real-time ● Challenging for interactive applications 4

Slide 5

Slide 5 text

Background: We should be able to do better today ● Development machines has many (>=10) cores. ● All cores other than the single core Ruby uses can be employed for analysis. ● The z-axis we perceive is not being utilized. ● CRuby is well-suited for creating such an observation tool. ● Wouldn't it be fun to see CRuby's behavior in real-time? 5

Slide 6

Slide 6 text

demo 6

Slide 7

Slide 7 text

rrtrace ● Real-time 3D visualization of CRuby behavior: Time x Thread x Stack. ● https://github.com/White-Green/rrtrace 7

Slide 8

Slide 8 text

internal: Overview ● A separate Visualizer Process runs apart from CRuby. ● Events like TracePoint are sent to the Visualizer Process for handling. 8 CRuby YARV C-ext TracePoint INTERNAL_THREAD_EVENT Visualizer Process Event Processor Event Window

Slide 9

Slide 9 text

internal: Capturing events in CRuby ● Use observation APIs available in CRuby ● TracePoint API ○ CALL, RETURN ○ INTERNAL_GC_ENTER, INTERNAL_GC_EXIT ← C-ext only ● INTERNAL_THREAD_EVENT ○ STARTED, READY, EXITED ○ SUSPENDED, RESUMED ○ unavailable on Windows… ● Convert into a unified format struct along with a timestamp 9 CRuby YARV C-ext Visualizer proc Window

Slide 10

Slide 10 text

internal: Capturing events in CRuby def add(a, b) a + b end add(1, 2) 10 CRuby YARV C-ext Visualizer proc Window ←CALL :add ←RETURN :add t1 = Thread.new { heavy_process } t2 = Thread.new { heavy_process } t1.join t2.join STARTED :t1 READY :t1 RESUMED :t1 STARTED :t2 READY :t2 SUSPENDED :t1 READY :t1 RESUMED :t2 SUSPENDED :t2 … EXITED :t1

Slide 11

Slide 11 text

internal: Capturing events in CRuby ● Convert to a common format data structure along with a timestamp. 11 CRuby YARV C-ext Visualizer proc Window timestamp (60bit) method id / thread id (64bit) ↑ event type (4bit) 0 = CALL 1 = RETURN 2 = GC_START …

Slide 12

Slide 12 text

internal: Sending event to visualizer process ● Create OS-managed shared memory for sharing between both processes ● Construct a ring buffer on the shared memory ○ Memory space for data + read and write indices ● Most transmissions involve 2 atomic rcw + 1 atomic addition ● Occasional reads including cache misses ● If the Visualizer processing is blocked, wait ○ All events obtained from TracePoint etc., must be processed without omission ○ We must ensure the Visualizer implementation is efficient to prevent this 12 CRuby YARV C-ext Visualizer proc Window

Slide 13

Slide 13 text

internal: Receiving event on visualizer process ● Treat the shared memory as a ring buffer with the same memory layout ● Collect events in batches and pass them to an internal processing queue ● Prioritize clearing the ring buffer space to avoid interfering with the CRuby process 13 CRuby YARV C-ext Visualizer proc Window

Slide 14

Slide 14 text

internal: Visualize ● Traverse events to simulate the stack for each thread. stack = [] events.each do |event| case event in [:call, timestamp, method_id] stack << [timestamp, method_id] in [:return, end_timestamp, _method_id] start_timestamp, method_id = stack.pop # x: start..end, y: thread_id, z: stack.size, color: method_id end 14 CRuby YARV C-ext Visualizer proc Window

Slide 15

Slide 15 text

internal: Visualize ● Traverse events to simulate the stack for each thread ● Thread-related events switch the active stack ● Drawing with the GPU 15 CRuby YARV C-ext Visualizer proc Window

Slide 16

Slide 16 text

internal: Parallel visualize ● Visualization must proceed at the same speed as CRuby's method calls ● Stack simulation is slower than calling the Integer#+ method ● Parallelization is the solution ○ Since Visualization doesn't need to be single-threaded 16 CRuby YARV C-ext Visualizer proc Window

Slide 17

Slide 17 text

internal: Parallel visualize ● Stack depth depend on the result of processing previous events ● cannot parallelize simply ● “Method information that hasn't been pushed cannot be popped” 17 CRuby YARV C-ext Visualizer proc Window Stack CALL 1 CALL 2 RETURN 2 RETURN 1 CALL 3

Slide 18

Slide 18 text

internal: Parallel visualize ● Aggregate from blocks of several events to determine "what couldn't be popped” and “what is finally left on the stack" ○ Parallelizable per block ● Merging the results for each block. ● By using this aggregate result and the events within each block again, the stack state at any point can be determined ○ parallelizable per block 18 CRuby YARV C-ext Visualizer proc Window CALL 1 CALL 2 RETURN 2 RETURN 1 CALL 3 CALL 4 pop: [1] stack: 3 4 pop: [] stack: 1 pop: [] stack: 3 4

Slide 19

Slide 19 text

internal: Parallel scan algorithm 19 CRuby YARV C-ext Visualizer proc Window Stack CALL 1 CALL 2 RETURN 2 RETURN 1 CALL 3 Stack CALL 1 CALL 2 RETURN 2 RETURN 1 CALL 3 ↑ Only serial processing is needed here

Slide 20

Slide 20 text

Performance Benchmark 20 function call / s rails server rps plain CRuby 73,417,127 (x1.00) 203.19 (x1.00) empty TracePoint handler 26,094,059 (x0.36) 153.94 (x0.76) rrtrace without sending event 13,866,260 (x0.19) 134.30 (x0.66) rrtrace 12,760,131 (x0.17) 110.84 (x0.55) TracePoint Timestamp Ring buffer

Slide 21

Slide 21 text

Conclusion ● 3D real-time visualization. ○ https://github.com/White-Green/rrtrace ● Real-time visualization of CRuby internals is possible with modern resources ○ with not small performance overhead... ● Open problems ○ Multi-process/Ractor support ○ GUI design 21