Method-based JIT compilation by transpiling to Julia
I will demonstrate a new approach to the method-based Just-In-Time compilation for Ruby. This is used Julia language as an infrastructure of the JIT compilation. I will describe the characteristics of this approach and show you some example results.
ffi ciently in Ruby by using Numo::NArray and Red Arrow libraries • However, numerical algorithms purely written in Ruby with these libraries are not very fast even if MJIT or YJIT is available • The reason is that these JIT compilers preserve all the semantics of Ruby
can be rede fi ned with other implementations anytime • The rede fi nition immediately a ff ect the code execution • Even in the middle of a while loop s = (1 .. 10).inject {|a, x| a + x } Numo::DFloat.new(1000, 2).rand_norm.cumsum(axis: 1) ↑ ↑ ↑ ↑ ↑
ffi ciently in Ruby by using Numo::NArray and Red Arrow libraries • However, numerical algorithms purely written in Ruby with these libraries does not run fast even if MJIT or YJIT is available • The reason is that these JIT compilers preserve all the semantics of Ruby • Ruby’s semantics is mostly meaningless in numerical algorithms so it is better to be able to optimize the computation by ignoring them • Today we must rewrite the algorithms into C extension libraries for such optimizations
type-specialization • Compiler stages: 1. Analyze CPython’s byte code 2. Generate Numba IR and rewrite it 3. Infer types of Numba IR and rewrite it into typed IR 4. Perform automatic parallelization 5. Generate LLVM IR from typed IR 6. Compile LLVM IR to native code (LLVM’s role)
• Numba has two modes: 1. Object mode — Preserve the full semantics of CPython by using the C API of CPython interpreter 2. nopython mode — Generate small and e ff i cient native code that specialized to the speci fi c native data types such as fl oat64 and numpy arrays • Object mode is required if the code to be JIT-ed uses the features that depends on CPython’s semantics • e.g. Attribute access of objects, using types unsupported by Numba • nopython mode can be applied if the code does not rely on such the features
mode def add(x, y): return x + y # Invoking JIT compile by calling the function add(10.0, 20.0) # Signatures of type-specific compiled functions print(add.signatures) #=> [(float64, float64)] # Assembly sources of the 1st type-specific compiled function print(add.inspect_asm(add.signatures[0])) #=> # (snip) # movabsq $PyNumber_Add, %rax # movq %r12, %rdi # movq %r14, %rsi # callq *%rax # (snip) PyNumber_Add function is used instead of the native add instruction
(powered by yadriggy.gem) function inject_sum(range) RbCall.inject(range) do a, x a + x end end [Def [Name “inject_sum”] [Params “range”] [Exprs [Call [Name “inject”] [Receiver [Name “range”]] [Block [Params “a” “x”] [Exprs [Call [Name “+”] [Receiver [Name “a”]] [Params [Name “x”]]]]]]]] The helper function to emulate the behavior of Ruby’s inject method
end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` function inject_sum_add(range) RbCall.inject(range) do a, x add(a, x) end end function add(x, y) x + y end inject_sum_op_rb Range (min … max): 4.670 ms … 5.644 ms Time (median): 4.725 ms Time (mean ± σ): 4.739 ms ± 78.906 μs █▄ ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 4.67ms Histogram: log(frequency) by time 5.062 ms< inject_sum_op_jl Range (min … max): 29.000 μs … 8.850 ms Time (median): 30.000 μs Time (mean ± σ): 30.615 μs ± 34.759 μs █ ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 29μs Histogram: frequency by time 40μs< inject_sum_add_rb Range (min … max): 6.141 ms … 6.597 ms Time (median): 6.163 ms Time (mean ± σ): 6.189 ms ± 67.294 μs █▃ ▇██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▇ 6.141 ms Histogram: log(frequency) by time 6.432 ms< inject_sum_add_jl Range (min … max): 29.000 μs … 2.593 ms Time (median): 30.000 μs Time (mean ± σ): 30.499 μs ± 12.872 μs █ ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 29μs Histogram: frequency by time 42μs<
region: Experiment def mandel(z) c = z maxiter = 80 (1 ... maxiter).each do |n| return n - 1 if z.abs2 > 4 z = z**2 + c end maxiter end def mandelbrot ary = [] (-1 .. 1.0).step(0.1) do |i| (-2 .. 0.5).step(0.1) do |r| ary << mandel(Complex(r, i)) end end ary end <latexit sha1_base64="QWzl+pFesLOlOuzv+S+NsWXVER4=">AAACFHicbVDLSsNAFJ34rPUVdelmsAhCa0lKfSwL3bisYB+QhDKZTtuhk0mYmUhD7Ee48VfcuFDErQt3/o3TNgttPXDhcM693HuPHzEqlWV9Gyura+sbm7mt/PbO7t6+eXDYkmEsMGnikIWi4yNJGOWkqahipBMJggKfkbY/qk/99j0Rkob8TiUR8QI04LRPMVJa6ppFNx0XE+rCBxeOoUs5dM4rJWiVL7wSTDLBLkHbcydds2CVrRngMrEzUgAZGl3zy+2FOA4IV5ghKR3bipSXIqEoZmSSd2NJIoRHaEAcTTkKiPTS2VMTeKqVHuyHQhdXcKb+nkhRIGUS+LozQGooF72p+J/nxKp/7aWUR7EiHM8X9WMGVQinCcEeFQQrlmiCsKD6VoiHSCCsdI55HYK9+PIyaVXK9mW5elst1OpZHDlwDE7AGbDBFaiBG9AATYDBI3gGr+DNeDJejHfjY966YmQzR+APjM8fh3+auA==</latexit> {x + yi | x 2 [ 2, 0.5], y 2 [ 1, 1]} The image created by Wolfgang Beyer, CC BY-SA 3.0 Re Im
1.0) for r in RbCall.RubyRange(-2, 0.1, 0.5) append!(ary, mandel(Complex(r, i))) end end ary end function mandel(z) c = z maxiter = 80 for n in RbCall.RubyRange(1, 1, maxiter, true) if abs2(z) > 4 return n - 1 end z = z ^ 2 + c end maxiter end def mandel(z) c = z maxiter = 80 (1 ... maxiter).each do |n| return n - 1 if z.abs2 > 4 z = z**2 + c end maxiter end def mandelbrot ary = [] (-1 .. 1.0).step(0.1) do |i| (-2 .. 0.5).step(0.1) do |r| ary << mandel(Complex(r, i)) end end ary end
1.0) for r in RbCall.RubyRange(-2, 0.1, 0.5) append!(ary, mandel(Complex(r, i))) end end ary end function mandel(z) c = z maxiter = 80 for n in RbCall.RubyRange(1, 1, maxiter, true) if abs2(z) > 4 return n - 1 end z = z ^ 2 + c end maxiter end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` mandelbrot_rb Range (min … max): 3.221 ms … 3.655 ms Time (median): 3.323 ms Time (mean ± σ): 3.326 ms ± 48.710 μs ▄▆ ▂▄ ▃ ▅█▄▂▄ ▅ ▆██▆██▆█▁█████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 3.221 ms Histogram: log(frequency) by time 3.431 ms< mandelbrot_jl Range (min … max): 164.000 μs … 8.291 ms Time (median): 167.000 μs Time (mean ± σ): 171.667 μs ± 99.903 μs ▁ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 164μs Histogram: frequency by time 201μs<
π by Monte Carlo method Experiment def monte_carlo_pi(n_samples) acc = 0 (0 ... n_samples).each do |i| x = rand y = rand acc += 1 if (x**2 + y**2) < 1.0 end return 4.0 * acc / n_samples end
n_samples, true) x = RbCall.rand(Float64) y = RbCall.rand(Float64) if (x ^ 2 + y ^ 2) < 1.0 acc += 1 end end return 4.0 * acc / n_samples end def monte_carlo_pi(n_samples) acc = 0 (0 ... n_samples).each do |i| x = rand y = rand acc += 1 if (x**2 + y**2) < 1.0 end return 4.0 * acc / n_samples end
n_samples, true) x = RbCall.rand(Float64) y = RbCall.rand(Float64) if (x ^ 2 + y ^ 2) < 1.0 acc += 1 end end return 4.0 * acc / n_samples end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` {:rb=>3.143348} {:jl=>3.143044} monte_carlo_pi_rb; N=1000000 Range (min … max): 105.851 ms … 108.312 ms Time (median): 106.178 ms Time (mean ± σ): 106.369 ms ± 602.130 μs █ ▇▇▇▁▇▁█▇▇▇▇▁▇▇▁▇▁▁▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ 105.851 ms Histogram: frequency by time 108.312 ms< monte_carlo_pi_jl; N=1000000 Range (min … max): 8.753 ms … 9.478 ms Time (median): 8.837 ms Time (mean ± σ): 8.851 ms ± 72.243 μs ▅▅█ ▅▅▇▁▅▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▇ 8.753 ms Histogram: log(frequency) by time 9.237 ms<
the divide-and-conquer method Experiment def partition(ary, lo, hi) pi = (lo + hi).div(2) pv = ary[pi] ary[pi], ary[hi] = ary[hi], ary[pi] j = lo JLTrans.inbounds do (lo ... hi).each do |i| if ary[i] <= pv ary[i], ary[j] = ary[j], ary[i] j += 1 end end end ary[j], ary[hi] = ary[hi], ary[j] j end def quicksort!(ary) quicksort0!(ary, 0, ary.length - 1) end def quicksort0!(ary, lo, hi) if lo < hi pi = partition(ary, lo, hi) quicksort0!(ary, lo, pi - 1) quicksort0!(ary, pi + 1, hi) end ary end
lo, hi) if lo < hi piv = partition(ary, lo, hi) quicksort0!(ary, lo, piv - 1) quicksort0!(ary, piv + 1, hi) end ary end function partition(ary, lo, hi) piv = div((lo + hi), 2) v = ary[piv + 1] ary[piv + 1], ary[hi + 1] = ary[hi + 1], ary[piv + 1] j = lo @inbounds for i in RbCall.RubyRange(lo, 1, hi, true) if ary[i + 1] <= v ary[i + 1], ary[j + 1] = ary[j + 1], ary[i + 1] j += 1 end end ary[j + 1], ary[hi + 1] = ary[hi + 1], ary[j + 1] j end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` quicksort_rb!; N=10000 Range (min … max): 7.413 ms … 8.341 ms Time (median): 7.539 ms Time (mean ± σ): 7.551 ms ± 90.993 μs ▂▅▄▄▂▆█▇▆ ▆██████████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 7.413 ms Histogram: log(frequency) by time 7.919 ms< quicksort_jl!; N=10000 Range (min … max): 1.790 ms … 42.982 ms Time (median): 1.823 ms Time (mean ± σ): 1.937 ms ± 1.764 ms █▇ ▄ ▅██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 1.79ms Histogram: log(frequency) by time 2.196 ms<
• If two arrays have the same length n, the complexity of calculation is O(n2) Experiment def convolve(x, y) m, n = x.length, y.length z = Array.new(m+n-1, JLTrans.zero(x[0])) (0 ... z.length).each do |k| zz = z[k] i0 = [k-n+1, 0].max (i0 ... m).each do |i| zz += x[i]*y[k-i] break if k-i == 0 end z[k] = zz end z end z[0] = x[0]*y[0] z[1] = x[0]*y[1] + x[1]*y[0] z[2] = x[0]*y[2] + x[1]*y[1] + x[2]*y[0] : <latexit sha1_base64="wTtf3Pnnv1+xMKE6GbtC1JAgDf4=">AAACEnicbVC7TsMwFHV4lvIKMLJYVEgwtEpQBSyVKnVhLBJ9SE2IHNdprTpOZDuIEOUbWPgVFgYQYmVi429wHwO0HOnqHp1zr+x7/JhRqSzr21haXlldWy9sFDe3tnd2zb39towSgUkLRywSXR9JwignLUUVI91YEBT6jHT8UWPsd+6IkDTiNyqNiRuiAacBxUhpyTNPH7wRrEFHJqGX0VrZoTxQaX477fDeozD1slGZ5p5ZsirWBHCR2DNSAjM0PfPL6Uc4CQlXmCEpe7YVKzdDQlHMSF50EklihEdoQHqachQS6WaTk3J4rJU+DCKhiys4UX9vZCiUMg19PRkiNZTz3lj8z+slKrh0M8rjRBGOpw8FCYMqguN8YJ8KghVLNUFYUP1XiIdIIKx0ikUdgj1/8iJpn1Xs80r1ulqqN2ZxFMAhOAInwAYXoA6uQBO0AAaP4Bm8gjfjyXgx3o2P6eiSMds5AH9gfP4A5fedpg==</latexit> zk = 1 X i= 1 xiyk i
fill(zero(x[1]), m + n - 1) for k in RbCall.RubyRange(0, 1, length(z), true) zz = z[k + 1] i0 = max(k - n + 1, 0) for i in RbCall.RubyRange(i0, 1, m, true) zz += x[i + 1] * y[k - i + 1] if k - i == 0 break end end z[k + 1] = zz end z end def convolve(x, y) m, n = x.length, y.length z = Array.new(m+n-1, JLTrans.zero(x[0])) (0 ... z.length).each do |k| zz = z[k] i0 = [k-n+1, 0].max (i0 ... m).each do |i| zz += x[i]*y[k-i] break if k-i == 0 end z[k] = zz end z end
fill(zero(x[1]), m + n - 1) for k in RbCall.RubyRange(0, 1, length(z), true) zz = z[k + 1] i0 = max(k - n + 1, 0) for i in RbCall.RubyRange(i0, 1, m, true) zz += x[i + 1] * y[k - i + 1] if k - i == 0 break end end z[k + 1] = zz end z end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` convolve_jl; N=10000, T=Any[] Range (min … max): 92.246 ms … 92.973 ms Time (median): 92.702 ms Time (mean ± σ): 92.671 ms ± 204.331 μs ▃ ▃▃ ▃ ▃█ ▃ ▃ ▃ ▃ ▃ ▃▃█ █ █ █▁▁▁▁▁▁▁▁██▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁██▁█▁█▁▁█▁▁█▁█▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁█ ▁ 92.246 ms Histogram: log(frequency) by time 92.973 ms< convolve_jl; N=10000, T=Float64[] Range (min … max): 88.514 ms … 89.272 ms Time (median): 88.779 ms Time (mean ± σ): 88.820 ms ± 216.830 μs ▁ ▁▅ █▅ ▁ ▁ ▁▁ ▁ ▅ ▁▅ ▅ █▁▁▁▁██▁▁▁▁▁██▁█▁▁▁▁█▁▁██▁▁▁█▁▁▁▁▁▁█▁▁▁▁██▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁ ▁ 88.514 ms Histogram: log(frequency) by time 89.272 ms< convolve_rb; N=1000 Range (min … max): 49.549 ms … 51.262 ms Time (median): 50.046 ms Time (mean ± σ): 50.108 ms ± 396.058 μs ▁ ▃ ▁ █ ▄▁▄▁▄▁▁▁█▄█▄▁▄█▁▄▇█▁▁▁▄▇▄▇▄▁▁▁▁▁▁▁▄▄▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▄▁▁▁ ▁ 49.549 ms Histogram: frequency by time 51.262 ms< convolve_jl; N=1000, T=Any[] Range (min … max): 1.239 ms … 23.259 ms Time (median): 1.291 ms Time (mean ± σ): 1.316 ms ± 591.804 μs █▆ ▃ ▃▃▃▁▁▁▃▃▁▁▃▆██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 1.239 ms Histogram: frequency by time 1.376 ms< convolve_jl; N=1000, T=Float64[] Range (min … max): 855.000 μs … 44.333 ms Time (median): 894.000 μs Time (mean ± σ): 916.545 μs ± 930.027 μs ██ ▇ ▅▇▅▇▅▁▅▁▁▅▅▁▅▁▇▇██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 855μs Histogram: log(frequency) by time 948μs<
fill(zero(x[1]), m + n - 1) [email protected] for k in RbCall.RubyRange(0, 1, length(z), true) zz = z[k + 1] i0 = max(k - n + 1, 0) for i in RbCall.RubyRange(i0, 1, m, true) zz += x[i + 1] * y[k - i + 1] if k - i == 0 break end end z[k + 1] = zz end z end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` convolve_jl; N=10000, T=Any[] Range (min … max): 38.072 ms … 39.816 ms Time (median): 38.226 ms Time (mean ± σ): 38.265 ms ± 237.234 μs ▁▂▂▇▇█▄▂▆▁ ▁ ▆██████████▆▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ 38.072 ms Histogram: log(frequency) by time 39.816 ms< convolve_jl; N=10000, T=Float64[] Range (min … max): 34.383 ms … 34.723 ms Time (median): 34.497 ms Time (mean ± σ): 34.513 ms ± 92.663 μs ▂█▂ ▂ ▂ ▂ ▂ ▅▁█▁█▁█████▁█▁▅▁█▅▁▁▅▁▁▅▅█▅▅▅▅▁▅▁█▁█▁▁█▅▅▅▅▁▅█▁▁▁▅▁▁▁▅▁▁▁▁▅ ▁ 34.383 ms Histogram: frequency by time 34.723 ms< convolve_rb; N=1000 Range (min … max): 49.622 ms … 51.365 ms Time (median): 49.766 ms Time (mean ± σ): 49.884 ms ± 356.988 μs █▆█▃ ▁ ▄▄▇████▄▄█▄▄▇▇▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ 49.622 ms Histogram: frequency by time 51.365 ms< convolve_jl; N=1000, T=Any[] Range (min … max): 749.000 μs … 42.637 ms Time (median): 772.000 μs Time (mean ± σ): 817.622 μs ± 1.168 ms █ ▃ ▃▃▃▅▆█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 749μs Histogram: frequency by time 822μs< convolve_jl; N=1000, T=Float64[] Range (min … max): 360.000 μs … 2.154 ms Time (median): 385.000 μs Time (mean ± σ): 386.711 μs ± 35.166 μs █ ▄▁▆▇▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 360μs Histogram: frequency by time 416μs<
The complexity is O(n) Experiment: bad case def dot_product(x, y) n = x.length JLTrans.assert { n == y.length } z = JLTrans.zero(x[0]) (0 ... n).each do |i| z += x[i] * y[i] end z end <latexit sha1_base64="DakWZm7lJu990yFHGbcFHo6hJbk=">AAACA3icbVDLSsNAFJ34rPUVdaebwSK4KokUdVModOOygn1AG8NkOmmHzkzCzESMIeDGX3HjQhG3/oQ7/8bpY6GtBy4czrmXe+8JYkaVdpxva2l5ZXVtvbBR3Nza3tm19/ZbKkokJk0csUh2AqQIo4I0NdWMdGJJEA8YaQej+thv3xGpaCRudBoTj6OBoCHFSBvJtw8fYBX2VML9jFad/DYTObz3KUx96tslp+xMABeJOyMlMEPDt796/QgnnAiNGVKq6zqx9jIkNcWM5MVeokiM8AgNSNdQgThRXjb5IYcnRunDMJKmhIYT9fdEhrhSKQ9MJ0d6qOa9sfif1010eOllVMSJJgJPF4UJgzqC40Bgn0qCNUsNQVhScyvEQyQR1ia2ognBnX95kbTOyu55uXJdKdXqszgK4Agcg1PgggtQA1egAZoAg0fwDF7Bm/VkvVjv1se0dcmazRyAP7A+fwDn/JcT</latexit> z = n X i=0 xiyi
z = zero(x[1]) for i in RbCall.RubyRange(0, 1, n, true) z += x[i + 1] * y[i + 1] end z end def dot_product(x, y) n = x.length JLTrans.assert { n == y.length } z = JLTrans.zero(x[0]) (0 ... n).each do |i| z += x[i] * y[i] end z end
z = zero(x[1]) for i in RbCall.RubyRange(0, 1, n, true) z += x[i + 1] * y[i + 1] end z end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` {:jl=>47.43176456562391, :rb=>47.43176456562391} dot_product_rb; N=10000 Range (min … max): 458.000 μs … 583.000 μs Time (median): 465.000 μs Time (mean ± σ): 468.608 μs ± 9.411 μs █ ▄ ▂▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 458μs Histogram: frequency by time 512μs< dot_product_jl; N=10000 Range (min … max): 3.636 ms … 11.786 ms Time (median): 3.704 ms Time (mean ± σ): 3.759 ms ± 428.098 μs ▃█ ▄██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 3.636 ms Histogram: log(frequency) by time 4.386 ms< dot_product_jl; N=10000, T=Float64 Range (min … max): 9.000 μs … 2.608 ms Time (median): 10.000 μs Time (mean ± σ): 10.651 μs ± 7.985 μs █ ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 9μs Histogram: frequency by time 19μs<
get faster by transpiling to Julia • Julia can generate optimized native code for function arguments • Array conversion is heavy overhead • Algorithms with O(n) time complexity get slower when they take array arguments
… • by Ruby-to-Julia transpiler so that the optimized native code is generated • when we can ignore Ruby’s dynamic features • This JIT compiler requires Julia … • but we can utilize Julia’s library features in Ruby • If you are interested in this method, please ask red-data-tools project to join the development
Non-copy wrapper of Numo::NArray in Julia • Stability improvement around of GC • Support of USE_RVARGC in Ruby 3.2 • CRuby byte code to Julia conversion • Adding trampoline support of AARCH64 in Julia and LLVM