Method-based JIT compilation by transpiling to Julia

Kenta Murata Xica Co., Ltd. 2022.09.09 RubyKaigi 2022 at Mie
Center for the Arts, Mie, Japan Method-based JIT compilation by transpiling to Julia

Talk Summary • I examined a new way of method-based
JIT compilation • It uses Julia language as the backend • Some results of experiments will be demonstrated

Talk Outline • self.introduction • Background and Motivation • Method
• Experiments and Results • Conclusion • Future plan

self.introduction • Kenta Murata • Chief Research O ff i
cer at Xica Co., Ltd. • Committer of Apache Arrow and Ruby • Gems: • pycall, enumerable-statistics, unicode_plot, charty, bigdecimal, etc. • red-arrow, red-datasets, numo-narray, etc.

Background and Motivation

Background • We can handle a large numerical arrays e
ffi ciently in Ruby by using Numo::NArray and Red Arrow libraries • However, numerical algorithms purely written in Ruby with these libraries are not very fast even if MJIT or YJIT is available • The reason is that these JIT compilers preserve all the semantics of Ruby

Ruby’s semantics == dynamic method dispatch • All the methods
can be rede fi ned with other implementations anytime • The rede fi nition immediately a ff ect the code execution • Even in the middle of a while loop s = (1 .. 10).inject {|a, x| a + x } Numo::DFloat.new(1000, 2).rand_norm.cumsum(axis: 1) ↑ ↑ ↑ ↑ ↑

Background • We can handle a large numerical arrays e
ffi ciently in Ruby by using Numo::NArray and Red Arrow libraries • However, numerical algorithms purely written in Ruby with these libraries does not run fast even if MJIT or YJIT is available • The reason is that these JIT compilers preserve all the semantics of Ruby • Ruby’s semantics is mostly meaningless in numerical algorithms so it is better to be able to optimize the computation by ignoring them • Today we must rewrite the algorithms into C extension libraries for such optimizations

How we make numerical algorithms written in Ruby faster without
rewriting them into C extension libraries?

Numba: Solution in the Python world • Numba is a
JIT compiler for CPython • Numba translates a subset of Python and NumPy code into fast machine code • Let’s see the example

import numba @numba.jit(nopython=True) def add(x, y): return x + y
# Invoking JIT compile by calling the function add(10.0, 20.0) # Signatures of type-specific compiled functions print(add.signatures) #=> [(float64, float64)] # Assembly sources of the 1st type-specific compiled function print(add.inspect_asm(add.signatures[0])) #=> # (snip) # vaddsd %xmm1, %xmm0, %xmm0 # xorl %eax, %eax # vmovsd %xmm0, (%rdi) # retq # (snip)

What Numba does? • Numba compiles CPython’s bydecode with optional
type-specialization • Compiler stages: 1. Analyze CPython’s byte code 2. Generate Numba IR and rewrite it 3. Infer types of Numba IR and rewrite it into typed IR 4. Perform automatic parallelization 5. Generate LLVM IR from typed IR 6. Compile LLVM IR to native code (LLVM’s role)

Numba’s two JIT modes How to deal with CPython’s semantics
• Numba has two modes: 1. Object mode — Preserve the full semantics of CPython by using the C API of CPython interpreter 2. nopython mode — Generate small and e ff i cient native code that specialized to the speci fi c native data types such as fl oat64 and numpy arrays • Object mode is required if the code to be JIT-ed uses the features that depends on CPython’s semantics • e.g. Attribute access of objects, using types unsupported by Numba • nopython mode can be applied if the code does not rely on such the features

import numba @numba.jit(nopython=True) def add(x, y): return x + y
# Invoking JIT compile by calling the function add(10.0, 20.0) # Signatures of type-specific compiled functions print(add.signatures) #=> [(float64, float64)] # Assembly sources of the 1st type-specific compiled function print(add.inspect_asm(add.signatures[0])) #=> # (snip) # vaddsd %xmm1, %xmm0, %xmm0 # xorl %eax, %eax # vmovsd %xmm0, (%rdi) # retq # (snip) Native add instruction for the fl oat64 type is generated!!

import numba @numba.jit(forceobj=True) # Force to compile in the object
mode def add(x, y): return x + y # Invoking JIT compile by calling the function add(10.0, 20.0) # Signatures of type-specific compiled functions print(add.signatures) #=> [(float64, float64)] # Assembly sources of the 1st type-specific compiled function print(add.inspect_asm(add.signatures[0])) #=> # (snip) # movabsq $PyNumber_Add, %rax # movq %r12, %rdi # movq %r14, %rsi # callq *%rax # (snip) PyNumber_Add function is used instead of the native add instruction

I need nopython-mode like JIT compilation feature in Ruby for
the purpose

Method

Method CRuby byte code Something typed IR LLVM IR Native
code NUMBA-like JIT compile process Method CPython byte code NUMBA typed IR LLVM IR Native code

Method CRuby byte code Something typed IR LLVM IR Native
code NUMBA-like JIT compile process Generating AST Optimization Generating byte code Generating IR Type inference Optimization

JIT compile by transpiling to Julia Method CRuby byte code
Something typed IR LLVM IR Native code Julia code Julia AST Julia typed IR LLVM IR Native code Generating AST Type inference Generating IR

೉ͦ͠͏ Julia

How to transpile Ruby to Julia

Transpile Ruby to Julia Ruby AST Ruby code Julia code
→ → 1 + 2 1 + 2 + 1 2 → → a...b … a b RbCall.RubyRange( a, 1, b, true) → → a:b

Transpile Ruby to Julia (1) Ruby code → AST (powered
by yadriggy.gem) def inject_sum(range) range.inject {|a, x| a + x } end [Def [Name “inject_sum”] [Params “range”] [Exprs [Call [Name “inject”] [Receiver [Name “range”]] [Block [Params “a” “x”] [Exprs [Call [Name “+”] [Receiver [Name “a”]] [Params [Name “x”]]]]]]]]

Transpile Ruby to Julia (2) Generate Julia code from AST
(powered by yadriggy.gem) function inject_sum(range) RbCall.inject(range) do a, x a + x end end [Def [Name “inject_sum”] [Params “range”] [Exprs [Call [Name “inject”] [Receiver [Name “range”]] [Block [Params “a” “x”] [Exprs [Call [Name “+”] [Receiver [Name “a”]] [Params [Name “x”]]]]]]]] The helper function to emulate the behavior of Ruby’s inject method

Transpile Ruby to Julia Before and After def inject_sum(range) range.inject
{|a, x| a + x } end function inject_sum(range) RbCall.inject(range) do a, x a + x end end

function inject_sum_op(range) RbCall.inject(range) do a, x a + x end
end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` function inject_sum_add(range) RbCall.inject(range) do a, x add(a, x) end end function add(x, y) x + y end inject_sum_op_rb Range (min … max): 4.670 ms … 5.644 ms Time (median): 4.725 ms Time (mean ± σ): 4.739 ms ± 78.906 μs █▄ ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 4.67ms Histogram: log(frequency) by time 5.062 ms< inject_sum_op_jl Range (min … max): 29.000 μs … 8.850 ms Time (median): 30.000 μs Time (mean ± σ): 30.615 μs ± 34.759 μs █ ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 29μs Histogram: frequency by time 40μs< inject_sum_add_rb Range (min … max): 6.141 ms … 6.597 ms Time (median): 6.163 ms Time (mean ± σ): 6.189 ms ± 67.294 μs █▃ ▇██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▇ 6.141 ms Histogram: log(frequency) by time 6.432 ms< inject_sum_add_jl Range (min … max): 29.000 μs … 2.593 ms Time (median): 30.000 μs Time (mean ± σ): 30.499 μs ± 12.872 μs █ ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 29μs Histogram: frequency by time 42μs<

Experiments and Results

Mandelbrot set • Calculating Mandelbrot set in the following domain
region: Experiment def mandel(z) c = z maxiter = 80 (1 ... maxiter).each do |n| return n - 1 if z.abs2 > 4 z = z**2 + c end maxiter end def mandelbrot ary = [] (-1 .. 1.0).step(0.1) do |i| (-2 .. 0.5).step(0.1) do |r| ary << mandel(Complex(r, i)) end end ary end <latexit sha1_base64="QWzl+pFesLOlOuzv+S+NsWXVER4=">AAACFHicbVDLSsNAFJ34rPUVdelmsAhCa0lKfSwL3bisYB+QhDKZTtuhk0mYmUhD7Ee48VfcuFDErQt3/o3TNgttPXDhcM693HuPHzEqlWV9Gyura+sbm7mt/PbO7t6+eXDYkmEsMGnikIWi4yNJGOWkqahipBMJggKfkbY/qk/99j0Rkob8TiUR8QI04LRPMVJa6ppFNx0XE+rCBxeOoUs5dM4rJWiVL7wSTDLBLkHbcydds2CVrRngMrEzUgAZGl3zy+2FOA4IV5ghKR3bipSXIqEoZmSSd2NJIoRHaEAcTTkKiPTS2VMTeKqVHuyHQhdXcKb+nkhRIGUS+LozQGooF72p+J/nxKp/7aWUR7EiHM8X9WMGVQinCcEeFQQrlmiCsKD6VoiHSCCsdI55HYK9+PIyaVXK9mW5elst1OpZHDlwDE7AGbDBFaiBG9AATYDBI3gGr+DNeDJejHfjY966YmQzR+APjM8fh3+auA==</latexit> {x + yi | x 2 [ 2, 0.5], y 2 [ 1, 1]} The image created by Wolfgang Beyer, CC BY-SA 3.0 Re Im

function mandelbrot() ary = [] for i in RbCall.RubyRange(-1, 0.1,
1.0) for r in RbCall.RubyRange(-2, 0.1, 0.5) append!(ary, mandel(Complex(r, i))) end end ary end function mandel(z) c = z maxiter = 80 for n in RbCall.RubyRange(1, 1, maxiter, true) if abs2(z) > 4 return n - 1 end z = z ^ 2 + c end maxiter end def mandel(z) c = z maxiter = 80 (1 ... maxiter).each do |n| return n - 1 if z.abs2 > 4 z = z**2 + c end maxiter end def mandelbrot ary = [] (-1 .. 1.0).step(0.1) do |i| (-2 .. 0.5).step(0.1) do |r| ary << mandel(Complex(r, i)) end end ary end

function mandelbrot() ary = [] for i in RbCall.RubyRange(-1, 0.1,
1.0) for r in RbCall.RubyRange(-2, 0.1, 0.5) append!(ary, mandel(Complex(r, i))) end end ary end function mandel(z) c = z maxiter = 80 for n in RbCall.RubyRange(1, 1, maxiter, true) if abs2(z) > 4 return n - 1 end z = z ^ 2 + c end maxiter end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` mandelbrot_rb Range (min … max): 3.221 ms … 3.655 ms Time (median): 3.323 ms Time (mean ± σ): 3.326 ms ± 48.710 μs ▄▆ ▂▄ ▃ ▅█▄▂▄ ▅ ▆██▆██▆█▁█████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 3.221 ms Histogram: log(frequency) by time 3.431 ms< mandelbrot_jl Range (min … max): 164.000 μs … 8.291 ms Time (median): 167.000 μs Time (mean ± σ): 171.667 μs ± 99.903 μs ▁ █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 164μs Histogram: frequency by time 201μs<

π estimation by Monte Carlo • Estimating the approximation of
π by Monte Carlo method Experiment def monte_carlo_pi(n_samples) acc = 0 (0 ... n_samples).each do |i| x = rand y = rand acc += 1 if (x**2 + y**2) < 1.0 end return 4.0 * acc / n_samples end

function monte_carlo_pi(n_samples) acc = 0 for i in RbCall.RubyRange(0, 1,
n_samples, true) x = RbCall.rand(Float64) y = RbCall.rand(Float64) if (x ^ 2 + y ^ 2) < 1.0 acc += 1 end end return 4.0 * acc / n_samples end def monte_carlo_pi(n_samples) acc = 0 (0 ... n_samples).each do |i| x = rand y = rand acc += 1 if (x**2 + y**2) < 1.0 end return 4.0 * acc / n_samples end

function monte_carlo_pi(n_samples) acc = 0 for i in RbCall.RubyRange(0, 1,
n_samples, true) x = RbCall.rand(Float64) y = RbCall.rand(Float64) if (x ^ 2 + y ^ 2) < 1.0 acc += 1 end end return 4.0 * acc / n_samples end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` {:rb=>3.143348} {:jl=>3.143044} monte_carlo_pi_rb; N=1000000 Range (min … max): 105.851 ms … 108.312 ms Time (median): 106.178 ms Time (mean ± σ): 106.369 ms ± 602.130 μs █ ▇▇▇▁▇▁█▇▇▇▇▁▇▇▁▇▁▁▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ 105.851 ms Histogram: frequency by time 108.312 ms< monte_carlo_pi_jl; N=1000000 Range (min … max): 8.753 ms … 9.478 ms Time (median): 8.837 ms Time (mean ± σ): 8.851 ms ± 72.243 μs ▅▅█ ▅▅▇▁▅▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▇ 8.753 ms Histogram: log(frequency) by time 9.237 ms<

Quicksort • The famous algorithm to sort an array by
the divide-and-conquer method Experiment def partition(ary, lo, hi) pi = (lo + hi).div(2) pv = ary[pi] ary[pi], ary[hi] = ary[hi], ary[pi] j = lo JLTrans.inbounds do (lo ... hi).each do |i| if ary[i] <= pv ary[i], ary[j] = ary[j], ary[i] j += 1 end end end ary[j], ary[hi] = ary[hi], ary[j] j end def quicksort!(ary) quicksort0!(ary, 0, ary.length - 1) end def quicksort0!(ary, lo, hi) if lo < hi pi = partition(ary, lo, hi) quicksort0!(ary, lo, pi - 1) quicksort0!(ary, pi + 1, hi) end ary end

function quicksort!(ary) quicksort0!(ary, 0, length(ary) - 1) end function quicksort0!(ary,
lo, hi) if lo < hi piv = partition(ary, lo, hi) quicksort0!(ary, lo, piv - 1) quicksort0!(ary, piv + 1, hi) end ary end function partition(ary, lo, hi) piv = div((lo + hi), 2) v = ary[piv + 1] ary[piv + 1], ary[hi + 1] = ary[hi + 1], ary[piv + 1] j = lo @inbounds for i in RbCall.RubyRange(lo, 1, hi, true) if ary[i + 1] <= v ary[i + 1], ary[j + 1] = ary[j + 1], ary[i + 1] j += 1 end end ary[j + 1], ary[hi + 1] = ary[hi + 1], ary[j + 1] j end def quicksort!(ary) quicksort0!(ary, 0, ary.length - 1) end def quicksort0!(ary, lo, hi) if lo < hi piv = partition(ary, lo, hi) quicksort0!(ary, lo, piv - 1) quicksort0!(ary, piv + 1, hi) end ary end def partition(ary, lo, hi) piv = (lo + hi).div(2) v = ary[piv] ary[piv], ary[hi] = ary[hi], ary[piv] j = lo JLTrans.inbounds do (lo ... hi).each do |i| if ary[i] <= v ary[i], ary[j] = ary[j], ary[i] j += 1 end end end ary[j], ary[hi] = ary[hi], ary[j] j end

function quicksort!(ary) quicksort0!(ary, 0, length(ary) - 1) end function quicksort0!(ary,
lo, hi) if lo < hi piv = partition(ary, lo, hi) quicksort0!(ary, lo, piv - 1) quicksort0!(ary, piv + 1, hi) end ary end function partition(ary, lo, hi) piv = div((lo + hi), 2) v = ary[piv + 1] ary[piv + 1], ary[hi + 1] = ary[hi + 1], ary[piv + 1] j = lo @inbounds for i in RbCall.RubyRange(lo, 1, hi, true) if ary[i + 1] <= v ary[i + 1], ary[j + 1] = ary[j + 1], ary[i + 1] j += 1 end end ary[j + 1], ary[hi + 1] = ary[hi + 1], ary[j + 1] j end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` quicksort_rb!; N=10000 Range (min … max): 7.413 ms … 8.341 ms Time (median): 7.539 ms Time (mean ± σ): 7.551 ms ± 90.993 μs ▂▅▄▄▂▆█▇▆ ▆██████████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 7.413 ms Histogram: log(frequency) by time 7.919 ms< quicksort_jl!; N=10000 Range (min … max): 1.790 ms … 42.982 ms Time (median): 1.823 ms Time (mean ± σ): 1.937 ms ± 1.764 ms █▇ ▄ ▅██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 1.79ms Histogram: log(frequency) by time 2.196 ms<

convolve • Calculating convolution of two arrays x and y
• If two arrays have the same length n, the complexity of calculation is O(n2) Experiment def convolve(x, y) m, n = x.length, y.length z = Array.new(m+n-1, JLTrans.zero(x[0])) (0 ... z.length).each do |k| zz = z[k] i0 = [k-n+1, 0].max (i0 ... m).each do |i| zz += x[i]*y[k-i] break if k-i == 0 end z[k] = zz end z end z[0] = x[0]*y[0]  z[1] = x[0]*y[1] + x[1]*y[0]  z[2] = x[0]*y[2] + x[1]*y[1] + x[2]*y[0]  : <latexit sha1_base64="wTtf3Pnnv1+xMKE6GbtC1JAgDf4=">AAACEnicbVC7TsMwFHV4lvIKMLJYVEgwtEpQBSyVKnVhLBJ9SE2IHNdprTpOZDuIEOUbWPgVFgYQYmVi429wHwO0HOnqHp1zr+x7/JhRqSzr21haXlldWy9sFDe3tnd2zb39towSgUkLRywSXR9JwignLUUVI91YEBT6jHT8UWPsd+6IkDTiNyqNiRuiAacBxUhpyTNPH7wRrEFHJqGX0VrZoTxQaX477fDeozD1slGZ5p5ZsirWBHCR2DNSAjM0PfPL6Uc4CQlXmCEpe7YVKzdDQlHMSF50EklihEdoQHqachQS6WaTk3J4rJU+DCKhiys4UX9vZCiUMg19PRkiNZTz3lj8z+slKrh0M8rjRBGOpw8FCYMqguN8YJ8KghVLNUFYUP1XiIdIIKx0ikUdgj1/8iJpn1Xs80r1ulqqN2ZxFMAhOAInwAYXoA6uQBO0AAaP4Bm8gjfjyXgx3o2P6eiSMds5AH9gfP4A5fedpg==</latexit> zk = 1 X i= 1 xiyk i

function convolve(x, y) m, n = length(x), length(y) z =
fill(zero(x[1]), m + n - 1) for k in RbCall.RubyRange(0, 1, length(z), true) zz = z[k + 1] i0 = max(k - n + 1, 0) for i in RbCall.RubyRange(i0, 1, m, true) zz += x[i + 1] * y[k - i + 1] if k - i == 0 break end end z[k + 1] = zz end z end def convolve(x, y) m, n = x.length, y.length z = Array.new(m+n-1, JLTrans.zero(x[0])) (0 ... z.length).each do |k| zz = z[k] i0 = [k-n+1, 0].max (i0 ... m).each do |i| zz += x[i]*y[k-i] break if k-i == 0 end z[k] = zz end z end

fill(zero(x[1]), m + n - 1) for k in RbCall.RubyRange(0, 1, length(z), true) zz = z[k + 1] i0 = max(k - n + 1, 0) for i in RbCall.RubyRange(i0, 1, m, true) zz += x[i + 1] * y[k - i + 1] if k - i == 0 break end end z[k + 1] = zz end z end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` convolve_jl; N=10000, T=Any[] Range (min … max): 92.246 ms … 92.973 ms Time (median): 92.702 ms Time (mean ± σ): 92.671 ms ± 204.331 μs ▃ ▃▃ ▃ ▃█ ▃ ▃ ▃ ▃ ▃ ▃▃█ █ █ █▁▁▁▁▁▁▁▁██▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁██▁█▁█▁▁█▁▁█▁█▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁█ ▁ 92.246 ms Histogram: log(frequency) by time 92.973 ms< convolve_jl; N=10000, T=Float64[] Range (min … max): 88.514 ms … 89.272 ms Time (median): 88.779 ms Time (mean ± σ): 88.820 ms ± 216.830 μs ▁ ▁▅ █▅ ▁ ▁ ▁▁ ▁ ▅ ▁▅ ▅ █▁▁▁▁██▁▁▁▁▁██▁█▁▁▁▁█▁▁██▁▁▁█▁▁▁▁▁▁█▁▁▁▁██▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁ ▁ 88.514 ms Histogram: log(frequency) by time 89.272 ms< convolve_rb; N=1000 Range (min … max): 49.549 ms … 51.262 ms Time (median): 50.046 ms Time (mean ± σ): 50.108 ms ± 396.058 μs ▁ ▃ ▁ █ ▄▁▄▁▄▁▁▁█▄█▄▁▄█▁▄▇█▁▁▁▄▇▄▇▄▁▁▁▁▁▁▁▄▄▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▄▁▁▁ ▁ 49.549 ms Histogram: frequency by time 51.262 ms< convolve_jl; N=1000, T=Any[] Range (min … max): 1.239 ms … 23.259 ms Time (median): 1.291 ms Time (mean ± σ): 1.316 ms ± 591.804 μs █▆ ▃ ▃▃▃▁▁▁▃▃▁▁▃▆██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 1.239 ms Histogram: frequency by time 1.376 ms< convolve_jl; N=1000, T=Float64[] Range (min … max): 855.000 μs … 44.333 ms Time (median): 894.000 μs Time (mean ± σ): 916.545 μs ± 930.027 μs ██ ▇ ▅▇▅▇▅▁▅▁▁▅▅▁▅▁▇▇██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 855μs Histogram: log(frequency) by time 948μs<

fill(zero(x[1]), m + n - 1) Threads.@threads for k in RbCall.RubyRange(0, 1, length(z), true) zz = z[k + 1] i0 = max(k - n + 1, 0) for i in RbCall.RubyRange(i0, 1, m, true) zz += x[i + 1] * y[k - i + 1] if k - i == 0 break end end z[k + 1] = zz end z end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` convolve_jl; N=10000, T=Any[] Range (min … max): 38.072 ms … 39.816 ms Time (median): 38.226 ms Time (mean ± σ): 38.265 ms ± 237.234 μs ▁▂▂▇▇█▄▂▆▁ ▁ ▆██████████▆▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ 38.072 ms Histogram: log(frequency) by time 39.816 ms< convolve_jl; N=10000, T=Float64[] Range (min … max): 34.383 ms … 34.723 ms Time (median): 34.497 ms Time (mean ± σ): 34.513 ms ± 92.663 μs ▂█▂ ▂ ▂ ▂ ▂ ▅▁█▁█▁█████▁█▁▅▁█▅▁▁▅▁▁▅▅█▅▅▅▅▁▅▁█▁█▁▁█▅▅▅▅▁▅█▁▁▁▅▁▁▁▅▁▁▁▁▅ ▁ 34.383 ms Histogram: frequency by time 34.723 ms< convolve_rb; N=1000 Range (min … max): 49.622 ms … 51.365 ms Time (median): 49.766 ms Time (mean ± σ): 49.884 ms ± 356.988 μs █▆█▃ ▁ ▄▄▇████▄▄█▄▄▇▇▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁ 49.622 ms Histogram: frequency by time 51.365 ms< convolve_jl; N=1000, T=Any[] Range (min … max): 749.000 μs … 42.637 ms Time (median): 772.000 μs Time (mean ± σ): 817.622 μs ± 1.168 ms █ ▃ ▃▃▃▅▆█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 749μs Histogram: frequency by time 822μs< convolve_jl; N=1000, T=Float64[] Range (min … max): 360.000 μs … 2.154 ms Time (median): 385.000 μs Time (mean ± σ): 386.711 μs ± 35.166 μs █ ▄▁▆▇▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 360μs Histogram: frequency by time 416μs<

Dot-product • Calculating the dot product of two arrays •
The complexity is O(n) Experiment: bad case def dot_product(x, y) n = x.length JLTrans.assert { n == y.length } z = JLTrans.zero(x[0]) (0 ... n).each do |i| z += x[i] * y[i] end z end <latexit sha1_base64="DakWZm7lJu990yFHGbcFHo6hJbk=">AAACA3icbVDLSsNAFJ34rPUVdaebwSK4KokUdVModOOygn1AG8NkOmmHzkzCzESMIeDGX3HjQhG3/oQ7/8bpY6GtBy4czrmXe+8JYkaVdpxva2l5ZXVtvbBR3Nza3tm19/ZbKkokJk0csUh2AqQIo4I0NdWMdGJJEA8YaQej+thv3xGpaCRudBoTj6OBoCHFSBvJtw8fYBX2VML9jFad/DYTObz3KUx96tslp+xMABeJOyMlMEPDt796/QgnnAiNGVKq6zqx9jIkNcWM5MVeokiM8AgNSNdQgThRXjb5IYcnRunDMJKmhIYT9fdEhrhSKQ9MJ0d6qOa9sfif1010eOllVMSJJgJPF4UJgzqC40Bgn0qCNUsNQVhScyvEQyQR1ia2ognBnX95kbTOyu55uXJdKdXqszgK4Agcg1PgggtQA1egAZoAg0fwDF7Bm/VkvVjv1se0dcmazRyAP7A+fwDn/JcT</latexit> z = n X i=0 xiyi

function dot_product(x, y) n = length(x) @assert n == length(y)
z = zero(x[1]) for i in RbCall.RubyRange(0, 1, n, true) z += x[i + 1] * y[i + 1] end z end def dot_product(x, y) n = x.length JLTrans.assert { n == y.length } z = JLTrans.zero(x[0]) (0 ... n).each do |i| z += x[i] * y[i] end z end

function dot_product(x, y) n = length(x) @assert n == length(y)
z = zero(x[1]) for i in RbCall.RubyRange(0, 1, n, true) z += x[i + 1] * y[i + 1] end z end Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl` {:jl=>47.43176456562391, :rb=>47.43176456562391} dot_product_rb; N=10000 Range (min … max): 458.000 μs … 583.000 μs Time (median): 465.000 μs Time (mean ± σ): 468.608 μs ± 9.411 μs █ ▄ ▂▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 458μs Histogram: frequency by time 512μs< dot_product_jl; N=10000 Range (min … max): 3.636 ms … 11.786 ms Time (median): 3.704 ms Time (mean ± σ): 3.759 ms ± 428.098 μs ▃█ ▄██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 3.636 ms Histogram: log(frequency) by time 4.386 ms< dot_product_jl; N=10000, T=Float64 Range (min … max): 9.000 μs … 2.608 ms Time (median): 10.000 μs Time (mean ± σ): 10.651 μs ± 7.985 μs █ ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █ 9μs Histogram: frequency by time 19μs<

Summary of Experiments • Algorithms that have large time complexity
get faster by transpiling to Julia • Julia can generate optimized native code for function arguments • Array conversion is heavy overhead • Algorithms with O(n) time complexity get slower when they take array arguments

Conclusion

Conclusion • We can make a pure-Ruby method super fast
… • by Ruby-to-Julia transpiler so that the optimized native code is generated • when we can ignore Ruby’s dynamic features • This JIT compiler requires Julia … • but we can utilize Julia’s library features in Ruby • If you are interested in this method, please ask red-data-tools project to join the development

Future plan

Future plan • Coverage improvement of CRuby AST patterns •
Non-copy wrapper of Numo::NArray in Julia • Stability improvement around of GC • Support of USE_RVARGC in Ruby 3.2 • CRuby byte code to Julia conversion • Adding trampoline support of AARCH64 in Julia and LLVM

Method-based JIT compilation by transpiling to ...

Method-based JIT compilation by transpiling to Julia

More Decks by Kenta Murata

Other Decks in Technology

Featured

Transcript