$30 off During Our Annual Pro Sale. View Details »

Method-based JIT compilation by transpiling to Julia

Kenta Murata
September 09, 2022

Method-based JIT compilation by transpiling to Julia

I will demonstrate a new approach to the method-based Just-In-Time compilation for Ruby. This is used Julia language as an infrastructure of the JIT compilation. I will describe the characteristics of this approach and show you some example results.

Kenta Murata

September 09, 2022
Tweet

More Decks by Kenta Murata

Other Decks in Technology

Transcript

  1. Kenta Murata
    Xica Co., Ltd.
    2022.09.09 RubyKaigi 2022 at Mie Center for the Arts, Mie, Japan
    Method-based JIT compilation
    by transpiling to Julia

    View Slide

  2. Talk Summary
    • I examined a new way of method-based JIT compilation

    • It uses Julia language as the backend

    • Some results of experiments will be demonstrated

    View Slide

  3. Talk Outline
    • self.introduction

    • Background and Motivation

    • Method

    • Experiments and Results

    • Conclusion

    • Future plan

    View Slide

  4. self.introduction
    • Kenta Murata

    • Chief Research O
    ff i
    cer at Xica Co., Ltd.

    • Committer of Apache Arrow and Ruby

    • Gems:

    • pycall, enumerable-statistics, unicode_plot, charty, bigdecimal, etc.

    • red-arrow, red-datasets, numo-narray, etc.

    View Slide

  5. Background and Motivation

    View Slide

  6. Background
    • We can handle a large numerical arrays e
    ffi
    ciently in Ruby by using
    Numo::NArray and Red Arrow libraries

    • However, numerical algorithms purely written in Ruby with these libraries are
    not very fast even if MJIT or YJIT is available

    • The reason is that these JIT compilers preserve all the semantics of Ruby

    View Slide

  7. Ruby’s semantics == dynamic method dispatch
    • All the methods can be rede
    fi
    ned with other implementations anytime

    • The rede
    fi
    nition immediately a
    ff
    ect the code execution

    • Even in the middle of a while loop

    s = (1 .. 10).inject {|a, x| a + x }

    Numo::DFloat.new(1000, 2).rand_norm.cumsum(axis: 1)
    ↑ ↑



    View Slide

  8. Background
    • We can handle a large numerical arrays e
    ffi
    ciently in Ruby by using
    Numo::NArray and Red Arrow libraries

    • However, numerical algorithms purely written in Ruby with these libraries does
    not run fast even if MJIT or YJIT is available

    • The reason is that these JIT compilers preserve all the semantics of Ruby

    • Ruby’s semantics is mostly meaningless in numerical algorithms so it is better
    to be able to optimize the computation by ignoring them

    • Today we must rewrite the algorithms into C extension libraries for such
    optimizations

    View Slide

  9. How we make numerical algorithms written in
    Ruby faster without rewriting them into C
    extension libraries?

    View Slide

  10. Numba: Solution in the Python world
    • Numba is a JIT compiler for CPython

    • Numba translates a subset of Python and NumPy code into fast machine
    code

    • Let’s see the example

    View Slide

  11. import numba


    @numba.jit(nopython=True)


    def add(x, y):


    return x + y


    # Invoking JIT compile by calling the function


    add(10.0, 20.0)


    # Signatures of type-specific compiled functions


    print(add.signatures)


    #=> [(float64, float64)]


    # Assembly sources of the 1st type-specific compiled function


    print(add.inspect_asm(add.signatures[0]))


    #=>


    # (snip)


    # vaddsd %xmm1, %xmm0, %xmm0


    # xorl %eax, %eax


    # vmovsd %xmm0, (%rdi)


    # retq


    # (snip)

    View Slide

  12. What Numba does?
    • Numba compiles CPython’s bydecode with optional type-specialization

    • Compiler stages:

    1. Analyze CPython’s byte code

    2. Generate Numba IR and rewrite it

    3. Infer types of Numba IR and rewrite it into typed IR

    4. Perform automatic parallelization

    5. Generate LLVM IR from typed IR

    6. Compile LLVM IR to native code (LLVM’s role)

    View Slide

  13. Numba’s two JIT modes
    How to deal with CPython’s semantics
    • Numba has two modes:

    1. Object mode — Preserve the full semantics of CPython by using the C API of
    CPython interpreter

    2. nopython mode — Generate small and e
    ff i
    cient native code that specialized to
    the speci
    fi
    c native data types such as
    fl
    oat64 and numpy arrays

    • Object mode is required if the code to be JIT-ed uses the features that depends on
    CPython’s semantics

    • e.g. Attribute access of objects, using types unsupported by Numba

    • nopython mode can be applied if the code does not rely on such the features

    View Slide

  14. import numba


    @numba.jit(nopython=True)


    def add(x, y):


    return x + y


    # Invoking JIT compile by calling the function


    add(10.0, 20.0)


    # Signatures of type-specific compiled functions


    print(add.signatures)


    #=> [(float64, float64)]


    # Assembly sources of the 1st type-specific compiled function


    print(add.inspect_asm(add.signatures[0]))


    #=>


    # (snip)


    # vaddsd %xmm1, %xmm0, %xmm0


    # xorl %eax, %eax


    # vmovsd %xmm0, (%rdi)


    # retq


    # (snip)
    Native add instruction for the
    fl
    oat64 type is generated!!

    View Slide

  15. import numba


    @numba.jit(forceobj=True) # Force to compile in the object mode


    def add(x, y):


    return x + y


    # Invoking JIT compile by calling the function


    add(10.0, 20.0)


    # Signatures of type-specific compiled functions


    print(add.signatures)


    #=> [(float64, float64)]


    # Assembly sources of the 1st type-specific compiled function


    print(add.inspect_asm(add.signatures[0]))


    #=>


    # (snip)


    # movabsq $PyNumber_Add, %rax


    # movq %r12, %rdi


    # movq %r14, %rsi


    # callq *%rax


    # (snip)
    PyNumber_Add function is used instead of the native
    add instruction

    View Slide

  16. I need nopython-mode like JIT
    compilation feature in Ruby for the purpose

    View Slide

  17. Method

    View Slide

  18. Method
    CRuby byte code
    Something typed IR
    LLVM IR
    Native code
    NUMBA-like JIT compile process
    Method
    CPython byte code
    NUMBA typed IR
    LLVM IR
    Native code

    View Slide

  19. Method
    CRuby byte code
    Something typed IR
    LLVM IR
    Native code
    NUMBA-like JIT compile process
    Generating AST
    Optimization
    Generating byte code
    Generating IR
    Type inference
    Optimization

    View Slide

  20. JIT compile by transpiling to Julia
    Method
    CRuby byte code
    Something typed IR
    LLVM IR
    Native code
    Julia code
    Julia AST
    Julia typed IR
    LLVM IR
    Native code
    Generating AST
    Type inference
    Generating IR

    View Slide

  21. ೉ͦ͠͏
    Julia

    View Slide

  22. How to transpile Ruby to Julia

    View Slide

  23. Transpile Ruby to Julia
    Ruby AST
    Ruby code Julia code
    → →
    1 + 2 1 + 2
    +
    1 2
    → →
    a...b

    a b
    RbCall.RubyRange(


    a, 1, b, true)
    → → a:b

    View Slide

  24. View Slide

  25. Transpile Ruby to Julia
    (1) Ruby code → AST (powered by yadriggy.gem)
    def inject_sum(range)


    range.inject {|a, x| a + x }


    end
    [Def


    [Name “inject_sum”]


    [Params “range”]


    [Exprs


    [Call


    [Name “inject”]


    [Receiver [Name “range”]]


    [Block


    [Params “a” “x”]


    [Exprs


    [Call


    [Name “+”]


    [Receiver [Name “a”]]


    [Params [Name “x”]]]]]]]]

    View Slide

  26. Transpile Ruby to Julia
    (2) Generate Julia code from AST (powered by yadriggy.gem)
    function inject_sum(range)


    RbCall.inject(range) do a, x


    a + x


    end


    end
    [Def


    [Name “inject_sum”]


    [Params “range”]


    [Exprs


    [Call


    [Name “inject”]


    [Receiver [Name “range”]]


    [Block


    [Params “a” “x”]


    [Exprs


    [Call


    [Name “+”]


    [Receiver [Name “a”]]


    [Params [Name “x”]]]]]]]]
    The helper function to emulate the
    behavior of Ruby’s inject method

    View Slide

  27. Transpile Ruby to Julia
    Before and After
    def inject_sum(range)


    range.inject {|a, x| a + x }


    end
    function inject_sum(range)


    RbCall.inject(range) do a, x


    a + x


    end


    end

    View Slide

  28. Demo

    View Slide

  29. function inject_sum_op(range)


    RbCall.inject(range) do a, x


    a + x


    end


    end


    Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl`


    function inject_sum_add(range)


    RbCall.inject(range) do a, x


    add(a, x)


    end


    end


    function add(x, y)


    x + y


    end


    inject_sum_op_rb


    Range (min … max): 4.670 ms … 5.644 ms


    Time (median): 4.725 ms


    Time (mean ± σ): 4.739 ms ± 78.906 μs


    █▄


    ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    4.67ms Histogram: log(frequency) by time 5.062 ms<


    inject_sum_op_jl


    Range (min … max): 29.000 μs … 8.850 ms


    Time (median): 30.000 μs


    Time (mean ± σ): 30.615 μs ± 34.759 μs





    ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    29μs Histogram: frequency by time 40μs<


    inject_sum_add_rb


    Range (min … max): 6.141 ms … 6.597 ms


    Time (median): 6.163 ms


    Time (mean ± σ): 6.189 ms ± 67.294 μs


    █▃


    ▇██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▇


    6.141 ms Histogram: log(frequency) by time 6.432 ms<


    inject_sum_add_jl


    Range (min … max): 29.000 μs … 2.593 ms


    Time (median): 30.000 μs


    Time (mean ± σ): 30.499 μs ± 12.872 μs





    ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    29μs Histogram: frequency by time 42μs<


    View Slide

  30. Experiments and Results

    View Slide

  31. Mandelbrot set
    • Calculating Mandelbrot set in the
    following domain region:
    Experiment
    def mandel(z)


    c = z


    maxiter = 80


    (1 ... maxiter).each do |n|


    return n - 1 if z.abs2 > 4


    z = z**2 + c


    end


    maxiter


    end


    def mandelbrot


    ary = []


    (-1 .. 1.0).step(0.1) do |i|


    (-2 .. 0.5).step(0.1) do |r|


    ary << mandel(Complex(r, i))


    end


    end


    ary


    end
    AAACFHicbVDLSsNAFJ34rPUVdelmsAhCa0lKfSwL3bisYB+QhDKZTtuhk0mYmUhD7Ee48VfcuFDErQt3/o3TNgttPXDhcM693HuPHzEqlWV9Gyura+sbm7mt/PbO7t6+eXDYkmEsMGnikIWi4yNJGOWkqahipBMJggKfkbY/qk/99j0Rkob8TiUR8QI04LRPMVJa6ppFNx0XE+rCBxeOoUs5dM4rJWiVL7wSTDLBLkHbcydds2CVrRngMrEzUgAZGl3zy+2FOA4IV5ghKR3bipSXIqEoZmSSd2NJIoRHaEAcTTkKiPTS2VMTeKqVHuyHQhdXcKb+nkhRIGUS+LozQGooF72p+J/nxKp/7aWUR7EiHM8X9WMGVQinCcEeFQQrlmiCsKD6VoiHSCCsdI55HYK9+PIyaVXK9mW5elst1OpZHDlwDE7AGbDBFaiBG9AATYDBI3gGr+DNeDJejHfjY966YmQzR+APjM8fh3+auA==
    {x + yi | x 2 [ 2, 0.5], y 2 [ 1, 1]}
    The image created by Wolfgang Beyer, CC BY-SA 3.0
    Re
    Im

    View Slide

  32. function mandelbrot()


    ary = []


    for i in RbCall.RubyRange(-1, 0.1, 1.0)


    for r in RbCall.RubyRange(-2, 0.1, 0.5)


    append!(ary, mandel(Complex(r, i)))


    end


    end


    ary


    end


    function mandel(z)


    c = z


    maxiter = 80


    for n in RbCall.RubyRange(1, 1, maxiter, true)


    if abs2(z) > 4


    return n - 1


    end


    z = z ^ 2 + c


    end


    maxiter


    end
    def mandel(z)


    c = z


    maxiter = 80


    (1 ... maxiter).each do |n|


    return n - 1 if z.abs2 > 4


    z = z**2 + c


    end


    maxiter


    end


    def mandelbrot


    ary = []


    (-1 .. 1.0).step(0.1) do |i|


    (-2 .. 0.5).step(0.1) do |r|


    ary << mandel(Complex(r, i))


    end


    end


    ary


    end

    View Slide

  33. function mandelbrot()


    ary = []


    for i in RbCall.RubyRange(-1, 0.1, 1.0)


    for r in RbCall.RubyRange(-2, 0.1, 0.5)


    append!(ary, mandel(Complex(r, i)))


    end


    end


    ary


    end


    function mandel(z)


    c = z


    maxiter = 80


    for n in RbCall.RubyRange(1, 1, maxiter, true)


    if abs2(z) > 4


    return n - 1


    end


    z = z ^ 2 + c


    end


    maxiter


    end


    Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl`


    mandelbrot_rb


    Range (min … max): 3.221 ms … 3.655 ms


    Time (median): 3.323 ms


    Time (mean ± σ): 3.326 ms ± 48.710 μs


    ▄▆ ▂▄ ▃ ▅█▄▂▄ ▅


    ▆██▆██▆█▁█████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    3.221 ms Histogram: log(frequency) by time 3.431 ms<


    mandelbrot_jl


    Range (min … max): 164.000 μs … 8.291 ms


    Time (median): 167.000 μs


    Time (mean ± σ): 171.667 μs ± 99.903 μs


    ▁ █


    █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    164μs Histogram: frequency by time 201μs<

    View Slide

  34. π estimation by Monte Carlo
    • Estimating the approximation of
    π by Monte Carlo method
    Experiment
    def monte_carlo_pi(n_samples)


    acc = 0


    (0 ... n_samples).each do |i|


    x = rand


    y = rand


    acc += 1 if (x**2 + y**2) < 1.0


    end


    return 4.0 * acc / n_samples


    end

    View Slide

  35. function monte_carlo_pi(n_samples)


    acc = 0


    for i in RbCall.RubyRange(0, 1, n_samples, true)


    x = RbCall.rand(Float64)


    y = RbCall.rand(Float64)


    if (x ^ 2 + y ^ 2) < 1.0


    acc += 1


    end


    end


    return 4.0 * acc / n_samples


    end
    def monte_carlo_pi(n_samples)


    acc = 0


    (0 ... n_samples).each do |i|


    x = rand


    y = rand


    acc += 1 if (x**2 + y**2) < 1.0


    end


    return 4.0 * acc / n_samples


    end

    View Slide

  36. function monte_carlo_pi(n_samples)


    acc = 0


    for i in RbCall.RubyRange(0, 1, n_samples, true)


    x = RbCall.rand(Float64)


    y = RbCall.rand(Float64)


    if (x ^ 2 + y ^ 2) < 1.0


    acc += 1


    end


    end


    return 4.0 * acc / n_samples


    end


    Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl`


    {:rb=>3.143348}


    {:jl=>3.143044}


    monte_carlo_pi_rb; N=1000000


    Range (min … max): 105.851 ms … 108.312 ms


    Time (median): 106.178 ms


    Time (mean ± σ): 106.369 ms ± 602.130 μs





    ▇▇▇▁▇▁█▇▇▇▇▁▇▇▁▇▁▁▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁


    105.851 ms Histogram: frequency by time 108.312 ms<


    monte_carlo_pi_jl; N=1000000


    Range (min … max): 8.753 ms … 9.478 ms


    Time (median): 8.837 ms


    Time (mean ± σ): 8.851 ms ± 72.243 μs


    ▅▅█


    ▅▅▇▁▅▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▇


    8.753 ms Histogram: log(frequency) by time 9.237 ms<

    View Slide

  37. Quicksort
    • The famous algorithm to sort an
    array by the divide-and-conquer
    method
    Experiment
    def partition(ary, lo, hi)


    pi = (lo + hi).div(2)


    pv = ary[pi]


    ary[pi], ary[hi] = ary[hi], ary[pi]


    j = lo


    JLTrans.inbounds do


    (lo ... hi).each do |i|


    if ary[i] <= pv


    ary[i], ary[j] = ary[j], ary[i]


    j += 1


    end


    end


    end


    ary[j], ary[hi] = ary[hi], ary[j]


    j


    end
    def quicksort!(ary)


    quicksort0!(ary, 0, ary.length - 1)


    end


    def quicksort0!(ary, lo, hi)


    if lo < hi


    pi = partition(ary, lo, hi)


    quicksort0!(ary, lo, pi - 1)


    quicksort0!(ary, pi + 1, hi)


    end


    ary


    end

    View Slide

  38. function quicksort!(ary)


    quicksort0!(ary, 0, length(ary) - 1)


    end


    function quicksort0!(ary, lo, hi)


    if lo < hi


    piv = partition(ary, lo, hi)


    quicksort0!(ary, lo, piv - 1)


    quicksort0!(ary, piv + 1, hi)


    end


    ary


    end


    function partition(ary, lo, hi)


    piv = div((lo + hi), 2)


    v = ary[piv + 1]


    ary[piv + 1], ary[hi + 1] = ary[hi + 1], ary[piv + 1]


    j = lo


    @inbounds for i in RbCall.RubyRange(lo, 1, hi, true)


    if ary[i + 1] <= v


    ary[i + 1], ary[j + 1] = ary[j + 1], ary[i + 1]


    j += 1


    end


    end


    ary[j + 1], ary[hi + 1] = ary[hi + 1], ary[j + 1]


    j


    end
    def quicksort!(ary)


    quicksort0!(ary, 0, ary.length - 1)


    end


    def quicksort0!(ary, lo, hi)


    if lo < hi


    piv = partition(ary, lo, hi)


    quicksort0!(ary, lo, piv - 1)


    quicksort0!(ary, piv + 1, hi)


    end


    ary


    end


    def partition(ary, lo, hi)


    piv = (lo + hi).div(2)


    v = ary[piv]


    ary[piv], ary[hi] = ary[hi], ary[piv]


    j = lo


    JLTrans.inbounds do


    (lo ... hi).each do |i|


    if ary[i] <= v


    ary[i], ary[j] = ary[j], ary[i]


    j += 1


    end


    end


    end


    ary[j], ary[hi] = ary[hi], ary[j]


    j


    end

    View Slide

  39. function quicksort!(ary)


    quicksort0!(ary, 0, length(ary) - 1)


    end


    function quicksort0!(ary, lo, hi)


    if lo < hi


    piv = partition(ary, lo, hi)


    quicksort0!(ary, lo, piv - 1)


    quicksort0!(ary, piv + 1, hi)


    end


    ary


    end


    function partition(ary, lo, hi)


    piv = div((lo + hi), 2)


    v = ary[piv + 1]


    ary[piv + 1], ary[hi + 1] = ary[hi + 1], ary[piv + 1]


    j = lo


    @inbounds for i in RbCall.RubyRange(lo, 1, hi, true)


    if ary[i + 1] <= v


    ary[i + 1], ary[j + 1] = ary[j + 1], ary[i + 1]


    j += 1


    end


    end


    ary[j + 1], ary[hi + 1] = ary[hi + 1], ary[j + 1]


    j


    end


    Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl`


    quicksort_rb!; N=10000


    Range (min … max): 7.413 ms … 8.341 ms


    Time (median): 7.539 ms


    Time (mean ± σ): 7.551 ms ± 90.993 μs


    ▂▅▄▄▂▆█▇▆


    ▆██████████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    7.413 ms Histogram: log(frequency) by time 7.919 ms<


    quicksort_jl!; N=10000


    Range (min … max): 1.790 ms … 42.982 ms


    Time (median): 1.823 ms


    Time (mean ± σ): 1.937 ms ± 1.764 ms


    █▇ ▄


    ▅██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    1.79ms Histogram: log(frequency) by time 2.196 ms<

    View Slide

  40. convolve
    • Calculating convolution of two
    arrays x and y

    • If two arrays have the same
    length n, the complexity of
    calculation is O(n2)
    Experiment
    def convolve(x, y)


    m, n = x.length, y.length


    z = Array.new(m+n-1, JLTrans.zero(x[0]))


    (0 ... z.length).each do |k|


    zz = z[k]


    i0 = [k-n+1, 0].max


    (i0 ... m).each do |i|


    zz += x[i]*y[k-i]


    break if k-i == 0


    end


    z[k] = zz


    end


    z


    end
    z[0] = x[0]*y[0]

    z[1] = x[0]*y[1] + x[1]*y[0]

    z[2] = x[0]*y[2] + x[1]*y[1] + x[2]*y[0]

    :
    AAACEnicbVC7TsMwFHV4lvIKMLJYVEgwtEpQBSyVKnVhLBJ9SE2IHNdprTpOZDuIEOUbWPgVFgYQYmVi429wHwO0HOnqHp1zr+x7/JhRqSzr21haXlldWy9sFDe3tnd2zb39towSgUkLRywSXR9JwignLUUVI91YEBT6jHT8UWPsd+6IkDTiNyqNiRuiAacBxUhpyTNPH7wRrEFHJqGX0VrZoTxQaX477fDeozD1slGZ5p5ZsirWBHCR2DNSAjM0PfPL6Uc4CQlXmCEpe7YVKzdDQlHMSF50EklihEdoQHqachQS6WaTk3J4rJU+DCKhiys4UX9vZCiUMg19PRkiNZTz3lj8z+slKrh0M8rjRBGOpw8FCYMqguN8YJ8KghVLNUFYUP1XiIdIIKx0ikUdgj1/8iJpn1Xs80r1ulqqN2ZxFMAhOAInwAYXoA6uQBO0AAaP4Bm8gjfjyXgx3o2P6eiSMds5AH9gfP4A5fedpg==
    zk =
    1
    X
    i= 1
    xiyk i

    View Slide

  41. function convolve(x, y)


    m, n = length(x), length(y)


    z = fill(zero(x[1]), m + n - 1)


    for k in RbCall.RubyRange(0, 1, length(z), true)


    zz = z[k + 1]


    i0 = max(k - n + 1, 0)


    for i in RbCall.RubyRange(i0, 1, m, true)


    zz += x[i + 1] * y[k - i + 1]


    if k - i == 0


    break


    end


    end


    z[k + 1] = zz


    end


    z


    end
    def convolve(x, y)


    m, n = x.length, y.length


    z = Array.new(m+n-1, JLTrans.zero(x[0]))


    (0 ... z.length).each do |k|


    zz = z[k]


    i0 = [k-n+1, 0].max


    (i0 ... m).each do |i|


    zz += x[i]*y[k-i]


    break if k-i == 0


    end


    z[k] = zz


    end


    z


    end

    View Slide

  42. function convolve(x, y)


    m, n = length(x), length(y)


    z = fill(zero(x[1]), m + n - 1)


    for k in RbCall.RubyRange(0, 1, length(z), true)


    zz = z[k + 1]


    i0 = max(k - n + 1, 0)


    for i in RbCall.RubyRange(i0, 1, m, true)


    zz += x[i + 1] * y[k - i + 1]


    if k - i == 0


    break


    end


    end


    z[k + 1] = zz


    end


    z


    end


    Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl`


    convolve_jl; N=10000, T=Any[]


    Range (min … max): 92.246 ms … 92.973 ms


    Time (median): 92.702 ms


    Time (mean ± σ): 92.671 ms ± 204.331 μs


    ▃ ▃▃ ▃ ▃█ ▃ ▃ ▃ ▃ ▃ ▃▃█ █ █


    █▁▁▁▁▁▁▁▁██▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁██▁█▁█▁▁█▁▁█▁█▁▁▁███▁█▁▁▁▁▁▁▁▁▁▁█ ▁


    92.246 ms Histogram: log(frequency) by time 92.973 ms<


    convolve_jl; N=10000, T=Float64[]


    Range (min … max): 88.514 ms … 89.272 ms


    Time (median): 88.779 ms


    Time (mean ± σ): 88.820 ms ± 216.830 μs


    ▁ ▁▅ █▅ ▁ ▁ ▁▁ ▁ ▅ ▁▅ ▅


    █▁▁▁▁██▁▁▁▁▁██▁█▁▁▁▁█▁▁██▁▁▁█▁▁▁▁▁▁█▁▁▁▁██▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁ ▁


    88.514 ms Histogram: log(frequency) by time 89.272 ms<


    convolve_rb; N=1000


    Range (min … max): 49.549 ms … 51.262 ms


    Time (median): 50.046 ms


    Time (mean ± σ): 50.108 ms ± 396.058 μs


    ▁ ▃ ▁ █


    ▄▁▄▁▄▁▁▁█▄█▄▁▄█▁▄▇█▁▁▁▄▇▄▇▄▁▁▁▁▁▁▁▄▄▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▄▁▁▁ ▁


    49.549 ms Histogram: frequency by time 51.262 ms<


    convolve_jl; N=1000, T=Any[]


    Range (min … max): 1.239 ms … 23.259 ms


    Time (median): 1.291 ms


    Time (mean ± σ): 1.316 ms ± 591.804 μs


    █▆ ▃


    ▃▃▃▁▁▁▃▃▁▁▃▆██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    1.239 ms Histogram: frequency by time 1.376 ms<


    convolve_jl; N=1000, T=Float64[]


    Range (min … max): 855.000 μs … 44.333 ms


    Time (median): 894.000 μs


    Time (mean ± σ): 916.545 μs ± 930.027 μs


    ██ ▇


    ▅▇▅▇▅▁▅▁▁▅▅▁▅▁▇▇██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    855μs Histogram: log(frequency) by time 948μs<

    View Slide

  43. function convolve(x, y)


    m, n = length(x), length(y)


    z = fill(zero(x[1]), m + n - 1)


    Threads.@threads for k in RbCall.RubyRange(0, 1, length(z), true)


    zz = z[k + 1]


    i0 = max(k - n + 1, 0)


    for i in RbCall.RubyRange(i0, 1, m, true)


    zz += x[i + 1] * y[k - i + 1]


    if k - i == 0


    break


    end


    end


    z[k + 1] = zz


    end


    z


    end


    Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl`


    convolve_jl; N=10000, T=Any[]


    Range (min … max): 38.072 ms … 39.816 ms


    Time (median): 38.226 ms


    Time (mean ± σ): 38.265 ms ± 237.234 μs


    ▁▂▂▇▇█▄▂▆▁ ▁


    ▆██████████▆▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁


    38.072 ms Histogram: log(frequency) by time 39.816 ms<


    convolve_jl; N=10000, T=Float64[]


    Range (min … max): 34.383 ms … 34.723 ms


    Time (median): 34.497 ms


    Time (mean ± σ): 34.513 ms ± 92.663 μs


    ▂█▂ ▂ ▂ ▂ ▂


    ▅▁█▁█▁█████▁█▁▅▁█▅▁▁▅▁▁▅▅█▅▅▅▅▁▅▁█▁█▁▁█▅▅▅▅▁▅█▁▁▁▅▁▁▁▅▁▁▁▁▅ ▁


    34.383 ms Histogram: frequency by time 34.723 ms<


    convolve_rb; N=1000


    Range (min … max): 49.622 ms … 51.365 ms


    Time (median): 49.766 ms


    Time (mean ± σ): 49.884 ms ± 356.988 μs


    █▆█▃ ▁


    ▄▄▇████▄▄█▄▄▇▇▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁


    49.622 ms Histogram: frequency by time 51.365 ms<


    convolve_jl; N=1000, T=Any[]


    Range (min … max): 749.000 μs … 42.637 ms


    Time (median): 772.000 μs


    Time (mean ± σ): 817.622 μs ± 1.168 ms


    █ ▃


    ▃▃▃▅▆█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    749μs Histogram: frequency by time 822μs<


    convolve_jl; N=1000, T=Float64[]


    Range (min … max): 360.000 μs … 2.154 ms


    Time (median): 385.000 μs


    Time (mean ± σ): 386.711 μs ± 35.166 μs





    ▄▁▆▇▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    360μs Histogram: frequency by time 416μs<

    View Slide

  44. Dot-product
    • Calculating the dot product of
    two arrays

    • The complexity is O(n)
    Experiment: bad case
    def dot_product(x, y)


    n = x.length


    JLTrans.assert { n == y.length }


    z = JLTrans.zero(x[0])


    (0 ... n).each do |i|


    z += x[i] * y[i]


    end


    z


    end
    AAACA3icbVDLSsNAFJ34rPUVdaebwSK4KokUdVModOOygn1AG8NkOmmHzkzCzESMIeDGX3HjQhG3/oQ7/8bpY6GtBy4czrmXe+8JYkaVdpxva2l5ZXVtvbBR3Nza3tm19/ZbKkokJk0csUh2AqQIo4I0NdWMdGJJEA8YaQej+thv3xGpaCRudBoTj6OBoCHFSBvJtw8fYBX2VML9jFad/DYTObz3KUx96tslp+xMABeJOyMlMEPDt796/QgnnAiNGVKq6zqx9jIkNcWM5MVeokiM8AgNSNdQgThRXjb5IYcnRunDMJKmhIYT9fdEhrhSKQ9MJ0d6qOa9sfif1010eOllVMSJJgJPF4UJgzqC40Bgn0qCNUsNQVhScyvEQyQR1ia2ognBnX95kbTOyu55uXJdKdXqszgK4Agcg1PgggtQA1egAZoAg0fwDF7Bm/VkvVjv1se0dcmazRyAP7A+fwDn/JcT
    z =
    n
    X
    i=0
    xiyi

    View Slide

  45. function dot_product(x, y)


    n = length(x)


    @assert n == length(y)


    z = zero(x[1])


    for i in RbCall.RubyRange(0, 1, n, true)


    z += x[i + 1] * y[i + 1]


    end


    z


    end


    def dot_product(x, y)


    n = x.length


    JLTrans.assert { n == y.length }


    z = JLTrans.zero(x[0])


    (0 ... n).each do |i|


    z += x[i] * y[i]


    end


    z


    end

    View Slide

  46. function dot_product(x, y)


    n = length(x)


    @assert n == length(y)


    z = zero(x[1])


    for i in RbCall.RubyRange(0, 1, n, true)


    z += x[i + 1] * y[i + 1]


    end


    z


    end


    Activating project at `~/src/github.com/mrkn/ruby-julia/RbCall.jl`


    {:jl=>47.43176456562391, :rb=>47.43176456562391}


    dot_product_rb; N=10000


    Range (min … max): 458.000 μs … 583.000 μs


    Time (median): 465.000 μs


    Time (mean ± σ): 468.608 μs ± 9.411 μs


    █ ▄


    ▂▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    458μs Histogram: frequency by time 512μs<


    dot_product_jl; N=10000


    Range (min … max): 3.636 ms … 11.786 ms


    Time (median): 3.704 ms


    Time (mean ± σ): 3.759 ms ± 428.098 μs


    ▃█


    ▄██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    3.636 ms Histogram: log(frequency) by time 4.386 ms<


    dot_product_jl; N=10000, T=Float64


    Range (min … max): 9.000 μs … 2.608 ms


    Time (median): 10.000 μs


    Time (mean ± σ): 10.651 μs ± 7.985 μs





    ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ █


    9μs Histogram: frequency by time 19μs<

    View Slide

  47. Summary of Experiments
    • Algorithms that have large time complexity get faster by transpiling to Julia

    • Julia can generate optimized native code for function arguments

    • Array conversion is heavy overhead

    • Algorithms with O(n) time complexity get slower when they take array
    arguments

    View Slide

  48. Conclusion

    View Slide

  49. Conclusion
    • We can make a pure-Ruby method super fast …

    • by Ruby-to-Julia transpiler so that the optimized native code is generated

    • when we can ignore Ruby’s dynamic features

    • This JIT compiler requires Julia …

    • but we can utilize Julia’s library features in Ruby

    • If you are interested in this method, please ask red-data-tools project to join
    the development

    View Slide

  50. Future plan

    View Slide

  51. Future plan
    • Coverage improvement of CRuby AST patterns

    • Non-copy wrapper of Numo::NArray in Julia

    • Stability improvement around of GC

    • Support of USE_RVARGC in Ruby 3.2

    • CRuby byte code to Julia conversion

    • Adding trampoline support of AARCH64 in Julia and LLVM

    View Slide