Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How I made a pure-Ruby word2vec program more than 3x faster

remore
December 02, 2016

How I made a pure-Ruby word2vec program more than 3x faster

Slides for my talk at RubyConf Taiwan 2016
https://2016.rubyconf.tw/#Kei Sawada

remore

December 02, 2016
Tweet

More Decks by remore

Other Decks in Technology

Transcript

  1. How I made a pure-Ruby word2vec
    program more than 3x faster
    RubyConf Taiwan 2016
    @remore

    View Slide

  2. Who Am I
    Kei Sawada
    @remore
    A rubyist from Tokyo"
    An weekend contrabassist
    Engineering Manager at Recruit
    Holdings Co.,Ltd., VP of
    Engineering at NIJIBOX Co.,Ltd.

    View Slide

  3. Me And Taiwan
    A Taiwanese coworker who is working
    at NIJIBOX
    Many #rubyfriends in Taiwan
    Eddie, Ryudo, Chao, Yu-Cheng, lulalala,
    Lin Yu Hsiang and many others
    Super glad to be here today!

    View Slide

  4. Who is interested in Ruby’s Performance
    micro-benchmarking results
    YARV, ISeq and profiling tools⏱
    Who may be interested in RPC(IPC) with
    Python and Julia from Ruby
    This Talk Is Mainly For The
    Rubyist

    View Slide

  5. Table Of Contents
    A Reality
    3x Challenge
    Why Slow

    View Slide

  6. Chapter 1
    A Reality
    A reality of Ruby’s performance
    for large-scale computation

    View Slide

  7. "x=2.5; 1.upto(N){|i| x=x+i}; p x"

    View Slide

  8. > echo "x=2.5; 1.upto(10){|i| x=x+i}; p x" | time ruby
    57.5
    0.13 real 0.06 user 0.05 sys
    0sec
    0.05sec
    0.1sec
    0.15sec
    0.2sec
    10
    Ruby

    View Slide

  9. > echo "x=2.5; 1.upto(100){|i| x=x+i}; p x" | time ruby
    5052.5
    0.11 real 0.06 user 0.04 sys
    0.1sec
    0.108sec
    0.115sec
    0.123sec
    0.13sec
    10 100
    Ruby

    View Slide

  10. > echo "x=2.5; 1.upto(1000){|i| x=x+i}; p x" | time ruby
    500502.5
    0.15 real 0.07 user 0.05 sys
    0sec
    0.038sec
    0.075sec
    0.113sec
    0.15sec
    10 100 1000
    Ruby

    View Slide

  11. > echo "x=2.5; 1.upto(10000){|i| x=x+i}; p x" | time ruby
    50005002.5
    0.11 real 0.06 user 0.04 sys
    0sec
    0.038sec
    0.075sec
    0.113sec
    0.15sec
    10 100 1000 10000
    Ruby

    View Slide

  12. > echo "x=2.5; 1.upto(1e5){|i| x=x+i}; p x" | time ruby
    5000050002.5
    0.14 real 0.08 user 0.05 sys
    0sec
    0.038sec
    0.075sec
    0.113sec
    0.15sec
    10 100 1000 10000 1e5
    Ruby

    View Slide

  13. Looks good enough for
    smaller number of loops!

    View Slide

  14. > echo "x=2.5; 1.upto(1e6){|i| x=x+i}; p x" | time ruby
    500000500002.5
    0.25 real 0.20 user 0.04 sys
    0sec
    0.065sec
    0.13sec
    0.195sec
    0.26sec
    10 100 1000 10000 1e5 1e6
    Ruby

    View Slide

  15. > echo "x=2.5; 1.upto(1e7){|i| x=x+i}; p x" | time ruby
    50000005000002.5
    1.58 real 1.52 user 0.05 sys
    0sec
    0.4sec
    0.8sec
    1.2sec
    1.6sec
    10 100 1000 10000 1e5 1e6 1e7
    Ruby

    View Slide

  16. > echo "x=2.5; 1.upto(1e8){|i| x=x+i}; p x" | time ruby
    5.000000050000003e+15
    14.56 real 14.37 user 0.09 sys
    0sec
    4sec
    8sec
    12sec
    16sec
    10 100 1000 10000 1e5 1e6 1e7 1e8
    Ruby

    View Slide

  17. > echo "x=2.5; 1.upto(1e9){|i| x=x+i}; p x" | time ruby
    5.00000000067109e+17
    157.27 real 150.16 user 1.30 sys
    0sec
    40sec
    80sec
    120sec
    160sec
    10 100 1000 10000 1e5 1e6 1e7 1e8 1e9
    Ruby

    View Slide

  18. Damn, apparently Ruby is Slow
    ⌛ for huge number of loops

    View Slide

  19. > PY=$(cat << EOS
    "n=2.5
    for i in range(1,int(\$N)+1):
    n=i+n;
    print(n)"
    EOS
    )
    > N=1e3 && eval echo "$PY" | time python
    500502.5
    0.10 real 0.01 user 0.01 sys
    How About Python?

    View Slide

  20. > N=1e5 && eval echo "$PY" | time python
    5000050002.5
    0.13 real 0.03 user 0.01 sys
    0sec
    0.038sec
    0.075sec
    0.113sec
    0.15sec
    10 100 1000 10000 1e5
    Ruby Python

    View Slide

  21. Both Ruby And Python are good
    enough for smaller number of loops!

    View Slide

  22. > N=1e6 && eval echo "$PY" | time python
    5.00000500002e+11
    0.38 real 0.23 user 0.02 sys
    0sec
    0.1sec
    0.2sec
    0.3sec
    0.4sec
    10 100 1000 10000 1e5 1e6
    Ruby Python

    View Slide

  23. > N=1e7 && eval echo "$PY" | time python
    5.0000005e+13
    2.66 real 2.35 user 0.17 sys
    0sec
    0.75sec
    1.5sec
    2.25sec
    3sec
    10 100 1000 10000 1e5 1e6 1e7
    Ruby Python

    View Slide

  24. > N=1e8 && eval echo "$PY" | time python
    5.00000005e+15
    48.27 real 25.87 user 10.67 sys
    0sec
    12.5sec
    25sec
    37.5sec
    50sec
    10 100 1000 10000 1e5 1e6 1e7 1e8
    Ruby Python

    View Slide

  25. > N=1e9 && eval echo "$PY" | time python
    5.00000005e+15
    48.27 real 25.87 user 10.67 sys
    0sec
    150sec
    300sec
    450sec
    600sec
    10 100 1000 10000 1e5 1e6 1e7 1e8 1e9
    Ruby Python

    View Slide

  26. > N=1e9 && eval echo "$PY" | time python
    5.00000005e+15
    48.27 real 25.87 user 10.67 sys
    0sec
    150sec
    300sec
    450sec
    600sec
    10 100 1000 10000 1e5 1e6 1e7 1e8 1e9
    Ruby Python
    Attention Please
    BTW take note that this micro
    benchmark is done by my MacBook
    Pro(2015) with Ruby 2.3.0 and Python 2.7.
    With my environment Python looks pretty
    slow but it’s never be a fair judge. Please
    do not take this measurement result
    seriously, but please just use this to grab
    the feeling of the order of each
    programming environment speed!

    View Slide

  27. Sadly BOTH Ruby and Python are
    Slow⌛ for huge number of loops

    View Slide

  28. > SRC=$(cat << EOS
    "#include \"stdio.h\"
    int main(){
    double n=2.5;
    for(int i=1;i<=\$N;i++){
    n=i+n;
    }
    printf(\"%lf\", n);
    }"
    EOS
    )
    What About C?

    View Slide

  29. > N=1e5 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out
    5000050002.500000
    real 0m0.006s
    user 0m0.001s
    sys 0m0.002s
    0sec
    0.04sec
    0.08sec
    0.12sec
    0.16sec
    10 100 1000 10000 1e5
    Ruby Python C

    View Slide

  30. C is …… Fast!

    View Slide

  31. 0sec
    0.1sec
    0.2sec
    0.3sec
    0.4sec
    10 100 1000 10000 1e5 1e6
    Ruby Python C
    > N=1e6 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out
    500000500002.500000
    real 0m0.009s
    user 0m0.004s
    sys 0m0.002s

    View Slide

  32. 0sec
    0.75sec
    1.5sec
    2.25sec
    3sec
    10 100 1000 10000 1e5 1e6 1e7
    Ruby Python C
    > N=1e7 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out
    50000005000002.500000
    real 0m0.033s
    user 0m0.029s
    sys 0m0.002s

    View Slide

  33. 0sec
    12.5sec
    25sec
    37.5sec
    50sec
    10 100 1000 10000 1e5 1e6 1e7 1e8
    Ruby Python C
    > N=1e8 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out
    5000000050000003.000000
    real 0m0.287s
    user 0m0.281s
    sys 0m0.003s

    View Slide

  34. 0sec
    150sec
    300sec
    450sec
    600sec
    10 100 1000 10000 1e5 1e6 1e7 1e8 1e9
    Ruby Python C
    > N=1e9 && eval echo "$SRC" > main.c; gcc main.c; time ./a.out
    500000000067108992.000000
    real 0m2.815s
    user 0m2.799s
    sys 0m0.008s

    View Slide

  35. C is ridiculously Fast

    View Slide

  36. Introducing Julia
    Julia is
    A dynamic programming language
    4 years old since open sourced in 2012
    Desgined for scientific computing
    Fast

    View Slide

  37. > JL=$(cat << EOS
    "function sample_loop(n)
    for i in 1:\$N
    n = i+n
    end
    n
    end
    println(sample_loop(2.5))"
    EOS
    )
    How About Julia?

    View Slide

  38. > N=1e5 && eval echo "$JL" | time julia
    sample_loop (generic function with 1 method)
    5.0000500025e9
    0.91 real 0.48 user 0.14 sys
    0sec
    0.125sec
    0.25sec
    0.375sec
    0.5sec
    10 100 1000 10000 1e5
    Ruby Python C Julia

    View Slide

  39. Julia is the slowest(⁉) for
    smaller number of loops

    View Slide

  40. However

    View Slide

  41. 0sec
    0.125sec
    0.25sec
    0.375sec
    0.5sec
    10 100 1000 10000 1e5 1e6
    Ruby Python C Julia
    > N=1e6 && eval echo "$JL" | time julia
    sample_loop (generic function with 1 method)
    5.000005000025e11
    0.45 real 0.44 user 0.08 sys

    View Slide

  42. 0sec
    0.75sec
    1.5sec
    2.25sec
    3sec
    10 100 1000 10000 1e5 1e6 1e7
    Ruby Python C Julia
    > N=1e7 && eval echo "$JL" | time julia
    sample_loop (generic function with 1 method)
    5.00000050000025e13
    0.50 real 0.47 user 0.09 sys

    View Slide

  43. 0sec
    12.5sec
    25sec
    37.5sec
    50sec
    10 100 1000 10000 1e5 1e6 1e7 1e8
    Ruby Python C Julia
    > N=1e8 && eval echo "$JL" | time julia
    sample_loop (generic function with 1 method)
    5.000000050000003e15
    1.82 real 0.76 user 0.09 sys

    View Slide

  44. 0sec
    150sec
    300sec
    450sec
    600sec
    10 100 1000 10000 1e5 1e6 1e7 1e8 1e9
    Ruby Python C Julia
    > N=1e9 && eval echo "$JL" | time julia
    sample_loop (generic function with 1 method)
    5.00000000067109e17
    1.71 real 1.70 user 0.08 sys

    View Slide

  45. Julia is as fast as C for
    bigger number of loops

    View Slide

  46. Findings
    Ruby works reasonably fast for smaller
    number of loops, but for huge number of loops
    it is advisable to consider to switch language
    Primary option would be using C
    Julia is also dynamic language but it can be
    FAST

    View Slide

  47. Chapter 2
    3x Challenge
    An experiment to workaround this
    performance issue using Julia programming
    language along with ruby2julia transpiler

    View Slide

  48. Given That Performance Issue,
    Which Option Is The Best To Workaround?
    Give up and use other languages anyway?
    Make Ruby itself faster?
    Make a gem to boost my ruby program?

    View Slide

  49. Idea: Transpiler
    What if we can run arbitrary ruby
    code on a Julia process? It may
    look something like
    `some_ruby_code.to_other_lang`

    View Slide

  50. Is it really possible to convert
    Ruby code to Julia code?

    View Slide

  51. # ruby
    for i in 1..N
    n = i+n
    end
    # julia
    for i in 1:N
    n = i+n
    end
    Sometimes it's hopeful
    range operator
    create range object

    View Slide

  52. # ruby
    class Sample
    def context
    return self
    end
    end
    p Sample.new.instance_eval{context}
    # julia
    # too many gaps to be filled such as OOP
    things, methods for reflection etc...
    But Sometimes it's NOT

    View Slide

  53. I tested it anyway, although
    it wasn’t sure if it’s promising

    View Slide

  54. A ruby2julia Transpiler
    Implementation: Julializer
    github.com/remore/julializer
    Very limited syntax is supported as of v0.1.2
    TrueClass, FalseClass, Fixnum, Float, integer, Numeric, Random
    Array, Range, Hash are also partially supported(only very few
    methods as of now)
    TBH still need huge improvements including developing error
    checking tool and writing documentations
    "Ͱ΋΍ΔΜͩΑ"

    View Slide

  55. $ echo “-1.6.to_i” | julializer
    trunc(Int64,parse(string((-1.6))));
    $ cat sample.rb
    for i in 0..list.size-1
    list[i] = (i-list.size/2).abs
    end
    $ julializer sample.rb
    for i::Int64 = 0:size(list)[1]-1;list[i+1]=abs((i-
    size(list)[1]/2));;end;;
    Examples

    View Slide

  56. $ ruby -r julializer -e "p Julializer.ruby2julia(File.read('calc.rb'))"
    "const max_exp=6;;const exp_table_size=1000;;const max_sentence_length=1000;;function init_unigram_table(table_size,
    vocab);train_words_pow=0.0;;power=0.75;;table=fill(0, table_size);;for a::Int64 = 0:size(vocab)[1]-1;train_words_pow+=vocab[a+1]
    [0+1]^power;;end;;i=0;;d1=(vocab[i+1][0+1]^power)/train_words_pow;;for a::Int64 = 0:table_size-1;table[a+1]=i;if a/float(table_size)>d1;i+=1;;d1+=(vocab[i+1]
    [0+1]^power)/train_words_pow;;;end;if i>=size(vocab)[1];i=size(vocab)[1]-1;;end;;end;;return table;;;end;;function addop(size, list, base, target);for i::Int64 =
    0:size-1;list[i+base+1]+=target[i+1];;end;;list;;end;;function addop2(size, list, base, coefficient, target, base2);for i::Int64 = 0:size-1;list[i+base
    +1]+=coefficient*target[i+base2+1];;end;;list;;end;;function addop3(size, f, coefficient, target, base);for i::Int64 = 0:size-1;f+=coefficient[i+1]*target[i+base
    +1];;end;;f;;end;;function addop4(size, list, target, base);for i::Int64 = 0:size-1;list[i+1]+=target[i+base+1];;end;;list;;end;;myrandom=0;;function
    next_random();global myrandom;myrandom=abs((myrandom*25214903917+11));;return myrandom;;;end;;function exptable(num);num=exp((num/
    float(exp_table_size)*2-1)*max_exp);;num/(num+1);;end;;function bsearch_index(list, target);a=0;;z=size(list)[1]-1;;while (true);current_entry=list[a+1:z+1]
    [floor(Int64,((z-a)/2))+1];if current_entry=target)||z-a<=1;return round(Int64,(a+(z-
    a)/2+1));;;else;a=round(Int64,(a+(z-a)/2));;;end;;;;else;if a>=target||z-a<=1;return a;;end;;z=round(Int64,(z-(z-a)/2));;;end;;;end;;;end;;function calc_vec(iter,
    original_text, sample, train_words, debug_mode, __vocab_index_hash, vocab, syn0, syn1neg, negative, alpha, __cum_table, table_size, layer1_size,
    window);sentence_position=0;;sentence_length=0;;word_count=0;;word_count_actual=0;;last_word_count=0;;sen=[];;local_iter=iter;;neu1=[];;neu1e=[];;backup=copy(
    original_text);;__denominator=trunc(Int64,parse(string((exp_table_size/max_exp/
    2))));;__sample_train_words=sample*train_words;;table_size=trunc(Int64,parse(string(1e8)));;table=init_unigram_table(table_size,vocab);;starting_alpha=alpha;;
    while true;if sentence_position%500==0&&debug_mode>1;print(@sprintf(\"%d %d / \",word_count,last_word_count));;end;if word_count-
    last_word_count>10000;word_count_actual+=word_count-last_word_count;;last_word_count=word_count;;if debug_mode>1;print(string(\"\\r Alpha:
    \",@sprintf(\"%f\",alpha),\" Progress: \",@sprintf(\"%.2f\",(word_count_actual/float((iter*train_words+1))*100)),\"%\"));;end;;alpha=starting_alpha*(1-
    word_count_actual/float((iter*train_words+1)));;if alphasentence_length==0;skipped=0;;sen=[];;___state = start(original_text);while !done(original_text, ___state);___i, ___state = next(original_text, ___state);e =
    ___i;if haskey(__vocab_index_hash, e);word=__vocab_index_hash[string(e)];;;else;skipped+=1;;continue;;;end;;;word_count+=1;;if word==0;break;;end;;if
    sample>0;ran=(sqrt(vocab[word+1][0+1]/__sample_train_words)+1)*__sample_train_words/vocab[word+1][0+1];;if ran<(next_random()&(0xFFFF+0))/
    65536.0;continue;;end;;;end;;push!(sen, word);sentence_length+=1;;if sentence_length>=max_sentence_length;break;;end;;;end;;if max_sentence_length
    +skipped<=length(original_text)-1;splice!(original_text, 0+1:0+0+max_sentence_length+skipped+1);;else;original_text=[];;;end;;;sentence_position=0;;;end;if
    size(original_text)[1]==0||word_count>train_words;word_count_actual+=word_count-last_word_count;;local_iter-=1;;if debug_mode>1;print(local_iter);;end;;if
    local_iter==0;break;;end;;word_count=0;;last_word_count=0;;sentence_length=0;;original_text=copy(backup);;sen=[];;continue;;;end;if sentence_position>=size(sen)
    [1];continue;;end;word=sen[sentence_position+1];neu1=fill(0.0, layer1_size);neu1e=fill(0.0, layer1_size);b=next_random()%window;cw=0;for j::Int64 = b:window*2-
    b;if j!=window;k=sentence_position-window+j;;if k<0||k>=sentence_length;continue;;end;;if k>=size(sen)[1];continue;;end;;last_word=sen[k
    +1];;neu1=addop4(layer1_size,neu1,syn0,last_word*layer1_size);;cw+=1;;;end;;end;if cw!=0;for j::Int64 = 0:layer1_size-1;neu1[j+1]/=cw;;end;;if negative>0;for
    j::Int64 = 0:negative;if j==0;target=word;;label=1;;;else;nr=next_random();;target=table[(nr>>16)%table_size+1];;if target==0;target=nr%(size(vocab)
    [1]-1)+1;;end;;if target==word;continue;;end;;label=0;;;end;;l2=target*layer1_size;f=0.0;f=addop3(layer1_size,f,neu1,syn1neg,l2);if
    f>max_exp;g=(label-1)*alpha;;;elseif f<(-max_exp);g=label*alpha;;;else;g=(label-exptable(trunc(Int64,parse(string(((f
    +max_exp)*__denominator))))))*alpha;;;end;;;neu1e=addop2(layer1_size,neu1e,0,g,syn1neg,l2);syn1neg=addop2(layer1_size,syn1neg,l2,g,neu1,0);;end;;;end;;for
    j::Int64 = b:window*2-b;if j!=window;c=sentence_position-window+j;;if c<0||c>=sentence_length;continue;;end;;if c>=size(sen)[1];continue;;end;;last_word=sen[c
    +1];;syn0=addop(layer1_size,syn0,last_word*layer1_size,neu1e);;;end;;end;;;end;sentence_position+=1;if
    sentence_position>=sentence_length;sentence_length=0;;end;;end;;[syn0,syn1neg];;end;;"
    You can convert word2vec.rb

    View Slide

  57. Next Problem: How To Run
    a Julia Program from Ruby
    For example: Run the external program like
    this?
    Process.spawn(\"echo 'p 123' |
    julializer | julia\", :out=>”STDOUT")
    Obviously not good solution(you need to
    marshal data manually + Julia language VM
    must be booted up at every single function call)

    View Slide

  58. Idea: IPC With Julia
    What if we can pass arbitrary Ruby
    value to running Julia background
    process throughout Module via IPC?

    View Slide

  59. Introducing virtual_module
    github.com/remore/virtual_module
    An IPC module generator
    Julia and Python are supported as
    a background process
    Marshaling with msgpack

    View Slide

  60. Sample Usage(1):
    Calling Julia from Ruby
    jl = VirtualModule.new(:julia=>["Clustering"])
    include jl
    r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter)
    p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]

    View Slide

  61. jl = VirtualModule.new(:julia=>["Clustering"])
    include jl
    r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter)
    p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]
    Sample Usage(1):
    Calling Julia from Ruby
    By calling VirtualModule#new method, Julia
    background process is booted up and starts to
    idle

    View Slide

  62. jl = VirtualModule.new(:julia=>["Clustering"])
    include jl
    r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter)
    p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]
    Sample Usage(1):
    Calling Julia from Ruby
    VirtualModule#new method will give you back an
    instance of Module class

    View Slide

  63. jl = VirtualModule.new(:julia=>["Clustering"])
    include jl
    r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter)
    p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]
    Sample Usage(1):
    Calling Julia from Ruby
    Since it’s an instance of Module class,
    you can #include it

    View Slide

  64. jl = VirtualModule.new(:julia=>["Clustering"])
    include jl
    r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter)
    p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]
    Sample Usage(1):
    Calling Julia from Ruby
    Now is the time to call arbitrary function in Julia.
    Every single parameters passed in Ruby’s world is
    converted to Julia’s value by msgpack

    View Slide

  65. jl = VirtualModule.new(:julia=>["Clustering"])
    include jl
    r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter)
    p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]
    Sample Usage(1):
    Calling Julia from Ruby
    msgpack does convert only basic data types(such as Integer, String
    Array etc). In this case, since kmeans function returns a value of
    `Clustering.KmeansResult{Float64}` Type, r is still an instance of
    Module class which keep pointer to
    `Clustering.KmeansResult{Float64}` value kept in background process.

    View Slide

  66. jl = VirtualModule.new(:julia=>["Clustering"])
    include jl
    r = Clustering.kmeans(jl.rand(5, 1000), 20, maxiter:200, display: :iter)
    p Clustering.assignments(r) # [3, 13, 2, 7, 15, 12, 10, ... 13, 1, 8]
    Sample Usage(1):
    Calling Julia from Ruby
    Since Clustering.assignments function returns basic
    data type which can be converted to Ruby’s Array,
    finally we’ve got the clustering result!

    View Slide

  67. Sample Usage(2):
    Calling Python(sklearn) from Ruby
    skl = VirtualModule.new(
    :lang=>:python, :pkgs=>["sklearn"=>["datasets", "svm", "grid_search", “cross_validation"]]
    )
    include skl
    iris = datasets.load_iris(:_)
    clf = grid_search.GridSearchCV(
    svm.LinearSVC(:_), {'C':[1, 3, 5],'loss':['hinge', 'squared_hinge']}, verbose:0
    )
    clf.fit(iris.data, iris.target)
    p "Best Params: #{best_params = clf.best_params_}" #"Best Params: {\"loss\"=>\"squared_hinge\", \"C\"=>1}"
    score = cross_validation.cross_val_score(
    svm.LinearSVC(loss:'squared_hinge', C:1), iris.data, iris.target, cv:5
    )
    p "Scores: #{[:mean,:min,:max,:std].map{|e| e.to_s + '=' + score.send(e, :_).to_s }.join(',')}" # "Scores:
    mean=0.9666666666666668,min=0.9,max=1.0,std=0.04216370213557838"

    View Slide

  68. Sample Usage(3):
    Defining Custom Methods With Julializer
    vm = VirtualModule.new(methods:<->(s)
    {Julializer.ruby2julia(s)})
    def init_table(list)
    for i in 0..list.size-1
    list[i]+=Random.rand
    end
    list
    end
    EOS
    p vm.init_table([1,20]) # [1.3066601775641218, 20.17001189249985]

    View Slide

  69. Sample Snippets:
    remore/virtual_module/blob/master/example/calc.rb
    remore/virtual_module/blob/master/example/scipy.rb
    remore/virtual_module/blob/master/example/word2vec.rb
    Corresponding Blog:
    rimuru.lunanet.gr.jp/blog/calling-python-and-julia-libraries-
    from-ruby
    More Details Can Be Found At

    View Slide

  70. $ SRC=$(cat << EOS
    "p VirtualModule.new(methods:<def sample_loop(n)
    for i in 1..\$N
    n = i+n
    end
    n
    end
    METHOD"
    EOS
    )
    Let’s Run The Simple Huge Loop

    View Slide

  71. > N=1e5 && eval echo "$SRC" | time ruby -r virtual_module
    5000050002.5
    2.68 real 1.58 user 0.36 sys
    0sec
    0.45sec
    0.9sec
    1.35sec
    1.8sec
    10 100 1000 10000 1e5
    Ruby Python C Julia VirtualModule

    View Slide

  72. Is it …… slow?

    View Slide

  73. 0sec
    0.75sec
    1.5sec
    2.25sec
    3sec
    10 100 1000 10000 1e5 1e6
    Ruby Python C Julia VirtualModule
    > N=1e6 && eval echo "$SRC" | time ruby -r virtual_module
    500000500002.5
    2.20 real 1.46 user 0.25 sys

    View Slide

  74. 0sec
    0.75sec
    1.5sec
    2.25sec
    3sec
    10 100 1000 10000 1e5 1e6 1e7
    Ruby Python C Julia VirtualModule
    > N=1e7 && eval echo "$SRC" | time ruby -r virtual_module
    50000005000002.5
    1.68 real 1.51 user 0.21 sys

    View Slide

  75. 0sec
    12.5sec
    25sec
    37.5sec
    50sec
    10 100 1000 10000 1e5 1e6 1e7 1e8
    Ruby Python C Julia VirtualModule
    > N=1e8 && eval echo "$SRC" | time ruby -r virtual_module
    5.000000050000003e+15
    1.95 real 1.75 user 0.21 sys

    View Slide

  76. 0sec
    150sec
    300sec
    450sec
    600sec
    10 100 1000 10000 1e5 1e6 1e7 1e8 1e9
    Ruby Python C Julia VirtualModule
    > N=1e9 && eval echo "$SRC" | time ruby -r virtual_module
    5.00000000067109e+17
    4.50 real 4.29 user 0.21 sys

    View Slide

  77. Yay! virtual_module works
    as fast as C and Julia!

    View Slide

  78. $ cd example
    $ ruby word2vec.rb --output /tmp/vectors.bin --train ../doc/benchmark_word2vec/training_data/
    10mb.txt --size 20 --window 10 --negative 5 --sample 1e-4 --binary 1 --iter 3 --debug 0 > /dev/null 2>&1
    $ python
    Python 2.7.12 (default, Jul 1 2016, 15:12:24)
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import gensim
    >>> model = gensim.models.Word2Vec.load_word2vec_format('/tmp/vectors.bin', binary=True)
    >>> model.most_similar("japan")
    [(u'netherlands', 0.9741939902305603), (u'china', 0.9712631702423096), (u'county',
    0.9686408042907715), (u'spaniards', 0.9669440388679504), (u'vienna', 0.9614173769950867),
    (u'abu', 0.9587018489837646), (u'korea', 0.9565504789352417), (u'canberra',
    0.954473614692688), (u'erupts', 0.9540712833404541), (u'prefecture', 0.9534248113632202)]
    Benchmarking Using Pure-Ruby
    word2vec implementation

    View Slide

  79. Benchmarking Program Can Be
    Found At:
    github.com/remore/virtual_module/blob/master/doc/
    benchmark_word2vec/
    compare_three_types_of_word2vec_implementation.sh

    View Slide

  80. Benchmarking Result

    View Slide

  81. 2.4x 3x 5.6x 297x
    Looks Almost 3x Faster!
    Done!?

    View Slide

  82. Yes done, in a sense of
    making pure-Ruby word2vec
    program more than 3x faster
    But…

    View Slide

  83. But Still Problematic
    If the size of the text going
    seriously huge, compared to
    C there is still big gap….

    View Slide

  84. Chapter 3
    Why Slow
    An ongoing profiling attempt to know
    what part of the program cause this
    performance issue

    View Slide

  85. What part of the source
    code works slow?

    View Slide

  86. $ cat profile.rb
    RubyProf.start
    x=2.5
    1.upto(1e4){|i| x=x+i};
    p x
    result = RubyProf.stop
    RubyProf::FlatPrinter.new(result).print(STDOUT)
    $ ruby -r ruby-prof profile.rb
    ruby-prof

    View Slide

  87. $ ruby -r ruby-prof profile.rb
    50005002.5
    Measure Mode: wall_time
    Thread ID: 70204763724260
    Fiber ID: 70204768000900
    Total: 0.009597
    Sort by: self_time
    %self total self wait child calls name
    52.19 0.010 0.005 0.000 0.005 1 Integer#upto
    16.92 0.002 0.002 0.000 0.000 10001 Fixnum#>
    15.79 0.002 0.002 0.000 0.000 10000 Fixnum#+
    14.60 0.001 0.001 0.000 0.000 10000 Float#+
    0.22 0.010 0.000 0.000 0.010 1 Global#[No method]
    0.19 0.000 0.000 0.000 0.000 1 Kernel#p
    0.09 0.000 0.000 0.000 0.000 1 Float#inspect
    #upto is the slowest.
    #> is also slow.

    View Slide

  88. What is happening under
    the hood?

    View Slide

  89. $ ruby -e “
    printf
    RubyVM::InstructionSequence.compile(
    'x=2.5; 1.upto(1e4){ |i| x = x+i }’
    ).disasm"
    InstructionSequence

    View Slide

  90. $ ruby -e "printf RubyVM::InstructionSequence.compile('x=2.5; 1.upto(1e4){ |i| x = x+i }').disasm"
    == disasm: #@>================================
    == catch table
    | catch type: break st: 0006 ed: 0013 sp: 0000 cont: 0013
    |------------------------------------------------------------------------
    local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: [email protected], kwrest: -1])
    [ 2] x
    0000 trace 1 ( 1)
    0002 putobject 2.5
    0004 setlocal_OP__WC__0 2
    0006 putobject_OP_INT2FIX_O_1_C_
    0007 putobject 10000.0
    0009 send , , block in
    0013 leave
    == disasm: #@>=======================
    == catch table
    | catch type: redo st: 0002 ed: 0014 sp: 0000 cont: 0002
    | catch type: next st: 0002 ed: 0014 sp: 0000 cont: 0014
    |------------------------------------------------------------------------
    local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: [email protected], kwrest: -1])
    [ 2] i
    0000 trace 256 ( 1)
    0002 trace 1
    0004 getlocal_OP__WC__1 2
    0006 getlocal_OP__WC__0 2
    0008 opt_plus ,
    0011 dup
    0012 setlocal_OP__WC__1 2
    0014 trace 512
    0016 leave
    will help us understand
    what’s happening internally

    View Slide

  91. $ ruby -e "printf RubyVM::InstructionSequence.compile('x=2.5; 1.upto(1e4){ |i| x = x+i }').disasm"
    == disasm: #@>================================
    == catch table
    | catch type: break st: 0006 ed: 0013 sp: 0000 cont: 0013
    |------------------------------------------------------------------------
    local table (size: 2, argc: 0 [opts: 0, rest: -1, post: 0, block: -1, kw: [email protected], kwrest: -1])
    [ 2] x
    0000 trace 1 ( 1)
    0002 putobject 2.5
    0004 setlocal_OP__WC__0 2
    0006 putobject_OP_INT2FIX_O_1_C_
    0007 putobject 10000.0
    0009 send , , block in
    0013 leave
    == disasm: #@>=======================
    == catch table
    | catch type: redo st: 0002 ed: 0014 sp: 0000 cont: 0002
    | catch type: next st: 0002 ed: 0014 sp: 0000 cont: 0014
    |------------------------------------------------------------------------
    local table (size: 2, argc: 1 [opts: 0, rest: -1, post: 0, block: -1, kw: [email protected], kwrest: -1])
    [ 2] i
    0000 trace 256 ( 1)
    0002 trace 1
    0004 getlocal_OP__WC__1 2
    0006 getlocal_OP__WC__0 2
    0008 opt_plus ,
    0011 dup
    0012 setlocal_OP__WC__1 2
    0014 trace 512
    0016 leave
    But which instruction
    on earth is really slow?

    View Slide

  92. What EXACTLY is
    happening under the hood?

    View Slide

  93. Introducing yarv-prof
    github.com/remore/yarv-prof
    A tiny DTrace-Based YARV profiler
    Instrumented profiling with walltime or cputime
    Only basic dataset are provided so far. Still
    under development

    View Slide

  94. require 'yarv-prof'
    YarvProf.start(clock: :cpu, out:'~/log/')
    x=2.5
    1.upto(N){|i|
    x=x+i
    }
    p x
    YarvProf.end
    yarv-prof

    View Slide

  95. opt_plus is more than 50% slower than other insn

    View Slide

  96. (For more metrics and features, still
    under development, to be continued)

    View Slide

  97. Chapter 4
    Your Turn!
    Why not to attempt by yourself
    towards Ruby 3x3?
    Or even “5xRuby”?

    View Slide

  98. Thanks!
    @remore

    View Slide