Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Progress of Ruby-Numo: Numerical Computing for Ruby

Progress of Ruby-Numo: Numerical Computing for Ruby

Masahiro Tanaka 田中昌宏

September 19, 2017
Tweet

More Decks by Masahiro Tanaka 田中昌宏

Other Decks in Programming

Transcript

  1. Progress of Ruby/Numo: Numerical Computing for Ruby Masahiro TANAKA -

    田中昌宏 RubyKaigi 2017 @ Hiroshima Nov 19, 2017 RubyKaigi2017@Hiroshima 1
  2. Masahiro Tanaka ▶ Research Fellow at Center for Computational Sciences,

    University of Tsukuba ▶ RubyKaigi Speaker ◦ RubyKaigi 2010 in Tsukuba • Topic: NArray • http://rubykaigi.org/2010/ja/events/83 ◦ RubyKaigi 2016 in Kyoto • Topic: Pwrake • http://rubykaigi.org/2016/presentations/masa16tanaka.html ◦ RubyKaigi 2017 in Hiroshima • Topic: Numo::NArray Nov 19, 2017 RubyKaigi2017@Hiroshima 2
  3. Data Science ▶ Artificial Intelligence (AI) ▶ Machine Learning (ML)

    ▶ Deep Learning (DL) ▶ Big Data Nov 19, 2017 RubyKaigi2017@Hiroshima 3
  4. Popular Programming Language for Data Science ▶ R ◦ Language

    for Statistical Computing ▶ Python ◦ Versatile Language + Library for Scientific Computing ◦ Supported by popular Deep Learning frameworks. Nov 19, 2017 RubyKaigi2017@Hiroshima 4
  5. SciPy Stack Pandas Sympy matplotlib Scikit-learn Jupyter (IPython) NumPy Cython

    SciPy lib Python Scientific Computing Data Science Machine Learning Nov 19, 2017 RubyKaigi2017@Hiroshima 8
  6. Two approaches to do Data Science with Ruby ▶ Calls

    Existing Framework from Ruby ◦ PyCall ▶ Build Libraries in Ruby ◦ Data Science Libraries require Science Libraries ◦ Science Libraries require NumPy-like library ◦ Is there any library like NumPy in Ruby? Nov 19, 2017 RubyKaigi2017@Hiroshima 9
  7. History of NArray ▶ First design ◦ 1999 NArray ver.

    0.3.0 (start) ◦ 2000 NArray ver. 0.5.0 (re-design) ◦ 2016 NArray ver. 0.6.1.2 (current) ▶ Second design ◦ 2007 NArray ver. 0.7 (start) ◦ 2011 NArray ver. 0.9 (re-design) ◦ 2016 Numo::NArray ver. 0.9.0.1 (gem release) ◦ 2017 Numo::NArray ver. 0.9.0.8 (current) Nov 19, 2017 RubyKaigi2017@Hiroshima 11
  8. Ruby Numo ▶ Numo::NArray ◦ N-dimensional Numerical Array ▶ Numo::Gnuplot

    ◦ wrapper to Gnuplot ▶ Numo::GSL ◦ wrapper to GNU Scientific Library ▶ Numo::Linalg ◦ Linear Algebra ◦ wrapper to BLAS/LAPACK ▶ Numo::FFTW, Numo::FFTE ◦ wrapper to FFT libraries Nov 19, 2017 RubyKaigi2017@Hiroshima 12 https://github.com/ruby-numo
  9. NArray coverage ▶ 363 NumPy functions ▶ 217 covered ▶

    91 to-do ▶ 55 no plan ◦ NumPy-specific functions, financial functions etc. Nov 19, 2017 RubyKaigi2017@Hiroshima 13 https://github.com/ruby-numo/narray/wiki/Numo-vs-numpy
  10. Data Type = Subclass of Numo::NArray ▶ Bit, Boolean –

    Numo::Bit ▶ Signed Integer – Numo::Int8, Numo::Int32 – Numo::Int16, Numo::Int64 ▶ Unsigned Integer – Numo::UInt8, Numo::UInt32 – Numo::UInt16, Numo::UInt64 ▶ Floating point real number – Numo::DFloat (Float64) – Numo::SFloat (Float32) ▶ Floating point complex number – Numo::DComplex (Complex128) – Numo::SComplex (Complex64) ▶ Ruby Object – Numo::RObject Nov 19, 2017 RubyKaigi2017@Hiroshima 16
  11. ▶ shape = [4] – 1-dimensional array ▶ shape =

    [4,4] – 2-dimentional array ▶ shape = [4,4,4] – 3-dimensional array Nov 19, 2017 RubyKaigi2017@Hiroshima 17 Shape = Array of sizes along dimensions a[0,0] a[0,1] a[0,2] a[0,3] a[1,0] a[1,1] a[1,2] a[1,3] a[2,0] a[2,1] a[2,2] a[2,3] a[3,0] a[3,1] a[3,2] a[3,3] a[0] a[1] a[2] a[3] a[0,0,0] a[0,0,1] a[0,0,2] a[0,0,3] a[0,1,0] a[0,1,1] a[0,1,2] a[0,1,3] a[0,2,0] a[0,2,1] a[0,2,2] a[0,2,3] a[0,3,0] a[0,3,1] a[0,3,2] a[0,3,3]
  12. ▶ a[1] ◦ returns an element ▶ a[1..2] ◦ returns

    NArray ▶ a[(1..-1).step(2)] ◦ returns NArray ▶ a[[1,2,4]] ◦ returns NArray Nov 19, 2017 RubyKaigi2017@Hiroshima 18 NArray Slice (Indexing) a[0] a[1] a[2] a[3] a[4] a[0] a[1] a[2] a[3] a[4] a[0] a[1] a[2] a[3] a[4] a[0] a[1] a[2] a[3] a[4]
  13. Notation and Speed of Element-wise Operation a = [1,2,3,4,5] b

    = [10,20,30,40,50] p c = a.zip(b).map{|x,y| x + y} #=> [11, 22, 33, 44, 55] require "benchmark" a = (1..10000).to_a b = (10..100000).step(10).to_a Benchmark.bm do |r| r.report do 1000.times{ c = a.zip(b).map{|x,y| x+y} } end end # user system total real # 1.910000 0.010000 1.920000 ( 1.912284) require "numo/narray" a = Numo::NArray[1,2,3,4,5] b = Numo::NArray[10,20,30,40,50] p c = a + b #=> Numo::Int32#shape=[5] #[11, 22, 33, 44, 55] require "benchmark" a = Numo::Int32.new(10000).seq(1) b = Numo::Int32.new(10000).seq(10,10) Benchmark.bm do |r| r.report do 100000.times{ c = a + b } end end # user system total real # 0.750000 0.080000 0.830000 ( 0.830775) Nov 19, 2017 RubyKaigi2017@Hiroshima 19 230 times faster Simple notation
  14. NArray methods (Excerpt from Numo::DFloat) ▶ Arithmetic ◦ + -

    * / % ** -@ abs divmod reciprocal poly sign square ▶ Statistics ◦ clip cumprod cumsum diff kahan_sum kron max max_index mean median min min_index minmax mulsum ptp prod rms sum stddev var sort sort_index ▶ Random Number (Mersenne Twister) ◦ rand rand_norm ▶ Comparison ◦ eq ge gt le lt ne nearly_eq ▶ Numo::NMath module function ◦ acos acosh asin asinh atan atan2 atanh cbrt cos cosh erf erfc exp exp10 exp2 expm1 frexp hypot ldexp log log10 log1p log2 sin sinc sinh sqrt tan tanh Nov 19, 2017 RubyKaigi2017@Hiroshima 20
  15. Important features of Numo::NArray (and NumPy) 1. Slice View 2.

    Broadcasting 3. Masking Nov 19, 2017 RubyKaigi2017@Hiroshima 21
  16. 1. Create View on Slice a = Numo::DFloat[1..5] => Numo::DFloat#shape=[5]

    [1, 2, 3, 4, 5] b = a[2..3] => Numo::DFloat(view)#shape=[2] [3, 4] b[0..1] = 0 a => Numo::DFloat#shape=[5] [1, 2, 0, 0, 5] ▶ b is a view ▶ Saves memory and copy cost. ▶ Slice view is introduced in Numo::NArray Nov 19, 2017 RubyKaigi2017@Hiroshima 22 a = 1 2 3 4 5 b = 3 4
  17. x = Numo::DFloat[[1,2,3]] => Numo::DFloat#shape=[1,3] [[1, 2, 3]] y =

    Numo::DFloat[[2],[10]] => Numo::DFloat#shape=[2,1] [[2], [10]] x*y => Numo::DFloat#shape=[2,3] [[2, 4, 6], [10, 20, 30]] 2. Broadcasting ▶ Apply an element repeatedly on an axis with length=1 Nov 19, 2017 RubyKaigi2017@Hiroshima 23 x[0,0] x[0,1] x[0,2] y[0,0] x[0,0]*y[0,0] x[0,1]*y[0,0] x[0,2]*y[0,0] y[1,0] x[0,0]*y[1,0] x[0,1]*y[1,0] x[0,2]*y[1,0]
  18. 3. Masking a = Numo::DFloat[1,-2,3,-4,-5] => Numo::DFloat#shape=[5] [1, -2, 3,

    -4, -5] a < 0 => Numo::Bit#shape=[5] [0, 1, 0, 1, 1] a[a<0] = 0 a => Numo::DFloat#shape=[5] [1, 0, 3, 0, 0] ▶ Returns Boolean with bit array ▶ Replaces negative elements with zeros Nov 19, 2017 RubyKaigi2017@Hiroshima 24
  19. NMatrix ▶ Main product of SciRuby. ▶ Supports ◦ Multi-dimensional

    Numerical Array ◦ Dense Matrix operation (wrapper to BLAS/LAPACK) ◦ Sparse Matrix ▶ IMO, NMatrix cannot be an alternative to NumPy. Nov 19, 2017 RubyKaigi2017@Hiroshima 26
  20. Feature Comparison NumPy NMatrix First NArray Numo:: NArray View on

    Slice ✔ ✔ ✔ Broadcasting ✔ ✔ ✔ Masking ✔ ✔ ✔ coerce ✔ ✔ ✔ Nov 19, 2017 RubyKaigi2017@Hiroshima 27
  21. Numo::Linalg ▶ Module for Linear Algebra ▶ Wrapper to BLAS/LAPACK

    ▶ Ruby Association Grant 2016 ◦ Supported Kishimoto-san to increase coverage. ▶ Modules: (Similar to scipy.linalg) ◦ Numo::Linalg::Blas, Numo::Linalg::Lapack • direct wrapper to BLAS/LAPACK ◦ Numo::Linalg • Matrix product, Linear solver, Decompositions, Eigen problems, etc. Nov 19, 2017 RubyKaigi2017@Hiroshima 30
  22. Numo::Linalg coverage ▶ Most of BLAS ▶ LAPACK ◦ covered:

    Non-Symmetric and Symmetric functions ◦ not covered: Triangular and Banded functions Nov 19, 2017 RubyKaigi2017@Hiroshima 31
  23. Backend for Numo::Linalg ▶ Original BLAS/LAPACK ▶ Atlas ▶ OpenBLAS

    ▶ Intel MKL ▶ Numo::Linalg uses dlopen to link Lapack backend ◦ Easy to replace backend unlike Scipy. Nov 19, 2017 RubyKaigi2017@Hiroshima 32
  24. Numo::GSL ▶ Wrapper to GSL (GNU Scientific Library) ◦ GSL

    is a collection of numerical routines (> 1000 functions in total). ◦ Corresponding to SciPy library ◦ Operates efficiently on Numo::NArray ◦ Wrapper code is automatically generated from texinfo. ▶ Why not using Ruby/GSL ? ◦ Wrapper code is written by hand -- hard to maintain. Nov 19, 2017 RubyKaigi2017@Hiroshima 33
  25. Numo::GSL coverage Nov 19, 2017 RubyKaigi2017@Hiroshima 34 Covered by NArray,

    Linalg, FFTW Complex Numbers Vectors and Matrices Sorting BLAS Support Linear Algebra Eigensystems Fast Fourier Transforms To do (20) Least-Squares Fitting (multi-parameter) Nonlinear Least-Squares Fitting Basis Splines Chebyshev Approximations Series Acceleration Discrete Hankel Transforms Quasi-Random Sequences Permutations Combinations Multisets Numerical Integration N-tuples Monte Carlo Integration Simulated Annealing Ordinary Differential Equations Numerical Differentiation One dimensional Root-Finding One dimensional Minimization Multidimensional Root-Finding Multidimensional Minimization Covered (15) Mathematical Functions Special Functions Physical Constants Random Number Generation Random Number Distributions Statistics Running Statistics Histograms Interpolation Wavelet Transforms Least-Squares Fitting Sparse Matrices Sparse BLAS Support Sparse Linear Algebra Polynomials
  26. Numo::FFTW, Numo::FFTE ▶ Wrapper to FFT (Fast Fourier Transform) libraries

    ▶ FFTW ◦ Widely-used DFT (Discrete Fourier Transform) library. ◦ Provides Complex FFT methods. ◦ http://www.fftw.org/ ▶ FFTE ◦ Developed by Prof. Takahashi (University of Tsukuba) ◦ http://www.ffte.jp/ ◦ 2,3,5-radix Nov 19, 2017 RubyKaigi2017@Hiroshima 35
  27. Numo::Gnuplot ▶ One of many wrappers to Gnuplot ▶ Features:

    ◦ Simple interface similar to Gnuplot command line ◦ No class for data handling (use Array or NArray) ▶ Example: require "numo/gnuplot" x = (0..100).map{|i| i*0.1} y = x.map{|i| Math.sin(i)} Numo.gnuplot do set title:"X-Y data plot" plot x,y, w:'lines', t:'sin(x)' end Nov 19, 2017 RubyKaigi2017@Hiroshima 36 set title "X-Y data plot" plot '-' w lines t "sin(x)" 0 0 ... e converted Gnuplot script
  28. Numo::Gnuplot example require "numo/narray" require "numo/gnuplot" x = Numo::DFloat.new(1,21).seq-10 y

    = Numo::DFloat.new(21,1).seq-10 f = Numo::NMath.sin(-Numo::NMath.sqrt((x+5)**2+(y-7)**2)*0.5) Numo.gnuplot do set term:"pngcairo" set output:"hidden.png" set isosamples:[25,25] set :xyplane, at:0 unset :key set :palette, rgbformulae:[31,-11,32] set :style, fill_solid:0.5 set cbrange:-1..1 set :hidden3d, :front splot [f, with:"pm3d"], [x**2-y**2, with:"lines", lc_rgb:"black"] end Nov 19, 2017 RubyKaigi2017@Hiroshima 37 https://github.com/ruby-numo/gnuplot-demo
  29. Summary of Progress ▶ Numo::NArray ◦ Numpy-like array is complete.

    ▶ Numo::Linalg ◦ BLAS routines are complete. ◦ Most of non-symmetric and symmetric routines are complete. ▶ Numo::GSL ◦ 15 modules are complete. ▶ Numo::FFTW ◦ Complex DFT is complete. ▶ Numo::Gnuplot ◦ Almost complete. Nov 19, 2017 RubyKaigi2017@Hiroshima 38
  30. SciPy Stack Pandas Sympy matplotlib Scikit-learn Jupyter (IPython) NumPy Cython

    SciPy lib Python Scientific Computing Data Science Machine Learning Nov 19, 2017 RubyKaigi2017@Hiroshima 39
  31. Numo Stack daru? No Sympy? Numo::Gnuplot ML libs? Jupyter (IRuby)

    Numo::NArray Rubex? Numo::Linalg Numo::GSL Numo::FFTW Ruby Scientific Computing Data Science Machine Learning Nov 19, 2017 RubyKaigi2017@Hiroshima 40
  32. Efforts on ML with Ruby Nov 19, 2017 RubyKaigi2017@Hiroshima 41

    https://github.com/arbox/machine-learning-with-ruby
  33. Numo issues ▶ API desgin ▶ Complete to-do list ▶

    Check correctness ▶ Refactoring ▶ Improve performance ▶ Write test ▶ Write document ▶ Collaborations – GPU (CUDA etc.) support – Data Science libraries ▶ I cannot maintain too many projects... Nov 19, 2017 RubyKaigi2017@Hiroshima 42
  34. Deep Learning From Scratch ▶ Published from O'Reilly Japan ◦

    Implement DL with NumPy ◦ https://www.oreilly.co.jp/books/9784873117584/ ▶ Blog post by Akanuma-san ◦ Translation into Ruby/Numo ◦ Elapsed time: • Ruby: 281 sec • Python: 39 sec Nov 19, 2017 RubyKaigi2017@Hiroshima 44 http://blog.akanumahiroaki.com/entry/2017/04/15/160000
  35. Measurement on My Laptop ruby train_neuralnet.rb # 372 sec python3.6

    train_neuralnet.py # 32 sec ▶ Depends on the performance of dot method (matrix product) ◦ NArray: Naïve implementation with single core ◦ NumPy: ATLAS or OpenBLAS with 4 cores ▶ Use OpenBLAS with 4 cores by loading Linalg: ruby -r numo/linalg/use/openblas train_neuralnet.rb # 54 sec ◦ NArray has room for more speed-up. Nov 19, 2017 RubyKaigi2017@Hiroshima 45
  36. Convert Deep Learning code from Python to Ruby ▶ Talk

    in RejectKaigi 2017 by Naitoh-san ▶ Purpose: – Translate Chainer code into Ruby ▶ py2rb.py – Converter from Python to Ruby – Replace NumPy with Numo::NArray Nov 19, 2017 RubyKaigi2017@Hiroshima 46 Another way to "get on the shoulder of the giant" for machine learning with Ruby https://www.slideshare.net/naitoh1/reject-kaigi2017-naitoh
  37. Incompatibility between NumPy and NArray Nov 19, 2017 RubyKaigi2017@Hiroshima 47

    NumPy Numo::NArray Why if condition: if condition != 0 zero is true in Ruby a[:] a[0..-1] or a[true] Range feature a[b:e] a[b..e-1] Range feature b = a.view() b += 1 b = a.view b.inplace + 1 b += 1 is syntax sugar for b = b + 1 a[i] (a.ndim >= 2) a[i,false] NArray feature a[[0,1],[1,0]] a.at([0,1],[1,0]) NArray feature
  38. Summary ▶ Ruby/Numo ◦ N-dimensional Numerical Array (Numo::NArray) ◦ Numerical

    Algorithms (Numo::Linalg|GSL|FFTW|FFTE) ◦ Data visualization (Numo::Gnuplot) ◦ Need further effort. (coverage, performance, test, document) ▶ Experimental cases for Deep Learning Nov 19, 2017 RubyKaigi2017@Hiroshima 48