Speeding up NMatrix by 100x

Speeding up NMatrix by 100x

With the growing need for fast numerical computing tools, there is a need for a library in Ruby which can perform at the level of NumPy in Python and can provide as much rich API. In this talk, we'll explore how NMatrix is being re-implemented to match this need and how it ended up getting renamed as NumRuby. We'll further explore the progress so far on NumRuby and potential future work.

We'll further explore how one can make the best use of Ruby C extensions for fast number crunching and not end up messing things up.

F52aec30af13577aa9082e81ba327744?s=128

Udit Gulati

November 19, 2019
Tweet

Transcript

  1. 2.

    2 About me • Computer Science senior at IIIT Una

    • SciRuby Contributor • Google Summer of Code 2019 • Projects – NumRuby – Ruby-Sparse
  2. 3.

    3 SciRuby • SciRuby has been trying to push Ruby

    for scientific computing. • Popular gems: – NMatrix – Daru – iRuby notebooks – Rubyplot – RbCuda
  3. 4.

    4 NMatrix • NMatrix is SciRuby’s numerical matrix core, implementing

    dense matrices as well as linked- list-based and CSR sparse matrices. • It relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for it’s linear algebra operations.
  4. 5.

    5 Daru • Daru (Data Analysis in RUby) is a

    library for storage, analysis, manipulation and visualization of data in Ruby. • Daru makes it easy to process data through 2 data structures: – Daru::DataFrame – Daru::Vector
  5. 6.

    6 iRuby notebooks • iRuby is a Ruby kernel for

    Jupyter project. • Browser based Ruby REPL for interacive computing.
  6. 7.

    7 Rubyplot • An advanced plotting library for Ruby. •

    Aims to allow you to visualize anything, anywhere with a flexible, extensible, Ruby-like API. • Support for GR and Rmagick (ImageMagick) backend.
  7. 8.

    8 RbCuda • CUDA bindings for Ruby. • Ready-made on-GPU

    linear algebra, reduction, scan using cuBLAS, cuMath, cuSolver libraries. • CUDA profiler for Ruby.
  8. 10.

    10

  9. 11.

    11

  10. 12.

    12 What went wrong? • NMatrix(old) is much slower than

    Numo-NArray. • Due to different implementation of different parts of backend resulting is slower interfaces between them. • Development not being primarily focused on speed improvements.
  11. 13.

    13 NumRuby - NMatrix reimplementation • NMatrix(old) is slow due

    to implementation overheads. • NumRuby solves this and makes it’s performance comparable to NumPy and Numo- NArray.
  12. 14.

    14 How NumRuby is the solution • NumRuby is being

    written as clean and simple code. • Easy to improve and extend. • Speed matches Numo-NArray and NumPy. • Supports indexing, iteration and slicing.
  13. 20.

    20 How NumRuby works • All operations are done in

    C. • Implemented using Ruby C extensions. • N-dimensional matrix stored as row-major one- dimensional array. • Uses switch statement for polymorphism in C.
  14. 22.

    22 Elementwise operations • Supports elementwise addition, subtraction, product, division,

    log, exp and trigonometric operations etc. • Support for user-defined operations to be added soon.
  15. 23.

    23 Indexing and iteration • N-dimensional index tuple converted to

    one- dimensional array index using strides. • Iterations elementwise, along row, column or any other dimension. • Iteration with indices.
  16. 24.

    24 NumRuby uses strides for indexing • Stride stores starting

    index for each dimension. • For a N-dimensional matrix, the stride is a tuple of size N. • Pair-wise multiply stride tuple with index tuple and add to get index in one-dimensional array.
  17. 25.

    25 Stride calculation example • Shape: [2, 3, 4] •

    Strides: [3*4, 4, 1] → [12, 4, 1] • Index: [0, 1, 2] • Index in 1-d list: (0*12)+(1*4)+(2*1) = 6
  18. 26.

    26 Broadcasting • Broadcasting is when 2 matrices are of

    unequal shapes but are compatible when scaled across each other. • Ex: A 1-d vector can be applied to a 2-d matrix if one of the dimensions is same.
  19. 27.

    27 Broadcasting example • N: [2, 3] → [2, 3]

    • M: [3] → [1, 3] → [2, 3] • N: [5, 3, 4] → [5, 3, 4] • M: [3, 4] → [1, 3, 4] → [5, 3, 4] • N: [2, 3] • M: [4] • Incompatible
  20. 28.

    28 Linear Algebra • Implemented using BLAS and LAPACK. •

    These are FORTRAN libraries which are highly optimized for speed. • NumRuby::Lapack is the wrapper to LAPACK. This is used by NumRuby::Linalg.
  21. 29.
  22. 35.

    35 Ruby-Sparse • Library for sparse matrices. • Supports 4

    types of sparse: – CSR – CSC – COO – DIA • Easy conversion with dense matrices.
  23. 36.

    36 Sparse matrices • A matrix is sparse when most

    of it’s elements are zero. • Only non-zero elements are stored. • Uses less memory and is also fast for some operations.
  24. 38.

    38 Objectives of Ruby-Sparse • To be a complete library

    for Sparse matrices. • Easy interfacing with popular dense matrix libraries. • Fast. • Easy to use API.
  25. 43.

    43 Graph module • Implements graph algorithm with graph stored

    in a sparse matrix. • Key Algorithms: – DFS, BFS – Shortest path – Maximum flow – Minimum spanning tree
  26. 44.

    44 Linear algebra module • Linear algebra for sparse matrices.

    • There isn’t a standard FORTRAN library for linear algebra unlike for dense.
  27. 45.

    45 Future work • Improve iRuby support. • Complete NumRuby::Linalg.

    • Add image processing support. • Serialization (marshalling) support.