Deep Learning Programming on Ruby

Deep Learning Programming on Ruby

Presented by @mrkn and @hatappi at RubyKaigi 2018

7cca11c5257fda526eeb4b1ada28f904?s=128

Kenta Murata

May 31, 2018
Tweet

Transcript

  1. RubyKaigi 2018 on 31 May 2018 https://www.flickr.com/photos/53416677@N08/4972916707/ Deep Learning Programming

    on Ruby Kenta Murata Yusaku Hatanaka RubyKaigi 2018
  2. Contents 1. About us 2. Introduction of this session 3.

    Deep learning programming on Ruby 3.1. mxnet.rb 3.2. Red Chainer 4. Overview of Ruby’s current data science support 5. Summary of this talk
  3. About us (1) • Kenta Murata (@mrkn) • Full-time CRuby

    committer at Speee, Inc. • bigdecimal, enumerable-statistics, pycall.rb, mxnet.rb, etc. • Ruby, C/C++, Python, Julia, etc. • neovim, vscode (neovim client)
  4. About us (2) • Yusaku Hatanaka (@hatappi) • Speee, Inc

    • Red Data Tools member • Ruby, Go, TypeScript, etc. • I love soybeans a soybean =>
  5. Session Introduction

  6. Deep Learning
 in Ruby

  7. There are several approaches ‣ mxnet.rb
 https://github.com/mrkn/mxnet.rb ‣ Red Chainer


    https://github.com/red-data-tools/red-chainer ‣ Tensorflow.rb
 https://github.com/somaticio/tensorflow.rb ‣ TensorStream
 https://github.com/jedld/tensor_stream
  8. Topics in this session ‣ mxnet.rb
 https://github.com/mrkn/mxnet.rb ‣ Red Chainer


    https://github.com/red-data-tools/red-chainer ‣ Tensorflow.rb
 https://github.com/somaticio/tensorflow.rb ‣ TensorStream
 https://github.com/jedld/tensor_stream
  9. mxnet.rb

  10. What is mxnet.rb? ‣ Ruby binding library of Apache MXNet

    ‣ Since Nov 2017 ‣ You can write deep learning programs in Ruby by using mxnet.rb and MXNet runtime library ‣ It doesn’t depend on Python runtime ‣ You need only Ruby ‣ But `pip install mxnet` is currently easiest way to install MXNet runtime library
  11. Why I write mxnet.rb ‣ I want to write deep

    learning programs in Ruby ‣ Without dependency on Python (pycall.rb) ‣ There is Tensorflow.rb, but I don’t want to use Tensorflow C API ‣ I think Apache MXNet must be best for Ruby
  12. Why MXNet is best for Ruby? ‣ It already supports

    multiple languages ‣ Many stakeholders support MXNet development ‣ There are some good features that can compete with other frameworks
  13. Multi-language support ‣ Not only Python and C/C++ ‣ But

    also Julia, R, JavaScript, Perl, Matlab, Scala, Go ‣ Ruby will be supported soon (I’m working on it)
  14. Companies that support Apache MXNet https://mxnet.incubator.apache.org/community/powered_by.html

  15. Academic organizations that support Apache MXNet

  16. Good Features ‣ Multiple programming paradigm for deep learning ‣

    Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support
  17. Good Features ‣ Multiple programming paradigm for deep learning ‣

    Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support
  18. Multiple programming paradigm for deep learning ‣ Imperative style ‣

    Symbolic style ‣ Hybrid style
  19. # Computation is executed step by step a = MXNet::NDArray.ones([10])

    b = MXNet::NDArray.ones([10]) * 2 c = b * a d = c + 1 Imperative style
  20. # first generate computation graphs a = MXNet::Symbol.var(:a) b =

    MXNet::Symbol.var(:b) c = b * a # generate a computation graph d = c + 1 # ditto # execute the computation graph d.eval(a: MXNet::NDArray.ones([10]), b: MXNet::NDArray.ones([10]) * 2) Symbolic style * 1 + a b c d
  21. Imperative vs Symbolic • Imperative Programs Tend to be More

    Flexible • It enables us to write loop directly in the syntax of the programming language
 e.g. while, until, loop { … }, each { … }, etc. • Symbolic Programs Tend to be More Efficient • It can optimize memory usage automatically • It can optimize computation orders
  22. Computational graph optimization example * 1 + a b c

    d 1 op a b d op = a * b + 1 This optimization reduces both computation steps and memory consumption. Remove c
  23. Hybrid style • Mix both imperative and symbolic styles •

    In deep learning programming • Imperative style is helpful for writing parameter update routines • Gradient calculation should be performed in symbolically • In MXNet, Gluon API supports hybrid style programming
  24. Good Features ‣ Multiple programming paradigm for deep learning ‣

    Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support
  25. MXNet vs Tensorflow • Investigated by Julien Simon
 https://medium.com/@julsimon/keras-shoot-out-tensorflow-vs-mxnet-51ae2b30a9c0 •

    Using Keras to compare them • Both MXNet and Tensorflow can be used as backends of Keras • Three metrics • Precision, Speed, and Memory consumption
  26. Good Features ‣ Multiple programming paradigm for deep learning ‣

    Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support
  27. Multi-node computation • You can use MXNet as a framework

    for distributed scientific computation • Using Key-Value Store to exchange parameters among each thread in each machine • For example: • Distributed model training
 https://mxnet.incubator.apache.org/versions/master/faq/distributed_training.html
  28. MXNet is an Apache incubator project • There are a

    lot of tools for data science under Apache Foundation • Arrow • Hadoop • Kudu • Spark • etc.
  29. ONNX • Open Neural Network Exchange Format • Founded by

    Microsoft and Facebook • We can interchange learned models between different frameworks by ONNX • e.g. We can use Python and Keras for experimental, and we can use Ruby and MXNet for production
  30. Frameworks that support to interchange models by ONNX • MXNet

    • PyTorch • Chainer • Caffe2 • Tensorflow • etc.
  31. Current project status
 of mxnet.rb 2 developers ‣ Me (@mrkn)

    • Conference-driven development • Currently focusing on Gluon API ‣ Laurent Julliard (@ljulliar) • Currently focusing on the coverage of NDArray API Future plan ‣ I want to achieve 100% feature coverage
  32. We want more developers ‣ We are welcome to receive

    your pull-request ‣ I’ll make feature tables and some milestones so that you can find your commit chance more easily
  33. require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def

    initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) DEMO
  34. require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def

    initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) end private def relu(x) ND.maximum(x, ND.zeros_like(x)) end def forward(x) h = x n = @layer_dims.length (n - 2).times do |i| h_linear = ND.dot(h, @weights[i]) + @biases[i] h = relu(h_linear) end y_hat_linear = ND.dot(h, @weights[-1]) + @biases[-1] end private def softmax_cross_entropy(y_hat_linear, t) -ND.nansum(t * ND.log_softmax(y_hat_linear), axis: 0, exclude: true) end def loss(y_hat_linear, t) softmax_cross_entropy(y_hat_linear, t) end def predict(x) y_hat_linear = forward(x) ND.argmax(y_hat_linear, axis: 1) end end module_function def SGD(params, lr) params.each do |param| param[0..-1] = param - lr * param.grad end end def evaluate_accuracy(data_iter, model) num, den = 0.0, 0.0 data_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) predictions = model.predict(data) num += ND.sum(predictions == label) den += data.shape[0] end (num / den).as_scalar end def learning_loop(train_iter, test_iter, model, epochs: 10, learning_rate: 0.001, smoothing_constant: 0.01) epochs.times do |e| start = Time.now cumloss = 0.0 num_batches = 0 train_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) label_one_hot = ND.one_hot(label, depth: model.layer_dims[-1]) loss = MXNet::Autograd.record do y = model.forward(data) model.loss(y, label_one_hot) end loss.backward SGD(model.all_parameters, learning_rate) cumloss = ND.sum(loss).as_scalar num_batches += 1 end test_acc = evaluate_accuracy(test_iter, model) train_acc = evaluate_accuracy(train_iter, model) duration = Time.now - start puts "Epoch #{e}. Loss: #{cumloss / (train_iter.batch_size * num_batches)}, " + "train-acc: #{train_acc}, test-acc: #{test_acc} (#{duration} sec)" end end end
  35. Summary of mxnet.rb ‣ MXNet is a deep learning framework

    that is better for supporting in Ruby ‣ mxnet.rb is under development but some APIs has already been usable ‣ Contact me if you want to join the development
  36. Red Chainer

  37. Red Chainer • Deep learning framework
 it ported python's chainer

    with ruby • Use Numo::NArray for holding and computing matrices • One project in development under Red Data Tools
  38. Red Data Tools • Project providing data processing tool for

    Ruby • @ktou was launched in February 2017 • red-arrow, red-datasets, csv gem maintenance, etc
  39. Red Data Tools’s Policy 1. Collaborate across the Ruby community

    2. Acting rather than blaming 3. Continuous, iterative progress rather than a short, big project 4. The current lack of knowledge doesn't matter 5. Ignore criticism from outsiders 6. Fun!
  40. Features of Red Chainer 1. Define-by-Run 2. Provide high level

    API 3. Can be constructed like Ruby 4. OSS Project
  41. Define-by-Run • Define and Run • Build a calculation graph

    and run data • Define by Run • Build a calculation graph with data flowing
  42. Provide high level API • 2D Convolution • BatchNormalization •

    Linear • ReLU • Sigmoid • Softmax • Dropout • etc…
  43. Can be constructed like Ruby

  44. OSS Project red-data-tools/red-chainer • You can see the source code

    at any time • You can start developing together anywhere you want to modify or API you want to add
  45. By having Red Chainer Application Deep Learning Red Chainer

  46. DEMO • Identify CIFAR-10(32x32 image datasets) with Red Chainer using

    CNN • Visualize the accuracy of each epoch with Rails using the graph and the identified image
  47. Future of Red Chainer • GPU compatible: sonots/cumo • Fast

    Numerical Computing and Deep Learning in Ruby with Cumo
 http://rubykaigi.org/2018/presentations/sonots.html#may31 • Support Apache Arrow • Develop around Red Chainer • red-datasets: provides common datasets • red-arrow: Apache Arrow Ruby binding
  48. Summary • introduced Red Chainer of Deep Learning Framework created

    in Ruby • Interested in Red Data Tools, Red Chainer • online • en: https://gitter.im/red-data-tools/en • ja: https://gitter.im/red-data-tools/ja • offline • hold meetup every month at Speee, inc in Tokyo • https://speee.connpass.com/ • I’m at the Speee booth at RubyKaigi2018
  49. Overview of the current status of Ruby’s data science supports

  50. The current status of
 Ruby’s data science support ‣ Red

    Arrow ‣ CRuby’s updates for data science ‣ SciRuby GSoC ‣ RubyData Workshop in RubyKaigi 2018
  51. Red Arrow ‣ Ruby binding of Apache Arrow ‣ It

    has become an official Ruby binding of Apache Arrow ‣ https://github.com/apache/arrow/tree/master/ ruby
  52. 2 updates in CRuby for
 data science ‣ Enumerator::ArithmeticSequence was

    accepted ‣ Range#% was accepted
  53. Enumerator::ArithmeticSequence ‣ We will have an object that works like

    a slice object in Python ‣ Integer#step and Range#step returns such an object
  54. Range#% ‣ An alias to Range#step ‣ A range with

    step can be written as (1…10)%2 ‣ It may be very useful in Numo::NArray, NMatrix, Daru::DataFrame, Arrow::Table, etc.
  55. SciRuby GSoC In GSoC 2018, SciRuby accepts 5 students, and

    then the following 4 projects are running: • Business Intelligence with daru • Advanced features in daru-views • NetworkX.rb: Ruby version of NetworkX • Ruby version of matplotlib The discussions are being held on RubyData’s discourse
 https://discourse.ruby-data.org/c/gsoc/gsoc2018
  56. RubyData Workshop in RubyKaigi 2018 ‣ 3:50pm tomorrow in Room

    Shirakashi ‣ After afternoon break ‣ Contents • Data analysis with Ruby’s data tools • Data analysis with pycall and Python data tools • Introduction of Red Data Tools project
  57. Talk Summary

  58. Talk summary ‣ The development of high-level deep learning frameworks

    in Ruby is progressed day by day ‣ You will be able to do not only deep learning, but also GPGPU and distributed computation by these frameworks ‣ The development of tools for generic data science is also progressed day by day ‣ You can join these development projects
  59. Links mxnet.rb ‣ https://github.com/mrkn/mxnet.rb Red Chainer ‣ https://github.com/red-data-tools/red-chainer Red Data

    Tools ‣ http://red-data-tools.github.io/ SciRuby GSoC ‣ https://discourse.ruby-data.org/c/gsoc/gsoc2018