Deep Learning Programming on Ruby

RubyKaigi 2018 on 31 May 2018 https://www.flickr.com/photos/53416677@N08/4972916707/ Deep Learning Programming
on Ruby Kenta Murata Yusaku Hatanaka RubyKaigi 2018

Contents 1. About us 2. Introduction of this session 3.
Deep learning programming on Ruby 3.1. mxnet.rb 3.2. Red Chainer 4. Overview of Ruby’s current data science support 5. Summary of this talk

About us (1) • Kenta Murata (@mrkn) • Full-time CRuby
committer at Speee, Inc. • bigdecimal, enumerable-statistics, pycall.rb, mxnet.rb, etc. • Ruby, C/C++, Python, Julia, etc. • neovim, vscode (neovim client)

About us (2) • Yusaku Hatanaka (@hatappi) • Speee, Inc
• Red Data Tools member • Ruby, Go, TypeScript, etc. • I love soybeans a soybean =>

Session Introduction

Deep Learning  in Ruby

There are several approaches ‣ mxnet.rb  https://github.com/mrkn/mxnet.rb ‣ Red Chainer 
https://github.com/red-data-tools/red-chainer ‣ Tensorflow.rb  https://github.com/somaticio/tensorflow.rb ‣ TensorStream  https://github.com/jedld/tensor_stream

Topics in this session ‣ mxnet.rb  https://github.com/mrkn/mxnet.rb ‣ Red Chainer 
https://github.com/red-data-tools/red-chainer ‣ Tensorflow.rb  https://github.com/somaticio/tensorflow.rb ‣ TensorStream  https://github.com/jedld/tensor_stream

mxnet.rb

What is mxnet.rb? ‣ Ruby binding library of Apache MXNet
‣ Since Nov 2017 ‣ You can write deep learning programs in Ruby by using mxnet.rb and MXNet runtime library ‣ It doesn’t depend on Python runtime ‣ You need only Ruby ‣ But `pip install mxnet` is currently easiest way to install MXNet runtime library

Why I write mxnet.rb ‣ I want to write deep
learning programs in Ruby ‣ Without dependency on Python (pycall.rb) ‣ There is Tensorflow.rb, but I don’t want to use Tensorflow C API ‣ I think Apache MXNet must be best for Ruby

Why MXNet is best for Ruby? ‣ It already supports
multiple languages ‣ Many stakeholders support MXNet development ‣ There are some good features that can compete with other frameworks

Multi-language support ‣ Not only Python and C/C++ ‣ But
also Julia, R, JavaScript, Perl, Matlab, Scala, Go ‣ Ruby will be supported soon (I’m working on it)

Companies that support Apache MXNet https://mxnet.incubator.apache.org/community/powered_by.html

Academic organizations that support Apache MXNet

Good Features ‣ Multiple programming paradigm for deep learning ‣
Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support

Multiple programming paradigm for deep learning ‣ Imperative style ‣
Symbolic style ‣ Hybrid style

# Computation is executed step by step a = MXNet::NDArray.ones([10])
b = MXNet::NDArray.ones([10]) * 2 c = b * a d = c + 1 Imperative style

# ﬁrst generate computation graphs a = MXNet::Symbol.var(:a) b =
MXNet::Symbol.var(:b) c = b * a # generate a computation graph d = c + 1 # ditto # execute the computation graph d.eval(a: MXNet::NDArray.ones([10]), b: MXNet::NDArray.ones([10]) * 2) Symbolic style * 1 + a b c d

Imperative vs Symbolic • Imperative Programs Tend to be More
Flexible • It enables us to write loop directly in the syntax of the programming language  e.g. while, until, loop { … }, each { … }, etc. • Symbolic Programs Tend to be More Efficient • It can optimize memory usage automatically • It can optimize computation orders

Computational graph optimization example * 1 + a b c
d 1 op a b d op = a * b + 1 This optimization reduces both computation steps and memory consumption. Remove c

Hybrid style • Mix both imperative and symbolic styles •
In deep learning programming • Imperative style is helpful for writing parameter update routines • Gradient calculation should be performed in symbolically • In MXNet, Gluon API supports hybrid style programming

MXNet vs Tensorflow • Investigated by Julien Simon  https://medium.com/@julsimon/keras-shoot-out-tensorflow-vs-mxnet-51ae2b30a9c0 •
Using Keras to compare them • Both MXNet and Tensorflow can be used as backends of Keras • Three metrics • Precision, Speed, and Memory consumption

Multi-node computation • You can use MXNet as a framework
for distributed scientific computation • Using Key-Value Store to exchange parameters among each thread in each machine • For example: • Distributed model training  https://mxnet.incubator.apache.org/versions/master/faq/distributed_training.html

MXNet is an Apache incubator project • There are a
lot of tools for data science under Apache Foundation • Arrow • Hadoop • Kudu • Spark • etc.

ONNX • Open Neural Network Exchange Format • Founded by
Microsoft and Facebook • We can interchange learned models between different frameworks by ONNX • e.g. We can use Python and Keras for experimental, and we can use Ruby and MXNet for production

Frameworks that support to interchange models by ONNX • MXNet
• PyTorch • Chainer • Caffe2 • Tensorflow • etc.

Current project status  of mxnet.rb 2 developers ‣ Me (@mrkn)
• Conference-driven development • Currently focusing on Gluon API ‣ Laurent Julliard (@ljulliar) • Currently focusing on the coverage of NDArray API Future plan ‣ I want to achieve 100% feature coverage

We want more developers ‣ We are welcome to receive
your pull-request ‣ I’ll make feature tables and some milestones so that you can find your commit chance more easily

require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def
initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) DEMO

require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def
initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) end private def relu(x) ND.maximum(x, ND.zeros_like(x)) end def forward(x) h = x n = @layer_dims.length (n - 2).times do |i| h_linear = ND.dot(h, @weights[i]) + @biases[i] h = relu(h_linear) end y_hat_linear = ND.dot(h, @weights[-1]) + @biases[-1] end private def softmax_cross_entropy(y_hat_linear, t) -ND.nansum(t * ND.log_softmax(y_hat_linear), axis: 0, exclude: true) end def loss(y_hat_linear, t) softmax_cross_entropy(y_hat_linear, t) end def predict(x) y_hat_linear = forward(x) ND.argmax(y_hat_linear, axis: 1) end end module_function def SGD(params, lr) params.each do |param| param[0..-1] = param - lr * param.grad end end def evaluate_accuracy(data_iter, model) num, den = 0.0, 0.0 data_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) predictions = model.predict(data) num += ND.sum(predictions == label) den += data.shape[0] end (num / den).as_scalar end def learning_loop(train_iter, test_iter, model, epochs: 10, learning_rate: 0.001, smoothing_constant: 0.01) epochs.times do |e| start = Time.now cumloss = 0.0 num_batches = 0 train_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) label_one_hot = ND.one_hot(label, depth: model.layer_dims[-1]) loss = MXNet::Autograd.record do y = model.forward(data) model.loss(y, label_one_hot) end loss.backward SGD(model.all_parameters, learning_rate) cumloss = ND.sum(loss).as_scalar num_batches += 1 end test_acc = evaluate_accuracy(test_iter, model) train_acc = evaluate_accuracy(train_iter, model) duration = Time.now - start puts "Epoch #{e}. Loss: #{cumloss / (train_iter.batch_size * num_batches)}, " + "train-acc: #{train_acc}, test-acc: #{test_acc} (#{duration} sec)" end end end

Summary of mxnet.rb ‣ MXNet is a deep learning framework
that is better for supporting in Ruby ‣ mxnet.rb is under development but some APIs has already been usable ‣ Contact me if you want to join the development

Red Chainer

Red Chainer • Deep learning framework  it ported python's chainer
with ruby • Use Numo::NArray for holding and computing matrices • One project in development under Red Data Tools

Red Data Tools • Project providing data processing tool for
Ruby • @ktou was launched in February 2017 • red-arrow, red-datasets, csv gem maintenance, etc

Red Data Tools’s Policy 1. Collaborate across the Ruby community
2. Acting rather than blaming 3. Continuous, iterative progress rather than a short, big project 4. The current lack of knowledge doesn't matter 5. Ignore criticism from outsiders 6. Fun!

Features of Red Chainer 1. Define-by-Run 2. Provide high level
API 3. Can be constructed like Ruby 4. OSS Project

Define-by-Run • Define and Run • Build a calculation graph
and run data • Define by Run • Build a calculation graph with data flowing

Provide high level API • 2D Convolution • BatchNormalization •
Linear • ReLU • Sigmoid • Softmax • Dropout • etc…

Can be constructed like Ruby

OSS Project red-data-tools/red-chainer • You can see the source code
at any time • You can start developing together anywhere you want to modify or API you want to add

By having Red Chainer Application Deep Learning Red Chainer

DEMO • Identify CIFAR-10(32x32 image datasets) with Red Chainer using
CNN • Visualize the accuracy of each epoch with Rails using the graph and the identified image

Future of Red Chainer • GPU compatible: sonots/cumo • Fast
Numerical Computing and Deep Learning in Ruby with Cumo  http://rubykaigi.org/2018/presentations/sonots.html#may31 • Support Apache Arrow • Develop around Red Chainer • red-datasets: provides common datasets • red-arrow: Apache Arrow Ruby binding

Summary • introduced Red Chainer of Deep Learning Framework created
in Ruby • Interested in Red Data Tools, Red Chainer • online • en: https://gitter.im/red-data-tools/en • ja: https://gitter.im/red-data-tools/ja • offline • hold meetup every month at Speee, inc in Tokyo • https://speee.connpass.com/ • I’m at the Speee booth at RubyKaigi2018

Overview of the current status of Ruby’s data science supports

The current status of  Ruby’s data science support ‣ Red
Arrow ‣ CRuby’s updates for data science ‣ SciRuby GSoC ‣ RubyData Workshop in RubyKaigi 2018

Red Arrow ‣ Ruby binding of Apache Arrow ‣ It
has become an official Ruby binding of Apache Arrow ‣ https://github.com/apache/arrow/tree/master/ ruby

2 updates in CRuby for  data science ‣ Enumerator::ArithmeticSequence was
accepted ‣ Range#% was accepted

Enumerator::ArithmeticSequence ‣ We will have an object that works like
a slice object in Python ‣ Integer#step and Range#step returns such an object

Range#% ‣ An alias to Range#step ‣ A range with
step can be written as (1…10)%2 ‣ It may be very useful in Numo::NArray, NMatrix, Daru::DataFrame, Arrow::Table, etc.

SciRuby GSoC In GSoC 2018, SciRuby accepts 5 students, and
then the following 4 projects are running: • Business Intelligence with daru • Advanced features in daru-views • NetworkX.rb: Ruby version of NetworkX • Ruby version of matplotlib The discussions are being held on RubyData’s discourse  https://discourse.ruby-data.org/c/gsoc/gsoc2018

RubyData Workshop in RubyKaigi 2018 ‣ 3:50pm tomorrow in Room
Shirakashi ‣ After afternoon break ‣ Contents • Data analysis with Ruby’s data tools • Data analysis with pycall and Python data tools • Introduction of Red Data Tools project

Talk Summary

Talk summary ‣ The development of high-level deep learning frameworks
in Ruby is progressed day by day ‣ You will be able to do not only deep learning, but also GPGPU and distributed computation by these frameworks ‣ The development of tools for generic data science is also progressed day by day ‣ You can join these development projects

Links mxnet.rb ‣ https://github.com/mrkn/mxnet.rb Red Chainer ‣ https://github.com/red-data-tools/red-chainer Red Data
Tools ‣ http://red-data-tools.github.io/ SciRuby GSoC ‣ https://discourse.ruby-data.org/c/gsoc/gsoc2018

Deep Learning Programming on Ruby

Deep Learning Programming on Ruby

More Decks by Kenta Murata

Other Decks in Technology

Featured

Transcript