Slide 1

Slide 1 text

RubyKaigi 2018 on 31 May 2018 https://www.flickr.com/photos/53416677@N08/4972916707/ Deep Learning Programming on Ruby Kenta Murata Yusaku Hatanaka RubyKaigi 2018

Slide 2

Slide 2 text

Contents 1. About us 2. Introduction of this session 3. Deep learning programming on Ruby 3.1. mxnet.rb 3.2. Red Chainer 4. Overview of Ruby’s current data science support 5. Summary of this talk

Slide 3

Slide 3 text

About us (1) • Kenta Murata (@mrkn) • Full-time CRuby committer at Speee, Inc. • bigdecimal, enumerable-statistics, pycall.rb, mxnet.rb, etc. • Ruby, C/C++, Python, Julia, etc. • neovim, vscode (neovim client)

Slide 4

Slide 4 text

About us (2) • Yusaku Hatanaka (@hatappi) • Speee, Inc • Red Data Tools member • Ruby, Go, TypeScript, etc. • I love soybeans a soybean =>

Slide 5

Slide 5 text

Session Introduction

Slide 6

Slide 6 text

Deep Learning
 in Ruby

Slide 7

Slide 7 text

There are several approaches ‣ mxnet.rb
 https://github.com/mrkn/mxnet.rb ‣ Red Chainer
 https://github.com/red-data-tools/red-chainer ‣ Tensorflow.rb
 https://github.com/somaticio/tensorflow.rb ‣ TensorStream
 https://github.com/jedld/tensor_stream

Slide 8

Slide 8 text

Topics in this session ‣ mxnet.rb
 https://github.com/mrkn/mxnet.rb ‣ Red Chainer
 https://github.com/red-data-tools/red-chainer ‣ Tensorflow.rb
 https://github.com/somaticio/tensorflow.rb ‣ TensorStream
 https://github.com/jedld/tensor_stream

Slide 9

Slide 9 text

mxnet.rb

Slide 10

Slide 10 text

What is mxnet.rb? ‣ Ruby binding library of Apache MXNet ‣ Since Nov 2017 ‣ You can write deep learning programs in Ruby by using mxnet.rb and MXNet runtime library ‣ It doesn’t depend on Python runtime ‣ You need only Ruby ‣ But `pip install mxnet` is currently easiest way to install MXNet runtime library

Slide 11

Slide 11 text

Why I write mxnet.rb ‣ I want to write deep learning programs in Ruby ‣ Without dependency on Python (pycall.rb) ‣ There is Tensorflow.rb, but I don’t want to use Tensorflow C API ‣ I think Apache MXNet must be best for Ruby

Slide 12

Slide 12 text

Why MXNet is best for Ruby? ‣ It already supports multiple languages ‣ Many stakeholders support MXNet development ‣ There are some good features that can compete with other frameworks

Slide 13

Slide 13 text

Multi-language support ‣ Not only Python and C/C++ ‣ But also Julia, R, JavaScript, Perl, Matlab, Scala, Go ‣ Ruby will be supported soon (I’m working on it)

Slide 14

Slide 14 text

Companies that support Apache MXNet https://mxnet.incubator.apache.org/community/powered_by.html

Slide 15

Slide 15 text

Academic organizations that support Apache MXNet

Slide 16

Slide 16 text

Good Features ‣ Multiple programming paradigm for deep learning ‣ Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support

Slide 17

Slide 17 text

Good Features ‣ Multiple programming paradigm for deep learning ‣ Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support

Slide 18

Slide 18 text

Multiple programming paradigm for deep learning ‣ Imperative style ‣ Symbolic style ‣ Hybrid style

Slide 19

Slide 19 text

# Computation is executed step by step a = MXNet::NDArray.ones([10]) b = MXNet::NDArray.ones([10]) * 2 c = b * a d = c + 1 Imperative style

Slide 20

Slide 20 text

# first generate computation graphs a = MXNet::Symbol.var(:a) b = MXNet::Symbol.var(:b) c = b * a # generate a computation graph d = c + 1 # ditto # execute the computation graph d.eval(a: MXNet::NDArray.ones([10]), b: MXNet::NDArray.ones([10]) * 2) Symbolic style * 1 + a b c d

Slide 21

Slide 21 text

Imperative vs Symbolic • Imperative Programs Tend to be More Flexible • It enables us to write loop directly in the syntax of the programming language
 e.g. while, until, loop { … }, each { … }, etc. • Symbolic Programs Tend to be More Efficient • It can optimize memory usage automatically • It can optimize computation orders

Slide 22

Slide 22 text

Computational graph optimization example * 1 + a b c d 1 op a b d op = a * b + 1 This optimization reduces both computation steps and memory consumption. Remove c

Slide 23

Slide 23 text

Hybrid style • Mix both imperative and symbolic styles • In deep learning programming • Imperative style is helpful for writing parameter update routines • Gradient calculation should be performed in symbolically • In MXNet, Gluon API supports hybrid style programming

Slide 24

Slide 24 text

Good Features ‣ Multiple programming paradigm for deep learning ‣ Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support

Slide 25

Slide 25 text

MXNet vs Tensorflow • Investigated by Julien Simon
 https://medium.com/@julsimon/keras-shoot-out-tensorflow-vs-mxnet-51ae2b30a9c0 • Using Keras to compare them • Both MXNet and Tensorflow can be used as backends of Keras • Three metrics • Precision, Speed, and Memory consumption

Slide 26

Slide 26 text

Good Features ‣ Multiple programming paradigm for deep learning ‣ Lower memory consumption than other frameworks ‣ Efficient multi-GPU computation ‣ Multi-node computation ‣ An Apache incubator project ‣ ONNX support

Slide 27

Slide 27 text

Multi-node computation • You can use MXNet as a framework for distributed scientific computation • Using Key-Value Store to exchange parameters among each thread in each machine • For example: • Distributed model training
 https://mxnet.incubator.apache.org/versions/master/faq/distributed_training.html

Slide 28

Slide 28 text

MXNet is an Apache incubator project • There are a lot of tools for data science under Apache Foundation • Arrow • Hadoop • Kudu • Spark • etc.

Slide 29

Slide 29 text

ONNX • Open Neural Network Exchange Format • Founded by Microsoft and Facebook • We can interchange learned models between different frameworks by ONNX • e.g. We can use Python and Keras for experimental, and we can use Ruby and MXNet for production

Slide 30

Slide 30 text

Frameworks that support to interchange models by ONNX • MXNet • PyTorch • Chainer • Caffe2 • Tensorflow • etc.

Slide 31

Slide 31 text

Current project status
 of mxnet.rb 2 developers ‣ Me (@mrkn) • Conference-driven development • Currently focusing on Gluon API ‣ Laurent Julliard (@ljulliar) • Currently focusing on the coverage of NDArray API Future plan ‣ I want to achieve 100% feature coverage

Slide 32

Slide 32 text

We want more developers ‣ We are welcome to receive your pull-request ‣ I’ll make feature tables and some milestones so that you can find your commit chance more easily

Slide 33

Slide 33 text

require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) DEMO

Slide 34

Slide 34 text

require 'mxnet' module MLPScratch ND = MXNet::NDArray class MLP def initialize(num_inputs: 784, num_outputs: 10, num_hidden_units: [256, 128, 64], ctx: nil) @layer_dims = [num_inputs, *num_hidden_units, num_outputs] @weight_scale = 0.01 @ctx = ctx || MXNet::Context.default @all_parameters = init_parameters end attr_reader :ctx, :all_parameters, :layer_dims private def rnorm(shape) ND.random_normal(shape: shape, scale: @weight_scale, ctx: @ctx) end private def init_parameters @weights = [] @biases = [] @layer_dims.each_cons(2) do |dims| @weights << rnorm(dims) @biases << rnorm([dims[1]]) end [*@weights, *@biases].each(&:attach_grad) end private def relu(x) ND.maximum(x, ND.zeros_like(x)) end def forward(x) h = x n = @layer_dims.length (n - 2).times do |i| h_linear = ND.dot(h, @weights[i]) + @biases[i] h = relu(h_linear) end y_hat_linear = ND.dot(h, @weights[-1]) + @biases[-1] end private def softmax_cross_entropy(y_hat_linear, t) -ND.nansum(t * ND.log_softmax(y_hat_linear), axis: 0, exclude: true) end def loss(y_hat_linear, t) softmax_cross_entropy(y_hat_linear, t) end def predict(x) y_hat_linear = forward(x) ND.argmax(y_hat_linear, axis: 1) end end module_function def SGD(params, lr) params.each do |param| param[0..-1] = param - lr * param.grad end end def evaluate_accuracy(data_iter, model) num, den = 0.0, 0.0 data_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) predictions = model.predict(data) num += ND.sum(predictions == label) den += data.shape[0] end (num / den).as_scalar end def learning_loop(train_iter, test_iter, model, epochs: 10, learning_rate: 0.001, smoothing_constant: 0.01) epochs.times do |e| start = Time.now cumloss = 0.0 num_batches = 0 train_iter.each_with_index do |batch, i| data = batch.data[0].as_in_context(model.ctx) data = data.reshape([-1, model.layer_dims[0]]) label = batch.label[0].as_in_context(model.ctx) label_one_hot = ND.one_hot(label, depth: model.layer_dims[-1]) loss = MXNet::Autograd.record do y = model.forward(data) model.loss(y, label_one_hot) end loss.backward SGD(model.all_parameters, learning_rate) cumloss = ND.sum(loss).as_scalar num_batches += 1 end test_acc = evaluate_accuracy(test_iter, model) train_acc = evaluate_accuracy(train_iter, model) duration = Time.now - start puts "Epoch #{e}. Loss: #{cumloss / (train_iter.batch_size * num_batches)}, " + "train-acc: #{train_acc}, test-acc: #{test_acc} (#{duration} sec)" end end end

Slide 35

Slide 35 text

Summary of mxnet.rb ‣ MXNet is a deep learning framework that is better for supporting in Ruby ‣ mxnet.rb is under development but some APIs has already been usable ‣ Contact me if you want to join the development

Slide 36

Slide 36 text

Red Chainer

Slide 37

Slide 37 text

Red Chainer • Deep learning framework
 it ported python's chainer with ruby • Use Numo::NArray for holding and computing matrices • One project in development under Red Data Tools

Slide 38

Slide 38 text

Red Data Tools • Project providing data processing tool for Ruby • @ktou was launched in February 2017 • red-arrow, red-datasets, csv gem maintenance, etc

Slide 39

Slide 39 text

Red Data Tools’s Policy 1. Collaborate across the Ruby community 2. Acting rather than blaming 3. Continuous, iterative progress rather than a short, big project 4. The current lack of knowledge doesn't matter 5. Ignore criticism from outsiders 6. Fun!

Slide 40

Slide 40 text

Features of Red Chainer 1. Define-by-Run 2. Provide high level API 3. Can be constructed like Ruby 4. OSS Project

Slide 41

Slide 41 text

Define-by-Run • Define and Run • Build a calculation graph and run data • Define by Run • Build a calculation graph with data flowing

Slide 42

Slide 42 text

Provide high level API • 2D Convolution • BatchNormalization • Linear • ReLU • Sigmoid • Softmax • Dropout • etc…

Slide 43

Slide 43 text

Can be constructed like Ruby

Slide 44

Slide 44 text

OSS Project red-data-tools/red-chainer • You can see the source code at any time • You can start developing together anywhere you want to modify or API you want to add

Slide 45

Slide 45 text

By having Red Chainer Application Deep Learning Red Chainer

Slide 46

Slide 46 text

DEMO • Identify CIFAR-10(32x32 image datasets) with Red Chainer using CNN • Visualize the accuracy of each epoch with Rails using the graph and the identified image

Slide 47

Slide 47 text

Future of Red Chainer • GPU compatible: sonots/cumo • Fast Numerical Computing and Deep Learning in Ruby with Cumo
 http://rubykaigi.org/2018/presentations/sonots.html#may31 • Support Apache Arrow • Develop around Red Chainer • red-datasets: provides common datasets • red-arrow: Apache Arrow Ruby binding

Slide 48

Slide 48 text

Summary • introduced Red Chainer of Deep Learning Framework created in Ruby • Interested in Red Data Tools, Red Chainer • online • en: https://gitter.im/red-data-tools/en • ja: https://gitter.im/red-data-tools/ja • offline • hold meetup every month at Speee, inc in Tokyo • https://speee.connpass.com/ • I’m at the Speee booth at RubyKaigi2018

Slide 49

Slide 49 text

Overview of the current status of Ruby’s data science supports

Slide 50

Slide 50 text

The current status of
 Ruby’s data science support ‣ Red Arrow ‣ CRuby’s updates for data science ‣ SciRuby GSoC ‣ RubyData Workshop in RubyKaigi 2018

Slide 51

Slide 51 text

Red Arrow ‣ Ruby binding of Apache Arrow ‣ It has become an official Ruby binding of Apache Arrow ‣ https://github.com/apache/arrow/tree/master/ ruby

Slide 52

Slide 52 text

2 updates in CRuby for
 data science ‣ Enumerator::ArithmeticSequence was accepted ‣ Range#% was accepted

Slide 53

Slide 53 text

Enumerator::ArithmeticSequence ‣ We will have an object that works like a slice object in Python ‣ Integer#step and Range#step returns such an object

Slide 54

Slide 54 text

Range#% ‣ An alias to Range#step ‣ A range with step can be written as (1…10)%2 ‣ It may be very useful in Numo::NArray, NMatrix, Daru::DataFrame, Arrow::Table, etc.

Slide 55

Slide 55 text

SciRuby GSoC In GSoC 2018, SciRuby accepts 5 students, and then the following 4 projects are running: • Business Intelligence with daru • Advanced features in daru-views • NetworkX.rb: Ruby version of NetworkX • Ruby version of matplotlib The discussions are being held on RubyData’s discourse
 https://discourse.ruby-data.org/c/gsoc/gsoc2018

Slide 56

Slide 56 text

RubyData Workshop in RubyKaigi 2018 ‣ 3:50pm tomorrow in Room Shirakashi ‣ After afternoon break ‣ Contents • Data analysis with Ruby’s data tools • Data analysis with pycall and Python data tools • Introduction of Red Data Tools project

Slide 57

Slide 57 text

Talk Summary

Slide 58

Slide 58 text

Talk summary ‣ The development of high-level deep learning frameworks in Ruby is progressed day by day ‣ You will be able to do not only deep learning, but also GPGPU and distributed computation by these frameworks ‣ The development of tools for generic data science is also progressed day by day ‣ You can join these development projects

Slide 59

Slide 59 text

Links mxnet.rb ‣ https://github.com/mrkn/mxnet.rb Red Chainer ‣ https://github.com/red-data-tools/red-chainer Red Data Tools ‣ http://red-data-tools.github.io/ SciRuby GSoC ‣ https://discourse.ruby-data.org/c/gsoc/gsoc2018