SciRuby Machine Learning Current Status and Future

by Kenta Murata

Slide 1

Slide 1 text

SciRuby  Machine Learning Current Status and Future Kenta Murata 2016.09.09 Kyoto Japan

Slide 2

Slide 2 text

self.introduce

Slide 3

Slide 3 text

@mrkn ✓ Kenta Murata ✓ CRuby committer ✓ Start contributing to SciRuby since last year ✓ Recruit Holdings Co., Ltd.  Media Technology Lab.

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

my gems ✓ bigdecimal ✓ daru-td ✓ iruby-rails ✓ enumerable-statistics

Slide 6

Slide 6 text

enumerable-statistics.gem ✓ Compute statistical summaries as fast and precise as possible ‣Array#sum, Enumerable#sum (for Ruby < 2.4) ‣Array#mean, Enumerable#mean ‣Array#variance, Enumerable#variance ‣etc.

Slide 7

Slide 7 text

enumerable-statistics.gem

Slide 8

Slide 8 text

Agenda ✓ Introduction ✓ Machine Learning ✓ SciRuby's Current Status ✓ Scikit-learn ✓ Future

Slide 9

Slide 9 text

Introduction

Slide 10

Slide 10 text

I want to ✓ Do machine learning with Ruby

Slide 11

Slide 11 text

Machine Learning w/ Ruby ✓ What does it mean?

Slide 12

Slide 12 text

NG ✓ Write machine learning algorithms by Ruby

Slide 13

Slide 13 text

OK ✓ Perform data science works with Ruby

Slide 14

Slide 14 text

Data Science Workﬂow ✓ Collecting data ✓ Exploratory data analysis ✓ Cleansing data ✓ Integrating multiple data sources ✓ Preprocessing ✓ Making machine learning model ✓ Applying to real world

Slide 15

Slide 15 text

Machine Learning Related Processes ✓ Collecting data ✓ Exploratory data analysis ✓ Cleansing data ✓ Integrating multiple data sources ✓ Preprocessing ✓ Making machine learning model ✓ Applying to real world

Slide 16

Slide 16 text

How many things can be done with Ruby? ✓ Exploratory data analysis ✓ Cleansing data ✓ Integrating multiple data sources ✓ Preprocessing ✓ Making machine learning model

Slide 17

Slide 17 text

How many things can be done with Ruby? ✓ Exploratory data analysis ✓ Cleansing data ✓ Integrating multiple data sources ✓ Preprocessing ✓ Making machine learning model

Slide 18

Slide 18 text

How many things can be done in Ruby? ✓ Generally nothing ✓ Python can do total workﬂow ✓ That's why everyone uses Python

Slide 19

Slide 19 text

Change the Current Situation ✓ Make Ruby available Data Science ✓ What is wrong now? ✓ I'll make it clear in this talk

Slide 20

Slide 20 text

Most Important Thing  in This Talk at First

Slide 21

Slide 21 text

Help!! ✓ Join SciRuby development ✓ A lot of issues are waiting your contribution ✓ Discussion in Slack ✓ https://sciruby-slack.herokuapp.com

Slide 22

Slide 22 text

Machine Learning

Slide 23

Slide 23 text

Why we use machine learning? ✓ We want to make business decisions from real data ✓ The use of machine learning algorithms is optional ✓ We need machine learning to drive our business by "big data"

Slide 24

Slide 24 text

Machine Learning can do ✓ Tasks impossible to do by human ✓ Tasks that the solution is diﬃcult to program by hand ✓ Tasks that how to solve is unknown

Slide 25

Slide 25 text

For example ✓ Recommendation ✓ Outlier detection ✓ Sentiment analysis ✓ etc.

Slide 26

Slide 26 text

Machine Learning Problems ✓ Supervised learning ✓ Unsupervised learning ✓ Reinforcement learning

Slide 27

Slide 27 text

Supervised learning ✓ To learn a general rule that maps inputs to outputs from the given example input-output pairs ✓ Two types of problems: ‣Classiﬁcation - To predict what the weather is tomorrow ‣Regression - To estimate the expected highest temperature tomorrow ✓ Example use case ‣Recommender system ‣Sentiment analysis

Slide 28

Slide 28 text

Unsupervised learning ✓ To extract the structural features of input data distribution ✓ Typical problem types: ‣Clustering ‣Density estimation ‣Dimensionality reduction ✓ Example use cases: ‣Exploratory data analysis ‣Outlier detection

Slide 29

Slide 29 text

Reinforcement learning ✓To learn rules of decision making on a dynamic environment ✓Typical problem types: ‣Multi-armed bandit problem ‣Adaptive scheduling ‣Automatic control ✓Example use cases: ‣Shogi AI ‣Automatic car driving

Slide 30

Slide 30 text

Machine Learning Problems ✓ Supervised learning ✓ Unsupervised learning ✓ Reinforcement learning

Slide 31

Slide 31 text

This talk focuses on ✓ Supervised learning ✓ Unsupervised learning ✓ Reinforcement learning

Slide 32

Slide 32 text

SciRuby Machine Learning  Current Status

Slide 33

Slide 33 text

Existing Gems  for Machine Learning ✓ liblinear-ruby.gem ✓ rb-libsvm.gem ✓ decisiontree.gem ✓ etc.

Slide 34

Slide 34 text

Slide 35

Slide 35 text

liblinear-ruby.gem features ✓ Just wrapper of liblinear ✓ Logistic regression ‣Classiﬁcation ✓ Linear SVC ‣Classiﬁcation ✓ Linear SVR ‣Regression ✓ Cross validation

Slide 36

Slide 36 text

liblinear-ruby.gem example require 'liblinear' # model parameters parameters = { solver_type: Liblinear::L2R_LR } # labels of training data labels = [-1, -1, 1, 1] # training data examples = [[-2, -2], [-1, -1], [1, 1], [2, 2]] # train model = Liblinear.train(parameters, labels, examples) # predict (the result will be 1) puts Liblinear.predict(model, [0.5, 0.5]) # cross validation fold = 5 # Means 5-fold cross validation results = Liblinear.cross_validation(fold, parameters, labels, examples) accuracy = results.zip(labels).map {|a, b| a == b ? 1.0 : 0.0 }.sum / labels.length puts "Cross validation accuracy: #{accuracy}"

Slide 37

Slide 37 text

rb-libsvm.gem features ✓ Just wrapper of libsvm ✓ C-SVC, nu-SVC ‣Classiﬁcation ✓ epsilon-SVR, nu-SVR ‣Regression ✓ One-class SVM ‣Unsupervised outlier detection ✓ Cross validation

Slide 38

Slide 38 text

rb-libsvm.gem example require 'libsvm' require 'enumerable/statistics' # model parameters parameter = Libsvm::SvmParameter.new parameter.svm_type = Libsvm::SvmType::C_SVC parameter.kernel_type = Libsvm::KernelType::RBF parameter.cache_size = 1 # in megabytes parameter.eps = 0.001 parameter.c = 10 # labels of training data labels = [1, -1] # training data examples = [[1, 0, 1], [-1, 0, -1]].map {|xs| Libsvm::Node.features(xs) } # train model problem = Libsvm::Problem.new problem.set_examples(labels, examples) model = Libsvm::Model.train(problem, parameter)

Slide 39

Slide 39 text

decisiontree.gem features ✓ ID3 decision tree ‣Classiﬁcation only ✓ No parameter conﬁguration ‣e.g. criterion, minimum samples in leaf, etc. ✓ No cross validation ✓ Pure Ruby implementation

Slide 40

Slide 40 text

decisiontree.gem usage require 'decisiontree' # training data (last items are labels) feature_names = ['hungers', 'color'] examples = [ [8, 'red', 'angry'], [6, 'red', 'angry'], [7, 'red', 'angry'], [7, 'blue', 'not angry'], [2, 'red', 'not angry'], [3, 'blue', 'not angry'], [2, 'blue', 'not angry'], [1, 'red', 'not angry'] ] # train model tree = DecisionTree::ID3Tree.new( feature_names, examples, 'not angry', color: :descrete, hunger: :continuous ) tree.train # prediction pred = tree.predict([7, 'red', 'angry']) puts "Predicted: #{pred} for angry"

Slide 41

Slide 41 text

Etc. ✓ ai4r.gem ✓ classiﬁer-reborn.gem ✓ data_mining.gem ✓ etc.

Slide 42

Slide 42 text

With Existing Gems ✓ Several machine learning algorithms are provided for classification, regression, clustering, etc. ✓ We must use these algorithms in library-specific implementations because they have different APIs

Slide 43

Slide 43 text

Issues of Existing Gems ✓ Diﬀerent ways to specify model parameters ✓ Diﬀerent ways and formats of training data ✓ Many gems don't support cross validation ✓ Not for practical use because of their toy- implementations

Slide 44

Slide 44 text

Real World Machine Learning

Slide 45

Slide 45 text

Real World Data ✓ Large amount of data ✓ High-dimensional features ✓ A lot of missing values

Slide 46

Slide 46 text

Machine Learning  in Real World ✓ We couldn't look at the whole data ✓ We couldn't know what algorithms were preferred to the given data ✓ We must try, compare, and combine as many algorithms as possible

Slide 47

Slide 47 text

Try, Compare, and Combine multiple algorithms ✓ Need to unify data formats ✓ Need to apply cross validation for all algorithms ✓ Need to unify interfaces of algorithms for searching optimal hyper parameters and combine algorithms

Slide 48

Slide 48 text

In Current SciRuby ✓ Couldn't build up practical machine learning systems with SciRuby ✓ Python can do with scikit-learn

Slide 49

Slide 49 text

Scikit-learn

Slide 50

Slide 50 text

What is scikit-learn ✓ Machine learning framework for Scipy stack

Slide 51

Slide 51 text

Scipy stack ✓ Numpy ‣Dense tensors ✓ Scipy ‣Scientiﬁc functions ‣Sparse matrices ✓ Pandas ‣Data frames ✓ Matplotlib ‣Visualization infrastructure ✓ Jupyter notebook ✓ Etc.

Slide 52

Slide 52 text

What is scikit-learn ✓ Machine learning framework for Scipy stack ✓ Python's machine learning standard

Slide 53

Slide 53 text

Scikit-learn is elegant ✓ Input data is feature matrix and label vector for all algorithms ✓ Input data type can be any objects compatible with numpy's ndarray ✓ Machine learning models follow the uniﬁed interface

Slide 54

Slide 54 text

Logistic regression from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import cross_val_score # labels of training data labels = [-1, -1, 1, 1] # training data examples = [[-2, -2], [-1, -1], [1, 1], [2, 2]] # learning classifier = LogisticRegression(penalty="l2") classifier.fit(examples, labels) # prediction print(classifier.predict([[0.5, 0.5]])) # 5-fold cross validation classifier = LogisticRegression(penalty="l2") scores = cross_val_score(classifier, examples, labels, cv=5, scoring='roc_auc') print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

Slide 55

Slide 55 text

dmlc/xgboost import xgboost as xgb from sklearn.cross_validation import cross_val_score # labels of training data labels = [-1, -1, 1, 1] # training data examples = [[-2, -2], [-1, -1], [1, 1], [2, 2]] # learning classifier = xgb.XGBClassifier() classifier.fit(examples, labels) # prediction print(classifier.predict([[0.5, 0.5]])) # 5-fold cross validation classifier = xgb.XGBClassifier() scores = cross_val_score(classifier, examples, labels, cv=5, scoring='roc_auc') print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

Slide 56

Slide 56 text

Grid search from sklearn.linear_model import LogisticRegression from sklearn.grid_search import GridSearchCV # labels of training data labels = [-1, -1, 1, 1] # training data examples = [[-2, -2], [-1, -1], [1, 1], [2, 2]] # Finding best parameter set by grid search parameters = { 'penalty' : ['l2', 'l1'], 'C' : numpy.logspace(-4, 4, 10) } classifier = GridSearchCV(LogisticRegression(), parameters, cv=5) classifier.fit(examples, labels) # Report best parameters best_params = classifier.best_estimator_.get_params() print 'Best parameters = {}'.format(best_params)

Slide 57

Slide 57 text

Combination with Pipeline from sklearn import svm from sklearn.decomposition import PCA  from sklearn.grid_search import GridSearchCV from sklearn.pipeline import Pipeline # Combine PCA and SVC by pipeline pipeline = Pipeline([ ('pca', PCA()), ('svc', svm.SVC()) ]) # Finding best parameter set by grid search parameters = { 'pca__n_components' : range(2, 6), 'svc__kernel' : ['linear', 'rbf'], 'svc__C' : numpy.logspace(-4, 4, 10), 'svc__gamma' : numpy.logspace(-4, 4, 10) } classifier = GridSearchCV(pipeline, parameters, cv=5, n_jobs=-1) classifier.fit(features, labels) # Report best parameter set best_params = classifier.best_estimator_.get_params() print 'Best parameters = {}'.format(best_params) 1$" 47$ Input Output

Slide 58

Slide 58 text

With scikit-learn ✓ We can prepare training data in a common format ‣vectors for labels and matrices for features ✓ We can use all the algorithms in the same interface ✓ We can make a combination model by using pipeline ✓ We can grid search for optimizing hyper parameters

Slide 59

Slide 59 text

Scikit-learn is a standard ✓ Several libraries provide scikit-learn compatible interface ‣xgboost ‣tensorﬂow

Slide 60

Slide 60 text

Scikit-learn is ideal framework for machine learning

Slide 61

Slide 61 text

The Future of SciRuby in Machine Learning

Slide 62

Slide 62 text

Key Point ✓ Make scikit-learn like thing available for Ruby programs

Slide 63

Slide 63 text

Two ways ✓ Make scikit-learn itself to be available from ruby ✓ Make own libraries wrote by Ruby like scikit-learn

Slide 64

Slide 64 text

Use scikit-learn itself ✓ Learn from PyCall.jl and ScikitLearn.jl ✓ PyCall.jl ‣Call python things from Julia code ✓ ScikitLearn.jl ‣Binding to scikit-learn via PyCall.jl ✓ Make pycall.gem and scikit-learn.gem

Slide 65

Slide 65 text

Make scikit-learn like libraries ✓ Very hard work ✓ Need Cython-like system to make writing extension library easy ‣rubex planned by v0dro ✓ Numerical arrays

Slide 66

Slide 66 text

Numerical array issues ✓ NMatrix ✓ Numo::NArray ✓ NumBuﬀer

Slide 67

Slide 67 text

NMatrix ✓ Slow implementation ✓ Lack of linear algebra operations for sparse matrices ✓ Installation issues

Slide 68

Slide 68 text

Numo::NArray ✓ Lack of sparse matrix features ✓ Too few supported libraries

Slide 69

Slide 69 text

NumBuﬀer ✓ What is? ‣Supporting to exchange numerical array data among different libraries ✓ Developer is only me ✓ Need more contributors

Slide 70

Slide 70 text

Benchmark $ ruby -r benchmark/ips -r nmatrix -r numo/narray -e ' Benchmark.ips do |x| ar = Array.new(100*100) { rand } nm = NMatrix.random [100*100] na = Numo::DFloat.new(100*100).rand x.report('ar') { Array.new(ar.length) {|i| ar[i] + ar[i] } } x.report('nm') { nm + nm } x.report('na') { na + na } end ' Warming up -------------------------------------- ar 111.000 i/100ms nm 59.000 i/100ms na 3.133k i/100ms Calculating ------------------------------------- ar 1.068k (±12.3%) i/s - 5.328k in 5.078079s nm 618.334 (±10.0%) i/s - 3.068k in 5.021136s na 34.110k (±19.0%) i/s - 166.049k in 5.028910s

Slide 71

Slide 71 text

Benchmark $ ruby -r benchmark/ips -r nmatrix -r numo/narray -e ' Benchmark.ips do |x| nm = NMatrix.random [100, 100] na = Numo::DFloat.new(100, 100).rand x.report('nm') { nm.dot nm } x.report('na') { na.inplace.dot na } end ' Warming up -------------------------------------- nm 189.000 i/100ms na 60.000 i/100ms Calculating ------------------------------------- nm 2.083k (± 8.0%) i/s - 10.395k in 5.022906s na 658.759 (± 7.4%) i/s - 3.300k in 5.039515s

Slide 72

Slide 72 text

NMatrix and NArray compatibility ✓ Which is best? ‣Both of them are not best now ✓ Interface and feature incompatibility ‣NumBuffer can't resolve this issue ✓ I want both of them to be uniﬁed ‣NMatrix is good for sparse matrices ‣NArray is good for dense arrays

Slide 73

Slide 73 text

SciRuby JP ✓ SciRuby developer community in Japan ✓ Perform survey study in this summer

Slide 74

Slide 74 text

Some Achievements ✓Tutorials ‣100 narray exercises (by masa16 & kozo2) ‣10 minutes to daru (by kozo2) ‣pandas cookbook with daru (by kozo2) ‣Rewrite pandas doc with daru (by chart-linux) ✓Installation ‣IRuby on Windows (by kimura) ‣ZeroMQ related things (by kozo2 & mrkn)

Slide 75

Slide 75 text

Some Achievements ✓ NLP ‣Survey (by himkt) ✓ Machine Learning ‣Survey (by mrkn) ✓ Visualization ‣New plotly binding (by y4ashida) ✓ Other Languages ‣Ruby support in runr (by y4ashida)

Slide 76

Slide 76 text

Let's go forward ✓ Join SciRuby contribution ‣English is preferred, but Japanese is OK ✓ A lot of issues are waiting your contribution ‣Not only for machine learning ✓ Discuss in Slack ‣https://sciruby-slack.herokuapp.com