Machine Learning In Practice for IEEE TENCON 2015

Eric Chio IEEE TENCON 2015 https://github.com/log0/ieeetencon2015 Machine Learning In Practice
1

What is Machine Learning? Arthur Samuel (1959): Machine Learning: Field
of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell (1998): A computer is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. 2

Why do Machine Learning? • Computer Vision • Natural Language
Processing • Speech Recognition • Computational Advertising • Recommendation System • Algorithmic Trading • Physics • ... 3

Example : Computer Vision 4 Source: http://getrealphilippines.com/blog/2012/06/facebook-com-acquires-facial-recognition-startup-face-com/ Source: http://www.extremetech.com/

Example: Computer Vision Source: http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html 5

Example: Computer Vision Source: http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html 6

Example: Natural Language Processing 7

Example: Natural Language Processing Source: http://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-3/ Source: http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html 8

Example: Physics Source: https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATLAS-CONF-2013-108/ 9

Example : Computational Advertising Source: http://thenextweb.com/insider/2011/10/08/an-introduction-to-online-advertising-and-why-you-should-care/ 10

Example : Speech to Text / Text to Speech Source:
http://www.pcworld.com/article/2148940/windows-phone-os/ask-cortana-anything-sassy-answers-to-58-burning-questions.html Source: http://www.cultofmac.com/285528/siri-may-helped-2-year-old-girl-just-save-mothers-life/ Source: http://www.mytechbits.com/google-now-update-will-let-android-users-to-send-messages-using-google-voice-commands-via-whatsapp/9812861/ 11

Goals, and the Types of Learning Types • Supervised learning
• Unsupervised learning • Semi-supervised learning Goals • Classification • Regression 12

Classification Given an input features vector x (x[i] ∈ R),
predict y (y ∈ {1,...,C}). Example: Given a piece of email text, predict if it is spam or not. x = [1 0 0 1 … 0 1 1], where each entry represents if the word appeared or not. y = [spam, not_spam], |y| = 2, C = 2. 13

Regression Given an input features vector x (x[i] ∈ R),
predict y (y ∈ R). Example: Given the location and size of a house, predict its price. 14

Model Training - Given a dataset D = (X, y)
- X is a (n x m) matrix containing the input features. - n is the number of training data. - m is the number of features. - y is a (n x 1) vector containing the target. - n is the number of training data. - Find a good function h: X → Y , where (h ∈ H) - H is the hypothesis space, and h is a single hypothesis. 15

Model Training - Given a dataset D = (X, y)
- X is a (n x m) matrix containing the input features. - n is the number of training data. - m is the number of features. - y is a (n x 1) vector contains the target. - n is the number of training data. - Find a good function h: X → Y , where (h ∈ H) - H is the hypothesis space, and h is a single hypothesis. We will be using a model called linear regression. - Find y = WTX, W is a (1 x m) vector containing the weights to the model. 16

17 Source: Caltech Machine Learning Telecourse - Dr Yaser Abu-Mustafa
Linear Regression

Housing Price Prediction Price ($) in 1000's Size in feet2
18

Housing Price Prediction Size in feet2 Price ($) in 1000's
19

X = [[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 20

X = [[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 21 w = 0.1492165

Model Training 1 # python scikit-learn models 2 X =
[[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] 3 y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 4 5 # Train model on (X, y). 6 model = LinearRegression() 7 model.fit(X, y) 22

Model Prediction 1 # python scikit-learn models 2 X =
[[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] 3 y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 4 5 # Train model on (X, y). 6 model = LinearRegression() 7 model.fit(X, y) 8 9 # Use model to predict housing price of (X_test). 10 X_test = [[1200]] 11 y_test_pred = model.predict(X_test) 23

Code Output 24

Model Evaluation But, how do we pick models out of
all possible models? 25

Model Evaluation But, how do we pick models out of
all possible models? - Different parameters - Randomness in algorithms - Variance in dataset - ... 26

Linear Regression Code 27

Random Forest Code 28

Recall this slide we have just now... Size in feet2
Price ($) in 1000's 29

Many lines can fit! Price ($) in 1000's Size in
feet2 30

These are equally possible models too... Price ($) in 1000's
Size in feet2 31

Model Evaluation Validation Set - Split the dataset into 80%
: 20%, roughly, randomly. - 80% will be used as training set. - 20% will be used as validation set. - For each candidate model: - Train model on training set. - Validate model on validation set by checking the accuracy/loss. 32

Model Evaluation 1 cutoff = len(y) * 4 / 5
2 X_train = X[:cutoff] 3 y_train = y[:cutoff] 4 X_cv = X[cutoff:] 5 y_cv = y[cutoff:] 6 7 model = LinearRegression() 8 model.fit(X_train, y_train) 9 10 y_cv_pred = model.predict(X_cv) 11 error = mean_squared_error(y_cv_pred, y_cv) 33

2 X_train = X[:cutoff] 3 y_train = y[:cutoff] 4 X_cv = X[cutoff:] 5 y_cv = y[cutoff:] 6 7 model = LinearRegression() 8 model.fit(X_train, y_train) 9 10 y_cv_pred = model.predict(X_cv) 11 error = mean_squared_error(y_cv_pred, y_cv) 34 => train_test_split(X, Y, test_size = 0.2)

2 X_train = X[:cutoff] 3 y_train = y[:cutoff] 4 X_cv = X[cutoff:] 5 y_cv = y[cutoff:] 6 7 model = LinearRegression() 8 model.fit(X_train, y_train) 9 10 y_cv_pred = model.predict(X_cv) 11 error = mean_squared_error(y_cv_pred, y_cv) 35

Model Evaluation 36

Cross Validation - Split the dataset into 5 folds -
Train on fold 1, 2, 3, 4, validate on fold 5 - Train on fold 1, 2, 3, 5, validate on fold 4 - ... - Take the average of the errors (optionally, also standard deviation, variance, ...) 37

Cross Validation 38

Cross Validation 39

Training Error, Test Error I got zero training error! Did
I get a perfect model? What now? 40

Overfitting and Underfitting Price ($) in 1000's Size in feet2
Underfitting Overfitting 41

Overfitting and Underfitting Price ($) in 1000's Size in feet2
Underfitting Overfitting 42

Overfitting and Underfitting A model's ability to perform well on
new, unseen data. underfit appropriate fit overfit 43

Generalization A model's ability to perform well on new, unseen
data. 44

Machine Learning Pipeline 45

The Machine Learning Pipeline 46

Movie Sentiment Analysis 47

Step 1 : Define goals and metrics - Goals =>
Classification - Classify a piece of text as positive/negative - Metrics => Zero One Loss 48

Step 2 : Data collection and cleaning 49 - Data
Collection - Extract from Rotten Tomatoes, IMDB, etc. - Label data - Data Cleaning - Check that the data is correctly labeled - The data is valid and not a bunch of invalid HTML

Step 3 : Data exploration 50 - Data Exploration -
How much positive and negative data? - What language? - ...

Step 4 : Evaluate approach 51 - Linear or non-linear?
- What kind of features? - Bag of words? Word counter? - K Best words? - Any preprocessing? - Stopwords? - How to find the best model? - How to construct the validation set?

Step 5 : Run model on data 52 - Run
and build the model

Step 6 : Evaluate results 53 - Check the score,
and run it on test data - Does it work? If not, why?

In Practice 59

Iterate Quickly - Build the minimal viable product. - Get
a running system ASAP. - Iterate from here. 60

No need to use all the data - Law of
Diminishing Returns 61

Learn Your Data - Is the data even correct? Target:
Dog happiest concerns 72109 62

Learn Your Data - Is the data even correct? =>
Models can be error-resistant given a lot more correct data, but no point in learning the wrong thing. Target: Dog happiest concerns 72109 63

The Curse of Dimensionality - Some features are noise -
Scalability 64 Source: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

The Curse of Dimensionality - Some features are noise -
Scalability => Principal Component Analysis 65 Source: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/

Feature Engineering Models can’t learn complicated relationships sometimes: 66 Source:
http://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/

Feature Engineering Models can’t learn complicated relationships sometimes: 67 Source:
http://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/

70:20:10 Rule - 70% features - 20% models - 10%
fine-tuning - The Unreasonable Effectiveness of Data 68

70:20:10 Rule - 70% features - 20% models - 10%
fine-tuning - The Unreasonable Effectiveness of Data => Any kind of model will need data. Ample good data. 69

Try Simple Things First - Start with simple features, models
first. - Move onto more sophisticated features and models later. - Occam's Razor - Among competing hypotheses, the one with the fewest assumptions should be selected. 70

Try Simple Things First - Start with simple features, models
first. - Move onto more sophisticated features and models later. - Occam's Razor - Among competing hypotheses, the one with the fewest assumptions should be selected. => Sometimes simple models work effectively well, and more sophisticated models may not yield much better gains. This is especially the case in deep features (images, audio) vs shallow features (text). 71

Ensembling is Your Friend - Bagging, Boosting, Stacking works effectively
well in practice - Harder for all models to fail altogether 72

Ensembling is Your Friend - Bagging, Boosting, Stacking works effectively
well in practice - Harder for all models to fail altogether => Try simple things first though, but this is worth trying. 73

Cheatsheet 74

Libraries • Scikit-Learn (Python) • R (lm, gbm, glm, caret,
etc...) • Lasagna (Python) • PyLearn2 (Python) • Keras (Python) • Caffe (C++, Python) • ... 75

Machine Learning Resources • Courses ◦ Machine Learning (by Andrew
Ng, Stanford, Coursera) ◦ Learning From Data (by Yaser S. Abu-Mustafa, Caltech) ◦ Neural Network for Machine Learning (by Geoff Hinton, University of Toronto) • Books ◦ Machine Learning : A Probabilistic Perspective, by Kevin Murphy ◦ The Elements of Statistical Learning 76

Questions? 77

Thank you! Code: https://github.com/log0/ieeetencon2015 78

Machine Learning In Practice for IEEE TENCON 2015

Machine Learning In Practice for IEEE TENCON 2015

More Decks by log0

Other Decks in Technology

Featured

Transcript