Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning In Practice for IEEE TENCON 2015

log0
November 06, 2015

Machine Learning In Practice for IEEE TENCON 2015

A quick overview on basics of machine learning with working code.

log0

November 06, 2015
Tweet

More Decks by log0

Other Decks in Technology

Transcript

  1. What is Machine Learning? Arthur Samuel (1959): Machine Learning: Field

    of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell (1998): A computer is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. 2
  2. Why do Machine Learning? • Computer Vision • Natural Language

    Processing • Speech Recognition • Computational Advertising • Recommendation System • Algorithmic Trading • Physics • ... 3
  3. Example : Speech to Text / Text to Speech Source:

    http://www.pcworld.com/article/2148940/windows-phone-os/ask-cortana-anything-sassy-answers-to-58-burning-questions.html Source: http://www.cultofmac.com/285528/siri-may-helped-2-year-old-girl-just-save-mothers-life/ Source: http://www.mytechbits.com/google-now-update-will-let-android-users-to-send-messages-using-google-voice-commands-via-whatsapp/9812861/ 11
  4. Goals, and the Types of Learning Types • Supervised learning

    • Unsupervised learning • Semi-supervised learning Goals • Classification • Regression 12
  5. Classification Given an input features vector x (x[i] ∈ R),

    predict y (y ∈ {1,...,C}). Example: Given a piece of email text, predict if it is spam or not. x = [1 0 0 1 … 0 1 1], where each entry represents if the word appeared or not. y = [spam, not_spam], |y| = 2, C = 2. 13
  6. Regression Given an input features vector x (x[i] ∈ R),

    predict y (y ∈ R). Example: Given the location and size of a house, predict its price. 14
  7. Model Training - Given a dataset D = (X, y)

    - X is a (n x m) matrix containing the input features. - n is the number of training data. - m is the number of features. - y is a (n x 1) vector containing the target. - n is the number of training data. - Find a good function h: X → Y , where (h ∈ H) - H is the hypothesis space, and h is a single hypothesis. 15
  8. Model Training - Given a dataset D = (X, y)

    - X is a (n x m) matrix containing the input features. - n is the number of training data. - m is the number of features. - y is a (n x 1) vector contains the target. - n is the number of training data. - Find a good function h: X → Y , where (h ∈ H) - H is the hypothesis space, and h is a single hypothesis. We will be using a model called linear regression. - Find y = WTX, W is a (1 x m) vector containing the weights to the model. 16
  9. Housing Price Prediction Size in feet2 Price ($) in 1000's

    X = [[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 20
  10. Housing Price Prediction Size in feet2 Price ($) in 1000's

    X = [[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 21 w = 0.1492165
  11. Model Training 1 # python scikit-learn models 2 X =

    [[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] 3 y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 4 5 # Train model on (X, y). 6 model = LinearRegression() 7 model.fit(X, y) 22
  12. Model Prediction 1 # python scikit-learn models 2 X =

    [[150], [400], [700], [720], [900], [1350], [1500], [1700], [2100], [2300], [2500]] 3 y = [80, 100, 180, 250, 210, 330, 400, 385, 395, 390, 420] 4 5 # Train model on (X, y). 6 model = LinearRegression() 7 model.fit(X, y) 8 9 # Use model to predict housing price of (X_test). 10 X_test = [[1200]] 11 y_test_pred = model.predict(X_test) 23
  13. Model Evaluation But, how do we pick models out of

    all possible models? - Different parameters - Randomness in algorithms - Variance in dataset - ... 26
  14. Model Evaluation Validation Set - Split the dataset into 80%

    : 20%, roughly, randomly. - 80% will be used as training set. - 20% will be used as validation set. - For each candidate model: - Train model on training set. - Validate model on validation set by checking the accuracy/loss. 32
  15. Model Evaluation 1 cutoff = len(y) * 4 / 5

    2 X_train = X[:cutoff] 3 y_train = y[:cutoff] 4 X_cv = X[cutoff:] 5 y_cv = y[cutoff:] 6 7 model = LinearRegression() 8 model.fit(X_train, y_train) 9 10 y_cv_pred = model.predict(X_cv) 11 error = mean_squared_error(y_cv_pred, y_cv) 33
  16. Model Evaluation 1 cutoff = len(y) * 4 / 5

    2 X_train = X[:cutoff] 3 y_train = y[:cutoff] 4 X_cv = X[cutoff:] 5 y_cv = y[cutoff:] 6 7 model = LinearRegression() 8 model.fit(X_train, y_train) 9 10 y_cv_pred = model.predict(X_cv) 11 error = mean_squared_error(y_cv_pred, y_cv) 34 => train_test_split(X, Y, test_size = 0.2)
  17. Model Evaluation 1 cutoff = len(y) * 4 / 5

    2 X_train = X[:cutoff] 3 y_train = y[:cutoff] 4 X_cv = X[cutoff:] 5 y_cv = y[cutoff:] 6 7 model = LinearRegression() 8 model.fit(X_train, y_train) 9 10 y_cv_pred = model.predict(X_cv) 11 error = mean_squared_error(y_cv_pred, y_cv) 35
  18. Cross Validation - Split the dataset into 5 folds -

    Train on fold 1, 2, 3, 4, validate on fold 5 - Train on fold 1, 2, 3, 5, validate on fold 4 - ... - Take the average of the errors (optionally, also standard deviation, variance, ...) 37
  19. Training Error, Test Error I got zero training error! Did

    I get a perfect model? What now? 40
  20. Overfitting and Underfitting A model's ability to perform well on

    new, unseen data. underfit appropriate fit overfit 43
  21. Step 1 : Define goals and metrics - Goals =>

    Classification - Classify a piece of text as positive/negative - Metrics => Zero One Loss 48
  22. Step 2 : Data collection and cleaning 49 - Data

    Collection - Extract from Rotten Tomatoes, IMDB, etc. - Label data - Data Cleaning - Check that the data is correctly labeled - The data is valid and not a bunch of invalid HTML
  23. Step 3 : Data exploration 50 - Data Exploration -

    How much positive and negative data? - What language? - ...
  24. Step 4 : Evaluate approach 51 - Linear or non-linear?

    - What kind of features? - Bag of words? Word counter? - K Best words? - Any preprocessing? - Stopwords? - How to find the best model? - How to construct the validation set?
  25. Step 6 : Evaluate results 53 - Check the score,

    and run it on test data - Does it work? If not, why?
  26. Iterate Quickly - Build the minimal viable product. - Get

    a running system ASAP. - Iterate from here. 60
  27. No need to use all the data - Law of

    Diminishing Returns 61
  28. Learn Your Data - Is the data even correct? =>

    Models can be error-resistant given a lot more correct data, but no point in learning the wrong thing. Target: Dog happiest concerns 72109 63
  29. The Curse of Dimensionality - Some features are noise -

    Scalability 64 Source: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
  30. The Curse of Dimensionality - Some features are noise -

    Scalability => Principal Component Analysis 65 Source: http://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
  31. Feature Engineering Models can’t learn complicated relationships sometimes: 66 Source:

    http://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/
  32. Feature Engineering Models can’t learn complicated relationships sometimes: 67 Source:

    http://blog.bigml.com/2013/02/21/everything-you-wanted-to-know-about-machine-learning-but-were-too-afraid-to-ask-part-two/
  33. 70:20:10 Rule - 70% features - 20% models - 10%

    fine-tuning - The Unreasonable Effectiveness of Data 68
  34. 70:20:10 Rule - 70% features - 20% models - 10%

    fine-tuning - The Unreasonable Effectiveness of Data => Any kind of model will need data. Ample good data. 69
  35. Try Simple Things First - Start with simple features, models

    first. - Move onto more sophisticated features and models later. - Occam's Razor - Among competing hypotheses, the one with the fewest assumptions should be selected. 70
  36. Try Simple Things First - Start with simple features, models

    first. - Move onto more sophisticated features and models later. - Occam's Razor - Among competing hypotheses, the one with the fewest assumptions should be selected. => Sometimes simple models work effectively well, and more sophisticated models may not yield much better gains. This is especially the case in deep features (images, audio) vs shallow features (text). 71
  37. Ensembling is Your Friend - Bagging, Boosting, Stacking works effectively

    well in practice - Harder for all models to fail altogether 72
  38. Ensembling is Your Friend - Bagging, Boosting, Stacking works effectively

    well in practice - Harder for all models to fail altogether => Try simple things first though, but this is worth trying. 73
  39. Libraries • Scikit-Learn (Python) • R (lm, gbm, glm, caret,

    etc...) • Lasagna (Python) • PyLearn2 (Python) • Keras (Python) • Caffe (C++, Python) • ... 75
  40. Machine Learning Resources • Courses ◦ Machine Learning (by Andrew

    Ng, Stanford, Coursera) ◦ Learning From Data (by Yaser S. Abu-Mustafa, Caltech) ◦ Neural Network for Machine Learning (by Geoff Hinton, University of Toronto) • Books ◦ Machine Learning : A Probabilistic Perspective, by Kevin Murphy ◦ The Elements of Statistical Learning 76