Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning

Machine Learning

Machine Learning is the general study of programs that learn from data. Machine Learning algorithms can be used to write software that we don't know how to write directly (e.g., spam filters, image classification, handwriting recognition, etc.).

This is a huge and broad topic. My goal is to give an introduction. Regardless of what type of software you write, chances are there are ways to employ some type of machine learning or data analysis algorithm to do something cool.

Atomic Object

May 06, 2014
Tweet

More Decks by Atomic Object

Other Decks in Technology

Transcript

  1. 5/8/2014 1 Machine Learning Introduction Matt Nedrich Motivation enum ImageType

    { Tree, Dog, Car } ImageType getImageType( int[][][] inputImage ) { . . . }
  2. 5/8/2014 2 Motivation • Learning from data – Representation and

    generalization • Write programs we can’t write directly • Math applied in a creative way Agenda • Data Representation – Feature Extraction • Machine Learning Problems – Regression – Classification – Clustering • Machine Learning Algorithms – Gradient Descent – KNN – Neural Networks – K-means
  3. 5/8/2014 3 Data Representation • Critical first step in many

    learning problems • How to represent real world objects or concepts – Extract numerical information (features) from them Feature Extraction Suppose we want to: • Cluster houses • Predict home values • Classify houses Need to represent houses as collection of features
  4. 5/8/2014 4 Feature Extraction Size (square feet) Age (years) Lot

    size (acres) Bedrooms (count) Distance to landmark (miles) Feature Space Size (square feet) House Size Age Lot Size # BR 1 1400 15 0.5 2 2 800 4 0.25 1 3 2300 35 0.2 4 4 1700 8 0.5 2 …
  5. 5/8/2014 5 Feature Space Size (square feet) House Size Age

    Lot Size # BR 1 1400 15 0.5 2 2 800 4 0.25 1 3 2300 35 0.2 4 4 1700 8 0.5 2 … Age (years) Feature Space Size (square feet) House Size Age Lot Size # BR 1 1400 15 0.5 2 2 800 4 0.25 1 3 2300 35 0.2 4 4 1700 8 0.5 2 … Age (years)
  6. 5/8/2014 6 Feature Extraction Suppose we want to: • Find

    similar regions in an image – Called image segmentation – Primitive for higher level learning – Cluster pixels Image from https://flic.kr/p/ahLKUz Feature Extraction Red value (0-255) For each pixel Green value (0-255) Blue value (0-255) x-coordinate y-coordinate Image from https://flic.kr/p/ahLKUz
  7. 5/8/2014 7 Feature Extraction Red Green Blue Data Representation Recap

    • Feature selection is critical • Some preprocessing usually required – Scale features – Reduce dimensionality (e.g., PCA) • Once we have data in features space, we can apply ML algorithms
  8. 5/8/2014 8 Machine Learning Problems • Regression – Fit model

    (e.g., function) to existing data – Several input variables, one response (e.g., output) variable – Predict stock prices, home values, etc. • Classification – Place items into one of N bins/classes – Document / Text classification (e.g., spam vs. not spam, positive tweet vs. negative tweet, etc.) • Clustering – Group common items (e.g., documents, tweets, images, people) together – Product recommendation Regression • Select features • Embed data in feature space House size (sqft) Value
  9. 5/8/2014 9 Regression House size (sqft) Value • Select features

    • Embed data in feature space • Predict house values Regression House size (sqft) Value • Select features • Embed data in feature space • Predict house values • Run regression algorithm
  10. 5/8/2014 10 Regression House size (sqft) Value • Select features

    • Embed data in feature space • Predict house values • Run regression algorithm • Predict values Machine Learning Problems • Regression – Fit model (e.g., function) to existing data – Several input variables, one response (e.g., output) variable – Predict stock prices, home values, etc. • Classification – Place items into one of N bins/classes – Document / Text classification (e.g., spam vs. not spam, positive tweet vs. negative tweet, etc.) • Clustering – Group common items (e.g., documents, tweets, images, people) together – Product recommendation
  11. 5/8/2014 11 Classification • Sentiment classification • Face detection •

    Medial diagnosis • Spam detection Classification Feature A Feature B • Select features • Embed data in feature space
  12. 5/8/2014 12 Classification 2D Example Feature A Feature B •

    Select features • Embed data in feature space • Label data – E.g., spam vs. not spam Classification 2D Example Feature A Feature B • Select features • Embed data in feature space • Label data – E.g., spam vs. not spam • Classify new observations ? ? ? ?
  13. 5/8/2014 13 Machine Learning Problems • Regression – Fit model

    (e.g., function) to existing data – Several input variables, one response (e.g., output) variable – Predict stock prices, home values, etc. • Classification – Place items into one of N bins/classes – Document / Text classification (e.g., spam vs. not spam, positive tweet vs. negative tweet, etc.) • Clustering – Group common items (e.g., documents, tweets, images, people) together – Product recommendation Clustering • Group common items – documents, tweets, images, people • Customer segmentation
  14. 5/8/2014 14 Clustering Feature A Feature B 1. Select features

    2. Embed data in feature space Clustering Feature A Feature B 1. Select features 2. Embed data in feature space 3. Apply clustering algorithm
  15. 5/8/2014 15 Clustering Feature A Feature B 1. Select features

    2. Embed data in feature space 3. Apply clustering algorithm 4. Can be used to classify new observations ? ? ? Clustering Feature A Feature B 1. Select features 2. Embed data in feature space 3. Apply clustering algorithm 4. Can be used to classify new observations 5. Many other applications
  16. 5/8/2014 16 Algorithms • Regression – Gradient Descent • Classification

    – K-Nearest Neighbors – Neural Networks • Clustering – K-means Gradient Descent for Linear Regression
  17. 5/8/2014 18 Regression Example y intercept (b) Slope (m) Line

    Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4 Regression Example y intercept (b) Slope (m) Line Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4
  18. 5/8/2014 19 Regression Example y intercept (b) Slope (m) Line

    Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4 Regression Example y intercept (b) Slope (m) Line Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4
  19. 5/8/2014 20 Regression Example y intercept (b) Slope (m) Line

    Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4 Regression Example • Need to score candidate lines (m,b) pairs • Choose the best one
  20. 5/8/2014 22 Regression Example - Error Regression Example - Error

    b mx y          N i b mx y i i error 1 2 ) ( Compute Gradient Search for solution
  21. 5/8/2014 23 Regression Example • Gradient – Derivative of error

    function – Acts as compass to search line space Regression Example
  22. 5/8/2014 26 Regression Example error Slope (m) y-intercept (b) error

    Regression Example Slope (m) y-intercept (b)
  23. 5/8/2014 27 error Regression Example Slope (m) y-intercept (b) Gradient

    Descent Recap • Express problem in terms of something to optimize – E.g., minimize error • Use gradient to search space for solution • Used in much more than regression
  24. 5/8/2014 28 Gradient Descent Notes • Search space may have

    several minimums slope y-intercept error error slope y-intercept Gradient Descent Notes • Search space may have several minimums Images from http://en.wikipedia.org/wiki/Gradient_descent
  25. 5/8/2014 29 Gradient Descent Notes • Search space may have

    several minimums Image from http://rs.io/2014/02/16/simulated-annealing-intuition.html Algorithms • Regression – Gradient Descent • Classification – K-Nearest Neighbors – Neural Networks • Clustering – K-means
  26. 5/8/2014 30 Classification 2D Example Feature A Feature B •

    Have some labeled Data • Need to classify new observations ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) ? ? ? ?
  27. 5/8/2014 31 Classification 2D Example Feature A Feature B •

    Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ?
  28. 5/8/2014 32 Classification 2D Example Feature A Feature B •

    Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ?
  29. 5/8/2014 33 Classification 2D Example Feature A Feature B •

    Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ?
  30. 5/8/2014 34 KNN Recap • Need to choose K •

    Need to search for nearest neighbors each time – Slow KNN Alternative • Train a classifier – Neural Network – Support Vector Machine – Logistic Regression
  31. 5/8/2014 35 Neural Networks • Train classifier that splits the

    different classes • Network of connected nodes – Connections are weighted – Solution requires learning weights • Output is result of an input being propagated through the network Neural Networks Neural Network Set of data observations Observation Label Correct
  32. 5/8/2014 36 Neural Networks Neural Network Set of data observations

    Observation Label Incorrect Weight Adjustment Neural Networks Neural Network Set of data observations Observation Label Incorrect Repeat until Neural Network is trained
  33. 5/8/2014 37 Back To Our Example Feature A Feature B

    • Have some labeled Data • Need to classify new observations ? ? ? ? Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B
  34. 5/8/2014 38 Neural Networks A B w0 w1 w4 w7

    w6 w2 w5 w8 w3 Feature A Feature B Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B
  35. 5/8/2014 39 Neural Networks A B w0 w1 w4 w7

    w6 w2 w5 w8 w3 Feature A Feature B Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B
  36. 5/8/2014 40 Neural Networks A B w0 w1 w4 w7

    w6 w2 w5 w8 w3 Feature A Feature B Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation
  37. 5/8/2014 41 Neural Networks A B w0 w1 w4 w7

    w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation
  38. 5/8/2014 42 Neural Networks A B w0 w1 w4 w7

    w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation Classification 2D Example Feature A Feature B • Neural Network defines some decision boundary
  39. 5/8/2014 43 Classification 2D Example Feature A Feature B •

    Neural Network defines some decision boundary • New observations fed through NN for classification ? ? ? ? Classification 2D Example Feature A Feature B • Neural Network defines some decision boundary • New observations fed through NN for classification • If training data was representative this will work well ? ? ? ?
  40. 5/8/2014 44 Algorithms • Regression – Gradient Descent • Classification

    – K-Nearest Neighbors – Neural Networks • Clustering – K-means k-means Clustering • Choose number of clusters k
  41. 5/8/2014 45 k-means Clustering • Choose number of clusters k

    • Choose k initial cluster centers k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center
  42. 5/8/2014 46 k-means Clustering • Choose number of clusters k

    • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers
  43. 5/8/2014 47 k-means Clustering • Choose number of clusters k

    • Choose k initial cluster centers • Assign each point to nearest center k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers
  44. 5/8/2014 48 k-means Clustering • Choose number of clusters k

    • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers k-means on Image Red Green Blue
  45. 5/8/2014 50 k-means Clustering • Repeat until no cluster assignment

    changes • Easy and simple algorithm • Need to choose k – No great way to automatically do this – General challenge in clustering Resources • Coursera Machine Learning course – https://www.coursera.org/course/ml • Libraries (many exist) – Python - http://scikit-learn.org/dev/index.html – Java - http://www.cs.waikato.ac.nz/ml/weka/ – Too many others to mention
  46. 5/8/2014 51 Thanks! Neural Networks A B Output 0 0

    0 0 1 0 1 0 0 1 1 1 Feature A Feature B 0 1 1
  47. 5/8/2014 52 Neural Networks A B h 1 wA wB

    wBias Feature A Feature B 0 1 1 0 or 1 0 or 1 Neural Networks A B h 1 0.75 0.75 -1 0 0 1 0          h h wB B wA A wBias h Feature A Feature B 0 1 1 0 or 1 0 or 1
  48. 5/8/2014 53 Neural Networks A B h 1 0.75 0.75

    -1 A B h 1 1 1 -0.5 A h 1 -1 0.5 AND OR NOT