Slide 1

Slide 1 text

5/8/2014 1 Machine Learning Introduction Matt Nedrich Motivation enum ImageType { Tree, Dog, Car } ImageType getImageType( int[][][] inputImage ) { . . . }

Slide 2

Slide 2 text

5/8/2014 2 Motivation • Learning from data – Representation and generalization • Write programs we can’t write directly • Math applied in a creative way Agenda • Data Representation – Feature Extraction • Machine Learning Problems – Regression – Classification – Clustering • Machine Learning Algorithms – Gradient Descent – KNN – Neural Networks – K-means

Slide 3

Slide 3 text

5/8/2014 3 Data Representation • Critical first step in many learning problems • How to represent real world objects or concepts – Extract numerical information (features) from them Feature Extraction Suppose we want to: • Cluster houses • Predict home values • Classify houses Need to represent houses as collection of features

Slide 4

Slide 4 text

5/8/2014 4 Feature Extraction Size (square feet) Age (years) Lot size (acres) Bedrooms (count) Distance to landmark (miles) Feature Space Size (square feet) House Size Age Lot Size # BR 1 1400 15 0.5 2 2 800 4 0.25 1 3 2300 35 0.2 4 4 1700 8 0.5 2 …

Slide 5

Slide 5 text

5/8/2014 5 Feature Space Size (square feet) House Size Age Lot Size # BR 1 1400 15 0.5 2 2 800 4 0.25 1 3 2300 35 0.2 4 4 1700 8 0.5 2 … Age (years) Feature Space Size (square feet) House Size Age Lot Size # BR 1 1400 15 0.5 2 2 800 4 0.25 1 3 2300 35 0.2 4 4 1700 8 0.5 2 … Age (years)

Slide 6

Slide 6 text

5/8/2014 6 Feature Extraction Suppose we want to: • Find similar regions in an image – Called image segmentation – Primitive for higher level learning – Cluster pixels Image from https://flic.kr/p/ahLKUz Feature Extraction Red value (0-255) For each pixel Green value (0-255) Blue value (0-255) x-coordinate y-coordinate Image from https://flic.kr/p/ahLKUz

Slide 7

Slide 7 text

5/8/2014 7 Feature Extraction Red Green Blue Data Representation Recap • Feature selection is critical • Some preprocessing usually required – Scale features – Reduce dimensionality (e.g., PCA) • Once we have data in features space, we can apply ML algorithms

Slide 8

Slide 8 text

5/8/2014 8 Machine Learning Problems • Regression – Fit model (e.g., function) to existing data – Several input variables, one response (e.g., output) variable – Predict stock prices, home values, etc. • Classification – Place items into one of N bins/classes – Document / Text classification (e.g., spam vs. not spam, positive tweet vs. negative tweet, etc.) • Clustering – Group common items (e.g., documents, tweets, images, people) together – Product recommendation Regression • Select features • Embed data in feature space House size (sqft) Value

Slide 9

Slide 9 text

5/8/2014 9 Regression House size (sqft) Value • Select features • Embed data in feature space • Predict house values Regression House size (sqft) Value • Select features • Embed data in feature space • Predict house values • Run regression algorithm

Slide 10

Slide 10 text

5/8/2014 10 Regression House size (sqft) Value • Select features • Embed data in feature space • Predict house values • Run regression algorithm • Predict values Machine Learning Problems • Regression – Fit model (e.g., function) to existing data – Several input variables, one response (e.g., output) variable – Predict stock prices, home values, etc. • Classification – Place items into one of N bins/classes – Document / Text classification (e.g., spam vs. not spam, positive tweet vs. negative tweet, etc.) • Clustering – Group common items (e.g., documents, tweets, images, people) together – Product recommendation

Slide 11

Slide 11 text

5/8/2014 11 Classification • Sentiment classification • Face detection • Medial diagnosis • Spam detection Classification Feature A Feature B • Select features • Embed data in feature space

Slide 12

Slide 12 text

5/8/2014 12 Classification 2D Example Feature A Feature B • Select features • Embed data in feature space • Label data – E.g., spam vs. not spam Classification 2D Example Feature A Feature B • Select features • Embed data in feature space • Label data – E.g., spam vs. not spam • Classify new observations ? ? ? ?

Slide 13

Slide 13 text

5/8/2014 13 Machine Learning Problems • Regression – Fit model (e.g., function) to existing data – Several input variables, one response (e.g., output) variable – Predict stock prices, home values, etc. • Classification – Place items into one of N bins/classes – Document / Text classification (e.g., spam vs. not spam, positive tweet vs. negative tweet, etc.) • Clustering – Group common items (e.g., documents, tweets, images, people) together – Product recommendation Clustering • Group common items – documents, tweets, images, people • Customer segmentation

Slide 14

Slide 14 text

5/8/2014 14 Clustering Feature A Feature B 1. Select features 2. Embed data in feature space Clustering Feature A Feature B 1. Select features 2. Embed data in feature space 3. Apply clustering algorithm

Slide 15

Slide 15 text

5/8/2014 15 Clustering Feature A Feature B 1. Select features 2. Embed data in feature space 3. Apply clustering algorithm 4. Can be used to classify new observations ? ? ? Clustering Feature A Feature B 1. Select features 2. Embed data in feature space 3. Apply clustering algorithm 4. Can be used to classify new observations 5. Many other applications

Slide 16

Slide 16 text

5/8/2014 16 Algorithms • Regression – Gradient Descent • Classification – K-Nearest Neighbors – Neural Networks • Clustering – K-means Gradient Descent for Linear Regression

Slide 17

Slide 17 text

5/8/2014 17 Gradient Descent for Linear Regression Regression Example b mx y  

Slide 18

Slide 18 text

5/8/2014 18 Regression Example y intercept (b) Slope (m) Line Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4 Regression Example y intercept (b) Slope (m) Line Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4

Slide 19

Slide 19 text

5/8/2014 19 Regression Example y intercept (b) Slope (m) Line Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4 Regression Example y intercept (b) Slope (m) Line Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4

Slide 20

Slide 20 text

5/8/2014 20 Regression Example y intercept (b) Slope (m) Line Space X-Y Space (y = mx + b) x y 0 2 4 -4 4 2 4 4 -4 Regression Example • Need to score candidate lines (m,b) pairs • Choose the best one

Slide 21

Slide 21 text

5/8/2014 21 Regression Example - Error Regression Example - Error

Slide 22

Slide 22 text

5/8/2014 22 Regression Example - Error Regression Example - Error b mx y          N i b mx y i i error 1 2 ) ( Compute Gradient Search for solution

Slide 23

Slide 23 text

5/8/2014 23 Regression Example • Gradient – Derivative of error function – Acts as compass to search line space Regression Example

Slide 24

Slide 24 text

5/8/2014 24 Regression Example Regression Example

Slide 25

Slide 25 text

5/8/2014 25 Regression Example Regression Error Surface slope y-intercept error error slope y-intercept

Slide 26

Slide 26 text

5/8/2014 26 Regression Example error Slope (m) y-intercept (b) error Regression Example Slope (m) y-intercept (b)

Slide 27

Slide 27 text

5/8/2014 27 error Regression Example Slope (m) y-intercept (b) Gradient Descent Recap • Express problem in terms of something to optimize – E.g., minimize error • Use gradient to search space for solution • Used in much more than regression

Slide 28

Slide 28 text

5/8/2014 28 Gradient Descent Notes • Search space may have several minimums slope y-intercept error error slope y-intercept Gradient Descent Notes • Search space may have several minimums Images from http://en.wikipedia.org/wiki/Gradient_descent

Slide 29

Slide 29 text

5/8/2014 29 Gradient Descent Notes • Search space may have several minimums Image from http://rs.io/2014/02/16/simulated-annealing-intuition.html Algorithms • Regression – Gradient Descent • Classification – K-Nearest Neighbors – Neural Networks • Clustering – K-means

Slide 30

Slide 30 text

5/8/2014 30 Classification 2D Example Feature A Feature B • Have some labeled Data • Need to classify new observations ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) ? ? ? ?

Slide 31

Slide 31 text

5/8/2014 31 Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ?

Slide 32

Slide 32 text

5/8/2014 32 Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ?

Slide 33

Slide 33 text

5/8/2014 33 Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ? Classification 2D Example Feature A Feature B • Could use K-Nearest Neighbors approach (KNN) • For each new observation, search for K nearest neighbors – Assign based on majority • Suppose we use K=3 ? ? ? ?

Slide 34

Slide 34 text

5/8/2014 34 KNN Recap • Need to choose K • Need to search for nearest neighbors each time – Slow KNN Alternative • Train a classifier – Neural Network – Support Vector Machine – Logistic Regression

Slide 35

Slide 35 text

5/8/2014 35 Neural Networks • Train classifier that splits the different classes • Network of connected nodes – Connections are weighted – Solution requires learning weights • Output is result of an input being propagated through the network Neural Networks Neural Network Set of data observations Observation Label Correct

Slide 36

Slide 36 text

5/8/2014 36 Neural Networks Neural Network Set of data observations Observation Label Incorrect Weight Adjustment Neural Networks Neural Network Set of data observations Observation Label Incorrect Repeat until Neural Network is trained

Slide 37

Slide 37 text

5/8/2014 37 Back To Our Example Feature A Feature B • Have some labeled Data • Need to classify new observations ? ? ? ? Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B

Slide 38

Slide 38 text

5/8/2014 38 Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B

Slide 39

Slide 39 text

5/8/2014 39 Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B

Slide 40

Slide 40 text

5/8/2014 40 Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation

Slide 41

Slide 41 text

5/8/2014 41 Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation

Slide 42

Slide 42 text

5/8/2014 42 Neural Networks A B w0 w1 w4 w7 w6 w2 w5 w8 w3 Feature A Feature B Weight Adjustment via backpropagation Classification 2D Example Feature A Feature B • Neural Network defines some decision boundary

Slide 43

Slide 43 text

5/8/2014 43 Classification 2D Example Feature A Feature B • Neural Network defines some decision boundary • New observations fed through NN for classification ? ? ? ? Classification 2D Example Feature A Feature B • Neural Network defines some decision boundary • New observations fed through NN for classification • If training data was representative this will work well ? ? ? ?

Slide 44

Slide 44 text

5/8/2014 44 Algorithms • Regression – Gradient Descent • Classification – K-Nearest Neighbors – Neural Networks • Clustering – K-means k-means Clustering • Choose number of clusters k

Slide 45

Slide 45 text

5/8/2014 45 k-means Clustering • Choose number of clusters k • Choose k initial cluster centers k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center

Slide 46

Slide 46 text

5/8/2014 46 k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers

Slide 47

Slide 47 text

5/8/2014 47 k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers

Slide 48

Slide 48 text

5/8/2014 48 k-means Clustering • Choose number of clusters k • Choose k initial cluster centers • Assign each point to nearest center • Update cluster centers k-means on Image Red Green Blue

Slide 49

Slide 49 text

5/8/2014 49 k-means on Image (k=7) Red Green Blue k-means on Image (k=14) Red Green Blue

Slide 50

Slide 50 text

5/8/2014 50 k-means Clustering • Repeat until no cluster assignment changes • Easy and simple algorithm • Need to choose k – No great way to automatically do this – General challenge in clustering Resources • Coursera Machine Learning course – https://www.coursera.org/course/ml • Libraries (many exist) – Python - http://scikit-learn.org/dev/index.html – Java - http://www.cs.waikato.ac.nz/ml/weka/ – Too many others to mention

Slide 51

Slide 51 text

5/8/2014 51 Thanks! Neural Networks A B Output 0 0 0 0 1 0 1 0 0 1 1 1 Feature A Feature B 0 1 1

Slide 52

Slide 52 text

5/8/2014 52 Neural Networks A B h 1 wA wB wBias Feature A Feature B 0 1 1 0 or 1 0 or 1 Neural Networks A B h 1 0.75 0.75 -1 0 0 1 0          h h wB B wA A wBias h Feature A Feature B 0 1 1 0 or 1 0 or 1

Slide 53

Slide 53 text

5/8/2014 53 Neural Networks A B h 1 0.75 0.75 -1 A B h 1 1 1 -0.5 A h 1 -1 0.5 AND OR NOT