Slide 1

Slide 1 text

Introduction to Classification & Regression Emaad Ahmed Manzoor November, 2017

Slide 2

Slide 2 text

Classification

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

How would you tag images with descriptive text?

Slide 6

Slide 6 text

Image Tagging as Classification

Slide 7

Slide 7 text

Image Tagging as Classification x1 x2 xn https://static.pexels.com/photos/126407/pexels-photo-126407.jpeg https://static.pexels.com/photos/54632/cat-animal-eyes-grey-54632.jpeg https://c1.staticflickr.com/4/3645/3523440998_474d43ddc6_b.jpg

Slide 8

Slide 8 text

Image Tagging as Classification https://static.pexels.com/photos/126407/pexels-photo-126407.jpeg https://static.pexels.com/photos/54632/cat-animal-eyes-grey-54632.jpeg https://c1.staticflickr.com/4/3645/3523440998_474d43ddc6_b.jpg 200 255 101 205 100 200 255 101 205 100 200 255 101 205 100 x1 x2 xn

Slide 9

Slide 9 text

Image Tagging as Classification https://static.pexels.com/photos/126407/pexels-photo-126407.jpeg https://static.pexels.com/photos/54632/cat-animal-eyes-grey-54632.jpeg https://c1.staticflickr.com/4/3645/3523440998_474d43ddc6_b.jpg yn = spiderman y2 = cat y1 = cat 200 255 101 205 100 200 255 101 205 100 200 255 101 205 100 x1 x2 xn

Slide 10

Slide 10 text

Image Tagging as Classification xn+1 cat Predict Given ŷn+1

Slide 11

Slide 11 text

Classification Essentials Labeled data { (x1, y1), (x2, y2), … }

Slide 12

Slide 12 text

Classification Essentials Labeled data { (x1, y1), (x2, y2), … } Types of labels: • Binary: yi ∈ { 0, 1 } • Multi-class: yi ∈ { cat, spiderman, … } • Multi-label: yi ∈ { {cat, feline}, {…} }

Slide 13

Slide 13 text

Classification Essentials Labeled data { (x1, y1), (x2, y2), … } Classification Model

Slide 14

Slide 14 text

Classification Essentials Labeled data { (x1, y1), (x2, y2), … } Classification Model • Linear function • Tree-based • Nearest-neighbor • …

Slide 15

Slide 15 text

A Simple Classifier — kNN

Slide 16

Slide 16 text

A Simple Classifier — kNN Intuition: Similar points have similar labels

Slide 17

Slide 17 text

A Simple Classifier — kNN Training Data { (x1, y1), (x2, y2), … }

Slide 18

Slide 18 text

A Simple Classifier — kNN Test Point xi, yi = ? i

Slide 19

Slide 19 text

A Simple Classifier — kNN i k = 1 k-NN majority vote Test Point xi, yi = ?

Slide 20

Slide 20 text

A Simple Classifier — kNN i k = 2 k-NN majority vote Break ties randomly Test Point xi, yi = ?

Slide 21

Slide 21 text

A Simple Classifier — kNN i k = 3 k-NN majority vote Break ties randomly Test Point xi, yi = ?

Slide 22

Slide 22 text

A Simple Classifier — kNN Classifier decision boundary If we wish to minimize the probability of misclassification, this is done by assigning the test point x to the class having the largest posterior probability, corresponding to the largest value of Kk/K. Thus to classify a new point, we identify the K nearest points from the training data set and then assign the new point to the class having the largest number of representatives amongst this set. Ties can be broken at random. The particular case of K = 1 is called the nearest-neighbour rule, because a test point is simply assigned to the same class as the nearest point from the training set. These concepts are illustrated in Figure 2.27. In Figure 2.28, we show the results of applying the K-nearest-neighbour algo- rithm to the oil flow data, introduced in Chapter 1, for various values of K. As expected, we see that K controls the degree of smoothing, so that small K produces many small regions of each class, whereas large K leads to fewer larger regions. x6 x7 K = 1 0 1 2 0 1 2 x6 x7 K = 3 0 1 2 0 1 2 x6 x7 K = 31 0 1 2 0 1 2 Figure 2.28 Plot of 200 data points from the oil data set showing values of x6 plotted against x7 , where the red, green, and blue points correspond to the ‘laminar’, ‘annular’, and ‘homogeneous’ classes, respectively. Also shown are the classifications of the input space given by the K-nearest-neighbour algorithm for various values of K.

Slide 23

Slide 23 text

A Simple Classifier — kNN Classifier decision boundary If we wish to minimize the probability of misclassification, this is done by assigning the test point x to the class having the largest posterior probability, corresponding to the largest value of Kk/K. Thus to classify a new point, we identify the K nearest points from the training data set and then assign the new point to the class having the largest number of representatives amongst this set. Ties can be broken at random. The particular case of K = 1 is called the nearest-neighbour rule, because a test point is simply assigned to the same class as the nearest point from the training set. These concepts are illustrated in Figure 2.27. In Figure 2.28, we show the results of applying the K-nearest-neighbour algo- rithm to the oil flow data, introduced in Chapter 1, for various values of K. As expected, we see that K controls the degree of smoothing, so that small K produces many small regions of each class, whereas large K leads to fewer larger regions. x6 x7 K = 1 0 1 2 0 1 2 x6 x7 K = 3 0 1 2 0 1 2 x6 x7 K = 31 0 1 2 0 1 2 Figure 2.28 Plot of 200 data points from the oil data set showing values of x6 plotted against x7 , where the red, green, and blue points correspond to the ‘laminar’, ‘annular’, and ‘homogeneous’ classes, respectively. Also shown are the classifications of the input space given by the K-nearest-neighbour algorithm for various values of K. Non-linear decision boundary

Slide 24

Slide 24 text

A Simple Classifier — kNN Optimal classifier with error e 1-NN with k = 1 will have error < 2e (Cover and Hart, 1967)

Slide 25

Slide 25 text

A Linear Classifier — SVM

Slide 26

Slide 26 text

A Linear Classifier — SVM Assumption: Data is linearly separable { (X1, t1), (X2, t2), … } ti ∈ { +1, -1 }

Slide 27

Slide 27 text

A Linear Classifier — SVM Linearly Separable Not Linearly Separable w1 w2

Slide 28

Slide 28 text

A Linear Classifier — SVM Goal: Find the best linear separator

Slide 29

Slide 29 text

A Linear Classifier — SVM w1 w2

Slide 30

Slide 30 text

A Linear Classifier — SVM Which linear separator is better? w1 w2

Slide 31

Slide 31 text

A Linear Classifier — SVM w1 w2 Margin Perpendicular distance between the separator and the nearest data point

Slide 32

Slide 32 text

A Linear Classifier — SVM w1 w2 Goal Find the maximum margin linear separator

Slide 33

Slide 33 text

A Linear Classifier — SVM w2 Find the maximum margin linear separator Goal

Slide 34

Slide 34 text

A Linear Classifier — SVM How do we handle non-linearly separable data?

Slide 35

Slide 35 text

Kernels

Slide 36

Slide 36 text

Kernels Not Linearly Separable

Slide 37

Slide 37 text

Kernels [ x, y ] [ x, y, x2, y2 ] ϕ( )

Slide 38

Slide 38 text

Kernels ϕ( ) Linearly Separable in 4-D

Slide 39

Slide 39 text

Kernels ϕ( ) Linearly Separable in 4-D Non-linear decision boundary in 2-D

Slide 40

Slide 40 text

Kernels https://en.wikipedia.org/wiki/Polynomial_kernel

Slide 41

Slide 41 text

Kernels Problem: The non-linear mapping ϕ can be extremely large

Slide 42

Slide 42 text

Kernels Solution: The Kernel Trick Problem: The non-linear mapping ϕ can be extremely large oneweirdkerneltrick.com

Slide 43

Slide 43 text

The Kernel Trick An explicit mapping is not required!

Slide 44

Slide 44 text

The Kernel Trick A valid kernel function K(x, y) implicitly defines a feature mapping Much easier to define K(x, y)

Slide 45

Slide 45 text

Examples of Valid Kernels K( x , x 0) = e( k x x 0k2 2 2 ) Radial basis function K( x , x 0) = ( x T y + c)d Polynomial

Slide 46

Slide 46 text

Generative Models

Slide 47

Slide 47 text

Generative Models Model the unknown data generating process Learn parameters of the model from the data

Slide 48

Slide 48 text

Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J., LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014. Data — Beer ratings by users on RateBeer.com

Slide 49

Slide 49 text

Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J., LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014. Question — How do beer drinkers “progress” over time?

Slide 50

Slide 50 text

Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J., LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014. Maximize the data likelihood xij ⇠ Multinomial(⇥(ci, sj)) ⇥(ci, sj) ⇠ Dirichlet( ) A generative model

Slide 51

Slide 51 text

Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J., LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014.

Slide 52

Slide 52 text

Naive Bayes

Slide 53

Slide 53 text

Naive Bayes Documents x1 x2 xn

Slide 54

Slide 54 text

Naive Bayes Documents x1 x2 xn xi = { w1, w2, …, wm } Words use place line one via money bank account United deposit Uk Assets funds Mr family come private kin also next now years last two died sum life full give well far 25th left care sent put 10 Allied kept USD Sir lord arise fax business fund want late claim share death inform client regards dear offer five find partner gold sincere manager prior nature state turn OLD able oil 15 5 end file 30 M incur collecting Kindly action west acceptable Africa town $ TW ENTY BELLO 2 FASO AUDITING ran OPENED FILES CHARTER JET 50 BENIN 55 TRADER TRADE 60 70 THING 95 hand DIE TIRED Mrs sence VALID DRIVERS George CHEAT HIT 1To TRACE visit SET ASIDE TAKE 252 2To BILLS SEX cocoa CELL CODE 3To Barr banking million make Dollars Arag Contact investigation since name charity organizations assist months official transaction know Simeon God mail interest provide forward world person father email deposited may COUNTRY good TRANSFER never London Please ownership situation contacting HSBC wish given made willing release investment live proposal thousand JOHN KOROVO FOREIGNER huge soon reward told way investments Schoelers properties within back time due numbers Total right believe less project information man estate children ACCIDENT got States assistance ask address hundred consignment OW NER beneficiary profit APPROVED assurance International position desire receive expenses attorney law deceased might COURSE used names associates deal proceeds contacted immediately GOING accounts bless procedure just details clients send Best per daughter enable kindness lines hospital Abidjan fathers FOREIGN chance Amah privileged cote wife divoire Firm six Trust informed certain permit portfolio new especially around opportunities cash COMPANY process instructions understand party relation Abdul instruct simply destroy let cent general DISCOVERED hesitate choice plane crash security knows confide capacity DOCUMENTS need simple Wumi 2000 sharing confidence conclusion honourably telephone secure phone seeking 2003 annually property Based following ways subject serve nothing sale indicate main towards concluded reverting Moreover current period revert LOCAL capital fifteen effort start input successful first feel message Four free According lived accept Peter Attah whole soul gives recommendations living CREDIT ABROAD invest letter CITY Port Harcourt like hear came division ago people advice took risk done $15 later task sector days dead held depositors alone much help seek move pass work must Management arrangement guardian monitored majestys government compensation distributing poisoned overseas consider AFRICAN COMMISSION appreciate without relationship placed request result reputable communication explained wished officer numerous wealth various managers found AMOUNT Kindom orphans Securities Trading charges affiliate processes worth concerned special declared possible surviving means existence rarely nominate internal dictates matter Stella practice relatives customer prepared accrued released distribute BOUAKE expedite HUNDREN SULEMAN AUDITOR BURKINA FLOATING economical GREETINGS RECORDS BEIRUTBOUND businessI fundHe DECEMBER COTONOU REPUBLIC NOBODY MINING PROVED ALONG INVOLVED reimburse TWENTYFIVE PASSPORT humanity STRONG sympathetic INFLUENCE mutual FOREIGNERS PENDING PHYSICAL ARRIVAL PROVE supposed balance BUILD retrive ENTITLED GRATIFICATION wealthy fearing education residential CHAMPION Hello Phillip

Slide 55

Slide 55 text

Naive Bayes Documents x1 x2 xn Labels ∈ { ham, spam } y1 y2 yn use place line one via money bank account United deposit Uk Assets funds Mr family come private kin also next now years last two died sum life full give well far 25th left care sent put 10 Allied kept USD Sir lord arise fax business fund want late claim share death inform client regards dear offer five find partner gold sincere manager prior nature state turn OLD able oil 15 5 end file 30 M incur collecting Kindly action west acceptable Africa town $ TW ENTY BELLO 2 FASO AUDITING ran OPENED FILES CHARTER JET 50 BENIN 55 TRADER TRADE 60 70 THING 95 hand DIE TIRED Mrs sence VALID DRIVERS George CHEAT HIT 1To TRACE visit SET ASIDE TAKE 252 2To BILLS SEX cocoa CELL CODE 3To Barr banking million make Dollars Arag Contact investigation since name charity organizations assist months official transaction know Simeon God mail interest provide forward world person father email deposited may COUNTRY good TRANSFER never London Please ownership situation contacting HSBC wish given made willing release investment live proposal thousand JOHN KOROVO FOREIGNER huge soon reward told way investments Schoelers properties within back time due numbers Total right believe less project information man estate children ACCIDENT got States assistance ask address hundred consignment OW NER beneficiary profit APPROVED assurance International position desire receive expenses attorney law deceased might COURSE used names associates deal proceeds contacted immediately GOING accounts bless procedure just details clients send Best per daughter enable kindness lines hospital Abidjan fathers FOREIGN chance Amah privileged cote wife divoire Firm six Trust informed certain permit portfolio new especially around opportunities cash COMPANY process instructions understand party relation Abdul instruct simply destroy let cent general DISCOVERED hesitate choice plane crash security knows confide capacity DOCUMENTS need simple Wumi 2000 sharing confidence conclusion honourably telephone secure phone seeking 2003 annually property Based following ways subject serve nothing sale indicate main towards concluded reverting Moreover current period revert LOCAL capital fifteen effort start input successful first feel message Four free According lived accept Peter Attah whole soul gives recommendations living CREDIT ABROAD invest letter CITY Port Harcourt like hear came division ago people advice took risk done $15 later task sector days dead held depositors alone much help seek move pass work must Management arrangement guardian monitored majestys government compensation distributing poisoned overseas consider AFRICAN COMMISSION appreciate without relationship placed request result reputable communication explained wished officer numerous wealth various managers found AMOUNT Kindom orphans Securities Trading charges affiliate processes worth concerned special declared possible surviving means existence rarely nominate internal dictates matter Stella practice relatives customer prepared accrued released distribute BOUAKE expedite HUNDREN SULEMAN AUDITOR BURKINA FLOATING economical GREETINGS RECORDS BEIRUTBOUND businessI fundHe DECEMBER COTONOU REPUBLIC NOBODY MINING PROVED ALONG INVOLVED reimburse TWENTYFIVE PASSPORT humanity STRONG sympathetic INFLUENCE mutual FOREIGNERS PENDING PHYSICAL ARRIVAL PROVE supposed balance BUILD retrive ENTITLED GRATIFICATION wealthy fearing education residential CHAMPION Hello Phillip xi = { w1, w2, …, wm } Words

Slide 56

Slide 56 text

Naive Bayes Assume there are M possible words in the vocabulary

Slide 57

Slide 57 text

Naive Bayes Generating a document xi:

Slide 58

Slide 58 text

Naive Bayes Generating a document xi: θ = P(spam)

Slide 59

Slide 59 text

Naive Bayes Generating a document xi: 1. Pick a label yi from {ham, spam} with probability θ θ = P(spam)

Slide 60

Slide 60 text

Naive Bayes Generating a document xi: 1. Pick a label yi from {ham, spam} with probability θ θ = P(spam) P(wj | ham) P(wj | spam)

Slide 61

Slide 61 text

Naive Bayes Generating a document xi: 1. Pick a label yi from {ham, spam} with probability θ θ = P(spam) P(wj | ham) P(wj | spam) use place line one via money bank account United deposit Uk Assets funds Mr family come private kin also next now years last two died sum life full give well far 25th left care sent put 10 Allied kept USD Sir lord arise fax business fund want late claim share death inform client regards dear offer five find partner gold sincere manager prior nature state turn OLD able oil 15 5 end file 30 M incur collecting Kindly action west acceptable Africa town $ TW ENTY BELLO 2 FASO AUDITING ran OPENED FILES CHARTER JET 50 BENIN 55 TRADER TRADE 60 70 THING 95 hand DIE TIRED Mrs sence VALID DRIVERS George CHEAT HIT 1To TRACE visit SET ASIDE TAKE 252 2To BILLS SEX cocoa CELL CODE 3To Barr banking million make Dollars Arag Contact investigation since name charity organizations assist months official transaction know Simeon God mail interest provide forward world person father email deposited may COUNTRY good TRANSFER never London Please ownership situation contacting HSBC wish given made willing release investment live proposal thousand JOHN KOROVO FOREIGNER huge soon reward told way investments Schoelers properties within back time due numbers Total right believe less project information man estate children ACCIDENT got States assistance ask address hundred consignment OW NER beneficiary profit APPROVED assurance International position desire receive expenses attorney law deceased might COURSE used names associates deal proceeds contacted immediately GOING accounts bless procedure just details clients send Best per daughter enable kindness lines hospital Abidjan fathers FOREIGN chance Amah privileged cote wife divoire Firm six Trust informed certain permit portfolio new especially around opportunities cash COMPANY process instructions understand party relation Abdul instruct simply destroy let cent general DISCOVERED hesitate choice plane crash security knows confide capacity DOCUMENTS need simple Wumi 2000 sharing confidence conclusion honourably telephone secure phone seeking 2003 annually property Based following ways subject serve nothing sale indicate main towards concluded reverting Moreover current period revert LOCAL capital fifteen effort start input successful first feel message Four free According lived accept Peter Attah whole soul gives recommendations living CREDIT ABROAD invest letter CITY Port Harcourt like hear came division ago people advice took risk done $15 later task sector days dead held depositors alone much help seek move pass work must Management arrangement guardian monitored majestys government compensation distributing poisoned overseas consider AFRICAN COMMISSION appreciate without relationship placed request result reputable communication explained wished officer numerous wealth various managers found AMOUNT Kindom orphans Securities Trading charges affiliate processes worth concerned special declared possible surviving means existence rarely nominate internal dictates matter Stella practice relatives customer prepared accrued released distribute BOUAKE expedite HUNDREN SULEMAN AUDITOR BURKINA FLOATING economical GREETINGS RECORDS BEIRUTBOUND businessI fundHe DECEMBER COTONOU REPUBLIC NOBODY MINING PROVED ALONG INVOLVED reimburse TWENTYFIVE PASSPORT humanity STRONG sympathetic INFLUENCE mutual FOREIGNERS PENDING PHYSICAL ARRIVAL PROVE supposed balance BUILD retrive ENTITLED GRATIFICATION wealthy fearing education residential CHAMPION Hello Phillip word words sprite placed area algorithm layout candidate step collision bounding without retrieve operation perform hierarchical time 32 possible draw placement data pixel expensive pixels even masks implementation simple detection starting larger whole previously comparing move box large think version single tree separately always use overlap animations prevents browsers event loop blocking placing incredibly available GitHub important Attempt place point usually near middle somewhere central horizontal line intersects open source one along increasing spiral Repeat intersections found hard part making license efficiently According Jonathan Feinberg Wordle uses combination d3cloud Note boxes quadtrees achieve reasonable speeds Glyphs JavaScript isnt way code precise glyph shapes via DOM except perhaps SVG fonts Instead text hidden canvas element rendering final Retrieving output requires many additional batch development Sprites initial quite performed slow hundred using run doesnt copy appropriate position asynchronously representing configurable Cloud advantage involves positioning size relevant rather previous Somewhat surprisingly lowlevel hack made tremendous difference constructing compressed blocks 1bit 32bit integers thus reducing number checks memory times fact turned beat makes quadtree everything tried Generator areas font sizes animate primarily Works needs stuttering test per whereas compare every overlaps slightly Another possibility merge recommended fairly though compared analagous mask essentially ORing block converting

Slide 62

Slide 62 text

Naive Bayes Generating a document xi: 1. Pick a label yi from {ham, spam} with probability θ 2. For each possible word w1 — wM, include it in the document with probability P(wj | yi) θ = P(spam) P(wj | ham) P(wj | spam)

Slide 63

Slide 63 text

Naive Bayes How many coins? θ = P(spam) P(wj | ham) P(wj | spam)

Slide 64

Slide 64 text

Naive Bayes — Important Assumption

Slide 65

Slide 65 text

Naive Bayes — Important Assumption P(w1, w2, …, wM | label) = P(w1 | label)…P(wM, | label)

Slide 66

Slide 66 text

Naive Bayes — Important Assumption P(w1, w2, …, wM | label) = P(w1 | label)…P(wM, | label) The presence of each word within a document is conditionally independent of the other words, given the label

Slide 67

Slide 67 text

Conditional Independence SPAM “MILLION” “WINNER”

Slide 68

Slide 68 text

I observe the word “million” in the document Conditional Independence SPAM “MILLION” “WINNER”

Slide 69

Slide 69 text

I observe the word “million” in the document Conditional Independence SPAM “MILLION” “WINNER” How do the chances of observing the word “winner” change?

Slide 70

Slide 70 text

Conditional Independence SPAM “MILLION” “WINNER” Knowing that “million” is observed changes our degree of uncertainty about observing “winner”

Slide 71

Slide 71 text

Conditional Independence SPAM “MILLION” “WINNER” Knowing that “million” is observed changes our degree of uncertainty about observing “winner” The events are not independent

Slide 72

Slide 72 text

Conditional Independence SPAM “MILLION” “WINNER” I know that the document is not spam

Slide 73

Slide 73 text

Conditional Independence SPAM “MILLION” “WINNER” I know that the document is not spam I observe the word “million” in the document

Slide 74

Slide 74 text

Conditional Independence SPAM “MILLION” “WINNER” I know that the document is not spam I observe the word “million” in the document Does that affect the chances of observing “winner”?

Slide 75

Slide 75 text

Naive Bayes — Fitting Parameters P ( yi = spam) = ✓ = number of spam documents total number of documents P ( wj | spam) = number of times word wj occurs in spam total number of words labeled spam

Slide 76

Slide 76 text

Naive Bayes — Prediction My father was a very wealthy cocoa merchant in Abidjan, the economic capital of Ivory Coast before he was poisoned to death by his business associates on one of their outing to discus on a business deal. http://www.hoax-slayer.net/wumi-abdul-advance-fee-scam/ P(y = spam) ∝ P(“my” | spam)…P(“deal” | spam)P(spam) P(y = ham) ∝ P(“my” | ham)…P(“deal” | ham)P(spam)

Slide 77

Slide 77 text

Naive Bayes — Smoothing What if we observe a word in a document that we never saw during training? P(“Ivory” | spam) = 0.0 P(y = spam) ∝ P(“my” | spam)…P(“Ivory” | spam)P(spam) = 0.0

Slide 78

Slide 78 text

Smooth word counts P ( wj | spam) = number of times word wj occurs in spam + 1 total number of words labeled spam + |V | ccurs in spam + 1 eled spam + |V | unique words in the training data Naive Bayes — Smoothing

Slide 79

Slide 79 text

Method Evaluation

Slide 80

Slide 80 text

Evaluation Goals

Slide 81

Slide 81 text

Evaluation Goals Model selection — finding the best-performing model kNN NB

Slide 82

Slide 82 text

Evaluation Goals Model selection — finding the best-performing model k=2 k=3 kNN NB “Hyperparameter" selection — finding the best hyperparameters for a given model

Slide 83

Slide 83 text

Goal: Minimize the error on future, unobserved data Evaluation Goals Model selection — finding the best-performing model k=2 k=3 kNN NB “Hyperparameter" selection — finding the best hyperparameters for a given model

Slide 84

Slide 84 text

Method I — Train, Validation, Test • Split data randomly into train, validation and test • Split can be stratified based on the label • See sklearn.model_selection.train_test_split Data Train Test Val

Slide 85

Slide 85 text

Method I — Train and Test Data Can we do this without wasting valuable training data?

Slide 86

Slide 86 text

Method II — k-Fold Cross-Validation • Split data randomly into train and test • Split train data randomly into k equal “folds” • Train on k-1 folds, validate on the remaining • Average the k metrics from each fold • See sklearn.model_selection.KFold

Slide 87

Slide 87 text

Method II — k-Fold Cross-Validation (Shuffled) Training Data Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test Data

Slide 88

Slide 88 text

Hyperparameter Selection • Grid-search: Search a “well- spaced grid” of hyperparameter values • Performance metrics averaged over the k validation folds • Select the hyperparameters with the best performance

Slide 89

Slide 89 text

Performance Metrics Factors driving the choice of a performance metric: • Data — balanced vs. skewed • Task — ranking, classification, clustering • Real-world use-case

Slide 90

Slide 90 text

No content

Slide 91

Slide 91 text

Performance Metrics Dangerous = +1, +1, …, +1 -1, …, -1 Skewed Data

Slide 92

Slide 92 text

Performance Metrics Dangerous = +1, +1, …, +1 -1, …, -1 Skewed Data 90% 10%

Slide 93

Slide 93 text

Performance Metrics Dangerous = +1, +1, …, +1 -1, …, -1 Skewed Data Accuracy of “always guess +1” = 90%! 90% 10%

Slide 94

Slide 94 text

Performance Metrics — Confusion Matrix Actual P Actual F Predicted P True Positive False Positive Predicted F False Negative True Negative

Slide 95

Slide 95 text

True Positive Rate Performance Metrics — TPR, FPR False Positive Rate TP / (TP + FN) FP / (FP + TN) Example: Percentage of dangerous objects correctly identified as such. Example: Percentage of safe objects incorrectly identified as dangerous.

Slide 96

Slide 96 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual TPR FPR 0.0 1.0 1.0

Slide 97

Slide 97 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0

Slide 98

Slide 98 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR

Slide 99

Slide 99 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR -1 -1 -1 +1 0.8 0.00 / 0.00

Slide 100

Slide 100 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR -1 -1 -1 +1 0.8 0.00 / 0.00 -1 -1 +1 +1 0.4 0.50 / 0.50

Slide 101

Slide 101 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR -1 -1 -1 +1 0.8 0.00 / 0.00 -1 -1 +1 +1 0.4 0.50 / 0.50 -1 +1 +1 +1 0.3 0.50 / 1.00

Slide 102

Slide 102 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. Thresh. -1 -1 -1 +1 Pred. 0.8 -1 -1 +1 +1 0.4 -1 +1 +1 +1 0.3 +1 +1 +1 +1 0.1 TPR FPR FPR/TPR 0.00 / 0.00 0.50 / 0.50 0.50 / 1.00 1.00 / 1.00 0.0 1.0 1.0

Slide 103

Slide 103 text

Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. Thresh. -1 -1 -1 +1 Pred. 0.8 -1 -1 +1 +1 0.4 -1 +1 +1 +1 0.3 +1 +1 +1 +1 0.1 TPR FPR FPR/TPR 0.00 / 0.00 0.50 / 0.50 0.50 / 1.00 1.00 / 1.00 0.0 1.0 1.0

Slide 104

Slide 104 text

Performance Metrics — Precision/Recall Precision Recall TP / (TP + FP) FP / (FP + TN) Example: Percentage of objects identified as dangerous that were actually dangerous. Example: Percentage of dangerous objects correctly identified as such. F1-Score (PxR ) / (P + R)

Slide 105

Slide 105 text

Performance Metrics — Precision/Recall -1 +1 -1 +1 Actual 0.1 0.3 0.4 0.8 Prob. Thresh. -1 -1 -1 +1 Pred. 0.8 -1 -1 +1 +1 0.4 -1 +1 +1 +1 0.3 +1 +1 +1 +1 0.1 P R P / R 1.00 / 0.50 0.50 / 0.50 0.66 / 1.00 0.50 / 1.00 0.0 1.0 1.0

Slide 106

Slide 106 text

https://gist.github.com/emaadmanzoor/0ba78a2920ea0858b54942eff8b08820 Notebook available at: Model Evaluation Demo