Introduction to Classification

Ed09e933a899fcae158439f11f66fed0?s=47 Emaad Manzoor
November 14, 2017

Introduction to Classification

Lecture for the 95865 Unstructured Data Analysis course in Fall 2017.

Demo notebook: https://gist.github.com/emaadmanzoor/0ba78a2920ea0858b54942eff8b08820

Ed09e933a899fcae158439f11f66fed0?s=128

Emaad Manzoor

November 14, 2017
Tweet

Transcript

  1. Introduction to Classification & Regression Emaad Ahmed Manzoor November, 2017

  2. Classification

  3. None
  4. None
  5. How would you tag images with descriptive text?

  6. Image Tagging as Classification

  7. Image Tagging as Classification x1 x2 xn https://static.pexels.com/photos/126407/pexels-photo-126407.jpeg https://static.pexels.com/photos/54632/cat-animal-eyes-grey-54632.jpeg https://c1.staticflickr.com/4/3645/3523440998_474d43ddc6_b.jpg

  8. Image Tagging as Classification https://static.pexels.com/photos/126407/pexels-photo-126407.jpeg https://static.pexels.com/photos/54632/cat-animal-eyes-grey-54632.jpeg https://c1.staticflickr.com/4/3645/3523440998_474d43ddc6_b.jpg 200 255 101

    205 100 200 255 101 205 100 200 255 101 205 100 x1 x2 xn
  9. Image Tagging as Classification https://static.pexels.com/photos/126407/pexels-photo-126407.jpeg https://static.pexels.com/photos/54632/cat-animal-eyes-grey-54632.jpeg https://c1.staticflickr.com/4/3645/3523440998_474d43ddc6_b.jpg yn = spiderman

    y2 = cat y1 = cat 200 255 101 205 100 200 255 101 205 100 200 255 101 205 100 x1 x2 xn
  10. Image Tagging as Classification xn+1 cat Predict Given ŷn+1

  11. Classification Essentials Labeled data { (x1, y1), (x2, y2), …

    }
  12. Classification Essentials Labeled data { (x1, y1), (x2, y2), …

    } Types of labels: • Binary: yi ∈ { 0, 1 } • Multi-class: yi ∈ { cat, spiderman, … } • Multi-label: yi ∈ { {cat, feline}, {…} }
  13. Classification Essentials Labeled data { (x1, y1), (x2, y2), …

    } Classification Model
  14. Classification Essentials Labeled data { (x1, y1), (x2, y2), …

    } Classification Model • Linear function • Tree-based • Nearest-neighbor • …
  15. A Simple Classifier — kNN

  16. A Simple Classifier — kNN Intuition: Similar points have similar

    labels
  17. A Simple Classifier — kNN Training Data { (x1, y1),

    (x2, y2), … }
  18. A Simple Classifier — kNN Test Point xi, yi =

    ? i
  19. A Simple Classifier — kNN i k = 1 k-NN

    majority vote Test Point xi, yi = ?
  20. A Simple Classifier — kNN i k = 2 k-NN

    majority vote Break ties randomly Test Point xi, yi = ?
  21. A Simple Classifier — kNN i k = 3 k-NN

    majority vote Break ties randomly Test Point xi, yi = ?
  22. A Simple Classifier — kNN Classifier decision boundary If we

    wish to minimize the probability of misclassification, this is done by assigning the test point x to the class having the largest posterior probability, corresponding to the largest value of Kk/K. Thus to classify a new point, we identify the K nearest points from the training data set and then assign the new point to the class having the largest number of representatives amongst this set. Ties can be broken at random. The particular case of K = 1 is called the nearest-neighbour rule, because a test point is simply assigned to the same class as the nearest point from the training set. These concepts are illustrated in Figure 2.27. In Figure 2.28, we show the results of applying the K-nearest-neighbour algo- rithm to the oil flow data, introduced in Chapter 1, for various values of K. As expected, we see that K controls the degree of smoothing, so that small K produces many small regions of each class, whereas large K leads to fewer larger regions. x6 x7 K = 1 0 1 2 0 1 2 x6 x7 K = 3 0 1 2 0 1 2 x6 x7 K = 31 0 1 2 0 1 2 Figure 2.28 Plot of 200 data points from the oil data set showing values of x6 plotted against x7 , where the red, green, and blue points correspond to the ‘laminar’, ‘annular’, and ‘homogeneous’ classes, respectively. Also shown are the classifications of the input space given by the K-nearest-neighbour algorithm for various values of K.
  23. A Simple Classifier — kNN Classifier decision boundary If we

    wish to minimize the probability of misclassification, this is done by assigning the test point x to the class having the largest posterior probability, corresponding to the largest value of Kk/K. Thus to classify a new point, we identify the K nearest points from the training data set and then assign the new point to the class having the largest number of representatives amongst this set. Ties can be broken at random. The particular case of K = 1 is called the nearest-neighbour rule, because a test point is simply assigned to the same class as the nearest point from the training set. These concepts are illustrated in Figure 2.27. In Figure 2.28, we show the results of applying the K-nearest-neighbour algo- rithm to the oil flow data, introduced in Chapter 1, for various values of K. As expected, we see that K controls the degree of smoothing, so that small K produces many small regions of each class, whereas large K leads to fewer larger regions. x6 x7 K = 1 0 1 2 0 1 2 x6 x7 K = 3 0 1 2 0 1 2 x6 x7 K = 31 0 1 2 0 1 2 Figure 2.28 Plot of 200 data points from the oil data set showing values of x6 plotted against x7 , where the red, green, and blue points correspond to the ‘laminar’, ‘annular’, and ‘homogeneous’ classes, respectively. Also shown are the classifications of the input space given by the K-nearest-neighbour algorithm for various values of K. Non-linear decision boundary
  24. A Simple Classifier — kNN Optimal classifier with error e

    1-NN with k = 1 will have error < 2e (Cover and Hart, 1967)
  25. A Linear Classifier — SVM

  26. A Linear Classifier — SVM Assumption: Data is linearly separable

    { (X1, t1), (X2, t2), … } ti ∈ { +1, -1 }
  27. A Linear Classifier — SVM Linearly Separable Not Linearly Separable

    w1 w2
  28. A Linear Classifier — SVM Goal: Find the best linear

    separator
  29. A Linear Classifier — SVM w1 w2

  30. A Linear Classifier — SVM Which linear separator is better?

    w1 w2
  31. A Linear Classifier — SVM w1 w2 Margin Perpendicular distance

    between the separator and the nearest data point
  32. A Linear Classifier — SVM w1 w2 Goal Find the

    maximum margin linear separator
  33. A Linear Classifier — SVM w2 Find the maximum margin

    linear separator Goal
  34. A Linear Classifier — SVM How do we handle non-linearly

    separable data?
  35. Kernels

  36. Kernels Not Linearly Separable

  37. Kernels [ x, y ] [ x, y, x2, y2

    ] ϕ( )
  38. Kernels ϕ( ) Linearly Separable in 4-D

  39. Kernels ϕ( ) Linearly Separable in 4-D Non-linear decision boundary

    in 2-D
  40. Kernels https://en.wikipedia.org/wiki/Polynomial_kernel

  41. Kernels Problem: The non-linear mapping ϕ can be extremely large

  42. Kernels Solution: The Kernel Trick Problem: The non-linear mapping ϕ

    can be extremely large oneweirdkerneltrick.com
  43. The Kernel Trick An explicit mapping is not required!

  44. The Kernel Trick A valid kernel function K(x, y) implicitly

    defines a feature mapping Much easier to define K(x, y)
  45. Examples of Valid Kernels K( x , x 0) =

    e( k x x 0k2 2 2 ) Radial basis function K( x , x 0) = ( x T y + c)d Polynomial
  46. Generative Models

  47. Generative Models Model the unknown data generating process Learn parameters

    of the model from the data
  48. Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J.,

    LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014. Data — Beer ratings by users on RateBeer.com
  49. Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J.,

    LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014. Question — How do beer drinkers “progress” over time?
  50. Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J.,

    LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014. Maximize the data likelihood xij ⇠ Multinomial(⇥(ci, sj)) ⇥(ci, sj) ⇠ Dirichlet( ) A generative model
  51. Example: Finding Progression Stages Yang, J., McAuley, J., Leskovec, J.,

    LePendu, P., and Shah, N. Finding progression stages in time-evolving event sequences. WWW 2014.
  52. Naive Bayes

  53. Naive Bayes Documents x1 x2 xn

  54. Naive Bayes Documents x1 x2 xn xi = { w1,

    w2, …, wm } Words use place line one via money bank account United deposit Uk Assets funds Mr family come private kin also next now years last two died sum life full give well far 25th left care sent put 10 Allied kept USD Sir lord arise fax business fund want late claim share death inform client regards dear offer five find partner gold sincere manager prior nature state turn OLD able oil 15 5 end file 30 M incur collecting Kindly action west acceptable Africa town $ TW ENTY BELLO 2 FASO AUDITING ran OPENED FILES CHARTER JET 50 BENIN 55 TRADER TRADE 60 70 THING 95 hand DIE TIRED Mrs sence VALID DRIVERS George CHEAT HIT 1To TRACE visit SET ASIDE TAKE 252 2To BILLS SEX cocoa CELL CODE 3To Barr banking million make Dollars Arag Contact investigation since name charity organizations assist months official transaction know Simeon God mail interest provide forward world person father email deposited may COUNTRY good TRANSFER never London Please ownership situation contacting HSBC wish given made willing release investment live proposal thousand JOHN KOROVO FOREIGNER huge soon reward told way investments Schoelers properties within back time due numbers Total right believe less project information man estate children ACCIDENT got States assistance ask address hundred consignment OW NER beneficiary profit APPROVED assurance International position desire receive expenses attorney law deceased might COURSE used names associates deal proceeds contacted immediately GOING accounts bless procedure just details clients send Best per daughter enable kindness lines hospital Abidjan fathers FOREIGN chance Amah privileged cote wife divoire Firm six Trust informed certain permit portfolio new especially around opportunities cash COMPANY process instructions understand party relation Abdul instruct simply destroy let cent general DISCOVERED hesitate choice plane crash security knows confide capacity DOCUMENTS need simple Wumi 2000 sharing confidence conclusion honourably telephone secure phone seeking 2003 annually property Based following ways subject serve nothing sale indicate main towards concluded reverting Moreover current period revert LOCAL capital fifteen effort start input successful first feel message Four free According lived accept Peter Attah whole soul gives recommendations living CREDIT ABROAD invest letter CITY Port Harcourt like hear came division ago people advice took risk done $15 later task sector days dead held depositors alone much help seek move pass work must Management arrangement guardian monitored majestys government compensation distributing poisoned overseas consider AFRICAN COMMISSION appreciate without relationship placed request result reputable communication explained wished officer numerous wealth various managers found AMOUNT Kindom orphans Securities Trading charges affiliate processes worth concerned special declared possible surviving means existence rarely nominate internal dictates matter Stella practice relatives customer prepared accrued released distribute BOUAKE expedite HUNDREN SULEMAN AUDITOR BURKINA FLOATING economical GREETINGS RECORDS BEIRUTBOUND businessI fundHe DECEMBER COTONOU REPUBLIC NOBODY MINING PROVED ALONG INVOLVED reimburse TWENTYFIVE PASSPORT humanity STRONG sympathetic INFLUENCE mutual FOREIGNERS PENDING PHYSICAL ARRIVAL PROVE supposed balance BUILD retrive ENTITLED GRATIFICATION wealthy fearing education residential CHAMPION Hello Phillip
  55. Naive Bayes Documents x1 x2 xn Labels ∈ { ham,

    spam } y1 y2 yn use place line one via money bank account United deposit Uk Assets funds Mr family come private kin also next now years last two died sum life full give well far 25th left care sent put 10 Allied kept USD Sir lord arise fax business fund want late claim share death inform client regards dear offer five find partner gold sincere manager prior nature state turn OLD able oil 15 5 end file 30 M incur collecting Kindly action west acceptable Africa town $ TW ENTY BELLO 2 FASO AUDITING ran OPENED FILES CHARTER JET 50 BENIN 55 TRADER TRADE 60 70 THING 95 hand DIE TIRED Mrs sence VALID DRIVERS George CHEAT HIT 1To TRACE visit SET ASIDE TAKE 252 2To BILLS SEX cocoa CELL CODE 3To Barr banking million make Dollars Arag Contact investigation since name charity organizations assist months official transaction know Simeon God mail interest provide forward world person father email deposited may COUNTRY good TRANSFER never London Please ownership situation contacting HSBC wish given made willing release investment live proposal thousand JOHN KOROVO FOREIGNER huge soon reward told way investments Schoelers properties within back time due numbers Total right believe less project information man estate children ACCIDENT got States assistance ask address hundred consignment OW NER beneficiary profit APPROVED assurance International position desire receive expenses attorney law deceased might COURSE used names associates deal proceeds contacted immediately GOING accounts bless procedure just details clients send Best per daughter enable kindness lines hospital Abidjan fathers FOREIGN chance Amah privileged cote wife divoire Firm six Trust informed certain permit portfolio new especially around opportunities cash COMPANY process instructions understand party relation Abdul instruct simply destroy let cent general DISCOVERED hesitate choice plane crash security knows confide capacity DOCUMENTS need simple Wumi 2000 sharing confidence conclusion honourably telephone secure phone seeking 2003 annually property Based following ways subject serve nothing sale indicate main towards concluded reverting Moreover current period revert LOCAL capital fifteen effort start input successful first feel message Four free According lived accept Peter Attah whole soul gives recommendations living CREDIT ABROAD invest letter CITY Port Harcourt like hear came division ago people advice took risk done $15 later task sector days dead held depositors alone much help seek move pass work must Management arrangement guardian monitored majestys government compensation distributing poisoned overseas consider AFRICAN COMMISSION appreciate without relationship placed request result reputable communication explained wished officer numerous wealth various managers found AMOUNT Kindom orphans Securities Trading charges affiliate processes worth concerned special declared possible surviving means existence rarely nominate internal dictates matter Stella practice relatives customer prepared accrued released distribute BOUAKE expedite HUNDREN SULEMAN AUDITOR BURKINA FLOATING economical GREETINGS RECORDS BEIRUTBOUND businessI fundHe DECEMBER COTONOU REPUBLIC NOBODY MINING PROVED ALONG INVOLVED reimburse TWENTYFIVE PASSPORT humanity STRONG sympathetic INFLUENCE mutual FOREIGNERS PENDING PHYSICAL ARRIVAL PROVE supposed balance BUILD retrive ENTITLED GRATIFICATION wealthy fearing education residential CHAMPION Hello Phillip xi = { w1, w2, …, wm } Words
  56. Naive Bayes Assume there are M possible words in the

    vocabulary
  57. Naive Bayes Generating a document xi:

  58. Naive Bayes Generating a document xi: θ = P(spam)

  59. Naive Bayes Generating a document xi: 1. Pick a label

    yi from {ham, spam} with probability θ θ = P(spam)
  60. Naive Bayes Generating a document xi: 1. Pick a label

    yi from {ham, spam} with probability θ θ = P(spam) P(wj | ham) P(wj | spam)
  61. Naive Bayes Generating a document xi: 1. Pick a label

    yi from {ham, spam} with probability θ θ = P(spam) P(wj | ham) P(wj | spam) use place line one via money bank account United deposit Uk Assets funds Mr family come private kin also next now years last two died sum life full give well far 25th left care sent put 10 Allied kept USD Sir lord arise fax business fund want late claim share death inform client regards dear offer five find partner gold sincere manager prior nature state turn OLD able oil 15 5 end file 30 M incur collecting Kindly action west acceptable Africa town $ TW ENTY BELLO 2 FASO AUDITING ran OPENED FILES CHARTER JET 50 BENIN 55 TRADER TRADE 60 70 THING 95 hand DIE TIRED Mrs sence VALID DRIVERS George CHEAT HIT 1To TRACE visit SET ASIDE TAKE 252 2To BILLS SEX cocoa CELL CODE 3To Barr banking million make Dollars Arag Contact investigation since name charity organizations assist months official transaction know Simeon God mail interest provide forward world person father email deposited may COUNTRY good TRANSFER never London Please ownership situation contacting HSBC wish given made willing release investment live proposal thousand JOHN KOROVO FOREIGNER huge soon reward told way investments Schoelers properties within back time due numbers Total right believe less project information man estate children ACCIDENT got States assistance ask address hundred consignment OW NER beneficiary profit APPROVED assurance International position desire receive expenses attorney law deceased might COURSE used names associates deal proceeds contacted immediately GOING accounts bless procedure just details clients send Best per daughter enable kindness lines hospital Abidjan fathers FOREIGN chance Amah privileged cote wife divoire Firm six Trust informed certain permit portfolio new especially around opportunities cash COMPANY process instructions understand party relation Abdul instruct simply destroy let cent general DISCOVERED hesitate choice plane crash security knows confide capacity DOCUMENTS need simple Wumi 2000 sharing confidence conclusion honourably telephone secure phone seeking 2003 annually property Based following ways subject serve nothing sale indicate main towards concluded reverting Moreover current period revert LOCAL capital fifteen effort start input successful first feel message Four free According lived accept Peter Attah whole soul gives recommendations living CREDIT ABROAD invest letter CITY Port Harcourt like hear came division ago people advice took risk done $15 later task sector days dead held depositors alone much help seek move pass work must Management arrangement guardian monitored majestys government compensation distributing poisoned overseas consider AFRICAN COMMISSION appreciate without relationship placed request result reputable communication explained wished officer numerous wealth various managers found AMOUNT Kindom orphans Securities Trading charges affiliate processes worth concerned special declared possible surviving means existence rarely nominate internal dictates matter Stella practice relatives customer prepared accrued released distribute BOUAKE expedite HUNDREN SULEMAN AUDITOR BURKINA FLOATING economical GREETINGS RECORDS BEIRUTBOUND businessI fundHe DECEMBER COTONOU REPUBLIC NOBODY MINING PROVED ALONG INVOLVED reimburse TWENTYFIVE PASSPORT humanity STRONG sympathetic INFLUENCE mutual FOREIGNERS PENDING PHYSICAL ARRIVAL PROVE supposed balance BUILD retrive ENTITLED GRATIFICATION wealthy fearing education residential CHAMPION Hello Phillip word words sprite placed area algorithm layout candidate step collision bounding without retrieve operation perform hierarchical time 32 possible draw placement data pixel expensive pixels even masks implementation simple detection starting larger whole previously comparing move box large think version single tree separately always use overlap animations prevents browsers event loop blocking placing incredibly available GitHub important Attempt place point usually near middle somewhere central horizontal line intersects open source one along increasing spiral Repeat intersections found hard part making license efficiently According Jonathan Feinberg Wordle uses combination d3cloud Note boxes quadtrees achieve reasonable speeds Glyphs JavaScript isnt way code precise glyph shapes via DOM except perhaps SVG fonts Instead text hidden canvas element rendering final Retrieving output requires many additional batch development Sprites initial quite performed slow hundred using run doesnt copy appropriate position asynchronously representing configurable Cloud advantage involves positioning size relevant rather previous Somewhat surprisingly lowlevel hack made tremendous difference constructing compressed blocks 1bit 32bit integers thus reducing number checks memory times fact turned beat makes quadtree everything tried Generator areas font sizes animate primarily Works needs stuttering test per whereas compare every overlaps slightly Another possibility merge recommended fairly though compared analagous mask essentially ORing block converting
  62. Naive Bayes Generating a document xi: 1. Pick a label

    yi from {ham, spam} with probability θ 2. For each possible word w1 — wM, include it in the document with probability P(wj | yi) θ = P(spam) P(wj | ham) P(wj | spam)
  63. Naive Bayes How many coins? θ = P(spam) P(wj |

    ham) P(wj | spam)
  64. Naive Bayes — Important Assumption

  65. Naive Bayes — Important Assumption P(w1, w2, …, wM |

    label) = P(w1 | label)…P(wM, | label)
  66. Naive Bayes — Important Assumption P(w1, w2, …, wM |

    label) = P(w1 | label)…P(wM, | label) The presence of each word within a document is conditionally independent of the other words, given the label
  67. Conditional Independence SPAM “MILLION” “WINNER”

  68. I observe the word “million” in the document Conditional Independence

    SPAM “MILLION” “WINNER”
  69. I observe the word “million” in the document Conditional Independence

    SPAM “MILLION” “WINNER” How do the chances of observing the word “winner” change?
  70. Conditional Independence SPAM “MILLION” “WINNER” Knowing that “million” is observed

    changes our degree of uncertainty about observing “winner”
  71. Conditional Independence SPAM “MILLION” “WINNER” Knowing that “million” is observed

    changes our degree of uncertainty about observing “winner” The events are not independent
  72. Conditional Independence SPAM “MILLION” “WINNER” I know that the document

    is not spam
  73. Conditional Independence SPAM “MILLION” “WINNER” I know that the document

    is not spam I observe the word “million” in the document
  74. Conditional Independence SPAM “MILLION” “WINNER” I know that the document

    is not spam I observe the word “million” in the document Does that affect the chances of observing “winner”?
  75. Naive Bayes — Fitting Parameters P ( yi = spam)

    = ✓ = number of spam documents total number of documents P ( wj | spam) = number of times word wj occurs in spam total number of words labeled spam
  76. Naive Bayes — Prediction My father was a very wealthy

    cocoa merchant in Abidjan, the economic capital of Ivory Coast before he was poisoned to death by his business associates on one of their outing to discus on a business deal. http://www.hoax-slayer.net/wumi-abdul-advance-fee-scam/ P(y = spam) ∝ P(“my” | spam)…P(“deal” | spam)P(spam) P(y = ham) ∝ P(“my” | ham)…P(“deal” | ham)P(spam)
  77. Naive Bayes — Smoothing What if we observe a word

    in a document that we never saw during training? P(“Ivory” | spam) = 0.0 P(y = spam) ∝ P(“my” | spam)…P(“Ivory” | spam)P(spam) = 0.0
  78. Smooth word counts P ( wj | spam) = number

    of times word wj occurs in spam + 1 total number of words labeled spam + |V | ccurs in spam + 1 eled spam + |V | unique words in the training data Naive Bayes — Smoothing
  79. Method Evaluation

  80. Evaluation Goals

  81. Evaluation Goals Model selection — finding the best-performing model kNN

    NB
  82. Evaluation Goals Model selection — finding the best-performing model k=2

    k=3 kNN NB “Hyperparameter" selection — finding the best hyperparameters for a given model
  83. Goal: Minimize the error on future, unobserved data Evaluation Goals

    Model selection — finding the best-performing model k=2 k=3 kNN NB “Hyperparameter" selection — finding the best hyperparameters for a given model
  84. Method I — Train, Validation, Test • Split data randomly

    into train, validation and test • Split can be stratified based on the label • See sklearn.model_selection.train_test_split Data Train Test Val
  85. Method I — Train and Test Data Can we do

    this without wasting valuable training data?
  86. Method II — k-Fold Cross-Validation • Split data randomly into

    train and test • Split train data randomly into k equal “folds” • Train on k-1 folds, validate on the remaining • Average the k metrics from each fold • See sklearn.model_selection.KFold
  87. Method II — k-Fold Cross-Validation (Shuffled) Training Data Fold 1

    Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Test Data
  88. Hyperparameter Selection • Grid-search: Search a “well- spaced grid” of

    hyperparameter values • Performance metrics averaged over the k validation folds • Select the hyperparameters with the best performance
  89. Performance Metrics Factors driving the choice of a performance metric:

    • Data — balanced vs. skewed • Task — ranking, classification, clustering • Real-world use-case
  90. None
  91. Performance Metrics Dangerous = +1, +1, …, +1 -1, …,

    -1 Skewed Data
  92. Performance Metrics Dangerous = +1, +1, …, +1 -1, …,

    -1 Skewed Data 90% 10%
  93. Performance Metrics Dangerous = +1, +1, …, +1 -1, …,

    -1 Skewed Data Accuracy of “always guess +1” = 90%! 90% 10%
  94. Performance Metrics — Confusion Matrix Actual P Actual F Predicted

    P True Positive False Positive Predicted F False Negative True Negative
  95. True Positive Rate Performance Metrics — TPR, FPR False Positive

    Rate TP / (TP + FN) FP / (FP + TN) Example: Percentage of dangerous objects correctly identified as such. Example: Percentage of safe objects incorrectly identified as dangerous.
  96. Performance Metrics — Thresholds -1 +1 -1 +1 Actual TPR

    FPR 0.0 1.0 1.0
  97. Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0
  98. Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR
  99. Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR -1 -1 -1 +1 0.8 0.00 / 0.00
  100. Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR -1 -1 -1 +1 0.8 0.00 / 0.00 -1 -1 +1 +1 0.4 0.50 / 0.50
  101. Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. TPR FPR 0.0 1.0 1.0 Thresh. Pred. FPR/TPR -1 -1 -1 +1 0.8 0.00 / 0.00 -1 -1 +1 +1 0.4 0.50 / 0.50 -1 +1 +1 +1 0.3 0.50 / 1.00
  102. Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. Thresh. -1 -1 -1 +1 Pred. 0.8 -1 -1 +1 +1 0.4 -1 +1 +1 +1 0.3 +1 +1 +1 +1 0.1 TPR FPR FPR/TPR 0.00 / 0.00 0.50 / 0.50 0.50 / 1.00 1.00 / 1.00 0.0 1.0 1.0
  103. Performance Metrics — Thresholds -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. Thresh. -1 -1 -1 +1 Pred. 0.8 -1 -1 +1 +1 0.4 -1 +1 +1 +1 0.3 +1 +1 +1 +1 0.1 TPR FPR FPR/TPR 0.00 / 0.00 0.50 / 0.50 0.50 / 1.00 1.00 / 1.00 0.0 1.0 1.0
  104. Performance Metrics — Precision/Recall Precision Recall TP / (TP +

    FP) FP / (FP + TN) Example: Percentage of objects identified as dangerous that were actually dangerous. Example: Percentage of dangerous objects correctly identified as such. F1-Score (PxR ) / (P + R)
  105. Performance Metrics — Precision/Recall -1 +1 -1 +1 Actual 0.1

    0.3 0.4 0.8 Prob. Thresh. -1 -1 -1 +1 Pred. 0.8 -1 -1 +1 +1 0.4 -1 +1 +1 +1 0.3 +1 +1 +1 +1 0.1 P R P / R 1.00 / 0.50 0.50 / 0.50 0.66 / 1.00 0.50 / 1.00 0.0 1.0 1.0
  106. https://gist.github.com/emaadmanzoor/0ba78a2920ea0858b54942eff8b08820 Notebook available at: Model Evaluation Demo