Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JGS594 Lecture 21

JGS594 Lecture 21

Software Engineering for Machine Learning
Naive Bayes
(202204)

B546a9b97d993392e4b22b74b99b91fe?s=128

Javier Gonzalez
PRO

April 20, 2022
Tweet

More Decks by Javier Gonzalez

Other Decks in Programming

Transcript

  1. jgs SER 594 Software Engineering for Machine Learning Lecture 21:

    Classification Dr. Javier Gonzalez-Sanchez javiergs@asu.edu javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2

    jgs Machine Learning
  3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 3

    jgs Definition § Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. § Classification – approximating a mapping function (f) from input variables (X) to discrete output variables (y). § Regression – approximating a mapping function (f) from input variables (X) to a continuous output variable (y). § Discretization – convert a regression problem to a classification problem
  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

    jgs But, How does a Classifier work?
  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5

    jgs Previously / / yes, we are still using Mallet. / / Nothing else.
  6. jgs Classification Naïve Bayes

  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7

    jgs Naïve Bayes § It is one of the most popular and simple machine learning classification algorithms § based on the Bayes Theorem for calculating probabilities and conditional probabilities § It can be extremely fast relative to other classification algorithms § It is easy to build and particularly useful for very large data sets § The name naive is used because it assumes the features that go into the model is independent of each other. That is changing the value of one feature, does not directly influence or change the value of any of the other features used in the algorithm.
  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8

    jgs Bayes X Y Frequency Table Likelihood Table Example: P (Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny) P (Yes | Sunny) = 3 / 9 * 9 / 14 / 5/14 = 0.60 Bayes Rule:
  9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9

    jgs Outlook Temperat ure Humidity Windy Play Golf 0 Rainy Hot High False No 1 Rainy Hot High True No 2 Overcast Hot High False Yes 3 Sunny Mild High False Yes 4 Sunny Cool Normal False Yes 5 Sunny Cool Normal True No 6 Overcast Cool Normal True Yes 7 Rainy Mild High False No 8 Rainy Cool Normal False Yes 9 Sunny Mild Normal False Yes 10 Rainy Mild Normal True Yes 11 Overcast Mild High True Yes 12 Overcast Hot Normal False Yes 13 Sunny Mild High True No X1 X2 X3 X4 Y Frequency Table
  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10

    jgs Naïve Bayes Bayes Rule: Naïve Bayes Rule:
  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11

    jgs Naïve Bayes Example: P (Yes | Rainy, Hot, High, False) = P( Rainy | Yes) * P( Hot | Yes) * P( High | Yes) * P( False | Yes) * P(Yes) / P (Rainy) * P (Hot) * P (High) * P (False) = 3/9 * 2/9* 3/9 * 6/9 * 9/14 / 5/14* 4/14 * 7/14 * 8/14 = 0.01646* 0.6428 / 0.0291 = 0.3629 P (No | Rainy, Hot, High, False) = P( Rainy | No) * P( Hot | No) * P( High | No) * P( False | No) * P(No) / P (Rainy) * P (Hot) * P (High) * P (False) = 2/5 * 2/5* 4/5 * 2/5 * 5/14 / 5/14* 4/14 * 7/14 * 8/14 = 0.0512* 0.3571 / 0.0291 = 0.6282
  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12

    jgs Weka | NaiveBayes CSV file
  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13

    jgs Note: ARFF Files String path = "data/weather.nominal.arff"; BufferedReader bufferedReader = new BufferedReader( new FileReader(path) ); Instances datasetInstances = new Instances(bufferedReader);
  14. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 14

    jgs Weka | NaiveBayes | Evaluation
  15. jgs Classification Decision Tree

  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16

    jgs Decision Tree § Also known as Classification And Regression Trees (CART) § Learning answers to a hierarchy of if/else questions leading to a decision
  17. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17

    jgs Example
  18. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18

    jgs Example Which Attribute Should be the ROOT?
  19. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19

    jgs Data
  20. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20

    jgs Entropy Entropy is a measurement of the degree of randomness or the increase in the disorganization within a system.
  21. jgs To be continued…

  22. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 22

    jgs Questions
  23. jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,

    Ph.D. javiergs@asu.edu Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.