$30 off During Our Annual Pro Sale. View Details »

JGS594 Lecture 21

JGS594 Lecture 21

Software Engineering for Machine Learning
Naive Bayes
(202204)

Javier Gonzalez-Sanchez
PRO

April 20, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs
    SER 594
    Software Engineering for
    Machine Learning
    Lecture 21: Classification
    Dr. Javier Gonzalez-Sanchez
    [email protected]
    javiergs.engineering.asu.edu | javiergs.com
    PERALTA 230U
    Office Hours: By appointment

    View Slide

  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2
    jgs
    Machine Learning

    View Slide

  3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 3
    jgs
    Definition
    § Supervised learning is the machine learning task of learning a function that
    maps an input to an output based on example input-output pairs.
    § Classification – approximating a mapping function (f) from input variables (X)
    to discrete output variables (y).
    § Regression – approximating a mapping function (f) from input variables (X)
    to a continuous output variable (y).
    § Discretization – convert a regression problem to a classification problem

    View Slide

  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4
    jgs
    But,
    How does a Classifier work?

    View Slide

  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5
    jgs
    Previously
    /
    / yes, we are still using Mallet.
    /
    / Nothing else.

    View Slide

  6. jgs
    Classification
    Naïve Bayes

    View Slide

  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7
    jgs
    Naïve Bayes
    § It is one of the most popular and simple machine learning classification
    algorithms
    § based on the Bayes Theorem for calculating probabilities and conditional
    probabilities
    § It can be extremely fast relative to other classification algorithms
    § It is easy to build and particularly useful for very large data sets
    § The name naive is used because it assumes the features that go into the
    model is independent of each other. That is changing the value of one
    feature, does not directly influence or change the value of any of the other
    features used in the algorithm.

    View Slide

  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8
    jgs
    Bayes
    X Y Frequency Table Likelihood Table
    Example:
    P (Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
    P (Yes | Sunny) = 3 / 9 * 9 / 14 / 5/14 = 0.60
    Bayes Rule:

    View Slide

  9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9
    jgs
    Outlook
    Temperat
    ure
    Humidity Windy
    Play
    Golf
    0 Rainy Hot High False No
    1 Rainy Hot High True No
    2 Overcast Hot High False Yes
    3 Sunny Mild High False Yes
    4 Sunny Cool Normal False Yes
    5 Sunny Cool Normal True No
    6 Overcast Cool Normal True Yes
    7 Rainy Mild High False No
    8 Rainy Cool Normal False Yes
    9 Sunny Mild Normal False Yes
    10 Rainy Mild Normal True Yes
    11 Overcast Mild High True Yes
    12 Overcast Hot Normal False Yes
    13 Sunny Mild High True No
    X1 X2 X3 X4 Y
    Frequency Table

    View Slide

  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10
    jgs
    Naïve Bayes
    Bayes Rule:
    Naïve Bayes Rule:

    View Slide

  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11
    jgs
    Naïve Bayes
    Example:
    P (Yes | Rainy, Hot, High, False) =
    P( Rainy | Yes) * P( Hot | Yes) * P( High | Yes) * P( False | Yes) *
    P(Yes) / P (Rainy) * P (Hot) * P (High) * P (False) =
    3/9 * 2/9* 3/9 * 6/9 *
    9/14 / 5/14* 4/14 * 7/14 * 8/14 =
    0.01646* 0.6428 / 0.0291 = 0.3629
    P (No | Rainy, Hot, High, False) =
    P( Rainy | No) * P( Hot | No) * P( High | No) * P( False | No) *
    P(No) / P (Rainy) * P (Hot) * P (High) * P (False) =
    2/5 * 2/5* 4/5 * 2/5 *
    5/14 / 5/14* 4/14 * 7/14 * 8/14 =
    0.0512* 0.3571 / 0.0291 = 0.6282

    View Slide

  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12
    jgs
    Weka | NaiveBayes
    CSV file

    View Slide

  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13
    jgs
    Note: ARFF Files
    String path = "data/weather.nominal.arff";
    BufferedReader bufferedReader = new BufferedReader(
    new FileReader(path)
    );
    Instances datasetInstances = new Instances(bufferedReader);

    View Slide

  14. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 14
    jgs
    Weka | NaiveBayes | Evaluation

    View Slide

  15. jgs
    Classification
    Decision Tree

    View Slide

  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16
    jgs
    Decision Tree
    § Also known as Classification And Regression Trees (CART)
    § Learning answers to a hierarchy of if/else questions leading to a decision

    View Slide

  17. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17
    jgs
    Example

    View Slide

  18. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18
    jgs
    Example
    Which Attribute Should be the ROOT?

    View Slide

  19. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19
    jgs
    Data

    View Slide

  20. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20
    jgs
    Entropy
    Entropy is a measurement of
    the degree of randomness or
    the increase in the disorganization within a system.

    View Slide

  21. jgs
    To be continued…

    View Slide

  22. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 22
    jgs
    Questions

    View Slide

  23. jgs
    SER 594 Software Engineering for Machine Learning
    Javier Gonzalez-Sanchez, Ph.D.
    [email protected]
    Spring 2022
    Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University.
    They cannot be distributed or used for another purpose.

    View Slide