# JGS594 Lecture 21

Software Engineering for Machine Learning
Naive Bayes
(202204)

April 20, 2022

## Transcript

1. jgs
SER 594
Software Engineering for
Machine Learning
Lecture 21: Classification
Dr. Javier Gonzalez-Sanchez
[email protected]
javiergs.engineering.asu.edu | javiergs.com
PERALTA 230U
Office Hours: By appointment

2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2
jgs
Machine Learning

3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 3
jgs
Definition
§ Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs.
§ Classification – approximating a mapping function (f) from input variables (X)
to discrete output variables (y).
§ Regression – approximating a mapping function (f) from input variables (X)
to a continuous output variable (y).
§ Discretization – convert a regression problem to a classification problem

4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4
jgs
But,
How does a Classifier work?

5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5
jgs
Previously
/
/ yes, we are still using Mallet.
/
/ Nothing else.

6. jgs
Classification
Naïve Bayes

7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7
jgs
Naïve Bayes
§ It is one of the most popular and simple machine learning classification
algorithms
§ based on the Bayes Theorem for calculating probabilities and conditional
probabilities
§ It can be extremely fast relative to other classification algorithms
§ It is easy to build and particularly useful for very large data sets
§ The name naive is used because it assumes the features that go into the
model is independent of each other. That is changing the value of one
feature, does not directly influence or change the value of any of the other
features used in the algorithm.

8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8
jgs
Bayes
X Y Frequency Table Likelihood Table
Example:
P (Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
P (Yes | Sunny) = 3 / 9 * 9 / 14 / 5/14 = 0.60
Bayes Rule:

9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9
jgs
Outlook
Temperat
ure
Humidity Windy
Play
Golf
0 Rainy Hot High False No
1 Rainy Hot High True No
2 Overcast Hot High False Yes
3 Sunny Mild High False Yes
4 Sunny Cool Normal False Yes
5 Sunny Cool Normal True No
6 Overcast Cool Normal True Yes
7 Rainy Mild High False No
8 Rainy Cool Normal False Yes
9 Sunny Mild Normal False Yes
10 Rainy Mild Normal True Yes
11 Overcast Mild High True Yes
12 Overcast Hot Normal False Yes
13 Sunny Mild High True No
X1 X2 X3 X4 Y
Frequency Table

10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10
jgs
Naïve Bayes
Bayes Rule:
Naïve Bayes Rule:

11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11
jgs
Naïve Bayes
Example:
P (Yes | Rainy, Hot, High, False) =
P( Rainy | Yes) * P( Hot | Yes) * P( High | Yes) * P( False | Yes) *
P(Yes) / P (Rainy) * P (Hot) * P (High) * P (False) =
3/9 * 2/9* 3/9 * 6/9 *
9/14 / 5/14* 4/14 * 7/14 * 8/14 =
0.01646* 0.6428 / 0.0291 = 0.3629
P (No | Rainy, Hot, High, False) =
P( Rainy | No) * P( Hot | No) * P( High | No) * P( False | No) *
P(No) / P (Rainy) * P (Hot) * P (High) * P (False) =
2/5 * 2/5* 4/5 * 2/5 *
5/14 / 5/14* 4/14 * 7/14 * 8/14 =
0.0512* 0.3571 / 0.0291 = 0.6282

12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12
jgs
Weka | NaiveBayes
CSV file

13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13
jgs
Note: ARFF Files
String path = "data/weather.nominal.arff";
);

14. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 14
jgs
Weka | NaiveBayes | Evaluation

15. jgs
Classification
Decision Tree

16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16
jgs
Decision Tree
§ Also known as Classification And Regression Trees (CART)
§ Learning answers to a hierarchy of if/else questions leading to a decision

17. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17
jgs
Example

18. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18
jgs
Example
Which Attribute Should be the ROOT?

19. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19
jgs
Data

20. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20
jgs
Entropy
Entropy is a measurement of
the degree of randomness or
the increase in the disorganization within a system.

21. jgs
To be continued…

22. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 22
jgs
Questions

23. jgs
SER 594 Software Engineering for Machine Learning
Javier Gonzalez-Sanchez, Ph.D.
[email protected]
Spring 2022
Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University.
They cannot be distributed or used for another purpose.