Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JGS594 Lecture 16

JGS594 Lecture 16

Software Engineering for Machine Learning
Clustering III
(202203)

Javier Gonzalez-Sanchez

March 31, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs SER 594 Software Engineering for Machine Learning Lecture 16:

    Clustering III Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

    jgs Algorithms § K-Means - distance between points. Minimize square-error criterion. § DBSCAN (Density-Based Spatial Clustering of Applications with Noise) - distance between nearest points. § Simple EM (Expectation Maximization) is finding likelihood of an observation belonging to a cluster(probability). Maximize log- likelihood criterion
  3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 24

    jgs Assignment | Part 1 § https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/iris.arff § The data was used to learn the description of an acceptable and unacceptable contract. § Number of Instances: 150 § @attribute 'class' {Iris-setosa,Iris-versicolor,Iris-virginica} § K-means (3) § DBSCAN § EM § Evaluation: Likelihood Values § Confusion Matrix, Accuracy
  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 25

    jgs Assignment | Part 2 § https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/labor.arff § Iris flowers classification § Number of Instances: 57 § @attribute 'class' {'bad’, ’good’} § K-means (2) § DBSCAN § EM § Evaluation: Likelihood Values § Confusion Matrix, Accuracy
  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 26

    jgs Assignment | Part 3 § Students Grades Dataset (From Previous Quiz) § Students' grades § Number of Instances: ~150 § @attribute 'class’ unknown § K-means (?) § DBSCAN § EM § Evaluation: Likelihood Values § Confusion Matrix, Accuracy
  6. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 27

    jgs Notes § Do not forget to separate Training and Testing datasets § Use your programming skills to calculate Confusion Matrix and Accuracy § As usual submit a paper including: A) Source Code B) Results B) Explain your findings and Conclusions § Academic Integrity 👀
  7. jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,

    Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.