$30 off During Our Annual Pro Sale. View Details »

JGS594 Lecture 16

JGS594 Lecture 16

Software Engineering for Machine Learning
Clustering III
(202203)

Javier Gonzalez-Sanchez
PRO

March 31, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs
    SER 594
    Software Engineering for
    Machine Learning
    Lecture 16: Clustering III
    Dr. Javier Gonzalez-Sanchez
    [email protected]
    javiergs.engineering.asu.edu | javiergs.com
    PERALTA 230U
    Office Hours: By appointment

    View Slide

  2. jgs
    Previously …
    Unsupervised Learning

    View Slide

  3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 3
    jgs
    Machine Learning

    View Slide

  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4
    jgs
    Algorithms
    § K-Means - distance between points. Minimize square-error criterion.
    § DBSCAN (Density-Based Spatial Clustering of Applications with
    Noise) - distance between nearest points.
    § Simple EM (Expectation Maximization) is finding likelihood of an
    observation belonging to a cluster(probability). Maximize log-
    likelihood criterion

    View Slide

  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5
    jgs
    Clustering Algorithms

    View Slide

  6. jgs
    Test Yourselves

    View Slide

  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7
    jgs
    DataSet

    View Slide

  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8
    jgs
    K-means| K=2

    View Slide

  9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 9
    jgs
    K-means| K=3

    View Slide

  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10
    jgs
    K-means| K=4

    View Slide

  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11
    jgs
    K-means| K=5

    View Slide

  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12
    jgs
    DBSCAN

    View Slide

  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13
    jgs
    EM

    View Slide

  14. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 14
    jgs
    DataSet

    View Slide

  15. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15
    jgs
    DataSet

    View Slide

  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16
    jgs
    K-means | 2 Clusters

    View Slide

  17. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17
    jgs
    K-means | 2 Clusters

    View Slide

  18. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18
    jgs
    EM | 3 Clusters

    View Slide

  19. jgs
    One More Thing
    Connecting the Dots

    View Slide

  20. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20
    jgs
    One More Thing

    View Slide

  21. jgs
    Assignment

    View Slide

  22. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 22
    jgs
    Source Code | Kmeans

    View Slide

  23. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 23
    jgs
    Source Code | General

    View Slide

  24. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 24
    jgs
    Assignment | Part 1
    § https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/iris.arff
    § The data was used to learn the description of an acceptable and
    unacceptable contract.
    § Number of Instances: 150
    § @attribute 'class' {Iris-setosa,Iris-versicolor,Iris-virginica}
    § K-means (3)
    § DBSCAN
    § EM
    § Evaluation: Likelihood Values
    § Confusion Matrix, Accuracy

    View Slide

  25. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 25
    jgs
    Assignment | Part 2
    § https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/labor.arff
    § Iris flowers classification
    § Number of Instances: 57
    § @attribute 'class' {'bad’, ’good’}
    § K-means (2)
    § DBSCAN
    § EM
    § Evaluation: Likelihood Values
    § Confusion Matrix, Accuracy

    View Slide

  26. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 26
    jgs
    Assignment | Part 3
    § Students Grades Dataset (From Previous Quiz)
    § Students' grades
    § Number of Instances: ~150
    § @attribute 'class’ unknown
    § K-means (?)
    § DBSCAN
    § EM
    § Evaluation: Likelihood Values
    § Confusion Matrix, Accuracy

    View Slide

  27. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 27
    jgs
    Notes
    § Do not forget to separate Training and Testing datasets
    § Use your programming skills to calculate Confusion Matrix and Accuracy
    § As usual submit a paper including:
    A) Source Code
    B) Results
    B) Explain your findings and Conclusions
    § Academic Integrity 👀

    View Slide

  28. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 28
    jgs
    Questions

    View Slide

  29. jgs
    SER 594 Software Engineering for Machine Learning
    Javier Gonzalez-Sanchez, Ph.D.
    [email protected]
    Spring 2022
    Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University.
    They cannot be distributed or used for another purpose.

    View Slide