$30 off During Our Annual Pro Sale. View Details »

JGS594 Lecture 14

JGS594 Lecture 14

Software Engineering for Machine Learning
Clustering
(202203)

Javier Gonzalez-Sanchez
PRO

March 24, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs
    SER 594
    Software Engineering for
    Machine Learning
    Lecture 14: Clustering
    Dr. Javier Gonzalez-Sanchez
    [email protected]
    javiergs.engineering.asu.edu | javiergs.com
    PERALTA 230U
    Office Hours: By appointment

    View Slide

  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2
    jgs
    Machine Learning

    View Slide

  3. jgs
    Clustering
    Unsupervised Learning

    View Slide

  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4
    jgs
    Definition
    § Unsupervised Learning
    § Clustering is the task of dividing a population (data points) into a
    number of groups such that data points in the same groups are
    similar

    View Slide

  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5
    jgs
    Similarity
    § One of the simplest ways to calculate the distance between two feature
    vectors is to use Euclidean Distance.
    § Other options: Minkowski distance, Manhattan distance, Hamming
    distance, Cosine distance, …

    View Slide

  6. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 6
    jgs
    Algorithm: K-Means

    View Slide

  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7
    jgs
    Algorithm: K-means
    § K-Means begins with k randomly placed centroids. Centroids are the
    center points of the clusters.
    § Iteration:
    o Assign each existing data point to its nearest centroid
    o Move the centroids to the average location of points assigned to it.
    § Repeat iterations until the assignment between multiple consecutive
    iterations stops changing

    View Slide

  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8
    jgs
    Complexity
    The time required for K-Means is O(I·K·m·n), where:
    a) I is the number of iterations required for convergence
    b) K is the number of clusters we're forming
    c) m is the number of attributes
    d) n is the number of observations

    View Slide

  9. jgs
    Coding
    K-means

    View Slide

  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10
    jgs
    Code: Record
    https://github.com/javiergs/Medium/tree/main/Clustering

    View Slide

  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11
    jgs
    Code: DataSet
    https://github.com/javiergs/Medium/tree/main/Clustering

    View Slide

  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12
    jgs
    Code: K-means
    https://github.com/javiergs/Medium/tree/main/Clustering

    View Slide

  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13
    jgs
    Let’s Work

    View Slide

  14. jgs
    Weka
    Java Framework

    View Slide

  15. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15
    jgs
    Weka
    § Waikato Environment for Knowledge Analysis (WEKA) is a machine learning
    library that was developed at the University of Waikato, New Zealand,
    § A well-known Java library.
    § It is a general-purpose library that can solve a wide variety of machine
    learning tasks, such as classification, regression, and clustering.
    § It features a rich graphical user interface, command-line interface, and
    Java API.
    § http://www.cs.waikato.ac.nz/ml/weka/

    View Slide

  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16
    jgs
    Weka Clustering
    § weka.clusterers: These are clustering algorithms, including K-means,
    CLOPE, Cobweb, DBSCAN hierarchical clustering, and FarthestFirst.

    View Slide

  17. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17
    jgs
    Weka GUI

    View Slide

  18. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18
    jgs
    Weka GUI

    View Slide

  19. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19
    jgs
    Weka GUI

    View Slide

  20. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20
    jgs
    Weka GUI

    View Slide

  21. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 21
    jgs
    Weka GUI

    View Slide

  22. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 22
    jgs
    Questions

    View Slide

  23. jgs
    SER 594 Software Engineering for Machine Learning
    Javier Gonzalez-Sanchez, Ph.D.
    [email protected]
    Spring 2022
    Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University.
    They cannot be distributed or used for another purpose.

    View Slide