$30 off During Our Annual Pro Sale. View details »

JGS594 Lecture 14

JGS594 Lecture 14

Software Engineering for Machine Learning
Clustering
(202203)

Javier Gonzalez-Sanchez
PRO

March 24, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs SER 594 Software Engineering for Machine Learning Lecture 14:

    Clustering Dr. Javier Gonzalez-Sanchez javiergs@asu.edu javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2

    jgs Machine Learning
  3. jgs Clustering Unsupervised Learning

  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

    jgs Definition § Unsupervised Learning § Clustering is the task of dividing a population (data points) into a number of groups such that data points in the same groups are similar
  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5

    jgs Similarity § One of the simplest ways to calculate the distance between two feature vectors is to use Euclidean Distance. § Other options: Minkowski distance, Manhattan distance, Hamming distance, Cosine distance, …
  6. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 6

    jgs Algorithm: K-Means
  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7

    jgs Algorithm: K-means § K-Means begins with k randomly placed centroids. Centroids are the center points of the clusters. § Iteration: o Assign each existing data point to its nearest centroid o Move the centroids to the average location of points assigned to it. § Repeat iterations until the assignment between multiple consecutive iterations stops changing
  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8

    jgs Complexity The time required for K-Means is O(I·K·m·n), where: a) I is the number of iterations required for convergence b) K is the number of clusters we're forming c) m is the number of attributes d) n is the number of observations
  9. jgs Coding K-means

  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10

    jgs Code: Record https://github.com/javiergs/Medium/tree/main/Clustering
  11. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11

    jgs Code: DataSet https://github.com/javiergs/Medium/tree/main/Clustering
  12. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12

    jgs Code: K-means https://github.com/javiergs/Medium/tree/main/Clustering
  13. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13

    jgs Let’s Work
  14. jgs Weka Java Framework

  15. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15

    jgs Weka § Waikato Environment for Knowledge Analysis (WEKA) is a machine learning library that was developed at the University of Waikato, New Zealand, § A well-known Java library. § It is a general-purpose library that can solve a wide variety of machine learning tasks, such as classification, regression, and clustering. § It features a rich graphical user interface, command-line interface, and Java API. § http://www.cs.waikato.ac.nz/ml/weka/
  16. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16

    jgs Weka Clustering § weka.clusterers: These are clustering algorithms, including K-means, CLOPE, Cobweb, DBSCAN hierarchical clustering, and FarthestFirst.
  17. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 17

    jgs Weka GUI
  18. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 18

    jgs Weka GUI
  19. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 19

    jgs Weka GUI
  20. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 20

    jgs Weka GUI
  21. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 21

    jgs Weka GUI
  22. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 22

    jgs Questions
  23. jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,

    Ph.D. javiergs@asu.edu Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.