Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JGS594 Lecture 14

JGS594 Lecture 14

Software Engineering for Machine Learning
Clustering
(202203)

Javier Gonzalez-Sanchez

March 24, 2022
Tweet

More Decks by Javier Gonzalez-Sanchez

Other Decks in Programming

Transcript

  1. jgs SER 594 Software Engineering for Machine Learning Lecture 14:

    Clustering Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
  2. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

    jgs Definition § Unsupervised Learning § Clustering is the task of dividing a population (data points) into a number of groups such that data points in the same groups are similar
  3. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5

    jgs Similarity § One of the simplest ways to calculate the distance between two feature vectors is to use Euclidean Distance. § Other options: Minkowski distance, Manhattan distance, Hamming distance, Cosine distance, …
  4. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7

    jgs Algorithm: K-means § K-Means begins with k randomly placed centroids. Centroids are the center points of the clusters. § Iteration: o Assign each existing data point to its nearest centroid o Move the centroids to the average location of points assigned to it. § Repeat iterations until the assignment between multiple consecutive iterations stops changing
  5. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8

    jgs Complexity The time required for K-Means is O(I·K·m·n), where: a) I is the number of iterations required for convergence b) K is the number of clusters we're forming c) m is the number of attributes d) n is the number of observations
  6. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10

    jgs Code: Record https://github.com/javiergs/Medium/tree/main/Clustering
  7. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11

    jgs Code: DataSet https://github.com/javiergs/Medium/tree/main/Clustering
  8. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12

    jgs Code: K-means https://github.com/javiergs/Medium/tree/main/Clustering
  9. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15

    jgs Weka § Waikato Environment for Knowledge Analysis (WEKA) is a machine learning library that was developed at the University of Waikato, New Zealand, § A well-known Java library. § It is a general-purpose library that can solve a wide variety of machine learning tasks, such as classification, regression, and clustering. § It features a rich graphical user interface, command-line interface, and Java API. § http://www.cs.waikato.ac.nz/ml/weka/
  10. Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16

    jgs Weka Clustering § weka.clusterers: These are clustering algorithms, including K-means, CLOPE, Cobweb, DBSCAN hierarchical clustering, and FarthestFirst.
  11. jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,

    Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.