440

# JGS594 Lecture 14

Software Engineering for Machine Learning
Clustering
(202203)

March 24, 2022

## Transcript

1. ### jgs SER 594 Software Engineering for Machine Learning Lecture 14:

Clustering Dr. Javier Gonzalez-Sanchez [email protected] javiergs.engineering.asu.edu | javiergs.com PERALTA 230U Office Hours: By appointment
2. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 2

jgs Machine Learning

4. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 4

jgs Definition § Unsupervised Learning § Clustering is the task of dividing a population (data points) into a number of groups such that data points in the same groups are similar
5. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 5

jgs Similarity § One of the simplest ways to calculate the distance between two feature vectors is to use Euclidean Distance. § Other options: Minkowski distance, Manhattan distance, Hamming distance, Cosine distance, …
6. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 6

jgs Algorithm: K-Means
7. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 7

jgs Algorithm: K-means § K-Means begins with k randomly placed centroids. Centroids are the center points of the clusters. § Iteration: o Assign each existing data point to its nearest centroid o Move the centroids to the average location of points assigned to it. § Repeat iterations until the assignment between multiple consecutive iterations stops changing
8. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 8

jgs Complexity The time required for K-Means is O(I·K·m·n), where: a) I is the number of iterations required for convergence b) K is the number of clusters we're forming c) m is the number of attributes d) n is the number of observations

10. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 10

jgs Code: Record https://github.com/javiergs/Medium/tree/main/Clustering
11. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 11

jgs Code: DataSet https://github.com/javiergs/Medium/tree/main/Clustering
12. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 12

jgs Code: K-means https://github.com/javiergs/Medium/tree/main/Clustering
13. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 13

jgs Let’s Work

15. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 15

jgs Weka § Waikato Environment for Knowledge Analysis (WEKA) is a machine learning library that was developed at the University of Waikato, New Zealand, § A well-known Java library. § It is a general-purpose library that can solve a wide variety of machine learning tasks, such as classification, regression, and clustering. § It features a rich graphical user interface, command-line interface, and Java API. § http://www.cs.waikato.ac.nz/ml/weka/
16. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 16

jgs Weka Clustering § weka.clusterers: These are clustering algorithms, including K-means, CLOPE, Cobweb, DBSCAN hierarchical clustering, and FarthestFirst.

jgs Weka GUI

jgs Weka GUI

jgs Weka GUI

jgs Weka GUI

jgs Weka GUI
22. ### Javier Gonzalez-Sanchez | SER 594 | Spring 2022 | 22

jgs Questions
23. ### jgs SER 594 Software Engineering for Machine Learning Javier Gonzalez-Sanchez,

Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.