jgs Definition § Unsupervised Learning § Clustering is the task of dividing a population (data points) into a number of groups such that data points in the same groups are similar
jgs Similarity § One of the simplest ways to calculate the distance between two feature vectors is to use Euclidean Distance. § Other options: Minkowski distance, Manhattan distance, Hamming distance, Cosine distance, …
jgs Algorithm: K-means § K-Means begins with k randomly placed centroids. Centroids are the center points of the clusters. § Iteration: o Assign each existing data point to its nearest centroid o Move the centroids to the average location of points assigned to it. § Repeat iterations until the assignment between multiple consecutive iterations stops changing
jgs Complexity The time required for K-Means is O(I·K·m·n), where: a) I is the number of iterations required for convergence b) K is the number of clusters we're forming c) m is the number of attributes d) n is the number of observations
jgs Weka § Waikato Environment for Knowledge Analysis (WEKA) is a machine learning library that was developed at the University of Waikato, New Zealand, § A well-known Java library. § It is a general-purpose library that can solve a wide variety of machine learning tasks, such as classification, regression, and clustering. § It features a rich graphical user interface, command-line interface, and Java API. § http://www.cs.waikato.ac.nz/ml/weka/
jgs Weka Clustering § weka.clusterers: These are clustering algorithms, including K-means, CLOPE, Cobweb, DBSCAN hierarchical clustering, and FarthestFirst.
Ph.D. [email protected] Spring 2022 Copyright. These slides can only be used as study material for the class CSE205 at Arizona State University. They cannot be distributed or used for another purpose.