K-Means Clustering in Machine Learning

Presented By CHAROTTE Félix on 27/09/2024 BASIC CONCEPTS, PROCESS, APPLICATIONS,
PROS & CONS K-MEANS TENCENT INNOVATION

MACHINE LEARNING Supervised Unsupervised Learn from labels X1, X2, X3
-> Y Learn without labels No target Y Reinforcement Learn from experience Regression : Linear regression Decision tree Classification : Logistic regression Clustering : K-means KNN Dimenssion Reduction : PCA (features selection) Agent + environment : Maximise the result by learning policy Definition : “Algorithms whose performance improve as they are exposed to more data over time”

How would you design an algorithm for finding the three
clusters in this case? Raw data - no target Notion of distance or similarity (euclidian/cosine) Different group of objects CLEAR CLUSTER DATA

K-means is a method that divides data into K clusters.
Each cluster has a centroid (aka center point or mean), and data points are grouped with the nearest centroid. The aim is to minimize the distance between ¹data points and their cluster’s ²centroid. For grouping and segmenting data e.g. customer segmentation WHAT IS K-MEANS ?

Choose K (or Elbow method) Initialize centroids Assign points: Each
data point is assigned to closest centroid. Update centroids: Compute mean of all datapoints and reassign centroids. Repeat until it converges. K-MEANS PROCESS K-means example (k=2) Ps: In python use kmeans.fit() from sklearn librairy.

SKLEARN PYTHON Plot raw data matplotlib Find the best value
for K using elbow method (k=2) Use kmeans.fit(data) to cluster

Source : Hewlet Packard DISADVANTAGES PROS & CONS ADVANTAGES Simple
and easy to implement: The algorithm is straightforward and efficient for clustering. Scalable for large datasets: Can handle large datasets efficiently, making it suitable for big data applications. Sensitive to initial centroid placement: If the initial centroids are poorly chosen, K- means can end up with suboptimal clusters. Assumes spherical clusters: It struggles with data that forms non-circular or varied-sized clusters (circular+same size).

SCOPE OF APPLICATION Image segmentation: Grouping pixels by similarity. Market
segmentation: Dividing customers based on behavior or preferences. Anomaly detection: Identifying outliers in datasets. e.g. Fraud detection, cybersecurity

Presented By CHAROTTE Félix on 27/09/2024 BASIC CONCEPTS, PROCESS, APPLICATIONS,
PROS & CONS THANKS FOR LISTENING! ANY QUESTIONS ?

K-Means Clustering in Machine Learning

K-Means Clustering in Machine Learning

Félix Charotte

More Decks by Félix Charotte

Other Decks in Programming

Featured

Transcript

Presented By CHAROTTE Félix on 27/09/2024 BASIC CONCEPTS, PROCESS, APPLICATIONS,

MACHINE LEARNING Supervised Unsupervised Learn from labels X1, X2, X3

How would you design an algorithm for finding the three

K-means is a method that divides data into K clusters.

Choose K (or Elbow method) Initialize centroids Assign points: Each

SKLEARN PYTHON Plot raw data matplotlib Find the best value

Source : Hewlet Packard DISADVANTAGES PROS & CONS ADVANTAGES Simple

SCOPE OF APPLICATION Image segmentation: Grouping pixels by similarity. Market

Presented By CHAROTTE Félix on 27/09/2024 BASIC CONCEPTS, PROCESS, APPLICATIONS,