Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

K-Means Clustering in Machine Learning

K-Means Clustering in Machine Learning

This presentation introduces K-Means clustering, covering its core concepts, step-by-step process, real-world applications, and advantages and limitations. Designed for aspiring members of the Tencent Innovation Club, it highlights how K-Means can drive practical machine learning solutions and data-driven innovation.

Avatar for Félix Charotte

Félix Charotte

October 06, 2025
Tweet

More Decks by Félix Charotte

Other Decks in Programming

Transcript

  1. MACHINE LEARNING Supervised Unsupervised Learn from labels X1, X2, X3

    -> Y Learn without labels No target Y Reinforcement Learn from experience Regression : Linear regression Decision tree Classification : Logistic regression Clustering : K-means KNN Dimenssion Reduction : PCA (features selection) Agent + environment : Maximise the result by learning policy Definition : “Algorithms whose performance improve as they are exposed to more data over time”
  2. How would you design an algorithm for finding the three

    clusters in this case? Raw data - no target Notion of distance or similarity (euclidian/cosine) Different group of objects CLEAR CLUSTER DATA
  3. K-means is a method that divides data into K clusters.

    Each cluster has a centroid (aka center point or mean), and data points are grouped with the nearest centroid. The aim is to minimize the distance between ¹data points and their cluster’s ²centroid. For grouping and segmenting data e.g. customer segmentation WHAT IS K-MEANS ?
  4. Choose K (or Elbow method) Initialize centroids Assign points: Each

    data point is assigned to closest centroid. Update centroids: Compute mean of all datapoints and reassign centroids. Repeat until it converges. K-MEANS PROCESS K-means example (k=2) Ps: In python use kmeans.fit() from sklearn librairy.
  5. SKLEARN PYTHON Plot raw data matplotlib Find the best value

    for K using elbow method (k=2) Use kmeans.fit(data) to cluster
  6. Source : Hewlet Packard DISADVANTAGES PROS & CONS ADVANTAGES Simple

    and easy to implement: The algorithm is straightforward and efficient for clustering. Scalable for large datasets: Can handle large datasets efficiently, making it suitable for big data applications. Sensitive to initial centroid placement: If the initial centroids are poorly chosen, K- means can end up with suboptimal clusters. Assumes spherical clusters: It struggles with data that forms non-circular or varied-sized clusters (circular+same size).
  7. SCOPE OF APPLICATION Image segmentation: Grouping pixels by similarity. Market

    segmentation: Dividing customers based on behavior or preferences. Anomaly detection: Identifying outliers in datasets. e.g. Fraud detection, cybersecurity