What is k-means Clustering?

2 min readSep 30, 2024

Disclaimer: This is just a sharing of my study notes, aim to reinforce my learning, proceed at your own risk!

Credit: Scikit Learn, https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_iris.html

k-means clustering is a method of unsupervised learning that is being used to, that’s right, clustering. The number of k is a predefined number to split the data into groups. k-means clustering finds a benchmark data point for each cluster and assigns all the data points to the nearest benchmark data point which is also known as the centroid, eventually forming clusters. Due to the nature of the algorithm, k-means can only work on numerical data.

k-means algorithm, in summary, works as below:

Placing k centroids (c₁ … cₖ) at random locations
Find the nearest centroid for each data point (using Euclidean distance), and assign it to the cluster
Find the new centroid for each cluster
Repeat Steps 1–3 until no data point is changing cluster.

Next, there are 2 ways to evaluate the clusters, either by using the Elbow Curve Method or Silhouette Analysis. In general, it is aimed to have the shortest average distance within the cluster, and the furthest average distance between clusters.

The Elbow Curve Method will check against each k value, calculating the WCSS (Within-Cluster-Sum of Square), which is the sum of the squared distance between each data point and centroid in the cluster. Plotting WCSS into a graph, there will be a point where the value suddenly drops, this will be the Elbow point.

Whereas the Silhouette Analysis will find out the Silhouette Score, which is a way to measure how well the clusters are separated, and also how dense it is.

References:

Victor Lavrenko. (2014) K-means clustering: how it works. https://youtu.be/_aWzGGNrcic?si=KbStWpVxHqIzhSAj

Scikit Learn. https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_iris.html

What is k-means Clustering?

References:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Weng Kee Teh

No responses yet