K-Means

When working with a large dataset, one way of simplifying our analysis is to group similar observations together. K-means clustering is a straightforward, iterative process for separating a dataset into clusters.

Based on the principle that observations with similar features should be grouped together, K-means is guaranteed to group the data so that the variation within each cluster is minimised. Whilst this occasionally leads to misleading results, it is a powerful tool in a great number of cases.

Online resources

A blog post on K-means written by Itai Muzhingi;
A really cool visualisation tool made by Naftali Harris

Click the links below to access the Jupyter Notebooks for K-means

K-means - Empty [Online notebook | .ipynb file]
K-means - Redacted [Online notebook | .ipynb file]
K-means - Complete [Online notebook | .ipynb file | HTML file]