K-Means
When working with a large dataset, one way of simplifying our analysis is to group similar observations together. K-means clustering is a straightforward, iterative process for separating a dataset into clusters.
Based on the principle that observations with similar features should be grouped together, K-means is guaranteed to group the data so that the variation within each cluster is minimised. Whilst this occasionally leads to misleading results, it is a powerful tool in a great number of cases.
Online resources
- A blog post on K-means written by Itai Muzhingi;
- A really cool visualisation tool made by Naftali Harris
Click the links below to access the Jupyter Notebooks for K-means
- K-means - Empty [Online notebook | .ipynb file]
- K-means - Redacted [Online notebook | .ipynb file]
- K-means - Complete [Online notebook | .ipynb file | HTML file]