K-Means

When working with a large dataset, one way of simplifying our analysis is to group similar observations together. K-means clustering is a straightforward, iterative process for separating a dataset into clusters.

Based on the principle that observations with similar features should be grouped together, K-means is guaranteed to group the data so that the variation within each cluster is minimised. Whilst this occasionally leads to misleading results, it is a powerful tool in a great number of cases.

Online resources

Click the links below to access the Jupyter Notebooks for K-means