Bagging
Bagging is a portmanteau of 'Bootstrapped Aggregation', which tells us all we need to know about how the algorithm works: Aggregation means that we train lots of different models and combine their results in order to improve our predictive accuracy.
Training lots of models using a finite dataset presents us with a dilemma: Either we train each model using the same dataset, leading to similar results for each model; or we split the dataset and train each model using a subset of the data, risking not giving the constituent models enough data.
Fortunately we can use the Statistician's trick of Boostrapping, which allows us to use a dataset to create a 'new' dataset of the same size. In this way, we can train as many different models as we want using an appropriately sized dataset.
In the notebooks below, we implement a Bagged model using Decision Trees as building blocks.
Online resources
- An article on bootstrapping by Lorna Yen;
- A series of blog posts on ensemble methods and the bias-variance decomposition;
Click the links below to access the Jupyter Notebooks for Bagging
- Bagging - Empty [Online notebook | .ipynb file]
- Bagging - Redacted [Online notebook | .ipynb file]
- Bagging - Complete [Online notebook | .ipynb file | HTML file]