Bagging

Bagging is a portmanteau of 'Bootstrapped Aggregation', which tells us all we need to know about how the algorithm works: Aggregation means that we train lots of different models and combine their results in order to improve our predictive accuracy.

Training lots of models using a finite dataset presents us with a dilemma: Either we train each model using the same dataset, leading to similar results for each model; or we split the dataset and train each model using a subset of the data, risking not giving the constituent models enough data.

Fortunately we can use the Statistician's trick of Boostrapping, which allows us to use a dataset to create a 'new' dataset of the same size. In this way, we can train as many different models as we want using an appropriately sized dataset.

In the notebooks below, we implement a Bagged model using Decision Trees as building blocks.

Online resources

An article on bootstrapping by Lorna Yen;
A series of blog posts on ensemble methods and the bias-variance decomposition;

Click the links below to access the Jupyter Notebooks for Bagging

Bagging - Empty [Online notebook | .ipynb file]
Bagging - Redacted [Online notebook | .ipynb file]
Bagging - Complete [Online notebook | .ipynb file | HTML file]