Bagging

Bagging is a portmanteau of 'Bootstrapped Aggregation', which tells us all we need to know about how the algorithm works: Aggregation means that we train lots of different models and combine their results in order to improve our predictive accuracy.

Training lots of models using a finite dataset presents us with a dilemma: Either we train each model using the same dataset, leading to similar results for each model; or we split the dataset and train each model using a subset of the data, risking not giving the constituent models enough data.

Fortunately we can use the Statistician's trick of Boostrapping, which allows us to use a dataset to create a 'new' dataset of the same size. In this way, we can train as many different models as we want using an appropriately sized dataset.

In the notebooks below, we implement a Bagged model using Decision Trees as building blocks.

Online resources

Click the links below to access the Jupyter Notebooks for Bagging