Random Forests

Lots of trees constitutes a forest, whilst lots of decision trees constitutes a random forest.

Random forests are closely related to bagged models; they use boostrapping and model aggregation to generate model predictions. The key distinction is that for each decision tree in a random forest, the decision tree is constructed using a randomly selected subset of the available features. Randomly subsetting the features in this way leads to a more diverse set of decision trees, reducing the risk of overfitting, and also allows us to understand the influence of individual features in the model fitting process.

Given their status as one of the most widely used ML algorithms, it's well worth becoming acquainted with random forests.

Online resources

  • Check out our page on decision trees, the building blocks of random forests;
  • A blog post by Niklas Dongeson on understand random forests and feature importance

Click the links below to access the Jupyter Notebooks for Random Forests