Linear Discriminant Analysis
Linear regression is an example of a Discriminative model: For a set of features, x, it predicts the most likely value that a related target variable, y, will take. Discriminative models often exhibit high predictive accuracy, but they are unable to make predictions when one of the features is missing, a situation that occurs regularly when working with real-world data.
An alternative to Discriminative models are Generative models: Instead of modelling the probability that y will take a specific value conditional on certain x values, Generative models model the probability that y will take a specific value, y0, AND x will take a specific value, x0. Described using mathematical notation, we model P(y, x) instead of P(y|x).
This might sound like a minor change, but it makes all the difference when dealing with missing data: Using Bayes' rule, we can use P(y, x) to calculate P(y|x), meaning that we can make predictions in the same way that Discriminative models do, but we can also calculate P(y|x*), where x* is a subset of the feature set. We can therefore use the same model to make predictions about the target variable when we have missing data
In the notebooks below, we implement the Linear Discriminant Analysis classifier, a Generative model first described by Sir Ronald Fisher in 1936.
Online resources
- Some slides on Bayes Classifiers from a course at Georgia Tech - covers LDA and Naive Bayes;
- A decidedly less rigourous exposition: A blog post on LDA and Gaussian Mixture Models I wrote for my research group's blog (*Shameless self-promotion claxon*)
- A nice explanation on the scikit-learn webpage.
Click the links below to access the Jupyter Notebooks for Linear Discriminant Analysis
- LDA - Empty [Online notebook | .ipynb file]
- LDA - Redacted [Online notebook | .ipynb file]
- LDA - Complete [Online notebook | .ipynb file | HTML file]