Linear Regression
Regression, in its simplest form, is merely a matter of finding the line-of-best-fit for a graph that plots one or more independent variables against a continuous dependent variable. The question is, what makes one line-of-fit better than another?
Linear regression works on the assumption that the line-of-best-fit is the line for which the sum of the squared distances between the line and each point in the training set is smallest. It has a number of properties that other models don't:
- There is a unique, closed form solution: In order to find the line-of-best-fit, we don't need to test out lots of potential candidates. There is a simple formula which we can apply to find the line-of-best-fit every time.
- Linear regression is very interpretable. It allows us to understand the relationships between variables easily, so that we can answer questions such as: 'If we change this independent variable by 10% how would we expect the dependent variable to change as a result?'
Online resources
- This short series of articles on linear regression by Ridley Leisy;
- Section 3.1 of Bishop's Pattern Recognition and Machine Learning;
- A guide to interpreting the output of a linear regression.
Click the links below to access the Jupyter Notebooks for linear regression
- Single Variable - Empty [Online notebook | .ipynb file]
- Single Variable - Redacted [Online notebook | .ipynb file]
- Single Variable - Complete [Online notebook | .ipynb file | HTML file]
- Multiple Variables - Empty [Online notebook | .ipynb file]
- Multiple Variables - Redacted [Online notebook | .ipynb file]
- Multiple Variables - Complete [Online notebook | .ipynb file | HTML file]