Linear Regression

Regression, in its simplest form, is merely a matter of finding the line-of-best-fit for a graph that plots one or more independent variables against a continuous dependent variable. The question is, what makes one line-of-fit better than another?

Linear regression works on the assumption that the line-of-best-fit is the line for which the sum of the squared distances between the line and each point in the training set is smallest. It has a number of properties that other models don't:

There is a unique, closed form solution: In order to find the line-of-best-fit, we don't need to test out lots of potential candidates. There is a simple formula which we can apply to find the line-of-best-fit every time.
Linear regression is very interpretable. It allows us to understand the relationships between variables easily, so that we can answer questions such as: 'If we change this independent variable by 10% how would we expect the dependent variable to change as a result?'

Online resources

This short series of articles on linear regression by Ridley Leisy;
Section 3.1 of Bishop's Pattern Recognition and Machine Learning;
A guide to interpreting the output of a linear regression.

Click the links below to access the Jupyter Notebooks for linear regression

Single Variable - Empty [Online notebook | .ipynb file]
Single Variable - Redacted [Online notebook | .ipynb file]
Single Variable - Complete [Online notebook | .ipynb file | HTML file]
Multiple Variables - Empty [Online notebook | .ipynb file]
Multiple Variables - Redacted [Online notebook | .ipynb file]
Multiple Variables - Complete [Online notebook | .ipynb file | HTML file]