Controlling ML Models - Regularisation in Practise

In this walkthrough we are going to look at regularisation, a really central approach to prevent overfitting in machine learning models. Here you learn how to build a linear model, and apply regularisation methods (e.g. LASSO). We will clearly see the benefits of this approach.

Adding Libaries

import numpy as np
import pandas as pd

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures

import matplotlib.pyplot as plt

Loading the dataset, scaling and transformation into features.

After loading various libraries, we now collecting the Boston housing data set.

In this example we will consider a single input and a single output for our regression challenge.

$x$ - the input is LSTAT which is a measure of the number of low income families

$y$ - the ouput is the median (local) house price over the data set

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv"

df = pd.read_csv(url, header = None)

data = df.values

X = data[0:50, 12].reshape(-1, 1) # Single input data - LSTAT
y = data[0:50, 13].reshape(-1, 1) # Target Variable - Median House Price

Nice and easy here. Read data in upload it into a dataframe.

We now can get input and output data in good shape. The call reshape(-1,1) makes sure that the array vector and so the shape is of the array is (n,1) rather just (n,).

We then move on to a sklearn pipeline. Here we first apply a scaling MinMaxScaler() to scale this input between $0$ and $1$ . Instead of looking at a single feature, we then expand the input representation to a polynomial feature space of order 10, i.e.

\boldsymbol{\phi} = \Big[ 1, x, x^2, \ldots, x^9, x^{10} \Big]

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

pipe = Pipeline(
    [
        ("minmax", MinMaxScaler()),
        ("feature", PolynomialFeatures(degree=10)),
    ]
)

X_poly = pipe.fit_transform(X)

Test / train split of the data

We then do the normal test train / split of the data set.

X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.3, random_state=123)

Looking at the data

fig, ax = plt.subplots(1, 1)

ax.plot(X_train[:,1], y_train, 'ob', alpha=0.4)
ax.plot(X_test[:,1], y_test, 'og', alpha=0.4)

plt.ylabel('Output - Media House Value')
plt.xlabel('Input - LSTAT (scaled)')

plt.show()

Fitting a linear model

Ok so now it is time to fit the linear model.

lr = LinearRegression().fit(X_train, y_train)

Then let us plot out how to see how we did over the range of the data set.

X_plot = np.linspace(0.0, 1.0, 100).reshape(-1,1)
X_plot_poly = PolynomialFeatures(degree=10).fit_transform(X_plot)

y_plot = lr.predict(X_plot_poly)

fig, ax = plt.subplots(1, 1)

ax.plot(X_train[:,1], y_train, 'ob', alpha=0.4, label='Training Data')
ax.plot(X_test[:,1], y_test, 'og', alpha=0.4, label='Testing Data')
ax.plot(X_plot, y_plot, '-',color='lightcoral', label='Polynomial Fit')

plt.ylabel('Output - Media House Value - Thousand Dollars')
plt.xlabel('Input - LSTAT (scaled) ')

plt.ylim([0.0, 50.])
plt.xlim([0.0, 1.])

plt.show()

We see this doesn't look good, particularly to the righthand side between the values of 0.7 of 1.0, the predicted model does not generalise well.

The model is too expressive, the model is clearly overfitting the data.

Applying Regularisation

Let us follow what we discussed in the previous explainer on regularsiation, and use a LASSO model to fit to the data.

We aren't sure what the best regularisation strength $\alpha$ (which is $= 1/lambda$ from the notes) should be. We therefore fit lots of models over a range of values between $0.001$ and $0.2$ . Here we use a simple loop but we could of used sklearn's grid search functions

from sklearn.linear_model import Lasso

alpha = np.linspace(0.001, 0.2, 1000)

training = []
testing = []

for a in alpha:
    lass = Lasso(alpha=a).fit(X_train, y_train)
    training.append(lass.score(X_train, y_train))
    testing.append(lass.score(X_test, y_test))

We can now plot out the training and testing scores

plt.plot(alpha, training, '-b', label='Training')
plt.plot(alpha, testing, '-g', label='Testing')

plt.ylabel('Testing and Training')
plt.xlabel('Regularisation Strength')


plt.legend()
plt.show()

The function score() returns the coefficient of determination of the prediction.

The coefficient of determination is given by

R^2 = \Big( 1 - \frac{u}{v} \Big)

where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum().

The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a score of $0.0$ .

Ok let us pick the regularisation strength where we do best over our testing data. So about a value of $0.02$

lass = Lasso(alpha=0.025).fit(X_train, y_train)

y_plot = lass.predict(X_plot_poly)

fig, ax = plt.subplots(1, 1)

ax.plot(X_train[:,1], y_train, 'ob', alpha=0.4, label='Training Data')
ax.plot(X_test[:,1], y_test, 'og', alpha=0.4, label='Testing Data')
ax.plot(X_plot, y_plot, '-',color='lightcoral', label='Polynomial Fit')

plt.ylabel('Output - Media House Value - Thousand Dollars')
plt.xlabel('Input - LSTAT (scaled)')

plt.ylim([0.0, 50])

plt.show()

This looks alot better. No oscillations that try to model the noise in the data, generalises well particularly between input values of $0.7$ and $1$ where the other model performed very badly.

We can now look to see what the final coefficients of this model were

print(lass.intercept)
print(lass.coef_)

array([ 31.27 ])
array([  0.        , -33.77058217,   0.        ,  17.24305027,
         0.        ,   0.        ,  -0.        ,  -0.        ,
        -0.        ,  -0.        ,  -0.        ])

So this gives us a nice simple model as the prediction

f(x) = 31.27 - 33.77x + 17.24x^3

We go through this result in more detail in the explainer.

Now you fitted a LASSO model why not try to fit a Ridge Regression model and compare the outputs of the functions you fit.

Hint you need to import the following

from sklearn.linear_model import Ridge

which can be used in the same way as Lasso

General Linear Models

5. Controlling ML Models - Regularisation in Practise