Applying the Kernel Trick in the Wild

📂 Resources

Download the resources for this lesson here.

📑 Learning Objectives

Applying the Kernel Ridge Regression model with scikit-learn
Exploring hyperparameters and kernel choices
Visualising output

In this video, we'll get stuck straight in to our problem: predicting the compressive strength of concrete from a few easy to measure variables. This is an end-to-end implementation of kernel ridge regression that you'll need when deploying in the wild.

Inspect the data

Loading the data and inspecting the columns, we print out the following table:

Target:  Concrete compressive strength(MPa, megapascals)
Features:  Index(['Cement (component 1)(kg in a m^3 mixture)',
       'Blast Furnace Slag (component 2)(kg in a m^3 mixture)',
       'Fly Ash (component 3)(kg in a m^3 mixture)',
       'Water  (component 4)(kg in a m^3 mixture)',
       'Superplasticizer (component 5)(kg in a m^3 mixture)',
       'Coarse Aggregate  (component 6)(kg in a m^3 mixture)',
       'Fine Aggregate (component 7)(kg in a m^3 mixture)', 'Age (day)'],
      dtype='object')
X shape:  (1030, 8)
y shape:  (1030, 1)

This tells us that there are 8 input features from which we want to predict a single target: the compressive strength of concrete.

Preprocessing

We'll need to scale and centre the data set so it is clean for machine learning. We'll also split it into training and test sets.

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=31)

print('X_train shape: ', X_train.shape)
print('X_test shape: ', X_test.shape)
print('y_train shape: ', y_train.shape)
print('y_test shape: ', y_test.shape)

scaler = StandardScaler()
scaler.fit(X_train)

# Transform both X matrices with these values
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Try a baseline linear model

So that we can benchmark any improvement, we'll first try a linear model.

from sklearn.linear_model import LinearRegression

# Fit a linear model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

The results aren't aweful, but indicate there is much more to be done with an $R^2$ score of 0.58.

Figure 1. A plot of the predictions vs. the ground truth for a baseline linear fit.

The Kernel Ridge Regressor

Our assumption is that a non-parametric model can learn the non-linear behaviour found in the residuals of the linear model. This is where subtle feature extraction can improve the fit.

from sklearn.kernel_ridge import KernelRidge

# Fit a KRR model
krr= KernelRidge(kernel='polynomial', degree=4, alpha=1.0)
krr.fit(X_train, y_train)


y_pred_linear = linear_model.predict(X_test)
y_pred_poly = krr.predict(X_test)

The results give us a far improved model.

Figure 2. Kernel Ridge Regression solution.

Tune the model over parameter space

In the implementation, we specified a number of hyper parameters kernel, degree, alpha. To ensure that we make the best possible choice, we can automate a search through the options and assess the outcome with K-fold cross validation.

First of all, let's specify the parameters that we shall search through.

param_grid = [
    {'degree': [2, 3, 4, 5, 6],
     'alpha': [1e-1, 1.0, 10],
     'kernel': ['polynomial'],
    },
    {'gamma': [1e-2, 1e-1, 1, 10],
     'alpha': [1e-1, 1.0, 10],
     'kernel': ['rbf'],
    },
]

Then we can perform the grid search.

from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold

krr = KernelRidge()
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=10, random_state=0)
search = GridSearchCV(
    estimator=krr, param_grid=param_grid,
)
search.fit(X_train, y_train)

This object now contains a search.best_estimator_ which we can compute the score on and recall the kernel. In fact, if we want to call upon more outputs of the search, we can return the values as a pandas.dataframe:

df = pd.DataFrame(search.cv_results_)
df[df['param_kernel'] == 'polynomial'][df['param_alpha'] == 0.1]

Which gives a complete overview of the K-fold cross validation statistics to report upon. Don't forget the df.to_markdown() method!

	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_alpha	param_degree	param_kernel	param_gamma	params	split0_test_score	split1_test_score	split2_test_score	split3_test_score	split4_test_score	mean_test_score	std_test_score	rank_test_score
0	0.313852	0.154708	0.11445	0.0619005	0.1	2	polynomial	nan	{'alpha': 0.1, 'degree': 2, 'kernel': 'polynomial'}	0.768445	0.799914	0.799908	0.742548	0.809902	0.784143	0.0250491	12
1	0.323135	0.0222913	0.0726411	0.0193653	0.1	3	polynomial	nan	{'alpha': 0.1, 'degree': 3, 'kernel': 'polynomial'}	0.864129	0.895091	0.874832	0.841756	0.878514	0.870864	0.0176288	3
2	0.342396	0.0471995	0.0740518	0.0387501	0.1	4	polynomial	nan	{'alpha': 0.1, 'degree': 4, 'kernel': 'polynomial'}	0.874275	0.899858	0.913486	0.777022	0.875907	0.868109	0.0478807	4
3	0.32883	0.0436676	0.056971	0.0127754	0.1	5	polynomial	nan	{'alpha': 0.1, 'degree': 5, 'kernel': 'polynomial'}	0.840566	0.784525	0.686616	0.673168	0.698032	0.736581	0.0649849	17
4	0.305537	0.00571628	0.076536	0.0288526	0.1	6	polynomial	nan	{'alpha': 0.1, 'degree': 6, 'kernel': 'polynomial'}	0.792378	0.663244	-0.0384043	0.462756	0.664408	0.508876	0.29327	19

The Kernel Trick

6. Applying the Kernel Trick in the Wild

Inspect the data

Preprocessing

Try a baseline linear model

The Kernel Ridge Regressor

Tune the model over parameter space

Return to Lesson Index

Next Lesson