Dr Andy Corbett

by Dr Andy Corbett

Lesson

The Kernel Trick

5. A Code Walkthrough for Kernel Ridge Regression

📂 Resources

Download the resources for this lesson here.

📑 Learning Objectives
  • Applying the Kernel Ridge Regression model with scikit-learn
  • Exploring hyperparameters and kernel choices
  • Visualising output

Linear models are too coarse for subtle data trends


We've chewed over the theory. Now let's see how this technique may be applied to a visual regression problem. Let's begin by generating some data.

RANDOM_SEED = 31
NOISE = 2
SCALE = 20

# Define the ground truth function
x = np.linspace(0, SCALE, 101)
y_true = 1.5 * (x -7 + 2*np.sin(x) + np.sin(2*x))

# Generate data skewed with noise
ep = NOISE * np.random.normal(size=(len(x),))
y = y_true + ep

# Columns
x = x[:, np.newaxis]
y = y[:, np.newaxis]

Here we consider the data-generating (ground truth) function

y(x)=32(x7+2sin(x)+sin(2x))y(x) = \frac{3}{2} ( x - 7 + 2\sin(x) + \sin(2x))

which is skewed, as y(x)+εy(x) + \varepsilon, with some normally distributed noise εN(0,2)\varepsilon \sim \mathcal{N}(0, 2). To see this data, let's fit a linear regression through the data points and plot the results.

from sklearn.linear_model import LinearRegression
linear_model = LinearRegression()
linear_model.fit(x, y)

print('Score: ', linear_model.score(x, y))
print('Coeff: ', linear_model.coef_)
print('Intercept: ', linear_model.intercept_)

x_test = np.linspace(0, SCALE, 1001)
x_test = x_test[:, np.newaxis]
y_pred = linear_model.predict(x_test)
y_gt = 1.5 * (x_test -7 + 2*np.sin(x_test) + np.sin(2*x_test))

scat = plt.scatter(x, y, marker='.')
line, = plt.plot(x_test, y_pred, 'g', linewidth=2)
true, = plt.plot(x_test, y_gt, '--r')
plt.legend([scat, line, true], ['Data', 'Prediction', 'Ground Truth'])
Equator split

The linear fit produces a coefficient of determination (the R^2 value) of 0.90. Not bad, but given the non-linear form of the ground truth function, we have a strong prior belief that the model has not captured all of the features.

Kernel regression can identify more features by high-dimensional projections


Now let us try a variety of kernel models to detect these features.

from sklearn.kernel_ridge import KernelRidge
# Polynomial kernel
krr_poly_model = KernelRidge(kernel='polynomial', degree=5)
krr_poly_model.fit(x, y)

# Radial Basis Function kernel
krr_rbf_model = KernelRidge(kernel='rbf')
krr_rbf_model.fit(x, y)

# Laplacian kernel
krr_laplacian_model = KernelRidge(kernel='laplacian', gamma=1/SCALE)
krr_laplacian_model.fit(x, y)

# Linear kernel
krr_linear_model = KernelRidge(kernel='linear')
krr_linear_model.fit(x, y)

# Test points
x_test = np.linspace(0, SCALE, 1001)
x_test = x_test[:, np.newaxis]

# Predictions
y_poly = krr_poly_model.predict(x_test)
y_rbf = krr_rbf_model.predict(x_test)
y_laplacian = krr_laplacian_model.predict(x_test)
y_linear = krr_linear_model.predict(x_test)

# Ground truth
y_gt = 1.5 * (x_test -7 + 2*np.sin(x_test) + np.sin(2*x_test))

# Plot results
fx, ax = plt.subplots(2, 2, figsize=(10, 10))

# Linear kernel
scat = ax[0, 0].scatter(x, y, marker='.')
line, = ax[0, 0].plot(x_test, y_linear, 'g', linewidth=2)
true, = ax[0, 0].plot(x_test, y_gt, '--r')
ax[0, 0].legend([scat, line, true], ['Data', 'Prediction', 'Ground Truth'])
ax[0, 0].set_title('Linear Kernel')

# Polynomial kernel
scat = ax[0, 1].scatter(x, y, marker='.')
line, = ax[0, 1].plot(x_test, y_poly, 'g', linewidth=2)
true, = ax[0, 1].plot(x_test, y_gt, '--r')
ax[0, 1].legend([scat, line, true], ['Data', 'Prediction', 'Ground Truth'])
ax[0, 1].set_title('Degree 5 Polynomial Kernel')


# Laplacian kernel
scat = ax[1, 0].scatter(x, y, marker='.')
line, = ax[1, 0].plot(x_test, y_laplacian, 'g', linewidth=2)
true, = ax[1, 0].plot(x_test, y_gt, '--r')
ax[1, 0].legend([scat, line, true], ['Data', 'Prediction', 'Ground Truth'])
ax[1, 0].set_title('Laplacian Kernel')

# RBF kernel
scat = ax[1, 1].scatter(x, y, marker='.')
line, = ax[1, 1].plot(x_test, y_rbf, 'g', linewidth=2)
true, = ax[1, 1].plot(x_test, y_gt, '--r')
ax[1, 1].legend([scat, line, true], ['Data', 'Prediction', 'Ground Truth'])
ax[1, 1].set_title('Radial Basis Function Kernel')

Equator split

A few observations:

  • The linear kernel under performs compared to our first linear model, as there is no built-in intercept.
  • The degree-5 polynomial is constrained by dimension and therefore not expressive enough to capture the mixed periodicity in our ground truth data.
  • The Laplacian kernel, whilst allowing an infinity dimensional feature representation, penalises higher order features.
  • In practice, you shall most likely be turning to the Radial Basis Function (RBF), or Gaussian kernel, which here captures different scales of periodicity in the data trend.