Dr Andy Corbett

by Dr Andy Corbett

Lesson

The Kernel Trick

4. Choosing Between Kernel Functions

📑 Learning Objectives
  • Three key types of kernel function:
    • Polynomial kernels
    • Squared-exponential kernels
    • Laplacian kernels
  • The effects of varying hyperparameters alters the prediction
  • Kernel interpretation of the the motivating example projecting into 3D

Constructing kernels can be close an art form. A very important art form when it comes to understanding trends in data.

The ground zero example is the kernel formed by the inner product of the original data features k(x,x)=xxk(\mathbf{x}, \mathbf{x}') = \mathbf{x} \cdot \mathbf{x}'. Other kernels may be found through sums, products, positive multiples of existing kernels. Three commonly used prototypes are:

  • k(x,x)=(xx+c)dk(\mathbf{x}, \mathbf{x}') = (\mathbf{x} \cdot \mathbf{x}' + c)^d; a polynomial kernel with constant cc and degree dd.
  • k(x,x)=exp(γxx2)k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|^{2}); the squared-exponential kernel, with 'inverse length-scale' γ\gamma.
  • k(x,x)=exp(γxx1)k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|_{1}); the Laplacian kernel, based on the taxi-cab norm v1\| \mathbf{v} \|_{1}.

In this video we shall explore how the various hyperparameters impact the prediction, and how the different choices of kernel can improve model fitting.

Choosing Kernels

Figure 1. Diagram of various kernels used to fit data using the Kernel Ridge Regression model.

Back to our example


Recall our running dataset from the previous example.

Projection Diagram

Figure 2. A dataset projection to produce linearly seperable data.

The kernel that makes the projection from 2D to 3D in our two-dimensional problem would correspond to the product

ϕ(x,y)ϕ(x,y)=xx+yy+(cos(πx)y2)(cos(πx)y2).\boldsymbol{\phi}(x,y) \cdot \boldsymbol{\phi}(x',y') = xx' + yy' + (\cos(\pi x) - y^2)(\cos(\pi x') - y'^2).

This is the sum of a linear kernel, xx+yyxx' + yy', and the norm of a non-linear mapping R2R\mathbb{R}^2\rightarrow \mathbb{R}.

Remember, in machine learning, it is not the task of the user to derive these projections by hand. This is the part where the algorithm itself will find the optimal projection.