Dr Andy Corbett

<Callout type="success" heading={"📑 Learning Objectives"}>

- Three key types of kernel function:
  - Polynomial kernels
  - Squared-exponential kernels
  - Laplacian kernels
- The effects of varying hyperparameters alters the prediction
- Kernel interpretation of the the motivating example projecting into 3D

</Callout>

Constructing kernels can be close an art form. A very important art form when it comes to understanding trends in data.

The ground zero example is the kernel formed by the inner product of the original data features $k(\mathbf{x}, \mathbf{x}') = \mathbf{x} \cdot \mathbf{x}'$. Other kernels may be found through sums, products, positive multiples of existing kernels. Three commonly used prototypes are:

- $k(\mathbf{x}, \mathbf{x}') = (\mathbf{x} \cdot \mathbf{x}' + c)^d$; a polynomial kernel with constant $c$ and degree $d$.
- $k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|^{2})$; the squared-exponential kernel, with 'inverse length-scale' $\gamma$.
- $k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma\|\mathbf{x} - \mathbf{x}'\|_{1})$; the Laplacian kernel, based on the taxi-cab norm $\| \mathbf{v} \|_{1}$.

In this video we shall explore how the various hyperparameters impact the prediction, and how the different choices of kernel can improve model fitting.

<Image
	src={
		"/images/courses/machine-learning-models-for-professionals/choosing-between-kernel-functions/img1.png"
	}
	alt={"Choosing Kernels"}
	width={750}
	height={438}
	caption={
		"Figure 1. Diagram of various kernels used to fit data using the Kernel Ridge Regression model."
	}
/>

## Back to our example

Recall our running dataset from the previous example.

<Image
	src={
		"/images/courses/machine-learning-models-for-professionals/choosing-between-kernel-functions/img2.png"
	}
	alt={"Projection Diagram"}
	width={750}
	height={438}
	caption={"Figure 2. A dataset projection to produce linearly seperable data."}
/>

The kernel that makes the projection from 2D to 3D in our two-dimensional problem would correspond to the product

$$
\boldsymbol{\phi}(x,y) \cdot \boldsymbol{\phi}(x',y') = xx' + yy' + (\cos(\pi x) - y^2)(\cos(\pi x') - y'^2).
$$

This is the sum of a linear kernel, $xx' + yy'$, and the norm of a non-linear mapping $\mathbb{R}^2\rightarrow \mathbb{R}$.

> Remember, in machine learning, it is not the task of the user to derive these projections by hand. This is the part where the algorithm itself will find the optimal projection.


The Kernel Trick

4. Choosing Between Kernel Functions

Back to our example

Return to Lesson Index

Next Lesson