Dr Andy Corbett

<Callout type="success" heading={"📑 Learning Objectives"}>
- Unpack more advanced properties of SVMs.
- Know to handle overlapping class boundaries.
- Apply the kernel trick to obtain a non-parametric SVM.
- Identify regularisation parameters in the model.

</Callout>

## Overlapping classes

In real-life, we are unlikely to be dealt such an easily-separable hand of data as in the previous video; data points will likely not have a clear margin between them. However, if we are willing to accept a degree of miss-classification we can consider relaxing the constraint at the boundary $y = \pm 1$ to require only

$$
y_nf(\mathbf{x}_n) \geq 1 - \varepsilon_{n}
$$

for small parameters $0< \varepsilon_n \leq 1$. Since we would rather the $\varepsilon_n$ to be small, the problem now involves minimising a new objective

$$
C\sum_{n=1}^{N}\varepsilon_{n} + \frac{1}{2}\|\mathbf{w}\|^2.
$$

The constant hyper-parameter $C>0$ is left for the user to adjust: small values of $C$ gives freedom to the $\varepsilon_n$ so that many points may cross the data boundary. But large values $C\rightarrow \infty$ will force the data points to adhere to a hard boundary $\varepsilon_n\rightarrow 0$.

<Image
	src={
		"/images/courses/machine-learning-models-for-professionals/overlapping-classes-and-kernel-svms/img1.png"
	}
	alt={"Overlapping Classes"}
	width={484}
	height={482}
/>

###### Figure 1. Taking our foot off the gas, we allow some vectors both inside and outside of the margin _and_ on the wrong side of the boundary. The SVM in this figure is optimised with hyper-parameter $C=10$.

## Non-linear decision boundaries: _re-enter the kernel trick_

In our linear solution we encountered the data dot products $\mathbf{x} \cdot \mathbf{x}_{n}$. Recalling the kernel trick from the previous tutorial, we can swap this term artificially for more general kernels and consider non-linear models of the form

$$
f(\mathbf{x}) = \sum_{n=1}^{N}a_n \cdot y_n \cdot k(\mathbf{x}, \mathbf{x}_{n}) + b
$$

to replace the $\mathbf{x}\cdot\mathbf{x}^{T}$. Why would we want to apply the kernel trick here? In our linear SVM, we only considered splitting the input space into two. But by visual inspection would seem that the data contains more structure: the clusters are arranged into circles. Let us reclassify the points with a non-linear kernel and compare the outputs.

<Image
	src={
		"/images/courses/machine-learning-models-for-professionals/overlapping-classes-and-kernel-svms/img2.png"
	}
	alt={"Non-Linear SVM Classifier"}
	width={1000}
	height={526}
	caption={
		"Figure 2. Comparing different kernels to form non-linear decision boundaries."
	}
/>

<Callout type="info" heading={""}>
	👀 Notice that in the non-linear classifier, we identify support vectors on the
	far boundaries of the data clusters. Although these vectors are very far from
	the boundary, they are close when projected into higher dimensions. Thus we
	gain more insight into the shape of the data compared to with a linear
	classifier.
</Callout>


Support Vector Machines

10. Overlapping Classes and Kernel SVMs

Overlapping classes

Figure 1. Taking our foot off the gas, we allow some vectors both inside and outside of the margin and on the wrong side of the boundary. The SVM in this figure is optimised with hyper-parameter $C=10$ .

Non-linear decision boundaries: re-enter the kernel trick

Return to Lesson Index

Next Lesson

Support Vector Machines

10. Overlapping Classes and Kernel SVMs

Overlapping classes

Figure 1. Taking our foot off the gas, we allow some vectors both inside and outside of the margin and on the wrong side of the boundary. The SVM in this figure is optimised with hyper-parameter C=10C=10C=10.

Non-linear decision boundaries: re-enter the kernel trick

Return to Lesson Index

Next Lesson

Figure 1. Taking our foot off the gas, we allow some vectors both inside and outside of the margin and on the wrong side of the boundary. The SVM in this figure is optimised with hyper-parameter $C=10$ .