Dr Andy Corbett

by Dr Andy Corbett

Lesson

Support Vector Machines

10. Overlapping Classes and Kernel SVMs

šŸ“‘ Learning Objectives
  • Unpack more advanced properties of SVMs.
  • Know to handle overlapping class boundaries.
  • Apply the kernel trick to obtain a non-parametric SVM.
  • Identify regularisation parameters in the model.

Overlapping classes


In real-life, we are unlikely to be dealt such an easily-separable hand of data as in the previous video; data points will likely not have a clear margin between them. However, if we are willing to accept a degree of miss-classification we can consider relaxing the constraint at the boundary y=Ā±1y = \pm 1 to require only

ynf(xn)ā‰„1āˆ’Īµny_nf(\mathbf{x}_n) \geq 1 - \varepsilon_{n}

for small parameters 0<Īµnā‰¤10< \varepsilon_n \leq 1. Since we would rather the Īµn\varepsilon_n to be small, the problem now involves minimising a new objective

Cāˆ‘n=1NĪµn+12āˆ„wāˆ„2.C\sum_{n=1}^{N}\varepsilon_{n} + \frac{1}{2}\|\mathbf{w}\|^2.

The constant hyper-parameter C>0C>0 is left for the user to adjust: small values of CC gives freedom to the Īµn\varepsilon_n so that many points may cross the data boundary. But large values Cā†’āˆžC\rightarrow \infty will force the data points to adhere to a hard boundary Īµnā†’0\varepsilon_n\rightarrow 0.

Overlapping Classes
Figure 1. Taking our foot off the gas, we allow some vectors both inside and outside of the margin and on the wrong side of the boundary. The SVM in this figure is optimised with hyper-parameter C=10C=10.

Non-linear decision boundaries: re-enter the kernel trick


In our linear solution we encountered the data dot products xā‹…xn\mathbf{x} \cdot \mathbf{x}_{n}. Recalling the kernel trick from the previous tutorial, we can swap this term artificially for more general kernels and consider non-linear models of the form

f(x)=āˆ‘n=1Nanā‹…ynā‹…k(x,xn)+bf(\mathbf{x}) = \sum_{n=1}^{N}a_n \cdot y_n \cdot k(\mathbf{x}, \mathbf{x}_{n}) + b

to replace the xā‹…xT\mathbf{x}\cdot\mathbf{x}^{T}. Why would we want to apply the kernel trick here? In our linear SVM, we only considered splitting the input space into two. But by visual inspection would seem that the data contains more structure: the clusters are arranged into circles. Let us reclassify the points with a non-linear kernel and compare the outputs.

Non-Linear SVM Classifier

Figure 2. Comparing different kernels to form non-linear decision boundaries.

šŸ‘€ Notice that in the non-linear classifier, we identify support vectors on the far boundaries of the data clusters. Although these vectors are very far from the boundary, they are close when projected into higher dimensions. Thus we gain more insight into the shape of the data compared to with a linear classifier.