by Dr Andy Corbett
Support Vector Machines
10. Overlapping Classes and Kernel SVMs
- Unpack more advanced properties of SVMs.
- Know to handle overlapping class boundaries.
- Apply the kernel trick to obtain a non-parametric SVM.
- Identify regularisation parameters in the model.
Overlapping classes
In real-life, we are unlikely to be dealt such an easily-separable hand of data as in the previous video; data points will likely not have a clear margin between them. However, if we are willing to accept a degree of miss-classification we can consider relaxing the constraint at the boundary to require only
for small parameters . Since we would rather the to be small, the problem now involves minimising a new objective
The constant hyper-parameter is left for the user to adjust: small values of gives freedom to the so that many points may cross the data boundary. But large values will force the data points to adhere to a hard boundary .
Figure 1. Taking our foot off the gas, we allow some vectors both inside and outside of the margin and on the wrong side of the boundary. The SVM in this figure is optimised with hyper-parameter .
Non-linear decision boundaries: re-enter the kernel trick
In our linear solution we encountered the data dot products . Recalling the kernel trick from the previous tutorial, we can swap this term artificially for more general kernels and consider non-linear models of the form
to replace the . Why would we want to apply the kernel trick here? In our linear SVM, we only considered splitting the input space into two. But by visual inspection would seem that the data contains more structure: the clusters are arranged into circles. Let us reclassify the points with a non-linear kernel and compare the outputs.
Figure 2. Comparing different kernels to form non-linear decision boundaries.
š Notice that in the non-linear classifier, we identify support vectors on the far boundaries of the data clusters. Although these vectors are very far from the boundary, they are close when projected into higher dimensions. Thus we gain more insight into the shape of the data compared to with a linear classifier.