by Dr Andy Corbett
Support Vector Machines
12. Using Kernel SVMs for Non-Linear Predictions
Download the resources for this lesson here.
In this video we shall use the SVM classifier as a non-parametric model by applying the kernel trick. To implement this, we then have more choices to make regarding hyperparameters which we walk through here.
- Identify the kernel approach to SVM models in
scikit-learn
. - Implement a support vector machine with different choices of kernel functions.
- Visualise a non-linear decision surface.
- Compare support vectors in the linear and non-linear case.
Kernel parameter selection
For kernel selection, the scikit-learn
package offers a few options:
-
Linear kernel: This is the 'no action' option. The model expressions containing remain the same.
-
Polynomial kernel: Permits a quantifiable amount of non-linearity, dependant on degree choosen. The form of this kernel is where the degree and coefficient are specified through the arguements
degree
andcoef0
, respectively. -
Squared-exponential, or Radial Basis Function (RBF), kernel: projects data into an infinite dimensional space whilst promoting smoothness when the data points and are close. The single parameter
gamma
is a positive real numbers and can be thought of as an inverse length scale. Thinking of as a measure of correlation between these data points, the length scale is the standard deviation between the two points. The length scale should be set on the order of and can be found with a simple grid search. -
Sigmoid kernel: closely resembles then non-linear activations occuring in deep neural networks. Whilst uncommon in the literature, one can think of them as being used to express binary features in teh optimisation proceedure.
Implementing a non-linear kernel SVM
Let's re-generate our data built from to blobs.
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
from sklearn import svm
from sklearn.utils import shuffle
%matplotlib inline
# Parameters
NUM_DATA = 200
SPREAD = 0.13
REDUCTION = 1
CPARAM = 10
SEED = 4
np.random.seed(SEED)
def get_blobs(num_samples, std_dev):
"""Generate two 2D normal distributions in the NE and SW quadrants."""
cov = np.asarray([[std_dev, 0], [0, std_dev]])
mean_ne = np.asarray(2*[2.5,])
mean_sw = np.asarray(2*[1.5,])
ne = np.random.multivariate_normal(mean=mean_ne, cov=cov, size=num_samples)
sw = np.random.multivariate_normal(mean=mean_sw, cov=cov, size=num_samples)
return ne, sw
ne, sw = get_blobs(NUM_DATA, SPREAD * REDUCTION)
# Organise the data
X = np.concatenate((ne, sw))
y = np.asarray(len(ne)*[1,] + len(sw)*[-1,])
# Randomly order the data, for good measure
X, y = shuffle(X, y, random_state=SEED)
Now we can implement our Kernel SVM and plot the contours around the output predictor. We contrast this against the linear kernel from before.
# Set up axes
SPACE = 0.05
AX_MIN = 0.25
AX_MAX = 3.75
LINE_MIN = 0.5
LINE_MAX = 3.5
fig, ax0 = plt.subplots(1, 2, figsize=[16, 8])
plt.subplots_adjust(wspace=SPACE, hspace=SPACE)
# Pick two kernels
kernels = ['linear', 'rbf']
titles = ['Linear kernel: ', 'RBF kernel: ']
for ii, ax in enumerate(ax0):
# Fit the Support Vector Classifier
clf = svm.SVC(kernel=kernels[ii], C=CPARAM)
clf.fit(X, y)
# Grids
x0 = np.linspace(LINE_MIN, LINE_MAX, 1000)
x1 = np.linspace(AX_MIN, AX_MAX, 1000)
xx, yy = np.meshgrid(x1, x1)
f = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
f = f.reshape(xx.shape)
ax.tick_params(direction='in')
ax.set_xlim(AX_MIN, AX_MAX)
ax.set_ylim(AX_MIN, AX_MAX)
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# Plot data and ground truth
ax.scatter(sw[:, 0], sw[:, 1], s=15, color='goldenrod')
ax.scatter(ne[:, 0], ne[:, 1], s=15, color='navy')
#ax.plot(x0, -x0 + 4, color='r', linestyle='--', linewidth=2)
svs = ax.scatter(
clf.support_vectors_[:, 0],
clf.support_vectors_[:, 1],
s=80,
facecolors='none',
zorder=10,
edgecolors='fuchsia',
)
# Put the result into a contour plot
ax.contourf(
xx, yy, f, cmap=cm.get_cmap("magma_r"), alpha=0.5, linestyles=["-"],
)
ax.set_title(
titles[ii] + f'{sum(clf.n_support_)} support vectors', fontsize=18,
)
plt.show()
Figure 1. Contour plots of the linear SVM vs. the non-linear (kernel) SVM. The non-linear soliution captures the shape of the clusters, rather than a straight split. Support vectors are indicated, differing in position between methods.