Dr Andy Corbett

by Dr Andy Corbett

Lesson

Support Vector Machines

12. Using Kernel SVMs for Non-Linear Predictions

📂 Resources

Download the resources for this lesson here.

In this video we shall use the SVM classifier as a non-parametric model by applying the kernel trick. To implement this, we then have more choices to make regarding hyperparameters which we walk through here.

📑 Learning Objectives
  • Identify the kernel approach to SVM models in scikit-learn.
  • Implement a support vector machine with different choices of kernel functions.
  • Visualise a non-linear decision surface.
  • Compare support vectors in the linear and non-linear case.

Kernel parameter selection


For kernel selection, the scikit-learn package offers a few options:

  • Linear kernel: This is the 'no action' option. The model expressions containing k(x,x)=xTxk(\mathbf{x}, \mathbf{x}') = \mathbf{x}^{T}\mathbf{x}' remain the same.

  • Polynomial kernel: Permits a quantifiable amount of non-linearity, dependant on degree choosen. The form of this kernel is k(x,x)=(γxTx+r)dk(\mathbf{x}, \mathbf{x}') = (\gamma\mathbf{x}^{T}\mathbf{x}' + r)^{d} where the degree dd and coefficient rr are specified through the arguements degree and coef0, respectively.

  • Squared-exponential, or Radial Basis Function (RBF), kernel: k(x,x)=exp(γxx2)k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma \| \mathbf{x} - \mathbf{x}'\|^{2}) projects data into an infinite dimensional space whilst promoting smoothness when the data points x\mathbf{x} and x\mathbf{x}' are close. The single parameter gamma is a positive real numbers and can be thought of as an inverse length scale. Thinking of k(x,x)k(\mathbf{x}, \mathbf{x}') as a measure of correlation between these data points, the length scale 1/2γ1/\sqrt{2\gamma} is the standard deviation between the two points. The length scale should be set on the order of xx\|\mathbf{x} - \mathbf{x}'\| and can be found with a simple grid search.

  • Sigmoid kernel: k(x,x)=tanh(γxTx+r)k(\mathbf{x}, \mathbf{x}') = \tanh(\gamma\mathbf{x}^{T}\mathbf{x}' + r) closely resembles then non-linear activations occuring in deep neural networks. Whilst uncommon in the literature, one can think of them as being used to express binary features in teh optimisation proceedure.

Implementing a non-linear kernel SVM


Let's re-generate our data built from to blobs.

import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
from sklearn import svm
from sklearn.utils import shuffle
%matplotlib inline

# Parameters
NUM_DATA = 200
SPREAD = 0.13
REDUCTION = 1
CPARAM = 10
SEED = 4
np.random.seed(SEED)
def get_blobs(num_samples, std_dev):
    """Generate two 2D normal distributions in the NE and SW quadrants."""
    cov = np.asarray([[std_dev, 0], [0, std_dev]])
    mean_ne = np.asarray(2*[2.5,])
    mean_sw = np.asarray(2*[1.5,])
    ne = np.random.multivariate_normal(mean=mean_ne, cov=cov, size=num_samples)
    sw = np.random.multivariate_normal(mean=mean_sw, cov=cov, size=num_samples)
    return ne, sw

ne, sw = get_blobs(NUM_DATA, SPREAD * REDUCTION)

# Organise the data
X = np.concatenate((ne, sw))
y = np.asarray(len(ne)*[1,] + len(sw)*[-1,])

# Randomly order the data, for good measure
X, y = shuffle(X, y, random_state=SEED)

Now we can implement our Kernel SVM and plot the contours around the output predictor. We contrast this against the linear kernel from before.


# Set up axes
SPACE = 0.05
AX_MIN = 0.25
AX_MAX = 3.75
LINE_MIN = 0.5
LINE_MAX = 3.5

fig, ax0 = plt.subplots(1, 2, figsize=[16, 8])
plt.subplots_adjust(wspace=SPACE, hspace=SPACE)

# Pick two kernels
kernels = ['linear', 'rbf']
titles = ['Linear kernel: ', 'RBF kernel: ']

for ii, ax in enumerate(ax0):

    # Fit the Support Vector Classifier
    clf = svm.SVC(kernel=kernels[ii], C=CPARAM)
    clf.fit(X, y)

    # Grids
    x0 = np.linspace(LINE_MIN, LINE_MAX, 1000)
    x1 = np.linspace(AX_MIN, AX_MAX, 1000)
    xx, yy = np.meshgrid(x1, x1)
    f = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
    f = f.reshape(xx.shape)

    ax.tick_params(direction='in')
    ax.set_xlim(AX_MIN, AX_MAX)
    ax.set_ylim(AX_MIN, AX_MAX)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Plot data and ground truth
    ax.scatter(sw[:, 0], sw[:, 1], s=15, color='goldenrod')
    ax.scatter(ne[:, 0], ne[:, 1], s=15, color='navy')
    #ax.plot(x0, -x0 + 4, color='r', linestyle='--', linewidth=2)
    svs = ax.scatter(
                clf.support_vectors_[:, 0],
                clf.support_vectors_[:, 1],
                s=80,
                facecolors='none',
                zorder=10,
                edgecolors='fuchsia',
    )

    # Put the result into a contour plot
    ax.contourf(
        xx, yy, f, cmap=cm.get_cmap("magma_r"), alpha=0.5, linestyles=["-"],
    )

    ax.set_title(
        titles[ii] + f'{sum(clf.n_support_)} support vectors', fontsize=18,
    )

plt.show()
Non-Linear SVM Classifier

Figure 1. Contour plots of the linear SVM vs. the non-linear (kernel) SVM. The non-linear soliution captures the shape of the clusters, rather than a straight split. Support vectors are indicated, differing in position between methods.