Using Kernel SVMs for Non-Linear Predictions

📂 Resources

Download the resources for this lesson here.

In this video we shall use the SVM classifier as a non-parametric model by applying the kernel trick. To implement this, we then have more choices to make regarding hyperparameters which we walk through here.

📑 Learning Objectives

Identify the kernel approach to SVM models in scikit-learn.
Implement a support vector machine with different choices of kernel functions.
Visualise a non-linear decision surface.
Compare support vectors in the linear and non-linear case.

Kernel parameter selection

For kernel selection, the scikit-learn package offers a few options:

Linear kernel: This is the 'no action' option. The model expressions containing $k(\mathbf{x}, \mathbf{x}') = \mathbf{x}^{T}\mathbf{x}'$ remain the same.
Polynomial kernel: Permits a quantifiable amount of non-linearity, dependant on degree choosen. The form of this kernel is $k(\mathbf{x}, \mathbf{x}') = (\gamma\mathbf{x}^{T}\mathbf{x}' + r)^{d}$ where the degree $d$ and coefficient $r$ are specified through the arguements degree and coef0, respectively.
Squared-exponential, or Radial Basis Function (RBF), kernel: $k(\mathbf{x}, \mathbf{x}') = \exp(-\gamma \| \mathbf{x} - \mathbf{x}'\|^{2})$ projects data into an infinite dimensional space whilst promoting smoothness when the data points $\mathbf{x}$ and $\mathbf{x}'$ are close. The single parameter gamma is a positive real numbers and can be thought of as an inverse length scale. Thinking of $k(\mathbf{x}, \mathbf{x}')$ as a measure of correlation between these data points, the length scale $1/\sqrt{2\gamma}$ is the standard deviation between the two points. The length scale should be set on the order of $\|\mathbf{x} - \mathbf{x}'\|$ and can be found with a simple grid search.
Sigmoid kernel: $k(\mathbf{x}, \mathbf{x}') = \tanh(\gamma\mathbf{x}^{T}\mathbf{x}' + r)$ closely resembles then non-linear activations occuring in deep neural networks. Whilst uncommon in the literature, one can think of them as being used to express binary features in teh optimisation proceedure.

Implementing a non-linear kernel SVM

Let's re-generate our data built from to blobs.

import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
from sklearn import svm
from sklearn.utils import shuffle
%matplotlib inline

# Parameters
NUM_DATA = 200
SPREAD = 0.13
REDUCTION = 1
CPARAM = 10
SEED = 4
np.random.seed(SEED)

def get_blobs(num_samples, std_dev):
    """Generate two 2D normal distributions in the NE and SW quadrants."""
    cov = np.asarray([[std_dev, 0], [0, std_dev]])
    mean_ne = np.asarray(2*[2.5,])
    mean_sw = np.asarray(2*[1.5,])
    ne = np.random.multivariate_normal(mean=mean_ne, cov=cov, size=num_samples)
    sw = np.random.multivariate_normal(mean=mean_sw, cov=cov, size=num_samples)
    return ne, sw

ne, sw = get_blobs(NUM_DATA, SPREAD * REDUCTION)

# Organise the data
X = np.concatenate((ne, sw))
y = np.asarray(len(ne)*[1,] + len(sw)*[-1,])

# Randomly order the data, for good measure
X, y = shuffle(X, y, random_state=SEED)

Now we can implement our Kernel SVM and plot the contours around the output predictor. We contrast this against the linear kernel from before.


# Set up axes
SPACE = 0.05
AX_MIN = 0.25
AX_MAX = 3.75
LINE_MIN = 0.5
LINE_MAX = 3.5

fig, ax0 = plt.subplots(1, 2, figsize=[16, 8])
plt.subplots_adjust(wspace=SPACE, hspace=SPACE)

# Pick two kernels
kernels = ['linear', 'rbf']
titles = ['Linear kernel: ', 'RBF kernel: ']

for ii, ax in enumerate(ax0):

    # Fit the Support Vector Classifier
    clf = svm.SVC(kernel=kernels[ii], C=CPARAM)
    clf.fit(X, y)

    # Grids
    x0 = np.linspace(LINE_MIN, LINE_MAX, 1000)
    x1 = np.linspace(AX_MIN, AX_MAX, 1000)
    xx, yy = np.meshgrid(x1, x1)
    f = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
    f = f.reshape(xx.shape)

    ax.tick_params(direction='in')
    ax.set_xlim(AX_MIN, AX_MAX)
    ax.set_ylim(AX_MIN, AX_MAX)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Plot data and ground truth
    ax.scatter(sw[:, 0], sw[:, 1], s=15, color='goldenrod')
    ax.scatter(ne[:, 0], ne[:, 1], s=15, color='navy')
    #ax.plot(x0, -x0 + 4, color='r', linestyle='--', linewidth=2)
    svs = ax.scatter(
                clf.support_vectors_[:, 0],
                clf.support_vectors_[:, 1],
                s=80,
                facecolors='none',
                zorder=10,
                edgecolors='fuchsia',
    )

    # Put the result into a contour plot
    ax.contourf(
        xx, yy, f, cmap=cm.get_cmap("magma_r"), alpha=0.5, linestyles=["-"],
    )

    ax.set_title(
        titles[ii] + f'{sum(clf.n_support_)} support vectors', fontsize=18,
    )

plt.show()

Figure 1. Contour plots of the linear SVM vs. the non-linear (kernel) SVM. The non-linear soliution captures the shape of the clusters, rather than a straight split. Support vectors are indicated, differing in position between methods.

Support Vector Machines

12. Using Kernel SVMs for Non-Linear Predictions

Kernel parameter selection

Implementing a non-linear kernel SVM

Return to Lesson Index

Next Lesson