Introducing PyTorch

📂 Resources

Download the resources for this lesson here.

In this video you will...

✅ Build a neural network with PyTorch.
✅ Subclass PyTorch's nn.Module.
✅ Construct a PyTorch Dataset and DataLoader.
✅ Implement a training loop.

We'll revisit our classification problem from before, but this time we shall construct and train our neural network with PyTorch. We'll use the tools which are provided for in PyTorch's code base:

Activation functions
Loss functions
Training algorithm

We'll see that we can move quickly using this package and still maintain a high degree of flexibility in model design.

Let's start at the beginning.

The Dataset Class

With your data in the scikit-learn format, where we have two np.arrays, the inputs X and outputs y, we need to convert this data into a format that PyTorch shall understand.

from torch.utils.data import Dataset

# Seed your RNGs
torch.manual_seed(SEED)

class Data(Dataset):
    def __init__(self, X, y):
        """Load data into torch tensors of dtype float32"""
        self.X = torch.from_numpy(X).type(torch.float)
        self.y = torch.from_numpy(y).type(torch.float).reshape(-1, 1)
        self.len = self.X.shape[0]

    def __getitem__(self, idx):
        """Return data sample as (input, target) pair."""
        return self.X[idx], self.y[idx]

    def __len__(self):
        """Return number of samples."""
        return self.len

n_train = int(0.8 * len(X))
n_val = int(0.1 * len(X))
n_test = len(X) - n_train - n_val

data_train, data_val, data_test = torch.utils.data.random_split(
    Data(X, y), [n_train, n_val, n_test],
)

We do this by subclassing the Dataset class (above). This tells PyTorch how to load and possibly pre-process the data (it could be loading straight from file if too large to fit in your memory).

There are two methods we have to fill in ourselves:

The __getitem__ method: this tells is how to pick a given sample where the index idx ranges over the length of the data.
The __len__ method: well this is the length, which parameterises the size of the dataset.

We also make use of PyTorch's own data splitting functionality random_split, which, unlike scikit-learn, permits us to split into more than two sets. Don't forget to seed your script to maintain reproducibility!

Data Loaders

The Dataset class describes where to find the data and how it should be retrieved. But a data loader is used to split out set into batches, shuffle the data and allocate the number of process it can be split across. This makes training much more efficient when using GPU acceleration.

We can construct a data loader from the dataset very simply:

from torch.utils.data import DataLoader


BATCH_SIZE = 20

# Instantiate loaders
loader_train = DataLoader(
    data_train, batch_size=BATCH_SIZE, shuffle=True,
)
loader_val = DataLoader(data_val, batch_size=n_val)
loader_test = DataLoader(data_test, batch_size=n_val)

Construct a deep neural network

Now we get to the fun bit. Let's build a model in PyTorch. We shall, as is necessary, subclass the nn.Module class. This contains the functionality to compute and store gradients and eventually backpropagate them with jsut a single method call.

We shall build a MLP with two hidden layers, so that we transform the input via

Input > Hidden Layer 1 > Hidden Layer 2 > Output

We'll activate with the logistic function, as before, and

import torch.nn as nn


DIM_INPUT = data_train[0][0].shape[0]  # The size of the input vector
DIM_OUTPUT = 1
WIDTH = 5  # The number of nodes in each leayer

# The multi-layer perceptron
class MLP(nn.Module):
    """Our PyTorch model for an MLP."""

    def __init__(self):
        super(MLP, self).__init__()

        # Construct linear connections
        self.layer1 = nn.Linear(DIM_INPUT, WIDTH)
        self.layer2 = nn.Linear(WIDTH, WIDTH)
        self.layer3 = nn.Linear(WIDTH, DIM_OUTPUT)

        # Pick activations
        self.act1 = nn.Sigmoid()
        self.act2 = nn.Sigmoid()
        self.act3 = nn.Sigmoid()

        # Batch-normalise layers
        self.bn1 = nn.BatchNorm1d(WIDTH)
        self.bn2 = nn.BatchNorm1d(WIDTH)

    def forward(self, x):
        """The journey forward."""

        x = self.layer1(x)
        x = self.bn1(x)
        x = self.act1(x)

        x = self.layer2(x)
        x = self.bn2(x)
        x = self.act2(x)

        x = self.layer3(x)
        x = self.act3(x)

        return x

The forward pass is the important method, which needs to be implemented in all nn.Modules. It is called when we call the model; that is,

model = MLP()
y1 = model(x)
y2 = model.forward(x)
print(y1 == y2)

returnsTrue. This method defines the forward pass. This is how the network functions. Whereas our model components, are defined in the __init__ call.

Train the network

Let's walk through a classic training routine in PyTorch. First of all, we shall need to instantiate (1) the model itself, (2) a loss function to measure success, and (3) an optimisation routine with which to update the parameters.

model = MLP()  # Instantiate the netork
loss_func = nn.BCELoss()  # Loss function
optimiser = torch.optim.SGD(model.parameters(), lr=0.1)  # Optimisation method

These in hand, we can begin training. This shall be executed as a loop, each step in the loop referred to as an epoch.

NUM_EPOCHS = 4000

# To store loss statistics
loss_train, loss_val = list(), list()
for ep in range(NUM_EPOCHS):
    running_loss = 0.0
    for ii, batch in enumerate(loader_train):
        inputs, targets = batch
        optimiser.zero_grad()  # Zero gradients in network
        preds = model(inputs)  # Make a forward pass
        loss = loss_func(preds, targets)  # Compute loss
        loss.backward()  # Backpropagate loss (compute gradients)
        optimiser.step()  # Update parameters

        # Record loss at this patch
        running_loss += loss.item()

    # Update epoch-level training loss
    loss_train.append(running_loss / n_train)

    # Test on validation set
    for ii, batch in enumerate(loader_val):
        inputs, targets = batch
        preds = model(inputs)  # Make a forward pass
        loss = loss_func(preds, targets)  # Compute loss
    loss_val.append(loss.item())

At each epoch we

Get the data inputs, targets = batch
Clean any past recorded gradients optimiser.zero_grad()
Make a prediction preds = model(inputs)
Compute the loss loss = loss_func(preds, targets)
Back propagate the gradients loss.backward()
Update the parameters via SGD optimiser.step()

Point 5 would be a highly technical feat by hand, but PyTorch automates this quickly. As an output, we can plot the loss of the training session.

Fig. 1. Loss plot for our training session.

A spacing between training and loss curves such as her indicates there is a degree of over fitting. In our example, this is likely to be the presence of noise on over lapping classes.

Now that our network is trained, we can inspect the output. Quite brilliantly, our model has learnt an accurate representation of the data.

Fig. 2. Output of neural network solution.

5. Introducing PyTorch

The Dataset Class

Data Loaders

Construct a deep neural network

Train the network

Return to Lesson Index

Next Lesson