by Dr Andy Corbett
5. Introducing PyTorch
Download the resources for this lesson here.
- ✅ Build a neural network with PyTorch.
- ✅ Subclass PyTorch's
nn.Module
. - ✅ Construct a PyTorch
Dataset
andDataLoader
. - ✅ Implement a training loop.
We'll revisit our classification problem from before, but this time we shall construct and train our neural network with PyTorch. We'll use the tools which are provided for in PyTorch's code base:
- Activation functions
- Loss functions
- Training algorithm
We'll see that we can move quickly using this package and still maintain a high degree of flexibility in model design.
Let's start at the beginning.
The Dataset Class
With your data in the scikit-learn
format, where we have two np.arrays
, the inputs X
and outputs y
, we need to convert this data into a format that PyTorch
shall understand.
from torch.utils.data import Dataset
# Seed your RNGs
torch.manual_seed(SEED)
class Data(Dataset):
def __init__(self, X, y):
"""Load data into torch tensors of dtype float32"""
self.X = torch.from_numpy(X).type(torch.float)
self.y = torch.from_numpy(y).type(torch.float).reshape(-1, 1)
self.len = self.X.shape[0]
def __getitem__(self, idx):
"""Return data sample as (input, target) pair."""
return self.X[idx], self.y[idx]
def __len__(self):
"""Return number of samples."""
return self.len
n_train = int(0.8 * len(X))
n_val = int(0.1 * len(X))
n_test = len(X) - n_train - n_val
data_train, data_val, data_test = torch.utils.data.random_split(
Data(X, y), [n_train, n_val, n_test],
)
We do this by subclassing the Dataset
class (above). This tells PyTorch how to load and possibly pre-process the data (it could be loading straight from file if too large to fit in your memory).
There are two methods we have to fill in ourselves:
- The
__getitem__
method: this tells is how to pick a given sample where the indexidx
ranges over the length of the data. - The
__len__
method: well this is the length, which parameterises the size of the dataset.
We also make use of PyTorch's own data splitting functionality random_split
, which, unlike scikit-learn
, permits us to split into more than two sets. Don't forget to seed your script to maintain reproducibility!
Data Loaders
The Dataset class describes where to find the data and how it should be retrieved. But a data loader is used to split out set into batches, shuffle the data and allocate the number of process it can be split across. This makes training much more efficient when using GPU acceleration.
We can construct a data loader from the dataset very simply:
from torch.utils.data import DataLoader
BATCH_SIZE = 20
# Instantiate loaders
loader_train = DataLoader(
data_train, batch_size=BATCH_SIZE, shuffle=True,
)
loader_val = DataLoader(data_val, batch_size=n_val)
loader_test = DataLoader(data_test, batch_size=n_val)
Construct a deep neural network
Now we get to the fun bit. Let's build a model in PyTorch. We shall, as is necessary, subclass the nn.Module
class. This contains the functionality to compute and store gradients and eventually backpropagate them with jsut a single method call.
We shall build a MLP with two hidden layers, so that we transform the input via
Input > Hidden Layer 1 > Hidden Layer 2 > Output
We'll activate with the logistic function, as before, and
import torch.nn as nn
DIM_INPUT = data_train[0][0].shape[0] # The size of the input vector
DIM_OUTPUT = 1
WIDTH = 5 # The number of nodes in each leayer
# The multi-layer perceptron
class MLP(nn.Module):
"""Our PyTorch model for an MLP."""
def __init__(self):
super(MLP, self).__init__()
# Construct linear connections
self.layer1 = nn.Linear(DIM_INPUT, WIDTH)
self.layer2 = nn.Linear(WIDTH, WIDTH)
self.layer3 = nn.Linear(WIDTH, DIM_OUTPUT)
# Pick activations
self.act1 = nn.Sigmoid()
self.act2 = nn.Sigmoid()
self.act3 = nn.Sigmoid()
# Batch-normalise layers
self.bn1 = nn.BatchNorm1d(WIDTH)
self.bn2 = nn.BatchNorm1d(WIDTH)
def forward(self, x):
"""The journey forward."""
x = self.layer1(x)
x = self.bn1(x)
x = self.act1(x)
x = self.layer2(x)
x = self.bn2(x)
x = self.act2(x)
x = self.layer3(x)
x = self.act3(x)
return x
The forward
pass is the important method, which needs to be implemented in all nn.Modules
. It is called when we call the model; that is,
model = MLP()
y1 = model(x)
y2 = model.forward(x)
print(y1 == y2)
returnsTrue
. This method defines the forward pass. This is how the network functions. Whereas our model components, are defined in the __init__
call.
Train the network
Let's walk through a classic training routine in PyTorch. First of all, we shall need to instantiate (1) the model itself, (2) a loss function to measure success, and (3) an optimisation routine with which to update the parameters.
model = MLP() # Instantiate the netork
loss_func = nn.BCELoss() # Loss function
optimiser = torch.optim.SGD(model.parameters(), lr=0.1) # Optimisation method
These in hand, we can begin training. This shall be executed as a loop, each step in the loop referred to as an epoch
.
NUM_EPOCHS = 4000
# To store loss statistics
loss_train, loss_val = list(), list()
for ep in range(NUM_EPOCHS):
running_loss = 0.0
for ii, batch in enumerate(loader_train):
inputs, targets = batch
optimiser.zero_grad() # Zero gradients in network
preds = model(inputs) # Make a forward pass
loss = loss_func(preds, targets) # Compute loss
loss.backward() # Backpropagate loss (compute gradients)
optimiser.step() # Update parameters
# Record loss at this patch
running_loss += loss.item()
# Update epoch-level training loss
loss_train.append(running_loss / n_train)
# Test on validation set
for ii, batch in enumerate(loader_val):
inputs, targets = batch
preds = model(inputs) # Make a forward pass
loss = loss_func(preds, targets) # Compute loss
loss_val.append(loss.item())
At each epoch we
- Get the data
inputs, targets = batch
- Clean any past recorded gradients
optimiser.zero_grad()
- Make a prediction
preds = model(inputs)
- Compute the loss
loss = loss_func(preds, targets)
- Back propagate the gradients
loss.backward()
- Update the parameters via SGD
optimiser.step()
Point 5 would be a highly technical feat by hand, but PyTorch automates this quickly. As an output, we can plot the loss of the training session.
Fig. 1. Loss plot for our training session.
A spacing between training and loss curves such as her indicates there is a degree of over fitting. In our example, this is likely to be the presence of noise on over lapping classes.
Now that our network is trained, we can inspect the output. Quite brilliantly, our model has learnt an accurate representation of the data.
Fig. 2. Output of neural network solution.