Dr Andy Corbett

by Dr Andy Corbett

Lesson

10. DNNs in the Wild for Classification

📂 Resources

Download the resources for this lesson here.

In this video you will...
  • ✅ Import and process image data.
  • ✅ Build a simple neural network to classify the images.
  • ✅ Inspect the weights to visually observe the feature-patterns learnt.

Image classification is big business, and we are now toe-stepping on one of the big applications of deep neural networks. But we are just dipping our toes in lightly here. our goals are twofold:

  1. To use PyTorch lightning on an image-data application.
  2. To visually unpack the network weights themselves.

Aquiring the data


Learning how to use PyTorch's in-built data sources can be very useful for quick development and testing of new models. One such benchmark task is known as the MNIST dataset, which contains a series of handwritten digits from 0, ... , 9 and the task is to identify the correct class.

Fig. 1. An example of the handwritten digit data provided.

It is a simple problem, hence we can get away with using a simple model, without throwing the full weight of computer vision at the task.

We can acquire the data through PyTorch directly:

from torchvision import datasets, transforms


transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # Mean/Std of MNIST data.
])

train_kwargs = {'batch_size': 64}
test_kwargs = {'batch_size': 32}

# Torch datasets
train = datasets.MNIST('data/', train=True, transform=transform, download=True)
test = datasets.MNIST('data/', train=False, transform=transform)

# Torch dataloaders
loader_train = torch.utils.data.DataLoader(train,**train_kwargs)
loader_test = torch.utils.data.DataLoader(test, **test_kwargs)

data_dict = {
    'training_loader': loader_train,
    'validation_loader': loader_test,
}

Constructing a Deep Neural Network


As mentioned, we wish to use Lightning to train the model, but this does not preclude us from first writing our nn.Module.

import torch.nn as nn


DIM_INPUT = np.prod(train[0][0].shape)  # The size of the input vector
DIM_OUTPUT = 10
WIDTH = 20  # The number of nodes in each leayer

# The multi-layer perceptron
class MLP(nn.Module):
    """Our PyTorch model for an MLP."""

    def __init__(self):
        super(MLP, self).__init__()

        # Construct linear connections
        self.layer1 = nn.Linear(DIM_INPUT, WIDTH, bias=False)
        self.layer2 = nn.Linear(WIDTH, DIM_OUTPUT, bias=False)

        # Pick activations
        self.act = nn.Sigmoid()
        # self.dropout = nn.Dropout(0.2)

        # Batch-normalise layers
        self.bn1 = nn.BatchNorm1d(WIDTH)

    def forward(self, x):
        """The journey forward."""

        x = self.layer1(x.flatten(start_dim=1))
        x = self.bn1(x)
        x = self.act(x)
        # x = self.dropout(x)

        x = self.layer2(x)

        return nn.functional.softmax(x, dim=1)

This is a relatively small model. Just downscaling to a single hidden layer of dimension 20 from the input images. With this assumption, we are postulating that there are 20 feature in the images which we want to extract.

Task: Augment this architecture. Try increase/decreasing these nodes and see how performance varies. Also try adding more layers whilst keeping the number of hidden neurons (20) constant.

Train with Lightning


We specify our training routine as before with the pl.Module subclass.

import lightning.pytorch as pl


class PLModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = MLP()

    def training_step(self, batch):
        """This defines a step in the training loop."""
        x, y = batch  # The batch comes from a torch dataloader
        y_hat = self.model(x)
        loss = nn.functional.nll_loss(y_hat, y, reduction='sum')
        y_pred = y_hat.argmax(dim=1, keepdim=True)  # get number index
        correct = y_pred.eq(y.view_as(y_pred)).sum().item()
        self.log('train_loss', loss, prog_bar=True)
        self.log('train_acc', correct / 64)
        return loss

    def train_dataloader(self):
        return loader_train

    def validation_step(self, batch):
        """This defines a validation step."""
        x, y = batch
        y_hat = self.model(x)
        loss = nn.functional.nll_loss(y_hat, y, reduction='sum')
        y_pred = y_hat.argmax(dim=1, keepdim=True)  # get number index
        correct = y_pred.eq(y.view_as(y_pred)).sum().item()
        self.log('val_loss', loss)
        self.log('val_acc', correct / 32)
        return loss

    def val_dataloader(self):
        return loader_test

    def configure_optimizers(self):
        optimiser = torch.optim.SGD(self.parameters(), lr=0.01)
        return optimiser

# init the autoencoder
model = PLModel()

Training can then be executed in a few lines of code.

Go and make yourself a hot drink. This may take some time! :timer_clock:

from lightning.pytorch.callbacks import ModelCheckpoint

ckpt = ModelCheckpoint(save_top_k=2, monitor='val_loss')

trainer = pl.Trainer(
    check_val_every_n_epoch=5,
    max_epochs=200,
    callbacks=[ckpt],
)

trainer.fit(model=model)
pth = trainer.checkpoint_callback.best_model_path
print('Best model path:\n', pth)

Inspection of the model weights


To glean intuition of this simple model, we shall print out the matrix of weights used on the incoming data for each hidden node. This means we shall have 20 matrices, each of whose parameters correspond to a single pixel in the image data.

First of all, the parameters of our neural network are stored here:

params = list(model.model.parameters())
sizes = [p.shape for p in params]

We can illustrate the plots with the following code:

fig, axes = plt.subplots(10, 2, figsize=(5, 10))

scale_max, scale_min = params[0].max(), params[0].min()  # Scale factors

for param, ax in zip(params[0], axes.ravel()):
    ax.matshow(param.reshape(28, 28).detach().cpu().numpy(),
               cmap=plt.cm.magma,
               vmin=scale_min,
               vmax=scale_max)
    ax.set_xticks(())
    ax.set_yticks(())

plt.show()

Fig. 2. A depiction of the model weights transforming the inputs into the 20 hidden feature nodes.

From the images, it is apparent that much of the network is formed by noise. Outside the spherical region in the centre, there is no differentiation of the weights. However, there are some notable bright and dark areas in the image which indicate features being extracted from the input. Remarkable!