DNNs in the Wild for Drug Discovery

📂 Resources

Download the resources for this lesson here.

In this video you will...

✅ Train and deploy a PyTorch model to solve a real world problem.
✅ Analyse data which may occur in experimental drug prediction.
✅ Minimise a PyTorch model to predict optimal solutions outside the experimental range.

Time to get our hands dirty. In this video we'll be setting up a drug trial experiment and we want to use machine learning to model our data and inform our actions. Perfect time to pull out our newly devised neural network.

Experimental data

First port of call is to load in our experimental readings and see what we're dealing with. We'll do this with pandas:

import pandas as pd

df = pd.read_csv('discovery.csv')
df.info()

RangeIndex: 5000 entries, 0 to 4999
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   soluble1  5000 non-null   float64
 1   soluble2  5000 non-null   float64
 2   soluble3  5000 non-null   float64
 3   soluble4  5000 non-null   float64
 4   output    5000 non-null   float64

Out data table contains 5 columns, each with 5000 samples. There are four solubles which constitute the input parameters to the experiment and our measurement of success is stored in the output: 'how much bacteria remains after apply our new concoction?'

Our goal, as scientists, is to minimise the output by choosing the correct solubles.

How can we solve this with neural networks? We'll do two things:

We shall emulate the experimental findings of this table.
We shall then use a package autograd-minimise to find the minimum of our emulator.

Inspecting the data

Before diving straight in, let's determine what we are working with. First of all, plotting histograms of the data variable will indicate if there is any skew or other anomalous behaviour.

Fig. 1. A histogram of the _output_ variable showing well distributed results.

Let's also assess if there are any linear correlations between our variables.

Fig. 2. Correlation matrix between the variables.

To things pop out to us from this table. Firstly, soluble 4 seems to be somewhat correlated to soluble 1. This tells us that we have not conducted our experiments so far in a completely uniform way. Hey-ho, we'll have to live with that. But more interesting, there seems to be high negative correlation between soluble 4 (and hence 1) and the output.

Let's investigate this further by finding the line of least-squares best fit between them.

Fig. 3. The linear relationship between the output and solution 4.

By eye, it seems that we shall get a long way with our model with this dependency. So we expect it to be picked up by the network. One option we have is to de-trend the data--subtracting this red line from the output column--and learning the residuals. However, we expect that our neural network can handle this feature.

Building our model

Step one is to rescale our data. The purpose of this is to put each feature on a level pegging with the randomly initialised weights. The scaling shall become the first step in our model pipeline.

from sklearn.preprocessing import StandardScaler


scale = StandardScaler()
X = df.drop(columns='output').to_numpy().copy()
X = scale.fit_transform(X)
y = df['output'].to_numpy().copy()

print('Means: ', scale.mean_)
print('Vars: ', scale.var_)

Means:  [ 55.21510051 242.114565    23.30306534  41.5352574 ]
Vars:  [1.65841110e+02 2.31802953e+03 1.84469320e-02 2.44658012e+02]

Build `Dataset` and `DataLoader` classes

Just as before, we construct python data objects. We shuffle the data and take a train, test and validation set in the ratio 80:10:10.

from torch.utils.data import DataLoader, Dataset

class Data(Dataset):
    def __init__(self, X, y):
        """Load data into torch tensors of dtype float32"""
        self.X = torch.from_numpy(X).type(torch.float)
        self.y = torch.from_numpy(y).type(torch.float).reshape(-1, 1)
        self.len = self.X.shape[0]

    def __getitem__(self, idx):
        """Return data sample as (input, target) pair."""
        return self.X[idx], self.y[idx]

    def __len__(self):
        """Return number of samples."""
        return self.len

n_train = int(0.8 * len(X))
n_val = int(0.1 * len(X))
n_test = len(X) - n_train - n_val

data_train, data_val, data_test = torch.utils.data.random_split(
    Data(X, y), [n_train, n_val, n_test],
)

BATCH_SIZE = 10

# Instantiate loaders
loader_train = DataLoader(data_train, batch_size=BATCH_SIZE, shuffle=True)
loader_val = DataLoader(data_val, batch_size=n_val)
loader_test = DataLoader(data_test, batch_size=n_val)

Neural network architecture

Using the MLP architecture, we choose a very simple architecture for this problem: a single hidden layer of width 4.

Experiment at home: With the provided notebook, tinker with the architecture. Add more layers, increase the width and see what effect this has on model performance.

import torch.nn as nn


DIM_INPUT = data_train[0][0].shape[0]  # The size of the input vector
DIM_OUTPUT = 1
WIDTH = 4  # The number of nodes in each leayer

# The multi-layer perceptron
class MLP(nn.Module):
    """Our PyTorch model for an MLP."""

    def __init__(self):
        super(MLP, self).__init__()

        # Construct linear connections
        self.layer_hid = nn.Linear(DIM_INPUT, WIDTH)
        self.layer_out = nn.Linear(WIDTH, DIM_OUTPUT)

        # Pick activations
        self.act = nn.Sigmoid()

        # Batch-normalise layers
        self.bn = nn.BatchNorm1d(WIDTH)

    def forward(self, x):
        """The journey forward."""
        x = self.layer_hid(x)
        x = self.bn(x)
        x = self.act(x)
        x = self.layer_out(x)
        return x

Train the model

We use our training code from last time. However, we include an important new feature: $L^2$ regularisation. This is inserted in the weight_decay parameter. $L^2$ regularisation helps to temper the growth of parameters to avoid divergence.

model = MLP()  # Instantiate the netork
loss_func = nn.MSELoss()  # Loss function
optimiser = torch.optim.SGD(model.parameters(), lr=0.001, weight_decay=1e-3)

NUM_EPOCHS = 50

# To store loss statistics
loss_train, loss_val = list(), list()

model.train()

for ep in trange(NUM_EPOCHS):
    running_loss = 0.0
    for ii, batch in enumerate(loader_train):
        inputs, targets = batch
        optimiser.zero_grad()  # Zero gradients in network
        preds = model(inputs)  # Make a forward pass
        loss = loss_func(preds, targets)  # Compute loss
        loss.backward()  # Backpropagate loss (compute gradients)
        optimiser.step()  # Update parameters

        # Record loss at this patch
        running_loss += loss.item()

    # Update epoch-level training loss
    loss_train.append(running_loss / n_train)

    # Test on validation set
    for ii, batch in enumerate(loader_val):
        inputs, targets = batch
        preds = model(inputs)  # Make a forward pass
        loss = loss_func(preds, targets)  # Compute loss
    loss_val.append(loss.item())

Training converges quickly with little overfitting.

Fig. 4. The loss plot for our training session.

Visualise the results

With high dimensional (higher than one dimension!) we are constrained in our visualisation tools. But a one-size-fits-all plot is the predicted-vs.-truth plot which compares the models prediction with the corresponding ground-truth value. This is easy to read as the perfect model sits on the diagonal, and we are able to measure any deviance with the $R^2$ score.

Fig. 5. Predicted-vs.-truth plot

Task 1: Improve this outcome! With a better regularised; wider, deeper, longer trained and using; apply your new techniques to improve the $R^2$ score.

Task 2: Can you extract the parameters (using model.parameters()) associated to solution 2 on the input layer to assess the weighting given to the linear relationship we found in fig. 3?

Application: Optimise the emulator

Our goal was to build a function--we called this model--which can emulate the results of our experiment. Now we want to ask: which set of parameters best minimises the output variable.

There is a great package called autograd-minimize which wraps SciPy's minimise routine but includes the stored gradient information given by torch tensors.

Lets perform this optimisation now.

from autograd_minimize import minimize

X_init = list(loader_val)[0][0][0]
res = minimize(
    model,
    X_init.numpy().reshape(1, -1),
    backend='torch',
    bounds=(0, None),
    tol=1e-8,
)

The output from this routine gives the results of the minimisation:

  message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
  success: True
   status: 0
      fun: 82.90983581542969
        x: [[ 7.159e-08  1.261e-08  1.923e+01  3.175e+01]]
      nit: 4
      jac: [-6.233e-06 -3.515e-06  1.863e-06 -2.502e-05]
     nfev: 19
     njev: 19
 hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>

And we can pick out the optimal parameters as follows (taking care to back-project through our scale transformation):

scale.inverse_transform(res.x)

array([[ 55.21510143, 242.11456561,  25.9146459 , 538.17894934]])

Low and behold, the soluble4 parameter has indeed been picked at the base of the linear slope given in fig. 3. But we also have values for the other parameters which were not apparent from this one-dimensional relationship.

9. DNNs in the Wild for Drug Discovery

Experimental data

Inspecting the data

Building our model

Build `Dataset` and `DataLoader` classes

Neural network architecture

Train the model

Visualise the results

Application: Optimise the emulator

Return to Lesson Index

Next Lesson

9. DNNs in the Wild for Drug Discovery

Experimental data

Inspecting the data

Building our model

Build Dataset and DataLoader classes

Neural network architecture

Train the model

Visualise the results

Application: Optimise the emulator

Return to Lesson Index

Next Lesson

Build `Dataset` and `DataLoader` classes