by Dr Andy Corbett
9. DNNs in the Wild for Drug Discovery
Download the resources for this lesson here.
- ✅ Train and deploy a PyTorch model to solve a real world problem.
- ✅ Analyse data which may occur in experimental drug prediction.
- ✅ Minimise a PyTorch model to predict optimal solutions outside the experimental range.
Time to get our hands dirty. In this video we'll be setting up a drug trial experiment and we want to use machine learning to model our data and inform our actions. Perfect time to pull out our newly devised neural network.
Experimental data
First port of call is to load in our experimental readings and see what we're dealing with. We'll do this with pandas
:
import pandas as pd
df = pd.read_csv('discovery.csv')
df.info()
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 soluble1 5000 non-null float64
1 soluble2 5000 non-null float64
2 soluble3 5000 non-null float64
3 soluble4 5000 non-null float64
4 output 5000 non-null float64
Out data table contains 5 columns, each with 5000 samples. There are four solubles which constitute the input parameters to the experiment and our measurement of success is stored in the output: 'how much bacteria remains after apply our new concoction?'
Our goal, as scientists, is to minimise the output by choosing the correct solubles.
How can we solve this with neural networks? We'll do two things:
- We shall emulate the experimental findings of this table.
- We shall then use a package
autograd-minimise
to find the minimum of our emulator.
Inspecting the data
Before diving straight in, let's determine what we are working with. First of all, plotting histograms of the data variable will indicate if there is any skew or other anomalous behaviour.
Fig. 1. A histogram of the _output_ variable showing well distributed results.
Let's also assess if there are any linear correlations between our variables.
Fig. 2. Correlation matrix between the variables.
To things pop out to us from this table. Firstly, soluble 4 seems to be somewhat correlated to soluble 1. This tells us that we have not conducted our experiments so far in a completely uniform way. Hey-ho, we'll have to live with that. But more interesting, there seems to be high negative correlation between soluble 4 (and hence 1) and the output.
Let's investigate this further by finding the line of least-squares best fit between them.
Fig. 3. The linear relationship between the output and solution 4.
By eye, it seems that we shall get a long way with our model with this dependency. So we expect it to be picked up by the network. One option we have is to de-trend the data--subtracting this red line from the output column--and learning the residuals. However, we expect that our neural network can handle this feature.
Building our model
Step one is to rescale our data. The purpose of this is to put each feature on a level pegging with the randomly initialised weights. The scaling shall become the first step in our model pipeline.
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
X = df.drop(columns='output').to_numpy().copy()
X = scale.fit_transform(X)
y = df['output'].to_numpy().copy()
print('Means: ', scale.mean_)
print('Vars: ', scale.var_)
Means: [ 55.21510051 242.114565 23.30306534 41.5352574 ]
Vars: [1.65841110e+02 2.31802953e+03 1.84469320e-02 2.44658012e+02]
Build Dataset
and DataLoader
classes
Just as before, we construct python data objects. We shuffle the data and take a train, test and validation set in the ratio 80:10:10.
from torch.utils.data import DataLoader, Dataset
class Data(Dataset):
def __init__(self, X, y):
"""Load data into torch tensors of dtype float32"""
self.X = torch.from_numpy(X).type(torch.float)
self.y = torch.from_numpy(y).type(torch.float).reshape(-1, 1)
self.len = self.X.shape[0]
def __getitem__(self, idx):
"""Return data sample as (input, target) pair."""
return self.X[idx], self.y[idx]
def __len__(self):
"""Return number of samples."""
return self.len
n_train = int(0.8 * len(X))
n_val = int(0.1 * len(X))
n_test = len(X) - n_train - n_val
data_train, data_val, data_test = torch.utils.data.random_split(
Data(X, y), [n_train, n_val, n_test],
)
BATCH_SIZE = 10
# Instantiate loaders
loader_train = DataLoader(data_train, batch_size=BATCH_SIZE, shuffle=True)
loader_val = DataLoader(data_val, batch_size=n_val)
loader_test = DataLoader(data_test, batch_size=n_val)
Neural network architecture
Using the MLP architecture, we choose a very simple architecture for this problem: a single hidden layer of width 4.
Experiment at home: With the provided notebook, tinker with the architecture. Add more layers, increase the width and see what effect this has on model performance.
import torch.nn as nn
DIM_INPUT = data_train[0][0].shape[0] # The size of the input vector
DIM_OUTPUT = 1
WIDTH = 4 # The number of nodes in each leayer
# The multi-layer perceptron
class MLP(nn.Module):
"""Our PyTorch model for an MLP."""
def __init__(self):
super(MLP, self).__init__()
# Construct linear connections
self.layer_hid = nn.Linear(DIM_INPUT, WIDTH)
self.layer_out = nn.Linear(WIDTH, DIM_OUTPUT)
# Pick activations
self.act = nn.Sigmoid()
# Batch-normalise layers
self.bn = nn.BatchNorm1d(WIDTH)
def forward(self, x):
"""The journey forward."""
x = self.layer_hid(x)
x = self.bn(x)
x = self.act(x)
x = self.layer_out(x)
return x
Train the model
We use our training code from last time. However, we include an important new feature: regularisation. This is inserted in the weight_decay
parameter. regularisation helps to temper the growth of parameters to avoid divergence.
model = MLP() # Instantiate the netork
loss_func = nn.MSELoss() # Loss function
optimiser = torch.optim.SGD(model.parameters(), lr=0.001, weight_decay=1e-3)
NUM_EPOCHS = 50
# To store loss statistics
loss_train, loss_val = list(), list()
model.train()
for ep in trange(NUM_EPOCHS):
running_loss = 0.0
for ii, batch in enumerate(loader_train):
inputs, targets = batch
optimiser.zero_grad() # Zero gradients in network
preds = model(inputs) # Make a forward pass
loss = loss_func(preds, targets) # Compute loss
loss.backward() # Backpropagate loss (compute gradients)
optimiser.step() # Update parameters
# Record loss at this patch
running_loss += loss.item()
# Update epoch-level training loss
loss_train.append(running_loss / n_train)
# Test on validation set
for ii, batch in enumerate(loader_val):
inputs, targets = batch
preds = model(inputs) # Make a forward pass
loss = loss_func(preds, targets) # Compute loss
loss_val.append(loss.item())
Training converges quickly with little overfitting.
Fig. 4. The loss plot for our training session.
Visualise the results
With high dimensional (higher than one dimension!) we are constrained in our visualisation tools. But a one-size-fits-all plot is the predicted-vs.-truth plot which compares the models prediction with the corresponding ground-truth value. This is easy to read as the perfect model sits on the diagonal, and we are able to measure any deviance with the score.
Fig. 5. Predicted-vs.-truth plot
Task 1: Improve this outcome! With a better regularised; wider, deeper, longer trained and using; apply your new techniques to improve the score.
Task 2: Can you extract the parameters (using
model.parameters()
) associated to solution 2 on the input layer to assess the weighting given to the linear relationship we found in fig. 3?
Application: Optimise the emulator
Our goal was to build a function--we called this model
--which can emulate the results of our experiment. Now we want to ask: which set of parameters best minimises the output variable.
There is a great package called autograd-minimize
which wraps SciPy's minimise routine but includes the stored gradient information given by torch tensors.
Lets perform this optimisation now.
from autograd_minimize import minimize
X_init = list(loader_val)[0][0][0]
res = minimize(
model,
X_init.numpy().reshape(1, -1),
backend='torch',
bounds=(0, None),
tol=1e-8,
)
The output from this routine gives the results of the minimisation:
message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
success: True
status: 0
fun: 82.90983581542969
x: [[ 7.159e-08 1.261e-08 1.923e+01 3.175e+01]]
nit: 4
jac: [-6.233e-06 -3.515e-06 1.863e-06 -2.502e-05]
nfev: 19
njev: 19
hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
And we can pick out the optimal parameters as follows (taking care to back-project through our scale transformation):
scale.inverse_transform(res.x)
array([[ 55.21510143, 242.11456561, 25.9146459 , 538.17894934]])
Low and behold, the soluble4
parameter has indeed been picked at the base of the linear slope given in fig. 3. But we also have values for the other parameters which were not apparent from this one-dimensional relationship.