Dr Andy Corbett

by Dr Andy Corbett

Lesson

3. Coding Neurons from Scratch

📂 Resources

Download the resources for this lesson here.

In this video you will...
  • ✅ Learn how to code up neurons by hand.
  • ✅ Implement gradients and update parameters.
  • ✅ See the linear restriction of single neurons.
  • ✅ Toy with how larger MLPs are implemented.

The best way to explore a new idea is by doing. So let's do it! We know what a neural network is. So let's write some first-principles code to act it our. That's right, we'll code up a neural network using just numpy alone.

The task


Here's a basic problem: a bakery has invested in a new machine to make its current buns. It was working well. The mixture went in and the dough came out very well proportioned as a good bun should. Until one day, someone fiddled with the knobs. Since then, then buns have all been tall and thin and very short and not so thin.

Thankfully, the bakery recorded the heights and widths of the buns for each the machine produced, before and after.

They would like to change controls on the machine and would like you to build a detector to check if the outputs are well proportioned, like 'before', or not, as they were 'after'.

Generate the data

Here's the our codification of the problem:

# Generate a table of height and width data
np.random.seed(SEED)
X = np.random.uniform(low=[3.5, 2.0], high=[7.5, 5.0], size=[200, 2])

# Assign class based on thresholds
y = np.ones([len(X),])
y[(X[:, 0] > 5.5) & (X[:, 1] > 3.5)] = 0. # Bun too tall and wide

Fig. 1. Our 2D data points spread over the array. Our goal is to predict the target colour.

Step one: encode a single neuron


Our activation on each neuron is given by the logistic function:

def logistic(alpha):
    """Our activation function."""
    return 1 / (1 + np.exp(-alpha))

which looks as follows:

Fig. 2. The logistic activation function

Now we can define a class to control a single neuron.

class SingleNeuron:
    """A single percepton neuron."""

    def __init__(self, weights, bias, activation_func=logistic):
        self.weights = weights
        self.bias = bias
        self.activation = activation_func

    def forward(self, x):
        linear_sum = np.dot(x, self.weights) + self.bias
        return self.activation(linear_sum)

As a sanity check, let's run our problem through this model and see what it predicts.

# Predictions
x_axis = np.linspace(3.5, 7.5, 1000)
y_axis = np.linspace(2.0, 5.0, 1000)
xx, yy = np.meshgrid(x_axis, y_axis)
zz = neuron.forward(np.stack((xx.ravel(), yy.ravel()), axis=1))
zz = zz.reshape(xx.shape)

# Contours
contours = plt.contour(xx, yy, zz, levels=100)
plt.colorbar(contours)

# Data
good = plt.scatter(X[y==1, 0], X[y==1, 1], color='goldenrod', zorder=2)
bad = plt.scatter(X[y==0, 0], X[y==0, 1], color='tab:blue', zorder=2)

plt.title('Bakery bun data')
plt.ylabel('Bun Height (cm)')
plt.xlabel('Bun Width (cm)')
plt.legend([good, bad], ['Accept', 'Reject'], loc='upper right', framealpha=1.)
plt.show()

Fig. 3. An untrained neuron predicting the output.

Step 2: Train the neuron


Now things begin to get tricky with our stubborn 'by hand' approach. To assess performance, we must define what we mean by 'good'. To keep things simple, we shall apply a mean-squared loss function between the output and the target. (Not really appropriate for a classification problem, but we'll look into this in more depth later.)

def mse(y_true, y_pred):
    return ((y_true - y_pred) ** 2).mean()

For our backpropagation routine, we shall need the derivatives of both this function and the logistic function we defined earlier.

def logistic_derivative(x):
    """Derivative of the logistic function"""
    sigma = logistic(x)
    return sigma * (1 - sigma)

def mse_derivative(y_true, y_pred):
    return -2 * (y_true - y_pred)

Finally, we can add new methods to our Single Neuron Class so that we can train the parameters on the dataset. We shall do this using stochastic gradient descent, but other algorithms are available--the important point is that we are updating the parameters on each epoch.

class SingleNeuronGrads:
    """A single percepton neuron with gradients."""

    def __init__(self, weights, bias, activation_func=logistic):
        self.weights = weights
        self.bias = bias
        self.activation = activation_func

    def forward(self, x):
        linear_sum = np.dot(x, self.weights) + self.bias
        return self.activation(linear_sum)

    def get_gradients(self, x, y_true):
        """Computes gradient of MSE loss with respect to weights and bias."""
        y_pred = self.forward(x)
        dL_dy = mse_derivative(y_true, y_pred)
        dy_db = logistic_derivative(np.dot(x, self.weights) + self.bias)
        dy_dw = np.matmul(
            dy_db[:, np.newaxis, np.newaxis],
            x[:, np.newaxis, :],
        )
        dL_dw = np.mean(dL_dy[:, np.newaxis, np.newaxis] @ dy_dw, axis=0)
        dL_db = np.mean(dL_dy * dy_db)
        return dL_dw.squeeze(), dL_db

    def train(self, X_train, y_train, num_epochs=100, learning_rate=0.01):
        """Optimise parameters for training data."""
        loss_record = list()
        for ep in range(num_epochs):
            dL_dw, dL_db = self.get_gradients(X_train, y_train)
            self.weights -= learning_rate * dL_dw
            self.bias -= learning_rate * dL_db
            rmse = np.sqrt(mse(y_train, self.forward(X_train)))
            loss_record.append(rmse)
            print(f'Epoch {ep}: RMSE = {rmse:2.4f}')
        return loss_record

Training is then as simple as

loss = neuron.train(X_train=X, y_train=y, num_epochs=100000)

Result

Well, how did it do? Our problem we posed in Fig. 1 is almost linear--easy to solve. But can our linear model find the answer?

Fig. 4. A trained neuron (i.e. a logistic regressor) solving the problem.

This could certainly be better. But at the very least, the parameters have re-oriented to point to the correct solution. To improve further, we shall need to extend the complexity of the model by adding more neurons. Let's do it!

Adding more neurons


Let's use our single neuron class to code a complete multi-layered perceptron with two layers, containing (4, 3) neurons, respectively.

class MultiLayerPerceptron:
    def __init__(self):
        self.n1 = SingleNeuron(np.random.normal(size=[2,]), 0.)
        self.n2 = SingleNeuron(np.random.normal(size=[2,]), 0.)
        self.n3 = SingleNeuron(np.random.normal(size=[2,]), 0.)
        self.n4 = SingleNeuron(np.random.normal(size=[2,]), 0.)

        self.n5 = SingleNeuron(np.random.normal(size=[4,]), 0.)
        self.n6 = SingleNeuron(np.random.normal(size=[4,]), 0.)
        self.n7 = SingleNeuron(np.random.normal(size=[2,]), 0.)

    def forward(self, x):

        # First hidden layer of width 4
        node11 = self.n1.forward(x)[:, np.newaxis]
        node12 = self.n2.forward(x)[:, np.newaxis]
        node13 = self.n3.forward(x)[:, np.newaxis]
        node14 = self.n4.forward(x)[:, np.newaxis]
        z1 = np.concatenate((node11, node12, node13, node14), axis=-1)

        # Second hidden layer of width 2
        node21 = self.n5.forward(z1)[:, np.newaxis]
        node22 = self.n6.forward(z1)[:, np.newaxis]
        z2 = np.concatenate((node21, node22), axis=-1)

        # Output layer
        return self.n7.forward(z2)

Now, deriving the gradients for this model is far more intricate. We shan't implement this here. But why would we? PyTorch has very efficient routines for implementing these algorithms. We'll explore this is the next videos.

In the meantime, let's take a look at the impact of constructing a deep MLP on the linearity of the model.

Fig. 5. An untrained multi-layed perceptron exhibiting non-linear behaviour.