Dr Mikkel Lykkegaard

by Dr Mikkel Lykkegaard

Lesson

Bayesian Linear Regression and parameter Estimation

8. Bayesian Parameter Estimation: Walkthrough with COVID Example

📂 Resources

Download the resources for this lesson here.

📑 Learning Objectives
  • Use ordinary least squares regression on a log-transformed variable to get an exponential model.
  • Define the forward operator of a Baysian parameter estimation problem.
  • Set up Bayesian prior distributions for discrete and continuous variables and set up a likelihood function.
  • Write a simple MCMC sampler that samples from the posterior distribution.

Bayesian Parameter Estimation: Walkthrough with COVID Example.


In this example, we are going to model COVID cases in the UK in the period 1-15 March 2020, just after the first cases were detected in the country. Lets first load the data and do a few transformations.

# read the data from the CSV and convert the date into a datetime
data = pd.read_csv('covid.csv')
data['date'] = pd.to_datetime(data['date'])

# create a new variable, which is days since the first measurement
data['day'] = (data['date'] - data['date'].iloc[0]).dt.days

We are now going to perform ordinary least-squares (OLS) regression on the log-transformed output variable, the number of cases, i.e. logy=β0+β1t\log y = \beta_0 + \beta_1 \: t. We are going to transform the OLS coefficients back into the original space so that y(t)=eβ0eβ1t=y0erty(t) = e^{\beta_0} e^{\beta_1 \: t} = y_0 e^{r \: t}. Then we will plot the model along with the datapoints.

# find the least squares estimate for the log-transformed output variable
lr = np.linalg.lstsq(np.c_[np.ones(len(data)), data['day']], np.log(data['cases']), rcond=None)
print(lr)
(array([3.59852482, 0.26155652]), array([0.19268671]), 2, array([32.02976831,  2.02334923]))

# convert the coefficients back into the original space
y0 = np.exp(lr[0][0])
r = lr[0][1]

# compute the response of the model according to the least squares estimate
t = np.linspace(0,14)
y = y0 * np.exp(r*t)

# plot the model response and and data
data.plot.scatter('day', 'cases', c='r')
plt.plot(t, y)
plt.show()
Equator split

That looks pretty good. However, this is just the OLS estimate and it is, as such, a point estimate. Not all the datapoints are well-described by the model. But using the Bayesian formalism that we looked at in the previous lesson, we can instead find the posterior distribution of the model parameters, and use that to the evaluate all the possible scenarios. We will first extract the data from the dataframe, then we will define the forward model F(θ)\mathcal F(\theta) and then we will set up some priors for our parameters θ=(y0,r)\theta = (y_0, r).

# extract the data as NumPy arrays
x_data = data.day.values
y_data = data.cases.values

# set up the model
def model(y0, r):
    return (y0*np.exp(r*x_data)).astype(int)

# define the priors
prior_y0 = stats.poisson(36)              # poisson prior for the initial value
prior_r = stats.lognorm(s=0.5, scale=0.2) # lognorm prior for the rate

# plot the priors
fig, ax = plt.subplots(figsize=(12,4), nrows=1, ncols=2)

x = np.arange(0,60)
p = prior_y0.pmf(x)
ax[0].set_xlabel('$y_0$')
ax[0].set_ylabel('$p(y_0)$')
ax[0].scatter(x, p)

x = np.linspace(0, 1)
p = prior_r.pdf(x)
ax[1].set_xlabel('$r$')
ax[1].set_ylabel('$p(r)$')
ax[1].plot(x, p)

plt.show()
Equator split

These are the priors, and they encode the soft knowledge we have about the parameters before conditioning on the data. Now let's try and draw some samples from the prior, pass them through our forward model and plot the result against the data.

# plot some draws from the prior and run them through the model
for i in range(1000):
    y0 = prior_y0.rvs()
    r = prior_r.rvs()
    y = model(y0, r)
    plt.plot(x_data, y, c='k', alpha=0.01)

# plot the data along with the prior draws
plt.scatter(x_data, y_data, c='r')
plt.ylim(0,2000)
plt.show()
Equator split

That looks pretty good. It is clear that all the samples are using the exponential model, but they are more dispersed than the data would permit. We could maybe allow for more flexibility in the y0y_0 prior, but let's keep it like this for simplicity. Now let's define the joint prior and the likelihood function.

# define the joint prior
def prior(x):
    return sum([prior_y0.logpmf(x[0]), prior_r.logpdf(x[1])])

# define the likelihood
def likelihood(x):
    # we are using a negative binomial likelihood since it is a bit more flexible than
    # the poisson distribution. It allows us to also model the variance independent of the
    # mean. it is typically used as a kind of overdispersed Poisson.
    return stats.nbinom.logpmf(k=0, n=np.abs(model(x[0], x[1]) - y_data)+1, p=0.99).sum()

Let's also plot the likelihood function for a range of inputs.

# plot the likelihood function
x = np.arange(-300, 300)
p = stats.nbinom.pmf(k=0, n=np.abs(x)+1, p=0.99)
plt.scatter(x, p, s=1)
plt.show()

Equator split

What this shows us is that the likelihood is highest for ϵ=0\epsilon = 0, that is zero miscounts. It then decays slowly as we increase number of (absolute) miscounts. It is still pretty likely to have e.g. 100 miscounts on any given day of the timeseries that we are modelling. Lets' plot the joint prior and the likelihood function evaluated on the actual data.

# evaluate the joint prior and likelihood for our problem.
y0_range = np.arange(10,60)
r_range = np.linspace(0.1, 0.5)
y0_grid, r_grid = np.meshgrid(y0_range, r_range)

prior_eval = np.zeros(y0_grid.shape)
likelihood_eval = np.zeros(y0_grid.shape)

for i in range(r_range.size):
    for j in range(y0_range.size):
        prior_eval[i,j] = prior([y0_grid[i,j], r_grid[i,j]])
        likelihood_eval[i,j] = likelihood([y0_grid[i,j], r_grid[i,j]])

# plot the joint prior and the likelihood function.
fig, ax = plt.subplots(figsize=(12,4), nrows=1, ncols=2)

ax[0].set_title('Prior')
y0_plot = ax[0].imshow(np.exp(prior_eval), extent=(10, 60, 0.1, 0.35), origin='lower', aspect='auto')
ax[0].set_xlabel('$y_0$')
ax[0].set_ylabel('$r$')
plt.colorbar(y0_plot, ax=ax[0])

ax[1].set_title('Likelihood')
r_plot = ax[1].imshow(np.exp(likelihood_eval), extent=(10, 60, 0.1, 0.35), origin='lower', aspect='auto')
ax[1].set_xlabel('$y_0$')
ax[1].set_ylabel('$r$')
plt.colorbar(r_plot, ax=ax[1])

plt.show()
Equator split

Here we see that the joint prior is pretty diffuse and permits most values across this range of parameters. The likelihood, however, is very concentrated along a narrow ridge in this parameter space. Let's set up and MCMC sampler and draw some samples from the posterior distribution. We are then going to plot the posterior samples.

# set up a proposal
def proposal(x):

    # set the step mechanism for each parameter
    y0_dx = int(stats.norm(0, 3).rvs()) # zero-centered normal cast as int for y0
    r_dx = stats.norm(0, 0.01).rvs()    # zero-centered normal for r

    # create a vector of the steps
    dx = np.array([y0_dx, r_dx])

    # compute the proposal
    p = x + dx

    # return the proposal
    return p

# set an empty list for the samples
samples = []

# draw the first MCMC state from the prior
y0_0 = prior_y0.rvs()
r_0 = prior_r.rvs()
samples.append(np.array([y0_0, r_0]))

# iterate through the MCMC
for i in tqdm(range(12000)):

    # create a proposal from the current state
    p = proposal(samples[-1])

    # comoute the acceptance probability
    alpha = np.exp(prior(p) + likelihood(p) - prior(samples[-1]) - likelihood(samples[-1]))

    # accept or reject the proposal
    if np.random.uniform() < alpha:
        samples.append(p)
    else:
        samples.append(samples[-1])

# create numpy array and remove burnin
samples = np.array(samples)[2000:,:]

# plot the posterior samples
plt.scatter(samples[:,0], samples[:,1], c='k', alpha=0.01)
plt.show()
Equator split

As we can see, the posterior samples are fairly close to the likelihood that we plotted earlier. This is not so surprising since the prior is relatively diffuse and so the likelihood function dominates the posterior. Here, we can also clearly see the bands of y0y_0 on the x-axis, which occur because we model that variable as an integer. Finally, let's run some posterior samples throught the model and do some forecasting.

# create a new time vector to do some forecasting
t = np.linspace(0, 20)

# draw 1000 samples from the posterior
for i in range(1000):
    idx = np.random.randint(0, samples.shape[0])
    y = samples[i,0]*np.exp(samples[i,1]*t)

    # plot each posterior sample
    plt.plot(t, y, c='k', alpha=0.01, zorder=0)

# plot the data along with the posterior samples
plt.scatter(x_data, y_data, c='r')
plt.savefig('../images/forecasting.png', bbox_inches='tight')
Equator split

Here, we have cranked the time-horizon up to 20, so we predict for 6 days into the future. Since we are using a probabilistic (Bayesian) model, we also get the uncertainty of the forecast. This tells us that the uncertainty increases the futher we try to predict, and after only 6 days into the future, there could be anything between 4000 and 8000 cases. This is a really important takeaway, since it allows for decision makers such as hospitals to make preparations for the worst-case scenario. Using the Bayesian formalism, obtaining such worst-case estimates is straightforward and relatively painless.