by Dr Mikkel Lykkegaard
Bayesian Linear Regression and parameter Estimation
8. Bayesian Parameter Estimation: Walkthrough with COVID Example
Download the resources for this lesson here.
- Use ordinary least squares regression on a log-transformed variable to get an exponential model.
- Define the forward operator of a Baysian parameter estimation problem.
- Set up Bayesian prior distributions for discrete and continuous variables and set up a likelihood function.
- Write a simple MCMC sampler that samples from the posterior distribution.
Bayesian Parameter Estimation: Walkthrough with COVID Example.
In this example, we are going to model COVID cases in the UK in the period 1-15 March 2020, just after the first cases were detected in the country. Lets first load the data and do a few transformations.
# read the data from the CSV and convert the date into a datetime
data = pd.read_csv('covid.csv')
data['date'] = pd.to_datetime(data['date'])
# create a new variable, which is days since the first measurement
data['day'] = (data['date'] - data['date'].iloc[0]).dt.days
We are now going to perform ordinary least-squares (OLS) regression on the log-transformed output variable, the number of cases, i.e. . We are going to transform the OLS coefficients back into the original space so that . Then we will plot the model along with the datapoints.
# find the least squares estimate for the log-transformed output variable
lr = np.linalg.lstsq(np.c_[np.ones(len(data)), data['day']], np.log(data['cases']), rcond=None)
print(lr)
(array([3.59852482, 0.26155652]), array([0.19268671]), 2, array([32.02976831, 2.02334923]))
# convert the coefficients back into the original space
y0 = np.exp(lr[0][0])
r = lr[0][1]
# compute the response of the model according to the least squares estimate
t = np.linspace(0,14)
y = y0 * np.exp(r*t)
# plot the model response and and data
data.plot.scatter('day', 'cases', c='r')
plt.plot(t, y)
plt.show()
That looks pretty good. However, this is just the OLS estimate and it is, as such, a point estimate. Not all the datapoints are well-described by the model. But using the Bayesian formalism that we looked at in the previous lesson, we can instead find the posterior distribution of the model parameters, and use that to the evaluate all the possible scenarios. We will first extract the data from the dataframe, then we will define the forward model and then we will set up some priors for our parameters .
# extract the data as NumPy arrays
x_data = data.day.values
y_data = data.cases.values
# set up the model
def model(y0, r):
return (y0*np.exp(r*x_data)).astype(int)
# define the priors
prior_y0 = stats.poisson(36) # poisson prior for the initial value
prior_r = stats.lognorm(s=0.5, scale=0.2) # lognorm prior for the rate
# plot the priors
fig, ax = plt.subplots(figsize=(12,4), nrows=1, ncols=2)
x = np.arange(0,60)
p = prior_y0.pmf(x)
ax[0].set_xlabel('$y_0$')
ax[0].set_ylabel('$p(y_0)$')
ax[0].scatter(x, p)
x = np.linspace(0, 1)
p = prior_r.pdf(x)
ax[1].set_xlabel('$r$')
ax[1].set_ylabel('$p(r)$')
ax[1].plot(x, p)
plt.show()
These are the priors, and they encode the soft knowledge we have about the parameters before conditioning on the data. Now let's try and draw some samples from the prior, pass them through our forward model and plot the result against the data.
# plot some draws from the prior and run them through the model
for i in range(1000):
y0 = prior_y0.rvs()
r = prior_r.rvs()
y = model(y0, r)
plt.plot(x_data, y, c='k', alpha=0.01)
# plot the data along with the prior draws
plt.scatter(x_data, y_data, c='r')
plt.ylim(0,2000)
plt.show()
That looks pretty good. It is clear that all the samples are using the exponential model, but they are more dispersed than the data would permit. We could maybe allow for more flexibility in the prior, but let's keep it like this for simplicity. Now let's define the joint prior and the likelihood function.
# define the joint prior
def prior(x):
return sum([prior_y0.logpmf(x[0]), prior_r.logpdf(x[1])])
# define the likelihood
def likelihood(x):
# we are using a negative binomial likelihood since it is a bit more flexible than
# the poisson distribution. It allows us to also model the variance independent of the
# mean. it is typically used as a kind of overdispersed Poisson.
return stats.nbinom.logpmf(k=0, n=np.abs(model(x[0], x[1]) - y_data)+1, p=0.99).sum()
Let's also plot the likelihood function for a range of inputs.
# plot the likelihood function
x = np.arange(-300, 300)
p = stats.nbinom.pmf(k=0, n=np.abs(x)+1, p=0.99)
plt.scatter(x, p, s=1)
plt.show()
What this shows us is that the likelihood is highest for , that is zero miscounts. It then decays slowly as we increase number of (absolute) miscounts. It is still pretty likely to have e.g. 100 miscounts on any given day of the timeseries that we are modelling. Lets' plot the joint prior and the likelihood function evaluated on the actual data.
# evaluate the joint prior and likelihood for our problem.
y0_range = np.arange(10,60)
r_range = np.linspace(0.1, 0.5)
y0_grid, r_grid = np.meshgrid(y0_range, r_range)
prior_eval = np.zeros(y0_grid.shape)
likelihood_eval = np.zeros(y0_grid.shape)
for i in range(r_range.size):
for j in range(y0_range.size):
prior_eval[i,j] = prior([y0_grid[i,j], r_grid[i,j]])
likelihood_eval[i,j] = likelihood([y0_grid[i,j], r_grid[i,j]])
# plot the joint prior and the likelihood function.
fig, ax = plt.subplots(figsize=(12,4), nrows=1, ncols=2)
ax[0].set_title('Prior')
y0_plot = ax[0].imshow(np.exp(prior_eval), extent=(10, 60, 0.1, 0.35), origin='lower', aspect='auto')
ax[0].set_xlabel('$y_0$')
ax[0].set_ylabel('$r$')
plt.colorbar(y0_plot, ax=ax[0])
ax[1].set_title('Likelihood')
r_plot = ax[1].imshow(np.exp(likelihood_eval), extent=(10, 60, 0.1, 0.35), origin='lower', aspect='auto')
ax[1].set_xlabel('$y_0$')
ax[1].set_ylabel('$r$')
plt.colorbar(r_plot, ax=ax[1])
plt.show()
Here we see that the joint prior is pretty diffuse and permits most values across this range of parameters. The likelihood, however, is very concentrated along a narrow ridge in this parameter space. Let's set up and MCMC sampler and draw some samples from the posterior distribution. We are then going to plot the posterior samples.
# set up a proposal
def proposal(x):
# set the step mechanism for each parameter
y0_dx = int(stats.norm(0, 3).rvs()) # zero-centered normal cast as int for y0
r_dx = stats.norm(0, 0.01).rvs() # zero-centered normal for r
# create a vector of the steps
dx = np.array([y0_dx, r_dx])
# compute the proposal
p = x + dx
# return the proposal
return p
# set an empty list for the samples
samples = []
# draw the first MCMC state from the prior
y0_0 = prior_y0.rvs()
r_0 = prior_r.rvs()
samples.append(np.array([y0_0, r_0]))
# iterate through the MCMC
for i in tqdm(range(12000)):
# create a proposal from the current state
p = proposal(samples[-1])
# comoute the acceptance probability
alpha = np.exp(prior(p) + likelihood(p) - prior(samples[-1]) - likelihood(samples[-1]))
# accept or reject the proposal
if np.random.uniform() < alpha:
samples.append(p)
else:
samples.append(samples[-1])
# create numpy array and remove burnin
samples = np.array(samples)[2000:,:]
# plot the posterior samples
plt.scatter(samples[:,0], samples[:,1], c='k', alpha=0.01)
plt.show()
As we can see, the posterior samples are fairly close to the likelihood that we plotted earlier. This is not so surprising since the prior is relatively diffuse and so the likelihood function dominates the posterior. Here, we can also clearly see the bands of on the x-axis, which occur because we model that variable as an integer. Finally, let's run some posterior samples throught the model and do some forecasting.
# create a new time vector to do some forecasting
t = np.linspace(0, 20)
# draw 1000 samples from the posterior
for i in range(1000):
idx = np.random.randint(0, samples.shape[0])
y = samples[i,0]*np.exp(samples[i,1]*t)
# plot each posterior sample
plt.plot(t, y, c='k', alpha=0.01, zorder=0)
# plot the data along with the posterior samples
plt.scatter(x_data, y_data, c='r')
plt.savefig('../images/forecasting.png', bbox_inches='tight')
Here, we have cranked the time-horizon up to 20, so we predict for 6 days into the future. Since we are using a probabilistic (Bayesian) model, we also get the uncertainty of the forecast. This tells us that the uncertainty increases the futher we try to predict, and after only 6 days into the future, there could be anything between 4000 and 8000 cases. This is a really important takeaway, since it allows for decision makers such as hospitals to make preparations for the worst-case scenario. Using the Bayesian formalism, obtaining such worst-case estimates is straightforward and relatively painless.