by Dr Mikkel Lykkegaard
Intro to Bayes
4. Prior and Posterior - Coin Flipping Walkthrough
Download the resources for this lesson here.
- Using Bayes' Theorem to solve a coin flipping parameter estimation problem.
- Identify different prior distributions and use them.
- Use simple numerical methods to find the evidence in Bayes' Theorem.
Using Bayes' Theorem to perform inference on a coin flipping example.
We have had a look at the interpretation of the prior, likelihood and posterior. Now let's have a look at how they might be used in practice. We start by defining a function that returns the prior distribution, likelihood function and posterior distribution for a given number of successes out of trials, with several different options for the prior distribution, namely the Jeffreys prior, the conjugate (Beta) prior, a normal prior and the uniform prior.
def bernoulli_bayes(k, n, prior='jeffreys', prior_parameters=None):
# define the Jeffreys prior for a binomial likelihood.
if prior == 'jeffreys':
prior = lambda p: 1 / np.sqrt(p * (1-p))
# define the conjugate (Beta) prior for a binomial likelihood.
elif prior == 'beta':
# extract the prior parameters if they were given.
if prior_parameters is not None:
a, b = prior_parameters
# otherwise set the default a = b = 0.5, corresponding to the Jeffreys prior.
else:
a, b = 0.5, 0.5
prior = lambda p: beta(a=a, b=b).pdf(p)
# define a normal prior.
elif prior == 'normal':
# extract the prior parameters if they were given.
if prior_parameters is not None:
loc, scale = prior_parameters
# otherwise set it to a reasonably narrow normal centered at 0.5 (unbiased coin).
else:
loc, scale = 0.5, 0.1
prior = lambda p: norm(loc=loc, scale=scale).pdf(p)
# define a uniform prior.
elif prior == 'uniform':
prior = lambda p: uniform().pdf(p)
# define the likelihood. this must be a function of the distribution parameter p,
# rather than the support of the distribution.
likelihood = lambda p: binom(n=n, p=p).pmf(k)
# set the evaluation range (some priors are undefined at 0 and 1.
range = (1e-3, 1 - 1e-3)
p = np.linspace(*range, 100)
# evaluate the prior and likelihood.
prior_eval = prior(p)
likelihood_eval = np.array([likelihood(_p) for _p in p])
# if we use the conjugate prior (Beta), the posterior has a closed-form solution.
if prior == 'beta':
posterior = beta(a=a+k, b=b+(n-k))
posterior_eval = posterior.pdf(p)
# otherwise, we are using quadrature to evaluate the evidence.
else:
evidence = quad(lambda _p: prior(_p)*likelihood(_p), 0, 1)[0]
posterior_eval = prior_eval*likelihood_eval / evidence
# return prior, likelihood and posterior.
return prior_eval, likelihood_eval, posterior_eval
Let's also define a plotting function that can visualise our different distributions.
def plot_distributions(prior_eval, likelihood_eval, posterior_eval):
# set the range
range = (1e-3, 1 - 1e-3)
p = np.linspace(*range, 100)
fig, ax = plt.subplots(figsize=(16,4), ncols=3, nrows=1)
# plot the prior.
ax[0].set_title('Prior')
ax[0].plot(p, prior_eval)
ax[0].set_xlabel(r'$\gamma$')
ax[0].set_ylabel(r'$\rho$')
# plot the likelihood.
ax[1].set_title('Likelihood')
ax[1].plot(p, likelihood_eval)
ax[1].set_xlabel(r'$\gamma$')
ax[1].set_ylabel(r'$\rho$')
# plot the posterior.
ax[2].set_title('Posterior')
ax[2].plot(p, posterior_eval)
ax[2].set_xlabel(r'$\gamma$')
ax[2].set_ylabel(r'$\rho$')
plt.show()
Let's try and use it to make inference for an example where we got successes out of trials:
successes = 2
trials = 3
for prior in ['jeffreys', 'beta', 'normal', 'uniform']:
print('{} prior'.format(prior))
plot_distributions(*bernoulli_bayes(successes, trials, prior))
As you can see, the posteriors for the Jeffreys prior and the Beta prior with are identical, since the priors are identical up to a constant. For the informed (Normal) prior, the posterior is near-identical to the likelihood, since for such a small dataset with trials, there is simply not enough evidence to suggest that the coin is any different than what we expected. For the Uniform prior, the posterior is identical to the likelihood function up to a constant, since the likelihood function is not a proper probability distribution.
Now let's try and see what happens when we observe more data. Let's preserve the same ratio between successes and trials, so that and .
successes = 66
trials = 100
for prior in ['jeffreys', 'beta', 'normal', 'uniform']:
print('{} prior'.format(prior))
plot_distributions(*bernoulli_bayes(successes, trials, prior))
With a larger number of trials, we see that the posterior contracts around (what we now think is) the true bias of the coin, namely , for all the four different prior distributions. Since all the posterior distributions are more or less identical, it is clear that the prior does not have much influence on the posterior when more data is available.