Dr Mikkel Lykkegaard

by Dr Mikkel Lykkegaard

Lesson

Intro to Bayes

2. The Importance of Being Bayes: Introducing Bayes' Theorem

📑 Learning Objectives
  • Identify the various components of Bayes' formula: the posterior, prior, likelihood and evidence.
  • Use Bayes' formula to compute simple conditional probabilities for events.

Bayes' Theorem


Here is a fun fact: the inventor of Bayes' Theorem (Thomas Bayes) was not himself a Bayesian, as we understand it today. He simply saw the Bayesian Theorem as a way to calculate conditional probablities. So let's first have a look at Bayes' Theorem, the way it was intended by its creator.

For events

For events AA and BB, we can write Bayes' Theorem as:

P(A∣B)=P(A)P(B∣A)P(B)P(A|B) = \frac{P(A)P(B|A)}{P(B)}

where

  • P(A)P(A) is the prior probability of AA.
  • P(B∣A)P(B|A) is the likelihood of BB given AA.
  • P(B)P(B) is marginal probability or evidence, and,
  • P(A∣B)P(A|B) is the posterior probability of AA given BB.

The marginal likelihood can be expanded by summing up over all possible events Ai∈A{A_i} \in \mathcal A:

P(B)=∑iP(Ai)P(B∣Ai)P(B) = \sum_i P(A_i)P(B|A_i)

I appreciate that there is a lot to unpack here, and it doesn't get any simpler as we move from events to continuous random variables, and from conditional probabilities to Bayesian Inference. So let's take a moment to dive into an example of how to use Bayes' Theorem for a real-world example of conditional event estimation.

Example

Suppose that you have tested positive for COVID-19 using a lateral flow test, and you want to calculate the posterior probability that you actually have the disease. Let's introduce some notation first. The test can either be positive or negative, so we will call the event that the test is positive t+t_+, and, conversely, that it is negative t−t_-. Similarly, you can either be COVID-positive or COVID-negative, and we'll call those events c+c_+ and c−c_-, respectively:

  • t+t_+ : Positive test
  • t−t_- : Negative test
  • c+c_+ : COVID-positive
  • c−c_- : COVID-negative

If we encode this into Bayes' Theorem, we would like to find the posterior probability of having COVID-19, given that you just tested positive P(c+∣t+)P(c_+|t_+):

P(c+∣t+)=P(c+)P(t+∣c+)P(t+)P(c_+|t_+) = \frac{P(c_+)P(t_+|c_+)}{P(t_+)}

Let's unpack this. P(c+)P(c_+) is the prior probability of being COVID-positive. If we have no more information, let's assume that 10% of the population has COVID-19, so that P(c+)=0.1P(c_+)=0.1.

The likelihood is a bit more involved, and we will have to look into the actual sensitivity and specificity of the lateral flow test. We'll have to take those numbers from somewhere, and this news article tells us that they are 76.8% and 99.7%, respectively. Another word for the sensitivity is the true positive rate, or, in other words, P(t+∣c+)=0.768P(t_+|c_+) = 0.768. The specificity is also called the true negative rate, i.e. P(t−∣c−)=0.997P(t_-|c_-) = 0.997.

The marginal likelihood expresses the likelihood of observing the data, considering all possible values of the variable of interest, in this case your COVID status. This can be written in the following way:

P(t+)=P(c+)P(t+∣c+)+P(c−)P(t+∣c−){P(t_+)} = P(c_+)P(t_+|c_+) + P(c_-)P(t_+|c_-)

To calculate this, we need the two final ingredients, namely the prior probability of being COVID-negative P(c−)P(c_-) and the false positive rate of the lateral flow test P(t+∣c−)P(t_+|c_-). These can easily be obtained from the numbers we already have:

P(c−)=1−P(c+)=1−0.1=0.9P(t+∣c−)=1−P(t−∣c−)=1−0.997=0.003\begin{aligned} P(c_-) = 1 - P(c_+) = 1 - 0.1 &= 0.9 \\ P(t_+|c_-) = 1 - P(t_-|c_-) = 1 - 0.997 &= 0.003 \end{aligned}

Putting it all together, we get:

P(c+∣t+)=P(c+)P(t+∣c+)P(c+)P(t+∣c+)+P(c−)P(t+∣c−)=0.1⋅0.7680.1⋅0.768+0.9⋅0.003=0.966P(c_+|t_+) = \frac{P(c_+)P(t_+|c_+)}{P(c_+)P(t_+|c_+) + P(c_-)P(t_+|c_-)} = \frac{0.1 \cdot 0.768}{0.1 \cdot 0.768 + 0.9 \cdot 0.003} = 0.966

or, 96.6%. Let's flip the calculations around and see what is the probability of being negative, given that you get a negative test P(c−∣t−)P(c_-|t_-):

P(c−∣t−)=P(c−)P(t−∣c−)P(c+)P(t−∣c+)+P(c−)P(t−∣c−)=0.9⋅0.9970.1⋅0.232+0.9⋅0.997=0.974P(c_-|t_-) = \frac{P(c_-)P(t_-|c_-)}{P(c_+)P(t_-|c_+) + P(c_-)P(t_-|c_-)} = \frac{0.9 \cdot 0.997}{0.1 \cdot 0.232 + 0.9 \cdot 0.997} = 0.974

or, 97.9%. This means that if you test negative using a lateral flow testing kit, you are almost certainly negative, and you can get on with your life.