by Dr Mikkel Lykkegaard
Intro to Bayes
2. The Importance of Being Bayes: Introducing Bayes' Theorem
- Identify the various components of Bayes' formula: the posterior, prior, likelihood and evidence.
- Use Bayes' formula to compute simple conditional probabilities for events.
Bayes' Theorem
Here is a fun fact: the inventor of Bayes' Theorem (Thomas Bayes) was not himself a Bayesian, as we understand it today. He simply saw the Bayesian Theorem as a way to calculate conditional probablities. So let's first have a look at Bayes' Theorem, the way it was intended by its creator.
For events
For events and , we can write Bayes' Theorem as:
where
- is the prior probability of .
- is the likelihood of given .
- is marginal probability or evidence, and,
- is the posterior probability of given .
The marginal likelihood can be expanded by summing up over all possible events :
I appreciate that there is a lot to unpack here, and it doesn't get any simpler as we move from events to continuous random variables, and from conditional probabilities to Bayesian Inference. So let's take a moment to dive into an example of how to use Bayes' Theorem for a real-world example of conditional event estimation.
Example
Suppose that you have tested positive for COVID-19 using a lateral flow test, and you want to calculate the posterior probability that you actually have the disease. Let's introduce some notation first. The test can either be positive or negative, so we will call the event that the test is positive , and, conversely, that it is negative . Similarly, you can either be COVID-positive or COVID-negative, and we'll call those events and , respectively:
- : Positive test
- : Negative test
- : COVID-positive
- : COVID-negative
If we encode this into Bayes' Theorem, we would like to find the posterior probability of having COVID-19, given that you just tested positive :
Let's unpack this. is the prior probability of being COVID-positive. If we have no more information, let's assume that 10% of the population has COVID-19, so that .
The likelihood is a bit more involved, and we will have to look into the actual sensitivity and specificity of the lateral flow test. We'll have to take those numbers from somewhere, and this news article tells us that they are 76.8% and 99.7%, respectively. Another word for the sensitivity is the true positive rate, or, in other words, . The specificity is also called the true negative rate, i.e. .
The marginal likelihood expresses the likelihood of observing the data, considering all possible values of the variable of interest, in this case your COVID status. This can be written in the following way:
To calculate this, we need the two final ingredients, namely the prior probability of being COVID-negative and the false positive rate of the lateral flow test . These can easily be obtained from the numbers we already have:
Putting it all together, we get:
or, 96.6%. Let's flip the calculations around and see what is the probability of being negative, given that you get a negative test :
or, 97.9%. This means that if you test negative using a lateral flow testing kit, you are almost certainly negative, and you can get on with your life.