library(here)
library(tidyverse)
library(ggplot2)
This notebook gives a simple example of Bayesian inference.
Suppose that you work in a doctor’s office and you meet a woman called Jen who is sitting in the waiting room. You start thinking about what condition she might have. To keep things simple, let’s assume that there are only three possible hypotheses \(h\): either she has a cold
, or she has emphysema
, or she has a stomach upset
. Your prior distribution \(P(h)\) over these hypotheses captures your expectations about which hypothesis is true before you have gathered any additional evidence. Let’s assume that the prior plotted below captures your beliefs. As set up initially, the prior captures the idea that cold
and stomach upset
are both more likely than emphysema
.
h <- c('cold', 'emphysema', 'stomach upset')
p_h <- c(0.46, 0.04, 0.5)
prior <- tibble(h, val=p_h, dist='prior P(h)')
plotdiseasechart <- function(d) {
pic <- d %>%
ggplot(aes(x=h, y = val)) +
scale_y_continuous(lim=c(0,1)) +
geom_col() +
facet_grid(dist ~ .) +
xlab("hypothesis")
plot(pic)
}
plotdiseasechart(prior)
Now you notice that Jen has a cough. This observation is our data set \(D\). Your likelihood function \(P(D|h)\)indicates how probable the data would be if each of the hypotheses were true. The vertical-bar notation represents a conditional probability — the probability of \(D\) given \(h\). The function \(P(D|h)\) plotted below indicates that coughing is fairly probable if Jen has a cold or if she has emphsyema, but not very probable if she has a stomach upset.
p_d_given_h <- c(0.4, 0.4, 0.05)
likelihood <- tibble(h, val=p_d_given_h, dist='likelihood P(D|h)')
plotdiseasechart(likelihood)
After observing the data \(D\) you update your prior beliefs \(P(h)\) and these updated beliefs are captured by a posterior distribution \(P(h|D)\). The notation here again represents a conditional probability — the probability of \(h\) given \(D\). The normative way to combine the prior and likelihood to arrive at the posterior is captured by Bayes rule: \[\begin{equation} P(h|D) \propto P(D|h) P(h) \end{equation}\]
So we can calculate the posterior by multiplying the likelihood and the prior then renormalizing (ie dividing by a constant so that the posterior distribution sums to 1 over the hypothesis space).
# update prior by multiplying by likelihood
p_h_given_d <- p_d_given_h * p_h
# "normalise" the posterior so that it sums to 1
p_h_given_d <- p_h_given_d / sum(p_h_given_d)
posterior <- tibble(h, val=p_h_given_d, dist='posterior P(h|D)')
plotdiseasechart(posterior)
Here the posterior indicates that cold
is the most likely diagnosis. Of the three hypotheses it is the only one which has a fairly high prior \(P(h)\) AND which makes the data probable (ie has a high likelihood \(P(D|h)\)).
The prior and posterior distributions sum to 1 but the likelihood function does not. Should it? Why or why not?
Change the code so that the prior is uniform. Generate the plots again — does the posterior match your intuitions about what should happen if the prior were truly random?
Change the code to use the original prior but adjust the likelihood so that coughing is equally probable given each hypothesis (e.g. P(coughing|cold) = P(coughing|emphysema) = P(coughing|stomach upset) = 0.4). Generate the plots again — does the posterior match your intuitions about what should happen if the likelihood were truly random?