exercise1_coughingpatient.utf8

Coughing Patient

Suppose that you work in a doctor’s office and you meet a woman called Jen who is sitting in the waiting room. You start thinking about what condition she might have. To keep things simple, let’s assume that there are only three possible hypotheses \(h\): either she has a cold, or she has emphysema, or she has a stomach upset. Your prior distribution \(P(h)\) over these hypotheses captures your expectations about which hypothesis is true before you have gathered any additional evidence. Let’s assume that the prior plotted below captures your beliefs. As set up initially, the prior captures the idea that cold and stomach upset are both more likely than emphysema.

h  <- c('cold', 'emphysema', 'stomach upset')
p_h <- c(0.46, 0.04, 0.5)
prior <- tibble(h, val=p_h, dist='prior P(h)')

plotdiseasechart <- function(d) {
  pic <- d %>%
    ggplot(aes(x=h, y = val)) +
    scale_y_continuous(lim=c(0,1)) +
    geom_col() +
    facet_grid(dist ~ .)  +
    xlab("hypothesis")
  plot(pic)
}

plotdiseasechart(prior)

Now you notice that Jen has a cough. This observation is our data set \(D\). Your likelihood function \(P(D|h)\)indicates how probable the data would be if each of the hypotheses were true. The vertical-bar notation represents a conditional probability — the probability of \(D\) given \(h\). The function \(P(D|h)\) plotted below indicates that coughing is fairly probable if Jen has a cold or if she has emphsyema, but not very probable if she has a stomach upset.

p_d_given_h <- c(0.4, 0.4, 0.05)
likelihood  <- tibble(h, val=p_d_given_h, dist='likelihood P(D|h)')
plotdiseasechart(likelihood)

After observing the data \(D\) you update your prior beliefs \(P(h)\) and these updated beliefs are captured by a posterior distribution \(P(h|D)\). The notation here again represents a conditional probability — the probability of \(h\) given \(D\). The normative way to combine the prior and likelihood to arrive at the posterior is captured by Bayes rule: \[\begin{equation} P(h|D) \propto P(D|h) P(h) \end{equation}\]

So we can calculate the posterior by multiplying the likelihood and the prior then renormalizing (ie dividing by a constant so that the posterior distribution sums to 1 over the hypothesis space).

# update prior by multiplying by likelihood
p_h_given_d <- p_d_given_h * p_h
# "normalise" the posterior so that it sums to 1
p_h_given_d <- p_h_given_d / sum(p_h_given_d)
posterior  <- tibble(h, val=p_h_given_d, dist='posterior P(h|D)')
plotdiseasechart(posterior)

Here the posterior indicates that cold is the most likely diagnosis. Of the three hypotheses it is the only one which has a fairly high prior \(P(h)\) AND which makes the data probable (ie has a high likelihood \(P(D|h)\)).

Day 4: Bayesian Inference

Day 4: Bayesian Inference

Charles Kemp

CHDSS 2019

Coughing Patient

Exercises