This document gives an introduction to mixed effects models. It follows Chapter 9 of R for Psych by Glenn Williams very closely, and the toy data set we’ll use is from the same resource.

sleep_groups <-read_csv(here("data", "sleep_study_with_sim_groups.csv")) 
## Parsed with column specification:
## cols(
##   Reaction = col_double(),
##   Days = col_double(),
##   Subject = col_double(),
##   Group = col_character()
## )
if (1) { # shuffle subjects between groups
# make frame showing Group for each Subject
SubjGroups <-  sleep_groups %>%  
  group_by(Subject, Group) %>%  
  summarize()

nsubject <- max(sleep_groups$Subject)
subjperm <- sample(nsubject)
orig_groups <- SubjGroups$Group
sleep_groups <- sleep_groups %>% 
  mutate(Group= orig_groups[subjperm[Subject]])
}

  
sleep_groups <- sleep_groups %>% 
  mutate(Subject=factor(Subject))
head(sleep_groups)
## # A tibble: 6 x 4
##   Reaction  Days Subject Group  
##      <dbl> <dbl> <fct>   <chr>  
## 1     250.     0 1       group_1
## 2     259.     1 1       group_1
## 3     251.     2 1       group_1
## 4     321.     3 1       group_1
## 5     357.     4 1       group_1
## 6     415.     5 1       group_1

This data set is for an (imaginary) study that looks at the effect of days without sleep on reaction times. Suppose that there are three groups of people— for example, group_1 could be teenagers, group_2 twenty-year olds and group_3 thirty-year olds.

Linear model

Let’s first fit and plot a linear model:

lm(Reaction ~ Days, data = sleep_groups)
## 
## Call:
## lm(formula = Reaction ~ Days, data = sleep_groups)
## 
## Coefficients:
## (Intercept)         Days  
##       371.7         15.4
ggplot(data = sleep_groups, mapping = aes(x = Days, y = Reaction)) +
  geom_point(na.rm = T, aes(col = Group), alpha = 0.5) +
  geom_smooth(method = "lm", na.rm = T, col = "black", se = F) +
  scale_y_continuous(limits = c(180, 1020)) +
  scale_x_continuous(breaks = seq(1:10) - 1) +
  theme(legend.position = "top")

You can see that the three groups look to have different intercepts and probably different slopes — for example, the blue points (group 3) are higher than the rest, and seem to rise more steeply. So let’s try adding random effects to capture the statistical dependency between samples from the same group.

Random intercepts for each group

me_intercepts <- lmer(Reaction ~ Days + (1 | Group), data = sleep_groups) 
me_intercepts
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (1 | Group)
##    Data: sleep_groups
## REML criterion at convergence: 6968.668
## Random effects:
##  Groups   Name        Std.Dev.
##  Group    (Intercept)  17.99  
##  Residual             154.71  
## Number of obs: 540, groups:  Group, 3
## Fixed Effects:
## (Intercept)         Days  
##       371.7         15.4
# see group coefficients
model_coefs <- coef(me_intercepts)$Group %>% 
  rename(Intercept = `(Intercept)`, Slope = Days) %>% 
  rownames_to_column("Group")

model_coefs
##     Group Intercept    Slope
## 1 group_1  359.0791 15.40385
## 2 group_2  388.4861 15.40385
## 3 group_3  367.5083 15.40385
sleep_groups_rani <- left_join(sleep_groups, model_coefs, by = "Group")

model_coef_plot <- ggplot(data = sleep_groups_rani, 
       mapping = aes(x = Days, 
                     y = Reaction, 
                     colour = Group)
       ) +
  geom_point(na.rm = T, alpha = 0.5) +
  geom_abline(aes(intercept = Intercept, 
                  slope = Slope,
                  colour = Group
                  ),
              size = 1.5
              ) +
  scale_y_continuous(limits = c(180, 1020)) +
  scale_x_continuous(breaks = seq(1:10) - 1) +
  theme(legend.position = "top")

# see the plot
model_coef_plot

Looks better but still not ideal. For example the line for the blue group should probably have a greater slope than the line for the red group.

Random slopes for each group

me_slopes <- lmer(Reaction ~ Days + (0 + Days | Group), data = sleep_groups)

# see group coefficients
model_coefs <- coef(me_slopes)$Group %>% 
  rename(Intercept = `(Intercept)`, Slope = Days) %>% 
  rownames_to_column("Group")

# see coefficients
model_coefs
##     Group Intercept    Slope
## 1 group_1  371.6912 12.74239
## 2 group_2  371.6912 18.60794
## 3 group_3  371.6912 14.86121
sleep_groups_rans <- left_join(sleep_groups, model_coefs, by = "Group")

model_coef_plot %+% sleep_groups_rans

Slopes look better but intercepts seem off – the intercept for the blue group should be higher than for the other two groups.

Random intercepts and slopes

me_interceptslopes <- lmer(Reaction ~ Days + (1 + Days | Group), data = sleep_groups)
## singular fit
# see group coefficients
model_coefs <- coef(me_interceptslopes)$Group %>% 
  rename(Intercept = `(Intercept)`, Slope = Days) %>% 
  rownames_to_column("Group")

# see coefficients
model_coefs
##     Group Intercept    Slope
## 1 group_1  365.3002 13.77405
## 2 group_2  379.7115 17.44916
## 3 group_3  370.0618 14.98834
sleep_groups_ranis <- left_join(sleep_groups, model_coefs, by = "Group")

model_coef_plot %+% sleep_groups_ranis

Now the regression lines for each group look better.

Additional random effects?

There’s another dependency that we haven’t yet acknowledged. All of the data points for a given individual will be coupled in a way that data points for different individuals are not. So we could perhaps add a random intercept for Subject.

set.seed(1)
me_subjectintercept <- lmer(Reaction ~ Days + (1 + Days | Group) + (1|Subject), data = sleep_groups)
## singular fit

Exercise 1: Add a random slope for Subject

Use lmer to fit a model that is similar to `me_subjectintercept’ but also includes a random slope for Subject.

me_subjectinterceptslope <- lmer(Reaction ~ Days + (1 + Days | Group) + (1 + Days|Subject), data = sleep_groups) 
## singular fit

Exercise 2: Model comparison

Compare 5 models using an ANOVA: me_intercepts, me_slopes, me_interceptslopes, me_subjectintercept, me_subjectinterceptslope. Look at the BIC values in the output to see which model fits best.

anova(me_intercepts, me_slopes, me_interceptslopes, me_subjectintercept, me_subjectinterceptslope)
## refitting model(s) with ML (instead of REML)
## Data: sleep_groups
## Models:
## me_intercepts: Reaction ~ Days + (1 | Group)
## me_slopes: Reaction ~ Days + (0 + Days | Group)
## me_interceptslopes: Reaction ~ Days + (1 + Days | Group)
## me_subjectintercept: Reaction ~ Days + (1 + Days | Group) + (1 | Subject)
## me_subjectinterceptslope: Reaction ~ Days + (1 + Days | Group) + (1 + Days | Subject)
##                          Df    AIC    BIC  logLik deviance    Chisq Chi Df
## me_intercepts             4 6986.8 7004.0 -3489.4   6978.8                
## me_slopes                 4 6986.6 7003.8 -3489.3   6978.6   0.2073      0
## me_interceptslopes        6 6990.4 7016.1 -3489.2   6978.4   0.2526      2
## me_subjectintercept       7 6031.6 6061.6 -3008.8   6017.6 960.8011      1
## me_subjectinterceptslope  9 5841.2 5879.8 -2911.6   5823.2 194.3582      2
##                          Pr(>Chisq)    
## me_intercepts                          
## me_slopes                    <2e-16 ***
## me_interceptslopes           0.8814    
## me_subjectintercept          <2e-16 ***
## me_subjectinterceptslope     <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exercise 3: Model comparison

There’s a chunk at the top of this notebook called data that includes an if (0) statement. Set this to if (1) to shuffle Subjects between groups, and Knit the notebook again. Take a look at the plots in the notebook – how have they changed? Which of the 5 models is preferred now?