This document gives an introduction to mixed effects models. It follows Chapter 9 of R for Psych by Glenn Williams very closely, and the toy data set we’ll use is from the same resource.
sleep_groups <-read_csv(here("data", "sleep_study_with_sim_groups.csv"))
## Parsed with column specification:
## cols(
## Reaction = col_double(),
## Days = col_double(),
## Subject = col_double(),
## Group = col_character()
## )
if (1) { # shuffle subjects between groups
# make frame showing Group for each Subject
SubjGroups <- sleep_groups %>%
group_by(Subject, Group) %>%
summarize()
nsubject <- max(sleep_groups$Subject)
subjperm <- sample(nsubject)
orig_groups <- SubjGroups$Group
sleep_groups <- sleep_groups %>%
mutate(Group= orig_groups[subjperm[Subject]])
}
sleep_groups <- sleep_groups %>%
mutate(Subject=factor(Subject))
head(sleep_groups)
## # A tibble: 6 x 4
## Reaction Days Subject Group
## <dbl> <dbl> <fct> <chr>
## 1 250. 0 1 group_1
## 2 259. 1 1 group_1
## 3 251. 2 1 group_1
## 4 321. 3 1 group_1
## 5 357. 4 1 group_1
## 6 415. 5 1 group_1
This data set is for an (imaginary) study that looks at the effect of days without sleep on reaction times. Suppose that there are three groups of people— for example, group_1
could be teenagers, group_2
twenty-year olds and group_3
thirty-year olds.
Let’s first fit and plot a linear model:
lm(Reaction ~ Days, data = sleep_groups)
##
## Call:
## lm(formula = Reaction ~ Days, data = sleep_groups)
##
## Coefficients:
## (Intercept) Days
## 371.7 15.4
ggplot(data = sleep_groups, mapping = aes(x = Days, y = Reaction)) +
geom_point(na.rm = T, aes(col = Group), alpha = 0.5) +
geom_smooth(method = "lm", na.rm = T, col = "black", se = F) +
scale_y_continuous(limits = c(180, 1020)) +
scale_x_continuous(breaks = seq(1:10) - 1) +
theme(legend.position = "top")
You can see that the three groups look to have different intercepts and probably different slopes — for example, the blue points (group 3) are higher than the rest, and seem to rise more steeply. So let’s try adding random effects to capture the statistical dependency between samples from the same group.
me_intercepts <- lmer(Reaction ~ Days + (1 | Group), data = sleep_groups)
me_intercepts
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (1 | Group)
## Data: sleep_groups
## REML criterion at convergence: 6968.668
## Random effects:
## Groups Name Std.Dev.
## Group (Intercept) 17.99
## Residual 154.71
## Number of obs: 540, groups: Group, 3
## Fixed Effects:
## (Intercept) Days
## 371.7 15.4
# see group coefficients
model_coefs <- coef(me_intercepts)$Group %>%
rename(Intercept = `(Intercept)`, Slope = Days) %>%
rownames_to_column("Group")
model_coefs
## Group Intercept Slope
## 1 group_1 359.0791 15.40385
## 2 group_2 388.4861 15.40385
## 3 group_3 367.5083 15.40385
sleep_groups_rani <- left_join(sleep_groups, model_coefs, by = "Group")
model_coef_plot <- ggplot(data = sleep_groups_rani,
mapping = aes(x = Days,
y = Reaction,
colour = Group)
) +
geom_point(na.rm = T, alpha = 0.5) +
geom_abline(aes(intercept = Intercept,
slope = Slope,
colour = Group
),
size = 1.5
) +
scale_y_continuous(limits = c(180, 1020)) +
scale_x_continuous(breaks = seq(1:10) - 1) +
theme(legend.position = "top")
# see the plot
model_coef_plot
Looks better but still not ideal. For example the line for the blue group should probably have a greater slope than the line for the red group.
me_slopes <- lmer(Reaction ~ Days + (0 + Days | Group), data = sleep_groups)
# see group coefficients
model_coefs <- coef(me_slopes)$Group %>%
rename(Intercept = `(Intercept)`, Slope = Days) %>%
rownames_to_column("Group")
# see coefficients
model_coefs
## Group Intercept Slope
## 1 group_1 371.6912 12.74239
## 2 group_2 371.6912 18.60794
## 3 group_3 371.6912 14.86121
sleep_groups_rans <- left_join(sleep_groups, model_coefs, by = "Group")
model_coef_plot %+% sleep_groups_rans
Slopes look better but intercepts seem off – the intercept for the blue group should be higher than for the other two groups.
me_interceptslopes <- lmer(Reaction ~ Days + (1 + Days | Group), data = sleep_groups)
## singular fit
# see group coefficients
model_coefs <- coef(me_interceptslopes)$Group %>%
rename(Intercept = `(Intercept)`, Slope = Days) %>%
rownames_to_column("Group")
# see coefficients
model_coefs
## Group Intercept Slope
## 1 group_1 365.3002 13.77405
## 2 group_2 379.7115 17.44916
## 3 group_3 370.0618 14.98834
sleep_groups_ranis <- left_join(sleep_groups, model_coefs, by = "Group")
model_coef_plot %+% sleep_groups_ranis
Now the regression lines for each group look better.
There’s another dependency that we haven’t yet acknowledged. All of the data points for a given individual will be coupled in a way that data points for different individuals are not. So we could perhaps add a random intercept for Subject.
set.seed(1)
me_subjectintercept <- lmer(Reaction ~ Days + (1 + Days | Group) + (1|Subject), data = sleep_groups)
## singular fit
Use lmer
to fit a model that is similar to `me_subjectintercept’ but also includes a random slope for Subject.
me_subjectinterceptslope <- lmer(Reaction ~ Days + (1 + Days | Group) + (1 + Days|Subject), data = sleep_groups)
## singular fit
Compare 5 models using an ANOVA: me_intercepts
, me_slopes
, me_interceptslopes
, me_subjectintercept
, me_subjectinterceptslope
. Look at the BIC values in the output to see which model fits best.
anova(me_intercepts, me_slopes, me_interceptslopes, me_subjectintercept, me_subjectinterceptslope)
## refitting model(s) with ML (instead of REML)
## Data: sleep_groups
## Models:
## me_intercepts: Reaction ~ Days + (1 | Group)
## me_slopes: Reaction ~ Days + (0 + Days | Group)
## me_interceptslopes: Reaction ~ Days + (1 + Days | Group)
## me_subjectintercept: Reaction ~ Days + (1 + Days | Group) + (1 | Subject)
## me_subjectinterceptslope: Reaction ~ Days + (1 + Days | Group) + (1 + Days | Subject)
## Df AIC BIC logLik deviance Chisq Chi Df
## me_intercepts 4 6986.8 7004.0 -3489.4 6978.8
## me_slopes 4 6986.6 7003.8 -3489.3 6978.6 0.2073 0
## me_interceptslopes 6 6990.4 7016.1 -3489.2 6978.4 0.2526 2
## me_subjectintercept 7 6031.6 6061.6 -3008.8 6017.6 960.8011 1
## me_subjectinterceptslope 9 5841.2 5879.8 -2911.6 5823.2 194.3582 2
## Pr(>Chisq)
## me_intercepts
## me_slopes <2e-16 ***
## me_interceptslopes 0.8814
## me_subjectintercept <2e-16 ***
## me_subjectinterceptslope <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There’s a chunk at the top of this notebook called data
that includes an if (0)
statement. Set this to if (1)
to shuffle Subjects between groups, and Knit the notebook again. Take a look at the plots in the notebook – how have they changed? Which of the 5 models is preferred now?