frames <- read_csv(here("data", "data_samplesize.csv"))

If you’d done the sampling frames experiment, which analyses would you actually report in a paper? Here we’ll give a frequentist approach and two Bayesian approaches.

Load and plot data

fullframes <- frames %>% 
  mutate(generalisation = (response+.1)/9.2) %>% mutate(id=factor(id)) %>% 
  mutate(id=factor(id)) %>% 
  mutate(sample_size = factor(sample_size, levels = c("small","medium","large"))) 

fullframes_avg <- fullframes %>%
  group_by(test_item, condition, sample_size) %>%
    n = n(),
    generalisation = mean(generalisation),
    ) %>%

expsummary <- fullframes_avg %>%
  ggplot(aes(x = test_item, y = generalisation, colour = condition)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin = generalisation - se, ymax = generalisation + se)) +

Eyeballing the data it seems clear that

  1. responses are higher overall for the category than the property condition
  2. responses decrease as test_item increases
  3. there is an interaction between n_obs and test item (as n_obs increases, difference between small and large test items increases)
  4. there is an interaction between test item and condition (as test_item increases, difference between category and property sampling increases)
  5. there is an interaction between n_obs and condition (as n_obs increases, difference between category and property sampling increases)
  6. there is a three-way interaction between n_obs, test item and condition condition (as n_obs increases, the interaction between test item and condition becomes more pronounced)

On the other hand, it’s not clear whether

  1. the average response increases or decreases as n_obs increases

Frequentist and Bayesian methods can both be used to test these qualitative impressions.

Frequentist approach

The day4/statistics.Rmd notebook ended up finding that a logistic regression model with the formula

generalisation ~ condition * test_item * n_obs + (1 + test_item * n_obs|id) 

was the best at capturing the responses of individual participants. Here, however, we’ll use a model with an intercept-only random effect

generalisation ~ condition * test_item * n_obs + (1|id) 

The more elaborate version of the model would be a better choice under some circumstances. For example, if the project focused on individual differences and aimed to develop a cognitive model that accounted for data at an individual level, it would probably make sense to use the more complex version of the model for the data analysis. Here we use the simpler version because it will be easier for readers to understand, and because it parallels the Bayes factor approach that the authors actually used in the published paper.

First we fit the model:

logitmod <- glmer(
  formula = generalisation ~ condition * test_item * n_obs + (1|id), 
  family = gaussian(link = "logit"), 
  data = fullframes)
If following this approach, the paper would include a writeup of this single model along with plots showing posterior distributions on all coefficients.