frames <- read_csv(here("samplingframes", "data", "data_samplesize.csv"))
set.seed(1) # for replicability

What this section is not

Because R is a statistical programming language it comes with a lot of hypothesis tests and tools built in, and of course there is an overwhelming number of packages out there that extend this. It is impossible to cover the whole thing in one tutorial, so I’m going to be a little picky. For example, I’m going to skip over the most commonly used classical tests, because they’re comparatively easy to learn and it’s not the best use of our time! For future reference though:

Of course, there are many, many others! What we’re going to focus on here is:

One of our goals is to get to the point where we can analyze the sampling frames data from yesterday. After exploring and visualizing these data yesterday, you have some idea of what they look like. Below is a summary plot that shows how generalization curves varied across both conditions (category, property) and sample sizes (small, medium, large).

frames_avg <- frames %>%
  mutate(sample_size = factor(sample_size, levels = c("small","medium","large"))) %>% 
  group_by(test_item, condition, sample_size) %>%
    n = n(),
    response = mean(response),
    ) %>%

frames_avg %>%
  ggplot(aes(x = test_item, y = response, colour = condition)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin = response - se, ymax = response + se)) +