W14: The Problem of Multiple Comparisons: A Role-playing Activity


By Jonathan Wells


Information

The 2015 revised GAISE report encourages instructors to incorporate multivariable thinking in their college-level statistics courses. While multivariate data allows students to explore interactions and confounding factors, it also tempts them to draw erroneous conclusions based on multiple comparisons. In this Beyond demonstration, we describe an activity appropriate for an introductory or intermediate statistics class of 10-30 students that provides salient and memorable exposure to the problem of multiple comparisons. During the activity, groups of students take the role of teams of data scientists. Each are provided with a different subset of a multivariate data set which includes a response variable and twenty predictors, and are tasked with identifying the variable most strongly associated with the response. After performing data analysis, groups share their results to the class, and are often surprised to find that each group has identified a different significant variable. At this time, the instructor reveals the true nature of the data: none of the variables are related to the response, as all were generated independently and any observed correlations in the data set are due to chance alone. Class concludes with a discussion of the problem of multiple comparisons, as well as possible solutions. At the conclusion of this demonstration, participants will have access to a ready-to-implement activity on multiple comparisons and the replication crisis, appropriate for small-to-moderate sized Intro and Intermediate statistics classes.