Artificial data sets are often used to demonstrate statistical methods in applied statistics courses and textbooks. We believe that this practice removes much of the intrinsic interest in learning to do good data analysis and contributes to the myth that statistics is dry and dull, In this article, we argue that artificial data sets should be eliminated from the curriculum and that they should be replaced with real data sets. Real data supplemented by suitable background material enable students to acquire analytic skills in an authentic research context and enable instructors to demonstrate how statistical analysis is used to model real data into applied statistics curricula, we identify seven characteristics that make data sets particularly good for instructional use and present an annotated bibliography of more than 100 primary and secondary data sources.
The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education