Data Sharing and the Development of the Cleveland Clinic Statistical Education Dataset Repository


Authors: 
Amy S. Nowacki
Year: 
2013
URL: 
http://ww2.amstat.org/publications/jse/v21n1/nowacki.pdf
Abstract: 

Examples are highly sought by both students and teachers. This is particularly true as many
statistical instructors aim to engage their students and increase active participation. While
simulated datasets are functional, they lack real perspective and the intricacies of actual data. In
order to obtain real datasets, the principal investigator of a study must be willing to share the
data. Understanding investigators’ opinions regarding data sharing would thus help elucidate the
general lack of data sharing currently exhibited. Presented are the results of a survey designed to
gather information regarding the proportion of researchers willing to share their data, conditions,
formats, primary motivation, concerns and current availability of data for sharing. With 76%
(56/74) responding favorably to the idea of sharing their published data, the creation of a new
statistical educational resource was prompted. Thus, additionally described is a web-based
dataset repository that can be used as a resource by both educators and students of statistics. This growing repository presents raw data from real medical studies and offers (a) a vignette
summarizing the study, research question and study design; (b) a data dictionary with clear
documentation of variables and codes; (c) a complete citation for the associated study
publication; and (d) a variety of data formats compatible with the majority of statistical
packages. The repository went online on 12/18/12 at the URL
http://www.lerner.ccf.org/qhs/datasets/.

The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education

register