Hartono Tjoe (Pennsylvania State University), Leigh Harrell-Williams (University of Memphis), Stephanie Casey (Eastern Michigan University), Charlotte Bolch (Midwestern University), Taylor Mulé (University of Memphis)
Abstract
Background. This paper focuses on the process and results of the work completed by the Statistics Education Synthesis group within the NSF-funded Validity Evidence for Measurement in Mathematics Education (VM²Ed) project (PIs: Krupa & Bostic; https://sites.ced.ncsu.edu/mathedmeasures/). The project goal is to identify and compile mathematics and statistics education assessments/instruments through multiple rounds of literature searches, with the end product being a searchable repository of instruments. This work is valuable to the field, as the application of modern measurement theory and practice is limited and not well integrated into research efforts as support for instrument interpretation. Methods. We performed three rounds of work with regard to identifying validity evidence about statistics education instruments to be added to the project’s searchable repository of instruments. In the first round, we compiled a list of statistics education instruments noted in publications between 2000 and 2020. The focus of the second round was a detailed literature search for those instruments to identify publications citing the instrument. We are currently completing the third round, which encompasses the identification and classification of pieces of validity evidence for each instrument using the 2014 AERA/APA/NCME Standards for Educational and Psychological Testing framework. Findings. The findings include a discussion of the types and sources of validity evidence for statistics education instruments, with a focus on single use instruments. Overall, a high percentage of statistics education instruments are single use: 76% of the 107 instruments on our list, 77% of 36 student attitude instruments, 71% of 33 student knowledge instruments, and 76% of 12 teacher knowledge and attitudes instruments. These instruments tend to have a minimal body of validity evidence. In contrast, we highlight instruments with broad bodies of validity evidence with respect to the five sources of validity evidence outlined in the 2014 Standards. Implications. Recommendations for the statistics education field based on our work include working to: (1) use modern measurement language to frame validity evidence and argumentation, (2) reduce the prevalence of single use instruments, (3) reduce questionable measurement practices (Flake & Fried, 2020), such as combining items from existing instruments into a new instrument without addressing its validity evidence, and (4) treat validation as a program of research (Bandalos, 2018).