By Susan Lloyd & Matthew Beckman (The Pennsylvania State University)
"Effective formative assessment (FA) is indispensable for instructors to monitor students’ learning. Research has linked “write-to-learn” tasks to improved learning outcomes in science and mathematics, yet constructed-response methods of FA become unwieldy for instructors with large enrollment classes.
During a previous study, six short-answer tasks were completed by a combined sample of 1,935 undergraduate statistics students from 15 distinct institutions. Responses were divided among three trained evaluators with sufficient intersection to measure inter- and intra-rater agreement. A natural language processing (NLP) algorithm scored a subset of student responses to serve as an automated rater. There is inherent value in a rigorous evaluation of rater agreement for instructors of all class sizes.
Reliable classification (e.g., incorrect / partial / correct) and successful NLP clustering of similar responses are necessary prerequisites toward AI-assisted FA for short-answer tasks. Future work will iteratively refine the clustering such that an instructor may interpret
meanings/misconceptions common within response clusters resulting in useful, scalable, FA feedback to hundreds or thousands of students. Success in these efforts creates an opportunity to study formative assessment interventions and mechanisms associated with desired learning outcomes that have implications for smaller and intermediate class sizes as well."