Coding Code: Investigating Student’s Data Science Skills with Qualitative Methods | Room 107


Allison Theobold (Cal Poly, San Luis Obispo)


Abstract

Background. Over the last 15 years, qualitative methods have been integrated throughout statistics education research (e.g., grounded theory, phenomenology, archaeology). However, we have yet to see researchers integrate these methodologies for research on the teaching and learning of data science. Despite the elevated importance of data science skills, there exists limited research investigating how students learn the computing concepts and skills necessary for carrying out data science tasks. Computer science education researchers have investigated how students debug their own code and reason through foreign code. Although these studies illuminate aspects of students’ conceptual understanding or programming behavior, they do not shed light on students’ learning processes. This type of inquiry necessitates qualitative methods—describing the computing skills, organizing these descriptions into themes, and comparing the emergent themes across students or across time. In this presentation I share how to conceptualize and carry out this qualitative coding process with students' computing code. Drawing on the Block Model (Schulte, 2008) to frame my analysis, I explore the R code produced by one graduate student throughout their research. Using qualitative methods to map the development of this student’s computing skills over time, I outline our discipline’s first learning trajectory of data science skills.

Methods. In this presentation, I propose researchers consider student’s code as an artifact of their learning. I will walk through the three phases required for all qualitative research. These phases are explored in the context of the R code produced by one graduate student, Alicia, throughout her graduate research. I detail how the Block Model (Schulte, 2008) can be used to frame this type of investigation, supporting the analysis of a variety of aspects of a computer program. Specifically, I analyze computing code from two perspectives, the level and the dimension of the program.

Findings. The intention of this presentation is twofold: (1) to outline methods for qualitatively analyzing students’ code, and (2) to use these methods to outline a preliminary learning trajectory for data science concepts. Using the R code produced by one graduate student throughout her graduate research, I outline how these methods shed light on the development of students’ computing skills.

Implications. The field of data science education is emerging as its own discipline of research, facing a multitude of open questions surrounding the teaching and learning of data science. I posit the horizon of research in data science education critically inspects student learning from the perspective of the learner, paying specific attention to students' computing code as a relic of their learning. Furthermore, I believe qualitative research will play a dominant role in the future of data science education research, and hope the methodology outlined in this presentation inspires researchers to begin this important work.