Keynote #2 (CAUSE Research): Classification, decision trees, and machine learning as part of secondary statistics education - from societal, conceptual, and pedagogical analysis to research on classroom experiments


Rolf Biehler (Paderborn University)


Abstract

Data science and statistics education at the secondary level is facing a new challenge: how to contribute to students' understanding of new machine learning methods and their applications in science and society.

In the Project Data Science and Big Data at School (www.prodabi.de/en) we have developed teaching materials and professional development courses on this topic. Our courses are based on societal, conceptual, and pedagogical analyses, and we empirically study their implementation in the classroom.

Decision trees were chosen as an introductory ML method because of their simplicity and transparency. This approach allows students to start with manual tree construction before moving on to semi-automatic and automatic methods, and highlights the differences between human- and computer-generated decision trees. It provides an easier introduction to multivariate methods than multivariate regression models and illustrates algorithmic models.

The project uses decision trees to introduce classification methods that are not commonly taught at the secondary level, despite their widespread use in the real world. It links classification to bivariate data analysis and Bayesian decision making with its typical two types of errors and extends to the evaluation of ML classifiers through the confusion matrix concept.

We have developed materials for different age groups. We use unplugged data cards for grades 5 and 6, and move to semi-automatic tree generation with CODAP and the ARBOR plug-in in grades 8 to 10. We have developed special Jupyter notebooks, mainly for grade 12, which appear menu-based to students (but teachers can change the code if they wish). They provide a tool for interactive experimentation with trees and for using professional libraries to create optimal trees.

The courses incorporates data on nutrition and adolescents' media use, encouraging students to explore and analyze before applying decision trees.

The presentation will discuss the project's materials, pedagogical foundations, and examples of classroom applications and related research.

 

Presenter Bio:

Dr. Rolf Biehler is professor emeritus for didactics of mathematics at Paderborn University, Germany. His research interests include probability, statistics and data science education with digital tools, university mathematics education and the professional development of mathematics teachers (https://www.researchgate.net/profile/Rolf-Biehler). Rolf Biehler is currently co-directing the Project Data Science and Big Data at School (www.prodabi.de/en/) at Paderborn University, a collaborative project with computer science educators. The project develops curriculum material and professional development courses for teachers and investigates which aspects of data-based machine learning can and should be taught at primary, middle and high school level. He was co-founder and co-director of the Centre for Research in University Mathematics Education (www.khdm.de) and co-founder of the German Centre for Mathematics Teacher Education (www.dzlm.de). He has served and continues to serve on editorial boards of several international and national mathematics education journals and book series. He currently chairs the advisory board of the Statistics Education Research Journal and the German Association for the Promotion of Stochastics Teaching in Schools.


register