P4-09: Real Data Are Messy! Cleaning, Organizing, and Drawing Meaning from Raw Data on Maple Trees


By Nicole Williams & Jeffrey McLean (St. Lawrence University)


Abstract

As statistics coursework at all age levels increases, more students are being exposed to the statistical investigative process of problem solving and decision-making. Working with raw and messy data, cleaning it up, organizing it, and then drawing meaning from the data should be a focus of this investigative process. Statistics curricula often use real world examples, yet the data tends to ready to process and perfectly fit to the example. A recent survey by CrowdFlower found that cleaning and organizing data was what data scientists spend the most amount of their time doing. With the boom of data scientist positions in academia and industry, it is crucial that we include this important part of the investigative process into our curriculum. Working with a local community based organization, Nature Up North, we developed an activity for introductory statistics students to experience messy data.

Since 2013 Nature Up North has worked with local residents of all ages to observe the phenology, or timing of seasonal changes, by collecting data on all types of maple trees. The data were collected over the last 5 years in the fall and spring seasons, among thousands of cases of maple trees over more than two dozen variables. As the data were gathered by volunteers of all ages, and is a large, complex, and messy data set, the data is not in an optimal format for analysis. Our session will discuss an activity where our students act as consultants to create a method for cleaning, organizing, and exploring the gathered data.


Recording

eCOTS 2018 - P4-09 - Real Data Are Messy! Cleaning, Organizing, and Drawing Meaning from Raw Data on Maple Trees.pdf