In this paper, I will report and summarize some preliminary results of two ongoing studies. The aim is to identify problem areas and difficulties of students in elementary data analysis based on preliminary results from the two ongoing studies. The general idea of the two projects is similar. Students took a course in data analysis where they learned to use a software tool, used the tool during the course, and worked on a data analysis project with this tool at the end of the course. The course covered elementary data analysis tools, such as variables and variable types, box plots, frequency tables and graphs, two-way frequency tables, summary measures (median, mean, quartiles, interquartile range, range), scatterplots, and line plots. The grouping of data and the comparison of distributions in the subgroups defined by a grouping variable was an important idea related to studying the dependence of two variables. The methods for analyzing dependencies differed according to the type of variables: for example, scatterplots were used in the case of two numerical variables, and two-way frequency tables and related visualizations were used in the case of two categorical variables.
The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education