On Enthusing Students About Big Data and Social Media Visualization and Analysis Using R, RStudio, and RMarkdown

Julian Stander & Luciana Dalla Valle

We discuss the learning goals, content, and delivery of a University of Plymouth intensive module delivered over four weeks entitled MATH1608PP Understanding Big Data from Social Networks, aimed at introducing students to a broad range of techniques used in modern Data Science. This module made use of R, accessed through RStudio, and some popular R packages. After describing initial examples used to fire student enthusiasm, we explain our approach to teaching data visualization using the ggplot2 package. We discuss other module topics, including basic statistical inference, data manipulation with dplyr and tidyr, data bases and SQL, social media sentiment analysis, Likert-type data, reproducible research using RMarkdown, dimension reduction and clustering, and parallel R. We present four lesson outlines and describe the module assessment. We mention some of the problems encountered when teaching the module, and present student feedback and our plans for next year.

The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education