# eCOTS 2014 - Breakout Session #3

### "How Introductory Applied Statistics Course Instructors Can Introduce Big Data through Four of its Vs" John McKenzie, Babson College

#### Abstract

Today it is hard to ignore Big Data due to its presence in all aspects of 21st-century life. Here are some examples, business, medicine, national security, science, and sports. It was the theme of the 2012 Math Awareness Month. In 2011 McKinsey Global Institute reported that the United States will have a shortage of more than 1.5 million people with Big Data knowledge by 2018. From its name most people believe it only deals with enormous sets of data readily available from the Internet and sensors. A more appropriate definition is one that denotes data sets whose size is beyond the ability of commonly used software to capture, store, manage, and analyze. The processing of such data involves statistics, computer science, and operations research. Numerous institutions in the United States have created or are developing courses to analyze Big Data. Among the names for these cross-disciplinary courses are Business Analytics and Data Science. There are now many related academic degree and certificate programs.

All undergraduate students who take an introductory applied statistics course (and secondary-school students who take the AP statistics course) should be introduced to Big Data. Then they will be more informed citizens who may decide to continue the study of statistics in preparation for future employment. But how can one introduce such information into such a course that has very little "wiggly room". The first 10-minutes of this 30-minute webinar will illustrate how this can easily be done by presenting some examples through four of Big Data's Vs. In 2001 consultant Doug Laney denoted Big Data by three Vs: Volume, Velocity, and Variety (in structure). Among the other Vs that have been added to his list are Variety (in source) and Veracity (maintaining and assuring the accuracy and consistency of data over time). Another V associated with Big Data is Visualization, graphic visual representations necessary for the analysis of such data and for the communication of results. The examples will come from the following list of ways to introduce Big Data from Laney's original Vs and Visualization:

• Volume: show how p-values of an analysis can be changed by increasing its sample size(s); include some much larger data sets than usually found in textbooks
• Velocity: include some time series data sets and process data sets, not just cross-sectional data sets
• Variety: show that there are two possible structures for two samples of data; include some data sets with missing data, text data, and date/time data; ask students to clean a messy data set
• Visualization: show how freeware word clouds and n-grams be used to analyze text data; present static network diagrams, heat maps, and other infographics; present examples of dynamic displays

Attached is a set of PowerPoints that contains slides of possible examples. These slides were presented to an audience of instructors of the introductory business statistics course at the 2013 Decision Sciences Institute Annual Meeting.

The remaining minutes of the session will start by asking the audience to suggest additional ways that Big Data can be introduced into Stat 101. Then the audience will be polled to determine 1) how many had previously introduced their students to Big Data and 2) how many expect to increase their introduction to Big Data. It will conclude by a discussion of challenges in introducing Big Data into the introductory applied statistics course.

It is the hope of the presenter that the audience will include some of ways to introduce Big Data into their courses from the presentation. As mentioned above, this is an important and doable addition to Stat 101.

#### Recording

(Tip: click the fullscreen control)