• Data Science for Undergraduates: Opportunities and Options

    Nicholas J. Horton (Amherst College)
    Tuesday, July 10, 2018 - 2:00pm ET
    As our economy, society, and daily life become increasingly dependent on data, work across nearly all fields is becoming more data driven, affecting both the jobs that are available and the skills that are required. At the request of the National Science Foundation, the National Academies of Sciences, Engineering, and Medicine were asked to set forth a vision for the emerging discipline of data science at the undergraduate level. The study committeem considered the core principles and skills undergraduates should learn and discussed the pedagogical issues that must be addressed to build effective data science education programs. The report underscores the importance of preparing undergraduates for a data-enabled world and recommends that academic institutions and other stakeholders take steps to meet the evolving data science needs of students. In this webinar, implications, opportunities, and challenges for statistics educators will be discussed along with the study findings.   Resources: http://nas.edu/EnvisioningDS
  • "Tame" data principles and the fivethirtyeight R package

    Albert Y. Kim (Amherst/Smith College)
    Tuesday, June 12, 2018 - 2:00pm ET
    FiveThirtyEight.com is a data journalism website founded by Nate Silver that makes many of the datasets used for their articles openly available on GitHub.com. The fivethirtyeight R package acts as an intermediary to make all this data, its documentation, and links to the original articles easily accessible to R users. Furthermore, the package "tames" the data: the data is pre-processed enough so that the biggest barriers to data exploration faced by novice R users are eliminated, but not so much that the true nature of the data as it exists "in the wild" is completely betrayed. In this webinar, I will present the corresponding set of "tame" data principles, discuss the pedagogical thinking behind them, and present illustrative examples involving datasets from articles on FiveThirtyEight.com.
  • Interactive R tutorials with learnr

    Mine Çetinkaya-Rundel
    Tuesday, May 8, 2018 - 2:00pm ET
    The learnr R package provides a new multimedia approach for teaching statistics and programming with R. Building on R Markdown, this package allows teachers to create interactive tutorials containing narrative, figures, illustrations, and equations, code exercises (R code chunks that users can edit and execute directly), multiple choice quiz questions, videos, and interactive Shiny components. Tutorials built with this tool can be used for checking and reinforcing students' understanding and have the benefit of being self-paced and provide instant feedback. In this webinar we will demonstrate how to use the learnr package to build interactive R tutorials and discuss best practices for using them.
  • A study on the current state of the use of online/flipped classroom pedagogy in statistics/biostatistics

    Todd Schwartz and Jane Monaco (University of North Carolina)
    Tuesday, April 10, 2018 - 2:00pm ET
    Online courses and 'flipped' classrooms are becoming more commonly found in statistics/biostatistics. A gap exists in the literature in regard to a systematic study of instructors' of these types of (bio)statistics courses. We conducted a survey to elicit these instructor's responses in terms of implementation, ratings, recommendations, and opinions, and we report on n=46 such instructors. In this webinar, we describe characteristics of these respondents' courses, as well as summarizing their responses on various aspects. Results are given both overall, as well as for different subgroups of interest. Our findings should be useful to inform statistics educators who might be considering adopting these formats.
  • Statistics knowledge among health sciences faculty

    Matt Hayat, Michael Jiroutek, MyoungJin Kim, and Todd Schwartz
    Tuesday, March 27, 2018 - 2:00pm ET
    Healthcare professionals and faculty depend on the health and medical literature to keep current with clinical information and best evidence-based practices. Yet, little is known about their knowledge of, and comfort level with, statistics. We conducted a research study on health sciences faculty to assess their knowledge about statistics. A probability sample of schools of dentistry, nursing, medicine, pharmacy, and public health were selected, and faculty were invited to complete a brief online survey that included 9 demographic-related questions and a 10-question statistics knowledge instrument. In this webinar we will present study results, including aggregated findings for the 708 respondents, as well as interesting discipline-specific findings. Implications for statistics educators will be discussed, and time will be allotted for questions from the audience.
  • What should we teach in data science courses?

    Dennis Sun (Cal Poly and Google)
    Tuesday, February 13, 2018 - 2:00pm ET
    Over the last few years, there has been a consensus that data science students should be involved in all stages of the data analysis process, from data preparation and wrangling, to presentation and visualization. But data science courses have varied widely in their implementation. Some courses go into great depth about statistical models and machine learning, while others focus on tools like XML, SQL, and web scraping. While there is no question that a budding data scientist must acquire these skills eventually, what should be covered in a course on data science? I suggest that data science courses be organized around three core concepts: paradigms for representing data, paradigms for manipulating data, and paradigms for visualization. These are topics of genuine intellectual merit that are underrepresented elsewhere in the statistics and computer science curriculum. The tools are secondary, and I suggest how such a course could be taught using R examples using the tidyverse or using Python examples.
  • Symbulate: Simulation in the Language of Probability

    Kevin Ross, Cal Poly
    Thursday, January 11, 2018 - 2:00pm ET
    Simulation is an effective tool for analyzing probability models as well as for facilitating understanding of concepts in probability and statistics. Unfortunately, implementing a simulation from scratch often requires users to think about programming issues that are not relevant to the simulation itself. We have developed a Python package called Symbulate (https://github.com/dlsun/symbulate) which provides a user friendly framework for conducting simulations involving probability models. The syntax of Symbulate reflects the "language of probability" and makes it intuitive to specify, run, analyze and visualize the results of a simulation. Moreover, Symbulate's consistency with the mathematics of probability reinforces understanding of probabilistic concepts.  This webinar will demonstrate Symbulate's use with a variety of probability concepts and problems, including: probability spaces; events; discrete and continuous random variables; joint, conditional, and marginal distributions; stochastic processes; and more.
  • The Hows and Whys of Reasoning with the ASA Ethical Guidelines

    Rochelle Tractenberg (Georgetown University)
    Tuesday, January 9, 2018 - 2:00pm ET
    Since data analysis is becoming important across disciplines, the ASA Ethical Guidelines for Statistical Practice, which were updated in 2016, can serve to introduce all students in quantitative disciplines to critical concepts of responsible data analysis, interpretation, and reporting. The Guidelines contain elements that are suitable, and important, components of training for undergraduates whether or not they are statistics majors, to prepare them for ethical quantitative work. The Guideline principles interact, and sometimes must be prioritized. Therefore, neither the simple distribution of –nor an encouragement to memorize- the Guidelines can promote the necessary level of awareness. This presentation will introduce ethical reasoning as a learnable, improvable skill set that can provide an entry point to working with the 2016 revised ASA Ethical Guidelines.
  • Data scraping, ingestation, and modeling: bringing data from cars.com into the intro stats class

    Nicholas Horton, Amherst College
    Tuesday, November 21, 2017 - 2:00pm ET
    In this webinar, I will describe a classroom activity where pairs of students hand scrape data from cars.com, ingest these data into R, then carry out analyses of the relationships between price, mileage, and model year for a selected type of car. This early in the semester activity can help illustrate the statistical problem solving process. The "Less Volume, More Creativity" approach utilized by the mosaic package facilitates the analysis with a minimal amount of syntax. Key concepts that are introduced and reinforced including data ingestion, multivariate thinking through graphical visualizations, and regression modeling. Extensions and additional use of the dataset will be discussed along with potential pitfalls. Project Files: https://github.com/Amherst-Statistics/Cars-Scraping-Webinar
  • Regression to the Mean/The regression effect

    Jeff Witmer (Oberlin College)
    Tuesday, October 17, 2017 - 2:00pm ET
    Regression to the mean, also known as "the regression effect," is an important but sometimes overlooked topic in introductory statistics. We will discuss the regression effect and how to teach it. We will also consider a number of examples of the "regression fallacy," in which people who are ignorant of the regression effect make up ad hoc (and sometimes very misleading) explanations for what they see in data.