Jeff Leek, Department of Biostatistics, Johns Hopkins
Abstract
Data science is the process of formulating a quantitative question that can be answered with data, collecting and cleaning the data, analyzing the data and communicating the answer to the question to the relevant audience. I'll discuss the common (math, programming) and less common (question formulation, data pipelines) components of data science. I will use examples from our experience creating and maintaining more than 20 data science courses that have enrolled more than 4 million over the last 4 years to illustrate why the future of data science courses is based on plain text documents.
Jeff Leek is a professor of Biostatistics and Oncology at the Johns Hopkins Bloomberg School of Public Health. His research focuses on public health genomics, data science as a science, and research on the scientific literature. He is also co-creator of the Johns Hopkins Data Science Specialization on Coursera (https://www.coursera.org/specializations/jhu-data-science) that has enrolled over 4 million students. He is a co-editor of the journal Biostatistics. He writes a blog at Simply Statistics and is the author of the book "The Elements of Data Analytic Style".