Big data analysis is explained in this online course that introduces the user to the tools Hadoop and Mapreduce. These tools allow for the parallel computing necessary to analyze large amounts of data.
Big data analysis is explained in this online course that introduces the user to the tools Hadoop and Mapreduce. These tools allow for the parallel computing necessary to analyze large amounts of data.
A joke to start a discussion on joint probability distributions. The joke was written in 2018 by Larry Lesser from The University of Texas at El Paso.
"There are a lot of small data problems that occur in big data. They don't disappear because you've got lots of stuff. They get worse." is a quote by British biostatistician David J. Spiegelhalter (1953 - ). The quote may be found in a March 28, 2014 article in the Financial Times written by Tim Hartford entitled "Big data: are we making a big mistake?"
A joke to aid in discussing probability density functions for continuous random variables. The joke was written in 2016 by Judah Lesser an AP statistics student from El Paso, Texas.
A joke for discussing how transformations can make data more normal and stabilize variances across groups with different means (here the square root transformation for Poisson data). The joke was written in 2016 by Larry Lesser from The University of Texas at El Paso.
A joke to be used in teaching about the use of randomization in experiments or about the Pearson correlation coefficient. The idea for the joke came from Lawrence Mark Lesser of The University of Texas at El Paso in 2012.
A pun to be used in discussing the concept of regression to the mean. The joke was co-authored in 2017 by Larry Lesser (The University of Texas at El Paso) and Dennis Pearl (Penn State University).
This is a chapter on data wrangling excerpted from a book on data science. The book is “Modern Data Science with R,” and the authors are Benjamin J. Baumer, Daniel T. Kaplan, and Nicholas J. Horton. It contains the R code needed to do basic things with data such as sorting, arranging, and summarizing data.
This is a chapter on ethics excerpted from a book on data science. The book is “Modern Data Science with R,” and the authors are Benjamin J. Baumer, Daniel T. Kaplan, and Nicholas J. Horton. The chapter presents several ethical dilemmas, then a framework to use when evaluating ethical issues. Then it discusses the dilemmas again, now resolving them.
This site is a lesson on using SQL. It starts with a simple SELECT query. The user must type in the correct command to select certain columns from a database. Once the user has completed the first lesson, then he or she may continue to more complicated lessons.