Abstract: The 1969-2000 Major League Baseball Attendance dataset contains Runs Scored, Runs Allowed, Wins, Losses, Number of Games Behind the Division Leader, and Home Game Attendance of each major league franchise for the 1969 through 2000 seasons. Also included for each franchise are its location, league affiliation (National or American), and division affiliation (East, Central, or West). These data have been used in a project-based modeling course to instruct students in basic data management, the use of exploratory data analysis to "clean" data, and construction of regression models. The dataset, which is both cross-sectional and time-series, is of a manageable size and easily understood. Furthermore, it provides a useful, interesting, and realistic classroom example for discussing many important statistical concepts.
The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education