Sources of Data
Cleaned and Vetted Data
- Journal of Statistics Education: (https://www.tandfonline.com/loi/ujse20) is a free, online, international journal focusing on the teaching and learning of statistics. This site also contains links to several statistical education organizations, newsletters, discussion groups and the JSE Dataset Archives.
- JSE Data Archive: (http://jse.amstat.org/datasets/)
- The Data and Story Library (DASL): (http://dasl.datadesk.com/) is a collection of data sets and related documentation (stories) that may be searched by data subject or statistical technique. A great place to visit for class examples.
- OzDASL - Australasian Data and Story Library: (www.statsci.org/data/)
- Statistical Science Web: A portal for statistical science (http://www.statsci.org/)
- Larry Winner's Miscellaneous Datasets: (http://www.stat.ufl.edu/~winner/datasets.html)
- CSV format data for several R packages: (http://vincentarelbundock.github.io/Rdatasets/datasets.html)
Useful Sites
- The Chance Project: (https://www.causeweb.org/wiki/chance/index.php/Main_Page) has resources that emphasize using current news media as motivation for studying issues in probability and statistics. A highlight of the web site is the monthly Chance News (which now functions as a wiki), abstracting recent articles from newspapers and magazines with suggestions for pedagogical uses.
- CAUSEweb: (https://www.causeweb.org/) the Consortium for the Advancement of Undergraduate Statistics Education, has helpful resources for teaching an introductory statistics course, including class examples, labs, homework assignments, data sets, cartoons, songs, jokes, and quotes. This site also houses information about the biennial U.S. Conference on Teaching Statistics and the Electronic Conference on Teaching Statistics.
- iNZight: (https://www.stat.auckland.ac.nz/~wild/iNZight/index.php)
- Significance: (http://onlinelibrary.wiley.com/doi/10.1111/sign.2015.12.issue-4/issuetoc)
- Statland: (http://statland.org/statland.htm)
- StatSci: (http://www.statsci.org)
Large Data Sets
- General Machine Learning Repository: (https://archive.ics.uci.edu/ml/index.php)
- Kaggle competitions: (http://www.kaggle.com/)
- College ScoreCard Data: (https://collegescorecard.ed.gov/data/)
- UMass Common Data Sets: (https://www.umass.edu/oir/accountability/common-data-set-cds)
- iHAPSS (Internet based Health and Air Pollution Surveillance System): (http://www.biostat.jhsph.edu/IHAPSS/index.htm)
- ICPSR: (https://www.icpsr.umich.edu/web/pages/ICPSR/index.html)
- Global Terrorism: (http://www.start.umd.edu/gtd/)
- Stop and Frisk data: (http://www.nyclu.org/content/stop-and-frisk-data)
- Stop and Frisk data: (http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/21660)
- UCI Machine Learning Repository Spambase Data Set: (http://archive.ics.uci.edu/ml/datasets/Spambase)
- kernlab package in R: (http://search.r-project.org/library/kernlab/html/spam.html)
- Baltimore Open Data: (https://data.baltimorecity.gov/)
- Air Quality Data: (http://rpubs.com/rdpeng/13396)
- Azure Machine Learning
- Global Land and Ocean: (http://berkeleyearth.org/land-and-ocean-data)
- Gridded Data: (http://berkeleyearth.org/data)
- Inter-univeristy consortium for Political and Social Research: (https://www.icpsr.umich.edu/web/pages/)
- National Bureau of Economic Research: (http://www.nber.org/data/)
- American Economic Association: (http://rfe.org/showCat.php?cat_id=2)
- Nation Master: (http://www.nationmaster.com/index.php)
- Statistical Abstract of the U.S.: (https://www.census.gov/library/publications/time-series/statistical_abstracts.html)
- Panel Study of Income Dynamics: (http://psidonline.isr.umich.edu/)
- Free Lunch: (http://www.economy.com/freelunch/default.asp)
- Bureau of Labor Statistics: (http://www.bls.gov/data/)
- County and City Data Book: (https://www.census.gov/library/publications/2010/compendia/databooks/ccdb07.html)
- NC Division of Marine Fisheries: (http://portal.ncdenr.org/web/mf/marine-fisheries-catch-statistics)
- Marine Recreational Fisheries Statistics (NOAA): (http://www.st.nmfs.noaa.gov/st1/recreational/queries/index.html)
- National Hurricane Center Data Archive: (http://www.nhc.noaa.gov/pastall.shtml)
- Statland: (http://statland.org/statland.htm)
- Statista, Commercial Clearinghouse for Data: (http://www.statista.com/)
- Kenan Fellows Program: (http://kenanfellows.org/kfp-cp-sites/cp19/cp19/lesson-overview/index.html)
- An Interesting NYT Article on Some Problems with Big Data: (http://www.nytimes.com/2014/04/07/opinion/eight-no-nine-problems-with-big-data.html?_r=0)
- Quandl: (https://www.quandl.com/)
- FBI Crime Data Explorer: (https://crime-data-explorer.app.cloud.gov/)
- Global Monitoring Laboratory Data Finder: (https://www.esrl.noaa.gov/gmd/dv/data/)
- Google Public Data Explorer: (https://www.google.com/publicdata/directory)
- UNdata: (http://data.un.org/)
- USAID Global Reading Network: (https://www.globalreadingnetwork.net/resources)
- DryadLAb, a collection of free, openly sourced, high quality data modules: (https://datadryad.org/stash/)
- Data.world: (https://data.world/)
- Medeley Data: (https://data.mendeley.com/)
- Enigma: (https://enigma.com/)
Education
- College ScoreCard Data: (https://collegescorecard.ed.gov/data/)
- National Center for Education Statistics (NCES) Database: (https://nces.ed.gov/datatools/)
- The College Board: (http://www.collegeboard.com) Administers the SAT, PSAT, AP Tests, etc.
- ACT Inc: (http://www.act.org) Administers the ACT.
Sports
- CBS Sportsline: (http://cbs.sportsline.com) The CBS Sportsline gives lots of data on all sorts of sports. The link will provide major league baseball statistics.
- Sports Illustrated: (https://www.si.com/)
- Baseball Reference: (http://www.baseball-reference.com)
- NFL Statistics: (http://www.nfl.com/stats)
- NCAA Football Statistics: (http://www.ncaa.org/about/resources/research)
- Baseball Prospectus: (https://legacy.baseballprospectus.com/sortable/)
Automobiles
- Autoweb: (http://www.autoweb.com) Information about new and used cars. Includes prices, horsepower, mileage, etc.
- Cars.com: (http://www.cars.com) Information about new and used cars. Includes prices, horsepower, mileage, etc.
Consumer Information
- Consumer Reports Online: (http://www.consumerreports.org) An online version of the popular magazine and buying guides.
- Foreign Currency Exchange Rates: (http://www.x-rates.com)
- The Consumers Price Index: (http://www.bls.gov/cpi/) from the Bureau of Labor Statistics, https://www.bls.gov/
Societal & Economic Data
- CIA World Fact Book: (https://www.cia.gov/library/publications/the-world-factbook/rankorder/rankorderguide.html)
- The US Census Bureau: (http://www.census.gov)
- The National Center for Health Statistics: (http://www.cdc.gov/nchs/) The U.S. government’s principal heath statistics agency.
- Data.gov: (https://www.data.gov/) has more than 10,000 data sets in csv format (plus others in other formats).
- PublicData.eu: (https://publicdata.eu/) has more than 10,000 data sets in csv format (plus others in other formats.
- World Bank Data: (https://data.worldbank.org/)
- Datahub.io: (https://datahub.io/search)
- General Social Survey: (https://gss.norc.org/)
- Statistics Canada: (http://www.statcan.ca)
- Bureau of Labor Statistics: (https://www.bls.gov/)
- Federal Reserve Economic Data: (https://fred.stlouisfed.org/)
- US Historic Financial Asset Returns: (http://pages.stern.nyu.edu/~adamodar/New_Home_Page/datafile/histretSP.html)
Health & Environmental
- Berkeley Land and Ocean Data: (http://berkeleyearth.org/land-and-ocean-data)
- Berkeley Gridded Data: (http://berkeleyearth.org/data)
- Datasets from Early (and Late) Phases of Drug Research, Thomas E. Bradstreet, Ph.D.: (http://webserv.jcu.edu/math/faculty/TShort/Bradstreet/index.html )
- National Hurricane Center: (https://www.nhc.noaa.gov/data/)
- CDC Health Datasets: (https://www.cdc.gov/nchs/)
- Vanderbilt Biostatistics Data: (https://hbiostat.org/data/)
- Global Climate Time Series Data: (https://www.metoffice.gov.uk/hadobs/hadcrut4/)
- Alaskan Ice River Breakup Data: (https://nsidc.org/data/nsidc-0064#)
Miscellaneous
- 538 github datasets: (https://github.com/fivethirtyeight/data)
- Encyclopedia.com: (http://www.encyclopedia.com)
- The Internet Movie Database (IMDB): (http://us.imdb.com) Information about movies including lists of top grossing movies and top rentals in the U.S. and U.K.
- UCLA Statistics Case Studies: (http://statistics.ucla.edu/)
- Rice Virtual Lab in Statistics: (http://onlinestatbook.com/rvls.html) houses an online statistics textbook, as well as java simulations and interesting case studies.
- Gapminder Data Sets: (https://www.gapminder.org/data/)
- Misc. Data from Pomona: (http://pages.pomona.edu/~jsh04747/otherdata.htm)
- Misc. Data from Amy Hogan’s Blog: (http://alittlestats.blogspot.com/p/data-sources.html)
- Public Profile of San Francisco OkCupid users: (https://github.com/wetchler/okcupid)
- Misc. Racial Disparities Data:
- Original Data: (https://openpolicing.stanford.edu/data/)
- Wrangled Data: (https://github.com/bakuninpr/traffic-stops-and-racial-disparity)
- Misc. Data Repository: (https://vincentarelbundock.github.io/Rdatasets/datasets.html)
- Journal of Cultural Analytics Data: (https://culturalanalytics.org/section/1579-data-sets)
Newspapers and Magazines
- New York Times: (http://www.nytimes.com)
- USA Today: (http://www.usatoday.com)
Politics and Polls
- The Gallup Organization: (http://www.gallup.com) The Gallup Organization site give results from many of the polls they conduct. This site is good if you are looking for data dealing with proportions.
- CNN: (http://www.cnn.com)
- FiveThirtyEight: Famous pollster Nate Silver’s website, dedicated to opinion poll analysis, politics, economics, and sometimes sports blogging.
- Site URL: (https://fivethirtyeight.com/)
- Data Portal: (https://data.fivethirtyeight.com/)
- R Package: (https://cran.r-project.org/web/packages/fivethirtyeight/vignettes/fivethirtyeight.html) contains pre-processed data and data dictionaries, all in a conveniently accessible R package.
- Github Data Repository: (https://github.com/fivethirtyeight/data)
- Pew Research: (http://www.pewresearch.org/)
- Grinnell College National Poll: (https://dasil.sites.grinnell.edu/political-science/grinnell-college-national-poll/)
- Iowa Secretary of State Voter Registration Data: (https://sos.iowa.gov/elections/voterreg/county.html)
- FEC Contributions Data: (https://github.com/hadley/fec-dplyr)
stats blogs
• http://simplystatistics.org
• http://www.statsblogs.com
• http://r-bloggers.com
• http://fivethirtyeight.com
• https://www.statschat.org.nz/
• https://www.statschat.org.nz/
• http://r4stats.com/
• https://tinyletter.com/data-is-plural/archive
• http://flowingdata.com/2009/05/06/37-data-ish-blogs-you-should-know-about/
Qualitative data sources:
- Haithtrust: (https://analytics.hathitrust.org/)
- Gutenberg.org: (https://www.gutenberg.org/)
- Chronicling America: (http://chroniclingamerica.loc.gov/)
- JSTOR for Data Research: (https://www.jstor.org/dfr/)
- Internet Data Sources for Social Scientists: (https://ciser.cornell.edu/data/)
- Digital Humanities Resources for Project Building: (http://dhresourcesforprojectbuilding.pbworks.com/w/page/69244469/Data%20Collections%20and%20Datasets)
- ICPSR, the Inter-university Consortium for Political and Social Research, https://www.icpsr.umich.edu/web/pages/
Images, audio, video:
- Europeana: (http://www.europeana.eu/portal/)
- Digital Public Library of America: (https://dp.la/)
- Internet Archive: (https://archive.org/)
Examples of past student projects with datasets
(../StudentProjects.html)
Additional Datasets to consider:
- Ebay Auction Data for Mario Kart Game: (https://www.openintro.org/data/index.php?data=mariokart)
- Professor ratings and beauty: (https://www.openintro.org/data/index.php?data=prof_evals)
- 50 variables for each of the 3143 US counties: (https://www.openintro.org/data/index.php?data=county_complete)
- US Conference on Teaching Statistics: (https://www.causeweb.org/cause/uscots/uscots21)
- Undergraduate Class Project Competition: (https://www.causeweb.org/usproc/usclap)