Data Sources
A collection of links to repositories and reliable, free data resources.
Automobiles
- Autoweb | Provides specifications on new and used cars including price, horsepower, fuel efficiency, and features. Useful for consumer research and market comparisons.
- Cars.com | Includes listings and data for both new and used vehicles. Allows filtering by model, year, performance stats, and price, making it a resource for car buyers and researchers.
Consumer Information
- Consumer Reports Online | An online extension of the Consumer Reports magazine. Offers product reviews, reliability ratings, and testing data for a wide range of consumer goods.
- Foreign Currency Exchange Rates | Provides up-to-date foreign exchange rates for major world currencies. Useful for international finance, trade analysis, and economic studies.
- The Consumer Price Index (CPI) | Offers historical and current data on U.S. inflation and consumer prices. Maintained by the Bureau of Labor Statistics, this site supports economic and cost-of-living research.
Education
- ACT Inc | Administers the ACT exam and provides statistics and research on college admissions, test scores, and student readiness.
- College Scorecard | Provides downloadable data on U.S. colleges and universities, including enrollment, costs, graduation rates, student debt, and post-graduation earnings. Supports educational and policy-related statistical analysis.
- National Center for Education Statistics (NCES) Database | Offers access to tools and datasets related to U.S. education statistics, including elementary to postsecondary education. Includes surveys, assessments, and reports maintained by the National Center for Education Statistics.
- The College Board | Administers standardized exams such as the SAT, PSAT, and AP tests. Offers data, reports, and trends on test performance and college readiness metrics.
General
- Azure & AWS Machine Learning | Online platforms by Microsoft (Azure) and Amazon (AWS) for building, deploying, and analyzing machine learning models at scale. Useful for exploring cloud computing tools in data science.
- CSV Format Data for R Packages | Online, international collection of over 1,000 datasets distributed with R packages. Each dataset is in CSV format and includes metadata, source package, variable types, and documentation to support the teaching and learning of statistics.
- ICPSR | Archive of social and behavioral science research data, including longitudinal surveys and studies in political science, education, public policy, and criminal justice.
- iNZight | Has an easy-to-use data analysis tool (iNZight) designed for the teaching and learning of statistics. The site emphasizes graphical summaries and drag‑and‑drop variable selection, offering both a desktop version and a lightweight web (Lite) version. It also provides official user guides, downloadable example exercises, and direct links to the Wild About Statistics video series, which includes tutorials on using iNZight and broader statistical concepts.
- Journal of Statistics Education | Online, international journal focusing on the teaching and learning of statistics. This site also contains links to several statistical education organizations, newsletters, discussion groups, and the JSE Dataset Archives.
- JSE Data Archive | Online, international archive hosted by the Journal of Statistics Education (JSE). It contains peer-reviewed datasets with instructional documentation and contextual narratives. It also links to contributor guidelines, classroom modules, and dataset documentation templates.
- Kaggle Competitions | It offers - of open datasets across domains (e.g., education, health, finance), an interactive web-based environment for exploring and analyzing data via notebooks, competitions for applying statistical and machine learning models, and extensive community-contributed documentation and tutorials.
- Larry Winner's Miscellaneous Datasets | Online, international collection of real-world datasets for the teaching and learning of statistics. Topics include sports, demographics, chemistry, and engineering. Datasets are downloadable and categorized by method, such as regression, ANOVA, or experimental design.
- OzDASL - Australasian Data and Story Library | Online, international portal providing datasets for the teaching and learning of statistics. Primarily featuring the Australian Data and Story Library (OzDASL), the site includes links to statistical games, visualization tools, and other educational materials (some links may be outdated).
- Statistical Science Web: A Portal for Statistical Science | Online, international resource for the teaching and learning of statistics. It provides access to OzDASL and offers links to university departments, conferences, GLM and bioinformatics resources, and archives related to S and MATLAB (some links may be outdated).
- The Data and Story Library (DASL) | Online, international library of datasets for the teaching and learning of statistics. Each dataset includes a “story” for context and is linked to tools and filters by method. - download formats are available.
- TidyTuesday | Weekly social data project for R learners, hosted on GitHub. Provides curated, cleaned datasets with documentation, along with examples from the R community, designed for practicing data wrangling, visualization, and reproducible workflows.
- Tuva Dataset | Online dataset library offering a wide collection of real-world, classroom-ready datasets. Resources are designed for teaching and learning data literacy and statistics, with filters by grade level, subject, and topic.
- UCI Machine Learning Repository | Online, international repository for the teaching and learning of statistics (and machine learning). It contains datasets organized by domain (biology, social science, physical sciences, business, games) and by task type (classification, regression, clustering), each with standardized documentation and download formats.
- Significance Magazine | (Offers paid content) Has the table of contents for Volume 12, Issue 4 (2015) of Significance magazine, which features accessible articles on how statistics applies to everyday life, public policy, and current events. The site provides links to abstracts and full articles (with subscription or institutional access), offering useful examples and insights for the teaching and learning of statistics.
Government
- Baltimore Open Data | An open data portal providing access to data on housing, crime, 311 service requests, business licenses, and other city functions in Baltimore.
Health & Environment
- Air Quality Data (RPubs) | Provides code and visualizations for air quality measurements in various U.S. cities. Useful for environmental health and public policy analysis.
- Alaskan Ice River Breakup Data | Data on seasonal ice breakup dates of Alaskan rivers. Useful for environmental monitoring and climate studies.
- Berkeley Gridded Data | Provides high-resolution gridded climate data for environmental research and modeling.
- Berkeley Land and Ocean Data | Historical global temperature data for land and ocean surfaces compiled by Berkeley Earth. Useful for climate analysis.
- CDC Health Datasets | Extensive collection of datasets covering health statistics in the U.S., including vital records and survey data.
- Drug Research Datasets (Bradstreet) | Curated datasets from early and late-stage drug trials. Useful for biostatistics and pharmaceutical research education.
- Global Climate Time Series Data | Provides historical global climate temperature series used in climate change research. Hosted by the UK Met Office.
- National Hurricane Center Data | Offers storm tracking data, forecasts, and hurricane-related data products. Operated by NOAA.
- Teaching of Statistics in the Health Sciences (TSHS) | Peer-reviewed and approved data sets and teaching materials that are centrally archived in a public domain, easy to navigate, website.
- Vanderbilt Biostatistics Data | Biostatistics data curated by Vanderbilt University. Includes datasets for modeling, survival analysis, and regression.
Images, Audio, Video
- Digital Public Library of America | Provides access to digital artifacts including photos, manuscripts, and videos from U.S. libraries and museums.
- Europeana | European cultural heritage archive offering access to digitized texts, images, audio, and video from European institutions.
- Internet Archive | Massive digital archive of historical websites, books, music, and film. Ideal for historical research and multimedia projects.
Miscellaneous
- Amy Hogan’s Blog – Miscellaneous Data | A curated collection of datasets shared on a blog focused on introductory statistics education.
- County-Level U.S. Data (3143 counties) | Comprehensive dataset with 50+ variables per U.S. county. Suitable for exploratory data analysis and modeling.
- Ebay Auction Data – Mario Kart | Dataset including data from online Mario Kart auctions. Useful for regression, market behavior, and listing analysis.
- Encyclopedia.com | An online encyclopedia providing summaries and articles across a range of disciplines. Can be used as a background resource.
- FiveThirtyEight GitHub Datasets | A GitHub repository of datasets used in FiveThirtyEight articles. Covers politics, sports, science, and more with accompanying context.
- Gapminder Data Sets | Global data visualizations and downloadable datasets focused on income, health, and development.
- Internet Movie Database (IMDB) | Database for movies, TV shows, and celebrities. Includes ratings, box office data, and detailed metadata on media content.
- Journal of Cultural Analytics Data | Datasets accompanying research published in the Journal of Cultural Analytics, focused on text and cultural data analysis.
- Miscellaneous Data from Pomona College | A collection of datasets used in academic courses at Pomona College. Topics include economics, biology, and social science.
- Miscellaneous Data Repository (R Datasets) | Repository of over 1,000 datasets included with R packages. Well documented and available in CSV format.
- Open Policing Project – Original Data | Raw stop-and-frisk policing data collected by the Stanford Open Policing Project. Useful for studying racial disparities.
- Open Policing Project – Wrangled Data | Cleaned and reformatted version of Stanford's Open Policing data, ready for analysis.
- Professor Ratings and Beauty | Data relating professor ratings to perceived attractiveness. Commonly used for correlation and bias analysis.
- Rice Virtual Lab in Statistics | A virtual lab for teaching statistics through simulations, case studies, and an online textbook.
- San Francisco OkCupid Users Public Profile | Public dataset of OkCupid user profiles from San Francisco, including survey responses and demographics.
- UCLA Statistics Case Studies | Educational case studies in statistics from UCLA’s statistics department. Useful for learning applied data analysis.
- Undergraduate Class Project Competition | National-level student project competition offering sample data submissions, topics, and judging rubrics.
- USCOTS: Teaching Statistics | Conference site that shares presentations and resources related to statistics education and pedagogy.
Newspapers & Magazines
- New York Times | Major U.S. newspaper featuring data journalism, infographics, and political reporting. Useful for case studies and contextual data.
- USA Today | National U.S. daily newspaper offering charts, visualizations, and public trend coverage useful for media literacy and data analysis.
Politics & Polling
- ANES – American National Election Studies | Provides U.S. election survey datasets with information on voting behavior, public opinion, and group ratings. Includes variables such as respondent evaluations of movements (e.g., #MeToo) and institutions (e.g., college professors). Datasets are large and sometimes require cleaning.
- CNN | News outlet with political polling, election coverage, and public sentiment summaries. Offers charts and analysis, not raw datasets.
- FEC Contributions GitHub Repository | GitHub repository with code and cleaned FEC contribution data. Created for R use, useful in campaign finance analysis.
- FiveThirtyEight Data Portal | Portal to access datasets featured in FiveThirtyEight articles. Data is cleaned and ready for public use.
- FiveThirtyEight GitHub Repository | FiveThirtyEight's GitHub repository of raw datasets used in articles. Includes metadata and organized folders by topic.
- FiveThirtyEight R Package | An R package with pre-processed FiveThirtyEight datasets and data dictionaries, designed for education and reproducible analysis.
- FiveThirtyEight Website | Homepage of FiveThirtyEight, known for data-driven articles on politics, sports, and economics. Founded by Nate Silver.
- Iowa Secretary of State Voter Registration Data | Official source for county-level voter registration statistics in Iowa. Useful for political and geographic analysis.
- Pew Research Center | Nonpartisan research organization conducting polls, demographic studies, and social trend analysis. Often includes downloadable reports.
- The Gallup Organization | Presents U.S. and global public opinion polls. Useful for studying proportions, survey methodology, and social trends.
Qualitative Data
- Chronicling America | Searchable archive of historic U.S. newspapers dating back to the 1700s. Great for historical or qualitative research.
- Digital Humanities Resources | Wiki-based collection of data sources for digital humanities projects including corpora, texts, and cultural datasets.
- HathiTrust Research Center | Portal for accessing large-scale literary text collections and tools for textual analysis. Focused on digital humanities research.
- ICPSR - Inter-university Consortium for Political and Social Research | Repository of political and social science datasets for research and teaching. Includes surveys, longitudinal studies, and more.
- Internet Data Sources for Social Scientists | Guide to data resources for social scientists, hosted by Cornell. Includes both qualitative and quantitative sources.
- JSTOR Data for Research | Provides access to academic papers' metadata and citation datasets for social science and humanities research.
- Project Gutenberg | Free digital library offering access to over 60,000 classic public domain books. Useful for textual and literary analysis.
Social & Economic Data
- American Time Use Survey (ATUS) | Nationally representative U.S. dataset on how people spend their time daily. Includes variables on work, leisure, childcare, education, and more, making it useful for regression, time-series, and demographic analysis.
- Bureau of Labor Statistics | Provides official U.S. labor statistics including employment, wages, consumer prices, and productivity.
- CIA World Fact Book | Provides rankings and comparative data for countries around the world, including demographics, economics, military, and infrastructure.
- Data.gov | A central portal for accessing public U.S. government data, covering a wide range of domains including energy, education, climate, and health.
- Datahub.io | Searchable repository of datasets published by various organizations and governments. Covers economics, health, environment, and more.
- Federal Reserve Economic Data (FRED) | A comprehensive source for U.S. economic indicators including inflation, employment, GDP, and interest rates. Hosted by the Federal Reserve Bank of St. Louis.
- General Social Survey | Social science survey dataset capturing American opinions on a wide range of topics including politics, religion, and demographics.
- National Center for Health Statistics (NCHS) | The CDC’s statistical agency. Offers datasets on health status, birth and death records, disease prevalence, and healthcare usage.
- PublicData.eu | An EU-based portal for accessing open data from European governments. Datasets span across - sectors and are available in various formats.
- Statistics Canada | Canada’s national statistical agency. Offers datasets and reports on population, health, economy, and more.
- US Census Bureau | The U.S. government’s principal source for population, housing, economic, and geographic data. Offers a wide variety of downloadable data tools and APIs.
- US Historic Financial Asset Returns | Historical returns on stocks, bonds, and other financial assets compiled by NYU professor Aswath Damodaran.
- World Bank Data | Provides extensive global development data, including GDP, poverty rates, health outcomes, and environmental indicators.
Sports
- Baseball Prospectus | Offers sortable baseball data and performance metrics. Known for its advanced analytics and sabermetrics, especially useful for in-depth player evaluation.
- Baseball Reference | An extensive historical database for baseball statistics including player records, team stats, leaderboards, and advanced analytics.
- CBS Sportsline | Provides extensive data on professional and college sports. The statistics section includes up-to-date and historical information on major league baseball and other sports.
- NCAA Football Statistics | Provides research and statistics on NCAA football including graduation rates, academic performance, and game results.
- NFL Statistics | Offers statistics on NFL team and player performance including yards, touchdowns, and defensive metrics. Includes sortable and season-specific data.
- Sports Illustrated | An online magazine offering commentary, rankings, and occasional sports data. Provides narrative context and statistics for various sports leagues and events.
Statistics Blogs
- Data is Plural Archive | Newsletter archive highlighting unique and open-access datasets across domains. Great for discovering new project ideas.
- FlowingData – Data Visualization Blog Collection | Collection of links to various influential data blogs curated by FlowingData. Focuses on visualization and storytelling.
- R4Stats | Reviews and comparisons of statistical software like R, SAS, SPSS, and Python. Ideal for methods and tools discussions.
- R-Bloggers | A widely used aggregator of R-related blog content. Ideal for finding tutorials, software updates, and applied examples.
- Simply Statistics Blog | Blog by biostatistics professors commenting on research, methodology, and reproducibility. Frequently links to new datasets.
- StatsBlogs Aggregator | Aggregator of blog posts from across the statistics and data science community. Updated regularly with news and commentary.
- StatsChat New Zealand | New Zealand-based blog focusing on statistical communication and media analysis. Critiques misleading use of statistics.
Transportation
- Boston Bluebikes Data | Provides trip-level data from Boston’s Bluebikes bikeshare system. Includes start and end times, stations, trip durations, and rider categories, useful for time-series, mapping, and urban mobility analysis.