Resources | Data Sources


Data Sources

A collection of links to repositories and reliable, free data resources.


Automobiles

  • Autoweb | Provides specifications on new and used cars including price, horsepower, fuel efficiency, and features. Useful for consumer research and market comparisons.
  • Cars.com | Includes listings and data for both new and used vehicles. Allows filtering by model, year, performance stats, and price, making it a resource for car buyers and researchers.

Consumer Information

  • Consumer Reports Online | An online extension of the Consumer Reports magazine. Offers product reviews, reliability ratings, and testing data for a wide range of consumer goods.
  • Foreign Currency Exchange Rates | Provides up-to-date foreign exchange rates for major world currencies. Useful for international finance, trade analysis, and economic studies.
  • The Consumer Price Index (CPI) | Offers historical and current data on U.S. inflation and consumer prices. Maintained by the Bureau of Labor Statistics, this site supports economic and cost-of-living research.

Education

  • ACT Inc | Administers the ACT exam and provides statistics and research on college admissions, test scores, and student readiness.
  • College Scorecard | Provides downloadable data on U.S. colleges and universities, including enrollment, costs, graduation rates, student debt, and post-graduation earnings. Supports educational and policy-related statistical analysis.
  • National Center for Education Statistics (NCES) Database | Offers access to tools and datasets related to U.S. education statistics, including elementary to postsecondary education. Includes surveys, assessments, and reports maintained by the National Center for Education Statistics.
  • The College Board | Administers standardized exams such as the SAT, PSAT, and AP tests. Offers data, reports, and trends on test performance and college readiness metrics.

General

  • Azure & AWS Machine Learning | Online platforms by Microsoft (Azure) and Amazon (AWS) for building, deploying, and analyzing machine learning models at scale. Useful for exploring cloud computing tools in data science.
  • CSV Format Data for R Packages | Online, international collection of over 1,000 datasets distributed with R packages. Each dataset is in CSV format and includes metadata, source package, variable types, and documentation to support the teaching and learning of statistics.
  • ICPSR | Archive of social and behavioral science research data, including longitudinal surveys and studies in political science, education, public policy, and criminal justice.
  • iNZight | Has an easy-to-use data analysis tool (iNZight) designed for the teaching and learning of statistics. The site emphasizes graphical summaries and drag‑and‑drop variable selection, offering both a desktop version and a lightweight web (Lite) version. It also provides official user guides, downloadable example exercises, and direct links to the Wild About Statistics video series, which includes tutorials on using iNZight and broader statistical concepts.
  • Journal of Statistics Education | Online, international journal focusing on the teaching and learning of statistics. This site also contains links to several statistical education organizations, newsletters, discussion groups, and the JSE Dataset Archives.
  • JSE Data Archive | Online, international archive hosted by the Journal of Statistics Education (JSE). It contains peer-reviewed datasets with instructional documentation and contextual narratives. It also links to contributor guidelines, classroom modules, and dataset documentation templates.
  • Kaggle Competitions | It offers - of open datasets across domains (e.g., education, health, finance), an interactive web-based environment for exploring and analyzing data via notebooks, competitions for applying statistical and machine learning models, and extensive community-contributed documentation and tutorials.
  • Larry Winner's Miscellaneous Datasets | Online, international collection of real-world datasets for the teaching and learning of statistics. Topics include sports, demographics, chemistry, and engineering. Datasets are downloadable and categorized by method, such as regression, ANOVA, or experimental design.
  • OzDASL - Australasian Data and Story Library | Online, international portal providing datasets for the teaching and learning of statistics. Primarily featuring the Australian Data and Story Library (OzDASL), the site includes links to statistical games, visualization tools, and other educational materials (some links may be outdated).
  • Statistical Science Web: A Portal for Statistical Science | Online, international resource for the teaching and learning of statistics. It provides access to OzDASL and offers links to university departments, conferences, GLM and bioinformatics resources, and archives related to S and MATLAB (some links may be outdated).
  • The Data and Story Library (DASL) | Online, international library of datasets for the teaching and learning of statistics. Each dataset includes a “story” for context and is linked to tools and filters by method. - download formats are available.
  • TidyTuesday | Weekly social data project for R learners, hosted on GitHub. Provides curated, cleaned datasets with documentation, along with examples from the R community, designed for practicing data wrangling, visualization, and reproducible workflows.
  • Tuva Dataset | Online dataset library offering a wide collection of real-world, classroom-ready datasets. Resources are designed for teaching and learning data literacy and statistics, with filters by grade level, subject, and topic.
  • UCI Machine Learning Repository | Online, international repository for the teaching and learning of statistics (and machine learning). It contains datasets organized by domain (biology, social science, physical sciences, business, games) and by task type (classification, regression, clustering), each with standardized documentation and download formats.
  • Significance Magazine | (Offers paid content) Has the table of contents for Volume 12, Issue 4 (2015) of Significance magazine, which features accessible articles on how statistics applies to everyday life, public policy, and current events. The site provides links to abstracts and full articles (with subscription or institutional access), offering useful examples and insights for the teaching and learning of statistics.

Government

  • Baltimore Open Data | An open data portal providing access to data on housing, crime, 311 service requests, business licenses, and other city functions in Baltimore.

Health & Environment

Images, Audio, Video

  • Digital Public Library of America | Provides access to digital artifacts including photos, manuscripts, and videos from U.S. libraries and museums.
  • Europeana | European cultural heritage archive offering access to digitized texts, images, audio, and video from European institutions.
  • Internet Archive | Massive digital archive of historical websites, books, music, and film. Ideal for historical research and multimedia projects.

Miscellaneous

Newspapers & Magazines

  • New York Times | Major U.S. newspaper featuring data journalism, infographics, and political reporting. Useful for case studies and contextual data.
  • USA Today | National U.S. daily newspaper offering charts, visualizations, and public trend coverage useful for media literacy and data analysis.

Politics & Polling

  • ANES – American National Election Studies | Provides U.S. election survey datasets with information on voting behavior, public opinion, and group ratings. Includes variables such as respondent evaluations of movements (e.g., #MeToo) and institutions (e.g., college professors). Datasets are large and sometimes require cleaning.
  • CNN | News outlet with political polling, election coverage, and public sentiment summaries. Offers charts and analysis, not raw datasets.
  • FEC Contributions GitHub Repository | GitHub repository with code and cleaned FEC contribution data. Created for R use, useful in campaign finance analysis.
  • FiveThirtyEight Data Portal | Portal to access datasets featured in FiveThirtyEight articles. Data is cleaned and ready for public use.
  • FiveThirtyEight GitHub Repository | FiveThirtyEight's GitHub repository of raw datasets used in articles. Includes metadata and organized folders by topic.
  • FiveThirtyEight R Package | An R package with pre-processed FiveThirtyEight datasets and data dictionaries, designed for education and reproducible analysis.
  • FiveThirtyEight Website | Homepage of FiveThirtyEight, known for data-driven articles on politics, sports, and economics. Founded by Nate Silver.
  • Iowa Secretary of State Voter Registration Data | Official source for county-level voter registration statistics in Iowa. Useful for political and geographic analysis.
  • Pew Research Center | Nonpartisan research organization conducting polls, demographic studies, and social trend analysis. Often includes downloadable reports.
  • The Gallup Organization | Presents U.S. and global public opinion polls. Useful for studying proportions, survey methodology, and social trends.

Qualitative Data

Social & Economic Data

  • American Time Use Survey (ATUS) | Nationally representative U.S. dataset on how people spend their time daily. Includes variables on work, leisure, childcare, education, and more, making it useful for regression, time-series, and demographic analysis.
  • Bureau of Labor Statistics | Provides official U.S. labor statistics including employment, wages, consumer prices, and productivity.
  • CIA World Fact Book | Provides rankings and comparative data for countries around the world, including demographics, economics, military, and infrastructure.
  • Data.gov | A central portal for accessing public U.S. government data, covering a wide range of domains including energy, education, climate, and health.
  • Datahub.io | Searchable repository of datasets published by various organizations and governments. Covers economics, health, environment, and more.
  • Federal Reserve Economic Data (FRED) | A comprehensive source for U.S. economic indicators including inflation, employment, GDP, and interest rates. Hosted by the Federal Reserve Bank of St. Louis.
  • General Social Survey | Social science survey dataset capturing American opinions on a wide range of topics including politics, religion, and demographics.
  • National Center for Health Statistics (NCHS) | The CDC’s statistical agency. Offers datasets on health status, birth and death records, disease prevalence, and healthcare usage.
  • PublicData.eu | An EU-based portal for accessing open data from European governments. Datasets span across - sectors and are available in various formats.
  • Statistics Canada | Canada’s national statistical agency. Offers datasets and reports on population, health, economy, and more.
  • US Census Bureau | The U.S. government’s principal source for population, housing, economic, and geographic data. Offers a wide variety of downloadable data tools and APIs.
  • US Historic Financial Asset Returns | Historical returns on stocks, bonds, and other financial assets compiled by NYU professor Aswath Damodaran.
  • World Bank Data | Provides extensive global development data, including GDP, poverty rates, health outcomes, and environmental indicators.

Sports

  • Baseball Prospectus | Offers sortable baseball data and performance metrics. Known for its advanced analytics and sabermetrics, especially useful for in-depth player evaluation.
  • Baseball Reference | An extensive historical database for baseball statistics including player records, team stats, leaderboards, and advanced analytics.
  • CBS Sportsline | Provides extensive data on professional and college sports. The statistics section includes up-to-date and historical information on major league baseball and other sports.
  • NCAA Football Statistics | Provides research and statistics on NCAA football including graduation rates, academic performance, and game results.
  • NFL Statistics | Offers statistics on NFL team and player performance including yards, touchdowns, and defensive metrics. Includes sortable and season-specific data.
  • Sports Illustrated | An online magazine offering commentary, rankings, and occasional sports data. Provides narrative context and statistics for various sports leagues and events.

Statistics Blogs

  • Data is Plural Archive | Newsletter archive highlighting unique and open-access datasets across domains. Great for discovering new project ideas.
  • FlowingData – Data Visualization Blog Collection | Collection of links to various influential data blogs curated by FlowingData. Focuses on visualization and storytelling.
  • R4Stats | Reviews and comparisons of statistical software like R, SAS, SPSS, and Python. Ideal for methods and tools discussions.
  • R-Bloggers | A widely used aggregator of R-related blog content. Ideal for finding tutorials, software updates, and applied examples.
  • Simply Statistics Blog | Blog by biostatistics professors commenting on research, methodology, and reproducibility. Frequently links to new datasets.
  • StatsBlogs Aggregator | Aggregator of blog posts from across the statistics and data science community. Updated regularly with news and commentary.
  • StatsChat New Zealand | New Zealand-based blog focusing on statistical communication and media analysis. Critiques misleading use of statistics.

Transportation

  • Boston Bluebikes Data | Provides trip-level data from Boston’s Bluebikes bikeshare system. Includes start and end times, stations, trip durations, and rider categories, useful for time-series, mapping, and urban mobility analysis.

 

register