Data Management & Organization

  • These cheat sheets make it easy to learn about and use some of the favorite packages of RStudio. 

    No votes yet
  • RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, and a variety of robust tools for plotting, viewing history, debugging and managing your workspace. 

    No votes yet
  • Which is more robust against outliers: mean or median?  This app demonstrates the (in)stability of these descriptive statistics as the value of an outlier and the number of data points change.

    No votes yet
  • The app allows you to see the trade-offs on various types of outlier/anomaly detection algorithms. Outliers are marked with a star and cluster centers with an X.

    No votes yet
  • This app allows you to derive an approximation to the difference in Bayesian information criterion and to the probability of the null and the alternative hypothesis from the sum of squares obtained in an ANOVA analysis.

    Required input

    • Number of participants
    • Df ... degrees of freedom of the effect of interest
    • Whether the effect is between or within participants
    • SSEffect ... sum of squares of the effect of interest
    • SSError ... sum of squares of the error, for within-factors the by-subject error, associated with this effect
    • SSTotal ... total sum of squares, only required for within-participant designs when using effective sample size (strongly recommended, Nathoo & Masson, 2007)
    No votes yet
  • This resource is designed to provide new users to R, RStudio, and R Markdown with the introductory steps needed to begin their own reproducible research. Many screenshots and screencasts (with no audio) will be included, but if further clarification is needed on these or any other aspect of the book, please create a GitHub issue here or email me with a reference to the error/area where more guidance is necessary.  It is recommended that you have R version 3.3.0 or later, RStudio Desktop version 1.0 or higher, and rmarkdown R package version 1.0 or higher. 

    No votes yet
  • These handouts/links give a foundational understanding of how to set up and use R

    No votes yet
  • This page presents a series of tutorials and interdisciplinary case studies that can be used in a variety of blended as well as brick-and-mortar courses. The materials can be used in introductory level data science courses as well as more advanced data science or statistics courses.  These materials assume that students have a basic prior knowledge of R or Rstudio.

    No votes yet
  • The Global Terrorism Database (GTD) contains information about more than 140,000 terrorist incidents occurring between 1970 and 2014. The data in the GTD are gathered from information gathered through multiple news sources (LaFree, Dugan, & Miller, 2015). In this activity, we will study the extent to which chemical, biological, radiological, and nuclear (CBRN) weapons have been used so far. We analyze whether or not their past use fits with our perceptions. Have CBRN weapons been used successfully in the past? Which weapons are more historically dangerous (more fatalities, injuries) in the hands of terrorists? What are the implications of past usage of CBRN weapons compared to other weapons in determining our priorities in counter-terrorism policies?

    No votes yet
  • The NYPD lab uses interactive, online graphs to better understand patterns in stop and arrest data for the New York Police Department. These data were originally collected by New York Police Department officers and record information gathered as a result of stop question and frisk (SQF) encounters during 2006. These data were used in a study carried out, under contract to the New York City Police Foundation, by the Rand Corporation's Center on Quality Policing. The release of the study, "Analysis of Racial Disparities in the New York Police Department's Stop, Question, and Frisk Practices" (Rand Document TR-534-NYCPF, 2007) generated interest in making the data available for secondary analysis. This data collection contains information on the officer's reasons for initiating a stop, whether the stop led to a summons or arrest, demographic information for the person stopped, and the suspected criminal behavior."

    No votes yet
