Even statisticians are not immune to misinterpretations of Null hypothesis significance tests


Authors: 
Lecoutre, B., Lecoutre, M.-P., & Poitevineau J.
Volume: 
38(1)
Pages: 
online
Year: 
2003
Publisher: 
International. Journal of Psychology
URL: 
http://pdfserve.informaworld.com/796678_731197548_741935381.pdf
Abstract: 

We investigated the way experienced users interpret Null Hypothesis Significance Testing (NHST) outcomes. An<br>empirical study was designed to compare the reactions of two populations of NHST users, psychological researchers<br>and professional applied statisticians, when faced with contradictory situations. The subjects were presented with the<br>results of an experiment designed to test the efficacy of a drug by comparing two groups (treatment/placebo). Four<br>situations were constructed by combining the outcome of the t test (significant vs. nonsignificant) and the observed<br>difference between the two means D (large vs. small). Two of these situations appeared as conflicting (t significant/D<br>small and t nonsignificant/D large). Three fundamental aspects of statistical inference were investigated by means of open<br>questions: drawing inductive conclusions about the magnitude of the true difference from the data in hand, making<br>predictions for future data, and making decisions about stopping the experiment. The subjects were 25 statisticians from<br>pharmaceutical companies in France, subjects well versed in statistics, and 20 psychological researchers from various<br>laboratories in France, all with experience in processing and analyzing experimental data. On the whole, statisticians and<br>psychologists reacted in a similar way and were very impressed by significant results. It must be outlined that professional<br>applied statisticians were not immune to misinterpretations, especially in the case of nonsignificance. However, the interpretations<br>that accustomed users attach to the outcome of NHST can vary from one individual to another, and it is hard to<br>conceive that there could be a consensus in the face of seemingly conflicting situations. In fact, beyond the superficial<br>report of "erroneous" interpretations, it can be seen in the misuses of NHST intuitive judgmental "adjustments" that try to<br>overcome its inherent shortcomings. These findings encourage the many recent attempts to improve the habitual ways of<br>analyzing and reporting experimental data.

The CAUSE Research Group is supported in part by a member initiative grant from the American Statistical Association’s Section on Statistics and Data Science Education