BOF-Th10: P-values provide evidence for associations in the population, assuming random sampling from this population. Discussing the value of regularly reminding students of this fact including referring to compatibility. Thursday


Hillary Watt and Andrew Pua


Information

Title: *P-values provide evidence for associations in the population, assuming random sampling from this population. Discussing the value of regularly reminding students of this fact including referring to compatibility. 

Background: McShane and Gal (2015; 2017) 1 2carried out surveys amongst researchers (including statisticians) and students (who had and had not been taught p-values). They found the majority of researchers (including around half of statistician researchers) responses implied that p>0.10 evidences lack of association amongst participants/ observations, despite the evidence from p-values applying only to the population (from which participants/ observations were assuming to be drawn at random). When presented with ‘p=0.01’ most responded correctly. Such responses may reflect genuine lack of understanding or carelessness, resulting from common focus on population inferences. Students who had not learnt p-values were not influenced by the p-value in their response. 

Our data results: We have survey data from 60 medical statistics educators. Nearly all (51/56) survey responders include ‘population’ (or real or true) within their confidence interval interpretation. Yet only 24/58 include this into their p-value interpretation. Further survey question: We gave participants’ mean in two group. When presented with ‘p=0.27’ for this mean comparison, many educators’ responses denied the existence of a mean difference amongst participants. We find a statistical association between responders erroneously responding in this manner and their omitting “population” (or real or true) from their p-value interpretation (odds ratio 9.8, CI 1.8-130). This erroneous response either reflects carelessness or ignorance that p-values evidence associations in population, not amongst participants. Within statistics education, stating distinct means in two groups, then reporting “no evidence for a difference in mean” may feel contradictory, without clarifying that this evidence applies to the population (assuming random sampling from this population). We encourage such clarity within p-value interpretations. 

Suggested wording for p-value interpretations: This data is highly compatible (p=0.34) with random selection from a population where there is no such association. Or this p-value has negligible compatibility (p<0.0001) with random selection from a population where there is no such association, assuming that the assumptions of the statistical model are correct. 

Novel graph illustrating the continuous nature of p-values (Watt 2020 3https://academic.oup.com/ije/article/49/6/2083/5876177): Distributions of sampling statistics are notoriously hard to understand. Hence standard graphs showing these (perhaps Normal or t-distributions), with an area reflecting a p-value, are not easy to understand. Furthermore, these graphs show distribution only assuming that the null hypothesis is true. By contrast, it is possible to draw a graph that directly shows p-values against z-values, by using an inverted log scale for p-values. The t-value/ z-value can be described as a difference in mean amongst observations/ participants, relative to its precision of estimation (standard error). This precision (standard error) reflects how precisely the difference in means amongst participants estimates the difference in means amongst the population, assuming random sampling of participants from the population. This may help students to encourage the continuous nature of p-values, which may support more skilful p-value and wider data interpretation. This is intended to address concerns over excessive focus on whether p<0.05. 

For Discussion: 
1) Does it matter that many respond erroneously when asked about differences in the mean specifically amongst participants, when presented with ‘p=0.27’ (but rarely when presented with ‘p=0.01’)? 
2) If we say mean=4.5 in group A, mean=5.7 in group B, but no evidence for a difference between these two means. Should we be concerned about the apparent contradiction with the fact that these means are clearly different? Should we add “no evidence for a difference between these two means in the population, assuming that participants/ observations are sampled at random from this population?” 
3) How is it best to encourage interpretation of p-values on a continuous scale? 

Authors: Hilary C Watt (Imperial College UK), Kay Leedham-Green (Imperial College UK), Kate Honeyford (Institute of Cancer Research UK), Damian Farnell (Cardiff University UK), Mintu Nath (Aberdeen university, UK), Renata Medeiros Mirra (Cardiff University UK). 
References 
1. McShane B, Gal D. Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence. Management Science 2015;62(6):1707-18. doi: 10.1287/mnsc.2015.2212 
2. McShane BB, Gal D. Statistical Significance and the Dichotomization of Evidence. Journal of the American Statistical Association 2017;112(519):885-95. doi: 10.1080/01621459.2017.1289846 
3. Watt HC. Reflection on modern methods: Statistics education beyond ‘significance’: novel plain English interpretations to deepen understanding of statistics and to steer away from misinterpretations. International Journal of Epidemiology 2020;49(6):2083-88. doi: 10.1093/ije/dyaa080


register