- Misinterpreted – the most common misinterpretation is the “probability of the studied hypothesis is true”. For example, a p-value of .02 is wrongly considered to mean that the null hypothesis is 2% likely to be true and the alternative is 98% likely to be correct
- Overtrusted – this is an issue when it is forgotten that proper inference requires full reporting and transparency; small p-values do not guarantee either.
- Misused – business and policy decisions are often made based on a p-value passing a specific threshold, even though a p-value is not a measure of effect size.
Dr. Goodman says that what the p-value is: mathematically if there is no effect (ie. null hypothesis is true), what is the chance of what you see in your results (or even something more extreme). Several things Dr. Goodman notes that the p-value is not, including (i) the probability of the null hypothesis given the data, (ii) the probability of the data under Ho (ie. if only chance were operating), (iii) the probability that the data were observed by chance, (iv) the probability that a non-null association is “real” given the data.
A potential alternative to the p-value is the Bayes Theorem, based on the degree of certainty regarding your hypothesis. We start with a degree of certainty, obtain the statistical evidence, and this brings you to a new level of certainty:
So, based on this theorem, what does a p-value of 0.05 mean? According to Dr. Goodman, if you start at a 25% level of certainty, a p-value of 0.05 means that at most you are 45%-69% certain that your finding is true. Comparatively speaking, if you start at a 25% level of certainty, a p-value of 0.01 means that at most you are 73-88% certain that your finding is true.
Dr. Goodman quotes Arie Fisher from 1926 (the developer of the p-value) “personally the writers set a low standard of significance at the 5% point. A scientific fact should be regarded as experimentally established, only if a properly designed experiment rarely fails to give this level of significance.” Dr. Goodman states that “this is not about a one off, it is about repeatability – it means it’s worth doing the experiment again.” Dr. Goodman also quoted a paper entitled “Mathematical vs Statistical Significance” by Dr. Edwin Boring (psychologist) from 1919 stating “mathematical measures of the difference may need to be discounted when arriving at a scientific conclusion.”
Dr. Goodman concluded with several important take-home messages, noting that uncertainty is misrepresented in many ways:
- Reliance on single studies to establish major claims
- Failure to consider the full range of the confidence interval
- Selective reporting of experiments that “work”
- Failure to account adequately for massive multiplicity and high “research degrees of freedom”
- Poor reporting and handling of missing data
- Strong publication bias in high tier journals
- Failure to conduct sensitivity (aka robustness) analyses
- Inadequate internal or external validation
- Use of rigid significance verdicts and minimal reporting of precision
Presented by: Steven Goodman, MD, MHS, PhD, Stanford School of Medicine, Palo Alto, CA
Written By: Zachary Klaassen, MD, MSc – Assistant Professor of Urology, Georgia Cancer Center, Augusta University/Medical College of Georgia, Twitter: @zklaassen_md at the 2019 American Society of Clinical Oncology Genitourinary Cancers Symposium, (ASCO GU) #GU19, February 14-16, 2019 - San Francisco, CA
- Ioannidis JPA. The Proposal to Lower P Value Thresholds to .005. JAMA2018 Apr 10;319(14):1429-1430.