Mircea Zloteanu
Kingston University London
A considerable amount has been written about the challenges surrounding teaching statistics in psychology, often in a self-reinforcing loop: “We teach these methods because they are expected, and they are expected because this is what is taught.” For decades, critiques of hypothesis testing have highlighted the problems caused by an overreliance on dichotomous decisions, p-values (e.g., p > .05), and the pursuit of statistical significance (Emmert-Streib, 2024). The debate has alternated between calls to “abandon p-values” and retorts that “p-values are fine*” (*where issues are reframed as features rather than bugs). For example, the fact that p-value are affected by sample size, leading to their “dance” in small samples and their certainty of significance in large samples, is often not given the importance it should when teaching statistics (Cumming & Calin-Jageman, 2024). Yet while statisticians and philosophers of science engage in these debates (Chen, et al., 2023), they often overlook a critical audience: people trying to learn this stuff.
One recurring, and valid, critique of null hypothesis significance testing (NHST) is that “it is difficult to understand.” While the process is often outlined in a few easy steps, the reality of frequentist hypothesis testing is much more nuanced and requires a specific kind of thinking that is (arguably) not intuitive to many learners (Rafi & Greenland, 2020).
A good example is simply the probability that p-values refer to. Many students (and researchers) tend to produce a sentence like this: “The p-value is the probability that the null hypothesis is true." This might seem correct, but this is a false statement, referred to as transposing the conditional. A p-value refers to the following probability: P(data | null hypothesis is true). In lay speak, it is the probability of the data given that the null is true. Thus, a p-value only speaks to the compatibility or surprisingness of the data given what we believe to be the state of the world (Rovetta, 2024). However, people often mistakenly think the p-value tells us: P(null hypothesis is true | data). This is the probability that the null hypothesis is true, given the observed data. These two probabilities are not the same and can have real-world consequences if confused (e.g., the prosecutor’s fallacy).
In my own teaching, I have found this to be a common mistake students make; one that is difficult to correct, as the misinterpretation seems to resonate more with people. Even if a diligent student attempts to learn the correct interpretation on their own, the textbook they use may be of no help (Gliner, et al., 2002). A study found 89% of introductory statistics handbooks contain incorrect definitions of p-values (Cassidy, et al., 2019). NHST seems to be a difficult framework to learn.
Having NHST as the default approach for research and publishing in psychology extends beyond researchers to students who are learning these methods. The incentive structure of academic publishing prioritizes statistical significance, leading to questionable research practices (QRPs)—strategies to artificially enhance "statistically significant" results (e.g., p-hacking; Amrhein, et al., 2017). This mentality affects students from the moment they begin learning statistical analyses, fostering “significance anxiety.” Students often worry when their dissertation results yield a non-significant p-value, believing they have done something wrong, that their thinking was incorrect, or that they will fail their degree. Such concerns are unwarranted, as both significant and non-significant results are part of the scientific (and learning) process, but this fact seems at odds with all other signals in our field (Moran, et al., 2023).
Learning Made Easier with Estimation Statistics
The above paints a bleak picture of the NHST framework for teaching and practice. While there are several frameworks that can substitute the traditional approach, most require significant time commitments and adjustments to how we think about experiments (e.g., Bayesian statistics). One alternative that permits a more fluid transition away from p-values while retaining the core components of frequentist thinking is Estimation Statistics (ES). ES, as its name suggests, focuses on estimating effects of interest, contrasting the NHST’s “is there an effect?” mentality. The core question asked by ES is “what is the magnitude and uncertainty of my effect?”
ES shifts the focus to understanding what the study in front of you can say, prioritises communicating effects in a meaningful way, accompanied by their margin of error and practical importance, and focuses on the visual presentation of results. Importantly, the emphasis is placed on meta-analyses and cumulative evidence, reminding researchers that a single study cannot say much. Tools like estimation plots, which display all data points, and meta-analyses make it easier to see patterns across studies rather than fixating on individual p-values. While I cannot do the approach justice in a few paragraphs, I encourage readers to read Introduction to the New Statistics (Cumming & Calin-Jageman, 2024). This has been the handbook I have used for the past 3 years, with great success, especially in my 1st year introduction to statistics classes.
To provide a concrete example of the difference in approaches, let us imagine a study comparing Aerobic and Strength-based exercise for decreased resting heart rate. Now we assume that there is a difference observed between the two groups. The table below provides an example of how this result would be reported under the two approaches:

In NHST, results can be confined to simply reporting the observed p-value (or even just p < α) and claiming that a non-zero effect was found. This says nothing of the size of the effect, its uncertainty, or its relevance. In ES, reporting focuses specifically on the effect size in question, communicating its magnitude, margin of error, and interpretation.
Another advantage of ES is that it deepens students’ understanding of effect size interpretation. Students begin to think more about what effects actually mean, move beyond binary conclusions, and develop a stronger appreciation for the role of uncertainty in statistical analyses and communication. Furthermore, they engage more deeply with their research questions: they ask whether the results make sense and explore concepts such as practical significance. This reflective approach strengthens their analytical and scientific reasoning.
Estimation-based approaches also foster a stronger focus on data visualisation, an essential skill in statistics. Students become more comfortable plotting data and critically evaluating the appropriateness of their figures. They begin to consider diverse visualisations and often uncover hidden patterns or issues in the data that might otherwise go unnoticed. By placing greater emphasis on data visualisation, students develop the habit of thinking critically about effective science communication.
On the left would be a typical bar plot that accompanies the NHST reporting above, with the asterisks denoting a “significant” difference. On the right is an estimation plot (aka. a Gardner-Altman plot) that would accompany the above ES reporting. The ES plot displays all data points, the mean of each group, and confidence intervals per group. Alongside these is the mean difference (the effect) and its uncertainty, with appropriate labels to understand the direction of the comparison. The estimation plot reduces misleading interpretations regarding the variability of the data by illustrating that values can/do occur above the group mean (and higher than one may believe if relying on the standard error bars) and highlights the size and uncertainty of the effect (for details, see Ho et al., 2019).

Switching to ES can improve students’ understanding of statistical concepts. Within my own experience across several cohorts, test scores show marked improvement and first-attempt progression rates increase. Beyond measurable outcomes, students report feeling more confident and frequently comment that it just “makes sense,” to connect the statistics being reported with the information one wants: the effect. As ES’ entire focus is on reporting, interpreting, and visualising the effect of interest and the abstraction of the hypothesis to the statistical procedure is reduced. NHST can be quite opaque in how it treats testing, amounting to a convoluted p-value producing contraption. ES places communication and clarity at the heart of the testing procedure, giving students more confidence in their reporting and understanding. This confidence is reflected in the quality of their work. By focusing on estimation, students engage more directly with the meaning of their results, rather than relying on rote procedures. No longer is the focus on a single number summary being above or below a critical threshold; now, attention shifts to the actual observed effect and how it contributes to our scientific understanding.
Navigating Institutional Resistance to Change
A barrier to adopting ES is the difficulty of institutional change. Transitioning away from entrenched practices like NHST requires supervisors, lecturers, and support staff to not only understand the new methods but also actively support students who use them. This can be challenging, as academic staff often juggle numerous commitments, leaving little time to learn a new framework or adapt their own workflows. Without adequate time, resources, and institutional support, the adoption of ES can feel like an added burden rather than an improvement. Successful implementation requires thoughtful onboarding, with line managers and department heads providing the necessary resources, training, and encouragement. Institutional inertia—where established practices are seen as "good enough"—can slow progress, as some may view the change as unnecessary or disruptive.
Thankfully, adopting ES does not require discarding the foundations of NHST. The two approaches operate under the same principles, and much of the information is interchangeable. For instance, any NHST values—such as p-values and t-values—can be re-computed using the descriptive statistics provided in estimation outputs (see Francis, 2017). This means that switching to estimation as the default teaching framework does not prevent students from understanding or using NHST methods when required. Moreover, the common objection that “real-world publications will demand p-values” is rendered moot; researchers can simply reply “compute it yourself.” By grounding students’ understanding in estimation first, they are better equipped to handle modern and traditional statistical demands without being constrained by the limitations of NHST.
Summary
ES offer a practical and effective alternative to NHST, easily integrating into existing curricula. Students, even those with minimal statistical background, find the approach intuitive and accessible, reducing misunderstandings and improving their ability to interpret and communicate findings. With a wealth of free online resources available (e.g., www.estimationstats.com; esci.thenewstatistics.com), including comprehensive courses, ES is both affordable and adaptable (dabsetr R package, esci module in Jamovi, and soon in JASP). Most importantly, it enhances statistical thinking for all learners, fostering a deeper understanding of data and its interpretation. Adopting ES is a step forward in creating more proficient and critical researchers.
References
Amrhein, V., Korner-Nievergelt, F., & Roth, T. (2017). The earth is flat (p> 0.05): significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544.
Cassidy, S. A., Dimova, R., Giguère, B., Spence, J. R., & Stanley, D. J. (2019). Failing grade: 89% of introduction-to-psychology textbooks that define or explain statistical significance do so incorrectly. Advances in Methods and Practices in Psychological Science, 2(3), 233-239. https://doi.org/10.1177/2515245919858072
Chen, O. Y., Bodelet, J. S., Saraiva, R. G., Phan, H., Di, J., Nagels, G., ... & De Vos, M. (2023). The roles, challenges, and merits of the p value. Patterns, 4(12). https://doi.org/10.1016/j.patter.2023.100878
Cumming, G., & Calin-Jageman, R. (2024). Introduction to the New Statistics: Estimation, Open Science, and Beyond (2nd ed.). Routledge. https://doi.org/10.4324/9781032689470
Emmert-Streib, F. (2024). Trends in null hypothesis significance testing: Still going strong. Heliyon, 10(21).
Francis, G. (2017). Equivalent statistics and data interpretation. Behavior research methods, 49(4), 1524-1538. https://doi.org/10.3758/s13428-016-0812-3
Gliner, J. A., Leech, N. L., & Morgan, G. A. (2002). Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say? The Journal of Experimental Education, 71(1), 83–92. https://doi.org/10.1080/00220970209602058
Ho, J., Tumkaya, T., Aryal, S., Choi, H., & Claridge-Chang, A. (2019). Moving beyond P values: data analysis with estimation graphics. Nature methods, 16(7), 565-566. https://doi.org/10.1038/s41592-019-0470-3
Moran, C., Richard, A., Wilson, K., Twomey, R., & Coroiu, A. (2023). I know it’s bad, but I have been pressured into it: Questionable research practices among psychology students in Canada. Canadian Psychology / Psychologie canadienne, 64(1), 12-24. https://doi.org/10.1037/cap0000326
Rafi, Z., & Greenland, S. (2020). Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC medical research methodology, 20, 1-13. https://doi.org/10.1186/s12874-020-01105-9
Rovetta, A. (2024). S-values and Surprisal intervals to Replace P-values and Confidence Intervals. REVSTAT-Statistical Journal.