Respond to my colleague post separately:
Colleague post#1: Listed below is my response to this weeks discussion question. I look forward to your comments and questions.
Null hypothesis testing tells the researcher that something has occurred and it is significant. However, it can not tell us exactly what occurred. It can give the researcher the ability to state that what occurred did not occur by chance. But the reseracher can not state specifically what occurred. When testing to discredit the null hypothesis, flawed research can result if the researcher conducts the test incorrectly, thereby leading to Type I and Type II errors. Schmit (1996) argues that relying on significance testing has systematically retarded the growth of cumulative knowledge in psychology (p.115), while Cohen argues that testing for the null hypothesis encourages misinterpretation, misuse, and overconfidence, resulting in poor scientific judgment and distorted conclusions across studies (p.1308). These two articles got me thinking about what else would be affected if the findings were incorrect? This is where my past classes came in, and I began to examine threats to validity and reliability, specifically focusing on how flawed research contributes to compromises in the validity and reliability of the study findings. In research, there are two forms of validity; external validity and internal validity. External validity refers to how findings in one study could apply to a population in another study (Pearl & Bareinboim, 2024). This is known as generalizability (Pearl & Bareinboim, 2024). Thus, if the calculations in a study are done incorrectly, a researcher could make the argument that the findings do not apply to other areas because of x, y, z. Internal validity measures how well a study is conducted (its structure) and how accurately its results reflect the studied group (Cuncic, 2025, para.2). The same argument applies here, if the statistical tests are not done correctly, then the results will not be accurate, and will not reflect the studied group (Cuncic, 2025, para.2). Reliability refers to the consistency and reproducibility of measurements. It assesses the degree to which a measurement tool produces stable and dependable results when used repeatedly under the same conditions (McLead, 2024, para.1). When reporting the statistical testing section of research this is where the researcher says, x, y, z test was used to conduct a, b, c. This test is deemed reliable because it has been used in x number of tests with y consistent findings. To the average person, this sounds valid. However, the average person does not consider how the test was conducted, what the population was, or whether the study was conducted correctly. I teach this concept in my business classes using late-night infomercials as the example, specifically talking about how statistics and statements are skewed to sell a product. I tell my students, before you go out and buy the latest thing you saw on late-night TV, ask yourself this question: Can I find the research results they are talking about? If not, do not buy the product because it is most likely a gimmick, and they are using statistics to make their product sound legit.
References
Abos, P. (2024). Validity and Reliability: The extent to which your research findings are accurate
and consistent. Retrieved February 4, 2026, from
https://www.researchgate.net/publication/384402476_Validity_and_Reliability_The_
extent_to_which_your_research_findings_are_accurate_and_consistent
Cohen, J. (1990). Things I Have Learned (So Far). American Psychologist 45(12), 1304-1312.
Cuncic, A. (2025). Internal Validity vs. External Validity in Research. Retrieved February 4, 2026,
from https://www.verywellmind.com/internal-and-external-validity-4584479
MLead, s. (2024). Reliability vs. Validity in Research. Retrived Fevruary 4, 2026, from
Pearl, J., & Bareinboim, E. (2022). External Validity: From Do-Calculus to Transportability Across
Populations. ACM Books.
Schmidt, F., L. (1996). Statistical Significance Testing and Cumulative Knowledge in Psychology:
Implications for Training of Researchers. Psychology Methods, 1(2), 115-129.
Colleague post #2: The Major Flaws of Null Hypothesis Significance Testing
Null hypothesis significance testing (NHST) has been the dominant framework in psychological research for decades, yet many scholars argue that it contains fundamental flaws that limit the fields ability to build reliable and cumulative scientific knowledge. Two of the most influential critiques come from Cohen (1990) and Schmidt (1996), both of whom highlight how NHST is frequently misunderstood, misapplied, and overvalued in psychological science (Cohen,1990; Schmidt,1996 )
Cohen (1990) argues that researchers routinely misinterpret the meaning of the pvalue, treating it as a direct indicator of truth rather than a conditional probability based on hypothetical repeated sampling. He emphasizes that statistical significance does not imply practical significance, noting that trivial effects can become significant with large enough samples (Cohen,1990 ). Cohen also points out that NHST encourages researchers to ignore effect sizes and statistical power, which are essential for understanding the magnitude and reliability of findings. His critique suggests that the field often mistakes detectability for importance, leading to a literature filled with statistically significant but scientifically uninformative results (Cohen,1990; Schmidt,1996 )
Schmidt (1996) extends this critique, arguing that NHST actively prevents psychology from developing cumulative knowledge. Because NHST focuses on binary decisions, significant or not, its approach obscures the true size and consistency of effects across studies (Schmidt,1996 ). Schmidt contends that this dichotomous thinking contributes to publication bias, unstable findings, and a lack of theoretical progress. He also notes that NHST is overly sensitive to sample size, producing significant results for trivial effects in large samples and nonsignificant results for meaningful effects in small samples (Schmidt,1996 ). According to Schmidt, the belief that NHST provides objective scientific rigor is illusory, and the field must shift toward effect sizes, confidence intervals, and meta-analytic thinking to advance (Schmidt, 1996).
Together, Cohen and Schmidts critiques reveal that NHST is not merely a flawed statistical tool but a barrier to scientific progress when used uncritically. Their work has helped push psychology toward more informative approaches, such as estimation statistics, effect size reporting, and metaanalysis, that better support cumulative knowledge and theoretical development. As the field continues to confront issues such as the replication crisis, these critiques remain highly relevant and underscore the need for statistical practices that prioritize meaning over mere significance (Cohen, 1990; Schmidt, 1996). During the replication crisis, another author, Bargh et al. (1996), did not directly critique NHST, but their study became one of the most influential examples of the flaws that Cohen (1990) and Schmidt (1996) warned about. The priming effects reported by Bargh were statistically significant but later failed to replicate, illustrating how NHST can produce unstable findings, encourage overreliance on pvalues, and hinder cumulative scientific progress. Replication failures (e.g., Doyen et al., 2012) provide empirical support for Cohens and Schmidts critiques (Cohen,1990; Bargh et al.,1996; Schmidt,1996;D oyen et al.,2012 )
What do these flaws mean for the field of Psychology? The flaws mean as follows. Psychology risks building theories on statistical significance is driven by sample size rather than meaningful effects, the field may chase’significant’ but trivial findings. NHST contributes to the replication crisis. Low power, overreliance on p-values, and publication bias all contribute to poor replicability, one of the biggest issues in modern psychology. The field needs to shift toward estimation and cumulative science (Cohen, 1990; Schmidt, 1996). Both Cohen and Schmidt advocate for effect sizes, confidence intervals, Meta-analysis, power analysis, and transparent reporting. NHST should not be the primary decision-making tool. Neither Cohen nor Schmidt argued for eliminating NHST entirely, but both insisted it should be supplemented or replaced by more informative statistical approaches. Finally, Cohen and Schmidt both argue that NHST is fundamentally limited and misleading. Their critiques helped spark the modern movement towards effect sizes, confidence intervals, meta-analysis, and estimation-based statistics, approaches that provide richer, more meaningful information than a simple p-value ( Cohen,1990; Schmidt,1996 )
Summary Table
Issue Cohen (1990 ) Schmidt (1996)
Misinterpretation of p-value yes yes
No statistical significance, practical
Significance yes yes
Low power in psychology yes
NHST blocks cumulative knowledge yes
Dichotomous thinking yes
Sample size distortions yes yes
Need for effect sizes & power yes yes
References
Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71(2), 230244. American Psychological Association
Doyen, S., Klein, O., Pichon, C.L., & Cleeremans, A. (2012). Behavioral priming: Its all in the mind, but whose mind? PLOS ONE, 7(1), e29081.Public Library of Science DOI: 10.1371/journal.pone.0029081
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 13041312.
American Psychological Association
DOI: https://doi.org/10.1037/0003-066X.45.12.1304
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115129.
American Psychological Association
DOI: https://doi.org/10.1037/1082-989X.1.2.115.
Colleague post #3: Null hypothesis significance testing has several major flaws that have long been recognized as problematic for psychological science. One key issue, emphasized by Cohen (1990), is that statistical significance is often misunderstood as evidence that an effect is important or meaningful, when it only indicates that an effect is unlikely to be zero given the sample size. Because the null hypothesis almost always states that an effect is exactly zero, a condition that is rarely true in real-world psychological phenomena, researchers are often rejecting a hypothesis that was never plausible to begin with. As a result, null hypothesis significance testing encourages a misleading focus on p values rather than on the size of effects, their practical importance, or the theoretical meaning of the findings. This has led to a culture in which results are reduced to binary decisions (significant vs. not significant), oversimplifying complex psychological processes.
Another serious problem, highlighted by Schmidt (1996), is that reliance on significance testing actively slows the development of cumulative knowledge in psychology. Because null hypothesis significance testing focuses heavily on controlling Type I error (false positives), it often ignores Type II error (false negatives) and statistical power. This leads to many real effects going undetected, especially in studies with small sample sizes. Schmidt shows that entire research literatures can appear contradictory simply because some studies reach statistical significance and others do not, even when all are estimating the same underlying effect. This vote-counting approach creates the false impression that effects are inconsistent or unreliable, when in fact the inconsistency is largely due to sampling error and low power rather than true differences in psychological phenomena.
Together, Cohen (1990) and Schmidt (1996) argue that the field of psychology has been misled by overreliance on null hypothesis significance testing and that this has serious consequences for theory building, replication, and practical application. Both authors emphasize that researchers should shift their focus toward effect sizes, confidence intervals, and meta-analysis, which provide more informative and honest summaries of research findings. Without this shift, psychology risks continuing to produce fragmented and confusing research literatures that obscure rather than clarify real effects. Moving beyond simple significance testing is therefore essential for advancing cumulative knowledge and improving the scientific credibility of the field.
References
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 13041312.
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115129.
Attached Files (PDF/DOCX): rubric.pdf
Note: Content extraction from these files is restricted, please review them manually.

Leave a Reply
You must be logged in to post a comment.