Powerful Tools for Designing Powerful Studies

Why do studies fail to replicate? There are several possible explanations but a notable one is that many studies are underpowered — that is, they have sample sizes that are simply too small given the size of the effect under investigation. In an article in Psychological Science, researchers from the University of Notre Dame explain why many studies end up inadequately powered and offer open-source tools that can help researchers proactively avoid the problem.

Statistical power, as psychological scientists Samantha F. Anderson, Ken Kelley, and Scott E. Maxwell describe in their article, is the “probability of rejecting the null hypothesis of no effect when the true effect is nonnull in the population.” Researchers want to be fairly confident that they’ll be able to detect an effect if one truly does exist — ensuring their studies have adequate power is an important component of experimental design.

To do this, they calculate the total number of participants needed to detect an effect of a specific size with their targeted level of power. Researchers can’t know how big an effect actually is in the population, so they often estimate it using effect sizes in published studies. And this is where the problem arises, Anderson and colleagues argue, as such effect-size estimates have several inherent flaws.

One notable flaw, the researchers explain, is that an effect size in published research is likely to be greater than the true population effect size due to the so-called file drawer problem. A publication bias that strongly favors statistically significant findings produces a literature with “upwardly biased” effect size estimates.

Estimates based on previously published effect sizes also fail to account for the uncertainty intrinsic to statistical inferences. Researchers can specify the uncertainty of an effect size via a confidence interval that indicates the range of values within which the true population effect size is likely to exist. This uncertainty is often ignored, however, when researchers use the single-value point estimate from published studies to determine the sample size required for their own studies.

“Given the ubiquity of bias and uncertainty in estimates of effect size, researchers who conscientiously plan their sample sizes using published effect sizes from prior studies can have actual power that is abysmal, especially when the population effect size is small,” Anderson, Kelley, and Maxwell write.

Underpowered studies mean that researchers may not be able to detect effects when they do exist, but they can also have other consequences, including increasing the proportion of studies in the literature that falsely find an effect when it doesn’t exist and producing effect-size estimates that are inflated. In a broader context, they also limit the replicability of study findings.

Building on a strategy originally proposed by Taylor and Muller in 1996, Anderson and colleagues outline a procedure that enables researchers to account for these flaws from the beginning by adjusting effect-size estimates for publication bias and uncertainty.

Researchers can use this method for free via an open-source R package (BUCSS) and web-based apps — they simply need to have a few key pieces of information to use these platforms.

“We hope that more accurate estimates of effect size will result in new psychological studies that are more adequately powered and will lead to a replicable literature that inspires more confidence and is less in crisis,” Anderson, Kelley, and Maxwell conclude.