A Commitment to Replicability: An Interview with the Editor of Psychological Science

Lindsay_Steve-2015_Observation_akm

D. Stephen Lindsay

The Association for Psychological Science is committed to publishing cutting-edge research of broad interest in its journals. But it also aims to publish empirical work built on strong and sound research practices.

This week, Psychological Science Interim Editor D. Stephen Lindsay published an editorial confirming the journal’s commitment to replicability and encouraging practices that increase the likelihood that studies can be reproduced by other researchers following the same method.

The Observer asked Lindsay for his thoughts about replicability in psychological science.

Does the interestingness of a given study come at the cost of replicability?

No – there are all sorts of fascinating psychological phenomena that are not only replicable but robust. An example that comes easily to mind (!) is the availability heuristic, in which ideas that are fluently generated mentally are judged to be more common or probable than less-fluently generated ideas. That is a powerfully robust phenomenon that is very interesting indeed.

So it’s not that if an effect is replicable then it is inherently boring. But when researchers are working at the frontiers of knowledge there is naturally risk and uncertainty. If we already knew that a particular effect obtained under a specific range of conditions, then there would be little interest in demonstrating it under those conditions. When scientists are truly exploring the unknown, there are a variety of ways we can be misled into thinking we’ve found something that isn’t really there. We might, for example, mistake a fluke of chance as evidence of a causal effect.

What should researchers do to increase the replicability of their research?

For psychologists who use null hypothesis significance testing, first and foremost is to ensure that they have a good understanding of the noisiness of p values and of how certain widespread practices (collectively termed “p hacking”) greatly increase the risk of spurious and misleading “significant” test outcomes. Like many psychologists, I’ve learned a lot about these issues in just the last few years. A terrific place to start is Geoff Cumming’s tutorial videos on the APS website, and the extraordinarily illuminating 2011 Psychological Science article by Simmons, Nelson, and Simonsohn.

Second, researchers who want to enhance the replicability of their research should do what they can to maximize the precision of the estimates they obtain in their studies. That is partly a matter of testing larger samples than is typical in psychology experiments. But the number of people tested in a study is not the only consideration. Another is the reliability of the measurement tools used in the study. Often it would help a lot to increase the number and/or quality of measures obtained from each participant. And another is the extent to which extraneous sources of variability are minimized.

Researchers wanting to increase replicability should also consider pre-registering their studies. I have started to do this in my own lab, and am requiring my students to do it as well. If you are doing exploratory work in a new domain, then pre-registering might be premature. By all means be creative and flexible while exploring. But if you happen to stumble across some interesting finding while exploring, don’t rush it into publication but rather attempt a pre-registered replication. Preregistration entails specifying in advance the number and nature of the subjects to be tested and the materials, procedures, measures, exclusions, and analyses to be used in the study. That information can be recorded in a time-stamped public repository such as the Open Science Framework as a way of documenting that you did good science, but the point is not so much to create the document as to in fact do things the right way.

Last, the most obvious and direct way of enhancing replicability is to conduct and report replications. It is not crucial that every study obtains a particular p value. What is more important is the extent to which a set of replications provides converging evidence as to the size of an effect. For a terrific example of this point, see the first Registered Replication Report published in Perspectives on Psychological Science, which found powerful evidence for the Schooler and Engstler-Schooler verbal overshadowing effect (even though the effect was not statistically significant in all individual experiments).

How can readers evaluate the likely replicability of scientific papers?

Good clear common sense will get you a long way. Be skeptical if the paper reports a single study, if the number of subjects in that study is not large, if the finding is surprising, or if the p value indicating the probability of obtaining an effect at least that large by a fluke is greater than 1 or 2 out of a hundred. If the study has all four of those characteristics, be very skeptical.

Why did it take so long for this trend toward transparency and replication to emerge?

Good question. I for one simply didn’t understand how noisy p values are and how badly they can be distorted by (for example) repeatedly “testing a few more subjects to see if the pattern stabilizes.” More and better math and stats training presumably would have helped me and many others have a better understanding of these issues. I wouldn’t be surprised if some cultural/social dynamics played important roles. We as a culture seem to put a high premium on novelty. Surprising findings hold a magnetic appeal to psychologists, perhaps partly because the field is sometimes dismissed as the science of the obvious. Also, researchers are under tremendous pressure to publish as much as they can as soon as they can.

How can scientists get involved in the efforts to improve the replicability of the field?

Model the desired behavior in their own work. Also, if they have influence on departmental policy, maybe attempt to de-emphasize the number of publications in favor of deeper measures of productivity. As consumers of information, search for published replication attempts. Be a bit skeptical of others’ claims, and a bit humble about their own.

Should scientists expect to see improvements to replicability in Psychological Science right away?

Psychological Science has always been a great journal. My predecessor Eric Eich put in place many policies to enhance replicability, and my fellow editors and I are now taking further steps in that direction.  But it takes quite a bit of time for these sorts of changes to really take root and start bearing fruit, so this will be a gradual, long-term process. Ultimately we aim to change not just the journal but the broader culture of psychological science by putting more of a premium on replicability. And that is likely to take a while.

D. Stephen Lindsay is Interim Editor of Psychological Science and an APS Fellow. He is Professor of Psychology at University of Victoria, British Columbia. Lindsay researches human memory and cognition, with an emphasis on source monitoring, age and memory, and eyewitness memory.


APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.