Scientific “freedom” and the Fountain of Youth

October 19, 2011

Tags:

“Chronological rejuvenation” is psychological jargon for the Fountain of Youth, that elusive tonic that, when we find it, will reverse the aging process. Though many of us would welcome such a discovery, most of us also know it’s a fantasy, a scientific impossibility.

So imagine my surprise when I came across this passage while browsing the journal Psychological Science. I include all the methodological detail because they are important:

Using the same method as in Study 1, we asked 20 University of Pennsylvania undergraduates to listen to either “When I’m Sixty-Four” by The Beatles or “Kalimba.” Then, in an ostensibly unrelated task, they indicated their birth date (mm/dd/yyyy) and their father’s age. We used father’s age to control for variation in baseline age across participants. An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040.

If you’re like me, you had to reread the passage: It doesn’t say that subjects felt a year and a half younger; it says that they actually were a year and a half younger. That is, the data support the scientists’ unbelievable hypothesis that listening to music about old age can make us significantly younger.

Unbelievable indeed. The authors of the report—Joseph Simmons and Uri Simonsohn of Penn and Leif Nelson of Berkeley—certainly don’t believe this finding, although they did really run the experiment as described, and they used accepted practices for reporting and analyzing their data. They ran the experiment to demonstrate a serious flaw in the usual way that behavioral science data are collected and reported—a flaw that (they claim) allows scientists to prove that “anything” is a significant finding. The scientists are not charging malfeasance or malicious intent, but they do argue that current methods lead inevitably to self-serving intellectual dishonesty, producing a lot of “false positives”—results that appear statistically valid but in fact are not. Discovering the Fountain of Youth is a deliberately absurd example, but the authors believe that false positives in behavioral science are “vastly more likely” than the 5 percent that’s generally acknowledged.

How does this happen? The scientists ran a computer simulation of 15,000 actual data samples, and identify the culprit as “researcher degrees of freedom.” This simply means that, in the course of running and reporting an experiment, scientists make a number of decisions that can skew results. These decisions—and the rules governing them—make it “unacceptably easy” to publish bogus evidence for literally any hypothesis. One example is the decision about the sample size. Typically, a researcher will recruit a sample—say 20 subjects—and run the experiment. At that point, the research tests the data for significance. If the result is significant, terrific: The researcher stops collecting data and reports the result. So far so good, but what if the result does not reach significance after 20 subjects? In that case, the researcher has the option of adding another group of subjects—say 10 more—and testing the data again. And again, and again. According to the authors’ computer simulation, this seemingly small degree of freedom increases the false-positive rate by 50 percent. A recent survey found that about seven in ten psychological scientists admit to making such interim decisions in their research.

Manipulating sample size is just one of the decision-making “freedoms” that the authors highlight in their Psychological Science paper. Another is flexibility in including (and reporting) dependent variables. If a researcher designs an experiment with two dependent variables, he or she can decide to test just one, or the other, or both, increasing the likelihood of producing at least one significant result. Similarly, flexibility in controlling for gender can dramatically boost false positives, as can dropping (or not dropping) one of three experimental conditions. What’s more, the authors note, scientists often use all these freedoms in the same experiment, a practice that would lead to a stunning 61 percent false positive rate. In other words, a researcher with all the best intentions is more likely than not to falsely detect a positive result just by using the accepted practices in the field.

They consider this estimate conservative. Researchers often test and choose among more than two dependent variables; exclude subsets of subjects as “outliers”; consign early data to “pilot studies”; and so on.

So what’s to be done? The authors offer a simple solution to the problem of false-positive publication, including requirements for researchers and guidelines for journal reviewers. The requirements all aim for more transparency in reporting data and methods of analysis—listing all variables, for example, and deciding on sample size before data collection begins. Here, for illustration, is the way the authors would revise their own report on the Fountain of Youth study:

Using the same method as in Study 1, we asked 34 University of Pennsylvania undergraduates to listen only to either “When I’m Sixty-Four” by The Beatles or “Kalimba” or “Hot Potato” by the Wiggles. We conducted our analyses after every session of approximately 10 participants; we did not decide in advance when to terminate data collection. Then, in an ostensibly unrelated task, they indicated only their birth date (mm/dd/yyyy) and how old they felt, how much they would enjoy eating at a diner, the square root of 100, their agreement with “computers are complicated machines,” their father’s age, their mother’s age, whether they would take advantage of an early-bird special, their political orientation, which of four Canadian quarterbacks they believed won an award, how often they refer to the past as “the good old days,” and their gender. We used father’s age to control for variation in baseline age across participants. An ANCOVA revealed the predicted effect: According to their birth dates, people were nearly a year-and-a-half younger after listening to “When I’m Sixty-Four” (adjusted M = 20.1 years) rather than to “Kalimba” (adjusted M = 21.5 years), F(1, 17) = 4.92, p = .040. Without controlling for father’s age, the age difference was smaller and did not reach significance (Ms = 20.3 and 21.2, respectively), F(1, 18) = 1.01, p = .33.

This detailed report is about twice as long as the one above, and will no doubt seem burdensome to many scientists. It also will not stop blatant cheaters, the authors concede, but that is not the point. The goal, they conclude is to reduce the “self-serving interpretation of ambiguity, which enables us to convince ourselves that whichever decisions produced the most publishable outcome must have also been the most appropriate.”

Wray Herbert’s book, On Second Thought, has recently been published in paperback. Excerpts from his two blogs—“We’re Only Human” and “Full Frontal Psychology”—appear regularly in Scientific American Mind and in The Huffington Post.

News > We're Only Human > Scientific “freedom” and the Fountain of Youth

Robot hand touching human hand illustration

The Biggest Threat to Online Data Collection Is Humans, Not Bots

Concerns about bots answering online surveys are exaggerated, but a new threat is emerging in artificial intelligence agents.

Emojis ranging from negative to positive emotions.

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Two researchers advocate for new AI-based measures not because they offer measurement free from error, but rather because they avoid specific problematic forms of error linked to overreliance on self-reports.

Comments

Neuroskeptic October 19, 2011

Two comments:

1) It’s not just psychology, same goes for many other fields of science.

2) I previously proposed a way of preventing this but it would require a radical change in the way science is published, and neither scientists nor publishers would like it.

Andrey October 19, 2011

The idea of “listing all variables” seems reasonable, but only until you realize that it isn’t possible, as no one can say what “all variables” are. Should we include gender as a variable? Or a time of day? Or temperature in the lab? As for the sample size, the more is always better as it reduces the noise – as long as one uses random assignment of participants to experimental groups.
My opinion, it is more correct to say that the psychologist should stick to the theory in making predictions and report null results as well as positive ones. But this is an old problem, and I don’t see any solutions to it. However, if you are wrong, there always will be someone to correct you, and this is one of the reasons scientist usually took care not to publish possibly bogus results.

Psicologos Barcelona January 5, 2012

I think that in order to validate a scientific report needs to be supported with objective data. And that’s the problem, which in many cases data reported are accurate but less objective, ie they are real data but we can not know for sure that these data represent what we are observing.
Sorry for my English.

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 27 days	Set by addthis.com to determine the usage of addthis.com service.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
loc	1 year 27 days	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Scientific “freedom” and the Fountain of Youth

Comments

Related

How Should Psychologists Use AI and Big Data? Nine Guides Point the Way

The Biggest Threat to Online Data Collection Is Humans, Not Bots

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Comments

Related

How Should Psychologists Use AI and Big Data? Nine Guides Point the Way

The Biggest Threat to Online Data Collection Is Humans, Not Bots

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Cookies