Making the Case for Prediction Over Explanation

September 1, 2017

Tags:

Through centuries of study, physical scientists have sought good explanations for the phenomena they study, helping the field generalize specific observations to the wider world. It was Sir Isaac Newton’s theory of gravity that led astronomers 200 years later to mathematically predict the discovery of a major planet, Neptune. But in psychological science, explanations don’t always yield useful predictions. The explanations given for behavior and thought patterns in humans, whether conceived through interviews, case studies, or experimental tests, haven’t predicted their future behavior—or haven’t been tested to make forecasts.

“If ideal explanatory science is not generally ideal predictive science, and vice versa, then researchers must make a conscious choice: to explain or to predict,” Tal Yarkoni and Jacob Westfall write in a recent article published in Perspectives on Psychological Science. The University of Texas at Austin researchers advocate for a shift in psychological research, from an emphasis on explanation-based studies to prediction-based ones. The best tool for the job, they say, is machine learning.

They argue that the reproducibility crisis, really, is a crisis in previous models failing to predict what would happen in a replication study. Instead of focusing on prediction, psychological models too often are formed to fit the specific data in the experiment, Yarkoni and Westfall say. If researchers can find a friendly trend line or function, they can report a low p-value and claim success. In addition, psychologists haven’t had access to large amounts of data to get meaningful patterns in the past and didn’t have calculators powerful enough to wrangle titanic datasets.

But recently both barriers have fallen, and the call to reform has grown louder. Massive increases in computing power have accompanied more efficient database software and a large knowledge base of how to use both. Machine learning has allowed researchers to ferret out patterns that were before out of reach. While computationally expensive, these methods are widely accessible and easy to learn. The biggest obstacle is scientists uncomfortable with change: academic inertia (Hello again, Mr. Newton).

Machine learning can easily ‘consider’ more factors than humans ever could. This is especially helpful for outcomes that have many small contributions. It is reasonable to assume that many human behaviors fall into this category.

In a 2010 article in Psychological Science, Mitja D. Back (University of Munster) and colleagues showed that observer ratings of Facebook profiles correlated with self-reported personalities. This study involved about 200 social network users in the United States and Germany. The researchers concluded that social network profiles reflected a person’s true personality, not just the personality a user might wish to present to the world. While this study gives an explanatory result, it suggests a predictive one: what personality traits can someone accurately pick up on just by viewing a social media profile? How accurate are the guesses?

Psychological scientist Michal Kosinski and colleagues in 2013 applied predictive thinking and machine learning to this question. They used psychometric questionnaires from 58,000 Facebook users and the ‘likes’ on their profile pages to predict personal traits and Big Five personality traits. They also answered the simple question, “Can one get accurate personality information from a social media profile,” but also found how predictable sexual orientation was (accurate 88% of the time for men), Democrat and Republican (85% of the time), and nearly as accurate as a personality test itself on the trait of Openness.

A focus on predictions and the application of machine learning does not eliminate the possibility of finding explanations, either. Machine learning algorithms can be built in such a way to be more or less comprehensible to their human operators, although understandability may come at the cost of accurate prediction, Yarkoni and Westfall say in their article. Algorithms can also be ‘read’ to a certain extent, probed to see whether a few factors are creating large effects or many factors are making small ones, they write. What’s more, it is not uncommon to use machine learning on subsets of data to, for example, separate out the effects of medical history and genetic traits by using them together in one analysis and separately in another, they point out.

Going to a prediction model will necessitate a shift in thinking and question asking, Yarkoni and Westfall say, likely leading to less elegant models of how internal chemistry and external experience shape human behavior. The models will put uncertainty in the foreground, something many psychologists may be uncomfortable with at first. As the authors point out, “This is arguably not a real weakness at all, inasmuch as estimation uncertainty is a fact of life.” It is simply underplayed in much research reporting, they say.

References

Back, M.D., Stopfer, J.M., Vazire, S., Gaddis, S., Schmukle, S.C., Egloff, B., & Gosling, S.D. (2010). Facebook profiles reflect actual personality, not self-idealization. Psychological Science, 21(3), 372-374, doi:10.1177/0956797609360756

Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110, 5802-5805, doi:10.1073/pnas.1218772110’

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: lessons learned from machine learning. Perspectives on Psychological Science, doi: 10.1177/1745691617693393

Publications > Observer > Observations > Making the Case for Prediction Over Explanation

Emojis ranging from negative to positive emotions.

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Two researchers advocate for new AI-based measures not because they offer measurement free from error, but rather because they avoid specific problematic forms of error linked to overreliance on self-reports.

Zoom in with magnifying glass on crowd of people.

Are Psychological Scientists Overvaluing Significance?

Some scientists are hesitant to submit nonsignificant results to journals, citing reputation as a major factor.

Tips for Estimating Power in Complex Statistical Models

David Cole and George Abitante present recommendations for using and interpreting power analyses.

Comments

Edward Erwin September 1, 2017

There is no need to choose between explaining and predicting. An explanatory hypothesis does not explain unless it is true. The fact that it would explain if true is not itself grounds for its truth.

But neither is prediction except under certain conditions. The H-D model of confirmation, still widely accepted, is incorrect.

First, it generates the raven paradox.
Second, deriving observational predictions from a theory and then confirming them is no confirmation at all unless equally credible rivals that make the same predictions are ruled out.

Third, even if there are no rivals, the confirmation of weak or trivial predictions that provide no rigorous test of a theory do not provide credible evidence.

Psychology needs explanations and predictions, but to avoid replication failures, the focus should be on rigorous methodologies that yield high quality evidence.

Ed Erwin, Philosophy, University of Miami

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 27 days	Set by addthis.com to determine the usage of addthis.com service.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
loc	1 year 27 days	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Comments

Related

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Are Psychological Scientists Overvaluing Significance?

Tips for Estimating Power in Complex Statistical Models

Cookies