Through centuries of study, physical scientists have sought good explanations for the phenomena they study, helping the field generalize specific observations to the wider world. It was Sir Isaac Newton’s theory of gravity that led astronomers 200 years later to mathematically predict the discovery of a major planet, Neptune. But in psychological science, explanations don’t always yield useful predictions. The explanations given for behavior and thought patterns in humans, whether conceived through interviews, case studies, or experimental tests, haven’t predicted their future behavior—or haven’t been tested to make forecasts.
“If ideal explanatory science is not generally ideal predictive science, and vice versa, then researchers must make a conscious choice: to explain or to predict,” Tal Yarkoni and Jacob Westfall write in a recent article published in Perspectives on Psychological Science. The University of Texas at Austin researchers advocate for a shift in psychological research, from an emphasis on explanation-based studies to prediction-based ones. The best tool for the job, they say, is machine learning.
They argue that the reproducibility crisis, really, is a crisis in previous models failing to predict what would happen in a replication study. Instead of focusing on prediction, psychological models too often are formed to fit the specific data in the experiment, Yarkoni and Westfall say. If researchers can find a friendly trend line or function, they can report a low p-value and claim success. In addition, psychologists haven’t had access to large amounts of data to get meaningful patterns in the past and didn’t have calculators powerful enough to wrangle titanic datasets.
But recently both barriers have fallen, and the call to reform has grown louder. Massive increases in computing power have accompanied more efficient database software and a large knowledge base of how to use both. Machine learning has allowed researchers to ferret out patterns that were before out of reach. While computationally expensive, these methods are widely accessible and easy to learn. The biggest obstacle is scientists uncomfortable with change: academic inertia (Hello again, Mr. Newton).
Machine learning can easily ‘consider’ more factors than humans ever could. This is especially helpful for outcomes that have many small contributions. It is reasonable to assume that many human behaviors fall into this category.
In a 2010 article in Psychological Science, Mitja D. Back (University of Munster) and colleagues showed that observer ratings of Facebook profiles correlated with self-reported personalities. This study involved about 200 social network users in the United States and Germany. The researchers concluded that social network profiles reflected a person’s true personality, not just the personality a user might wish to present to the world. While this study gives an explanatory result, it suggests a predictive one: what personality traits can someone accurately pick up on just by viewing a social media profile? How accurate are the guesses?
Psychological scientist Michal Kosinski and colleagues in 2013 applied predictive thinking and machine learning to this question. They used psychometric questionnaires from 58,000 Facebook users and the ‘likes’ on their profile pages to predict personal traits and Big Five personality traits. They also answered the simple question, “Can one get accurate personality information from a social media profile,” but also found how predictable sexual orientation was (accurate 88% of the time for men), Democrat and Republican (85% of the time), and nearly as accurate as a personality test itself on the trait of Openness.
A focus on predictions and the application of machine learning does not eliminate the possibility of finding explanations, either. Machine learning algorithms can be built in such a way to be more or less comprehensible to their human operators, although understandability may come at the cost of accurate prediction, Yarkoni and Westfall say in their article. Algorithms can also be ‘read’ to a certain extent, probed to see whether a few factors are creating large effects or many factors are making small ones, they write. What’s more, it is not uncommon to use machine learning on subsets of data to, for example, separate out the effects of medical history and genetic traits by using them together in one analysis and separately in another, they point out.
Going to a prediction model will necessitate a shift in thinking and question asking, Yarkoni and Westfall say, likely leading to less elegant models of how internal chemistry and external experience shape human behavior. The models will put uncertainty in the foreground, something many psychologists may be uncomfortable with at first. As the authors point out, “This is arguably not a real weakness at all, inasmuch as estimation uncertainty is a fact of life.” It is simply underplayed in much research reporting, they say.
Back, M.D., Stopfer, J.M., Vazire, S., Gaddis, S., Schmukle, S.C., Egloff, B., & Gosling, S.D. (2010). Facebook profiles reflect actual personality, not self-idealization. Psychological Science, 21(3), 372-374, doi:10.1177/0956797609360756
Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110, 5802-5805, doi:10.1073/pnas.1218772110’
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: lessons learned from machine learning. Perspectives on Psychological Science, doi: 10.1177/1745691617693393