“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Catharine Fairbairn, Nigel Bosch

February 13, 2026

Tags:

Emojis ranging from negative to positive emotions.

Self-reports • Machine learning • Conclusion

Psychology has long been a science of self-reports.
Overreliance on self-reports can lead to inflated effects.
A range of automated measures provide new options for behavioral researchers.
Although neither self-reports nor automated tools offer measurement free from error, automated measures are less likely to be vulnerable to systematic forms of error and false-positive findings when compared with self-reports.

When selecting measures for use in new research, psychologists have overwhelmingly turned to self-reports. Though questionnaires and interview-style measures have advantages, studies exploring relationships between multiple self-reports can produce inflated effects.

Progress in artificial intelligence (AI) and machine learning has the potential to transform measurement in psychology by offering new ways to analyze data created naturally through everyday human activities, including social media posts, videos, and photographs. In this essay, coauthored by a behavioral scientist with a doctoral degree in psychology (Catharine Fairbairn) and a machine-learning researcher with a doctorate in computer science (Nigel Bosch), we argue for increased uptake of automated measures in psychological science. Specifically, we advocate for these new AI-based measures not because they offer measurement free from error, but rather because they avoid specific problematic forms of error linked to overreliance on self-reports.

Self-reports

Psychology has long been a science of self-reports. When applied to subjective constructs, self-reports can provide a view of human experience that is tremendously valuable and difficult to capture via other means (Garcia & Gustavson, 1997). Self-reports are also cost-effective and scalable, and when they include multiple choice and Likert-style response options, self-reports can easily be analyzed with conventional statistical approaches (Paulhus & Vazire, 2007).

Catharine Fairbairn

Although self-reports have advantages, their use in psychological science might be said to have gotten out of hand (Baumeister et al., 2007). The application of self-reports in behavioral research now extends well beyond the measurement of constructs that are inherently subjective to the measurement of behavior, events, skills/abilities, and even physiology (Fairbairn & Bosch, in press).

Psychological processes often operate at levels below conscious awareness, such that we as humans may find ourselves unable to accurately report on our internal thoughts and feelings and external behaviors (Nisbett & Wilson, 1977). Limitations in memory and self-perception interfere with the accuracy of self-reports, with recall for distant life events, aggregation of information across time, and self-evaluation of our own performance emerging as particularly biased (Baldwin et al., 2019; Dunning et al., 2004; Schwarz, 1999).

Even when we are capable of accurately reporting our experiences, we may not always be willing to do so, with misreporting on sensitive topics (e.g., drug use) reaching levels as high as 50% (Tourangeau & Yan, 2007). Fixed-choice items are no help in this regard. Participant responses to multiple-choice and Likert-type items vary substantially depending on the specific response options provided by researchers (Schwarz, 1999).

In sum, we know that self-reports contain error. But the same could be said of every assessment technique deployed in the history of science. When it comes to measurement, we never deal in the realm of absolute truth, but rather one fumbling attempt at approximation after another.

“In theoretical terms, shared measurement error could either inflate or diminish the size of effects. However, when applied to self-reports, research suggests that shared measurement effects are overwhelmingly inflationary.”

Potential harm from measurement error can vary substantially depending on not only its quantity but also, and importantly, its specific characteristics or quality. Psychology features a high proportion of studies in which both predictor and outcome are assessed via self-report, with the prevalence of such designs surpassing 50% in some subdisciplines (Fairbairn & Bosch, in press).

Measurement error shared across a predictor and outcome can artificially inflate the size of observed effects. For example, a significant relationship between self-reports of alcohol problems and self-reports of marital distress might emerge because of a true underlying relationship between these factors or, alternatively, because of systematic forms of measurement error shared across the two self-reports (Campbell & Fiske, 1959; Podsakoff et al., 2003, 2012). These might include individual differences in self-presentational concern (some participants may be unwilling to disclose alcohol use and marital distress), memory/attention (some may struggle to remember instances of either), mood (everything seems bad, including marriage and behaviors), or personal lay theories (“I believe my spouse is driving me to drink.”). Such systematic error has the potential to move beyond the realm of noise into that of true confound.

Nigel Bosch

In theoretical terms, shared measurement error could either inflate or diminish the size of effects. However, when applied to self-reports, research suggests that shared measurement effects are overwhelmingly inflationary (Podsakoff et al., 2024). As such, overreliance on self-reports has the potential to lead to false positive effects. In a discipline recently rocked by a replicability crisis (Open Science Collaboration, 2015), this possibility is one worthy of grave concern.

A veritable “Who’s Who” of psychological science has raised concerns about overreliance on self-reports, from Floyd Henry Allport to Allen Edwards to Richard Nisbett and APS President James Pennebaker. Although limitations of self-reports have been widely acknowledged, researchers across psychological subdisciplines have continued to use them. Thus, within behavioral research, the measure that is most universally lambasted also represents the one most universally deployed.

Such a seeming contradiction might be explained in part by necessity. Although biological psychologists have recourse to brain scans and biological assays, options available to psychosocial researchers are comparatively sparse. Historically, alternatives to self-reports in psychosocial research required large teams of human coders and costly experimental equipment, constraining investigations to laboratory contexts and requiring coding efforts that span months or even years. As a result, depending on the study and the domain of assessment, psychological researchers have often been faced with a choice between self-report data or no data at all.

Machine learning

Recent developments in AI have the potential to transform the measurement landscape of psychological science, offering a smorgasbord of measurement options to behavioral researchers beyond surveys. Advanced machine-learning subtypes such as deep learning can accurately model relationships characterized by extraordinary levels of complexity, including nonlinear associations and millions of interacting predictors (LeCun et al., 2015). These data-driven model types can help us move beyond the constraints of “designed” data types, such as closed-ended self-reports, into the analysis of rich, organic data sources created naturally through human activities (e.g., social media posts; Adjerid & Kelley, 2018; Tay et al., 2022).

Despite these possibilities, discourse on machine learning in behavioral research has often focused on its potential as a tool for analysis of preexisting “designed” data types, and widely cited applications involve the use of older machine-learning subtypes (e.g., using a random forest algorithm to predict suicide risk from survey responses; see Jacobucci et al., 2021). This focus fails to leverage key advantages of newer machine-learning models, which require massive training datasets rich in both observations and reliably measured predictors—attributes unlikely to characterize even the largest of survey studies (Fairbairn & Bosch, in press; Jacobucci & Grimm, 2020).

“Importantly, measures based in machine learning do not and will not ever offer a true “objective” view of human experience and behavior. These models are only as robust as the datasets they are trained on, which were inevitably created and curated by humans.”

In tandem with progress in deep learning, the increased use of smartphones, wearables, and the internet has exponentially expanded access to organic data sources. Therefore, in behavioral research, machine-learning applications are likely to be most impactful not in transforming the manner in which we analyze premeasured constructs, but rather in transforming how we measure these constructs to begin with.

Researchers can apply deep and generative learning methods within complex datasets for a wide range of measurement tasks, including recognition of patterns within predictors spanning time, space, and ordered sequences (LeCun et al., 2015). For example, machine-learning models have surpassed human accuracy in identifying objects and individuals within images (Norvig & Russell, 2021), offering unprecedented possibilities for analyzing environmental and social factors within photographs (e.g., Ariss et al., 2025). Machine learning has also been fruitfully applied to video data, enabling automated analysis of action sequences, body posture, physical proximity, and facial movements over time (e.g., Gurrieri et al., 2021).

Deep learning has been particularly impactful for speech and language analysis, with models for speech recognition now capable of performing advanced transcription tasks, such as parsing individual speakers during social exchanges and accurately recognizing language in noisy recording environments. Furthermore, natural language processing models can now detect broad emotional tone as well as the content and structure of language (e.g., Rathje et al., 2024).

Finally, researchers are using machine-learning models to identify patterns within sequences of events as well as to extract behaviorally relevant constructs from data produced by wearable technology (Fairbairn et al., in press).

Importantly, measures based in machine learning do not and will not ever offer a true “objective” view of human experience and behavior. These models are only as robust as the datasets they are trained on, which were inevitably created and curated by humans. Although comparatively resistant to shared-methods bias, they are often trained on human reports as ground truth and therefore can share other limitations linked to these. Automated measures can also be misapplied. And, where individual errors arise, the complexity of machine-learning models means the mechanism of error may at times be difficult to trace.

As such, machine-learning models have access to no truthier sort of truth than do self-reports—always looking through a lens and never directly. Yet in a field where tests of theory all too often rely on reports from the same individuals measured in the same context via similarly structured closed-ended questionnaires, automated measures offer us something infinitely valuable—a source of error variance more likely to be random than systematic.

Conclusion

New technological developments can trigger competing drives. On the one hand, there is the draw, the allure of the unknown, the longed-for fix. Here, technology-driven measures are an escape from a marriage that has long turned stale—a fresh start with the shiny and new. On the other hand is the resistance, the attachment borne of familiarity, the fear of the unknown. Here, we compare new solutions not to the true array of flawed alternatives, but to the fallacy of a limitation-less tool—a measure that will never exist. Guided in part by these competing drives, attitudes toward new measurement technology follow an oscillating cycle (Borup et al., 2006; Maclure, 2020), swinging from undiscriminating uptake to wholesale dismissal, with little space for a wider view.

In this essay, we aim to offer an alternative. Specifically, we present an argument for the integration of AI-based measurement into psychological science grounded not in an inflated or rose-colored view of the future, but rather a reasoned vision of measures now. Measures based in AI offer great promise, and yet they are associated with limitations, both known and unknown. At the same time, these measures are less likely to be vulnerable to systematic forms of error and false-positive findings when compared with self-reports. As such, these new tools may offer behavioral scientists a relatively unglamorous, but nonetheless precious, “lesser of two evils.”

Feedback on this article? Email [email protected] or login to comment.

References

Adjerid, I., & Kelley, K. (2018). Big data in psychology: A framework for research advancement. American Psychologist, 73(7), 899–917.

Ariss, T., Caumiant, E. P., Fairbairn, C. E., Kang, D., Bosch, N., & Morris, J. K. (2025). Exploring associations between drinking contexts and alcohol consumption: An analysis of photographs. Journal of Psychopathology and Clinical Science, 34(3), 284–297.

Baldwin, J. R., Reuben, A., Newbury, J. B., & Danese, A. (2019). Agreement between prospective and retrospective measures of childhood maltreatment: A systematic review and meta-analysis. JAMA Psychiatry, 76(6), 584–593.

Baumeister, R. F., Vohs, K. D., & Funder, D. C. (2007). Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2, 396–403.

Borup, M., Brown, N., Konrad, K., & Van Lente, H. (2006). The sociology of expectations in science and technology. Technology Analysis & Strategic Management, 18(3–4), 285–298.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105.

Dunning, D., Heath, C., & Suls, J. M. (2004). Flawed self-assessment: Implications for health, education, and the workplace. Psychological Science in the Public Interest, 5(3), 69–106.

Fairbairn, C. E., & Bosch, N. (in press). Applying artificial intelligence to expand the measurement toolkit in clinical psychological science: Moving beyond self-reports. Clinical Psychological Science.

Fairbairn, C. E., Kang, D., Han, J., & Bosch, N. (in press). Objective assessment in clinical psychological science: Progress in wearable alcohol biosensors. Annual Review of Clinical Psychology.

Garcia, J., & Gustavson, A. R. (1997). The science of self-report. APS Observer, 10.

Gurrieri, L., Fairbairn, C. E., Sayette, M. A., & Bosch, N. (2021). Alcohol narrows physical distance between strangers. Proceedings of the National Academy of Sciences, 118(20), Article e2101937118.

Jacobucci, R., & Grimm, K. J. (2020). Machine learning and psychological research: The unexplored effect of measurement. Perspectives on Psychological Science, 15(3), 809–816.

Jacobucci, R., Littlefield, A. K., Millner, A. J., Kleiman, E. M., & Steinley, D. (2021). Evidence of inflated prediction performance: A commentary on machine learning and suicide research. Clinical Psychological Science, 9(1), 129–134.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

Maclure, J. (2020). The new AI spring: A deflationary view. AI & Society, 35, 747–750.

Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231–259.

Norvig, P., & Russell, S. J. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), Article aac4716.

Paulhus, D. L., & Vazire, S. (2007). The self-report method. Handbook of Research Methods in Personality Psychology, 1, 224–239.

Podsakoff, P. M., MacKenzie, S. B., Lee, J., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879–903.

Podsakoff, P. M., MacKenzie, S. B., & Podsakoff, N. P. (2012). Sources of method bias in social science research and recommendations on how to control it. Annual Review of Psychology, 63(1), 539–569.

Podsakoff, P. M., Podsakoff, N. P., Williams, L. J., Huang, C., & Yang, J. (2024). Common method bias: It’s bad, it’s complex, it’s widespread, and it’s not easy to fix. Annual Review of Organizational Psychology and Organizational Behavior, 11(1), 17–61.

Rathje, S., Mirea, D., Sucholutsky, I., Marjieh, R., Robertson, C. E., & Van Bavel, J. J. (2024). GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences, 121(34), Article e2308950121.

Schwarz, N. (1999). Self-reports: How the questions shape the answers. American Psychologist, 54(2), 93–105.

Tay, L., Woo, S. E., Hickman, L., Booth, B. M., & D’Mello, S. (2022). A conceptual framework for investigating and mitigating machine-learning measurement bias (MLMB) in psychological assessment. Advances in Methods and Practices in Psychological Science, 5(1), Article 25152459211061337.

Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological Bulletin, 133(5), 859–883.

Publications > Observer > “Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 27 days	Set by addthis.com to determine the usage of addthis.com service.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
loc	1 year 27 days	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

“Lesser of Two Evils”: Applying Artificial Intelligence to Move Beyond Self-Reports

Self-reports

Machine learning

Conclusion

About the Authors

Related

Seeking Empathy in the Age of AI

AI Revolution or Revulsion? APS Journal Editors Weigh In

Self-reports

Machine learning

Conclusion

About the Authors

Related

Seeking Empathy in the Age of AI

Chatbots Are Like Potato Chips: Understanding Loneliness in the Digital Age

AI Revolution or Revulsion? APS Journal Editors Weigh In

Cookies