How Should Psychologists Use AI and Big Data? Nine Guides Point the Way

As generative AI and large language models (LLMs) rapidly reshape the research landscape, psychological scientists are seeking guidance on how to responsibly integrate these tools into their work. The Observer has gathered nine recent and upcoming articles from APS’s open-access journal Advances in Methods and Practices in Psychological Science (AMPPS) that offer practical tutorials, frameworks, and cautionary insights for researchers navigating this new terrain. They are listed below, not in any particular order.
Associate Editor Kong Meng Liew introduces this list with a few thoughts on how these articles fit into the larger landscape of AMPPS submissions.
Introduction from Associate Editor Kong Meng Liew

As a generalist journal focused on methodological innovation, AMPPS has seen an increase in papers surrounding the use of generative AI and associated technologies within the psychological sciences, highlighting strengths, opportunities, and concerns that are shared by many within our community. Barely 3 years after the launch of ChatGPT, AMPPS editors are seeing such AI systems being broadly used in psychological research.
One stream of submissions has to do with how researchers can effectively use generative AI in their research. Lechuga & Karle (2026) provide an example for text-stimulus generation. Social psychologists have long relied on text-based stimuli for experiments, and generative AI provides a way to streamline the writing process. A concerning trend, however, has been the substitution of human participants with generative AI models. Lin (2026) provides a framework for understanding when AI can and should be used to simulate humans, and Asher et al. (2026) provide a keystroke-based tool to detect when generative AI models are problematically employed by participants to complete online surveys.
Another stream of submissions surrounds the use of the underlying technologies (LLMs) that power these AI models. LLMs are, at their core, sophisticated computational tools for statistical pattern recognition. Abdurahman et al. (2025) offer a primer on what LLMs are for psychologists who may be unfamiliar with these technologies. Debelak et al. (2025) provide a tutorial on how these LLMs can be used to classify text, such as automatically deciding if a given text can be placed into specific categories (i.e., high vs. low contentment), and how these models can be unpacked to understand why these classifications are made.
Additionally, several submissions also focused on how LLMs can be used effectively in specific areas of psychological research: Hou et al. (2026) provide some considerations for using LLMs in cross-cultural and cross-linguistic survey research, and Brickman et al. (2025) provide a workflow and primer for how LLMs can be used for psychological assessment, through participants’ speech or text-based inputs.
Related: Nine Practical Guides to Support Your Research in 2026
Nine guides for using AI and big data
1. A primer for evaluating large language models in social science research
Suhaib Abdurahman, Alireza Salkhordeh Ziabari, Alexander Moore, et al.
Autoregressive large language models exhibit remarkable conversational and reasoning abilities, and exceptional flexibility across a wide range of tasks. Subsequently, LLMs are being increasingly used in scientific research to analyze data, generate synthetic data, or even write scientific papers. The authors provide recommendations to ensure replicable and robust results when using LLMs. They highlight considerations for reviewers, focusing on methodological rigor, replicability, and validity of results when evaluating studies that use LLMs to automate data processing or simulate human data. And they offer practical advice on assessing the appropriateness of LLM applications in submitted studies, emphasizing the need for transparency in methodological reporting and the challenges posed by the nondeterministic and continuously evolving nature of these models. By providing a framework for best practices and critical review, this primer aims to ensure high-quality, innovative research within the evolving landscape of social science studies using LLMs.
Resources:
- Access the data, code, and instructions for replication.
2. Large language models for psychological assessment: A comprehensive overview
Jocelyn Brickman, Mehak Gupta, and Joshua Oltmanns
Large language models are extraordinary tools that demonstrate potential to improve our understanding of psychological characteristics. They provide an unprecedented opportunity to supplement self-report in psychology research and practice with scalable behavioral assessment. However, they also pose unique risks and challenges. This article serves as an overview and guide for psychological scientists to evaluate LLMs for psychological assessment. The authors briefly review the development of transformer-based LLMs and discuss their advances in natural language processing, describe the experimental design process, and discuss important broader ethical and implementation issues and future directions for researchers using this methodology. The reader will develop an understanding of essential ideas and an ability to navigate the process of using LLMs for psychological assessment.
Resources:
- Read the supplemental material.
3. From embeddings to explainability: A tutorial on LLM-based text analysis for behavioral scientists
Rudolf Debelak, Timo Koch, Matthias Aßenmacher, and Clemens Stachl
Large language models are transforming research in psychology and the behavioral sciences by enabling advanced text analysis at scale. Their applications range from the analysis of social media posts to infer psychological traits to the automated scoring of open-ended survey responses. However, despite their potential, many behavioral scientists struggle to integrate LLMs into their research due to the complexity of text modeling. This tutorial aims to provide an accessible introduction to LLM-based text analysis, focusing on transformer architecture. The authors guide researchers through the process of preparing text data, using pretrained transformer models to generate text embeddings, fine-tuning models for specific tasks such as text classification, and applying interpretability methods like SHAP and LIME to explain model predictions. By making these powerful techniques more approachable, they aim to empower behavioral scientists to leverage LLMs in their research, unlocking new opportunities for analyzing and interpreting textual data.
4. Large language models as psychological simulators: A methodological guide
Zhicheng Lin
Large language models offer emerging opportunities for psychological and behavioral research, but methodological guidance is lacking. This article develops a framework for using LLMs as psychological simulators across two primary applications: simulating roles and personas to explore diverse contexts and serving as computational models to investigate cognitive processes. The framework addresses overarching challenges including prompt sensitivity, temporal limitations from training data cutoffs, and ethical considerations that extend beyond traditional human-subjects review. Throughout, the article emphasizes open-weight models as the default for reproducibility and the need for transparency about model capabilities and constraints. Together, this framework integrates emerging empirical evidence about LLM performance—including systematic biases, cultural limitations, and prompt brittleness—to help researchers wrangle these challenges and leverage the unique capabilities of LLMs in psychological research.
5. Chatbots are undermining crowdsourced research in the behavioral sciences: Detecting AI-assisted cheating with a keystroke-based tool
Michael Asher, Gillian Gold, Eason Chen, and Paulo Carvalho
Generative AI poses a significant threat to data integrity on crowdsourcing platforms like Prolific, which behavioral scientists widely rely on for data collection. LLMs allow users to generate fluent and relevant responses to open-ended questions, which can mask inattention and compromise experimental validity. To empirically estimate the prevalence of this behavior, the authors analyzed keystroke data from three studies (N = 928) on Prolific between May and July 2025. Using an embedded JavaScript tool, they flagged participants who pasted text or whose keystroke count was anomalously low compared to their response length. For each flagged participant, they manually compared detected keystrokes to their final response to determine if the text could have been typed. This confirmed that, despite deterrence measures, approximately 9% of participants submitted responses consistent with AI assistance or other forms of outsourced responding. These participants outperformed noncheaters (by up to 1.5 standard deviations), were over twice as likely to share geolocations with other participants (suggesting possible proxy use), and exhibited lower internal consistency on questionnaire scales. Simulated power analyses indicate that this level of undetected cheating can diminish observed effect sizes by 10% and inflate required sample sizes by up to 30%. These findings highlight the urgent need for new detection methods such as keystroke logging, which offers verifiable evidence of cheating that is difficult to obtain from manual review of LLM-generated text alone.
Resources:
- Access the data and code.
6. Bridging cultures in the era of big data: A cross-language equivalence framework in machine learning research with social media texts
Daphne Hou, Stuti Thapa, and Louis Tay
With the rise of big data and machine learning (ML), particularly natural language processing, researchers have powerful tools to study culture using large-scale, organic language data from social media. However, the lack of methodological guidance on how to establish cross-language equivalence in cross-cultural studies, especially with multilingual or culturally diverse text data, poses a major challenge. To address this gap, the authors propose a framework to raise awareness of key equivalence challenges and offer practical guidance for reducing measurement biases when applying ML techniques to social media language data. The framework outlines five types of equivalence following the ML pipeline from data collection to evaluation: source equivalence, sample equivalence, input equivalence, psychological ground truth equivalence, and model performance equivalence. The authors also draw parallels to survey-based research to highlight shared conceptual challenges and identify future directions to advance cross-cultural research with big data and computational linguistic methods.
7. Advancing psychological research with random forests: A review of methods, tools, and applications
Yi Feng, Han Du, Jiarui Song, et al.
This paper provides a comprehensive review of random-forest methods in psychological research. The authors begin by introducing the fundamental concept of decision trees, followed by the theoretical framework of random forests as an ensemble method. Next, they review the methodological development and commonly used software tools for random-forest models. They also discuss practical issues and challenges when implementing random forests in psychological studies and then systematically review the empirical psychological research articles published between 2020 and 2022 that utilized random forests. By synthesizing the theoretical foundation and current empirical practices, the authors identify significant methodological gaps in applying random forests to psychological data and hope to initiate much-needed conversations on how psychologists can effectively use random-forest methods to advance psychological science.
8. Generating experimental text stimuli for psychological research using ChatGPT
Jacqueline Lechuga and Nakul Karle
The introduction of ChatGPT—an AI chatbot capable of text recognition and generation—has been transformative for numerous academic research communities, including psychology. The authors propose using ChatGPT to reduce researchers’ cognitive load and time spent creating text materials for psychological studies (e.g., vignettes). They present examples of ChatGPT-generated text materials for relationship science (N = 60) and social cognition (N = 67) studies and provide evidence of their effectiveness. They also discuss ethical considerations and make recommendations related to using text materials generated by ChatGPT or similar AI tools. They conclude with a brief discussion of the importance of this work and encourage others to leverage AI in the field of psychology.
Resources:
9. Google-search data for psychological scientists: A tutorial and best practices
Jordan Moon and Michael Barlev
Google searches have been described as the most important dataset on the human psyche ever assembled. Google search data—accessible through a tool called Google Trends—can provide new insights on topics as varied as stereotypes and prejudices, political attitudes, religious identity and belief, personality, motivations, psychological well-being, mental health, and culture. Google Trends can generate highly customized datasets: Users can compare the popularity of search terms across most of the world, or access longitudinal data as far back as 2004, and they can do so with high geographical and temporal granularity. Notwithstanding these opportunities, Google Trends has significant limitations. Without appropriate caution, users can easily rely on data that are not meaningful or draw mistaken conclusions. The authors provide a comprehensive overview and tutorial on best practices when using Google Trends.
Resources:
- Access data and code.
- Read the supplemental material.
Feedback on this article? Email [email protected] or login to comment.
APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.
Please login with your APS account to comment.