New Content From Advances in Methods and Practices in Psychological Science

A Practical Guide for Integrating Community-Engaged Research Across the Psychological Research Cycle
Jawahir Mohamed, Benjamin Koshy Jacob, Régine Debrosse, et al.
Despite growing calls to increase diversity in research, methodological approaches that could address this issue remain underused. In this article, we argue that community-engaged research (CEnR), a framework that ultimately seeks to create genuine partnerships between researchers and marginalized communities, offers a solution for making psychological research more diverse while strengthening scientific rigor. We provide a practical guide for implementing CEnR principles across seven key research phases, from study conceptualization to knowledge dissemination, with different levels of engagement based on what researchers can realistically manage. Drawing from examples across five of our studies with diverse populations, including Black youth in Canada and the United States, Syrian newcomers in the Netherlands, and racial and ethnic minority university students in Canada, we illustrate how CEnR strengthens research quality and impact by fostering culturally responsive methods, building trust with communities, and enabling richer interpretation of findings. We discuss challenges in using CEnR, including the time and resources it requires and institutional barriers, while providing concrete guidance that emphasizes honest self-reflection and starting small. We conclude by highlighting future directions and emphasizing that developing CEnR skills requires ongoing practice, with the goal of building toward more collaborative and impactful psychology research.
Bridging Cultures in the Era of Big Data: A Cross-Language Equivalence Framework in Machine-Learning Research With Social Media Texts
Daphne Xin Hou, Stuti Thapa, Louis Tay
Past research on cross-cultural equivalence has focused on statistical procedures and techniques for ensuring measurement equivalence in tests and surveys. With the rise of big data and machine learning (ML), particularly natural language processing, researchers have powerful tools to study culture using large-scale, organic language data from social media. However, the lack of methodological guidance on how to establish cross-language equivalence in cross-cultural studies, especially with multilingual or culturally diverse text data, poses a major challenge. To address this gap, in this article, we propose a framework to raise awareness of key equivalence challenges and offer practical guidance for reducing measurement biases when applying ML techniques to social media language data. The framework outlines five types of equivalence following the ML pipeline from data collection to evaluation: source equivalence, sample equivalence, input equivalence, psychological-ground-truth equivalence, and model-performance equivalence. We also draw parallels to survey-based research to highlight shared conceptual challenges and identify future directions to advance cross-cultural research with big data and computational-linguistic methods.
Measurement-Reporting Practices in Social-and-Personality-Psychology Articles
Katherine M. Lawson, Julia G. Bottesini, Linh D. Khong, Simine Vazire
Psychological scientists are increasingly acknowledging the importance of transparency for research integrity. In the present study, we examined one important facet of transparency: providing enough information about measures so that readers can evaluate aspects of construct validity. With a focus on social and personality psychology, we explored how often authors in one journal report a scale name, citation, example item, number of items, and reliability coefficient and how often authors provide access to the study’s materials. We also investigated how measurement-reporting practices have changed from 2010 to 2020, the decade encompassing the start of the “credibility revolution” in psychology. Across two samples, we coded 506 Social Psychological and Personality Science articles (N= 425 articles with at least one questionnaire measure; N= 1,198 questionnaire measures). Overall, ≈31% of measures were reported with a name, ≈53% were reported with a citation, ≈66% were reported with an example item, ≈76% were reported with the number of items, and ≈78% of multiitem measures included some reliability information; ≈22% of measures were a single item, and 46% were ad hoc. We did not detect any apparent changes in the reporting practices examined from 2010 to 2020 in either sample except for an increase in the availability of materials over time. Therefore, the replication crisis may have motivated increased access to studies’ materials in recent years but otherwise does not seem to be associated with more transparent reporting of measurement information for questionnaires in brief social-and-personality-psychology articles.
Generating Experimental Text Stimuli for Psychological Research Using ChatGPT
Jacqueline Lechuga, Nakul N. Karle
The introduction of ChatGPT—an artificial-intelligence (AI) chatbot capable of text recognition and generation—has been transformative for numerous academic research communities, including psychology. We propose using ChatGPT to reduce researchers’ cognitive load and time spent creating text materials for psychological studies (e.g., vignettes). We present examples of ChatGPT-generated text materials for relationship-science (N= 60) and social-cognition (N= 67) studies and provide evidence of their effectiveness. Furthermore, we discuss ethical considerations and make recommendations related to using text materials generated by ChatGPT or similar AI tools. We end with a brief discussion of the importance of this work and encourage others to leverage AI in the field of psychology.
Sample-Size Planning for Frequentist and Bayesian 2 × 2 Analysis-of-Variance Designs
Sebastian A. J. Lortz, Andrew Setiono, Marton Kovacs, Don van Ravenzwaaij
Sample-size justification is an essential aspect of rigorous research in the behavioral and social sciences and helps to ensure studies are adequately powered, minimize resource waste, and reduce participant burden. However, researchers often face challenges in navigating the array of sample-size-planning methods available, particularly when balancing inferential goals and statistical frameworks. The SampleSizePlanner (SSP), originally developed to assist researchers in selecting appropriate sample-size determination methods for two-group designs, has been expanded to address 2 × 2 analysis-of-variance (ANOVA) designs. In this article, we introduce novel 2 × 2 design extensions to the SSP, including tools for Bayesian methods, such as the Bayes factor equivalence interval and the region of practical equivalence, and a frequentist approach. The SSP offers an accessible ShinyApp interface and R package, enabling researchers to streamline decision-making and apply various sample-size-planning methods with minimal computational overhead. Ready-to-use reporting templates foster transparency in sample-size justification. In the article, we address the practical application of these tools through comprehensive examples, demonstrating their relevance to scenarios such as interaction testing and equivalence estimation. By providing a standardized and accessible approach to sample-size planning, this work supports researchers in conducting reproducible and well-powered studies while addressing gaps in sample-size planning for 2 × 2 ANOVA designs.
Using Heteroskedasticity-Consistent Standard Errors and the Bootstrap for Linear Regression Analysis Available in SPSS: A Tutorial
Hanna Rajh-Weber, Stefan Ernest Huber, Martin Arendasy
In the landscape of statistical software, from customizable programming-language-based to point-and-click systems, SPSS remains a popular choice among researchers. In SPSS, analyses with conventional methods, such as ordinary least squares regression, can be easily performed. However, violated assumptions, such as homoskedasticity or normality of the errors, can lead to altered Type I error rates or a reduction in statistical power. SPSS provides a multitude of alternative inference methods associated with linear regression, but accessing them is not always straightforward. To facilitate data analysis when assumptions for conventional inference methods are not met, in this tutorial, we aim to provide applied researchers, particularly SPSS users, with a guide for performing linear regression analyses using heteroskedasticity-consistent (HC) standard errors (HC3 and HC4) and two different bootstrap resampling methods (pairs bootstrap and wild bootstrap). Each bootstrap method can further be combined with a bootstrappvalue, a percentile confidence interval, or a bias-corrected and accelerated confidence interval. For illustration, the methods are then compared using a computer-generated data set. Although the focus of this article is on applied researchers who use mainly SPSS for their analyses, a tutorial on how to do everything shown here in R (with custom functions) is included in the supplementary materials.
Handling Item-Level Missing Data in Linear Regression: A Tutorial
Guyin Zhang, Lihan Chen, Dexin Shi
With advances in methodology and statistical software, modern methods for handling missing data have become more accessible and straightforward to apply. In psychological studies, researchers often use questionnaires or scales composed of multiple items to measure constructs of interest. As a result, missing values frequently occur at the item level, whereas data analyses are typically conducted at the scale (composite) level. However, properly addressing item-level missing data remains a common challenge for many applied psychologists, including researchers who are otherwise well experienced in handling missing data at the scale level. In this tutorial, we introduce six approaches for handling item-level missing data: listwise deletion, hybrid methods that include proration with listwise deletion and proration with full-information maximum likelihood, item-level full-information maximum likelihood, item-level multiple imputation, two-stage maximum likelihood, and composite score factored regression. Using a published empirical data set, we provide step-by-step guidance on applying these methods in linear regression models. We include R code for each method and corresponding Mplus syntax if applicable. Finally, we summarize the key assumptions, advantages, and limitations of each approach and offer practical recommendations for researchers.
When Do Interaction/Moderation Effects Stabilize in Linear Regression?
Andrew Castillo, Joshua D. Miller, Colin Vize, David A. A. Baranger, Donald R. Lynam
Two-way interaction effects in linear regression occur when the relation between two variables changes depending on the level of a third. Despite their frequent use, interactions are notoriously difficult to estimate accurately and test for statistical significance because of small effect sizes and low reliability. In this study, we used Monte Carlo simulations to establish stability thresholds for two-way interactions between continuous variables across combinations of reliability (0.7–1.0), main effect size (0.1–0.5), collinearity (0.1–0.5), and interaction effect size (0.05–0.2). Stability was defined as the consistency of estimated effect sizes across repeated samples of the same size from the same population and operationalized using modified definitions of the corridor of stability and point of stability from Schönbrodt and Perugini. Results show that the stability of interaction estimates is primarily determined by sample size and predictor reliability. The case representing a realistic psychology field study, in which researchers have limited control over variables, stabilized atn= 3,800, requiring 72% statistical power. Atn ≤≤ 100, 11% to 45% of the estimates were incorrectly signed (i.e., negative when the true effect was positive). Most psychology studies enroll far fewer than 500 participants, and our results indicate many published interactions may be unstable. Analyses involving highly reliable predictors, such as group assignment in experimental designs, may stabilize at lower sample sizes because they attenuate the expected effect size less than variables with more measurement error. Researchers are encouraged to avoid routine tests of two-way interactions unless sample size and reliability are adequate and hypotheses are specified a priori.
Beyond Statistical Myopia: Replying to a Misguided Critique of Mind–Body Research
Peter J. Aungle, Daniel L. Chen, Nicholas P. Holmes
In response to Gelman and Brown’s recent critique of Aungle and Langer, we argue that their article illustrates how narrow statistical reasoning and selective literature review can misrepresent and undermine credible scientific findings. Using their discussion of perceived time and physical healing as a case study, we identify three general problems: (a) a failure to accurately characterize the methods and results of the study they critiqued, (b) misinterpretations and omissions in their review of the relevant literature, and (c) a tendency to generalize from isolated statistical issues to sweeping claims about the invalidity of mind–body research. We adopt Gelman and Brown’s recommended model and find that the main effect remains robust. We also document errors in their interpretations of other cited studies and demonstrate that they ignore decades of rigorous, well-replicated research on placebo effects and health mindsets. By examining their critique in detail, we highlight how methodological skepticism, when untethered from accurate reading and balanced appraisal, can mislead rather than clarify.
Large Language Models as Psychological Simulators: A Methodological Guide
Zhicheng Lin
Large language models (LLMs) offer emerging opportunities for psychological and behavioral research, but methodological guidance is lacking. In this article, I develop a framework for using LLMs as psychological simulators across two primary applications: simulating roles and personas to explore diverse contexts, and serving as computational models to investigate cognitive processes. For simulation, the framework includes (a) an implementation-confound checklist distinguishing essential from context-dependent methodological checks, (b) methods for developing psychologically grounded personas that move beyond demographic categories, and (c) a three-tier validation framework (direct, indirect, and generative) tailored to data availability. A diagnostic decision framework guides researchers through establishing performance validity, identifying implementation artifacts, and interpreting LLM-human discrepancies. For cognitive modeling, I synthesize (a) emerging approaches for probing internal representations, (b) methodological advances in causal interventions, and (c) strategies for relating model behavior to human cognition. The framework addresses overarching challenges, including prompt sensitivity, temporal limitations from training-data cutoffs, and ethical considerations that extend beyond traditional human-subjects review. Open-weight models are the default for reproducibility. Together, this framework integrates emerging empirical evidence about LLM performance—including systematic biases, cultural limitations, and prompt brittleness—to help researchers wrangle these challenges and leverage the unique capabilities of LLMs in psychological research.
Feedback on this article? Email [email protected] or login to comment.
APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.
Please login with your APS account to comment.