New Content From Advances in Methods and Practices in Psychological Science

Consistent and Precise Description of Research Outputs Could Improve Implementation of Open Science
Evan Mayo-Wilson, Sean Grant, Katherine S. Corker, David Moher
In 2013, the Center for Open Science proposed that journal articles be awarded “badges” for engaging in open-science practices, including preregistration. In 2015, the Transparency and Openness Promotion (TOP) guidelines (TOP 2015) promoted preregistration of studies and analysis plans. Since then, the term “preregistration” has been used to describe different research outputs created at different times—sometimes, but not always, including study registration. Following a review of evidence about TOP 2015 implementation, including evidence that adherence could not be rated reliably, the TOP Guidelines Advisory Board updated these guidelines (TOP 2025). The TOP 2025 guidelines no longer use the term “preregistration.” Instead, TOP 2025 disambiguates specific research outputs, such as registrations, study protocols, analysis plans, code, and other research materials. TOP 2025 also explains that researchers should describe the time at which outputs are created and shared in relation to key study activities. In this article, we explain why adopting the terminology used in TOP 2025 and describing the times at which specific research outputs are created and shared will enhance understanding and support better implementation and reporting of open science.
A Tutorial on Distribution-Free Uncertainty Quantification Using Conformal Prediction
Tim Kaiser, Philipp Herzog
Statistical prediction models are ubiquitous in psychological research and practice. Increasingly, machine-learning models are used. Quantifying the uncertainty of such predictions is rarely considered, partly because prediction intervals are not defined for many of the algorithms used. However, generating and reporting prediction models without information on the uncertainty of the predictions carries the risk of overinterpreting their accuracy. Conventional methods for prediction intervals (e.g., those defined for ordinary least squares regression) are sensitive to violations of several distributional assumptions. In this tutorial, we introduce conformal prediction, a model-agnostic, distribution-free method for generating prediction intervals with guaranteed marginal coverage, to psychological research. We start by introducing the basic rationale of prediction intervals using a motivating example. Then, we proceed to conformal prediction, which is illustrated in three increasingly complex examples using publicly available data and R code.
Do Musicians Have Better Short-Term Memory Than Nonmusicians? A Multilab Study
Massimo Grassi, Francesca Talamini, Gianmarco Altoè, et al.
Musicians are often regarded as a positive example of brain plasticity and associated cognitive benefits. This emerges when experienced musicians (e.g., musicians with more than 10 years of music training and practice) are compared with nonmusicians. A frequently observed behavioral finding is a short-term memory advantage of the former over the latter. Although available meta-analysis reported that the effect size of this advantage is medium (Hedges’sg= 0.5), no literature study was adequately powered to estimate reliably an effect of such size. This multilab study has been ideated, realized, and conducted in lab by several groups that have been working on this topic. Our ultimate goal was to provide a community-driven shared and reliable estimate of the musicians’ short-term memory advantage (if any) and set a method and a standard for future studies in neuroscience and psychology comparing musicians and nonmusicians. Thirty-three research units recruited a total of 600 experienced musicians and 600 nonmusicians, a number that is sufficiently large to estimate a small effect size (Hedges’sg= 0.3) with a high statistical power (i.e., 95%). Subsequently, we measured the difference in short-term memory for musical, verbal, and visuospatial stimuli. We also looked at cognitive, personality, and socioeconomic factors that might mediate the difference. Musicians had better short-term memory than nonmusicians for musical, verbal, and visuospatial stimuli with an effect size of, respectively, Hedges’sgs = 1.08 (95% confidence interval [CI] = [0.94, 1.22]; large), 0.16 (95% CI = [0.02 0.30]; very small), and 0.28 (95% CI = [0.15, 0.41]; small). This work sets the basis for sound research practices in studies comparing musicians and nonmusicians and contributes to the ongoing debate on the possible cognitive benefits of musical training.
Meeting the Bare Minimum: Quality Assessment of Idiographic Temporal Networks Using Power Analysis and Predictive-Accuracy Analysis
Yong Zhang, Jordan Revol, Ginette Lafit, et al.
The network theory of psychopathology inspired clinicians and researchers to use idiographic networks to study how symptoms of an individual interact over time, hoping to find the target symptom(s) for intervention to most effectively break this self-sustaining network. These networks are often based on the vector-autoregressive (VAR) model and rely on intensive longitudinal data collected in patients’ daily lives. Nowadays, one major challenge these networks are faced with is that they are used without sufficient quality assessments. Because VAR-based temporal networks are complex and highly parameterized, they can easily face problems of low statistical power and overfitting, especially when the time series available is short. In this study, we review existing idiographic-network studies with a focus on the number of variables and time points used in the analysis and show that the “big network, short time series” problem is prevalent. As potential solutions, we propose two simulation-based methods that aim to find the optimal number of time points to be collected: power analysis and predictive-accuracy analysis. Two applications of both methods are demonstrated: (a) “a priori”—informing the sample-size planning of future network studies and (b) “retrospective”—evaluating whether the sample size of existing network studies was large enough to avoid problems of low statistical power and overfitting. Results confirmed the observation that the sample sizes in past network studies are often insufficient, suggesting that findings of existing network studies should be critically assessed. Future idiographic-network studies are thus strongly advised to make more guided decisions on sample size using the proposed methods.
Gather Demographic Data About Gender, Sexuality, and Relational Identities: Asking the Right Questions
Eleanor J. Junkins, Jaime Derringer
In this tutorial, we suggest ways to improve current practices for measuring gender identity, sexual orientation, and demographics about relationships based on previous datasets and a newly collected survey of people’s behavior and perceptions of alternative-response formats. We apply lessons learned from racial identity/ethnicity to suggest broader principles of improving demographic measurement. We offer guides to meet the expectations of diverse stakeholders, including participants. The response options we recommend were curated to balance global identities and emerging trends to be applicable for online international research and in-person psychology research conducted primarily by U.S. institutions. We also offer practical suggestions for researchers to handle more complex data, including multiselect response options, which tend to be preferred by participants. Improved demographic data allow researchers to more fully capture multidimensional and complex social identities that are related to social inequities. In sum, the current tutorial is a guide to and discussion about challenges in collecting demographic data on social identities in which we use illustrative data to address important points related to measuring gender, sexuality, and relational demographics, specifically.
A Primer on Fixed Effects and Fixed-Effects Panel Modeling Using R, Stata, and SPSS
Nicolas Sommet, Oliver Lipps
Fixed-effects modeling is a powerful tool for estimating within-clusters associations in cross-sectional data and within-participants associations in longitudinal data. Although commonly used by other social scientists, this tool remains largely unknown to psychologists. To address this issue, we offer a pedagogical primer tailored for this audience, complete with R, Stata, and SPSS scripts. This primer is organized into three parts. In Part 1, we show how fixed-effects modeling applies to clustered cross-sectional data. We introduce the concepts of “cluster dummies” and “demeaning” and provide scripts to estimate the within-schools association between sports and depression in a fictional data set. In Part 2, we show how fixed-effects modeling applies to longitudinal data and provide scripts to estimate the within-participants association between sports and depression over time in a fictional four-wave data set. In this part, we cover three additional topics. First, we explain how to calculate effect sizes and offer simulation-based sample-size guidelines to detect within-participants effects of plausible magnitude with sufficient power. Second, we show how to test two possible interactions: between a time-invariant and a time-varying predictor and between two time-varying predictors. Third, we introduce three relevant extensions: first-difference modeling (estimating changes from one wave to the next), time-distributed fixed-effects modeling (estimating changes before, during, and after an individual event), and within-between multilevel modeling (estimating both within- and between-participants associations). In Part 3, we discuss two limitations of fixed-effects modeling: time-varying confounders and reverse causation. We conclude with reflections on causality in nonexperimental data.
Towards a Clearer Understanding of Causal Estimands: The Importance of Joint Effects in Longitudinal Designs With Time-Varying Treatments
Lukas Junker, Ramona Schoedel, Florian Pargent
Longitudinal studies with time-varying treatments or exposures make it hard to figure out “what effect” is being estimated. Drawing on causal inference, we clarify this by distinguishing between total, direct, and—centrally—joint effects, defined within the potential-outcomes framework and illustrated with directed acyclic graphs. Joint effects extend average treatment effects to repeated interventions, providing a practical measure of combined intervention effects over time. Using a worked example on smartphone use and sleep quality, we demonstrate how different estimands answer different questions, why single total effects can sometimes mislead in longitudinal settings, and how joint effects capture strategy-level consequences across time. A key practical takeaway is that joint effects can be estimated in both experimental and observational studies. In the latter, it typically suffices to adjust only for variables that govern treatment decisions at each time point rather than modeling the entire causal system. Building on this, we propose covariate-driven treatment assignment (information-restriction designs in which decisions depend only on observed covariates) as a practical route to causal inference in nonexperimental psychology, and we connect these designs to estimation via g-methods from epidemiology. We provide open materials, including R code, to support adoption.
Investigating the Barriers and Enablers to Data-Sharing Behaviors: A Qualitative Registered Report
Emma L. Henderson, Ruth Abrams, Afrodita Marcu, Lou Atkins, Emily K. Farran
“Data sharing” describes the process of making research data available for reuse. The availability of research data is the basis of transparent, effective research systems that democratize access to knowledge and advance discovery. Despite a broad recognition of the value of data sharing across the sector, many researchers are not yet engaging meaningfully with data-sharing behaviors. Through a behavioral lens, in this qualitative Registered Report, we aimed to identify the barriers and enablers to data sharing experienced by researchers working at a UK university. Data were collected using a theoretically informed 26-item interview schedule (capability, opportunity, motivation–behavior [COM-B] model; theoretical-domains framework [TDF]). Fourteen participants across a range of career levels and disciplines were recruited to take part in semistructured interviews focused on data-sharing behaviors and their influences. Transcripts were analyzed using thematic template analysis based on the COM-B constructs and TDF domains. Results indicated that quantitative data-sharing behaviors were performed differently to qualitative behaviors, which affected the required skills. However, the barriers experienced were similar across all disciplines. These barriers included a lack of time to undertake data-sharing activities, concerns over General Data Protection Regulation/correct deidentification of data, and limited infrastructure. Enablers included researchers’ drive to be seen as open researchers. This identity matters to them for both the good of research and what it signals about them. It is a key enabling factor, potentially driving behavior even in the absence of other factors. Mandating data-sharing activities could encourage more widespread behaviors. However, such mandates need to be both discipline-specific and supported by institutions providing adequate resources.
Language Models Accurately Infer Correlations Between Psychological Items and Scales From Text Alone
Björn E. Hommel, Ruben C. Arslan
Many behavioral scientists do not agree on core constructs and how they should be measured. Different literatures measure related constructs, but the connections are not always obvious to readers and meta-analysts. Many measures in behavioral science are based on agreement with survey items. Because these items are sentences, computerized language models can make connections between disparate measures and constructs and help researchers regain an overview over the rapidly growing, fragmented literature. Our fine-tuned language model, the SurveyBot3000, accurately predicts the correlations between survey items, the reliability of aggregated measurement scales, and intercorrelations between scales from item positions in semantic vector space. We measured the model’s performance as the convergence between its synthetic model estimates and empirical coefficients observed in human data. In our pilot study, the out-of-sample accuracy was .71 for item correlations, .89 for reliabilities, and .89 for scale correlations. In our preregistered validation study using novel items, the out-of-sample accuracy was slightly reduced to .59 for item correlations, .84 for reliabilities, and .84 for scale correlations. The synthetic item correlations showed an average prediction error of .17, and there were larger errors for middling correlations. Predictions exhibited generalizability beyond the training data and across various domains, with some variability in accuracy. Our work shows language models can reliably predict psychometric relationships between survey items, enabling researchers to evaluate new measures against existing scales, reduce redundancy in measurement, and work toward a more unified behavioral-science taxonomy.
CATAcode: A Principled Approach for Coding Check-All-That-Apply Demographic Items
Gabriel J. Merrin, Kyle Nickodem, Nickholas Grant, Sitara M. Weerakoon, Melissa K. Holt, Dorothy L. Espelage
Accurately measuring, reporting, interpreting, and evaluating identity categories in social-science research is essential; however, check-all-that-apply (CATA) responses present methodological challenges because of the large permutations of categories and the fluctuating salience of intersecting identities across time and contexts. These challenges can hinder the validity of quantitative studies, particularly those examining racial, ethnic, and other social-identity differences. Although quantitative-critical-race-theory scholars have proposed principles for handling racial and ethnic categories in quantitative research, their application in statistical analyses remains limited. In this article, we introduce CATAcode, an R package designed to assist researchers in exploring and preparing CATA demographic items for statistical modeling. By applying this tool to cross-sectional and longitudinal data, in the tutorial, we demonstrate how CATAcodecan enhance the generalizability, transparency, and reproducibility of social-science research. Improving the rigor of demographic measurement is essential for identifying and addressing social inequalities, allocating resources, and understanding broader patterns of marginalization.
Feedback on this article? Email [email protected] or login to comment.
APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.
Please login with your APS account to comment.