New Content From Advances in Methods and Practices in Psychological Science

Logo for the journal AMPPS

Open Science in the Developing World: A Collection of Practical Guides for Researchers in Developing Countries
Hu Chuan-Peng, Zhiqi Xu, Aleksandra Lazić, et al.  

Over the past decade, the open-science movement has transformed the research landscape, although its impact has largely been confined to developed countries. Recently, researchers from developing countries have called for a redesign of open science to better align with their unique contexts. However, raising awareness alone is insufficient—practical actions are required to drive meaningful and inclusive change. In this work, we analyze the opportunities offered by the open-science movement and explore the macro- and micro-level barriers researchers in developing countries face when engaging with these practices. Drawing on these insights and aiming to inspire researchers in developing regions or other resource-constrained contexts to embrace open-science practices, we offer a four-level guide for gradual engagement: (a) foundation, using open resources to build a solid foundation for rigorous research; (b) growth, adopting low-cost, easily implementable practices; (c) community, contributing to open-science communities through actionable steps; and (d) leadership, taking on leadership roles or forming local communities to foster cultural change. We further discuss potential pitfalls of the current open-science practices and call for readaptation of these practices in developing countries’ settings. We conclude by outlining concrete recommendations for future action. 

Three Sensitivity-Analysis Methods to Assess Unmeasured Pretreatment Confounding Bias in Experimental Mediation Analysis
Diana Alvarez-Bartolo, David P. MacKinnon  

Statistical-mediation analysis is a widely used method in psychological research that helps understand the intermediate variables, known as mediators ( M ), by which an independent variable ( X ) causes an outcome variable ( Y ). A major contribution to statistical-mediation analysis has been the incorporation of causal methods because it allows a clear definition of the causal direct and mediated effects and the specification of the assumptions to interpret such effects as causal. Modern causal approaches to mediation analysis encourage routinely investigating the extent to which unobserved confounders may explain the observed mediated effects. The recommendation acknowledges that even when X represents random assignment, participants are not usually randomly assigned to levels of M ; hence, unobserved confounders may bias the M to Y relation ( b -path). In this article, we describe unobserved pretreatment confounding of the M to Y relation in experimental mediation studies and three sensitivity-analysis methods to assess unmeasured pretreatment confounding of the M to Y relation: the correlated-residuals method, the left-out-variables-error method, and the phantom-variable method. We report the results of a simulation study that compares the routine application of the three sensitivity-analysis methods. Results generally indicate that larger effect sizes of the population b -path are less susceptible to confounding bias for all sensitivity methods. Thus, an initial approach to investigating confounding bias in experimental mediation studies is to assess the effect size of the path relating M to Y , and more details can be obtained by applying one of the three sensitivity-analysis methods.  

From Experiments to Policy Insights: Generalizing Causal Effects From Study Samples to Target Populations
Wen Wei Loh, Dongning Ren  

Psychological science holds substantial promise for informing policy decisions but faces challenges in realizing its potential. One widely recognized challenge is bridging the gap between the nonrepresentative study samples commonly used to evaluate interventions and the broader populations that policymakers aim to serve. To address this challenge, we introduce causal effect generalizability, an approach from causal inference and epidemiology, in the form of an accessible, nontechnical tutorial for psychological and behavioral scientists. We use publicly available data from a real-world psychology intervention study to illustrate why causal effects in a nonrepresentative study sample may systematically differ from those in a broader population. We provide a step-by-step guide with user-friendly R functions, enabling researchers to generalize causal effects from a study sample back to the full target population. This approach allows researchers to assess intervention effects in broader populations, offering valuable insights to guide evidence-based policy development. We hope this nontechnical introductory material will assist scholars in enhancing the policy relevance and real-world impact of psychological science. 

A Practical Guide to Specifying Random Effects in Longitudinal Dyadic Multilevel Modeling
Kareena S. del Rosario, Tessa V. West

Analyzing over-time dyadic data can be challenging, particularly when using multilevel models with complex random-effect structures. In this tutorial, we discuss the best practices of model specification for longitudinal dyadic multilevel modeling, providing a practical guide to specifying (and respecifying) random effects with both theoretical and practical considerations in mind. We begin by defining random effects in the context of repeated-measures dyadic data and address common issues such as nonconvergence. Then, using two models—the dyadic growth-curve and the stability and influence model—we demonstrate how to apply these guidelines in both SAS and R. The dyadic growth-curve model provides a straightforward example, whereas the stability and influence model illustrates common challenges when dealing with complex random-effect structures and convergence issues. In the first exercise, we explain how to customize the variance-covariance matrix for these analyses in SAS. In the second exercise, we adapt these analyses for R and discuss how to implement the sum-and-difference approach for indistinguishable dyads. We conclude with a discussion of alternative models and go over the utility of data simulation during study design, helping readers plan and select the best approach for their research.

A Window Into the State of the Science: Current Reporting Practices Related to Generalizability in MRI and Functional-MRI Studies
Arianna M. Gard, Deena Shariq, Alison A. Albrecht, et al.

Concerns for the replicability, reliability, and generalizability of MRI and functional MRI (fMRI) research have led to debates over the contributions of sample size, open-science practices, and recruitment methods, particularly in the psychological sciences. Key to understanding the state of a science is an assessment of reporting practices. In this structured review, we evaluated select reporting practices across three domains: (a) demographic (e.g., age), (b) methodological (e.g., inclusion/exclusion criteria), and (c) open science and generalizability (e.g., preregistration, target-population identification). Included were 919 published MRI and fMRI studies from 2019 in nine top-ranked journals. Reporting across domains was infrequent; participant racial-ethnic identity (14.8%), reasons for missing imaging data (31.2%), and identification of a target population (19.4%) were particularly low. Reporting likelihood varied by study characteristics (e.g., journal) and was correlated across domains. Finally, study sample size but not reporting frequency was positively associated with 2-year citation counts. Results call for recentering transparency in reporting practices in MRI and fMRI studies, with direct implications for study generalizability.

How Do Psychology Journals Handle Postpublication Critique? A Cross-Sectional Study of Policy and Practice
Annie Whamond, Simine Vazire, Beth Clarke, et al.  

Postpublication critique, such as letters to the editor, can contribute to the validity and trustworthiness of scientific research. We conducted a cross-sectional analysis of the policy and practice of postpublication critique in (a) randomly selected (N= 100) and (b) prominent (N= 100) psychology journals. In 2023, an explicit submission option for postpublication critique was available at 23% (95% confidence interval [CI] = [16%, 32%]) of randomly sampled psychology journals and 38% of the most prominent psychology journals. Journals sometimes imposed limits on the length and time allowed to submit critiques. We manually inspected two random samples of empirical articles published in 2020 (articles per sample:N= 101), estimating the prevalence of postpublication critique to be 0% (95% CI = [0%, 3.7%]) in psychology journals generally and 1% (95% CI = [0.2%, 5.4%]) in the most prominent psychology journals. The policy and practice of postpublication critique is seriously neglected in psychology journals.

Realistic Expectations for Replications: Expecting Too Little Is Just as Bad as Expecting Too Much
Frieder Göppert, Kriti Bhatia, Sascha Meyen, Volker H. Franz  

In a series of influential articles, Spence and Stanley have discussed to which degree researchers can expect a published effect to replicate in a replication study. They argue that expectations are often too high because sampling variability and measurement error are not fully taken into account. They conclude that (a) the failure of a single study to replicate a published effect might be less serious than often assumed, (b) the replication crisis might only exist for those who hold unduly high expectations about replications, and (c) researchers should temper their expectations about replication studies. However, these claims are based on a highly unusual and far too pessimistic approach they used in their initial work on this topic. Later, Spence and Stanley have promoted the well-established prediction intervals, for which their recent article in this journal provides an instructive tutorial. We use these prediction intervals to demonstrate that their previous claims about replications were far too pessimistic and need to be updated. We conclude that the failure of a single study to replicate a published effect should indeed be taken seriously (given of course, that the replication study is well-designed and has sufficient statistical power), and warn that too pessimistic expectations about replications can be just as detrimental to science as too optimistic expectations. This is crucial because too pessimistic expectations can make it prohibitively difficult for researchers to demonstrate evidence against original results.

Publication Bias in Academic Decision-Making in Clinical Psychology
Louis Schiekiera, Kristina Eichel, Felicitas Heßelmann, Jacqueline Sachse, Sophie P. Müller, Helen Niemeyer

Review studies suggest that results that are statistically significant or consistent with hypotheses are preferred in the publication process and in reception. The mechanisms underlying this bias remain unclear, and prior research has focused on between-subjects rather than within-subjects designs. We conducted a within-subjects study, grounded in dual-process decision-making theories, to examine these dynamics. Across four online experiments, 303 clinical-psychology researchers evaluated 16 fictitious abstracts varying in statistical significance and hypothesis consistency. Participants provided fast, intuitive judgments about each abstract’s likelihood of being submitted, read, or cited, rated their feeling of rightness (FOR), and gave deliberated evaluations. We analyzed the data using multilevel and mediation models. Researchers rated statistically nonsignificant abstracts as less likely to be submitted, read, or cited compared with significant ones. No such bias was found for hypothesis-inconsistent results. Intuitive judgments were rarely revised, and FOR did not predict response changes. Overall, researchers favored statistically significant results, with deliberation and FOR playing minimal roles.

A Fragmented Field: Construct and Measure Proliferation in Psychology
Farid Anvari, Taym Alsalti, Lorenz A. Oehler, et al.

We examined the extent to which constructs and measures have proliferated in psychological science. We integrated two large databases obtained from the American Psychology Association that have been used to keep track of constructs, measures, and research in the psychological-science literature for the past 30 years. In our descriptive analyses, we found that (a) thousands of new constructs and measures are published each year, (b) most measures are used very few times, and (c) there is no trend toward consensus or standardization in the use of constructs and measures; in fact, there is a trend toward even greater fragmentation over time. That is, constructs and measures are proliferating. We conclude that measurement in the psychological-science literature is fragmented, creating problems such as redundancy and confusion and stifling cumulative scientific progress. We conclude by providing suggestions for what researchers can do about this problem.

Identifying Careless Survey Respondents Through Machine Learning Using Responses to a Gibberish Scale
Leah Bloy, Yehezkel Resheff, Avraham Kluger, Nechumi Malovicki-Yaffe

Invalid responses pose a significant risk of distorting survey data, compromising statistical inferences, and introducing errors in conclusions drawn from surveys. Given the pivotal role of surveys in research, development, and decision-making, it is imperative to identify careless survey respondents. The existing literature on this subject comprises two primary categories of approaches: methods that rely on survey items and methods involving post hoc analyses. The latter, which does not demand preemptive preparation, predominantly incorporates statistical techniques or metadata analysis aimed at identifying distinct response patterns that are associated with careless responses. However, several inherent limitations limit the precise identification of careless respondents. One notable challenge is the lack of consensus concerning the thresholds to use for the various measures. Furthermore, each method is designed to detect a specific response pattern associated with carelessness, leading to conflicting outcomes. In this article, we seek to assess the efficacy of the existing methods using a novel survey methodology encompassing responses to both meaningful and meaningless gibberish scales in which the latter compels respondents to answer without considering item content. Using this approach, we propose the application of machine learning to identify careless survey respondents. Our findings underscore the efficacy of a methodology using supervised machine learning combined with unique gibberish data as a potent method for the identification of careless respondents, aligning with and outperforming other approaches in terms of effectiveness and versatility.

Feedback on this article? Email [email protected] or login to comment.


APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.