Advances in Methods and Practices in Psychological Science

Language Models Accurately Infer Correlations Between Psychological Items and Scales From Text Alone

Abstract

Many behavioral scientists do not agree on core constructs and how they should be measured. Different literatures measure related constructs, but the connections are not always obvious to readers and meta-analysts. Many measures in behavioral science are based on agreement with survey items. Because these items are sentences, computerized language models can make connections between disparate measures and constructs and help researchers regain an overview over the rapidly growing, fragmented literature. Our fine-tuned language model, the SurveyBot3000, accurately predicts the correlations between survey items, the reliability of aggregated measurement scales, and intercorrelations between scales from item positions in semantic vector space. We measured the model’s performance as the convergence between its synthetic model estimates and empirical coefficients observed in human data. In our pilot study, the out-of-sample accuracy was .71 for item correlations, .89 for reliabilities, and .89 for scale correlations. In our preregistered validation study using novel items, the out-of-sample accuracy was slightly reduced to .59 for item correlations, .84 for reliabilities, and .84 for scale correlations. The synthetic item correlations showed an average prediction error of .17, and there were larger errors for middling correlations. Predictions exhibited generalizability beyond the training data and across various domains, with some variability in accuracy. Our work shows language models can reliably predict psychometric relationships between survey items, enabling researchers to evaluate new measures against existing scales, reduce redundancy in measurement, and work toward a more unified behavioral-science taxonomy.