Advances in Methods and Practices in Psychological Science

Bridging Traditional-Statistics and Machine-Learning Approaches in Psychology: Navigating Small Samples, Measurement Error, Nonindependent Observations, and Missing Data

Abstract

In recent years, machine learning has propagated into different aspects of psychological research, and supervised machine-learning methods have increasingly been used as a tool for predicting human behavior or psychological characteristics when there is a large number of possible predictors. However, researchers often face practical challenges when using machine-learning methods on psychological data. In this article, we identify and discuss four key challenges that often arise when applying machine learning to data collected for psychological research. The four challenge areas cover (a) limited sample size, (b) measurement error, (c) nonindependent data, and (d) missing data. Such challenges are extensively discussed in the “traditional” statistical literature but are often not explicitly addressed, or at least not to the same extent, in the applied-machine-learning community. We present how each of these challenges is dealt with first from a traditional-statistics perspective and then from a machine-learning perspective and discuss the strengths and weaknesses of these solutions by comparing the approaches. We argue that the boundary between traditional statistics and machine learning is fluid and emphasize the need for cross-disciplinary collaboration to better tackle these core challenges and improve replicability.