MTurk Workers Are More Depressed—But “Bots” and Demographic Differences Inflate the Data

Conducting psychological research through online platforms such as Amazon’s Mechanical Turk (MTurk) is more common than ever, but working with this convenient pool of participants can present a unique set of challenges. In line with previous findings, research in Clinical Psychological Science indicates that up to 11% of MTurk participants, compared with just 3% to 7% of the general population, may meet the diagnostic criteria for major depression. However, reports are significantly inflated by low-quality responses and sociodemographic differences.

“Knowledge about the reasons behind increased prevalence rates [of depression] is not only interesting on its own but also necessary for accurate interpretations of research results,” write psychological scientist Yaakov Ophir (The Hebrew University of Jerusalem) and colleagues. “To the extent that this finding reflects a genuine difference, increased prevalence makes MTurk an even more convenient and attractive recruitment platform for clinical researchers.”

The researchers dug deeper into how data quality, demographics, and other factors may contribute to the so called “Turker Blues” through a set of two online studies.

In the first study of 2,719 MTurk participants reportedly living in the United States, Ophir and colleagues focused on how responses from
inattentive and fake “bot” accounts may influence self-reported rates
of depression.

Participants reported what, if any, symptoms of depression they had experienced in the past 2 weeks—including hopelessness, trouble concentrating, changes in sleeping or eating patterns, and suicidal ideation—on a scale of 0 (not at all) to 3 (nearly every day). By this measure, individuals who report experiencing five or more symptoms of depression throughout more than half of days in the past 2 weeks may be described as having major depressive disorder.

To weed out automated or otherwise fake accounts, the researchers made note of any supposed MTurk workers with suspicious IP addresses; such addresses suggested that the account may have been hosted on a private server. They also noted those with non-US-based IP addresses, given that bot accounts have largely been found to originate outside the United States.

In addition, Ophir and colleagues targeted low-quality responses using a program that rated participants for attentiveness on the basis of a series of eight indicators, including reading speed and consistency of responses. This allowed the researchers to sort participants into three categories: attentive workers, who passed all attention checks; questionable workers, who failed one attention check; and inattentive workers, who failed two or more attention checks.

Using these strategies, the researchers identified 236 workers with non-US or otherwise suspicious IP addresses, 35% of whom failed the attention checks, compared with 7% of participants using nonsuspicious IPs. They also identified an additional 181 inattentive and 427 questionably attentive workers, leaving a total of 1,848 participants who were both attentive and using a nonsuspicious IP.

Taken together, 18.5% of all participants reported symptoms that met the criteria for major depression. This number was heavily skewed, however, by the reports of workers with suspicious IP addresses or who were inattentive, Ophir and colleagues write. In fact, although 26% to 38% of workers with suspicious IP addresses or those who were inattentive or questionably attentive workers met the cutoff point for major depression, just 13% of attentive workers reported five or more symptoms.

In line with previous research, this suggests that the prevalence of major depression among MTurk workers is substantially higher than that of the general population, Ophir and colleagues write, but more stringent participant filtering may be required to prevent low-quality responses from further inflating these results.

Clinical measures can be particularly vulnerable to fake and inattentive responses, the researchers note, because these respondents tend to select answers toward the middle of a scale, which in the case of certain diagnostic measures can results in participants meeting the cutoff point for depression.

“Even though in this study we specifically focused on depression, our findings on the effects of failing to screen for inattentive and suspicious respondents are relevant for any crowdsourcing-based clinical research,” Ophir and colleagues write. “Pathologies such as social anxiety may prove to have been artificially inflated as well.”

The researchers sought to replicate these findings, and to further investigate the influence of sociodemographics on the prevalence of major depression in MTurk workers, through a follow-up study of 2,444 US-based MTurk workers.

After filtering out workers who had suspicious IP addresses or who were inattentive, 11% of the remaining 1,461 attentive participants were found to meet the criteria for major depression. In contrast, just 3.6% of 5,134 individuals who participated in face-to-face interviews for the US Centers for Disease Control and Prevention’s 2015–2016 National Health and Nutrition Examination Survey (NHANES) were found to meet the cutoff for major depression.

Additionally, MTurk workers were found to be significantly younger, to be more physically inactive, and to report poorer sleep quality than the nationally representative sample in the NHANES study. These factors have all been tied to increased risk of depression, Ophir and colleagues write. On the other hand, the MTurk sample was also more highly educated, reported higher income, and was more likely to be employed than participants in the NHANES study, factors which have been found to decrease depression risk.

“Approximately half of this difference can be attributed to differences in the composition of MTurk samples and the general population (i.e., sociodemographics, health, and physical activity lifestyle),” write Ophir and colleagues—and there are a number of potentially provocative explanations for the remaining difference.

Each of these factors contributes to the differing rates of major depression in National Health and Nutrition Examination Survey (NHANES) and Mechanical Turk (MTurk) participants. (Ophir et al, 2020).

On the one hand, individuals who already suffer from depression may be more likely to participate in paid online surveys from the comfort of their home. It’s also possible that using MTurk may contribute to this increase in depressive feelings, the researchers write. This could be due either to the potentially deleterious effects of excessive screen time or to the socially isolating nature of these tasks, which may impair participants’ sense of leading a meaningful life.

The third and perhaps more concerning alternative, Ophir and colleagues write, is that people may simply be more comfortable honestly reporting symptoms of depression in an online setting, in which case the actual rate of depression in the general population may be significantly higher than face-to-face interviews suggest. Further work is needed to narrow down the potential cause, or causes, of the unexplained difference in these samples’ depression rates, the researchers conclude.


Ophir, Y., Sisso, I., Asterhan, C. S. C., Tikochinski, R., & Reichart, R. (2020). The Turker blues: Hidden factors behind increased depression rates among Amazon’s Mechanical Turkers. Clinical Psychological Science8(1), 65–83.

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Comments will be moderated. For more information, please see our Community Guidelines.

Please login with your APS account to comment.