What the Rise of Large Datasets Means for Psycholinguistics

January 20, 2016

Tags:

The ability to crowdsource data from large groups and the rise of Big Data have helped advance many different areas of psychological research. The field of psycholinguistics — the study of the psychology behind the acquisition, use, production, and comprehension of language — is one of those areas. Such is the importance of Big Data to the field that it was the subject of a special issue, edited by Emmanuel Keuleers (Ghent University, Belgium) and APS Fellow David A. Balota (Washington University in St. Louis, USA) and published in a 2015 issue of The Quarterly Journal of Experimental Psychology.

Words are often the main focus of linguistic studies, and variables unique to each word — such as length, pronunciation, frequency, concreteness, and valence — influence how people process and respond to each word. Large datasets that examine these factors allow psychological scientists to understand how each variable affects language processing and recognition, enabling researchers to control for these variables — when not the focus of interest — in their own studies.

As early as the 1940s, researchers created databases chronicling characteristics of words such as their familiarity and vividness. These studies were laboratory-based and were considered large for their time, with several hundred words being examined. As technology has changed to lab-centered, computer-based data collection — and now to Internet-based intake — researchers are more easily able to amass data on tens of thousands of stimuli from thousands of participants; thus, psychological scientists have now been able to create databases examining a wide range of language characteristics such as subjective familiarity ratings, meaningfulness, age of acquisition, valence, arousal, concreteness, and dominance of words. The data from these large-scale and crowdsourcing studies can be used to create norms for words that eventually may be used as stimuli for other researchers.

Such large datasets also can be used to test novel variables, to create predictive computational models of language processing, and to help researchers understand how words gain meaning from the words that surround them and from the context provided by larger chunks of text.

The expansion of these types of datasets brings with it many benefits but also potential methodological concerns. For example, how might one go about replicating a megastudy, and does participant fatigue during data collection reduce the reliability of megastudy findings? Psychological scientists are addressing these questions, with many developing and applying innovative techniques to address the reliability and replicability of these types of studies.

This special issue highlights the utility of large-scale datasets for the field of psycholinguistics and shines a light on researchers who are advancing their field by creating new linguistic databases, utilizing such datasets to better understand the way we process language, and tackling methodological issues that arise with the expansion and application of these techniques.

Reference

Keuleers, E., & Balota, D. A. (2015). Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments. The Quarterly Journal of Experimental Psychology, 68, 1457–1468. doi:10.1080/17470218.2015.1051065

Publications > Observer > Observations > What the Rise of Large Datasets Means for Psycholinguistics

Experimental Methods Are Not Neutral Tools

Ana Sofia Morais and Ralph Hertwig explain how experimental psychologists have painted too negative a picture of human rationality, and how their pessimism is rooted in a seemingly mundane detail: methodological choices.

APS Fellows Elected to SEP

In addition, an APS Rising Star receives the society’s Early Investigator Award.

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.

Cookie	Duration	Description
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 27 days	Set by addthis.com to determine the usage of addthis.com service.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
loc	1 year 27 days	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

What the Rise of Large Datasets Means for Psycholinguistics

Reference

Related

Careers Up Close: Joel Anderson on Gender and Sexual Prejudices, the Freedoms of Academic Research, and the Importance of Collaboration

Experimental Methods Are Not Neutral Tools

APS Fellows Elected to SEP

Reference

Related

Careers Up Close: Joel Anderson on Gender and Sexual Prejudices, the Freedoms of Academic Research, and the Importance of Collaboration

Experimental Methods Are Not Neutral Tools

APS Fellows Elected to SEP

Cookies