Investigating Human-Like Processing in Large Language Models: A Glimpse into Findings from Early-Career Researchers

In 2022, a colleague of mine and I were required to complete our progress report for full professors at Rice University, which is due every 5 years as part of post-tenure review. As ChatGPT had recently been released, we joked that we should ask it to do this rather tedious task. Neither of us actually took that route—most likely because we thought it would take too much time to figure out the system and the results would not be something we would want our evaluations based on. Had we tried it out, we might have been amazed at its degree of success. In the short time since then, we have all experienced ChatGPT and other artificial intelligence (AI) systems known as large language models (LLMs) become ubiquitous and produce remarkable results—popping up to provide a summary of a technical article we’re reading and being used in commercial products such as those creating descriptions of homes for sale posted on realty websites. Of course, we who serve as professors have become all too familiar with how they can be used to write term papers on complex topics with the results appearing within a few seconds of students entering a prompt.
The success of LLMs raises fundamental questions as to what they actually know and how their processing does or does not match that of humans. LLMs are trained by exposure to terabytes of language data to minimize the error in predicting the next word given some language context. Thus, they don’t represent knowledge in the fashion that had previously been used by search engines on the internet but are being trained in pattern recognition—that is, what words co-occur with other words in particular contexts. Recent models, called transformer models, have enabled striking advances in performance over earlier AI models by taking in context in parallel, rather than using purely sequential prediction. This large advance has made LLMs highly useful as tools for psychological research—for instance, they can be used to generate semantic representations from texts that can then be mapped onto the brain regions coding these representations when a sentence or narrative text is perceived.
LLM’s similarities to human processing
The fact that LLMs on the surface appear to work similarly to the human language-processing system is remarkable, especially given that humans have perceptual systems in which to ground concept representations, and they have intentions during speaking that these LLM models do not. How is it that LLMs’ basis solely in pattern recognition can lead to language abilities that seem very similar to those of humans? The early-career scientists taking part in the plenary session at the APS 2025 Annual Convention titled “Human Language and Thought in the Era of Large Language Models” (Alexander Huth, Laura Gwilliams, and Anna Ivanova) will present their ground-breaking research, which both makes use of LLMs as a cognitive neuroscience research tool and addresses fundamental theoretical questions about the relation of the processing of these models to human language and thought.
Learn more about the 2025 APS Annual Convention in Washington, D.C.
Alexander Huth began his work in this domain before the advent of these latest LLMs. Earlier studies of semantic processing in the brain had typically presented small sets of words differing in certain semantic features such as object vs. action or concrete vs. abstract to find brain regions that show selective responses to these features. By contrast, the study by Huth and colleagues published in Nature (2016) exposed participants to 2 hours of naturally spoken narrative speech from radio shows while brain responses were recorded via functional magnetic resonance imaging (fMRI). A data-driven process determined the semantic representation of all the narrative words by computing their co-occurrence with a set of less than 1,000 basic words within a large corpus of text. The data were then reduced to a smaller set of semantic dimensions along which the greatest variations in meaning occurred (using a process similar to latent semantic analysis, an earlier kind of language model pioneered by the late APS Charter Member and Fellow Thomas Landauer). The values on these semantic dimensions were correlated to brain activity to find the regions coding these semantic features.
The results showed selectivity in the brain regions responding to the different semantic features and found that these feature maps in the brain were shared to some degree across individuals. Later results (Popham et al., 2021) showed a correspondence between the semantic features uncovered in narrative speech input and movie input, indicating correspondence in meaning representations derived from verbal and visual input.
Huth’s recent work uses modern LLMs to uncover semantic features and examine the correspondence of semantic brain responses across verbal and visual modalities and across participants. The work investigates the extent to which sequences of semantic features derived from visual input can be used to generate accurate verbal stories that capture that input. In terms of the issues about how LLMs generate meaning without perceptual experience, these findings by Huth suggest a great deal of overlap in the meanings derived purely from language and those derived from visual input.
Laura Gwilliams’s research has focused on determining the brain regions involved in the processes underlying speech comprehension, from the decoding of acoustic signals to the construction of high-level aspects of meaning. She applies time-resolved analysis methods to examine the simultaneous presence and duration of different types of language features in different brain areas. For instance, some of her work on decoding acoustic speech signals has demonstrated that when processing words in natural speech, several phonemes are maintained in parallel while coding their timing to preserve order (Gwilliams et al., 2021). In very recent work, she has investigated the time course of simultaneous activation at different levels of the language hierarchy (e.g., phonetic vs. grammatical vs. semantic), to examine the extent to which activity can be decoded that either follows the relevant input or may anticipate it. For instance, when hearing a sentence like “I wanted to eat the dessert,” it is possible that after perceiving “eat,” rather than waiting on hearing “the dessert,” the brain anticipates a direct object noun following “eat” and a noun referring to something edible, allowing for the decoding of these grammatical and semantic features in the brain before the relevant words are perceived. To investigate these semantic properties, she uses LLM analyses to code the semantic features of upcoming words and relates these features to signals from magnetoencephalography (MEG). Her work could inform the construction of automated speech-recognition systems like Siri, using the timing and predictions of different types of representations observed in the human brain to improve the automated recognition. It is important to note that the work of both Huth and Gwilliams has opened up the possibility for applications that use patterns of brain representations of semantic information to generate speech output for individuals with deficits in language production, with Huth exploring this for individuals with aphasia and Gwilliams for those with autism.
Best practices for evaluating LLMs
Anna “Anya” Ivanova has directly engaged with the issues of the overlap in LLM processing of language and thought and that of humans. She comes from a cognitive neuroscience perspective in which neuroimaging and neuropsychological studies have demonstrated brain regions specific to language and other areas specific to other aspects of cognition, such as mathematical processing, logical reasoning, and the use of real-world knowledge (Fedorenko et al., 2024). With respect to LLMs, she and her colleagues have argued for a distinction between formal competence and functional competence—whereby LLMs are good at formal competence in language but not at functional competence in using language in real-world situations (Mahowald et al., 2024). She points to the need to distinguish the accomplishments of LLMs given general language input from those given specialized training regimes. In some domains, LLMs have been augmented with other modules to provide better functional competence, for instance, by adding mathematical processing modules. Ivanova’s presentation will discuss these limitations of LLMs and how psychologists and neuroscientists can help design best practices for accurate evaluation of LLM capabilities (Ivanova, 2025).
Current LLMs have demonstrated remarkable power in carrying out language tasks and may have utility in practical tasks and as research tools. Many questions remain, though, about the similarities and differences between language and thought for humans vs. LLMs. The speakers on this panel will present research relevant to these issues and provide insights as to what the future might hold for these research directions.
Feedback on this article? Email [email protected] or login to comment.
Fedorenko, E., Ivanova, A. A., & Regev, T. I. (2024). The language network as a natural kind within the broader landscape of the human brain. Nature Reviews Neuroscience, 25, 289–312.
Gwilliams, L., King, J. R., Marantz, A., & Poeppel, D. (2022). Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nature Communications, 13, 6606.
Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532, 453–458.
Ivanova A. A. (2025). How to evaluate the cognitive abilities of LLMs. Nature Human Behaviour, 9, 230–233.
Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2024) Dissociating language and thought in large language models. Trends in Cognitive Sciences, 6, 517–540.
Popham, S. F., Huth, A. G., Bilenko, N. Y., Deniz, F., Gao, J. S., Nunez-Elizalde, A. O., & Gallant, J. L. (2021). Visual and linguistic semantic representations are aligned at the border of human visual cortex. Nature Neuroscience, 24, 1628–1636.
APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.
Please login with your APS account to comment.