Cover Story

How Sound Becomes Music

Music plays many roles. It can awe a concert hall full of adoring fans, woo a would-be lover, or soothe a fussing child — and psychological scientists are discovering just how deep our connection with this intimate art form goes.

It may seem almost trivial to say that music is universal; every known culture in the world seems to have something that ethnographers and ethnomusicologists would describe as music. What’s far more interesting is how music is universal, says psychological scientist Samuel Mehr, principal investigator at Harvard University’s Music Lab.

Through analyzing the lab’s Natural History of Song database, an archive of nearly 5,000 songs and performances from more than 100 societies across the globe, Mehr and his team have been able to distill the musical complexities of a given song into a few key dimensions, such as level of formality, religious/secular purpose, and positive/negative affect. These dimensional labels, Mehr continues, create intuitive clusters of songs that align with the most common genres of music, including songs traditionally accompanied by dancing, ceremonial healing songs, love songs, lullabies, and spiritual or religious music.

This tells us something about the basic types of music that exist across cultures, he said, and although the evidence is mixed for every culture producing every type of music, there is always a certain amount of musical variety within a culture.

“In our data, no culture produces only very formal songs and no informal songs, or only very religious songs and no secular songs, so what that suggests is that not only is music universal in a trivial sense in that it turns up everywhere, but there seem to be key ways that music is patterned similarly across human societies,” Mehr explains.

The universality of these patterns means that people are often able to identify the intended purpose of songs from cultures they may otherwise have little or no experience with. In an online study with 750 participants from 60 countries, listeners rated the perceived purpose of a random sampling of 36 song excerpts from the Natural History of Song archives. The excerpts were drawn from a larger set of 118 songs in 75 languages from 86 small-scale hunter-gatherer, pastoral, and subsistence-farming societies. Participants, who were completely unfamiliar with the societies from which these songs originated, rated their perceptions of the songs’  functions — and their ratings correlated highly with what the songs were actually used for in the societies from which they were gathered. Participants rated dance songs highly on the dimension “used for dancing,” lullabies highly on the dimension “used to soothe a baby,” and healing songs highly on the dimension “used to heal illness.” They weren’t able to do so for love songs, however.

Samples of the musical genres used in the Natural History of Song study.
Mentawai Dance Music – Mentawai Archipelago, Indonesia


Iroquois Healing Music — Six Nations Reserve, Ontario, Canada


Highland Scots Love Music — Outer Hebrides, Scotland


Nyangatom Lullaby — South Omo, Ethiopia


In a follow-up study of 1,000 participants, half of whom lived in India and half of whom lived in the United States, Mehr and colleagues found that while all four song types showed reliable differences in their musical features, some were more distinct than others. Participants rated the same set of dance songs and lullabies as having the most unique musical profiles; dance songs’ more numerous singers, instruments, and greater complexity were easily distinguished from the simpler and often female-led style of lullabies. Errors also appeared to happen nonrandomly – when participants identified a healing or love song as having a different function, for example, it was often because the song possessed features typical of another genre, or the genre was less distinct in general.

Songs that share a social function may take a similar form because those musical features help amplify the music’s social signal, Mehr writes in Current Biology. Drawing numerous singers and instrumentalists into a performance may help reinforce dance songs’ coalition-building effects; lullabies’ slow melodies, on the other hand, can have a calming effect that helps lower arousal in young children.

These soothing songs aren’t just for putting children to sleep, however — in research published in Psychological Science, Mehr and colleagues found that infants use lullabies and other melodies to orient themselves in their newfound social environment.

In a study of 32 infants, the researchers asked parents to sing one of two lullabies to their children at home over the course of 1 to 2 weeks. At the end of this period, in which the parents reported singing the lullaby an average of 76 times, the 5-month-old babies then viewed a pair of videos in which two strangers sang those same songs. Although they were equally attentive to both strangers while they were singing, the infants looked at the singer of the familiar song for longer after they had finished singing, and infants who had heard their assigned lullaby the greatest number of times looked at this singer the longest.

Two later studies, with another group of 64 infants, revealed that infants who received either a musical toy or video calls in which someone sang to them over a 2-week period did not show this attentional effect.

Together, these results suggest that infants may use the songs produced by their parents and others close to them to learn about their social world, Mehr says. Much like remembering the native language spoken around them, remembering the songs their parents sing may help infants determine who is most likely to provide care.

Do You Hear What I Hear?

Drawing from these findings, some researchers, including APS Fellow William Forde Thompson of Macquarie University in Australia, have proposed that the similarities between humans’ linguistic and musical abilities suggest that the functions may have arisen from a common protolanguage.

Evolutionary biologist Charles Darwin was among the first to suggest the possibility that a musical protolanguage split into both referential speech and emotive music at some point in humans’ evolutionary history.

This more simplified system of communication, Thompson explains, might have been similar to the way mammals such as vervet monkeys communicate threat information to other members of their troop. These primates make a specific nervous sound when there is a snake in the grass, for example, prompting members of their troop to climb up into the trees; another sound signals the presence of a predatory eagle, prompting the monkeys to avoid more exposed branches.

Thompson’s work with people who have congenital amusia, or tone deafness, also suggests a connection between the faculties responsible for human musicality and those responsible for emotional speech. In one study on emotional prosody – the changes in tone throughout an utterance that reflect the emotional state of the speaker — 12 participants previously diagnosed with amusia and 12 with no diagnosis were tasked with distinguishing between sets of 16 phrases that communicated happiness, sadness, fear, irritation, tenderness, and no emotion.

Overall, the amusic group correctly classified the emotional prosody of a statement in 77% of trials on average – making them 10% less accurate than their peers without amusia. Accuracy varied considerably across emotions, however. While people with and without amusia were able to detect fear and emotionally neutral statements with similar levels of accuracy, tone-deaf participants were 20% less accurate at detecting happiness, and they struggled with differentiating among sadness, tenderness, and irritability, as well.

Similar to Mehr’s findings on musical features common across cultures, these misinterpretations seemed to reflect auditory overlap in the emotional prosody of the statements. Participants were most likely to confuse emotion statements that had similar intensity and vocal duration, mistaking sadness for tenderness, irritability for fear, and happiness for neutrality.

Amusia does not reduce an individual’s ability to understand the linguistic content of speech, Thompson and colleagues note, suggesting that emotional communication may represent a fundamental link between the domains of music and language.

Metalheads, Metal Minds

The types of music and the purposes they serve may share similarities across cultures, but our individual responses to music can vary widely, and niche genres such as death metal provide an illustrative example of this. Songs in this subgenre often contain lyrics depicting real or imagined moments of extreme violence, and fear about the effects that such music could have on its listeners made it a prime target of the “Satanic Panic” in the United States during the 1980s. But Thompson’s research suggests that death-metal fans may not be thrashing to their favorite songs for the reasons critics once thought.

For one thing, Thompson said, fans and nonfans of a particular genre can experience that music very differently. The 97 college students in his study who were not fans of death metal reported feelings of tension, fear, and anger after listening to songs like “Hammer Smashed Face” by Cannibal Corpse and “Waiting for the Screams” by Autopsy. But the 48 students who self-identified as death-metal fans reported a far more positive experience, including feelings of power, joy, peace, wonder, nostalgia, and even transcendence.

On average, death-metal fans scored slightly lower than nonfans on the personality traits of agreeableness and conscientiousness (as measured by the Big Five Inventory) but similarly on the Interpersonal Reactivity Index, suggesting no between-group differences in empathy for others. Fans who scored highest on openness were also more likely to report feeling higher levels of power and joy.

When asked to describe the musical features of death metal, fans were also more likely to bypass the graphic lyrics — which nonfans described as “gruesome and intense” — to focus on more technical elements, such as the “evocative…fast-paced tempo, down-tuned instruments and blast beats.”

Overall, Thompson writes, these findings suggest that people listen to music for many different reasons, and they can experience that music quite differently from how we might expect. This should make us think twice about stereotyping individuals according to their listening habits, he said.

Time After Time

The fact that art is subjective doesn’t mean that we’re as likely to consider any collection of sounds as musical as any other, of course. There are certain features, says psychological scientist Elizabeth Hellmuth Margulis, director of the Music Cognition Lab at the University of Arkansas, that encourage our brains to perceive a given collection of sounds as music — namely, repetition.

Repetition. Repetition. Repetition. Repetition.

Annoying, right? Repetition in speech, and in writing, often strikes us as grating — and can even cause words to “lose their meaning,” in the case of semantic satiation. But Margulis’s research suggests that this same feature can turn an otherwise simple series of sounds into music.

Previous research by APS Fellow Diana Deutsch (University of California, San Diego) has shown that simply repeating a spoken phrase can shift people’s perception of the utterance from speech to song, a phenomenon known as the speech-to-song illusion. Deutsch did not find evidence of this effect, however, when the syllables of the phrase were presented out of order.

In a recent follow-up study, Margulis and collaborator Rhimmon Simchy Gross (University of Arkansas) investigated how this illusion might extend to nonspeech sounds. The researchers had 58 students listen to environmental sounds, from bumblebees buzzing to the crackle of breaking ice and the scraping of a shovel being dragged across the ground. Each of the 20 sounds was repeated seven times to form a 10-second clip. After listening to the original, untransformed clip, some students then heard the same clip eight more times, while other students heard eight “jumbled” versions of the clip in which the sound was looped and interrupted at different time points. After each 10-second clip, participants rated the clip’s musicality on a scale of 1 (sounds exactly like environmental stimuli) to 5 (sounds exactly like music).

As in Deutsch’s study, Margulis found that participants rated stimuli as being more musical the more the clips were repeated. Unlike in the previous study, participants also reported experiencing this “sound-to-music illusion” regardless of how the sounds were transformed.

This suggests, Margulis writes, that the speech-to-song illusion may be a function of semantic satiation suppressing the meaning of repeated words and phrases, causing them to be perceived as more musical. When the semantic meaning of a phrase is disrupted by scrambling the syllables of its component words, satiation does not occur, and the musical effect vanishes. This doesn’t seem to be the case with environmental sounds, however.

“A succession of drops of water that has been rearranged is still just a succession of drops of water,” Margulis writes. “Rearranging individual components of the sound does not tend to alter this source identification.”

These phenomena highlight the foundational role that repetition serves in music, both within individual songs and in listening to the same songs over and over again, Margulis notes in Frontiers in Psychology. Similar to the rituals we use to mark holidays and other significant events, repetition in music can cause us to enter a “special mental state” in which we focus on the lower-level properties of an action or stimuli — in this case, the changes in tone and rhythm throughout a familiar song or sound. Repeated listening can also cause us to experience an attentional shift toward larger-scale elements, such as lyrical phrases and song structure, that might be lost on a first-time listener.

Great Expectations

Margulis, who studied piano at the Peabody Conservatory of Music, said that her experience as a performer inspires her to ask questions that get at some of the more elusive things that happen in the practice room, in the teaching studio, and on stage.

“Musicians tend to think about music in terms of gestures and metaphors, but researchers often think in terms of quantitative, measurable attributes,” Margulis says. “The really exciting stuff happens when you find ways to move back and forth between these modes of understanding.”

Jonathan Berger, a composer and professor of music at Stanford University’s Center for Computer Research in Music and Acoustics (CCRMA), has an ear in both worlds as well.

As a composer, Berger’s work ranges from the monodrama “My Lai,” a one-man show that reflects on the massacre of more than 500 civilians at the hands of American soldiers in the village of My Lai during the Vietnam War, to electronica and string quartets like “Swallow,” a five-movement piece inspired by the chirps, whines, and gurgles these birds use to communicate. As a self-described “amateur researcher,” Berger’s Music Engagement Research Initiative (MERI) has shone a light on the neural mechanisms underlying music perception and performance and on how listeners engage with individual songs.

In one exploratory study led by CCRMA Research Scientist Blair Kaneshiro, 13 musicians with at least 5 years of training listened to a cello concerto while their physiological responses were measured. The results showed that the musicians’ cortical responses (measured by electroencephalogram), respiratory rate, and galvanic skin response, but not heart rate, correlated with the musical highpoints in the concerto, such as the first entrance of a cello, an unexpected pause, or an orchestral climax.

Those data were gathered from a relatively small group of participants listening to just one song, though; to further pursue the question of how people engage with music in their day-to-day lives, Berger, Kaneshiro, and colleagues turned to Shazam, a website and mobile app that helps users identify the music playing around them. The researchers analyzed 188.3 million time-stamped queries related to the top 20 songs of 2015’s Billboard Year-End Hot 100 chart, which included such hits as “Shut Up and Dance” by Walk the Moon and “Can’t Feel My Face” by the Weekend. The data revealed that users were most likely to search for a song shortly after the onset of vocals and the start of the first chorus. But the relationship between salient musical events and listener engagement continued to evolve throughout the “life cycle” of a song — as songs became more popular, listeners began searching for them as soon as they started.

The researchers note that the timing of user queries doesn’t necessarily mean that those musical features were the most interesting points of these songs, just that they were the first point in the song interesting enough to compel listeners to learn more.

These points of interest, whether spurred by an instrumental entrance, a vocal feat, or a beat drop, often elicit one essential reaction: surprise.

“Manipulation of expectation is at the very core of a composer’s craft,” Berger explains.

Musicians intuitively create these moments of surprise by changing tone, volume, and timing throughout a piece, but the actual processes through which these expectations are formed or violated remain mysterious, he said. Although research on this fundamental aspect of musical engagement has been limited, Berger and others continue to seek new methods for bringing this and other acoustic wonders into the lab.


Deutsch, D., Henthorn, T., & Lapidis, R. (2011). Illusory transformation from speech to song. The Journal of the Acoustical Society of America129, 2245–2252. doi:10.1121/1.3562174

Kaneshiro, B., Nguyen, D. T., Dmochowski, J. P., Norcia, A. M., & Berger, J. (2016). Neurophysiological and behavioral measures of musical engagement. Proceedings of the 14th International Conference on Music Perception and Cognition (pp. 41–47). Retrieved from

Kaneshiro, B., Ruan, F., Baker, C. W., & Berger, J. (2017). Characterizing listener engagement with popular songs using large-scale music discovery data. Frontiers in Psychology, 8, Article 416. doi:10.3389/fpsyg.2017.00416

Margulis, E. H. (2013). Repetition and emotive communication in music versus speech. Frontiers in Psychology4, Article 167. doi:10.3389/fpsyg.2013.00167

Mehr, S. A., Singh, M., York, H., Glowacki, L., & Krasnow, M. M. (2018). Form and function in human song. Current Biology28, 356–368. doi:10.1016/j.cub.2017.12.042

Mehr, S. A., Song, L. A., & Spelke, E. S. (2016). For 5-month-old infants, melodies are social. Psychological Science27, 486–501. doi:10.1177/0956797615626691

Simchy-Gross, R., & Margulis, E. H. (2018). The sound-to-music illusion. Music & Science1. doi:10.1177/2059204317731992

Thompson, W. F., Geeves, A. M., & Olsen, K. N. (2018). Who enjoys listening to violent music and why? Psychology of Popular Media Culture. Advance online publication. doi:10.1037/ppm0000184

Thompson, W. F., Marin, M. M., & Stewart, L. (2012). Reduced sensitivity to emotional prosody in congenital amusia rekindles the musical protolanguage hypothesis. Proceedings of the National Academy of Sciences USA109, 19027–19032. doi:10.1073/pnas.1210344109

Thompson, W. F., & Russo, F. A. (2007). Facing the music. Psychological Science, 18, 756–757. doi:10.1111/j.1467-9280.2007.01973.x

Thompson, W. F., Schellenberg, E. G., & Husain, G. (2001). Arousal, mood, and the Mozart effect. Psychological Science12, 248–251. doi:10.1111/1467-9280.00345

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines.

Please login with your APS account to comment.