How do we pick the voice from the crowd? Focus, my friend, focus.

Aug 01 2012 Published by under Behavioral Neuro

Imagine for a minute. You're in a coffeeshop, or a bar, or at a swanky cocktail party (whichever you prefer). There are people around, chatting nearby. But you're speaking to the person directly across from you. Somehow, you can pick their voice out of the chatter and attend to what they are saying, even though the conversations around you might be just as loud or louder (especially in a bar!) than the one you're interested in.

Have you ever wondered how you do that? I know I have. It's kind of a mind-boggling problem (and is, in fact, called the Cocktail party problem), trying to separate out speech, and make sense of it, in comparison to all the noise. And it's not just something to think about for us humans. Voice recognition technology and recording wrestles with this all the time: how to pick out the voice from the crowd?

As it turns out, it's all about attention, and how that attention can change your brain.

Mesgarani and Chang. "Selective cortical representation of attended speaker in multi-talker speech perception" Nature, 2012.

The authors of this study were interested in what happens in the brain when someone tries to pick out a single speaker in a room full of people. To look at this, they actually used electrodes implanted subdurally (beneath the tough dura mater on the outside of the brain) in three human patients. Three is a really small number, but they had to use patients who were receiving this electrode implant clinically, in this case for treatment of epilepsy, and who were known to have normal hearing and language skills.

These subdural electrodes were implanted over the cortex, and in particular over the superior temporal lobe of the brain.


This area is not primary auditory cortex, but it is secondary auditory cortex, and is related to areas which are important in speech perception. Electrodes implanted in this region could detect populations of neurons firing in that area, and showed reliable changes in response to sound.

They then had the patients listen to speech samples from two different voices. The sentences used made no sense, an example from the paper is "ready tiger go to red two now", but they contain call signs and color-number sequences, which the patients had to identify (in this case the call signs are "tiger" and "red two"). The patients had to indicate when they heard the particular call signs and sequences spoken, indicating that they could hear and understand the speech. They were exposed to a male and female voice separately saying similar things with similar types of call signs and indicators, and unsurprisingly, were very good at the task (100% accuracy).

Then the subjects listened to a mixture of the two voices speaking together, saying the same types of words, and told to pick out the call signs. The targeted speaker that they had to listen to varied from trial to trial, and the target call sign varied as well, meaning that the patients had to keep a careful ear tuned to keep up. At this point, their performance dropped to about 75% accuracy, but it's still not bad.

Above you can see a visual representation of what the sounds looked like when presented individually and then when overlapped (i in the figure). You can see that there's a lot of overlap between the two voices, even though they differ in gender, pitch, and intonation, but that when the patients attended to them (the thin lines displayed in i), they were able to pick them out and focus on the differences between the voices.

But what is going on in the brain? During the individual voices, the electrodes picked up a pattern (shown below as the dotted lines for speaker 1 and speaker 2).

You can see that when the patients were given a mixture and told to attend to one voice or another, their electrode responses took on forms that were very similar to those when they speakers were speaking alone (the solid lines). In particular, they showed an increased response to the high frequency sounds, and suppressed the response to the sounds of the other speaker.

So it appears that when you have to pay attention to a single speaker in a group (or in this case a pair), your brain can emphasize the signals related to what you recognize as that speaker. This is probably more than just paying better attention, it probably also involves things like increased signal responses to that person's particular speech patterns and intonations that you might recognize (say, if the voice is female you'd be more sensitive to that range). And the neural signals also suggested that when the patients messed up in the task, it was because their brain was "losing track" of the aspects it was supposed to pay attention to, and the responses from the electrodes faltered.

This means that picking out speech isn't just a response to sounds in your environment, it's also your brain picking out the parts of the sounds that make them relevant to you as speech, probably thought several processes.

Of course, it was a simplistic study, they only dealt with two voices, not a crowd. It'd be interesting to see how much the signals stand out for attention as the number of voices increases. And of course, it's in a very small sample, and a group that is already not a standard control, so there might be some differences in their speech processes compared to people without epilepsy (though that is unlikely here since their hearing and speech was normal). But I personally think this is a really cool study, and provides some interesting insights into how our brains allocate attention where it is needed. So the next time you're in a crowded room, and shouting out to someone over the crowd, don't worry. Their brain will help them pick you out, and with any luck they'll even understand what you're saying.

Mesgarani N, & Chang EF (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485 (7397), 233-6 PMID: 22522927

17 responses so far

  • Lars says:

    Thank you! I am autistic (Asperger's), and I absolutely can not do this. I've long suspected it to be some sort of processing deficit, and now I have a study I can point to.

  • Jacob Farkas says:

    This comment is prompted by the one posted by Lars, above.
    I'm 72 years old. From the time I was a toddler, I could not participate in group discussions when people would talk over each other. I could not tolerate cocktail parties or meeting with people in bars; I would do fine at a small dinner party with another couple. This was a multifaceted handicap.People thought I was 'standoffish'.
    When I was growing up, people did not recognize ADHD or autism. Since I did not exhibit any traditional mental health issues, my teachers and others thought I was misbehaving.
    Fortunately, I found a career which required constantly demonstrable high intelligence and excellent analytical skills. I was successful, so I was able to use money and reputation to overcome some of my social shortcomings.
    I hope you continue your research so that others who have the same deficits and aren't as fortunate as I, can get some help -- or, at least, understanding.

  • Michael says:

    Sadly it is not a very effective process when you have reduced signals from a cochlear implant or hearing aid

  • Grant says:

    My personal experience is that I can't pick a particular voice out of a crowd and that this is common for those deaf in one ear. While attentiveness is used in tracking a voice, as in the study, a pre-condition of picking out one voice in the first place is having hearing in two ears - ?

    (For those with difficulty hearing in a crowd, it might be worth checking your hearing - it's fairly common for people to not realise that they have poor hearing in one ear.)

    While I'm writing anyone interested in contributing to a blog carnival about disability is welcome to offer posts (excuse my hijacking this thread briefly, but it's for a good cause! 🙂 (Please pass this on!)


  • Kathryn says:

    Lars, I have the same issue (but I'm not Autistic so, perhaps, it's not a symptom of Autism???) I've always hated going to parties with crowd because I cannot hear what anyone is saying to me. I always thought I was simply tone deaf but apparently that might not be the case.

  • Rosemary says:

    I lost this ability for over a year after I received a severe concussion. Being in a crowded room was torture and even after a year, being in a crowded resturaunt for more than an hour led to feelings of spaciness and an inability to focus on anything.

  • Lars says:

    I'm not saying this is definitely due to my autism. But most autistic people have additional neurological problems. Some are more closely correlated with autism than others (I have a couple that are definitely unrelated).

  • Brin says:

    Have you ever wondered how you do that?

    Err...I don't. On one occasion, I had my ear only three inches from my mom's mouth and still couldn't make out what she was saying over the chatter. I don't think I even knew that the ability was common until reading this post (so thanks for that, I suppose).

  • ian says:

    i have a nueroma which has shut down the auditory nerve from left ear to brain. i have noticed an enormous drop in the abitity to discriminate sounds and a crowded room is extreemly difficult to manage with conversation. i am guessing that two ears and two sound channels provides much more focus on sound source than we might suppose.

  • [...] Science ponders: How do we pick out a voice in a crowd? (why is an oboe not a [...]

  • Paul says:

    Unlike some of the other respondents I know this effect works, I used to be quite good at it. Then i got Tinnitus and now noisy environs like restaurants are sheer torment, it doesn't even need to be particularly noisy either, the whisper of air from an aircon duct in a conference room will mean I cant understand anything said. I have always imagined that my brain is too busy filtering out the infernal whistle thats always in my ears to be able to effectively tune into a speaker.

  • John says:

    I think I have the same problem -- that is, difficulty following a conversation in the presence of background noise.

    I've had it for years, but I was diagnosed with Multiple Sclerosis a couple of years ago.

    Nothing can be done about it, but I feel better knowing there's a possible explanation.

  • [...] How do we pick out a voice from a crowd? [...]

  • [...] Science ponders: How do we pick out a voice in a crowd? (ensemble musicians need to be good at it!) More about “cocktail party [...]

  • [...] to new research from two scientists at the University of California San Francisco, Dr. Nima Mesgarani and Dr. [...]

  • [...] in filtering out the unimportant conversations at a cocktail party. First, there's the matter of attention. We know that the voice you're paying attention to and focusing on in a crowded space will elicit [...]

  • Yvette says:

    This article answers my curiosity somewhat.

    I can recognize who is on the phone after hearing them call a couple of times. I didn't think anything of it until the next time they called and I asked them if it is them calling and they feel shocked and say, "That's amazing." I can't do it with everybody but, thinking about it, these people have distinctive voices to me, as the article says.
    When I was a young girl, my musician father said that I have a good ear. I noticed that I unconsciously became more attentive to every part of a musical piece in the last 10 years after doing spiritual therapy. And when I hear a particular part of a musical piece or a whole piece, I know that I have heard it in a movie and I feel so emotionally affected by it that I want to find the movie.

Leave a Reply