CARTA: How Language Evolves: Language in The Brain


– [automated message] This UCSD TV program
is presented by University of California Television. Like what you learn? Visit our
website, or follow us on Facebook and Twitter to keep up with the latest
programs. ♪ [music] ♪ – [male] We are the paradoxical ape,
bi-pedal, naked, large brain, long the master of fire, tools, and language, but
still trying to understand ourselves. Aware that death is inevitable, yet filled
with optimism. We grow up slowly. We hand down knowledge. We empathize and deceive.
We shape the future from our shared understanding of the past. CARTA brings
together experts from diverse disciplines to exchange insights on who we are and how
we got here. An exploration made possible by the generosity of humans like you. ♪ [music] ♪ – [Evelina] Thanks very much for having me
here. So today, I will tell you about the language system and some of its properties
that may bear on the question of how this system came about. I will begin by
defining the scope of my inquiry, because people mean many different things by
language. I will then present you some evidence, suggesting that our language
system is highly specialized for language. Finally I will speculate a bit on the
evolutionary origins of language. So linguistic input comes in through our
ears or our eyes. Once it’s been analyzed perceptually, we interpret it by linking
the incoming representations toward our stored language knowledge. This knowledge,
of course, is also used for generating utterances during language production.
Once we’ve generated an utterance in our heads, we can send it down to our
articulation system. The part that I focus on is this kind of high-level component of
language processing. This cartoon shows you the approximate
locations of the brain regions that support speech perception, shown in
yellow, visual letter and word recognition in green. This region is known as the
visual word-form area. Articulation in pink and high-level language processing in
red. What differentiates the high-level
language processing regions from the perceptual and articulation regions is
that they’re sensitive to the meaningfulness of the signal. So for
example, the speech regions will respond just as strongly to foreign speech as they
respond to meaningful speech in our own language. The visual word-form area
responds just as much to a string of consonants as it does to real words. The
articulation regions can be driven to a full extent by people producing
meaningless sequences of syllables. But in contrast, the high-level
language-processing system or network, or sometimes I refer to it just as the
language network for short, cares deeply about whether the linguistics signal is
meaningful or not. In fact, the easiest way to find this
system in the brain is to contrast responses to meaningful language stimuli,
like words, phrases, or sentences. Some control conditions like linguistically
degraded stimuli. The contrast I use most frequently is
between sentences and sequences of non-words. A key methodological innovation
that laid the foundation for much of my work was the development of tools that
enable us to define the regions of the language network, functionally at the
individual subject level, using contrasts like these. Here, I’m showing you sample
language regions in three individual brains. This so-called functional
localization approach has two advantages. One, it circumvents the need to average
brains together, which is what’s done in the common approach, and it’s a very
difficult thing to do because brains are quite different across people. Instead, in
this approach, we can just average the signals that we extract from these key
regions of interest. The second advantage is that it allows us
to establish a cumulative research enterprise, which I think we all agree is
important in science, because comparing results across studies and labs is quite
straightforward if we’re confident that we’re referring to the same brain regions
across different studies. This is just hard or impossible to do in the
traditional approach, which relies on very coarse anatomical landmarks, like the
inferior frontal gyrus or the superior temporal sulcus, which span many
centimeters of cortex and are just not at the right level of analysis.
So what drives me and my work is the desire to understand the nature of our
language knowledge and the computations that mediate language comprehension and
production. However, these questions are hard, especially given the lack of animal
models for language. So for now, I settle on more tractable questions. For example,
one, what is the relationship between the language system and the rest of human
cognition? Language didn’t evolve, and it doesn’t exist in isolation from other
evolutionarily older systems, which include the memory and attention
mechanisms, the visual and the motor systems, the system that supports social
cognition, and so on. That means that we just can’t study language as an isolated
system. A lot of my research effort is aimed at trying to figure out how language
fits with the rest of our mind and brain. The second question delves inside the
language system, asking, “What does its internal architecture look like?” It
encompasses questions like, “What are the component parts of the language system?
And what is the division of labor among them, in space and time?”
Of course, both of those questions ultimately should constrain the space of
possibilities for how language actually works. So they place constraints on both
the data structures that underlie language and the computations that are likely
performed by the regions of the system. Today, I focus on the first of these
questions. Okay. So now onto some evidence. So the relationship between
language and the rest of the mind and brain has been long debated, and the
literature actually is quite abundant with claims that language makes use of the same
machinery that we use for performing other cognitive tasks, including arithmetic
processing, various kinds of executive function tasks, perceiving music,
perceiving actions, abstract conceptual processing, and so on.
I will argue that these claims are not supported by the evidence. Two kinds of
evidence speak to the relationship between language and other cognitive systems.
There is brain-imaging studies, brain-imaging evidence, and investigations
of patients with brain damage. In fMRI studies, we do something very simple. We
find our regions that seem to respond a lot to language, and then we ask how do
they respond to other various non-linguistic tasks. If they don’t show
much of a response, then we can conclude that these regions are not engaged during
those tasks. In the patient studies, we can evaluate non-linguistic abilities in
individuals who don’t have a functioning language system. If they perform well, we
can conclude that the language system is not necessary for performing those various
non-linguistic tasks. So starting with the fMRI evidence, I will show you responses
in the language regions to arithmetic, executive function tasks, and music
perception today. So here are two sample regions, the regions of the inferior
frontal cortex around Broca’s area and the region in the posterior middle temporal
gyrus, but the rest of the regions of this network look similar in their profiles.
The region on the top is kind of in and around this region known as Broca’s area,
except I don’t use that term because I don’t think it’s a very useful term. In
black and gray, I show you the responses to the two localizer conditions, sentences
and non-words. These are estimated in data that’s not used for defining these
regions. So we divide the data in half, use half of the data to define the regions
and the other half to quantify their responses. I will now show you how these
regions respond when people are asked to do some simple arithmetic additions,
perform a series of executive function tasks, like for example, hold a set of
locations in spacial memory, spacial locations in working memory, or perform
this classic flanker task, or listen to various musical stimuli. For arithmetic
and various executive tasks, we included a harder and an easier condition, because we
wanted to make sure that we can identify regions that are classically associated
with performing these tasks, which is typically done by contrasting a harder and
an easier condition of a task. So I’ll show you now, in different colors,
responses to these various tasks, starting with the region on the lower part of the
screen. So we find that this region doesn’t respond during arithmetic
processing, doesn’t respond during working memory, doesn’t respond during cognitive
control tasks, and doesn’t respond during music perception. Quite strikingly to me
at the time, a very similar profile is observed around this region, which is
smack in the middle of so-called Broca’s area, which appears to be incredibly
selective in its response for language. Know that it’s not just the case that
these, for example, demanding tasks fail to show a hard versus an easy difference.
They respond pretty much at or below fixation baseline when people are engaged
in these tasks. So that basically tells you that these language regions work as
much when you’re doing a bunch of math in your head or hold information in working
memory, as what you’re doing when you’re looking at a blank screen. So they really
do not care. So of course, to interpret the lack of the response in these language
regions, you want to make sure that these tasks activate the brain somewhere else.
Otherwise, you may have really bad tasks that you don’t want to use. Indeed, they
do. So here, I’ll show you activations for the
executive function tasks, but music also robustly activates the brain outside of
the language system. So here are two sample regions, one in the right frontal
cortex, one in the left parietal cortex. You see the profiles of response are quite
different from the language regions. For each task, we see robust responses, but
also a stronger response to the harder than the easier condition across these
various domains. These regions turn out to be part of this bilateral frontal parietal
network, which is known in the literature by many names, including the cognitive
control network or the multiple demand system, the latter term advanced by John
Duncan, who wanted to highlight the notion that these regions are driven by many
different kinds of cognitive demands. So these regions appeared to be sensitive to
effort across tasks, and their activity has been linked to a variety of
goal-directed behaviors. Interestingly if you look at the responses of these regions
to our language localizer conditions, we find exactly the opposite of what we find
in the language regions. They respond less to sentences than sequences of non-words,
presumably because processing sentences requires less effort, but clearly this
highlights, again, the language and the cognitive control system are clearly
functionally distinct. Moreover, damage to the regions of the
multiple demand network has been shown to lead to decreases in fluid intelligence.
So Alex Woolgar reported a strong relationship between the amount of tissue
loss in frontal and parietal cortices and a measure of IQ. This is not true for
tissue loss in the temporal lobes. It’s quite striking. You can actually calculate
for this many cubic centimeters of loss in the MD system, you lose so many IQ points.
It’s a strong, clear relationship. So this system is clearly an important part of the
cognitive arsenal of humans because the ability to think flexibly and abstractly
and to solve new problems are exactly . . . These are the kinds of abilities that IQ
tests aim to measure, are considered kind of one of the hallmarks of human
cognition. Okay. So as I mentioned, the complementary approach for addressing
questions about language specificity and relationship to other mental functions is
to examine cognitive abilities in individuals who lack a properly
functioning language system. Most telling are cases of global aphasia. So this is a
severe disorder which affects pretty much the entire front temporal language system,
typically due to a large stroke in the middle cerebral artery and lead to
profound deficits in comprehension and production. Rosemary Varley at UCL has
been studying this population for a few years now.
With her colleagues, she has shown that actually these patients seem to have
preserved abilities across many, many domains. So she showed that they have
in-tact arithmetic abilities. They can reason causally. They have good nonverbal
social skills. They can navigate in the world. They can perceive music and so on
and so forth. Of course, these findings are then consistent with the kind of
picture that emerges in our work in fMRI. Let’s consider another important
non-linguistic capacity, which a lot of people often bring up when I tell them
about this work. How about the ability to extract meaning from non-linguistic
stimuli? Right? So given that our language regions are so sensitive to meaning, we
can ask how much of that response is due to the activation of some kind of
abstract, conceptual representation that language may elicit, rather than something
more language-specific, a semantic representation type. So to ask these
questions, we can look at how language regions respond to nonverbal, meaningful
representations. In one study, we had people look at events like this or the
sentence-level descriptions of them, and either we had them do kind of a high-level
semantic judgment test, like decide whether the event is plausible, or do a
very demanding perceptual control task. Basically what you find here is, again,
the black and gray are responses to the localizer conditions. So in red, as you
would expect, you find strong responses to this and to the condition where people see
sentences and make semantic judgments on them. So what happens when people make
semantic judgments on pictures? We find that some regions don’t care at all about
those conditions, and other regions show reliable responses, but they’re much
weaker than those elicited by the meaningful sentence condition. So could it
be that some of our language regions are actually abstract semantic regions?
Perhaps. But for now, keep in mind that the response to the sentence-meaning
condition is twice stronger, and it is also possible that participants may be
activating linguistic representations to some extent when they encounter meaningful
visual stimuli. So to answer this question more definitively, we’re turning to the
patient evidence again. If parts of the language system are critical for
processing meaning in non-linguistic representations, then aphasic individuals
should have some difficulties with nonverbal semantics. First, I want to
share a quote with you from Tom Lubbock, a former art critic at The Independent, who
developed a tumor in the left temporal lobe which eventually killed him. As the
tumor progressed, and he was losing his linguistic abilities, he was documenting
his impressions of what it feels like to lose the capacity to express yourself
using verbal means. So he wrote, “My language to describe
things in the world is very small, limited. My thoughts, when I look at the
world, are vast, limitless, and normal, same as they ever were. My experience of
the world is not made less by lack of language, but is essentially unchanged.”
I think this quote quite powerfully highlights the separability of language
and thought. So in work that I’m currently collaborating on with Rosemary Varley and
Nancy Kanwisher, we are evaluating the global aphasics performance on a wide
range of tasks, requiring you to process meaning in nonverbal stimuli. So for
example, can they distinguish between real objects and novel objects that are matched
for low-level visual properties? Can they make plausibility judgments for visual
events? What about events where plausibility is conveyed simply by the
prototypicality of the roles? So you can’t do this task by simply inferring that a
watering can doesn’t appear next to an egg very frequently. Right? It seems like the
data so far is suggesting that they indeed seem fine on all of these tasks, and they
laugh just like we do when they see these pictures because they’re sometimes a
little funny. So they seem to process these just fine. So this suggests, to me,
that these kinds of tasks can be performed without a functioning language system. So
even if our language system stores some abstract conceptual knowledge in some
parts of it, it tells me at least that that code must live somewhere else as
well. So even if we lose our linguistic way to encode this information, we can
have access to it elsewhere. So to conclude this part, fMRI in patients
sudies converge suggesting that the front temporal language system is not engaged in
and is not needed for non-linguistic cognition. Instead, it appears that these
regions are highly specialized for interpreting and generating linguistics
signals. So just a couple minutes on what this means. So given this highly selective
response to language stimuli that we observe, can we make some guesses already
about what these regions actually do? I think so. I think a plausible hypothesis
is that this network houses our linguistic knowledge, including our knowledge of the
sounds of the language, the words, the constraints on how sounds and words can
combine with one another. Then essentially the process of language interpretation is
finding matches between the pieces of the input that are getting into our language
system and our previously stored representations. Language production is
just selecting the relevant subset of the representations to then convey to our
communication partner. This way . . . The form that this knowledge takes is a huge
question in linguistic psychology and neuroscience. So one result I don’t have
time to discuss is that contra some claims, it doesn’t seem to be the case
that syntactic processing is localized to a particular part of this language system.
It seems it’s widely distributed across. Anywhere throughout the system, you find
sensitivity to both word-level meanings and compositional aspects of language,
which is much in line with all current linguistic theorizing, which doesn’t draw
a short boundary between the lexicon and grammar. So this way of thinking about the
language system as a store of our language knowledge makes it pretty clear that the
system is probably not innate. In fact, it must arise via experience with language as
we accumulate this language store. It’s also presumably dynamic, changing all the
time as we get more and more linguistic input through our lifetimes. I assume that
our language knowledge is plausibly acquired with domain general statistical
learning mechanisms, just like much other knowledge. So what changed in our brains
that allowed for the emergence of this system? So one thing that changed is that
our association cortices expanded. So these are regions that are sensory and
motor regions and include frontal, temporal, and parietal regions. Okay. So
these people have noted for a long time. I think I’m kind of in the camp of people
who think that our brains are not categorically different in any way.
They’re just scaled-up versions of other primate brains. I think there’s quite good
evidence for that. So how does the system emerge?
So I think one thing that was different between us and chimps is that there is a
protracted course to the brain development in humans. So between birth and adulthood,
our brains increase threefold, compared to just twofold in chimps. It’s a big
difference. Basically this just makes us exceptionally susceptible to environmental
influences, and we can soak stuff up from the environment very, very easily. So as
our brains grow, we make more glial cells. We make more synapses. Our axions continue
to grow and become myelinated, and it’s basically tissue that’s ready to soak up
the regularity that we see in the world. Of course, it comes at a cost. That’s why
we have totally useless babies that can’t do anything, but apparently somehow it was
worth. . . The tradeoffs were worth it. Okay. So the conclusions. We have this
system. It’s highly selective in its responses. It presumably emerges over the
course of our development and would enable probably some combination of the expansion
of these association cortices, where we can store vast amounts of symbolic
information and this protracted brain development, which makes us great learners
early on. Thank you. [applause] ♪ [music] ♪ – [Rachel] So I want to start this story
actually in the 1800s. So in 1800, a young physician named Itard decided to take on
the task of teaching a young boy French. Turns out that this young boy, who they
think was between 10 and 12 years old, he had no language. He’d been discovered
running around in the woods, naked and unable to communicate. Itard thought that
this was a very important task because he thought that civilization was based on the
ability to empathize and also on language. So he tried valiantly, for two years, to
teach this young boy named Victor . . . He named him Victor because over this
two-year time span, the only language or sounds that this boy could make was the
French ‘er’, which sounds like Victor. Hence, he had his name. But after two
years, Victor was unable to speak any French and comprehended very little French
and primarily communicated with objects. Then Itard wrote this up after two years.
He wrote up his findings, and he said that he thought that a major reason why Victor
didn’t learn French was because he was simply too old.
But of course, this was the 1800s, and he didn’t speculate as to what it was about
being 10 years old with no language that would prevent you from learning language
if you had a daily tutor trying to teach you French. There’s an enormous amount of
irony in this particular story, because Itard was the house physician for the
first school for the deaf in the entire world. Here, we have a picture of it. The
first school for the deaf was begun in 1760 in Paris. At the time that Itard was
teaching Victor, he was at the school with all of these children who used sign
language and all of these teachers who used sign language. Nonetheless, it never
occurred to him to try to teach Victor sign language, but we can’t blame him
because in the 1800s, sign language was not considered to be a language. In fact,
that particular discovery and realization wouldn’t happen for another century and a
half. So we can imagine why he didn’t teach Victor sign language. The question
is: could Victor have actually learned sign language if Itard had tried to teach
it to him? So I’m going to try to answer that question today through a series of
studies. But before I talk about our studies, I want to talk about something
that all of the speakers who have preceded me have talked about, which is that one
thing that’s . . . The defining characteristic of language is that it’s
highly structured, and it’s highly structured at all of these multiple
levels, so that speech sounds make up words. Words make up words and phonemes.
These words are strung together in sentences with syntax, and the specific
syntax helps us understand and produce very specific kinds of meanings.
Now one aspect of this language structure is that humans have evolved to the point
where children learn this structure naturally. Nobody has to teach them.
Nobody has to have a tutor to sit with them for two years to teach them the
structure of French or the structure of English or the structure of ASL. Simply by
being around people who use the language, young children naturally acquire all of
this multilevel and complicated structure. That is to say all children do this if, in
fact, they can access the language around them, if they have normal hearing. They’re
born with normal hearing. They hear people talk. Before you know it, they’re talking
themselves. But if children are born profoundly deaf, they cannot hear the
speech around them. We know that lip-reading is insufficient to learn
language because most of the speech sounds are invisible. What happens to these
children? If there’s no sign language in the environment, they can’t learn a visual
form of language either. So it happens that there are large numbers of children
who actually are like Victor, in the sense that they grow up without language,
without learning a language, but they are like Victor– they are unlike Victor, in
that they weren’t running around in the woods, nude and having a very harsh life.
So how does the lack of language in childhood affect the ability to learn
language? Or does it affect the ability to learn language? This has been the focus of
studies that we have been doing for many years in our laboratory. Because we’re
using deaf children and sign language as a means to model language acquisition and
its effect on the brain, I think it’s appropriate that I talk a little bit about
the kinds of stimuli we do and about American Sign Language. So first of all,
you should know that American Sign Language, unlike many of the sign
languages that have been discussed up to this point, is a very sophisticated
language. It’s evolved clearly over 200 years. We might even say that American
Sign Language evolved with the development of the United States of America and spread
as civilization went across, as white men went across the continent. So American
Sign Language has a phonological system, a morphological system, syntax, and so
forth. So for those of you who don’t know sign . . . I know there are many people
here who do know sign. I want to show you what some of this structure looks like. So
I’m going to play you two video tapes. For those of you who don’t know sign, I would
like you to guess which one is syntactically structured. For those of you
who do know sign, maybe you could keep the answer to yourselves. That’s one. Here’s two. So how many think two? How many think one?
Okay. For those of you who think two, I captioned this. So this is really ‘kon,
dird, lun, blid, mackers, gancakes.’ Number one is a fully formed sentence with
a subordinate clause. The reason I’m showing you these two
sentences is, for those of you who don’t know the language, you can’t perceive or
parse this particular structure. This is what knowing a language is about. For
those of you who know ASL and who know this language, you know that the second
example had all of these signs which were non-signs, possible signs, but really just
non-signs. This is part of what knowing language is about. How do people learn
this particular structure? The question that we’re interested in is: how does
being a young child help people learn this particular structure? So we did a series
of experiments. When we started this work, it wasn’t even clear that age would make a
difference in sign language acquisition. Sign language is gestural. Sign language
is mimetic. Maybe anybody can learn sign language at any time in their lives.
So in one experiment, what we did is we recruited a number of people. This is in
Canada, who were born deaf and who used ASL. We asked them. We created an
experiment where we had a set of sentences that varied in complexity, and we showed
them these ASL sentences, and we asked them simply to point to a picture that
reflected the meaning of the sentence that they saw. We were quite struck by our
findings. What you see here is that deaf people who learned ASL from birth, from
their parents, performed very, very well on this task, in contrast to deaf people
who were adults, who’ve been signing for over 20 years performed at chance. So they
had great difficulty understanding some of these basic sentences in ASL. So this
suggests that there are age of acquisition effects for sign, as there are for spoken
language. Everybody sort of . . . The word on the street is it’s much harder to learn
a language if you’re an adult than you’re a child. But what if it’s something deeper
than this? What if there’s something about learning a language in childhood that sets
up the ability to learn language, that creates the ability to learn language? So
we did another experiment, also in Canada, but we decided in order to test this
particular hypothesis, we should switch languages. So we’re no longer testing ASL
here. What we’re testing here is English. We devised an experiment in English, where
we had a set of sentences, and some of them were ungrammatical, and some of them
were grammatical. This is a common kind of task that psycholinguists use.
Notice here that the people who were born profoundly deaf and for whom ASL was a
first language are near-native in English. So this is a second language. So learning
a language early, even though it’s in sign, helps people learn a second
language. Notice also that they performed . . . Their performance was
indistinguishable from normally-hearing people who had learned other languages at
birth, German, Urdu, Spanish, and French. So there seems to be that there’s
something about learning a language early in life, regardless of whether it’s sign
language or spoken language, that actually helps people learn more language. It’s not
simply learning language when one is little. But as also part of this
experiment, we tested a group of individuals who had been signing for 20
years and had gone through the educational system in Canada, who were born deaf. On
this task, they performed at chance. On a grammaticality judgment task, it’s either
yes or no. So it’s at chance. So we see that individuals who are deprived of
language, who aren’t able to learn language at a young age, perform poorly on
their primary language, sign language, and they perform very poorly on a second
language, which is ASL, and we see the reverse. So there’s something really
special going on here about learning language at an early age. What might this
be? And might it be in the brain? So in another set of studies, what we did
is take this population that we had been looking at, and we decided to neuro-image
their language processing to see whether this might give us some clues as to
differences between first and second-language learners and people who
had language and people who did not have language. So in this study, also done in
Canada, with colleagues at the Montreal Neurological Institute, we did fMRI. Maybe
many of you have had MRIs. We showed the subjects sentences, like the sentences
that you saw, and we asked them to make grammatical judgments on these sentences.
We tested 22 people. They were all born profoundly deaf. They all used American
Sign Language as a primary language. They had all gone through the educational
system, but they ranged in the age at which they were first able to acquire
language. This is all the way from birth up to age 14. So if the age at which you
learn your first language doesn’t make a difference, then we should expect the
neuro-processing patterns of all of these individuals to be similar. If age of
acquisition makes a difference, we should see different patterns in the brain. In
fact, this is what we have. This is what we found. When we did the analysis, we
found that there were seven regions in the brain, primarily in the left hemisphere.
As you know now, the left hemisphere has areas that are responsible for language.
One effect that we found was that in the language regions of the left hemisphere,
the earlier the person learned their first language, the more activation we saw in
the language hemisphere. However, the older the person was when they learned
their first language, the less activation we got in the language areas of the brain.
So if there’s less activation, is there something else going on here? We actually
found a second effect, which we were not expecting at all, which is in the back
part of the brain, the posterior part of the brain, in visual processing. This
particular effect was that the longer the person matured, the older the . . .
Without language, the older they were when they learned language, we found greater
activation, more neural resources being devoted to visual processing. So we see
that here in this group of deaf signers, we have two complementary reciprocal
effects of when a child learns his or her . . .when an individual learns their first
language and what the brain seems to be doing in terms of processing that
language. So that for people who learned language early in life, almost all of
their neural resources are devoted to processing the meaning and structure of
that language. For people who learned their first language later in life, more
neural resources are devoted to just trying, perhaps, to figure out what the
signal was. Was this a word? Was it glum? Or was it gleam? So we have this
reciprocal relationship between perceptual processing and language processing. This
particular pattern is not unique to deaf signers. There’s work by Tim Brown and
Shleger that showed that younger children often have more posterior activation than
older children. There are also some clinical populations, such as autistic
individuals, particularly those who have low . . . whose language skills are not
well-developed, will often show more processing in the occipital lobe. So this
is not a pattern that is unique to deafness.
So then the next question we had is whether, in fact . . . How does language
develop when an individual first starts to learn it when they’re much older, for
example, when they’re a teen? We have been very fortunate to have followed five or
six children in our laboratory who had no language until they were 13 to 14 years of
age, for a variety of reasons. Two of these children are from the United States.
These other children are from other countries. Actually this particular
circumstance, while we might think of it as being very rare, is actually very
common, particularly in underdeveloped countries. So the way in which we have
observed or analyzed language acquisition is to use normal procedures that people
use to study children’s language acquisition. We get a lot of spontaneous
data from them, and we analyze it. So one question we had is: if you’re 13 years
old, and you don’t have a language system, will you develop language like a baby? Or
will you do something else? Because you have a developed cognitive system. Will
you jump in the middle of the task? How will this progress? To answer this
question, we need to look a little bit at how normally-hearing children or deaf or
hearing children develop language when they are exposed to it as a young age. The
major hallmark of children’s language acquisition is that they very quickly, as
they’re acquiring the grammar of their language, their sentences get longer and
longer. The reasons their sentences get longer and
longer is because they’re learning all of these . . .the morphology, the syntax. As
they say ideas, as they’re expressing their ideas, they’re better able to use
grammar to express them. So these data show the average length of children’s
expressions. Two of these children are normally hearing and acquiring English,
and two of these children are acquiring ASL. So we see that, in fact, the teens
that we have been following show no increase in their language. They’re able
to learn language and put words together, but, in fact, we don’t see an increase in
their grammar. In the last study, we wanted to neuro-image these children. We
wanted to see what are their brains doing with the language that they have. So we
used magnetoencephalography, which is a different technique, which is
complementary to the fMRI. What we did for this is we studied their vocabulary, and
we made stimuli that we knew that they knew, words that were in their vocabulary,
and that looked something like this. In the first instance, the picture matched
the sign. In the second instance, it didn’t. When that happens, the brain goes,
“Uh-oh,” and you get this N-400 response. That’s what we were localizing in the
brain for these children. Because we’re using vocabulary that they have, we know
that they knew these words. We asked them to press buttons while they were doing
this task, and we knew that they were accurate. We didn’t only test these
children. We also tested control groups. So some of these control groups are deaf.
Some are hearing. Some are first language learners, and some are second language
learners. The first panel shows the response of a
group of normally hearing adults doing this task while looking at pictures and
listening to words. This is data that Katie Travis used also to look at
children’s development, neural development. The second panel, these are
deaf adults who learned ASL from birth. You can see that their processing is very
much like the hearing adults who are speaking English, primarily left
hemisphere in the language areas, with some support or help from the right
hemisphere. Actually these patterns are indistinguishable. Both the hearing adults
and the deaf adults learned language from birth, even though it was in a different
form. What’s this last panel? These are college students who are normally hearing.
They have been learning sign for about three years, which is about the same
amount of time that our cases were learning language. So we see that
responding on this task in speech in ASL, whether it’s a first language or a second
language, so long as the subject had language from birth, looks fairly similar.
What about the cases? We were able to neuro-image two cases, and you can see
that their neural processing patterns are quite different, and they look neither
like second language learners, nor do they look like deaf adults. These children have
been signing for three years, and they had no language before they started to learn
how to sign. You can see primarily that there’s a huge response in the right
hemisphere, in the occipital and parietal areas. One of the subjects also shows some
response in the language areas. We can see that even though they’re
acquiring language, they’re doing it in a very different way, and their brains are
responding very differently. So we see that, in fact, there are huge effects of
language environment on both the development of language, but also how the
brain processes language. So we see that it seems to be that the human language
capacity, both understanding language and expressing language, but also the brain’s
ability to process language is very dependent upon the baby’s brain being
exposed to or immersed in language from birth. It’s through this analyzing
language and working on the data that it’s being fed that, in fact, I think the
neural networks of language are being created. So language is a skill that is
not innate but emerges from the interaction of the child with the
environment, through linguistic communication. That was probably the
answer that Itard was looking for and might be the reason why Victor did not
acquire language. Thank you. [applause] ♪ [music] ♪ – [Edward] Okay. So I wanted to first
thank the organizers really for this kind invitation to join this cast of stellar
thinkers about the biology and behavior of language. I actually want to switch the
title a little bit to something more specific. I want to talk to you about
organization, in particular a kind of organization that I refer to as a
taxonomy. The organization that I’m actually really referring to is the
organization of sound, in particular speech sounds and how those are actually
processed in a very important part of the brain called the superior temporal gyrus,
also known as Wernicke’s area. The main focus of my lab is actually to understand
the basic transformation that occurs when you have an acoustic stimulus and how it
becomes transformed into phonetic units. In other words, basically, how do we go
from the physical stimulus that enters our ears into one that’s essentially a mental
construct, one that’s a linguistic one? To basically ask the simple question, what is
the structure of that kind of information as it’s processed in the brain?
Now this actually turns out to be a very complex problem because it’s one that
actually arises from many levels of computation that occur in the ascending
auditory system. As sounds actually come through the ears, they go up through at
least seven different synaptic connections across many different parts of the brain,
even bilaterally, to where they’re actually processed in the non-primary
auditory cortex in the superior temporal gyrus. What we know about this area from
animal studies and non-human primates, for example, is this is an area that no longer
is tuned to basic low-level sound features, like pure tones, pure
frequencies, but in one that is actually tuned to very broad, complex sounds. There
have been very nice work in fMRI that’s actually demonstrated that this area is
far more selective to complex sounds, like speech, over non-speech sounds. So the
basic question is not really about where is this processing going on. The question
is how. Okay. What is the structure of information in this transformation that’s
going on? In particular, what kind of linkages can we make between that physical
stimulus and the internal one, which is a phonetic one? For me, I think it’s really
important to acknowledge some really important fundamental contributions that
occur, that give us some insight and put them in a very important perspective. For
me, I think one of the most important pieces of work that led to this work that
I’m about to describe from our lab was actually 25 years ago, using an approach
that’s actually far more complicated and difficult to achieve than what we do in
our own work. This is using single unit and single
neuron recordings that were recorded from patients that were undergoing
neurosurgical procedures for their clinical routine care. This very extremely
rare but precious opportunity to actually record from certain brain areas while
someone is actually recording– listening to speech. These are from my close
colleague and mentor, Dr. Ojamin, who’s in the audience today. But why I think it’s
so important to acknowledge this work is a lot of the clues about what I’m about to
describe were actually seen 25 years ago. This figure that’s extracted from that
paper, where they actually showed and could record from single brain cells,
called neurons, in the superior temporal gyrus, that they were active and
corresponding to very specific sounds. But if you actually look at where those sounds
are, they’re not exactly corresponding to the same exact, let’s say, phonemes or the
same exact sounds, but, in fact, they are corresponding to a class of sounds. This
was an observation that was made in this paper. They thought perhaps this is some
mention of phonetic category representation there. But it wasn’t that
clear, actually, and there were a lot of other really important observations that
were made in that paper. Now from a linguistic perspective, in thinking about
how, behaviorally, we organize this information in the brain, there’s actually
a wonderful way to approach it. It’s not perfect, but a very wonderful way to think
about how languages across the world actually share a similar and shared
inventory of speech sounds, not all completely the same. Each language has a
different number, but they highly overlap. The reason why they overlap is because
they are produced by the same vocal tract. This is essentially like a periodic table
of sound elements for human language and speech. So this table actually has two
really important dimensions. The horizontal dimension is actually one that
we call the place of articulation. It’s referencing where in the vocal track these
sounds are made. For example, bilabial sounds, the ‘P’ and the ‘B’ require you to
actually have a transient occlusion at the lips, ‘ba. ‘ You cannot make those sounds
without that particular articulatory movement. Whereas some of the other
sounds, like a ‘D’ or a ‘T’, a ‘da’ a ‘ta’ a ‘da’ we call alveolar because the front
of the tongue tip is actually placed against the teeth. So these are actually
referencing where occlusions are occurring in the vocal tract when we speak, and
those actually correlate necessarily to very specific acoustic signatures. The
other dimension is what we call the manner of articulation. So the manner of
articulation is actually telling you a little bit more about not so much where,
but how in the vocal tract the constrictions are made in order to produce
those sounds. We have certain ones, like plosives, where you have complete closure
of the vocal tract and then a transient release, other sounds where you have
near-complete, like a fricative, like ‘sha,’ ‘za,’ those sounds that we call
fricative. If you actually look at vowels, they
actually have a similar structural organization. There actually is something
that actually references where in the vocal tract, either the front, middle, or
back, or the degree of open and closure. So for both consonants and vowels, there
actually is a structure that we know about, linguistically and phonologically,
about how these things are organized. I think the thing that interests me is that,
like I referenced before, that this is something like a periodic table. There is
something fundamental about these units to our ability to perceive speech. These
phonological representations are not necessarily the ones that we think of as
these letters that we call phonemes, but actually groups of phonemes that share
something in common, what we call features. These are the members of small
categories which combine to form the speech sounds of human language. This
became very attractive to me as a model of something to look for in the brain because
of . . . Essentially why it could be so important is that languages actually do
not vary without limit, but they actually reflect some single or limited general
pattern, which is actually rooted in both the physical and cognitive capacities of
the human brain, and I would add the vocal tract. This is not a new kind of thinking,
but it’s one that has not been clearly elucidated in terms of its biological
mechanisms. So in order for us to get this information, it requires a very special
opportunity, the one where we can’t actually record directly from the brain.
In many ways, this is actually a lot more coarse than the kind of recordings that
were done almost 25 years ago. These are ones from electrode sensors that are
placed on the brain in order to localize seizures in patients that have epilepsy.
In the seven to 10 days that they are usually waiting to be localized, we have a
very, again, precious opportunity to actually have some of the participant, the
patient volunteers, listen to natural, continuous speech and look at those neural
responses on these electrode recordings to see how information is distributed in the
superior temporal gyrus when they’re listening to these sounds. This gives you
a sense of actually what that neural activity pattern looks like.
– We’re going to slow down that sentence a
lot here. – Ready tiger go to green
five now. – So you can see that the information is
being processed in a very precise, both spacial and temporal, manner in the brain.
This is exactly the reason why this kind of information has been elusive, because
we do not currently have a method that actually has both spacial and temporal
resolution and, at the same time, covers all of these areas simultaneously. So
it’s, again, in the context of these rare opportunities with human patient
volunteers that we can conduct this kind of research. So the natural question is .
. . Of course, now that I’ve shown you that we can actually see a pattern in the
brain, both that’s temporally and spatially specific, what actually happens
when we try to deconstruct some of those sound patterns from the brain? This just
gives you an example, again, in the superior temporal gyrus, where those
sounds are activating the brain. An example of the spectrogram for a given
sentence, in this case, it’s, “In what eyes there were.” The last part of that
figure basically shows you that pattern across different electrodes. It’s not all
happening in the same particular way. You have very specific evoked responses that
actually occur at different parts of the superior temporal gyrus.
I want to show you what happens when you look at just one of those electrodes. If
you look at the neural response of that one particular electrode that’s labeled
e1, and you organize the neural response by different phonemes, okay, you can
actually see, again, on the vertical access, starting with ‘da,’ ‘ba,’ ‘ga,’
‘ta,’ ‘ka.’ You can see that this electrode . . . Those hundreds or
thousands of neurons that are under this electrode are very selectively responsive
to this set of sounds that we call plosives. It’s not one phoneme, but a
category, and they share this feature that we actually know, linguistically, to be
called plosive. I can show you a series of other electrodes. Electrode two has a very
different kind of sensitivity. It’s showing you that it really likes those
sounds ‘sha,’ ‘za,’ ‘sa,’ ‘fa.’ This is an electrode that is, again, not tuned
to one phoneme, but actually tuned to the category of sibilant fricatives in
linguistic jargon. We have another electrode, e3, that is selective to
low-back vowels, these “ah” based ones. Another one that is a little bit more
selective to high-fronted vowels, ‘E.’ Even another electrode, e5, that is
corresponding to nasal sounds. So this is a very low-level description,
but it’s actually the first time we’ve ever seen in this kind of principled way,
obtained through very precise spatial and temporal recordings, the ability to
resolve phonetic feature selectivity at single electrodes in the human brain. Now
this is not enough. We need to really address this issue of structure. That’s
one of the themes here. Are all of these things just equally distributed as
features? In the original thinking about these things, you could have a binary list
of features. It turns out that features, in and of themselves, actually have
structure and have relationships with one another. So what we did, in order to look
at that structure in the brain, we looked at hundreds of electrodes that were
recorded over a dozen patients. Each one of those columns actually corresponds to
one electrode and one particular superior temporal gyrus in someone’s brain. Like I
just showed you before, the vertical axis is actually how they’re organized by
different phonemes. What we did here was we used a statistical method called
hierarchical clustering. What hierarchical clustering is used for is finding the
patterns in this data. What the hierarchical clustering showed us and
sorted this data was that, in fact, there is, indeed, structure in the brain’s
responses to human speech sounds, and it looks like this.
So we’ve organized the hierarchical clustering as a function of a single
electrode’s, again, a single column’s selectivity to different phonemes, but
we’ve also organized this clustering as a population response across all of the
electrodes and looking at that selectivity for different phonemes. So we have two
different axes that we’re actually looking at the brains large distributed response
to speech sounds. We’re using this method which is what we call unsupervised,
meaning we’re not telling it any linguistic information, or we’re not
organizing the data. We’re just saying, “Tell us how the brain is organizing this
information.” What we see from this is that when we actually look at where this
information is being organized, one of the biggest divisions between different parts
of different kinds of selectivity in the brain are what we would call the
difference between consonants and vowels or really, actually, between obstruents
and continuants, in linguistic jargon. But within those different categories, you
actually have sub-classification. So within the consonants, you actually have a
subdivision between plosives and fricatives. Between the sonorants you
actually have referencing for different positions of the tongue, low back, low
front, high front, different classes of vowels and, in fact, nasal. So basically
this is telling you that feature selectivity in the brain is actually
hierarchically structured. The second thing is that instead of using phonemes in
order to organize the responses, we actually use features. So as an example,
that term dorsal actually refers to the tongue position when it’s fairly back,
like for ‘G’ ‘K’ sounds. You can see that when we organize things
by features, you have a much cleaner delineation. The electrode responses seem
to be much more tuned to phonetic features shown below, as they are, compared to when
you plot them as phonemes. Okay. So this essentially disproves any idea that there
is individual phoneme representation in the brain, at least not one that’s locally
encoded, but tells you that the brain is organized by its sensitivity to phonetic
features. Now relating it to a phonetic feature is the first step, and it’s one
that’s really important because it’s referencing the one we know about from
linguistics and the one that we know behaviorally. But how do we connect this
to the physical stimulus that’s actually coming through our ears? That’s where we
have to make a linkage to actually something about the sound properties. Are
these things truly abstract features that are being picked up by the brain? Or
actually, are they referencing specific sound properties? Basically the answer is
the latter. It’s that what we’re actually seeing is sensitivity to particular
spectral temporal features. In the top row, I am showing you basically . . . When
we look at the average tuning curves, the frequency versus time, tuning curves for
each one of those different classifications for plosive fricatives,
they’re very similar to the acoustic structure when you average those
particular phonemes in the brain. So what this means is the tuning that
we’re seeing that’s corresponding to phonetic features is, in fact, one that is
tuned to high order acoustic spectral temporal ones. The brain is selecting
specific kind of acoustic information and converting it into what we perceive as
phonetic. In the interest of time, I’m going to sort of skip more in-depth
information about vowels and plosives and how those are specifically encoded. But in
summary, what we’ve found is that there’s actually a multidimensional feature space,
actually, for speech sounds in the human superior temporal gyrus. This feature
space is organized in a way that actually shows hierarchical structure. The
hierarchical structure is fairly strongly driven by the brain, in particular, this
auditory cortex sensitivity to acoustic differences, which are most signified
actually in the manner of articulation distinctions, linguistically. What’s
interesting about this is it actually does correlate quite well with some known
perceptual behavior. So I would like to conclude there and acknowledge some of the
really important people from my lab, postdoctoral fellow Nima Mesgarani who did
most of this work with one of our graduate students, Connie Cheung. Thank you. [applause] ♪ [music] ♪

9 Comments

Add a Comment

Your email address will not be published. Required fields are marked *