cathy price 2010.pdf

8/11/2019 Cathy Price 2010.pdf

1/27

Ann. N.Y. Acad. Sci. ISSN 0077-8923

A N N A L S O F T H E N E W Y O R K A C A D E M Y O F S C I E N C E SIssue:The Year in Cognitive Neuroscience

The anatomy of language: a review of 100 fMRI studies

published in 2009Cathy J. Price

Wellcome Trust Centre for Neuroimaging, UCL, London, UK

Address for correspondence: Cathy Price, Wellcome Trust Centre for Neuroimaging, Institute of Neurology, UCL, 12, Queen

Square, London WC1N 3BG, UK. [email protected]

In this review of 100 fMRI studies of speech comprehension and production, published in 2009, activation is

reported for: prelexical speech perception in bilateral superior temporal gyri; meaningful speech in middle and

inferior temporal cortex; semantic retrieval in the left angular gyrus and pars orbitalis; and sentence comprehension

in bilateral superior temporal sulci. For incomprehensible sentences, activation increases in four inferior frontalregions, posterior planum temporale, and ventral supramarginal gyrus. These effects are associated with the use of

prior knowledge of semantic associations, word sequences, and articulation that predict the content of the sentence.

Speech production activates the same set of regions as speech comprehension but in addition, activation is reported

for: word retrieval in left middle frontal cortex; articulatory planning in the left anterior insula; the initiation and

execution of speech in left putamen, pre-SMA, SMA, and motor cortex; and for suppressing unintended responses in

the anterior cingulate and bilateral head of caudate nuclei. Anatomical and functional connectivity studies are now

required to identify the processing pathways that integrate these areas to support language.

Keywords: speech; comprehension; production; fMRI; language

Introduction

The aim of this paper was to provide an up to dateand integrated review of functional imaging studiesof language published in 2009. I included studiesthat (i) used fMRI, (ii) were published online be-tween January and mid-October 2009, and (iii) wereconcerned with the functional anatomy of speechcomprehension and production in the healthy adult

brain. This excluded papers that used MEG, EEG,TMS, or optical imaging; that were published in2008 or before; or that focused on populations withacquiredor developmentallanguage disorders. I alsoexcluded paperson orthographic processing of writ-ten words, sign language, and multilingualism: eachof these topics requires its own review.

Having selected more than 100 fMRI papers onspeech comprehension or production, I categorizedeach paper according to the conclusions provided in

their abstracts. The first categorization was speechcomprehension or speech production. Within eachof these main categories, I progressively grouped

the papers according to whether their stimuli wereprelexical, single words,orsentences. This process il-lustrated the wide range of language topics that arecurrentlybeinginvestigated with fMRI.Although allpapers published prior to 2009 were excluded, myassumption was that the 2009 papers would haveincorporated the most salient conclusions from for-mer years, as well as assimilating their literature.Each of the 2009 papers listed in the references of

this review article should therefore provide a list ofrelated papers published prior to 2009.

Given the number of language fMRI studies pub-lished in 2009, it was not possible to do justiceto the contribution of each one. Furthermore, areview is limited to the reviewers own under-standing of the topic and how the conclusionsof each paper fit together. What follows is theoverall picture that I saw when reviewing the se-lected literature. It emphasizes consistent rather

than atypical results. Only time and more studieswill determine whether the conclusions drawn arecorrect.

doi: 10.1111/j.1749-6632.2010.05444.x

62 Ann. N.Y. Acad. Sci. 1191 (2010) 6288 c 2010 New York Academy of Sciences.


2/27

Price The anatomy of language

Anatomical terms

Across studies, different anatomical terms have beenused for the same region. For example, the sameinferior frontal region has been variously referred

to as Brocas area, BA44, or pars opercularis. Theanatomical distinction between anterior and pos-terior superior temporal lobe activation also variesacross studies; and the same left temporo-parietalarea has been referred to as the supramarginal gyrus(SMG), planum temporale, Spt (Sylvian parieto-temporal), superior temporal gyrus (STG), and theinferior parietal lobe (IPL). I have therefore triedto use a more consistent terminology (see Fig. 1),reported the MNI or Talairach coordinates [x,y,z]of the effects I am referring to and estimated the

subregion being referred to using both MNI andTalairch templates. I divided inferior frontal activa-tions into four different regions: pars triangularis(pTr), pars orbitalis (pOr), dorsal pars opercularis(dpOp), and ventral pars opercularis (vpOp). Inthe superior temporal gyrus, anterior and posteriorparts were in relation to Heschls gyrus (HG).

Given the limited spatial resolution of fMRI andthe use of different spatial normalization templates,it was not always possible to determine the location

of reported coordinates, particularly when they fellat the border between two regions. For example, thelower bank of the sylvian fissure, posterior to HG isin the posterior planum temporale (pPT), and theupper bank of the same part of the sylvian fissure isin the ventral supramarginal gyrus (vSMG). Theseregions will not have the same connectivity or func-tions but it was difficult to tell when an activationfell in the lower or upper bank of the fissure becauseof their close proximity. One option is to refer to ac-tivation around the posterior part of the Sylvian fis-

sure as Spt (Sylvian parieto-temporal), as in Hickokand colleagues.1 However, this de-emphasizes thefact that the lower and upper banks of the sylvianfissure will have different functions. I have there-fore labeled activation around Spt as pPT/vSMg toemphasize the convergence of two different regionswithout implying that they have the same function.

To provide summaries of the results, I markedthe approximate location of the reported coordi-nates on a two-dimensional template of the brain.

This further reduced the spatial precision of the re-sults. Remarkably, however, the low-resolution car-toon illustrations were sufficient to highlight dis-

tinct spatial dissociations in the anatomical systems

that support speech comprehension and produc-tion at the prelexical, word, and sentence levels (seeFig. 2). Future studies that dissociate activationswithin subject, rather than across study, are now

needed to provide more accurate anatomical local-ization; and functional connectivity analyses withhigh temporal resolution are needed to determinehow the widely distributed sets of regions interactwith one another.

Review of speech comprehension

Auditory speech processing extracts meaningful in-formation from continuously changing acoustic in-puts. Many acoustic features predict the perceptualidentity (intelligibility) of a given auditory stimu-lus including the temporal dynamics of the timbre,pitch, and volume in the auditory waveform andthe integration of this information over short- andlong-time frames. The type of information that isextracted will depend on the expectations and in-tentions of the listener. For example, auditory inputcan be translated into meaningful messages (seman-tics), articulatory associations (speech production),spellings, and physical actions (locomotion) toward

or away from the source of the sound (i.e., spatialprocessing).

As auditory speech information is ambiguous,comprehension can be facilitated by auditory cuesfrom prosody and visual cues from concurrent handgestures, lip movements, facial expressions, andwriting. These additional cues are particularly use-ful under adverse listening conditions. For example,co-speech hand gestures provide semantic cues rel-evant to the intended message. Lip reading (view-ing mouth movements) provides articulatory infor-mation and writing facilitates both semantic and

articulatory processing. Therefore, speech compre-hension, even at the single-word level, involves the

integration of multiple cues from auditory and vi-sual sources.

Speechcomprehension at thesentence level is alsoambiguousbecausethemeaningofasentencecanbedifferent from its parts, particularly for a metaphor-ical sentence such as You are the sun in my sky.Sentence comprehension is therefore constrained

by our prior knowledge of the world (semantic con-straints), what we are expecting to hear (context ef-fects), and how the words are typically combined

Ann. N.Y. Acad. Sci. 1191 (2010) 6288 c 2010 New York Academy of Sciences. 63


3/27

The anatomy of language Price

Figure 1. Anatomical terms. The names and location of the regions referred to in the text are illustrated on a

cartoon sketch of the left hemisphere. This low-resolution sketch was deliberately chosen for all figures because it had

minimal anatomical details, which were not appropriate in the context of the poor spatial precision of the activation

loci illustrated in Figure 2, see introductory text for further explanation.

(i.e., syntactic constraints/grammatical rules). As

the meaning of the sentence emerges over time, au-ditory short-term memory is also required to holdindividual words in memory until a satisfactory

interpretation of the word combination has been

achieved. Each stage in the speech-processing hier-archy therefore involves the integration of bottom-up processing from auditory and visual inputs with



4/27


Figure 2. Summary of left hemisphere activation foci. The approximate location of the left hemisphere activation

foci reported in thetext. Right hemisphere foci were less consistent and not summarized here. Despite thepoor spatial

precision, the relative distributions of the different types of effects are valid and informative. Top left panel:prelexical

processing35,7,8;top right panel:pseudowords13;middle left panel:black circles =words relative to pseudowords,13

white circles =words relative to rotated words,9,10 open circles = effect of familiarity with auditory objects14;middle

right panel:black circles =comprehension versus imcomprehensible sentences,19,21,23,24 open circles =sentences

with versus without visual gestures.2729 Lower left panel:black circles = semantic ambiguity,9,19,3539 white cir-

cles = syntactic ambiguity10,32,49,50,55; lower right panel:black circles=word retrieval,43,76,79,8186 open circles =overt

articulation,83,87,96 white circles = covert articulation.72,73,84



5/27


top-down predictions from our prior knowledge oflanguage and the context in which it is being used.

My review of the anatomy of speech comprehen-sion is divided into seven sections. These focus onprelexical auditory processing (prelexical phonemic

processing);semanticrecognition at thesingle-wordlevel (semantic processing of spoken words); se-mantic recognition at the sentence level (sentencecomprehension); semantic constraints at the sen-tence level (semantic constraints in sentence com-prehension); syntactic constraints at the sentencelevel (syntactic constraints); the role of subvocalarticulation during speech comprehension (subvo-cal articulation during speech comprehension), andthe influence of prosodic information (the role of

prosody in speech comprehension). I then (Sum-mary of activation related to speech comprehen-sion) sum up, and (Left lateralization for speechcomprehension) hypothesize that top-down pro-cessing has a strong influence on determining leftlateralization for language.

Prelexical phonemic processingPrelexical processing occurs prior to the semanticrecognition of the auditory stimulus. It includesacoustic analysis of the frequency spectrum and the

integration of auditory features over different timeframes. The result is phonemic categorization ofthe speech signal and the morphological parsing ofword stems and inflectional affixes.

Rauschecker and Scott2 review evidence that au-ditory inputs are processed along an antero-lateralgradient that starts at HG, progressing in the ante-rior direction toward the temporal pole. As audi-tory information progresses anteriorly and laterally,lower level auditory features are combined into in-

creasingly complex and intelligible sounds. For ex-ample, middle and anterior parts of the superiortemporal sulci respond to prelexical stimuli, suchas consonantvowel syllables, and there may evenbe subregions selective for particular speech-soundclasses, such as vowels.

Further evidence for vowelselective responses hasbeen presented in two recent studies.3,4 Leff and col-leagues3 compared activation for hearing familiarvowels versus single formants and tones. The for-mants had the same intonation pattern as thevowels

but they were less acoustically complex because theylacked stop bursts or other formants. Relative to theformants and tones, vowel stimuli increased activa-

tion in lateral left HG, anteriorly [62 12 6] and

posteriorly [64 24 10]. Concurrent activationwas also observed in the right superior temporalsulcus [64 10 8]. Likewise, Britton and col-leagues4 identified activation in lateral and anterior

HG [64 7 5/59 7 5] for hearing vowels versustones. These bilateral superior temporal activationsare likely to reflect differences in the acoustic com-plexity of vowel stimuli versus tone and formantstimuli.

To probe more speech selective responses, Leffand colleagues3 manipulated the acoustic quality oftheir stimuli withincategory (i.e., vowel, formant, ortone). Using a mismatch paradigm, the same stim-ulus was repeated for a few trials followed by a new

stimulus in the same category. In this context, thenew stimulus results in an activation change thatcan be related to the processes that differ betweenthe new stimulus and the prior stimuli. The resultsrevealed a left-lateralized activation in the anteriorsuperior temporal lobe [56 6 10] when thevowelchangedandthiseffectwasnotobservedforformantor tone changes. Myers and colleagues5 also used amismatch paradigm, repeating a phonemic stimu-lus and measuring activation in response to acousticdeviants. They found left dorsal pars opercularis ac-

tivation at [41 8 27] when the phonetic categoryof the new stimulus was different versus the same asits predecessors. These mismatch responses3,5 canbe explained in terms of prediction error whentop-down expectations from prior experience failto predict the bottom-up inputs.6 According to thisaccount, left anterior temporal activation for voweldeviants3 and left dorsal pars opercularis activationfor phonetic deviants5 reflect the influence of top-down predictions, with thelocation of themismatch

response depending on the type of stimulus and thetype of prediction.Specht and colleagues7 also investigated speech

selective responses by manipulating acoustic com-plexity within category. Using a sound morphingtechnique, white noise was gradually changed intoeither prelexical speech (consonant-vowels) or mu-sic (piano or guitar chords). Bilateral activation inthe anteriorsuperior temporal sulci at [51 39/60 6 3] was observed irrespective of whetherspeech or music emerged but left lateralized speech

selective responses were identified more posteri-orly [56 36 12] and ventrally [54 18 6].

Likewise, Vaden and colleagues8 report an effect



6/27


of phonological priming in superior temporal re-gions [63 30 3/45 33 3] that were posteriorand ventral to HG. Together these results illustratespeech selective responses in the posterior and ven-tral processing directions.

To summarize, prelexical processing of auditoryspeech increased activation in bilateral superiortemporal gyri but left lateralized responses were ob-served when the noise merged into speech versusmusic7 or when there was a mismatch between thestimulus presented and the stimulus expected.3,4

In both these contexts, left lateralization may bedriven by top-down predictions from prior experi-ence rather than bottom-up processing. For exam-ple, the emergence of speech over time allows the

participant to predict the morphing of the stimulus;and mismatch responses have been associated withthe failure of top-down processing to predict thestimulus.6

The location of all left hemisphere coordinatesreported above for prelexical speech versus non-speech processing are illustrated in the top leftpanel of Figure 2. This illustrates that prelexicalactivation is most extensive in the direction ante-rior to HG. However, the results do not supporta straightforward gradient of increased speech se-

lectivity as information progresses more anteriorly.Speech selectivity was observed in the ventral7,8

and posterior7 directions as well as the ante-rior direction.3 This suggests that there are mul-tiple prelexical processing pathways, with the levelof activation in each depending on the stimu-lus, task and top-down expectations from priorexperience.

Semantic processing of spoken words

Semantic processing of auditory speech links heardsounds to what we know about their meanings. Thishas been probed by comparing familiar words to (i)closely matched but unintelligiblespectrally rotatedwords9,10 and (ii) pseudowords that are intelligi-ble but without meaning.1113 A study of auditoryfamiliarity, when the acoustic stimulus is held con-stant,14 is also reported in this section.

When participants listen to spoken sentences, rel-ative to spectrally rotated speech (which is unintel-ligible), bilateral mid-to-anterior superior temporal

activation has been reported by Friederici and col-leagues10 at [58 4 4/62 4 14] and by Obleserand Kotz9 at [60 8 6/62 6 4] with the latter

authors9 also reporting activation in the left angu-lar gyrus [48 64 38]. It is important to note,however, that the bilateral mid-to-anterior superiortemporal activations observed for familiar wordsrelative to spectrally rotated speech are in very close

proximity to those reported as common to musicand prelexical speech by Specht and colleagues7

at [51 3 9/60 6 3]. It may therefore bethe case that the activation reported for the com-parison of speech to spectrally rotated speech9,10

reflects prelexical differences between speech androtated speech rather than differences in semanticprocessing. In contrast, semantic retrieval involvesmore distributed regions. For example, Sharp andcolleagues15 have shown that when attention is di-

rected to the meaning of single words versus theacoustic properties of spectrally rotated speech, ac-tivation is observed in a left-lateralized network in-cluding regions in the inferior temporal gyrus, an-terior fusiform, hippocampus, angular gyrus, parsorbitalis, superior and middle frontal gyri, and theright cerebellum.

Thecomparison of spoken words to pseudowordsprovides a tighter control for prelexical processingand speech intelligibility than the comparison ofspoken words to spectrally rotated speech. I found

three studies1113 published on line in 2009 thatcompared spoken words and pseudowords in thecontext of deciding if the stimulus was a word ornot (i.e.,a lexical decision task). In a review of priorstudies, Davis and Gaskell13 report an impressivedouble dissociation in activation for familiar wordsversus pseudowords with activation for words dis-tributed in bilateral anterior middle temporal cor-tices, posterior temporal parietal cortices, and theprecuneus, with left-lateralized activation in the

temporal pole, posterior middle temporal cortex,anterior fusiform, pars orbitalis, middle frontal cor-tex, anterior cingulate and putamen, and the rightprecentral gyrus. The left hemisphere foci of theseresults are illustrated in the left middle panel ofFig. 2). This highlights word activations that areout-side the areas associated with prelexical processing,extending further in the anterior, ventral and pos-terior directions, consistent with the results of theprelexical studies discussed in section Pre-lexicalphonemic processing that suggest multiple speech-

processing routes.The meta-analysis of auditory words versus pseu-

dowords reported by Davis and Gaskell13 also



7/27


demonstrated increased activation for pseudowordsrelative to words within the superior temporal areasassociated with prelexical processing in the previ-ous section (compare top left and top right panelsin Fig. 2). Together the results suggest that pseu-

dowords increased the demands on prelexical pro-cessing whereas real words increased activation insemantic areas. Indeed, many of the areas associatedwith word versus pseudoword processing13 have alsobeen associated with semantic processing of writtenwords and pictures of objects, as summarized in areview of 120 functional imaging studies by Binderand colleagues.16 Diaz and McCarthy17 have alsoreported greater anterior middle temporal lobe ac-tivation bilaterally at [59 14 27/66 10 21]

for written content words (with high semantic asso-ciations) versus written function words (that carryminimal or ambiguous semantic associations).

The second study that compared spoken wordsand pseudowords during lexical decision was re-ported by Kotz and colleagues.12 Each target stim-ulus (word or pseudoword) was primed with arhyming or nonrhyming word or pseudoword.Rhyming pseudowords compared to rhyming wordsincreased activation in bilateral superior temporalgyri as reported by Davis and Gaskell,13 whereas

rhyming words compared to rhyming pseudowordsincreased activation in frontal and parietal regions.Kotz and colleagues12 focus their interpretation onthe observation that left inferior frontal activationat the level of the left pars orbitalis/pars triangularis[56 28 6] was greater for words than pseudowords.

The third study to compare lexical decisions towords and pseudowords was reported by Kouiderand colleagues11 who report higher activation forwords than pseudowords in the precuneus, medial

superior frontal gyrus, and the anterior cingulate.Although Kouider and colleagues11 do not presentany coordinates for these effects, it is interesting tonote that the same regions were also reported earlierfor words versus pseudowords13 and for phonologi-cal repetition of words.8 Activation in the precuneusand superior frontal gyrus have also been associatedwith semantic decisions on heard words,15,16 nar-ratives,18 metaphors,19 and semantic versus phono-logical word generation.20 In brief, activation forspoken words relative to pseudowords during lex-

ical decision was identified in frontal and parietalareas that are also activated during semantic deci-

sion tasks.

Finally, an alternative approach for identifyingsemantic recognition of auditory stimuli is to keepthe stimulus (and prelexical processing) constantand compare responses to the same stimuli beforeand after learning. Leech and colleagues14 used a

video game to train participants to associate novelacoustically complex, artificial nonlinguistic soundsto visually presented aliens. After training, left pos-terior superior temporalactivation at [54 37 1]increased with how well the auditory categories rep-resenting each alien had been learnt. The authorsemphasize that part of what makes speech special isthe extended experience that we have with it. Thiscould include acoustic familiarity, enhanced audiovisual associations, or auditory memory.

To summarize this section, semantic processingof familiar auditory stimuli activates a distributedset of regions that surround the ventral, anterior,and posterior borders of the perisylvian regionssupporting prelexical auditory speech processing(see middle left panel vs. top left panel in Fig. 2).The extended distribution of semantic activationssuggests that there are numerous pathways sup-porting speech perception and comprehension (seeRauschecker and Scott2). This might be envisionedas activity spreading out and around a spiders web-

like network, centered on HG.

Sentence comprehensionTo tap speech comprehension at the sentence level,activation has been compared for hearing grammat-ically correct sentences with plausible versus im-plausible meanings. This comparison is expectedto reveal semantic representations at the level ofword combinations while controlling for phono-logical and lexical familiarity, syntactic processing,

andworking memory. Activation in four keyregionshas been reported: anterior and posterior parts ofthe left middle temporal gyrus,9,19,21 bilateral ante-rior temporal poles,21,22 left angular gyrus,9,19 andthe posterior cingulate/precuneus.18,19 Each of theseareas was also associated with semantic processingin the review of 120 studies reported by Binder andcolleagues.16 Surprisingly, however, activation forplausible relative to implausible sentences is not asextensive as that associated with semantic process-ing of single words (see left and right middle panels

in Fig. 2) but was primarily focused in anterior andposterior parts of the middle temporal gyrus, justventral to the superior temporal sulci.



8/27


In the left anterior middle temporal locus, activa-tion has been reported by Mashal and colleagues19

at [46 6 10] for semantically plausible morethan implausible sentences; by Adank and Devlin23

at [58 8 4] when time compressed auditory

sentences were understood versus not understoodduring a sentence verification task; by Devauchelleand colleagues24 at [60 12 8/60 8 12] whenauditory sentence processing was primed by a sen-tence with the same meaning; by Obleser and Kotz9

at [52 6 14] for sentences with meanings thatwere difficult versus easy to predict; and by Hubbardand colleagues25 at [57 12 8] when spokenspeech was accompanied by beat gestures (rhyth-mic beating of hand) that enhance semantic mean-

ing by providing intonation. These sentence-levelcomprehension effects are not specific to auditorywords because anterior middle temporal activationhas also been reported by Snijders and colleagues26

at [56 6 16] for written sentences comparedto unrelated word sequences; and by Kircher andcolleagues27 at [52 4 8] when spoken sentenceswith abstract meanings were accompanied by visualgestures relative to speech only or gesture only. Thissuggests that the left anterior superior temporal sul-cus is involved in multimodal sentence processing.

Likewise, left posterior middle temporal activa-tion has been reported at [54 41 1] by Mashaland colleagues,19 for semantically plausible versusimplausible sentences; more anteriorly at [41 310] by Rogalsky and Hickok21 when meaningful sen-tences were compared to lists of unrelated words;and more posteriorly at [55 52 2] by Adankand Devlin23 when time compressed auditory sen-tences were understood versus not understood dur-ing a sentence verification task. Activation in the left

posterior middle temporal cortex is also increasedwhen auditory sentences are accompanied by visualobservation of the speakers body movements. Forexample, Holle and colleagues28 reported activationat [48 39 3] when participants viewed a speakerdescribing manual object actions (e.g.,now I gratethe cheese) while making meaningful manual ges-tures versus hearing speech only or seeing gesturesalone. This effect was stronger in adverse listeningconditions that benefit most from visual cues. Like-wise, Kircher and colleagues27 found activation at

[63 43 1] when spoken sentences with abstractmeanings were accompanied by visual cues from

gestures versus hearing speech only or seeing ges-

tures only; and Robins and colleagues29 found acti-vation at [66 31 4] when participants viewed theface and emotional expression of a person speakingsentences versus hearing the sentence or viewing theface alone. Seeing the lip movements of the speaker

also provides phonological cues,30 therefore it is notsurprising that the activation reported by Robinsand colleagues29 for speech and faces extended dor-sally [45 37 10] into the superior temporalgyrus.

Thefindings of Dick andcolleagues31 further sug-gest that left posterior temporal activation in re-sponse to speech (spoken stories) and gesture re-flects two sources of semantic information ratherthan an active integration process. Specifically, they

found that perceiving hand movements duringspeech increased activation in the left posteriortemporal regions irrespective of the semantic re-lationship between the hand movement and the ac-companying speech. Thus, it did not depend onwhether speech and gesture could be integratedinto the same meaning or not. Instead, the au-thors conclude that listeners attempt to find mean-ing, not only in the words that speakers produce,but also in the hand movements that accompanyspeech.

The role of the left and right temporal polesin sentence processing relative to unrelated lists ofwords has been reported by Rogalsky and Hickok21

at [47 17 18 and 52 18 20]. Snijders and col-leagues26 also found bilateral temporal pole regionsat [53 18 30/54 20 32] for written sentencescompared to unrelated word sequences but the co-ordinates are more ventral than those reported byRogalsky and Hickok.21 To distinguish auditory andvisual sentence processing requires a within subjects

comparison of (i) auditory sentences with plau-sible versus implausible meanings with (ii) visualsentences with plausible versus implausible mean-ings. All four of these conditions were included ina study reported by Richardson and colleagues,32

which found no interaction between plausible ver-sus implausible sentences presented in the auditoryversus visual modality. Likewise, a meta-analysisof semantic and sentence processing activation byVisser and colleagues22 found anterior temporallobe activation for both auditory and visual sen-

tences and these authors discuss the role for theanterior temporal cortex in amodal combinatorial

semantics.



9/27


Left angular gyrus activation has been less consis-tently reported during sentence processing but wasobserved by Mashal and colleagues19 at [47 5925] for sentences with plausible versus implausiblemeanings; and by Obleser and Kotz9 at [46 64

38] for the comparison of heard sentences to un-intelligible spectrally rotated speech. Obleser andKotz9 suggest that activation in the angular gyrus fa-cilitates sentence comprehension via top-down ac-tivation of semantic concepts. Likewise, Carreirasand colleagues33 demonstrated a top-down rolefor the angular gyrus at [48 74 28] duringreading relative to object naming, and Brownsettand Wise34 highlight the role of the left angulargyrus in both speaking and writing. In the medial

parietal lobes, Whitney and colleagues18

highlightthe importance of the right precuneus and bilat-eral posterior/middle cingulate cortices for narra-tive language comprehension, associating these re-gions with the processes involved in updating storyrepresentations.

In summary, the comparison of grammaticallycorrect sentences with comprehensible versus in-comprehensible meanings has been associated withactivation in anterior and posterior parts of the leftmiddle temporal gyrus,9,19,21 bilateral anterior tem-

poral poles,21,

22 the left angular gyrus,9,

19 and theposterior cingulate/precuneus.18,19 Each region mayplay a different role in sentence semantics. For ex-ample, seeing concurrent hand gestures enhancedactivation for heard sentences in the posterior mid-dle temporal cortex consistent with auditory andvisual semantics converging at the posterior end ofthe semantic network. Inconsistent activation in theleft angular gyrus and medial parietal regions mayreflect task-dependent semantic retrieval strategies;

and activation in anterior middletemporal lobe mayreflect the integration of multiple semantic con-cepts.

When the sentence comprehension activationsare illustrated on the cartoon brains in Figure 2,it is striking to see that temporal lobe activation forplausible versus implausible sentences (middle rightpanel) lies outside the areas associated with prelexi-cal processing (top left panel) or pseudowords (topright panel) but inside the ring of activation as-sociated with semantic processing of single words

(middle left panel). This dissociation might reflecta task confound (e.g.,lexical decision for words andplausibility judgments for sentences). Alternatively,

it raises the interesting possibility that activation

in the superior temporal sulci for sentences withplausible versus implausible meanings reflects theconsolidation of multiple semantic concepts into anew and integrated concept. For example, the sen-

tence My memory is a little cloudy has six wordsincluding three rich content words (memory, little,cloudy) to express one concept forgotten. Thissuggests that activation for accessing the semanticsof single words is spatially extensive but consoli-dates in the ventral banks of the superior temporalsulci when multiple meanings can be integrated intofewer concepts.

Semantic constraints in sentence

comprehensionThe previous section focused on activation forgrammatically correct sentences with plausible ver-sus implausible meanings. In contrast, this sectionconsiders the reverse of this comparison, that is,which areas are more activated by sentences withimplausible versus plausiblemeanings.The assump-tion here is that, when the sentence meaning isimplausible, ambiguous, or unconstrained, therewill be a greater need for semantic constraints thatwill be expressed by top-down processing from our

prior knowledge of the world and the experimentalcontext.

When sentence comprehension becomes moredifficult, activation is consistently observed in theleft pars opercularis. I am distinguishing dorsal ver-sus ventral pars opercularis according to whetheractivation is more or less than 20 mm above theACPC line. Dorsal pars opercularis activation hasbeen reported for grammatically correct sentenceswith implausible versus plausible meanings by Ye

and Zhou35

at [40 22 24] and in several studieswhen the meaning of the sentence is more difficultto extract. For example, by Bilenko and colleagues36

at [52 14 23/47 14 29] for sentences with ambigu-ous versus unambiguous meanings; by Mashal andcolleagues19 at [44 18 30] for sentences with novelmetaphoric versus literal meanings; by Willems andcolleagues37 at [40 10 22] when speech was pre-sented with incongruent relative to congruent ges-tures or pantomimes (i.e.,when there was interfer-ence at the comprehension level); and by Desai and

colleagues38 at [54 10 31] when participants lis-tened to sentences with abstract versus visual or mo-toric meanings.However, leftdorsal pars opercularis



10/27


activation is not specific to semantic contexts. It hasalso been reported by Myers and colleagues5 at [418 27] for prelexical stimuli when there is an unex-pected change in phonetic category and in otherstudies of linguistic and nonlinguistic sequencing

(see next section).More ventrally, pars opercularis activation has

been reported by Tyler and colleagues39 at [5115 18] for grammatically correct sentences with im-plausible versus plausible meanings; by Obleser andKotz9 at [60 12 16] for sentences with low versushigh semantic expectancy; by Desai and colleagues38

at [51 19, 13] when participants listened to sen-tences with abstract versus visual or motoric mean-ings; by Turner and colleagues40 at [50 14 18]

when participants view silent videos of a speakermaking speech versus non-speech oral movements;and by Szycik and colleagues41 at [49 14 18] whenheard speech is inconsistent with seen oral move-ments. These activations may reflect top-down pro-cessing that is attempting to predict the stimuli. Forexample, sentences with low semantic expectancy(e.g.,She weighs the flour) are difficult to predictbefore completion of the sentence because a verbsuch as weighed can be associated with multi-ple objects; whereas the sentences with high seman-

tic expectancy (e.g., She sifts the flour) are eas-ier to predict before sentence completion becausethere are not many nouns that are likely to followverbs like sift. In this context, activation in theleft ventral pars opercularis might reflect increaseddemands on predictions about the forthcoming se-quence of events. The examples in this section havepertained to semantic predictions. However, leftventral pars opercularis activation has also been as-sociated with verbal working memory at [53715]

by Koelsch and colleagues

42

; and for articulatoryplanning. Specifically, Papoutsi and colleagues43 re-port activation at [54 12 12] for the repetition andsubvocal rehearsal of pseudowords with low- ver-sus high-sublexical frequency; a manipulation thatdoes not involve semantic, syntactic, or phonolog-ical word form processing. Left ventral pars op-ercularis activation can therefore be consistentlyexplained in terms of the generation of semanticor articulatory sequences. Its function is consideredagain in sections Articulation and Summary of

speech production.The comparison of auditory sentences with plau-

sible versus implausible meanings was also associ-

ated with left pars orbitalis, activation at [39 27

6/48 24 9] by Tyler and colleagues39; and at[52 32 4] by Ye and Zhou.35 The same regionhas been reported by Davis and Gaskell13 at [4027 9/48 24 12] for single auditory words rela-

tive to pseudowords; by Nosarti and colleagues44 at[38 32 6] for reading irregularly spelled wordsrelative to pseudowords; by Schafer and Constable45

at [42 28 11] for semantic relative to syntacticprocessing in written speech comprehension; andby Aarts and colleagues46 at [40 22 14] whenthe written words right or left are incongruentwith the direction of an arrow (right or left). Allthese studies suggest that left pars orbitalis activateswith increasing demands on semantic retrieval in

the context of semantic conflict that may be aris-ing at the single-word level or between the mean-ing of single words and sentences. Likewise, rightinferior frontal activation has also been reportedin the context of conflicting semantic information.For example, activation in the right pars opercu-laris [44 18 14/54 18 12] and triangularis [46 286] were reported by Snijders and colleagues26 whenthe meaning of a sentence was ambiguous versusunambiguous; and by Peelle and colleagues47 at [4414 22/44 4 28] when the meanings of a series of

sentences conflict with one another. Dick and col-leagues31 also report right inferior frontal activa-tion when participants listened to and watched astory teller using hand movements that were se-mantically incongruent relative to congruent withthe spoken speech. As with left inferior frontalactivation, right inferior frontal activation is notspecific to the auditory modality. It has also beenreported at [60 28 4] by Schmidt and Seger48

when subjects read metaphors relative to literal sen-

tences. Right inferior frontal activation may there-fore increase when there is incongruency betweenthe meaning of the words and the meaning of thesentence.

In summary, when the semantic content of sen-tences is difficult to extract, activation increases inthe left and right pars opercularis and orbitalis.This contrasts to the temporal and parietal regionsthat are more activated for sentences with plausi-ble meanings. The most likely explanation is thatactivation in the pars orbitalis reflects semantic

competition at the single-word level and activationin the pars opercularis reflects top-down predic-

tions on the plausible sequence of events. This is



11/27


discussed in more detail in the following sectionsthat also attempt to clarify a distinction between thefunction of dorsal and ventral regions of the parsopercularis.

Syntactic constraintsSyntactic processing refers to the hierarchical se-quencing of words and their meanings with the ex-pected order of words depending on the languagespoken. Consequently, there are language-specificgrammatical rules about adjacent words (what typeofwordislikelytofollowanother)andlong-distanceword associations. Deriving the meaning of wordsbecomes particularly complex when one phrase isembedded within another. For example, a sentence

such as the man the dog chased was carrying aball has one relative clause (the dog chased)nested within a simpler sentence the man was car-rying the ball. As the sentence becomes more com-plex, the demands on auditory short-term memoryincrease because items have to be held in mem-ory for longer until the meaning of the sentenceemerges.

Syntactic processing has been investigated bycomparing sentences with grammatical errors tosentences without grammatical errors; and for sen-

tences with more versus less syntactically complexstructures. In both cases, the demands on syntacticprocessing are confoundedby the differing demandson semantics because both grammatical errors andcomplex sentences make it more difficult to extractthe meaning of the sentence. It is therefore not sur-prising that the left pars opercularis areas that weremore activated for sentences with implausible ver-sus plausible meanings in the previous section arealso more activated for sentences with grammatical

errors or complex structures. For example, left ven-tral pars opercularis activation has been reportedat [62 18 12] by Friederici and colleagues10 whensentences had syntactic errors; and at [47 12 18]by Raettig and colleagues49 when there were viola-tions in verbargument structure. As discussed inthe previous section on semantic constraints, acti-vation in the left ventral pars opercularis has alsobeen associated with verbal working memory [537 15] and predicting the sequence of semantic or ar-ticulatory events. It will be discussedagain in section

Summary of speech production.Syntactic complexity has also been shown to in-

crease auditory sentence activation in the left dorsal

pars opercularis at [45 6 24], even when working

memory demands are factored out.50 An effect ofsyntactic complexity at [481630]hasalsobeenre-ported by Newman and colleagues51 for written sen-tences; and activation at [46 16 24] during written

sentences was more strongly primed by prior pre-sentation of the verbs than the nouns.52 Makuuchiand colleagues50 attributed dorsal left pars oper-cularis activation to the hierarchical organizationof sequentially occurring events. Hierarchical se-quencing of events is a core component of syntacticprocessing but it is not limited to linguistic stimuli.It can therefore be investigated using nonlinguisticstimuli and such studies have again highlighted theinvolvement of the left dorsal pars opercularis.53,54

For example, Bahlmann and colleagues53

found sig-nificantly higher activation in the dorsal pars oper-cularis at [50 6 30] when sequences of nonlinguis-tic visual symbols could be predicted on the basis ofnonadjacent word dependencies compared to adja-cent word dependencies. Similarly, Tettamanti andcolleagues54 found dorsal left pars opercularis acti-vation at [54 12 24] when participants learnt thenonrigid (variable) dependencies of items within asequence of unfamiliar colored shapes. The dorsalleft pars opercularis therefore appears to be involved

in sequencing events, irrespective of whether theyare linguistic or nonlinguistic.

The third region that has been associated withsyntactic errors or complexity lies on the border be-tween the pPT and the vSMG. Specifically, Raettigand colleagues49 reported activation at [65 4215] for sentences with grammatical errors (e.g., Pe-ter has eatthe apple vs. Peter has eaten the ap-ple); Friederici and colleagues55 report activationat [63 42 21] for syntactically complex versus

less complex sentences; and Richardson and col-leagues32 report increased activation at [58 4222] for sentences where the meaning depends onthe order of the subject and object (e.g., The dogchased the horse/The horse chased the dog) ver-sus sentences where the subject and object are notreversible (e.g.,The dog eats the bone).

Activation in the vicinity of pPT and/or vSMGis not limited to difficult syntactic contexts. Theseareas are also activated when sentence compre-hension is made more challenging at the percep-

tual or semantic level. In perceptually challengingcontexts, Dos Santos Sequeira and colleagues56 re-

port activation at [56 42 24] when participants



12/27


listened to consonant-vowel syllables in backgroundnoise. In semantically challenging contexts, Obleserand Kotz9 report activation at [46 36 18] forsentences with low versus high semantic expectancyparticularly when the stimuli had low intelligibility;

and Hickok and colleagues1 report pPT/vSMG ac-tivation for individual subjects at [x= 47 to 61;

y = 32 to 50; and z = 11 to 28] for hearingsentences that were syntactically correct but seman-tically meaningless because some words in the sen-tence had been replaced with pseudowords (It isthe glandor in my nedderop).

Why does left pPT/vSMG activate during difficultspeech comprehension conditions? Evidence fromstudies of auditory short-term memory42,43,57 and

speech production in the absence of auditory stim-uli32,58 suggests that it is involved in subvocal ar-ticulation. For example, Papoutsi and colleagues43

found increased left pPT/vSMG activation bilater-ally at [56 38 20/64 32 10] when subjects ar-ticulated four versus two syllables during a task thatinvolved delayed repetition and subvocal rehearsalof pseudowords. Left pPT or vSMG activation hasalso been observed during auditory working mem-oryat[44 38 21]by Koelschand colleagues42 andat [63 34 19] by Buchsbaum and DEsposito.57

One interpretation is that left pPT/vSMG activa-tion reflects a working memory rehearsal strategythat holds an auditory representation in memory.However, in Buchsbaum and DEsposito,57 the taskdid not involve the online retention of a fixed setof items. Instead, left pPT/vSMG activation was ob-served when participants made correct word recog-nition decisions on constantly changing targets thatmay or may not have been presented before. Buchs-baum and DEsposito57 therefore suggest that left

pPT/vSMG activation is involved in phonologicalretrieval processes (perhaps subvocal articulation)that occurs automatically during successful recog-nition of recently presented words.

Other studies have shown that left pPT/vSMG ac-tivation is not limited to speech production in thecontext of auditory stimuli or short-term memorytasks. Nor is it limited to the production of speechsounds. For example, Richardson and colleagues32

report activation at [62 40 14/50 46 20] foralternatively articulating one and three in re-

sponse to seeing (not hearing) the digits 1 and 3, andalso for nonverbal mouth movements versus hand

movements; and Wilson and colleagues58 report ac-

tivation at [64 34 24] for naming pictures withlow- versus high-frequency names. For non-speechsounds, Koelsch and colleagues42 report activationat [47 42 24] when a sequence of auditory tonesis held in memory; and Yoncheva and colleagues59

report activation at [42 38 26] for a tone match-ing task (are tone-triplet pairs the same or not) aswell as for a rhyme-matching task (do word pairsrhyme or not) with no significant difference in ac-tivation for the rhyme versus tone comparisons.

The correspondence in the location of speechand non-speech effects has been demonstrated byKoelsh and colleagues42 for auditory working mem-ory tasks; by Yoncheva and colleagues59 duringmatching tasks; and by Richardson and colleagues32

for auditory speech comprehension and visuallycued mouth movements. Common activation fortwotasksmayormaynotreflectacommonunderly-ing function. Indeed, Hickok and colleagues1 disso-ciated the spatial pattern of activation in pPT/vSMGfor speech comprehension and production, usingmultivariate pattern classification. This approachidentifies a significant difference in the pattern ofvoxels over a single region but it does not dissociatethefunctional contribution of the pPT versus vSMGregions.

To conclude this section, activation for syntacti-cally difficult sentences increases in the left pars op-ercularisandtheleftpPT/vSMG. 55 Thesameregionsare also activated when syntactic demands are heldconstantand sentencecomprehension is made moredifficult.Therefore, activation in these regions is notspecific to syntactic processing. The dorsal pars op-ercularisappears to play a top-down role in sequenc-ing linguistic and nonlinguistic events, independentof working memory demands.50 The ventral pars

opercularis is involved in verbal working memory

42

and sequencing semantic and articulatory events.In contrast, activation in the pPT and vSMG can beexplained by subvocal articulation. The functionaldistinctions between these regions will be discussedfurther in section Summary of speech productionbut future studies will be required to dissociate thefunctions of pPT and vSMG.

Subvocal articulation during speechcomprehension

The previous section concluded that left pPT/vSMGactivation during difficult speech comprehensionreflected subvocal articulation. The close link



13/27


between comprehension and articulation is one ofthe unique features of speech because we can ac-curately reproduce speech sounds but we cannotaccurately reproduce non-speech sounds. This isnot surprising given that auditory speech, by def-

inition, is constrained by the auditory signals thatour vocal tracts and articulators can produce. Evi-dence that heard speech activates areas involved inarticulating speech comes from common activationfor (i) auditory speech in the absence of articula-tion and (ii) articulation in the absence of audi-tory speech. Three areas meet these criteria. One isthe motor cortex,60,61 the others are the pPT andvSMG regions discussed in the previous section. Inall three of these regions, activation is consistently

activated during speech production but inconsis-tently activated during speech comprehension. Itwas quite striking that the majority of auditoryspeech perception and comprehension studies thatI reviewed did not report left pPT or vSMG activa-tion, irrespective of whether the stimuli were sen-tences,10,23,24,38 words,8,11,12,6264 or prelexical con-sonant vowels.3,7 I only found reports of left pPTand/or vSMG activation in five studies of sentencecomprehension1,9,32,49,55 and onestudy of prelexicalperception.56 In all cases, pPT/vSMG activation was

identified when speech comprehension was mademore difficult at the syntactic, semantic, or percep-tual level (see previous section). Thus, fMRI evi-dence suggests that an area associated with speechproduction is not consistently involved in speechcomprehension (when acoustic inputs are matched)but may play an essential role when speech compre-hension is difficult.

A similar conclusion has been reached for the roleof the motor cortex in speech perception. This is dis-

cussed in two recent review papers.

60,61

Devlin andAydelott60 discuss the degree to which functionalimaging data support the motor theory of speechperception. They emphasize that motor and premo-tor activation is not consistently reported in speechperception studies but it is observed when speechstimuli are degraded (e.g., embedded in noise) orminimal (e.g.,syllables rather than words) and maytherefore be recruited to aid speech comprehen-sion, when speech perception is challenging. Scottand colleagues61 also conclude that motor activa-

tion might facilitate speech perception in difficultlistening conditions. In addition, they suggest that

motor activation during speech perception allows

people to coordinate their speech with others, bothin terms of turn taking and also in terms of id-iosyncratic characteristics of pronunciation and theuseof conceptual andsyntactic structures. Themainpoint, however, in both reviews60,61 is thatactivation

in motor cortex is not always necessary for phoneticencoding or speech comprehension.

To conclude this section, left motor and left pPTand left vSMG activations have been reported fortasks that challenge auditory comprehension andeach of these regions are also activated for speechproduction in the absence of auditory cues. Ac-tivation may therefore reflect subvocal articula-tion that facilitates difficult speech comprehension.More specifically, the overlap in the location of ac-

tivation for speech production and difficult speechperception may reflect the use of the speech pro-duction system to make predictions during speechperception.

The role of prosody in speech comprehensionProsody refers to the patterns of stress and intona-tion in speech. These patterns carry emotional andnonverbal information that supplements the lexico-semantic information carried by the spoken words.Wiethoff and colleagues65 found that emotional

prosody increases activation in the amygdala. Thisis consistent with previous studies showing amyg-dala activation for facial expressions,66 and whensubjects read emotional words.67 Wittfoth and col-leagues68 found that when the emotional prosodyin heard sentences was incongruous with semanticcontent, activation increased in the right superiortemporal gyrus and sulcus and right dorsal anteriorcingulate cortex (ACC). Many previous studies haveassociated the right superior temporal sulcus with

nonverbal emotional processing of human voicesand faces but Kreifelts and colleagues69 have recentlydemonstrated a functional subdivision of the supe-rior temporal lobes. They found maximum voicesensitivity in the trunk of the superior temporallobe and maximum face sensitivity in the posteriorterminal ascending branch. They also argue that anoverlap of these two regions at the bifurcation of thesuperior temporal cortex may support the forma-tion of an audiovisual emotional percept.

Although the earlier studies have investigated the

impact of emotional prosody in general, Ethoferand colleagues70 attempted to dissociate activationfor pseudowords spoken in five prosodic categories



14/27


(anger, sadness, neutral, relief, and joy). Using mul-tivariate pattern analysis, they demonstrated thateach emotion had a specific spatial signature in theauditory cortex that generalized across speakers.

Summary of activation related to speechcomprehensionIn summary, the current speech comprehension lit-erature supports the following anatomical model.Acoustic processing of prelexical speech increasesactivation in the vicinity of bilateral HG with se-lectivity for speech emerging in lateral, anterior,ventral, and posterior regions of the superior tem-poral gyri and sulci. Semantic processing of singlewords extends even further in the anterior, ventral,

and posterior directions into the middle and infe-rior temporal cortex. Remarkably, however, seman-tic processing of plausible relative to implausiblesentences was more focal than for words, and wasobserved in the ventral banks of the anterior andposterior superior temporal sulci, which lie betweenthe activations associated with prelexical processingin the superior temporal gyri and single-word se-mantics in the middle and inferior temporal gyri(see Fig. 2).This constricted pattern of activation forsentence semantics might reflect the consolidation

of multiple words into a reduced set of concepts.Theobservation that prelexical andsemantic pro-

cessing of spoken words extends in anterior, ven-tral, and posterior directions suggests that the samespeech inputs can follow multiple different path-ways. The location of prelexical activation wouldthen be determined by the task demands. For ex-ample, when an articulatory response is required,top-down expectations from motor programs maystabilize and enhance prelexical processing in the

dorsal processing direction but when a lexical de-cision is required, top-down expectations fromsemantic knowledge may stabilize and enhanceprelexical activation in the ventral direction. Inthis way, the cortical networks supporting languagecomprehension are dynamically determined by thetask and context30 (see Fig. 3).

Several different sources of top-down inputs/constraints are distinguished in Figure 3, includ-ing semantic, syntactic (word sequences), articula-tion, auditory, visual (spelling), and spatial location

for locomotion. The review reported in this sectionlocalized semantic constraints to the left pars or-bitalis, ventral pars opercularis, angular gyrus, and

precuneus. Wordorder constraints (grammaticalse-

quencing) were identified in the dorsal pars oper-cularis; and articulatory constraints were associatedwith activation in the left motor cortex,60 vSMGand pPT (see Fig. 3). For further discussion of these

regions (see section Summary of speech produc-tion).

Left lateralization for speech comprehensionDifferences in the role of the left and right temporallobes have been of particular interest because lesionstudies indicate that it is the left rather than righttemporal lobe that is needed for speech recogni-tion and production. In this context it is surprisingthat the comparison of speech and closely matched

non-speech stimuli result in bilateral temporal lobeactivation.9,10 The specialization and lateralizationfor temporal lobe speech function may therefore bedriven by nonacoustic differences between speechand non-speech stimuli. These nonacoustic differ-ences include our significantly greater experiencewith speech than non-speech stimuli,14 the result-ing categorical qualities of the perceptual represen-tation of speech,71 the ability to produce as well asto perceive speech,60,61 and the influence of spatialorienting.62

Why should the effect of auditory expertise14,23

be greater in the left than right temporal lobe? Here Iconsider two possibilities: (i) either familiarity withthe auditory stimulus increases bottom-up connec-tions from acoustic processing in bilateral supe-rior temporal lobes to representations of seman-tics, grammar, and articulation in the left frontalcortex; or (ii) familiarity results in more success-ful top-down predictions from left frontal regionsthat stabilize acoustic processing in the left temporal

lobe more than the right temporal lobe. Adank andDevlin23 support the latter hypothesis. They suggestthat the left-lateralized temporal lobe response theyobserved when time-compressed speech was rec-ognized might be driven by top-down processingfrom left-lateralized premotor activation. Kouiderand colleagues11 also highlight the influence of top-down inputs on lexical processing to explain whythey observed repetition suppression in left HGfor words but not pseudowords. Similarly, the left-lateralized anterior temporal lobe response that Leff

and colleagues3 observed for vowel deviants couldreflect top-down mechanisms (see section Prelexi-cal phonemic processing).6



15/27


Figure 3. Functional attributions. Toppanel:Summaryof functions attributedto languageareas,see text forrationale.

Future reviews are required to postulate and illustrate the multiple pathways that connect these areas. Below panel:a

schematic summary of the balance between bottom-up and top-down processing during speech comprehension. The

hypothesis is that top-down constraints from frontal, motor, and parietal areas influence the location of activation

foci for bottom-up prelexical processing of auditory inputs.



16/27


In conclusion, frontal lobe activations duringspeech comprehension are more consistently left-lateralized than temporal lobe activations. The pro-posal developed here is that lateralization in thetemporallobesmaybedriventop-downfromhigher

level processing in frontal areas. Moreover, differentparts of the frontal cortex may determine lateraliza-tion in different temporal regions. If this hypothesisis correct, then the more damaging effect of lefttemporal lobe lesions compared to right temporallobe lesions on language function might be a con-sequence of loss of top-down connections from an-terior speech areas. This could be investigated withfunctional connectivity studies.

Review of speech production

This section focuses on speech production, a com-plex multistage process that links conceptual ideasto articulation. The cognitive components of speechproduction have been described in many studies.Its roots are in the conceptual ideas that need tobe expressed. The words associated with these con-cepts must then be retrieved and sequenced withtheir appropriate morphological forms. At the sametime, competition from other words with similarmeanings needs to be suppressed. Word selection

is therefore a dynamic interaction between excita-tion of the intended words and inhibition of theunintended words.7274

Sequencingis requiredat several different levels ofthe speech production process. At the sentence level,words need to be sequenced in a grammatically cor-rect order. Within a word, sequencing is required toassemble the correct combination of phonemes andsyllables. In addition, pitch, prosody, and a metri-cal structure need to be applied to speech output,75

which will depend on the conversational context.The combination of phonemes and syllables beingproduced is translated into a sequence of articu-latory plans (phonetic encoding). This is followedby the initiation and coordination of sequences ofmovements in the speech articulators, which in-clude the tongue, lips, and laryngeal muscles thatvibrate the vocal tract for vowel phonation andprosody. At the same time, respiration needs to becontrolled and the resulting auditory signal needsto be monitored and fed back to the motor system

for online correction of speech production.Given the many processes that support speech

production, it is not surprising that many different

brain areas are involved. Each level of the classical

speech output pipeline are addressed in the follow-ing sections: conceptual processing (controlling forword retrieval and articulation) (section Concep-tual processing in speech production); word re-

trieval (controlling for conceptual processing andarticulation) (section Word retrieval); articula-tion (controlling for conceptual processing andword retrieval) (section Articulation); and au-ditory feedback (section Monitoring speech out-put).

Conceptual processing in speech productionIn a large-scale review of 120 functional neuroimag-ing studies of the neural systems that store and re-

trieve semantic memories, Binder and colleagues16

identified seven left-lateralized regions associatedwith amodal semanticprocessing: (i) inferior frontalgyrus, (ii) ventral and dorsal medial prefrontal cor-tex, (iii) posterior inferior parietal lobe, (iv) middletemporal gyrus, (v) fusiform, (vi) parahippocam-pal gyri, and (vii) the posterior cingulate gyrus. Allthese areas have been associated with speech com-prehension in the previous section. Their contribu-tion to speech production depends on the task andthe type of semantic information that needs to be

retrieved.A demonstration that speech production and

speech comprehension access the same conceptualsystem was provided by de Zubicaray and McMa-hon.76 They found that activation in the left parsorbitalis [51 24 9] was reduced when picturenaming occurred in the context of semantically re-lated heard words versus unrelated or phonologi-cally related heard words. The authors suggest thatthe picture is recognized before the auditory word

and semantic processing of the picture facilitatesthe recognition of the auditory word. The coordi-nates of this effect [51 24 9] also correspond tothose reported by Tyler and colleagues39 at [4824 9] and by Ye and Zhou35 at [52 32 4]whenheard sentences were semantically implausibleversus plausible. The left pars orbitalis is thereforeinvolved in semantic retrieval processes that sup-port both speech comprehension and productiontasks.

A direct comparison of semantic versus phono-

logical word retrieval by Birn and colleagues20 hasillustrated theroleof themedial superior frontal cor-tex at [21 11 45] in the conceptual processes that



17/27


support speech production. It is interesting to notethat activation in the medial superior frontal cortexhas also been reported by Tremblay and Gracco77

at [21 19 57] for volitional versus stimulus-driven word selection. This raises the possibility

that semantic word retrieval (e.g.,producing wordsthat belong to a particular semantic category likefruits) requires morevolitional effort thanphono-logical word retrieval (e.g., producing words thatstart with the same letter). The point here is that in-creased activation for semantic tasks may reflect thedemands on executive strategies that are necessaryfor, but not limited to, semantic word retrieval.

In addition to the widely distributed set of re-gions that are typically associated with seman-

tic/conceptual processing during speech productionand comprehension, many other sensory-motor ar-eas may play a role in accessing words with specificmeanings. This was illustrated in a study of verbalfluency by Hwang and colleagues,78 who found thatthe retrieval of words belonging to visual categoriesactivated extrastriate cortex (a secondary visual pro-cessing area); retrieval of words belonging to motorcategories activated theintraparietal sulcus and pos-terior middle temporalcortex;and retrievalof wordsbelonging to somato-sensory categories activated

postcentral and inferior parietal regions. Thus, theydemonstrate that lexico-semantic processing dur-ing speech production is distributed across brainregions participating in sensorimotor processing.This is argued to be a consequence of the sen-sorimotor experiences that occurred during wordacquisition.

Hocking and colleagues72 also demonstrate howactivation during speechproduction depends on thetype of words that arebeing retrieved. These authors

found bilateral hippocampal activation at [30 330/33 6 33] when pictures to be named wereblocked in terms of their semantic category; andbilateral anterior medial temporal activations whenthe objects to be named were blocked accordingto similar visual features. Whitney and colleagues79

also found left hippocampal activation [28 355]for spontaneous word production (free verbal asso-ciation) relative to category and letter fluency. Thesemedial temporal lobe activations duringspeech pro-duction tasks therefore appear to depend on the

demands placed on semantic retrieval even thoughmedial temporal lobe structures such as the hip-

pocampus are more classically associated with the

initial acquisition of newwords.80 To conclude,con-ceptual processing during speech production acti-vates the same set of regions that have been associ-ated with single-word comprehension.

Word retrievalTo investigate the processing that supports word re-trieval, speech production has been compared totasks that control for articulation. For example,Whitney and colleagues79 and Jeon and colleagues81

compared word generation tasks (find, select, andproduce a word related to the stimulus) to read-ing (produce the word or nonword specified by thestimulus). In both studies, the most significant andconsistent effects were observed in the left inferior

and middle frontal gyri, spanning both the pars op-ercularis (BA 44), the pars triangularis (BA45), andthe inferior frontal sulcus. The peak coordinates forthe left middle frontal activations in these two stud-iesareremarkablysimilar:at[482821]inWhitneyand colleagues79 andat[51 25 25] in Jeon and col-leagues81 Two other studies also demonstrated thatthis left middle frontal area is involved in word re-trieval irrespective of the type of words that neededto be produced. Specifically, Heim and colleagues82

observed that the same left middle frontal region at

[44 28 22] was activated for generating words thatwere either semantically or phonologically relatedto a word; and de Zubicaray and McMahon76 foundan effect of semantic and phonological priming onpicture naming at [48 21 18], which is again con-sistent with a generic role for this region in wordretrieval.

A second region that is consistently activated inword retrieval tasks is the left dorsal pars opercu-laris that was associated with top-down sequencing

of linguistic and nonlinguistic events in the speechcomprehension section (Syntactic constraints).During speech production, left dorsal pars oper-cularis activation was located at [40 17 25] byJeon and colleagues81 for word generation morethan reading; and at [49 13 29] by Fridrikssonand colleagues83 for imitatingor observing a speakerproducing nonsense syllables. As viewing a speakerproducing nonsense syllables does not involve se-mantics or articulation,83 speech production activa-tion in the left dorsal pars opercularis is not specific

to semantic or syntactic processing. Indeed, as dis-cussed in the speech comprehension section earlier,the left dorsal pars opercularis is involved in the



18/27


hierarchical sequencing of linguistic and nonlin-guistic events.37,50,53,54 This is required for speechproduction as well as speech comprehension. Theleft dorsal pars opercularis may therefore play a rolein the hierarchical sequencing of events during pro-

duction as well as comprehension tasks.A third region that is consistently activated in

word retrieval tasks is the left ventral pars opercu-laris that was associated with predicting semanticor articulatory sequences in the speech compre-hension section (Syntactic constraints). Duringspeech production, this region has been reportedat [57 9 9] by Zheng and colleagues84 for artic-ulating versus listening to the word Ted; and at[54 12 12] by Papoutsi and colleagues43 for the

repetition and subvocal rehearsal of pseudowordswith low versus high sublexical frequency. As thismanipulation does not involve semantic, syntactic,or phonological word form processing, the authorsattribute their activations to phonetic encoding (ar-ticulatory planning). Left ventral pars opercularisduring speech production has also been reportedin two studies by Heim and colleagues.82,85 In onestudy, Heim and colleagues82 found that left ventralpars opercularis activation at [46 10 4] was greaterfor retrieving words that were phonologically ver-

sus semantically related to a target. Another85 foundleft ventral pars opercularis activation at [50 6 6]during picture naming in German was sensitive tothe gender of the noun being produced. This wasprobed by presenting pictures to be named with anauditory word and its gender determiner (e.g.,derTischthe table). The gender of the auditory wordand picture were either the same or different withmore activation when they were different. However,it is not clear from this study whether activation in

the left ventral pars opercularis was sensitive to thegender of the determiner (which would reflect syn-tactic processing) or to the phonological overlap inthe determiner being heard and the determiner be-ing retrieved (which could reflect articulatory pro-cesses). The explanation that is most consistent withall the other data being presented is that left ventralpars opercularis activation reflects the demands onarticulatory planning.

The distinction between ventral and dorsal parsopercularis has been ratherarbitrary. I have been us-

ing dorsal to refer to activation that is 2030 mmabove the ACPC line and ventral when activationis 420 mm above the ACPC line but there may be

multiple functional subregions within each of these

regions. Moreover, some studies are at the border ofventral and dorsal pars opercularis activation. Forexample, Heim and colleagues86 report activation at[50 8 19] when pictures to be named were blocked

according to similarity in semantics or syntax (gen-der) versus phonology. The results were interpretedin terms of decreased demands when name retrievalwas primed by phonologically related responses andincreased demands when stimuli are presented withsemantic and syntactic competitors. However, with-out further studies it is not possible to determinewhether the function of this mid-pars opercularisregion corresponds to the properties of the dorsalor ventral regions.

In summary, word retrieval activates the left mid-dle frontal cortex and dorsal and ventral left parsopercularis. Left middle frontal activation was notassociated with speech comprehension, unless thetask required explicit semantic comparisons be-tween two words,16 therefore the left middle frontalcortex appears to be more involved in word retrievalthan word comprehension.In contrast, activation inthedorsal and ventral regions of thepars opercularishas been reported for both speech production andcomprehension (see Fig. 2). The dorsal pars opercu-

laris has been associated with sequencing linguisticand nonlinguistic events and the ventral pars op-ercularis has been associated with sequencing ar-ticulatory events.43 Generating or interpreting thesequence of events is essential for speech compre-hension and production at both the individual word(i.e.,the sequence of phonemes) and sentence level(i.e., the sequence of words). The dorsal and ven-tral pars opercularis may contribute to sequencingby top-down constraints from prior knowledge of

what event typically follows another.

ArticulationWhen speech is produced, activation typically in-creases bilaterally in motor andpremotor cortex, thecerebellum, the supplementary motor area (SMAand pre-SMA), the superior temporal gyri, thetemporo-parietal cortices (PPT/vSMG), and the an-terior insula, with left-lateralized activation in theputamen (see Brown et al.,87 for a recent review).None of these regions are dedicated to speech per

seas illustrated by Chang and colleagues88 who re-port the same areas in the production of speechand the production of non-speech sounds from



19/27


orofacial and vocal tract gestures (e.g.,cough, sigh,kiss, snort, laugh, tongue click, whistle, cry) thathave no phonemic content. The only areas thatChang and colleagues88 found to activate for speechmore than vocal tract gestures were the ACC and

bilateral caudate. Both these areas have been associ-ated with suppression of inappropriate responsesby Aarts and colleagues,46 Ali and colleagues,74

and Kircher and colleagues.89 Greater activation forspeech than non-speech production may thereforereflect the greater demands on response selectionin the context of more competitors that need tobe suppressed during the selection of speech versusnon-speech movements.

The functions of the anterior cingulate and cau-

date nuclei during word selection are likely to be dif-ferent. There are many subregions in the ACC thatmay each have their own function during speechproduction.74,90,91 The head of the caudate has alsobeen implicated in sequencing temporal and cogni-tive processes,92 includingthesequencing of nonlin-guistic events.53 It is therefore plausible that greateractivation in the head of the caudate nuclei forspeech relative to non-speech88 might contribute togreater precision in thesequencing of speechrelativeto non-speech sounds.

Other studies have shown that activation in areascommon to speech and non-speech vocal produc-tion (i.e.,the premotor, parietal, temporal, cerebel-lar, and putamen) increases with the length of thespoken utterance43 but is inversely related to thefamiliarity of the stimulus. For example, Papoutsiand colleagues43 found more activation throughoutthe speech production system for the repetition ofpseudowords with four syllables versus two sylla-bles; and Shuster93 found activation in this system

increased for repetition of pseudowords comparedto the repetition of words. Interestingly, the mostextensive difference between pseudoword and wordrepetition in Shuster93 was located in the left ante-rior insula. This is consistent with a study by Moserand colleagues,94 who found increased activation intheleft anterior insulafor repetition of pseudowordswith novel syllablesrelative to therepetition of pseu-dowords with native (i.e.,familiar) syllables.

With respect to the function of the left ante-rior insula, the meta-analysis by Brown and col-

leagues87 found that the frontal operculum andmedial-adjacent anterior insula were activated by

syllable singing as well as oral reading; and Koelsch

and colleagues42 found left anterior insula activa-tion for the rehearsal of tone (pitch) information[32 19 3] as well as verbal information [32 154]. Brown and colleagues87 speculate that the ante-rior insula is involved in generalized orofacial func-

tions, including lip movement, tongue movement,and vocalization. The fact that activation is not de-pendent on whether speech is overt or covert;83 andnot reported to depend on the number syllables be-ing produced43 is consistent with previous claimsthat the anterior insula is involved in the planningrather than execution of articulation. In this con-text, increased activation in the left anterior insulafor unfamiliar speech sounds93,94 may simply reflectgreater demands on articulatory speech plans when

they are unfamiliar versus familiar.With respect to the function of the premotor cor-tex, Brown and colleagues87 distinguish areas thatcontrol larynx movements from those that controltongue and mouth movements. In Talairach coor-dinates, Brown and colleagues87 locate the larynxmotor cortex to [40 12 30/44 10 30] and lo-cate tongue movements to part of the rolandic op-erculum at [50 9 23/59 5 17], consistent withother studies of non-speech oral movements.95 Incontrast, Meister and colleagues96 found that the

most dorsal part of the premotor cortex [48 1254/ 45 12 60], above the larynx motor area, isactivated by finger tapping as well as articulation.This led Meister and colleagues96 to suggest thatthe dorsal premotor region plays a role in actionselection and planning within the context of arbi-trary stimulusresponse mapping tasks. Thus, thereare at least three functionally distinct regions in thepremotor cortex. Of these three regions, Brown andcolleagues87 note that activation during speech is

typically strongest in areas associated with phona-tion (i.e., the vibration of the vocal tracts). Thisis consistent with the findings of Fridriksson andcolleagues,83 who show peak activation for overt ar-ticulation relative to observation of articulation at[58 2 36/58 10 34], which is in the vicinity ofthe larynx area rather than the mouth and tonguearea.

The direct comparison of producing versus ob-serving speech production83 identified the pre-SMA[2 6 60] and left putamen [24 6 6] as well as

the premotor cortex discussed earlier. This suggeststhat these regions are involved in the initiation or

execution of speech movements but activation is



20/27


not specific to speech movements. For example,Tremblay and Gracco77 observed pre-SMA activa-tion at [6 13 50] for both volitional and stimulusdriven mouth movements, irrespective of whetherthe response was speech or oral motor gestures; and

Bahlmann and colleagues53 report activation at [14 57] for learning the hierarchical organization ofsequentially occurring nonlinguistic visual symbols.

In summary, articulatory planning of orofacialmovements activates the left anterior insula.87 Thisarea is equally activated for overt articulation andaction observation83; and more activated when themotor plans are unfamiliar.93,94 The initiation andexecution of movement increases activation in bi-lateral premotor/motor cortex, the pre-SMA, and

the left putamen.83

At least three distinct areas inthe premotor/motor cortex support speech produc-tion. The most ventral is involved in tongue move-ments, a more dorsal region is involved in controlof the larynx, and the most dorsal region is involvedin planning both speech and finger tapping move-ments. Two areas were found to be more involvedwith articulation of speech than non-speech orofa-cial movements. These were the anterior cingulateand bilateral head of caudate. Both regions havebeen associated with the suppression of competing

responses (speech and non-speech), which suggeststhat the subtleties and precision involved in cor-rect speech production requires more suppressionof competition from nontargets (i.e., all the wordswe know with similar articulatory patterns). Thenext section discusses the role of the superior tem-poral gyri, pPT, vSMG, and cerebellum in speechproduction.

Monitoring speech output

The final stage of speech production involves au-ditory and somato-sensory monitoring of the spo-ken response. This is crucial for online correctionof speech production, for example, when speakersmodify the intensity of their speech in noisy en-vironments, or when auditory feedback is altered(e.g., by delay on the telephone). There are threelevels at which this feedback can occur. One is fromhearing the sound of the spoken response (auditoryfeedback), the second is higher level phonologicalprocessing, and thethird is from somato-sensory in-

formation derived from the speech movements.72,

97

Auditory processing of the spoken response ac-tivates bilateral superior temporal regions associ-

ated with speech perception. This has been ob-served when auditory processing of the spoken re-sponse was not controlled with a speech outputbaseline, for example, by Heim and colleagues86 andWilson and colleagues58 during picture naming and

by Brown and colleagues87 during reading aloud.When speech output is controlled, however, supe-rior temporal activation has not been reported forword generation tasks.79,81

With respect to higher level phonological pro-cessing of the spoken output, three recent studieshave highlighted the involvement of the pPT and/orvSMG, when speech production is made more diffi-cult and therefore requires more auditory monitor-ing. In Hockinget al.,72 activation at [54 36 15]

increased when pictures were named in the contextof semantic interference; and Abel and colleagues,73

showed that picture naming activation at [56 3715] was stronger for phonological than semantic in-terference. The argument here is that both semanticand phonological interference make picture namingmore error prone and pPT and/or vSMG activationis involved in monitoring the verbal response toreduce errors. The effect of verbal self-monitoringon left pPT/vSMG activation was directly demon-strated by Zheng andcolleagues84 Theirparticipants

either whispered or produced the word Ted. Dur-ing production, the auditory input was either (i)consistent with the speech output (i.e., the wordTed) or (ii) a masking noise that prevented par-ticipants hearing the sound of their own speech. LeftpPT/vSMG activation at [66 45 15] was greatestwhen participants spoke in the context of a maskrelative to the other conditions. This effect couldnot be explained by either speaking or hearing persebut indicates greater pPT/vSMG activation when

there was conflict between the expected auditoryinput and the actual auditory input. This is consis-tent with the interaction between auditory inputsand auditory predictions that are generated by thespeech productionsystem or conversely motoreffer-ents and motor predictions that are generated fromthe auditory speech inputs, Zheng and colleagues.84

Linking back to the speech comprehension sec-tion, pPT/vSMG activation was attributed to sub-vocal articulation in perceptually or semanticallychallenging contexts. Increased subvocal articula-

tion can also explain increased pPT/vSMG ac-tivation during speech production in the con-

text of semantic,72 phonological,73 or acoustic



21/27


interference.84 Specifically, pPT/vSMG activationcould reflect increased subvocal articulation fromsemantic and phonological competitors72,73 that isthen suppressed before motor output. There mayalso be increased subvocal articulation during pro-

duction when no corresponding articulatory sig-nals are received from auditory processing of theoutput.84 In both cases, PPT and/or vSMG may beinvolved in suppressing articulatory errors.

With respect to somato-sensory feedback, in theabsence of auditory feedback, Peschke and col-leagues97 suggest a role for the postcentral gyrus.Consistent with this proposal, Zheng and col-leagues84 report activation in the left postcentralgyrus when auditory feedback during speech pro-

duction was masked; and Hocking and colleagues72

report activation in the right postcentral gyrus dur-ing error-prone speech production.It is alsorelevantto note that, during the same conditions that evokedpostcentral activation, both Zheng and colleagues84

and Hocking and colleagues72 also report activationin the left cerebellum. This would be consistent witha role for both the postcentral gyri and the left cere-bellum in somato-sensory feedback during speechproduction.

In summary, auditory monitoring and feedback

during speech production requires the integrationof auditory, articulatory, and somato-sensory sig-nals with the motor output. The dynamics of this

cathy price 2010.pdf

Documents