barry e. stein - the new handbook of multisensory processing (chapter 24)

19
Q Stein—The New Handbook of Multisensory Processes 24 Multisensory Interactions in Speech Perception  J , . Yg, J F . W, -F m The perception of soeone talkin provides correlated input to ore than one sensory odality (ainly vision and audition) siultaneously . There are any everyday situations (such as face-to-face conversations, watchin television, or videoconferencin) in which linuistically reliable inforation can be obtained fro the siht of the speaker. The fact that one often counicates effectively in the absence of any visual cue (e.., talkin over the telephone) perhaps leads to the siple infer- ence that these visual speech cues are copletely redundant with respect to the concurrent acoustic input or even useless in their linuistic and infora- tional relevance. Nevertheless, epirical evidence accu- ulated over the last few decades provides solid rounds to disiss this subjective ipression and instead sup- ports the conclusion that there is iportant infora- tion fro vision that, when accessible, copleents and suppleents the acoustic speech sinal.  Vision carries substantial, and linuistically relevant, cues about the spoken sinal. or exaple, research and clinical/educational practice with deaf individuals have repeatedly deonstrated the benefits of lipread- in (or speechreading ) under conditions of hearin loss (see Auer, 2010, for a review). Norally hearin indi-  viduals also display a rearkable sensitivity to these  visual speech cues. or instance, the fact that adults, and even infants as youn as 4 onths, are capable of discriinatin between silent faces articulatin sen- tences in different lanuaes (e.., Enlish and rench) akes us think that the sensitivity to visual speech infor- ation arises as a part of noral developent and not only as a copensatory stratey to cope with acoustic ipairent (eiku et al., 2007; see also Soto-araco et al., 2007; see fiure 24.1). Many different linuistic cues (includin both seental and supraseental) can, in fact, be retrieved fro visual speech articula- tions (Bernstein, Eberhardt, & Deorest, 1989; ian,  Auer , Alwan, Keatin, & Bernstein, 2007; Vatiki otis- Bateson, Munhall, Kasahara, Garcia, & ehia,1996;  ehia, Kuratate, & Vatiki otis-Bateson, 2002; and see chapter 23, in this volue, by Vatikiotis-Bateson and Munhall) and fro other visible correlates such as head otion (Hadar, Steiner, Grant, & Rose, 1983, 1984; Munhall, ones, Callan. Kuratate, & Vatikiotis-Bateson, 2004). An iportant question, however, is whether and how this visual source of inforation about speech is cobined with auditory speech when they are both present. The pioneerin work by Suby and Pollack (1954; see also Cotton, 1935) represents the first successful attept to assess the role of dynaic facial inforation on the coprehension of a spoken essae. Usin a clever setup, these authors deonstrated that the per- ception of acoustically presented words asked with noise iproved substantially when the speaker’s facial oveents were available to the perceiver (see also Grant & Greenber, 2001 and Ross, Saint-Aour, Leavitt, avitt, & oxe, 2007, for other deonstrations of visual enhanceent of auditory speech perception). This type of result reveals that observers can exploit (and thus benefit fro) the inforational correspon- dence between visual and acoustic aspects of the speech sinal when needed. Moreover, abundant research su- ests that visual speech exerts a substantial ipact on speech perception even under ood acoustic condi- tions (e.., Reisber, McLean, & Goldfield, 1987), that is, not only when the acoustical sinal is deraded. McGurk and MacDonald (1976) deonstrated this point in a hihly influential study, showin that a sylla- ble that is heard as /ba/ when presented in isolation is often heard as /da/ when dubbed onto a video clip of a face silently articulatin the syllable [a] (see fiure 24.2). 1 This added value of audiovisual (A V) interation has also been deonstrated in ore subtle ways, without the need for artificially induced intersensory conflict. or exaple, the cobination of visual and auditory speech can ake us ore sensitive to nonnative pho- neic distinctions that are difficult to discern on the basis of just visual or auditory inforation alone (Navarra & Soto-araco, 2007; see also Tein onen, Aslin,  Alku, & Csibra, 2008). Developental research has shown that these AV speech perception abilities are 8466_024.indd 435 12/21/2011 6:01:40 PM

Upload: fanton95

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 1/18Stein—The New Handbook of Multisensory Processes

24 Multisensory Interactions in Speech

Perception

 J , . Yg, J F. W, -F

m

The perception of soeone talkin provides correlatedinput to ore than one sensory odality (ainly visionand audition) siultaneously. There are any everyday situations (such as face-to-face conversations, watchintelevision, or videoconferencin) in which linuistically reliable inforation can be obtained fro the siht of the speaker. The fact that one often counicates

effectively in the absence of any visual cue (e.., talkinover the telephone) perhaps leads to the siple infer-ence that these visual speech cues are copletely redundant with respect to the concurrent acousticinput or even useless in their linuistic and infora-tional relevance. Nevertheless, epirical evidence accu-ulated over the last few decades provides solid roundsto disiss this subjective ipression and instead sup-ports the conclusion that there is iportant infora-tion fro vision that, when accessible, copleentsand suppleents the acoustic speech sinal.

 Vision carries substantial, and linuistically relevant,cues about the spoken sinal. or exaple, researchand clinical/educational practice with deaf individualshave repeatedly deonstrated the benefits of lipread-in (or speechreading ) under conditions of hearin loss(see Auer, 2010, for a review). Norally hearin indi- viduals also display a rearkable sensitivity to these visual speech cues. or instance, the fact that adults,and even infants as youn as 4 onths, are capable of discriinatin between silent faces articulatin sen-tences in different lanuaes (e.., Enlish and rench)akes us think that the sensitivity to visual speech infor-ation arises as a part of noral developent and not only as a copensatory stratey to cope with acousticipairent (eiku et al., 2007; see also Soto-aracoet al., 2007; see fiure 24.1). Many different linuistic

cues (includin both seental and supraseental)can, in fact, be retrieved fro visual speech articula-tions (Bernstein, Eberhardt, & Deorest, 1989; ian, Auer, Alwan, Keatin, & Bernstein, 2007; Vatikiotis-Bateson, Munhall, Kasahara, Garcia, & ehia,1996; ehia, Kuratate, & Vatikiotis-Bateson, 2002; and see

chapter 23, in this volue, by Vatikiotis-Bateson andMunhall) and fro other visible correlates such as headotion (Hadar, Steiner, Grant, & Rose, 1983, 1984;Munhall, ones, Callan. Kuratate, & Vatikiotis-Bateson,2004). An iportant question, however, is whether andhow this visual source of inforation about speech iscobined with auditory speech when they are bothpresent.

The pioneerin work by Suby and Pollack (1954;

see also Cotton, 1935) represents the first successfulattept to assess the role of dynaic facial inforationon the coprehension of a spoken essae. Usin aclever setup, these authors deonstrated that the per-ception of acoustically presented words asked withnoise iproved substantially when the speaker’s facialoveents were available to the perceiver (see alsoGrant & Greenber, 2001 and Ross, Saint-Aour,Leavitt, avitt, & oxe, 2007, for other deonstrationsof visual enhanceent of auditory speech perception).This type of result reveals that observers can exploit (and thus benefit fro) the inforational correspon-dence between visual and acoustic aspects of the speech

sinal when needed. Moreover, abundant research su-ests that visual speech exerts a substantial ipact onspeech perception even under ood acoustic condi-tions (e.., Reisber, McLean, & Goldfield, 1987), that is, not only when the acoustical sinal is deraded.McGurk and MacDonald (1976) deonstrated thispoint in a hihly influential study, showin that a sylla-ble that is heard as /ba/ when presented in isolation isoften heard as /da/ when dubbed onto a video clip of a face silently articulatin the syllable [a] (see fiure24.2).1 This added value of audiovisual (AV) interationhas also been deonstrated in ore subtle ways, without the need for artificially induced intersensory conflict.or exaple, the cobination of visual and auditory 

speech can ake us ore sensitive to nonnative pho-neic distinctions that are difficult to discern on thebasis of just visual or auditory inforation alone(Navarra & Soto-araco, 2007; see also Teinonen, Aslin, Alku, & Csibra, 2008). Developental research hasshown that these AV speech perception abilities are

8466_024.indd 435 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 2/18Stein—The New Handbook of Multisensory Processes

436  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

already present quite early in life, as youn infants, andeven neonates, are able to detect the atch betweenspeech sounds and correspondin visually perceivedarticulatory estures (Aldride, Braa, alton, & Bower,1999; Kuhl & Meltzoff, 1982; Patterson & erker, 2003)and see to display McGurk-like illusions (Burnha &

Dodd, 2004).The present chapter provides an overview of therecent advances in this area, discussin soe of theresearch issues on ultisensory speech perception that have led to eneral areeent as well as those that arecurrently at the focus of debate. e have oranized thepresent review under three topics: the developentalaspects of AV speech interation, the perceptual andneural bases of AV speech interation, and the possiblerole that attention and top-down processes play on AV speech processin.2

m g ( )

Unlike theories of adult speech perception (see thenext section), one priary theoretical divide aonresearchers studyin the developent of speech per-ception is whether the senses work toether fro birthor whether coordination of input across odalitiescoes about throuh experience and learnin. Thediscussion in the current section will, in consequence,focus on when and how the auditory and visual systescoe to work toether in developent.

 Accordin to the ore standard theoretical approach,the senses are separate at birth and only becoe inte- grated  throuh learnin and experience. ithin thisapproach, AV speech perception is referred to as “inter-odal” or “cross-odal,” to capture the fact that twoindependently functionin odalities need to act toether. There is disareeent within this approach asto whether interation occurs across developent throuh eneral principles of associative learnin (e..,Birch & Lefford, 1963) or throuh ore active hypoth-esis testin (e.., Piaet, 1952). In contrast to this inte- gration fraework, differentiation theorists refer to initial AV perception as “aodal,” to capture the notion that the senses are not separate at birth and respond toetherto the input, for exaple, encodin intensity, rhyth,or other aodal properties of the eneratin sources(e.., Gibson, 1969). In differentiation approaches

experience also plays a role; it is needed to allow thesenses to act independently and to attune, or narrow,the perceptual syste to ake it respond selectively toonly those ultiodal relations present in the infant’senvironent (see Lewkowicz, 1994, for a review of bothclasses of theories).

g . Experiental setup used by eiku et al.(2007). Infants aed 4–6 onths were presented with a seriesof videoclips showin faces silently articulatin sentences inrench or Enlish. nce the infants habituated to the clipsin one lanuae (e.., Enlish), test trials appeared. Test trialsconsisted of siilar silent videoclips that could appear in thesae lanuae (followin the exaple, Enlish), for half of the infants, or in another lanuae (rench) for the otherhalf of the infants. Accordin to the results, infants tested witha new lanuae spent ore tie lookin at the faces than theother roup, indicatin that infants discriinated Enlishfro rench speech just fro viewin silent articulations. By 8 onths, only bilinual (rench/Enlish) infants succeed,perhaps reflectin an experience-based selectivity in keepin

only necessary perceptual sensitivities.

g . The McGurk effect. e often “hear” the sylla-ble /da/ when bein presented with a face articulatin thesyllable [a] dubbed with the sounds of the syllable /ba/(McGurk & MacDonald, 1976).

8466_024.indd 436 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 3/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 437

Evidence for differentiation coes fro studies that show, for exaple, infants’ ability to detect the equiva-lence between the phonetic content of heard and seenspeech (i.e., AV atchin), even for sounds not previ-ously experienced. There is also incontrovertible evi-dence that experience does play a role in the

developent of AV speech perception. or exaple,initial ultiodal sensitivities that are not required inthe infants’ environent becoe less evident (a trendcalled “perceptual narrowin”), and those that arerequired becoe ore robust. The evidence for eachof these theoretical approaches is reviewed below.

Infant AV Matching 

In a typical AV atchin paradi, an infant is showntwo side-by-side talkin faces, each articulatin a differ-ent utterance. A speech sound that atches the visiblearticulation of one of the faces but not the other ispresented centrally. Evidence of AV atchin is

obtained if the infant looks loner to the atchin facethan to the isatchin face. Research in the late1970s revealed that 2.5- to 4-onth-old infants are ableto use synchrony to atch outh oveents withsounds (Dodd, 1979; see also Lewkowicz, 2010). By 5–7onths of ae, infants are able to atch the affectivecontent of faces and voices (Soken & Pick, 1992; alker- Andrews, 1997) and can perfor the AV atch usinthe ender of the faces and the voice by 7–9 onths(Patterson & erker, 2002; Poulin-Dubois, Serbin,Kenyon, & Derbyshire, 1994; alker-Andrews, Bahrick,Ralioni, & Diaz, 1991). Indeed, a recent study provideselectrophysioloical evidence that ender atchiniht be possible durin speech perception by as early as 10 weeks (Bristow et al., 2009).

The first evidence of phonetic atchin of AV speech was provided in infants aed 4.5 onths by Kuhl andMeltzoff (1982) and has been replicated several tiesfor vowels and consonants, usin both feale (Kuhl &Meltzoff, 1984; Lewkowicz, 2000; MacKain, Studdert-Kennedy, Spieker, & Stern, 1983) and ale videotapedfaces (Patterson & erker, 1999). Recent work has de-onstrated that the atch between AV vowel pairs is, infact, evident as early as 2 onths of ae (Baier, Idsardi,& Lidz, 2007; Patterson & erker, 2003). Indeed,infants at this ae distinuish the dynaic lip ove-ents of a rounded production, /wi/, fro a sinle

 vowel, /i/, which ends in the sae articulatory confiu-ration, and can use this subtle difference to detect an AV atch (Baier et al., 2007). These findins of atch-in as early as 2 onths of ae, nevertheless, are alsoconsistent with an integration  view since they could bethe result of the substantial early experience of infants

 watchin speakin faces, particularly in the face-to-faceexchanes that are quite characteristic of parent-infant interactions. This is an iportant issue, as a require-ent for experience would be consistent with a learn-in perspective and would suest that AV speechperception is akin to eneral learnin principles, just as

the association between the sound and the siht of acar.Support for a differentiation  view, on the other hand,

coes fro accuulatin evidence suestin that biodal atchin ay be also possible without specificexperience. It has been observed that even neonates areable to atch the seen oveents of the outh withheard vowels (Aldride, Braa, alton, & Bower, 1999)and that, at only 4 onths of ae, infants can atch AV  vowels (alton & Bower, 1993) and consonants fro aforein lanuae (Pons, Lewkowicz, Soto-araco, &Sebastián-Gallés, 2009). In the work by Pons and col-leaues, infants were tested on their ability to biodally atch the Enlish syllables /ba/ and /va/ presented

acoustically with their respective lip oveents. Notethat the /ba/-/va/ phonetic distinction is not contras-tive in Spanish (i.e., adult Spanish speakers tested inthat study could not atch correctly between acousticand visual exeplars of /va/ and /ba/). At 4 onthsof ae both Spanish and Aerican infants succeeded, whereas by 10 onths of ae only Aerican, but not Spanish, infants showed sins of pairin the siht andsound of /va/ and /ba/. This pattern of results sueststhat AV atchin precedes specific experience withphonoloical cateories. et, experience with a particu-lar lanuae odulates this ability, accoodatineach infant’s perceptual syste to the deands of his/her particular linuistic environent (as the differen-tiation hypothesis suests).

 Additional support for the liited role of experiencein early detection of AV speech conruency ay also bethe ability to atch heard and seen vocalizations beyondone’s own species. Huan infants at 5 onths of aeare able to atch the species of heard vocalizations withstatic faces of huans and onkeys but not with staticduck faces (Voulouanos, Druhen, Hauser, & Huizink,2009). ne interpretation of these results is that early atchin abilities are driven by experience-indepen-dent, evolutionarily based biases for conspecifics ratherthan by actual experience with specific anials. et, it is also possible that the reater siilarity between

huan and onkey faces than between huan andduck faces facilitates eneralization of learnin andhence better atchin of onkey calls to onkey faces.More nuanced evidence for early aodal sensitivities to vocal sounds coes fro reports that even at birth(Lewkowicz, Leo, & Siion, 2010), and continuin to

8466_024.indd 437 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 4/18Stein—The New Handbook of Multisensory Processes

438  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

6 onths (Lewkowicz & Ghazanfar, 2006), huaninfants can atch heard and seen vocalizations of rhesus onkeys, even thouh they have never beenexposed to the before. By 10 onths of ae, however,this AV atchin ability declines for other species(Lewkowicz, Sowinski, & Place, 2008), aain suestin

that experience is required to aintain the initialsensitivity.

Infant AV Integration 

 Althouh the evidence for AV atchin of speech at anearly ae is rearkable, sensitivity to AV atch/is-atch does not directly iply that correlated infora-tion fro vision and audition is integrated  in a unitary percept. Stroner evidence of the senses workintoether can be found when a unitary percept arisesfro the cobination of the features specified by heardand seen speech, as it happens in the McGurk effect.Burnha and Dodd (2004) tested interation abilities

in 4-onth-old infants by habituatin the to either an AV atchin /ba/ or a isatchin speech event con-sistin of an audio /ba/ dubbed onto a visual [a], acobination that leads to a fused “da” (and soeties“tha”) percept in adults (see McGurk & MacDonald,1976). After habituation, infants were tested with thesounds “ba,” “da,” or “tha” alone (i.e., dubbed onto astill face that provided no dynaic/linuistic infora-tion). In the test followin habituation, and accordinto the typical habituation results, it was expected that infants would look loner to novel stiuli than to theones experienced durin habituation. Consistent withthe idea that infants interated the AV isatch sylla-ble into “da” or “tha” durin the habituation phase,they showed reater recovery in lookin ties to thesound /ba/ than to /da/ or /tha/ (copared to infantshabituated to the audiovisually atchin /ba/). Thissuests that, durin habituation, the acoustic /ba/ + visual [a] stiulus was fused and perceived as either“da” or “tha.”

Two other studies have tested visual capture  ratherthan AV fusion. In the adult literature the presentationof an auditory /b/ accopanied by a visual /v/ resultsin an acoustic percept of the visually presented phonee(i.e., perceived as /v/), a phenoenon coonly referred to as “visual capture.” Even if this phenoe-non were not as indicative of interation as the McGurk

effect, it surely represents another exaple of ultisen-sory speech perception. In a series of experients,Rosenblu, Schuckler, and ohnson (1997) providedevidence that infants of 4 onths show visual [va]capture of auditory /ba/. In a subsequent study usina slihtly different ethod, Desjardins and erker

(2004) provided further evidence of visual capture in4-onth-old infants (auditory /bi/ by visual [vi]),althouh in their case, the phenoenon was not evident in all conditions. n the basis of these results, Desjar-dins and erker concluded that althouh a foundationfor biodal speech perception iht be present in the

 very youn infant, its strenth increases withexperience.Kushnerenko, Teinonen, Volein, and Csibra (2008)

recorded event-related potentials (ERPs) fro infantsaed 5 onths in response to four types of AV stiuli: AV conruent /ba/ + [ba] and /a/ + [a] stiuli, AV inconruent /ba/ + [a] (leadin to “da” fusions inadults), and AV inconruent /a/ + [ba] (leadin tothe phonotactically unusual cobination percept “ba”in adults; see McGurk & MacDonald, 1976). They rea-soned that the first three types of stiuli should beperceived as a coherent, fused syllable, whereas the AV inconruency that leads to “ba” should pose a percep-tual proble. In line with this prediction, the authors

found a difference in the visual ERP coponent to a visual [a] versus a visual [ba], but ore iportantly,there was also a distinct ERP response to the /a/ +[ba] (cobination stiulus) copared to any other AV stiulus type (ore positive over frontal areas andore neative over teporal areas). Althouh ourcurrent knowlede of the eanin of different infant ERP sinals is still lackin, the fact that different responses were seen to AV stiulus confiurationsleadin to unitary fused syllable (“da”) versusthose leadin to a phonotactically anoalous cobina-tion (“ba”) led the authors to conclude that neuralresponses are sensitive to the results of AV interation.

Contrary to developental research on AV atchinabilities, which has involved a wide variety of aes(includin newborns) and cross-linuistic approaches(i.e., native and nonnative speech cateories), untilrecently no research on AV interation of speech hadinvolved infants youner than 4 onths or nonnativestiuli. In 2009, Bristow and colleaues conducted astudy in which they easured the electrophysioloicalMMR (isatch response) to a chane to a new acous-tically presented vowel (either /a/ or /i/) followinrepeated presentation of either an auditory-only vowelor a visual-only (silently articulated) vowel (/i/ or /a/,respectively). This hybrid desin allowed the authors to

investiate the neural response to an acoustic stiulusthat either atched or did not atch the previously presented vowel. Interestinly, the MMR was as rapid tothe chane fro one visual to a different acoustic vowelas it was fro one acoustic to a different acoustic vowel.These results suest the existence of shared represen-

8466_024.indd 438 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 5/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 439

tations for visual and auditory speech at just 4 onthsof ae. Moreover, the doinant neural responses wereseen first in the left heisphere teporal areas, and inthe left heisphere frontal areas afterwards, indicatinthe possible involveent of siilar “lanuae” areasof the brain as in adult perceivers.

The above results suest that the neural systessupportin AV interation eere very early in life.Nonetheless, in order to ore fully address the ques-tion of whether visual inforation influences auditory speech perception prior to specific experience, it willbe essential to test AV interation of speech in youninfants usin unfailiar auditory and/or visual stiuli, just as has been done in the biodal atchin studiesreviewed above. Studies with newborn infants without prior experience with visual speech will also be infora-tive for deterinin whether there is an oranizin rolefor visual experience. In suary, research to dateindicates substantial readiness for AV speech percep-tion in youn infants even without prior specific experi-

ence. et, it has becoe clear that experience also playsan iportant role in aintainin and sharpenin theseinitial sensitivities. The initial oranization seen pro- vides a foundation for apprehendin the redundant inforation available in visual and auditory speech andthus could facilitate learnin about lanuae in eneraland the native lanuae in particular.

g g

 A reat deal of research has souht to characterize theprocesses and the representations that underlie AV speech perception in adults. By AV speech process we referto a set of ental operations that contribute to theinteration of inforation fro two or ore sensory odalities, and fro which eeres a speech percept.By  AV speech representation  we refer to the forat andinforational content of the input enterin into, or theoutput resultin fro, these interative processes.

 As any review of the literature on oral counica-tion will show, there is a reat deal of disareeent about both the processes and representations involvedin the perception of speech. Researchers have advo-cated for a variety of alternatives, includin abstractionsof phonoloical features that are foratted fro pro-totypes in eory (e.., Massaro, 1987) or defined

acoustically (e.., Stevens, 2002), abstract otor plansto produce specific speech estures (Liberan &Mattinly, 1985), or even aodal inforation about theactual oveents involved in these estures (e..,owler, 1996). In the specific case of AV speech percep-tion, theoretical approaches have siilarly disareed

about what the code iht be in which visual and audi-tory inputs are represented durin speech perception.Siilarly, how AV speech processes are conceptualizedis enorously influenced by a coitent to any par-ticular representation of incoin AV speech sinals.or exaple, a direct-realist view about the representa-

tion of speech estures leads to a reduction in the set of processes involved in AV speech perception (seeowler, 1996).

Equally relevant questions for the understandin of  AV interation of speech have been the when (durinthe processin of speech sinals) and the where (in thebrain) interation takes place. The advance of neuro-iain, includin functional anetic resonanceiain and electrophysioloical techniques, hasallowed researchers to investiate these issues in reat detail. Initial attepts tended to depict AV interationof speech as a late phenoenon (that is, occurrin afterearly analyses of the uniodal sinals have already taken place) in heteroodal areas rather than, for

exaple, in priary visual or auditory cortex. Morerecent literature, however, tends to ephasize the exis-tence of an early interplay between vision and audition(or even a soewhat unidirectional visual-to-auditory odulation). Both of these controversies (i.e., what andhow as well as when and where ) are reviewed in the twosections below.

What and How: Representational Codes and Related Processes 

Classic odels of speech perception have frequently nelected the ultiodal nature of counication.or soe, the speech sinal is exclusively based on theacoustic input (e.., Klatt, 1980; Liberan, 1982;McClelland, & Elan, 1986), whereas others haveincluded in later forulations visual input as a possiblesource of copleentary inforation (e.., Diehl &Kluender, 1987; Diehl, Lotto, & Holt, 2004; Liberan& Mattinly, 1985). Even within those theoretical viewsof speech perception that consider the ultisensory nature of spoken lanuae, there are very different approaches reardin both the processes involved inperceivin AV speech and the representations in whichspeech inputs are encoded. Three distinct approaches,includin the ost influential odels on AV speechperception proposed to this date, are discussed in turn.

g Such theories enerally assue that both visual and auditory speech inputs areapped onto unitary  representations (e.., esturalcodes) that ediate perceptual processes (see owler,2004). The convergence theory that athered ost of the

8466_024.indd 439 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 6/18Stein—The New Handbook of Multisensory Processes

440  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

attention fro the research counity in the last decade is, without doubt, the otor theory of speechperception (Liberan, Cooper, Shankweiler, &Studdert-Kennedy, 1967). Accordin to this theory,speech production and speech perception are bothbased on a coon, prearticulatory code (Liberan

et al., 1967; see also Liberan & Mattinly, 1985). Thistheory, and also the direct-realist theory of speech per-ception (owler, 1996; see also Best, 1995), have su-ested that speech events are represented as specificspeech estures. A stron clai of this type of theory is,therefore, the involveent of otor representationsdurin speech perception.

The hypothesis of otor involveent durin speechperception has received epirical support fro a richbody of behavioral evidence showin that perceivedauditory  speech can alter the latency (Bell-Berti,Raphael, Pisoni, & Sawusch, 1979; owler, Brown, Saba-dini, & eihin, 2003; Gordon & D. E. Meyer, 1984;Levelt et al., 1991) and even the kineatics (Houde &

 ordan, 1998; uen, Davis, Brysbaert, & Rastle, 2010) with which the listener produces speech. Copleen-tary studies also show that articulatin (Nasir & stry,2009; Sas, Möttönen, & Sihvonen, 2005), planninarticulation (Roelofs, Özdeir, & Levelt, 2007), or evenhavin one’s facial skin defored in ways that iicarticulation (Ito, Tiede, & stry, 2009) can all affect theperception of auditory speech. In line with this view,recent evidence has also suested that perceivin visual and AV speech utterances can have a siilareffect on ensuin production, influencin the latency  with which speakers articulate these and other utter-ances (Galantucci, owler, & Goldstein, 2009; Genti-lucci & Bernardis, 2007; Kerzel & Bekkerin, 2000).

urther evidence for the possible role of esturalinforation in speech perception coes fro otorevoked potentials (MEPs) induced by transcranial a-netic stiulation (TMS) of lip- and tonue-related areas within the otor cortex (see adia, Craihero, &livier, 2005, for a review). Several studies have seenselective odulation of MEPs at the articulators involvedin the production of specific speech events durin audi-tory (e.., adia, Craihero, Buccino, & Rizzolatti,2002; ilson, Pinar-Sayin, Sereno, & Iacobini, 2004)and visual (e.., Sato, Buccino, Gentilucci, & Cattaneo,2009; atkins, Strafella, & Paus, 2003) speech percep-tion. TMS has also been used to disrupt the functionin

in these cortical otor areas, and doin so has beenshown to influence speech perception itself, particu-larly when the speech sinal is rendered abiuouseither by addin noise or by usin stiuli that fallbetween cateories in a phonetic continuu (D’Ausilioet al., 2009; Möttönen & atkins, 2009).

Toether, these results suest that otor-articulatory processes iht be enaed in AV speechperception, but ore evidence ust be obtained todeterine their causal role (Hickok, Holt, & Lotto,2009; Mahon & Caraazza, 2008; Mitterer & Ernestus,2008). In fact, critics of the otor theories also arue

that speech perception abilities reain relatively intact in patients in who otor brain areas are daaed(e.., Crinion, arburton, Labon-Ralph, Howard, & ise, 2006; Terao et al., 2007; but see Pulverüller &adia, 2010).

These views enerally differ fro convergence views in two iportant respects.irst, they suest that speech inforation fro sepa-rate odalities retains odality-specific characteristicsthrouhout processin (see Bernstein, Auer, & Moore,2004). In other words, unlike converence theories, which assue aodal (or heteroodal) and unitary representations fro the initial staes in the processin

of ultisensory speech sinal, associationist  theoriesassue parallel processin of speech inforation inseparate odalities until a relatively late stae of per-ceptual analysis. Second, accordin to this view, intera-tion of ultisensory input does not involverepresentations of speech that are linked to speechestures, per se, but rather to abstract features storedas prototypes in eory, which are associated acrossodalities in the course of processin (see Braida,1991; Massaro, 1987, 1998, 2004). The fuzzy-loicalodel of perception (LMP; see Massaro, 1987, 1998)illustrates both of these clais. This odel postulatesthe existence of different processin staes, and cross-odal interation does not take place until a relatively advanced stae of processin (called feature integration ). Accordin to this view, speech features extracted frothe visual and auditory inputs are copared aainst “prestored” phonoloical prototypes to select the best atch.

 Althouh the LMP can account for a lare aount of epirical data (e.., Massaro, 1998), it has beensubject to soe substantial criticiss (e.., Green &Gerdean, 1995). A possible aruent aainst one of the postulates of prototype-based odels such as theLMP is the excessive weiht it confers to stored eory representations. The fact that prelinuistic (i.e.,4-onth-old) infants are able to interate and atch

speech events across sensory odalities (see the previ-ous section) ay suest that this cross-odal ability isnot necessarily based on established auditory cateoriesor prototypes in eory (i.e., a phonoloical syste).f course, this does not ean that (adult) perceivers with a well-developed phonoloical syste use the sae

8466_024.indd 440 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 7/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 441

code durin AV interation of speech sinal as infantsdo, but one can at least conclude that AV atchin of speech is possible without an elaborated phonoloicalsyste or entrenched prototypes in eory. Anotherepirical aruent aainst the LMP is the findin of  AV interactions at the level of articulatory features,

prior to phonoloical cateorization (Green &Gerdean, 1995).

-- Accordin to thisrelatively recent view, the role of visual input, when it provides salient and sufficiently distinuishableinforation, is to constrain the processin of the audi-tory sinal, thus akin it faster and ore accurate.Durin speech production, for exaple, there is often visible articulatory inforation (e.., closure of thelips) that precedes a speech sound (e.., the sound /b/;see fiure 24.3) by several tenths of a second (see Schro-eder, Lakatos, Kajikawa, Partan, & Puce, 2008). Accord-in to van assenhove and colleaues (van assenhove,

Grant, & Poeppel, 2005), this advanced inforationcan be used durin speech perception to eneratephonoloical predictions aainst which incoinsounds are evaluated. The ore constrainin the pre-diction fro visual speech is (e.., [b] enerates stron-er prediction than []), the faster its correlated soundsees to be processed. Visual anticipation durin AV speech perception ay perhaps help the perceiver toquickly retrieve the relevant representations fro therather coplex and fast-chanin acoustic speechsinal (see Navarra, Alsius, Soto-araco, & Spence,2010).

Recent conceptions of this theoretical view suest that the prediction process ay be based on preotor-articulatory representations (Skipper, Nusbau, &Sall, 2005). Accordin to this view, visual speech infor-ation enerates an internally synthesized  motor prediction  

about an articulatory event, represented by the speak-er’s own estural knowlede. These predictions arethen interated with the onoin acoustic input toderive the final percept (Skipper, van assenhove,Nusbau, & Sall, 2007). Accordin to Skipper andcolleaues (2007), the eneration of these otor pre-

dictions ay help to solve the lack of invarianceproble, that is, the proble of appin speech sinalsthat are extreely variable (i.e., the way a specificphonee is produced is different for each speaker andhihly depends on its adjacent phonees, in a pheno-enon called coarticulation ) onto hihly specific, discrete,and cateorical representations. It is worth notin,however, that the relevance iven to visual anticipatory inforation and otor predictions is restricted in theseodels by (1) the lare abiuity that exists within the visual speech sinal itself (e.., the hihly salient articu-lation of /p/, /b/, and // is virtually equivalent) andby (2) the fact that the articulation of any phoneesreains invisible to the perceiver (e.., the articulation

of //).

Where and When: The Neural Underpinnings of Perceiving AV Speech 

Studies usin positron eission tooraphy (PET) andfunctional anetic resonance iain (fMRI) haverepeatedly shown activation in posterior areas of thesuperior teporal sulcus (STS) durin auditory (Binderet al., 1997; Calvert, Bullore, Braer, & Capbell,1997) and visual (Blasi et al., 1999; Calvert et al., 1997)speech perception. In a seinal study, Calvert and col-leaues (Calvert, Capbell, & Braer, 2000) reported,for the first tie usin fMRI, ultisensory responses inthe left STS followin a superadditive pattern (with acriterion adapted fro sinle-neuron anial neuro-physioloy)3 durin the perception of atchin AV 

g . Speech articulatory estures often precede their correspondin sounds. This early visual inforation has anipact on the way speech sounds are processed, as shown in, for exaple, an advanceent of auditory ERP coponents suchas the N1 (van assenhove et al., 2005; see also Besle et al., 2004).

8466_024.indd 441 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 8/18Stein—The New Handbook of Multisensory Processes

442  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

speech. In contrast, a subadditive response pattern inthe sae brain area was found durin the presentationof isatchin AV speech stiuli (see also Callan et al., 2003; riht, Pelphrey, Allison, McKeown, & McCar-thy, 2003). urther studies have hihlihted the STS asa node at which correlated auditory and visual infora-

tion (fro speech and nonspeech stiuli) is thouht to convere (see Beauchap, Arall, Bodurka, Duyn, &Martin, 2004).

The conceptualization of the STS as the place wheresinals fro vision and audition are interated fits well with a hierarchical and associationist  conception of AV speech processin (see previous subsection) whereby ultisensory bindin occurs at hiher-order associationareas after independent visual and auditory processinin unimodal  brain areas have taken place. Processin within these ultisensory sites would then enae in arecurrent loop via feedback projections to hierarchi-cally lower, sensory areas. Previous findins of visually evoked activation in auditory cortices (Calvert et al.,

1997; Bernstein et al., 2002; see also Sas et al., 1991)could then be accounted for as a result of feedbackprojections fro heteroodal areas. This hierarchical   view of AV speech interation has been put in doubt,however, after accuulatin evidence of AV interac-tions at very early latencies in electrophysioloicalstudies (e.., as early as 30 sec after the onset of thesound; see Besle, Bertrand, & Giard, 2009, for a review)durin speech perception (Besle, ort, Delpuech,& Giard, 2004; Hertrich, Mathiak, Lutzenberer, & Ackerann, 2008; Lebib, Papo, de Bode, & Baudon-nière, 2003; Möttönen, Schürann, & Sas, 2004; van assenhove et al., 2005).4 Note, however, that thederee of stiulus and doain specificity of these early correlates with respect to speech processin is stillunclear (e.., Bernstein, Auer, aner, & Ponton, 2008;Ponton, Bernstein, & Auer, 2009).

Because of the biophysical characteristics of speechproduction, visible articulatory estures are often avail-able before their consequent sounds (e.., the lipsusually close ore than 150 sec before the sound /b/is produced; see fiure 24.3). This precedence of visualinforation sees to overcopensate the teporaloffset in neural processin ties whereby vision arriveslater than audition to their correspondin priary cor-tical areas (see Schroeder et al., 2008). As several studiesnow suest, this early visual inforation can, when it 

is salient enouh (e.., the visee [p] as opposed to the visee []), have a stron ipact on the way incoinsounds are processed (see earlier discussion on theanalysis-by-synthesis theoretical view of AV speech percep-tion). or exaple, van assenhove et al. (2005)reported a decrease in the aplitude and a latency 

speed-up of the auditory evoked potentials N1 and P2as a result of a visual odulation (see also Stekelenbur& Vrooen, 2007, for siilar ERP effects usin non-speech stiuli). This evidence, toether with previousresults showin ultisensory interactions in nonpri-ary areas such as STS (e.., Calvert et al., 2000),

suest that interation between auditory and visualspeech cues possibly involves interactions at different levels of processin: early, when the speech sounds arepredicted by prior visual inforation, as findins suchas the one by van assenhove and her colleaues (2005)suest, and also late (in heteroodal areas such as theultisensory STS), as several researchers have foundbefore (Bernstein, Lu, & ian, 2008; Calvert et al.,2000; Callan et al., 2003; Miller & d’Esposito, 2005; riht et al., 2003).

In a recent study usin intracranial ERPs in epilepticpatients, Besle and collaborators (2008) proposed that this early influence fro vision to audition ay be theresult of a direct link between otion-responsive visual

areas (MT/V5) and auditory cortex, rather than theconsequence of feedback projections fro heteroo-dal ultisensory areas. In the study by Besle et al., whenparticipants were presented with (AV-atchin) conso-nant-vowel syllables such as /ba/, secondary auditory areas were activated just 10 sec after the activation of MT/V5. Most noteworthy, previous neuropsycholoicalstudies had already pointed out the relevance of visualotion processin in AV and visual speech perception(Capbell, Zihl, Massaro, Munhall, & Cohen, 1997;Munhall, Servos, Santi, & Goodale, 2002; Mohaedet al., 2005). As also reported by Besle et al. (2008), adecrease of activity in auditory areas was also observedas a result of hihly predictive visual inforation,perhaps irrorin the N1 aplitude reduction foundin earlier ERP studies of AV speech processin (e..,Besle et al., 2004; van assenhove et al., 2005). Accord-in to Besle and colleaues, this effect could reflect analleviation of auditory processin induced by the early disabiuatin role of linuistically inforative visualcues. Another interestin (yet hihly speculative)hypothesis reardin this visual-to-auditory influenceiplies that visual input odulates the neuronal oscil-latory activity in priary auditory cortex (A1;Schroeder et al., 2008). Neuronal oscillations are theresult of the spontaneous electrophysioloical activity in a population of neurons. The visual odulation of 

this oscillatory activity in A1 ay, accordin to Schro-eder and colleaues, induce a teporary hihly excit-atory state durin which ensuin auditory input couldbe facilitated (i.e., resultin in aplified neuronalresponses to sounds). The fact that an aplitude reduc-tion of the auditory-related evoked potentials (and not 

8466_024.indd 442 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 9/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 443

amplification ) is usually found as a result of the anticipa-tory effect fro visual speech will have to be taken intoconsideration, in our opinion, in further developentsof this hypothesis.

 An issue that recently raised the interest of research-ers is whether these early visual odulations of auditory 

responses to upcoin speech sounds are doain-specific about linuistic input (and reflects, for exaple,phonoloical AV interactions) or not. Althouh siilarelectrophysioloical effects (i.e., aplitude reductionand speedup of auditory evoked potentials) have beenreported for nonlinuistic stiuli when vision is predic-tive of acoustic events (such as the video clip of a personhandclappin; Stekelenbur & Vrooen, 2007; see alsoNavarra, Vatakis, et al., 2005 for parallel effects usinboth speech and nonspeech), soe researchers haveinterpreted the early effects found with AV speech asphonetic in nature (e.., Hertrich et al., 2008). Arnal,Morillon, Kell, and Giraud (2009) have recently ana-lyzed cobined data fro anetoencephaloraphy 

(MEG) and fMRI, showin that visual-to-auditory early effects see to be unresponsive to the level of conru-ency between the visual esture and the speech sound,perhaps ivin support to a prelinuistic and phono-loically nonspecific version of these early visual-to-auditory interactions. Arnal et al. also presentedevidence suestin that the visee-phonee conru-ency is detected later on in processin, perhaps owinto the processin in ultiodal reions such as STS, where AV interation would take place. Accordin to Arnal and colleaues (2009), and perhaps in line withthe proposal by Schroeder and colleaues (2008)described above, the nonspecific visual-to-auditory effects found at an early processin stae could reflect a reset of activity in auditory cortex into an excitatory/preparatory state that ay increase perceptual effi-ciency at a subsequent stae.

In conclusion, findin out when and where visual andauditory speech inputs interact (and are interated) inters of brain processes still is an onoin task. It is also worth notin that, at present, all this evidence about AV interactions needs to be interated with the ountinevidence for links between speech perception and pro-duction (see D’Ausilio et al., 2009; Möttönen & atkins,2009; Sato et al., 2010; see also Converence Views,previous subsection). Skipper and colleaues (2005),for exaple, found that activity in preotor cortex (as

 well as in the superior teporal lobe [STS and STG])durin AV speech perception can be odulated by thesaliency of the visual speech sinal. Activity in thesecortical otor areas was observed to a lesser extent durin uniodal speech perception, suestin theirstroner involveent in ultisensory perception.

g ?

ne currently debated question in ultisensory litera-ture (includin AV speech perception) is whether inte-ration across sensory odalities depends on the

distribution of the observer’s attention (see chapter 20,this volue). Interestinly, the case of AV speech is infact one of the ost frequently used illustrations of theautoatic and andatory nature of ultisensory inte-ration. Paraount aon these is the McGurk illu-sion, whereby the illusory acoustic percept arisin fro AV conflict arises even under optial hearin condi-tions and reardless of the observer’s knowlede about the anipulation (McGurk & MacDonald, 1976; seefiure 24.1). ollowin up on this idea, AV speech inte-ration has been assued to fulfill the criteria for auto-aticity that are typically used in conitive psycholoy.Naely, autoaticity iplies that, once the appropriatesensory stiulation is present, the process will initiate

quickly, in a  preattentive  anner, without uch influ-ence of the perceiver’s voluntary control and larely unaffected by the availability of processin resources(i.e., reardless of whether other onoin tasks are thecurrent focus of the observer’s attention; Schneider &Shiffrin, 1977). In the particular case of ultisensory interation, soe authors have further assued that it is a cognitively impenetrable  process (Bertelson & deGelder, 2004; Bertelson & Aschersleben, 1998; Colinet al., 2002; de Gelder & Bertelson, 2003), that is, aprocess that will proceed with little influence froother conitive systes, includin top-down attention(see Coltheart, 1999; odor, 1983).

The characterization of AV speech interation as anautoatic, attention-free process has iportant conse-quences because it iplies that the benefits arisinfro AV bindin (i.e., enhanceent in coprehen-sion, faster processin, and increased accuracy, asdescribed earlier) would coe about at alost no cost,so that little interference on other onoin brain pro-cesses would be expected. et, despite the widespreadassuption about the autoaticity of AV interation,includin that of speech (Bertelson & de Gelder, 2004;Bertelson & Aschersleben, 1998; Colin et al., 2002; deGelder & Bertelson, 2003; Bertelson, Vrooen,de Gelder, & Driver, 2000; Vrooen, Bertelson, & deGelder, 2001), recent evidence is startin to question

 whether soe of the classical criteria for autoatic pro-cessin apply to ultisensory interation (Talsa & oldorff, 2005) includin the case of speech (Alsius,Navarra, Capbell, & Soto-araco, 2005; Alsius, Navarra,& Soto-araco, 2007; airhall & Macaluso, 2009). Inaddition, there is aple consensus nowadays that the

8466_024.indd 443 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 10/18Stein—The New Handbook of Multisensory Processes

444  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

interplay between perception and attention can expressat various levels of inforation processin, includinthe earliest processin staes (see Talsa, Senkowski,Soto-araco, & oldorff, 2010, for the specific case of the interplay between attention and ultisensory inte-ration). e review epirical evidence supportin the

assuption of autoaticity (or lack thereof) in AV speech processin and attept to interate past andcurrent findins under a coon fraework.

 Explicit Manipulations of Attention to AV Speech 

The fact that perceivers do not notice, ost of the tie,their use of visual inforation durin speech percep-tion ives us an approxiate (albeit subjective) idea of the extent to which AV interation appears to be auto-atic. As noted earlier, a stron epirical aruent forthe autoaticity of AV speech interation is that theMcGurk illusion reains ipervious to several coni-tive anipulations such as repeated exposure, con-

scious knowlede of the cross-dubbed nature of thestiuli (McGurk & MacDonald, 1976), and ender is-atch between face and voice (Green, Kuhl, Meltzoff,& Stevens, 1991). This resilience of AV interation tosuch anipulations is typical of autoatic, andatory processes. Another aruent favorin the autoaticity of AV speech interation is based on the lack of effectsof explicit instructions on AV tasks (e.., Dekle, owler,& unnnell, 1992; Easton & Basala, 1982; Massaro,1987). In Massaro’s (1987) study, for instance, identifi-cation responses to inconruent stiuli (i.e., auditory /ba/ and visual [da]) were siilar reardless of whetherthe observers received the instruction to focus attentionon one or the other odality (or both), thus support-in the autoaticity hypothesis.

Nevertheless, ore recent findins hint at potentialfor odulation of AV interation. or exaple, theobserver’s seantic and lexical expectations exert soeinfluence on the probability of occurrence of theMcGurk illusion (indann, 2003). Alon siilarlines, other studies have reported that responses toMcGurk-like stiuli can in fact chane dependin onthe explicit instructions iven to participants (Colin,Radeau, & Deltenre, 2005; Easton & Basala, 1982;Massaro, 1998; but see Dekle et al., 1992), or their priorknowlede about the dubbed nature of the stiuli(Suerfield & McGrath, 1984). inally, in a recent 

study it was found that the tie needed to find thetalkin face that atched an acoustically presented sen-tence stronly depended on the nuber of distractortalkin faces (Alsius & Soto-araco, 2011). Thus,althouh there are a nuber of observations suestinthe lack of conitive influence on AV speech bindin,

current data fro studies anipulatin the observer’sexpectations and prior knowlede about the stiulireveal certain flexibility that does not fit perfectly withthe idea of a strictly autoatic and involuntary echanis.

Implicit Manipulations of Attention to AV Speech Explicit anipulations in an intersensory conflict situ-ation just like the ones discussed above can be subject to confounds involvin conitive biases (see de Gelder& Bertelson, 2003, p. 462, for a recent discussion). orexaple, it is ipossible to know with certainty that siply instructin participants to focus on the auditory coponent of the AV event and inore vision (i.e.,Massaro, 1987; Dekle et al., 1992; Easton & Basala,1982) will ensure lack of attention to the visual copo-nent. It is in fact well known that in low-perceptual-loadsituations attention can spill over to task-irrelevant stiuli (e.., Lavie, 1995). Consequently, the interpreta-

tion of the findins discussed above in ters of auto-aticity, or lack thereof, ust reain inconclusive. Alternative paradis usin iplicit anipulations of the participants’ attention on the AV stiulus are lesssusceptible to these types of confounds. or exaple,Driver (1996) used a selective listenin task where par-ticipants had to recall words fro one (taret) out of two possible and overlappin speech streas (bothstreas coposed of triplets of two-syllable words pre-sented at a rate of two per second). Perforance wasbetter when a visual talkin face that atched the audi-tory taret utterances was presented away fro thesource of the sounds than at the source of the sounds. Accordin to Driver, the visible talkin face selectively attracted the (correlated) relevant auditory streatoward its spatial location, thus illusorily separatin thetaret strea fro the distractor strea (i.e., the ven-triloquist effect). In other words, AV conruency actedto create an illusory spatial separation that facilitatedselective attention to one speech strea over the otherdespite the fact that the two auditory speech streas were physically presented fro exactly the sae loca-tion. This suests that AV conruency was processedbefore spatial selective attention was deployed.

In a study by Andersen, Tiippana, Laarni, Kojo, andSas (2009), participants had to covertly attend to oneof two speakin faces that were presented siultane-

ously (one at each side of fixation) while reportincentrally presented auditory speech (syllables). The syl-lable spoken by the attended visual face doinatedparticipants’ reports of the auditory taret syllable, su-estin a odulation of visual influence of AV process-in by the direction of spatial attention. As with tasks

8466_024.indd 444 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 11/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 445

explicitly directin attention to a specific odality (dis-cussed above), one quibble with the interpretation of the findins of Andersen et al. is that the lip oveentsat the attended location iht have biased responses at the decision stae rather than the perception stae. Inorder to be able to test the autoaticity of AV speech

interation without the potential confounds associated with requirin explicit judents fro the perceiver,Soto-araco, Navarra, & Alsius (2004) easured theMcGurk effect indirectly. Participants were asked toclassify the first syllable of disyllabic pseudowords (suchas /tobi/ or /tadi/) irrespective of the second syllable, which could be kept constant throuhout a block of trials (always /di/ or always /bi/) or else vary randoly fro trial to trial; this latter condition is known to leadto a slowdown in RTs (see Navarra, Sebastián-Gallés, &Soto-araco, 2005; Navarra & Soto-araco, 2007; Pallier,1994). In the variable condition, the syllable /di/ wasachieved by dubbin AV-conruent ites (auditory /todi/ + visual [todi]), or else AV-inconruent ites con-

ducive to illusory fusion “di” (auditory /tobi/ + visual[toi]). Syllabic interference was equally effective withreal as well as with illusory variation, suestin that AV bindin occurred in a copletely task-irrelevant syllableand prior to the perceptual classification of the task-relevant syllable. urtherore, in a copleentary experient, syllabic interference could be preventedby alternatin real and illusory /di/, despite the syllabic variability present in each odality individually.

Gentilucci and Cattaneo (2005) provide anotherdeonstration of an indirect (and possibly involuntary)influence of AV interation, this tie on the perceiver’sarticulatory production of taret AV stiuli. In theirstudy, participants’ articulation of spoken responses tothe identity of AV-inconruent syllables was influencedby the visually specified speech esture at kineatic andacoustic levels. So, in AV-inconruent trials for whichparticipants had answered correctly (i.e., when they didnot experience the McGurk illusion), the kineaticprofile of their verbal response was nevertheless differ-ent fro the production of responses to AV-conruent trials. Altoether, these results show evidence of AV speech interation at staes of processin that do not operate under voluntary control and perhaps beforethe allocation of attention (e.., selectin a spatial loca-tion, as per Driver’s 1996 findins).

Neural Evidence of Attention-Free versus Attention- Influenced AV Integration of Speech 

Colin et al. (2002) used McGurk fusions in anERP isatch neativity (MMN) paradi wheredeviant oddballs were presented within an otherwise

hooeneous sequence of auditory stiuli (i.e., /ba/).In this study by Colin et al., deviant stiuli consisted of inconruent visual lip oveents ([a]) dubbed ontoone of the standard (/ba/) auditory events. Despite theabsence of a real acoustic chane, these illusory deviants  evoked a isatch ERP response, leadin Colin et al.

(2002) to conclude that AV interation takes placeprior to the preattentive coparison process ivin riseto the MMN. ther studies have shown that McGurkstiuli can eliinate the MMN difference wave whenresultin in an illusory standard event (i.e., illusory “da”induced by /ba/ + [a] aon real /da/ standardstiuli; Kislyuk, Möttönen, & Sas, 2008). This last findin is iportant in order to rule out the possibility that the MMN found by Colin et al. (2002) was, in fact,a brain response to AV inconruence (see Ponton et al.,2009, for potential confounds associated with easur-in isatch neativity in AV speech).

 A recent fMRI study by airhall & Macaluso (2009),however, points to a rather different conclusion about 

the attention requireents for AV speech interation.airhall and colleaues anipulated the direction of covert spatial attention to lateralized speakin faces inthe presence of a central auditory speech strea(siilar to Andersen et al., 2009). hen visual attention was directed to a speakin face whose lips atched thecentral sound, visual taret detection iproved, and,iportantly, the STS area was selectively activated (bilat-erally) as copared to when the isatchin talkinface was attended. urther analyses revealed yet otherareas selectively activated by attended AV conruency,includin the superior colliculus as well as visual sensory areas. Contrary to the ERP findins discussed above(Colin et al. 2002, Kislyuk et al., 2008), this result su-ests a odulation of (spatial) attention on AV speechprocessin, whose consequences carry over to ultisen-sory association areas.

The Interplay between Attention and AV Integration of Speech 

 As the data reviewed above reveal, soe anipulationsof attention can alter the outcoe of AV speech intera-tion, whereas others see to be ineffective. ne possi-ble key in order to evaluate this ixed pattern of resultsis whether or not the potential odulations of AV speech interation caused by attention reflect odula-

tions at the level of unisensory processin (that carry over to ultisensory interation staes), or whether,alternatively, attention has a direct ipact on the ul-tisensory interation echanis itself (Navarra et al.,2010). Alsius et al. (2005, 2007; see also Tiippana, Andersen, & Sas, 2004) addressed this question usin

8466_024.indd 445 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 12/18Stein—The New Handbook of Multisensory Processes

446  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

a dual-task paradi in which participants reportedtheir perception of McGurk stiuli presented occasion-ally in the context of a difficult visual, auditory, orsoatosensory onitorin task. As expected, visualload reduced the ipact of visual influence on auditory speech (less McGurk effect, ore auditory responses)

 when copared to a no-load baseline. The interestinresult was that auditory load also led to a reduction of visual  influence on AV interation and, perhaps coun-terintuitively iven that the perceptual load was iposedon the auditory syste, to an increase of auditory-basedresponses. In all cases (see Alsius et al., 2007, for effectsof soatosensory load on McGurk effect), the reduc-tion in the McGurk illusion in dual-task conditions wasstroner than any decreent in unisensory perception(auditory or visual) and thus consistent with the ideathat depletin resources in any sensory odality by iposin a deandin task reduced the ability to inte-rate ultisensory inforation in exactly the saefashion (i.e., leadin to ore auditory-based responses).

ne possible interpretation of these data is that theshortae of resources depletes the online processin of the less-inforative (visual) sensory input when boththe visual and the auditory sinals are available concur-rently (see fiure 24.4). ollowin up on this idea, thereduction in McGurk illusions could have resulted frothe attenuation of the weiht iven to the visual input durin AV speech interation when resources werecoproised (see Navarra et al., 2010).

inally, soe studies have addressed whether, asexpected fro an encapsulated and cognitively impenetra- ble process, AV speech interation echaniss prevent access to inforation about the individual (unisensory)coponents of the stiulus. This unitary notion of the AV percept has been, in fact, challened by recent reports showin sensitivity to odality-specific charac-teristics durin AV speech processin. or exaple,robust AV interation reains even under face/voiceender isatch disparities that otherwise lead to anincreased sensitivity to AV teporal asynchrony (Vatakis& Spence, 2007). In a ore explicit deonstration of this point, Soto-araco and Alsius (2007, 2009) usedinconruent AV speech tokens leadin to the illusory cobination /bda/ (visual [ba] + acoustic /da/, e..,McGurk & MacDonald, 1976) presented at varyin AV asynchronies. Participants were asked to identify thesyllable they heard and to perfor a teporal order

 judent (or siultaneity) task about the AV asyn-chrony in each trial. The data revealed that the tepo-ral window at which the AV illusion occurred wassubstantially wider than the teporal interval neededto resolve the correct order between the AV sinals.That is, at soe asynchronies, participants would

g . Three hypothetical odels reardin the pos-sible influence of attention on the perception of AV speechare presented scheatically. Accordin to odel 1, attentiondoes not have any influence on AV speech perception (e..,Dekle et al., 1992; Easton & Basala, 1982; Massaro, 1987).Model 2 assues that attention odulates the interation of  visual and auditory speech sinals itself. hen the availableattentional resources are depleted, the final percept will rely,

accordin to odel 2, on uniodal inforation (e.., audi-tory, in Alsius et al., 2005; 2007). Model 3 predicts that atten-tion can influence the processin of visual speech input (aruably the less inforative channel in noral acousticconditions). Accordin to this odel, the processin of visualspeech inforation is (at least partially) blocked when atten-tional resources are nearly exhausted (see Navarra et al.,2010). Recent studies have found evidence of attentionalodulations of AV speech perception at ultiple staes of processin, perhaps backin up a cobination of odels 2and 3 (see airhall & Macaluso, 2009).

correctly report that audition (containin /da/) te-porally led vision (containin [ba]) but classify the per-ceived syllable as /bda/ (note the inversion in thesequence of phonees). This findin stronly sueststhat access to soe features of the individual uniodalstiuli (teporal order) is not prevented when AV events are bound (see Miller & D’Esposito, 2005, fordissociation between neural correlates of AV fusion andcoincidence detection).

In suary, extant data are far fro conclusive withreard to the assuption of autoaticity of AV speechbindin held by several authors (Bertelson & Ascher-leben, 1998; Bertelson & de Gelder, 2004; Bertelson et al., 2000; Colin et al., 2002; de Gelder and Bertelson,2003; de Gelder, Böcker, Tuoainen, Hennsen, & Vrooen, 1999; Vrooen et al., 2001). n the one

hand, it is clear that the consequences of AV interationof speech are quite copellin and that these pheno-ena are indeed robust to several conitive and atten-tional anipulations. n the other hand, it is not lesstrue that behavioral (dual task) and neuroiain(spatial attention) evidence for attentional odulation

8466_024.indd 446 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 13/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 447

and conitive penetrability of AV speech interation isstartin to accuulate (see Navarra et al., 2010, for areview on this issue). This pattern of findins is ineneral inconsistent with an extree conceptualizationabout the strictly autoatic nature of AV speech inte-ration. Instead, evidence sees to be in line with

recent proposals about the flexible interplay betweenultisensory interation and attention (Talsa et al.,2010).

g m

Perceivin speech is, ost of the tie, a ultisensory experience involvin the interplay between at least twodifferent sensory odalities: audition and vision. Althouh we cannot inore the fact that, under oodlistenin conditions, auditory input is sufficient tounderstand speech, there is undeniable evidence that the visual and auditory aspects of speech, when avail-able, contribute to an interated perception of spoken

lanuae. Indeed, anticipatory visual inforation seesto odulate the way sounds will be processed quiteearly in the sinal processin.

Speech is an extreely coplex stiulus that variescontinuously over tie at very fast rates. Thus, retriev-in as uch inforation as early as possible becoescrucial for counication to be efficient. The present  volue is full of exaples where the cobination of sinals fro different sensory odalities leads to oreaccurate and faster representations of perceptualobjects, and speech is not an exception. e discussedevidence suestin that AV interation of speechresults in a ore efficient encodin of a essae. Althouh experience is crucial for developin an effec-tive AV speech-processin syste, there is evidence that infants are surprisinly well prepared to interate visualand acoustic speech at very early aes.

The bindin of AV speech streas sees to be, infact, so stron that we are less sensitive to AV asynchrony  when perceivin speech than when perceivin otherstiuli (see Vatakis, Ghazanfar & Spence, 2008 Vatakis& Spence, 2007). Despite results such as these (see alsoTuoainen, Andersen, Tiippana, & Sas, 2005) that suest that AV speech represents a special case of per-ceptual input for the brain, other experiental evi-dence does in fact indicate that this ay be, at least forthe processin of certain perceptual properties, not the

case. or exaple, teporal adaptation effects observeddurin the exposure to teporally isalined AV speech streas influence not only the perception of speech but also the perception of other (sipler)stiuli (e.., Navarra, Vatakis, et al., 2005). urther-ore, the perception of speech and nonspeech stiuli

sees to share soe coon resources that can bedepleted by a nonspeech deandin task (Alsius et al.,2005, 2007). Electrophysioloical evidence of a eneral(rather than speech-specific) predictive nature of visualprocessin with respect to the processin of sounds isanother exaple where the apparently special status of 

speech perception could be ruled out in the benefit of a eneral stiulus-nonspecific forulation of huanperception (see Stekelenbur & Vrooen, 2007).

Reardless of whether it is special or not in ters of brain and conitive processes, speech is essentially ul-tisensory. Its perception sees to iply the constant interaction between vision and audition and, possibly,the cobination of visual and auditory inforation inheteroodal areas of the brain as well as the activationof other areas related to phonoloical representationsand articulatory behavior. Althouh visual speech cuesare interated with sounds in a quite robust and auto-atic fashion, recent studies suest (in line with otherstudies described in section V, Attention, in this volue)

that processes underlyin this interation are oreaenable to conitive and attentional penetrability than oriinally assued.

gm

This work was supported by rant PSI2009–12859 andRaón y Cajal (RyC-2008–00231) Prorae froMinisterio de Ciencia e Innovación  (Spain) to . Navarra,by rant 81103 fro Natural Sciences and EnineerinResearch Council of Canada to . . erker, and by rants PSI2010–15426 and CDS00012 fro Ministerio de Ciencia e Innovación  (Spain), rant 2009SGR-292 froComissionat per a Universitats i Recerca del DIUE-Generalitat de Catalunya , and rant StG-2010 263145 fro the Euro-pean Research Council to S. Soto-araco.

1. ne can experience for oneself how the auditory percept siply chanes (i.e., fro /da/ to /ba/) as a function of  whether one opens or closes one’s eyes. There are deon-strations available online; a siple search for “McGurkillusion” in the browser will return soe exaples (e..,http://www.youtube.co/watchv=jtsfidRq2tw). Note that phonees appear between slashes (/ /) and visees (the visual equivalent to phonees) between brackets ([ ]).

2. e note that properties of the speech sinal can also beextracted fro sensory odalities other than vision and

audition (e.., Auer, Bernstein, & Coulter, 1998; Bernsteinet al., 1989; Gick & Derrick, 2009; Grant, Ardell, Kuhl, &Sparks, 1985; uan, Reed, & Durlach, 2005; see Kiran,1973 and Suers, 1992, for reviews). or exaple,soatosensory inputs can influence speech perceptionboth in individuals who have experience usin tactileethods of counication (Bernstein, Deorest,

8466_024.indd 447 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 14/18Stein—The New Handbook of Multisensory Processes

448  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

Coulter, & ’Connell, 1991; Reed, Durlach, Braida, &Schultz, 1989) as well as in individuals without any previousexplicit trainin usin, for exaple, tactile aids (owler &Dekle, 1991; Gick & Derrick, 2009; Gick, óhannsdóttir,Gibraiel, & Mühlbauer, 2008; Ito et al., 2009). or reasonsof brevity, however, we will concentrate only on the AV casebecause the bulk of the research and the related controver-

sies can be best exeplified in this literature.3. ollowin seinal studies recordin sinle-cell activity inthe cat’s superior colliculus (SC), the idea of superadditiv-ity in huans was characterized by a neural response (inthis case in ters of BLD sinal, in a population of neurons) that exceeds the su of the responses elicitedby each of the uniodal inputs separately in that particu-lar brain area. Even thouh the validity of the superaddi-tive criterion to reveal ultisensory interation has beenput in doubt (both in huan fMRI studies, e.., Lauri-enti, Perrault, Stanford, allace, & Stein, 2005; Steven-son, Ki, & aes, 2009; as well as in ter of sinle-cellelectrophysioloy, e.., Holes, 2009; Holes & Spence,2005), the STS still stands out as a ultisensory site for AV speech interation when usin other criteria (see Ste- venson et al., 2009; Calvert et al., 2000).

4. Note that the early interactions between audition and vision seen when subtractin the su of the auditory and visual responses (A + V) fro the audiovisual response(AV) have to be taken with caution because of possibledifferences in the level of attention aon the auditory,the visual, and the AV conditions (see Besle et al., 2009). hen these possible effects are controlled, a odulationof the auditory ERP coponents at 100–125 sec afterthe stiulus onset is observed (see Besle et al., 2009;Ponton et al., 2009).

 Aldride, M. A., Braa, E. S., alton, G. E., & Bower, T. G. R.(1999). The interodal representation of speech in new-borns. Developmental Science , 2 , 42–46.

 Alsius, A., Navarra, ., & Soto-araco, S. (2007). Attention totouch reduces audiovisual speech interation. Experimental Brain Research , 183 , 399–404.

 Alsius, A., & Soto-araco, S. (2011). Searchin for audiovisualcorrespondence in ultiple speaker scenarios. Experimental Brain Research ; Epub ahead of print.

 Alsius, A., Navarra, ., Capbell, R., & Soto-araco, S. (2005). Audiovisual interation of speech falters under hih atten-tion deands. Current Biology , 15 , 839–843.

 Andersen, T. S., Tiippana, K., Laarni, ., Kojo, I., & Sas, M.(2009). The role of visual spatial attention in audiovisualspeech perception. Speech Communication , 51, 184–193.

 Arnal, L. H., Morillon, B., Kell, C. A., & Giraud, A. L. (2009).Dual neural routin of visual facilitation in speech process-in. Journal of Neuroscience , 29 , 13445–13453.

 Auer, E. T., r. (2010). Investiatin speechreadin and deaf-ness.  Journal of the American Academy of Audiology , 21,163–168.

 Auer, E. T., r., Bernstein, L. E., & Coulter, D. C. (1998). Te-poral and spatio-teporal vibrotactile displays for voicefundaental frequency: an initial evaluation of a new vibro-tactile speech perception aid with noral-hearin and

hearin-ipaired individuals. Journal of the Acoustical Society of America , 104 , 2477–2489.

Baier, R., Idsardi, ., & Lidz, . (2007). Two-onth-olds aresensitive to lip roundin in dynaic and static speech events. Audiovisual Speech Processin Conference, KasteelGroenendaal, Hilvarenbeek, The Netherlands.

Beauchap, M. S., Arall, B. D., Bodurka, ., Duyn, . H., &

Martin, A. (2004). Unravelin ultisensory interation:patchy oranization within huan STS ultisensory cortex.Nature Neuroscience , 7 , 1190–1192.

Bell-Berti, ., Raphael, L. ., Pisoni, D. B., & Sawusch, . R.(1979). Soe relationships between speech productionand perception. Phonetica , 36 , 373–383.

Bernstein, L. E., Auer, E. T., r., Moore, . K., Ponton, C. .,Don, M., & Sinh, M. (2002). Visual speech perception without priary auditory cortex activation. Neuroreport , 13 ,311–315.

Bernstein, L. E., Auer, E. T., r., aner, M., & Ponton, C. .(2008). Spatio-teporal dynaics of audiovisual speechprocessin. NeuroImage , 39 , 423–435.

Bernstein, L. E., Auer, E. T., r., & Moore, . K. (2004). Audiovi-sual speech bindin: Converence or association InG. Calvert, C. Spence, & B. E. Stein (Eds.), Handbook of Multi- sensory Processing (pp. 203–223). Cabride, MA: MIT Press.

Bernstein, L. E., Deorest, M. E., Coulter, D. C., & ’Connell,M. P. (1991). Lipreadin sentences with vibrotactile vocod-ers: perforance of noral-hearin and hearin-ipairedsubjects.  Journal of the Acoustical Society of America , 90 ,2971–2984.

Bernstein, L. E., Eberhardt, S. P., & Deorest, M. E. (1989).Sinle-channel vibrotactile suppleents to visual percep-tion of intonation and stress. Journal of the Acoustical Society of America , 85 , 397–405.

Bernstein, L. E., Lu, Z. L., & ian, . (2008). Quantifiedacoustic-optical speech sinal inconruity identifies corticalsites of audiovisual speech processin. Brain Research , 1242 ,172–184.

Bertelson, P., & Aschersleben, G. (1998). Autoatic visual biasof perceived auditory location. Psychonomic Bulletin & Review , 5 , 482–489.

Bertelson, P., & de Gelder, B. (2004). The psycholoy of ul-tiodal perception. In C. Spence & . Driver (Eds.), Cross- modal space and crossmodal attention (pp. 141–179). xford:xford University Press.

Bertelson, P., Vrooen, ., de Gelder, B., & Driver, . (2000).The ventriloquist effect does not depend on the directionof deliberate visual attention. Perception & Psychophysics , 62 ,321–332.

Besle, ., ischer, C., Bidet-Caulet, A., Lecainard, .,Bertrand, ., & Giard, M. H. (2008). Visual activation andaudiovisual interactions in the auditory cortex durinspeech perception: intracranial recordins in huans.

 Journal of Neuroscience , 28 , 14301–14310.Besle, ., ort, A., Delpuech, C., & Giard, M. H. (2004).

Biodal speech: early suppressive visual effects in huanauditory cortex.  European Journal of Neuroscience , 20 ,

2225–2234.Best, C. (1995). A irect ealist iew of ross-anuae peech ercep-tion. In . Strane (Ed.), Speech erception and inguistic xperi- ence. Issues in ross-anguage esearch (pp. 171–204). Baltiore,MD: ork Press.

Binder, . R., rost, . A., Haeke, T. A., Cox, R. ., Rao, S.M., & Prieto, T. (1997). Huan brain lanuae areas identi-

8466_024.indd 448 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 15/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 449

fied by functional MRI.  Journal of Neuroscience , 17 ,353–362.

Birch, H., & Lefford, A. (1963). Intersensory developent inchildren. Monographs of the Society for Research in Child Devel- opment , 28.

Blasi, V., Paulesu, E., Mantovani, ., Menoncello, L., Giovanni,U. D., Sensolo, S., et al. (1999). Ventral prefrontal areas

specialised for lip-readin: a PET activation study. NeuroIm- age , 9 , 1003.Braida, L. (1991). Crossodal interation in the identifica-

tion of consonant seents. Quarterly Journal of Experimental Psychology. A, Human Experimental Psychology , 43 , 647–678.

Bristow, D., Dehaene-Labertz, G., Mattout, ., Soares, C.,Glia, T., Baillet, S., (2009). Hearin faces: how the infant brain atches the face it sees with the speech it hears.

 Journal of Cognitive Neuroscience , 21, 905–921.Burnha, D., & Dodd, B. (2004). Auditory-visual speech inte-

ration by pre-linuistic infants: perception of an eerent consonant in the McGurk effect. Developmental Psychobiology ,44 , 204–220.

Callan, D., ones, . A., Munhall, K. G., Kroos, C., Callan, A.,& Vatikiotis-Bateson, E. (2003). Neural processes underly-in perceptual enhanceent by visual speech estures. Neu- roreport , 14 , 2213–2218.

Calvert, G. A., Bullore, E. T., Braer, M. ., & Capbell,R. (1997). Activation of auditory cortex durin silent lip-readin. Science , 276 , 593–596.

Calvert, G. A., Capbell, R., & Braer, M. . (2000). Evi-dence fro functional anetic resonance iain of crossodal bindin in the huan heteroodal cortex.Current Biology , 10 , 649–657.

Capbell, R., Zihl, ., Massaro, D., , K., & Cohen, M. (1997).Speechreadin in the akinetopsic patient L.M. Brain , 120 ,179–1803.

Colin, C., Radeau, M., & Deltenre, P. (2005). Top-down andbotto-up odulation of audiovisual interation in speech.

 European Journal of Cognitive Psychology , 17 , 541–560.Colin, C., Radeau, M., Soquet, A., Deolin, D., Colin, ., &

Deltenre, P. (2002). Misatch neativity evoked by theMcGurk-MacDonald effect: a phonetic representation

 within short-ter eory. Clinical Neurophysiology , 113 ,495–506.

Coltheart, M. (1999). Modularity and conition. Trends in Cognitive Sciences , 3 , 115–120.

Cotton, . C. (1935). Noral visual hearin. Science , 82 ,592–593.

Crinion, . T., arburton, E. A., Labon-Ralph, M. A.,Howard, D., & ise, R. . (2006). Listenin to narrativespeech after aphasic stroke: the role of the left anteriorteporal lobe. Cerebral Cortex , 16 , 1116.

D’Ausilio, A., Pulverüller, ., Salas, P., Bufalari, I., Belio-ini, C., & adia, L. (2009). The otor soatotopy of speech perception. Current Biology , 19 , 381–385.

de Gelder, B., Böcker, K. B., Tuoainen, ., Hensen, M., & Vrooen, . (1999). The cobined perception of eotionfro voice and face: early interaction revealed by huan

electric brain responses. Neuroscience Letters , 260 , 133–136.de Gelder, B., & Bertelson, P. (2003). Multisensory intera-tion, perception and ecoloical validity. Trends in Cognitive Sciences , 7 , 460–467.

Dekle, D., owler, C., & unnell, M. (1992). Auditory-visualinteration in perception of real words. Perception & Psycho- 

 physics , 51, 355–362.

Desjardins, R. N., & erker, . . (2004). Is the interationof heard and seen speech andatory for infants  Develop- mental Psychobiology , 45 , 187–203.

Diehl, R. L., & Kluender, K. R. (1987) n the cateorizationof speech sounds. In S. Harnad (Ed.) Categorical perception: the groundwork of cognition  (pp. 226–253). New ork:Cabride Univerity Press.

Diehl, R. L., Lotto, A. ., & Holt, L. L. (2004). Speech percep-tion. Annual Review of Psychology , 55 , 149–179.Dodd, B. (1979). Lip-readin in infants: attention to speech

presented in- and out-of-synchrony. Cognitive Psychology , 11,478–484.

Driver, . (1996). Enhanceent of selective listenin by illu-sory islocation of speech sounds due to lip-readin.Nature , 381, 66–68.

Easton, R. D., & Basala, M. (1982). Perceptual doinancedurin lipreadin. Perception & Psychophysics , 32 , 562–570.

adia, L., Craihero, L., Buccino, G., & Rizzolatti, G. (2002).Speech listenin specifically odulates the excitability of tonue uscles: a TMS study.  European Journal of Neurosci- ence , 15 , 399–402.

adia, L., Craihero, L., & livier, E. (2005). Huan otorcortex excitability durin the perception of others’ action.Current Opinion in Neurobiology , 15 , 213–218.

airhall, S. L., & Macaluso, E. (2009). Spatial attention canodulate audiovisual interation at ultiple cortical andsubcortical sites.  European Journal of Neuroscience , 29 ,1247–1257.

odor, . (1983). The modularity of mind . Cabride, MA: TheMIT Press.

owler, C. A. (2004). Speech as a supraodal or aodal phe-noenon. In G. Calvert, C. Spence, & B. E. Stein (Eds.),Handbook of Multisensory Processing  (pp. 189–202).Cabride, MA: MIT Press.

owler, C. A., Brown, . M., Sabadini, L., & eihin, . (2003).Rapid access to speech estures in perception: evidencefro choice and siple response tie tasks.  Journal of Memory and Language , 49 , 396–413.

owler, C. A., & Dekle, D. . (1991). Listenin with eye and

hand: cross-odal contributions to speech perception. Journal of Experimental Psychology. Human Perception and Performance , 17 , 816–823.

Galantucci, B., owler, C. A., & Goldstein, L. (2009). Percep-tuootor copatibility effects in speech. Attention, Percep- tion & Psychophysics , 71, 1138–1149.

Gentilucci, M., & Bernardis, P. (2007). Iitation durinphonee production. Neuropsychologia , 45 , 608–615.

Gentilucci, M., & Cattaneo, L. (2005). Autoatic audiovisualinteration in speech perception.  Experimental Brain Research , 167 , 66–75.

Gibson, E. . (1969). Principles of perceptual learning and develop- ment . Enlewood Cliffs, N: Prentice Hall.

Gick, B., & Derrick, D. (2009). Aero-tactile interation inspeech perception. Nature , 462 , 502–504.

Gick, B., óhannsdóttir, K. M., Gibraiel, D., & Mühlbauer, .

(2008). Tactile enhanceent of auditory and visual speechperception in untrained perceivers. Journal of the Acoustical Society of America , 123 , EL72–EL76.

Gordon, P. C., & Meyer, D. E. (1984). Perceptual-otor pro-cessin of phonetic features in speech. Journal of Experimen- tal Psychology. Human Perception and Performance , 10 ,153–178.

8466_024.indd 449 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 16/18Stein—The New Handbook of Multisensory Processes

450  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

Grant, K. ., Ardell, L. H., Kuhl, P. K., & Sparks, D. . (1985).The contribution of fundaental frequency, aplitudeenvelope, and voicin duration cues to speechreadin innoral-hearin subjects.  Journal of the Acoustical Society of America , 77 , 671–677.

Grant, K. ., & Greenber, S. (2001), Speech intelliibility derived fro asynchronous processin of auditory-visual

inforation, Proceedings of the AVSP 2001 International Confer- ence of Auditory-Visual Speech Processing , Scheelsinde,Denark, pp. 132–137.

Green, K. P., & Gerdean, A. (1995). Cross-odal discrepan-cies in coarticulation and the interation of speech infor-ation: the McGurk effect with isatched vowels. Journal of Experimental Psychology. Human Perception and Performance ,21, 1409–1426.

Green, K. P., Kuhl, P. K., Meltzoff, A. N., & Stevens, E. B.(1991). Interatin speech inforation across talkers,ender, and sensory odality: feale faces and ale vocesin the McGurk effects. Perception & Psychophysics , 50 ,524–536.

Hadar, U., Steiner, T. ., Grant, E. C., & Rose, . C. (1983).Head oveent correlates of juncture and stress at sen-tence level. Language and Speech , 26 , 117–129.

Hadar, U., Steiner, T. ., Grant, E. C., & Rose, . C. (1984).The tiin of shifts in head posture durin conversation.Human Movement Science , 3 , 237–245.

Hertrich, I., Mathiak, K., Lutzenberer, ., & Ackerann, H.(2008). Tie course of early audiovisual interactions durinspeech and nonspeech central auditory processin: a a-netoencephaloraphy study. Journal of Cognitive Neuroscience ,21, 259–274.

Hickok, G., Holt, L. L., & Lotto, A. . (2009). Response to ilson: what does otor cortex contribute to speech per-ception Trends in Cognitive Sciences , 13 , 330–331.

Holes, N. P. (2009). The principle of inverse effectivenessin ultisensory interation: soe statistical considerations.Brain Topography , 21, 168–176.

Holes, N. P., & Spence, C. (2005). Multisensory interation:space, tie, and superadditivity. Current Biology , 15 ,R762–R764.

Houde, . ., & ordan, M. I. (1998). Sensoriotor adaptationin speech production. Science , 279 , 1213–1216.

Ito, T., Tiede, M., & stry, D. . (2009). Soatosensory func-tion in speech perception. Proceedings of the National Academy of Sciences of the United States of America , 106 , 1245–1248.

 ian, ., Auer, E. T., r., Alwan, A., Keatin, P. A., & Bernstein,L. E. (2007). Siilarity structure in visual speech percep-tion and optical phonetics. Perception & Psychophysics , 69 ,1070–1083.

Kerzel, D., & Bekkerin, H. (2000). Motor activation fro visible speech: evidence fro stiulus response copatibil-ity.  Journal of Experimental Psychology. Human Perception and Performance , 26 , 634–647.

Kiran, . H. (1973). Tactile counication of speech: areview and an analysis. Psychological Bulletin , 80 , 54–74.

Kislyuk, D. S., Möttönen, R., & Sas, M. (2008). Visual pro-

cessin affects the neural basis of auditory discriination. Journal of Cognitive Neuroscience , 20 , 2175–2184.Klatt, D. H. (1980). Speech perception. A odel of acoustic-

phoneic analysis and lexical access. Journal of Phonetics , 8 ,279–312.

Kuhl, P. K., & Meltzoff, A. N. (1982). The biodal perceptionof speech in infancy. Science , 218 , 1138–1141.

Kuhl, P. K., & Meltzoff, A. N. (1984). The interodal repre-sentation of speech in infants. Infant Behavior and Develop- ment , 7 , 361–381.

Kushnerenko, E., Teinonen, T., Volein, A., & Csibra, G.(2008). Electrophysioloical evidence of illusory audiovi-sual speech percept in huan infants. Proceedings of the National Academy of Sciences of the United States of America , 105 ,

11442–11445.Laurienti, P. ., Perrault, T. ., Stanford, T. R., allace, M. T.,& Stein, B. E. (2005). n the use of superadditivity as aetric for characterizin ultisensory interation in func-tional neuroiain studies.  Experimental Brain Research ,166 , 289–297.

Lavie, N. (1995). Perceptual load as a necessary condition forselective attention. Journal of Experimental Psychology. Human Perception and Performance , 21, 451–468.

Lebib, R., Papo, D., de Bode, S., & Baudonnière, P. M. (2003).Evidence of a visual-to-auditory cross-odal sensory atinphenoenon as reflected by the huan P50 event-relatedbrain potential odulation. Neuroscience Letters , 341,185–188.

Levelt, . . M., Schriefers, H., Vorber, D., Meyer, A. S.,Pechann, T., & Havina, . (1991). The tie course of lexical access in speech production: a study of picturenain. Psychological Review , 98 , 122–142.

Lewkowicz, D. . (1994). Developent of intersensory perception in huan infants. In D. . Lewkowicz & R.Lickliter (Eds.), The development of intersensory perception: com- 

 parative perspectives  (pp. 165–203). Hillsdale, N: LawrenceErlbau.

Lewkowicz, D. . (2000). Infants’ perception of the audible, visible and biodal attributes of ultiodal syllables. Child  Development , 71, 1241–1257.

Lewkowicz, D. . (2010). Infant perception of audiovisualspeech synchrony. Developmental Psychology , 46 , 66–77.

Lewkowicz, D. ., & Ghazanfar, A. A. (2006). The decline of cross-species intersensory perception in huan infants. Pro- ceedings of the National Academy of Sciences of the United States of America , 103 , 6771–6774.

Lewkowicz, D. ., Leo, I., & Siion, . (2010). Intersensory 

perception at birth: newborns atch non-huan priatefaces & voices. Infancy , 15 , 46–60.

Lewkowicz, D. ., Sowinski, R., & Place, S. (2008). The declineof cross-species intersensory perception in huan infants:underlyin echaniss & its developental persistence.Brain Research , 1242 , 291–302.

Liberan, A. M. (1982). n findin that speech is special.American Psychologist , 37 , 148–167.

Liberan, A. M., Cooper, . S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psycho- logical Review , 74 , 431–461.

Liberan, A. M., & Mattinly, I. G. (1985). The otor theory of speech perception revised. Cognition , 21, 1–36.

MacKain, K., Studdert-Kennedy, M., Spieker, S., & Stern, D.(1983). Infant interodal speech perception is a left hei-sphere function. Science , 219 , 1347–1349.

Mahon, B. Z., & Caraazza, A. (2008). A critical look at theebodied conition hypothesis and a new proposal forroundin conceptual content.  Journal of Physiology, Paris ,102 , 59–70.

Massaro, D. . (1987). Speech perception by ear and eye: a para- digm for psychological enquiry . Hillsdale, N: LawrenceErlbau Associat.

8466_024.indd 450 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 17/18Stein—The New Handbook of Multisensory Processes

MULTISENSR INTERACTINS IN SPEECH PERCEPTIN 451

Massaro, D. . (1998). Perceiving talking faces: from speech percep- tion to a behavioral principle . Cabride, MA: MIT Press.

Massaro, D. . (2004). ro ultisensory interation totalkin heads and lanuae learnin. In G. Calvert, C.Spence, & B. E. Stein (Eds.), Handbook of multisensory process- ing (pp. 153–176). Cabride, MA: MIT Press.

McClelland, . L., & Elan, . L. (1986). The TRACE odel

of speech perception. Cognitive Psychology , 18 , 1–86.McGurk, H., & MacDonald, . (1976). Hearin lips and seein voices. Nature , 264 , 746–748.

Miller, L. M., & d’Esposito, M. (2005). Perceptual fusion andstiulus coincidence in the cross-odal interation of speech. Journal of Neuroscience , 25 , 5884–5893.

Mitterer, H., & Ernestus, M. (2008). The link between speechperception and production is phonoloical and abstract:evidence fro the shadowin task. Cognition , 109 , 168–173.

Mohaed, T., Capbell, R., MacSweeney, M., Milne, E.,Hansen, P., & Colean, M. (2005). Speechreadin skill and visual oveent sensitivity are related in deaf speechread-ers. Perception , 34 , 205–216.

Möttönen, R., Schürann, M., & Sas, M. (2004). Tiecourse of ultisensory interactions durin audiovisualspeech perception in huans: a anetoencephaloraphicstudy. Neuroscience Letters , 363 , 112–115.

Möttönen, R., & atkins, K. E. (2009). Motor representationsof articulators contribute to cateorical perception of speech sounds. Journal of Neuroscience , 29 , 9819–9825.

Munhall, K., ones, . A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Head oveent iprovesauditory speech perception. Psychological Science , 15 ,133–137.

Munhall, K. G., Servos, P., Santi, A., & Goodale, M. A. (2002).Dynaic visual speech perception in a patient with visualfor anosia. Neuroreport , 13 , 1793–1796.

Nasir, S. M., & stry, D. . (2009). Auditory plasticity andspeech otor learnin. Proceedings of the National Academy of Sciences of the United States of America , 106 , 20470–20475.

Navarra, ., Alsius, A., Soto-araco, S., & Spence, C. (2010). Assessin the role of attention in the audiovisual intera-tion of speech. Information Fusion , 11, 4–11.

Navarra, ., Sebastián-Gallés, N., & Soto-araco, S. (2005). Theperception of second lanuae sounds in early bilinuals:new evidence fro an iplicit easure.  Journal of Experi- mental Psychology. Human Perception and Performance , 31,912–918.

Navarra, ., & Soto-araco, S. (2007). Hearin lips in a secondlanuae: visual articulatory inforation enables the per-ception of L2 sounds. Psychological Research , 71, 4–12.

Navarra, ., Vatakis, A., Zapini, M., Soto-araco, S.,Huphreys, ., & Spence, C. (2005). Exposure to asyn-chronous audiovisual speech extends the teporal window for audiovisual interation. Brain Research. Cognitive Brain Research , 25 , 499–507.

Pallier, C. (1994). Rôle de la syllable dans la perception de laparole: études attentionnelles. Unpublished PhD dissertation 

 presented at the École des Hautes Études en Sciences Sociales , Paris.

Patterson, M. L., & erker, . . (1999). Matchin phoneticinforation in lips and voice is robust in 4.5-onth-oldinfants. Infant Behavior and Development , 22 , 237–247.

Patterson, M. L., & erker, . . (2002). Infants’ ability toatch dynaic phonetic and ender inforation in theface and voice.  Journal of Experimental Child Psychology , 81,93–115.

Patterson, M. L., & erker, . . (2003). Two-onth-old infantsatch phonetic inforation in lips and voice. Developmental Science , 6 , 191–196.

Piaet, . (1952). The origins of intelligence in children . New  ork: International University Press.

Pons, ., Lewkowicz, D. ., Soto-araco, S., & Sebastián-Gallés,N. (2009). Narrowin of intersensory speech perception in

infancy. Proceedings of the National Academy of Sciences of the United States of America , 106 , 10598–10602.Ponton, C. ., Bernstein, L. E., & Auer, E. T., r. (2009). Mis-

atch neativity with visual-only and audiovisual speech.Brain Topography , 21, 207–215.

Poulin-Dubois, D., Serbin, L., Kenyon, B., & Derbyshire, A.(1994). Infants’ interodal knowlede about ender. Devel- opmental Psychology , 30 , 436–442.

Pulverüller, ., & adia, L. (2010). Active perception: Sen-soriotor circuits as a cortical basis for lanuae. Nature Reviews. Neuroscience , 11, 351–360.

Reed, C. M., Durlach, N. I., Braida, L. D., & Schultz, M. C.(1989). Analytic study of the Tadoa ethod: effects of hand position on seental speech perception.  Journal of Speech and Hearing Research , 32 , 921–929.

Reisber, D., McLean, ., & Goldfield, A. (1987). Easy to hearbut hard to understand: a lip-readin advantae with intact auditory stiuli. In B. Dodd & R. Capbell (Eds.), Hearing by eye: the psychology of lip-reading (pp. 97–114). Hillsdale, N:Lawrence Erlbau Associates.

Roelofs, A., Özdeir, R., & Levelt, . . M. (2007). Influencesof spoken word plannin on speech reconition.  Journal of 

 Experimental Psychology. Learning, Memory, and Cognition , 33 ,900–913.

Rosenblu, L. D., Schuckler, M. A., & ohnson, . A. (1997).The McGurk effect in infants. Perception & Psychophysics , 59 ,347–357.

Ross, L. A., Saint-Aour, D., Leavitt, V. M., avitt, D. C., &oxe, . . (2007). Do you see what I a sayin Explorin visual enhanceent of speech coprehension in noisy environents. Cerebral Cortex , 17 , 1147–1153.

Sas, M., Aulanko, R., Hääläinen, M., Hari, R., Lounasaa,. V., Lu, S. T., et al. (1991). Seein speech: visual

inforation fro lip oveents odifies activity in thehuan auditory cortex. Neuroscience Letters , 127 , 141–145.

Sas, M., Möttönen, R., & Sihvonen, T. (2005). Seein andhearin others and oneself talk. Brain Research. Cognitive Brain Research , 23 , 429–435.

Sato, M., Buccino, G., Gentilucci, M., & Cattaneo, L. (2009).n the tip of the tonue: odulation of the priary otorcortex durin audiovisual speech perception. Speech Com- munication , 52 , 533–541.

Schneider, ., & Shiffrin, R. M. (1977). Controlled and auto-atic huan inforation processin: detection, search,and attention. Psychological Review , 84 , 1–66.

Schroeder, C. E., Lakatos, P., Kajikawa, ., Partan, S., & Puce, A. (2008). Neuronal oscillations and visual aplification of speech. Trends in Cognitive Sciences , 12 , 106–113.

Skipper, . I., Nusbau, H. C., & Sall, S. L. (2005). Listeninto talkin faces: otor cortical activation durin speechperception. NeuroImage , 25 , 76–89.

Skipper, . I., van assenhove, V., Nusbau, H. C., & Sall,S. L. (2007). Hearin lips and seein voices: how corticalareas supportin speech production ediate audiovisualspeech perception. Cerebral Cortex , 17 , 2387–2399.

8466_024.indd 451 12/21/2011 6:01

7/30/2019 Barry E. Stein - The New Handbook of Multisensory Processing (Chapter 24)

http://slidepdf.com/reader/full/barry-e-stein-the-new-handbook-of-multisensory-processing-chapter-24 18/18

452  RDI NAVARRA, H. HENN EUNG, ANET . ERKER, AND SALVADR ST-ARAC

Soken, N. H., & Pick, A. D. (1992). Interodal perception of happy and anry expressive behaviors by seven-onth-oldinfants. Child Development , 63 , 787–795.

Soto-araco, S., & Alsius, A. (2007). Conscious access to theuni-sensory coponents in a cross-odal illusion. Neurore- 

 port , 18 , 347–350.Soto-araco, S., & Alsius, A. (2009). Deconstructin the

McGurk-MacDonald illusion. Journal of Experimental Psychol- ogy. Human Perception and Performance , 35 , 580–587.Soto-araco, S., Navarra, ., & Alsius, A. (2004). Assessin

autoaticity in audiovisual speech interation: evidencefro the speeded classification task. Cognition , 92 ,B13–B23.

Soto-araco, S., Navarra, ., eiku, ., Voulouanos, A.,Sebastián-Gallés, N., & erker, . (2007). Discriinatinlanuaes by speechreadin. Perception & Psychophysics , 69 ,218–231.

Stekelenbur, . ., & Vrooen, . (2007). Neural correlatesof ultisensory interation of ecoloically valid audiovisualevents. Journal of Cognitive Neuroscience , 19 , 1964–1973.

Stevens, K. N. (2002). Toward a odel for lexical access basedon acoustic landarks and distinctive features. Journal of the Acoustical Society of America , 111, 1872–1891.

Stevenson, R. A., Ki, S., & aes, T. . (2009). An additive-factors desin to disabiuate neuronal and areal conver-ence: easurin ultisensory interactions between audio, visual, and haptic sensory streas usin fMRI.  Experimental Brain Research , 198 , 183–194.

Suby, . H., & Pollack, I. (1954). Visual contribution tospeech intelliibility in noise. Journal of the Acoustical Society of America , 26 , 212–215.

Suerfield, Q., & McGrath, M. (1984). Detection and reso-lution of audiovisual incopatibility in the perception of  vowels. Quarterly Journal of Experimental Psychology: Human  Experimental Psychology , 36 , 51–74.

Suers, I. R. (Ed.). (1992). Tactile aids for the hearing impaired (practical aspects of audiology) . London: ohn iley &Sons.

Talsa, D., Senkowski, D., Soto-araco, S., & oldorff, M. G.(2010). The ultifaceted interplay between attention and

ultisensory interation. Trends in Cognitive Sciences , 14 ,400–410.

Talsa, D., & oldorff, M. G. (2005). Selective attention andultisensory interation: ultiple phases of effects on theevoked brain activity.  Journal of Cognitive Neuroscience , 17 ,1098–1114.

Teinonen, T., Aslin, R. N., Alku, P., & Csibra, G. (2008). Visualspeech contributes to phonetic learnin in 6-onth-oldinfants. Cognition , 105 , 850–855.

Terao, ., Uawa, ., aaoto, T., Sakurai, ., Masuoto, T.,& Abe, . (2007). Priary face otor area as the otorrepresentation of articulation.  Journal of Neurology , 254 ,442–447.

Tiippana, K., Andersen, T. S., & Sas, M. (2004). Visual atten-tion odulates audiovisual speech perception.  European 

 Journal of Cognitive Psychology , 16 , 457–472.

Tuoainen, ., Andersen, T. S., Tiippana, K., & Sas, M.(2005). Audio-visual speech is special. Cognition , 96 ,B13–B22.

 van assenhove, V., Grant, K. ., & Poeppel, D. (2005). Visualspeech speeds up the neural processin of auditory speech.Proceedings of the National Academy of Sciences of the United States of America , 102 , 1181–1186.

 Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). acilitationof ultisensory interation by the “unity effect” reveals that speech is special.  Journal of Vision (Charlottesville, Va.) , 8 ,

1–11. Vatakis, A., & Spence, C. (2007). Crossodal bindin: evaluat-in the” unity assuption” usin audiovisual speech stiuli.Perception & Psychophysics , 69 , 744.

 Vatikiotis-Bateson, E., Munhall, K. G., Kasahara, ., Garcia, .,& ehia, H. (1996). Characterizin audiovisual inforationdurin speech. Proceedings of the 4th International Conference on Language Processing (ICSLP 96 , ctober 3-6,) , Philadel-phia, PA (pp. 1485–1488).

 Voulouanos, A., Druhen, M. ., Hauser, M. D., & Huizink, A. T. (2009). ive-onth-old infants’ identification of thesources of vocalizations. Proceedings of the National Academy of Sciences of the United States of America , 106 , 18867–18872.

 Vrooen, ., Bertelson, P., & de Gelder, B. (2001). The ven-triloquist effect does not depend on the direction of auto-atic visual attention. Perception & Psychophysics , 63 ,651–659.

 alker-Andrews, A. S. (1997). Infants’ perception of expres-sive behaviors: differentiation of ultiodal inforation.Psychological Bulletin , 121, 437–456.

 alker-Andrews, A. S., Bahrick, L. E., Ralioni, S. S., & Diaz,I. (1991). Infants’ biodal perception of ender. Ecological Psychology , 3 , 55–75.

 alton, G. E., & Bower, T. G. (1993). Aodal representationsof speech in infants. Infant Behavior and Development , 16 ,233–243.

 atkins, K. E., Strafella, A. P., & Paus, T. (2003). Seein andhearin speech excites the otor syste involved in speechproduction. Neuropsychologia , 41, 989–994.

 eiku, . M., Voulouanos, A., Navarra, ., Soto-araco, S.,Sebastián-Gallés, N., & erker, . . (2007). Visual lanuaediscriination in infancy. Science , 316 , 1159.

 ilson, S. M., Pinar-Sayin, A., Sereno, M. I., & Iacobini, M.

(2004). Listenin to speech activates otor areas involvedin speech production. Nature , 7 , 701–702.

 indann, S. (2003). Effects of sentence context and expec-tation on the McGurk illusion.  Journal of Memory and Language , 50 , 212–230.

 riht, T. M., Pelphrey, K. A., Allison, T., McKeown, M. ., &McCarthy, G. (2003). Polysensory interactions alon lateralteporal reions evoked by audiovisual speech. Cerebral Cortex , 13 , 1034–1043.

 ehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002).Linkin facial aniation, head otion and speech acous-tics. Journal of Phonetics , 30 , 555–568.

 uan, H., Reed, C. M., & Durlach, N. I. (2005). Tactual display of consonant voicin as a suppleent to lipreadin. Journal of the Acoustical Society of America , 118 , 1003–1015.

 uen, I., Davis, M. H., Brysbaert, M., & Rastle, K. (2010).

 Activation of articulatory inforation in speech percep-tion. Proceedings of the National Academy of Sciences of the United States of America , 107 , 592–597.