producing american english vowels during vocal tract ... · producing american english vowels...

18
Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels Purpose: To consider interactions of vocal tract change with growth and perceived output patterns across development, the influence of nonuniform vocal tract growth on the ability to reach acousticperceptual targets for English vowels was studied. Method: Thirty-seven American English speakers participated in a perceptual categorization experiment. For the experiment, an articulatory-to-acoustic model was used to synthesize 342 five-formant vowels, covering maximal vowel spaces for speakers at 5 growth stages (from 6 months old to adult). Results: Results indicate that the 3 vowels /iuæ/ can be correctly perceived by adult listeners when produced by speakers with a 6-month-old vocal tract. Articulatory-to- acoustic relationships for these 3 vowels differ across growth stages. For a given perceived vowel category, the infant s tongue position is more fronted than the adult s. Furthermore, nonuniform vocal tract growth influences degree of interarticulator coupling for a given perceived vowel, leading to a reduced correlation between jaw height and tongue body position in infantlike compared with adult vocal tracts. Conclusion: Findings suggest that nonuniform vocal tract growth does not prevent the speaker from producing acousticauditory targets related to American English vowels. However, the relationships between articulatory configurations and perceptual targets change from birth to adulthood. KEY WORDS: developmental milestones, speech and language, speech perception L earning to speak is related to the emergence of sensorimotor maps in which vowels and consonants are associated with articulatoryacoustic vocal tract configurations. The young child must accommo- date transformations of the speech production mechanisms by integrating anatomical, motor, perceptual, and cognitive capacities (Green, Moore, & Reilly, 2002; Kuhl & Meltzoff, 1982; Vorperian et al., 2005). Physical growth in the vocal tract is not complete until adolescence. Thus, producing the speech targets associated with vowels and consonants from birth to adulthood likely involves articulatory and acoustic adjustments, as the production system evolves. Determining the exact role of each component (anatomical, motor, perceptual, and cognitive) in producing speech output is a complex task. Nonuniform Vocal Tract Growth, Motor Control Development, and Vowel Production At the anatomical level, cineradiographic (Goldstein, 1980) and mag- netic resonance imaging ( MRI) data (Callan, Kent, Guenther, & Vorperian, Lucie Ménard Université du Québec à Montréal, Canada Barbara L. Davis University of Texas at Austin Louis-Jean Boë Université Stendhal/Institut National Polytechnique de Grenoble, Grenoble, France Johanna-Pascale Roy Université Laval, Sainte-Foy, Canada Journal of Speech, Language, and Hearing Research Vol. 52 12681285 October 2009 D American Speech-Language-Hearing Association 1092-4388/09/5205-1268 1268

Upload: others

Post on 27-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

Producing American EnglishVowels During Vocal Tract Growth:A Perceptual Categorization Studyof Synthesized Vowels

Purpose: To consider interactions of vocal tract change with growth and perceivedoutput patterns across development, the influence of nonuniform vocal tract growth onthe ability to reach acoustic–perceptual targets for English vowels was studied.Method: Thirty-seven American English speakers participated in a perceptualcategorization experiment. For the experiment, an articulatory-to-acoustic modelwas used to synthesize 342 five-formant vowels, covering maximal vowel spaces forspeakers at 5 growth stages (from 6 months old to adult).Results: Results indicate that the 3 vowels /i u æ/ can be correctly perceived by adultlisteners when produced by speakers with a 6-month-old vocal tract. Articulatory-to-acoustic relationships for these 3 vowels differ across growth stages. For a givenperceived vowel category, the infant ’s tongue position is more fronted than the adult ’s.Furthermore, nonuniform vocal tract growth influences degree of interarticulatorcoupling for a given perceived vowel, leading to a reduced correlation between jawheight and tongue body position in infantlike compared with adult vocal tracts.Conclusion: Findings suggest that nonuniform vocal tract growth does not prevent thespeaker from producing acoustic–auditory targets related to American English vowels.However, the relationships between articulatory configurations and perceptual targetschange from birth to adulthood.

KEY WORDS: developmental milestones, speech and language, speech perception

L earning to speak is related to the emergence of sensorimotor “maps”in which vowels and consonants are associated with articulatory–acoustic vocal tract configurations. The young child must accommo-

date transformations of the speech productionmechanisms by integratinganatomical, motor, perceptual, and cognitive capacities (Green, Moore, &Reilly, 2002; Kuhl &Meltzoff, 1982; Vorperian et al., 2005). Physical growthin thevocal tract isnot completeuntil adolescence.Thus,producing thespeechtargets associatedwithvowels and consonants frombirth to adulthood likelyinvolves articulatory and acoustic adjustments, as the production systemevolves. Determining the exact role of each component (anatomical, motor,perceptual, and cognitive) in producing speech output is a complex task.

Nonuniform Vocal Tract Growth, Motor ControlDevelopment, and Vowel Production

At the anatomical level, cineradiographic (Goldstein, 1980) andmag-netic resonance imaging (MRI) data (Callan,Kent, Guenther,&Vorperian,

Lucie MénardUniversité du Québec à Montréal, Canada

Barbara L. DavisUniversity of Texas at Austin

Louis-Jean BoëUniversité Stendhal/Institut National

Polytechnique de Grenoble, Grenoble, France

Johanna-Pascale RoyUniversité Laval, Sainte-Foy, Canada

Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009 • D American Speech-Language-Hearing Association1092-4388/09/5205-1268

1268

Page 2: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

2000; Fitch & Giedd, 1999; Vorperian et al., 2005) haveshown that the adult vocal apparatus is the result of acomplex remodeling of the infant tract. At birth, overallvocal tract length, determined from the larynx to the lips,is approximately 8 cm,whereas the adultmale vocal tractis 17 cm long (Goldstein, 1980). According to Goldstein,the ratio of the pharynx length to the length of the oralcavity changes from 0.5 cm at birth to 1.1 cm in adult-hood for a male. An infant’s pharyngeal cavity at birth isthusmuch shorter than the oral cavity, whereas an adultmale’s pharyngeal cavity is longer than the front cavity.MRI data (Fitch & Giedd, 1999) confirm that vocal tractgrowth is not uniform. Thesemajor modifications greatlyinfluence the way speakers produce vowels at differentgrowth stages (Callan et al., 2000; Ménard, Schwartz, &Boë, 2004). For instance, it has been suggested that thesparse occurrence of the vowel /u/ in infant vowel in-ventories is related to infants’ short pharyngeal cavity(Buhr, 1980).

In addition to nonuniformanatomical growth, speechmotor abilities constrain the quality of the producedvowels, as they emerge during the first year of life (Davis& MacNeilage, 1995; Kent, 1976; Vilain, Abry, Badin, &Brosda, 1999). In the framework of the frame/contenttheory of speech production, MacNeilage andDavis (1990)proposed a production system–based account of vocal-ization patterns in early speechlike syllabic vocalizationpatterns. In their view, at the onset of canonical babbling(around 7–8 months), control of rhythmic mandible os-cillation gives rise to alternating open-closed cycles, re-sulting in a percept of CV syllables. Poor control of lipand tongue articulators prevents infants from produc-ing a wide variety of places and manners of articulation.Consonant- and vowel-like sounds produced are the re-sult of the tongue and lips being passively raised andlowered in concert with rhythmic jawmovements. In thisview, vowels that do not occur in the first sound inventoryare those that cannot be produced because the infant doesnot yet havemotor control abilities for tonguemovementsindependent of the jaw in vocalizations.

Adultlikemotor control to produce a variety of speechsounds is reached in late childhood or adolescence (Kent,1976; Smith&Zelaznik, 2004;Walsh&Smith, 2002). In astudy of lip–jaw coarticulatory patterns, Green, Moore,Higashikawa, and Steeve (2000) demonstrated that aschildren mature, the coordination of lip and jaw move-ment improves, resulting in decreased trial-to-trial vari-ability. This patternwas confirmed by Smith and Zelaznik(2004), who studied lip and jaw movements in 180 chil-drenandadults ages4–22years.Cheng,Murdoch,Goozee,and Scott (2007) investigated tongue–jaw interactionswith electromagnetic articulography in 48 participants6–38 years old during production of the consonants /t/ and/k/. Tongue–tip and jawmovements became increasinglysynchronized over time. Tongue–body and jawmovements

did not show a similar pattern but became less variablewith age. These results are interpreted as reflecting in-creased motor control maturity with age. Learning tointegrate new controls into existing ones, such as inte-grating tongue movements into already controlled jawoscillatory movements, enables precision for the diver-sification of speech-related movement.

A Previous Study Combining PerceptualTests and Articulatory Modeling

The effects of nonuniform cavity growth and motorcontrol development on vowel production have been de-scribed in an earlier study of French by using combinedperceptual and modeling experiments (Ménard et al.,2004). Five-formant vowels generated by an articulatory-to-acoustic model in vocal tracts representative of fivegrowth stages (newborn, 4-year-old, 10-year-old, 16-year-old, and adult) were presented to French listeners in anidentification task. Stimuli were generated on the as-sumption that all synthesized speakers exhibited adult-like motor control abilities, allowing investigation of theeffects of changes in vocal tract length alone.Even thoughthe ratio of pharyngeal length to front cavity length dif-fered across growth stages, acoustic targets for the threecardinal French vowels /i u a/ could be perceived whensynthesized in the infant vocal tract. However, for French,more stimuli were perceived as front and open vowels inthe infant than in the adult male vocal tract. Thus, vow-els that French speakers perceive to be front and open,corresponding to some of the most frequently producedvowels reported for infant sound inventories in canonicalbabbling (Davis & MacNeilage, 1995), were favored bythe infant vocal tract. Further analyses lead us to pro-pose that different sensorimotor maps are associatedwith French vowels from birth to adulthood. An analysisof those maps suggested that different ranges of artic-ulatory parameters were used, depending on the vocaltract morphology. Thus, a given perceived French vowelmight be produced using different positions of the jaw,tongue, and lips in the firstmonths of life comparedwiththe adult stage. Those conclusions, however, were spe-cific to French. It is by no means obvious that the sameresults would hold for another language (e.g., English).It is possible, for instance, that the fact that the Frenchsystem has six front vowels (the unrounded /i e e/ andtheir rounded counterparts /y L l /) and only three backvowels (/u o �/) might trigger a greater tendency to per-ceive front vowels.

ObjectivesThe first objective of the present study was to de-

termine the effects of nonuniform vocal tract growth on aspeaker’s ability to produce American English vowels.

Ménard et al.: Perceptual Categorization of English Vowels 1269

Page 3: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

The results of identification by English listeners of syn-thesizedvowels produced byvocal tractmorphology rang-ing from birth to adulthood were analyzed. The secondobjectivewas to compare the underlying articulatory con-figurations related to perceptual targets in infantlike andadultlike vocal tracts and propose sensorimotor maps forEnglish vowels at both growth stages that represent end-points of the vocal tract growth process.

MethodThis perceptual experiment was designed following

a method previously used by Ménard et al. (2004). Syn-thesized five-formant vowels representing five growthstages from birth to adulthood were generated using anarticulatory-to-acoustic model. Realistic F0 values werealso taken into account. Those vowels were used as stim-uli in an identification task performed by 37 AmericanEnglish listeners.

The Articulatory ModelThe variable linear articulatory (VLAM) growth

model, developed byMaeda and colleagues (Boë &Maeda,1997), integratesknowledgeacquired frompreviousmodelswith currently available growth data. VLAMwas imple-mented and tested in an environment (GROWTH) orig-inally developed for an articulatory model of the adultvocal tract (Maeda, 1979, 1990). As described in detail inMénard et al. (2004), the latter model is based on a sta-tistical analysis of 519 cineradiographic images of aFrenchspeaker uttering 10 sentences (Bothorel, Simon, Wioland,&Zerling, 1986). For each image, a semipolar grid (with anintersegment distance of 0.5 cm in the oral andpharyngealregions and 11° in the polar region)was superimposed onthe midsagittal contour. The intersection of each seg-ment of the grid with the interior and exterior walls ofthe vocal tract was then determined. A principal com-ponent analysis revealed that 88% of the variance couldbe explained by seven articulatory parameters, directlyinterpretable in terms of functionally organized articu-latory blocks: (a) protrusion and labial aperture; (b) move-ment of the tongue body, dorsum, and tip; (c) jaw height;and (d) larynxheight (Boë,Gabioud,&Perrier, 1995).Eachof these parameters is adjustable at a value in the rangeof ±3.5 SD of the mean values for this articulator in thecineradiographic images.

The growth process is simulated by modifying thelongitudinal dimension of the vocal tract according to twoscaling factors. A schematic representation of the scalingmethod is provided in Figure 1. One factor relates to theanterior part of the vocal tract (k_oral in Figure 1) andthe other to the pharynx (k_pharynx in Figure 1), inter-polating the zone in between. On the semipolar grid, the

5-mm intersegment space and the sagittal distance (cor-responding to the value used for the adultmale) are scaledby one of the two factors or by an interpolation of these fac-tors. The factor used for the oral region differs from the fac-tor used in the pharyngeal region to appropriately reflectthe nonuniform growth of the two cavities. The exact val-ues of those factors were calibrated, month by month andyear by year, using the data provided by Goldstein (1980).

From the midsagittal contour obtained, the cross-sectional area function is computed following Heinz andStevens’s (1965) formula. The transfer function is calcu-lated according to Badin and Fant’s (1984) model. Thepoles of the transfer function are excited through a five-formant cascade synthesis system (Feng, 1983), by a pulsetrain generated by a source according to the Liljencrants–Fantmodel (Fant, Liljencrants,&Lin 1985). The param-eters related to the source (glottal symmetry quotientand open quotient) were equal to 0.8 and0.7, respectively,and remained unchanged for all growth stages. The result-ing signal is sampled at 22050 Hz and is 500 ms long.Fundamental frequency values are chosen based on thegrowthdata presented byBeck (1996). Themodel is, there-fore, suitable for use in systematic simulation studies.

StimuliThe stimuli are similar to those used in the earlier

study of French listeners (Ménard et al., 2004). First,

Figure 1. Schematic representation of the semipolar grid super-imposed on a sagittal view. Intersegment space (5 mm for an adultmale) and sagittal distance (d ) are scaled by a factor dependingof the region of the vocal tract: the oral cavity (k_oral), the pharynx(k_pharynx), or the zone in between. From “Role of Vocal TractMorphology in Speech Development: Perceptual Targets andSensorimotor Maps for Synthesized French Vowels From Birth toAdulthood,” by L. Ménard, J.-L. Schwartz, and L.-J. Boë, 2004,p. 1062. Copyright 2004 by American Speech-Language-HearingAssociation. Reprinted with permission.

1270 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 4: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

maximal vowel spaces (hereafterMVSs, afterBoë, Perrier,Guérin,&Schwartz, 1989)were generated for five growthstages: 6 months old, and 4, 10, 16, and 21 years old. TheMVS is defined as the maximal acoustic space, in theF1–F2–F3 dimensions, obtained by a uniform distribu-tion of the entire input space of command parameters.All possible vowels of the world’s languages can be sit-uated within that space. The following vocal tract lengthswere obtained, for a neutral configuration (all param-eters set to the null value): 7.70 cm (6 months), 10.67 cm(4 years), 12.65 cm (10 years), 13.56 cm (16 years), and17.45 cm (21 years). A total of about 7,000 vowels foreach age were modeled.

In the 21-year-old MVS generated by VLAM,5-formant vowels were synthesized. Those vowels cor-respond to the 38monophthong oral vowels of theworld’slanguages, as reported in the UCLA Phonological Seg-ments Inventory Database (UPSID) and described inMaddieson (1986). The UPSID vowels constituted anappropriate sample covering the entireMVSwhile ensur-ing articulatory coherence as well as a reasonable num-ber of stimuli. First, the 4 corner vowels / i y u a/ weresituated at the limits of that space, following criteriainspired by dispersion–focalization theory (Schwartz,Boë, Vallée, & Abry, 1997). The dispersion–focalizationtheory assumes that vowel systems are shaped by both(a) dispersion constraints that increase mean formantdistances between vowels and (b) focalization constraintsthat increase the trend to have focal vowels in the system—that is, vowels with close adjacent formants (F3 andF4 for /i /, F2 and F3 for /y/, F1 and F2 at their lowestmean position for /u/, F1 and F2 at their highest meanposition for /a/). The remaining 34 acoustic prototypeswere distributed according to their mean F1–F2–F3 val-ues, provided by previous studies of these vowels invarious languages (for a review, see Vallée, 1994). Onceestablished, the optimal acoustic F1–F2–F3 valueswererelated to their underlying articulatory values (commandparameters in VLAM) by an inversion procedure exploit-ing the pseudoinverse of the Jacobian matrix (detailscan be found in Ménard et al., 2004).

For each of the other four growth stages (6 months,4 years, 10 years, and 16 years), the 38 prototypes weresituated in their respective MVS so that their relativepositions in the acoustic space were similar to those inthe adult MVS. The same inversion method was used todetermine the values of the underlying articulatory pa-rameters associated with each prototype. Because thosestimuli are based on similar relative positionswithin theacoustic space (MVS) throughout growth, we refer to themas acoustic prototypes. These prototypes are depicted forthe five vocal tracts in Figure 2.

A second set of prototypes was generated for the fol-lowing growth stages: 6 months, 4 years, 10 years, and16 years. Those prototypes, referred to as articulatory

prototypes, were synthesized using the adult commandparameters. For instance, for the 6-month-old vocal tract,for each of the 38UPSID vowels, the articulatory settingsdetermined for the adult vocal tractwere generated in the6-month-old vocal tract. Whereas acoustic prototypes arebased on similar relative acoustic positions within theMVS, articulatory prototypes involve similar articula-tory values for all growth stages. Those prototypes arerepresented for the five growth stages in the F1 versusF2 space in Figure 3. As schematized in Figure 3, the rel-ative location of the articulatory prototypes in the MVSfor each growth stage differs. As a result, 38 vowel stimuliwere synthesized for the adult vocal tract, and 76 proto-types (acoustic andarticulatoryprototypes)weregeneratedfor the other four vocal tracts (for a total of 342 stimuli).

Fundamental frequency values of 436 Hz, 337 Hz,229 Hz, 155 Hz, and 112 Hz were associated with the6-month-old and 4-, 10-, 16-, and 21-year-old vocal tracts,respectively. Those values are based on Beck’s (1996)growth curve of F0 values. Duration was maintainedconstant (500 ms) for all vowels. Thus, durational cues(as well as spectral variability), which play an importantrole in the perception of English vowels, were not takeninto account. Our goal was to limit as much as possiblethe variability among the stimuli. It is thus expectedthat some vowels will be less intelligible than others.This issue is discussed in the text that follows.

ProcedureOne occurrence of each of the 342 stimuli was pre-

sented binaurally, through high-quality headphones, to agroup of 37 American English listeners living in the Aus-tin area (Texas, United States), ages 18–25. The stimuliwere grouped into five blocks of 76 (or 38 for the adultstage) randomized items, with each token of a single agebeing presented in one block. Each block was followed bya brief pause. Listeners were undergraduate students en-rolled at the University of Texas at Austin who had nophonetics background and did not report any speech per-ception or production abnormalities. Using a button press,they identified the perceived vowel among the 12 Amer-ican English oral vowels / i I e e æ A � o ʊ u ʌ ɚ/. Eventhough durational and spectral variability were not mod-eled in our stimuli, thewhole set of vowels was presented.Each vowel was represented by amonosyllabic word of theform [hVd]: heed [hi▼

▲d], hid [hId], hayed [heId], head [hed],had [hæd], hod [hAd], hawed [h�d], hoed [hod], hood[hʊd],who’d [hud], hud [hʌd], and heard [hɚd]. The testlasted about 40 min and took place in a quiet room.

Data AnalysisTo achieve the first objective, the perceptual scores

of the two sets of stimuli (articulatory and acoustic)

Ménard et al.: Perceptual Categorization of English Vowels 1271

Page 5: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

representing the 12 American English vowels wereanalyzed. For each stimulus, the vowel category per-ceived by the majority of the listeners (modal response)was determined. A first analysis was performed on onlythe articulatory prototypes, to compare the articula-tory and perceptual dimensions. In a second analysis,

acoustic prototypes were analyzed, in order to evaluatethe variation in perceptual scores related to stable rel-ative acoustic positions within the MVS throughoutgrowth. Our second objective was attained by an anal-ysis of the perceived vowels in the acoustic space. Foreach stimulus, formant and F0 values in hertz were

Figure 2. Formant values, in the F1/F2 space, of acoustic prototypes corresponding to the 38 monophthong oral vowels of the UCLAPhonological Segments Inventory Database (UPSID), for an adultlike vocal tract (upper left-hand panel), a 16-year-old-like vocal tract(upper right-hand panel), a 10-year-old-like vocal tract (middle left-hand panel), a 4-year-old-like vocal tract (middle right-handpanel), and a 6-month-old-like vocal tract (lower panel).

1272 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 6: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

first converted into a Bark scale, using the conversionformula proposed by Traunmüller (1990): Fbark = 26.81/[1 + (1960/FHz)] – 0.53. In this analysis, stimuli (artic-ulatory and acoustic prototypes) were grouped accord-ing to their dominantly perceived vowel category (i.e.,the category that triggered at least 50% agreement

among listeners). Perceived dispersion ellipses plottedin the formant space for each growth stage were super-imposed on the corresponding MVS in order to selectthe subset of articulatory–acoustic configurations en-closed within each dispersion ellipsis. In this subset ofconfigurations, linear regression analyses were carried

Figure 3. Formant values, in the F1/F2 space, of articulatory prototypes corresponding to the 38 monophthong oral vowels of UPSID,for an adultlike vocal tract (upper left-hand panel), a 16-year-old-like vocal tract (upper right-hand panel), a 10-year-old-like vocal tract(middle left-hand panel), a 4-year-old-like vocal tract (middle right-hand panel), and a 6-month-old-like vocal tract (lower panel).

Ménard et al.: Perceptual Categorization of English Vowels 1273

Page 7: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

out between various articulators to assess the influenceof vocal tract anatomy on the degree of interarticulatorcoupling.

Recall that the stimuli used in this experiment didnot include the durational variation that characterizesthe American English tense/lax distinction (such as /æ/vs. /e/, for instance) nor the formant variation typical ofsomevowels. Consequently, it is expected that some stim-uli will be less intelligible. However, our goal was pri-marily to compare the perception of specific articulatorysettings and their acoustic consequences in various vocaltract lengths andmorphologies. The experimentwas alsodesigned to be as similar as possible to the Ménard et al.(2004) experiment on French. Thus, the same corpus wasused here.

ResultsTo assess the influence of nonuniform vocal tract

growth on the production of American English vowels,the modal responses of the perceptual test were com-pared for the two sets of stimuli (articulatory prototypesand acoustic prototypes). In a second step, dominantlyperceived vowels were selected and used to draw dis-persion ellipses, representing acoustic areas related to agiven perceived vowel target. Finally, underlying artic-ulatory positions related to those perceived dispersionellipses for the infant vocal tract and the adult vocaltract were compared because those two stages representthe limits of the growing vocal tract.

Perception of Articulatory PrototypesRelated to American English Vowels

The identification scores of articulatory prototypesrelated to the 12 American English vowels were ex-amined for the five growth stages. Recall that the artic-ulatory prototypes are the 38 stimuli generated in eachvocal tract with similar underlying articulatory posi-tions throughout growth (the adult ’s articulatory set-tings). The perceptual confusionmatrix for this subset ofstimuli is depicted inTable 1. In this table, the number ofperceived English vowels is depicted for each articula-tory prototype generated in a 6-month-old, a 4-year-old,a 10-year-old, a 16-year-old, and an adult male vocaltract. For each produced stimulus, the perceived vowelassociated with the highest perceptual score (modalresponse) is in bold. Cases for which the highest score fora given produced prototype corresponds to the samevowel category for all five growth stages are shaded.

It is clear from Table 1 that nonuniform vocal tractgrowth has a strong effect on the perception of a given ar-ticulatory position. Indeed, for all vowels but three (/ i /,/æ/, and /u/), the category that received the maximal

number of responses differed for the five growth stages.For the vowels /I e e o ʊ/, the modal perceived categoriesvary according to height: / I / was mainly perceived as /e /in the infant vocal tract; /e/ wasmainly perceived as /I / or/e/ in all five vocal tract lengths; /e / wasmainly perceivedas /æ/ in the infant vocal tract and /e/ in the older vocaltracts; /o/ wasmainly perceived as /ʊ/ in the 6-month-oldvocal tract and /o/ in older vocal tracts; and /ʊ/ ’s modalresponse was /ʊ/ or /u/ in the 6-month-old and the adultvocal tracts. The vowels /A � ʌ ɚ/ were, however, morevariable in their modal response among the five vocaltracts. It should be noted that, for the majority of thestimuli considered here, the perceived vowel categorythat received the highest score is similar for the 4-year-old, 10-year-old, 16-year-old, and adult vocal tracts butdiffers from the 6-month-old vocal tract.

Thus, overall, similar articulatory configurations inthe infant and adult male vocal tracts may be associatedwith different perceived vocalic targets. Of importance,the total number of perceived /æ/ vowels was 138 for thedata synthesized in the 6-month-old vocal tract and 15for the data synthesized in the adult male vocal tract.Thus, a substantial number of articulatory positions per-ceived as /æ/ in the infant vocal tract would no longer beperceived as such after vocal tract growth. Because sim-ilar articulatory maneuvers are perceived as belongingto different vowel categories in the 6-month-old and theadult, it is likely that the articulatory-to-perceptualmapspresent at the earliest stage of vocalization are modifiedon the basis of the anatomical transformations that occurduring growth. This hypothesis is explored later in thearticle.

Perception of Acoustic Prototypes Relatedto American English Vowels

The fact that stable articulatory configurations gen-erated in different vocal tract lengths are not associatedwith similar perceived vowel categories raises the ques-tion of the invariance domain throughout nonuniformvocal tract growth. To determinewhether stable relativeacoustic positions within the MVS in various vocal tractlengths are associated with similar perceived vowel cat-egories, perceptual scores related to the second set ofstimuli, namely the acoustic prototypes, were studied.The perceptual confusion matrix for this subset of stim-uli is depicted in Table 2. As was the case for Table 1, inTable 2 the number of perceived English vowels is de-picted for each acoustic prototype generated in a 6-month-old, a 4-year-old, a 10-year-old, a 16-year-old, and an adultmale vocal tract. For each produced stimulus, the per-ceived vowel associated with the highest perceptual score(modal response) is in bold. Cases for which the highestscore for a given produced prototype corresponds to thesame vowel category for all five growth stages are shaded.

1274 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 8: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

As Table 2 shows, the pattern of variation of per-ceptual scores for acoustic prototypes displays similartrends to the one depicted in Table 1 for articulatoryprototypes. The three stimuli /I æ u/ were associatedwithhigh perceptual scores for perceived /I æ u/, and thosemodal responses remained similar for all vocal tractlengths. As for the other vowels, /e e ʊ/ are mainly per-ceived as vowels that vary according to the height degreethroughout vocal tract growth: /e/ is mainly perceived as/e/ or / i /; /e / is associated with /æ/ or /e /; and /ʊ/ is mainlyperceived as /u/ or /ʊ/. Contrary to the articulatoryprototype /I /, the acoustic prototype /I / is perceived as /ʊ/in the 6-month-old vocal tract—in other words, a vowelthat contrasts with /I / along the place of articulation

feature. Table 2 also reveals that /A / is perceived as /æ/ inthe youngest vocal tract, /l/ in the 4-year-old vocal tract,and /o/ in the older vocal tracts.

Together, these results show that a given perceivedvowel category is not necessarily associated with thesame relative acoustic location within the MVS over thecourse of nonuniform vocal tract growth.

Dispersion Ellipses for PerceivedVowel Categories

The results presented so far have shown that the per-ception of a given articulatory setting, considered to be an

Table 1. Confusion matrix of perceived articulatory prototypes associated with American English vowels generated in newborn-and adultlike vocal tracts.

Producedstimuli

Growthstage

Perceived stimuli

i I e e æ A � o ʊ u ʌ ɚ

i 6 m. o. 27 1 2 3 0 0 0 0 2 2 0 04 y. o. 33 1 1 1 0 0 0 0 0 1 0 010 y. o. 36 0 1 0 0 0 0 0 0 0 0 016 y. o. 33 1 3 0 0 0 0 0 0 0 0 0Adult 31 0 4 2 0 0 0 0 0 0 0 0

I 6 m. o. 0 11 3 13 0 1 0 0 1 1 5 24 y. o. 31 2 3 0 0 0 0 0 1 0 0 010 y. o. 27 5 0 5 0 0 0 0 0 0 0 016 y. o. 28 0 4 2 0 1 0 0 1 0 0 1Adult 32 0 1 2 0 0 0 1 1 0 0 0

e 6 m. o. 0 1 5 21 6 1 0 2 0 0 0 14 y. o. 1 15 10 11 0 0 0 0 0 0 0 010 y. o. 0 18 7 11 0 0 0 1 0 0 0 016 y. o. 2 12 9 13 0 0 0 1 0 0 0 0Adult 3 11 9 13 0 0 0 1 0 0 0 0

e 6 m. o. 0 0 2 2 31 0 2 0 0 0 0 04 y. o. 0 0 7 20 5 0 0 3 0 0 1 110 y. o. 0 0 3 24 5 0 0 0 0 0 3 216 y. o. 0 1 8 25 1 0 0 0 0 0 0 2Adult 0 1 5 25 1 1 1 0 0 1 2 0

æ 6 m. o. 0 0 2 1 30 1 2 0 1 0 0 04 y. o. 0 0 0 5 23 1 3 0 0 0 5 010 y. o. 0 0 1 2 22 2 6 0 0 0 4 016 y. o. 0 0 1 1 25 0 6 2 0 0 1 1Adult 0 0 2 2 20 0 4 0 0 0 9 0

A 6 m. o. 0 0 1 4 3 4 1 3 6 1 11 34 y. o. 0 0 0 0 2 7 0 17 2 1 8 010 y. o. 0 0 0 0 0 7 2 19 3 1 5 016 y. o. 0 0 0 0 0 8 3 22 2 1 1 0Adult 0 0 0 0 0 11 3 20 1 1 1 0

Note. Maximal scores are in bold; shaded areas represent cases in which maximal scores for all five growth stages correspond to the samevowel. m.o. = months old; y.o. = years old.

Ménard et al.: Perceptual Categorization of English Vowels 1275

Page 9: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

Table 2. Confusion matrix of perceived acoustic prototypes associated with American English vowels generated in newborn- andadultlike vocal tracts.

Producedstimuli

Growthstage

Perceived stimuli

i I e e æ A � o ʊ u ʌ ɚ

� 6 m. o. 0 0 1 2 31 0 2 0 1 0 0 04 y. o. 0 0 1 0 7 6 22 0 0 0 1 010 y. o. 0 0 0 0 4 6 26 1 0 0 0 016 y. o. 0 0 0 0 4 4 29 0 0 0 0 0Adult 0 0 0 0 3 5 29 0 0 0 0 0

o 6 m. o. 0 1 1 0 0 0 1 2 15 3 12 24 y. o. 0 0 0 0 0 6 0 9 7 7 5 310 y. o. 0 0 0 0 0 8 1 25 1 1 1 016 y. o. 0 0 0 0 0 4 0 29 2 2 0 0Adult 0 0 1 0 0 8 1 26 1 0 0 0

ʊ 6 m. o. 0 0 0 0 0 0 0 2 17 17 1 04 y. o. 0 0 0 0 0 0 0 2 14 19 2 010 y. o. 0 0 0 0 0 0 0 2 15 19 0 116 y. o. 0 0 0 0 0 1 0 4 14 16 1 1Adult 0 0 0 0 0 0 0 1 20 16 0 0

u 6 m. o. 0 0 0 0 0 2 0 1 14 20 0 04 y. o. 0 0 0 0 0 0 0 2 13 21 1 010 y. o. 0 0 0 0 0 0 1 2 16 18 0 016 y. o. 0 0 0 0 0 0 0 1 14 22 0 0Adult 0 0 0 0 0 0 0 2 9 25 1 0

ʌ 6 m. o. 0 0 0 0 18 2 17 0 0 0 0 04 y. o. 0 0 0 0 4 6 19 1 0 0 7 010 y. o. 0 0 0 0 8 4 10 0 0 0 14 116 y. o. 0 0 0 0 0 2 10 3 0 0 20 2Adult 0 0 0 3 2 1 1 3 1 0 21 5

ɚ 6 m. o. 0 0 0 1 17 4 11 1 0 0 3 04 y. o. 0 0 0 0 2 2 0 10 7 0 12 410 y. o. 0 0 0 2 2 1 0 1 2 0 18 1116 y. o. 0 0 2 3 1 0 1 1 3 1 8 17Adult 0 0 0 3 1 0 0 0 3 0 2 28

i 6 m. o. 25 1 3 1 0 0 0 0 2 5 0 04 y. o. 34 0 2 0 0 0 0 0 0 0 0 110 y. o. 35 0 2 0 0 0 0 0 0 0 0 016 y. o. 35 0 1 1 0 0 0 0 0 0 0 0Adult 31 0 4 2 0 0 0 0 0 0 0 0

I 6 m. o. 5 5 2 6 0 0 1 3 7 2 5 14 y. o. 24 7 3 3 0 0 0 0 0 0 0 010 y. o. 22 10 2 3 0 0 0 0 0 0 0 016 y. o. 27 3 5 2 0 0 0 0 0 0 0 0Adult 32 0 1 2 0 0 0 1 1 0 0 0

e 6 m. o. 0 5 5 17 0 0 1 1 1 0 5 24 y. o. 0 15 6 15 0 0 1 0 0 0 0 010 y. o. 2 13 10 11 1 0 0 0 0 0 0 016 y. o. 2 6 10 18 0 0 0 0 0 0 0 1Adult 3 11 9 13 0 0 0 1 0 0 0 0

(Continued on the following page)

1276 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 10: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

Table 2 Continued. Confusion matrix of perceived acoustic prototypes associated with American English vowels generated innewborn- and adultlike vocal tracts.

Producedstimuli

Growthstage

Perceived stimuli

i I e e æ A � o ʊ u ʌ ɚ

e 6 m. o. 0 0 3 8 21 1 4 0 0 0 0 04 y. o. 0 1 5 22 7 0 0 0 0 0 1 110 y. o. 0 0 7 26 4 0 0 0 0 0 0 016 y. o. 0 0 4 30 2 0 0 0 0 0 0 1Adult 0 1 5 25 1 1 1 0 0 1 2 0

æ 6 m. o. 0 1 0 1 24 0 10 0 0 0 1 04 y. o. 0 0 2 1 27 0 7 0 0 0 0 010 y. o. 0 0 1 1 31 0 3 0 0 0 1 016 y. o. 0 1 0 0 28 0 8 0 0 0 0 0Adult 0 0 2 2 20 0 4 0 0 0 9 0

A 6 m. o. 0 0 1 0 12 5 9 0 2 0 6 24 y. o. 0 0 0 0 2 8 12 9 1 0 4 110 y. o. 0 0 0 0 0 7 10 16 0 0 4 016 y. o. 0 0 0 1 1 5 12 16 2 0 0 0Adult 0 0 0 0 0 11 3 20 1 1 1 0

� 6 m. o. 0 0 1 1 31 2 2 0 0 0 0 04 y. o. 0 0 0 0 7 5 25 0 0 0 0 010 y. o. 0 1 1 0 8 5 22 0 0 0 0 016 y. o. 0 0 0 0 4 6 27 0 0 0 0 0Adult 0 0 0 0 3 5 29 0 0 0 0 0

o 6 m. o. 0 0 1 0 2 5 5 4 2 0 16 24 y. o. 0 0 0 0 0 8 1 23 1 2 2 010 y. o. 0 0 0 0 0 5 1 27 3 1 0 016 y. o. 0 0 0 0 0 6 1 26 3 0 1 0Adult 0 0 1 0 0 8 1 26 1 0 0 0

ʊ 6 m. o. 0 0 0 0 0 1 0 1 16 18 1 04 y. o. 0 0 0 0 0 0 0 0 14 23 0 010 y. o. 0 0 0 0 0 0 0 2 14 21 0 016 y. o. 0 0 0 0 0 0 0 3 10 24 0 0Adult 0 0 0 0 0 0 0 1 20 16 0 0

u 6 m. o. 0 0 0 0 0 0 0 0 16 20 1 04 y. o. 0 0 0 0 0 1 0 0 17 18 1 010 y. o. 0 0 0 0 0 0 0 0 17 20 0 016 y. o. 0 0 0 0 0 0 0 5 12 19 1 0Adult 0 0 0 0 0 0 0 2 9 25 1 0

ʌ 6 m. o. 0 0 1 8 25 0 2 0 1 0 0 04 y. o. 0 0 1 1 5 3 5 4 0 0 16 210 y. o. 0 0 0 0 4 2 3 3 0 0 23 216 y. o. 1 0 0 0 6 1 7 1 0 0 17 4Adult 0 0 0 3 2 1 1 3 1 0 21 5

ɚ 6 m. o. 0 0 7 22 5 1 2 0 0 0 0 04 y. o. 0 2 3 7 2 0 1 3 5 1 6 710 y. o. 0 0 3 9 0 0 0 1 0 0 4 2016 y. o. 0 0 1 4 0 1 2 2 2 0 3 22Adult 0 0 0 3 1 0 0 0 3 0 2 28

Note. Maximal scores are in bold; shaded areas represent cases where maximal scores for all five growth stages correspond to the samevowel.

Ménard et al.: Perceptual Categorization of English Vowels 1277

Page 11: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

articulatory canonical position for a given American En-glish vowel, varies depending on the speaker ’s vocal tractlength and morphology. The same pattern is observedwhen the canonical position for the vowels is consideredto be a stable relative acoustic location within the MVS.To further investigate the relationships between the ar-ticulatory, acoustic, and perceptual domains, the totalnumber of stimuli (articulatory and acoustic proto-types) was analyzed, following a method previouslyapplied in Ménard et al. (2004). Dispersion ellipsesfor the dominantly perceived vowels were determined.Recall that a dominantly perceived vowel is the vowelcategory chosen by at least 50% of the listeners. First,all stimuli were labeled according to their dominantlyperceived vocal category. A dispersion ellipsis is definedhere as the area, in the F1 versus F2 acoustic space,enclosed by the mathematical ellipsis plotted with aradius of ±2 SDs of the mean F1 and F2 values of alldominantly perceived stimuli according to a given vowelcategory.

Figure 4 depicts the dispersion ellipses of each per-ceived vowel category (of dominantly perceived vowels)in theF1 versusF2 space for each growth stage.Data arepresented separately for the stimuli synthesized in the6-month-old, 4-year-old, 10-year-old, 16-year-old, and21-year-old vocal tracts.

The three vowels /i u æ/ were dominantly perceivedin the infantlike vocal tract, as well as in the childlikevocal tracts (4- and 10-year-old), the 16-year-old vocaltract, and the adult vocal tract. Indeed, in each of thosevocal tracts, at least 50% of the listeners perceived one ormore stimuli as /i /, /u /, or /æ/. Thus, the children’s rel-atively short pharyngeal cavity did not prevent listenersfrom perceiving perceptual targets related to the extremepositions of the vowel space.

Figure 4 also shows that some vowel categorieswerenot dominantly perceived among the synthesized stimuliin vocal tracts representative of the five growth stages.For instance, only five vowel categories (/ i ʊ u eæ/) wereperceived by at least 50% of the listeners in the infant-like vocal tract, whereas six to eight vowel categories(including /� oʌ /) were dominantly perceivedat the oldergrowth stages. This patternmay result from the acousticdifferences between the less intelligible high-pitchedvoices associated with infants and more intelligible low-pitched adult voices (Ménard et al., 2004). In addition,durational cues, which were not modeled in the presentstudy,may bemore important for vowel perception in in-fantlike vocal tracts than in older vocal tracts.

These findings also suggest that the acoustic vowelspace is organized differently across growth stages. Forthese AmericanEnglish listeners, dispersion ellipses as-sociated with the low vowel /æ/ and the high vowels /i ʊ/enclose much broader acoustic areas in the simulated

6-month-old vocal tract than in the older vocal tracts. Theareas related to the front vowels /i /, /e /, and /æ/ graduallydecrease with speaker age, suggesting a tight link be-tween the size of these acoustic categories and vocaltract morphology. Conversely, the high tense vowel /u/is related to a larger acoustic area for the adult malethan for the younger vocal tracts. A similar pattern wasobservedwith the French listeners (Ménard et al., 2004).Thus, larger acoustic areas are related to perceived low,high front tense, and high back lax vowels in a vocal tractin which the pharyngeal cavity is shorter than the oralcavity than in a vocal tract in which the pharyngeal cav-ity is longer than the oral cavity. These results suggestthat some vowel targets are favored by the infant or child’svocal tract configuration. Although vocal tract lengthandmorphology donot prevent a young speaker frompro-ducing the three corner vowels /i u æ/, corresponding tothe limits of the MVS, a relatively short pharyngeal cav-ity (typical of an infant or child) constrains the speaker ’svowel inventory such that larger areas of the vowel spaceare perceived as these three vowels.

Articulatory-to-Perceptual RelationshipsAccording to perceptual criteria, these results in-

dicate that the infant’s vowel space is organized diff-erently from the adult male’s vowel space. This patternmay be the outcome of developmental differences in theunderlying articulatory parameters related to a given per-ceived vowel. If so, articulatory–acoustic relationshipswould change during growth, leading to variations inarticulatory–perceptual relationships as well. However,the data presented so far take into account only one un-derlying articulatory configuration for each perceivedstimulus. Because many articulatory configurations maybe exploited to reach a given acoustic and perceptual tar-get (Atal, Chang, Mathews, & Tukey, 1978), the variousarticulatory strategies enclosed by the perceived disper-sion ellipses were further explored in order to considersensorimotor maps, defined here as relationships be-tween the perceived dispersion ellipses (sensory domain)and the underlying articulatory configurations (motordomain) for a given speaker’s vocal tract configuration.This section focuses on the infantlike and adultlike vocaltracts.

Articulatory–perceptual maps. To further investigatethe links between articulatory parameters and percep-tual targets, the dispersion ellipses presented inFigure 4,which represent the dominantly perceived vowel catego-ries for each growth stage, were used to select a subsetof articulatory–acoustic configurations corresponding tothe perceptual targets. Each ellipsis displayed in Figure 4was superimposed on the correspondingMVS, previouslyused to locate acoustic prototypes. The MVS data pointsenclosed in the ellipsis were selected as belonging to

1278 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 12: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

the perceptual area for this ellipsis (or perceived vowelcategory). Because the MVSs were generated with theVLAM articulatory-to-acoustic model, each data pointwas associated with a value for the seven articulatory

parameters. For the sake of clarity, Figures 5, 6, and 7present the range of articulatory values for four pa-rameters (tongue body, tongue dorsum, jaw, and lip pro-trusion) for each of the three corner vowels /i/ (Figure5), /u/

Figure 4. Dispersion ellipses of vowels dominantly perceived by American English adult participants for an adultlike vocal tract(upper left-hand panel), a 16-year-old-like vocal tract (upper right-hand panel), a 10-year-old-like vocal tract (middle left-handpanel), a 4-year-old-like vocal tract (middle right-hand panel), and a 6-month-old-like vocal tract (lower panel). Articulatoryprototypes and acoustic prototypes are included. (Note: In the online version of this article, this figure appears in color.)

Ménard et al.: Perceptual Categorization of English Vowels 1279

Page 13: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

(Figure 6), and /æ/ (Figure 7). Data are presented for theinfant and the adult male along the x-axis.

Figure 5 shows that, for perceived /i /, the tonguebody (upper left-hand panel in Figure 5) is slightly morefronted in the 6-month-old vocal tract than in the adulttract. The tongue dorsum (upper right-hand panel inFigure 5) is lower. Those configurations are probablyproduced to compensate for the infant’s shorter pharyn-geal cavity. No noticeable changes are observed con-cerning jaw position and lip protrusion (lower left-handand right-hand panels in Figure 5). For perceived /u/(Figure 6), the major difference between the infant andadult vocal tracts is also related to tongue position. Forthis vowel, a more fronted tongue body position is al-lowed in the infant vocal tract than in the adult tract

(upper left-hand panel); most important, the tonguedorsum is higher than it is for the adult (upper right-hand panel). This important differencemight contributeto the sparse occurrence of /u/ in early sound inventory.It should also be noted that for the infant, the jaw is a bitlower than for the adult male (lower left-hand panel). Avisual inspection of Figure 7 suggests that the variationin the range of articulatory values associated with per-ceived /æ/ shows a similar pattern. For the infant vocaltract, the tongue body is more fronted than it is in theadult male vocal tract (upper left-hand panel). Apartfroma slightly higher jawposition (lower left-handpanel),no other differences are noticeable.

Interarticulator coupling. To evaluate interarticula-tor relationships between jaw position and tongue body

Figure 5. Range of possible articulatory values in variable linear articulatory model (VLAM) where formant values areenclosed in the dispersion ellipsis (in F1/F2) corresponding to perceived /i/. Data are presented for the tongue body(upper left-hand panel), tongue dorsum (Drsm; upper right-hand panel), jaw (lower left-hand panel), and lip protrusion(LipP; lower right-hand panel) parameters. Two growth stages are represented along the x-axis: newborn and adult male(21 years old).

1280 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 14: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

position, linear regression analyses were carried out onthe parameter values for those articulators for the set ofconfigurations enclosed by the three perceived vowels/i u æ/. Analyses were performed independently for eachvowel and growth stage. Slope values and the R2 corre-lation coefficient are presented in Table 3. High positiveslope values reflect a high degree of correlation betweenthe positions of both articulators. That is, as jaw heightincreases, tongue body position is more back. As can beseen in Table 3, slope values between jaw height andtongue body position for perceived /i /, /u/, and /æ/ in-crease from the infant to the adult stage. Thus, inter-articulator coupling becomes stronger, with tongue bodyposition being more closely related to jaw height as thevocal tract grows. Furthermore, R2 values increase forall three vowels, reflecting an increasing percentage of

variance of tongue body variability that is explained byjaw height.

DiscussionNonuniform Vocal Tract Growthand the Ability to ProduceAmerican English Vowels

The first objective of this experiment was to inves-tigate the effects of nonuniform vocal tract growth on theproduction of the American English vowels / i I e eæ A � oʊ u ʌ ɚ/. To disentangle the roles of vocal tract anatomyandmotor control capacities, the available range of com-mand parameters in the VLAM model was similar for

Figure 6. Range of possible articulatory values in VLAMwhere formant values are enclosed in the dispersion ellipsis (in F1/F2)corresponding to perceived /u/. Data are presented for the tongue body (upper left-hand panel), tongue dorsum (Drsm; upperright-hand panel), jaw (lower left-hand panel), and lip protrusion (LipP; lower right-hand panel) parameters. Two growthstages are represented along the x-axis: newborn and adult male (21 years old).

Ménard et al.: Perceptual Categorization of English Vowels 1281

Page 15: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

6-month-old, 4-year-old, 10-year-old, 16-year-old, and adultmale vocal tracts. The results presented so far reveal thatthe articulatory prototypes related to the three cornervowels /i u æ/ are well perceived by themajority of the lis-teners when generated for the five growth stages. Thus,for those vowels, similar articulatory configurations areassociatedwith the sameperceived vowel category for thespeaker throughout vocal tract growth. Changes in therelative length of the pharyngeal cavity do not preventthe speaker from reaching the auditory targets related tothe corner vowels. A different pattern is, however, observedfor the articulatory prototypes related to the remainingnine American English vowels. Indeed, for those vowels,the adult’s canonical articulatory configurations are asso-ciated with a different perceived vowel target for a new-bornlike vocal tract compared with an older vocal tract.

Figure 7. Range of possible articulatory values in VLAM where formant values are enclosed in the dispersion ellipsis in F1/F2corresponding to perceived /æ/. Data are presented for the tongue body (upper left-hand panel), tongue dorsum (Drsm;upper right-hand panel), jaw (lower left-hand panel), and lip protrusion (LipP; lower right-hand panel) parameters. Twogrowth stages are represented along the x-axis: newborn and adult male (21 years old).

Table 3. Slopes and correlation coefficients from linear regressionanalysis between jaw and tongue body values for perceived /i/, /u/,and /æ/ vowels at 0 and 21 years.

Perceived vowel Parameter

Age

0 21

i Slope .34 .52R2 .18 .63

æ Slope .49 .56R2 .51 .70

u Slope .26 .39R2 .05 .27

Note. All correlations are significant at p < .05.

1282 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 16: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

When considering acoustic prototypes, overall, asimilar pattern is observed. When vowels are generatedon the basis of a stable acoustic position within theMVSand different articulatory configurations, for all vowelsbut /i u æ/, the perceived vowel category related to thehighest perceptual score differs depending on the speak-er ’s growth stage. Thus, a given vowel target does notnecessarily involve the same acoustic and articulatoryconfiguration during vocal tract growth.

Sensorimotor Maps for Infantlikeand Adultlike Vocal Tracts

The second objective was to analyze the links be-tween articulatory configurations and perceptual targetsfor the infant and adult male synthesized vocal tracts inorder to investigate the evolution of sensorimotor maps.To do this, dispersion ellipses were plotted in the F1 ver-sus F2 formant space (see Figure 4), for each of the fivegrowth stages. Figure 4 confirms the stability of the cor-ner vowels, in that it reveals that the three corner vowels/i u æ/ can be dominantly perceived by American Englishlisteners among a set of synthesized five-formant vowelsgenerated in a 6-month-old, a 4-year-old, a 10-year-old, a16-year-old, and a 21-year-old vocal tract (adult male).However, Figure 4 also shows that the dispersion ellipsesassociatedwith the low vowel /æ/ and the high vowels /i ʊ/enclose much broader acoustic areas in the simulated6-month-old vocal tract than in the older vocal tracts. Theareas related to the front vowels /i/, /e/, and /æ/ graduallydecrease with speaker’s age, suggesting that youngerspeakers have more acoustic space to produce those vow-els thandoadult speakers. This result confirms our previ-ous study on French (Ménard et al., 2004). Conversely, thehigh tense vowel /u/ is associated with a larger acousticarea for the adult male than for the younger vocal tracts.Those differences suggest that vocal tract anatomy doesnot prevent speakers fromproducing those vowels butnev-ertheless constrains their partition of the acoustic vowelspace. The sparse occurrence of some vowels in infants’early sound inventorymay instead be attributed to imma-ture motor control abilities (MacNeilage & Davis, 1990).

Further analysis of the underlying articulatory con-figurations related to the perceived dispersion ellipsesrevealed that the ranges of articulatory positions relatedto perceived /i u æ/ vary between the two growth stages.Some positions of the tongue, jaw, and lips associatedwith /i/, /u/, or /æ/ at a very young age are no longer per-ceived as belonging to those categories as the vocal tractgrows (see Figures 5, 6, and 7). Thus, as growth occurs,reaching a perceptual vocalic target involves the gradualadaptation of articulatory positions to produce the appro-priate ranges of articulatory strategies related to theacoustic goal. Vocal tract anatomy, therefore, constrainsthe sensorimotor maps internalized by the speaker during

growth. The articulatory–perceptual relationships es-tablished early in life change with vocal tract growth.

This result has important implications for labelingsystems of infant speech. Indeed, as we pointed out inthe analysis of French (Ménard et al., 2004), the posi-tions of the speech articulators cannot be inferreddirectlyfrom a solely auditory-based or a solely acoustic-based la-beling system. The latter assumes, based on adult pro-duction of a given perceived vowel, that the infant usesthe same articulatory positions to produce this vowel.

The results of our modeling have also shown thatinterarticulator coupling increaseswith vocal tract growthand reconfiguration. Indeed, as shown by linear regres-sion analyses conducted on jaw height and tongue bodypositions (see Table 3) for perceived /i/, /u/, and /æ/ in in-fantlike and adultlike vocal tracts, slope values and R2

coefficients are lower in a 6-month-old vocal tract than inan adultlike vocal tract. Interarticulator coupling rep-resents an important step in speech development. Assuggested by Green et al. (2000), motor control develop-ment is characterized by three phases: (a) differentiation,(b) integration, and (c) refinement. Although the develop-ment of motor control abilities at the muscular and cog-nitive levels is undoubtedly crucial in reaching these stages(Cheng et al., 2007; Smith & Zelaznik, 2004), our resultssuggest that nonuniform vocal tract growth and recon-figuration also contribute to this motor integration. Thisis in line with similar results in adults. It has been sug-gested that in the adult vocal tract, anatomical struc-tures such as the shape of the palate play an importantrole in determining the kinematic patterns of the tongue(Fuchs, Perrier, Geng, & Mooshammer, 2006). The pre-sent experiment showed that analogous anatomical con-straints contribute to the development of interarticulatorsynergies in the child.

Limitations and Further IssuesInterestingly, Figure 4 revealed that some vowel

categories were not perceived at all when produced inthe infantlike and 4-year-old vocal tracts. It is possiblethat the 76 stimuli (including acoustic prototypes andarticulatory prototypes) synthesized for these vocal tractsdid not cover thewhole acoustic space and that someper-ceptual regions associated with those vowel categorieswere not sampled by the stimuli. The fact that dura-tional cues were not modeled also plays an importantrole in explaining the absence of some perceived vowelcategories. It is well known that English vowels differ intheir inherent length. This feature is crucial for the per-ceptual identification of those vowels. (However, seeHillenbrand, Clark, and Houde, 2000, for a discussion ofthe importance of durational cues versus spectral changecues.) In this experiment, vowels were synthesized inisolation, limiting the influence of durational cues. For

Ménard et al.: Perceptual Categorization of English Vowels 1283

Page 17: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

instance, the high lax vowel /ʊ/, which is typically shorterthan its tense counterpart /u/, was dominantly perceivedin all vocal tracts. This methodological choice does notinvalidate these results.

It is also worth noting that the stimuli used in thepresent study did not include variations in formant val-ues either. Although it has been shown that the percep-tion of American English vowels is strongly influencedby such changes (Assmann, Nearey, &Hogan, 1982), thegoal of this study was not to assess the intelligibility ofthe American English vowels per se. Rather, it was tocompare the perception of those vowels in various vocaltract lengths and morphologies using the Ménard et al.(2004) corpus. The lack of durational cues and spectralchange likely contributes to the fact that some vowelcategories are not perceived in the various vocal tracts.

Aswas alsomentioned earlier, the infant’s low intel-ligibility (which led to a reduced number of dominantlyperceived vowels for the 6-month-old vocal tract in thisexperiment) may be ascribed to the undersampled spec-tra inherent to high-pitched voices. One can raise theissue of the confounding role of F0 and formant variationbecause the set of stimuli generated in childlike andinfantlike vocal tracts (with high formant frequencies)were also associated with increasing F0 values. Thispoint was specifically addressed in Ménard et al. (2004),in which a similar corpus was used to determine the roleof F0 and formant frequencies in auditory normalizationof French vowels. In the latter study, stimuli were syn-thesized in vocal tracts representative of seven growthstages, and each of the stimuli was presentedwith sevendifferent F0 values, ranging from 110 Hz to 450 Hz. Theresults showed that although high-pitched voices defi-nitely reduce speech intelligibility, the lower number ofdominantly perceived vowels—and,more specifically, thereduced number of vowel categories—in the 6-month-oldvocal tract compared with the adult vocal tract cannot beascribed to this parameter. Based on those results, thepatterns observed in the present study for the infant-like vocal tract are related to other factors than under-sampled spectra related to high-pitched voices.

In summary, the present experiment demonstratedthat nonuniform changes in vocal tract length from birthto adulthood do not prevent the speakers fromproducing/i u æ/ vowels that will be correctly perceived by Amer-ican English adult listeners. However, the perceptualpartition of the vowel space changes as the speaker ’s vo-cal tract length and morphology evolve. First, perceivedlow vowels are related to broader acoustic areas in infant-like vocal tracts than in adultlike vocal tracts. Second,interarticulator coupling between jawheight and tonguebody positions increases with vocal tract growth and re-configuration. Thus, producing a given perceived vowelcategory likely requires different articulatory strategiesthroughout a speaker ’s lifetime.

AcknowledgmentsThis work was supported by the Social Sciences and

Humanities Research Council of Canada, the Natural Sciencesand Engineering Research Council of Canada,and the FondsQuébécois de Recherche sur la Société et la Culture. We aregrateful to Jean-Luc Schwartz for his insightful commentsand to Zofia Laubitz for copyediting the article.

ReferencesAssmann, P., Nearey, T., & Hogan, J. (1982). Vowel iden-tification: Orthographic, perceptual, and acoustic aspects.The Journal of the Acoustical Society of America, 71,975–989.

Atal, B. S., Chang, J. J., Mathews, M. V., & Tukey, J. W.(1978). Inversion of articulatory-to-acoustic transformationin the vocal tract by a computer-sorting technique. TheJournal of the Acoustical Society of America, 63, 1535–1555.

Badin, P., & Fant, G. (1984). Notes on vocal tract computa-tion. Speech Transmission Laboratory–Quarterly ProgressStatus Report, 2–3, 53–108.

Beck, J. M. (1996). Organic variation of the vocal apparatus.In W. J. Hardcastle & J. Laver (Eds.), Handbook of phoneticsciences (pp. 256–297). Oxford, England: Blackwell.

Boë, L.-J., Gabioud, B., & Perrier, P. (1995). The SMIP:An interactive articulatory–acoustic software for speechproduction studies. Bulletin de la Communication Parlée, 3,137–154.

Boë, L.-J., & Maeda, S. (1997). Modélisation de la croissancedu conduit vocal. Espace vocalique des nouveaux-nés et desadultes. Conséquences pour l’ontogenèse et la phylogenèse[Modeling vocal tract growth. Vowel spaces for newborns andadults. Consequences on ontogenesis and phylogenesis].Journées d’Études Linguistiques: La Voyelle dans Tous sesÉtats, 98–105.

Boë, L.-J., Perrier, P., Guérin, B., & Schwartz, J. L. (1989).Maximal vowel space. Eurospeech, 89, 281–284.

Bothorel, A., Simon, P., Wioland, F., & Zerling, J.-P.(1986).Cinéradiographies des voyelles et consonnes duFran0ais[Cineradiographic of vowels and consonants of French].Strasbourg, France: Institut de Phonétique.

Buhr, R. D. (1980). The emergence of vowels in an infant.Journal of Speech and Hearing Research, 23, 73–94.

Callan, D. E., Kent, R. D., Guenther, F. H., & Vorperian,H. K. (2000). An auditory-feedback-based neural networkmodel of speech production that is robust to developmentalchanges in the size and shape of the articulatory system.Journal of Speech, Language, and Hearing Research, 43,721–736.

Cheng, H. Y., Murdoch, B. E., Goozee, J. V., & Scott, D.(2007). Physiologic development of tongue–jaw coordinationfrom childhood to adulthood. Journal of Speech, Language,and Hearing Research, 50, 352–360.

Davis, B. L., & MacNeilage, P. F. (1995). The articulatorybasis of babbling. Journal of Speech, Language, and HearingResearch, 38, 1199–1211.

Fant,G., Liljencrants, J.,&Lin,Q. (1985). A four-parametermodel of glottal flow. KTH, Speech Transmission LaboratoryQuarterly Progress and Status Report, 4, 1–13.

1284 Journal of Speech, Language, and Hearing Research • Vol. 52 • 1268–1285 • October 2009

Page 18: Producing American English Vowels During Vocal Tract ... · Producing American English Vowels During Vocal Tract Growth: A Perceptual Categorization Study of Synthesized Vowels

Feng, G. (1983). Vers une synthèse par la méthode des pôleset des zéros [Toward a synthesis method of using polesand zeroes]. Actes des Journées d’Étude sur la Parole,GCP–GALF, 155–157.

Fitch, W. T., & Giedd, J. (1999). Morphology and develop-ment of the human vocal tract: A study using magneticresonance imaging. The Journal of the Acoustical Societyof America, 106, 1511–1522.

Fuchs, S., Perrier, P., Geng, C., &Mooshammer, C. (2006).What role does the palate play in speech motor control?Insights from tongue kinematics for German alveolarobstruents. In J. Harrington & M. Tabain (Eds.), Speechproduction: Models, phonetic processes and techniques(pp. 149–164). New York: Psychology Press.

Goldstein, U. G. (1980). An articulatory model for the vocaltract of growing children. Unpublished doctoral dissertation,MIT, Cambridge, Massachusetts.

Green, J. R., Moore, C. A., Higashikawa, M., & Steeve,R. W. (2000). The physiologic development of speech motorcontrol: Lip and jaw coordination. Journal of Speech, Lan-guage, and Hearing Research, 43, 239–255.

Green, J. R., Moore, C. A., & Reilly, K. (2002). The sequen-tial development of jaw and lip control for speech. Journalof Speech, Language, and Hearing Research, 45, 66–79.

Heinz, J. M., & Stevens, K. N. (1965). On the relationsbetween lateral cineradiographs, area functions, and acous-tic spectra of speech. Proceedings of the 5th InternationalCongress of Acoustics, Paper A44.

Hillenbrand, J. M., Clark, M. J., & Houde, R. A. (2000).Some effects of duration on vowel recognition. The Journalof the Acoustical Society of America, 108, 3013–3022.

Kent, R. D. (1976). Anatomical and neuromuscular matura-tion of the speech mechanism: Evidence from acousticstudies. Journal of Speech and Hearing Research, 19,421–447.

Kuhl, P. K., & Meltzoff, A. N. (1982, December 10). Thebimodal perception of speech in infancy. Science, 218,1138–1141.

MacNeilage, P. F., & Davis, B. L. (1990). Acquisition ofspeech production: Frames then content. In M. Jannerod(Ed.), Attention and performance XIII: Motor representationand control (pp. 453–475). Hillsdale, NJ: Erlbaum.

Maddieson, I. (1986). Patterns of sounds (2nd ed.). Cam-bridge, England: Cambridge University Press.

Maeda, S. (1979). An articulatory model of the tongue basedon a statistical analysis. The Journal of the AcousticalSociety of America, 65, S22.

Maeda, S. (1990). Compensatory articulation during speech:Evidence from the analysis and synthesis of vocal-tractshapes using an articulatory model. In W. J. Hardcastle &A. Marchal (Eds.), Speech production and speech model-ling (pp. 131–149). Dordrecht, The Netherlands: KluwerAcademic.

Ménard, L., Schwartz, J.-L., & Boë, L.-J. (2004). Role ofvocal tract morphology in speech development: Perceptualtargets and sensorimotor maps for synthesized Frenchvowels from birth to adulthood. Journal of Speech, Lan-guage, and Hearing Research, 47, 1059–1080.

Schwartz, J.-L., Boë, L.-J., Vallée, N., & Abry, C. (1997).The dispersion–focalization theory of vowel systems.Journal of Phonetics, 25, 255–286.

Smith, A., & Zelaznik, H. N. (2004). Development of func-tional synergies for speech motor coordination in childhoodand adolescence. Developmental Psychobiology, 45, 22–33.

Traunmüller, H. (1990). A note on hidden factors in vowelperception experiments. The Journal of the AcousticalSociety of America, 88, 2015–2019.

Vallée, N. (1994). Systèmes vocaliques: De la typologie auxprédictions [Vowel systems: From typology to prediction].Unpublished doctoral dissertation, Université Stendhal/Institut de la Communication Parlée, Grenoble, France.

Vilain, A., Abry, C., Badin, P., & Brosda, S. (1999). Fromidiosyncratic pure frames to variegated babbling: Evidencefrom articulatory modeling. In J. J. Ohala, Y. Hasegawa,M. Ohala, D. Granville, & A. C. Bailey (Eds.), Proceedingsof the 14th International Congress of Phonetic Sciences,San Francisco (pp. 2497–2500). Berkeley: University ofCalifornia.

Vorperian, H. K., Kent, R. D., Lindstrom, M. J., Kalina,C. M., Gentry, L. R., & Yandell, B. S. (2005). Developmentof vocal tract length during early childhood: A magneticresonance imaging study. The Journal of the AcousticalSociety of America, 117, 338–350.

Walsh, B., & Smith, A. (2002). Articulatory movements inadolescents: Evidence for protracted development of speechmotor control processes. Journal of Speech, Language, andHearing Research, 45, 1119–1133.

Received January 9, 2008

Accepted December 10, 2008

DOI: 10.1044/1092-4388(2009/08-0008)

Contact author: Lucie Ménard, Département de Linguistiqueet Didactique des Langues, Université du Québec àMontréal, Case Postale 8888, Succursale Centre-Ville,Montreal, Quebec H3C 3P8, Canada.E-mail: [email protected].

Ménard et al.: Perceptual Categorization of English Vowels 1285