allophone recognition for l2 learners: pattern … · allophone recognition for l2 learners:...

Lagos Papers in English Studies Vol. 1: 1-19 (2007)

July - 2007 1

ALLOPHONE RECOGNITION FOR L2 LEARNERS:

PATTERN MATCHING OR RULE-BASED EFFECT?

CHRISTOPH HAASE

CHEMNITZ UNIVERSITY OF TECHNOLOGY

Partaking in the discussion of learner abilities in a second language we try to solidify a notion of interlanguage that holds not only from an acquisition point of view but also typologically, rendering interlanguage a "proper" language in certain areas. For this, allophonic variation of aspirated and unaspirated stop consonants has been studied in its effect on perception of highly competent L2 learners. Learners were expected to show inhibited performance due to a lack in discriminating competence. This lack is not acquired but a feature of the interlanguage which therefore shows typologically plausible gaps in the stop system. The study demonstrates insight into the learner's phonological model of a target language and makes some predictions about extent and limitations of their auditive performance.

1. Introduction The acquisition of phonological subtleties in a second language is subject to a variety of influences of which rapidity (speaking rate), phoneme separation and noise are parameters, separation in part being a function of rapidity. The listener needs the resource of time in order to process the phonetic signal. This timespan can be considered as depending on competence which represents a parameter that merits further investigation when set to correlate with learner performances. Another parameter of complexity of the speech signal is coarticulation (e.g. Miller, 2001): any phoneme carries more information for the specific segment; cf. the /k/ in /ki.p/ (keep) is different to the one in /ku.l/ (cool). However, as can be observed stably, second language learners simplify in articulation, but likewise their audition is comparatively impaired. It takes time until learners start to "hear" differences, to tune and successively readjust their phonological competence to levels beyond mere intelligibility. Although the production of velar stops for L2 learners of English with first language German can be considered straightforward and unproblematic as far as intelligibility is concerned, usage conforming to rules of English phonology is not. Together with [voice] the second laryngeal feature, [aspiration], has a distinctive function in the respective language (Brown 1998: 141). A contoid (i.e. a consonant in its phonetic characterization) can be aspirated in that the voiceless contoid is followed by a voiceless lenis laryngeal approximant. In English aspiration is phonetic for single words: pin /'p.n/ ['p..n]. This is also the case with German zehn (“ten”) (/'tse.n/ ['t.se.n]) but not in Mandarin


July - 2007 2

Chinese, where aspiration is phonological (Canepari 2005: 203) and not in Cuzco Quechua, where aspiration is not only phonological but also part of a three-way distinctive laryngeal node feature with aspirated, unaspirated and glottalized stops (Parker/Weber 1996: 75). The Quechua word tanta has therefore three lexical entries: /tanta / [tanta] -”collection“, /t…þanta/ [t…þanta] - “bread“ and /t.anta/ [t.anta] – “old, used up“ (ibid.). Further, English and German differ from other languages that also have preaspirated stops in the way they employ laryngeal contrasts in connected speech. In Russian and Hungarian for example this can be captured as a two-way [voice] contrast, in German it is [spread] and in English (together with Swedish and Russian) it is [voice] and [spread], cf. Petrova et al., (2006). This contrast is to a lesser extent noticeable in English as a second language. What is observable is a certain carelessness of advanced second language speakers and a subsequent difficulty to imitate and apply. In this study, tables have turned for the speakers insofar as only the perceptional side is considered. Subjects were asked to deduce the rule that governs aspiration in velar stops of which English has a richer system as German or French, in comparison. The experiment follows suggestions made in a landmark study by Jaeger (cf. Jaeger 1980) and revisited by Ohala (Ohala 1982: 237) and considers more current results by Whalen/Best/Irwin, 1997, Best/McRoberts, 2003 and Kingston, 2003. In the final part, we sketch a perspective on the phonological acquisition of second language learners and investigate second language parameters against results obtained by Best (Perceptual Assimilation Model, PAM), Best 1994. Aspiration has been selected as a parameter as it provides a clear-cut binary feature. There is one exception to this: When stress is also considered, then aspiration is strongest in syllableinitial positions of stressed syllables (Giegerich 1992: 219). This has been observed early, cf. Trubetzkoy who describes that a 1-dimensional opposition between phonemes of the same obstructional level renders particular correlations (1989[1939]:138). Although non-distinctive in English as well as in German, the opposition provides a testing ground for auditive competence of L2 speakers of English with advanced competence. Although e.g. [p] and [p.] do not form minimal pairs and do not occur in the same environment (Hyman 1975: 62), English is particularly rule-governed when it comes to aspiration, a circumstance this study makes use of. 2. Aspiration: The phonological and typological perspective Aspiration affords higher energy when it is produced to accompany a stop sound as it involves a somewhat stronger puff or burst of air. The articulatory background can be found in the feature of [-spread glottis] for unaspirated stops (e.g. Vaux 1998: 497) and respectively [+spread glottis] for aspirated stops. In a simplified way, the burst of air can therefore be explained as a glottal phenomenon where the glottis opens for a brief moment to leave a gap for the airstream to pass, which is the case especially in voiceless consonant


July - 2007 3

speech sounds [p], [k] (cf. Giegerich 1992: 3). This explanation, however, ignores the existence of aspirated voiced stops as [b.]. Aspiration is therefore often seen as a “weakened intermediate step in the spirantization of plosives” (Dogil/Luschützky 1990: 37). Hence, the glottis involvement must be more divergent. A time course consideration of the glottis makes obvious that the timing of glottis movement is crucial. Differences in aspiration are measured as a function of the amount of air ejected. Quantity of aspiration is therefore “commonly defined as delay in the onset of the vibration of the cords after the release of a preceding voiceless consonant /p.at / vs. /spat/” (Kenstowicz/Kisseberth 1979: 10). This means that for unaspirated stops the occlusion is released simultaneously with the voice onset (Clark/Yallop 2002: 53) whereas for aspirated stops the voice onset follows the release with a short but discernible delay. Accordingly, the phonological features accompanying aspiration are laryngeal features. Of the two laryngeal features, one controls glottalization [+constricted gl], the other aspiration [+spread gl] (Kenstowicz 1994: 493). This also means that both features cannot occur at the same time for the same speech segment, they are incompatible, cf. also Paradis/LaCharite 2001: 291. On the other hand, Kenstowicz sees only few or no constraints on [- constr gl] or [- spread gl] (ibid) which creates a certain asymmetry for any categorization which is confined to positive (+) variants of the respective features. 2.1. Rules governing aspiration in English

The aspiration rules in English must be subdivided according to the concerning stop consonant although they share some common features. Even though English lacks words that would substitute aspirated [p] for unaspirated [p], aspirated [p] always occurs in certain positions in a word, like word-initial or syllable-initial positions whereas unaspirated stops occupy positions in the middle of words or ends of syllables. Conversely, Kenstowicz and Kisseberth conclude that unaspirated voiceless stops occur in noninitial position (Kenstowicz/Kisseberth 1979: 30), aspiration of voiceless stops in initial position in monosyllabic words is even an “invariable correlation” (ibid.). This statement however, can not be made for all environments. Kenstowicz observes further, that a preceding [s] deprives voiceless stops of all aspiration as in [spat] (ibid: 257). Here, lip closure from [p] coincides with the onset of the vowel, there is no discernable delay, the stop comes out unaspirated. If a stop consonant is aspirated or not is therefore not “completely predictable from context in English” (Caplan 1987: 202). Furthermore, attention has to be paid to the distribution in bi- or polysyllabic words and the presence or absence of stress. It is therefore possible to formulate a rule for the distribution of aspiration in English stops. The obvious difference enables native speakers effortlessly to articulate according to the rule: “In English, voiceless stops are generally aspirated, except after /s/ and before unstressed vowels, where they are unaspirated.” (Crowley 1992: 220). The full system of English aspirated and unaspirated stops is more complex. Although Clark and Yallop maintain that


July - 2007 4

unaspirated initial stops besides their distribution in German and Dutch do occur in certain dialects in northern England (Clark/Yallop 2002: 52), the canonical system for English looks as follows: English has a 4 (6)-way system, cf. Vaux 1998: 497: b. p p. d d. t t. g g. k k. Deviations from this are dialectal as well as variational: Scottish Standard English exhibits the weakest aspiration of all distinctive varieties of English (Giegerich 1992: 219). Gaps in the system like the missing unaspirated [b] are not uncommon (Clark/Yallop 2002: 124), uncommon however are fully-fledged systems that exhibit all possible stops in aspirated as well as unaspirated variation. The marking of aspiration modifies the time course of the stop articulation. Unaspirated stops occupy very short durations of about 10-15 ms, aspiration extends this duration up to about 50 ms (Fry 1987: 122). Measurements in the experiment described here clocked 18 ms for [k] and 56 ms for [kh], cf. waveforms fig. 1 and fig. 2. 3. Hypotheses and assumptions

With the experimental setup suggested here, our study of learner subject perception performance aims at several different propositions. First, some assumptions are made about aspiration in interlanguage competence and ways to consider learner interlanguage as part of a typological spectrum that includes a solid if incomplete system of aspirated speech sounds. Second, differences in perception performance are related to the inference of rules out of stimuli on basis of a first language.

3.1. Aspiration in learner interlanguage The idea of interlanguage as brought forward by Selinker among others from 1972 onward is a program to map the progress to proficiency of second language learners. According to Selinker, interlanguage is a structured system a learner constructs during acquisition (Ellis 1992: 42). This system underlies processes of transfer, overgeneralization and instructed rules and is thought of as a continuum, the end of which yields competence in a second language. Moreover, it does not become clear if competence is part of the continuum or the end of it – which leaves the question if competence itself is a continuum. Arguably, all these processes are processes of simplification. At any state, however, rules are considered to be in flux where the rule system is considered open instead of closed. Open phonological rules would generate a random, erratic behavior of test subjects as it is open to debate whether interlanguage in Selinker’s definition is comparable at all. However, Ellis maintains that interlanguage is systematic even though he states that free variation occurs which leads to learner performance that varies from task to task. The criteria mentioned above make interlanguage look like an ad hoc device that does not resemble natural language to a large extent. This is however implausible given that learners end up with L2 competence. Calling it “interlanguage” therefore invites criticism and debate. If however Selinker's notion of interlanguage is true that L2 learners reactivate latent language


July - 2007 5

structure and subsequently employ the same cognitive mechanisms and devices responsible for first language acquisition then these same mechanisms will determine their internal phonological representation and thus inhibit their performance. As Corder maintains, 95% of all learners stop before reaching the end of the interlanguage continuum - native-like competence - and fossilize at a certain stage, phonological fossilization would be only one component and a logical outcome of second language acquisition. As a starting hypothesis, L2 learners of the studied type have a strong bias towards unaspiration as their interlanguage underlies transfer effects that are not compensated or overruled by instruction. As interlanguage usually omits elements that are non-essential for communication (Cook 1993: 40) aspiration looks negligible at first glance. Even at second glance the feature is produced by imitation only as there is no physiological necessity to aspirate (Kenstowicz/Kisseberth 1979: 30). Accordingly, the question why aspiration occurs at all is sometimes considered epiphenomenal and outside phonological theory (Giegerich 1992: 292p.). In §5. we will make some assumptions of origin and scope of aspiration within a general theory of phonation. 3.2. Hypotheses on aspiration perception of L2 learners

Learners and native speakers use considerably different acoustic-phonetic cues to segment speech and to perceive word boundaries. Often, perception of differences between e.g. keep sticking and keeps ticking (from an experiment in Altenberg 2005: 331) is severely inhibited for L2 learners. Several parameters were identified to inhibit the perception of segmentation (longer /s/ with higher amplitude in sticking, longer closure in keep than in keeps) but primarily the segmentation can be related to the perception of aspirated /t./ in ticking versus unaspirated /t/ in sticking, or more general, aspiration “[is] a primary cue in English speech segmentation” (ibid: 328). Acquisition is a mapping of abstract implicational rules onto concrete speech. Ohala maintains in this respect, that occurrences of allophones would be grouped into phonemes (1982: 236) according to parametrized learning. However, establishing these parameters is difficult and unstable across languages. On the other hand, other researchers see a general principle where “stimulus concreteness is mediated by cognitive processes” as Reynolds and Paivio in a classic 1968 study on cognitive determinants of speech suggest (Reynolds/Paivio 1968: 164). This means, allophonic differences are leveled, intelligibility rules; if a certain bundle of features is kept coherent even the broadest L1 dialect will be intelligible to nonnative speakers. Whenever this bundle deviates considerably, guesswork will step in or mild communication difficulties up to communication breakdown will follow (cf. also Atechi, 2006). This is in accordance with a certain hypothetical “capability-continuum” in interlanguage (Cook 1993: 100) from careful to vernacular which can be applied to articulation and therefore audition as well.


July - 2007 6

fig. 1: Unaspirated [k], English segment, native German speaker

Fig. 1 shows an English segment with initial /k./ spoken by a native speaker of German (waveform, top). The spectrogram below shows a more protracted VOT than fig. 2 and much less aspiration, visible in the lower spectral range of 0-4 kHz, spread out across approximately 25 ms. Remark that for approximants the full frequency scale of 0-11 kHz was used (Clark/Yallop 1995: 255).


July - 2007 7

fig. 2: Aspirated [k.], English segment, native English speaker

The native speaker segment shows a short VOT and a clear frequency distribution not only across a much longer timespan (39 ms) but also with more approximant noise in all regions of the spectrum. This difference in performance can be ascertained in all phonetic environments. For native speakers, aspiration will therefore be perceived in different contexts differently. Whalen, Best and Irwin observe perceptual preference in "appropriate and inappropriate stress contexts" (1997: 501). A leveled difference should be language specific only according to the feature that is present and active in the first language of the speaker. As a second language will have an impoverished system in terms of aspiration (as Vaux 1998: 497 notes), so will – in the case of French and German – have the first language. As a result, L2 speakers will have systematic difficulties in discriminating between aspirated and unaspirated stops when they do not occur in their mother tongue and rule-based L2 acquisition never focuses on the difference. That this difference could possibly become pivotal during acquisition is unlikely because the feature is (in contrast to phonation, i.e. laryngeal activity that is neither initiatory nor articulatory) nondistinctive. The situation will likely be that the L2 speaker will not hear the difference even at high competence levels. As Kenstowicz/Kisseberth explain, an English speaker will say [p.a] for pas. “[H]e will have trouble hearing the difference between his [p.a] and the French [pa]” (1979: 31). The French learners of English, however, will encounter difficulties of a different kind: they are “sensitive to the difference between aspirated and


July - 2007 8

nonaspirated consonants since their language lacks aspirated voiceless stops” (ibid: 256). For German can be concluded that German learners of English will level the distinction between aspirated and unaspirated altogether. Even if production is rule-based and near-native speaker like, these learners will be blind to the difference when listening. This working hypothesis is further exacerbated as all test subjects have a Saxon German background. In an aside, Arnold and Hansen suggest normatively that all Saxon learners shall aspirate voiceless stops as they tend to lenitize in general (Arnold/Hansen 1992: 148). Such rule following further levels discriminatory competence even though it is commonly agreed that /b d g/ > /p t k / is not difficult for advanced learners, as Moyer notes (who studied word-final segments) (Moyer 1999: 86). A typologically plausible interlanguage would add some realism to the theory. A language cannot possibly be more primitive than any other language. This is the reason why primates do not have simplified language/syntax, for an extensive discussion cf. Deacon, 1998. Therefore we add some remarks on interlanguage in the perspective of second language acquisition where interlanguage is considered. 4. Experimental setup and data

The experiment was conducted in two phases, one training session followed by an evaluation session. The training session did not establish or highlighted a rule, instead, after each word occurrence, it was indicated, if the rule was matched on example or not. The words were replayed out of context; there was a break between each word for de-triggering and for the subjects to form their hypothesis. This was designed to exclude further acoustic cues, especially for the voiced-voiceless distinction where larynx activity results in slightly prolonged VOTs that trigger voiced initial consonants (cf. Fry 1987: 135-136). The importance of such control issues is exacerbated by neurolinguistic evidence from cerebellar speech perception mechanisms (Mathiak et al. 2002: 902). This study investigated the representation of temporal information where either wordmedial pauses or wordmedial aspiration noise as intrasegmental cues signaled the difference between the German words Boten (“messengers”) and Boden (“floor”). The study could show that durational parameters were singularly encoded. Thus, controlling these cues turned out to be important for the validity of our study. 4.1. Stimuli and results

In the first session, 75 items were replayed. It was indicated whether the word was a “good” example or not. These target training stimuli cued learner subjects towards discriminating phonemes even though the specific phoneme in question was not indicated. In the second session, 40 items were replayed. The breaks were slightly longer (> 10s) and no information whatsoever was given.


July - 2007 9

All speech samples had been recorded previously and spoken by a native speaker (Irish English native). Test participants were uniformly native speakers of German with advanced L2 competence in English. Recording equipment was a sound card Yamaha OPL running Soundforge 6.0. Replaying equipment was a laptop and regular external speakers. Spectrograms were generated using LingWAVES v.2.30. The first tables show the performance of the listeners after the training session, the results of the test session are in the second tables.

tab. 1: Sample of training session results (upper half) and test session results (lower half)

Performance results of the learner subjects (N=19 German native speakers with very high proficiency in English) are displayed together with the rule that applied to either aspirated and unaspirated [k.]. This cue was given to the learner subjects after the stimulus in all trials of the training session (upper half of table 1). No cue whatsoever was given in the test sessions (lower half of table 1). Performance to rule divergence is displayed in fig. 4 and fig. 5, cf. §5. 4.2. Spectrogram analysis As explained in §2. The aspirated allophonic variant affords higher energy to produce and maintain over time, cf. spectrograms in fig. 1 and 2. Darker areas represent zones of higher phonetic energy although both speech sounds occupy nearly the same frequency range (vertical axis). Commonly, for the aspirated


July - 2007 10

[k.] the energy is more evenly distributed over a longer period of > 40 ms whereas the stop burst for [k] is much shorter with about 10 ms.

fig.3: Approximant “noise” across 0-11 kHz

The recorded sample from the training session in fig. 3 exhibits the voiceless stop consonant with a burst of noise of very short duration (10ms from ms 35 to ms 45, graph scale). The burst of noise energy is spread widely over the spectrum of 0-11 kHz. Fry estimates 10-15 ms for unaspirated stops and >50 ms for aspirated stops (Fry 1987: 122). The intensity for voiceless velar stops is comparatively high in the lower parts of the spectrum (1-4 kHz) and represents a somewhat marked individual difference to their canonical representation. The following segment shows aspiration (ms 45 to ms 70, graph scale) with a “progressive narrowing of the noise band passed by the vocal tract filter” (ibid.). The laryngeal nature of the aspirated segment makes this more vocalic and peaks at around 3 kHz in intensity. It usually anticipates formant F0 of the following vowel (not shown). 5. Evaluation and discussion

The data obtained show a strong tendency towards a learning effect throughout the test. The first graph in fig. 3 shows performance results of the training session. We recognize two largely divergent graphs with student performance at chance level in the first trials (1-40) and later parallelizing more with the dotted model graph (>40). The model graph represents the linear relationship between full rule application and trial number, this means both increment with 1. The performance graph (straight line) accumulates mean performance of all test


July - 2007 11

subjects and increments thus always <1 (1 is theoretically possible with all subjects showing rule following).

fig. 4: Training session results with accumulated rule of ideal recognition (dotted line) and subject performance (straight line)

The test session graphs (in which rule deduction should be complete and no cue was given whether a form conformed to it or not) shows only slight divergence, all subjects showing rule following greater than chance (cf. correlation results).

fig. 5: Test session results with accumulated rule of ideal recognition (dotted line) and subject

performance (straight line)

The almost parallel graphs indicate rule acquisition roughly after 70 trials. This means that test subjects had enhanced their discriminatory ability either a) after a process of rule following or b) after a process of pattern matching with the stimuli given or c) by both.


July - 2007 12

However, the rule was administered “blindly” and most trials of most learner subjects in the training session were still beating chance levels with 77% recognition. This strongly suggests that learners applied general cognitive pattern matching abilities, as the distribution graph shows.

fig. 5: Grouped frequency distribution of performance for training session (dotted line) and test

session (straight line)

The grouped frequency distributions (interpolated histogram in fig. 5) of the training session and the test session yield higher recognition scores for the test session (straight line) in a distribution that is strongly skewed to the right for training and test session. In the interpolated graph we recognize a chance behavior at the first tries (01-19), a more convergent pattern afterwards (20-35) although the scores never exceed 0.8 (80%) recognition. Performance degrades somewhat (36-56) with some irregular spikes (48-52) but henceforth improving scores up to 77% over the remaining training session. Hayes-Harb observed suppression of learner subjects’ pre-existing discrimination ability (Hayes-Harb 2007: 76). On basis of rule inference this was a plausible outcome but was not observed in our study. In order to control for individual deviation in pattern recognition vs. rule following, we made a correlation analysis of the matrix of learner subjects to their performance in both, training and test session.

tab. 2: Individual accumulated learner subject scores (sample)

The overall improvement of the learner subject scores shows high positive correlation with a Pearson of rP(19)= 0.949, p<.0005 one-tailed. This allows for


July - 2007 13

a very confident establishment of correlation with a critical value of 0.693 for p=.0005 and N=19. From a typological point of view a diverse picture emerges from the data. There is considerable debate whether phonation and aspiration are a continuum or two closely related but separate phenomena. Both are laryngeal features with, across languages, similar discriminative functions. Aspirates are parallel to affricates as possible intermediate stage in a development from stops to fricatives also in non-Indo-European languages. In Urdu, first language acquiring children must maintain two phonetic dimensions of closure voicing and breathiness which amounts to a 4-way contrast among /p/, /b/, /ph/, /bh/ (Pierrehumbert 2003: 131). Analogue effects can be found in different Bantu languages. Swahili as a North-East coastal Bantu employs a phonemic contrast of aspiration in voiceless plosives (Maw 1981: 225). Zulu, a South-East Bantu systematically transforms voiceless aspirates into ejectives. As Dogil and Luschützky note, the following universal pattern holds (1990: 36): [-constr gl] [-spread gl] > voiceless [-constr gl] [+spread gl] > aspirated [+constr gl] [-spread gl] > voicing [+constr gl] [+spread gl] > breathy voicing This trend offers a solution out of the aspiration – phonation dilemma: voiceless aspirates and voiceless plosives are systematically lenitized by dephonation (ibid: 39), given that stop lenition is a common process in phonological change (Crowley 1992: 40). All this confirms the role of aspiration as an aspect of phonation, not of initiation (as in syllabic stress) and denies hypotheses of aspirated stops as actually starting out as geminates (Scheer/Ségéral 2000: 3). A covert reason for aspiration would therefore be gemination of the consonant in the empty [CV] slot of the initial stressed syllable. This can not be substantiated, given that the recognition scores for initial segments are not significantly higher than the scores for medial and final segments (cf. fig. 6).


July - 2007 14

fig. 6: Recognition scores for initial (I), medial (C) and final (F) segments involving /k/ and /k./

The findings for stress marking further emphasize this, as recognition for non-initial stressed segments was slightly higher (0.81) than the mean recognition of all stressed segments (0.79). This ties in with Cutler’s remark that learner assimilation of phonological rules rarely occur in initial positions. Speakers “avoid distorting the onset of words” (Cutler 1987: 30) although this concerns especially words of particular discourse significance or emphasis-stressed words. Cognitive primacy effects may be a plausible explanation for this. More significantly, the parameter of syllabicity interferes with recognition performance. In graph fig. 7 the dependency of recognition scores on syllabicity is shown. Monosyllabic items were recognized below average, recognition was best for polysyllabic items.


July - 2007 15

fig. 7: Recognition scores as a function of syllabicity

Best and McRoberts (2003) as well as Kingston (2003) obtained interesting results within the framework of Best’s Perceptual Assimilation Model (PAM) (Best, 1994). The core principle in PAM is that learners assimilate foreign speech sounds to different native phonemes. To train learner subjects to identify non-native phonemes teaches them sets of exemplars rather than more abstract distinctive feature values (Kingston 2003: 295). This rather parallels our starting question as to whether phoneme perception bases on abstract rule following or on pattern matching that relies on general cognitive problem solving abilities. Further, PAM makes predictions as to what extent learners discriminate foreign sounds at all. It introduces the models TC and SC. TC means “two category”, two foreign sounds are assimilated to two native sounds, and SC = “single category”, two foreign sounds assimilated to one native sound. If TC prevails, then one of the foreign sounds assimilates better than the other; thus foreign sounds differ in category goodness (CG). Kingston detected a CG difference between aspirated and ejective [kh] – [k'] (as in Zulu), both assimilate to English /k/ but [kh] is better in category goodness (Kingston 2003: 297). This can be explained well, as [kh] is the typical English allophone of [k] in initial position and [k'] never occurs in this phonetic environment. Kingston’s PAM prediction is therefore that the English lack of a [constricted glottis] contrast leads to SC assimilations for English learners. Best and McRoberts, working on the same laryngeal distinction in isiZulu and the lack thereof in English, obtained similar results and concluded: “as for allophonic experience with these phones, English provides much exposure to [kh], but none to ejective stops” (Best and McRoberts 2003: 190). Interestingly, speakers of American English hear the ejective glottal gesture as a non-speech vocal tract event superimposed on /k/. Both studies parallel the approach undertaken here: German learners conflate the foreign discrimination (training


July - 2007 16

session) but can be trained to perceptually discriminate based on pattern matching of good category members. 6. Conclusions

As elaborated in the previous paragraph, L2 speakers converge on the phenomenon as given in the training session without the specific information given. This clearly supports a learning effect over time. We have argued for a certain “blindness” of German learners toward a differentiated perception of [k] and [k.], out of the fact that this phonological feature is not present in their first language (German) and – for single items - of no discriminating use in their second language (English). The test results show accordingly that learners conflate [k] and [k.] when exposed to them in isolated words. Abstract or less frequent words seemed to provide no inhibiting effect as e.g. argued in Harley 1995: 71. As a conclusion, these test results offer an unparametrized (i.e. inductive) look at the L2 acquisition of phonological perception and argue strongly for a pattern-driven learning style. Phonological rules as those stated here are never “learned” nor “acquired” in the original sense as they fail to figure prominently in the correspondence of production (where the distinction is made by L2 speakers) and comprehension, where the distinction is lost. From a typological perspective, the interlanguage of these speakers provides a realistic framework of a language conforming to the typological constraints of a real language. As there is no physiological necessity to aspirate (cf. §2.) from an evolutionary point of view the feature appears like a waste of resources (time) and energy (added glottis movement, air burst ejection). This is also substantiated by findings from non-Indo-European languages (cf. §5). It is therefore a feature which could not have emerged on evolutionary grounds alone. 7. Bibliography

Altenberg, Evelyn P. (2005). “The perception of word boundaries in a second language”. Second Language Research 21 (4), 325-58.

Arnold, Roland/Hansen, Klaus (1992). Englische Phonetik. Leipzig: Langenscheidt.

Atechi, Samuel N. (2006). The Intelligibility of Native and Non-native English Speech. Göttingen: Cuvillier.

Beach, Elizabeth Francis, Burnham, Denis and Kitamura, Christine (2001). ”Bilingualism and the relationship between perception and production: Greek/English bilinguals and Thai bilabial stops”. International Journal of Bilingualism 5 (2), 221-35.

Best, Catherine (1994). The emergence of native-language phonological influences in infants: A perceptual assimilation model. In: Nusbaum, H.C. (ed.). The development of speech perception: The transition from speech sounds to spoken words. Cambridge, MA: The MIT Press, 167-224.


July - 2007 17

Best, Catherine and McRoberts, Gerald W. (2003). “Infant perception of non-native consonant contrasts that adults assimilate in different ways”. Language and Speech 46 (2-3), 183-216.

Boersma, Paul (2000). Phonetically-driven acquisition of phonology. Unpublished Ms. Amsterdam: University of Amsterdam.

Brown, Cynthia A. (1998). “The role of the L1 grammar in the L2 acquisition of segmental structure”. Second Language Research 14 (2), 136–93.

Canepari, Luciano (2005). A handbook of phonetics. München: LINCOM. Caplan, David (1987). Neurolinguistics and linguistic aphasiology. Cambridge:

Cambridge University Press. Clark, John and Yallop, Colin (1995). An Introduction to phonetics and

phonology. Oxford: Blackwell. Cook, Vivian (1993). Linguistics and second language acquisition. Basingstoke:

Macmillan. Crowley, Terry (1992). An introduction to historical linguistics. Oxford: Oxford

University Press. Crystal, Thomas H. and House, Arthur S. (1990). “Articulation rate and the

duration of syllables and stress groups in connected speech”. The Journal of Acoustical Society in America 88(1), 101-12.

Cutler, E. Anne (1987). Speaking for listening. In: Allport, Alan et al. (eds.). Language perception and production: Relationships between listening, speaking, reading and writing. London: Academic Press, 23-40.

Deacon, Terrence W. (1998). The symbolic species: The co-evolution of language and the brain. New York: W.W. Norton & Company.

Dogil, Grzegorz and Luschützky, Hans Christian (1990). “Notes on sonority and segmental strength”. Rivista di linguistica 2, 3-54.

Ellis, Rod (1992). Understanding second language acquisition. Oxford: Oxford University Press.

Fry, Dennis Butler (1987). The physics of speech. Cambridge: Cambridge University Press.

Giegerich, Heinz J. (1992). English phonology: An introduction. Cambridge: Cambridge University Press.

Greenberg, Joseph H. (1995). The diachronic typological approach to language. In: Shibatani,

Masayoshi and Bynon, Theodora (eds.). Approaches to language typology. Oxford: Clarendon Press.

Gussenhoven, Carlos (1991). “The English rhythm rule as an accent deletion rule”. Phonology 8, 1-35.

Hall, T. A., Hamann, Silke, and Zygis, Marzena (2006). “The phonetic motivation for phonological stop assimilation”. Journal of the International Phonetic Association 36 (1), 59-81.

Hardcastle, W.J. (1981). Experimental studies in lingual coarticulation. In: Asher, R.E. and

Henderson, J.A. (eds.). Towards a history of phonetics, 50-66. Harley, Trevor (1995). The psychology of language. Hove: Psychology Press.


July - 2007 18

Hayes-Harb, Rachel (2007). “Lexical and statistical evidence in the acquisition of second language phonemes”. Second Language Research 23(1), 65–94.

Hyman, Larry M. (1975). Phonology. Theory and analysis. New York: Holt, Rinehart and Winston.

Jaeger, Jeri J. (1980). “Testing the psychological reality of phonemes”. Language and Speech 23, 233-53.

Jessen, Michael (2004). “Instability in the production and perception of intervocalic closure voicing as a cue to bdg vs. ptk in German”. Folia Linguistica 38 (1-2), 27-41.

Jones, Amanda Katherine (2002). “A lexicon-independent phonological well-formedness effect: Listeners’ sensitivity to inappropriate aspiration in initial /st/ clusters”. University of California Working Papers in Phonetics 100, 33-71.

Kallestinova, Elena (2004). “Voice and aspiration of stops in Turkish”. Folia Linguistica, 38 (1-2), 117-43.

Kenstowicz, Michael and Kisseberth, Charles (1979). Generative phonology: Description and theory. London: Academic Press.

Kenstowicz, Michael (1994). Phonology in generative grammar. Cambridge, MA: Blackwell.

Kenstowicz, Michael and Suchato, Atiwong (2006). “Issues in loanword adaptation: A case study from Thai”. Lingua 116 (7), 921-49.

Kingston, John (2003). “Learning foreign vowels”. Language and Speech 46(2-3), 295-349.

Lodhi, Abdulaziz and Engstrand, Olle (1985). “On aspiration in Swahili: Hypotheses, field observations, and an instrumental analysis”. Phonetica 42 (4), 175-187.

Mathiak, Klaus, et al. (2002). “Cerebellum and speech perception: A functional magnetic resonance imaging study”. Journal of Cognitive Neuroscience 14 (6), 902-12.

Maw, Joan (1981). Arabic and Roman writing systems for Swahili. In: Asher,R.E. and Henderson, J.A. (eds.). Towards a history of phonetics, 225-247.19

Miller, Joanne L. (2001). Speech Perception. In: Wilson, Robert A. and Keil, Frank, C. (eds.). The MIT Encyclopedia of Cognitive Science. Cambridge: The MIT Press, 787-790.

Moosmüller, Sylvia, and Ringen, Catherine (2004). “Voice and aspiration in Austrian German plosives”. Folia Linguistica 38 (1-2), 43-62.

Moyer, Alene (1999). “Ultimate attainment in L2 phonology”. Studies in Second Language Acquisition 21(1), 81-108.

Nye, Patrick W. and Fowler, Carol A. (2003). “Shadowing latency and imitation: the effect of familiarity with the phonetic patterning of English”. Journal of Phonetics 31, 63–79.

Ohala, John J. (1982). The phonological end justifies any means. In: Hattori, S. and Inoue, K. (eds.). Proceedings of the XIIIth International congress of linguistics. Tokyo: Plenary 5.


July - 2007 19

Paradis, Carole and LaCharite, Darlene (2001). “Guttural deletion in loanwords”. Phonology 18, 255-300.

Parker, Steve and Weber, David (1996). “Glottalized and Aspirated Stops in Cuzco Quechua”. International Journal of American Linguistics 62 (1), 70-85.

Pater, Joe (2003) "The perceptual acquisition of Thai phonology by English speakers: Task and stimulus effects". Second Language Research 19(3), 209-23.

Peperkamp, Sharon (2003). “Phonological acquisition: Recent attainments and new challenges”. Language and Speech 46 (2-3), 87-113.

Petrova, Olga, et al. (2006). “Voice and aspiration: Evidence from Russian, Hungarian, German, Swedish, and Turkish”. The Linguistic Review 23 (1), 1-35.

Pierrehumbert, Janet B. (2003). “Phonetic diversity, statistical learning, and acquisition of phonology”. Language and Speech 46 (2-3), 115-154.

Plaut, David C. and Kello, Christopher T. (1999). The emergence of phonology from the interplay of speech comprehension and production: A distributed connectionist approach. In: MacWhinney, Brian (ed.). Emergence of Language. Hillsdale, NJ: Lawrence Erlbaum Associates, 381.

Reynolds, Allan and Paivio, Allan (1968). “Cognitive and emotional determinants of speech”. Canadian Journal of Psychology 22 (3), 164-75.

Ringen, Catherine O. (1999). “Aspiration, preaspiration, deaspiration, sonorant devoicing and spirantization in Icelandic”. Nordic Journal of linguistics 22 (2)137-156.

Scheer, Tobias and Ségéral, Philippe (2000). In which disjunctive contexts does lenition and fortition occur? International Conference on Phonology: The Strong Position of lenition and fortition. Université de Nice: June 2000.

Sharma, Devyani (2005). “Dialect stabilization and speaker awareness in non-native varieties of English”. Journal of Sociolinguistics 9 (2), 194-224.

Silverman, Daniel (2003). “On the rarity of pre-aspirated stops". Journal of Linguistics 39 (3), 575-98.

Trubetzkoy, N.S. (1989 [1939]). Grundzüge der Phonologie. Göttingen: Vandenhoeck & Ruprecht.

Tsui, Ida Y. H. and Ciocca, Valter (2000). “Perception of aspiration and place of articulation of Cantonese initial stops by normal and sensorineural hearing-impaired listeners”. International Journal of Language & Communication Disorders 35(4), 507-25.

Vaux, Bert (1998). “The laryngeal specifications of fricatives”. Linguistic Inquiry 29 (3), 497-511.

Vaux, Bert and Samuels, Bridget (2005). “Laryngeal Markedness and Aspiration”. Phonology 22(3), 395-436.

Vennemann, Theo (1988). Preference laws for syllable structure and the explanation of sound change. Berlin: Mouton de Gruyter.

Whalen, D.H., Best, C. T. and Irwin, J. R. (1997). “Lexical effects in the perception and production of American English /p/ allophones”. Journal of Phonetics 25, 501-528.

allophone recognition for l2 learners: pattern … · allophone recognition for l2 learners:...

Documents