phonemic restoration: insights from a new methodology · phonemic restoration: ... several problems...

21
Journal of Experimental Psychology: General Copyright 1981 by the American Psychological Association, Inc. 1981, Vol. 110, No. 4, 474-494 0096-3445/81/1004-0474S00.75 Phonemic Restoration: Insights From a New Methodology Arthur G. Samuel University of California, San Diego SUMMARY Phonemic restoration is a powerful auditory illusion in which listeners "hear" parts of words that are not really there. In earlier studies of the illusion, segments of words (phonemes) were replaced by an extraneous sound; listeners were asked whether any- thing was missing and where the extraneous noise had occurred. Most listeners reported that the utterance was intact and mislocalized the noise, suggesting that they had restored the missing phoneme. In the present study, a second type of stimulus was also presented: items in which the extraneous sound was merely superimposed on the critical phoneme. On each trial, listeners were asked to report whether they thought a stimulus utterance was intact (noise superimposed) or not (noise replacing). Since this procedure yields both a miss rate / > (intact|replaced), and a false alarm rate P(replaced|intact), signal detection parameters of discriminability and bias can be calculated. The discriminability param- eter reflects how similar the two types of stimuli sound; perceptual restoration of replaced items should make them sound intact, producing low discriminability scores. The bias parameter measures the tendency of listeners to report utterances as intact; it reflects postperceptual decision processes. This improved methodology was used to test the hypothesis that restoration (and more generally, speech perception) depends upon the bottom-up confirmation of ex- pectations generated at higher levels. Perceptual restoration varied greatly with the phone class of the replaced segment and its acoustic similarity to the replacement sound, supporting a bottom-up component to the illusion. Increasing listeners' expec- tations of a phoneme increased perceptual restoration: missing segments in words were better restored than corresponding pieces in phonologically legal pseudowords; priming the words produced even more restoration. In contrast, sentential context affected the postperceptual decision stage, biasing listeners to report utterances as intact. A limited interactive model of speech perception, with both bottom-up and top-down components, is used to explain the results. Phonemic restoration is a powerful audi- In the decade since this experiment was tory illusion in which listeners "hear" parts reported, phonemic restoration has been very of words that are not really there. In the first widely cited and very little .studied. Only a report of the illusion, Warren (1970) re- handful of studies of the illusion have ap- placed the first /s/ in "legislatures" with a peared in the literature. The most thorough cough or a tone in the sentence, "The state of these was done by Warren and Obusek governors met with their respective legisla- (1971). Using the same sentence as Warren tures convening in the capital city." Listen- (1970), Warren and Obusek varied the na- ers were given typewritten versions of the ture of the replacement sound, the infor- sentence and asked to circle the location of mation given to the subjects, and the size of the cough or tone and say whether it had the replaced chunk of speech. As in the first replaced a speech sound. Localization of the study, two measures of restoration were extraneous sound was poor, and essentially used: the distance metric (how accurately all of the subjects reported that the sentence subjects located the replacement sound) and was intact; they apparently restored the de- the hit rate (how often subjects accurately leted /s/. detected the deletion). Using these mea- 474

Upload: doanbao

Post on 05-May-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Journal of Experimental Psychology: General Copyright 1981 by the American Psychological Association, Inc.1981, Vol. 110, No. 4, 474-494 0096-3445/81/1004-0474S00.75

Phonemic Restoration: Insights From a New Methodology

Arthur G. SamuelUniversity of California, San Diego

SUMMARY

Phonemic restoration is a powerful auditory illusion in which listeners "hear" partsof words that are not really there. In earlier studies of the illusion, segments of words(phonemes) were replaced by an extraneous sound; listeners were asked whether any-thing was missing and where the extraneous noise had occurred. Most listeners reportedthat the utterance was intact and mislocalized the noise, suggesting that they hadrestored the missing phoneme.

In the present study, a second type of stimulus was also presented: items in whichthe extraneous sound was merely superimposed on the critical phoneme. On each trial,listeners were asked to report whether they thought a stimulus utterance was intact(noise superimposed) or not (noise replacing). Since this procedure yields both a missrate />(intact|replaced), and a false alarm rate P(replaced|intact), signal detectionparameters of discriminability and bias can be calculated. The discriminability param-eter reflects how similar the two types of stimuli sound; perceptual restoration ofreplaced items should make them sound intact, producing low discriminability scores.The bias parameter measures the tendency of listeners to report utterances as intact;it reflects postperceptual decision processes.

This improved methodology was used to test the hypothesis that restoration (andmore generally, speech perception) depends upon the bottom-up confirmation of ex-pectations generated at higher levels. Perceptual restoration varied greatly with thephone class of the replaced segment and its acoustic similarity to the replacementsound, supporting a bottom-up component to the illusion. Increasing listeners' expec-tations of a phoneme increased perceptual restoration: missing segments in words werebetter restored than corresponding pieces in phonologically legal pseudowords; primingthe words produced even more restoration. In contrast, sentential context affected thepostperceptual decision stage, biasing listeners to report utterances as intact. A limitedinteractive model of speech perception, with both bottom-up and top-down components,is used to explain the results.

Phonemic restoration is a powerful audi- In the decade since this experiment wastory illusion in which listeners "hear" parts reported, phonemic restoration has been veryof words that are not really there. In the first widely cited and very little .studied. Only areport of the illusion, Warren (1970) re- handful of studies of the illusion have ap-placed the first /s/ in "legislatures" with a peared in the literature. The most thoroughcough or a tone in the sentence, "The state of these was done by Warren and Obusekgovernors met with their respective legisla- (1971). Using the same sentence as Warrentures convening in the capital city." Listen- (1970), Warren and Obusek varied the na-ers were given typewritten versions of the ture of the replacement sound, the infor-sentence and asked to circle the location of mation given to the subjects, and the size ofthe cough or tone and say whether it had the replaced chunk of speech. As in the firstreplaced a speech sound. Localization of the study, two measures of restoration wereextraneous sound was poor, and essentially used: the distance metric (how accuratelyall of the subjects reported that the sentence subjects located the replacement sound) andwas intact; they apparently restored the de- the hit rate (how often subjects accuratelyleted /s/. detected the deletion). Using these mea-

474

PHONEMIC RESTORATION 475

sures, the authors concluded that coughs,tones, and buzzes were essentially equallyeffective in producing restoration, and allwere better than leaving a silent gap wherethe phoneme had been. The null differenceamong the different replacement soundsmay have been due to a ceiling effect. Usingsimilar methodology, Layton (1975) foundthat all replacement sounds are not equal;a tone is less effective than noise in produc-ing restoration. Warren and Obusek's 8-dBamplitude manipulation of the replacementsound was ineffective, but even the quieterversion was as loud as the sentence peaklevel, making strong conclusions unwise. Thetwo most surprising findings were that thewhole syllable "gis" was about as restorableas the "s" alone, and that informing subjectsbeforehand that something was in fact de-leted did not help—restoration was still quitestrong.

Obusek and Warren (1973) used a dif-ferent approach to the problem of under-standing restoration. They took a singleword, "magistrjate," and created several ver-sions of it, replacing the /s/ with noise, the/gis/ with noise, or the /s/ with silence.Each version of "magistrate" was made intoa tape loop and presented to listeners. Therationale was clever: It was already knownthat listeners hear (synthesize) very differentwords when a word is played over and over;this is the "verbal transformation effect"(Warren, 1961). Obusek and Warren rea-soned that if restoration and the verbal

This research was conducted at the University ofCalifornia, San Diego, and was supported by U.S. Pub-lic Health Service Grant MH 15828 to the Center forHuman Information Processing, University of Califor-nia, San Diego, and by National Science FoundationGrant BNS76-15024 A01 to Dave Rumelhart. Bell Lab-oratories provided support for the preparation of themanuscript.

I would like to thank those who have helped me inconducting this project. First and foremost, I thankDave Rumelhart, who has provided invaluable adviceand support throughout the project. I have also benefitedfrom the suggestions of Donna Kat, Mark Liberman,Jay McClelland, Don Norman, and the LNR ResearchGroup.

Requests for reprints should be sent to Arthur G.Samuel, who is now at the Department of Psychology,Box 11A Yale Station, Yale University, New Haven,Connecticut 06520.

transformation effect are both due to thelisteners' synthesis of speech, then the trans-formations with the restoration stimuli shouldbe enhanced at the point of replacement(/s/ or /gis/). In fact, they obtained exactlythis result.

The only other published study of resto-ration was done by Warren and Sherman(1974). Three methodological improvementswere made in this study. First, the splicingwas done electronically, allowing greateraccuracy. Second, seven sentences (ratherthan the single "legislatures" sentence) wereused, a step toward generality. Third, in theoriginal recording of the stimuli, the criticalphonemes were mispronounced to insurethat no appropriate cues remained. The the-oretical issue considered was whether sub-sequent context could influence what pho-neme was restored when more than one fitthe prior context. Unfortunately, the oper-ational definition of subsequent context, theremainder of the word, was a very limitedone. For example, a sentence might be con-textually neutral up to "deli* . . .", where* is the replacement and ". . ." is either"ery" or "eration," producing either "deliv-ery" or "deliberation." Not surprisingly, this"subsequent context" was effective in influ-encing whether /b/ or /v/ was restored.

Despite the methodological improvementsof Warren and Sherman (1974), there areseveral problems inherent in the way pho-nemic restoration has been studied. In par-ticular, the measures of restoration have se-rious shortcomings. The distance measure(localizing the replacement sound) is, asWarren and Obusek (1971) admitted, atbest an indirect measure of restoration. Itis not at all clear why a subject who mis-localizes the noise by six phonemes shouldbe considered to have restored more than onewho misses by only three phonemes; bothhave restored the critical phoneme or beenconfused by the apparent location of thenoise. The hit rate measure of restoration isconsiderably better in that it presumably isdirectly related to the restoration phenom-enon. However, it has a serious drawbackin that there is no way to obtain a false alarmrate (reporting that something was missingwhen nothing was), since the noise alwaysreplaces a phoneme. Thus there is no way

476 ARTHUR G. SAMUEL

to know whether a hit (correct report ofsomething missing) is due to a failure to re-store or to some sort of bias. An extremecase of bias would be a subject who neversaid that a phoneme was missing. In thestudies reviewed here, that subject wouldhave to be considered a perfect restorer.With no false alarm data, restoration is fun-damentally indistinguishable from bias.

Clearly, other means are needed to mea-sure phonemic restoration. The approachtaken in the present study is to structure thetask in such a way that the false alarm ratecan be determined, thus providing a meansto factor out bias. To do this, a second typeof stimulus is used in addition to the normalrestoration item. In the new stimulus type,the "replacement" sound is merely addedto the appropriate phoneme, rather than re-placing it. Stimuli thus come in pairs; in oneversion, a sound replaces a phoneme, whereasin the other the same sound coincides withthe phoneme. The rationale for this ap-proach is simple. The phenomenology ofphonemic restoration is that the stimulusword seems to be intact and an additionalsound appears to be overlaid over part of it.The new version of the stimulus correspondsexactly to this description. Given appropri-ate controls (for such factors as masking),to the extent that the two versions soundalike, restoration is indicated. Since a falsealarm rate is available (probability of re-porting "replaced" when presented with anadded item), bias can be factored out usingsignal detection theory.

Using this improved methodology, the ex-periments reported here address a funda-mental question: How does lexical accessoccur from speech? In answering this ques-tion, the focus will be on determining howmuch the system depends on analyzing theacoustic information into words and howmuch it uses higher level knowledge to derivelower level information; in short, what arethe relative contributions of bottom-up andtop-down analysis? The metatheoreticalframework that prompted this approach isthe interactive schema theory of Rumelhart(1977). In this model, the perceptual-cog-nitive system is viewed as a collection of ac-tive processing units (schemata). The sche-mata are data structures that are activated

either by other schemata (top-down) or byincoming sensory information (bottom-up).In Rumelhart's model, the perceptual pro-cess is essentially an evidence-gatheringtask; perception occurs when a given schemahas received sufficient evidence. The piecesof evidence can come from both bottom-upand top-down sources. In general, the top-down information can be characterized asexpectations, and the bottom-up as confir-mation. The restoration phenomenon itselfis clear evidence for top-down processing ofspeech; the listener uses the linguistic con-text to restore the appropriate (expected)phoneme. However, the illusion also dependson bottom-up factors; the amplitude, spec-trum, and very presence of the replacementsound can determine the success of resto-ration. Thus study of a product of the speechrecognition process, restoration, may provideinsights into the general working of thesystem.

Within this framework, a number of pro-cessing issues are considered. In the first ex-periment several factors that could affecttop-down and/or bottom-up analysis ofspeech are manipulated: word frequency andlength, and phone class and location withinthe word of the replaced sound. These fac-tors are examined using words produced inisolation, in order to factor out the influenceof variables at levels higher than the lexical.In the second experiment, a priming para-digm is used to investigate the role of lexicaland phonological knowledge in the restora-tion phenomenon. The final experiment ex-amines the effects of memory load and sen-tential context on phonemic restoration.

Experiment 1

The basic hypothesis of the present studyis that phonemes are restored on thebasis of expectations and confirmation; thestronger the contextual constraints, thegreater the expectations should be, and theless bottom-up confirmation that should beneeded.

Four factors are experimentally manipu-lated in order to vary the level of expectationand confirmation of a particular phoneme.First, high-frequency words are comparedwith low-frequency words. High-frequency

PHONEMIC RESTORATION 477

words are more easily retrieved from the lex-icon. To the extent that lexical informationprovides expectations of constituent pho-nemes, more restoration should occur inmore frequent words. Second, two-syllable,three-syllable, and four-syllable words arecompared. To the extent that a longer wordprovides more context, restoration should beenhanced. Third, five different classes ofphonemes are investigated: liquids, stops,nasals, fricatives, and vowels. The replace-ment-added sound in the experiment is aburst of white noise. The degree of similaritybetween this noise and the five phone classesproduces a variation in bottom-up confir-mation; the fricatives (and to a lesser extentthe stops) should be most easily restoredwhen white noise is the inserted sound. Thefinal factor is the position within a word ofthe replacement-addition: word-initial, word-medial, and word-final phonemes are ma-nipulated. Several predictions can be testedfor this factor. If expectations can be gen-erated on-line and very quickly, then word-final position should exhibit the most res-toration, since the earlier parts of the wordserve as context. If simple forward or back-ward masking is involved, word-medial po-sition should be most susceptible. Finally,several theorists have recently argued thatthe initial syllable is disproportionately im-portant in lexical access (Cole, 1973; Foss& Blank, 1980; Marslen-Wilson & Tyler,1980; Marslen-Wilson & Welsh, 1978; Taft,1979). If this is so, we may expect listenersto be particularly sensitive to any changesmade in word-initial position.

MethodStimuli. Each test word was spoken clearly by the

author and recorded on audiotape. The word was thenamplified, low pass filtered (5 kHz), digitized (12-bitA/D at 10 kHz sample rate), and stored on a PDP-11disc file. Using a waveform editor, two versions of theword were constructed from the stored file, a replace-ment item and an added item. Figure 1 presents oscil-lograms of the critical segment of the original, added,and replaced versions of the word "funerals." In thisexample, the medial /n/ was the target phoneme. Tocreate the test items, the /n/ was located visually onan oscilloscope and auditorily over headphones. Onceits general location was determined, the segment wasbracketed by two pointers according to the followingcriteria: (a) when the part of the word before the pointerto the segment onset was played, no trace of the target

phone was audible; (b) when the part of the word afterthe pointer to the segment offset was played, no traceof the target phone was audible; (c) on the oscilloscope,all parts of the pattern characteristic of the target phonewere included between the onset and offset pointers. Asis evident in Figure 1, satisfying (a) and (b) generallyinsured (c). In fact, due to coarticulation, substantialportions of the adjacent phonemes were virtually alwaysdeleted to meet these criteria. Using this procedure pro-duced critical segments 50-279 msec long.

Once the target phone was properly bracketed, thedigital root mean square amplitude (DRMSA) of its cen-tral 20 msec was computed. To create the added version,a random number was added to each point within thetarget segment (this is the digital equivalent of mixingin white noise). Each random number was between+DRMSA and -DRMSA. Thus the amplitude of the noiseadded to (or replacing) the target was a function of theamplitude of the target itself. Using this procedure in-sured that the S/N ratio would be constant for all tar-gets, regardless of the original amplitude of the target.The panel labeled ADDED in Figure 1 shows the resultsof this procedure. The added version of the word wasstored on disc file. The replaced version was constructedby simply replacing each point within the bracketed seg-ment with a random number between +DRMSA and-DRMSA.' The REPLACED panel in Figure 1 illustratesthe results. The replaced item was also stored on discfile.

Replaced and added versions of 90 test words wereconstructed. Four factors were represented in these 90words:

1. Word frequency: 45 high-frequency words (100-300 occurrences/million) and 45 low (1 occurrence/million, Kucera & Francis, 1967). The low-frequencywords were all recognizable English words.

2. Word length: 30 two-syllable, 30 three-syllable,and 30 four-syllable words.

3. Phone class of the replaced/added phoneme: 18liquids, 18 stops, 18 vowels, 18 fricatives, and 18 nasals.

4. Phone position: 30 word initial, 30 word medial,and 30 word final. The 90 words thus represent a 2 X3 X 5 X 3 factorial crossing of these four factors (Fre-quency X Length X Class X Position).

Matched to the 180 test-word items (90 added and90 replaced) were 180 control segments. Each controlsegment was the segment of the test word that containedwhite noise. Thus, for replacement items, the controlsegment was the segment of white noise that had re-placed a phone; for added items, the control segmentwas a mixture of speech and noise.

The control segments were used to determine thebasic discriminability of the added and replaced versionsof the test words. Since restoration will be measured bythe difficulty of discriminating added and replaced testwords, it is important to obtain a measure of the dis-criminability of the critical segments outside of their

' The amplitude of the replacement sounds in thepresent study was generally lower than that of replace-ment sounds in previous studies by Warren (e.g., 1970)and his colleagues. This may reduce the amount of res-toration observed.

478 ARTHUR G. SAMUEL

linguistic context. With other appropriate controls, gooddiscriminability of the control segments, along with poordiscriminability of the test words, is evidence for res-toration.

Table 1 presents a summary of some of the importantacoustic characteristics of the test words and controlsegments.

Apparatus. All stimuli were stored on disc file in aPDF-11/45 computer. For presentation to the subjects,they were output through a 12-bit D/A converter, am-plified, low pass filtered (5 kHz), and played binaurallyover stereo headphones. Subjects heard the stimuli inindividual acoustically shielded booths and respondedby pressing one of two labeled buttons, using the firsttwo fingers of the right hand. All experimental eventswere controlled by the PDP-11.

Procedure. On each trial, subjects heard one testitem (a word or a segment). For words, they were toldto push one button if the noise replaced part of the word,and the other if it coincided with part of the word. Forsegments, they pushed one button if they heard justwhite noise, and the other if the white noise was mixedwith another (speech) sound. The nature of the stimuliwas fully explained beforehand. Subjects were told torespond as quickly as possible without sacrificing ac-curacy. They were instructed to guess when necessary;a response was required on each trial. Groups of 1-4

subjects were tested simultaneously. When all subjectshad responded (or 5 sec had elapsed), the computerwaited 1 sec, and then initiated the next trial.

Words and segments were presented in blocks. Twentypractice words (half added, half replaced) and twentypractice segments (half noise, half noise plus speech)were presented first. The 90 test words were then pre-sented in a random order. On each trial, it was randomlydetermined whether the added or replaced version of theword would be presented. After the word block finished,a block of 90 segments was run. These segments weretaken directly from the test words that had just beenpresented; if the sixth test word was the replaced versionof "modern," the sixth segment would be the noise burstthat had replaced a phoneme in "modern." The blockof segments was followed by another block of 90 wordsand a block of 90 segments containing the versions ofthe words and segments not presented in the first pass.The design was thus within subject; each subject re-ceived both versions of each word and the control seg-ments from each version.

All trials were run in a single session that lasted ap-proximately 20 min.

Subjects. Twenty subjects participated in Experi-ment 1. All were native English speakers with no knownhearing problems. They received course credit for theirparticipation.

ADDED

ORIGINAL

REPLACED

Figure 1. Oscillograms of a part of the word "funerals." The left panel is from the original digitizedversion of the word, and includes the /n/ and part of the preceding /u/ and succeeding/a/. (The ADDEDpanel is the same part of the word, with digital white noise superimposed on the critical segment. TheREPLACED panel is the version of the word with digital white noise replacing the critical segment.)

PHONEMIC RESTORATION 479

Results

Three related measures of restorationwere calculated from the data. The miss rateis the probability that a replacement itemwas labeled added(or, for the segments, thata noise burst was labeled noise plus speech).This measure is essentially the same as theaccuracy measure used in previous studies.However, it is more valid in this study be-cause subjects were presented with itemsthat really were intact. The primary measureof restoration is d1, the discriminability ofadded and replaced versions of the samewords. To the extent that d' is near zero, thetwo versions are not discriminable; the in-ference is that they sound alike because themissing phoneme is being restored in the re-placed version. This inference is renderedconsiderably more plausible if the d' for thecorresponding segments is far from zero. Ahigh d' for segments indicates that in isola-tion, the critical segments do not sound alike.In particular, it rules out a simple masking

Table 1Stimuli Analysis for Words and Segments inExperiment 1

Table 2Mm, d', and Beta for Words in Experiment 1

Stimulus type

FrequencyHighLow

LengthTwo-syllableThree-syllableFour-syllable

Phone classLiquidStopNasalFricativeVowel

PositionInitialMedialFinal

Wordduration

660651

585655725

624707649717580

668666633

Segmentduration

144133

140141135

127114127193131

115107193

Segmentamplitude

745597

693557763

779610697268

1000

825819369

Note. The word and segment durations are in millisec-onds. The segment amplitudes are the means of the dig-ital RMS amplitudes (DRMSA), expressed in arbitraryunits. These values are scaled such that vowels have amean of 1000; the numbers should be treated as relativevalues.

Stimulus type

FrequencyHighLow

LengthTwo-syllableThree-syllableFour-syllable

Phone classLiquidStopNasalFricativeVowel

PositionInitialMedialFinal

Miss

4441

3443SO

3460406315

305641

d'

.881.06

1.101.02.77

1.47.36

1.13.36

1.69

1.00.89

1.08

Beta

1.281.36

1.151.391.37

1.571.171.431.20.72

.981.711.37

Note. The miss rates are percentages; d' and Beta arein absolute units.

explanation for the low d' for words.2 Thethird measure, Beta, is an index of bias. ABeta greater than 1.0 reflects a tendency toreport that the word was intact, regardlessof whether the item was a replaced or addedversion; values less than 1.0 mean the reverse(for segments, a Beta greater than 1.0 meansa bias toward saying "noise plus speech").High Beta values reflect a form of responsebias in the postperceptual decision process.

Table 2 presents the miss, d', and Betascores for the words, broken down by wordfrequency, word length, phone class, andphone position. The scores for each factorwere derived by pooling all of the data fromall of the subjects, and collapsing across allother factors (this was necessary to assurean adequate number of observations in eachcell). A matrix of the same form as Table2 was also calculated for each subject. Usingthese scores, 12 one-way analyses of variancewere conducted, 4 factors (frequency, length,class, and position) X 3 measures (miss, d',and Beta).3

2 An explanation based on forward and backwardmasking is not ruled out by this control; it is consideredfurther in Experiment 2.

3 Separate one-way analyses of variance were con-ducted for each factor, rather than the usual multifactor

480 ARTHUR G. SAMUEL

The effect of word length was as pre-dicted; longer words provided more contextand greater restoration. The large effect onthe miss rate, F(2, 38) = 19.06, p < .001,is due to both perceptual, d, F(2, 38) = 5.20,p = .01, and postperceptual factors, Beta,F(2, 38) = 6.88, p<. 01.

As can be seen in Table 2, the effect offrequency was less • clear-cut. An expecta-tion-based model predicts more restorationof phonemes in high-frequency words thanlow and thus worse performance. In fact, themiss rate for high frequency words is slightlyhigher, but not significantly so, F(l, 19) =2.85, p > .10. The frequency conditions alsodid not differ in bias, F(l, 19) = 1.89, p >.10. The d measure does reach significance,F(l, 19) = 7.21, p<.05, suggesting thatmore frequent words, through more efficientlexical access, yield more restoration. How-ever, the effect is not very robust; under sim-ilar conditions, Samuel (1981) found no dif-ference on any of the three measures as afunction of word frequency. Thus, accep-tance of a frequency effect must be consid-ered tentative, pending further data.

Even a cursory inspection of Table 2 re-veals the massive effect of phone class onrestoration. As predicted by simple acousticfactors, fricatives and stops are very diffi-cult, whereas the more periodic phones arerelatively easy. Recall that restoration ishypothesized to be due to the bottom-up con-firmation of the listener's expectations. Sinceexpectations of each phone class were pre-sumably equivalent, the data suggest that aburst of white noise is more effective at con-firming the presence of fricatives or stopsthan liquids, nasals, or vowels. The analyses

analyses, in order to assure an adequate number of ob-servations in each cell for each subject. The signal de-tection parameters d and Beta are extremely nonlinearfor miss and false alarm rates near 0% or 100%; suchextreme rates were fairly common for individual sub-jects when the data were factorially divided, since veryfew observations were obtained in some cells (e.g., eachcell of the Phone Class X Position interaction contained90/[5 X 3] = 6 observations). In order to avoid the un-manageable noise such an analysis would produce, sep-arate analyses for each factor were necessary. This nec-essarily implies the loss of any interactions that mightbe present. However, few of the factors would be ex-pected to interact (perhaps position with the others),and no crossover interactions are predicted by any cur-rent theory. Thus the main effects are sufficient.

Table 3Miss, d', and Beta for Segments in Experiment 1

Stimulus type Miss d' Beta

FrequencyHighLow

LengthTwo-syllableThree-syllableFour-syllable

Phone ClassLiquidStopNasalFricativeVowel

PositionInitialMedialFinal

69

787

69868

1094

2.712.60

2.832.522.61

3.121.973.082.553.16

2.542.772.80

.54

.86

.86

.65

.63

1.03.48

1.32.44

1.82

.901.04.31

Note. The miss rates are percentages; d' and Beta arein absolute units.

of variance for the phone class factor bearthis out: miss, F(4, 76) = 51.97, p < .001;d', F(4, 76) = 43.60, p<.001; Beta, F(4,76) = 3.50, p < .05. The effect is thus pri-marily one of discriminability rather thanbias.4

An examination of the comparable datafor the control segments (Table 3) aids ininterpreting the phone class result. The over-all level of performance for the segments ismuch higher than that for the words, indi-cating that in general the acoustic infor-mation needed for discrimination of addedand replaced versions was present. A two-way analysis of variance (Words vs. Seg-ments X Phone Class) confirms the superiordiscriminability of the segments, F(l,19)= 170.51, p<. 001. Taking this absolutedifference in level into account, we may lookat the relative levels of performance withina given factor. Looking at the d's for seg-ments of the various phone classes, we find

4 Samuel (1981) has replicated this pattern usingwhite noise as the replacement sound. When a 1000-Hztone was used instead, vowels were slightly better re-stored than fricatives, the reversal predicted by the con-firmation view. The overall level of restoration was, how-ever, greater with white noise than with the pure tone,suggesting that spectral similarity is weighted moreheavily than periodicity.

PHONEMIC RESTORATION 481

that stops and fricatives are relatively dif-ficult even in isolation. This result is no doubtdue to the fact that stops and especially fric-atives share more acoustic properties withthe noise than do the other phone classes;they are also of relatively low amplitude (seeTable 1) and more difficult to discriminate.

Although the segment data provide a par-tial explanation for the pattern of results onthe words, there apparently is another factorto consider. Performance on the periodic seg-ments was perhaps 50% better than that forstop and fricative segments; the comparabledifferences for words are around 300%. Theanalysis of variance reveals that this, inter-action of lexical status and phone class isreliable, F(4, 76) = 4.52, p < .01. Differen-tial restorability, because of the acousticdifferences, could account for the additionalspread in performance.

The analysis of performance as a functionof phoneme position revealed several inter-esting results. Although discriminability didnot vary reliably, F(2, 38) = 1.33, p > .20,the differences in miss rates for initial, me-dial, and final phonemes were significant,F(2, 38) = 26.69, p< .001. This pattern wasdue to a significant difference in bias acrossthe three positions, F(2, 38) = 4.99, p < .05.The bias results suggest that subjects aregenerally inclined to report "intact" unlessthere is evidence to the contrary. If a noiseburst occurs near the middle of a word,where forward and backward masking re-duce perceptibility, subjects tend to reportthat nothing was removed. Given this overallbias toward reporting "intact," the Betavalue for initial phonemes (.98 vs. 1.71 formedial and 1.37 for final) indicates that thepresence of an initial noise burst (added orreplacing) biases the decision process towardreporting that something was amiss. Thispattern was predicted from the view that lex-ical access is dominated by word-initial pho-nemes; any disruption in initial position isparticularly noticeable and therefore leadsto a bias toward saying "replaced."

No support was found for a rapid on-lineeffect of lexical context. Such an effectwould produce more restoration in final po-sition, since earlier parts of the word wouldprovide contextual cues. Neither the d' northe miss rate measure revealed this result.Apparently, in the isolated word situation

tested here, within-word context cannot pro-duce the expectation and confirmation re-quired for restoration.

Discussion

The results from the first experiment pro-vide support for the schema-based modeloutlined earlier. The fact that more resto-ration was found for longer words than forshorter words supports the view that resto-ration is a function of context; the greaterthe context, the greater the expectation, thegreater the restoration. A small effect ofword frequency was also obtained, providingsome additional support for the role of ex-pectations.

The importance of bottom-up confirma-tion was perhaps even greater than that ofexpectation in this single-word situation. Alarge effect of phone class was obtained, sug-gesting that the replacement sound must becompatible with the phoneme it is to restore.Even more basic acoustic factors, such asforward and backward masking, also appearto play some role. Thus, the pattern of resultsindicates that phonemic restoration occurswhen the bottom-up signal (e.g., white noise)is sufficient to confirm the presence of aschema that is expected on the basis of con-text (e.g., the rest of the word). There is atrade-off between these two factors; whenexpectations are strong, less confirmation(i.e., an acoustic signal less like the actualphoneme) is needed for restoration, and viceversa.

Examination of the results for the variousfactors manipulated in Experiment 1 revealshow critical it is to factor changes in thesimple miss rate into discriminability andbias components. Consider, for example, thephone class and position factors. If only missrates were considered, one would be inclinedto conclude that these two factors can havesimilar effects, for instance, that fricativesand medial phones both tend to be well re-stored. The signal detection analysis showsthat these factors operate in different ways;the high miss rates for fricatives are in factdue to true perceptual restoration, whereasthe high medial miss rates are the result ofa postperceptual response bias.

This response bias can be difficult to in-terpret theoretically: What does it mean to

482 ARTHUR G. SAMUEL

be more biased toward reporting utterancesas intact in some situations than in others?I believe its interpretation is made clearerby an introspective report. In many cases,the real answer to the question, "Was theutterance intact?" is "I don't know." Thatis, listeners often do not have a sufficientlydetailed acoustic representation of the stim-ulus available to make the required judg-ment. Under these circumstances, bias ef-fects will appear. The part of the cognitivesystem responsible for making the requiredchoice uses all of the sources of informationavailable, including questioning whetherthere was anything "wrong" with the stim-ulus. The answer is likely to be yes in initialposition and no in medial. In contrast, theeffect of phone class is truly perceptual andleads to some phonemes being restored bet-ter than others. Vowels, for example, are notrestored very well, leading to an incompleteword percept; this percept is available forthe required judgment and leads to an im-provement in the d' measure. As the resultsfor the word length factor show, a particularmanipulation can affect both true perceptualrestoration and postperceptual bias. It is thuscritical to be able to pull these effects apartto understand the restoration illusion.

Experiment 2

In the single word situation used in thefirst experiment, the kinds of knowledge thatcan be brought to bear are clearly more lim-ited than in the usual discourse situation;syntax, semantics, and pragmatics are ruledout. However, listeners still have at least twopowerful knowledge sources. First, they pos-sess detailed (though probably implicit)knowledge of the allowable sequences ofphonemes in English. Second, they have arather detailed representation of the soundof each word in their memory. The secondexperiment is concerned with the role thatthese phonological and lexical knowledgesources play in the perception of speech.

One way to separate the contributions ofthese two factors is to use stimuli that followthe phonological rules of English but haveno lexical entry in memory—pseudowords.The intuitive prediction is that if the samesort of restoration task is run with pseudo-words and words, performance will be better

on words than on pseudowords because ofthe familiarity of the words; the wordsshould be easier to encode and examine, per-mitting better discriminability of added andreplaced items. However, the top-down com-ponent of the schema model predicts the re-verse; because the words are more familiar(i.e., have a lexical entry), listeners will bebetter able to generate expectations of de-leted phonemes and should therefore restoremore, producing poorer discriminability ofthe two versions.

There is a serious problem in running theexperiment as it has just been outlined. Thepseudqjvord condition is at a fundamentaldisadvantage because the words can be doneby process of elimination, whereas pseudo-words cannot. For example, consider theword item "basis" in which the final "s" hasbeen replaced. If no restoration occurs, thelistener hears "basi*" and can answer "re-placed" because "basi*" is not a word, evenif he or she thought the noise burst was ontop of another phoneme. In the comparablepseudoword condition, "pafis" might be pre-sented with the "s" replaced, yielding"pafi*." If the noise is mislocalized (as inthe word case), the listener should report"added." Thus, because no lexical backupstrategy is available, an advantage exists forthe word condition if the experiment is runthis way.

To overcome this problem, a cuing para-digm is used in Experiment 2. In this par-adigm, the intact version (neither added norreplaced) of a test item is presented first,followed by the usual added or replaced ver-sion. For the pseudoword example given ear-lier, the trial would thus become "pafis" fol-lowed by "pafi*" or "pafi[*s]" (* = re-placement sound, [*s] = replacement soundadded to "s"). The listener thus knows whatthe original was and can use the same strat-egies in both conditions.

For the word items, the cuing paradigmcan also be thought of as a priming condi-tion, since the cue word should activate anentry in the lexicon. Again, the intuitive pre-diction clashes with an expectation-drivenmodel's prediction. The simplest predictionwould be that knowing what word was com-ing should help the subject to know what tolisten for. The top-down alternative is thatdespite this advantage, performance should

PHONEMIC RESTORATION 483

actually be worse because priming a wordincreases the expectation of each phoneme,increasing restoration.

Method

Stimuli and design. To address the various issuesjust discussed, three conditions were included in Ex-periment 2. Each condition was similar to Experiment1 in format. In all of the conditions, the amplitude-matched white noise replacement procedure of Exper-iment 1 was used. All new stimuli were recorded by thesame speaker under the same conditions as the wordsused in Experiment 1 and were constructed using thesame digital editing techniques.

The unprimed pseudoword condition was identical toExperiment 1, except that each word was replaced bya matched pseudoword. Each pseudoword was con-structed by changing one phoneme per syllable in theoriginal word to a phonologically legal different pho-neme from the same phone class (stops replaced stops,fricatives replaced fricatives, etc.). The critical phonemewas never changed, and was used as the added/replacedphoneme in the pseudoword as well. This procedure in-sured that the pseudowords were very closely yoked tothe original words but were nevertheless nonlexicalitems. Each pseudoword was pronounced with the samestress pattern as the original word it replaced and ofcourse had the same number of syllables. Some exam-ples of word-pseudoword pairs (with the critical pho-nemes capitalized) are "prOgress-crOgless," "basiS-pafiS," "Mildew-Molbew," and "acTivity-ecTathigy."

Table 4 presents an analysis of the pseudowords andtheir control segments analogous to the breakdown ofthe words in Table 1. The similarity between the wordsand pseudowords is evident in the values in the twotables. The only consistent difference is a 10% length-ening of the pseudowords, probably produced by theirunfamiliarity.

The unprimed pseudoword condition was included todetermine how well listeners can do the added/replacedtask in a situation that calls for actually determiningwhether the noise is superimposed on a phoneme. If thisproves very difficult, it suggests that the primary strat-egy used on the words was the "failure to restore =nonword = replaced" strategy discussed earlier. Notethat with this strategy, the methodology measures ex-actly what we would like: Failure to restore produces"replaced" responses, whereas "added" is reported whenan intact word is heard (restored).

The same pseudowords were used in the pseudoword/1500 condition, but on each trial, the intact version pre-ceded the added or replaced test item. The time fromonset of the cue pseudoword until onset of the test itemwas 1500 msec (since the cues were approximately 700msec long, there was about 800 msec from cue offsetto test item onset).

The word/1500 condition was comparable to the pseu-doword/1500 version, except that the test words fromExperiment 1 were used (the stimuli were in fact iden-tical). The comparison between performance in this con-dition and the pseudoword/1500 condition should clar-ify the role lexical knowledge plays in phonemicrestoration. The comparison between performance in

Table 4Stimuli Analysis for Pseudowords andSegments in Experiment 2

Pseudoword Segment SegmentStimulus type duration duration amplitude

FrequencyHighLow

LengthTwo-syllableThree-syllableFour-syllable

Phone classLiquidStopNasalFricativeVowel

PositionInitialMedialFinal

720724

656717793

713757707771662

726742698

152145

155149142

136144131195137

124123198

707660

696652702

716569835298

1000

802895354

Note. The pseudoword and segment durations are inmsec. The segment amplitudes are the means of thedigital RMS amplitudes (DRMSA), expressed in arbi-trary units. These values are scaled such that vowelshave a mean of 1000; the numbers should be treated asrelative values.

this condition and performance on the unprimed wordsof Experiment 1 should reveal the effect on restorationof priming a word.

Apparatus and procedure. The apparatus was thesame as that used in the previous experiment. The pro-cedure was also the same except that in the cued con-ditions, subjects were told that they would hear a targetfirst, followed by a test item. They were told to use thetarget to help decide whether the test item was an addedor replaced item.

The segment controls followed the same pattern asthe words (or pseudowords). For example, a replacedsegment trial in the pseudoword/1500 condition wouldinclude the target phoneme (taken from the originalintact pseudoword) followed by the noise burst that re-placed that phoneme. The onset to onset time would be1500 msec.

Sessions for the cued conditions lasted approximately35 min.

Subjects. Sixty subjects participated in Experiment2, 20 in each of the three conditions. All were nativeEnglish speakers with no known hearing problems. Theyreceived $2 or course credit for their participation.

Results

Table 5 and Table 6 present the miss, d',and Beta values for the words and pseudo-words, and their control segments.

As might be expected, performance in the

484 ARTHUR G. SAMUEL

Table 5Miss, d', and Beta for Words and Pseudowords in Experiment 2

Unprimed pseudowords

Stimulus type

FrequencyHighLow

LengthTwo-syllableThree-syllableFour-syllable

Phone ClassLiquidStopNasalFricativeVowel

PositionInitialMedialFinal

Miss

5448

495252

5054525841

476145

d'

.26

.43

.25

.44

.33

.19

.31

.37

.44

.41

.32

.30

.42

Beta

1.061.06

1.031.131.07

1.011.091.091.20.99

1.031.131.03

Pseudoword/1500

Miss

3537

363338

2842276417

344429

d'

1.251.30

1.261.361.19

1.651.051.65.51

1.65

1.191.191.45

Beta

.32

.48

.41

.37

.40

1.451.391.431.38.81

1.251.661.29

Miss

4248

434746

3564456120

345843

Word/ 1500

d'

.79

.82

.92

.72

.78

1.31.17

1.04.03

1.63

.94

.83

.71

Beta

1.171.34

1.281.221.24

1.421.081.481.01.96

1.061.681.13

Note. The miss rates are percentages; d' and Beta are in absolute units.

unprimed pseudoword condition was ex-tremely poor. Listeners apparently were un-able to make an absolute added/replacedjudgment. Rather, the strategy generallyused was to determine whether the speechstimulus heard matched a representation inmemory. If not, "replaced" was reported.Since this strategy was preempted in theunprimed pseudoword condition, perfor-mance was near chance.

The central question of Experiment 2 waswhether lexical knowledge increases resto-ration. The answer seems to be yes; discrim-inability of added/replaced versions ofprimed pseudowords (pseudoword/1500) was50% better than that for the comparableword condition. To test this difference sta-tistically, a two-way analysis of variance wasconducted using Word versus Pseudowordand Phone Class as the factors (the PhoneClass factor was used to pull out variancefrom the effect of interest). The higher missrates for words than pseudowords, F(l,38) = 8.05, p < .01, was entirely due toperceptual restoration (d1), F(\, 38) = 14.19,p < .001; the postperceptual bias showed anonsignificant trend in the opposite direc-tion, F(l, 38) = 3.14, p > .05. Despite thelarge familiarity advantage that words en-

joy, the availability of a lexical entry appearsto increase expectations sufficiently to pro-duce a discriminability disadvantage for thewords.

One caveat needs to be added at this point.Examination of Table 6 reveals that perfor-mance on the primed pseudoword segmentswas also considerably better than that on theprimed word segments; there was also anunusual pattern of bias effects. The reasonfor these differences is not clear—perhapsthe slight lengthening of pseudowords (andthus their segments) relative to words en-hanced performance. The segment controlsfrom the unprimed pseudoword conditionwere intermediate in difficulty, easier thanthose for the words, but harder than thosefor the primed pseudowords, and the biaswas in the normal range. In any event thereis a large difficulty factor (due to familiar-ity) favoring the words over the pseudowordsthat does not operate in the segment con-ditions. In resource allocation terms, thepseudowords should require more resourcesto process, leaving less capacity for the dis-crimination judgment; this factor shouldhave pushed performance on the words andpseudowords in the opposite direction fromthe results obtained. Thus it seems reason-

PHONEMIC RESTORATION 485

able to tentatively accept the original con-clusion that the existence of a lexical entryfacilitates restoration and thus hurts dis-criminability. However, a replication of thepseudoword advantage is clearly needed be-fore any strong claims can be made.

It seems reasonable that telling the sub-jects what word to listen for should be help-ful, since the listener can presumably set upsome sort of auditory image and see whetherthe test item matches it. However, in a sys-tem with expectation-driven processing, thisstrategy could backfire; the auditory imageis in effect an expectation and thus shouldincrease restoration, hurting performance.A comparison of discriminability in theword/1500 condition and the unprimed con-dition of Experiment 1 shows that perfor-mance is in fact worse when subjects knowwhat word to expect. A more detailed com-parison of the primed and unprimed condi-tions reveals that this degradation in per-formance varies with position of the criticalphoneme. Recall that in several current the-ories (e.g., Cole, 1973; Foss & Blank, 1980;Marslen-Wilson & Welsh, 1978), lexicalaccess is based on information in the firstfew phonemes of a word; these theoriesshould predict more restoration (and thus

poorer discriminability) for word-final pho-nemes than for initial ones, since the earlypart of a word provides context for laterparts. For unprimed words, no such discrim-inability effect was observed. For primedwords, however, the predicted pattern ap-pears, with restoration increasing from ini-tial through final position. In a two-wayanalysis of variance (Primed vs. UnprimedX Phone position), this interaction was mar-ginally significant, F(2, 76) = 2.65, .05 <p< .10.

To see whether the degradation due topriming was reliable, the primed word con-dition was replicated. In the replication, theinterval between cue offset and target onsetwas held at 200 msec; all other aspects ofthe experiment were unchanged. The d' val-ues for initial, medial, and final position were1.07, .84, and .78, respectively, replicatingthe Priming X Position interaction. In thereplication, the interaction reached signifi-cance, F(2, 76) = 4.06, p < .02. Thus, therole of the beginnings of words appears tobe confirmatory; in isolation, there are noexpectations to confirm, whereas in the prim-ing condition, listeners expect a particularword, and when their expectations are con-firmed, the ending is restored.

Table 6Mm, d', and Beta for Segments in Experiment 2

Unprimed pseudowords

Stimulus type

FrequencyHighLow

LengthTwo-syllableThree-syllableFour-syllable

Phone ClassLiquidStopNasalFricativeVowel

PositionInitialMedialFinal

Miss

67

685

67838

6103

d'

3.102.78

3.132.852.85

3.452.663.162.733.65

3.212.633.12

Beta

.91

.74

.981.04.53

1.96.67

1.51.22

4.23

1.121.07.37

Pseudoword/ 1 500

Miss

68

697

351

223

47

10

d'

3.673.31

3.883.283.33

4.21•3.344.552.444.12

4.003.443.14

Beta

2.742.33

4.542.061.91

2.581.141.283.002.48

3.062.282.33

Miss

89

799

596

158

78

11

Word/ 1500

d'

2.452.34

2.722.292.23

3.142.103.051.323.42

2.712.611.97

Beta

.61

.67

.76

.61

.59

.64

.54

.80

.612.43

.66

.68

.62

Note. The miss rates are percentages; d' and Beta are in absolute units.

486 ARTHUR G. SAMUEL

Discussion

The major finding of Experiment 2 is thatlexical access from speech is both a top-downand bottom-up process, even in the relativelyimpoverished isolated-word situation tested.The availability of a lexical representationproduced a 50% decrease in discriminabilityof replaced and added items, reflecting therole of lexical knowledge in the restorationphenomenon. Put another way, because ofknowledge at the word level, the perceptionof the sounds of speech was changed.

This perception was also affected by cuingthe subjects as to what word to expect. How-ever, the effect of such cuing was to hurtperformance, rather than its usual effect ofhelping. This reversal is predicted by a top-down model of speech perception, since thecue served to prime a lexical unit, increasingthe expectation of its component sounds.

It is important to note that these effectsare perceptual effects; they appear in thelistener's ability to discriminate intact frominterrupted words or pseudowords. The post-perceptual bias measure was constant acrossthe conditions under study, indicating thatlexicality and priming have little effect onthe decision stage. Thus, the data supporta truly top-down effect, in which higher-levelknowledge affects the operation of a lowerlevel process.

Experiment 3

In the first two experiments, the added/replaced methodology illustrated the rolethat acoustic, phonological, and lexical fac-tors play in the perception of isolated words.In the final experiment, this methodology isextended to investigate phonemic restoration(and thus speech processing) in a more in-teresting domain—sentences.

The general question being addressed ishow higher level context influences whatword (i.e., ordered set of phonemes) is heardby a listener. The basic technique is a simpleextension of the added/replaced methodol-ogy to the sentence level, with one importantmodification. In the word-level experiments,the test words were selected in such a waythat if a replaced phoneme were restored,a unique word (the original) would beformed. In the sentence level version, re-

placement items were always ambiguous;restoration could produce at least two dif-ferent words. For example, the pair of words"battle" and "batter" formed a basic wordpair in this experiment. In the replacementversions of these words the final (syllabic)liquids were replaced, leaving "bat*" in eachcase. The added or replaced versions of thesewords occurred in sentences such as "Thesoldier's thoughts of the dangerous bat*made him very nervous" or "The pitcher'sthoughts of the dangerous bat* made himvery nervous." In addition to the added/re-placed judgment, subjects made a forced-choice word judgment (in this example, be-tween "battle" and "batter"). The use ofsentence pairs with multiply restorable wordsprovides information on which phoneme isrestored, in addition to the rate of restora-tion. Sherman (Note 1) first used multiplyrestorable stimuli and reported that the sen-tential context strongly influenced whichphoneme was restored; this was true even ifthe critical sentential context occurred afterthe restoration item. Experiment 3 investi-gates restoration of multiply restorable wordswith the improved methodology used in thefirst two experiments.

Two theoretical issues are considered inExperiment 3. First, what effect does pre-dictability of a word have on phonemic res-toration? The results of the priming condi-tion in Experiment 2 suggest that makinga word predictable should increase restora-tion, causing a drop in discriminability ofthe added and replaced versions. However,the listeners' predictions in the present ex-periment are cued by semantic/syntactic in-formation; it remains to be seen whetherthese sources of information affect restora-tion in the same way that lexical knowledgedoes. It is possible, for example, that thesehigher level information sources have effectsat higher levels in the task. If so, predict-ability might have its primary effect on Beta,the index of postperceptual bias, rather thanon the perceptual index d.

To manipulate predictability of a word,quadruplets of sentences were used. For ex-ample, in addition to the two predictablesentences mentioned earlier for the "bat-tle"-"batter" word pair, the unpredictablesentences "The soldier's thoughts of the dan-gerous batter made him very nervous" and

PHONEMIC RESTORATION 487

"The pitcher's thoughts of the dangerousbattle made him very nervous" were in-cluded. In these sentences, the disambigu-ating cues ("pitcher's" and "soldier's") havebeen switched, making the critical wordmuch less predictable.

The second factor considered in Experi-ment 3 is the role that memory load playsin the restoration illusion. Pisoni's (1973)work at the syllabic level has shown thatspeech is rapidly reencoded, with a loss ofacoustic detail. A similar reencoding (per-haps at a higher level) could account forsome of the observed restoration results. Totest this possibility, Experiment 3 was runtwice. In one version, the sentence stoppedas soon as the critical word had been pre-sented, and subjects made their two judg-ments immediately. Comparing this condi-tion with one in which subjects delayed theirresponses until the sentence finished providesa measure of the role of memory factors inthe restoration phenomenon.

In summary, Experiment 3 investigatestwo factors potentially affecting phonemicrestoration in sentences: predictability of thecritical word and memory load. In addition,the phone class and phoneme position factorswere carried over from the preceding exper-iments, since they provided particularly in-teresting results.

Method

Stimuli. Twenty-seven basic word pairs were usedto generate all of the sentences and fragments used inExperiment 3. The 27 pairs included three instances ofeach of the nine conditions created by factorially cross-ing three phone classes (liquid, stop, and vowel) andthree positions within the word (initial, medial, and fi-nal). For example, there were 3 basic word pairs inwhich the added/replaced phoneme was an initial liq-uid—"liver-river," "loyal-royal," and "locket-rocket."All basic words were two syllables long.

For each of the 27 basic word pairs, 16 test stimuliwere constructed. The 16 stimuli were broken down asfollows: There were the two memory load conditionsoutlined above—the full sentence and fragment condi-tions. To satisfy the ambiguously restorable and pre-dictability constraints, four versions of the sentenceswere used; each of the two basic words appeared withits own disambiguating context (e.g., battle-soldier) andwith the context of its mate (e.g., battle-pitcher). Forthe battle-batter pair, the full sentences were as follows:the soldier's thoughts of the dangerous battle made himvery nervous; the pitcher's thoughts of the dangerousbatter made him very nervous; the pitcher's thoughtsof the dangerous battle made him very nervous; and the

soldier's thoughts of the dangerous batter made himvery nervous. The fragments were as follows: the sol-dier's thoughts of the dangerous battle; the pitcher'sthoughts of the dangerous batter; the pitcher's thoughtsof the dangerous battle; and the soldier's thoughts ofthe dangerous batter. Each of these 8 stimuli was pre-sented in both an added and a replaced form. Thus atotal of 432 (27 X 16) stimulus items were used in Ex-periment 3.

For each word pair, the 16 test stimuli were con-structed from four utterances recorded by the samespeaker who produced the stimuli in the other experi-ments. The utterances were spoken at a normal con-versational rate, with slightly reduced intonation. Thefragment stimuli were constructed from the full-sen-tence items by editing out everything after the end ofthe word that contained the added/replaced segment.Added and replaced versions of the eight utteranceswere constructed using the same technique and equip-ment used to construct the stimuli in the first two ex-periments. Table 7 presents the utterance lengths, crit-ical segment onset times, segment lengths, and segmentDRMSA for the fragment and full-sentence conditions.

Apparatus and procedure. On each trial, the wordREADY was displayed on a screen in front of the subjectfor 1 sec. This warning signal disappeared, and 300 mseclater a test utterance was played binaurally over stereoheadphones. When the utterance finished, the wordsADDED and REPLACED immediately appeared on the leftand right sides of the screen, respectively. Subjects madethe usual added/replaced judgment as quickly and asaccurately as possible, using the display of ADDED andREPLACED as a signal to respond. Seven hundred msecafter all (1-4) of the subjects had responded, two wordsappeared on the screen, replacing ADDED and RE-PLACED. The two words were the basic pair appropriatefor the utterance the subjects had just heard (e.g., "bat-tle-batter"). Subjects were told to press one of two keysto indicate which of the two words they thought hadbeen in the utterance on that trial (left key - left word,right key - right word). For both the added/replacedjudgment and the word judgment, subjects were told toguess if necessary—both responses were required on alltrials. One second after all of the subjects had made theword judgment, the next trial began.

The apparatus was the same as that used in the pre-vious experiments with the following exceptions: (a) ATektronix 602 display scope was used, and (b) subjectsused response keys instead of response buttons. Thesame two keys were used for the added/replaced andword judgments (left - ADDED for first judgment, andleft - left word for the second).

Each subject received 94 utterances, divided into twopasses through 47 utterances. The first 20 items in eachpass were practice and were not scored. The remaining27 utterances included 1 utterance from each of thebasic word pair sets (see the Design section below). Theorder of the 27 test items was determined randomly. Onthe first pass through the 47 utterances, either the addedor replaced version of each utterance was randomly(p = .5) selected. The second pass through the itemsincluded the versions not presented on the first pass. Thecomparison of added and replaced versions was thuswithin subject. On each trial, the position (left or right)of the forced-choice words was randomly determined.

488 ARTHUR G. SAMUEL

Table 7Stimulus Information in Experiment 3

Stimulustype

Full sentenceFragment

Phone ClassLiquidStopVowel

PositionInitialMedialFinal

Utterancelength

26592063

295725662453

262426262726

Segmentonset time

18021802

196017421702

151918452041

Segmentlength

137137

139133138

142114154

Segmentamplitude

671671

749265

1000

666883466

Note. The utterance length, segment onset time, and segment lengths are in milliseconds. The segment amplitudeis the digital RMS amplitude (DRMSA). Unless otherwise noted, the utterance lengths are for the full sentencecondition. The DRMSA was normalized such that vowels had a mean of 1000.

The data for each subject were collected in a singlesession lasting approximately 20 min.

Design. The fragment and full-sentence conditionswere each run individually—no subject received bothkinds of utterances. Each subject heard one added/re-placed item from each of the 27 basic word pairs. Sincethere are four utterances per basic word pair set in eachof the conditions, four subjects were needed to span theentire stimulus set. Thus four test versions of each mem-ory load condition were constructed. Within each testversion a Latin square procedure was used to counter-balance the four versions of each condition, the phoneclass, and the phone position of the added/replacedphone.

Subjects. A total of 80 subjects participated in Ex-periment 3, 40 in each memory load condition. The sub-jects were randomly assigned to the four versions of eachtest, yielding 10 subjects in each test version.

All subjects were native English speakers with noknown hearing problems. They received $2 or coursecredit for their participation.

Results

Measures. For the added/replaced judg-ments, the two primary measures used in thefirst two experiments, cf and Beta, were usedagain. For the forced-choice word judg-ments, a similar signal detection theorybreakdown can be used. The primary mea-sure of interest is Beta, which is an index ofhow much context biases the listener to hear(or at least report) the word that fits. Betameasures the likelihood that the predictedword was reported, regardless of what wordwas originally there.

Composite subjects. Recall that in eachmemory load condition, 4 subjects were re-quired to span the stimulus set. In each of

these conditions, formation of 10 compositesubjects (pseudosubjects who received allthe stimuli in a condition) was done as fol-lows: for each subject, the Beta score for theadded/replaced judgment was computed.The 10 subjects who had received a givenversion of the experiment were ordered ac-cording to their Beta scores. Composite Sub-ject 1 was then formed by combining thedata from the subject within each group of10 who had the lowest Betas. CompositeSubject 2 was comprised of those with thesecond lowest Betas, and so on, up to Com-posite Subject 10, those with the highestBetas. Combining the data in this way hasthe virtue of putting together data from sub-jects with similar decision criteria on theadded/replaced judgment, decreasing within-pseudosubject variance.5

The major issues being considered in Ex-periment 3 are the roles that semantic-syn-tactic context and memory load play in pho-nemic restoration. Table 8 presents theadded/replaced judgment data, which re-flect how these factors affect the rate of res-toration. Two aspects of the d' data are mostimportant. First, there is no difference inperformance as a function of when the re-

5 In addition to providing pseudosubjects who receivedall of the stimuli, this procedure increases the numberof observations per cell for each pseudosubject. As notedbefore, the d and Beta measures are highly nonlinearfor extreme miss and false alarm rates; the larger cellsizes produced by this procedure substantially reducesuch extreme values and thus increase statistical power.

PHONEMIC RESTORATION 489

Table 8Memory Load by Predictability Breakdown forExperiment 3

Table 9Bias (Beta) on Word Judgments as a Functionof Memory Load and Restoration Occurrence

Contextcondition

Fragment Full sentence

d' Beta d' Beta

Predicted 1.48 3.07 1.71 2.02Unpredicted 1.42 1.33 1.34 1.33

Note. The d' and Beta values are in absolute units.

sponse is made; memory load appears to playno role. Second, and more surprising, dis-criminability is actually better in predictablesituations than in unpredictable ones. A two-way analysis of variance (Memory Load XPredictability) bears out these observations;memory load, F(l, 18) = .10, p > .20; pre-dictability, F(l, 18) = 4.38, p = .06. Al-though the predictability effect is larger forfull sentences than for fragments, the inter-action is not reliable, F(l, 18) = 2.30, p >.10. Note that the predictability result is indirect opposition to the predictions of theexpectation-driven component of the inter-active schema model; it also conflicts withthe priming results. Apparently, high-levelinformation of the type available in this test-ing situation is unable to influence the low-level perceptual process.

The Beta values in Table 8 show that se-mantic-syntactic context is by no means im-potent, however. There is a large bias towardreporting predictable words as intact. In atwo-way analysis of variance of the Betascores (Memory Load X Predictability), thepredictability effect was marginally signifi-cant, F(l, 18) = 3.98, .05 <p < .10. How-ever, the effect was much more reliable thanthis analysis suggests—19 of the 20 pseu-dosubjects showed it. Thus, although dis-criminability was unimpaired by predict-ability, listeners did report predictable wordsto be intact more than unpredictable ones;less "psychological intactness" is requiredwhen a word is expected. As in the d' data,memory load had no effect, F(l, 18) = .43,p > .20.

Table 9 presents data that bear on thequestion of whether semantic-syntactic con-text determines what is restored. The valuesare Beta scores for the forced-choice wordjudgments. All the data in the table come

Restorationoccurrence

RestorationNo restoration

Fragment

.67

.83

Fullsentence

.77

.86

Note. The Beta values are in absolute units.

from trials on which the stimulus was a re-placement item. The scores in the top rowcome from trials on which subjects re-sponded "added" to the replacement item(i.e., restored either perceptually or postper-ceptually); the bottom row data come fromthe trials on which subjects correctly re-ported the replaced stimulus as replaced.The Beta scores were set up such that a valueless than 1.0 indicated a bias toward re-porting the word predicted by context. Thefirst thing to notice is that in all conditions,there is a bias toward reporting the predictedword. However, this bias is not uniform; ontrials in which restoration occurred, listenerswere more biased toward the predicted wordthan when no restoration occurred. An anal-ysis of variance (Restoration X MemoryLoad) revealed that this difference was mar-ginally significant, F(l, 18) = 3.90, .05 <p < .10. It also indicated that the tendencytoward more bias in the immediate responsecondition was not significant, F(l, 18) =1.63, p> .10.

It is of course possible to compute a dscore for the word judgments as well as theBetas that have been discussed. The d' mea-sure reflects the subjects' ability to discrim-inate between replacement items that wereconstructed from different original words(e.g., "bat*" from "battle" and "bat*" from"batter"). These values were generally quitelow, averaging about .7, indicating that few(but some) cues to the original remained.

Table 10 presents the d' and Beta scores(from the added/replaced judgment) for thefragment and full-sentence conditions, bro-ken down by phone class of the added/re-placed phone. The values reported are themeans for the 10 composite subjects in eachcondition. Two-way analyses of variance(Phone Class X Memory Load Condition)were conducted on the d' and Beta scores in

490 ARTHUR G. SAMUEL

Table 10Added/Replaced Judgments: Phone ClassBreakdown

Liquid Stop Vowel

Fragmentd'Beta

Full sentenced'Beta

1.532.78

1.641.53

1.293.39

1.231.86

1.611.65

1.791.46

Note. The d' and Beta values are in absolute units.

Table 11Added/Replaced Judgments: Phone PositionBreakdown

Initial Medial Final

Fragmentd'Beta

Full sentenced1

Beta

1.462.35

1.772.53

1.191.37

1.391.15

1.682.00

1.542.99

Note. The d' and Beta values are in absolute units.

Table 10. The effect of phone class was sig-nificant on the measure of replacement de-tectability, F(2, 36) = 6.76, p < .01, but noton Beta, F(2, 36) = 1.70, p> . 10. The effectof memory load was not significant on eithermeasure (for both, F :£ 1). The potency ofthe phone class factor indicates that thestrong influence of bottom-up confirmationfound in the first two experiments (using iso-lated words) is not lost in a situation withmore top-down influences. As in isolatedwords, the phone class manipulation has itsimpact at the perceptual level, not on higherlevel decision processes. The same orderingof restorability is found in both situations—for both fragments and full sentences, stopsare best restored, followed by liquids andvowels.

The Beta scores for words in sententialcontext are higher than for words in isolation(see Table 2), indicating that subjects aregenerally more likely to report the words asintact in context. However, the d' measureindicates better detection of phoneme re-placement in sentences (or fragments) thanin isolation. A likely basis for this surprisingresult is the influence of prosody (especiallypitch contour) in sentences. The replacementversion of a sentence will often introduce abreak into the prosodic structure, whereasthe added version seldom does. This provideslisteners with an artifactual cue for discrim-inating added and replaced versions in sen-tences. Thus we cannot compare absolutescores across Experiment 3 and Experiments1 and 2. However, within each experiment,we may safely consider the patterns of re-sults.

Table 11 presents the d and Beta scoresfor the fragment and full-sentence condi-

tions, broken down by phoneme positionwithin the target word. A comparison ofthese data with the comparable results usingisolated words (Table 2) reveals that thepatterns differ somewhat. In isolation, nodifference in discriminability was found asa function of position. In sentences, althoughinitial and final position are roughly equal,medial position is consistently poorer. A two-way analysis of variance (Phone Position XMemory Load) indicates that this positioneffect in sentences is reliable, F(2, 36) =4.79, p < .05. The bias measure shows aneven more striking change. In isolation, sub-jects were most biased toward reporting re-placed in initial position and most biasedtoward added in medial position (see Table2). In sentences, medial segments were mostlikely to be called replaced, Beta, F(2,36) = 3.60, p < .05. This result may be dueto the prosody cue discussed earlier. The in-troduction of a noise burst in the middle ofa word (whether added to or replacing aphone) is likely to create the impression ofa break in structure; this impression is lesslikely with noise bursts at word boundaries(i.e., word initial or word final), since breaksbetween words are expected (or even cre-ated) by the listener. If subjects are usingthe prosody-break cue, this would lead to thelower medial Betas directly and to lowermedial d's by use of a cue that is unreliablein this situation. As in the previous analyses,memory load had no effect on the added/replaced judgments (for both, F < 1).

Discussion

The central question under considerationin Experiment 3 was what role sentential

PHONEMIC RESTORATION 491

context plays in phonemic restoration. Theanswer appears to be that context serves tobias listeners. For the word judgments, thisresult is not at all surprising; Sherman (Note1) has shown that subjects' reports of wordstend to follow the sentential context. How-ever, the bias-based nature of context effectson the added/replaced judgment is muchmore surprising and much more important.The data indicate that listeners are more in-clined to report words as intact if the wordsare predictable from preceding context thanif they are not. This bias effect was coupledwith an improvement in discriminability forpredictable words.

These data place an important constrainton the interactive schema model of speechperception that was posited at the outset.The failure to find a discriminability dec-rement for predictable words suggests thatthe higher level (syntactic-semantic) infor-mation was not being passed down to thelower phonetic-phonological level; the top-down processing that was evident betweenthe lexical and phonetic-phonological leveldid not occur here. This is not to say thatthe information was not used. On the con-trary, the Beta data show clearly that high-level information is effective. However, itseffect is at higher levels, particularly thedecision-making level. The data also do notrule out the possibility of top-down use ofsentential context. It is possible that in anextremely predictive context, a discrimina-bility decrement might be found. However,with the materials tested (and with mostnormal discourse), the level of predictionwas insufficient to activate a particular lex-ical entry strongly enough to produce theperformance decrement found in the primingsituation of Experiment 2.

In fact, discriminability actually improvedfor predictable words. This improvementsuggests that the various levels of processingshare some common resources. When pre-ceding context makes a word predictable, theload on the perceptual system decreases.This apparently leaves more processing ca-pacity available for the fine level of acousticanalysis needed to discriminate added andreplacement items.

In light of this finding, the results of thememory load manipulation are a little bitsurprising. On essentially all measures, re-

quiring subjects to delay their responses hadno effect. Maintaining a response in memoryshould place some load on the system, hurt-ing performance. Either this additional loadis negligible, or it is balanced by some aspectof the delay condition, such as the greaternaturalness of whole sentences.

An overall comparison of performance onisolated words (Experiment 1) and words insentences (Experiment 3) revealed higherdiscriminability in sentences. The pattern ofresults for critical phonemes in different po-sitions within words suggested that subjectswere using pitch contour breaks as a cue toimprove discriminability. The possible sen-sitivity of the paradigm to prosodic factorsis both a plus and a minus; although it com-plicates the study of segmental effects, itmay provide insights into suprasegmentalprocessing.

There is a second factor that may havecontributed to the relatively good discrimi-nability in Experiment 3. Recall that in thefirst two experiments, the words were chosenso that a unique lexical item could be re-stored. In Experiment 3, the critical wordswere designed to be multiply restorable inorder to investigate context effects. It is pos-sible that the existence of more than onepotential percept inhibited restoration, im-proving discriminability. If the purpose ofthe restoration mechanism is to reconstructthe original word spoken, the system mightbe stymied when more than one acceptablecandidate is available. Research is currentlyunder way to test this hypothesis.

Finally, the results of the phone class ma-nipulation were straightforward—stops weremost restorable, followed by liquids and vow-els. This pattern of results is identical to thatobserved for isolated words. It thus appearsthat although the functioning of top-downexpectations differs in the two situations, theprocess of bottom-up confirmation of hy-potheses is similar.

General Discussion

The three experiments were designed toprovide some insight into the encoding of thesounds of speech into meaningful units. Thefocus of this inquiry was to determine howmuch of this encoding is a function of theincoming sounds and how much of it is de-

492 ARTHUR G. SAMUEL

termined by the knowledge of the listener.The results, not surprisingly, indicate thatspeech perception depends on the interactionof both the top-down expectations generatedby the listener's knowledge and the bottom-up confirmation provided by the acousticsof the signal. These bottom-up influenceswere actually more potent than one mighthave expected, given the fundamentallyknowledge-driven nature of a phenomenonlike phonemic restoration; a very potent fac-tor in whether restoration occurred turnedout to be the phone class of the speech seg-ment to be restored and its acoustic simi-larity to the replacement sound. Samuel(1981) has shown that specific bottom-upcharacteristics of the signal, including itsamplitude, continuity, and periodicity, havesignificant effects on the rate of restoration.

The technique of separating restorationinto discriminability and bias revealed thatvarious higher level knowledge sources mayincrease the miss rate in different ways. Lex-ical knowledge can be brought to bear duringthe perceptual process in a truly top-downfashion. Thus more restoration occurred inwords than in phonologically legal pseudo-words; the difference resulted in poorer <fsfor words than pseudowords. Similarly, in-creasing the availability of lexical informa-tion, through priming, reduced discrimina-bility of added and replaced stimuli. Incontrast, sentential information producedhigher miss rates through a bias effect; lis-teners were more willing to say "intact"when a word was expected than when it wasnot. Discriminability actually improved forsententially predictable words, suggestingthat the load on the perceptual system wasreduced through predictability.

The model of speech perception thatemerges from these results is a modified ver-sion of Rumelhart's (1977) interactiveschema model. The critical feature of Ru-melhart's model is the combination of bothbottom-up and top-down modes of process-ing. The results of Experiment 2 demon-strate the powerful role of top-down pro-cessing in the perception of speech. Theresults of Experiment 3 place an importantconstraint on the interactive schema model—not all sources of information appear to passtheir results in both directions. In particular,the data indicate that high-level syntactic-

semantic information is used in the high-level decision process, but it does not reachthe acoustic-phonetic level. This result sug-gests that in studies in which the perceptualand decision stage components are insepar-able (e.g., Foss & Blank, 1980), the observedcontextual effects are due to decision stagefactors.

This model of speech perception may beassessed by considering how well it agreeswith the results of several recent studies inwhich different methodological approacheswere taken. Each of these approaches solvessome of the problems inherent in the meth-odology first used to study phonemic resto-ration.

Cole (1973; Cole & Jakimik, 1979) intro-duced a technique in which subjects listento a passage that contains occasional mis-pronunciations. The task was to press a but-ton whenever a mispronunciation was de-tected. Cole varied the degree of mispro-nunciation (one, two, or four distinctivefeatures) and its position within the word(word initial, second syllable final, or wordfinal). The mispronunciation miss rate, anindex of restoration, varied with similarityof the expected and actual phonemes but wasunaffected by position of the mispronuncia-tion. The similarity of the mispronunciationto the original is a bottom-up factor thatappears to be quite potent (as in the presentstudy). Reaction times for the detections alsovaried with similarity. In addition, reactiontimes varied with position; responses wererelatively slow for word-initial mispronun-ciations, whereas second-syllable and word-final responses were about equally fast. Colesuggested that the long reaction times forword-initial mispronunciations reflect thedominance of initial phonemes in lexical ac-cess—a mispronunciation there leads to er-roneous entry into the lexicon.

The results of the position factor in Ex-periments 1 and 2 generally confirm Cole'shypothesis. For words in isolation, there wasa consistent tendency for subjects to reportsomething missing if an extraneous soundoccurred at the beginning of a word, re-gardless of whether the sound coincided withor replaced the initial phone. Such a diffen-tial sensitivity tends to support Cole's view.The priming data provide support for a re-finement of the special role of word begin-

PHONEMIC RESTORATION 493

nings. Those data showed that when expec-tations exist, the initial part of a word canconfirm the expectation, producing greaterrestoration of later phones. The failure toobtain the effect in Experiment 3 may wellbe due to the artifactual prosody cue. Thusthe results of the present study may be takenas support for a special confirmatory role ofword-initial phonemes.

Marslen-Wilson (1975) introduced an-other variant of restoration methodology, theshadowing technique. Marslen-Wilson hadsubjects shadow (repeat almost simulta-neously) speech in which mispronunciationsoccurred, and as in Cole's procedure, in-ferred restoration when a mispronunciationwas not repeated (i.e., the subject restoredthe proper pronunciation). Marslen-Wilsonfound that most restoration occurred in thelast two syllables of words that the subjectsexpected. This result suggests that restora-tion was initiated by the sentential context(i.e., expectations generated by listenerknowledge) and was finalized when the ini-tial syllable confirmed the contextually-driven hypothesis.

Marslen-Wilson and Welsh (1978) com-pared Cole's (1973) mispronunciation tech-nique and the shadowing method, using thesame recorded material for each. The resultsfor the two procedures were similar. Thebiggest difference was that for grossly mis-pronounced phonemes (three distinctive fea-tures changed), subjects rarely restored us-ing the mispronunciation task (6%) but werefairly likely to restore when shadowing(24%). This result may be due to subjectsfocusing their attention on mispronuncia-tions in Cole's task, or to the great atten-tional demands of shadowing. In any event,degree of mispronunciation was a very po-tent factor on both measures. As in Marslen-Wilson's (1975) study, predictability of aword greatly increased the likelihood of ashadowing restoration. In the present study,sentential predictability also increased "res-toration," but did so by biasing the decisionprocess. In the shadowing paradigm, thereis no way to discover whether the higher res-toration rate was due to a perceptual effector some sort of postperceptual decision stage.

Marslen-Wilson and Welsh's (1978) modelof lexical access from speech incorporatesboth bottom-up and top-down processing.

They argued that the incoming acoustic in-formation suggests a group of word candi-dates (the cohort), and that higher level fac-tors help to choose from among thecandidates. Note that this model reverses theroles of bottom-up and top-down informa-tion; the bottom-up information creates theexpectation (or cohort), and the higher levelknowledge is used to confirm one of the can-didates. It is not clear how to choose betweenthe two formulations.

Bashford and Warren (1979) have re-cently introduced a restoration methodologythat seems particularly promising for inves-tigating high-level effects on restoration. Inthis procedure, a speech passage is inter-rupted in such a way that the listener onlyhears half of the passage. For example, thefirst 500 msec of the passage might be heard,the next 500 msec deleted, the next 500 msecheard, and so forth. Miller and Licklider(1950), using word lists, had reported thatreplacing the deleted portions with broad-band noise produced speech that soundedmore complete than the speech produced byleaving the deleted segments as silence.Since Miller and Licklider found no corre-sponding increase in intelligibility, the illu-sory continuity appears to be similar to thebias component of restoration observed inExperiment 3. The continuity effect also hasa property that would be expected if post-perceptual restoration were its basis—thecontinuity breaks down if the interruptionsare too long, reducing the necessary context.

From these facts, Bashford and Warren(1979) developed a new measure of resto-ration: the point at which the continuity ef-fect appears (or disappears) when the rateof speech-noise switching is varied. Subjectswere given control over this rate, and madethreshold judgments for the continuity ef-fect. Bashford and Warren found that ifbroadband noise was the replacement sound,listeners heard continuous speech with noise-filled deletions as long as 304 msec. Whensilence was left, the continuity broke downat 52 msec; silence did not confirm the ex-pectations generated by the remainingspeech. The results of these and other con-ditions in Bashford and Warren's study ledthem to conclude that the success of resto-ration depends on (a) the spectral similarityof the replacement sound and the speech it

494 ARTHUR G. SAMUEL

replaces, and (b) the amount of linguisticcontext available. In short, restoration is afunction of expectation and confirmation.6

The results of three quite different meth-odologies thus coverge with those obtainedin Experiments 1-3. The strengths andweaknesses of the added/replaced paradigmnicely complement those of the alternativemethods. The strongest virtue of the proce-dure used in the present study is its abilityto separate perceptual discriminability frombias; none of the other three methods can dothis because none has any usable false alarmrate. Thus bias and discriminability (of arestored and a "real" phoneme) are inextri-cably intertwined.

A second virtue of the procedure is thatit does not impose a second, high-resourcetask on the listener, as is the case in theshadowing paradigm. The shadowing taskwas first used precisely because it did absorbmost of the subject's attentional capacity.With this strain on the system, it is likelythat the listener is forced to rely more ontop-down processing (knowledge already inhand) than in the normal situation. Thus theshadowing methodology probably exagger-ates the role of expectations. In contrast,Cole's (1973) mispronunciation task and theadded/replaced task focus the listener's at-tention on the speech sounds, accentuatingbottom-up processing to some degree. Themiddle ground between the models derivedfrom the shadowing and detection para-digms is therefore probably closest to thetrue functioning of the speech system.

6 Restoration of speech sounds is not necessarilyunique in being a function of expectation and confir-mation—the interactive schema model of Rumelhart(1977) is applicable to perception in general. In fact,Warren, Obusek, and Ackroff (1972) have reported anonspeech analogue to phonemic restoration that ap-pears to follow similar rules, including sensitivity to suchconfirmatory factors as spectrum and intensity.

Reference Note

1. Sherman, G. The phonemic restoration effect: Aninsight into the mechanisms of speech perception.Unpublished master's thesis, University of Wiscon-sin—Milwaukee, 1971.

References

Bashford, J., & Warren, R. M. Perceptual synthesis ofdeleted phonemes. In J. J. Wolf & D. H. Klatt (Eds.),Speech communication papers. New York: Acousti-cal Society of America, 1979.

Cole, R. Listening for mispronunciations: A measure ofwhat we hear during speech. Perception & Psycho-physics. 1973, /7, 153-156.

Cole, R., & Jakimik, J. A model of speech perception.In R. Cole (Ed.), Perception and production affluentspeech. Hillsdale, N.J.: Erlbaum, 1979.

Foss, D., & Blank, M. Identifying the speech codes.Cognitive Psychology, 1980, 12, 1-31.

Kucera, H., & Francis, W. Computational analysis ofpresent-day American English. Providence, R.I.:Brown University Press, 1967.

Layton, B. Differential effects of two nonspeech soundson phonemic restoration. Bulletin of the PsychonomicSociety, 1975, 6, 487-490.

Marslen-Wilson, W. Sentence perception as an inter-active parallel process. Science, 1975, 189, 226-228.

Marslen-Wilson, W., & Tyler, L. The temporal struc-ture of spoken language understanding. Cognition,1980,5, 1-71.

Marslen-Wilson, W., & Welsh, A. Processing interac-tions and lexical access during word recognition incontinuous speech. Cognitive Psychology, 1978, 10,29-63.

Miller, G. A. & Licklider, J. C. The intelligibility ofinterrupted speech. Journal of the Acoustical Societyof America, 1950, 22, 167-173.

Obusek, C,, & Warren, R. M. Relation of the verbaltransformation and the phonemic restoration effects.Cognitive Psychology, 1973, 5, 97-107.

Pisoni, D. Auditory and phonetic memory codes in thediscrimination of consonants and vowels. Perception& Psychophysics, 1973, 13, 253-260.

Rumelhart, D. E. Toward an interactive model of read-ing. In S. Domic (Ed.), Attention and performanceVI. Hillsdale, N.J.: Erlbaum, 1977.

Samuel, A. G. The role of bottom-up confirmation inthe phonemic restoration illusion. Journal of Exper-imental Psychology: Human Perception and Perfor-mance, 1981, J, 1124-1131.

Taft, M. Lexical access via an orthographic code: Thebasic orthographic syllabic structure (BOSS). Jour-nal of Verbal Learning and Verbal Behavior, 1979,18, 21-40.

Warren, R. M. Illusory changes of distinct speech uponrepetition—The verbal transformation effect. BritishJournal of Psychology, 1961,52, 249-258.

Warren, R. M. Perceptual restoration of missing speechsounds. Science, 1970, 167, 392-393.

Warren, R. M., & Obusek, C. Speech perception andphonemic restorations. Perception & Psychophysics,1971, 9, 358-363.

Warren, R. M., Obusek, C., & Ackroff, J. Auditoryinduction: Perceptual synthesis of absent sounds. Sci-ence, 1972, 176, 1149-1151.

Warren, R. M., & Sherman, G. Phonemic restorationsbased on subsequent context. Perception & Psycho-physics, 1974, 16, 150-156.

Received August 12, 1980 •