effects of delayed auditory feedback (daf) on the pitch-shift reflex

7
Effects of delayed auditory feedback (DAF) on the pitch-shift reflex Timothy C. Hain Departments of Otolaryngology, Head and Neck Surgery, and Neurology, Northwestern University Medical School, Chicago, Illinois 60611 Theresa A. Burnett Laryngeal & Speech Section, MNB/NINDS/NIH, Bldg. 10, Rm. 5D38, Bethesda, Maryland 20892-1416 Charles R. Larson a) and Swathi Kiran Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois 60208 ~Received 27 January 2000; accepted for publication 22 February 2001! Changes in voice pitch auditory feedback to vocalizing subjects elicit compensatory changes in voice fundamental frequency ~F0!. The neural mechanisms responsible for this behavior involve the auditory and vocal-motor systems, collectively known as the audio-vocal system. Previous work @Burnett et al., J. Acoust. Soc. Am. 103, 3153–3161 ~1998!; Hain et al., Exp. Brain Res. 130, 133–141 ~2000!; Larson et al., J. Acoust. Soc. Am. 107, 559–564 ~2000!# indicated that this system operates using negative feedback to cancel out low-level errors in voice F0 output. By introducing delays in the auditory feedback pathway, we hoped to transiently ‘‘open’’ the feedback loop and learn which components of the response are most closely related to the timing of the auditory feedback signal. Subjects were presented with pitch-shift stimuli that were paired with a delay of 0, 50, 100, 200, 300, or 500 ms. Delayed auditory feedback did not affect voice F0 response latency or magnitude, but it delayed the timing of later parts of the response. As a further test of the audio-vocal control system, a second experiment was conducted in which delays of 100, 200, or 300 ms were combined with stimuli having onset velocities of 1000 or 330 cents/s. Results confirmed earlier reports that the system is sensitive to velocity of stimulus onset. A simple feedback model reproduced most features of both experiments. These results strongly support previous suggestions that the audio-vocal system monitors auditory feedback and, through closed-loop negative feedback incorporating a delay, adjusts voice F0 so as to cancel unintentional small magnitude fluctuations in F0. © 2001 Acoustical Society of America. @DOI: 10.1121/1.1366319# PACS numbers: 43.70.Aj @AL# I. INTRODUCTION The control of voice fundamental frequency ~F0! is a complex process involving volition, memory, kinesthetic and proprioceptive feedback, auditory feedback, and the neuro- muscular adjustments of the respiratory and laryngeal sys- tems necessary to produce a desired F0 level. The ‘‘pitch- shift reflex’’ is a vocal response to changes in the pitch of voice auditory feedback. It is thought to help stabilize voice F0 in the presence of unintended perturbations. The pitch- shift reflex can be elicited by experimental manipulation of auditory feedback, offering a unique means of studying neu- ral mechanisms of vocal control ~Burnett et al., 1998, 1997; Hain et al., 2000; Kawahara, 1995; Kawahara et al., 1996; Larson et al., 1997, 1995, 1996!. Most voice F0 responses elicited by a pitch shift are compensatory. That is, a downward shift in feedback pitch ~pitch-shift stimulus! results in a rise in voice F0, and an upward change in feedback results in a reduction in F0. There is also proportionality between stimulus onset velocity and response velocity ~Larson et al., 2000!. This relation was largely replicated in a mathematical model of the audio- vocal system incorporating negative feedback control incor- porating intrinsic delays, and filtering ~Larson et al., 2000!. These observations indicate the audio-vocal system is sensi- tive to both the direction and the speed of change in feedback pitch, supporting the hypothesis that the function of the pitch-shift reflex is to null unexpected changes in voice F0 output and thus stabilize F0. In studying human behavior using negative feedback control, while the internal components of the system are usu- ally unavailable for experimental modification, it is often possible to modify the gain or timing of the feedback signal. To the extent that negative feedback is used to stabilize the system, these procedures should lead to instabilities. In the present study, pitch-shift stimuli ~PSS! were presented in a delayed auditory feedback ~DAF! paradigm to alter the tim- ing of the system, and so test the hypothesis that the audio- vocal system operates in a negative feedback mode. While subjects were vocalizing a steady vowel ~ah!, a harmonizer continuously stored a segment of the voice in a buffer. At some time during the vocalization, that stored segment was unexpectedly shifted upward or downward in pitch and fed back to the subjects over headphones. This caused the sub- ject to hear what sounded to be their own voice at a higher or lower pitch than they were currently producing, but evidence a! Electronic mail: [email protected] 2146 2146 J. Acoust. Soc. Am. 109 (5), Pt. 1, May 2001 0001-4966/2001/109(5)/2146/7/$18.00 © 2001 Acoustical Society of America Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49

Upload: swathi

Post on 24-Mar-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effects of delayed auditory feedback (DAF) on the pitch-shift reflex

Redist

Effects of delayed auditory feedback (DAF)on the pitch-shift reflex

Timothy C. HainDepartments of Otolaryngology, Head and Neck Surgery, and Neurology, Northwestern University MedicalSchool, Chicago, Illinois 60611

Theresa A. BurnettLaryngeal & Speech Section, MNB/NINDS/NIH, Bldg. 10, Rm. 5D38, Bethesda, Maryland 20892-1416

Charles R. Larsona) and Swathi KiranDepartment of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois 60208

~Received 27 January 2000; accepted for publication 22 February 2001!

Changes in voice pitch auditory feedback to vocalizing subjects elicit compensatory changes invoice fundamental frequency~F0!. The neural mechanisms responsible for this behavior involve theauditory and vocal-motor systems, collectively known as the audio-vocal system. Previous work@Burnett et al., J. Acoust. Soc. Am.103, 3153–3161~1998!; Hain et al., Exp. Brain Res.130,133–141~2000!; Larsonet al., J. Acoust. Soc. Am.107, 559–564~2000!# indicated that this systemoperates using negative feedback to cancel out low-level errors in voiceF0 output. By introducingdelays in the auditory feedback pathway, we hoped to transiently ‘‘open’’ the feedback loop andlearn which components of the response are most closely related to the timing of the auditoryfeedback signal. Subjects were presented with pitch-shift stimuli that were paired with a delay of 0,50, 100, 200, 300, or 500 ms. Delayed auditory feedback did not affect voiceF0 response latencyor magnitude, but it delayed the timing of later parts of the response. As a further test of theaudio-vocal control system, a second experiment was conducted in which delays of 100, 200, or 300ms were combined with stimuli having onset velocities of 1000 or 330 cents/s. Results confirmedearlier reports that the system is sensitive to velocity of stimulus onset. A simple feedback modelreproduced most features of both experiments. These results strongly support previous suggestionsthat the audio-vocal system monitors auditory feedback and, through closed-loop negative feedbackincorporating a delay, adjusts voiceF0 so as to cancel unintentional small magnitude fluctuations inF0. © 2001 Acoustical Society of America.@DOI: 10.1121/1.1366319#

PACS numbers: 43.70.Aj@AL #

ndursy

oicetcoeu

retc

cit

io

or-

nsi-ackhe

cksu-nal.thethe

dio-hile

Atwasedsub-r orce

I. INTRODUCTION

The control of voice fundamental frequency~F0! is acomplex process involving volition, memory, kinesthetic aproprioceptive feedback, auditory feedback, and the nemuscular adjustments of the respiratory and laryngealtems necessary to produce a desiredF0 level. The ‘‘pitch-shift reflex’’ is a vocal response to changes in the pitchvoice auditory feedback. It is thought to help stabilize voF0 in the presence of unintended perturbations. The pishift reflex can be elicited by experimental manipulationauditory feedback, offering a unique means of studying nral mechanisms of vocal control~Burnettet al., 1998, 1997;Hain et al., 2000; Kawahara, 1995; Kawaharaet al., 1996;Larsonet al., 1997, 1995, 1996!.

Most voice F0 responses elicited by a pitch shift acompensatory. That is, a downward shift in feedback pi~pitch-shift stimulus! results in a rise in voiceF0, and anupward change in feedback results in a reduction inF0.There is also proportionality between stimulus onset veloand response velocity~Larsonet al., 2000!. This relation waslargely replicated in a mathematical model of the aud

a!Electronic mail: [email protected]

2146 J. Acoust. Soc. Am. 109 (5), Pt. 1, May 2001 0001-4966/2001/

ribution subject to ASA license or copyright; see http://acousticalsociety.org

o-s-

f

h-f-

h

y

-

vocal system incorporating negative feedback control incporating intrinsic delays, and filtering~Larsonet al., 2000!.These observations indicate the audio-vocal system is setive to both the direction and the speed of change in feedbpitch, supporting the hypothesis that the function of tpitch-shift reflex is to null unexpected changes in voiceF0output and thus stabilizeF0.

In studying human behavior using negative feedbacontrol, while the internal components of the system are ually unavailable for experimental modification, it is oftepossible to modify the gain or timing of the feedback signTo the extent that negative feedback is used to stabilizesystem, these procedures should lead to instabilities. Inpresent study, pitch-shift stimuli~PSS! were presented in adelayed auditory feedback~DAF! paradigm to alter the tim-ing of the system, and so test the hypothesis that the auvocal system operates in a negative feedback mode. Wsubjects were vocalizing a steady vowel~ah!, a harmonizercontinuously stored a segment of the voice in a buffer.some time during the vocalization, that stored segmentunexpectedly shifted upward or downward in pitch and fback to the subjects over headphones. This caused theject to hear what sounded to be their own voice at a highelower pitch than they were currently producing, but eviden

2146109(5)/2146/7/$18.00 © 2001 Acoustical Society of America

/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49

Page 2: Effects of delayed auditory feedback (DAF) on the pitch-shift reflex

rreSchns

tiodT

tiv

he-SPo

aKf

es,ac

o

oo

il

rt-irodoe

mitclaedthdi-uryn

00ionecvvSer

hethez.

ass

edlt-

F

di-, end

thetheatre-

asicuresded,nlope

theof

ringAt athichas

tiondforli.nts

edp-res

g a

Redist

of all subsequent vocal responses was delayed by an inteequivalent to that held in the buffer. This paradigm pvented subjects from hearing theirF0 response to the PSduring the delay interval. Instead, they just heard the pitshift stimulus itself. Thus, an error between intended aperceived pitch was created and perception of responsethis error was delayed. By this means, the error correcfeedback loop was transiently opened. At the end of thelay interval, the subjects heard their response to the PSS.resultant behavior was largely replicated with the negafeedback mathematical model previously reported.

II. EXPERIMENT

Subjects:Twenty-two healthy young adults between tages of 18 and 22~21 females and 1 male! served as subjects. All subjects passed a hearing screening at 20 dB~500–9 kHz!, reported no neurological deficits, had nspeech or voice disorder and were not trained singers.

Apparatus and procedures: Subjects were seated insound-treated booth, their voices transduced with an Aboom-set microphone, amplified, recorded, processedpitch shifting through an Eventide~SE 3000! Ultraharmo-nizer, mixed with 70 dB~SPL! masking noise~3 dB/oct,100–5 kHz! and fed back to the subject over AKG earphon~Model K 270 H/C!. The harmonizer shifts all frequencievoice F0 as well as formants, and thus the shifted feedbsignal sounds like a person’s normal voice at a differentF0~details of the pitch-shifting algorithm are a trade secretthe manufacturer and are thus unavailable!. Throughout theexperiment, subjects maintained vocal loudness at apprmately 70 dB SPL, resulting in a voice feedback loudnessapproximately 80 dB. For additional methodological detasee Burnettet al. ~1998!.

Subjects were instructed to vocalize ‘‘ah’’ at a comfoable pitch for 5 s, pause for a breath, then repeat. Thvocalizations constituted a block. Within each block of vcalizations, 15 experimental trials were mixed pseurandomly with 15 control trials, in which no PSS was prsented. In the experimental trials, a PSS of 100 cents~1200cents51 oct! was presented at a random time 500 to 2500after vocalization onset. At the same instant that the pshift was initiated, the harmonizer also imposed a de~DAF! of 0, 50, 100, 200, 300, or 500 ms, which extendthe time that the system operates open loop. Thus forduration of the PSS, auditory feedback was 100 centsplaced from the voiceF0 of 0–500 ms ago. Given production of a steady note, the PSS created a sudden discontinbetween theF0 the subject was producing and the auditofeedback pitch. However, subjects did not hear their respoto the PSS until the delay interval had elapsed. After 1 s thePSS was terminated.

The PSS onset and offset were abrupt, i.e., 10,cents/s. During a single block of trials, one delay conditwas presented, resulting in a total of six blocks per subjFifteen subjects were tested with downward PSS and sewith upward PSS. We did not think it necessary to haequal numbers of subjects with upward and downward Psince previous studies have failed to show significant diff

2147 J. Acoust. Soc. Am., Vol. 109, No. 5, Pt. 1, May 2001

ribution subject to ASA license or copyright; see http://acousticalsociety.org

val-

-dtone-hee

L

Gor

s

k

f

xi-f

s

ty--

-

shy

es-

ity

se

0

t.eneS,-

ences between these two conditions.~Burnett et al., 1988,1997; Hainet al., 2000!.

Throughout the experiment, the subject’s voice, tfeedback signal to the subject, and a control pulse toharmonizer were digitized at a sample frequency of 2 kHThe subject’s voice and the feedback signal were low-pfiltered at 200 Hz~females! or 100 Hz~males! prior to digi-tization. TheF0 of digitized acoustical signals was extractusing a software algorithm resulting in dc signals where voage corresponded toF0 ~analogF0!. All signals in each ex-perimental trial were then time-aligned with the PSS-DAonset and averaged within each block.

From the averaged waveforms, the following set ofrect measures was automatically made: response latencytime, peak magnitude, and peak time~Fig. 1!. Response la-tency was defined as the time poststimulus onset at whichanalogF0 signal exceeded two standard deviations ofprestimulus meanF0. Response end time was that timewhich the response returned to within two S.D.’s of the pstimulus mean. Response duration~latency to end time! andonset slope were calculated from these direct measures.

The F0 response often peaked and then fell back toholding value, resulting in two response components: phaand prolonged. In these cases, several additional measwere calculated. The time when the phasic response en‘‘phasic end time,’’ was determined from visual inspectioof the records and was defined as a dramatic change in sof the F0 response following the response peak~see Fig. 1!.The detection of the slope change was made by eye withconcurrence of two independent observers. Confirmationthe time of the slope change was made by low-pass filte~5 Hz! and then differentiating the averaged waveform.change in slope of the differentiated signal was apparenthe offset time of the phasic response. For 61 cases in wthe phasic peak end time could clearly be defined, it wmeasured along with that of the phasic peak offset dura~peak time to end time!, slope of phasic peak offset, anphasic peak duration. Identical measures were madedownward responses following increasing pitch shift stimuMagnitude measures were converted from Hertz to ce~100 cents51 semitone! and the absolute values determinin order to permit joint consideration of measures from uward and downward PSS direction conditions. The measuwere charted and submitted to significance testing usin

FIG. 1. Illustration of measurement parameters.

2147Hain et al.: Pitch-shift reflex with delayed auditory feedback

/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49

Page 3: Effects of delayed auditory feedback (DAF) on the pitch-shift reflex

,e

fo

ueee

o

atspIndtep--ow

herwasub-ed-acefdytdppo-

the

AFof

-ig-ifi-ction

(d 0,ex-aveof

not

Selull

Th

er

.

efi-

e in-per

igh

,

Redist

MANOVA in DataDesk™ ~Data Description, Inc., IthacaNY!. Mathematical modeling and simulation of responswas performed using the Matlab/Simulink package~TheMathWorks, Inc., Nantick, MA!.

III. RESULTS

We measured 126 averaged responses. In all butresponses, subjects changed voiceF0 in the direction oppo-site to that of the stimulus~i.e., a ‘‘compensating response’’!.There were 61 cases in which initial phasic and subseqprolonged response components could be clearly identifiIn the remaining cases, only a prolonged response was msured. No delay condition resulted in significantly higherlower incidence of phasic responses~chi square50.6!.

Figure 2~a! ~A–F! shows representative averaged dfrom one subject. Data simulations also shown are discuslater. Each part of the graph consists of two traces, the tothe subjet’s voiceF0 and the bottom is the feedback pitch.each case, the onset of the PSS is seen as a downwarflection in the lower trace, which lasts 1000 ms. Shortly afthe PSS onset, the subject’sF0 response is seen as an uward deflection~upward arrow! in the upper trace. The phasic portion of the response ends at the downward arrfollowed by the prolonged component of theF0 response

FIG. 2. ~a! Examples of averaged voiceF0 responses to decreasing PSdirections for experiment 1. Each of six boxes represents data for a dcondition. Solid lines are experimental data and dashed lines are simtions. Upper traces are voiceF0 and lower traces are feedback signals. Atraces are from the same subject, and each trace consists of 15 trials.Yaxis is frequency in Hz. Upward arrows indicate beginning ofF0 response.Downward arrows indicate end of phasic component ofF0 response. Aphasic response was not identified for the example in subpanel A of~a!, andhence there is no downward arrow to indicate end of the response. Astindicates delayed response of auditory feedback toF0 response. Simulationsusing mathematical model of Fig. 6 parametrized with median values~b!Enlarged view of average wave and standard error~gray! of the average for300-ms delay condition.

2148 J. Acoust. Soc. Am., Vol. 109, No. 5, Pt. 1, May 2001

ribution subject to ASA license or copyright; see http://acousticalsociety.org

s

ur

ntd.a-

r

aedis

de-r

,

~except 300 and 500-ms conditions!, which continues untiljust beyond the termination of the 1000 ms PSS. Anotresponse occurred at the termination of the PSS, but itnot measured and so will not be discussed further. The sject’s F0 response onset was reflected in the auditory feback trace, seen as an upward deflection in the lower trand indicated by an asterisk~* !. The delayed presentation oresponse feedback~* ! apparently constituted a secon‘‘pitch-shift stimulus,’’ eliciting an additional compensatorresponse, this time a reduction inF0 that terminated the firsF0 response~downward arrows!. Those subjects who hareceived PSS that increased in pitch produced patterns osite in direction to those seen in Fig. 2. Figure F2~b! illus-trates the average wave for the 300 cents DAF along withstandard error of the waves making up the average.

The data in Fig. 2 suggest that with an increase in Dinterval, there is a corresponding increase in the durationthe phasic~peak! response. After logarithmically transforming the data to achieve homogeneity of variance, overall snificance testing with a multivariate design revealed signcant differences among the dependent variables as a funof DAF interval (Fapprox52.34, d f530, 194, p,0.005!.Follow-up testing revealed significant differencesp,0.005) in phasic peak duration between the 300 ms an50, and 100 ms, and between 500 ms and 0 ms. Theamples in Fig. 2 suggest that the offset slope may also hbeen affected by the delay conditions, and the boxplotsFig. 3 also suggest this, however, the differences were

ayla-

e

isk

FIG. 3. Boxplots of peak duration and offset slope measures. Boxplot dnitions: the upper and lower limits of the boxes~hinges! represent the 75thand 25th percentiles, respectively. Gray bars represent 95% confidenctervals. Horizontal line through box is the median. Whiskers extend to upand lower limits of main body of data, defined as high hinge11.5 ~highhinge-low hinge! and low hinge21.5 ~high hinge-low hinge!. Points de-picted as ‘‘o’’ extend beyond the above limits, unless they exceed hhinge13.0 ~high hinge2low hinge! or low hinge-3.0 ~high hinge-lowhinge!, in which case they are shown as ‘‘* ’’ ~Data Desk, Data DescriptionInc.!.

2148Hain et al.: Pitch-shift reflex with delayed auditory feedback

/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49

Page 4: Effects of delayed auditory feedback (DAF) on the pitch-shift reflex

eaopmnt-

s

aicn

ab

dd-

e

he

aontle

e

n

lte

nto

i-a00ic

tiosanrdo

oneticalgedory’’-

ts/s.ofarlyro--ses,

leseof

eakct toakto

sicndas

i-thetheig-

edata

entalls andal

e

Redist

significant. There were no effects of feedback delay on pmagnitude, latency, peak time, onset slope, or offset sl~see Fig. 1!. The overall mean response latency was 130~s.d.531 ms! and mean peak magnitude was 69 ce~s.d.569 cents!. Although not statistically significant, the delay conditions resulted in a larger peak magnitude~mean571cents! compared with the no delay condition~mean559cents!. There were no differences in any of the responmeasures as a function of pitch-shift direction.

In our previous work, we proposed a schematic forF0 tracking system, implemented as an explicit mathematmodel ~Larson et al., 2000!. We configured and drove aextension of this model~see Fig. 6! with input waveformssynthesized to match those used in this experiment. Wesessed the goodness of fit from the squared differencetween experimental data and simulated data normalizethe squared difference between the experimental data anbaseline F0 ~subsequently called vaf for ‘‘variance accounted for’’!.

A second-order filter@Eq. ~1!# was needed to reproducour data accurately:

f gain

sTc11•

sTa

sTa11. ~1!

Using the optimization toolbox of Matlab, we determined tvalues for four model parameters~fgain, sTa, sTc, Delay!that had the largest vaf. When all four parameters werelowed to vary for each DAF delay, the simulations of a setsix averaged experimental traces from one subject accoufor 91.2% ~85.2%–96.7%! of the variance in experimentadata. Median parameter values from the six fits above wTc50.170 s,T50.235 s, f gain51.15, anddelay50.06.Using these values uniformly for all DAF delays, the modaccounted for 76.6% of the variance~see Fig. 2!. A first-order filter ~i.e., f gain/Stc11! performed much morepoorly, accounting for only 18% of the variance. A less costrained second-order filter, described in Eq.~2! under theheading of experiment 2, fit the data equally well as the fiof Eq. ~1!.

IV. EXPERIMENT 2

Subjects: Twenty-three normal undergraduate stude~20 females, 3 males! participated. None reported speechneurological disorders or were trained singers.

Procedures: All aspects of this experiment were identcal to the first with a few exceptions. The first exception wthat the pitch-shift stimuli had onset velocities of either 1or 330 cents/s. These two velocity conditions resulted‘‘ramp-shaped’’ PSS with durations of 100 or 300 ms. Seond, the subject’s voice feedback was delayed upon initiaof the PSS by 100, 200, or 300 ms. Thus, there wereexperimental conditions. The PSS remained on for 3 s,though only data up to 1.5 s poststimulus onset were alyzed. Within each block of 30 trials, there were 15 upwastimuli pseudo-randomly mixed with 15 trials in which nstimuli were presented.

2149 J. Acoust. Soc. Am., Vol. 109, No. 5, Pt. 1, May 2001

ribution subject to ASA license or copyright; see http://acousticalsociety.org

kess

e

nal

s-e-tothe

l-fed

re

l

-

r

sr

s

n-nixl-a-

V. RESULTS

Figure 4 shows representative averaged data fromsubject as well as simulated data using the mathemamodel of Fig. 6, to be discussed later. Of the 138 averaresponses recorded and analyzed, 134 were ‘‘compensatand 4 were ‘‘following’’ ~the F0 changed in the same direction as the stimulus!. All 4 of the ‘‘following’’ responsesoccurred when the stimulus ramp duration was 1000 cenTwo occurred under DAF of 200 ms and two under DAF300 ms. There were 83 cases in which the end of a cledefined phasic peak could be distinguished from the plongedF0 response~see Fig. 1!. Thus we were able to measure phasic response onset variables from all 138 responwhile we could analyze ‘‘end of phasic response’’ variabfrom only 83 cases. Figure 4~b! illustrates the averaged wavfrom the 300–100 condition along with the standard errorthe wave making up the average.

For statistical analysis, latency, peak time, phasic pduration, and peak magnitude were evaluated with respethe independent variables. A log transformation of pemagnitude, peak time, and peak duration was doneachieve homogeneity of variance. Offset slope and phapeak end point were highly correlated with magnitude aduration, so they were not evaluated statistically. There wa main effect for delay~Fapprox.52.70,d f58146,p,0.005!but not for stimulus onset velocity. The DAF condition prmarily affected the later parts of the response, as wascase with experiment 1, while there was a tendency forstimulus velocity to affect earlier parts of the response. F

FIG. 4. ~a! Examples of averaged voiceF0 responses to ramped PSS in thincreasing direction from experiment 2. Each of six boxes representsfor a combination of ramp duration and a delay condition~duration-delay!.The step magnitude was 100 cents in all cases. Solid lines are experimdata and dashed lines are simulations. Upper traces are feedback signalower traces are voiceF0. Simulations are shown using the mathematicmodel of Fig. 6, parametrized with median values.~b! Enlarged view ofaverage wave and standard error~gray! of the averaged responses for th300-ms delay, 100-ms ramp condition.

2149Hain et al.: Pitch-shift reflex with delayed auditory feedback

/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49

Page 5: Effects of delayed auditory feedback (DAF) on the pitch-shift reflex

seom3mt/

u-g

samla6

riamn

ongco

a

of

s ablee

the

h ofct’sal.notnoted.it isdedisthen innd

n-ileuc-gnalthatnceiseun-

wascedif-dingl. In

f an

e-

o the

Redist

ures 5~a! and ~b! show boxplots of peak time and responlatency. Although not significant, peak time increased fr424 ms in the 100 cent/s condition to 506 ms in the 3cent/s condition. Latency increased from a mean of 186in the 1000 cent/s condition to 222 ms in the 330 cencondition. The overall mean latency was 204 ms~s.d.563!.Boxplots in Fig. 5~c! illustrate a progressive increase in dration of the phasic response as the delays became lonThere were significant differences between the 300 ms~415ms! and 100 ms~256 ms! (p,0.001) and 300 and 200 m~309 ms! (p,0.05) comparisons. Mean peak magnitude wweakly affected by the DAF interval, with a decrease fro39 cents in the 100 ms delay to 34 ms with the 300 ms de(p,0.05). The overall mean response magnitude was 3cents~s.d.520.5 cents!.

a•s

b•s21c•s11. ~2!

Again we fit parameters of the model of Fig. 6 to the expemental data and determined the values for three filter pareters and the delay parameter that minimized the variabetween the experimental data and simulation. The secorder filter of Eq.~1! did not do nearly as well in reproducindata and the average vaf was less than 0%. A lessstrained second-order filter@Eq. ~2!# still having three param-eters did better. When filter parameters were allowed to v

FIG. 5. ~a! Boxplots for experiment 2 of peak time~A! and latency~B! as afunction of stimulus onset velocity and phasic peak duration~C! as a func-tion of feedback delay.

2150 J. Acoust. Soc. Am., Vol. 109, No. 5, Pt. 1, May 2001

ribution subject to ASA license or copyright; see http://acousticalsociety.org

0ss

er.

s

y.8

--

ced-

n-

ry

for each condition, fits were obtained with a mean vaf60.3%. On using median values ofa54.31, b514.9, c570.7, and delay50.1 s, the vaf was only 8%.

VI. DISCUSSION

The simultaneous presentation of PSS and DAF iparadigm in which subjects begin each vocalization trial ato hear their voice feedback in real time. It is only with thonset of the PSS that there is introduction of a delay inprocessing of the subject’s voice~50–500 ms!. The effect isthat the PSS is heard by the subject as a change in pitcthe feedback voice signal, but perception of the subjeresponse is delayed until after the particular delay intervThus, the subject hears the PSS in real time but doesimmediately hear his/her own response. Subjects weretold ahead of time that their responses would be delayAlso, the delay was not apparent to the investigators, andtherefore unlikely that the subjects noticed it and responto it. Analysis of Fig. 2 illustrates the primary findings of thstudy; the delayed feedback prolonged the duration ofphasic peak response. Combining a delay with a reductiovelocity of the stimulus also led to an increase in latency areduction in magnitude of the response.

A key assumption of this study is that in the DAF codition subjects could not hear their true vocal output. Whwe cannot be certain that their unshifted, undelayed prodtions were masked by the processed auditory feedback sirelayed via the headphones, we are reasonably surewhatever perception was present was not a major influeon the responses. The addition of 70 dB SPL masking noto the processed signal helped reduce perception of theprocessed signal via bone conduction. Voice loudnessquiet, approximately 70 dB SPL, while the fed back voiloudness was louder, at approximately 80 dB SPL. Thisference, as well as the structure of the headphone padover the ears, reduced hearing of any side-tone signa

FIG. 6. ~a! Mathematical model of compensatory responses composed oopen-loop pathway~DesiredF0→Gain→F0!, and a feedback pathway~up-per portion of diagram!. Dynamics are implemented with two matched dlays, and a second-order filter@see Eqs.~1! and ~2! in the text for details#.Components to the left of the dotted line represent the person and that tright represent the model.

2150Hain et al.: Pitch-shift reflex with delayed auditory feedback

/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49

Page 6: Effects of delayed auditory feedback (DAF) on the pitch-shift reflex

dcoddlly

toizr

d

ers

-

ecainlteof

tfsrm

n

ca

nthete

omoornn

cathichiilaems

i-sw

hise-

-first00

layent

forllyoicersni-dero-rytive

eri-1ci-nseF

asore

oessoagni-on-ow-

byfsetre-eak

s the

-of

theeub-o aheackre-

het ve-m-e ass in

ob-etsd-

Redist

preliminary tests, the experimenters themselves failed totect the bone conducted or side-tone signal under theseditions. We are thus reasonably confident that subjectsnot hear their own voice response during the delay perio

The model is an elaboration of a structure originaproposed elsewhere~Hain et al., 2000; Larsonet al., 2000!.In brief, the input to the model is the desiredF0 and theoutput is vocalization~F0!. Internal delays are postulatedrepresent known response latencies, and the harmonDAF circuitry is implemented with a variable delay. AfteF0 error ~disparity between intended and auditoryF0! iscomputed, the result is processed with a filter and useadjust theF0 drive signal.

The filter element~Fig. 6! is critical to simulate detailsof the experimental data. For the step responses of expment 1, a second-order filter consisting of two cascaded fiorder dynamic elements was used@see Eq.~1!#. The firstelement is a simple low-pass filter,f gain/sTc11, having amedian time constant (Tc) of 0.17 s. The second is an adaptation operator, of the formsTa/sTa11, with a mediantime constant~Ta! of 0.235 s. The adaptation operator causthe overall filter response to a step input to eventually deto 0, and greatly improved the performance of the filtersimulating experimental data compared to a low-pass fialone. The frequency response of the entire filter is thatband-pass with a broad peak centered at 5 Hz wherephase shift is 0, and both low- and high-frequency roll ofOthers have also described the vocal control system in teof a second-order system~Kawahara, 1995; Ternstro¨m et al.,1989!.

For the ramp responses, the filter of Eq.~1! did notperform well in fitting experimental data, but the less costrained second-order system as shown in Eq.~2! performedreasonably well. While both Eqs.~1! and ~2! are second-order systems, Eq.~1! being a degenerate form of Eq.~2!,they differ in that the roots of the second-order systemtake on imaginary values in the form of Eq.~2!. This allowsEq. ~2! to produce responses exhibiting damped oscillatiowhile Eq. ~1! can only produce responses consisting ofsum of two exponentials. The frequency response of thetire filter is again a band-pass filter, but with a higher cenfrequency than that of Eq.~1! ~approximately 10 Hz!.

There are several possible reasons why the best-fit mels for the step and ramp inputs might be different. The raresponses were longer than the step responses, due tmore prolonged stimulus. A longer stimulus might allow fparticipation of other tracking mechanisms, and thereforebe as well fit by a model that reflects early processing. Aother possibility is that the ramp stimulus may invoke vocontrol systems that make the response more complexthe step. At present there is insufficient data to decide whof the two second-order filters is more accurate. While tmathematical model is doubtless oversimplified and simbehavior might be produced by other constructs, it does donstrate its feasibility and provides a quantitative hypothethat can be tested in future experiments.

Returning to the data, the inclusion of the DAF condtions did not affect measures taken from the initial aspectthe responses. The latency, onset slope, and peak time

2151 J. Acoust. Soc. Am., Vol. 109, No. 5, Pt. 1, May 2001

ribution subject to ASA license or copyright; see http://acousticalsociety.org

e-n-id.

er/

to

ri-t-

sy

rahe.s

-

n

s,en-r

d-pthe

ot-lanhsr-

is

ofere

unchanged by DAF. The model of Fig. 6 reveals that timmunity results from the approximately 100-ms delay btween the auditory stimulus and theF0 response, adding further evidence that the response to the PSS in at least the100–150 ms after the PSS is driven by input during the 1ms preceding theF0 response~indicated by ‘‘* ’’ in Fig. 2!.

There was a nonsignificant trend for greater DAF deto be accompanied by greater peak magnitude in experim1. If significant, this difference would have suggested thatsituations where an auditory-vocal disparity is artificiamaintained, subjects could have continued to change vF0 indefinitely; a ‘‘runaway’’ situation. Instead, it appeathat some factors other than auditory input limit the magtude of the response. These factors might simply includynamics that adapt to input disparity such as we have pposed in the filter element of our model, or nonauditosources of feedback, e.g., efference copy or propriocepfeedback.

The reduced response magnitude observed in expment 2~mean535.8 cents! compared to that in experiment~mean569 cents! suggests that lower stimulus onset veloties under DAF conditions cause a reduction in respomagnitude. Previous findings from experiments without DA~Larsonet al., 2000! suggested that response magnitudein-creasedmodestly with decreases in onset velocity, and it wsuggested that lower stimulus onset velocities might be meffective in eliciting responses to PSS~Larsonet al., 2000!.Results of the present study indicate this relationship dnot apply under DAF conditions. The model of Fig. 6 alsupported the present findings, as simulated response mtude for experiment 2, declined with decreasing stimulusset velocity. The response decline was caused by the lfrequency rolloff characteristic of the filter of Eqs.~1! and~2!.

The features of the PSS response that were affectedthe delay conditions were related to the duration and ofof the response or, in other words, later aspects of thesponse. One clear effect was the increase in phasic pduration with the DAF interval~Figs. 2, 4, and 5!. The eventthat apparently terminated the phasic peak response waresponse to the delayed perception of the initialF0 change~marked by asterisks in Fig. 2!. This behavior was also replicated in mathematical simulations, and is an exampleinstability. As the delay period decrease, the duration ofphasic peak change inF0 is inappropriately extended. Whilthe instability was not severe, it may be concluded that sjects’ sustained perception of pitch-shifted feedback led tprolonged voiceF0 response. This again confirms that tvoice F0 response is achieved through a negative feedbcircuit rather than through a more primitive stereotypedaction to a change in auditory input~Sapiret al., 1983!.

In experiment 2, the DAF condition had no effect on tearly components of the response; however, PSS onselocity combined with DAF had a weak affect on these coponents. There was a decrease in latency and peak timthe PSS stimulus onset velocity increased. Also, responseexperiment 2 had longer overall latencies than thoseserved in experiment 1, in which sudden pitch-shift onswere used. This finding is in agreement with previous fin

2151Hain et al.: Pitch-shift reflex with delayed auditory feedback

/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49

Page 7: Effects of delayed auditory feedback (DAF) on the pitch-shift reflex

th

hdi

n

,mbdhiatfiltlepp

ysisr

NI

oc

res-

f-nee,

re-me-

.

. L.h,

ack

No.ck-

Redist

ings in which low velocity stimuli elicited responses wilong latencies~Larsonet al., 2000!.

To summarize, the results of experiment 1 showed twhen a subject’s audition of their response to a step ofparity between their intendedF0 and a fed-backF0 is de-layed, there is a prolongation of the response, but no chain magnitude. At longer delays, e.g.,.300 ms, the effect isan increase in instability of voiceF0. From experiment 2when the pitch shift onset velocity is reduced forming a rastimulus, along with the introduction in the delay of the suject’s response to it, there is likewise an increase in theration of the subject’s response. With reduced pitch svelocity, latency increases and magnitude declines. Mematical modeling of the response using a second-orderreproduced the step responses well, but was not as abreproduce ramps as accurately. This suggests that othercesses not reflected by our mathematical model may comcate the longer responses.

ACKNOWLEDGMENTS

We gratefully acknowledge the assistance of RokAkhavein, who wrote computer programs for data analyand of Dr. Mary Kay Kenney and Danielle Lodewyck fohelp in data analysis. This research was supported byGrant No. DC02764-01.

Burnett, T. A., Freedland, M. B., Larson, C. R., and Hain, T. C.~1998!.‘‘Voice f 0 responses to manipulations in pitch feedback,’’ J. Acoust. SAm. 103, 3153–3161.

2152 J. Acoust. Soc. Am., Vol. 109, No. 5, Pt. 1, May 2001

ribution subject to ASA license or copyright; see http://acousticalsociety.org

ats-

ge

p-u-fth-erto

ro-li-

n,

H

.

Burnett, T. A., Senner, J. E., and Larson, C. R.~1997! ‘‘Voice f 0 responsesto pitch-shifted auditory feedback; a preliminary study,’’ J. Voice11,202–211.

Hain, T. C., Larson, C. R., Burnett, T. A., Kiran, S., and Singh, S.~2000!‘‘Instructing participants to make a voluntary response reveals the pence of two vocal responses to pitch-shift stimuli,’’ Exp. Brain Res.130,133–141.

Kawahara, H.„1995…. ‘‘Hearing voice: Transformed auditory feedback efects on voice pitch control,’’ presented at Computational Auditory SceAnalysis and International Joint Conference on Artificial IntelligencMontreal.

Kawahara, H., and Aikawa, K.~1996!. ‘‘Contributions of auditory feedbackfrequency components onf0 fluctuations,’’ J. Acoust. Soc. Am.100, 2825.

Larson, C. R., Burnett, T. A., Freedland, M. B., and Hain, T. C.„1997….‘‘Voice f 0 responses to manipulations in pitch feedback stimuli,’’ psented at First International Conference on Voice Physiology and Biochanics, Evanston, IL.

Larson, C. R., Burnett, T. A., Kiran, S., and Hain, T. C.~2000!. ‘‘Effects ofpitch-shift onset velocity on voicef 0 responses,’’ J. Acoust. Soc. Am107, 559–564.

Larson, C. R., Carrell, T. D., Senner, J. E., Burnett, T. A., and Nichols, L„1995…. ‘‘A proposal for the study of voice of control using the pitcshifting technique,’’ inVocal Fold Physiology: Voice Quality Controledited by O. Fujimura and M. Hirano~Singular, San Diego!, pp. 321–331.

Larson, C. R., White, J. P., Freedland, M. B., and Burnett, T. A.„1996….‘‘Interactions between voluntary modulations and pitch-shifted feedbsignals: Implications for neural control of voice pitch,’’ inVocal FoldPhysiology: Controlling Complexity and Chaos,edited by P. J. Davis andN. H. Fletcher~Singular, San Diego!, pp. 279–289.

Sapir, S., McClean, M. D., and Larson, C. R.„1983…. ‘‘Human laryngealresponses to auditory stimulation,’’ J. Acoust. Soc. Am.73, 315–321.

Ternstrom, S., and Friberg, A.„1989…. ‘‘Analysis and simulation of smallvariations in the fundamental frequency of sustained vowels,’’ Report3, Speech Transmission Laboratory, Royal Institute of Technology, Stoholm.

2152Hain et al.: Pitch-shift reflex with delayed auditory feedback

/content/terms. Download to IP: 130.102.42.98 On: Fri, 21 Nov 2014 22:45:49