comparacion egg y aerodinamica en mujeres

12
Journal of Speech and Hearing Research, Volume 38, 1212-1223, December 1995 Comparisons Among Aerodynamic, Electroglottographic, and Acoustic Spectral Measures of Female Voice Eva B. Holmberg Voice and Speech Laboratory Massachusetts Eye and Ear Infirmary Boston and Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge Robert E. Hillman Voice and Speech Laboratory Massachusetts Eye and Ear Infirmary Boston and Department of Otology and Laryngology Harvard Medical School and Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge Joseph S. Perkell Peter C. Guiod Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge Susan L. Goldman Voice and Speech Laboratory Massachusetts Eye and Ear Infirmary Boston This study examines measures of the glottal airflow waveform, the electroglottographic signal (EGG), amplitude differences between peaks in the acoustic spectrum, and observations of the spectral energy content of the third formant (F3), in terms of how they relate to one another. Twenty females with normal voices served as subjects. Both group and individual data were studied. Measurements were made for the vowel in two speech tasks: strings of the syllable /pae/ and sustained phonation of /e/, which were produced at two levels of vocal effort: comfortable and loud voice. The main results were: 1. Significant differences in parameter values between /pTe/ and /a/ were related to significant differences in the sound pressure level (SPL). 2. An "adduction quotient," measured from the glottal waveform at a 30% criterion, was sensitive enough to differentiate between waveforms reflecting abrupt versus gradual vocal fold closing movements. 3. DC flow showed weak or nonsignificant relationships with acoustic measures. 4. The spectral content in the third formant (F3) in comfortable loudness typically consisted of a mix of noise and harmonic energy. In loud voice, the F3 spectral content typically consisted of harmonic energy. 5. Significant differences were found in all measures between tokens with F3 harmonic energy and tokens with F3 noise, independent of loudness condition. 6. Strong relationships between flow- and EGG-adduction quotients suggested that these signals can be used to complement each other. 7. The amplitude difference between spectral peaks of the first and third formant (F1 -F3) was found to add information about abruptness of airflow decrease (flow declination) that may be lost in the glottal waveform signal due to low-pass filtering. The results are discussed in terms of how an integrated use of these measures can contribute to a better understanding of the normal vocal mechanism and help to improve methods for evaluating vocal function. KEY WORDS: female voice, glottal airflow waveform, glottal aperture, acoustic spectral slope, electroglottography This study is a continuation of our previous work that examines aerodynamic and acoustic measures of voice production. Its objectives are to (a) gain a deeper understanding of normal voice production and provide normative data for studies of vocal pathology (Holmberg, Hillman, & Perkell, 1988, 1989; Holmberg, Hillman, Perkell, & Gress, 1994a, 1994b); and (b) improve methods (Perkell, Holmberg, & Hillman, 1991) for assessing voice disorders and the efficacy of their treatment, with a particular emphasis on hyperfunctional voice disorders (Hillman, Holmberg, Perkell, Walsh, & Vaughan, 1989, 1990). Vocal hyperfunction is often associated with abnormally loud voice production (Hillman et al., 1989, 1990), and one of the most common hyperfunctional voice disorders, vocal nodules (Bone & McFarlane, 1988; Nagata, Kurita, Yasumoto, Maeda, Kawaski, & Hirano, 1983), is reported to be C 1995, American Speech-Language-Hearing Association 1212 0022-4685/95/3806-112 2

Upload: yassiska-alejandra

Post on 21-Jul-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Comparacion EGG y Aerodinamica en Mujeres

Journal of Speech and Hearing Research, Volume 38, 1212-1223, December 1995

Comparisons Among Aerodynamic,Electroglottographic, and AcousticSpectral Measures of Female Voice

Eva B. HolmbergVoice and Speech Laboratory

Massachusetts Eye and Ear InfirmaryBoston

andResearch Laboratory of Electronics

Massachusetts Institute of TechnologyCambridge

Robert E. HillmanVoice and Speech Laboratory

Massachusetts Eye and Ear InfirmaryBoston

andDepartment of Otology

and LaryngologyHarvard Medical School

andResearch Laboratory of Electronics

Massachusetts Institute of TechnologyCambridge

Joseph S. PerkellPeter C. Guiod

Research Laboratory of ElectronicsMassachusetts Institute of Technology

Cambridge

Susan L. GoldmanVoice and Speech Laboratory

Massachusetts Eye and Ear InfirmaryBoston

This study examines measures of the glottal airflow waveform, the electroglottographic signal(EGG), amplitude differences between peaks in the acoustic spectrum, and observations of thespectral energy content of the third formant (F3), in terms of how they relate to one another.Twenty females with normal voices served as subjects. Both group and individual data werestudied. Measurements were made for the vowel in two speech tasks: strings of the syllable/pae/ and sustained phonation of /e/, which were produced at two levels of vocal effort:comfortable and loud voice. The main results were:

1. Significant differences in parameter values between /pTe/ and /a/ were related tosignificant differences in the sound pressure level (SPL).

2. An "adduction quotient," measured from the glottal waveform at a 30% criterion, wassensitive enough to differentiate between waveforms reflecting abrupt versus gradual vocal foldclosing movements.

3. DC flow showed weak or nonsignificant relationships with acoustic measures.4. The spectral content in the third formant (F3) in comfortable loudness typically consisted

of a mix of noise and harmonic energy. In loud voice, the F3 spectral content typically consistedof harmonic energy.

5. Significant differences were found in all measures between tokens with F3 harmonicenergy and tokens with F3 noise, independent of loudness condition.

6. Strong relationships between flow- and EGG-adduction quotients suggested that thesesignals can be used to complement each other.

7. The amplitude difference between spectral peaks of the first and third formant (F1 -F3) wasfound to add information about abruptness of airflow decrease (flow declination) that may belost in the glottal waveform signal due to low-pass filtering.

The results are discussed in terms of how an integrated use of these measures can contributeto a better understanding of the normal vocal mechanism and help to improve methods forevaluating vocal function.

KEY WORDS: female voice, glottal airflow waveform, glottal aperture, acoustic spectralslope, electroglottography

This study is a continuation of our previous work that examines aerodynamic andacoustic measures of voice production. Its objectives are to (a) gain a deeperunderstanding of normal voice production and provide normative data for studies ofvocal pathology (Holmberg, Hillman, & Perkell, 1988, 1989; Holmberg, Hillman,Perkell, & Gress, 1994a, 1994b); and (b) improve methods (Perkell, Holmberg, &Hillman, 1991) for assessing voice disorders and the efficacy of their treatment, witha particular emphasis on hyperfunctional voice disorders (Hillman, Holmberg,Perkell, Walsh, & Vaughan, 1989, 1990). Vocal hyperfunction is often associated withabnormally loud voice production (Hillman et al., 1989, 1990), and one of the mostcommon hyperfunctional voice disorders, vocal nodules (Bone & McFarlane, 1988;Nagata, Kurita, Yasumoto, Maeda, Kawaski, & Hirano, 1983), is reported to be

C 1995, American Speech-Language-Hearing Association 1212 0022-4685/95/3806-112 2

Page 2: Comparacion EGG y Aerodinamica en Mujeres

Holmberg et at.: Comparisons Among Measures of Voice 1213

approximately twice as common in adult females as in adultmales (Herrington-Hall, Lee, Stemple, Niemi, & McHone,1988; Nagata et al., 1983). Therefore, this study focused onfemale voices under conditions of comfortable and loudphonation.

Our previous elicitation material consisted of strings ofrepeated /pa/ syllables. In this study subjects also producedsustained phonation of /a/ vowels. The /pa/ syllables areused for inferred measures of the level of subglottal airpressure during the vowel (Rothenberg, 1973; Smitheran &Hixon, 1981; Lfqvist, Carlborg & Kitzing, 1982). Theseinferred pressure values are then compared to acoustic andaerodynamic measures of voicing during the vowel. How-ever, the // vowel from these syllables is too short formeasurements of signal periodicity (cf. Hillenbrand, 1988),such as amplitude and frequency perturbation. Vowels fromsyllable strings are also too short for excision and presen-tation as stimuli for potential use in perceptual tests toevaluate aspects of voice quality, such as "breathiness."Our intention, then, is to combine measures from the twospeech tasks. However, because parameter values obtainedfor syllable repetition may differ from those obtained forsustained vowel phonation (Higgins & Saxman, 1993), weexamined relationships and differences between measuresobtained from the two types of productions.

To date we have focused on glottal airflow parametersthat were obtained from the inverse filtered oral airflowsignal, and our acoustic analyses have included only thesound pressure level (SPL) and fundamental frequency (Fo).However, in using indirect methods to estimate laryngealfunction, it is important to combine measures from differenttechniques and to determine how well the measures corre-spond with each other (cf. HertegArd & Gauffin, 1995).Furthermore, the inverse filtering technique is not withoutproblems, which may become especially significant in stud-ies of large groups of subjects. In settings that are notoriented primarily toward research, such as in clinics wherevoice disorders are evaluated and treated, audio recordingsand acoustic analyses may be easier to obtain than record-ings and analyses of glottal air flow, and the necessaryacoustic equipment is more readily available (i.e., manymore "user friendly" products are commercially available foracoustic recording and analysis than there are for aerody-namic assessments). Therefore, in this study we addedmeasures of the slope of the acoustic spectrum. Onepurpose was to determine if such acoustic measures couldbe used to supplement, or even substitute for, flow mea-sures that are harder to obtain and/or are especially sensi-tive to technical difficulties.

One difficulty with our flow-based technique is assuringthat there is a perfect seal between the subject's face andthe transducer mask; such a seal is necessary to obtainreliable flow measures. Air leaks are difficult to detect duringthe recordings, and even small leaks have large effects onthe amplitude-based flow data that include the DC flowcomponent. Another difficulty is the occurrence of unsuc-cessful inverse filtering, which can result in formant residualssuperimposed on the glottal waveform. Ripple, due to for-mant residuals, can cause unreliable time-based measure-ments, such as measurements of adduction quotient (closed

time/time for the entire cycle). Therefore, to look for acousticinformation to complement adduction quotient, we mea-sured the amplitude difference between the two first har-monics (H1-H2), which reflects the degree to which thewaveform tends towards a sinusodial shape (Klatt & Klatt,1990) and should be indirectly related to the degree of glottalabduction.

An additional issue is raised by the low-pass filtering thatis used in our inverse filtering algorithm (Perkell, Holmberg,& Hillman, 1991). For female voices, low-pass filtering at1100 Hz is used to avoid the effects of a resonance in thetransducer face mask that occurs just above 1 kHz (Badin,HertegArd, & Karlsson, 1990) and to simplify the inversefiltering procedure. However, low-pass filtering has theundesirable effect of "rounding the corners" of waveformdiscontinuities, such as at the instance of vocal fold closure,the event that produces the main excitation of the vocal tract(Fant, 1979). Therefore, a glottal waveform associated withhigh closing velocities and abrupt reduction of the air flowmay be more affected by low-pass filtering than a waveformwith low closing velocities and gradual reduction of theairflow stream. Abrupt reduction of the flow generates highfrequency energy in the acoustic spectrum (Stevens, 1977).Therefore, to look for information that reflects possibleabrupt changes in the glottal waveform, we measured thespectral amplitude from the acoustic signal in the frequencyregion of the third formant (F3) (well above the flow low-passfrequency of 1100 Hz), as the difference in amplitudesbetween the fundamental (H1) and the highest spectral peakin the F3 region, and the difference in amplitudes betweenthe highest spectral peaks in the first and third formants(F1 -F3).

Measures made from the acoustic spectrum may alsoassist in an objective evaluation of voice quality, becausemuch of speech perception is based on auditory spectralanalyses of the speech signal (Klatt & Klatt, 1990). Oneprominent aspect of voice quality is degree of perceived"breathiness" (cf. Hammarberg, 1986). A breathy voice isthe result of incomplete vocal fold closure and increasedsubglottal coupling (Klatt & Klatt, 1990; S6dersten & Lin-destad, 1990; S6dersten, Lindestad, & Hammarberg, 1991).Reported spectral correlates of increased subglottal cou-pling are an increase of the relative amplitude of the funda-mental (Bickley, 1982; Fant, 1979, 1982; Hillenbrand, Cleve-land, & Erickson, 1994; Huffman, 1987; Klatt & Klatt, 1990;Sundberg & Gauffin, 1979) and a widening of the firstformant (F1) bandwidth with subsequent decrease of the F1amplitude (Fant, 1979). The acoustic spectra for breathyvoices also have been found to display noise in the fre-quency region of F3 (Hillenbrand et al., 1994; Klatt & Klatt,1990) and reduced amplitudes of the higher frequencyharmonics (Klatt & Klatt, 1990) with a steeper overall down-ward spectral slope (Klich, 1982; Stevens, 1977). Klatt andKlatt (1990) showed a strong relationship between perceivedbreathiness and the amplitude difference between the twofirst harmonics of the acoustic spectrum (H1-H2; cf. Bickley,1982). Thus, this study also examines relationships betweenthe measures of spectral amplitude (H1-H2, H1-F1, H1-F3,and F1-F3) and glottal airflow waveform measures, as wellas their relationship to SPL in two levels of vocal effort. In

Page 3: Comparacion EGG y Aerodinamica en Mujeres

1214 Journal of Speech and Hearing Research

addition, observations were made of the spectral content ofthe third formant frequency region, in terms of the presenceof noise versus harmonic energy. These observations arerelated to vocal effort as well as to aerodynamic andacoustic measures that should reflect degree of glottalaperture. For example, incomplete vocal fold closure shouldbe reflected in decreased adduction quotient and possiblyincreased DC flow, and should result in reduced highfrequency harmonic energy in combination with high fre-quency noise (Klatt & Klatt, 1990).

Finally, we have also included measures obtained from anelectroglottographic (EGG) signal. The literature has sug-gested that the EGG waveform contains a number of inter-esting features that could be useful in understanding theunderlying vocal-fold vibration pattern (Childers, Hicks,Moore, Eskenazi, & Lalwani, 1990; Childers & Lee, 1991;Rothenberg & Mahshie, 1988). We used the EGG waveformfor deriving an adduction quotient in order to determinewhether any useful information could be obtained from theEGG-based quotient that was not available from the analo-gous flow-based quotient.

Method

In this section, the methods are described briefly. Moredetailed descriptions of recording procedures, signal pro-cessing, data extraction, and analyses procedures are givenin Perkell, Holmberg, and Hillman (1991).

Subjects

Twenty females served as subjects. Subjects' agesranged from 20 to 43 years, with an average of 27.5 years.All subjects were native speakers of American English, hadno history of voice or hearing problems, and were nonsmok-ers. None had professional singing or speaking training.

Speech Material

Two different speech tasks were produced: (a) strings offive repetitions of the syllable /pa/ (paepaepaepaepae) at arate of about 1.5 syllables per second, and (b) the vowel la/,sustained for 2-3 seconds. The two types of utteranceswere produced alternately three times each in two levels ofvocal effort (henceforth called "loudness conditions"):"comfortable" and loud voice. Comfortable loudness wasequal to the subject's preferred SPL during a short mono-logue, and loud voice was approximately 6 dB (1 dB)above the subject's comfortable level. Subjects producedthe comfortable voice level first, followed by loud voice.Loudness levels were monitored by one of the experiment-ers, using an oscilloscope. The subjects were allowed topractice the tasks if needed, and none had any difficulty inperforming the tasks.

Recording Procedures

The subject was seated in a sound-isolated booth. Oneexperimenter monitored the subject in the booth. Outside

the booth, the signals were recorded on a tape with a datarecorder (TEAK RD-111T PCM) and were monitored on amultichannel oscilloscope by a second experimenter. Re-cordings were made of oral airflow, intraoral air pressure,sound pressure, and EGG.

Oral airflow. Oral airflow was transduced with a "Rothen-berg mask," which is a high time-resolution pneumotacho-graph (Glottal Enterprises-Rothenberg, 1973, 1977) with aflat frequency response up to 1 kHz (Badin, Hertegard, &Karlsson, 1990). It consists of a circumferentially vented facemask, with the venting holes covered with resistive wiremesh. The pneumotachograph was hand-held by the exper-imenter as tightly as possible against the subject's face,taking care to assure a tight seal between face and mask(while not interfering with utterance production). At the timeof the recording, the flow signal was low-pass filtered at1100 Hz, using a linear phase (eight-pole Bessel) filter, inorder to eliminate the resonant effects of the vocal tractabove the first formant and the transducer mask. A zero flowlevel was recorded before each syllable string as a measure-ment reference, in order to minimize errors due to potentialdrift in the flow transducer system. During the recording ofthis "zero flow," the mask was held still on a stationarysurface.

Calibration signals for flow were recorded for each subjectand individual face mask combination. For DC airflow cali-bration, a series of six DC levels was generated, using an airtank as the airflow source and a rotometer to monitor theflow levels. A mask mold was used to couple the flow sourceto the face mask (Glottal Enterprises).

Intraoral air pressure. Intraoral air pressure was trans-duced using a thin, short catheter, which was passedthrough a fitting in the face mask. One end of the catheterwas passed between the subject's lips into the oral cavity toan approximate location just behind the incisors, and theother end was connected to a differential pressure trans-ducer (Glottal Enterprises) with a flat frequency response upto about 30 Hz.

Calibration of air pressure was performed once for eachrecording session. A series of six pressure levels, from 0 to24 cm H2 0, was produced with a pressure source (syringe)and monitored with a U-tube water manometer.

Sound pressure. Sound pressure was transduced usinga small electret microphone (Sony model ECM 50) that wasattached to the handle of the pneumotachograph at a fixed,reproducible distance of 15 cm from the subject's lips.

A calibration signal for the sound pressure level (SPL) wasrecorded for each subject. The signal was produced bymeans of a buzzer, manufactured for use as an artificiallarynx. The buzzer was held above the microphone, whichwas attached to the pneumotachograph as during therecordings. The SPL value was read off at the microphone,using a sound level meter (on a linear scale, 400-msecaveraging time).

EGG. An electroglottographic (EGG) signal was recordedusing a laryngograph (Glottal Enterprises). A DBX automaticgain circuit and a linear phase high-pass filter with a cutofffrequency of 20 Hz (Glottal Enterprises, LP-HP 2) were usedto normalize the signal amplitude and eliminate the effects ofgross larynx movements. At the time of the recording, the

38 1212-122 December 1995

Page 4: Comparacion EGG y Aerodinamica en Mujeres

Holmberg et al.: Comparisons Among Measures of Voice 1215

EGG signal was low-pass filtered at 1710 Hz using a linearphase (eight-pole Bessel) filter.

Data Processing and Analyses

The recorded oral airflow, oral air pressure, speech, andEGG signals were digitized simultaneously, after which theywere demultiplexed and processed further in software (Per-kell et al., 1991). At the time of the digitizing, air pressure anda copy of the speech signal (passed through an RMS circuit)were low-pass filtered at 80 Hz (10th-order Butterworthfilter) and sampled at 312.5 samples per second (sps) toyield "slowly varying" signals (with respect to the duration ofindividual glottal cycles). These signals were used for indirectmeasurements of average transglottal air pressure and SPL.

The oral airflow signal was low-pass filtered (as during therecording) at 1100 Hz. (Low-pass filtering a second time furtherhelped to eliminate the effects of resonances above F1.) Thesignal was digitized at 10 kHz and used for inverse filtering ofthe first formant to obtain a glottal airflow waveform. A singlezero-pair was used as the antiresonance for the inverse filter-ing. The center frequency was determined for each token usingspectra from linear prediction coding (LPC) and discrete Fou-rier transformation (DFT) (51.2 msec window) of the flow signal.A number of measurements (described below) were obtainedfrom the glottal airflow waveform.

A second copy of the speech signal, low-pass filtered at4.5 kHz and sampled at 10 kHz, was used for acousticspectral analyses. Amplitude differences between peaks onthe spectral slope (described below) were obtained from theacoustic spectrum.

The EGG signal was (again) low-pass filtered at 1710 Hz atdigitizing and sampled at 10 kHz. A measure of "adductionquotient" (closed time/T) was obtained from this signal(Colton, Brewer, & Rothenberg, 1985).

Data Extraction and Obtained Measures

The data were extracted algorithmically with interactivemonitoring, at a vowel midpoint location (Perkell et al.,1991). Four sets of measures were extracted from thesignals for the three productions of sustained // and themiddle three syllables of each /pa/ string in each loudnesscondition (making three // tokens and nine /p/ tokens perloudness condition). The measures were chosen for theirpotential relevance to vocal loudness and voice quality (e.g.,Fant, 1982; Klatt & Klatt, 1990; Sdersten, Lindestad, &Hammarberg, 1991; Sundberg & Gauffin, 1979).

1. Estimates of average transglottal air pressure, (thedriving force for phonation, cm H20) were made for thevowel, interpolating from the levels during the occlusions forthe voiceless stop consonants /p/ (Holmberg et al., 1988;L6fqvist, Carlborg, & Kitzing, 1982; Rothenberg, 1977; Smith-eran & Hixon, 1981). SPL was calculated from the RMS ofthe speech signal.

2. Glottal airflow waveform measurements (see Figure 1)were averaged over four consecutive cycles (for each token).Measurements were made of AC flow (the modulated por-tion of the flow, reflecting magnitude of vocal fold oscillation,

I/sec), DC flow (the unmodulated flow, I/sec), and flow-adduction quotient (closed time/T).1 Maximum flow declina-tion rate (an indirect measure of vocal fold closing velocity,I/sec2) was extracted from the first derivative of the glottalairflow waveform. For calculation of adduction quotient,places where the flow level was 30% of the differencebetween peak and minimum flow were identified and used todefine arbitrary times of "opening" and "closing" in thewaveform (Colton, Brewer, & Rothenberg, 1985; Rothen-berg, 1985).2 Measurements at these levels were well abovethe portion of the signal that indicates vocal fold closure.

3. The EGG signal was used for extraction of a measureof adduction quotient (vocal fold contact time/T), which wasmeasured at the 65% criterion from the same four cycles asfor the glottal flow signal. Measurement at the (arbitrary)65% level was determined to not interfere with the portion ofthe signal that indicates vocal fold closure. (See Figure 1.)

4. Amplitude differences (dB) were calculated from theacoustic spectra (see Figure 1): the amplitude differencebetween (a) the first two harmonics (H1-H2),3 (b) the firstharmonic and the peak harmonic in the first formant (H1-F1;Stevens & Hanson, 1995), (c) the first harmonic and thespectral peak (harmonic or noise) of the third formant(H1-F3; Stevens & Hanson, 1995), (d) the peak harmonic ofthe first formant and the spectral peak (harmonic or noise) ofthe third formant (F1-F3).

Qualitative observations were made of the energy contentin the frequency region of the third formant (F3). Three levelswere used, according to which the spectrum in the F3 regionconsisted predominantly of (a) harmonics (cf. Figure 1, top),(b) noise (cf. Figure 1, bottom), or (c) a mixture of harmonicenergy and noise. The mixed condition was used as a"wastebasket" for uncertain cases, to help in identifying thetwo extreme levels (i.e., to reduce ambiguity of judgments).

Figure 1 shows examples of acoustic spectra and wave-forms for [pae] for two different speakers. In both the top andbottom figures, the (A) spectra show the lower frequencyregion, used for calculation of the amplitude differencebetween the two first harmonics. The (B) spectra coverfrequencies up to 4 kHz; they were used for the measure-ments of the spectral slope and observation of the F3 energycontent (harmonics or noise), as mentioned above. The rightportions of the figure show: the glottal airflow waveform (C),the first derivative (D), and the (inverted) EGG waveform (E)for the two speakers. The events for data extraction areindicated on the time waveforms by cross-marks (+).

1We focused on adduction quotient rather than the complementary abductionquotient because we use our normal data in studies of vocal hyperfunctionand relate the results to a theoretical model of different modes of vocal foldclosure in patients with vocal hyperfunction (Hillman et al., 1989).2Previously we used 25% (flow) and 75% (EGG) criteria to obtain measures ofadduction quotients (Perkell et al., 1991). The criteria were changed to 30%(flow) and 65% (EGG), because these levels were less affected by waveformirregularities.3A correction was calculated for the influence of F1 (K. N. Stevens, personalcommunication):

AH1 corrected = AH1 - (20 log 10(1/(1 - FrH1/FrF1)*2))AH2 corrected = AH2 - (20* log 10(1/(1 - FrH2/FrF1)*2))

where AH1 is the amplitude of the first harmonic, AH2 is the amplitude of thesecond harmonic, FrH1 is the frequency of the first harmonic, FrH2 is thefrequency of the second harmonic, FrF1 is the frequency of the first formant.

Page 5: Comparacion EGG y Aerodinamica en Mujeres

1216 Journal of Speech and Hearing Research

Speaker 1 © r ti IFZV tLT ------20 - --'

-1 0 0 - -

* I

-120-- -

-10 400 800 1200

m 40 l

0 2000 4000

frequency (Hz)

o

@

Speaker 2

-201- -- r

I-0-100,.- . -- -

0 400 800 1200

I I I-100

0 2000 4000

frequency (Hz)

© r tl

o LTJ'_---......

FIGURE 1. Acoustic spectra (dB) (A, B), glottal airflow waveform (I/sec) (C), first derivative (I/sec/sec) (D), and EGG (E) for thevowel in /pae/ produced by two speakers. The top portion of the figure (Speaker 1) shows signals for a speaker withpredominantly harmonic spectral energy in the F3 region. The bottom portion of the figure (Speaker 2) shows signals for aspeaker with predominantly noise in the F3 region. In spectrum A (top and bottom) the first two harmonics are indicated. Inspectrum B (top) the peak harmonics of F1 and F3 are indicated. In spectrum B (bottom) the peak harmonic of F1 and peak noiseenergy in F3 are indicated. On the glottal airflow and EGG waveforms, the events for extraction of the adduction quotients areindicated. On the first derivative, the maximum negative peak was used for extraction of maximum flow declination rate.

Statistical AnalysesAnalysis of covariance, with SPL as the covariate, was

performed for each variable to examine the extent to whichdifferences between /pae/ and /a:/ were related to differencesin SPL (Holmberg et al., 1994a). A significance level of

p = < 0.005 was used for each test, on the basis of theBonferroni correction for multiple univariate comparisons foran overall significance level of p < 0.05 (0.05/10 variables =0.005).

Analysis of variance was performed to examine the extent

38 1212-1223 December 1995

Page 6: Comparacion EGG y Aerodinamica en Mujeres

Holmberg et al.: Comparisons Among Measures of Voice 1217

to which the parameters differed between tokens for whichthe energy in the F3 region consisted of harmonics andthose that consisted predominantly of noise. In the interestof not overinterpreting results from these subjectively basedjudgments of the spectrum, a conservative level ofp = < 0.001 was used for each test, on the basis of theBonferroni correction for multiple univariate comparisons foran overall significance level of p < 0.01 (0.01/11 variables =0.001).

Pearson product moment correlation coefficients (p <0.05) were calculated to examine pairwise linear relationshipsbetween parameters. Scatterplots were examined to assurethat the results were not determined by outliers and that thedata were distributed across scales of measurements (i.e.,there was no grouping of values at the extreme ends of scales).The aerodynamic measures, were log,, transformed for corre-lations with acoustic parameters that were measured in dB(because dB is a log1 0 transformation of sound pressure).Correlations were calculated for tokens across loudness con-ditions and within each loudness condition for the group ofsubjects and for each individual subject.

The correlations were calculated in order to examine linearrelationships between (a) measures obtained for the vowel instrings of the syllable /pe/ and sustained phonation of /a:/,(b) two different measures of an "adduction quotient"- oneobtained from the glottal flow waveform and the other fromthe EGG signal, and (c) selected parameter pairs that shouldbe related to degree of glottal aperture. Calculations for (b)and (c) were made for data pooled across /a/ and /pse/.

Results

Comparisons between measures of the vowel in syl-lable repetitions and sustained phonation. Group meansand standard deviations of measures for the vowel from stringsof repeated /pa/ syllables and sustained phonation of // incomfortable and loud voice are presented in the Appendix.

SPL was higher for the vowel in the syllable strings than inthe sustained phonation. Previous studies have shownstrong relationships between SPL and several of the mea-

sures utilized in this study (Holmberg et al., 1988, 1994a).Therefore analysis of covariance (ANACOVA, p < 0.05,Bonferroni corrections, p = 0.005) with SPL as the covariatewas performed to examine the extent to which differencesbetween /a/ and /pe/ were related to the difference in SPL.Table 1 presents the results. As seen in the table, there wereno significant differences in parameters for the vowel inrepeated /pe/ syllables versus sustained /d/ phonation afteradjustment for SPL.

Comparisons between adduction quotients obtainedfrom the glottal flow waveform and the EGG waveform.Usable EGG signals could be obtained for only 17 of the 20speakers. Pearson product moment correlations were per-formed between adduction quotients (pooled over /e/ and/pae/ tokens) obtained from the glottal flow and EGG signals.The relationship between the quotients was relatively weak.Across comfortable and loud voice, only 32% of the vari-ance in one adduction quotient could be accounted for byvariation in the other (r = 0.57, p < 0.001). Examining eachloudness condition, the relationship between the quotientswas stronger in loud voice (r = 0.59, p < 0.001) than incomfortable voice (r = 0.47, p < 0.001). This was most likelydue to the fact that the EGG signals were often stronger andless noisy in loud voice.

Glottal airflow and slope of the acoustic spectrum.Pearson product moment correlations were performed onpairs of parameters from the following groups: (a) spectralmeasures related to subglottal coupling (Stevens, 1977) andvocal intensity (Fant, 1977, 1982) (the measures were H1-H2, H1-F1, H1-F3, and F1-F3); (b) logio transformations ofglottal airflow and EGG waveform measures that reflectglottal aperture and vocal fold contact, that is, adductionquotient and DC flow; and (c) SPL. Correlations were per-formed across comfortable and loud voice for /pa/ and /ae/tokens pooled together. Correlations that included H1-F3and F1-F3 were calculated in three different ways, depen-dent on the content of the spectral energy in the F3 region:(a) separately for tokens with predominantly harmonic en-ergy in F3, (b) separately for tokens with predominantly noisein F3, and (c) pooled across tokens with F3 harmonics,

TABLE 1. Results from repeated tests of covariance analyses (covariate = spl) betweenparameter values for the vowel in strings of repeated /pe/ syllables and in sustained phonationof /a/ in comfortable and loud voice conditions (sp. cond).

Token type Token type Sp. cond

Parameter df f p df f p

ACFL 1,15 4.429 .053 1,15 0.949 .345AQ-F 1,14 0.113 .742 1,14 0.194 .667AQ-E 1,13 0.170 .687 1,13 1.408 .257DCFL 1,15 1.859 .193 1,15 0.852 .370MFDR 1,15 4.505 .051 1,15 0.173 .683H1-H2 1,15 2.401 .142 1,15 0.182 .675H1-F1 1,15 4.001 .064 1,15 0.956 .344H1-F3 1,15 1.583 .228 1,15 5.015 .041F1-F3 1,15 0.002 .964 1,15 1,793 .200

Note: The glottal airflow measures are AC flow (ACFL, I/sec), flow-adduction quotient (AQ-F, closedt/T), EGG-adduction quotient (closed t/T, AQ-E), DC flow (DCFL, I/sec), and MFDR (maximum flowdeclination rate, I/sec/sec). The acoustic measures are the amplitude differences between the first andthe second harmonics (H1 -H2, dB), the first harmonic and peak F1 (H1-F1, dB), and the first harmonicand peak F3 (H1-F3, dB). Bonferroni corrections: p = 0.005, N = 20 (AQ-E: N = 17).

Page 7: Comparacion EGG y Aerodinamica en Mujeres

1218 Journal of Speech and Hearing Research

TABLE 2. Pearson product moment correlations for the groupcalculated across comfortable and loud voice, pooled over /ae/and /pa:/ tokens, between measures of the acoustic spectrumslope, spl, and log,0 transformations of selected glottal airflowwaveform measures.

LAQ-F LAQ-E LDCFL SPLr r r r

H1-H2 -. 69 -. 46 .23 -. 49H1-F1 -.36 -. 15 NS -.39H1-F3 -.28 NS .18 -.32H1-F3 harmonic -.33 .29 .28 NSH1-F3 noise NS .21 NS NSF1-F3 NS .13 .18 + NSF1-F3 harmonic NS NS .22 .22F1-F3 noise .42 .51 NS .36SPL .69 .21 + -.25+ --

Note: The acoustic measures are the amplitude differences betweenthe first and the second harmonics (H1-H2, dB), the first harmonicand peak F1 (H1-Fl, dB), the first harmonic and peak F3 (H1-F3,dB), and SPL (dB). The glottal airflow measures are flow-adductionquotient (LAQ-F, closed t/T), EGG-adduction quotient (LAQ-E,closed t/T), and DC flow (LDCFL, I/sec). Harmonic = predominantlyharmonic F3 energy; noise = predominantly F3 noise. N = 20.(LAQ-E: N = 17.) + = both positive and negative relationships.NS = nonsignificant (p < 0.05).

noise, and mixed harmonics and noise. The results arepresented in Table 2.

H1-H2 should reflect the degree to which the glottalwaveform has a sinusodial shape and, inversively, the de-gree of glottal adduction. As seen in Table 2, the strongestgroup-based relationships were between (a) flow-adductionquotient, measured at a 30% criterion, and H1-H2 (r =-0.69); and (b) flow-adduction quotient and SPL (r = 0.69).The other relationships were weak. In a previous study(Holmberg et al., 1994a) differences were found betweengroup and individual data in terms of strength of relation-ships between such measures. Therefore, in this study, inaddition to the group data, the relationships between acous-tic spectral measures and glottal waveform measures wereexamined for each individual speaker as well.

Table 3 is based on Pearson product moment correlationanalyses, performed for each individual subject, pooled over/pae/ and // tokens across comfortable and loud voice forthe same parameters as presented in Table 2.4 The numbersin the table are simple tallies of the number of speakers forwhom relationships between the spectral measures and(log,, transformed) glottal waveform measures were rela-tively strong (r > 0.70, i.e., 49% or more of the variation inone measure was accounted for by variation in the other).

As seen in Table 3, a majority of the individual speakersshow strong relationships between flow-adduction quotientand the acoustic spectral measures with the exception ofF1-F3 (i.e., with ratios that included the amplitude of thefundamental, but not with the ratio of only the higher spectralpeaks).

Relationships between F3 spectral energy contentand loudness condition. In order to examine the relation-

4The F3 harmonic and noise conditions are not presented separately in theindividual data because of insufficient number of cases.

TABLE 3. Pearson product moment correlations for individualsubjects. Number of speakers for whom relationships betweenthe acoustic spectral slope, SPL, and log,, transformations ofselected glottal airflow waveform measures were relativelystrong (r > 0.70).

LAQ-F LAQ-E LDCFL SPL

H1-H2 15(-) 5(-) 7(+) 18(-)H1-F1 15(-) 7(-) 8 (+ ) 18(-)H1-F3 12 ( ) 3(-) 5(+) 13(-)F1-F3 1 (-) 1 (-) 3 (+) 5 (-)SPL 16 (+) 6 () 9 (+) --

Note: The table is based on Pearson product moment correlationanalyses performed for each individual subject for /e/ and /pae/tokens pooled together across comfortable and loud voice. Theacoustic measures are the amplitude differences between the firstand the second harmonics (H1-H2, dB), the first harmonic and peakF1 (H1-F1, dB), the first harmonic and peak F3 (H1-F3, dB), andSPL (dB). The glottal airflow measures are log10 transformations offlow-adduction quotient (LAQ-F, closed t/T), EGG-adduction quo-tient (LAQ-E, closed time/T), and DC flow (LDCFL, I/sec). N = 20.(LAQ-E: N = 17.) + = positive relationships; - = negative relation-ships; + = both positive and negative relationships, most of thempositive; - = both positive and negative relationships, most of themnegative.

ships between F3 spectral content and loudness condition(comfortable and loud voice) simple counts were made ofthe number of tokens with F3 harmonic energy, tokens withF3 noise, and tokens with mixed noise and harmonic F3energy in comfortable and loud voice respectively. Theresults are shown in Table 4.

As seen in Table 4, most tokens (122) in comfortable voicedisplayed a mix of harmonic energy and noise in the F3region, followed by tokens with predominantly F3 noise (84).Few tokens (34) displayed predominantly F3 harmonic en-ergy in comfortable voice. In contrast, most tokens (144) inloud voice displayed harmonic F3 energy, followed bytokens with a mix of F3 harmonic energy and noise (68). Fewtokens (28) in loud voice displayed F3 energy with predom-inantly noise.

Differences in acoustic and glottal waveform mea-sures between tokens with F3 spectral energy and F3noise. Analysis of variance was performed to study differ-ences in acoustic and aerodynamic measures betweentokens with F3 excited predominantly by harmonic energyand tokens with F3 excited predominantly by noise. Theresults showed significant differences between the harmonicand noise conditions for all measures. Tokens with predom-

TABLE 4. Harmonics or noise in F3. Number of tokens, pooledover // and /pie/, with harmonic, noise, and mixed harmonicand noise spectral energy in the F3 region in the comfortableand loud voice conditions.

F3- Number of TokensSpectralEnergy Comf Loud Total

Harmonic 34 144 178Noise 84 28 112Mix 122 68 190Total 240 240

Note: Total number of tokens in each loudness condition = 240.Number of speakers = 20.

38 1212-1223 Decemnber 995

Page 8: Comparacion EGG y Aerodinamica en Mujeres

Holmberg et al.: Comparisons Among Measures of Voice 1219

inantly noise in the F3 region were associated with signifi-cantly lower SPL; larger amplitude differences betweenH1-H2, H1-F3, and F1-F3; smaller amplitude differencebetween Hi-F1; lower subglottal air pressure; smaller ad-duction quotients (flow and EGG); higher DC flow; lower ACflow; and lower MFDR.

Discussion

Repeated Syllables Versus Sustained Phonation

Our elicitation materials contained strings of /pe/ sylla-bles, from which we measured inferred levels of subglottalair pressure, and sustained phonation of // vowels withlonger duration, needed for future use as stimuli in listeningexperiments and signal periodicity measures. Our goal is tobe able to generalize across the two speech tasks and torelate measurements of air pressure to perceptual judg-ments of voice quality, such as "breathy" and/or "strained"voice, in future studies.

SPL was higher for the //l in the syllable strings than insustained // phonation. A possible explanation is that thetime of data extraction (mid-vowel) was shortly after theinitiation of the vocalization in the syllables, whereas data forthe sustained phonation were extracted at a time well intothe vowel. SPL was generally higher in the very beginning ofthe phonation of the sustained vowels, after which SPLstabilized, conceivably because of initial higher subglottalpressure. Previous studies have shown strong relationshipsbetween variation in SPL and several of these measures(Holmberg et al., 1988, 1994a), and in this study there wereno significant parameter differences between /p/ and /e/after covariation of SPL, which suggests that the differenceswere, to a large extent, related to the higher SPL for /pae/.The results suggest that it should be possible to makecomparisons between values of inferred subglottal pressuremeasurements from syllable repetition and other measures,obtained from sustained vowel phonation, as long as theutterances have the same SPL or ad hoc adjustments aremade for SPL differences.

Flow- and EGG-Adduction Quotients

The EGG waveform was used for deriving an "adductionquotient" (closed time/T) in order to determine whether anyuseful information can be obtained from the EGG-basedquotient that is not available from the flow-based quotient.Initially, the literature suggested that the EGG waveformcontained a number of interesting features and events thatcould be useful for a better understanding of the underlyingvocal-fold vibration pattern (Childers et al., 1990; Childers &Lee, 1991; Childers, Naik, Larar, Krishnamurthy, & Moore,1983; Rothenberg & Mahshie, 1988). However, we seldomfind such clear events in the EGG waveforms. In addition, wehave experienced several difficulties in recording EGG (cf.Peterson, Verdolini-Marston, Barkmeier, & Hoffman, 1994).For example, for some subjects, especially women, thesignal is weak and noisy, and data have to be excluded. Forother subjects, it is not possible to obtain an EGG signal at

all (often because of obesity and/or short necks). Anotherdifficulty is gross movements of larynx in change of vocaleffort from comfortable to loud voice, which often causesintermittent disruptions of the signal.

The relationship between adduction quotients measuredfrom the EGG and flow signals was relatively weak in thegroup data. To some extent this could have been due tonoise in some of the signals. In examining relationshipsbetween the EGG- and flow-adduction quotients for eachindividual speaker, we found that the subjects fell intosubgroups in terms of quality of the flow and EGG signals.For a majority of the speakers, there were problems witheither or both of the signals, and the relationships betweenthe two adduction quotients were weak. However, a fewsubjects displayed strong and noise-free EGG signals aswell as successfully inverse filtered glottal waveforms, freeof formant residuals. For those subjects the flow and EGGquotients were highly correlated (r > 0.85). These findingssuggest that measurements of the EGG- and flow-adductionquotients at the 30% and 65% levels respectively wererelated. Because problems sometimes occur in obtainingdata from both the glottal waveforms (formant residuals) andEGG signals (weak signals), simultaneous recordings of bothsignals can be useful, because quotients from clean sam-ples from one signal can complement the other.

Measures of Glottal Aperture and Spectral Slope

A particular focus of this study was to examine therelationships between glottal airflow measures and mea-sures of the acoustic spectral slope. As mentioned in theintroduction, reliable amplitude-based flow measures thatinclude the DC flow are not always easy to obtain becauseof potential mask leak, and measurements of adductionquotient can be affected by formant residuals that are due tounsuccessful inverse filtering. Therefore, we examined rela-tionships between these flow and acoustic parameters thathave been found to reflect glottal aperture, with the goal ofcross-validating these measures.

Figure 2, from Stevens (1977, p. 273), illustrates relation-ships among the glottal configuration, the air pulse, and thespectral slope for three different adjustments of adduction orabduction.

With reference to Figure 2 and work by Fant (1982) weformed the following hypotheses about relationships amongunderlying vocal physiology and the flow and acousticparameters in our paradigm: Gradual closing movements ofa somewhat abducted membranous portion of the vocalfolds (cf. Figure 2, column b) would result in relativelysinusodial glottal waveforms and small adduction quotients.The acoustic spectrum would be characterized by a firstharmonic with relatively high amplitude, a steep overallspectral slope, an attenuated F1 peak amplitude (because ofa relatively wide F1 bandwidth), and a relatively low SPL.Thus, we expected negative relationships between adduc-tion quotient and H1-H2, adduction quotient and H1-F3,SPL and H1-H2, and SPL and H1-F3. A positive relationshipwas expected between adduction quotient and SPL.

As shown in Tables 2 and 3, adduction quotient ac-counted for a relatively large percent of the variation in

-�--�--�-��

Page 9: Comparacion EGG y Aerodinamica en Mujeres

1220 Journal of Speech and Hearing Research

Jzt

v/-N\I

a

TIME

FREQUENCY, LOG SCALE

b C

FIGURE 2. Sketches of glottal configurations (upper row), waveforms of glottal volume velocityfor one cycle of vibration (middle row), and spectrum of glottal pulse (bottom row) for threeadjustments of adduction or abduction: neutral position (a), spread arytenoid cartilages (b), andconstricted glottis (c). The arrows indicate the region of the waveform where the abruptness ofcessation of airflow influences the relative amount of spectral energy at high frequencies.Source: Stevens, 1977, p. 273. Reprinted by permission of the publisher.

H1-H2 in both the group data (r = - 0.69) and the individualdata (r > 0.70 for 15 of the 20 speakers). These resultssuggest that measurements of adduction quotient at the30% level criterion were sensitive enough to separate wave-forms with gradual and abrupt closing. The results suggestthat H1-H2 could be used as a substitute for adductionquotient in cases where reliable measurements from theglottal waveform are difficult to obtain because of unsuc-cessful inverse filtering.

Ideally, high vocal fold closing velocities and abruptreduction of the air flow should be reflected in glottalwaveforms with sharp corners between the closing andclosed portions (cf. Figure 2, column c). However, a detri-mental effect of low-pass filtering is a "rounding" of wave-form discontinuities (corners), which could have an influenceon waveforms resulting from high vocal fold closing veloci-ties. Already rounded waveforms, which result from moregradual closing velocities (cf. Figure 2, column b), would beless influenced by the low-pass filtering rounding effect. Forsuch waveforms, the F3 is excited by noise. The findings ofnonsignificant relationships between flow-adduction quo-tient and F1-F3 for tokens with F3 harmonic energy butsignificant relationship for tokens with noise in the F3 region(r = 0.42, p < 0.001) may indicate that the sharp corners inthe glottal waveforms, resulting from abrupt vocal foldclosing, were obscured by the low-pass filtering at 1100 Hzbut were still reflected in the F1-F3 value.

These results suggest that the spectral measurement ofF1-F3 may serve as a useful complement to the flowmeasurements, especially when there are high vocal foldclosing velocities.

For most of the individual speakers there were strong(negative) relationships between flow-adduction quotientand H1-F1 and between flow-adduction quotient and SPL.These findings suggest that relatively gradual vocal foldclosures resulted in an increased amplitude of the firstharmonic, reduced amplitude of the first formant, and re-duced SPL (cf. Fant, 1979). The same relationships werefound for H1-F3: Decreased adduction quotient resulted in asteeper spectral slope and decreased SPL (cf. Hillenbrand etal., 1994; Klatt & Klatt, 1990; Stevens, 1977).

In normal phonation, DC flow in the glottal waveform hasbeen assumed to reflect airflow that passes through aposterior glottal "chink" that is mostly between the aryte-noids (i.e., the cartilagenous portion of the vocal folds).However, neither the relationship between underlying phys-iology and the DC flow (cf. Hertegard et al., 1992, 1995) northe acoustic effect of the DC flow is completely understood.A large chink would increase the subglottal coupling withreduced high frequency energy and reduced SPL. Thereforea positive relationship between DC flow and H1-F3 and anegative relationship between DC flow and SPL could beexpected. As seen in Tables 2 and 3, these relationshipswere weak or inconsistent. Neither of the acoustic measuresvaried systematically with DC flow.

It is reasonable to believe that there are physiologicalconstraints at some extreme glottal configurations thatwould produce some covariation between chink size andparameters of flow through the membranous part of thevocal folds. Thus, a strong relationship between DC flow andthe adduction quotient should be expected, at least amongextreme cases of abduction and adduction. However, de-

WI-

-o0I P IL1 %-I o ' b J ! m

38 1212-1223 December 1995

W

Page 10: Comparacion EGG y Aerodinamica en Mujeres

Holmberg et al.: Comparisons Among Measures of Voice 1221

scriptive observations of individual glottal waveformsshowed both abrupt and gradual closing portions in combi-nation with relatively high levels of DC flow, and the rela-tionship between adduction quotient and DC flow wasnonsignificant in the group data. For individual speaker data,the relationship was weak, inconsistent, or nonsignificantand did not indicate any readily observable systematicrelationship between the chink and the abruptness of themembranous vocal fold closure. However, one factor thatmay limit the usefulness of our measure of DC flow in glottalwaveforms and contribute to erroneous speaker variation isthe difficulty of assuring a perfect seal between the subject'sface and the flow transducer mask. For large enough leaks,the lowest parts of the glottal flow waveform signal can fallbelow the zero flow baseline, making the leak problem easilydetectable. However, small DC leaks that would lower allflow amplitude measures except for AC flow can easily goundetected. Taking a somewhat conservative stance, onecan never be quite certain that flow amplitude measure-ments that include DC flow do not include a DC leak andshould have been higher than measured. Therefore, DC flowdata must be interpreted with caution (cf. Holmberg et al.,1988; 1994a, 1994b).

Conclusions

The following conclusions could be drawn from the resultsof this study:

Comparisons between measures obtained from the vowelin /pae/ syllables and those obtained from sustained phona-tion can be made as long as SPL differences are controlledfor.

Strong relationships between flow-adduction quotient andH1-H2 for individual data suggest: (a) Adduction quotient,measured at a 30% level on the glottal waveform is sensitiveenough to separate waveforms with gradual and abruptclosing portions in data for individual subjects. (b) Measure-ments of the amplitude difference between the two firstharmonics (H1-H2) may be used as a substitute measure forflow-adduction quotient in cases of unsuccessful inversefiltering that make measurements of adduction quotientunreliable (e.g., formant residuals are superimposed on theglottal waveform).

Measurements of adduction quotient obtained from cleansignals at a 30% level on the glottal airflow waveform and at65% level on the EGG waveform were highly related. Thisfinding suggests that (a) the 65% level should be a usefulcriterion for measurements of adduction quotient on theEGG signal, (b) the glottal flow and EGG signals may serveto complement one another, especially in cases of formantresiduals on the glottal waveforms or weak and/or noisyEGG signals that would cause problems at data extraction.

Measurement of the amplitude difference between thespectral peaks of the first and third formants (F1-F3) re-flected high vocal fold closing velocities that may have beenobscured because of low-pass filtering at 1100 Hz. Thus,F1-F3 may serve as a useful complement to measurementsof maximum flow declination rate, especially in voices withhigh closing vocal fold velocities.

The lack of any readily observable systematic relationshipbetween DC flow (determined largely by the size of anarytenoid glottal "chink") and adduction quotient (reflectingthe abruptness of the membranous vocal fold closure), incombination with the difficulty of assuring a tight sealbetween the subject's face and the mask, call for caution ininterpretation of DC flow data.

Acknowledgments

This study was supported by a grant (R01-DC 00266) from theNational Institutes of Health. We thank Elaine Stathopoulos andthree anonymous reviewers for helpful comments on a previousversion of this manuscript.

References

Badin, P., Hertegard, S., & Karlsson, I. (1990). Notes on theRothenberg Mask. Quarterly Report, 1, 1-7. Stockholm: SpeechTransmission Laboratory and Music Acoustics, The Royal Insti-tute of Technology.

Bickley, C. A. (1982). Acoustic analysis and perception of breathyvowels. Working papers, 2, 71-82. Cambridge, MA: SpeechCommunication Group, Research Laboratory of Electronics, MIT.

Boone, D. R., & McFarlane, S. C. (1988). The voice and voicetherapy (4th ed.). Englewood Cliffs, NJ: Prentice-Hall.

Childers, D. G., Hicks, D. M., Moore, G. P., Eskenazi, L., &Lalwani, A. L. (1990). Electroglottography and vocal fold physi-ology. Journal of Speech and Hearing Research, 33, 245-254.

Childers, D. G., Naik, J. M., Larar, J. N., Krishnamurthy, A. K., &Moore, G. P. (1983). Electroglottography, speech, and ultra-highspeed cinematography. In I. R. Titze & R. Scherer (Eds.), Vocalfold physiology and biophysics of voice (pp. 202-220). Denver:Denver Center for the Performing Arts.

Childers, D. G., & Lee, C. K. (1991). Vocal quality factors: Analysis,syntheses, and perception. Journal of the Acoustical Society ofAmerica, 90, 2394-2410.

Colton, R. H., Brewer, D. W., & Rothenberg, M. (1985, August).Vibratory characteristics of patients with voice disorders. Oralpresentation delivered at the Symposium on Voice Acoustics andDysphonia, Gotland, Sweden.

Fant, G. (1979). Glottal source and excitation analysis. QuarterlyReport, 1, 85-107. Stockholm: Speech Transmission Laboratoryand Music Acoustics, The Royal Institute of Technology.

Fant, G. (1982). Preliminaries to analysis of the human voice source.Quarterly Report, 4, 1-27. Stockholm: Speech TransmissionLaboratory and Music Acoustics, The Royal Institute of Technol-ogy.

Hammarberg, B. (1986). Perceptual and acoustic analysis of dys-phonia. Doctoral dissertation, Dept. of Logopedics and Phoniat-rics, Karolinska Institute, Huddinge University Hospital, Hud-dinge, Sweden.

Herrington-Hall, B. L., Lee, L., Stemple, J. C., Niemi, K. R., &McHone, M. M. (1988). Description of laryngeal pathology byage, sex, and occupation in a treatment-seeking sample. Journalof Speech and Hearing Disorders, 53, 57-64.

Hertegard, S., & Gauffin, J. (1991). Insufficient vocal fold closure asstudied by inverse filtering. In J. Gauffin & B. Hammarberg (Eds.),Vocal fold physiology: Acoustic, perceptual and physiologicalaspects of voice mechanisms (pp. 243-250). San Diego: SingularPublishing Group, Inc.

Hertegard, S., & Gauffin, J. (1995). Glottal area and vibratorypatterns studied with simultanous stroboscopy, flow glottotraphy,and electroglottography. Journal of Speech and Hearing Re-search, 38, 85-100.

Hertegard, S., Gauffin, J., & Karlsson, I. (1992). Physiologicalcorrelates of the inverse filtered flow waveform. Journal of Voice,6, 3, 224-234.

Page 11: Comparacion EGG y Aerodinamica en Mujeres

1222 Journal of Speech and Hearing Research

Higgins, M. B., & Saxman, J. H. (1993). Inverse-filtered air flow andEGG measures for sustained vowels and syllables. Journal ofVoice, 7, 47-53.

Hillenbrand, J. (1988). Perception of aperiodicities in syntheticallygenerated voices. Journal of the Acoustical Society of America,83, 2361-2371.

Hillenbrand, J., Cleveland, R. A., & Erickson, R. L. (1994).Acoustic correlates of breathy vocal quality. Journal of Speechand Hearing Research, 37, 769-778.

Hillman, R. E., Holmberg, E. B., Perkell, J. S., Kobler, J., Guiod,P., Gress, C., & Sperry, E. (1995). Speech respiration in adultfemales with vocal nodules. Manuscript submitted for publication.

Hillman, R. E., Holmberg, E. B., Perkell, J. S., Walsh, M., &Vaughan, C. (1989). Objective assessment of vocal hyperfunc-tion: An experimental framework and initial results. Journal ofSpeech and Hearing Research, 32, 373-392.

Hillman, R. E., Holmberg, E. B., Perkell, J. S., Walsh, M., &Vaughan, C. (1990). Phonatory function associated with hyper-functionally related vocal fold lesions. Journal of Voice, 4, 52-63.

Holmberg, E. B., Hillman, R. E., & Perkell, J. S. (1988). Glottalairflow and transglottal air pressure measurements for male andfemale speakers in soft, normal, and loud voice. Journal of theAcoustical Society of America, 84, 511-529.

Holmberg, E. B., Hillman, R. E., & Perkell, J. S. (1989). Glottalairflow and transglottal air pressure measurements for male andfemale speakers in low, normal and high pitch. Journal of Voice,3, 294-305.

Holmberg, E. B., Hillman, R. E., Perkell, J. S., & Gress, C. (1994a).Relationships between intra-speaker variation in aerodynamicmeasures of voice production and variation in SPL across re-peated recordings. Journal of Speech and Hearing Research, 37,484-495.

Holmberg, E. B., Perkell, J. S., Hillman, R. E., & Gress, C. (1994b).Individual variation in measures of voice. Phonetica, 51, 30-37.

Huffman, M. K. (1987). Measures of phonation type in Hmong.Journal of the Acoustical Society of America, 81, 495-503.

Karlsson, . (1985). Glottal waveforms for normal female speakers.Quarterly Progress Report, 1, 31-36. Stockholm: Speech Trans-mission Laboratory, Royal Institute of Technology.

Klatt, D. H., & Klatt, L. C. (1990). Analysis, synthesis, and percep-tion of voice quality variations among female and male talkers.Journal of the Acoustical Society of America, 87, 820-857.

Klich, R. J. (1982). Relationship of vowel characteristics to listenerratings of breathiness. Journal of Speech and Hearing Research,25, 574-580.

Koike, Y., & Hirano, M. (1973). Glottal-area time function andsubglottal pressure variation. Journal of the Acoustical Society ofAmerica, 54, 1618-1627.

L6fqvist, A., Carlborg, B., & Kitzing, P. (1982). Initial validation ofan indirect measure of subglottal pressure during vowels. Journalof the Acoustical Society of America, 72, 633-634.

Nagata, K., Kurita, S., Yasumoto, S., Maeda, T., Kawasaki, H., &Hirano, M. (1983). Vocal fold polyps and nodules: A 10-yearreview of 1,156 patients. Auris-Nasus-Larynx, 10(Suppl.), S27-S35.

Perkell, J. P., Holmberg, E. H., & Hillman, R. E. (1991). A system

for signal processing and data extraction from aerodynamic,acoustic, and electroglottographic signals in the study of voiceproduction. Journal of the Acoustical Society of America, 54,1777-1781.

Peterson, K. L., Verdolini-Marston, K., Barkmeier, J. M., &Hoffman, H. T. (1994). Comparisons of aerodynamic and elec-troglottographic parameters in evaluating clinically relevant voic-ing patterns. Annals of Otology, Rhinology, and Laryngology, 103,335-346.

Rothenberg, M. (1973). A new inverse filtering technique for deriv-ing the glottal air flow waveform during voicing. Journal of theAcoustical Society of America, 53, 1632-1645.

Rothenberg, M. (1977). Measurements of airflow in speech. Journalof Speech and Hearing Research, 20, 155-176.

Rothenberg, M. (1985). Source-tract acoustic interaction in breathyvoice. In I. R. Titze & R. Scherer (Eds)., Vocal fold physiology andbiophysics of voice (pp. 202-220). Denver: Denver Center for thePerforming Arts.

Rothenberg, M., & Mahshie, J. J. (1988). Monitoring vocal foldabduction through vocal fold contact area. Journal of Speech andHearing Research, 31, 338-351.

Smitheran, J. R., & Hixon, T. J. (1981). A clinical method forestimating laryngeal airway resistance during vowel production.Journal of Speech and Hearing Research, 46, 138-146.

S6dersten, M., & Lindestad, P-A. (1990). Glottal closure andperceived breathiness during phonation in normally speakingsubjects. Joumal of Speech and Hearing Research, 33, 601-611.

Sodersten, M., Lindestad, P-A, & Hammarberg, B. (1991). Vocalfold closure, perceived breathiness, and acoustic characteristicsin normal adult speakers. In J. Gauffin & B. Hammarberg (Eds.),Vocal fold physiology: Acoustic, perceptual, and physiologicalaspects of voice mechanisms (pp. 217-225). San Diego: SingularPublishing Group, Inc.

Sperry, E., Hillman, R. E., & Perkell, J. S. (1994). The use ofinductance plethysmography to assess respiratory function in apatient with vocal nodules. Journal of Medical Speech-LanguagePathology, 2, 137-145.

Stevens, K. N. (1977). Physics of laryngeal behavior and larynxmodes. Phonetica, 34, 264-279.

Stevens, K. N., & Hanson, H. M. (1995). Classification of glottalvibration from acoustic measurements. In O. Fujimura & M. Hirano(Eds.), Vocal fold physiology: Voice quality control (pp. 147-170).San Diego: Singular.

Sundberg, J., & Gauffin, J. (1979). Waveform and spectrum of theglottal source. In B. Lindblom & S. Ohman (Eds.), Frontiers ofspeech communication research (pp. 301-320). London: Aca-demic Press.

Received October 26, 1994Accepted: April 5, 1995

Contact author: Eva B. Holmberg, PhD, Massachusetts Instituteof Technology, Research Laboratory of Electronics, Building 36,Room 521, Cambridge, MA 02139. E-mail: [email protected]

38 1212-1223 December 1995

Page 12: Comparacion EGG y Aerodinamica en Mujeres

Holmberg et al.: Comparisons Among Measures of Voice 1223

Appendix

Mean and standard deviation values for the vowel in strings of repeated /pa/syllables and sustained /e/ phonation in comfortable (C) and loud (L) voicein female speakers

Ipe/ /e/

M SD M SD

SPL (dB)

H1-H2 (dB)

H1-F1 (dB)

H1-F3 (dB)

H1-F3 harmonics(dB)

H1-F3 noise(dB)

H1-F3 mix(dB)

F1-F3 (dB)

C 74.5L 83.1

C 6.6L 2.4

C -4.0L -11.7

C 21.0L 12.3

C 22.8L 8.8

C 24.2L 18.0

C 20.0L 14.9

C 25.0L 24.0

F1-F3 harmonics C(dB) L

F1-F3 noise C(dB) L

F1-F3 mix C(dB) L

AC FLOW C(I/sec) L

DC FLOW C(I/sec) L

Max. flow CDekl. rate L(I/sec/sec)

Adduc. quot. CFlow L(t closed/T)

Adduc. quot. CEGG L(t closed/T)

Pressure C(cm H2 0) L

(N = 20. (Adduc.quot.EGG: N = 17.)

22.821.2

27.331.3

24.025.3

.147

.213

.097

.081

190.8421.4

.50

.59

3.53.9

3.83.5

7.76.2

7.98.4

4.78.5

8.67.6

6.86.5

5.65.1

4.74.0

5.64.2

5.43.8

.045

.058

.042

.039

75.8140.9

.06

.07

.46 .07

.46 .07

5.58.3

1.11.9

73.282.3

7.72.9

-2.2-12.0

22.012.6

15.810.5

26.420.0

20.912.8

24.324.6

18.122.5

27.629.8

24.025.8

.140

.198

.087

.070

171.8372.0

3.74.0

4.33.9

7.46.1

8.08.9

7.69.1

6.55.9

7.58.0

6.05.8

4.15.7

4.34.0

5.85.0

.033

.049

.045

.030

70.9139.5

.07

.08

.06

.07

.48

.59

.48

.49

not applicablenot applicable