synchronizing videostroboscopic images of human ......a 32-year-old man with a normal laryngeal ex-...

5
Synchronizing Videostroboscopic Images of Human Laryngeal Vibration With Physiological Signals JOEL A. SERCARZ, MD, GERALD S. BERKE, MD, BRUCE R. GERRATT, PHD, JODY KREIMAN, PHD, YE MING, MD, AND MANUEL NATIVIDAD, BA Purpose: This report describes a new system that permits the precise correlation of vid- eostroboscopic images with corresponding physiological measures, such as glottography and subglottic pressure. Method: A healthy volunteer had unilateral vocal cord paralysis induced by infiltrating local anesthesia into the recurrent and superior laryngeal nerve. Vocal-fold vibrations were monitored by photoglottography (PGG) and electroglottography (EGG). Analog sig- nals from the EGG and PGG were synchronized with the video and correlated. Results: The method described permits images to be sampled throughout sustained pho- nation. This technique allows study of events during glottic vibration. Results obtained have been in close agreement with previous studies that correlate the vocal-fold morphol- ogy to glottographic signal using other methods. This technique is inexpensive in com- parison with high-speed filming. The main disadvantage of this method is related to the limitations of stroboscopy. Copyright 0 1992 by W.B. Saunders Company Videostroboscopy has gained widespread acceptance for use in the assessment of laryn- geal vibration, and can complement glotto- graphic and aerodynamic measures in the ob- jective analysis of voice. During stroboscopy, light flashes are generated in synchrony with the fundamental frequency of laryngeal vibra- tion. These flashes of light are slightly out of phase with the fundamental frequency, creat- ing a montage of individual images from nu- merous consecutive glottal cycles. This mon- tage stimulates a moving image of the vibrat- ing vocal folds.* The correlation of videostroboscopic im- ages of the vibrating vocal folds with specific points on physiological waveforms would im- prove our understanding of this data. In the past, this type of comparison could be made only at a small number of research centers From the Division of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, CA; the Department of Veterans Affairs Medical Center, West Los Angeles, CA. Supported in part by NIDCD Grant No. ROl-DC00855- 01 by VA Medical Research Funds. Address reprint requests to Gerald S. Berke, MD, UCLA School of Medicine, Division of Head and Neck Surgery, CHS 62-132,10633 Le Conte Ave, Los Angeles, CA 90024-l 624. Copyright 0 1992 by W.B. Saunders Company 0196-0709/92/l 301-0005$5.00/O equipped to perform high-speed laryngeal photography. This report describes a method of correlating images with physiological sig- nals using conventional videostroboscopy and commercially available speech analysis software programs. This method was devel- oped for use in an in vivo canine model of phonation.’ Its application to human laryn- geal vibration is reported here for the first time. Previously, Anastaplo and Karne113S4 have correlated electroglottographic (EGG) data with videostroboscopic images by superim- posing the EGG signal over a portion of a video field. They used split-screen video tech- nology, using the stroboscopic flash timing signal to trigger oscilloscopic tracing of an EGG signal. Thus, the leading edge of the EGG signal corresponds to the stroboscopic image on a video monitor. This method has several disadvantages. First, because the EGG wave- forms are recorded on videotape, they must be analyzed using digital video processing tech- niques. Second, the frequency of the strobo- scopic flashes may exceed the video frame rate, so some of the images recorded in a sin- gle video frame may actually represent a com- bination of two strobe flashes. Finally, the EGG waveform recorded on the oscilloscopic is a composite reconstructed from a number of 40 American Journal of Otolaryngology, Vol 13, No 1 (January-February), 1992: pp 40-44

Upload: others

Post on 23-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Synchronizing Videostroboscopic Images of Human ......A 32-year-old man with a normal laryngeal ex- amination and no history of laryngeal pathology volunteered to be the subject of

Synchronizing Videostroboscopic Images of Human Laryngeal Vibration With Physiological Signals

JOEL A. SERCARZ, MD, GERALD S. BERKE, MD, BRUCE R. GERRATT, PHD, JODY KREIMAN, PHD, YE MING, MD, AND MANUEL NATIVIDAD, BA

Purpose: This report describes a new system that permits the precise correlation of vid- eostroboscopic images with corresponding physiological measures, such as glottography and subglottic pressure. Method: A healthy volunteer had unilateral vocal cord paralysis induced by infiltrating local anesthesia into the recurrent and superior laryngeal nerve. Vocal-fold vibrations were monitored by photoglottography (PGG) and electroglottography (EGG). Analog sig- nals from the EGG and PGG were synchronized with the video and correlated. Results: The method described permits images to be sampled throughout sustained pho- nation. This technique allows study of events during glottic vibration. Results obtained have been in close agreement with previous studies that correlate the vocal-fold morphol- ogy to glottographic signal using other methods. This technique is inexpensive in com- parison with high-speed filming. The main disadvantage of this method is related to the limitations of stroboscopy. Copyright 0 1992 by W.B. Saunders Company

Videostroboscopy has gained widespread acceptance for use in the assessment of laryn- geal vibration, and can complement glotto- graphic and aerodynamic measures in the ob- jective analysis of voice. During stroboscopy, light flashes are generated in synchrony with the fundamental frequency of laryngeal vibra- tion. These flashes of light are slightly out of phase with the fundamental frequency, creat- ing a montage of individual images from nu- merous consecutive glottal cycles. This mon- tage stimulates a moving image of the vibrat- ing vocal folds.*

The correlation of videostroboscopic im- ages of the vibrating vocal folds with specific points on physiological waveforms would im- prove our understanding of this data. In the past, this type of comparison could be made only at a small number of research centers

From the Division of Head and Neck Surgery, UCLA School of Medicine, Los Angeles, CA; the Department of Veterans Affairs Medical Center, West Los Angeles, CA.

Supported in part by NIDCD Grant No. ROl-DC00855- 01 by VA Medical Research Funds.

Address reprint requests to Gerald S. Berke, MD, UCLA School of Medicine, Division of Head and Neck Surgery, CHS 62-132,10633 Le Conte Ave, Los Angeles, CA 90024-l 624.

Copyright 0 1992 by W.B. Saunders Company 0196-0709/92/l 301-0005$5.00/O

equipped to perform high-speed laryngeal photography. This report describes a method of correlating images with physiological sig- nals using conventional videostroboscopy and commercially available speech analysis software programs. This method was devel- oped for use in an in vivo canine model of phonation.’ Its application to human laryn- geal vibration is reported here for the first time.

Previously, Anastaplo and Karne113S4 have correlated electroglottographic (EGG) data with videostroboscopic images by superim- posing the EGG signal over a portion of a video field. They used split-screen video tech- nology, using the stroboscopic flash timing signal to trigger oscilloscopic tracing of an EGG signal. Thus, the leading edge of the EGG signal corresponds to the stroboscopic image on a video monitor. This method has several disadvantages. First, because the EGG wave- forms are recorded on videotape, they must be analyzed using digital video processing tech- niques. Second, the frequency of the strobo- scopic flashes may exceed the video frame rate, so some of the images recorded in a sin- gle video frame may actually represent a com- bination of two strobe flashes. Finally, the EGG waveform recorded on the oscilloscopic is a composite reconstructed from a number of

40 American Journal of Otolaryngology, Vol 13, No 1 (January-February), 1992: pp 40-44

Page 2: Synchronizing Videostroboscopic Images of Human ......A 32-year-old man with a normal laryngeal ex- amination and no history of laryngeal pathology volunteered to be the subject of

SYNCHRONIZING VIDEOSTROBOSCOPIC IMAGES 41

glottal cycles, rather than a single raw signal. One advantage of this system described by Karnell‘? is that it does not require any realign- ment of separately recorded stroboscopic and physiological records.

Gerratt, Hanson, and Berke5 reported a tech-

EGG

PGG

Audio

VMliC8l

nique for correlating photoglottographic (PGG) waveforms with individual strobo- scopic images using an SLR camera and elec- tronic flash. By locating the relatively intense signal from the electronic photo flash on the PGG waveform, the precise location of the im- Fig 1. The C-Speech video screen. The narrow spikes on the

age on the glottograph can be determined. PGG waveform represent the affect of the bright strobe flash on the PGG sensor. The third trace from the top represents the

However, with this techniaue onlv a sinale SWP recorded on the audio channel of the video recorder. The 1 _I

image per sustained vocalization can be asso- vertical retrace of the video camera is presented in the fourth

ciated with the PGG waveform. In contrast, channel, depicting the length and position of the video fields.

the system reported here allows the analysis of an entire stroboscopic cycle, any series of images, and any number of corresponding physiological waves.

METHODS

Instrumentation and Procedures

Synchronizing videostroboscopic images and physiological signals requires (1) a record of the video camera’s vertical retrace signal, (2) a syn- chronization pulse to align the video image to the digitized glottographic signal, (3) a videotimer or framecoder, and (4) a record of the strobe flash transduced by a light sensor (PGG).

A record of the video signal from a charged cou- pled device (CCD) video camera was obtained by passing the signal through the coupled TV-vertical trigger input of an oscilloscope (Hitachi V-1050F,

Torrance, CA) for retrieval of the vertical retrace signal from the gated output. The vertical retrace signal provides timing information about the length and position of a video field (ie, a video image; see Fig 1). The video camera images were simultaneously recorded on a Sony VO 9850 video recorder (Sony, Park Ridge, NJ), along with a signal from a timecoder (Panasonic WJ-810, Cypress, CA). A 5 millisecond, 50 mV square wave pulse (SWP) was produced at a rate of one pulse per second; these SWPs were simultaneously digitized and re- corded on the audio channel of the video recorder to synchronize the tape-recorded and digitized video fields. Previous research’ has shown that a SWP recorded on a VCR’s audio channel can be used to synchronize the video fields on the video recorder with the video fields obtained from the vertical retrace signal.

A 32-year-old man with a normal laryngeal ex- amination and no history of laryngeal pathology volunteered to be the subject of this study. Unilat-

eral vocal fold paralysis was induced by infiltrating 2% lidocaine with a 25-gauge needle into the ex- pected location of the nerve. The recurrent laryn- geal nerve (RLN) was infiltrated in the tracheo- esophageal groove, approximately 2 cm inferior to the cricoid cartilage. The superior laryngeal nerve (SLN) was infiltrated at the lateral thyrohyoid membrane. To create a combined (“vagal”) paraly- sis, an SLN injection was performed, verified by stroboscopy, and immediately followed by an RLN injection. RLN paralysis was confirmed with laryn- goscopic evidence of paramedial fixation of the left vocal process; unilateral SLN paralysis was veri- fied by laryngoscopic evidence of posterior glottal rotation toward the injected (left) side on phona- tion.=

The subject sustained the vowel Ii/ as long as possible at comfortable levels of pitch and loud- ness. His larynx was visualized through a 90” en- doscope (Wolf, Rosemont, IL with two light cables attached to a CCD video camera (Toshiba IKA C-30,

Buffalo Grove, IL)). A 50-mm video lens (Comp- utar, Orange, CA) and a diopter lens (Vivitar No. 1, Santa Monica, CA) were used to magnify and focus the image of the glottis. A Bruel & Kjaer strobo- scope (4914A; Orange, CA) was connected to one light cable of the endoscope, and a cold light source (481B; Storz, Culver City, CA) was used to provide constant low-level illumination for PGG through the second light cable. A photosensor (Centronics OSD-50, Mountainside, NJ) was placed on the tra- chea. Although constant low-level PGG light was present during the stroboscopic imaging, no blur- ring or distortion of the stroboscopic images was observed.

Vocal fold vibration was monitored by PGG and EGG. EGG was performed with a Synchrovoice (Harrison, NJ) unit; the reference electrode was ap- plied to the strap muscles and the recording elec- trodes were placed on either side of the thyroid cartilage. Analog signals from EGG and PGG, the square wave synchronizing pulses, and the video

Page 3: Synchronizing Videostroboscopic Images of Human ......A 32-year-old man with a normal laryngeal ex- amination and no history of laryngeal pathology volunteered to be the subject of

42 SERCARZ ET AL

vertical retrace signal were low-pass filtered at 3 kHz and sampled for 5 seconds at 10 kHz with 12- bit resolution using C-SPEECH software [Paul Milenkovich, University of Wisconsin, Madison).

Correlation Procedure

Figure 1 shows the computer screen produced by C-SPEECH during the correlation procedure. EGG and PGG signals appear in the first and second channels, respectively; the narrow spikes on the PGG waveform represent the effect of strobe flashes on the light sensor. The third channel contains the SWPs, which were also recorded on the audio channel of the video recorder. The fourth channel contains the vertical retrace of the video camera and indicates the length of individual video fields.

To find the video image corresponding to any sampled point in a physiological signal, we first scan the entire digitized segment to determine where the center of the strobe light spike was su- perimposed on the PGG point of interest. A strobe image is located in the center of the high amplitude strobe light flash.’ We locate the video field con- taining the audible SWP nearest to the point of in- terest by referring to the vertical retrace output. Be- cause a recorded image does not appear on the video screen until the field after the one in which the flash occurred,’ we next determine the time interval from the digitized field containing the square wave pulse to the field following the one of interest using the timecoder output. To find the corresponding image, the digitized video retrace signal containing the SWP is synchronized to the video field containing the audible SWP. The previ- ously defermined time interval can then be used to find the number of video fields between the one containing the SWP and the one containing the im- age representing the glottal configuration of inter- est.

In a g-second digitized segment of phonation, three to five SWPs usually occur. The position of a specific SWP (first SWP, second SWP, etc) can be determined by reviewing the entire digitized seg- ment. It is necessary that digitization occurs before the first SWP. Similarly, the audio SWPs can be located by reviewing the audio channel of the vid- eotape during playback in the search mode.

The system can be used to locate images associ- ated with a target position on a waveform or to locate the point on a waveform that corresponds to some target vocal fold image. The system also per- mits correlation of all the recorded images with their corresponding positions on glottographic sig- nals.

simulated laryngeal paralysis described previ- ously. The PGG waveform is markedly skewed to the right, and differs notably from the symmetrical form observed in people with normal laryngeal function. These asymme- tries are reflected in a speed quotient (SQ) of 0.3; the mean SQ for a sample of 11 normal male speakers was 0.93. ’ Low SQvalues have been found in previous PGG studies of pa- tients with unilateral recurrent laryngeal nerve paralysis.‘*’

Figure 2 shows PGG, EGG, and differenti- ated EGG (dEGG) waveforms with the cursor centered in a strobe flash occurring during the most closed portion of the glottal cycle. The EGG signal shows a low amplitude of imped- ance, indicating maximum vocal fold contact at this point. The PGG waveform is at base- line, indicating a minimum amount of light passing through the glottis. Figure 3 shows the corresponding videostroboscopic image at this moment. Note that vocal fold contact is in fact incomplete. There is a gap in the posterior third of the glottis, extending through the car- tilaginous glottis into the posterior aspect of the membranous portion. Although the rela- tively flat portion at the baseline of the PGG signal appears to represent a period of glottal closure, the videostroboscopic image shows that this is not the case. In fact, because nei- ther PGG nor EGG can distinguish relative from absolute zero (ie, minimum glottal area/ maximum contact from no glottal area/ complete vocal fold contact), the presence and size of the glottal gap occurring during the most closed portion of the cycle are poorly represented by these waveforms.

Figure 4 shows the PGG, EGG, and dEGG signals with the cursor centered in a strobe flash occurring during the rising slope of the PGG and EGG waveforms. Figure 5 shows the

dEGGr._ v”_______~~

RESULTS Fig 2. PGG, EGG, and dEGG waveforms from an adult male

with an induced, temporary, unilateral, combined recurrent,

Figures 2-9 show a series of images and syn- chronized glottographic waveforms from the

and superior laryngeal nerve paralysis. The spike and cursor position show the timing of the strobe flash that produced the Iaryngeel image in Fig 3.

Page 4: Synchronizing Videostroboscopic Images of Human ......A 32-year-old man with a normal laryngeal ex- amination and no history of laryngeal pathology volunteered to be the subject of

SYNCHRONIZING VIDEOSTROBOSCOPIC IMAGES 43

Fig 3. The videostroboscopic image associated with the

waveforms in Fig 2, showing the glottis at the most closad

portion of the cycle.

associated videostroboscopic image of early glottal opening. Previous studies using high- speed photography” have shown that for some speakers the positive peak in the dEGG waveform corresponds to the point of open- ing. As these two figures show, similar infor- mation is available from the present tech- nique: glottal opening for this subject also oc- curred close to the positive dEGG peak.

Figure 6 shows the three waveforms with the cursor centered on a strobe flash at the point of maximal glottal area [indicated by maximal PGG amplitude): Fig 7 presents the time-aligned videostroboscopic image. Com- parison of this image with that in Fig 5 shows that greater lateral excursion has occurred in the unaffected vocal fold on the subject’s right (left side of the figure), with little apparent movement of the paralyzed left vocal fold (right side of the figure).

Figure 8 shows the three waveforms with the strobe flash and cursor located on the de- creasing slopes. Figure 9 shows the associated

Fig 4. The PGG, EGG, and dEGG signals with the spike and cursor position further along the glottal cycle, showing the stroba flash that produced the laryngeal image in Fig 5.

Fig 5. The videostroboscopic image corresponding to the waveforms in Fig 4, showing the glottis beginning to open.

image, a moment during the closing portion of the cycle. The image resembles Fig 5 (early glottal opening), except that in Fig 9 a faint outline of the crest of the mucosal wave can be observed at approximately the halfway point between the medial and lateral margins of the unaffected (patient’s right) vocal fold. No such wave is visible on the paralyzed (patient’s left) vocal fold.

EXPERIENCE WITH THE TECHNIQUE

The method described in this report has been used in our laboratory for human studies to indicate the timing of events during glottal vibration. Our results have been in close agreement with previous studies correlating vocal fold vibratory morphology to glotto- graphic signal position using a variety of methods, including high-speed photogra- phy.” Any signal-including glottographic waveforms, subglottic pressure, particle ve- locity, and acoustic and electromyographic

EGG ,‘--?I--?

Fig 5. The PGG, EGG, and dEGG signals with strobe spike and cursor position, demonstrating the timing of the flash that produced the laryngeal image in Fig 7.

Page 5: Synchronizing Videostroboscopic Images of Human ......A 32-year-old man with a normal laryngeal ex- amination and no history of laryngeal pathology volunteered to be the subject of

44 SERCARZ ET AL

Fig 7. The videostroboscopic image corresponding to the waveforms in Fig 6, showing the glottis at maximum opening.

data-can be correlated with videostrobo- scopic images using this system.

This system was designed using a modified stroboscope with a flash rate of 30 per second. Because video fields occur at a rate of 60 per second, the strobe rate of 30 flashes per sec- ond allows approximately one half of the fields to elapse without simultaneous strobe flashes. Blank video fields increase the cer- tainty of correlating strobe flashes with spe- cific video fields, and also prevent the super- imposition of two strobe flashes in a single video image. A strobe rate of 60 flashes per second can be used, but results in images that may be composite. However, when a compos- ite image occurs, the vocal fold configuration changes little due to the slow phase advance.

In contrast to SLR photography, the method described permits images to be sampled throughout sustained phonation. In addition, the system is inexpensive in comparison with high-speed filming. The main disadvantages of this method are related to the limitations of stroboscopy. In particular, only a single image can be obtained from each vocal cycle, and very irregular vibration may be difficult to re-

Fig 8. The PGG, EGG, and dEGG signals with spike and cur- sor position corresponding to the laryngeal image in Fig 9.

Fig 9. The videostroboscopic image corresponding to the

waveforms in Fig 8, showing the glottis near maximum closure. The crest of the mucosal wave is located about halfway be-

tween the medial and lateral margins of the unaffected (sub- ject’s right) vocal fold. No mucosal wave is discernible on the surface of the paralyzed (subject’s left) vocal fold.

construct. Although this technique may seem to be a complex and time-consuming process, in reality, once learned, it is simple to use, takes little effort, and is clinically relevant.

REFERENCES

1. Hirano M: Clinical Examination of Voice. Vienna, Austria, Springer, 1981

2. Berke GS, Trapp TK, Gerratt BR, et al: Videostrobo- scopic images associated with glottographic waveforms in an in vivo canine model of phonation. 85:1789-1793, 1989

3. Anastaplo S, Karnell MP: Synchronized videostro- boscopic and electroglottographic examination of glottal opening. J Acous Sot Am 83:1883-1890,1988 4. Karnell MP: Synchronized videostroboscopy and

electroglottography. J Voice 388-75, 1989 5. Gerratt BR, Hanson DG, Berke GS: Laryngeal config-

uration associated with glottography. Am J Otol 9:173- 179, 1988

6. Sercarz JA, Berke GS, Ming Y, et al: Videostrobos- copy of human vocal cord paralysis. Ann Otol Rhino1 Laryngol (in press)

7. Hanson DG, Gerratt BR, Berke GS: Frequency, inten- sity and target matching effects on photoglottographic measures of open quotient and speed quotient. J Speech Hear Res 33:45-50, 1990

8. Gerratt BR, Hanson DG, Berke G: Glottographic mea- sures of laryngeal function in individuals with abnormal motor control, in Baer T, Sasaki C, Harris K (eds]: Vocal Fold Physiology: Laryngeal Function in Phonation and Respiration. San Diego, CA, College Hill, 1987, pp 521- 532

9. Hanson DG, Gerratt BR, Karin R, et al: Glottographic measures of vocal fold vibration: An examination of la- ryngeal paralysis. Laryngoscope 98:541-549, 1988

10. Childers DG, Hicks DM, Moore GP, et al: Electoglot- tography and vocal fold physiology. Speech Hear Res 33:245-254, 1990