improving speech intelligibility in fluctuating background
TRANSCRIPT
1
Improving Speech Intelligibility in Fluctuating Background
Interference
by Laura A. D’Aquila
S.B., Massachusetts Institute of Technology (2015),
Electrical Engineering and Computer Science, Mathematics
Submitted to the
Department of Electrical Engineering and Computer Science
in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
Massachusetts Institute of Technology
June 2016
© Massachusetts Institute of Technology 2016. All rights reserved.
Author: _________________________________________________________________
Department of Electrical Engineering and Computer Science
May 20, 2016
Certified by: _____________________________________________________________
Dr. Charlotte M. Reed, Senior Research Scientist
Research Laboratory of Electronics
May 20, 2016
Certified by: _____________________________________________________________
Professor Louis D. Braida, Henry Ellis Warren Professor
Electrical Engineering and Health Sciences and Technology
May 20, 2016
Accepted by: ____________________________________________________________
Dr. Christopher J. Terman, Chairman, Masters of Engineering Thesis Committee
2
Improving Speech Intelligibility in Fluctuating Background
Interference
by Laura A. D’Aquila
Submitted to the Department of Electrical Engineering and Computer Science on May
20, 2016, in partial fulfillment of the requirements for the degree of Master of
Engineering in Electrical Engineering and Computer Science.
ABSTRACT
The masking release (MR; i.e., better speech recognition in fluctuating compared to continuous
noise backgrounds) that is evident for normal-hearing (NH) listeners is generally reduced or
absent in hearing-impaired (HI) listeners. In this study, a signal-processing technique was
developed to improve MR in HI listeners and offer insight into the mechanisms influencing the
size of MR. This technique compares short-term and long-term estimates of energy, increases the
level of short-term segments whose energy is below the average energy, and normalizes the
overall energy of the processed signal to be equivalent to that of the original long-term estimate.
In consonant-identification tests, HI listeners achieved similar scores for processed and
unprocessed stimuli in quiet and in continuous-noise backgrounds, while superior performance
was obtained for the processed speech in some of the fluctuating background noises. Thus, the
energy-normalized signals led to larger values of MR compared to that obtained with
unprocessed signals.
3
ACKNOWLEDGMENTS
This research was supported by the National Institute on Deafness and Other Communication
Disorders of the National Institutes of Health under Award Number R01 DC000117.
I would like to extend a big thank you to my advisors, Dr. Charlotte M. Reed and Professor
Louis D. Braida. From when I first began doing research in their lab as a sophomore to now as I
wrap up my M.Eng thesis, they have always made themselves available and offered much
guidance, instruction, and support. Their analysis and ideas for moving forward were crucial to
the success of this project. Their kindness made me look forward to coming into lab every day. I
am also extremely grateful for the RA funding that they provided me with as I worked on the
project. Additionally, I would like to heartily thank Dr. Joseph G. Desloge, the signal processing
mastermind of the project. During the spring of my senior year, his help was critical as I coded
the different components of this project. Despite having since taken a new job on the West Coast,
he still kindly spoke with me weekly on the phone throughout the year to discuss my project and
offer his very valuable insight, ideas, and feedback. I could not have asked for a better group of
mentors than Dr. Reed, Professor Braida, and Dr. Desloge. As part of the Sensory
Communication Group, the three of them performed much of the previous work that led to this
project, and this project would also not have been possible without their continued involvement.
I would lastly like to thank my family, who have provided me with countless opportunities
throughout my life without which I would not be at where I am today. I am very grateful for the
love and confidence that they have had in me throughout it all and for their shaping me into the
person I am. It is comforting to know that I can always turn to them no matter what happens.
4
I. BACKGROUND
Many hearing-impaired (HI) listeners with sensorineural hearing loss who are able to
understand speech in quiet environments without much difficulty encounter more problems in
noisy situations, such as in a cafeteria or at a social gathering. Indeed, it has been shown that
these listeners require a higher speech-to-noise ratio (SNR) to achieve a given level of
performance than do normal-hearing (NH) listeners (Festen and Plomp, 1990). This is the case
regardless of whether the noise is temporally fluctuating, such as interfering voices in the
background, or is steady-state, such as a motor.
Festen and Plomp (1990) measured the SNR required for 50%-correct sentence reception
in different types of background interference. Whereas HI listeners required a similar SNR
regardless of the type of interfering noise, NH listeners performed better (i.e., required a lower
SNR) in temporally fluctuating interference than in steady-state interference. Listeners who
perform better with fluctuating interference are said to experience a release from masking. This
release from masking occurs when listeners are able to perceive audible glimpses of the target
speech during dips in the fluctuating noise (Cooke, 2006) and it aids in the ability to converse
normally in the noisy social situations mentioned above.
One possible explanation of reduced release from masking in HI listeners is based on the
effects of reduced audibility in HI listeners, who are less likely to be able to receive the target
speech in the noise gaps (Desloge et al., 2010). Léger et al. (2015) looked at release from
masking in greater depth, particularly with respect to consonant recognition with different types
of speech processing. The processing allowed for the examination of the roles played by the
signal’s slowly varying component, known as its envelope (ENV), and rapidly varying
component, known as its temporal fine structure (TFS), on release from masking. The consonant
5
speech stimuli were processed using the Hilbert Transform to convey ENV cues, TFS cues, or
ENV cues recovered from TFS speech. Consonant identification was measured in the presence of
steady-state and 10-Hz square-wave interrupted speech-shaped noise. The percent-correct scores
were used to calculate masking release (MR) in percentage points, defined as the difference in
scores in interrupted noise and in continuous noise at a given SNR. The results showed that HI
listeners generally experienced MR for TFS and recovered-ENV speech but not for unprocessed
or ENV speech. The study concluded that the increase in MR may be related to the way the TFS
processing interacts with the interrupted noise signal, rather than to the presence of TFS itself.
Under certain circumstances, the removal of amplitude-envelope variation in TFS speech may
amplify the higher SNR glimpses of the speech signal during gaps in a fluctuating noise.
Reed et al. (2016) further investigated the conclusions of Léger et al regarding the role of
reduced amplitude variation in MR. The study tested an infinite peak-clipped (IPC) speech
condition, which used the sign of each sample point of the input signal to convert positive terms
to +1, convert negative terms to -1, and leave zero terms unchanged. This processing thus also
removed much of the amplitude variation. Speech intelligibility in noise and MR were compared
for TFS, IPC, and unprocessed speech for HI listeners. Outcomes for TFS and IPC speech were
very similar, leading to the conclusion that the removal of amplitude variation can indeed lead to
MR. Because both the TFS and IPC speech contained fine-structure cues, however, it was still
possible that TFS was responsible for the observed MR. Another condition was created in which
both TFS and amplitude-variation cues were eliminated by passing an ENV signal through the
TFS processing stage. Greater MR was observed for this condition than for the original ENV
speech, thus lending support to the hypothesis that reduced amplitude variation can lead to
improved MR in HI listeners. This MR arose as the less-intense portions of the speech stimulus,
6
which occurred in the noise gaps, became more audible to HI listeners when the amplitude was
“normalized” to remove variation. These studies proved promising in understanding a potential
way to improve MR in HI listeners; however, the improvement in MR was mainly due to a
decreased performance in continuous noise rather than an increased performance in fluctuating
noise.
To address these issues, Desloge et al. (2016) developed a signal-processing technique
designed to achieve similar reductions in signal amplitude variation without suffering a loss in
intelligibility in continuous background noise. Using non-real-time processing over the
broadband signal, the technique compared short-term and long-term estimates of energy,
increased the level of short-term segments whose energy was below the average energy, and
normalized the overall energy of the processed signal to be equivalent to that of the original
long-term estimate. In consonant-identification tests, HI listeners achieved similar scores for
processed and unprocessed stimuli in quiet and in continuous-noise backgrounds, while superior
performance was obtained for the processed speech in fluctuating background noises. Thus, the
energy-normalized signals led to larger values of MR compared to that obtained with
unprocessed signals. The work described in this paper builds upon Desloge et al. by
implementing and evaluating a real-time and multi-band version of the signal processing
algorithm in a broader range of noises.
II. GOALS
This study investigates a novel signal processing technique, called energy equalization
(EEQ), for the reduction of amplitude variation, which Reed et al. (2016) had concluded could
contribute to MR in HI listeners. EEQ processing “normalizes” the fluctuating short-term signal
energy to be equal to the long-term average signal energy. This technique is thus another way of
7
removing the rapid amplitude variation that occurs in speech. The goal is for this signal
processing to improve the performance of HI listeners in fluctuating background noise without
leading to a drop in performance in continuous background noise. This change in performance
would thus result in greater MR for HI listeners.
Energy equalization is applicable in the area of hearing aid and cochlear implant
processing, and it could potentially also be used to benefit NH listeners and even machine
listening systems that use automatic speech recognition. Other potential applications of EEQ
processing include cell-phone or teleconferencing systems where an individual is speaking in a
noisy environment and in speech recognition in interfering backgrounds. Thus, wherever speech
reception is needed in noise, energy equalization could be used.
The short-term signal energy for speech varies at a syllabic rate as intervals fluctuate
between being more intense (usually during vowels), less intense (usually during consonants),
and silent. Meanwhile, the long-term signal energy remains relatively constant and reflects the
overall loudness at which a speaker is talking. These overall properties of speech persist even
when background noise is added to the signal. The quiet portions of the speech signal are the
most troublesome ones to HI listeners and lead to reduced speech comprehension. Energy
equalization is a way for combatting this difficulty by amplifying the quieter parts of the signal
(that may be present during gaps in the background noise) relative to the louder parts of the
signal (that occur when background noise is fully present). This technique makes speech content
present during the dips in background noise more audible and hence useful for speech
comprehension.
8
III. SIGNAL PROCESSING ALGORITHM
The EEQ processing seeks to reduce short-term amplitude fluctuations of a speech-plus-
noise (S+N) stimulus while operating blindly and without introducing excessive distortion. The
following is a general description of the steps the EEQ processing performs in real-time on a
S+N signal x(t):
● Form running short-term and long-term moving averages of the signal energy, Eshort(t)
and Elong(t):
Eshort(t) = AVGshort[x2(t)] and Elong(t) = AVGlong[x
2(t)],
where AVG is a moving-average operator that utilizes specified short and long time
constants to provide an estimate of the signal’s energy.
● In this implementation, the AVG operators are single-pole infinite impulse
response (IIR) low pass filters applied to the instantaneous signal energy, x2(t),
with time constants of 5 ms and 200 ms for the short and long averages,
respectively. The magnitude and phase of the square root of the ratio of the
frequency response of AVGlong to the frequency response of AVGshort are shown
in Figure 1, which is useful in understanding the scale factor computed in the next
step of the processing.
● Determine the scale factor, SC(t):
SC(t) = √𝐸𝑙𝑜𝑛𝑔(𝑡) / 𝐸𝑠ℎ𝑜𝑟𝑡(𝑡),
where attention is made to avoid dividing by zero during quiet intervals.
● To prevent over-amplification of the noise floor, SC(t) had an upper limit of 20
dB.
9
● To prevent attenuation of stronger signal components, SC(t) had a lower limit of 0
dB.
● Apply the scale factor to the original signal:
y(t) = SC(t)x(t).
● Form the output z(t) by normalizing y(t) to have the same energy as x(t):
z(t) = K(t)y(t),
where K(t) is chosen such that AVGlong[z2(t)] = AVGlong[x
2(t)].
The processing described above can be applied either to a broadband signal or
independently to bandpass filtered components. The current implementation operated on both the
broadband signal (EEQ1) and a signal divided into four contiguous frequency bands (EEQ4).
These conditions are described in more detail in Section IV-D. Figure 2 depicts block diagrams
of the EEQ1 (Figure 2A) and EEQ4 (Figure 2B) algorithms. The EEQ algorithm that was
implemented follows the outline of the steps described above to process x[n], a sampled version
of the original signal x(t). The original signal is first multiplied by SC[n], as shown in Figure 2A,
and the resulting EEQ signal is then multiplied by K[n] to ensure that the long-term energy of the
EEQ signal is equal to the long-term energy of the original signal at every sample point. SC[n] is
restricted to lie in the range of 0 dB to 20 dB. Appendix I describes a modification to the
computation of the scale factor that could be used without this lower limit in place.
IV. METHODS
The experimental protocol for testing human subjects was approved by the internal
review board of the Massachusetts Institute of Technology. All testing was conducted in
10
compliance with regulations and ethical guidelines on experimentation with human subjects. All
listeners provided informed consent and were paid for their participation in the experiments.
A. Participants
Six male and three female HI listeners with bilateral, symmetric, mild-to-severe
sensorineural hearing loss participated in the experiment. They were all native speakers of
American English and ranged in age from 20 to 69 years with an average age of 36.7 years. Six
of the listeners were younger (33 years or less) and three were older (58-69 years). Five of the
listeners had sloping high-frequency losses (HI-1, HI-2, HI-4, HI-5, and HI-7), three had
relatively flat losses (HI-6, HI-8, and HI-9), and one had a “cookie-bite” loss (HI-3). Seven of
the listeners (all but HI-1 and HI-3) were regular users of bilateral hearing aids. The five-
frequency (0.25, 0.5, 1, 2, and 4 kHz) audiometric pure-tone average (PTA) ranged from 27 dB
HL to 75 dB HL across listeners with an average of 45.3 dB HL.
The test ear, age, and five frequency PTA for each HI listener are listed in Table 1 along
with the speech levels and SNRs employed in the experiment. The pure-tone thresholds of the HI
listeners in dB SPL are shown in Figure 3. The pure-tone threshold measurements were obtained
with Sennheiser HD580 headphones for 500 ms stimuli in a three-alternative forced-choice
adaptive procedure which estimates the threshold level required for 70.7%-correct detection (see
Léger et al., 2015).
Four NH listeners (defined as having pure-tone thresholds of 15 dB HL or better in the
octave frequencies between 250 and 8000 Hz) also participated in the study. They were native
speakers of American English, included three males and one female, and ranged in age from 19
to 54 years, with an average age of 30.0 years. A test ear was selected for each listener (2 left ear
11
and 2 right ear). The mean adaptive thresholds across test ears of the NH listeners are provided in
the first panel of Figure 3.
B. Speech Stimuli
The speech materials were Vowel-Consonant-Vowel (VCV) stimuli, with C=/p t k b d g f
s ʃ v z dʒ m n r l/ and V=/a/, taken from the corpus of Shannon et al. (1999). The set used for
testing consisted of 64 VCV tokens (one utterance of each of the 16 disyllables by two male and
two female speakers). The mean VCV duration was 945 ms with a range of 688 to 1339 ms
across the 64 VCVs in the test set. The recordings were digitized with 16-bit precision at a
sampling rate of 32 kHz and filtered to a bandwidth of 80-8020 Hz for presentation.
C. Interference Conditions
Noises from two broad categories of maskers were added to the speech stimuli prior to
processing for presentation. Four background interference conditions were derived from speech-
shaped noise but did not come from actual speech samples. Three additional background
interference conditions, referred to as vocoded modulated noises, were derived from actual
speech samples. The RMS level of each of the noises except for the baseline condition was
adjusted to be equal to that of the continuous noise, whose level was set as described in Section
IV-F. The maskers used in the study are shown in Figure 4 and are summarized below.
Maskers derived from randomly-generated speech-shaped noise (spectrogram
shown in Figure 5) but not coming from actual speech samples. This paper refers
to these as non-speech-derived noises:
o Baseline Noise (BAS): Continuous speech-shaped noise at 30 dB SPL.
o Continuous Noise (CON): Additional continuous noise added to BAS.
12
o Square-Wave Interrupted Noise (SQW): 10-Hz square-wave interruption
with 50% duty cycle added to BAS.
o Sinusoidal Amplitude Modulation Noise (SAM): 10-Hz sinusoidal
amplitude modulation noise added to BAS.
Maskers derived from actual speech samples (referred to as vocoded modulated
noise). These maskers were designed to exhibit fluctuations realistic of speech
without the informational masking component. This paper refers to these as
speech-derived noises:
o 1-Speaker Vocoded Modulated Noise (VC-1)
o 2-Speaker Vocoded Modulated Noise (VC-2)
o 4-Speaker Vocoded Modulated Noise (VC-4)
Appendix II describes the steps used to generate the vocoded modulated noises.
D. Speech Conditions
Listeners were presented with S+N signals with three different kinds of processing
applied:
Unprocessed Condition (UNP): The S+N signals were presented as described above with
no further processing beyond per-subject NAL-RP (Dillon, 2001) amplification.
1-band Energy Equalized Condition (EEQ1): EEQ processing was applied to the
broadband S+N signal over the range of 80-8020 Hz. As described in Section III, the
EEQ processing compared short-term and long-term estimates of S+N signal energy,
increased the level of short-term segments whose energy was below the average signal
energy, and normalized the overall energy of the processed signal to be equivalent to that
of the original long-term estimate (see Figure 2A).
13
4-band Energy Equalized Condition (EEQ4): The same technique as in the EEQ1
condition was applied independently to 4 logarithmically-equal bands of the S+N signal
in the range of 80-8020 Hz. In doing so, order 6 (36 dB/octave) passband filters divided
the input signals into bands with frequency ranges of 80 – 253 Hz, 253 – 801 Hz, 801 –
2535 Hz, and 2535 – 8020 Hz, respectively, and the EEQ1 processing was applied
independently to each band prior to reconstructing the signal by summing across bands
(see Figure 2B).
E. Speech and Noise Signals
Figure 6 shows the waveform of one the VCV tokens used in the experiment, ‘APA’, for
UNP speech in BAS noise. The vowel components, which have more energy than the consonant
component, constitute the two higher-energy sections of the speech that surround the weaker
consonant component in the center. These sections of the speech are annotated at the top of the
figure. Figure 7 shows the waveforms of this token in the different speech and noise conditions
(Figure 7A for the UNP condition, Figure 7B for the EEQ1 condition, and Figure 7C for the
recombined EEQ4 condition). In every type of interference except for BAS, the SNR is set to -4
dB. The left panels show the S+N waveforms, and the right panels show the distribution of the
amplitude of the S+N signal in dB. These amplitude distributions were generated by sampling
points of the S+N signal and, based on their amplitudes in dB, placing them into buckets of
length 1 dB in the range of -10 dB to 85 dB. The RMS value of the signal in dB is shown by the
blue vertical line, and the median amplitude is shown by the green vertical line.
The gaps of the noise in the plots of the S+N waveforms make evident the reduction in
short-term amplitude fluctuations by the EEQ processing. For example, a comparison of the S+N
waveforms in SQW between UNP and either EEQ1 or EEQ4 shows that the lower-energy speech
14
components that are present during the gaps in the fluctuating interference are greater in energy
in the EEQ processed signals. The reduction in amplitude is also seen in the amplitude
distributions in the right panels. The low-energy tails of the amplitude distributions in the UNP
condition are reduced or absent in the EEQ1 and EEQ4 conditions. As a result, the median
amplitudes (given by the green vertical lines) in the EEQ1 and EEQ4 conditions are shifted to
the right, despite the RMS values (given by the blue vertical lines) remaining constant between
UNP and EEQ (as a result of the final normalization step in the EEQ processing that sets the
long-term energy of the output equal to the long-term energy of the input at every sample point).
These effects are analyzed in more detail in Section VI-A of the paper.
Figure 8 shows the EEQ4 waveforms and amplitude distributions in the CON (Figure
8A) and SQW (Figure 8B) conditions on a band-by-band basis. As in Figure 7, an SNR of -4 dB
is used. The boundaries of the four bands, which are logarithmically spaced, are 80 – 253 Hz
(Band 1), 253 – 801 Hz (Band 2), 801 – 2535 Hz (Band 3), and 2535 – 8020 Hz (Band 4). Band
2 has the largest RMS value, followed by, in decreasing order, Band 3, Band 4, and Band 1. The
EEQ1 processing was applied independently in each band.
F. Test Procedure
Experiments were controlled by a desktop PC using MatlabTM software. The digitized
speech-plus-noise stimuli were played through a 24-bit PCI sound card (E-MU 0404 by Creative
Professional) and then passed through a programmable attenuator (Tucker-Davis PA4) and a
headphone buffer (Tucker-Davis HB6) before being presented monaurally to the listener in a
soundproof booth via a pair of headphones (Sennheiser HD580). A monitor, keyboard, and
mouse located within the soundproof booth allowed the listener to interact with the control PC.
15
Consonant identification was tested using a one-interval, 16-alternative, forced-choice
procedure without correct-answer feedback. On each 64-trial run, one of the 64 tokens from the
test set was selected randomly without replacement. Depending on the noise condition, a
randomly selected noise segment equal in duration to that of the speech token was scaled to
achieve the desired SNR and then added to the speech token. The resulting stimulus was either
presented unprocessed (for the UNP conditions) or processed according to EEQ1 or EEQ4
before being presented to the listener for identification. The listener’s task was to identify the
medial consonant of the VCV token that had been presented by selecting a response (using a
computer mouse) from a 4x4 visual array of orthographic representations associated with the
consonant stimuli. No time limit was imposed on the listeners’ responses. Each run typically
lasted 3-5 minutes depending on the listener’s response times. Chance performance was 6.25%-
correct.
Experiment 1. NH listeners were tested using a speech level of 60 dB SPL. The SNR was
set to -10 dB (selected to yield roughly 50%-correct performance for UNP speech in CON noise)
for all noise conditions (except for BAS). For the HI listeners, a linear-gain amplification was
applied to the speech-plus-noise stimuli using the NAL-RP formula (Dillon, 2001). Each HI
listener selected a comfortable speech level when listening to UNP speech in the BAS condition.
For these listeners, the SNR was selected to yield roughly 50%-correct performance for UNP
speech in CON noise. The speech levels and SNRs for each HI listener are listed in Table 1. The
noise levels in dB are the differences between the speech levels and the SNRs.
The three speech conditions were tested in the order of UNP first, followed by EEQ1 and
EEQ4 in a random order. The seven noise conditions were tested in order of BAS first, followed
by a randomized order of the remaining six noises (CON, SQW, SAM, VC-1, VC-2, and VC-4).
16
Five 64-trial runs were presented for each of the 21 conditions (3 speech types x 7 noises). The
first run was considered as practice and discarded. The final four test runs were used to calculate
the percent-correct score in each condition.
Experiment 2. Four of the HI listeners (HI-2, HI-4, HI-5, and HI-7) were tested at two
additional values of SNR after completing Experiment 1. As shown in Table 2, one SNR was 4
dB lower than that employed in Experiment 1 and the other was 4 dB higher. This testing was
conducted with UNP and EEQ1 speech in six types of noise: CON, SQW, SAM, VC-1, VC-2,
and VC-4. The test order for UNP and EEQ1 speech was selected randomly for each listener. For
each speech type, the two additional values of SNR were presented in random order. Within each
SNR, the test order of the six types of noises was selected at random. Five 64-trial runs were
presented at each condition using the tokens from the test set. The first run was discarded as
practice and the final four runs were used to calculate the percent-correct score on each of the 24
additional conditions (2 speech types x 6 noises x 2 SNRs). Other than the SNR, all other
experimental parameters remained the same as in Experiment 1.
G. Data Analysis
For each condition, percent-correct scores were averaged over the final 4 runs (consisting
of 4*64=256 trials). Analysis of Variance (ANOVA) tests were performed on rationalized
arcsine units (RAU; Studebaker, 1985) scores to examine the effects of speech type and noise
condition on these percent-correct scores. MR in percentage points was calculated as the
difference between scores in fluctuating noise and in continuous noise:
MR = Score in Fluctuating Noise - Score in Continuous Noise .
17
Additionally, as was done by Léger et al., a normalized measure of masking release (NMR) was
calculated as the quotient of MR and the difference between scores in quiet and in continuous
noise:
NMR = Score in Fluctuating Noise - Score in Continuous Noise
Score in Quiet - Score in Continuous Noise .
NMR thus represents the fraction of baseline performance lost in continuous noise that can be
recovered in interrupted noise. Listeners who perform just as well in fluctuating noise as in quiet
have an NMR of 1, and listeners who do not perform any better in fluctuating noise than in
continuous noise have an NMR of 0. The metric is useful for comparing performance among HI
listeners whose scores in quiet are different. By using baseline performance as a reference, NMR
emphasizes the differences in performance with interrupted and continuous noise as opposed to
the differences due to factors such as the severity of the hearing loss of the listener or the
distorting effects of the processing on the speech itself.
The MR and NMR calculations in SQW and SAM noises used CON noise as the
continuous noise, and the MR and NMR calculations in VC-1 and VC-2 noises used VC-4 noise
as the continuous noise. These NMR formulas are listed here:
NMRSQW = SQW Score - CON Score
BAS Score - CON Score
NMRSAM = SAM Score - CON Score
BAS Score - CON Score
NMRVC−1 = VC-1 Score - VC-4 Score
BAS Score - VC-4 Score
NMRVC−2 = VC-2 Score - VC-4 Score
BAS Score - VC-4 Score
18
V. RESULTS
A. Experiment 1
The scores from Experiment 1 are reported in Appendix III-A and Appendix III-B and
are summarized in Figure 9, Figure 10, and Figure 11. Appendix III-A provides the scores for
each NH listener in each of the seven noise conditions for UNP, EEQ1, and EEQ4 speech, and
Appendix III-B provides this same information for each HI listener. In Figure 9, the scores are
plotted to highlight the differences in the average scores of the NH and HI listeners across
conditions. In Figure 10 and Figure 11, the scores are plotted to highlight the differences of
speech types within each noise for the NH and HI listeners (Figure 10 for the average NH results
and the average HI results and Figure 11 for the average NH results and the individual HI
results).
First, consider average NH and HI performance, as shown in Figure 9. As expected, the
performance for both groups was greatest in the BAS condition. Performance was lowest in
CON (and was approximately 50%-correct by design of the experiment) and VC-4 (which was
derived from samples of enough speakers to behave similarly to continuous noise). Performance
was intermediate for the remaining noises. Other than in CON noise, scores were greater for NH
than for HI listeners across noise conditions for all three speech types. The differences between
the two groups were relatively small in the BAS condition (where the average differences
between NH and HI listeners were 5.3% in UNP, 7.8% in EEQ1, and 12.2% in EEQ4), showing
that the two groups diverge the most in fluctuating noise conditions where NH listeners were
able to listen in the gaps, unlike HI listeners. In fact, across the five noises other than BAS and
CON, NH scores were on average 17.9, 15.9, and 17.1 percentage points higher than HI scores
for the UNP, EEQ1, and EEQ4 conditions, respectively. HI listeners exhibited slightly more
19
variability in their results than did NH listeners: the mean standard deviations across listeners
(computed as the average of the standard deviations in each of the seven noises1) in percentage
points were 3.59 for UNP, 3.23 for EEQ1, and 4.38 in EEQ4 for NH listeners and 4.67 in UNP,
4.86 in EEQ1, and 4.59 in EEQ4 for HI listeners.
Next, consider NH and HI performance across the different speech types, as is shown in
Figures 10 and 11. Both figures show the mean scores for the NH listeners. Figure 10 shows the
mean scores for the HI listeners, whereas Figure 11 shows the scores for the individual HI
listeners. Note that the data depicted here are the same as that shown in Figure 9 and are
replotted to highlight differences in speech types within a given noise. In general, both NH and
HI listeners scored best in UNP followed by EEQ1 followed by EEQ4. Averaged across the
different listeners and noise types, the NH scores were 78.7% in UNP, 75.5% in EEQ1, and
73.4% in EEQ4, and the HI scores were 65.3% in UNP, 63.0% in EEQ1, and 59.1% in EEQ4.
By noise type, the scores of NH listeners generally followed the pattern of CON = VC-4 < VC-2
< VC-1 < SAM < SQW < BAS, and those of HI listeners generally followed the pattern of VC-4
< CON = VC-2 < VC-1 = SAM < SQW < BAS. Averaged across the different listeners and
speech types, the NH scores were 98.3% in BAS, 52.1% in CON, 92.4% in SQW, 86.2% in
SAM, 81.1% in VC-1, 68.6% in VC-2, and 52.4% in VC-4, and the HI scores were 89.8% in
BAS, 51.5% in CON, 72.1% in SQW, 64.5% in SAM, 61.7% in VC-1, 52.1% in VC-2, and
45.7% in VC-4. EEQ1 processing was effective in improving the scores of HI and NH listeners
in SQW noise: the average NH listener and eight of the nine individual HI listeners (all but HI-3)
1 Note that here, the standard deviation for a given noise and processing condition is calculated
as √∑ 𝜎𝑖
2𝑛𝑖=1
𝑛 , where 𝜎𝑖
2 is the variance of the four recorded runs on listener i and n is the number
of listeners.
20
scored higher with EEQ1 than with UNP in SQW noise. EEQ1 processing also yielded improved
performance for SAM noise in six of the nine HI listeners (all but HI-3, HI-7, and HI-9). For
EEQ4 processing, no improvements over UNP were seen in SQW noise for NH listeners;
however, all but one HI listener (HI-3) showed an improvement. For EEQ4 in SAM noise, there
was no evidence for improvements over UNP for either NH or HI listeners. For all remaining
noise conditions, for both HI and NH, scores were highest with UNP and lowest with EEQ4,
with EEQ1 in between.
The results obtained on each individual NH and HI listener were analyzed using a two-
way ANOVA with main factors of speech type and noise condition. The ANOVAs were
conducted at the significance level of 0.01 on the RAU of the 84 percent-correct scores obtained
on each listener (3 speech x 7 noise x 4 repetitions) and are reported in Table 3. All but one of
the NH listeners (NH-1) and all of the HI listeners had a significant effect of speech type, and all
of the NH and HI listeners had a significant effect of noise type. One of the NH listeners (NH-2)
and all but three of the HI listeners (HI-2, HI-3, and HI-7) had a significant speech by noise
interaction.
Post-hoc Tukey-Kramer comparisons at the significance level of 0.05 were conducted for
cases of significant main factor effects, and the results are listed in Table 4. By speech type, most
listeners had UNP = EEQ1 > EEQ4 (NH-2, HI-2, HI-4, HI-8, HI-9) or UNP > EEQ1 = EEQ4
(NH-3, NH-4, HI-1, HI-5, HI-7). The exceptions were HI-3, who had UNP > EEQ1 > EEQ4, and
HI-6, who had EEQ1 > UNP > EEQ4. By noise type, BAS, SQW, SAM, and VC-1 were greater
than VC-2, VC-4, and CON. Most listeners had BAS > SQW > SAM > VC-1 (NH-2, NH-3, NH-
4, HI-1, and HI-3) or BAS > SQW > SAM = VC-1 (NH-1, HI-2, HI-4, HI-5, HI-6, HI-8, and HI-
9). The exception was HI-7, who had BAS > SQW = SAM = VC-1. All NH listeners had VC-2 >
21
VC-4 = CON, and the order of VC-2, VC-4, and CON in HI listeners varied with each listener,
with five of the nine HI listeners (HI-2, HI-4, HI-5, HI-7, and HI-8) having no significant
differences among the three conditions. As discussed in the preceding paragraph, the significant
speech by noise interaction present in many of the HI listeners is largely due to improved
performance with EEQ1 processing relative to UNP in the SQW and SAM conditions but not in
the other noises.
The NMR data calculated from the scores of Experiment 1 are reported in Appendix III-C
and Appendix III-D and are summarized in Figure 12. Appendix III-C provides the NMR for
each NH listener in the SQW, SAM, VC-1, and VC-2 noise conditions for UNP, EEQ1, and
EEQ4 speech, and Appendix III-D provides this same information for each HI listener. In Figure
12, the NMR results are plotted for the average NH listener and the individual HI listeners to
highlight the differences of speech types within each noise.
As shown in Figure 12, for the HI listeners, NMR was generally similar in EEQ1 and
EEQ4 speech in the various noises and was greater in EEQ1 and EEQ4 than in UNP speech.
Averaged over the HI listeners and the noise types, these NMR values were 0.266 in UNP, 0.345
in EEQ1, and 0.361 in EEQ4. NMR for HI listeners by noise type was generally greatest in SQW
interference, smallest in VC-2 interference, and between the two and equivalent in SAM and
VC-1 interference. As such, NMR was generally greater in the non-speech derived noises than in
the speech-derived noises. Averaged over the HI listeners and speech types, these NMR values
were 0.525 in SQW, 0.320 in SAM, 0.359 in VC-1, and 0.143 in VC-2. EEQ processing yielded
the largest improvement in NMR for HI listeners in the SQW conditions. This improvement
decreased in the SAM condition and disappeared in the VC-1 and VC-2 conditions. Averaged
across HI listeners, NMR values for UNP, EEQ1, and EEQ4, respectively, were 0.320, 0.639,
22
and 0.616 in SQW; 0.227, 0.400, and 0.335 in SAM; 0.391, 0.376, and 0.312 in VC-1; and
0.126, 0.125, and 0.180 in VC-2. NH listeners generally achieved greater NMR than did the HI
listeners with little effect of speech type. Averaged across speech type for NH listeners, NMR
decreases in the order of SQW, SAM, VC-1, and VC-2. Averaged across NH listeners, NMR for
UNP, EEQ1, and EEQ4, respectively, were 0.861, 0.907, and 0.846 in SQW; 0.792, 0.735, and
0.694 in SAM; 0.673, 0.600, and 0.607 in VC-1; and 0.356, 0.351, and 0.355 in VC-2. Both
within and across listeners, HI listeners exhibited greater variability in their results than did NH
listeners.
B. Experiment 2
The scores of Experiment 2 are reported in Appendix IV-A and are summarized in Figure
13. Appendix IV-A provides the scores for each HI listener in the non-BAS noise conditions for
UNP and EEQ1 speech at each of the three SNRs that were tested. Figure 13A plots the results in
non-speech derived noises (except BAS) as a function of SNR and fits sigmoidal functions to the
data, and Figure 13B does the same for the speech-derived noises. The sigmoidal fits to the
psychometric functions in Figure 13 assumed a lower bound corresponding to chance
performance on the consonant-identification task (6.25%-correct) and an upper bound
corresponding to a given listener’s score on the BAS condition for UNP or EEQ. The fitting
process found the slope and midpoint values of a logistic function that minimized the error
between the fit and the data points. The results of the fits are summarized in Table 5 in terms of
their midpoints (SNR in dB yielding a 50%-correct score) and slopes around the midpoint (in
percentage points per dB). For the CON noise conditions, the midpoints and slopes were similar
for UNP and EEQ1 signals for each of the HI listeners. In CON, averaged across listeners,
midpoints were -3.9 dB and -2.8 dB for UNP and EEQ1, respectively, and slopes were 5.2
23
percent per dB and 4.3 percent per dB, respectively. In the two non-speech derived fluctuating
background noises, the midpoints were lower for EEQ1 than for UNP for each of the HI
listeners. Averaged across listeners and for UNP and EEQ1, respectively, midpoints were -13.6
dB and -98.5 dB in SQW (a difference of 84.9 dB) and -7.3 dB and -15.9 dB in SAM (a
difference of 8.6 dB).2 For the speech-derived noise conditions, the midpoints and slopes were
similar for UNP and EEQ1 signals for each of the HI listeners. Averaged across listeners and for
UNP and EEQ1, respectively, midpoints were -9.0 dB and -11.0 dB in VC-1 (a difference of 2.0
dB), -5.8 dB and -4.5 dB in VC-2 (a difference of -1.3 dB), and -3.4 dB and -2.4 dB in VC-4 (a
difference of -1.0 dB). Slopes were similar for both types of processing, where they were ordered
as SQW < SAM < CON for the non-speech derived noises and VC-1 < VC-2 < VC-4 for the
speech-derived noises.
In Figure 14, MR in percentage points is plotted as a function of SNR for SQW, SAM,
VC-1, and VC-2. Here, MR was computed as the difference in sigmoid fits between fluctuating
and continuous noises. Note that this metric differs from NMR in that it is not normalized by the
difference between scores in quiet and in continuous noise (i.e., MR is the numerator in the
NMR quotient). Similarly to what was done in the NMR calculations, MR was computed with
CON as the continuous noise when SQW and SAM were the fluctuating noises and with VC-4 as
the continuous noise when VC-1 and VC-2 were the fluctuating noises. In SQW interference,
these plots indicate greater MR with EEQ1 than with UNP for all subjects across the SNRs. For
SAM interference, MR with EEQ1 generally exceeded that with UNP, although the increase was
generally smaller than in SQW interference. The trend was similar with VC-1 interference,
2 It should be noted that the midpoint of HI-5 (-300.1 dB) was highly deviant relative to the
remaining 3 HI listeners (whose midpoints ranged from -24.4 to -37.3 dB). The SQW midpoint
average falls to -31.3 dB if HI-5 is eliminated, leading to a difference of 17.6 dB with UNP.
24
although the increase in MR with EEQ1 was smaller than that observed for SAM interference.
With VC-2, there was no clear trend showing greater MR for either EEQ1 or UNP. These
observations were generally consistent with the NMR findings discussed in the next paragraph.
NMR was calculated from the scores of Experiment 2 and is reported in Appendix IV-B
and summarized in Figure 15. Appendix IV-B provides the calculated NMR data for each HI
listener in each of the seven noise conditions for UNP and EEQ1 speech at each of the three
SNRs that were tested. In Figure 15A, NMR for EEQ1 is plotted as a function of NMR for UNP
for the individual HI listeners in SQW and SAM noise at the various SNRs, and in Figure 15B,
this same information is plotted for VC-1 and VC-2 noise. In Figure 15A, every NMR data point
lies above the 45-degree reference line, showing a strong tendency in HI listeners for larger
NMR with EEQ1 processing in non-speech derived noises at all SNRs tested. Additionally,
NMR was greater with SQW interference than with SAM interference. In SQW noise, NMR
averaged across subjects at the low, mid, and high SNRs was 0.431, 0.314, and -0.100,
respectively, for UNP and 0.765, 0.657, and 0.564, respectively, for EEQ1. These same numbers
in SAM noise were 0.284, 0.210, and 0.136, respectively, for UNP and 0.505, 0.501, and 0.467,
respectively, for EEQ1.
As shown in Figure 15B, there was less of a difference in NMR for UNP and EEQ1 for
the speech-derived noises than seen in Figure 15A for the non-speech derived noises. However,
more data points were above the reference line with VC-1 than with VC-2. Additionally, NMR
with both UNP and EEQ1 was greater with VC-1 interference than with VC-2 interference. In
VC-1 noise, NMR averaged across subjects for UNP was 0.366 at the low SNR, 0.386 at the mid
SNR, and 0.231 at the high SNR. These numbers for EEQ1 were 0.488 at the low SNR, 0.369 at
25
the mid SNR, and 0.515 at the high SNR. These same numbers in VC-2 noise were 0.223, 0.117,
and 0.019, respectively, for UNP and 0.177, 0.119, and 0.247, respectively, for EEQ1.
VI. DISCUSSION
This section discusses the results of the experiments in greater detail and analyzes
potential explanations for the outcomes. Section VI-A begins by examining the effects that the
EEQ processing has on the amplitude variability of the waveforms. In Section VI-B, the EEQ
effect on NMR is explored. Models are introduced in Section VI-C that attempt to predict the
performance benefit gained with the EEQ processing. In Section VI-D, EEQ1 is compared to
EEQ4 in an attempt to understand differences in performance. Finally, in Section VI-E, different
types of background interference are subjected to a glimpse analysis to explain the different
effects of the EEQ processing with the speech-derived versus non-speech-derived noises.
A. Effect of EEQ on Signal Amplitude
The waveform and amplitude distribution plots of the various S+N signals in Figures 7A,
7B, and 7C are now examined in more detail to assess how EEQ achieves its goal of equalizing
the energy of an S+N signal. The amplitude distribution plots depict RMS values with blue
vertical lines and amplitude medians with green vertical lines. Median amplitudes were plotted
because of their resilience to outliers as compared to the means. As shown in the figures, the
RMS values are constant between UNP and EEQ1 and between UNP and EEQ4 within each type
of interference. This is because the final step of the EEQ processing normalizes the output signal
at every sample point to have a long-term energy equal to that of the input signal. Note also that
the RMS value, which is determined by the levels of the speech and the noise, is equal in all
types of interference except BAS. This is because, in the figure, the SNR is -4 dB in all non-BAS
26
conditions. However, despite the RMS values being the same within a type of interference, the
median amplitudes are not the same. The median amplitude is greater with EEQ1 and EEQ4 than
with UNP. For the VCV token depicted in the figure, the differences in median amplitudes in dB
between EEQ1 and UNP are 2.10 for BAS, 0.42 for CON, 1.78 for SQW, 2.05 for SAM, 1.36 for
VC-1, 1.11 for VC-2, and 0.80 for VC-4. Note that except in CON and VC-4, these values are
smaller than the differences in mean amplitudes between EEQ1 and UNP, which are 4.98 for
BAS, 0.39 for CON, 4.25 for SQW, 2.82 for SAM, 2.97 for VC-1, 1.79 for VC-2, and 0.65 for
VC-4. The fact that the differences in mean amplitudes are greater than the differences in median
amplitudes highlights the fact that the UNP histograms contain tails of low-energy components
that are not present with EEQ. The rightwards shift of the amplitude distribution with the EEQ
processing occurs because the lower energy speech components which are present during the
gaps in the noise are amplified with the processing. The movement of the tail of the amplitude
distribution towards the center of the histogram corresponds to the reduction in amplitude
variation in the processed speech. The waveform and amplitude distributions of EEQ1 and EEQ4
look approximately the same when examining the broadband signals. In dB, the absolute values
of the differences in mean amplitudes between EEQ1 and EEQ4 are 0.21 for BAS, 0.27 for
CON, 0.32 for SQW, 0.33 for SAM, 0.18 for VC-1, 0.73 for VC-2, and 0.57 for VC-4. Further
discussion of the differences between EEQ1 and EEQ4 is found in Section VI-D below.
B. Effect of EEQ on NMR
It was stated that the goals of this research are to increase NMR in HI listeners by
increasing performance in fluctuating interference while maintaining performance in baseline
and continuous noise conditions. For HI listeners, EEQ1 processing yielded improved
performance in SQW and SAM noises (average scores increased by 7.2 and 1.6 percentage
27
points, respectively) but not for the speech-derived noises. For HI listeners, EEQ1 processing
resulted on average in 2.4 and 6.3 percentage point reductions in performance for BAS and CON
noises, respectively. As such, for HI listeners, NMR was greater in SQW and SAM noises with
EEQ1 compared to UNP. This was brought about both by an increase in performance in
fluctuating noise and a greater decrease in performance in CON noise than in BAS. For HI
listeners with EEQ4 processing, NMR also increased, but compared to UNP there was a bigger
performance drop in all noise conditions except SQW, which had a slight performance increase.
Meanwhile, for NH listeners, EEQ1 processing yielded a slight improvement in performance in
SQW noise (average score increased by 1.4 percentage points) but not in the remaining noises.
The overall effect on NMR was minimal both in EEQ1 and EEQ4.
The benefits of EEQ processing for HI listeners in SQW interference are evident through
the NMR results, which are shown in Figure 12 for Experiment 1 and in Figure 15 for
Experiment 2. Figure 16 re-plots the results from Figure 12 to show NMR as a function of the 5-
frequency PTA hearing loss of each of the nine HI listeners. In the figure, NMR decreases as a
function of PTA with UNP speech, which demonstrates the increasing effects of reduced
audibility in the SQW noise gaps with severity of hearing loss. However, with EEQ1 and EEQ4
processing, NMR is much more constant relative to PTA, which highlights the benefits provided
to HI listeners by making the speech component of the signal more audible in the SQW noise
gaps. Additionally, as shown in Figure 15, the increase in NMR with EEQ1 relative to UNP for
SQW and SAM holds at various SNRs: with UNP in these types of interference, NMR becomes
close to zero or even negative at the high SNRs, whereas with EEQ1, NMR is always positive.
28
C. Modelling Psychometric Functions
Two analyses were performed to explore the percent-correct performance shown in
Figure 13 for each speech type and noise as a function of SNR and to attempt to account for the
changes in performance, especially the performance boost in SQW noise.
1) Local Changes in SNR
The first analysis investigated whether the performance improvement in fluctuating
noises with EEQ processing can be explained solely by changes to the SNR. Specifically, for
low-to-moderate SNRs and fluctuating noise, EEQ tends to amplify the higher-SNR stimulus
segments present in the gaps when noise energy is low relative to the lower-SNR stimulus
segments when the noise energy is high. By doing this, EEQ changes the effective SNR of the
stimulus, and so it is possible that the observed increase in NMR, which depends on the
observer, might be explained simply by an increase in SNR, which is independent of the
observer. The first analysis addressed this question by estimating the change in SNR as a result
of EEQ processing and looking at scores as a function of this changed SNR.
Although EEQ processing is nonlinear, the scale factor is applied linearly to the speech
and to the noise. Thus, it is possible to determine its effect on the speech and noise components
of the signal separately for a particular stimulus at a particular input SNR, thus allowing
computation of the post-processing SNR for that input. The output SNR for a particular input
sample (consisting of specific speech and noise samples s(t) and n(t) and a known input SNR,
SNRUNP) may be calculated as follows:
(1) Compute the EEQ scale factor SC(t) applied to an input of x(t) = s(t) + n(t).
(2) The EEQ output signal is given as y(t) = x(t) * SC(t) = ys(t) + yn(t), where:
ys(t) = s(t) * SC(t) and
29
yn(t) = n(t) * SC(t) .
(3) The post-processed SNR (in dB) for this combination of s(t) , n(t), and SNRUNP is
𝑆𝑁𝑅𝐸𝐸𝑄 = 10log10( 𝑦𝑠2(𝑡)̅̅ ̅̅ ̅̅ ̅ / 𝑦𝑛
2(𝑡)̅̅ ̅̅ ̅̅ ̅ ) ,
where 𝑦𝑠2(𝑡)̅̅ ̅̅ ̅̅ ̅ and 𝑦𝑛
2(𝑡)̅̅ ̅̅ ̅̅ ̅ are the mean values of 𝑦𝑠 2(t) and 𝑦𝑛
2 (t), respectively.
Each of the 64 speech tokens used in the experiments was examined with six noise types
(CON, SQW, SAM, VC-1, VC-2, and VC-4) and values of SNRUNP (ranging from -40 to +40
dB). For every combination of speech token, noise type, and SNRUNP, 10 noise samples n(t) of
length equal to s(t) were randomly generated. The above procedure was used to calculate
SNREEQ1 as a function of SNRUNP and noise type averaged across each of 10 noise samples
combined with each of the 64 speech tokens. These averages were used to formulate a pre-to-
post-processing SNR mapping function SNREEQ1 = F(SNRUNP, noise type), shown in Figure 17,
where a diagonal reference line is included to show the case of SNRUNP = SNREEQ1. When
SNRUNP is negative, EEQ1 processing provides an SNR boost by raising the level of the signal
present in the dips in the noise. Interestingly, when SNRUNP is positive, EEQ1 processing actually
lowers the SNR because fluctuations in the signal, as opposed to the noise, drive the
equalization. The CON, SQW, SAM, VC-1, VC-2, and VC-4 curves cross the reference line at
SNRUNP values of -5.8, 4.8, 3.1, 2.5, -0.6, -2.3 dB, respectively.
Using the pre-to-post-processing SNR mapping function, the psychometric functions in
Figure 13 were replotted. Scores for UNP were plotted versus SNRUNP and scores for EEQ1 were
plotted versus SNREEQ1. These plots can be seen in Figure 18A for the non-speech-derived noises
and in Figure 18B for the speech-derived noises. Had the performance boost with EEQ1
processing been able to be explained solely by the change in SNR, the score for UNP and EEQ1
in a given noise type should be the same at a given SNR. For the non-speech-derived noises, this
30
prediction fits well for the data of HI-2 and HI-7, especially in the SQW and SAM conditions,
and for HI-5 in the SAM condition. For the speech-derived noises, this prediction fits well for the
data of HI-2 and HI-7 in the VC-1, VC-2, and VC-4 conditions, for HI-4 in the VC-1 condition,
and for HI-5 in the VC-2 condition. Other than these cases, the modelling of performance based
on the SNR mapping function is less effective.
It should be noted that the local SNR analysis was computed using an entire VCV token,
which is dominated by the vowel components in both duration and level. It is assumed that for
the consonant portion alone, the SNREEQ1 vs SNRUNP curves cross the diagonal reference line at
more positive SNRs than are shown in Figure 17 for the whole VCV token. This is because the
low-energy consonant component is often the beneficiary of the short-term amplification
provided by the EEQ processing algorithm. As such, the EEQ processing does not have a
negative impact on local consonant SNR until more positive SNRs, at which point the speech is
dominant enough that a slight decrease in SNR would not hurt performance.
2) Crest Factor
The second analysis explored whether the performance improvements with the EEQ
processing can be explained by the changes in amplitude variation of an S+N signal. The crest
factor, defined as the peak amplitude of a waveform x divided by its RMS value, gives a sense of
the amplitude variation of the signal. In dB, the crest factor is given as:
Crest Factor = 20log10 (|𝑥|𝑝𝑒𝑎𝑘
𝑥𝑟𝑚𝑠).
Because EEQ processing reduces amplitude variation, it is expected that the processing
would reduce the crest factor as the maximum value of the signal moves closer to the RMS value
of the signal. In a manner similar to what was done for the SNR analysis above, each of the 64
speech tokens used in the experiments was examined with various noise types (CON, SQW,
31
SAM, VC-1, VC-2, and VC-4), processing conditions (UNP and EEQ1), and values of SNRUNP
(ranging from -40 to +40 dB). For every combination of speech token, processing, and SNRUNP,
10 noise samples n(t) of length equal to s(t) were randomly generated. The average maximum
value and the average RMS value across these 10 S+N samples were then recorded. Using these
two average values, an average crest factor in dB was calculated for each of the 64 test syllables
at each noise type, processing type, and value of SNRUNP. Finally, the 64 crest factors (in dB)
calculated for each condition were averaged to formulate a function of Crest Factor = F(SNRUNP,
noise type, processing type). This function is shown in Figure 19A for the non-speech-derived
noises and in Figure 19B for the speech-derived noises, where it can be seen that the EEQ1 crest-
factor curves lie below the corresponding UNP curves of the same noise type. This effect is
consistent with the reduced amplitude variability (and therefore decrease in the ratio of its
maximum value to its RMS value) in the EEQ1 processed signal. Note that the crest factor for
speech in the speech-derived noises is more variable than that in the non-speech-derived noises,
as shown by the jagged curves across the SNRs in Figure 19B as compared to Figure 19A. This
behavior comes from the greater variability in the speech-derived noises in general.
In a manner similar to what was done with the SNR analysis described above, the
psychometric functions in Figure 13 were plotted on a crest-factor scale. Scores for UNP were
plotted versus the pre-processing crest factor and scores for EEQ1 were plotted versus the post-
processing crest factor. These plots can be seen in Figure 20. The percent-correct curves still do
not lie on top of each other, indicating that crest factor by itself also cannot be used to explain the
performance benefits with the EEQ1 processing. In fact, because pure noise has a lower crest
factor than pure speech (as seen by the crest factor curves for CON being lower at negative
SNRs than at positive SNRs), one would expect processing whose performance benefits can be
32
explained solely by crest factor changes to result in signals that have higher crest factors than
UNP signals. However, as stated above, EEQ1 processing lowers the crest factor and thus cannot
be used to explain the psychometric data. This analysis was also performed by using different
percentiles of signal amplitude in the numerator of the crest factor formula (for example, by
using the 95th or 99th percentile rather than the maximum value), but this mapping did not fit the
data well either.
D. EEQ1 vs EEQ4 Processing
EEQ1 proved to be more effective than EEQ4 processing for HI listeners; with the
average HI data, the mean difference between EEQ1 and EEQ4 scores across the seven noises
was 4.0 percentage points. It had been hypothesized that processing frequency bands
independently could provide benefit particularly with speech-derived noises for HI listeners with
non-uniform losses. However, by applying different scale factors to different frequency bands,
such independent-band processing may have interfered with the spectral shape, resulting in
decreased effectiveness. To see if this might be the case, outputs of the three processing schemes
were examined in each of the four bands used for EEQ4.
Figure 21 compares RMS values and median amplitudes for UNP, EEQ1, and EEQ4
within each of the four bands used in the EEQ4 processing as a function of SNR. As seen in
Figures 21A, 21B, and 21C, the RMS values for the three different kinds of processing do not
differ much on a band-by-band basis. This is because the EEQ processing normalizes the RMS
mean of the output signal to be equal to that of the input signal. However, an obvious difference
can be seen between the median values of the UNP and both EEQ processing schemes, as shown
in Figures 21D, 21E, and 21F. For UNP, the median amplitudes have a generally linear decrease
with an increase in SNR, whereas the slopes of the EEQ1 and EEQ4 functions level off at around
33
0 dB. This is consistent with the EEQ processing amplifying the low energy speech components.
The shapes of these functions are generally similar for EEQ1 and EEQ4. However, at the lower
SNRs, bands 1 and 4, and to a lesser extent bands 2 and 3, show greater overlap with EEQ4 than
with EEQ1. At the higher SNRs, bands 1 and 4 and bands 2 and 3 show greater separation for
EEQ4 compared to EEQ1. It is possible that these differences in spectral shape led to the overall
4.0 percentage point decrease in performance with EEQ4 relative to EEQ1. It is possible that
other metrics besides RMS values and median amplitudes might reveal a larger difference in
spectral shape between the two processing schemes. It is also possible that the additional
processing involved in the multi-band scheme introduced additional distortions to the signal,
which led to the observed decreases in performance with EEQ4 compared to EEQ1.
E. Glimpse Analysis of Vocoded and Non-Vocoded Noises
The EEQ processing scheme proved to be more effective, both in terms of improving
scores and NMR, with the non-speech-derived noises compared to the speech-derived noises. An
analysis was conducted on the differences in occurrences of noise glimpses between these two
categories of noises to explore why this may be the case. This analysis was similar to one done
by Cooke (2006), who looked at glimpse percentages and counts for a number of competing
background speakers. Cooke’s analysis defined a glimpse to be a connected region of the
spectrotemporal excitation pattern in which the energy of the speech token exceeded that of the
background by at least 3 dB in each time-frequency element. Unlike Cooke’s analysis, the
current analysis looked at noise alone and examined its envelope as opposed to its
spectrotemporal pattern. The analysis used here defines a noise glimpse to be a section of the
noise where the envelope drops more than 3 dB below the RMS value of the noise for at least 10
ms. Gaps present at the immediate start or end of the noise were not counted because these
34
durations might be truncated from their actual duration. The threshold of 3 dB below the RMS
value was chosen to prevent steady-state noise, which has many small fluctuations in the vicinity
of its RMS value, from being classified as having a significant portion of the signal spent in a
glimpse. The minimum duration of 10 ms was chosen based on a study by He et al. (1999),
which showed that the threshold for detecting a gap in a longer duration noise (400 ms) is
roughly 5 ms, independent of the location of the gap within the noise or whether the gap location
is randomly selected on each trial. Therefore, the analysis described in this paper chose a
minimum duration of 10 ms that was twice as long as the threshold where gaps can be perfectly
discriminated.
Figure 22 depicts the waveforms and envelopes of VC-1 (Figure 22A), VC-2 (Figure
22B), and VC-4 (Figure 22C) noises. The envelope was computed by passing the absolute value
of the signal’s Hilbert transform through 16 logarithmically spaced low-pass filters in the range
of 80 Hz to 8020 Hz with cutoffs of 64 Hz. The red lines represent the RMS values, and the
green lines represent 3 dB below the RMS values. An interval is considered to be a noise gap if
the envelope (shown in blue) drops below the green threshold line for at least 10 ms. As shown
in the figures, as the number of speakers increases in the speech-derived noises, the envelope
hovers closer to the RMS value.
The analysis considered six of the noises used in the experiment (eliminating only the
BAS noise). Additionally, it considered speech-vocoded modulated noises derived from more
than 4 speaker samples; VC-8, VC-16, VC-32, VC-64, VC-128, VC-256, and VC-512 were also
examined. Five hundred samples of each of the noise types were generated to have a duration
equal to an arbitrarily chosen VCV token of 1.29 seconds. Note that these additional noise types
were generated from multiple samples of the same eight speakers as were used to generate VC-1,
35
VC-2, and VC-4. Half of these samples came from combinations of female speakers and half
from combinations of male speakers. For each noise sample, the occurrences of the glimpses
using the above definition were determined. This information was then used to calculate the
percentage that the glimpses constitute of the overall noise duration, the number of glimpses per
second, and the average length of the glimpses. For the first two quantities, the averages over the
500 noise samples generated are shown in Figures 23 and 24: Figure 23 depicts the percentage of
glimpses information, and Figure 24 depicts the measured glimpses-per-second information.
Cooke’s paper contains plots of these same quantities which are similar in shape to the results
obtained with this study’s slightly different metric of a noise glimpse. Figure 25 shows a
histogram of the final quantity, the average glimpse duration in each of the 500 noise samples.
As shown in Figure 23, the average fraction of time spent in a glimpse decreased as the
number of speakers in the vocoded modulated noise increased. As the number of speakers
increased, this fraction approached zero, the value in CON noise. SQW and SAM had values of
0.439 and 0.419, respectively. Note that had the opening and closing gaps been counted and had
the RMS value been used as a threshold instead of 3 dB below the RMS value, these numbers
would have been closer to 0.5 by nature of the symmetry of the noises. VC-1, VC-2, and VC-4,
the three speech-derived noises used in the experiment, had fractions of 0.423, 0.336, and 0.257,
respectively. VC-1 was therefore very similar to SQW and SAM in terms of fraction of the time
spent in a gap, whereas VC-4 had more gaps than CON using the current metric. The variability
of this measure was greater for the speech-derived noises than the non-speech-derived noises and
decreased as the number of speakers in the speech-derived noises increased. Standard deviations
within the 500 samples were 0.120 for VC-1, 0.098 for VC-2, and 0.081 for VC-4, whereas these
values were 0.000 for CON, 0.015 for SQW, and 0.014 for SAM.
36
As shown in Figure 24, the average number of glimpses per second increased from 1
speaker to 2 speakers and then decreased from there on as the number of speakers in the vocoded
noises increased. This quantity approached zero, the value in CON noise. SQW and SAM had
glimpse per second values of 9.20 and 9.22, respectively, and VC-1, VC-2, and VC-4 had
glimpse per second values of 2.57, 3.46, and 3.43, respectively. Thus, all of the speech-derived
noises had fewer number of glimpses per second than did the non-speech-derived noises. The
variability in the number of glimpses per second was greater for the speech-derived noises than
for the non-speech-derived noises, and the variability first increased and then decreased as the
number of speakers in the speech-derived noises increased. VC-1, VC-2, and VC-4 had standard
deviations of 1.14, 1.33, and 1.34, respectively, between the 500 samples, whereas these values
were 0.000, 0.318, and 0.319 for CON, SQW, and SAM, respectively.
To generate Figure 25, the 500 average glimpse durations (corresponding to the average
glimpse duration in each of the 500 noise samples generated for a given noise) were placed into
buckets of length 10 ms. As shown in the figure, the average glimpse duration between samples
of the same type of speech-derived noise varies quite a bit, especially for the ones which were
tested in the experiments (VC-1, VC-2, and VC-4). The histograms for these noises span many
buckets. Meanwhile, for the non-speech-derived noises (CON, SQW, and SAM), there is very
little variability in the average glimpse duration between noise samples. The histograms for these
noises occupy a single bucket: for CON, the bin from 0 to 0.01 seconds and for SQW and SAM,
the bin from 0.04 to 0.05 seconds. Almost every single sample for VC-1, VC-2, and VC-4 falls
into a bucket of greater duration than that for SQW and SAM.
Figures 23, 24, and 25 offer insight into why the EEQ processing performed better with
the non-speech-derived noises than with the speech-derived noises. Although VC-1, SQW, and
37
SAM have similar amounts of total time spent in glimpses, these times are distributed over a
greater number of glimpses in SQW and SAM. With VC-2 and VC-4, there are both fewer total
times spent in glimpses and total number of glimpses than with SQW and SAM. Additionally,
the variability is much greater in the speech-derived noises than the non-speech-derived noises.
The EEQ processing performs best with short, frequent glimpses, as this gives it the best
opportunity to amplify speech exposed during the gaps in the noise by normalizing with the ratio
of the long and short term energies. With VC-1, there are fewer, longer glimpses. Therefore, the
listener only gets a few samples of the speech stimulus rather than little bits throughout. During
the longer glimpses, the running long-term average would be reduced, leading to smaller changes
in gain in these sections. With fewer and longer glimpses (and therefore fewer and longer non-
glimpses as well), it is also possible that the entirety of the low-energy consonant portion of the
speech stimuli would be covered by noise. Thus, the EEQ processing may have less of an
opportunity to operate effectively on the portion of the speech where HI listeners require the
most amplification and could instead end up amplifying noise during these parts. Finally, the
predictability of the non-speech-derived noises (as evidenced by the low standard deviation in
percentage glimpses and number of glimpses) may make it easier for HI listeners to benefit from
EEQ processing with those noises. With the speech-derived noises, the standard deviations are
high, and each sample of noise is quite different from the others. This unpredictable pattern
makes it harder for HI listeners to benefit from EEQ processing in the speech-derived noises.
To examine the role of glimpsing on performance in the different types of noise, Figure
26 plots the mean NH and HI scores for UNP and EEQ1 as a function of the average fraction of
the noise spent in a glimpse. For both speech types and groups of listeners, scores increased with
an increase in the fraction of glimpses once this measure exceeded approximately 0.25. Below
38
this fraction, scores were roughly constant at the level observed in CON. For both NH and HI
listeners, the UNP curve lies above the EEQ1 curve for the smaller fractions of glimpses.
However, as the fraction of glimpses increases, the difference between the curves gets smaller
and even reverses at the highest fractions of glimpses. These trends are consistent with the
concept that the EEQ processing is most effective when there are many glimpses available
throughout the duration of the noise signal.
Another explanation for why EEQ processing results in less of an NMR improvement in
speech-derived noises for HI listeners lies in the fact that many HI listeners begin with a greater
NMR in VC-1 and VC-2 compared to SQW and SAM. As shown by Figure 12, the HI listeners
with the most severe hearing losses (HI-6, HI-7, HI-8, and HI-9) have almost no NMR in the
UNP condition in SQW but do have a non-zero NMR in VC-1. In fact, in the UNP condition, the
NMR is much more constant in VC-1 interference as the listener’s PTA increases than is the case
in SQW. For UNP, this implies that in VC-1 noise, HI listeners were able to recover more of the
performance that was lost in VC-4 noise than they were in SQW with their performance lost in
CON. Thus, there is less room for NMR improvement with EEQ processing in the speech-
derived noises, and it is less surprising that there is not as much of an increase compared to the
non-speech-derived noises.
VII. CONCLUSIONS
EEQ processing was effective in improving NMR for HI listeners in SQW and SAM
interference. The EEQ effect on NMR was less apparent in VC-1 and VC-2 interference.
These observations held across various SNRs.
39
NMR improvements for EEQ resulted primarily from increased performance in
fluctuating noise, especially in SQW interference, although there was also a smaller
decrease in performance in BAS and CON for EEQ.
EEQ processing is more effective with regular and frequent gaps in the fluctuating noises,
as is seen in SQW and SAM. VC-1 and VC-2 have gaps that are more variable in length
and therefore limit the effectiveness of the EEQ processing in using the short and long
window to normalize energy.
EEQ1 processing was more effective than EEQ4 processing. EEQ4 may have interfered
with the spectral shape, resulting in decreased effectiveness.
NMR decreased with increasing hearing loss for UNP speech but was roughly
independent of degree of loss for EEQ speech. This resulted in a large increase in NMR
for HI listeners with the most severe hearing losses.
Although EEQ processing increases the local SNR and decreases the amplitude variation
of a noisy speech signal, neither of these effects provided a complete explanation of
behavioral performance with EEQ signals over a wide range of SNR.
VIII. FUTURE WORK
This study arose out of the desire to study and understand the factors that influence NMR
in HI listeners and to explore a signal processing technique to improve NMR. Future work
relates to these long-term goals. The work reported here evaluated the effectiveness of the EEQ
processing scheme in a consonant identification task. Future studies will explore how the EEQ
processing scheme fares in other speech types, specifically vowels and sentences. A model to
predict the differences in performance exhibited by HI listeners with UNP and EEQ processing
40
in the different types of background noise, building on the SNR and crest factor analyses
described in this thesis, will be further investigated. Also, the EEQ processing scheme will
continue to be analyzed for the potential for improvements in a broader range of fluctuating
noises, most notably in noises with irregular gap lengths. Additionally, the factors causing NMR
to be greater in the speech-derived noises than in the non-speech derived noises with UNP will
be investigated. Ways to restrict the EEQ4 processing from resulting in as much spectral
alteration will also be explored, potentially by restricting the scale factor applied to adjacent
bands to be within a fixed range of each other. Additional signal processing techniques will also
be examined for the improvement of NMR in HI listeners. These techniques will perhaps make
use of what was learned from the EEQ processing, and they could together lead to an increased
understanding of the mechanisms contributing to masking release in both NH and HI listeners.
41
References
Cooke, M. (2006). “A glimpsing model of speech perception in noise,” J. Acoust. Soc. Am. 119,
1562-1573.
Desloge, J. G., Reed, C. M., Braida, L. D., Perez, Z. D., and D’Aquila, L. A. (2016). “Technique
to improve speech intelligibility in fluctuating background noise by normalizing signal
energy over time.” Manuscript in preparation.
Desloge, J. G., Reed, C. M., Braida, L. D. Perez, Z. D., and Delhorne, L. A. (2010). “Speech
_____ reception by listeners with real and simulated hearing impairment: Effects of continuous
_____ and interrupted noise,” J. Acoust. Soc. Am. 128, 342-359.
Dillon, H. (2001). Hearing Aids (Thieme, New York), pp. 239-247.
Festen, J. M., and Plomp, R. (1990). "Effects of fluctuating noise and interfering speech on the
speech reception threshold for impaired and normal hearing,'' J. Acoust. Soc. Am. 88,
1725-1736.
He, N., Horwitz, A. R., Dubno, J. R., and Mills, J. H. (1999). “Psychometric functions for gap
detection in noise measured from young and aged subjects,” J. Acoust. Soc. Am. 106, 966-
978.
Léger, A. C., Reed, C. M., Desloge, J. G., Swaminathan, J., and Braida, L. D. (2015). “Consonant
identification in noise using Hilbert-transform temporal fine-structure speech and
recovered-envelope speech for listeners with normal and impaired hearing,” J. Acoust. Soc.
Am. 136, 859-866.
Phatak, S, and Grant, K. W. (2014). “Phoneme recognition in vocoded maskers by normal-
hearing and aided hearing-impaired listeners,” J. Acoust. Soc. Am. 136, 859-866.
42
Reed, C. M., Desloge, J. G., Braida, L. D., Léger, A. C., and Perez, Z. D. (2016). “Level
variations in speech: Effect on masking release in hearing-impaired listeners.” Under
review for J. Acoust. Soc. Am.
Shannon, R. V., Jensvold, A., Padilla, M., Robert, M. E., and Wang, X. (1999). "Consonant
recordings for speech testing," J. Acoust. Soc. Am. 106, L71-74.
Studebaker, G. A. (1985). “A ‘Rationalized’ Arcsine Transform,” J. Speech Lang. Hear. Res. 28,
455-462.
43
Figure 1: The magnitude and phase of the square root of the ratio of the frequency response of
AVGlong to the frequency response of AVGshort. AVGshort and AVGlong are the moving average
operators used by the EEQ processing in the computation of the running short- and long-term
energies, respectively, of the signal. They are single-pole IIR low pass filters applied to the
instantaneous signal energy with time constants of 5 ms and 200 ms for the short and long
averages, respectively.
44
Figure 2: Block diagrams of the EEQ processing algorithm used in this implementation. Figure
2A outlines the steps of the EEQ1 processing. Eshort[n] and Elong[n] are computed with single pole
IIR filters applied to the instantaneous signal energy with time constants of 5 ms and 200 ms,
respectively, and the scale factor SC[n] is restricted to lie in the range of 0 dB to 20 dB. Figure
2B shows the EEQ1 processing applied independently in each of the four frequency bands to
yield EEQ4.
Figure 2A:
Figure 2B:
45
Table 1: Test ear, age, and 5-frequency pure-tone average (PTA) for each HI listener. The final
two columns provide the comfortable speech presentation levels chosen by each listener with
NAL amplification and the SNR used in testing all speech conditions. The SNR was chosen to
yield 50%-correct in continuous noise.
Listener Test Ear Age 5-Frequency
PTA (dB HL)
Speech Level
(dB SPL)
SNR (dB)
HI-1 R 33 27 68 -8
HI-2 R 58 28 65 -2
HI-3 L 21 30 65 -6
HI-4 L 23 36 65 -2
HI-5 L 20 45 65 -4
HI-6 L 69 53 70 0
HI-7 L 59 56 68 0
HI-8 R 26 58 70 -2
HI-9 L 21 75 71 -2
46
Figure 3: Pure-tone detection thresholds in dB SPL measured for 500 ms tones in a three-
alternative forced-choice adaptive procedure. A green line representing the average thresholds of
the test ears of the NH listeners is shown in the upper left box, and the thresholds for the HI
listeners are shown in the remaining boxes. For the HI listeners, thresholds are shown for the
right ear (red circles) and left ear (blue x’s), with the points of the test ear connected using a solid
line and the points of the non-test ear connected using a dashed line.
47
Figure 4: Waveforms of the 7 background interference conditions used in testing. To make it
easier to see the effects of the modulation, the same underlying noise sample was used to
generate all four non-speech derived noises in this figure.
48
Figure 5: The spectrogram of the randomly generated speech-shaped noise used to create the
BAS, CON, SQW, and SAM interference conditions. The speech-shaped noise had a total
duration of 30 seconds, and a random segment of the speech-shaped noise (of the desired
interference duration) was chosen every time a sample of BAS, CON, SQW, or SAM was
generated.
49
Figure 6: Waveform of the VCV token ‘APA’ for UNP speech in the BAS noise condition. The
high-energy vowel components and the low energy consonant component are annotated at the
top of the figure.
50
Figure 7: Waveforms and amplitude distribution plots for the VCV token ‘APA’ presented with
the seven different kinds of background interference (BAS, CON, SQW, SAM, VC-1, VC-2, and
VC-4) with UNP (Figure 7A), EEQ1 (Figure 7B), and EEQ4 (Figure 7C) processing. The speech
is presented at a level of 65 dB SPL, and the noise (other than BAS) is presented at a level of 69
dB SPL, leading to an SNR of -4 dB. The blue line in the amplitude distribution plot represents
the RMS value. The green line is the median of the amplitude distribution.
51
Figure 7A
52
Figure 7B
53
Figure 7C
54
Figure 8: Waveforms and amplitude distribution plots for the VCV token ‘APA’ presented with
CON (Figure 8A) and SQW (Figure 8B) interference with EEQ4 processing. Each of the four
rows in the figure corresponds to a different logarithmically equal band in the range of 80 Hz to
8020 Hz. The speech is presented at a level of 65 dB SPL, and the noise is presented at a level of
69 dB SPL, leading to an SNR of -4 dB. The blue line in the amplitude distribution plot
represents the RMS value. The green line is the median of the amplitude distribution.
Figure 8A:
55
Figure 8B:
56
Table 2: The SNRs employed in Experiment 2. The Mid SNR was equivalent to the one used in
Experiment 1. The Low SNR was 4 dB lower than the Mid SNR, and the High SNR was 4 dB
higher than the Mid SNR.
Listener Low SNR Mid SNR High SNR
HI-2 -6 -2 2
HI-4 -6 -2 2
HI-5 -8 -4 0
HI-7 -4 0 4
57
Figure 9: The mean percent correct scores achieved by the four NH (green bars) and nine HI
listeners (gold bars) in Experiment 1. The scores were measured with UNP (top panel), EEQ1
(middle panel), and EEQ4 (bottom panel) processing in BAS, CON, SQW, SAM, VC-1, VC-2,
and VC-4 background interference conditions. The error bars associated with each bar are +/- 1
standard deviation.
58
Figure 10: The mean percent correct scores achieved by the four NH (upper panel) and nine HI
listeners (lower panel) in Experiment 1. The scores were measured with UNP (purple bars),
EEQ1 (orange bars), and EEQ4 (green bars) processing in BAS, CON, SQW, SAM, VC-1, VC-
2, and VC-4 background interference conditions.
59
Figure 11: The mean percent correct scores achieved by the four NH listeners (upper left panel)
and the individual percent correct scores achieved by the nine HI listeners (other nine panels) in
Experiment 1. The scores were measured with UNP (purple bars), EEQ1 (orange bars), and
EEQ4 (green bars) processing in BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4 background
interference conditions. The error bars associated with each bar are +/- 1 standard deviation.
60
Table 3: Analysis of Variance results conducted on the rationalized arcsine units for the percent
correct scores of each of the four NH and nine HI listeners. The F-statistic (along with the
degrees of freedom) and probability are shown for each listener by speech type, noise type, and
speech by noise interaction. The probabilities below the 0.01 significance level are bolded and
annotated with an asterisk.
Speech Type Noise Type Speech x Noise
F(2, 63) p F(6, 63) p F(12, 63) p
NH-1 2.54 0.0866 228.22 < 0.0001* 1.24 0.2745
NH-2 26.95 < 0.0001* 269.21 < 0.0001* 2.87 0.0033*
NH-3 17.72 < 0.0001* 281.37 < 0.0001* 1.32 0.2311
NH-4 12.08 < 0.0001* 273.64 < 0.0001* 1.85 0.0592
Speech Type Noise Type Speech x Noise
F(2, 63) p F(6, 63) p F(12, 63) p
HI-1 7.92 0.0009* 237.95 < 0.0001* 2.68 0.0056*
HI-2 7.20 0.0015* 139.51 < 0.0001* 1.57 0.1244
HI-3 114.04 < 0.0001* 325.82 < 0.0001* 0.89 0.565
HI-4 15.71 < 0.0001* 93.95 < 0.0001* 5.22 < 0.0001*
HI-5 21.16 < 0.0001* 162.28 < 0.0001* 2.65 0.0061*
HI-6 30.59 < 0.0001* 163.58 < 0.0001* 3.17 0.0014*
HI-7 7.97 0.0008* 64.67 < 0.0001* 2.26 0.0188
HI-8 17.84 < 0.0001* 134.79 < 0.0001* 4.16 < 0.0001*
HI-9 5.69 0.0054* 94.82 < 0.0001* 3.44 0.0007*
61
Table 4: Tukey-Kramer post-hoc multiple comparison results among the four NH and nine HI
listeners using a significance level of 0.05. The ordering is shown for each listener by speech
type (1 for UNP, 2 for EEQ1, and 3 for EEQ4), noise type (1 for BAS, 2 for CON, 3 for SQW, 4
for SAM, 5 for VC-1, 6 for VC-2, and 7 for VC-4), and speech by noise interaction. When two
conditions are not significantly different, they are listed in decreasing order of mean value
observed. Note that there were some cases where conditions X and Y and conditions Y and Z
were not significantly different, but conditions X and Z were significantly different. In these
cases, Y was listed in the table as being not significantly different to whichever of X and Z it was
closer in mean value to.
Speech Type Noise Type
NH-1 1 = 2 = 3 1 > 3 > 4 = 5 > 6 > 7 = 2
NH-2 1 = 2 > 3 1 > 3 > 4 > 5 > 6 > 7 = 2
NH-3 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 7 = 2
NH-4 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 7 = 2
Speech Type Noise Type
HI-1 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 2 = 7
HI-2 1 = 2 > 3 1 > 3 > 4 = 5 > 6 = 7 = 2
HI-3 1 > 2 > 3 1 > 3 > 4 > 5 > 2 > 6 > 7
HI-4 1 = 2 > 3 1 > 3 > 4 = 5 > 6 = 2 = 7
HI-5 1 > 2 = 3 1 > 3 > 4 = 5 > 6 = 2 = 7
HI-6 2 > 1 > 3 1 > 3 > 4 = 5 = 2 > 6 = 7
HI-7 1 > 2 = 3 1 > 3 = 4 = 5 > 6 = 2 = 7
HI-8 1 = 2 > 3 1 > 3 > 4 = 5 > 2 = 6 = 7
HI-9 1 = 2 > 3 1 > 3 > 4 = 2 = 5 > 6 = 7
62
Figure 12: The mean NMR achieved by the NH listeners (first group of bars) and the individual
NMR for each of the HI listeners (remaining nine groups of bars) with UNP (purple bars), EEQ1
(orange bars), and EEQ4 (green bars) processing. The NMR for the SQW (upper left panel) and
SAM (lower left panel) noises was calculated relative to the CON condition, whereas the NMR
for the VC-1 (upper right panel) and VC-2 (lower right panel) noises was calculated relative to
the VC-4 noise condition.
63
Figure 13: Percent-correct scores plotted as a function of SNR in dB for UNP (circles) and
EEQ1 (asterisks) speech. The data for the non-speech derived noises (SQW noise in blue, SAM
noise in green, and CON noise in red) are plotted in Figure 13A, and the data for the speech-
derived noises (VC-1 noise in blue, VC-2 noise in green, and VC-4 noise in red) are plotted in
Figure 13B. Sigmoidal fits to each of these functions are shown with data points connected by
continuous lines for UNP conditions and dashed lines for EEQ1 conditions.
Figure 13A:
64
Figure 13B:
65
Table 5: Midpoints (M) of SNR in dB for 50%-correct identification and slopes (S) around the
midpoints in percentage points per dB of sigmoidal fits to the data of the four individual HI
listeners shown in Figure 13. M and S are given for UNP and EEQ1 speech in six noise
backgrounds: CON, SQW, SAM, VC-1, VC-2, and VC-4. Means across listeners are provided in
the final row.
UNP SPEECH EEQ1 SPEECH
CON SQW SAM CON SQW SAM
M S M S M S M S M S M S
HI-2 -3.8 5.1 -18.7 1.7 -7.8 3.5 -2.2 5.0 -32.1 1.1 -8.6 3.5
HI-4 -5.3 4.2 -13.9 2.1 -8.7 3.3 -3.9 4.4 -37.3 1.9 -24.4 1.7
HI-5 -4.3 7.2 -15.8 1.5 -8.2 3.7 -2.7 4.7 -300.1 0.2 -24.2 1.0
HI-7 -2.1 4.1 -6.2 2.1 -4.3 3.4 -2.4 3.3 -24.4 1.2 -6.2 3.5
Means -3.9 5.2 -13.6 1.9 -7.3 3.5 -2.8 4.3 -98.5 1.1 -15.9 2.4
UNP SPEECH EEQ1 SPEECH
VC-1 VC-2 VC-4 VC-1 VC-2 VC-4
M S M S M S M S M S M S
HI-2 -8.0 3.4 -6.3 2.8 -3.4 5.2 -8.9 3.0 -3.9 3.8 -2.1 4.0
HI-4 -11.8 2.7 -7.7 3.3 -5.8 3.2 -13.1 2.6 -6.9 3.1 -3.3 4.2
HI-5 -11.1 3.1 -5.6 3.7 -3.6 5.7 -14.5 1.3 -5.0 3.7 -2.3 4.9
HI-7 -5.1 3.2 -3.7 2.7 -0.9 5.0 -7.4 3.1 -2.4 3.9 -1.8 3.9
Means -9.0 3.1 -5.8 3.1 -3.4 4.8 -11.0 2.5 -4.5 3.6 -2.4 4.3
66
Figure 14: MR in percentage-points for UNP (solid lines) and EEQ1 (dotted lines) speech
plotted as a function of SNR in dB. MR is computed as a subtraction of the fluctuating and
continuous sigmoid fit functions in Figure 13. MR for the non-speech derived fluctuating noises
(SQW noise in blue and SAM noise in green) is computed with CON as the continuous noise and
is plotted in Figure 14A, and MR for the speech-derived fluctuating noises (VC-1 noise in blue
and VC-2 noise in green) is computed with VC-4 as the continuous noise and is plotted in Figure
14B.
Figure 14A:
67
Figure 14B:
68
Figure 15: Normalized masking release (NMR) for EEQ1 plotted as a function of NMR for
UNP for the four HI listeners. The data for the non-speech derived noises (SQW with filled
symbols and SAM with unfilled symbols) are plotted in Figure 15A, and the data for the speech-
derived noises (VC-1 with filled symbols and VC-2 with unfilled symbols) are plotted in Figure
15B. NMR is shown for three values of SNR: circles represent the lowest SNR, diamonds the
middle SNR, and squares the highest SNR tested for each of the listeners. The markers are pink
for HI-2, blue for HI-4, red for HI-5, and green for HI-7.
Figure 15A:
69
Figure 15B:
70
Figure 16: The NMR for Experiment 1 in the SQW condition attained by each of the HI
listeners with UNP (purple lines), EEQ1 (orange lines), and EEQ4 (green lines) processing
plotted as a function of PTA in dB HL.
71
Figure 17: Effective SNR after EEQ1 processing (SNREEQ1) plotted as a function of the SNR
before EEQ1 processing (SNRUNP) for 10 samples of noise in each of the 64 test syllables in
CON (red line), SQW (blue line), SAM (green line), VC-1 (yellow line), VC-2 (pink line), and
VC-4 (cyan line) noises. A reference line of SNREEQ1 = SNRUNP (dashed black line) is shown to
make apparent at which points the EEQ processing lowers or raises the effective SNR.
72
Figure 18: Scores plotted as a function of SNR. UNP scores (solid lines with circular markers)
are plotted as a function of SNRUNP, and EEQ1 scores (dashed lines with asterisk markers) are
plotted as a function of SNREEQ1 (obtained from the function in Figure 17). The data for the non-
speech derived noises (CON with red markers, SQW with blue markers, and SAM with green
markers) are plotted in Figure 18A, and the data for the speech-derived noises (VC-1 with blue
markers, VC-2 with green markers, and VC-4 with red markers) are plotted in Figure 18B.
Figure 18A:
73
Figure 18B:
74
Figure 19: Crest factor plotted as a function of SNR with UNP (solid lines) and EEQ1 (dashed
lines) for 10 samples of noise in each of the 64 test syllables. The functions in the non-speech-
derived noises (CON in red, SQW in blue, and SAM in green) are shown in Figure 19A, and the
functions in the speech-derived noises (VC-1 in blue, VC-2 in green, and VC-4 in red) are shown
in Figure 19B.
Figure 19A:
75
Figure 19B:
76
Figure 20: Scores plotted as a function of crest factor (obtained from Figure 19). UNP scores are
solid lines with circular markers, and EEQ1 scores are dashed lines with asterisk markers. The
data for the non-speech derived noises (CON with red markers, SQW with blue markers, and
SAM with green markers) are plotted in Figure 20A, and the data for the speech-derived noises
(VC-1 with blue markers, VC-2 with green markers, and VC-4 with red markers) are plotted in
Figure 20B.
Figure 20A:
77
Figure 20B:
78
Figure 21: The RMS value (Figures 21A, 21B, and 21C) and median amplitude (Figures 21D,
21E, and 21F) of the syllable ‘APA’ presented at 70 dB SPL in SQW interference in the UNP,
EEQ1, and EEQ4 speech conditions. The values in four logarithmically spaced frequency bands
in the range of 80-8020 Hz are plotted with SNRs ranging from -20 dB to 20 dB. The data are
plotted with red squares for Band 1, green circles for Band 2, blue asterisks for Band 3, and
black x’s for Band 4.
Figure 21A:
79
Figure 21B:
80
Figure 21C:
81
Figure 21D:
82
Figure 21E:
83
Figure 21F:
84
Figure 22: The waveforms and envelopes of VC-1, VC-2, and VC-4 noises. Shown with the
envelope (in dB) are the RMS value (red line) and 3 less than the RMS value (green line). The
black horizontal lines correspond to the noise glimpses.
Figure 22A:
85
Figure 22B:
Figure 22C:
86
Figure 23: The fraction that the glimpses constitute of the overall noise duration of 1.29 seconds
(which was fixed to be equal to the duration of an arbitrarily chosen VCV token) plotted as a
function of the number of speakers in the vocoded modulated noises (which are powers of 2
between 1 and 512). Also shown for reference are these values in CON, SQW, and SAM noise,
the first of which is part of the connected vocoded modulated noises curve. For each noise type,
500 samples were averaged together for the computation of the percent glimpses. The error bars
associated with each data point are +/- 1 standard deviation.
87
Figure 24: The average number of glimpses per second (computed using noise samples of
duration 1.29 seconds, the length of an arbitrarily chosen VCV token) plotted as a function of the
number of speakers in the vocoded modulated noises (which are powers of 2 between 1 and
512). Also shown for reference are these values in CON, SQW, and SAM noise, the first of
which is part of the connected vocoded modulated noises curve. For each noise type, 500
samples were averaged together for the computation of number of glimpses. The error bars
associated with each data point are +/- 1 standard deviation.
88
Figure 25: Histograms of the average glimpse duration of the different noise types: speaker-
vocoded modulated noises (with the number of speaker samples as powers of 2 between 1 and
512), CON, SQW, and SAM. For every noise type, 500 samples were generated, and for every
sample, the average glimpse duration was calculated. In generating these histograms, these 500
average glimpse durations were placed into buckets of length 10 ms.
89
90
Figure 26: The scores averaged across the NH (red lines and markers) and HI listeners (blue
lines and markers) for UNP (circular markers connected by solid lines) and EEQ1 (asterisk
markers connected by dotted lines) plotted as a function of the fraction glimpses in CON (0.000),
SQW (0.436), SAM (0.418), VC-1 (0.423), VC-2 (0.364), and VC-4 (0.236).
91
APPENDIX
Appendix I: Speaker Vocoded Modulated Noise Generation
When its scale factor does not have a lower limit of 0 dB, an implementation of EEQ
processing may choose to use a fast-attack, slow-release scheme when computing Eslow(t). In
such a scheme, SC(t) is computed as follows:
SC(t) = √𝐸𝑠𝑙𝑜𝑤(𝑡) / 𝐸𝑓𝑎𝑠𝑡(𝑡) .
Here, Eslow(t) = MAX(Eshort(t), Elong(t)) and Efast(t) = Eshort(t). Computing Eslow(t) in such a manner
ensures that it responds quickly to increases in energy but reacts more slowly to decreases in
energy. As a result, the audibility of the quiet portions of the signal is maximized by quickly
amplifying them and slowly attenuating them. Note that because the current implementation of
EEQ processing sets a lower limit of 0 dB on SC(t), the fast-attack, slow-release scheme does not
have an additional effect beyond what is already provided by the scale factor limit. However,
should an implementation not have this lower limit on the scale factor in place, the fast-attack,
slow-release scheme would provide an additional benefit in terms of maximizing the audibility
of the weaker signal components.
92
Appendix II: Speaker Vocoded Modulated Noise Generation
The 1-, 2-, and 4-speaker vocoded modulated noises were generated according to a
method described in Phatak and Grant (2014). The method takes randomly generated continuous
flat noise and a speech signal as input and produces the speaker vocoded modulated noise as
follows:
1) Apply a Butterworth filter (slope 72 dB/oct) to the noise and the speech signals
separately in 6 logarithmically-equal bands in the range 80-8020 Hz.
2) Apply a low-pass filter (cutoff 64 Hz) to the envelope of the filtered speech signal
(from step 1) in each band.
3) Modulate the filtered noise (from step 1) by the low-pass filtered speech envelope
(from step 2) in each band and scale to the original level of the filtered speech
signal (from step 1).
4) Sum the modulated bands and time-reverse the output to obtain VC-1.
5) Repeat steps 1-4 to obtain a VC-1 noise sample as many times as needed. Add
two VC-1 samples together to create a VC-2 sample or add four VC-1 samples
together to create a VC-4 sample.
6) Scale the VC-1, VC-2, or VC-4 sample to the desired level.
The speech signal used as input comes from one of eight speakers, four of whom are
female and four of whom are male. The speech samples for a given speaker consist of sentences
that are scaled to the same level and concatenated together with a smoothing window applied to
the onset and offset of each sentence. Every sample of a speaker vocoded modulated noise comes
from a randomly chosen combination of speakers of a randomly chosen gender. For example, if a
sample of VC-2 is selected to consist of female speakers, the concatenated speech samples from
two of the four female speakers will be selected. A random start index is chosen for each selected
concatenated speech sample, and the end index is computed based on the desired duration of the
noise. The concatenated speech samples trimmed to these start and end indices are used as the
input to the above algorithm for generating the speaker vocoded modulated noise.
93
Appendix III: Results, Experiment 1
Appendix III-A: Experiment 1 consonant identification scores in percent-correct for the four
normal hearing (NH) listeners. Scores are listed for UNP, EEQ1, and EEQ4 speech for seven
noise conditions: BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4. The NH listeners were tested
at a speech level of 65 dB SPL and an SNR of -10 dB. Means across listeners are provided in the
final row.
UNP EEQ1 EEQ4
BAS CON SQW SAM BAS CON SQW SAM BAS CON SQW SAM
NH-1 98.8 52.7 91.0 86.3 98.4 45.3 92.2 78.5 97.3 52.0 91.4 82.8
NH-2 98.4 57.4 93.4 92.6 99.2 56.6 95.3 89.1 99.2 48.0 90.6 80.1
NH-3 98.4 55.1 94.5 90.6 98.4 47.7 93.8 88.7 97.3 45.7 93.4 86.7
NH-4 98.8 55.9 91.4 88.7 97.7 53.1 94.5 86.3 97.7 55.9 87.5 84.0
Mean 98.6 55.3 92.6 89.6 98.4 50.7 93.9 85.6 97.9 50.4 90.7 83.4
UNP EEQ1 EEQ4
VC-1 VC-2 VC-4 VC-1 VC-2 VC-4 VC-1 VC-2 VC-4
NH-1 84.4 69.5 57.8 82.8 71.1 56.6 80.1 71.9 53.9
NH-2 81.6 76.2 59.4 81.6 69.5 53.1 74.2 62.9 41.4
NH-3 89.1 75.0 58.2 77.7 64.8 47.7 81.6 65.6 48.4
NH-4 85.9 67.6 54.7 77.0 67.6 50.4 76.6 61.7 47.7
Mean 85.3 72.1 57.5 79.8 68.3 52.0 78.1 65.5 47.9
94
Appendix III-B: Experiment 1 consonant identification scores in percent-correct for the nine
hearing impaired (HI) listeners. Scores are listed for UNP, EEQ1, and EEQ4 speech for seven
noise conditions: BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4. The speech levels and SNRs
used in testing each of HI listeners are provided in Table 1. Means across listeners are provided
in the final row.
UNP EEQ1 EEQ4
BAS CON SQW SAM BAS CON SQW SAM BAS CON SQW SAM
HI-1 100.0 59.0 91.4 84.0 99.2 53.9 94.9 84.4 97.7 55.1 93.8 86.7
HI-2 94.5 57.4 74.2 66.4 95.3 49.2 77.3 71.1 91.0 47.3 75.4 63.3
HI-3 96.9 58.2 77.3 74.6 94.5 50.8 71.9 64.5 87.1 41.0 62.1 52.7
HI-4 91.4 58.6 68.8 63.7 85.5 54.7 80.9 77.7 82.8 48.0 75.8 60.2
HI-5 95.3 50.4 66.8 62.5 91.4 42.2 85.5 64.1 87.5 43.4 68.4 62.5
HI-6 89.5 56.3 57.8 55.9 91.0 52.3 68.4 62.5 80.1 48.0 62.1 50.4
HI-7 89.8 56.3 60.5 62.1 82.8 46.9 67.2 59.0 78.1 45.3 66.0 62.5
HI-8 87.9 53.9 63.3 55.1 86.7 42.2 75.4 62.5 82.4 43.8 65.2 53.1
HI-9 92.6 59.8 60.2 64.5 89.5 60.5 77.3 57.4 84.4 56.6 71.9 57.4
Mean 93.1 56.6 68.9 65.4 90.7 50.3 77.6 67.0 85.7 47.6 71.2 61.0
UNP EEQ1 EEQ4
VC-1 VC-2 VC-4 VC-1 VC-2 VC-4 VC-1 VC-2 VC-4
HI-1 81.3 75.4 59.8 77.3 59.0 50.0 75.4 64.8 50.4
HI-2 69.5 58.6 54.3 70.3 55.1 52.0 59.4 56.6 52.3
HI-3 66.0 46.1 41.0 54.3 42.2 30.9 46.9 30.9 27.3
HI-4 69.5 62.9 57.4 66.8 55.5 49.6 61.3 55.9 45.7
HI-5 68.4 51.6 47.7 54.7 53.1 36.7 60.9 48.8 39.1
HI-6 57.8 47.7 48.0 63.3 48.8 46.9 48.8 44.9 43.4
HI-7 64.8 54.7 50.0 57.4 47.3 49.2 52.7 52.3 44.1
HI-8 64.1 51.6 46.9 59.4 43.4 39.8 46.9 44.9 35.5
HI-9 63.7 52.3 48.8 53.5 52.3 44.9 52.0 48.8 41.0
Mean 67.2 55.6 50.4 61.9 50.7 44.4 56.0 49.8 42.1
95
Appendix III-C: Experiment 1 NMR Results for normal hearing listeners. The NMR for SQW
and SAM noises was calculated relative to the CON noise condition, whereas the NMR for the
VC-1 and VC-2 noises was calculated relative to the VC-4 noise condition. Means across
listeners are provided in the final row.
UNP EEQ1 EEQ4 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2
NH-1 0.83 0.73 0.65 0.29 0.88 0.62 0.63 0.35 0.87 0.68 0.60 0.41
NH-2 0.88 0.86 0.57 0.43 0.91 0.76 0.62 0.36 0.83 0.63 0.57 0.37
NH-3 0.91 0.82 0.77 0.42 0.91 0.81 0.59 0.34 0.92 0.80 0.68 0.35
NH-4 0.83 0.76 0.71 0.29 0.93 0.75 0.56 0.36 0.76 0.67 0.58 0.28
Mean 0.86 0.79 0.67 0.36 0.91 0.73 0.60 0.35 0.85 0.69 0.61 0.35
96
Appendix III-D: Experiment 1 NMR Results for hearing impaired listeners. The NMR for SQW
and SAM noises was calculated relative to the CON noise condition, whereas the NMR for the
VC-1 and VC-2 noises was calculated relative to the VC-4 noise condition. Means across
listeners are provided in the final row.
UNP EEQ1 EEQ4
SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2
HI-1 0.79 0.61 0.53 0.39 0.91 0.67 0.56 0.18 0.91 0.74 0.53 0.31
HI-2 0.45 0.24 0.38 0.11 0.61 0.47 0.42 0.07 0.64 0.37 0.18 0.11
HI-3 0.49 0.42 0.45 0.09 0.48 0.31 0.37 0.18 0.46 0.25 0.33 0.06
HI-4 0.31 0.15 0.36 0.16 0.85 0.75 0.48 0.16 0.80 0.35 0.42 0.27
HI-5 0.37 0.27 0.43 0.08 0.88 0.44 0.33 0.30 0.57 0.43 0.45 0.20
HI-6 0.05 -0.01 0.24 -0.01 0.41 0.26 0.37 0.04 0.44 0.07 0.15 0.04
HI-7 0.13 0.17 0.37 0.12 0.57 0.34 0.24 -0.06 0.63 0.52 0.25 0.24
HI-8 0.28 0.03 0.42 0.11 0.75 0.46 0.42 0.08 0.56 0.24 0.24 0.20
HI-9 0.01 0.14 0.34 0.08 0.58 -0.11 0.19 0.17 0.55 0.03 0.25 0.18
Mean 0.32 0.23 0.39 0.13 0.67 0.40 0.38 0.12 0.62 0.33 0.31 0.18
97
Appendix IV: Results, Experiment 2
Appendix IV-A: Experiment 2 consonant identification scores in percent-correct for UNP and
EEQ1 speech for six noise conditions: CON, SQW, SAM, VC-1, VC-2, and VC-4. The SNRs
used in testing each of the four HI listeners are provided in Table 2, and all other experimental
parameters are the same as in Table 1.
Low SNR Mid SNR High SNR
CON SQW SAM CON SQW SAM CON SQW SAM
HI-2 UNP 40.6 70.7 57.8 57.4 74.2 66.4 78.1 80.1 80.9
EEQ1 34.4 76.6 60.2 49.2 77.3 71.1 71.9 82.0 82.0
HI-4 UNP 47.7 65.2 59.8 58.6 68.8 63.7 77.7 77.7 81.6
EEQ1 36.3 82.4 70.3 54.7 80.9 77.7 68.0 85.2 75.8
HI-5 UNP 28.9 62.9 52.7 50.4 66.8 62.5 79.7 73.4 78.9
EEQ1 27.0 80.5 65.6 42.2 85.5 64.1 61.3 77.0 72.3
HI-7 UNP 40.2 52.7 49.2 56.3 60.5 62.1 70.3 68.0 73.0
EEQ1 41.8 66.4 54.3 46.9 67.2 59.0 67.2 72.3 77.0
Low SNR Mid SNR High SNR
VC-1 VC-2 VC-4 VC-1 VC-2 VC-4 VC-1 VC-2 VC-4
HI-2 UNP 57.0 52.7 39.1 69.5 58.6 54.3 78.9 73.4 77.3
EEQ1 59.0 44.1 35.5 70.3 55.1 52.0 78.1 73.0 66.0
HI-4 UNP 64.5 55.9 49.6 69.5 62.9 57.4 80.1 78.5 73.0
EEQ1 64.5 50.8 35.5 66.8 55.5 49.6 78.5 73.0 67.2
HI-5 UNP 61.3 44.5 28.9 68.4 51.6 47.7 81.3 73.4 70.7
EEQ1 60.9 37.5 27.0 54.7 53.1 36.7 71.1 66.0 61.7
HI-7 UNP 50.8 48.8 34.4 64.8 54.7 50.0 71.9 69.5 71.5
EEQ1 57.8 41.4 37.1 57.4 47.3 49.2 78.1 71.1 66.4
98
Appendix IV-B: Experiment 2 NMR results for HI listeners. The NMR for SQW and SAM
noises was calculated relative to the CON noise condition, whereas the NMR for the VC-1 and
VC-2 noises was calculated relative to the VC-4 noise condition. The SNRs used in testing each
of the four HI listeners are provided in Table 2, and all other experimental parameters are the
same as in Table 1.
Low SNR Mid SNR High SNR SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2 SQW SAM VC-1 VC-2
HI-2 UNP 0.56 0.32 0.32 0.25 0.45 0.24 0.38 0.11 0.12 0.17 0.09 -0.23
EEQ1 0.69 0.42 0.39 0.14 0.61 0.47 0.42 0.07 0.43 0.43 0.41 0.24 HI-4 UNP 0.40 0.28 0.36 0.15 0.31 0.15 0.36 0.16 0.00 0.29 0.38 0.30
EEQ1 0.94 0.69 0.58 0.30 0.85 0.75 0.48 0.16 0.98 0.44 0.62 0.32 HI-5 UNP 0.51 0.36 0.49 0.24 0.37 0.27 0.43 0.08 -0.40 -0.05 0.43 0.11
EEQ1 0.83 0.60 0.53 0.16 0.88 0.44 0.33 0.30 0.52 0.36 0.32 0.14 HI-7 UNP 0.25 0.18 0.30 0.26 0.13 0.17 0.37 0.12 -0.12 0.14 0.02 -0.11
EEQ1 0.60 0.30 0.45 0.09 0.57 0.34 0.24 -0.06 0.33 0.63 0.71 0.29