state of art and challenges in improving speech … state of art and challenges in improving speech...
TRANSCRIPT
1
State of art andChallenges in Improving Speech Intelligibility in
Hearing Impaired People
Stefan Launer, Lyon, January 2011
Phonak AG, Stäfa, CH
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 2
Content
� Speech intelligibility in complex listening environments for hearing impaired persons
� Noise reduction technologies in hearing instruments
� De-reverberation
� Single microphone technology
� Multi-microphone technology
� FM systems
� Results
� Challenges
2
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 3
Speech Intelligibility in Noise??? � Speech Intelligibility in Complex Listening Conditions!!
� Different types of interfering sources
� Different spatial arrangements of sources and interferers
� Dynamic…
� Room acoustics
� Reverberation
� Distance
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 4
Speech Intelligibility in Noise??? � Speech Intelligibility in Complex Listening Conditions!!
� Test methodology
� Speech tests: short sentences, words, phonemes- target from front, static
� White noise… from the back
� Anechoic environment
� Lab / real life results
� Speech intelligibility
� Listening effort
3
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 5
Killion 1997
Hearing Loss dB (3FA)Hearing Loss dB (3FA)Hearing Loss dB (3FA)Hearing Loss dB (3FA)
SNR dB
SNR dB
SNR dB
SNR dB
20 30 40 50 60 70 80 90
0
5
10
15
20
Mild hearing loss
Moderate hearing loss
Severe hearing loss
Speech Intelligibility in Noise
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 6
Physical structure of interfering signal has a strong impact on speech intelligibility
Introducing ....
� spectral dips: SH 3 - 4 dB SRT NH: 9 - 15 dB
� temporal dips: SH 1 - 2 dB SRT NH: 6 - 7 dB
� combination of both: SH 4 - 5 dB SRT NH: 15 - 20 dB
� ... improves speech intelligbility a lot for normal-hearing subjects, much less so for hearing impaired subjects !
Peters, Moore and Baer 1998, JASA
4
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 7
Speech Intelligibility in Multi-talker Environment
� Speech intelligibility as a function of interfering talkers
Fig. 2,Bronhorst and Plomp, JASA 1992
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 8
Spatial Release from Masking – Anechoic Chamber
Beutelmann & Brand, JASA 2006
NH HI
10 dB!
5
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 9
Spatial Release from Masking - Office
� Spatial release reduced byreverberation
Beutelmann & Brand, JASA 2006
NH
HI
4 dB!
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 10
Spatial Release from Masking - Cafeteria
Beutelmann & Brand, JASA 2006
NH HI6-7 dB!
6
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 11
Speech intelligibility in reverberant environments
0000
10101010
20202020
30303030
40404040
50505050
60606060
70707070
80808080
90909090
100100100100
Sound suiteSound suiteSound suiteSound suite
Correct
Correct
Correct
Correct
%% %%
T = 0.54T = 0.54T = 0.54T = 0.54 T = 1.55T = 1.55T = 1.55T = 1.55
Reverberation TimeReverberation TimeReverberation TimeReverberation Time
normal
mild
Moderate / severe
Harris & Swenson, Audiology 1990, p. 314-321
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 12
How to mix a Speech in Noise Cocktail
Speech in
noise cocktail
directional microphones
noise
canceling
Objectives for a hearing instrument:
� Speech intelligibility improvement!!!!
� Ease of listening, listening effort, listening comfort
7
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 13
Noise Reduction Using a Single Microphone
� Single Microphone Noise-Cancellers: in principal estimate the noise and subtract it from the noisy signal.
Adaptive Filter:H = 1 - N* / (S + N)
Speech Detection
(S + N)
Noise Estimation
N*
(S + N) - N* ≈ S
� Statistical estimation, amplitude modulation, noise detection in speech pauses
� Use a single information source to separate two signals
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 14
Reverberation Canceller – Reduces the smearing effects by de-blurring the speech signal
Signal
Time
Time span of early
reflections
Time span of disturbing reflections
EchoBlockEchoBlockEchoBlockEchoBlock
Level
8
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 15
Single Microphone Noise Reduction - Summary
� This technique performs well eliminating stationary noises like a fan or in a car, etc.
� Reverberation: very reverberant rooms
� Speech like noises can’t be suppressed without degrading speech quality at the same time.
� ... ease of listening: improving listening comfort
� reduction of perceived noisiness
� less annoyance
� Improvement of speech intelligibility ???
� Sound quality is a trade off…
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 16
Delay & Sum - Technique
� The acoustical signal is picked up at two different locations by the front and the back microphones
� The signal from the back is delayed
� The signals from both microphones are summed
� Depending on delay - different directions are attenuated
f
b
d
Target direction
Delay= d/c
+
9
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 17
Digital Adaptive Directional Microphones
� Adaptive: minimize output energy of the two microphones AD-
Converter
AD-Converter
Back
Front
SpatialProcessor
α
The spatial weighting factor (a) is continuously ad apted, the Directivity Index hereby optimized .
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 18
Digital Adaptive Directional Microphones
� Amplify sounds from front
� Adaptively attenuates strongest noise source
10
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 19
Frequency Specific BeamformingDirectivity in each frequency band
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 20
Directional Microphones: Potential and Limitations
� Significant speech intelligibility benefit compared to omnidirectional systems in complex listening conditions
� from side & asymmetric
� diffuse
� moving noises
� reverberant environments & larger distances
� Lab results: 3-6 dB improvements
11
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 21
Directional Microphones: Potential and Limitations
� Positioning on head
� Microphone mismatch, ageing etc
� More than two microphones
� Noise floor
� Narrow beam pattern acceptable?
� Size constraint: low frequency roll-off
� Computational complexity
f
b
d
Target direction
Delay= d/c
+
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 22
Directivity Index for Different Products Styles and Placements
� BTE
0
5
DI
12
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 23
Factors causing BF mismatch
The beamformer performance in our current products can be limited
due to level and phase mismatch caused by the following factors:
time invariant time variant
Microphone production mismatch Microphone ageing
HI assembly HI repairing
Clean W&W variability W&W pollution
Customer individual head/pinna shape
Non-idealities of current adaptive level matching block
Device geometry:
ITEs and microBTEs have unfavorable mic positions
Customer HI positioning variance
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 24
Effects of Microphone/BF mismatch
� microphone phase deviation
� → rotated null direction
� microphone magnitude deviation
� → reduced suppression
target
blocking
13
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 25
Binaural Directional Microphones
Maximum SNR improvement: 3 dB
Beamformer Beamformer
wireless transmission
∑ ⋅i
ii Xw ∑ ⋅i
ii Xw
Improving directivity by linear combination of mona ural directional microphone outputs
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 26
Test setup
� Subjects
� 20 adults
� Moderate - moderately-severe hearing loss
� Exélia Art and Ambra microP BTE
� Algorithms
� Excelia
� Ambra UltraZoom (monaural)
� Ambra StereoZoom (binaural)
� Test setup
� OLSA: speech intelligibility in noise
� Listening effort scaling
� Paired comparison
14
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 27
Binaural Beamforming
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 28
OLSA
A) B)OLSA, 60° angle
-14
-12
-10
-8
-6
-4
-2
0
SR
T 5
0% in
dB
SN
R
Exélia Art VoiceZoomAmbra UltraZoomAmbra StereoZoom
OLSA, 45° angle
-14
-12
-10
-8
-6
-4
-2
0
15
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 29
Paired Comparison – Subjective Speech Intelligibility
Mit welchem Hörgerät verstehen Sie besser?
0
10
20
30
40
50
60
70
45° Winkel 60° WinkelStörgeräusch
An
zah
l V
erg
leic
he
Ambra UZ Ambra SZ Exelia VZ
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 30
Paired Comparison – Subjective Listening Effort
Mit welchem Hörgerät verstehen Sie leichter?
0
10
20
30
40
50
60
70
80
45° Winkel 60° WinkelStörgeräusch
An
zah
l V
erg
leic
he
Ambra UZ Ambra SZ Exelia VZ
16
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 31
User-Steered directionality
� Traditional beamforming systems focus only to the front
� Speech signals do not always come from the front and facing the speaker is not always possible
� Car, restaurants, small groups
� ZoomControl, accessible through myPilot,allows Exélia wearers to select in which direction to focus hearing
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 32
Listen to the side: User-Steered directionality
� Uses four-microphone network of full bandwidth binaural instruments
� Broadband audio data transfer between devices focuses hearing in one specific direction, while suppressing all signals in other directions
17
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 33
Fal
l_La
unch
_201
0_A
mbr
a_G
B_P
age
33
User-Steered directionality
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 34
Fal
l_La
unch
_201
0_A
mbr
a_G
B_P
age
34
User-Steered directionality
-9
-7
-5
-3
-1
1
3
5
7
0° (front) 90° (left) 180° (back) 270° (right)
SNR (dB
SPL)
Without
Adaptive multichannel directionality
Steerable directionality
ExeliaArt P
18
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 35
Subjective Evaluation – Listening Effort
Which setting needs the least listening effort to u nderstand well? (For
first time and experienced user (n=9))
88%
13%
78%
11% 11%0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Without ZoomControl VoiceZoom Omni
Male speech Female speech
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 36
Binaural noise reduction techniques
� Different types of algorithms
� Beam former: spatial information, timing difference
� Binaural Wiener Filter
� Blind source separation:statistical information estimating room transfer function
� Auditory processing schemes
19
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 37
BWF: Speech Intelligibility Weighted GainS
peec
h In
telli
gibi
lity
Wei
ghte
d G
ain
Acoustic EnvironmentPhD Thesis van den Bogaert 2008
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 38
BWF: Speech Intelligibility Weighted Gain
Spe
ech
Inte
lligi
bilit
y W
eigh
ted
Gai
n
Acoustic EnvironmentPhD Thesis van den Bogaert 2008
20
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 39
Binaural Beam Forming / Noise Reduction
� No stereo output signal => loss of spatial sensation / localization
� Artificially re-introduce that by “split-directionality”
� Mixing in part of the original signal at the output
� Narrower beam width
� How narrow should the beam be (head movement!)?
� Complex environments
� Dynamic -> target tracking, target identification
� Reverberation & distance
� Expected improvements: specific situation, no generic solution
� Single /few strong interfering source, frontal hemisphere
� Environments with little reverberation
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 40
Technical constraints
� Delay over the link
� Clock jtter
� Noise floor, signal degradation
� Microphone calibration (amplitude and phase)
21
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 41
Earlevel FM
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 42
Modern FM Technology
� Dynamic Speech Extraction
� Automatic FM advantage: Adjusts the FM gain depending on the environmental noise level
� Surrounding Noise Compensation
� Voice Activity Detector
� Multi-talker networks: New team teaching concept using up to 10 transmitters
22
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 43
SNR at ear level for different technologies
40 45 50 55 60 65 70 75 80 85
40
35
30
25
20
15
10
5
0
-5
-10
-15
-20
No FM
Traditional FM: fix FM Advantage
Adaptive FM Advantage
Surrounding Noise (dB SPL)
SN
R (
dB)
10 dB FM advantage: - Good environmental awareness- Audibility of the own voice- Compromise at high noise levels
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 44
0°
45°
135°225°
315°
TX3
1 m
*
*
HINT Sentences
Correlated HINT Noise
1 meter (39“) Loudspeakers to center of head.7.6 cm (3“) Loudspeaker toTX3 Transmitter
Fieldstudy with 48 adults
Source: Valente, 2002
23
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 45Speech Intelligibility Threshold
8.6 3.1 -0.8
-14.6 -18.9
-5
-20
-15
-10
-5
0
5
10
Mea
n R
TS
(dB
)
Una
ided
Om
ni
Dua
l
FM
-M
FM
-B
Nor
mal
Listening Conditions
5.9 2.3 1.8 6.5 3.2 2.1 (SD)
Source: Valente, 2002
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 46
Auditory Scene Analysis / Hearing Instrument Processing
Hearing Instrument ProcessingAuditory Processing
Target signal – assumption: in front…Attention control: Target signal identification and tracking, switching back and forth between objects, overcoming salient sources
Retrospective analysisDynamic aspects head / source movement …
A priori knowledge , “situational knowledge”- other sensory modalities- “world knowledge”, models of sources � fill in information…
Channel with limited information capacityChannel: full information capacity
Signal reconstruction & signal modification: amplification & attenuation / filtering -> “distortions”
No signal reconstruction!⇒ Perceptual attenuation, focus attention, suppression of neuronal activity
Delay constraint , real-time processingcomp. power constraint- limited signal analysis, spectro-temporal resolution
No delay constraint , no real-time processingHigher resolution signal analysisMuch higher computational power=> Stream segregation & source formation: works on several different time scales
Bottom upBottom up / top down
24
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 47
Conclusion
� Hearing instruments offer several algorithms to improve speech intelligibility in complex listening environments
� Algorithms based mainly on
� Speech intelligibility in complex listening environments remains a huge challenge
� Reverberation and distance
� Dynamic target selection and tracking
� Technical limitations
� Realistic test setups and test procedures
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 48
Questions
� Speech intelligibility: how much is top-down driven versus bottom-up processing?
� Speech intelligibility: how fast is it really??
� How much information do we infer at the end of a „sentence“?
� Which cues (pitch, temporal fine structure, location,….) are the essential ones, does it depend on situation?
� How does the auditory system pick the relevant one??
� How do we achieve „perceptual constancy“ – voices in real life always sounds the same, (almost) independent of environment?
25
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 49
Thank you…!!!
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 50
0000
10101010
20202020
30303030
40404040
50505050
60606060
70707070
80808080
90909090
100100100100
Sound suiteSound suiteSound suiteSound suite T = 0.54T = 0.54T = 0.54T = 0.54 T = 1.55T = 1.55T = 1.55T = 1.55
Reverberation timeReverberation timeReverberation timeReverberation time
Harris & Swenson, Audiology 1990, p. 314-321
normal
mild
moderate / severe
Speech Intelligibility %
Speech Intelligibility %
Speech Intelligibility %
Speech Intelligibility %
Speech Intelligibility in reverberant environments
26
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 51
Binaural processing - audio delay
� Group delay:
� Is mainly determined by Radio bandwidth, ADC, CODEC, buffering (Error correction)
� Delay shall be deterministic and constant
� For binaural audio processing the link delay adds to the other signal processing delay ie. FFT block processing, ADC.
� Overall system delay should be less than 10 ms (Stone & Moore 2005, …)
� Audiosignals + control data:
� Some more delay for Gain control is acceptable (Hohmann 2009)
©Phonak Stefan Launer, Speech in Noise Workshop, January 2011 Page 52
Jitter – examples: 800 Hz pure tone
30°
0
0°
10time / s
phase difference / deg
sfreq
phasediffT jitter µ30
360≤
⋅°=
Acoustic delay from head dimension: typ. 500 µs for ear distanceNormal hearing minimum audible angle: a few µsJitter should be smaller than 20 µs RMS
-> allows for binaural beamforming without significant localization errors