Meena Ramani
04/12/06
EEL 6586 Automatic Speech Processing
Topics to be covered
Lecture 1: The incredible sense of hearing 1The incredible sense of hearing 1
Anatomy
Perception of Sound
Lecture 2: The incredible sense of hearing 2The incredible sense of hearing 2
Psychoacoustics
Hearing aids and cochlear implants
The incredible sense of hearing-2The incredible sense of hearing-2
“Behind these unprepossessing flaps lie structures of such delicacy that they shame the most skillful craftsman"
-Stevens, S.S. [Professor of Psychophysics, Harvard University]
How do we hear?
Threshold of Hearing
Equal loudness curves
The Bass Loss Problem
Rock music
Too lowno bass
Too hightoo much bass
Threshold variation with age
102
103
104
-10
0
10
20
30
40
50
60
70
80
90
Frequency (Hz)
Th
res
ho
ld o
f h
ea
rin
g (
dB
SP
L)
Thresholds of hearing for normal & HI listeners
Normal hearingHearing impaired
The Audiogram
0 1000 2000 3000 4000 5000 6000-20
0
20
40
60
80
100
Frequency, Hz
He
ari
ng
Le
ve
l (H
L),
dB
Audiogram
Left EarRight Ear
The Audiogram (contd.)
Pure tone audiogram
[250 500 1K 2K 4K 6k] Hz
<20 dB HL is Normal Hearing
0 1000 2000 3000 4000 5000 6000-20
0
20
40
60
80
100
Frequency, Hz
He
ari
ng
Le
ve
l (H
L),
dB
Audiogram
Left EarRight Ear
Loudness Growth Curve
0 20 40 60 80 1000
1
2
3
4
5
6
7
Input level (dB SPL)
LG
OB
-Lo
ud
ne
ss
ra
tin
g
LGOB loudness growth curve at 250 Hz
Normal hearingHearing impaired
Otoacoustic emissions
• The ear produces some sounds!– OHC-outer hair cell
• Used to test hearing for infants & check if patient is feigning a loss
Monoaural beats
If two tones are presented monaurally with a small frequency difference, a beating pattern can be heard
500 & 502 Hz 500 & 520 Hz
Interaction of the two tones in the same auditory filter
Waveform: 150 Hz + 170 Hz
Beating can also be heard when the tones are presented to different ears!
Beating arises from neural interaction
Only perceived if the tones are sufficiently close in frequency
500 Hz - left 520 Hz - right binaural
Binaural beats
The case of the missing fundamental
Telephone BW: 300-3400 Hz
How do we know the pitch?
Primary Auditory cortex
•Pitch sensitive neurons [Bendor and Wang, Nature 2005]
•Neuron responds to fundamental and harmonics
•What are the I/Ps to these neuron?
How do spikes represent periodic, temporal and spectral information?
Matlab code available
Feed it a wav file
Spits out PSTH
<post stimulus time histogram>
Auditory-periphery model
(Zhang et al. ~2001)
Critical bands
Equally loud, close in frequency
•Same IHCs
•Slightly louder
Equally loud, separated in freq.
•Different IHCs
•Twice as loud
Psychoacoustic experiments
Critical Band (cont.)
• Proposed by Fletcher• How to measure?
– S/N ratio vs noise BW • CB ~= 1.5mm spacing on BM• 24 such band pass filters
• BW of the filters increases with fc
• Logarithmic relationship – Weber’s law example
• Bark scale
Center Freq Critical BW
100 90
200 90
500 110
1000 150
2000 280
5000 700
10000 1200
Critical bands for HI
103
104
0
10
20
30
40
50
60
70
80
90
Desired tone frequency (Hz)
De
sir
ed
to
ne
th
res
ho
ld (
dB
SP
L)
4 kHz tuning curve for normal & HI listeners
MaskerNormal hearingHearing impaired
“You know I can't hear you when the water is running!”
MASKING
Frequency Masking
• Masking occurs because two frequencies lie within a critical band and the higher amplitude one masks the lower amplitude signal
• Masking can be because of broad band, narrowband noise, pure and complex tones
• Low frequency broad band sounds mask the most– Eg. Truck on road, water flowing
• Masking threshold– Amount of dB for test tone to be just audible in presence of noise
Temporal Aspects of Masking
• Simultaneous Masking• Pre-Stimulus/Backward/Premasking
– 1st test tone 2nd Masker
• Poststimulus/Forward/Postmasking– 1st Masker 2nd test tone
Simultaneous masking– Duration >200ms constant test tone threshold– Assume hearing system integrates over a period of 200ms
Postmasking– Decay in effect of masker for 100ms– More dominant
Premasking – Takes place 20ms before masker is on!!– Each sensation is not instantaneous , requires build-up time
• Quick build up for loud maskers• Slower build up for softer maskers
– Less dominant effect
Temporal Aspects of Masking (contd.)
Temporal masking for HI
0 20 40 60 80 100 120 1400
10
20
30
40
50
60
70
80
Desired-Masker tone separation (ms)
De
sir
ed
to
ne
th
res
ho
ld (
dB
SP
L)
Temporal resolution at 4 kHz for normal & HI listeners
Normal hearingHearing impaired
Meena Ramani
04/14/06
EEL 6586 Automatic Speech Processing
Normal Hearing
Sensorineural Hearing Loss
Mild to Severe Loss
[10 20 30 60 80 90] dB HL
Time (s)
Fre
qu
en
cy
(H
z)
Cell phone speech for normal hearing
0 0.5 1 1.5 20
500
1000
1500
2000
2500
3000
3500
4000
-250
-200
-150
-100
-50
0
Time (s)
Fre
qu
en
cy
(H
z)
Cell phone speech for SNHL
0 0.5 1 1.5 20
500
1000
1500
2000
2500
3000
3500
4000
-250
-200
-150
-100
-50
0
What do the hearing impaired hear?
Facts on Hearing Loss in Adults
• One in every ten (28 million) Americans has hearing loss.
• The vast majority of Americans (95% or 26 million) with hearing loss can have their hearing loss treated with hearing aids.
• Only 6 million use HAs
• Millions of Americans with hearing loss could benefit from hearing aids but avoid them because of the stigma.
Types of Hearing aids
Behind The earIn the Ear
In the Canal Completely in the canal
Anatomy of a Hearing Aid
• Microphone• Tone hook• Volume control• On/off switch
• Battery compartment
Ear Mold Measurements
Hearing Aid Fitting
Acclimatization effect
Auditory cortex brain plasticity
Time for the HI to reuse the HF information: Acclimatization effect
How does this affect HA fitting?– Multiple fitting sessions– Initial fitting should be optimum one
So doc, what is the fitting methodology employed by the hearing aid company to compensate for my hearing loss?
Not-so-average Joe
(PhD EE/Speech person)
CO
NFI
DEN
TIA
L?
So, do you want your HA to:
1) Always be comfortably loud2) Equalize loudness across
frequencies3) Normalize loudness
…?
?
Which fitting methodology is the bestbest?
Existing HL compensation algorithms
Rationale Adhoc: Half Gain, POGO Make speech comfortable: NAL-R Loudness normalization: IHAFF, Fig 6 Loudness equalization: DSL
Hearing aid fittingalgorithms
Threshold-only Suprathreshold
NAL-R POGO HG Fig 6 IHAFF DSL
Sensorineural hearing loss [10 20 30 60 80 90] dB HLSpeech level= 65 dBA
Spectrograms and sound files
Normal hearing Hearing impaired HI with Linear gain
HI with DSL gain HI with RBC gain
Section Two
Speech Intelligibility
Objective MeasuresAI, STI
Speech Quality
Objective MeasuresPESQ
Subjective MeasuresMOS
Speech Intelligibility (SI): The degree to which speech can be understood
Performance metrics
Subjective MeasuresHINT
Speech Quality: “Does the speech match your expectations?”
Performance metrics (contd.)• Objective speech quality measure
– Perceptual Evaluation of Subjective Quality (PESQ)• Subjective speech quality measure
– Mean Opinion Score (MOS)• Subjective speech intelligibility measure
– Hearing In Noise Test (HINT)
Reference signal
Comparison signal
Score
Hearing In Noise Test (HINT)
Subjective listening experiments
Audiograms of the HI patients
0 2000 4000 6000 80000
20
40
60
80
100
120
Frequency (Hz)
Th
res
ho
ld o
f h
ea
rin
g (
dB
HL
)
Left ear audiograms of the HI subjectsLocation:
Shands speech & hearing clinic
(sound proof booth)
Subjects:
15 HI people– PTA: 40-70 dB HL
15 normal hearing people
Tools used:
Matlab HINT and MOS GUIs
Subjective HINT and MOS scores for RBC:hearing impaired, cell phone speech
RBC has a 7 dB improvement in SI when compared to DSL
MOS scores reveal that RBC has a quality rating of ‘Good’
None HPF RBC NALR POGO HG NALRP DSL
1-Bad
2-Poor
3-Fair
4-Good
5-Excellent
Algorithm
Ave. MOSs of 15 HI subjects
None HPF RBC NALR POGO HG NALRP DSL-20
-15
-10
-5
0
5
Algorithm
SN
R r
ela
tiv
e t
o b
as
eli
ne
(d
B)
Ave. HINT scores of 15 HI subjects
Subjective HINT and MOS scores for RBC:normal hearing, cell phone speech
RBC has a 12 dB improvement in SI when compared to DSL
MOS scores reveal that RBC has a quality rating of ‘Good’
Cochlear Implants
The first fully functional Brain Machine The first fully functional Brain Machine Interface (BMI)Interface (BMI)
Definition:
A device that electrically stimulates the auditory nerve of patients with severe-to-profound hearing loss to provide them with sound and speech information
Who is a candidate?
• Severe-to profound sensorineural hearing loss
• Hearing loss did not reach severe-to-profound level until after acquiring oral speech and language skills
• Limited benefit from hearing aids
• Worldwide:– Over 100,000 multi-channel implants
• At Univ of Florida:– Implanted first patient in 1985– Currently follow over 400 cochlear patients
CI statistics
Technical and Safety Issues
• Magnetic Resonance Imaging• Surgical issues
How does the Cochlea encode frequencies?
Example: New Freedom
CI characteristics
1. Electrode design – Number of electrodes, electrode configuration
2. Type of stimulation – Analog or pulsatile
3. Transmission link – Transcutaneous or percutaneous
4. Signal processing – Waveform representation or feature extraction
Signal processing
• Compressed Analog (CA)• Continuous Interleaved Sampling (CIS)• Multiple Peak (MPEAK )• Spectral Maxima Sound Processor (SMSP)• Spectral Peak (SPEAK)
Compressed Analog (CA) approach
CA activation signals
Continuous Interleaved Sampling (CIS)
CIS activation signals
Multiple Peak (MPEAK)
MPEAK activated electrodes
Spectral Maxima Sound Processor (SMSP)
SMSP activated electrodes
Spectral Peak (SPEAK)
SPEAK activated electrodes
Outcomes for Post-lingual Adults
• Wide range of success
• Most score 90-100% on AV sentence materials
• Majority score > 80% on high context
• Performance more varied on single word tests
Auditory Brainstem Implant
• Approved October 20, 2000• Uses the Nucleus 24 system
processors• Plate array with 21 electrodes
Review-1Pinna:
ITDs,IIDs: Horizontal localizationReflections: Vertical localization
Ear canal:¼ wave resonance 1-3 kHz
Middle ear:Amplification by lever action and by areaStapedius reflex
Cochlea:IHCs/OHCs: convert mechanical to electricalPlace theory: frequency analysisMissing fundamental
Review-2
Adaptation: AN firing sensitive to changes
Otoacoustic emissions:Produced by movement of OHCs
Beats:Monaural & binaural
Measurement of hearing:Audiogram: threshold of hearingThreshold variation with ageEqual loudness curves
Bass loss problem: discrimination against LFs
Review-3
Critical bands:used for efficient encodingBark scale
Masking:Frequency: LFs mask moreTemporal: simultaneous, pre and post
Hearing impairment:Hearing aids: external to cochleaCochlear implants: inside cochlea