![Page 1: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/1.jpg)
Meena Ramani
04/10/06
EEL 6586 Automatic Speech Processing
![Page 2: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/2.jpg)
Topics to be covered
Lecture 1: The incredible sense of hearing 1The incredible sense of hearing 1
Anatomy
Perception of Sound
Lecture 2: The incredible sense of hearing 2The incredible sense of hearing 2
Psychoacoustics
Hearing aids and cochlear implants
![Page 3: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/3.jpg)
Lecture 1:Lecture 1:The incredible sense of hearing The incredible sense of hearing
“Behind these unprepossessing flaps lie structures of such delicacy that they shame the most skillful craftsman"
-Stevens, S.S. [Professor of Psychophysics, Harvard University]
![Page 4: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/4.jpg)
Why study hearing?
• Best example of speech recognition– Mimic human speech processing
• Hearing aids/ Cochlear implants• Speech coding
![Page 5: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/5.jpg)
• The stapes or stirrup is the smallest bone in our body. – It is roughly the size of a grain of rice ~2.5mm
• Eardrum moves less than the diameter of a hydrogen atom – For minimum audible sounds
• Inner ear reaches its full adult size when the fetus is 20-22 weeks old.
• The ears are responsible for keeping the body in balance
• Hearing loss is the number one disability in the world. – 76.3% of people loose their hearing at age 19 and over
Interesting facts
![Page 6: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/6.jpg)
Specifications
Frequency range: 20Hz-20kHz
Dynamic range: 0-130 dB
JND frequency: 5 cents
JND intensity: ~1dB
Size of cochlea: smaller than a dime
![Page 7: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/7.jpg)
A
N
A
T
O
M
Y
![Page 8: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/8.jpg)
Outer ear
Focuses sound waves (variations in pressure) into the ear canal
Pinna size:• Inverse Square Law• Larger pinna captures more of the wave • Elephants: hear low frequency sound from up to 5 miles away
Human Pinna structure: • Pointed forward & has a number of curves• Helps in sound localization• More sensitive to sounds in front
Dogs/ Cats- Movable Pinna => focus on sounds from a particular direction
Pinna /Auricle
Auditory Canal
![Page 9: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/9.jpg)
Interaural Time Difference (ITD)
Interaural Intensity Difference (IID)
Horizontal localization
Vertical localization
Sound Localization
Outer earPinna /Auricle
Auditory Canal
Is sound on your right or left side?
![Page 10: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/10.jpg)
Interaural differences
- The signal needs to travel further to more distant ear- More distant ear partially occluded by the head
Two types of interaural difference will emerge
- Interaural time difference (ITD)- Interaural intensity difference (IID)
![Page 11: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/11.jpg)
Illustration of interaural differences
Leftear
Rightear
soundonset
left right
time
![Page 12: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/12.jpg)
Leftear
Rightear
soundonset
time
arrival timedifference
Illustration of interaural differences
![Page 13: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/13.jpg)
Leftear
Rightear
soundonset
time
ongoing timedifference
Illustration of interaural differences
![Page 14: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/14.jpg)
Leftear
Rightear
soundonset
time
inte
nsity
diff
eren
ce
Illustration of interaural differences
![Page 15: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/15.jpg)
Interaural time differences (ITDs)
Threshold ITD 10-20 s (~ 0.7 cm)
Interaural intensity differences (IIDs)
Threshold IID 1 dB
Thresholds
![Page 16: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/16.jpg)
Interaural time differences (ITDs) Low frequencies
• Up to around 1500 Hz; sensitivity declines rapidly above 1000 Hz
• Smallest phase difference corresponds to the true ITD
Interaural intensity differences (IIDs) High Frequencies
• The amount of attenuation varies across frequency
• below 500 Hz, IIDs are negligible (due to diffraction)
• IIDs can reach up to 20 dB at high frequencies
D
U
P
L
E
X
T
H
E
O
R
Y
![Page 17: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/17.jpg)
Pinna Directional Filtering
Horizontal localization
Vertical localizationSound Localization
•Pinna amplifies sound above and below differently
•Curves in structure selective amplifies certain parts of the sound spectrum
Outer earPinna /Auricle
Auditory Canal
Is sound above or below?
![Page 18: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/18.jpg)
Pinna /Auricle
Auditory Canal
•Closed tube resonance: ¼ wave resonator
•Auditory canal length 2.7cm
•Resonance frequency ~3Khz
•Boosts energy between 2-5Khz upto 15dB
Outer ear
![Page 19: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/19.jpg)
A
N
A
T
O
M
Y
![Page 20: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/20.jpg)
Middle Ear
Impedance matching– Acoustic impedance of the fluid is 4000 x that of air– All but 0.1% would be reflected back
Amplification– By lever action < 3x– Area amplification [55mm2 3.2mm2] 15x
Stapedius reflex – Protection against low frequency loud sounds– Tenses muscles stiffens vibration of Ossicles– Reduces sound transmitted (20dB)
Eardrum
Ossicles
Oval window
Pressure variations are converted to mechanical motionEardrum OssiclesOval WindowOssicles: Malleus, Incus, Stapes
![Page 21: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/21.jpg)
A
N
A
T
O
M
Y
![Page 22: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/22.jpg)
Inner EarSemicircular Canals
Cochlea
Body's balance organsAccelerometers in 3 perpendicular planes
•Hair cells detect fluid movements•Connected to the auditory nerve
![Page 23: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/23.jpg)
Cochlea is a snail-shell like structure 2.5 turns
3 fluid-filled parts:
•Scala tympani
•Scala Vestibuli
•Cochlear duct (Organ of Corti)
(1) Organ of Corti
(2) Scala tympani
(3) Scala vestibulli
(4) Spiral ganglion
(5) auditory nerve fibres
Semicircular Canals
Cochlea
Inner Ear
![Page 24: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/24.jpg)
Semicircular Canals
Cochlea
Organ of CortiBasilar membraneInner hair cells and outer hair cells (16,000 -20,000)IHC:100 tiny stereocilia
The body's microphone:•Vibrations of the oval window causes the cochlear fluid to vibrate•Basilar membrane vibration produces a traveling wave•Bending of the IHC cilia produces action potentials•The outer hair cells amplify vibrations of the basilar membrane
Inner Ear
![Page 25: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/25.jpg)
The cochlea works as a frequency analyzer It operates on the incoming sound’s frequencies
![Page 26: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/26.jpg)
Place Theory
• Each position along the BM has a characteristic frequency for maximum vibration
• Frequency of vibration depends on the place along the BM• At the base, the BM is stiff and thin (more responsive to high Hz)• At the apex, the BM is wide and floppy (more responsive to low Hz)
32-35 mm long
4mm21mm2
![Page 27: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/27.jpg)
Tuning curves of auditory nerve fibers
Response curve is a BPF with almost constant Q(=f0/BW)
To determine the tonotopic map on Cochlea
•Apply 50ms tone bursts every 100ms
•Increase sound level until discharge rate increases by 1 spike
•Repeat for all frequencies
![Page 28: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/28.jpg)
Auditory Neuron
Carries impulses from both the cochlea and the semicircular canalsConnections with both auditory areas of the brainNeurons encode– Steady state sounds– Onsets or rapidly changing frequencies
Auditory Area of Brain
![Page 29: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/29.jpg)
Auditory Neurons Adaptation
•At onset, auditory neuron fiber firing increases rapidly
•If the stimulus remains (a steady tone for eg.) the rate decreases exponentially
•Spontaneous rate: Neuron firings in the absence of stimulus
Neuron is more responsive to changes than to steady inputs
![Page 30: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/30.jpg)
Perception of Sound
Threshold of hearing– How it is measured– Age effects
Equal Loudness curves
Bass loss problem
Critical bands
Frequency Masking
Temporal Masking
![Page 31: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/31.jpg)
Threshold of Hearing
Hearing area is the area between the Threshold in quiet and the threshold of pain
![Page 32: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/32.jpg)
Bekesy TrackingSTEPS:
•Play a tone
•Vary its amplitude till its audible
•Then tone’s amplitude is reduced to definitely inaudible and the frequency is slowly changed
•Continu\e
![Page 33: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/33.jpg)
Threshold variation with age
•Presbycusis
•Hearing sensitivity decreases with age especially at High frequencies
•Threshold of pain remains the same
•Reduced dynamic range
32-35 mm long
4mm21mm2
![Page 34: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/34.jpg)
Equal Loudness Curves
Loudness is not simply sound intensity!
Factor of ten increase in intensity for the sound to be perceived as twice as loud.
![Page 35: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/35.jpg)
The Bass Loss Problem
Eg. Rock music
Too lowno bass
Too hightoo much bass
For very soft sounds, near the threshold of hearing, the ear strongly discriminates against low frequencies.
For mid-range sounds around 60 phons, the discrimination is not so pronounced
For very loud sounds in the neighborhood of 120 phons, the hearing response is more nearly flat.
![Page 36: Meena Ramani 04/10/06 EEL 6586 Automatic Speech Processing](https://reader035.vdocument.in/reader035/viewer/2022062515/56649cee5503460f949bc782/html5/thumbnails/36.jpg)
Elephants
Sound Production
A a typical male elephant’s rumble is around an average minimum of 12 Hz, a female's rumble around 13 Hz and a calf's around 22 Hz.
Produce sounds ranging over more than 10 octaves, from 5 Hz to over 9,000 Hz
Produce very gentle, soft sounds as well as extremely powerful sounds. (112dB recorded a meter away)
Hearing
Wider tympanic membranes Longer ear canals (20 cm) Spacious middle ears.
Low frequency detection