09 march 2013

13
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), 09-11 March 2013, Allahabad, India 09 March 2013 Speech Processing for Persons with Moderate Sensorineural Hearing Impairment Prem C. Pandey EE Dept., IIT Bombay pcpandey @ ee.iitb.ac.in www.ee.iitb.ac.in/~pcpandey, www.ee.iitb.ac.in/~spilab

Upload: hayley

Post on 23-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Speech Processing for Persons with Moderate Sensorineural Hearing Impairment Prem C. Pandey EE Dept., IIT Bombay pcpandey @ ee.iitb.ac.in www.ee.iitb.ac.in/~pcpandey , www.ee.iitb.ac.in/~spilab. - PowerPoint PPT Presentation

TRANSCRIPT

Speech Processing Under Adverse Listening Conditions

Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), 09-11 March 2013, Allahabad, India

09 March 2013Speech Processing for Persons with Moderate Sensorineural Hearing Impairment

Prem C. Pandey

EE Dept., IIT Bombay

pcpandey @ ee.iitb.ac.inwww.ee.iitb.ac.in/~pcpandey, www.ee.iitb.ac.in/~spilab1OutlineA. Speech & Hearing

B. Noise Suppression

S. K. Waddi, P. C. Pandey, N. TiwariSpeech Enhancement Using Spectral Subtraction and Cascaded Median Based Noise Estimation for Hearing Impaired Listeners (Proc. NCC 2013, Delhi, 15-17 Feb. 2013, Paper 3.2_2_1569696063)

C: Reducing the Effect of Increased Spectral Masking

N. Tiwari, P. C. Pandey, P. N. Kulkarni Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment (,Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689)[email protected] Production Mechanism

Excitation source & filter model Excitation: voiced/unvoiced glottal, frication Filtering: vocal tract filter [email protected] segments Words Syllables Phonemes Sub-phonemic segments

Phonemes: basic speech units Vowels: Pure vowels, Diphthongs Consonants: Semivowels, Stops, Fricatives, Affricates, Nasals/aba/

/apa/

/aga/

/ada/[email protected] features Modes of excitationGlottalUnvoiced (aspiration, constriction at the glottis) Voiced (vibration of vocal chords) FricationUnvoiced (constriction in vocal tract)Voiced (constriction in vocal tract & glottal vibration) Movement of articulators Continuant (steady-state vocal tract configuration): vowels, nasal stops, fricatives Non-continuant (changing vocal tract): diphthongs, semivowels, oral stops (plosives) Place of articulation (place of maximum constriction in vocal tract)Bilabial, Labio-dental, Linguo-dental, Alveolar, Palatal, Velar, GluttoralChanges in voicing frequency (Fo)

Supra-segmental features Intonation Rhythm [email protected] MechanismPeripheral auditory systemExternal ear (sound collection)Pinna Auditory canalMiddle ear (impedance matching)Ear drumMiddle ear bonesInner ear (analysis and transduction): cochleaAuditory nerve (transmission of neural impulses)

Central auditory system Information processing & interpretation

[email protected]

Tonotopic map of cochleaAuditory system

[email protected] of hearing lossesConductive loss Sensorineural loss Central loss Functional lossSensorineural hearing lossElevated hearing thresholds Reduced intelligibility as speech components are inaudibleReduced dynamic range & loudness recruitment (abnormal loudness growth)Distortion of loudness relationship among speech componentsIncreased temporal masking Poor detection of acoustic eventsIncreased spectral masking (due to widening of auditory filters)Reduced frequency selectivity Reduced ability to sense spectral shapes of speech sounds>> Poor intelligibility and degraded perception of speech

Hearing [email protected] availableFrequency selective amplificationImproves audibility but may not improve intelligibility in presence of noiseAutomatic volume controlMultichannel dynamic range compression (settable attack time, release time, and compression ratios) Compresses the natural dynamic range into the reduced dynamic range

Under InvestigationImprovement of consonant-to-vowel ratio (CVR): for reducing the effects of increased temporal maskingTechniques for reducing the effects of increased spectral masking: Binaural dichotic presentation, Spectral contrast enhancement, Multi-band frequency compressionNoise suppression

Signal processing in hearing [email protected] Hearing AidsPre-amp AVC Selectable Freq. Response Amp.Programmable Digital Hearing AidsPre-amp AVC Multi-band Amplitude Compression & Freq. Response Amp.Major ProblemsNoisy environment & reverberationDistortions due to multiband amplitude compressionPoor speech perception due to increased spectral & temporal maskingVisit to audiologist for change of settings Proposed Hearing Aids (with user selectable settings)Pre-amp AVC Noise Suppression Processing for Reducing the Effects of Increased Spectral Masking Processing for Reducing the Effects of Increased Temporal Masking Multi-band Amplitude Compression & Freq. Response Amp.

[email protected] Research Objectives Developing techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss Reduction of effects of increased spectral maskingBinaural aids: Binaural dichotic presentation using comb filters for spectral splittingMonoaural aids: Mutiband frequency compression Reduction of spectral maskingEnhancement of transient parts (weak & short but perceptually important ) Noise Suppression

Implementation of the techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms)[email protected] Research Objectives Developing techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss Reduction of effects of increased spectral maskingBinaural aids: Binaural dichotic presentation using comb filters for spectral splittingMonoaural aids: Mutiband frequency compression Reduction of spectral maskingEnhancement of transient parts (weak & short but perceptually important ) Noise Suppression

Implementation of the techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms)[email protected]. C. Pandey (EE Dept, IIT Bombay): "Speech Processing for Persons with Moderate Sensorineural Hearing Impairment", Plenary talk, Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), 09-11 March 2013, Allahabad, India

Abstract Our objective is to develop techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss and to implement these techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms). Here we present two techniques to reduce the adverse effects of increased spectral masking associated with sensorimeural loss. The first technique reduces the effects of noise in the listening environment and the second one reduces the effects of increased intra-speech spectral masking.A spectral subtraction technique is presented for real-time speech enhancement in the aids used by hearing impaired listeners. For reducing computational complexity and memory requirement, it uses a cascaded-median based estimation of the noise spectrum without voice activity detection. The technique is implemented and tested for satisfactory real-time operation, with sampling frequency of 12 kHz, processing using window length of 30 ms with 50% overlap, and noise estimation by 3-frame 4-stage cascaded-median, on a 16-bit fixed-point DSP processor with on-chip FFT hardware. Enhancement of speech with different types of additive stationary and non-stationary noise resulted in SNR advantage of 4 13 dB.Widening of auditory filters in persons with sensorineural hearing impairment leads to increased spectral masking and degraded speech perception. Multi-band frequency compression of the complex spectral samples using pitch-synchronous processing has been reported to increase speech perception by persons with moderate sensorineural loss. It is shown that implementation of multi-band frequency compression using fixed-frame processing along with least-squares error based signal estimation reduces the processing delay and the speech output is perceptually similar to that from pitch-synchronous processing. The processing is implemented on a 16-bit fixed-point DSP processor and real-time operation is achieved using about one-tenth of its computing capacity. [email protected]