[email protected] ee dept., iit bombay ncc 2015, mumbai, 27 feb.- 1 mar. 2015, paper no....

Slide 1

5. Summary & ConclusionsProposed technique: Suppression of stationary & non-stationary background noise by estimation of noise spectrum using dynamic quantile tracking without voice activity detection or storage & sorting of past samples.Speech enhancement: SNR advantage (at PESQ score = 2) of 3 6 dB for different stationary & non-stationary noises.Implementation for real-time operation using 16-bit fixed-point processor TI/TMS320C5515: signal delay 36 ms, processing capacity required 41%.Technique permits use of frequency-dependent quantile for noise estimation without introducing processing overheads.Further work Combination of noise suppression with other processing techniques in sensory aids Implementation using other processors12345#/[email protected] Dept., IIT BombayOverview1.Introduction2.Signal Processing for Speech Enhancement 3.Implementation for Real-time Processing4.Test Results5.Summary & Conclusion#/[email protected] Dept., IIT Bombay23. Implementation for Real-time ProcessingDSP board with 16-bit fixed-point processor with on-chip FFT hardwareDSP chip: TI/TMS320C551516 MB memory space (320 KB on-chip RAM with 64 KB dual access data memory)Three 32-bit programmable timers4 DMA controllers each with 4 channelsFFT hardware accelerator (up to 1024-point FFT)Max. clock speed: 120 MHz DSP Board: eZdsp4 MB on-board NOR flash for user programStereo codec TLV320AIC3204: 16/20/24/32-bit ADC & DAC, 8 192 kHz samplingSoftware development: C using TI's 'CCStudio ver. 4.0 12345#/[email protected] Dept., IIT Bombay16Signal processing techniques currently used in hearing aidsFrequency-selective amplification to compensate for frequency dependent elevation of hearing thresholdsAmplitude compression to compensate for decreased dynamic range

Processing for reducing the effects of increased spectral & temporal masking in sensorineural lossBinaural dichotic presentation (Lunner et al. 1993, Kulkarni et al. 2012)Spectral contrast enhancement (Yang et al. 2003) Multiband frequency compression (Arai et al. 2004, Kulkarni et al. 2012)Consonant-vowel ratio enhancement (Jayan et al. 2014) 12345#/[email protected] Dept., IIT Bombay4Suppression of background noise for speech enhancementFor improving speech quality and intelligibilityAs a pre-processing stage for signal processing techniques to improve speech perception by hearing impaired listeners

Single-input speech enhancement using spectral subtraction (Boll 1979, Berouti et al.1979, Martin 1994, Loizou 2007)Short-time spectral analysisDynamic estimation of stationary or non-stationary noise spectrum using voice activity detection during non-speech segments, or continuously using statistical techniquesEstimation of noise-free speech spectrum using spectral noise subtraction, or multiplication by noise suppression functionSpeech resynthesis using enhanced magnitude and noisy phase12345#/[email protected] Dept., IIT Bombay5Noise spectrum estimationDynamic estimation of stationary & non-stationary noise spectrum.Under-estimation results in residual noise & over-estimation results in distortion leading to degraded quality with reduced intelligibility.Techniques for noise spectrum estimationNoise estimation during silence intervals using voice activity detector: Detection may not be satisfactory under low-SNR conditions & the method may not correctly track the noise spectrum during long speech segments.Minimum statistics based estimation of noise spectrum: Needs estimation of SNR-dependent subtraction factor, may underestimate noise & remove parts of speech signal during weaker segments.Quantile-based noise estimation (QBNE): Based on observation that signal energy in a particular frequency bin is low in most of the frames & high only in 10-20% frames corresponding to voiced speech segments. Reported to be useable without voice activity detection & SNR estimation.12345#/[email protected] Dept., IIT Bombay6Spectral subtraction using QBNE techniquesQuantiles obtained by ordering the spectral samples or from dynamically generated histogramsStahl et al. (2000): 0.55-quantile used to estimate the noise spectrum. 26% decrease in word error in speech recognition.Evans & Mason (2002): Time-frequency quantile based noise estimation. 35% improvement in word recognition accuracy.Bai & Wan (2003): A two-pass quantile based noise estimation. SNR estimation for each time-frequency point using a fixed quantile & using it for a new quantile for each sub-band.Not suited for use in hearing aids due to large memory space required for storing the spectral samples & high computational complexity.Cascaded-median based noise estimation (Waddi et al., 2013)Median (0.5-quantile) to reduce computation requirement.Cascaded-median as approximation to median for real-time implementation.Different improvements for different noises indicated need for using frequency dependent quantiles for noise estimation.12345#/[email protected] Dept., IIT Bombay7Research objective Technique for noise spectrum estimation for speech enhancement using spectral subtraction & its real-time implementation for possible use in hearing aids.Proposed technique Dynamic quantile tracking as an approximation to the sample quantile obtained by sorting without use of voice activity detector,without involving storage and sorting of past samples, low computational complexity & memory requirement,low signal delay (algorithmic + computational).12345#/[email protected] Dept., IIT Bombay8+(k) & (k) selected to be fractions of the range such that Dn(k) approaches sample quantile & sum of changes approaches zero:

where p(k) = pth quantile of the magnitude spectrum; M = Number of frames; R = Range (difference between max. & min. of the spectral samples); = Factor controlling the step size.

Ripple in estimation

&No. of steps between initial estimate Di(k) & final estimate Df (k)

(|Df (k) Di(k)|)max = R Convergence factor to be selected for tradeoff between & smax. Method not suited for very low or high p.&12345#/[email protected] Dept., IIT Bombay10Range estimationRange updated dynamically as the underlying distribution is unknown.Peak & valley detection using first-order recursive relations.where = factor controlling rise time; : factor controlling fall time. Small for fast detector response & relatively large to avoid ripples.

Rn(k) = Pn(k) Vn(k)Quantile tracking of noise spectral sample

12345#/[email protected] Dept., IIT Bombay11Block diagram of dynamic quantile tracking

12345#/[email protected] Dept., IIT Bombay12Speech enhancement using spectral subtraction

Short-time spectral analysis using FFTDynamic estimation of non-stationary noise spectrum Enhanced magnitude spectrum calculation by generalized spectral subtraction Enhanced complex spectrum calculation without explicit phase estimationRe-synthesis using IFFT with overlap-add12345#/[email protected] Dept., IIT Bombay13Generalized spectral subtraction |Yn(k)| = 1/ Dn(k), if |Xn(k)| < ( + )1/Dn(k) [ |Xn(k)| (Dn(k)) ] 1/ otherwisewhere = exponent factor (2: power subtraction, 1: magnitude subtraction), = over-subtraction factor (for limiting effect of short-term variations in noise spectrum), = floor factor to mask the musical noise due to over-subtraction, Xn(k) = Windowed speech spectrum, Dn(k) = Estimated noise mag. spectrum.Enhanced complex spectrum calculation with phase of input spectrum without explicit phase calculationYn(k) = |Yn(k)| Xn(k) / |Xn(k)|12345#/[email protected] Dept., IIT Bombay14Implementation for offline processing Implementation using Matlab 7.10 for evaluating the performance of the proposed technique and the effect of processing parameters. Processing parameters fs = 10 kHz Frame length = 25.6 ms (L = 256) Overlap = 75% (S = 64) FFT size N = 512Implementation using = 1 (magnitude subtraction) showed higher tolerance to variation in , values Offline processing to get optimal combination of and for spectral subtractionOffline processing to get optimal combination of and for noise estimation12345#/[email protected] Dept., IIT Bombay15ImplementationDMA based I/O with cyclic buffersADC and DAC with 16-bit quantization (left channel of stereo codec )fs = 10 kHz, L = 256, S = 64, N = 512 (as for offline processing)Data representation (input, spectral values, output): 16-bit real & 16-bit imaginary

12345#/[email protected] Dept., IIT Bombay17Data transfer & buffering operations (S = L/4)DMA cyclic buffers 5-block S-sample input buffer 2-block S-sample output buffer Pointers Current input block Just-filled input block Current output block Write-to output block(incremented cyclically on DMA interrupt)

DelayAlgorithmic: 1 frame (25.6 ms)Computational: frame shift (6.4 ms)

12345#/[email protected] Dept., IIT Bombay18Results: Offline processing Investigations for most suitable values of processing parametersProcessing with noise estimation carried out using sample quantile (SQ) values & the following processing parameters: = 0, = 0.4 6 = 0.1, = (0.9)1/1024 (rise time = 1 frame shift, fall time = 1024 frame shift)p = 0.1, 0.25, 0.5, 0.75, 0.9 M = 32, 64, 128, 256, & 512M = 128 resulted in highest PESQ scores (for fixed SNR, , & p).Noise estimation with p = 0.25 resulted in nearly the best scores for different types of noises at all SNRsPESQ scores obtained for processing with noise estimation using dynamic quantile tracking with = 1/256 nearly equal to the PESQ scores obtained using SQ with M = 128.

12345#/[email protected] Dept., IIT Bombay20Processing examples & PESQ scoresPESQ scores of the unprocessed (Unpr.) noisy speech with babble (a non-stationary noise) and processed (Pr.) signals with noise estimation by sample quantile (SQ) with M = 128 and dynamic quantile tracking (DQT) with = 1/256.

SNR(dB)PESQ ScoreUnpr.Pr., =1,=0Pr., =2,=0Pr., =3, =0SQDQTSQDQTSQDQT-61.681.721.661.711.751.621.5701.972.002.132.202.192.172.2862.392.542.532.702.652.692.67PESQ scores obtained using 0.25-quantile not sensitive to changes in Combination of = 1/256, p = 0.25, & = 2 used for more detailed examination of scores

12345#/[email protected] Dept., IIT Bombay21PESQ score vs SNR: noisy & enhanced speech

Increase in scores: 0.24 0.46 for white noise, 0.08 0.32 for babble noise.SNR advantage: 6 dB for white noise, 3 dB for babble noise.Informal listening: = 0.001 reduced the musical noise without degrading speech quality.12345#/[email protected] Dept., IIT Bombay22Results: Real-time processingTesting of real-time processing using white, babble, car, street, and train noises at different SNRsListening: Real-time processed output perceptually similar to the offline processed outputObjective verification: High PESQ scores (> 3.5) for output of real-time processing with output of offline processing as the referenceSignal delay: 36 msProcessing capacity required: 41% (System clock needed for satisfactory processing = 50 MHz, highest system clock = 120 MHz)12345#/[email protected] Dept., IIT Bombay(a) Clean speech (b) Noisy speech(c) Offline processed(d) Real-time processedExample: -/a/-/i/-/u/ aayiye aap kaa naam kyaa hai? Where were you a year ago?) , white noise, input SNR = 3 dB.

More examples: http://www.ee.iitb.ac.in/~spilab/material/nitya/ncc2015

12345#/[email protected] Dept., IIT BombayThank [email protected] Dept., IIT BombayReferences[1]H. Levitt, J. M. Pickett, and R. A. Houde (eds.), Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980.[2]J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289323.[3]H. Dillon, Hearing Aids. New York: Thieme Medical, 2001.[4]T. Lunner, S. Arlinger, and J. Hellgren, 8-channel digital filter bank for hearing aid use: preliminary results in monaural, diotic, and dichotic modes, Scand. Audiol. Suppl., vol. 38, pp. 7581, 1993.[5]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, Binaural dichotic presentation to reduce the effects of spectral masking in moderate bilateral sensorineural hearing loss, Int. J. Audiol., vol. 51, no. 4, pp. 334344, 2012.[6]J. Yang, F. Luo, and A. Nehorai, Spectral contrast enhancement: Algorithms and comparisons, Speech Commun., vol. 39, no. 12, pp. 3346, 2003.[7]T. Arai, K. Yasu, and N. Hodoshima, Effective speech processing for various impaired listeners, in Proc. 18th Int. Cong. Acoust., 2004, Kyoto, Japan, pp. 13891392.[8]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss, Speech Commun., vol. 54, no. 3 pp. 341350, 2012.[9]A. R. Jayan and P. C. Pandey, Automated modification of consonant-vowel ratio of stops for improving speech intelligibility, Int. J. Speech Technol., 2014, [online] DOI 10.1007/s10772-014-9254-4.[10]M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proc. IEEE ICASSP 1979, Washington, D.C., pp. [email protected] Dept., IIT Bombay[11]S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, pp. 113-120, 1979.[12]P. C. Loizou, Speech Enhancement: Theory and Practice. New York: CRC, 2007.[13]Y. Lu and P. C. Loizou, A geometric approach to spectral subtraction, Speech Commun., vol. 50, no. 6, pp. 453-466, 2008.[14]K. Paliwal, K. Wjcicki, and B. Schwerin, Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Commun., vol. 52, no. 5, pp. 450475, 2010.[15]R. Martin, Spectral subtraction based on minimum statistics, in Proc. 6th Eur. Signal Process. Conf. (EUSIPCO 1994), Edinburgh, U.K., 1994, pp. 1182-1185.[16]I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging, IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466-475, 2003.[17]G. Doblinger, Computationally efficient speech enhancement by spectral minima tracking in subbands, in Proc. EUROSPEECH 1995, Madrid, Spain, pp. 1513-1516.[18]V. Stahl, A. Fisher, and R. Bipus, Quantile based noise estimation for spectral subtraction and Wiener filtering, in Proc. IEEE ICASSP 2000, Istanbul, Turkey, pp. 1875-1878.[19]N. W. Evans and J. S. Mason, "Time-frequency quantile-based noise estimation," in Proc. 11th Eur. Signal Process. Conf. (EUSIPCO 2002), Toulouse, France, 2002, pp. 539-542. [20]H. Bai and E. A. Wan, "Two-pass quantile based noise spectrum estimation," Center of spoken language understanding, OGI School of Science and Engineering at OHSU (2003), [online] Available: http://speech.bme.ogi.edu/publications/ps/bai03.pdf.[21]S. K. Waddi, P. C. Pandey, and N. Tiwari, Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners, in Proc. 19th Nat. Conf. Commun. (NCC 2013), Delhi, India, 2013, paper no. 1569696063.[22]Texas Instruments, Inc., TMS320C5515 Fixed-Point Digital Signal Processor, 2011, [online] Available: focus.ti.com/lit/ds/symlink/ [email protected] Dept., IIT Bombay[23]Spectrum Digital, Inc., TMS320C5515 eZdsp USB Stick Technical Reference, 2010, [online] Available: support.spectrumdigital.com/ boards/usbstk5515/reva/files/usbstk5515_TechRef_RevA.pdf[24]Texas Instruments, Inc., TLV320AIC3204 Ultra Low Power Stereo Audio Codec, 2008, [online] Available: focus.ti.com/lit/ds/ symlink/tlv320aic3204.pdf.[25]ITU, Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, ITU-T Rec., P.862, 2001.[26] N. Tiwari, Speech enhancement using noise estimation based on dynamic quantile tracking for hearing impaired listeners: Processing results, 2015, [online] Available: www.ee.iitb.ac.in/~spilab/material /nitya/ncc2015. [email protected] Dept., IIT BombayDelay

Delay

Delay

Quantile Calc.

Valley Calc.

Peak Calc.

+

Range Estimation

Quantile Estimation

|Xn(k)|

x(n)

Noise Estimation

FFT

Windowing

| |

Enh. Mag. Spec. Calc.

|Yn(k)|

Yn(k)

y(n)

Enh. Complex Spec. Calc.

Input Speech

Enhanced Speech

Dn(k)

IFFT

Overlap-Add

Xn(k)

Text

Overlap- Add

Input

ADC

DAC

Output

FFT

InputWindowing

Codec

DMA Ch. 2Input Cyclic Buf.

DMA Ch. 0Output Cyclic Buf.

Processor

OutputWindowing

|X|

|Y|

X / |X|

Y

D

IFFT

| . |

Noise est.

Spec. Sub.

X

X

Text

N L

N L

Input Samples

Ouput Samples

Just-Filled Block

Input Block

L

Words

Output Block

L

Words

DMA Input Cyclic Buffer(5S Samples)

DMA Output Cyclic Buffer(2S Samples)

Write-to Block

Input Data Buffer

Output Data Buffer

LSamples

SSamples

FFT

IFFT

[email protected] ee dept., iit bombay ncc 2015, mumbai, 27 feb.- 1 mar. 2015, paper no....

Documents