05346518

4
2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, New Paltz, NY SINGLE-MICROPHONE WIND NOISE REDUCTION BY ADAPTIVE POSTFILTERING Elias Nemer Wilf Leblanc Broadcom Corp 5300 California Ave, Irvine CA 92617, USA [email protected] ABSTRACT This paper presents a novel time-domain algorithm for detecting and attenuating the acoustic effect of wind noise in speech signals originating from mobile terminals. The detection part makes use of metrics that exploits the properties of the spectral envelop of wind noise as well as its non-periodic and non-harmonic nature. LPC analyses of various orders are carried out and the results used to distinguish between wind and speech frames and to estimate the magnitude and location of the wind noise ‘resonance’. The suppression part entails constructing a parameterized postfilter of an appropriate order having a ‘null’ where the wind noise ‘resonance’ is. Wind-only frames are used to estimate the wind noise energy, from which the emphasis parameters of the post-filter are adjusted to provide an appropriate attenuation. The proposed scheme may be combined with background-noise suppression algorithms, or with speech-formant-enhancing post-filters in the context of a speech codec. Index Terms— wind noise reduction. 1. INTRODUCTION In mobile telephony contexts, speech is often corrupted by surrounding noise such as traffic, car, street and wind noise, as well as system-introduced noise, such as quantization and channel interference. This in turn has an adverse effect on the perceived quality and intelligibility of speech as well as on the performance of other processing algorithms such as speech or speaker recognition. Wind noise is a particular type of acoustic interference in that it is caused by the turbulent airflow over the microphone membrane [1], which creates an acoustic effect of a relatively high signal level. Wind noise is bursty with gusts lasting from a few to a few hundred milliseconds. It is often annoying and leads to listener fatigue, since it is impulsive, with high amplitude that may exceed the nominal speech amplitude. Due to its nonstationary nature, wind noise cannot be attenuated by conventional noise reduction schemes, such as spectral subtraction or statistical-based estimators [2][3], thus requiring special detection and processing to reduce its effect. The most effective methods for reducing wind noise are those utilizing 2 or more sensors, thus exploiting the difference in propagation delay between wind and acoustic waves [4]. Single-microphone wind-noise reduction is still an open problem. Some of the proposed methods make use of independent component analysis [5] but are computationally prohibitive; others [6] use comb filters to reinforce the harmonic nature of speech, but rely on an accurate pitch estimation, which is difficult to achieve in a noisy environment. Postfilters are used in model-based speech coders to improve speech quality. They reduce the effect of coding quantization [7][8][9] by emphasizing the formant frequencies and deemphasizing the spectral valleys where noise contributes the most to the perceived distortion. The frequency response of the postfilter thus corresponds to a modified version of the speech spectrum. The commonly used transfer function is given by: ( ) ( ) α β / / ) ( z A z A G z H = (Eq 1) Where ) ( z A is the adaptive short term filter derived from a prediction-based analysis (LPC) of an appropriate order, G is gain, and α and β are emphasis parameters (usually fixed) that control the degree of spectral attenuation/emphasis. In [10], an extension of the conventional postfilter is proposed and entails adapting the emphasis parameters to compensate for some of the effects of the background noise. In this paper, we use a time-domain adaptive post-filter to attenuate the effect of the wind noise in corrupted speech. As in the case of speech postfilters, the one proposed here tracks the changing envelop spectrum of wind noise, but unlike speech postfilters, it places a null where the noise energy is concentrated (i.e. deemphasizes the wind ‘resonance’). The coefficients of the filter are derived from an LPC analysis. In addition to estimating the spectrum shape, such analysis is also used to discriminate between wind-only frames and frames containing mostly speech energy. The proposed wind- attenuating postfilter can also be combined with a speech- formant-reinforcing post filter or its extended version [11] , to further reduce coding and background noise as well. In a speech coding context, suppressing the background noise at the decoder (instead of prior to encoding) has been argued beneficial for various reasons [11]. The paper is organized as follows: section 2 describes the spectral uniqueness of wind noise. Section 3 describes the wind noise detection scheme and section 4 describes how the postfilter is constructed. Simulation results are presented in section 5 and concluding remarks in section 6. 2. SPECTRAL CHARACTER OF WIND NOISE Various published work on wind noise has shown that it consists of 2 components [1] : Flow turbulences and fluctuations occurring naturally in the wind. Turbulence generated by the interaction of the microphone and the wind. The second component is dominant in practical telephony contexts: the wind effect on handheld and ear-held devices is very pronounced given the presence of the human hand and 978-1-4244-3679-8/09/$25.00 ©2009 IEEE 177

Upload: adrian-ferguson

Post on 01-Oct-2015

4 views

Category:

Documents


3 download

DESCRIPTION

wnr for denoising speech audio

TRANSCRIPT

  • 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, New Paltz, NY

    SINGLE-MICROPHONE WIND NOISE REDUCTION BY ADAPTIVE POSTFILTERING

    Elias Nemer Wilf Leblanc

    Broadcom Corp 5300 California Ave, Irvine CA 92617, USA

    [email protected]

    ABSTRACT

    This paper presents a novel time-domain algorithm for detecting and attenuating the acoustic effect of wind noise in speech signals originating from mobile terminals. The detection part makes use of metrics that exploits the properties of the spectral envelop of wind noise as well as its non-periodic and non-harmonic nature. LPC analyses of various orders are carried out and the results used to distinguish between wind and speech frames and to estimate the magnitude and location of the wind noise resonance. The suppression part entails constructing a parameterized postfilter of an appropriate order having a null where the wind noise resonance is. Wind-only frames are used to estimate the wind noise energy, from which the emphasis parameters of the post-filter are adjusted to provide an appropriate attenuation. The proposed scheme may be combined with background-noise suppression algorithms, or with speech-formant-enhancing post-filters in the context of a speech codec.

    Index Terms wind noise reduction.

    1. INTRODUCTION

    In mobile telephony contexts, speech is often corrupted by surrounding noise such as traffic, car, street and wind noise, as well as system-introduced noise, such as quantization and channel interference. This in turn has an adverse effect on the perceived quality and intelligibility of speech as well as on the performance of other processing algorithms such as speech or speaker recognition.

    Wind noise is a particular type of acoustic interference in that it is caused by the turbulent airflow over the microphone membrane [1], which creates an acoustic effect of a relatively high signal level. Wind noise is bursty with gusts lasting from a few to a few hundred milliseconds. It is often annoying and leads to listener fatigue, since it is impulsive, with high amplitude that may exceed the nominal speech amplitude. Due to its nonstationary nature, wind noise cannot be attenuated by conventional noise reduction schemes, such as spectral subtraction or statistical-based estimators [2][3], thus requiring special detection and processing to reduce its effect.

    The most effective methods for reducing wind noise are those utilizing 2 or more sensors, thus exploiting the difference in propagation delay between wind and acoustic waves [4]. Single-microphone wind-noise reduction is still an open problem. Some of the proposed methods make use of independent component analysis [5] but are computationally prohibitive; others [6] use comb filters to reinforce the harmonic nature of speech, but rely on an accurate pitch estimation, which is difficult to achieve in a noisy environment.

    Postfilters are used in model-based speech coders to improve speech quality. They reduce the effect of coding quantization [7][8][9] by emphasizing the formant frequencies and deemphasizing the spectral valleys where noise contributes the most to the perceived distortion. The frequency response of the postfilter thus corresponds to a modified version of the speech spectrum. The commonly used transfer function is given by:

    ( )( )

    //)(

    zAzAGzH = (Eq 1)

    Where )(zA is the adaptive short term filter derived from a prediction-based analysis (LPC) of an appropriate order, G is gain, and and are emphasis parameters (usually fixed) that control the degree of spectral attenuation/emphasis. In [10], an extension of the conventional postfilter is proposed and entails adapting the emphasis parameters to compensate for some of the effects of the background noise.

    In this paper, we use a time-domain adaptive post-filter to attenuate the effect of the wind noise in corrupted speech. As in the case of speech postfilters, the one proposed here tracks the changing envelop spectrum of wind noise, but unlike speech postfilters, it places a null where the noise energy is concentrated (i.e. deemphasizes the wind resonance). The coefficients of the filter are derived from an LPC analysis. In addition to estimating the spectrum shape, such analysis is also used to discriminate between wind-only frames and frames containing mostly speech energy. The proposed wind-attenuating postfilter can also be combined with a speech-formant-reinforcing post filter or its extended version [11] , to further reduce coding and background noise as well. In a speech coding context, suppressing the background noise at the decoder (instead of prior to encoding) has been argued beneficial for various reasons [11].

    The paper is organized as follows: section 2 describes the spectral uniqueness of wind noise. Section 3 describes the wind noise detection scheme and section 4 describes how the postfilter is constructed. Simulation results are presented in section 5 and concluding remarks in section 6.

    2. SPECTRAL CHARACTER OF WIND NOISE

    Various published work on wind noise has shown that it consists of 2 components [1] :

    Flow turbulences and fluctuations occurring naturally in the wind.

    Turbulence generated by the interaction of the microphone and the wind.

    The second component is dominant in practical telephony contexts: the wind effect on handheld and ear-held devices is very pronounced given the presence of the human hand and

    978-1-4244-3679-8/09/$25.00 2009 IEEE 177

  • 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, New Paltz, NY

    face that generates significant turbulence. The spectrum of the recorded wind noise is a generally decreasing function of frequency, whereby the bulk of the energy is concentrated in the lower spectrum. The bandwidth and amplitude are function of the wind speed, wind direction, and handset design. Figure 1 illustrates the power spectrum of wind noise recorded at a 45 deg angle and various speeds.

    500 1000 1500 2000 2500 3000 3500 4000-50

    -40

    -30

    -20

    -10

    0

    10

    20

    30

    2mph

    8mph

    4mph6mph

    Frequency (Hz)

    dB

    Figure 1 : Spectrum of wind noise at various speeds

    2.1. LPC analysis of wind noise vs. speech segments

    Given the concentration of the energy of wind noise in the lower spectrum, the spectral envelop derived from predictive analysis contains a single resonance in the lower part of the spectrum. This is illustrated in the 2 figures below where a 2nd, 4th, and 10th order LPC analysis were performed and the results shown for a wind-only frame (figure 2) vs. a speech frame (figure 3). Since there is only 1 resonance (or formant), a low order LPC analysis (for instance 2) would also yield the same resonance as higher-order ones.

    Time (samples)

    Frequency (Hz)

    dB

    0 20 40 60 80 100 120 140- 0.2

    - 0.15

    - 0.1

    - 0.05

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0 500 1000 1500 2000 2500 3000 3500 40000

    5

    10

    15

    20

    25

    30

    35

    10 th order

    4 th order

    2 ndorder

    Figure 2 : Wind noise frame and LPC spectral envelop

    This is in contrast with speech frames, which typically

    have multiple formants and thus will yield different resonant frequency locations for different order analyses.

    Time (samples)

    Frequency (Hz)

    dB

    0 20 40 60 80 100 120 140- 0.25

    - 0.2

    - 0.15

    - 0.1

    - 0.05

    0

    0.05

    0.1

    0.15

    0 500 1000 1500 2000 2500 3000 3500 40000

    1

    2

    3

    4

    5

    6

    7

    10 th order

    2 nd order

    4 th order

    Figure 3 : Voiced speech frame and LPC spectral envelop of various orders

    3. DETECTING WIND NOISE SEGMENTS

    A number of time-domain metrics may be used to detect wind noise segments. These metrics exploit the difference in spectral shape between wind noise and speech segments, as well as the non-periodic and non-harmonic nature of wind noise.

    3.1. LPC analysis of various orders

    Given the spectral distribution of the wind noise energy, an LPC prediction of a low-order (e.g. 2) is sufficient and should yield a small prediction error for wind noise frames, but not so for speech frames, since the latter ones contain multiple resonances. The prediction error can be derived from the reflection coefficients as:

    ( )=

    =

    K

    kkrcPE

    1

    21 (Eq 2)

    Where K is the prediction order. Moreover, since all LPC orders yield the same solutions for wind noise frames, then evaluating the higher-order LPC polynomials (for instance 4th and 10th) using the roots of the 2nd order polynomial should yield a near-zero result.

    The 2nd order LPC polynomial can be derived from a simple 2nd-order matrix inverse, while the higher order ones (4th and 10th) can be derived by a Levinson-Durbin recursion. The roots of the 2nd order LPC analysis can be computed from a closed form expression by solving the quadratic equation:

    22

    111)(

    ++= zazazA (Eq 3)

    The complex conjugate roots are:

    ( ) .2/4 2121 aajaz = (Eq 4)

    178

  • 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, New Paltz, NY

    3.2. Pitch prediction error and variance

    Pitch prediction is used in speech coders to provide an open-loop estimate of the pitch [12]. The predictor derives the value that minimizes the mean square error, being the difference between the predicted and actual speech sample. A 1st order pitch predictor is based on estimating the speech sample in the current period using the sample in the previous one. The prediction error is: ][][][ Lnxgnxne =

    Where L is a plausible estimate of the pitch and g is the prediction gain. It can be shown that the optimum pitch is the one that maximizes the gain ratio:

    [ ][ ]LLR

    LRLx

    x

    L ,,0max

    2

    0 = (Eq 5)

    Where xR is the autocorrelation of the signal. Given the periodic nature of voiced speech and the impulsive nature of wind noise, the prediction gain ratio would be small during wind-noise and generally large during voiced speech segments.

    In addition to the prediction gain, we make use of the pitch variance computed over the past few frames. Given the slow pitch trajectory in natural speech, the variance would take on small values over voiced speech and relatively large values during non-voiced speech and wind noise segments.

    3.3. Higher-order Statistics of the LPC residual

    In [13] we showed that the 3rd and 4th order statistics of speech signals are non-zero and can be used as a basis for a robust voice activity detector in the presence of Gaussian noise. Analytical expressions for the higher order statistics of the LPC residual of speech signals were derived, and expressed as function of the speech energy and bandwidth. Based on this finding, we use the 4th order statistics (kurtosis) in this work as an added metric to distinguish speech and wind noise frames. The kurtosis of a real process x is given by :

    ( )2244 ][3][)( xExExC = (Eq 6) In practice, the statistical expectations ][ nxE are approximated by time averages over an interval of 10 20 msec. Figure 4 below illustrates the basic logic flow of the detector, as described above.

    4. ADAPTIVE POSTFILTERING

    Given an estimate of the spectral envelop of wind noise, a post-filter of order K can be constructed to place a null where the wind resonance is estimated to be. The general expression of the filter is similar to (Eq 1) where

    =

    +=K

    k

    kkazzA1

    1)/( and

    and are emphasis parameters that control the degree of spectral attenuation. In order to place nulls at the resonant location, these parameters meet the condition 10

  • 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 18-21, 2009, New Paltz, NY

    table. A block diagram of the adaptive filter procedure is illustrated in figure 5.

    5. SIMULATION RESULTS

    Speech recorded in a wind-noise context is created in a controlled lab environment using a wind-generating machine at various wind speeds, ranging from 2 to 8 mph.

    Phonetically balanced sentences spoken by male and female speakers are used and the noisy speech is sampled and stored at 8 kHz and quantized to 16 bits. The above described algorithm for detection and filtering is applied and the resulting speech is subjectively evaluated by asking listeners to rate it in comparison to the original noise speech, according to a scoring rule :

    Clear improvement : 4 ;average improvement : 3; marginal improvement : 2; no improvement : 1 ; deterioration : 0.

    The results of subjective comparison of the original and

    processed speech are show in the table below. In general the processed speech is deemed better than the original noisy one up to 6 mph, after which excessive attenuation causes too much erosion of speech energy.

    Table 1 : Subjective evaluation results

    speed 2mph 4mph 6mph 8mph score 3.6 3.3 3.2 2.8

    A spectrogram of the noisy and processed speech (using a 10th order postfilter) is show in figure 6 below.

    6. CONCLUSION

    We described a new method for attenuating wind noise in mobile telephony devices using adaptive postfilters. This is achieved by tracking the wind noise resonance and constructing a postfilter with a null whose magnitude and location depends on the estimated wind noise spectral energy. Moreover, the predictive-based analysis is used to discriminate between speech and wind-noise segments and estimate the general spectral envelop of the wind noise. The approach yields noticeable improvement in the perceived quality of noisy speech, while keeping speech distortion to a minimum.

    7. REFERENCES

    [1] S. Bradley, J. Backman, S. von Hunerbein, T. Wu. The mechanisms creating wind noise in microphones. Acoustic Engineering Society Convention. March 2003.

    [2] R. McAulay, M. Malpass. Speech enhancement using a soft-decision noise suppression filter. IEEE trans. On ASSP. Vol. 28. April 1980. Pages 137-145.

    [3] P. Wolfe. Simple alternative to the Ephraim and Malah suppression rule for speech enhancement. IEEE Proceedings of the Statistical Signal Processing Workshop, 2001. Pages: 496 499.

    [4] D F. Marshall. Wind noise reduction using a two microphone array. Acoust. Soc. Am. Volume 76, Issue S1, October 1984. Page S70.

    [5] M. Schmidt, J. Larsen and Fu-Tien Hsiao. Wind noise reduction using non-negative sparse coding. IEEE workshop on machine learning for signal processing. 27-29 Aug. 2007. Pages: 431 436.

    [6] B. King and L. Atlas. Coherent modulation comb filtering for enhancing speech in wind noise. IWAENC 2008.

    [7] H. Woo, Jeong Jin Kim, Kyung A Jang, Myung Jin Bae. The speech enhancement of the G.723.1 vocoder using multi-order formant post-filter. IEEE Proceedings of TENCON 1999. Pages: 710 713.

    [8] P. Kabal, F. Wang, D. OShaughnessy, R. Ramachandran. Adaptive postfiltering for enhancement of noisy speech in the frequency domain. IEEE Intl Symp. Circuits and Systems. June 1991. Pages 312-315.

    [9] Juin-Hwey Chen and A. Gersho. Adaptive postfiltering for quality enhancement of coded speech. IEEE trans on Speech and Audio Processing. Vol. 3, Issue 1, Jan. 1995 Pages: 59 71.

    [10] V. Gransharov, J. Samuelsson, and W.B. Kleijn. Noise-dependant post-filtering. IEEE ICASSP May 2004. Vol. 1 Pages: I - 457-60.

    [11] V. Gransharov, J. Plasberg, J. Samuelsson, and W.B. Kleijn. Generalized postfilter for speech quality enhancement. IEEE trans. On ASSP. Vol. 16, No. 1, Jan 2008. Pages 57 64.

    [12] R.P. Ramachandran and P. Kabal. Pitch prediction filters in speech coding. IEEE trans. On ASSP. Vol. 37, Issue 4, April 1989. Pages: 467 478.

    [13] E. Nemer, S. Mahmoud, R. Goubran. Robust Voice Activity Detection Using Higher-Order Statistics in the LPC Residual Domain. IEEE trans. On ASSP. Vol. 9, No. 3. March 2001. Pages 217-231.

    Figure 6 : Spectrogram of noisy and processed speech

    180

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 200 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 2.00333 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 400 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.00167 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /CreateJDFFile false /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure false /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles true /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /NA /PreserveEditing false /UntaggedCMYKHandling /UseDocumentProfile /UntaggedRGBHandling /UseDocumentProfile /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice