dealing with acoustic noise part 1: spectral estimation

26
Dealing with Acoustic Noise Part 1: Spectral Estimation Mark Hasegawa-Johnson University of Illinois Lectures at CLSP WS06 July 20, 2006

Upload: neith

Post on 24-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Dealing with Acoustic Noise Part 1: Spectral Estimation. Mark Hasegawa-Johnson University of Illinois Lectures at CLSP WS06 July 20, 2006. Noise from the Perspective of the Brainstem. Something happened!! (VAD) What happened?? (Recognition). Noise from the Perspective of the Brainstem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Dealing with Acoustic Noise

Part 1: Spectral Estimation

Mark Hasegawa-JohnsonUniversity of Illinois

Lectures at CLSP WS06July 20, 2006

Page 2: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Noise from the Perspective of the Brainstem

1. Something happened!! (VAD)

2. What happened?? (Recognition)

Page 3: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Noise from the Perspective of the Brainstem

1. Something happened!! (VAD)2. Which pixels belong to the new event

(auditory scene analysis)?3. What are their amplitudes (spectral

estimation)?4. What happened?? (Recognition)

Page 4: Dealing with Acoustic Noise  Part 1: Spectral Estimation

A Speech Recognition Model (Mixture Gaussian)

• Problem: find the most probable class (Cn) given measurements of a function (f(S)) of the speech signal (S). For example, f(S) might be the PLP coefficients.

• Solution: Choose n such that p(Cn,S)>p(Cm,S) for n≠m.

• What is p(Cn,S)? The most effective current computational model is the mixture Gaussian: the weighted sum of exp(-|-f(S)|W2), where |x|W2≡xTWx. 2W is called the “precision” matrix.

Page 5: Dealing with Acoustic Noise  Part 1: Spectral Estimation

How Should the Model React to Additive Noise?

• Suppose that we only have a noisy measurement, X:

... where V is independent noise. Then Cn should maximize:

Page 6: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Answer: By Computing fMMSE

Page 7: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Definition of fMMSE

Page 8: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Classical Estimators: Maximum Likelihood

Page 9: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Classical Estimators: Maximum A Posteriori

Page 10: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Classical Estimators: Minimum Mean Squared Error

Page 11: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Functions of Random Variables Since fMMSE(S)≠f(SMMSE), it is necessary to find the

probability density function of f(S) directly. Fortunately, the PDF of f(S) can always be computed from the PDF of S, as follows:

Page 12: Dealing with Acoustic Noise  Part 1: Spectral Estimation

PDF of Speech: Time Domain• Speech samples s[n] are often

modeled as Gaussian, because Gaussian PDFs are easy to manipulate.

• In fact, noise tends to be Gaussian, but…

• Speech PDF is actually a mixture of Gaussian small-amplitude samples (the noise bits?) and Laplacian high-amplitude samples (the actual speech bits?)

Page 13: Dealing with Acoustic Noise  Part 1: Spectral Estimation

PDF of Speech: Filter Outputs, e.g., STFT

• STFT is just a filter with complex-valued filter coefficients:

• Central Limit Theorem says Sk should be a 2D Gaussian, if the window is infinitely long…

Page 14: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Err…. Is it REALLY Gaussian?

Page 15: Dealing with Acoustic Noise  Part 1: Spectral Estimation

If we ignore the previous slide, and pretend that Sk is a

complex Gaussian, then it is possible to analytically derive

the PDFs of |Sk|2, |Sk|, and phase of Sk. They are:

Page 16: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Classical Spectral Estimation: Assumptions

• One signal is ongoing (call it the “noise”), so its N is known (averaged over time prior to voice activity detection).

• The MMSE estimate of the other signal, S, combines two types of information:– a priori knowledge, S = E[|Sk|2]

• a priori SNR: k = S / N

–Maximum likelihood estimator, SML = |Xk|2-N

• Maximum likelihood SNR: k = |Xk|2 / N

Page 17: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Classical Spectral Estimation Results:Wiener Filter(Norbert Wiener, 1949)

Page 18: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Classical Spectral Estimation Results:MMSE Spectral Amplitude Estimate

(Ephraim and Malah, 1984)

Page 19: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Classical Spectral Estimation Results:MMSE Log Amplitude Estimate

(Ephraim and Malah, 1985)

Page 20: Dealing with Acoustic Noise  Part 1: Spectral Estimation

How does it sound?

Page 21: Dealing with Acoustic Noise  Part 1: Spectral Estimation

How does it sound?MVDR Beamformer

eliminates high-

frequency noise,

MMSE-logSA

eliminates low-

frequency noise

MMSE-logSA adds

reverberation at low

frequencies;

reverberation seems

to not effect speech

recognition accuracy

Page 22: Dealing with Acoustic Noise  Part 1: Spectral Estimation

What about,oh, say…

PLP?

Page 23: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Loudness Spectrum• Perceptual LPC (PLP) begins by computing an

estimate of the perceptual loudness spectrum. – Step 1: filter the signal using complex-valued

Bark-scale critical-band filters hk[m]:

– Step 2: compress the amplitudes with a nonlinearity:

Page 24: Dealing with Acoustic Noise  Part 1: Spectral Estimation

MMSE Estimate of the Perceptual Loudness Spectrum

• gPLP requires numerical integration (of u1/3e-u)• Numerical integration is a lot cheaper than it

used to be (e.g., via lookup table).

Page 25: Dealing with Acoustic Noise  Part 1: Spectral Estimation

A Conservative Computational Auditory Model

Basilar Membrane:Mechanical

Filterbank(von Bekesy)

x(t)

VariableThresholdSynapses: Amplitude Compression(Ghitza, 1986)

AverageBackgroundLoudness lN

ChangeDetection

MMSEEstimator of “New Event”

E[ |S[k]|2/3 | X ]

Auditory Nerve Carries The Loudness

Spectrum~ |X[k]|2/3

(Fletcher;Hermansky)

Auditory Scene Analysis

Speech FeatureExtraction

PLP≈

PerceptualFormant

Extraction(Hermansky);

TandemFeatures

≈Perceptual

MagnetEffect

(Niyogi)

Page 26: Dealing with Acoustic Noise  Part 1: Spectral Estimation

Change Detection (VAD)