dealing with acoustic noise part 1: spectral estimation
DESCRIPTION
Dealing with Acoustic Noise Part 1: Spectral Estimation. Mark Hasegawa-Johnson University of Illinois Lectures at CLSP WS06 July 20, 2006. Noise from the Perspective of the Brainstem. Something happened!! (VAD) What happened?? (Recognition). Noise from the Perspective of the Brainstem. - PowerPoint PPT PresentationTRANSCRIPT
Dealing with Acoustic Noise
Part 1: Spectral Estimation
Mark Hasegawa-JohnsonUniversity of Illinois
Lectures at CLSP WS06July 20, 2006
Noise from the Perspective of the Brainstem
1. Something happened!! (VAD)
2. What happened?? (Recognition)
Noise from the Perspective of the Brainstem
1. Something happened!! (VAD)2. Which pixels belong to the new event
(auditory scene analysis)?3. What are their amplitudes (spectral
estimation)?4. What happened?? (Recognition)
A Speech Recognition Model (Mixture Gaussian)
• Problem: find the most probable class (Cn) given measurements of a function (f(S)) of the speech signal (S). For example, f(S) might be the PLP coefficients.
• Solution: Choose n such that p(Cn,S)>p(Cm,S) for n≠m.
• What is p(Cn,S)? The most effective current computational model is the mixture Gaussian: the weighted sum of exp(-|-f(S)|W2), where |x|W2≡xTWx. 2W is called the “precision” matrix.
How Should the Model React to Additive Noise?
• Suppose that we only have a noisy measurement, X:
... where V is independent noise. Then Cn should maximize:
Answer: By Computing fMMSE
Definition of fMMSE
Classical Estimators: Maximum Likelihood
Classical Estimators: Maximum A Posteriori
Classical Estimators: Minimum Mean Squared Error
Functions of Random Variables Since fMMSE(S)≠f(SMMSE), it is necessary to find the
probability density function of f(S) directly. Fortunately, the PDF of f(S) can always be computed from the PDF of S, as follows:
PDF of Speech: Time Domain• Speech samples s[n] are often
modeled as Gaussian, because Gaussian PDFs are easy to manipulate.
• In fact, noise tends to be Gaussian, but…
• Speech PDF is actually a mixture of Gaussian small-amplitude samples (the noise bits?) and Laplacian high-amplitude samples (the actual speech bits?)
PDF of Speech: Filter Outputs, e.g., STFT
• STFT is just a filter with complex-valued filter coefficients:
• Central Limit Theorem says Sk should be a 2D Gaussian, if the window is infinitely long…
Err…. Is it REALLY Gaussian?
If we ignore the previous slide, and pretend that Sk is a
complex Gaussian, then it is possible to analytically derive
the PDFs of |Sk|2, |Sk|, and phase of Sk. They are:
Classical Spectral Estimation: Assumptions
• One signal is ongoing (call it the “noise”), so its N is known (averaged over time prior to voice activity detection).
• The MMSE estimate of the other signal, S, combines two types of information:– a priori knowledge, S = E[|Sk|2]
• a priori SNR: k = S / N
–Maximum likelihood estimator, SML = |Xk|2-N
• Maximum likelihood SNR: k = |Xk|2 / N
Classical Spectral Estimation Results:Wiener Filter(Norbert Wiener, 1949)
Classical Spectral Estimation Results:MMSE Spectral Amplitude Estimate
(Ephraim and Malah, 1984)
Classical Spectral Estimation Results:MMSE Log Amplitude Estimate
(Ephraim and Malah, 1985)
How does it sound?
How does it sound?MVDR Beamformer
eliminates high-
frequency noise,
MMSE-logSA
eliminates low-
frequency noise
MMSE-logSA adds
reverberation at low
frequencies;
reverberation seems
to not effect speech
recognition accuracy
What about,oh, say…
PLP?
Loudness Spectrum• Perceptual LPC (PLP) begins by computing an
estimate of the perceptual loudness spectrum. – Step 1: filter the signal using complex-valued
Bark-scale critical-band filters hk[m]:
– Step 2: compress the amplitudes with a nonlinearity:
MMSE Estimate of the Perceptual Loudness Spectrum
• gPLP requires numerical integration (of u1/3e-u)• Numerical integration is a lot cheaper than it
used to be (e.g., via lookup table).
A Conservative Computational Auditory Model
Basilar Membrane:Mechanical
Filterbank(von Bekesy)
x(t)
VariableThresholdSynapses: Amplitude Compression(Ghitza, 1986)
AverageBackgroundLoudness lN
ChangeDetection
MMSEEstimator of “New Event”
E[ |S[k]|2/3 | X ]
Auditory Nerve Carries The Loudness
Spectrum~ |X[k]|2/3
(Fletcher;Hermansky)
Auditory Scene Analysis
Speech FeatureExtraction
PLP≈
PerceptualFormant
Extraction(Hermansky);
TandemFeatures
≈Perceptual
MagnetEffect
(Niyogi)
Change Detection (VAD)