dealing with acoustic noise part 2: beamforming

32
Dealing with Acoustic Noise Part 2: Beamforming Mark Hasegawa-Johnson University of Illinois Lectures at CLSP WS06 July 25, 2006

Upload: faunia

Post on 14-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Dealing with Acoustic Noise Part 2: Beamforming. Mark Hasegawa-Johnson University of Illinois Lectures at CLSP WS06 July 25, 2006. 8 Mics, Pre-amps, Wooden Baffle. Best Place= Sunvisor. 4 Cameras, Glare Shields, Adjustable Mounting Best Place= Dashboard. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dealing with Acoustic Noise  Part 2: Beamforming

Dealing with Acoustic Noise

Part 2: Beamforming

Mark Hasegawa-JohnsonUniversity of Illinois

Lectures at CLSP WS06July 25, 2006

Page 2: Dealing with Acoustic Noise  Part 2: Beamforming

AVICAR Recording Hardware

4 Cameras,

Glare Shields,

Adjustable

Mounting

Best Place=

Dashboard

8 Mics,

Pre-amps,

Wooden

Baffle.

Best Place=

Sunvisor.

System is not permanently installed; mounting requires 10 minutes.

Page 3: Dealing with Acoustic Noise  Part 2: Beamforming

AVICAR Database● 100 Talkers● 4 Cameras, 7 Microphones● 5 noise conditions: Engine idling, 35mph, 35mph with

windows open, 55mph, 55mph with windows open● Three types of utterances:

– Digits & Phone numbers, for training and testing phone-number recognizers

– Phonetically balanced sentences, for training and testing large vocabulary speech recognition

– Isolated letters, to see how video can help with an acoustically hard problem

● Open-IP public release to 15 institutions, 5 countries

Page 4: Dealing with Acoustic Noise  Part 2: Beamforming

Noise and Lombard Effect

Page 5: Dealing with Acoustic Noise  Part 2: Beamforming

Beamforming Configuration

Microphone Array

ClosedWindow(AcousticReflector)

Talker(AutomobilePassenger)

OpenWindow(NoiseSource)

Page 6: Dealing with Acoustic Noise  Part 2: Beamforming

Frequency-Domain Expression

• Xmk is the measured signal

at microphone m in frequency band k

• Assume that Xmk was

created by filtering the speech signal Sk through the room response filter Hmk and adding noise Vmk.

• Beamforming estimates an “inverse filter” Wmk.

Page 7: Dealing with Acoustic Noise  Part 2: Beamforming

Time-Domain Approximation

Page 8: Dealing with Acoustic Noise  Part 2: Beamforming

Optimality Criteria for Beamforming

• ŝ[n] = WTX is the estimated speech signal• W is chosen for– Distortionless response:

WTHS = S Time Domain: WTH = [1,0,…,0] Freq Domain: WHH = 1– Minimum variance:

W = argmin(WTRW), R = E[VVT]– Multichannel spectral estimation:

E[f(S)|X] = E[f(S)|Ŝ]

Page 9: Dealing with Acoustic Noise  Part 2: Beamforming

Delay-and-Sum Beamformer

Suppose that the speech signal at microphone m arrives earlier than at some reference microphone, by nm samples. Then we can estimate S by delaying each microphone nm samples, and adding:

…or…

Page 10: Dealing with Acoustic Noise  Part 2: Beamforming

What are the Delays?

d

θ

dsinθ

Far-field approximation: inter-microphone delay τ = d sinθ/c

Near-field (talker closer than ~10d): formulas exist

Page 11: Dealing with Acoustic Noise  Part 2: Beamforming

Delay-and-Sum Beamformer has Distortionless Response

In the noise-free, echo-free case, we get…

Page 12: Dealing with Acoustic Noise  Part 2: Beamforming

Delay-and-Sum Beamformer with Non-Integer Delays

Page 13: Dealing with Acoustic Noise  Part 2: Beamforming

Distortionless Response for Channels with Non-Integer Delays

or Echoes

Page 14: Dealing with Acoustic Noise  Part 2: Beamforming

Beam Patterns• Seven microphones, spaced 4cm apart

• Delay-and-Sum Beamformer, Steered to =0

Page 15: Dealing with Acoustic Noise  Part 2: Beamforming

Minimum Variance Distortionless Response

(Frost, 1972)

• Define an error signal, e[n]:

• Goal: minimize the error power, subject to the constraint of

distortionless response:

Page 16: Dealing with Acoustic Noise  Part 2: Beamforming

Minimum Variance Distortionless Response: The Solution

• The closed-form solution (for stationary noise):

• The adaptive solution: adapt WB so that…

Page 17: Dealing with Acoustic Noise  Part 2: Beamforming

Beam Patterns, MVDR• MVDR beamformer, tuned to cancel a noise

source distributed over 60 < θ < 90 deg.• Beam steered to θ = 0

Page 18: Dealing with Acoustic Noise  Part 2: Beamforming

Multi-Channel Spectral Estimation• We want to estimate some function f(S), given a multi-

channel, noisy, reverberant measurement X:

• Assume that S and X are jointly Gaussian, thus:

Page 19: Dealing with Acoustic Noise  Part 2: Beamforming

p(X|S) Has Two Factors

• Where Ŝ is (somehow, by magic) the MVDR beamformed signal:

… and the covariance matrices of Ŝ and its orthogonal complement are:

Page 20: Dealing with Acoustic Noise  Part 2: Beamforming

Sufficient Statistics for Multichannel Estimation

(Balan and Rosca, SAMSP 2002)

Page 21: Dealing with Acoustic Noise  Part 2: Beamforming

Multi-Channel Estimation: Estimating the Noise

• Independent Noise Assumption• Measured Noise Covariance– Problem: Rv may be singular, especially if estimated from a

small number of frames

• Isotropic Noise (Kim and Hasegawa-Johnson, AES 2005)

– Isotropic = Coming from every direction with equal power

– The form of Rv has been solved analytically, and is guaranteed to never be exactly singular

– Problem: noise is not really isotropic, e.g., it comes primarily from the window and the radio

Page 22: Dealing with Acoustic Noise  Part 2: Beamforming

Isotropic Noise: Not Independent

Page 23: Dealing with Acoustic Noise  Part 2: Beamforming

Adaptive Filtering for Multi-channel Estimation

(Kim, Hasegawa-Johnson, and Sung, ICASSP 2006)

δT(HTH)-1HTX

BT WBT

+ SpectralEstimation

Ŝ+

-

Page 24: Dealing with Acoustic Noise  Part 2: Beamforming

MVDR with Correct Assumed Channel

MVDR eliminates high-

frequency noise,

MMSE-logSA

eliminates low-

frequency noise

MMSE-logSA adds

reverberation at low

frequencies;

reverberation seems

to not effect speech

recognition accuracy

Page 25: Dealing with Acoustic Noise  Part 2: Beamforming

MVDR with Incorrect Assumed Channel

Page 26: Dealing with Acoustic Noise  Part 2: Beamforming

Channel Estimation• Measured signal x[n] is the sum of noise (v[n]), a “direct

sound” (a0s[n-n0]), and an infinite number of echoes:

• The process of adding echoes can be written as convolution:

Page 27: Dealing with Acoustic Noise  Part 2: Beamforming

Channel Estimation• If you know enough about Rs[m], then the echo times can be

estimated from Rx[m]:

Page 28: Dealing with Acoustic Noise  Part 2: Beamforming

Channel Estimation• For example, if the “speech signal” is actually white noise,

then Rx[m] has peaks at every inter-echo-arrival time:

Page 29: Dealing with Acoustic Noise  Part 2: Beamforming

Channel Estimation• Unfortunately, Rs[m] is usually not sufficiently well known to

allow us to infer ni and nj

Page 30: Dealing with Acoustic Noise  Part 2: Beamforming

Channel Estimation

• Seeded methods, e.g., maximum-length sequence pseudo-noise signals, or chirp signals– Problem: channel response from loudspeaker to

microphone not same as from lips to microphone• Independent components analysis– Problem: doesn’t work if speech and noise are

both nearly Gaussian

Page 31: Dealing with Acoustic Noise  Part 2: Beamforming

Channel Response as a Random Variable: EM Beamforming(Kim, Hasegawa-Johnson, and Sung, in preparation)

Page 32: Dealing with Acoustic Noise  Part 2: Beamforming

WER Results, AVICARTen-digit phone numbers; trained and tested with 50/50 mix of

quiet (engine idling) and very noisy (55mph, windows open)