3/24/2006lecture notes for speech communications multi-channel speech enhancement chunjian li dicom,...
Post on 20-Dec-2015
221 views
TRANSCRIPT
3/24/2006 Lecture notes for Speech Communications
Multi-channel speech enhancement
Chunjian Li
DICOM, Aalborg University
3/24/2006 Lecture notes for Speech Communications
Methods & applied fields
Dual-channel spectral subtraction
- noise reduction in speech
Adaptive Noise Canceling (ANC)
- noise reduction and interference elimination
- echo canceling
- adaptive beamforming Blind Source Separation (BSS) Blind Source Extraction (BSE)
3/24/2006 Lecture notes for Speech Communications
Dual-channel spectral subtraction
- Hanson and Wong, ICASSP84.
3/24/2006 Lecture notes for Speech Communications
The method
The exponent is chosen to be a=1 based on listening test and spectral distortion measure.
The noisy phase is used in the reconstruction of signal.
The estimate of noise spectrum is either obtained from a reference channel or estimated from the noisy signal assuming the SNR is very low (about -12 dB).
3/24/2006 Lecture notes for Speech Communications
Revisiting the phase issue
a
a
a
a
fjaaa
fNfN
fS
fN
fSfNfS
efNfNfSfS
1
22
)(
1
)(ˆ)cos()(
)(2
)(
)(1)()(ˆ
)(ˆ)()()(ˆ
To see the dependency of magnitude on phase:
where is the phase difference between the two signals.
It is clear that the estimate of signal magnitude spectrum depends on both the SNR and the phase difference. But phase is not estimated in this method because the enhanced quality is acceptable.
3/24/2006 Lecture notes for Speech Communications
Comments
The simplest (and a bit unrealistic) form of exploiting multi-channel.
Aims at improving intelligibility. Significant intel. gains only at very low
SNR (-12dB). Unvoiced speech is not processed.
3/24/2006 Lecture notes for Speech Communications
Adaptive Noise Canceling
First proposed by Widrow et al. [1] in 1975. It is adaptive because of the use of adaptive
filter such as the LMS algorithm. The objective: estimate the noise in the
primary channel using the noise recorded in the secondary channel, and subtract the estimate from the primary channel recordings.
[1] B. Widrow, J. R. Grover, J. M. McCool et al. ”Adaptive noise canceling: Principles and applications,” Proceedings of the IEEE, vol.63, pp. 1692-1716, Dec. 1975.
3/24/2006 Lecture notes for Speech Communications
Signal model
3/24/2006 Lecture notes for Speech Communications
Signal estimation
)(ˆ)()(ˆ 1 ndnyns
1
021 )()(ˆ)(ˆ
M
i
indihnd
The optimization criterion:
The estimated signal:
21
02 )()(ˆ)(minargˆ
M
ihindihnyh
3/24/2006 Lecture notes for Speech Communications
Signal estimation
The minimization can be solved by applying the orthogonality principle:
0)()(ˆ)(22
0
irihr d
M
iyd
This can be solved in the same way as solving the normal equations. But it is usually solved by sequential algorithms such as the LMS algorithm. The advantages of the LMS are: -No matrix inversion, low complexity-Fully adaptive, suitable to non-stationary signal and noise-Low delay
3/24/2006 Lecture notes for Speech Communications
LMS
ghh kkk 1ˆˆ
-It is a sequential, gradient descent minimization method,
- The estimate of the weights is updated each time a new sample is available:
Where the element of the gradient vector:
1
0
)( )()(ˆ)(2)(ˆ 22
M
idyd irihr
hg
3/24/2006 Lecture notes for Speech Communications
LMS
222
ˆ Hd ddR
)ˆ(222hRrg dyd
The most important trick is, in this sequential implementation, to approximate the correlation matrix and cross-correlation vector byThe instantaneous estimates.
Or, in matrix form:
)(22nyyd dr
3/24/2006 Lecture notes for Speech Communications
LMS
max
10
max
The step size is often chosen empirically, as long as the following condition is satisfied for stability reason:
where is the largest eigenvalue of the matrix2d
R
The larger the step-size, the faster the convergence, but also the larger estimation variance.
3/24/2006 Lecture notes for Speech Communications
Comments
The LMS belongs to the stochastic gradient algorithm.
The algorithm is based on the instantaneous estimates of correlation function, which are of high variance. But the algorithm works well because of its iterative nature, which averages the estimate over time.
Low complexity: O(M), where M is the filter order. Although the derivation is based on WSS
assumption, the algorithm is applicable to stationary signals, due to the sequential implementation.
3/24/2006 Lecture notes for Speech Communications
Implementation issues of ANC
Microphones must be sufficiently separated in space or contain acoustic barriers.
Typically 1500 taps are needed => large misadjustment => pronounced echo => must use small step-size => long convergence time.
Different delays from the sources to the two microphones must be taken care of.
Frequency domain LMS can reduces the number of taps needed.
ANC can be generalizes to a multi-channel system, which can be seen as a generalized beamforming system.
3/24/2006 Lecture notes for Speech Communications
Eliminating cross-talk
Cross-talk: If the signal is also captured in the reference channel, the ANC will suppress part of the signal. Cross-talk can be reduced by employing two adaptive filter within a feedback loop.
3/24/2006 Lecture notes for Speech Communications
Beamforming
Compared to ANC, beamforming is truly a spatial filtering technique.
First, locate the source direction; then form a beam directing to the source.
The source location problem is a analogy of the spectral analysis problem, with the frequency domain replaced by the spatial domain.
3/24/2006 Lecture notes for Speech Communications
A simple array model
Planar wave Uniform linear array Sensors responses are identical and
LTI Sensors are omni directional One parameter to estimate: DOA
3/24/2006 Lecture notes for Speech Communications
ULA
3/24/2006 Lecture notes for Speech Communications
ULA
)()()()( ttst eay
Tjj mcc ee ...1)( 2ac
The signal model:
where the array transfer vector :
Where is the delay with reference to the first sensor, and is the center frequency of the signal. By defining the spatial frequency as:
m
c
dcs
sin
we can write the array transfer vector as:
Tmjj ss ee )1(...1)( a
3/24/2006 Lecture notes for Speech Communications
ULA
A direct analogy between frequency analysis and spatial analysis using the spatial frequency.
To avoid spatial aliasing:
All frequency analysis techniques can be applied to the DOA estimation problem.
2/d
3/24/2006 Lecture notes for Speech Communications
Spatial filtering
Analogy between spatial filter and temporal filter
3/24/2006 Lecture notes for Speech Communications
Spatial filtering
The spatially filtered signal: Objective: find the filter that passes
undistorted the signals with a given DOA; and attenuates all the other DOAs as much as possible.
1)(min ** ahhhh
tosubject
)()()( * tstx ah
3/24/2006 Lecture notes for Speech Communications
The beam pattern
3/24/2006 Lecture notes for Speech Communications
Restrictions to beamforming
Very sensitive to array geometry, need good calibration
Has only directivity, no selectivity in range or other location parameters
Frequency response is not flat Ambient noises are assumed to be spatially
white Beam width (or selectivity) depends on the
size of the array Spatial aliasing problem
3/24/2006 Lecture notes for Speech Communications
Blind Source Separation (BSS)
MIMO systems Spatial processing techniques with no
knowledge of array geometry Invisible beam Arbitrarily high spatial resolution Do not depend on signal frequency Spatial noise is not assumed to be white Not a spatial sampling system
3/24/2006 Lecture notes for Speech Communications
Solutions to BSS
Independent Component Analysis (ICA) [2]
Independent Factor Analysis (IFA) [3]
[2] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, Inc. 2001[3] H. Attias, “Independent factor analysis”, Neural Computation, 1999.
3/24/2006 Lecture notes for Speech Communications
Independent component analysis (ICA)
Instantaneous mixing The number of sensors is greater than
or equal to the number of sources No system noise The sources (components) are
independent of each other The sources are non-Gaussian
processes
3/24/2006 Lecture notes for Speech Communications
ICA model
)()()(
)()()(
)()()(
)(
)(
)(
333232131
323222112
313212111
3
2
1
tsatsatsa
tsatsatsa
tsatsatsa
tx
tx
tx
Cocktail party problem. Three sources, three sensors:
Neither s nor A are known. Can not be solved by linear algebra. If the sources are independent non-Gaussian, the A matrix can be found by maximizing the non-Gaussianity of the sources.
Asx
Or, in matrix form
3/24/2006 Lecture notes for Speech Communications
Contrast function
An iterative gradient method. First initialize the A matrix.If the mixing matrix A is square and non-singular, move it to the left:
sxA 1
Calculate the non-Gaussianity of s, and find the next estimate of A that gives a higher non-Gaussianity. Iterate until convergence.
The contrast function is the objective function to maximize or minimize.
3/24/2006 Lecture notes for Speech Communications
Maximizing non-Gaussianity
Non-Gaussian is independent Measuring non-Gaussianity
- by kurtosis
- by negentropy
3/24/2006 Lecture notes for Speech Communications
ICA methods
ICA by maximizing non-Gaussianity ICA by Maximum Likelihood ICA by minimizing mutual information ICA by nonlinear decorrelation
3/24/2006 Lecture notes for Speech Communications
Extensions to ICA
Noisy ICA ICA with non-square mixing matrix Independent Factor Analysis Convolutive mixture Methods using time structure
3/24/2006 Lecture notes for Speech Communications
Blind Source Extraction
Only interested in one or a few sources out of many (feature extraction)
Save computation Don’t know the exact number of
sources
3/24/2006 Lecture notes for Speech Communications
BSE
D. Mandic and A. Cichocki, An Online Algorithm For Blind Extraction Of Sources With Different Dynamical Structures.