independent component analysis zhen wei, li jin, yuxue jin department of statistics stanford...

29
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Upload: avis-campbell

Post on 13-Jan-2016

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Independent Component Analysis

Zhen Wei, Li Jin, Yuxue Jin

Department of StatisticsStanford University

An Introduction

Page 2: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Outline

Introduction History, Motivation and Problem Formulation

Algorithms Stochastic Gradient Algorithm FastICA Ordering Algorithm

Applications Concluding Remark

Page 3: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Introduction

There has been a wide discussion about the application of Independence Component Analysis (ICA) in Signal Processing, Neural Computation and Finance, first introduced as a novel tool to separate blind sources in a mixed signal. The Basic idea of ICA is to reconstruct from observation sequences the hypothesized independent original sequences.

Page 4: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

ICA versus PCA

Similarity Feature extraction Dimension reduction

Difference PCA uses up to second order moment of the

data to produce uncorrelated components ICA strives to generate components as

independent as possible

Page 5: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Motivation - Blind Source Separation Suppose that there are k unknown indepe

ndent sources

A data vector x(t) is observed at each time point t, such that

where A is a full rank scalar matrix

Page 6: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Mixingprocess

A

Blind source separation

Independentcomponents

BlindSource

De-mixingprocess

W…

Observedsequences

Recovered independen

t components

Page 7: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Problem formulation

The goal of ICA is to find a linear mapping W such that the unmixed sequences u

are maximally statistically independent Find some

where C is a diagonal matrix and P is a permutation matrix.

Page 8: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Principle of ICA: Nongaussianity

The fundamental restriction in ICA is that the independent components must be nongaussian for ICA to be possible.

This is because gaussianity is invariant under orthogonal transformation and hence make the matrix A not identifiable for gaussian independent components.

Page 9: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Measures of nongaussianity (1)

Kurtosis

Kurtosis can be very sensitive to outliers, when its value has to be estimate from a measured sample.

Page 10: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Measures of nongaussianity (2)

Negentropy A guassian variable has the largest entropy

among all random variables of equal variance. Definition:

where is entropy

and ygauss is a gaussian random variable of the same covariance matrix as y

Page 11: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Measures of nongaussianity (3) Mutual information

Definition:

Mutual information is a natural measure of the dependence between random variables.

It is always non-negative, and zero if and only if the variables are statistically independent.

Page 12: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Relation between negentropy and Mutual Information If we constrain yi to be uncorrelated and of unit var

iance

where C is a constant that does not depend on W. This shows that finding an invertible transformatio

n W that minimizes the mutual information is equivalent to finding directions in which the negentropy is maximized.

Page 13: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Algorithms

Maximum likelihood Bell and Sejnowski (1995)Maximum entropyMinimum mutual information

Low-Complexity Coding and Decoding (LOCOCODE ) Sepp Hochreiter et al. (1998)

Neuro-mimetic approach

Page 14: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Maximum Likelihood

The log-likelihood is:

where the fi are the density functions of the si

Connection to mutual information:

if the fi were equal to the true distributions of

Page 15: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Stochastic Gradient Algorithm

Initialize the weight matrix W Iteration:

where is the learning rate, g is a nonlinear function, e.g.

Repeat until converges to The ICAs are the components of

Page 16: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

FastICA - Preprocessing

Centering:Make the x-s mean 0 variables

WhiteningTransform the observed vector x linearly so

that it has unit variance:One can show that:

where

Page 17: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

FastICA algorithm

Initialize the weight matrix W Iteration:

where Repeat until convergence

The ICAs are the components of

Page 18: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Ordering of the ICAs

Unlike PCA which has well-defined and intuitive explanation of the ordering of its components, i.e. the eigen values of its covariance matrix, ICA, however, deserves further investigation on this particular problem since a particular kind of ordering is not readily at hand.

Follow a heuristic scheme called: testing-and-acceptance (TNA)

Page 19: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Ordering Algorithm

Page 20: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Applications (1)

Feature extraction: Recognize the pattern of excess returns of Mutual Funds in the financial market of China

Data: the time series of excess returns of four mutual funds in the financial market of China

Page 21: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

ICA components

Page 22: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

ICA reconstruction

Page 23: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Applications (2)

Image de-noising ICASparse Code Shrinkage

The example is exacted from (Hyvarinen, 1999).

Page 24: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Image de-noising (1)

Suppose a noisy image model holds:

where n is uncorrelated noise.

where W is an orthogonal matrix that is the best orthogonal approximation of the inverse of the ICA mixing matrix.

Page 25: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Image de-noising (1)

Sparse code shrinkage transformation:

Function g(.) is zero close to the origin and linear after a cutting value depending on the parameters of the Laplacian density and the Gaussian noise density.

Page 26: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

1. Original image 2. Corrupted with noise

3. Recover by ICA and Sparse Code Shrinkage

3. Recover by classical wiener filtering

Page 27: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Concluding Remarks

ICA is a very flexible and widely-applicable tool which searches the linear transformation of the observed data into statistically maximally independent components

It is also interesting to note that the methods to compute ICA: maximum negentropy, minimum Mutual Information, maximum likelihood are equivalent to each other (at least in the statistical sense). There is also resemblance between the forms of the gradient descent (Newton Raphson) algorithm and the FastICA algorithm.

Other application prospects: audio (signal) processing, image processing, telecommunication, Finance, Education

Page 28: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

References

[1] Amari, S., Cichocki, A., and Yang, H. (1996). A New Learning Algorithm for Blind Signal Separation, Advances in Neural Information Processing Systems 8, pages 757-763.

[2] Bell, A. J. and Sejnowski, T. J. (1995). An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation, 7:1129-1159

[3] Cardoso, J. and Soloumiac, A. (1993). Blind beamforming for non-Gaussian signals. IEEE Proceedings-F, 140(46):362-370.

[4] Chatfield, C. (1989). Analysis of Time Series: An Introduction, Fourth Edition. London: Chapman and Hall.

Page 29: Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

References continued

[5] Moulines, E., Cardoso, J.-F., and Cassiat, E. (1997). Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models. Proc. ICASSP’ 97, volume 5, pages 3617-3620, Munich.

[6] Nadal, J.-P. and Parga, N. (1997). Redundancy reduction and independent component analysis: Conditions on cumulants and adaptive approaches. Neural Computation, 9:1421-1456.

[7] Xu, L., Cheung, C., Yang, H., and Amari, S. (1997). Maximum equalization by entropy maximization and mixture of cumulative distribution functions. Proc. Of ICNN’97, pages 1821-1826, Houston

[8] Yang, H., Amari, S., and Cichocki, A. (1997). Information back-propagation for blind separation of sources from non-linear mixtures. Proc. of ICNN, pages 2141-2146, Houston