Download - Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Independent Component Analysis

CAP5610: Machine Learning

Instructor: Guo-Jun QI

Review: Principle Component Analysis

• PCA aims to find a set of principle components that span a subspace,• Projecting data into this subspace will generate

minimum reconstruction error.

• Principle components should be orthogonal

• PCA projection

• Each row of W is a direction along which x will be projected.

𝐲 = 𝑊𝐱

𝑤1𝑤2

PCA

• PCA removes the correlations between components, but it does not mean the components become independent.• No correlation: COV 𝑦1𝑦2 = 𝐸 𝑦1𝑦2 − 𝐸 𝑦1 𝐸 𝑦2 = 0

• Independence: p 𝑦1𝑦2 − 𝑝 𝑦1 𝑝 𝑦2 = 0

• Only for Gaussian distribution, no correlation means independence.

• Independence Component Analysis (ICA) aims at finding a set of independent components•

Source separation problem

• M independent sources{𝑠1, … , 𝑠𝑀}

• Mixture observations of signals

𝑥𝑖 =

𝑗=1

𝑀

𝑎𝑖𝑗𝑠𝑗

𝐱 = 𝐴𝐬

• 𝐴 = [𝑎𝑖𝑗] is mixing matrix

• Can we find the mixing matrix and recover the sources?• ICA

Inverse problem

• Mixture of signals𝐱 = 𝐴𝐬

• ICA: Find W, 𝐲 = 𝑊𝐱

so that The components of y are as much independent as possible. • y is an estimate of s

• W is an estimate of 𝐴−1

PCA VS. ICA

• ICA finds the underlying independent components that generate the data.

ICA for Natural images

• ICA components: corresponding to some natural image structures

PCA for Natural images

• PCA components are orthogonal, which may not correspond to any independent structures in natural images.

Applications: denoising images

• Noise and image are independent.

Original Noisy Median filter ICA

Statistical independence

• Definition – independence

Source ambiguity

• Independent sources can be recovered only up to sign, scale and permutation.

• If 𝐬 is changed by sign, scale and permutation, there exists another mixing matrix, so that the observed signals stay 𝐱 unchanged.• Proof: P is a permutation matrix, and D is a diagonal scaling matrix

𝐱 = 𝐴𝑃−1𝐷−1 [𝑃𝐷 𝐬]

𝐱 = 𝐴𝐬

Preprocessing: subtracting mean

• Mean: 𝐦 = 𝐄 𝐱

• 𝐱 −𝐦

• In this case, the original sources 𝐬 also have zero mean

Preprocessing: whitening • Covariance matrix of observed signals

• Do SVD,

• Let , then is the whitened signals, because

• Define as a new mixing matrix , then

• We also have

Preprocessing: Benefit

• Reducing the number of parameters• The orthogonal matrix 𝐴∗ of N by N only has (N-1)(N-2)/2 free parameters.

Solving ICA

• Problem: Given whitened zero mean x, find an orthogonal matrix W, so that the components in y=Wx are as much independent as possible.

• Question: how to measure the independence between the components?• Central limit theorem – the sum of a set of i.i.d. random variables approaches

to Gaussian distribution.

Non-Gaussianity and independence

• y=wTx=wTAs is a weighted sum of s, where wT is a row vector of W.

• If y is a mixture of s (up to scale, sign), then y is closer to Gaussian

• Otherwise, y is not a mixture of s, but only one of its components, then y should be far away from Gaussian.

• Non-Gaussianity measures the independence of y.

Measure of non-Gaussianity

• Kurtosis – the forth-order cumulant

The Fast ICA algorithm (Hyvarinen)

• Lagragian function:𝐿 𝐰 = 𝑓 𝐰 + 𝜆(𝐰𝐰𝑇 − 1)

• Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐰𝐓𝐰 = 1.

• Maximize Kurtosis 𝑓 𝐰 = κ4 𝑦 = 𝐸 𝑦2 − 3, 𝑠. 𝑡. , 𝐰𝑇𝐰 = 1

• Lagragian function 𝑓 𝐰 + 𝜆 𝐰𝑇𝐰− 1

• KKT condition for constrained optimization problem 𝑓′ 𝐰 + 2𝜆𝐰 = 0

4𝐸 𝐰𝑇𝐳 3𝐳 + 2𝜆𝐰 = 0

Algorithm

• Randomly initialize w(1)

• Updatew 𝑘 + 1 ← 𝐸 𝐰(𝑘)𝑇𝐳 3𝐳 − 3𝐰 𝑘

𝒘 𝑘 + 1 ←𝒘 𝑘 + 1

𝒘 𝑘 + 1

Estimate the other components

• Given an estimate of w1, find other directions to recover more sources

• The 2nd w, with the similar formulation, but an additional constraint that 𝐰 ⊥ 𝐰1

• 3rd, 4th , each item an additional orthogonal constraint will be added…

The other independence measure

• For all the distributions with the same variance, Gaussian has the maximal entropy.

• Minimizing negentropy

where 𝐲𝑔𝑎𝑢𝑠𝑠 is the Gaussian with the same covariance as y

• Because y=wTz, w is unity, z is a random variable with covariance I, y has a covariance matrix of I

Approximation to Negentropy

• Negentropy is difficult to compute

• Approximation using 3rd order and 4th order cumulant

• Approximation using non-quadratic functions

Question

• In MP3, we will use PCA to project images into a subspace where the obtained components are supposed to be independent. Is this assumption valid?

Question

• In MP3, we will use PCA to project images into a subspace where the obtained components are supposed to be independent. Is this assumption valid?• PCA gets uncorrelated components.

• Under Gaussian, uncorrelated components imply independence.

• So we need to verify if the pixels are generated from a Gaussian• Using Kurtosis and negentropy.

Summary

• ICA recovers a set of independent components

• PCA finds a set of uncorrelated components

• By central limit theorem, we use nongaussianity to find the independent component • Surrogate: Kurtosis and negentropy

• Fast ICA algorithm – iterative algorithm, no closed-form solution

• Application: separating independent sources from mixture signals• Image denoising

• Voice separation

Download - Independent Component Analysis - CS Departmentgqi/CAP5610/CAP5610Lecture13.pdf · PCA •PCA removes the correlations between components, but it does not mean the components become

Top Related