independent component analysis€¦ · •given whitened zero-mean data z, find w such that y=wtz...
TRANSCRIPT
![Page 1: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/1.jpg)
Independent Component Analysis
CAP5610: Machine Learning
Instructor: Guo-Jun QI
![Page 2: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/2.jpg)
Review: Principle Component Analysis
• PCA aims to find a set of principle components that span a subspace,• Projecting data into this subspace will generate
minimum reconstruction error.
• Principle components should be orthogonal
• PCA projection
• Each row of W is a direction along which x will be projected.
𝐲 = 𝑊𝐱
𝑤1𝑤2
![Page 3: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/3.jpg)
PCA
• PCA removes the correlations between components, but it does not mean the components become independent.• No correlation: COV 𝑦1𝑦2 = 𝐸 𝑦1𝑦2 − 𝐸 𝑦1 𝐸 𝑦2 = 0
• Independence: p 𝑦1𝑦2 − 𝑝 𝑦1 𝑝 𝑦2 = 0
• Only for Gaussian distribution, no correlation means independence.
• Independence Component Analysis (ICA) aims at finding a set of independent components•
![Page 4: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/4.jpg)
Source separation problem
• M independent sources{𝑠1, … , 𝑠𝑀}
• Mixture observations of signals
𝑥𝑖 =
𝑗=1
𝑀
𝑎𝑖𝑗𝑠𝑗
𝐱 = 𝐴𝐬
• 𝐴 = [𝑎𝑖𝑗] is mixing matrix
• Can we find the mixing matrix and recover the sources?• ICA
![Page 5: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/5.jpg)
Inverse problem
• Mixture of signals𝐱 = 𝐴𝐬
• ICA: Find W, 𝐲 = 𝑊𝐱
so that The components of y are as much independent as possible. • y is an estimate of s
• W is an estimate of 𝐴−1
![Page 6: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/6.jpg)
PCA VS. ICA
• ICA finds the underlying independent components that generate the data.
![Page 7: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/7.jpg)
ICA for Natural images
• ICA components: corresponding to some natural image structures
![Page 8: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/8.jpg)
PCA for Natural images
• PCA components are orthogonal, which may not correspond to any independent structures in natural images.
![Page 9: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/9.jpg)
Applications: denoising images
• Noise and image are independent.
Original Noisy Median filter ICA
![Page 10: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/10.jpg)
Statistical independence
• Definition – independence
![Page 11: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/11.jpg)
Source ambiguity
• Independent sources can be recovered only up to sign, scale and permutation.
• If 𝐬 is changed by sign, scale and permutation, there exists another mixing matrix, so that the observed signals stay 𝐱 unchanged.• Proof: P is a permutation matrix, and D is a diagonal scaling matrix
𝐱 = 𝐴𝑃−1𝐷−1 [𝑃𝐷 𝐬]
𝐱 = 𝐴𝐬
![Page 12: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/12.jpg)
Preprocessing: subtracting mean
• Mean: 𝐦 = 𝐄 𝐱
• 𝐱 −𝐦
• In this case, the original sources 𝐬 also have zero mean
![Page 13: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/13.jpg)
Preprocessing: whitening • Covariance matrix of observed signals
• Do SVD,
• Let , then is the whitened signals, because
• Define as a new mixing matrix , then
• We also have
![Page 14: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/14.jpg)
Preprocessing: Benefit
• Reducing the number of parameters• The orthogonal matrix 𝐴∗ of N by N only has (N-1)(N-2)/2 free parameters.
![Page 15: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/15.jpg)
Solving ICA
• Problem: Given whitened zero mean x, find an orthogonal matrix W, so that the components in y=Wx are as much independent as possible.
• Question: how to measure the independence between the components?• Central limit theorem – the sum of a set of i.i.d. random variables approaches
to Gaussian distribution.
![Page 16: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/16.jpg)
Non-Gaussianity and independence
• y=wTx=wTAs is a weighted sum of s, where wT is a row vector of W.
• If y is a mixture of s (up to scale, sign), then y is closer to Gaussian
• Otherwise, y is not a mixture of s, but only one of its components, then y should be far away from Gaussian.
• Non-Gaussianity measures the independence of y.
![Page 17: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/17.jpg)
Measure of non-Gaussianity
• Kurtosis – the forth-order cumulant
![Page 18: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/18.jpg)
The Fast ICA algorithm (Hyvarinen)
• Lagragian function:𝐿 𝐰 = 𝑓 𝐰 + 𝜆(𝐰𝐰𝑇 − 1)
• Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐰𝐓𝐰 = 1.
• Maximize Kurtosis 𝑓 𝐰 = κ4 𝑦 = 𝐸 𝑦2 − 3, 𝑠. 𝑡. , 𝐰𝑇𝐰 = 1
• Lagragian function 𝑓 𝐰 + 𝜆 𝐰𝑇𝐰− 1
• KKT condition for constrained optimization problem 𝑓′ 𝐰 + 2𝜆𝐰 = 0
4𝐸 𝐰𝑇𝐳 3𝐳 + 2𝜆𝐰 = 0
![Page 19: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/19.jpg)
Algorithm
• Randomly initialize w(1)
• Updatew 𝑘 + 1 ← 𝐸 𝐰(𝑘)𝑇𝐳 3𝐳 − 3𝐰 𝑘
𝒘 𝑘 + 1 ←𝒘 𝑘 + 1
𝒘 𝑘 + 1
![Page 20: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/20.jpg)
Estimate the other components
• Given an estimate of w1, find other directions to recover more sources
• The 2nd w, with the similar formulation, but an additional constraint that 𝐰 ⊥ 𝐰1
• 3rd, 4th , each item an additional orthogonal constraint will be added…
![Page 21: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/21.jpg)
The other independence measure
• For all the distributions with the same variance, Gaussian has the maximal entropy.
• Minimizing negentropy
where 𝐲𝑔𝑎𝑢𝑠𝑠 is the Gaussian with the same covariance as y
• Because y=wTz, w is unity, z is a random variable with covariance I, y has a covariance matrix of I
![Page 22: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/22.jpg)
Approximation to Negentropy
• Negentropy is difficult to compute
• Approximation using 3rd order and 4th order cumulant
• Approximation using non-quadratic functions
![Page 23: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/23.jpg)
Question
• In MP3, we will use PCA to project images into a subspace where the obtained components are supposed to be independent. Is this assumption valid?
![Page 24: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/24.jpg)
Question
• In MP3, we will use PCA to project images into a subspace where the obtained components are supposed to be independent. Is this assumption valid?• PCA gets uncorrelated components.
• Under Gaussian, uncorrelated components imply independence.
• So we need to verify if the pixels are generated from a Gaussian• Using Kurtosis and negentropy.
![Page 25: Independent Component Analysis€¦ · •Given whitened zero-mean data z, find w such that y=wTz is far away from Gaussian, and w is a unit vector 𝐓 =1. •Maximize Kurtosis 𝑓](https://reader036.vdocument.in/reader036/viewer/2022081408/60610e506e2a0c4d514841ec/html5/thumbnails/25.jpg)
Summary
• ICA recovers a set of independent components
• PCA finds a set of uncorrelated components
• By central limit theorem, we use nongaussianity to find the independent component • Surrogate: Kurtosis and negentropy
• Fast ICA algorithm – iterative algorithm, no closed-form solution
• Application: separating independent sources from mixture signals• Image denoising
• Voice separation