independent component analysis - Örebro...
TRANSCRIPT
Independent Component Analysis
PhD SeminarJörgen Ungh
Agenda
• Background – a motivater• Independence• ICA vs. PCA• Gaussian data• ICA theory• Examples
Background & motivation
• The cocktail party problem
Hi hiBla bla
Blabla
Background & motivation
• The cocktail party problem
Hi hiBla bla
Blabla
Background & motivation
• The cocktail party problem
Hi hiBla bla
Blabla
x1
x3
x2
s1 s2s3
Cocktail party problem• Let s1(t), s2(t) and s3(t) be the original spoken signals• Let x1(t), x2(t) and x3(t) be recorded signals• The connection between s and x can be written
x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t)x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t)x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t)
Goal: Estimate s1, s2 and s3 from x1, x2 and x3?Problem: We do not know anything about the right side…
Cocktail party problem
• Example
Microphone 1
Microphone 2
Separated 1
Separated 2
http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html
”Today we celebrate ourindependence day”
- US President THOMAS J. WHITMORE (Bill Pullman) in Independence Day (1996)
Independence – what is it?Independence = Uncorrelation?
Definitions
}))({( Tyxxy mymxEC −−=Covariance:
}{T
xy yxER =Correlation:
xyxyyx RCthenmmIf === ,0
Uncorrelated
• Two vectors are uncorrelated if:
0}))({( =−−= Tyxxy mymxEC
Tyx
TT
xy mmyExEyxER === }{}{}{
0,0 ==== xyxyyx RCthenmmIf
…from now we assume zero mean variables
Independent• Vectors x,y are independent if:
• Which also gives:
• Where gx and gy are arbitrary functions of x and y
)()(),(, ypxpyxp yxyx =
)}({)}({)}()({ ygExgEygxgE yxyx =
Independent
Independent is stronger than uncorrelated!
}{}{}{TTyExEyxE =
)}({)}({)}()({ ygExgEygxgE yxyx =
Equal if linear functions of x and y
Independent ≠ Uncorrelated
x
y
x
y
Are x and y uncorrelated?
Independent ≠ Uncorrelated
x
y
x
y
YESYES
Are x and y uncorrelated?
Independent ≠ Uncorrelated
x
y
x
y
Are x and y independent?
Independent ≠ Uncorrelated
x
y
x
y
No YES
Are x and y independent?
Relations
Independent Uncorrelated
BUT
Uncorrelated Independent
ICA vs. PCA
Independent Principal
PCA• Goal: ”Project data onto an ortonormal
basis with maximum variance”
• Data explained by principal components
e1
e2
PCA
• Uses information up to second moment, i.e. the mean and variance/covariance
• Reduce dimensions of data
• Ortonormal basis of uncorrelated vectors
ICA• Goal: ”Find the independent sources”
• Data explained by independent components
x
y
e1
e2
ICA
• Uses information over second moment, i.e. higher order statistics like kurtosis and skewness
• Does not reduce dimensions of data
• A basis of independent vectors
ICA vs. PCA
• Independent is stronger
• In case of Gaussian data, ICA = PCA
Gaussian data
Gaussian distribution
• Definition:
⎟⎠⎞
⎜⎝⎛ −−−= − )()(21exp
)2(
1)( 1
212/
µµπ
xCxC
xf T
Nx
C = covariance matrix, µ = mean vector
Explained completely by first and second orderstatistics, i.e. mean and variances
Gaussian data
• Cannot perform a rotation of the basis, due to symmetry
Gaussian distribution
• Completely defined by its first and secondmoment
• Uncorrelated Gaussian data Independence
• Why assume gaussian data?
Central limit theorem• Definition:
”A sum of independent random variables will tendto be Gaussian”
• That is the argument for many assumptions on gaussian distributions
Central limit theorem
• Definition:
”A sum of independent random variables willtend to be Gaussian”
What if we put it in another way…?
Central limit theorem
• 2:nd definition:
”The mixtures of two or more independent random variables are more gaussian thanthe random variables themselves”
A single random u.d. variable
A mixture of 2 u.d. variables
Idea!• The observed mixtures should be more
gaussian than the original components
• The original components should be less gaussian than the mixture
• If we try to maximize the non-gaussianity of the data, we should get closer to the original components…
ICA theory
• Problem definition• Solution • Preprocessing• Different methods• Examples
ICA: Definition of the problem• Let s1(t), s2(t) and s3(t) be the original signals• Let x1(t), x2(t) and x3(t) be collected signals• The connection between s and x can be written
x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t)x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t)x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t)
Goal: Estimate s1, s2 and s3 from x1, x2 and x3?
ICA: Assumption
• Independence
• Non-gaussian
• Square mixing matrix
ICA: Idea
• Maximize non-gaussianity of the data!
• We need a measure of ”Gaussianity” or ”Non-gaussianity”
Measures of Gaussianity
1. Kurtosis
Assuming zero mean variables
{ } { }( )224 3)( yEyEykurt −=
Measures of Gaussianity
1. Kurtosis
Assuming zero mean and unit variance
{ } 3)( 4 −= yEykurt
Measures of Gaussianity1. Kurtosis
For Gaussian data we have:
Which gives kurt(y) = 0, for Gaussian data
For most other, kurt ≠ 0, positive or negative
{ } { }( )224 3)( yEyEykurt −=
{ } { }( )224 3 yEyE =
Measures of Gaussianity
1. Kurtosis
Maximize |kurt(y)|
Measures of Gaussianity
1. Kurtosis
Maximize |kurt(y)|
Advantages:- Easy to compute
Drawbacks:- Sensitive to outliers
Measures of Gaussianity2. Negentropy
where, H = entropy, defined as:
)()()( yHyHyJ gauss −=
ηηη dppyH yy )(log)()( ∫−=
Measures of Gaussianity2. Negentropy
)()()( yHyHyJ gauss −=
Gaussian data has the largest entropy, meaning that it is the ”most random” distrubution.
Measures of Gaussianity2. Negentropy
)()()( yHyHyJ gauss −=
Gaussian data has the largest entropy, meaning that it is the ”most random” distrubution.
J(y) > 0 and equals zero if y gaussian
Measures of Gaussianity
1. Negentropy
Maximize J(y)
Advantages:- Robust
Drawbacks:- Computationally hard
ICA: Solutions
• Kurtosis• Negentropy• Maximum likelihood• Infomax• Mutual information• …
ICA: Solutions
• Kurtosis• Negentropy• Maximum likelihood• Infomax• Mutual information• …
Based on independenceand/or non-gaussian
ICA: Restrictions
• Non-gaussian data*
• Scaling, sign and order of components
• Need to know the No. of components
ICA: Restrictions
• Non-gaussian data*
• Scaling, sign and order of components
• Need to know the No. of components
* In case of some Gaussian data, the independent components will still be found, but the Gaussian oneswill be mixed.
ICA: Preprocessing
• No reduction of dimension in ICA
• Need to know the number of components
• But, we do already have a method for dimension reduction and estimating the probable number of components
ICA: Preprocessing
• No reduction of dimension in ICA
• Need to know the number of components
• But, we do already have a method for dimension reduction and estimating the probable number of components
Use PCA as a preprocessing step!
ICA: Preprocessing
• Low pass filtering+ Reduces noise– Reduces independence
• High pass filtering+ Increases independence- Increases noise
ICA: Overlearning
• Much more mixtures than independent components
• Spiky character of the components
Examples
• Cocktail party• Music separation• Image analysis• Separation of recorded signals of brain
activity• Process data• Noise/Signal separation• Process monitoring
Cocktail party problem
Hi hiBla bla
Blabla
Music separation
Mix 1 Est 1Source 1
Mix 2 Est 2Source 2
Mix 3 Est 3Source 3
Mix 4 Est 4Source 4
http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi
Music separation
Mix 1 Est 1Source 1
Mix 2 Est 2Source 2
Mix 3 Est 3Source 3
Mix 4 Est 4Source 4
http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi
Image analysis - NLPCA
Brain activity
S1
S3
S2S4
Process data
0 200 400 600 800 1000 1200-5
0
5Mixed signals
0 200 400 600 800 1000 1200-5
0
5
0 200 400 600 800 1000 1200-5
0
5
0 200 400 600 800 1000 12000
1
2
Process data
0 200 400 600 800 1000 1200-20
0
20Whitened signals
0 200 400 600 800 1000 1200-5
0
5
0 200 400 600 800 1000 1200-5
0
5
0 200 400 600 800 1000 1200-5
0
5
Process data
0 200 400 600 800 1000 1200-20
0
20Independent components
0 200 400 600 800 1000 1200-2
0
2
0 200 400 600 800 1000 1200-5
0
5
0 200 400 600 800 1000 1200-10
-5
0
Process data
0 200 400 600 800 1000 1200-1
0
1
0 200 400 600 800 1000 12000
2
4
0 200 400 600 800 1000 12000.5
1
1.5
0 200 400 600 800 1000 12000
0.5
1
Noise removal
• Different noise sources– Laplacian– Gaussian– Uniform– Exponential
0 100 200 300 400 500 600 700 800 900 1000-1
-0.5
0
0.5
1
1.5
2
Noise removal - Laplacian
0 200 400 600 800 1000 1200-4
-2
0
2
4Mixed signals
0 200 400 600 800 1000 1200-4
-2
0
2
4
Noise removal - Laplacian
0 200 400 600 800 1000 1200-2
-1
0
1
2
3Whitened signals
0 200 400 600 800 1000 1200-4
-2
0
2
4
6
Noise removal - Laplacian
0 200 400 600 800 1000 1200-2
-1
0
1
2Independent components
0 200 400 600 800 1000 1200-6
-4
-2
0
2
4
Noise removal - Gaussian
0 200 400 600 800 1000 1200-4
-2
0
2
4Mixed signals
0 200 400 600 800 1000 1200-5
0
5
Noise removal - Gaussian
0 200 400 600 800 1000 1200-2
-1
0
1
2Whitened signals
0 200 400 600 800 1000 1200-4
-2
0
2
4
Noise removal - Gaussian
0 200 400 600 800 1000 1200-2
-1
0
1
2Independent components
0 200 400 600 800 1000 1200-4
-2
0
2
4
Noise removal - Uniform
0 200 400 600 800 1000 1200-2
-1
0
1
2Mixed signals
0 200 400 600 800 1000 1200-2
-1
0
1
2
Noise removal - Uniform
0 200 400 600 800 1000 1200-2
-1
0
1
2Whitened signals
0 200 400 600 800 1000 1200-4
-2
0
2
4
Noise removal - Uniform
0 200 400 600 800 1000 1200-2
-1
0
1
2Independent components
0 200 400 600 800 1000 1200-2
-1
0
1
2
Noise removal - Exponential
0 200 400 600 800 1000 1200-1
0
1
2Mixed signals
0 200 400 600 800 1000 1200-0.4
-0.2
0
0.2
0.4
0.6
Noise removal - Exponential
0 200 400 600 800 1000 1200-2
-1
0
1
2
3Whitened signals
0 200 400 600 800 1000 1200-6
-4
-2
0
2
Noise removal - Exponential
0 200 400 600 800 1000 1200-2
-1
0
1
2Independent components
0 200 400 600 800 1000 1200-2
0
2
4
6
Process monitoring
• Often done by PCA
• Example: F1, F2
• One step further, use ICA!
Practical considerations
• Noise reduction (filtering)
• Dimension reduction (PCA?)
• Overlearning
• Algorithm
What about time signals?
• So far, no information about time used• Original ICA, x is a random variable• What if x is a time signal x(t) ?
0 100 200 300 400 500 600 700 800 900 1000-1
-0.5
0
0.5
1
1.5
2
x(t)
Time signal x(t)
• Extra information, order is not random:– Autocorrelation– Cross correlation
• More information
Relaxed assumptions
Gaussian data ok
Extensions…
• Non-linear ICA
• Independent subspace analysis
Further information:
Book:Independent Component Analysis - A. Hyvärinen, J. Karhunen, E. OjaCovers everything from novel to expert
Homepage:http://www.cis.hut.fi/projects/ica/Tutorials, material, contacts, matlab code, …
Journal of Machine Learning Researchhttp://jmlr.csail.mit.edu/papers/special/ica03.htmlPapers and publications
Toolboxes, codehttp://mole.imm.dtu.dk/toolbox/ica/index.htmlhttp://www.bsp.brain.riken.jp/ICALAB/http://www.cis.hut.fi/projects/ica/book/links.html