section 5a ica independent component analysis icamathstat.carleton.ca/~smills/2016-17/stat5703/pdf...
TRANSCRIPT
Data Mining 160
SECTION 5A-ICA
Independent Component Analysis (ICA)
Independent Component Analysis (ICA) (“Independent Component Analysis:Algorithms andApplications” Hyvarinen and Oja (2000)) is a variation of Principal Component Analysis(PCA).and a strong competitor to Factor Analysis. ICA is an attempt to decompose complex datainto independent subparts (also known as theblind source separation problem or thecocktail partyproblem.) It attempts to determine the source signalsS given only the observed mixtures.X (It isnecessary to assume independence of source signals, i.e. the value of one signal does not giveanyinformation re other signals).
Using singular value decompositonX � UDVT and writing S� N U andAT � DVT
Nwe can write
X � SAT thus each column ofX is a linear combination of the columns ofS. SinceU isorthogonal, and assuming that the columns ofX each have mean zero, this means the columns ofShave zero mean, are uncorrelated, and have unit variance.
We haveX i � �j�1
p
aijSj, i � 1, ...,p
or (writing X andS as column vectors)
X � AS ��ARTRS � A�S� for any orthogonalp � p matrix R�
ICA assumes theSi arestatistically independent (thus determining all the cross moments) ratherthan uncorrelated.(which determines only the second-order cross moments). Independence impliesuncorrelatedness so ICA constrains the estimation procedure to give uncorrelated estimates of theindependent components (this reduces the number of free parameters and thus simplifiestheproblem) The extra moment conditions identifyA uniquely.
NOTE: In Factor Analysis withq � p we have
X i � �j�1
q
aijSj,�� i, i � 1, ...,p
or
X � AS � �
where theS are thecommon factors and� representsunique factors. ICA can be viewed as anotherFactor Analysis rotation method (just likevarimax or quartimax); it starts essentially from a FactorAnalysis solution and looks for rotations that lead toindependent components.
©Mills 2017 ICA 160
Data Mining 2017 161
In Factor Analysis, theSj and� i are generally assumed to beGaussian; orthogonal transformationsAS of Gaussians are still Gaussian. Hence we can estimate the model only up to an orthogonaltransformation. ThusA is not identifiable for independent Gaussian components. (If just onecomponent is Gaussian, the ICA model can be estimated.)
However we actuallydo not want Gaussian source variables (we allow at most one Gaussiansource variable) because ifSi are Gaussian and the mixing matrixA is orthogonal, theX i will alsobe Gaussian and uncorrelated with unit variance so the joint density will be completely symmetricand we will have no information on thedirections of the columns of the mixing matrixA henceAcannot be estimated. We avoid this identifiability problem byassuming the Si are independentand non-Gaussian so
S � A�1X � ATX
(becauseA is orthogonal).
We assumeX has been ”whitened” (i.e. sphered) via SVD to have Cov�X� � I; A is orthogonal andsolving the ICA problem means finding an orthogonalS such that the components ofS � ATX areindependent and non-Gaussian.
Writing
Y � WTX
where
X � AS
and setting
Z � ATW
we obtain
Y � WTX � WT�AS�� �ATW�TS � ZTS
which can bemore Gaussian than any of the Si. and theleast Gaussian when it equals one of the Si
(i.e. when only one of the elements ofZ is nonzero). We want to findW so as to maximize thenonGaussianity of Y; this corresponds (in the transformed coordinate systems) toZ which hasonlyone nonzero component.
Thus
Y � WTX � ZTS
is one of the independent components.
Thus findingA thatminimizes mutual informationI�S �ATX� requireslooking for an orthogonaltransformation that gives the most independence between its components. This is equivalent tominimizing the sum of the entropies of separate components of Y, which is equivalent tomaximizing their departures from Gaussianity (since Gaussian variables have maximum entropy).
©Mills 2017 161
Data Mining 162
There are two problems:
1. We cannot determine the variances of the independent components. SinceS andA areboth unknown, a scalar multiple of oneSi can be cancelled out by dividing thecorresponding columnai of A by the same scalar. Thus we fix the magnitude of theindependent componentsSi - since they are all random variables, we assume each has unitvariance and, since they have been centered, this meansE�Si
2� � 1. Note that we canmultiply an independent component by�1 without affecting the model so there is alsoambiguity of sign.
2. We cannot determine the order of the independent components. SinceS andA are bothunknowns, we are free to change the order of the terms, setting any one of them first .Thus a permutation matrixP and its inverse can be substituted in the model to give
X � �AP�1��PS�
whereAP�1 is the new unknown mixing matrix to be solved for by ICA and the elementsof PS are the original independentSi but in different (i.e. permuted) order.
Read some required files:
drive �- ” D:”
code. dir �- paste( drive, ” DATA/ Data Mining R- Code”, sep�”/”)
data. dir �- paste( drive, ” DATA/ Data Mining Data”, sep�”/”)
source( paste( code. dir, ” BorderHist. r”, sep�”/”))
source( paste( code. dir, ” WaveIO. r”, sep�”/”))
library( fastICA)
We will create and display two signals (Figure 16)S. 1 �- sin(( 1: 1000)/ 20)
S. 2 �- rep(((( 1: 200)- 100)/ 100), 5)
S �- cbind( S. 1, S. 2)
plot( S. 1)
plot( S. 2)
©Mills 2017 ICA 162
Data Mining 2017 163
Figure 16. Original signals
©Mills 2017 163
Data Mining 164
and rotate them:
a �- pi/ 4
A �- matrix( c( cos( a), sin( a), - sin( a), cos( a)), 2, 2)
X �- S%*%A
plot( X[, 1])
plot( X[, 2])
Figure 17. Rotated signals
We combine them and then display them with their histograms:
border. hist( S. 1, S. 2)
border. hist( X[, 1], X[, 2])
Figure 18. Border histograms of the original (left) and rotated signals.
©Mills 2017 ICA 164
Data Mining 2017 165
Now start with the mixed signals and observe what happens to the histograms as we rotate the axeson which the signals are projected:b �- pi/ 36
W �- matrix( c( cos( b), - sin( b), sin( b), cos( b)), 2, 2)
XX �- X
for ( i in 1: 9) {
XX �- XX%*%W
border. hist( XX[, 1], XX[, 2])
readline(” Press Enter...”)
}
©Mills 2017 165
Data Mining 166
Figure 19. Effect of rotating the projection plane
©Mills 2017 ICA 166
Data Mining 2017 167
We see that for the fully mixed signals the histograms appear nearly Gaussian. As we move throughthe different projections the histograms move away from normality.
The resulting signals are:
plot( XX[, 1])
plot( XX[, 2])
Figure 20. Result of the ICA
Now consider what happens for 3 signals - a sine function, a sawtooth, and a pair of exponentials.
S. 1 �- sin(( 1: 1000)/ 20)
S. 2 �- rep(((( 1: 200)- 100)/ 100), 5)
S. 3 �- rep( c( exp( seq( 0,. 99,. 01))- 1. 845617, - exp( seq( 0,. 99,. 01)) �1. 845617), 5)
S �- cbind( S. 1, S. 2, S. 3)
A �- matrix( runif( 9), 3, 3) # Set a random mixing
X �- S%*%A
©Mills 2017 167
Data Mining 168
Do an ICA on the mixed data:
a �- fastICA( X, 3, alg. typ � ” parallel”, fun � ” logcosh”, alpha � 1,
method � ” R”, row. norm � FALSE, maxit � 200, tol � 0. 0001, verbose �
TRUE)WhiteningSymmetric FastICA using logcosh approx. to neg- entropy functionIteration 1 tol � 0. 1086564Iteration 2 tol � 0. 004629528Iteration 3 tol � 0. 0001178137Iteration 4 tol � 5. 028182e- 06
We then plot the original, mixed, and recovered data:
oldpar �- par( mfcol � c( 3, 3), mar�c( 2, 2, 2, 1))
plot( 1: 1000, S[, 1], type � ” l”, main � ” Original Signals”, xlab � ””, ylab � ””)
for ( i in 2: 3) {
plot( 1: 1000, S[, i ], type � ” l”, xlab � ””, ylab � ””)
}
plot( 1: 1000, X[, 1 ], type � ” l”, main � ” Mixed Signals”, xlab � ””, ylab � ””)
for ( i in 2: 3) {
plot( 1: 1000, X[, i], type � ” l”, xlab � ””, ylab � ””)
}
plot( 1: 1000, a$S[, 1 ], type � ” l”, main � ” ICA source estimates”, xlab � ””, ylab � ””)
for ( i in 2: 3) {
plot( 1: 1000, a$S[, i], type � ” l”, xlab � ””, ylab � ””)
}
par( oldpar)
Figure 21. Original, mixed and recovered
©Mills 2017 ICA 168
Data Mining 2017 169
Repeat the process with four signals:
S. 1 �- sin(( 1: 1000)/ 20)
S. 2 �- rep(((( 1: 200)- 100)/ 100), 5)
s. 3 �- tan( seq(- pi/ 2�. 1, pi/ 2-. 1,. 0118))
S. 3 �- rep( s. 3, 4)
S. 4 �- rep( c( exp( seq( 0,. 99,. 01))- 1. 845617, - exp( seq( 0,. 99,. 01)) �1. 845617), 5)
S �- cbind( S. 1, S. 2, S. 3, S. 4)
( A �- matrix( runif( 16), 4, 4))[, 1] [, 2] [, 3] [, 4]
[ 1,] 0. 4091777 0. 79526756 0. 773487999 0. 7201944[ 2,] 0. 1084712 0. 03256865 0. 151097684 0. 2899303[ 3,] 0. 8920621 0. 69775810 0. 281228361 0. 1156242[ 4,] 0. 4683415 0. 91346105 0. 003911073 0. 1033929X �- S%*%A
a �- fastICA( X, 4, alg. typ � ” parallel”, fun � ” logcosh”, alpha � 1,
method � ” R”, row. norm � FALSE, maxit � 200, tol � 0. 0001, verbose �
TRUE)CenteringWhiteningSymmetric FastICA using logcosh approx. to neg- entropy functionIteration 1 tol � 0. 3458911Iteration 2 tol � 0. 007638039Iteration 3 tol � 0. 001150413Iteration 4 tol � 0. 0003499578Iteration 5 tol � 9. 909304e- 05oldpar �- par( mfcol � c( 4, 3), mar�c( 2, 2, 2, 1))
plot( 1: 1000, S[, 1], type � ” l”, main � ” Original Signals”, xlab � ””, ylab � ””)
for ( i in 2: 4) {
plot( 1: 1000, S[, i ], type � ” l”, xlab � ””, ylab � ””)
}
plot( 1: 1000, X[, 1 ], type � ” l”, main � ” Mixed Signals”, xlab � ””, ylab � ””)
for ( i in 2: 4) {
plot( 1: 1000, X[, i], type � ” l”, xlab � ””, ylab � ””)
}
plot( 1: 1000, a$S[, 1 ], type � ” l”, main � ” ICA source estimates”, xlab � ””, ylab � ””)
for ( i in 2: 4) {
plot( 1: 1000, a$S[, i], type � ” l”, xlab � ””, ylab � ””)
}
par( oldpar)
©Mills 2017 169
Data Mining 170
Figure 22.
©Mills 2017 ICA 170
Data Mining 2017 171
For this example we will look at three mixtures of 4 signals (note the warning messages):.
A �- matrix( runif( 12), 4, 3)
X �- S%*%A
a �- fastICA( X, 4, alg. typ � ” parallel”, fun � ” logcosh”, alpha � 1,
method � ” R”, row. norm � FALSE, maxit � 200, tol � 0. 0001, verbose � TRUE)n. comp is too largen. comp set to 3CenteringWhiteningSymmetric FastICA using logcosh approx. to neg- entropy functionIteration 1 tol � 0. 1473840Iteration 2 tol � 0. 003145043Iteration 3 tol � 1. 781576e- 05oldpar �- par( mfcol � c( 4, 3), mar�c( 2, 2, 2, 1))
plot( 1: 1000, S[, 1], type � ” l”, main � ” Original Signals”, xlab � ””, ylab � ””)
for ( i in 2: 4) {
plot( 1: 1000, S[, i ], type � ” l”, xlab � ””, ylab � ””)
}
plot( 1: 1000, X[, 1 ], type � ” l”, main � ” Mixed Signals”, xlab � ””, ylab � ””)
for ( i in 2: 3) {
plot( 1: 1000, X[, i], type � ” l”, xlab � ””, ylab � ””)
}
plot( 0, type�” n”)) # Dummy to fill
plot( 1: 1000, a$S[, 1 ], type � ” l”, main � ” ICA source estimates”, xlab � ””, ylab � ””)
for ( i in 2: 3) {
plot( 1: 1000, a$S[, i], type � ” l”, xlab � ””, ylab � ””)
}
plot( 0, type�” n”) # Dummy to fill
par( oldpar)
©Mills 2017 171
Data Mining 172
Figure 23.
©Mills 2017 ICA 172
Data Mining 2017 173
The next example uses ICA on sounds. This is a demonstration found at the Laboratory of Computerand Information Science (CIS) of the Department of Computer Science and Engineering at HelsinkiUniversity of Technology
http:// www. cis. hut. fi/ projects/ ica/ cocktail/ cocktail_en. cgi
For this example we will need to read and write .wav files.A .wav file has the basic structure.described in the next function:
read. wavz �- z function( d. file) z {
zz �- file( d. file,” rb”) # Open binary file for reading
# RIFF chunk
RIFF �- readChar( zz, 4) # Word RIFF ( 4)
file. len �- readBin( zz, integer(), 1) # Number of bytes in file ( 4)
WAVE �- readChar( zz, 4) # Word WAVE ( 4)
# FORMAT chunk
fmt �- readChar( zz, 4) # fmt ( 3)
len. of. format �- readBin( zz, integer(), 1) # format length ( 40
f. one �- readBin( zz, integer(), 1, size�2) # Number 1 ( 2)
Channel. numbs �- readBin( zz, integer(), 1, size�2) # Number of channels ( 2)
Sample. Rate �- readBin( zz, integer(), 1) # Sample rate ( 4)
Bytes. P. Sec �- readBin( zz, integer(), 1) # Bytes/ sec ( 4)
Bytes. P. Sample �- readBin( zz, integer(), 1, size�2) # Bytes/ sample ( 2)
Bits. P. Sample �- readBin( zz, integer(), 1, size�2) # Bits/ sample ( 2)
# DATA chunk
DATA �- readChar( zz, 4) # Word DATA ( 4)
data. len �- readBin( zz, integer(), 1) # Length of data ( 4)
bias �- 2^( Bits. P. Sample - 1)
wav. data �- rep( 0, data. len) # Create a place to store data
# Read data based on above parameters
wav. data �- readBin( zz, integer(), data. len, size�Bytes. P. Sample, signed�F)
close( zz) # Close the file
wav. data �- wav. data - bias # Shift based on bias
# Return the information for R
list( RIFF�RIFF, File. Len�file. len, WAVE�WAVE, format�fmt,
len. of. format�len. of. format, f. one�f. one, Channel. numbs�Channel. numbs,
Sample. Rate�Sample. Rate, Bytes. P. Sec�Bytes. P. Sec,
Bytes. P. Sample�Bytes. P. Sample, Bits. P. Sample�Bits. P. Sample,
DATA�DATA, data. len�data. len, data�wav. data)
}
Set up variables for the data and create the file names for the input, mixed, and outputfile:
numb. source �- 9
in. file �- matrix( 0, numb. source, 1)
mix. file �- matrix( 0, numb. source, 1)
out. file �- matrix( 0, numb. source, 1)
for ( i in 1: numb. source) {
in. file[ i,] �- paste( data. dir, ”/ source”, i,”. wav”, sep�””)
mix. file[ i,] �- paste( data. dir, ”/ m”, i,”. wav”, sep�””)
out. file[ i,] �- paste( data. dir, ”/ s”, i,”. wav”, sep�””)
©Mills 2017 173
Data Mining 174}
in. wav �- {}
for ( m in 1: numb. source) {
in. wav �- c( in. wav, list( read. wav( in. file[ m,])))
}
©Mills 2017 ICA 174
Data Mining 2017 175
We can look at the characteristics of the file with:
wav. char �- function( wav)
{
cat(” RIFF � ”, wav$RIFF, ”\ n”)
cat(” Length � ”, wav$File. Len, ”\ n”)
cat(” Wave � ”, wav$WAVE, ”\ n”)
cat(” Format � ”, wav$format, ”\ n”)
cat(” Format Length � ”, wav$len. of. format, ”\ n”)
cat(” One � ”, wav$f. one, ”\ n”)
cat(” Number of Channels � ”, wav$Channel. numbs, ”\ n”)
cat(” Sample Rate � ”, wav$Sample. Rate, ”\ n”)
cat(” Bytes/ Sec � ”, wav$Bytes. P. Sec, ”\ n”)
cat(” Bytes/ Sample � ”, wav$Bytes. P. Sample, ”\ n”)
cat(” Bits/ Sample � ”, wav$Bits. P. Sample, ”\ n”)
cat(” Data � ”, wav$DATA, ”\ n”)
cat(” Data Length � ”, wav$data. len, ”\ n”)
}
wav. char( in. wav[[ 1]])RIFF � RIFFLength � 50036Wave � WAVEFormat � fmtFormat Length � 16One � 1Number of Channels � 1Sample Rate � 8000Bytes/ Sec � 8000Bytes/ Sample � 1Bits/ Sample � 8Data � dataData Length � 50000
Set up a random matrix for mixing:
A �- matrix( runif( numb. source* numb. source), numb. source, numb. source)
We will create a matrix (5000� 9) that has one source in each column:mixed �- {}
for ( i in 1: numb. source) {
mixed �- cbind( mixed, in. wav[[ i]]$ data)
}
We multiply by the 9� 9 mixing matrix to produce a new (5000� 9) matrix in which each columnis a mixture of the 9 columns of the original matrix.mixed �- mixed%*%A
©Mills 2017 175
Data Mining 176
We now plot the resulting wave form (Figure 24).old. par �- par( mfcol � c( numb. source, 1))
par( mar�c( 2, 2, 2, 2) �0. 1)
plot( mixed[, 1], type�” l”, main�” Mixed”)
for ( m in 2: numb. source) {
plot( mixed[, m], type�” l”)
}
if ( dev. cur()[[ 1]]! �1) bringToTop( which�dev. cur())
par( old. par)
Figure 24. 9 signals mixed
©Mills 2017 ICA 176
Data Mining 2017 177
In order to save the signal as a .wav file, we need the header information. We cheat a little by simplyusing the in.wav header and replacing its data part with the mixed data. The first part of thefollowing code simply creates a mixed list from the in list and the second partdoes the datareplacement.
mix. wav �- {}
for ( m in 1: numb. source) {
mix. wav �- c( mix. wav, list( in. wav[[ m]]))
}
for ( m in 1: numb. source) {
mix. wav[[ m]]$ data �- mixed[, m]
write. wav( mix. file[ m,], mix. wav[[ m]])
}
# ��������������� Play them �����������������
Use thesound library to play the mixed sound:
library( sound)
play( mix. file[ 1,])
play( mix. file[ 2,])
play( mix. file[ 3,])
play( mix. file[ 4,])
play( mix. file[ 5,])
play( mix. file[ 6,])
play( mix. file[ 7,])
play( mix. file[ 8,])
play( mix. file[ 9,])
# ��������������� Unmix them �����������������
We will usefastICA to unmix the signals and save then play the results as we did for the mixedsignal:
mixed. all �- {}
for ( i in 1: numb. source) {
mixed. all �- cbind( mixed. all, mixed[, i])
}
ICA. wavs �- fastICA( mixed. all,
numb. source, alg. typ � ” parallel”, fun � ” logcosh”, alpha � 1,
method � ” R”, row. norm � FALSE, maxit � 200,
tol � 0. 0001, verbose � TRUE)
# ��������������� Save them �����������������
new. wav �- {}
for ( m in 1: numb. source) {
new. wav �- c( new. wav, list( in. wav[[ m]]))
}
for ( m in 1: numb. source) {
new. wav[[ m]]$ data �- 5* ICA. wavs$S[, m]
write. wav( out. file[ m,], new. wav[[ m]])
}
# ��������������� Play them �����������������
play( out. file[ 1,])
©Mills 2017 177
Data Mining 178play( out. file[ 2,])
play( out. file[ 3,])
play( out. file[ 4,])
play( out. file[ 5,])
play( out. file[ 6,])
play( out. file[ 7,])
play( out. file[ 8,])
play( out. file[ 9,])
# ��������������� Plot them �����������������
old. par �- par( mfcol � c( numb. source, 3))
par( mar�c( 2, 2, 2, 2) �0. 1)
plot( in. wav[[ 1]]$ data, type�” l”, main�” Original”)
for ( m in 2: numb. source) {
plot( in. wav[[ m]]$ data, type�” l”)
}
plot( mixed[, 1], type�” l”, main�” Mixed”)
for ( m in 2: numb. source) {
plot( mixed[, m], type�” l”)
}
plot( ICA. wavs$S[, 1], type�” l”, )
if ( dev. cur()[[ 1]]! �1) bringToTop( which�dev. cur())
for ( m in 2: numb. source) {
plot( ICA. wavs$S[, m], type�” l”)
}
par( old. par)
©Mills 2017 ICA 178
Data Mining 2017 179
Figure 25.# ��������������� Original - play �����������������
play( in. file[ 1,])
play( in. file[ 2,])
play( in. file[ 3,])
play( in. file[ 4,])
play( in. file[ 5,])
play( in. file[ 6,])
play( in. file[ 7,])
play( in. file[ 8,])
play( in. file[ 9,])
©Mills 2017 179