section 5a ica independent component analysis icamathstat.carleton.ca/~smills/2016-17/stat5703/pdf...

Data Mining 160

SECTION 5A-ICA

Independent Component Analysis (ICA)

Independent Component Analysis (ICA) (“Independent Component Analysis:Algorithms andApplications” Hyvarinen and Oja (2000)) is a variation of Principal Component Analysis(PCA).and a strong competitor to Factor Analysis. ICA is an attempt to decompose complex datainto independent subparts (also known as theblind source separation problem or thecocktail partyproblem.) It attempts to determine the source signalsS given only the observed mixtures.X (It isnecessary to assume independence of source signals, i.e. the value of one signal does not giveanyinformation re other signals).

Using singular value decompositonX � UDVT and writing S� N U andAT � DVT

Nwe can write

X � SAT thus each column ofX is a linear combination of the columns ofS. SinceU isorthogonal, and assuming that the columns ofX each have mean zero, this means the columns ofShave zero mean, are uncorrelated, and have unit variance.

We haveX i � �j�1

p

aijSj, i � 1, ...,p

or (writing X andS as column vectors)

X � AS ��ARTRS � A�S� for any orthogonalp � p matrix R�

ICA assumes theSi arestatistically independent (thus determining all the cross moments) ratherthan uncorrelated.(which determines only the second-order cross moments). Independence impliesuncorrelatedness so ICA constrains the estimation procedure to give uncorrelated estimates of theindependent components (this reduces the number of free parameters and thus simplifiestheproblem) The extra moment conditions identifyA uniquely.

NOTE: In Factor Analysis withq � p we have

X i � �j�1

q

aijSj,�� i, i � 1, ...,p

or

X � AS � �

where theS are thecommon factors and� representsunique factors. ICA can be viewed as anotherFactor Analysis rotation method (just likevarimax or quartimax); it starts essentially from a FactorAnalysis solution and looks for rotations that lead toindependent components.

©Mills 2017 ICA 160

Data Mining 2017 161

In Factor Analysis, theSj and� i are generally assumed to beGaussian; orthogonal transformationsAS of Gaussians are still Gaussian. Hence we can estimate the model only up to an orthogonaltransformation. ThusA is not identifiable for independent Gaussian components. (If just onecomponent is Gaussian, the ICA model can be estimated.)

However we actuallydo not want Gaussian source variables (we allow at most one Gaussiansource variable) because ifSi are Gaussian and the mixing matrixA is orthogonal, theX i will alsobe Gaussian and uncorrelated with unit variance so the joint density will be completely symmetricand we will have no information on thedirections of the columns of the mixing matrixA henceAcannot be estimated. We avoid this identifiability problem byassuming the Si are independentand non-Gaussian so

S � A�1X � ATX

(becauseA is orthogonal).

We assumeX has been ”whitened” (i.e. sphered) via SVD to have Cov�X� � I; A is orthogonal andsolving the ICA problem means finding an orthogonalS such that the components ofS � ATX areindependent and non-Gaussian.

Writing

Y � WTX

where

X � AS

and setting

Z � ATW

we obtain

Y � WTX � WT�AS�� ATW�TS � ZTS

which can bemore Gaussian than any of the Si. and theleast Gaussian when it equals one of the Si

(i.e. when only one of the elements ofZ is nonzero). We want to findW so as to maximize thenonGaussianity of Y; this corresponds (in the transformed coordinate systems) toZ which hasonlyone nonzero component.

Thus

Y � WTX � ZTS

is one of the independent components.

Thus findingA thatminimizes mutual informationI�S �ATX� requireslooking for an orthogonaltransformation that gives the most independence between its components. This is equivalent tominimizing the sum of the entropies of separate components of Y, which is equivalent tomaximizing their departures from Gaussianity (since Gaussian variables have maximum entropy).

©Mills 2017 161

Data Mining 162

There are two problems:

1. We cannot determine the variances of the independent components. SinceS andA areboth unknown, a scalar multiple of oneSi can be cancelled out by dividing thecorresponding columnai of A by the same scalar. Thus we fix the magnitude of theindependent componentsSi - since they are all random variables, we assume each has unitvariance and, since they have been centered, this meansE�Si

2� � 1. Note that we canmultiply an independent component by�1 without affecting the model so there is alsoambiguity of sign.

2. We cannot determine the order of the independent components. SinceS andA are bothunknowns, we are free to change the order of the terms, setting any one of them first .Thus a permutation matrixP and its inverse can be substituted in the model to give

X � �AP�1��PS�

whereAP�1 is the new unknown mixing matrix to be solved for by ICA and the elementsof PS are the original independentSi but in different (i.e. permuted) order.

Read some required files:

drive �- ” D:”

code. dir �- paste( drive, ” DATA/ Data Mining R- Code”, sep�”/”)

data. dir �- paste( drive, ” DATA/ Data Mining Data”, sep�”/”)

source( paste( code. dir, ” BorderHist. r”, sep�”/”))

source( paste( code. dir, ” WaveIO. r”, sep�”/”))

library( fastICA)

We will create and display two signals (Figure 16)S. 1 �- sin(( 1: 1000)/ 20)

S. 2 �- rep(((( 1: 200)- 100)/ 100), 5)

S �- cbind( S. 1, S. 2)

plot( S. 1)

plot( S. 2)



Figure 16. Original signals

©Mills 2017 163

Data Mining 164

and rotate them:

a �- pi/ 4

A �- matrix( c( cos( a), sin( a), - sin( a), cos( a)), 2, 2)

X �- S%*%A

plot( X[, 1])

plot( X[, 2])

Figure 17. Rotated signals

We combine them and then display them with their histograms:

border. hist( S. 1, S. 2)

border. hist( X[, 1], X[, 2])

Figure 18. Border histograms of the original (left) and rotated signals.



Now start with the mixed signals and observe what happens to the histograms as we rotate the axeson which the signals are projected:b �- pi/ 36

W �- matrix( c( cos( b), - sin( b), sin( b), cos( b)), 2, 2)

XX �- X

for ( i in 1: 9) {

XX �- XX%*%W

border. hist( XX[, 1], XX[, 2])

readline(” Press Enter...”)

}

©Mills 2017 165

Data Mining 166

Figure 19. Effect of rotating the projection plane



We see that for the fully mixed signals the histograms appear nearly Gaussian. As we move throughthe different projections the histograms move away from normality.

The resulting signals are:

plot( XX[, 1])

plot( XX[, 2])

Figure 20. Result of the ICA

Now consider what happens for 3 signals - a sine function, a sawtooth, and a pair of exponentials.

S. 1 �- sin(( 1: 1000)/ 20)

S. 2 �- rep(((( 1: 200)- 100)/ 100), 5)

S. 3 �- rep( c( exp( seq( 0,. 99,. 01))- 1. 845617, - exp( seq( 0,. 99,. 01)) �1. 845617), 5)

S �- cbind( S. 1, S. 2, S. 3)

A �- matrix( runif( 9), 3, 3) # Set a random mixing

X �- S%*%A

©Mills 2017 167

Data Mining 168

Do an ICA on the mixed data:

a �- fastICA( X, 3, alg. typ � ” parallel”, fun � ” logcosh”, alpha � 1,

method � ” R”, row. norm � FALSE, maxit � 200, tol � 0. 0001, verbose �

TRUE)WhiteningSymmetric FastICA using logcosh approx. to neg- entropy functionIteration 1 tol � 0. 1086564Iteration 2 tol � 0. 004629528Iteration 3 tol � 0. 0001178137Iteration 4 tol � 5. 028182e- 06

We then plot the original, mixed, and recovered data:

oldpar �- par( mfcol � c( 3, 3), mar�c( 2, 2, 2, 1))

plot( 1: 1000, S[, 1], type � ” l”, main � ” Original Signals”, xlab � ””, ylab � ””)

for ( i in 2: 3) {

plot( 1: 1000, S[, i ], type � ” l”, xlab � ””, ylab � ””)

}

plot( 1: 1000, X[, 1 ], type � ” l”, main � ” Mixed Signals”, xlab � ””, ylab � ””)

for ( i in 2: 3) {

plot( 1: 1000, X[, i], type � ” l”, xlab � ””, ylab � ””)

}

plot( 1: 1000, a$S[, 1 ], type � ” l”, main � ” ICA source estimates”, xlab � ””, ylab � ””)

for ( i in 2: 3) {

plot( 1: 1000, a$S[, i], type � ” l”, xlab � ””, ylab � ””)

}

par( oldpar)

Figure 21. Original, mixed and recovered



Repeat the process with four signals:

S. 1 �- sin(( 1: 1000)/ 20)

S. 2 �- rep(((( 1: 200)- 100)/ 100), 5)

s. 3 �- tan( seq(- pi/ 2�. 1, pi/ 2-. 1,. 0118))

S. 3 �- rep( s. 3, 4)

S. 4 �- rep( c( exp( seq( 0,. 99,. 01))- 1. 845617, - exp( seq( 0,. 99,. 01)) �1. 845617), 5)

S �- cbind( S. 1, S. 2, S. 3, S. 4)

( A �- matrix( runif( 16), 4, 4))[, 1] [, 2] [, 3] [, 4]

[ 1,] 0. 4091777 0. 79526756 0. 773487999 0. 7201944[ 2,] 0. 1084712 0. 03256865 0. 151097684 0. 2899303[ 3,] 0. 8920621 0. 69775810 0. 281228361 0. 1156242[ 4,] 0. 4683415 0. 91346105 0. 003911073 0. 1033929X �- S%*%A


method � ” R”, row. norm � FALSE, maxit � 200, tol � 0. 0001, verbose �

TRUE)CenteringWhiteningSymmetric FastICA using logcosh approx. to neg- entropy functionIteration 1 tol � 0. 3458911Iteration 2 tol � 0. 007638039Iteration 3 tol � 0. 001150413Iteration 4 tol � 0. 0003499578Iteration 5 tol � 9. 909304e- 05oldpar �- par( mfcol � c( 4, 3), mar�c( 2, 2, 2, 1))


for ( i in 2: 4) {


}


for ( i in 2: 4) {


}


for ( i in 2: 4) {


}

par( oldpar)

©Mills 2017 169

Data Mining 170

Figure 22.



For this example we will look at three mixtures of 4 signals (note the warning messages):.

A �- matrix( runif( 12), 4, 3)

X �- S%*%A


method � ” R”, row. norm � FALSE, maxit � 200, tol � 0. 0001, verbose � TRUE)n. comp is too largen. comp set to 3CenteringWhiteningSymmetric FastICA using logcosh approx. to neg- entropy functionIteration 1 tol � 0. 1473840Iteration 2 tol � 0. 003145043Iteration 3 tol � 1. 781576e- 05oldpar �- par( mfcol � c( 4, 3), mar�c( 2, 2, 2, 1))


for ( i in 2: 4) {


}


for ( i in 2: 3) {


}

plot( 0, type�” n”)) # Dummy to fill


for ( i in 2: 3) {


}

plot( 0, type�” n”) # Dummy to fill

par( oldpar)

©Mills 2017 171

Data Mining 172

Figure 23.



The next example uses ICA on sounds. This is a demonstration found at the Laboratory of Computerand Information Science (CIS) of the Department of Computer Science and Engineering at HelsinkiUniversity of Technology

http:// www. cis. hut. fi/ projects/ ica/ cocktail/ cocktail_en. cgi

For this example we will need to read and write .wav files.A .wav file has the basic structure.described in the next function:

read. wavz �- z function( d. file) z {

zz �- file( d. file,” rb”) # Open binary file for reading

# RIFF chunk

RIFF �- readChar( zz, 4) # Word RIFF ( 4)

file. len �- readBin( zz, integer(), 1) # Number of bytes in file ( 4)

WAVE �- readChar( zz, 4) # Word WAVE ( 4)

# FORMAT chunk

fmt �- readChar( zz, 4) # fmt ( 3)

len. of. format �- readBin( zz, integer(), 1) # format length ( 40

f. one �- readBin( zz, integer(), 1, size�2) # Number 1 ( 2)

Channel. numbs �- readBin( zz, integer(), 1, size�2) # Number of channels ( 2)

Sample. Rate �- readBin( zz, integer(), 1) # Sample rate ( 4)

Bytes. P. Sec �- readBin( zz, integer(), 1) # Bytes/ sec ( 4)

Bytes. P. Sample �- readBin( zz, integer(), 1, size�2) # Bytes/ sample ( 2)

Bits. P. Sample �- readBin( zz, integer(), 1, size�2) # Bits/ sample ( 2)

# DATA chunk

DATA �- readChar( zz, 4) # Word DATA ( 4)

data. len �- readBin( zz, integer(), 1) # Length of data ( 4)

bias �- 2^( Bits. P. Sample - 1)

wav. data �- rep( 0, data. len) # Create a place to store data

# Read data based on above parameters

wav. data �- readBin( zz, integer(), data. len, size�Bytes. P. Sample, signed�F)

close( zz) # Close the file

wav. data �- wav. data - bias # Shift based on bias

# Return the information for R

list( RIFF�RIFF, File. Len�file. len, WAVE�WAVE, format�fmt,

len. of. format�len. of. format, f. one�f. one, Channel. numbs�Channel. numbs,

Sample. Rate�Sample. Rate, Bytes. P. Sec�Bytes. P. Sec,

Bytes. P. Sample�Bytes. P. Sample, Bits. P. Sample�Bits. P. Sample,

DATA�DATA, data. len�data. len, data�wav. data)

}

Set up variables for the data and create the file names for the input, mixed, and outputfile:

numb. source �- 9

in. file �- matrix( 0, numb. source, 1)

mix. file �- matrix( 0, numb. source, 1)

out. file �- matrix( 0, numb. source, 1)

for ( i in 1: numb. source) {

in. file[ i,] �- paste( data. dir, ”/ source”, i,”. wav”, sep�””)

mix. file[ i,] �- paste( data. dir, ”/ m”, i,”. wav”, sep�””)

out. file[ i,] �- paste( data. dir, ”/ s”, i,”. wav”, sep�””)

©Mills 2017 173

Data Mining 174}

in. wav �- {}

for ( m in 1: numb. source) {

in. wav �- c( in. wav, list( read. wav( in. file[ m,])))

}



We can look at the characteristics of the file with:

wav. char �- function( wav)

{

cat(” RIFF � ”, wav$RIFF, ”\ n”)

cat(” Length � ”, wav$File. Len, ”\ n”)

cat(” Wave � ”, wav$WAVE, ”\ n”)

cat(” Format � ”, wav$format, ”\ n”)

cat(” Format Length � ”, wav$len. of. format, ”\ n”)

cat(” One � ”, wav$f. one, ”\ n”)

cat(” Number of Channels � ”, wav$Channel. numbs, ”\ n”)

cat(” Sample Rate � ”, wav$Sample. Rate, ”\ n”)

cat(” Bytes/ Sec � ”, wav$Bytes. P. Sec, ”\ n”)

cat(” Bytes/ Sample � ”, wav$Bytes. P. Sample, ”\ n”)

cat(” Bits/ Sample � ”, wav$Bits. P. Sample, ”\ n”)

cat(” Data � ”, wav$DATA, ”\ n”)

cat(” Data Length � ”, wav$data. len, ”\ n”)

}

wav. char( in. wav[[ 1]])RIFF � RIFFLength � 50036Wave � WAVEFormat � fmtFormat Length � 16One � 1Number of Channels � 1Sample Rate � 8000Bytes/ Sec � 8000Bytes/ Sample � 1Bits/ Sample � 8Data � dataData Length � 50000

Set up a random matrix for mixing:

A �- matrix( runif( numb. source* numb. source), numb. source, numb. source)

We will create a matrix (5000� 9) that has one source in each column:mixed �- {}


mixed �- cbind( mixed, in. wav[[ i]]$ data)

}

We multiply by the 9� 9 mixing matrix to produce a new (5000� 9) matrix in which each columnis a mixture of the 9 columns of the original matrix.mixed �- mixed%*%A

©Mills 2017 175

Data Mining 176

We now plot the resulting wave form (Figure 24).old. par �- par( mfcol � c( numb. source, 1))

par( mar�c( 2, 2, 2, 2) �0. 1)

plot( mixed[, 1], type�” l”, main�” Mixed”)


plot( mixed[, m], type�” l”)

}

if ( dev. cur()[[ 1]]! �1) bringToTop( which�dev. cur())

par( old. par)

Figure 24. 9 signals mixed



In order to save the signal as a .wav file, we need the header information. We cheat a little by simplyusing the in.wav header and replacing its data part with the mixed data. The first part of thefollowing code simply creates a mixed list from the in list and the second partdoes the datareplacement.

mix. wav �- {}


mix. wav �- c( mix. wav, list( in. wav[[ m]]))

}


mix. wav[[ m]]$ data �- mixed[, m]

write. wav( mix. file[ m,], mix. wav[[ m]])

}

# �� Play them ��

Use thesound library to play the mixed sound:

library( sound)

play( mix. file[ 1,])









# �� Unmix them ��

We will usefastICA to unmix the signals and save then play the results as we did for the mixedsignal:

mixed. all �- {}


mixed. all �- cbind( mixed. all, mixed[, i])

}

ICA. wavs �- fastICA( mixed. all,

numb. source, alg. typ � ” parallel”, fun � ” logcosh”, alpha � 1,

method � ” R”, row. norm � FALSE, maxit � 200,

tol � 0. 0001, verbose � TRUE)

# �� Save them ��

new. wav �- {}


new. wav �- c( new. wav, list( in. wav[[ m]]))

}


new. wav[[ m]]$ data �- 5* ICA. wavs$S[, m]

write. wav( out. file[ m,], new. wav[[ m]])

}

# �� Play them ��

play( out. file[ 1,])

©Mills 2017 177

Data Mining 178play( out. file[ 2,])








# �� Plot them ��

old. par �- par( mfcol � c( numb. source, 3))

par( mar�c( 2, 2, 2, 2) �0. 1)

plot( in. wav[[ 1]]$ data, type�” l”, main�” Original”)


plot( in. wav[[ m]]$ data, type�” l”)

}

plot( mixed[, 1], type�” l”, main�” Mixed”)


plot( mixed[, m], type�” l”)

}

plot( ICA. wavs$S[, 1], type�” l”, )

if ( dev. cur()[[ 1]]! �1) bringToTop( which�dev. cur())


plot( ICA. wavs$S[, m], type�” l”)

}

par( old. par)


section 5a ica independent component analysis icamathstat.carleton.ca/~smills/2016-17/stat5703/pdf...

Documents