pca network unsupervised learning networks. pca is a representation network useful for signal,...

23
PCA NETWORK Unsupervised Learning NEtWORKS

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

•PCA NETWORK

Unsupervised Learning NEtWORKS

Page 2: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

PCA is a Representation Network useful for signal, image, video processing

Page 3: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

In order to analyze multi-dimensional input vectors, a representation with maximum information is the principal component analysis (PCA).

PCA

• per component: extract most significant features,

• inter-component: avoid duplication or redundancy between the neurons.

PCA NEtWORKS

Page 4: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Rx Řx = (1/M ) Σt x(t)xt(t)

An estimate of the autocorrelation matrix by taking the time average over the sample vectors:

Rx = UΛUt

Page 5: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

the optimal matrix W is formed by the first m singular vectors of Rx .

x(t) = W a(t)

the errors of the optimal estimate are [Jain89]:

• matrix-2-norm error = λm+1

• least-mean-square error = Σin

=m+1 λi

Page 6: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

to enhance the correlation between the input x(t) and the extracted component a(t), it is natural to use a Hebbian-type rule:

w(t+1) = w(t) + β x(t)a(t)

a(t) = w(t)tx(t)

First PC

Page 7: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

the Oja learning Rule is equivalent to a normalized Hebbian rule. (Show procedure!!)

Δw(t) = β [x(t)a(t) - w(t) a(t)2]

Oja Learning Rule

Page 8: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing
Page 9: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

By the Oja learning rule, w(t) converges asymptotically (with probability 1) to

Convergence theorem:

Single Component

w = w(∞) = e1

where e1 is the principal eigenvector of Rx

Page 10: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Δw(t) = β [x(t)a(t) - w(t) a(t)2]Proof:Δw(t) = β [x(t)x’(t)w(t) - a(t)2 w(t)]

Δw(ť) = β [Rx - σ(ť)I] w(ť)

Δw(ť) = β [UΛUT - σ(ť)I] w(ť)

Δw(ť) = β U[Λ - σ(ť)I] UT w(ť)

ΔUTw(ť) = β [Λ - σ(ť)I] UTw(ť)

ΔΘ(ť) = β [Λ - σ(ť)I] Θ (ť)

take average over a block of data, and redenote ť as the block time index:

Page 11: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

the relative dominance of the principle component grows, with a growth rate:

Convergence Rates

Each of the eigen-components is enhanced/dampened by

θi(ť+1) = [1+β' λi - β' σ(ť)] θi(ť)

(1+β' [λi-σ(ť)])/(1+β' [λ1 - σ(ť)])

Θ(ť) = [θ1(ť) θ2(ť) … θn(ť)]T

Page 12: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Simulation: Decay Rates of PCs

Page 13: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Multiple Principal Components

How to extract

Page 14: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Let W denote a nm weight matrix

ΔW(t) = β [x(t) - W(t) a(t)] a(t)t

Concern on duplication/redundancy

Page 15: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Assume that the first component is already obtained; then

the output value can be ``deflated'' by the following transformation:

Deflation Method

x = (I- w1 w’1) x˜

Page 16: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

the basic idea is to allow the old hidden units to influence the new units so that the new ones do not duplicate information (in full or in part) already provided by the old units. By this approach, the deflation process is effectively implemented in an adaptive manner.

Lateral Orthogonalization Network

Page 17: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing
Page 18: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

APEX Network(multiple PCs)

Page 19: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Δαij(t) = β [ ai(t) aj(t) - αij(t) ai(t)2 ]

APEX: Adaptive Principal-component Extractor

Δwi(t) = β [ x(t)ai(t) - wi(t) ai(t)2]

the Oja Rule: for i-th component (e.g. i=2)

Dynamic Orthogonalization Rule (e.g. i=2,j=1)

Page 20: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

the Hebbian weight matrix W(t) in APEX converges asymptotically to a matrix formed by the m largest principal components.

Convergence theorem: Multiple Components

the weight matrix W(t) converges to (with probability 1),

W(∞) = W

where W is the matrix formed by m row vectors wit,

wi = wi(∞) = ei

Page 21: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Δα(t) = β [ a1(t) a2(t) - α(t) a2(t)2 ]

Δw2(t) = β [ x(t)a2(t) – w2(t) a2(t)2]

w’1Δw2(t) = β [w’1 x(t)a2(t) – w’1w2(t) a2(t)2]

Δw’1w2(t) = β [a1(t)a2(t) – w’1w2(t) a2(t)2]

Δ[w’1w2(t)- Δα(t)] = β[ w’1w2(t) -α(t)]a2(t)2

α(t)→w’1w2(t) a2(t) = x’ (t)w2(t) - α(t)a1(t) = x’ (t) [I- w’1w1] w2(t)

[w’1w2(t+1)- α(t+1)] = [1-βσ(t)][ w’1w2(t) -α(t)]

w’1w2(t) - α(t) → 0

Page 22: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

Learning Rates of APEX

[w’1w2(ť+1)- α(ť+1)] = [1-β’σ(ť)][w’1w2(ť) -α(ť)]

β’ = 1/σ(ť)

• β = 1/[Σta2 (t)2]

• β = 1/[Σtγta2 (t)2]

Learning Rates

Page 23: PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing

• PAPEX: Hierarchical Extraction

Other Extensions

• DCA: Discriminant Component Analysis

• ICA: Independent Component Analysis