parallel ica algorithm and modeling

Parallel ICA Algorithm and Modeling

Hongtao DuMarch 25, 2004

Outline

Review– Independent Component Analysis – FastICA– Parallel ICA

Parallel Computing Laws Parallel Computing Models Model for pICA

Independent Component Analysis (ICA)

A linear transformation which minimizes the higher order statistical dependence between components.

ICA model: What is independence? Source signal S:

– statistically independent– not more than one is Gaussian distributed

Weight matrix (unmixing matrix) W:

WXS

Methods to minimize statistical dependence– Mutural information (InfoMax)– K-L divergence or relative entropy (Output Divergence)– Nongaussianity (FastICA)

FastICA Algorithm

wxwgExwxgEw TT )}({)}({ '

p

jjj

Tppp wwwww

1111

Parallel ICA

Internal Decorrelation

jn

vjvjv

Tiuiuiu

1

p

viviv

Tiupipi

1)1()1(

jn

rjrjr

Tpi

p

viviv

Tpipipi

1)1(

1)1()1()1(

jnp

vvv

Tpipipi

1)1()1()1(

External Decorrelation

Performance Comparison (4 Processors)

Parallel Computing

Classified by instruction delivery mechanism and data stream.

Single Instruction Flow

Multiple Instruction Flow

Single Data Stream

SISD MISD(Pipeline)

Multiple Data Stream

SIMD(MPI, PVM)

MIMD(Distributed)

SISD: Do-It-Yourself, No help SIMD: Rowing, 1 master, several slave MISD: Assemble line in car manufacture MIMD: Distributed sensor network

PICA algorithm for hyperspectral image analysis (high volume data set) is SIMD.

Parallel Computing Laws and Models

Amdahl Law Gustafson Law

BSP Model LogP Model

Amdahl Law

First law for parallel computing (1967) Limit the speedup for parallel applications.

where

N: number of processors s: serial fraction p: parallel fraction

Naa

Nps

psSpeedup )1(1

pssa

Speedup boundary: 1/a Serial part should be limited and very fast Problem: parallel computer must be fast sequential

computer.

Gustafson Law

Improvement of Amdahl law Considering data size In a parallel program, if the quantity of data

increases, then the sequential fraction decreases.

Ndp *dsNds

Nps

psSpeedup

*

Nspeedupd

Parallel Computing Model

Amdahl and Gustafson laws define the limits without considering the properties of the computer architecture

Can not predict the real performance of any parallel application.

Parallel computing models integrate the computer architecture and application architecture.

Purpose:– Predicting computing cost– Evaluating efficiency of programs

Impacts on performance– Computing node (processor, memory)– Communication network– Tapp=Tcomp+Tcomm

Centric vs. Distributed

Parallel Random Access Machine

– Synchronous processors– Shared memory

Distributed-memory Parallel Computer

– Distributed processor and memory

– Interconnected by a communication network

– Each processor has fast access to its own memory, slow access to remote memory

P1 P2 P3

Shared Memory

P4

P1M1

P2M2

P3M3

P4M4

Bulk Synchronous Parallel - BSP

For distributed-memory parallel computer. Assumptions

– N identical processors, each of them having its own memory– Interconnected with a predictable network.– Each processor can conduct synchronization.

Applications are composed by supersteps, separated by global synchronization.

Each superstep includes: – computation step – communication step– synchronization step

TSuperstep = w + g * h + l

– w: maximum of computing time– g: 1 / (Network bandwidth)– h: amount of transferred message– l: time of synchronization

Algorithm can be described with w and h.

LogP Model

Improvement of BSP model. Decomposing the communication (g) into 3

parts.– Latency (L): message cross the network– Overhead (O): lost time in I/O– Gap (g): gap between 2 consecutive messages

o o

L

TSuperstep = w + (L + 2 * o) * h + l

Execution time is the time of the slowest process.

The total time for a message to be transferred from processor A to processor B is: L + 2 * o

g > o

o o

P1

P2

wait

wait

g

g < oo o

P1

P2

g

Giving the finite capacity of the network:

Drawbacks:– Does not address the data size. If the all messages

are very small?– Does not consider the global capacity of the

network.

gL

Model for pICA

Features– SIMD– High volume data set transfer at first stage– Low volume data transfer at other stages

Combine BSP and LogP models– Stage 1:

Pipeline: hyperspectral image transfer, one unit (weight vector) estimations

Parallel: Internal decorrelations in sub-matrices– Other stages:

Parallel: External decorrelations

T = Tstage1 + Tstage2 +… + Tstagek

Number of layers k = log2P

Tstage1 = (wone-unit + winternal-decorrelation) + (L + 2 * o) * hhyperspectral-image + g * hweight-vectors + lstage1

Tstagei = wexternal-decorrelation + g * hweight-vectors + lstagei i = 2, ….., k

Another Topic

Optimization of parallel computing– Heterogeneous parallel computing network– Minimize overall time– Tradeoff problem between computation

(individual computer properties) and communication (network)

References

A. Hyv¨arinen and Erkki Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Computation, vol. 9, pp. 1483–1492, 1997.

P. Common, “Independent component analysis, a new concept,” Signal Processing, vol. 36, no. 3, pp. 287–314, April 1994, Special Issue on Highorder Statistics.

A.J. Bell and T.J. Sejnowski, “An information maximisation approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995.

S. Amari, A. Cichochi, and H. Yang, “A new learning algorithm for blind signal separation,” Advances in Neural Information Processing Systems, vol. 8, 1996.

Te-Won Lee, Mark Girolami, Anthony J. Bell, and Terrence J. Sejnowski, “A unifying information-theoretic framework for independent component analysis,” International Journal on Mathematical and Computer Modeling, 1998.

parallel ica algorithm and modeling

Documents

parallel computing modelamdahl

parallel applications

parallel program

parallel ica algorithm

processor b

maximum of computing

communication g

fast sequential computer