parallel ica algorithm and modeling
DESCRIPTION
Parallel ICA Algorithm and Modeling. Hongtao Du March 25, 2004. Outline. Review Independent Component Analysis FastICA Parallel ICA Parallel Computing Laws Parallel Computing Models Model for pICA. Independent Component Analysis (ICA). - PowerPoint PPT PresentationTRANSCRIPT
Parallel ICA Algorithm and Modeling
Hongtao DuMarch 25, 2004
Outline
Review– Independent Component Analysis – FastICA– Parallel ICA
Parallel Computing Laws Parallel Computing Models Model for pICA
Independent Component Analysis (ICA)
A linear transformation which minimizes the higher order statistical dependence between components.
ICA model: What is independence? Source signal S:
– statistically independent– not more than one is Gaussian distributed
Weight matrix (unmixing matrix) W:
WXS
Methods to minimize statistical dependence– Mutural information (InfoMax)– K-L divergence or relative entropy (Output Divergence)– Nongaussianity (FastICA)
FastICA Algorithm
wxwgExwxgEw TT )}({)}({ '
p
jjj
Tppp wwwww
1111
Parallel ICA
Internal Decorrelation
jn
vjvjv
Tiuiuiu
1
p
viviv
Tiupipi
1)1()1(
jn
rjrjr
Tpi
p
viviv
Tpipipi
1)1(
1)1()1()1(
jnp
vvv
Tpipipi
1)1()1()1(
External Decorrelation
Performance Comparison (4 Processors)
Parallel Computing
Classified by instruction delivery mechanism and data stream.
Single Instruction Flow
Multiple Instruction Flow
Single Data Stream
SISD MISD(Pipeline)
Multiple Data Stream
SIMD(MPI, PVM)
MIMD(Distributed)
SISD: Do-It-Yourself, No help SIMD: Rowing, 1 master, several slave MISD: Assemble line in car manufacture MIMD: Distributed sensor network
PICA algorithm for hyperspectral image analysis (high volume data set) is SIMD.
Parallel Computing Laws and Models
Amdahl Law Gustafson Law
BSP Model LogP Model
Amdahl Law
First law for parallel computing (1967) Limit the speedup for parallel applications.
where
N: number of processors s: serial fraction p: parallel fraction
Naa
Nps
psSpeedup )1(1
pssa
Speedup boundary: 1/a Serial part should be limited and very fast Problem: parallel computer must be fast sequential
computer.
Gustafson Law
Improvement of Amdahl law Considering data size In a parallel program, if the quantity of data
increases, then the sequential fraction decreases.
Ndp *dsNds
Nps
psSpeedup
*
Nspeedupd
Parallel Computing Model
Amdahl and Gustafson laws define the limits without considering the properties of the computer architecture
Can not predict the real performance of any parallel application.
Parallel computing models integrate the computer architecture and application architecture.
Purpose:– Predicting computing cost– Evaluating efficiency of programs
Impacts on performance– Computing node (processor, memory)– Communication network– Tapp=Tcomp+Tcomm
Centric vs. Distributed
Parallel Random Access Machine
– Synchronous processors– Shared memory
Distributed-memory Parallel Computer
– Distributed processor and memory
– Interconnected by a communication network
– Each processor has fast access to its own memory, slow access to remote memory
P1 P2 P3
Shared Memory
P4
P1M1
P2M2
P3M3
P4M4
Bulk Synchronous Parallel - BSP
For distributed-memory parallel computer. Assumptions
– N identical processors, each of them having its own memory– Interconnected with a predictable network.– Each processor can conduct synchronization.
Applications are composed by supersteps, separated by global synchronization.
Each superstep includes: – computation step – communication step– synchronization step
TSuperstep = w + g * h + l
– w: maximum of computing time– g: 1 / (Network bandwidth)– h: amount of transferred message– l: time of synchronization
Algorithm can be described with w and h.
LogP Model
Improvement of BSP model. Decomposing the communication (g) into 3
parts.– Latency (L): message cross the network– Overhead (O): lost time in I/O– Gap (g): gap between 2 consecutive messages
o o
L
TSuperstep = w + (L + 2 * o) * h + l
Execution time is the time of the slowest process.
The total time for a message to be transferred from processor A to processor B is: L + 2 * o
g > o
o o
P1
P2
wait
wait
g
g < oo o
P1
P2
g
Giving the finite capacity of the network:
Drawbacks:– Does not address the data size. If the all messages
are very small?– Does not consider the global capacity of the
network.
gL
Model for pICA
Features– SIMD– High volume data set transfer at first stage– Low volume data transfer at other stages
Combine BSP and LogP models– Stage 1:
Pipeline: hyperspectral image transfer, one unit (weight vector) estimations
Parallel: Internal decorrelations in sub-matrices– Other stages:
Parallel: External decorrelations
T = Tstage1 + Tstage2 +… + Tstagek
Number of layers k = log2P
Tstage1 = (wone-unit + winternal-decorrelation) + (L + 2 * o) * hhyperspectral-image + g * hweight-vectors + lstage1
Tstagei = wexternal-decorrelation + g * hweight-vectors + lstagei i = 2, ….., k
Another Topic
Optimization of parallel computing– Heterogeneous parallel computing network– Minimize overall time– Tradeoff problem between computation
(individual computer properties) and communication (network)
References
A. Hyv¨arinen and Erkki Oja, “A fast fixed-point algorithm for independent component analysis,” Neural Computation, vol. 9, pp. 1483–1492, 1997.
P. Common, “Independent component analysis, a new concept,” Signal Processing, vol. 36, no. 3, pp. 287–314, April 1994, Special Issue on Highorder Statistics.
A.J. Bell and T.J. Sejnowski, “An information maximisation approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995.
S. Amari, A. Cichochi, and H. Yang, “A new learning algorithm for blind signal separation,” Advances in Neural Information Processing Systems, vol. 8, 1996.
Te-Won Lee, Mark Girolami, Anthony J. Bell, and Terrence J. Sejnowski, “A unifying information-theoretic framework for independent component analysis,” International Journal on Mathematical and Computer Modeling, 1998.