projection pursuit. projection pursuit (pp) pca and fda are linear, pp may be linear or non-linear....

Projection Pursuit

Projection Pursuit (PP)PCA and FDA are linear, PP may be linear or non-linear.

Find interesting “criterion of fit”, or “figure of merit” function,

that allows for low-dim (usually 2D or 3D) projection.

Interesting indices may use a priori knowledge about the problem:

1. mean nearest neighbor distance – increase clustering of Y(j) 2. maximize mutual information between classes and features

3. find projection that have non-Gaussian distributions.

The last index does not use a priori knowledge; it leads to the Independent Component Analysis (ICA).ICA features are not only uncorrelated, but also independent.

( )T ( ) ( ) ( )1 2, ; ;

( ; ) ;

j j j jY Y f

I I f

Y X W

Y W X W Index of “interestingness”

General transformation with parameters W.

KurtosisICA is a special version of PP, recently very popular.

Gaussian distributions of variable Y are characterized by 2 parameters:

mean value:

variance:

These are the first 2 moments of distribution; all higher are 0 for G(Y).

Super-Gaussian distribution: long tail, peak at zero, 4(y)>0, like binary image data.

sub-Gaussian distribution is more flat and has 4(y)<0, like speech signal data.

24 2

4 3Y E Y E Y

{ }Y E Y2 2{ ( )}Y E Y E Y

One simple measure of non-Gaussianity of projections is the

4-th moment (cumulant) of the distribution, called kurtosis, measures “skewedness” of the distribution. For E{Y}=0 kurtosis is:

Correlation and independence

Features Yi, Yj are uncorrelated if covariance is diagonal, or:

This is much stronger condition than correlation; in particular the functions may be powers of variables; any non-Gaussian distribution after PCA transformation will still have correlated features.

1 21

,n

n i ii

p X X X p X

i j i jE YY E Y E Y

Uncorrelated features are orthogonal.

Statistically independent features Yi, Yj for any functions give:

1 2 1 2i j i jE f Y f Y E f Y E f Y

Variables are statistically independent if their joint probability distribution is a product of probabilities for all variables:

PP/ICA exampleExample: PCA and PP based on maximal kurtosis: note nice separation of the blue class.

Some remarks

Other components are found in the space orthogonal to W1T

X

2(1) T

1arg max E

WW W X

• Many formulations of PP and ICA methods exist.• PP is used for data visualization and dimensionality reduction.• Nonlinear projections are frequently considered, but solutions

are more numerically intensive. • PCA may also be viewed as PP, max (for standardized data):

21

( ) T ( ) T( )

11

arg maxk

k i i

i

E

WW W I W W X

Same index is used, with projection on space orthogonal to k-1 PCs.

Index I(Y;W) is based here on maximum variance.

How do we find multiple Projections

• Statistical approach is complicated:

–Perform a transformation on the data to

eliminate structure in the already found

direction

–Then perform PP again

• Neural Comp approach: Lateral

Inhibition

High Dimensional Data

Dimension Reduction

Feature ExtractionVisualisationClassification

Analysis

Projection Pursuit

what: An automated procedure that seeks interesting low dimensional projections of a high dimensional cloud by numerically maximizing an objective function or projection index.

Huber, 1985

Projection Pursuitwhy:

Curse of dimensionality• Less Robustness• worse mean squared error• greater computational cost• slower convergence to limiting distributions• …

• Required number of labelled samples increases with dimensionality.

What is an interesting projection

In general: the projection that reveals more information about the

structure.

In pattern recognition:

a projection that maximises class separability in a low

dimensional subspace.

Projection Pursuit

Dimensional ReductionFind lower-dimensional projections of a high-dimensional point

cloud to facilitate classification.

Exploratory Projection PursuitReduce the dimension of the problem to facilitate visualization.

Projection Pursuit

How many dimensions to use• for visualization• for classification/analysis

Which Projection Index to use• measure of variation (Principal Components)• departure from normality (negative entropy)• class separability(distance, Bhattacharyya, Mahalanobis, ...)• …

Projection Pursuit

Which optimization method to choose

We are trying to find the global optimum among local ones

• hill climbing methods (simulated annealing)• regular optimization routines with random starting points.

Timetable for Dimensionality reduction

• Begin 16 April 1998

• Report on the state-of-the-art. 1 June 1998

• Begin software implementation 15 June 1998

• Prototype software presentation 1 November 1998

ICA demos• ICA has many applications in signal and image analysis.• Finding independent signal sources allows for separation of

signals from different sources, removal of noise or artifacts.

Observations X are a linear mixture W of unknown sources Y

Play with ICALab PCA/ICA Matlab software for signal/image analysis: http://www.bsp.brain.riken.go.jp/page7.html

TX W Y

Both W and Y are unknown! This is a blind separation problem. How can they be found?

If Y are Independent Components and W linear mixing the problem is similar to FDA or PCA, only the criterion function is different.

http://www.bsp.brain.riken.go.jp/page7.html

ICA demo: images & audio

Example from Cichocki’s lab,


X space for images:

take intensity of all pixels one vector per image, or

take smaller patches (ex: 64x64), increasing # vectors

• 5 images: originals, mixed, convergence of ICA iterations

X space for signals:

sample the signal for some time t

• 10 songs: mixed samples and separated samples


Self-organizationPCA, FDA, ICA, PP are all inspired by statistics, although some neural-

inspired methods have been proposed to find interesting solutions, especially for their non-linear versions.

• Brains learn to discover the structure of signals: visual, tactile, olfactory, auditory (speech and sounds).

• This is a good example of unsupervised learning: spontaneous development of feature detectors, compressing internal information that is needed to model environmental states (inputs).

• Some simple stimuli lead to complex behavioral patterns in animals; brains use specialized microcircuits to derive vital information from signals – for example, amygdala nuclei in rats sensitive to ultrasound signals signifying “cat around”.

Models of self-organizaitonSOM or SOFM (Self-Organized Feature Mapping) – self-organizing feature map, one of the simplest models.

How can such maps develop spontaneously?

Local neural connections: neurons interact strongly with those nearby, but weakly with those that are far (in addition inhibiting some intermediate neurons).

History:von der Malsburg and Willshaw (1976), competitive learning, Hebb mechanisms, „Mexican hat” interactions, models of visual systems.Amari (1980) – models of continuous neural tissue.Kohonen (1981) - simplification, no inhibition; leaving two essential factors: competition and cooperation.

21

Computational Intelligence: Methods and Applications

Lecture 8 Projection Pursuit &

Independent Component Analysis

Włodzisław DuchSCE, NTU, Singapore

Google: Duch

22

Computational Intelligence: Methods and Applications

Lecture 6 Principal Component Analysis.

Włodzisław Duch

SCE, NTU, Singapore

http://www.ntu.edu.sg/home/aswduch

projection pursuit. projection pursuit (pp) pca and fda are linear, pp may be linear or non-linear....

Documents