multivariate analysis a unified perspective

23
Multivariate Analysis Harrison B. Prosper Durham, UK 2002 1 Multivariate Analysis Multivariate Analysis A Unified Perspective A Unified Perspective Harrison B. Prosper Florida State University Advanced Statistical Techniques in Particle Physics Durham, UK, 20 March 2002

Upload: dexter

Post on 16-Jan-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Multivariate Analysis A Unified Perspective. Harrison B. Prosper Florida State University Advanced Statistical Techniques in Particle Physics Durham, UK, 20 March 2002. Outline. Introduction Some Multivariate Methods Fisher Linear Discriminant(FLD) Principal Component Analysis (PCA) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 1

Multivariate AnalysisMultivariate AnalysisA Unified PerspectiveA Unified Perspective

Harrison B. ProsperFlorida State University

Advanced Statistical Techniques in Particle Physics

Durham, UK, 20 March 2002

Page 2: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 2

OutlineOutline

Introduction Some Multivariate Methods

Fisher Linear Discriminant (FLD) Principal Component Analysis (PCA) Independent Component Analysis (ICA) Self Organizing Map (SOM) Random Grid Search (RGS) Probability Density Estimation (PDE) Artificial Neural Network (ANN) Support Vector Machine (SVM)

Comments Summary

Page 3: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 3

Introduction – i Introduction – i

Multivariate analysis is hard! Our mathematical intuition based on analysis in one

dimension often fails rather badly for spaces of very high dimension.

One should distinguish the problem to be solved from the algorithm to solve it.

Typically, the problems to be solved, when viewed with sufficient detachment, are relatively few in number whereas algorithms to solve them are invented every day.

Page 4: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 4

Introduction – ii Introduction – ii

So why bother with multivariate analysis? Because:

The variables we use to describe events are usually statistically dependent.

Therefore, the N-d density of the variables contains more information than is contained in the set of 1-d marginal densities fi(xi).

This extra information may be useful

Page 5: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 5

jetslttpp

Dzero 1995TopDiscovery

Page 6: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 6

Introduction - iiiIntroduction - iii

Problems that may benefit from multivariate analysis: Signal to background discrimination Variable selection (e.g., to give maximum

signal/background discrimination) Dimensionality reduction of the feature space Finding regions of interest in the data Simplifying optimization (by ) Model comparison Measuring stuff (e.g., tanin SUSY)

1: Uf N

Page 7: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 7

Fisher Linear DiscriminantFisher Linear Discriminant

Purpose Signal/background discrimination

bxw

xg

xg

)()(

,|

,|log

12

22

2

1

g is a Gaussianw

0 bxw

0 bxw

Page 8: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 8

Principal Component AnalysisPrincipal Component Analysis

Purpose Reduce dimensionality of data

K

ii wdw

1

21 )(maxarg

1st principal axis

21

112 ))](([maxarg wdwxww i

K

ii

x2

x1

di

ii xwd

wix 1w

2nd principal axis

Page 9: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 9

PCA algorithm in practicePCA algorithm in practice

Transform from X = (x1,..xN)T to U = (u1,..uN)T in which lowest order correlations are absent. Compute Cov(X)

Compute its eigenvalues ii and eigenvectors vi

Construct matrix T = Col(vi)T

U = TX Typically, one eliminates ui with smallest amount

of variation

Page 10: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 10

Independent Component AnalysisIndependent Component Analysis

Purpose Find statistically independent variables. Dimensionality reduction

Basic Idea Assume X = (x1,..,xN)T is a linear sum X = AS

of independent sources S = (s1,..,sN)T. Both A, the mixing matrix, and S are unknown.

Find a de-mixing matrix T such that the components of U = TX are statistically independent

Page 11: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 11

ICA-AlgorithmICA-Algorithm

0)(

)(log)()|(

dU

Ug

UfUfgfK

Given two densities f(U) and g(U) one measure of their “closeness”is the Kullback-Leibler divergence

which is zero if, and only if, f(U) = g(U).

We set

i

ii ufUg )()(

and minimize K( f | g) (now called the mutual information) with respect to the de-mixing matrix T.

Page 12: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 12

Self Organizing MapSelf Organizing Map

Purpose Find regions of interest in data; that is, clusters. Summarize data

Basic Idea (Kohonen, 1988) Map each of K feature vectors X = (x1,..,xN)T

into one of M regions of interest defined by the vector wm so that all X mapped to a given wm are closer to it than to all remaining wm.

Basically, perform a coarse-graining of the feature space.

Page 13: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 13

x

y

Apply cuts at each grid point

Apply cuts at each grid point

Grid Search Grid Search

x x

y yi

i

We refer to as a cut-pointcut-point

We refer to as a cut-pointcut-point

( , )x yi i

Number of cut-points ~ NbinNdimNumber of cut-points ~ Nbin

Ndim

Purpose: Signal/Background discrimination

Page 14: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 14

Random Grid SearchRandom Grid Search

x

y

Sig

nal f

ract

ion

Background fraction

0

0

1

1

Ntot = # events before cutsNcut = # events after cutsFraction = Ncut/Ntot

Ntot = # events before cutsNcut = # events after cutsFraction = Ncut/Ntot

Take each point each point ofthe signal class as a cut-pointa cut-point

Take each point each point ofthe signal class as a cut-pointa cut-point

x x

y yi

i

H.B.P. et al, Proceedings, CHEP 1995

Page 15: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 15

Probability Density EstimationProbability Density Estimation

Purpose Signal/background discrimination Parameter estimation

Basic Idea Parzen Estimation (1960s)

Mixtures

Nnh

xx

hNxp

n

nd

111

)(

Njjqjxxpj

)()|()(

Page 16: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 16

Artificial Neural NetworksArtificial Neural Networks

Purpose Signal/background discrimination Parameter estimation Function estimation Density estimation

Basic Idea Encode mapping (Kolmogorov, 1950s).

Using a set of 1-D functions.

],..,[)(: 1 KMN FxfUUf

Page 17: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 17

Feedforward NetworksFeedforward Networks

))((),(5

1

i

ii afwfwxn

)(2

1ii

jjiji afxwa

Input nodes Hidden nodes Output node

f(a)

a

1x

2xijw

iw

i

Page 18: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 18

Minimize the empirical risk function with respect to

i

iiN xntR 21 )],([)(

ANN- AlgorithmANN- Algorithm

Solution (for large N)

dtxtpxtxn )|()(),( dtxtpxtxn )|()(),(

k

kpkxpkpkxpxkpxn )()|(/)()|()|(),( k

kpkxpkpkxpxkpxn )()|(/)()|()|(),(

If t(x) = k[1I(x)], where I(x) = 1 if x is of class k, 0 otherwise

D.W. Ruck et al., IEEE Trans. Neural Networks 1(4), 296-298 (1990)E.A. Wan, IEEE Trans. Neural Networks 1(4), 303-305 (1990)

Page 19: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 19

Support Vector MachinesSupport Vector Machines

Purpose Signal/background discrimination

Basic Idea Data that are non-separable in N-dimensions

have a higher chance of being separable if mapped into a space of higher dimension

Use a linear discriminant to partition the high dimensional feature space.

bxwxD )()(

HugeN :

Page 20: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 20

SVM – Kernel TrickSVM – Kernel TrickOr how to cope with a possibly infinite number of parameters!Or how to cope with a possibly infinite number of parameters!

),,(),(: 32121 zzzxx

x1

x2

z1

z2

z3

bxxyxDj

ii )]()([)(

bxwxD )()(

)()(),( jj xxxxK Try different because mapping unknown!

y = 1

y = +1

Page 21: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 21

Comments – i Comments – i

Every classification task tries to solves the same fundamental problem, which is: After adequately pre-processing the data …find a good, and practical, approximation to

the Bayes decision rule: Given X, if P(S|X) > P(B|X) , choose hypothesis S otherwise choose B.

If we knew the densities p(X|S) and p(X|B) and the priors p(S) and p(B) we could compute the Bayes Discriminant Function (BDF): D(X) = P(S|X)/P(B|X)

Page 22: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 22

Comments – ii Comments – ii

The Fisher discriminant (FLD), random grid search (RGS), probability density estimation (PDE), neural network (ANN) and support vector machine (SVM) are simply different algorithms to approximate the Bayes discriminant function D(X), or a function thereof.

It follows, therefore, that if a method is already close to the Bayes limit, then no other method, however sophisticated, can be expected to yield dramatic improvements.

Page 23: Multivariate Analysis A Unified Perspective

Multivariate Analysis Harrison B. Prosper Durham, UK 2002 23

SummarySummary

Multivariate analysis is hard, but useful if it is important to extract as much information from the data as possible.

For classification problems, the common methods provide different approximations to the Bayes discriminant.

There is considerably empirical evidence that, as yet, no uniformly most powerful method exists. Therefore, be wary of claims to the contrary!