23 machine learning feature generation

Post on 16-Apr-2017

456 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning for Data MiningFeature Generation

Andres Mendez-Vazquez

July 16, 2015

1 / 35

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat do we want?

2 Fisher Linear DiscriminantThe Rotation IdeaSolutionScatter measureThe Cost Function

3 Principal Component Analysis

2 / 35

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat do we want?

2 Fisher Linear DiscriminantThe Rotation IdeaSolutionScatter measureThe Cost Function

3 Principal Component Analysis

3 / 35

Images/cinvestav-1.jpg

What do we want?

WhatGiven a set of measurements, the goal is to discover compact andinformative representations of the obtained data.

Our ApproachWe want to “squeeze” in a relatively small number of features, leading toa reduction of the necessary feature space dimension.

PropertiesThus removing information redundancies - Usually produced and themeasurement.

4 / 35

Images/cinvestav-1.jpg

What do we want?

WhatGiven a set of measurements, the goal is to discover compact andinformative representations of the obtained data.

Our ApproachWe want to “squeeze” in a relatively small number of features, leading toa reduction of the necessary feature space dimension.

PropertiesThus removing information redundancies - Usually produced and themeasurement.

4 / 35

Images/cinvestav-1.jpg

What do we want?

WhatGiven a set of measurements, the goal is to discover compact andinformative representations of the obtained data.

Our ApproachWe want to “squeeze” in a relatively small number of features, leading toa reduction of the necessary feature space dimension.

PropertiesThus removing information redundancies - Usually produced and themeasurement.

4 / 35

Images/cinvestav-1.jpg

What Methods we will see?

Fisher Linear Discriminant1 Squeezing to the maximum.2 From Many to One Dimension

Principal Component Analysis1 Not so much squeezing2 You are willing to lose some information

Singular Value Decomposition1 Decompose a m × n data matrix A into A = USV T , U and V

orthonormal matrices and S contains the eigenvalues.2 You can read more of it on “Singular Value Decomposition -

Dimensionality Reduction”

5 / 35

Images/cinvestav-1.jpg

What Methods we will see?

Fisher Linear Discriminant1 Squeezing to the maximum.2 From Many to One Dimension

Principal Component Analysis1 Not so much squeezing2 You are willing to lose some information

Singular Value Decomposition1 Decompose a m × n data matrix A into A = USV T , U and V

orthonormal matrices and S contains the eigenvalues.2 You can read more of it on “Singular Value Decomposition -

Dimensionality Reduction”

5 / 35

Images/cinvestav-1.jpg

What Methods we will see?

Fisher Linear Discriminant1 Squeezing to the maximum.2 From Many to One Dimension

Principal Component Analysis1 Not so much squeezing2 You are willing to lose some information

Singular Value Decomposition1 Decompose a m × n data matrix A into A = USV T , U and V

orthonormal matrices and S contains the eigenvalues.2 You can read more of it on “Singular Value Decomposition -

Dimensionality Reduction”

5 / 35

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat do we want?

2 Fisher Linear DiscriminantThe Rotation IdeaSolutionScatter measureThe Cost Function

3 Principal Component Analysis

6 / 35

Images/cinvestav-1.jpg

Rotation

ProjectingProjecting well-separated samples onto an arbitrary line usually produces aconfused mixture of samples from all of the classes and thus produces poorrecognition performance.

Something NotableHowever, moving and rotating the line around might result in anorientation for which the projected samples are well separated.

Fisher linear discriminant (FLD)It is a discriminant analysis seeking directions that are efficient fordiscriminating binary classification problem.

7 / 35

Images/cinvestav-1.jpg

Rotation

ProjectingProjecting well-separated samples onto an arbitrary line usually produces aconfused mixture of samples from all of the classes and thus produces poorrecognition performance.

Something NotableHowever, moving and rotating the line around might result in anorientation for which the projected samples are well separated.

Fisher linear discriminant (FLD)It is a discriminant analysis seeking directions that are efficient fordiscriminating binary classification problem.

7 / 35

Images/cinvestav-1.jpg

Rotation

ProjectingProjecting well-separated samples onto an arbitrary line usually produces aconfused mixture of samples from all of the classes and thus produces poorrecognition performance.

Something NotableHowever, moving and rotating the line around might result in anorientation for which the projected samples are well separated.

Fisher linear discriminant (FLD)It is a discriminant analysis seeking directions that are efficient fordiscriminating binary classification problem.

7 / 35

Images/cinvestav-1.jpg

Example

Example - From Left to Right the Improvement

8 / 35

Images/cinvestav-1.jpg

This is actually comming from...

Classifier asA machine for dimensionality reduction.

Initial SetupWe have:

N d-dimensional samples x1, x2, ..., xN

Ni is the number of samples in class Ci for i=1,2.

Then, we ask for the projection of each xi into the line by means of

yi = wT xi (1)

9 / 35

Images/cinvestav-1.jpg

This is actually comming from...

Classifier asA machine for dimensionality reduction.

Initial SetupWe have:

N d-dimensional samples x1, x2, ..., xN

Ni is the number of samples in class Ci for i=1,2.

Then, we ask for the projection of each xi into the line by means of

yi = wT xi (1)

9 / 35

Images/cinvestav-1.jpg

This is actually comming from...

Classifier asA machine for dimensionality reduction.

Initial SetupWe have:

N d-dimensional samples x1, x2, ..., xN

Ni is the number of samples in class Ci for i=1,2.

Then, we ask for the projection of each xi into the line by means of

yi = wT xi (1)

9 / 35

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat do we want?

2 Fisher Linear DiscriminantThe Rotation IdeaSolutionScatter measureThe Cost Function

3 Principal Component Analysis

10 / 35

Images/cinvestav-1.jpg

Use the mean of each Class

ThenSelect w such that class separation is maximized

We then define the mean sample for ecah class1 C1 ⇒m1 = 1

N1

∑N1i=1 xi

2 C2 ⇒m2 = 1N2

∑N2i=1 xi

Ok!!! This is giving us a measure of distanceThus, we want to maximize the distance the projected means:

m1 −m2 = wT (m1 −m2) (2)

where mk = wT mk for k = 1, 2.

11 / 35

Images/cinvestav-1.jpg

Use the mean of each Class

ThenSelect w such that class separation is maximized

We then define the mean sample for ecah class1 C1 ⇒m1 = 1

N1

∑N1i=1 xi

2 C2 ⇒m2 = 1N2

∑N2i=1 xi

Ok!!! This is giving us a measure of distanceThus, we want to maximize the distance the projected means:

m1 −m2 = wT (m1 −m2) (2)

where mk = wT mk for k = 1, 2.

11 / 35

Images/cinvestav-1.jpg

Use the mean of each Class

ThenSelect w such that class separation is maximized

We then define the mean sample for ecah class1 C1 ⇒m1 = 1

N1

∑N1i=1 xi

2 C2 ⇒m2 = 1N2

∑N2i=1 xi

Ok!!! This is giving us a measure of distanceThus, we want to maximize the distance the projected means:

m1 −m2 = wT (m1 −m2) (2)

where mk = wT mk for k = 1, 2.

11 / 35

Images/cinvestav-1.jpg

However

We could simply seek

maxwT (m1 −m2)

s.t.d∑

i=1wi = 1

After allWe do not care about the magnitude of w.

12 / 35

Images/cinvestav-1.jpg

However

We could simply seek

maxwT (m1 −m2)

s.t.d∑

i=1wi = 1

After allWe do not care about the magnitude of w.

12 / 35

Images/cinvestav-1.jpg

Example

Here, we have the problem

13 / 35

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat do we want?

2 Fisher Linear DiscriminantThe Rotation IdeaSolutionScatter measureThe Cost Function

3 Principal Component Analysis

14 / 35

Images/cinvestav-1.jpg

Fixing the Problem

To obtain good separation of the projected dataThe difference between the means should be large relative to somemeasure of the standard deviations for each class.

We define a SCATTER measure (Based in the Sample Variance)

s2k =

∑xi∈Ck

(wT xi −mk

)2=

∑yi=wT xi∈Ck

(yi −mk)2 (3)

We define then within-class variance for the whole data

s21 + s2

2 (4)

15 / 35

Images/cinvestav-1.jpg

Fixing the Problem

To obtain good separation of the projected dataThe difference between the means should be large relative to somemeasure of the standard deviations for each class.

We define a SCATTER measure (Based in the Sample Variance)

s2k =

∑xi∈Ck

(wT xi −mk

)2=

∑yi=wT xi∈Ck

(yi −mk)2 (3)

We define then within-class variance for the whole data

s21 + s2

2 (4)

15 / 35

Images/cinvestav-1.jpg

Fixing the Problem

To obtain good separation of the projected dataThe difference between the means should be large relative to somemeasure of the standard deviations for each class.

We define a SCATTER measure (Based in the Sample Variance)

s2k =

∑xi∈Ck

(wT xi −mk

)2=

∑yi=wT xi∈Ck

(yi −mk)2 (3)

We define then within-class variance for the whole data

s21 + s2

2 (4)

15 / 35

Images/cinvestav-1.jpg

Outline

1 IntroductionWhat do we want?

2 Fisher Linear DiscriminantThe Rotation IdeaSolutionScatter measureThe Cost Function

3 Principal Component Analysis

16 / 35

Images/cinvestav-1.jpg

Finally, a Cost Function

The between-class variance

(m1 −m2)2 (5)

The Fisher criterionbetween-class variancewithin-class variance (6)

Finally

J (w) = (m1 −m2)2

s21 + s2

2(7)

17 / 35

Images/cinvestav-1.jpg

Finally, a Cost Function

The between-class variance

(m1 −m2)2 (5)

The Fisher criterionbetween-class variancewithin-class variance (6)

Finally

J (w) = (m1 −m2)2

s21 + s2

2(7)

17 / 35

Images/cinvestav-1.jpg

Finally, a Cost Function

The between-class variance

(m1 −m2)2 (5)

The Fisher criterionbetween-class variancewithin-class variance (6)

Finally

J (w) = (m1 −m2)2

s21 + s2

2(7)

17 / 35

Images/cinvestav-1.jpg

We use a transformation to simplify our lifeFirst

J (w) =

(wT m1 − wT m2

)2∑yi =wT xi ∈C1

(yi − mk)2 +∑

yi =wT xi ∈C2(yi − mk)2

(8)

Second

J (w) =

(wT m1 − wT m2

)(wT m1 − wT m2

)T∑yi =wT xi ∈C1

(wT xi − mk

)(wT xi − mk

)T+∑

yi =wT xi ∈C2

(wT xi − mk

)(wT xi − mk

)T

(9)

Third

J (w) =wT (m1 − m2)

(wT (m1 − m2)

)T∑yi =wT xi ∈C1

wT (xi − m1)(

wT (xi − m1))T

+∑

yi =wT xi ∈C2wT (xi − m2)

(wT (xi − m2)

)T

(10)

18 / 35

Images/cinvestav-1.jpg

We use a transformation to simplify our lifeFirst

J (w) =

(wT m1 − wT m2

)2∑yi =wT xi ∈C1

(yi − mk)2 +∑

yi =wT xi ∈C2(yi − mk)2

(8)

Second

J (w) =

(wT m1 − wT m2

)(wT m1 − wT m2

)T∑yi =wT xi ∈C1

(wT xi − mk

)(wT xi − mk

)T+∑

yi =wT xi ∈C2

(wT xi − mk

)(wT xi − mk

)T

(9)

Third

J (w) =wT (m1 − m2)

(wT (m1 − m2)

)T∑yi =wT xi ∈C1

wT (xi − m1)(

wT (xi − m1))T

+∑

yi =wT xi ∈C2wT (xi − m2)

(wT (xi − m2)

)T

(10)

18 / 35

Images/cinvestav-1.jpg

We use a transformation to simplify our lifeFirst

J (w) =

(wT m1 − wT m2

)2∑yi =wT xi ∈C1

(yi − mk)2 +∑

yi =wT xi ∈C2(yi − mk)2

(8)

Second

J (w) =

(wT m1 − wT m2

)(wT m1 − wT m2

)T∑yi =wT xi ∈C1

(wT xi − mk

)(wT xi − mk

)T+∑

yi =wT xi ∈C2

(wT xi − mk

)(wT xi − mk

)T

(9)

Third

J (w) =wT (m1 − m2)

(wT (m1 − m2)

)T∑yi =wT xi ∈C1

wT (xi − m1)(

wT (xi − m1))T

+∑

yi =wT xi ∈C2wT (xi − m2)

(wT (xi − m2)

)T

(10)

18 / 35

Images/cinvestav-1.jpg

Transformation

Fourth

J (w) =wT (m1 − m2) (m1 − m2)T w∑

yi =wT xi ∈C1wT (xi − m1) (xi − m1)T w +

∑yi =wT xi ∈C2

wT (xi − m2) (xi − m2)T w(11)

FifthJ (w) =

wT (m1 − m2) (m1 − m2)T w

wT[∑

yi =wT xi ∈C1(xi − m1) (xi − m1)T +

∑yi =wT xi ∈C2

(xi − m2) (xi − m2)T]

w(12)

Now Rename

J (w) = wT SBwwT Sww (13)

19 / 35

Images/cinvestav-1.jpg

Transformation

Fourth

J (w) =wT (m1 − m2) (m1 − m2)T w∑

yi =wT xi ∈C1wT (xi − m1) (xi − m1)T w +

∑yi =wT xi ∈C2

wT (xi − m2) (xi − m2)T w(11)

FifthJ (w) =

wT (m1 − m2) (m1 − m2)T w

wT[∑

yi =wT xi ∈C1(xi − m1) (xi − m1)T +

∑yi =wT xi ∈C2

(xi − m2) (xi − m2)T]

w(12)

Now Rename

J (w) = wT SBwwT Sww (13)

19 / 35

Images/cinvestav-1.jpg

Transformation

Fourth

J (w) =wT (m1 − m2) (m1 − m2)T w∑

yi =wT xi ∈C1wT (xi − m1) (xi − m1)T w +

∑yi =wT xi ∈C2

wT (xi − m2) (xi − m2)T w(11)

FifthJ (w) =

wT (m1 − m2) (m1 − m2)T w

wT[∑

yi =wT xi ∈C1(xi − m1) (xi − m1)T +

∑yi =wT xi ∈C2

(xi − m2) (xi − m2)T]

w(12)

Now Rename

J (w) = wT SBwwT Sww (13)

19 / 35

Images/cinvestav-1.jpg

Derive with respect to w

ThusdJ (w)

dw =d(wT SBw

) (wT Sww

)−1

dw = 0 (14)

Then

dJ (w)dw =

(SBw + ST

B w) (

wT Sww)−1 −

(wT SBw

) (wT Sww

)−2 (Sww + STw w)

= 0(15)

Now because the symmetry in SB and Sw

dJ (w)dw = SB

(wT Sww) − wT SBwSww(wT Sww)2 = 0 (16)

20 / 35

Images/cinvestav-1.jpg

Derive with respect to w

ThusdJ (w)

dw =d(wT SBw

) (wT Sww

)−1

dw = 0 (14)

Then

dJ (w)dw =

(SBw + ST

B w) (

wT Sww)−1 −

(wT SBw

) (wT Sww

)−2 (Sww + STw w)

= 0(15)

Now because the symmetry in SB and Sw

dJ (w)dw = SB

(wT Sww) − wT SBwSww(wT Sww)2 = 0 (16)

20 / 35

Images/cinvestav-1.jpg

Derive with respect to w

ThusdJ (w)

dw =d(wT SBw

) (wT Sww

)−1

dw = 0 (14)

Then

dJ (w)dw =

(SBw + ST

B w) (

wT Sww)−1 −

(wT SBw

) (wT Sww

)−2 (Sww + STw w)

= 0(15)

Now because the symmetry in SB and Sw

dJ (w)dw = SB

(wT Sww) − wT SBwSww(wT Sww)2 = 0 (16)

20 / 35

Images/cinvestav-1.jpg

Derive with respect to w

ThusdJ (w)

dw = SB

(wT Sww) − wT SBwSww(wT Sww)2 = 0 (17)

Then (wT Sww

)SBw =

(wT SBw

)Sww (18)

21 / 35

Images/cinvestav-1.jpg

Derive with respect to w

ThusdJ (w)

dw = SB

(wT Sww) − wT SBwSww(wT Sww)2 = 0 (17)

Then (wT Sww

)SBw =

(wT SBw

)Sww (18)

21 / 35

Images/cinvestav-1.jpg

Now, Several Tricks!!!

FirstSBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19)

Where α = (m1 −m2)T w is a simple constantIt means that SBw is always in the direction m1 −m2!!!

In additionwT Sww and wT SBw are constants

22 / 35

Images/cinvestav-1.jpg

Now, Several Tricks!!!

FirstSBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19)

Where α = (m1 −m2)T w is a simple constantIt means that SBw is always in the direction m1 −m2!!!

In additionwT Sww and wT SBw are constants

22 / 35

Images/cinvestav-1.jpg

Now, Several Tricks!!!

FirstSBw = (m1 − m2) (m1 − m2)T w = α (m1 − m2) (19)

Where α = (m1 −m2)T w is a simple constantIt means that SBw is always in the direction m1 −m2!!!

In additionwT Sww and wT SBw are constants

22 / 35

Images/cinvestav-1.jpg

Now, Several Tricks!!!

Finally

Sww ∝ (m1 −m2)⇒ w ∝ S−1w (m1 −m2) (20)

Once the data is transformed into yi

Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0

Or ML with a Gussian can be used to classify the new transformeddata using a Naive Bayes (Central Limit Theorem and y = wT x sumof random variables).

23 / 35

Images/cinvestav-1.jpg

Now, Several Tricks!!!

Finally

Sww ∝ (m1 −m2)⇒ w ∝ S−1w (m1 −m2) (20)

Once the data is transformed into yi

Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0

Or ML with a Gussian can be used to classify the new transformeddata using a Naive Bayes (Central Limit Theorem and y = wT x sumof random variables).

23 / 35

Images/cinvestav-1.jpg

Now, Several Tricks!!!

Finally

Sww ∝ (m1 −m2)⇒ w ∝ S−1w (m1 −m2) (20)

Once the data is transformed into yi

Use a threshold y0 ⇒ x ∈ C1 iff y (x) ≥ y0 or x ∈ C2 iff y (x) < y0

Or ML with a Gussian can be used to classify the new transformeddata using a Naive Bayes (Central Limit Theorem and y = wT x sumof random variables).

23 / 35

Images/cinvestav-1.jpg

Please

Your Reading Material, it is about the Multiclass4.1.6 Fisher’s discriminant for multiple classes AT “Pattern Recognition”by Bishop

24 / 35

Images/cinvestav-1.jpg

Also Known as Karhunen-Loeve Transform

SetupConsider a data set of observations {xn} with n = 1, 2, ...,N andxn ∈ Rd .

GoalProject data onto space with dimensionality m < d (We assume m isgiven)

25 / 35

Images/cinvestav-1.jpg

Also Known as Karhunen-Loeve Transform

SetupConsider a data set of observations {xn} with n = 1, 2, ...,N andxn ∈ Rd .

GoalProject data onto space with dimensionality m < d (We assume m isgiven)

25 / 35

Images/cinvestav-1.jpg

Use the 1-Dimensional Variance

Remember the Variance Sample

VAR(X) =∑N

i=1 (xi − x) (xi − x)N − 1 (21)

You can do the same in the case of two variables X and Y

COV (X ,Y ) =∑N

i=1 (xi − x) (yi − y)N − 1 (22)

26 / 35

Images/cinvestav-1.jpg

Use the 1-Dimensional Variance

Remember the Variance Sample

VAR(X) =∑N

i=1 (xi − x) (xi − x)N − 1 (21)

You can do the same in the case of two variables X and Y

COV (X ,Y ) =∑N

i=1 (xi − x) (yi − y)N − 1 (22)

26 / 35

Images/cinvestav-1.jpg

Now, Define

Given the data

x1,x2, ...,xN (23)

where xi is a column vector

Construct the sample mean

x = 1N

N∑i=1

xi (24)

Build new data

x1 − x,x2 − x, ...,xN − x (25)

27 / 35

Images/cinvestav-1.jpg

Now, Define

Given the data

x1,x2, ...,xN (23)

where xi is a column vector

Construct the sample mean

x = 1N

N∑i=1

xi (24)

Build new data

x1 − x,x2 − x, ...,xN − x (25)

27 / 35

Images/cinvestav-1.jpg

Now, Define

Given the data

x1,x2, ...,xN (23)

where xi is a column vector

Construct the sample mean

x = 1N

N∑i=1

xi (24)

Build new data

x1 − x,x2 − x, ...,xN − x (25)

27 / 35

Images/cinvestav-1.jpg

Build the Sample Mean

The Covariance Matrix

S = 1N − 1

N∑i=1

(xi − x) (xi − x)T (26)

Properties1 The ijth value of S is equivalent to σ2

ij .2 The iith value of S is equivalent to σ2

ii .3 What else? Look at a plane Center and Rotating!!!

28 / 35

Images/cinvestav-1.jpg

Build the Sample Mean

The Covariance Matrix

S = 1N − 1

N∑i=1

(xi − x) (xi − x)T (26)

Properties1 The ijth value of S is equivalent to σ2

ij .2 The iith value of S is equivalent to σ2

ii .3 What else? Look at a plane Center and Rotating!!!

28 / 35

Images/cinvestav-1.jpg

Using S to Project Data

As in FisherWe want to project the data to a line...

For this we use a u1

with uT1 u1 = 1

QuestionWhat is the Sample Variance of the Projected Data

29 / 35

Images/cinvestav-1.jpg

Using S to Project Data

As in FisherWe want to project the data to a line...

For this we use a u1

with uT1 u1 = 1

QuestionWhat is the Sample Variance of the Projected Data

29 / 35

Images/cinvestav-1.jpg

Using S to Project Data

As in FisherWe want to project the data to a line...

For this we use a u1

with uT1 u1 = 1

QuestionWhat is the Sample Variance of the Projected Data

29 / 35

Images/cinvestav-1.jpg

Thus we have

Variance of the projected data

1N − 1

N∑i=1

[u1xi − u1x] = uT1 Su1 (27)

Use Lagrange Multipliers to Maximize

uT1 Su1 + λ1

(1− uT

1 u1)

(28)

30 / 35

Images/cinvestav-1.jpg

Thus we have

Variance of the projected data

1N − 1

N∑i=1

[u1xi − u1x] = uT1 Su1 (27)

Use Lagrange Multipliers to Maximize

uT1 Su1 + λ1

(1− uT

1 u1)

(28)

30 / 35

Images/cinvestav-1.jpg

Derive by u1

We get

Su1 = λ1u1 (29)

Thenu1 is an eigenvector of S .

If we left-multiply by u1

uT1 Su1 = λ1 (30)

31 / 35

Images/cinvestav-1.jpg

Derive by u1

We get

Su1 = λ1u1 (29)

Thenu1 is an eigenvector of S .

If we left-multiply by u1

uT1 Su1 = λ1 (30)

31 / 35

Images/cinvestav-1.jpg

Derive by u1

We get

Su1 = λ1u1 (29)

Thenu1 is an eigenvector of S .

If we left-multiply by u1

uT1 Su1 = λ1 (30)

31 / 35

Images/cinvestav-1.jpg

ThusVariance will be the maximum when

uT1 Su1 = λ1 (31)

is set to the largest eigenvalue. Also know as the First PrincipalComponent

By InductionIt is possible for M -dimensional space to define M eigenvectorsu1,u2, ...,uM of the data covariance S corresponding to λ1, λ2, ..., λMthat maximize the variance of the projected data.

Computational Cost1 Full eigenvector decomposition O

(d3)

2 Power Method O(Md2) “Golub and Van Loan, 1996)”

3 Use the Expectation Maximization Algorithm32 / 35

Images/cinvestav-1.jpg

ThusVariance will be the maximum when

uT1 Su1 = λ1 (31)

is set to the largest eigenvalue. Also know as the First PrincipalComponent

By InductionIt is possible for M -dimensional space to define M eigenvectorsu1,u2, ...,uM of the data covariance S corresponding to λ1, λ2, ..., λMthat maximize the variance of the projected data.

Computational Cost1 Full eigenvector decomposition O

(d3)

2 Power Method O(Md2) “Golub and Van Loan, 1996)”

3 Use the Expectation Maximization Algorithm32 / 35

Images/cinvestav-1.jpg

ThusVariance will be the maximum when

uT1 Su1 = λ1 (31)

is set to the largest eigenvalue. Also know as the First PrincipalComponent

By InductionIt is possible for M -dimensional space to define M eigenvectorsu1,u2, ...,uM of the data covariance S corresponding to λ1, λ2, ..., λMthat maximize the variance of the projected data.

Computational Cost1 Full eigenvector decomposition O

(d3)

2 Power Method O(Md2) “Golub and Van Loan, 1996)”

3 Use the Expectation Maximization Algorithm32 / 35

Images/cinvestav-1.jpg

Example

From Bishop

33 / 35

Images/cinvestav-1.jpg

Example

From Bishop

34 / 35

Images/cinvestav-1.jpg

Example

From Bishop

35 / 35

top related