learning from data lecture 10 nonlinear transformsmagdon/courses/lfd-slides/slideslect10.pdfchange...

17
Learning From Data Lecture 10 Nonlinear Transforms The Z -space Polynomial transforms Be careful M. Magdon-Ismail CSCI 4100/6100

Upload: others

Post on 12-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Learning From Data

Lecture 10

Nonlinear Transforms

The Z-space

Polynomial transforms

Be careful

M. Magdon-IsmailCSCI 4100/6100

Page 2: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

recap: The Linear Model

linear in x: gives the line/hyperplane separator

s = wtx

↑linear in w: makes the algorithms work

CreditAnalysis

Amountof Credit

Approveor Deny

Probabilityof Default

Perceptron

Logistic Regression

Linear Regression

Classification ErrorPLA, Pocket,. . .

Cross-entropy ErrorGradient descent

Squared ErrorPseudo-inverse

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 2 /17 Limitations of linear −→

Page 3: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

The Linear Model has its Limits

(a) Linear with outliers (b) Essentially nonlinear

To address (b) we need something more than linear.

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 3 /17 Change the features −→

Page 4: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Change Your Features

Years in Residence, Y

Income

Y ≫ 3 yearsno additional effect beyond Y = 3;

Y ≪ 0.3 yearsno additional effect below Y = 0.3.

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 4 /17 ‘Transform’ your features −→

Page 5: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Change Your Features Using a Transform

Years in Residence, Y

Income

z1Income

z1

Y

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 5 /17 Feature transform I: Z-space −→

Page 6: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Mechanics of the Feature Transform I

Transform the data to a Z-space in which the data is separable.

x1

x2 −→

z1 = x21

z2=x2 2

x =

1

x1

x2

−→ z = Φ(x) =

1

x21

x22

=

1

Φ1(x)

Φ2(x)

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 6 /17 Feature transform II: classify in Z-space −→

Page 7: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Mechanics of the Feature Transform II

Separate the data in the Z-space with w̃:

g̃(z) = sign(w̃tz)

−→

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 7 /17 Feature transform III: bring back to X -space −→

Page 8: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Mechanics of the Feature Transform III

To classify a new x, first transform x to Φ(x) ∈ Z-space and classify there with g̃.

g(x) = g̃(Φ(x))

= sign(w̃tΦ(x))g̃(z) = sign(w̃tz)

←−

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 8 /17 Summary of nonlinear transform −→

Page 9: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

The General Feature Transform

X -space is Rd Z-space is Rd̃

x =

1x1...xd

z = Φ(x) =

1Φ1(x)

...Φd̃(x)

=

1z1...zd̃

x1,x2, . . . ,xN z1, z2, . . . , zN

y1, y2, . . . , yN y1, y2, . . . , yN

no weights w̃ =

w0

w1...wd̃

g(x) = sign(w̃tΦ(x))

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 9 /17 Generalization −→

Page 10: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Generalization

dvc d̃vc

d + 1 −→ d̃+ 1

Choose the feature transform with smallest d̃

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 10 /17 Many possibilities to choose from −→

Page 11: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Many Nonlinear Features May Work

x1

x2 ←− x2

1 + x22 = 0.6

z1 = (x1 + 0.05)2

z2=x2

0

z1 = x21

z2=x2 2

z1 = x21 + x22 − 0.6

A rat! A rat!

This is called data snooping: looking at your data and tailoring your H.

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 11 /17 Choose before looking at data −→

Page 12: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Must Choose Φ BEFORE Your Look at the Data

After constructing features carefully, before seeing the data . . .

. . . if you think linear is not enough, try the 2nd order polynomial transform.

1

x1

x2

= x −→ Φ(x) =

1

Φ1(x)

Φ2(x)

Φ3(x)

Φ4(x)

Φ5(x)

=

1

x1

x2

x21

x1x2

x22

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 12 /17 The polynomial transform −→

Page 13: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

The General Polynomial Transform Φk

We can get even fancier: degree-k polynomial transform:

Φ1(x) = (1, x1, x2),

Φ2(x) = (1, x1, x2, x21, x1x2, x

22),

Φ3(x) = (1, x1, x2, x21, x1x2, x

22, x

31, x

21x2, x1x

22, x

32),

Φ4(x) = (1, x1, x2, x21, x1x2, x

22, x

31, x

21x2, x1x

22, x

32, x

41, x

31x2, x

21x

22, x1x

32, x

42),

...

– Dimensionality of the feature space increases rapidly (dvc)!

– Similar transforms for d-dimensional original space.

– Approximation-generalization tradeoffHigher degree gives lower (even zero) Ein but worse generalization.

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 13 /17 Be carefull with nonlinear transforms −→

Page 14: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Be Careful with Feature Transforms

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 14 /17 Insist on Ein = 0 −→

Page 15: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Be Careful with Feature Transforms

High order polynomial transform leads to “nonsense”.

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 15 /17 Digits data −→

Page 16: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Digits Data “1” Versus “All”

Average Intensity

Sym

metry

0.35

Average Intensity

Sym

metry

Linear model

Ein = 2.13%Eout = 2.38%

3rd order polynomial model

Ein = 1.75%Eout = 1.87%

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 16 /17 Use the linear model! −→

Page 17: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond

Use the Linear Model!

• First try a linear model – simple, robust and works.

• Algorithms can tolerate error plus you have nonlinear feature transforms.

• Choose a feature transform before seeing the data. Stay simple.Data snooping is hazardous to your Eout.

• Linear models are fundamental in their own right; they are also the building blocksof many more complex models like support vector machines.

• Nonlinear transforms also apply to regression and logistic regression.

c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 17 /17