learning from data lecture 10 nonlinear transformsmagdon/courses/lfd-slides/slideslect10.pdfchange...
TRANSCRIPT
![Page 1: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/1.jpg)
Learning From Data
Lecture 10
Nonlinear Transforms
The Z-space
Polynomial transforms
Be careful
M. Magdon-IsmailCSCI 4100/6100
![Page 2: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/2.jpg)
recap: The Linear Model
linear in x: gives the line/hyperplane separator
↓
s = wtx
↑linear in w: makes the algorithms work
CreditAnalysis
Amountof Credit
Approveor Deny
Probabilityof Default
Perceptron
Logistic Regression
Linear Regression
Classification ErrorPLA, Pocket,. . .
Cross-entropy ErrorGradient descent
Squared ErrorPseudo-inverse
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 2 /17 Limitations of linear −→
![Page 3: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/3.jpg)
The Linear Model has its Limits
(a) Linear with outliers (b) Essentially nonlinear
To address (b) we need something more than linear.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 3 /17 Change the features −→
![Page 4: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/4.jpg)
Change Your Features
Years in Residence, Y
Income
Y ≫ 3 yearsno additional effect beyond Y = 3;
Y ≪ 0.3 yearsno additional effect below Y = 0.3.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 4 /17 ‘Transform’ your features −→
![Page 5: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/5.jpg)
Change Your Features Using a Transform
Years in Residence, Y
Income
z1Income
z1
Y
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 5 /17 Feature transform I: Z-space −→
![Page 6: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/6.jpg)
Mechanics of the Feature Transform I
Transform the data to a Z-space in which the data is separable.
x1
x2 −→
z1 = x21
z2=x2 2
x =
1
x1
x2
−→ z = Φ(x) =
1
x21
x22
=
1
Φ1(x)
Φ2(x)
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 6 /17 Feature transform II: classify in Z-space −→
![Page 7: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/7.jpg)
Mechanics of the Feature Transform II
Separate the data in the Z-space with w̃:
g̃(z) = sign(w̃tz)
−→
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 7 /17 Feature transform III: bring back to X -space −→
![Page 8: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/8.jpg)
Mechanics of the Feature Transform III
To classify a new x, first transform x to Φ(x) ∈ Z-space and classify there with g̃.
g(x) = g̃(Φ(x))
= sign(w̃tΦ(x))g̃(z) = sign(w̃tz)
←−
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 8 /17 Summary of nonlinear transform −→
![Page 9: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/9.jpg)
The General Feature Transform
X -space is Rd Z-space is Rd̃
x =
1x1...xd
z = Φ(x) =
1Φ1(x)
...Φd̃(x)
=
1z1...zd̃
x1,x2, . . . ,xN z1, z2, . . . , zN
y1, y2, . . . , yN y1, y2, . . . , yN
no weights w̃ =
w0
w1...wd̃
g(x) = sign(w̃tΦ(x))
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 9 /17 Generalization −→
![Page 10: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/10.jpg)
Generalization
dvc d̃vc
d + 1 −→ d̃+ 1
Choose the feature transform with smallest d̃
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 10 /17 Many possibilities to choose from −→
![Page 11: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/11.jpg)
Many Nonlinear Features May Work
x1
x2 ←− x2
1 + x22 = 0.6
z1 = (x1 + 0.05)2
z2=x2
0
z1 = x21
z2=x2 2
z1 = x21 + x22 − 0.6
A rat! A rat!
This is called data snooping: looking at your data and tailoring your H.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 11 /17 Choose before looking at data −→
![Page 12: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/12.jpg)
Must Choose Φ BEFORE Your Look at the Data
After constructing features carefully, before seeing the data . . .
. . . if you think linear is not enough, try the 2nd order polynomial transform.
1
x1
x2
= x −→ Φ(x) =
1
Φ1(x)
Φ2(x)
Φ3(x)
Φ4(x)
Φ5(x)
=
1
x1
x2
x21
x1x2
x22
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 12 /17 The polynomial transform −→
![Page 13: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/13.jpg)
The General Polynomial Transform Φk
We can get even fancier: degree-k polynomial transform:
Φ1(x) = (1, x1, x2),
Φ2(x) = (1, x1, x2, x21, x1x2, x
22),
Φ3(x) = (1, x1, x2, x21, x1x2, x
22, x
31, x
21x2, x1x
22, x
32),
Φ4(x) = (1, x1, x2, x21, x1x2, x
22, x
31, x
21x2, x1x
22, x
32, x
41, x
31x2, x
21x
22, x1x
32, x
42),
...
– Dimensionality of the feature space increases rapidly (dvc)!
– Similar transforms for d-dimensional original space.
– Approximation-generalization tradeoffHigher degree gives lower (even zero) Ein but worse generalization.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 13 /17 Be carefull with nonlinear transforms −→
![Page 14: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/14.jpg)
Be Careful with Feature Transforms
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 14 /17 Insist on Ein = 0 −→
![Page 15: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/15.jpg)
Be Careful with Feature Transforms
High order polynomial transform leads to “nonsense”.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 15 /17 Digits data −→
![Page 16: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/16.jpg)
Digits Data “1” Versus “All”
Average Intensity
Sym
metry
0.35
Average Intensity
Sym
metry
Linear model
Ein = 2.13%Eout = 2.38%
3rd order polynomial model
Ein = 1.75%Eout = 1.87%
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 16 /17 Use the linear model! −→
![Page 17: Learning From Data Lecture 10 Nonlinear Transformsmagdon/courses/LFD-Slides/SlidesLect10.pdfChange Your Features Years in Residence, Y Income Y ≫3 years no additional effect beyond](https://reader033.vdocument.in/reader033/viewer/2022041922/5e6c6df0f643dd018213f825/html5/thumbnails/17.jpg)
Use the Linear Model!
• First try a linear model – simple, robust and works.
• Algorithms can tolerate error plus you have nonlinear feature transforms.
• Choose a feature transform before seeing the data. Stay simple.Data snooping is hazardous to your Eout.
• Linear models are fundamental in their own right; they are also the building blocksof many more complex models like support vector machines.
• Nonlinear transforms also apply to regression and logistic regression.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 17 /17