e m e - yaser abu-mostafawork.caltech.edu/slides/slides07.pdf · c v dimension the c v dimension of...
TRANSCRIPT
![Page 1: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/1.jpg)
Review of Le ture 6• mH(N) is polynomialif H has a break point k
bottom
top
k
1 2 3 4 5 6 . .
1 1 2 2 2 2 2 . .
2 1 3 4 4 4 4 . .
3 1 4 7 8 8 8 . .
N 4 1 5 11 . . . . . . . .
5 1 6 : .
6 1 7 : .
: : : : .
mH(N) ≤k−1∑
i=0
(N
i
)
︸ ︷︷ ︸maximum power is Nk−1
• The VC Inequality
(a) (b) (c)
.
data setsspace of
Hoeffding Inequality Union Bound VC Bound
D
PP [ |Ein(g) − Eout(g)| > ǫ ] ≤ 2 M e− 2 ǫ2N
↓ ↓ ↓
↓ ↓ ↓PP [ |Ein(g) − Eout(g)| > ǫ ] ≤ 4 mH(2N) e−
1
8ǫ2N
![Page 2: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/2.jpg)
Learning From DataYaser S. Abu-MostafaCalifornia Institute of Te hnologyLe ture 7: The VC Dimension
Sponsored by Calte h's Provost O e, E&AS Division, and IST • Tuesday, April 24, 2012
![Page 3: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/3.jpg)
Outline• The denition• VC dimension of per eptrons• Interpreting the VC dimension• Generalization bounds © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 2/24
![Page 4: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/4.jpg)
Denition of VC dimensionThe VC dimension of a hypothesis set H, denoted by dv (H), is
the largest value of N for whi h mH(N) = 2N
the most points H an shatterN ≤ dv (H) =⇒ H an shatter N pointsk > dv (H) =⇒ k is a break point for H
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 3/24
![Page 5: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/5.jpg)
The growth fun tionIn terms of a break point k:
mH(N) ≤k−1∑
i=0
(N
i
)
In terms of the VC dimension dv :mH(N) ≤
dv ∑
i=0
(N
i
)
︸ ︷︷ ︸maximum power is N dv © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 4/24
![Page 6: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/6.jpg)
Examples• H is positive rays: •
dv = 1
• H is 2D per eptrons: •
dv = 3 • •
• H is onvex sets:dv = ∞
up
bottom © AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 5/24
![Page 7: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/7.jpg)
VC dimension and learningdv (H) is nite =⇒ g ∈ H will generalize• Independent of the learning algorithm• Independent of the input distribution• Independent of the target fun tion HYPOTHESIS SET
ALGORITHM
LEARNING FINALHYPOTHESIS
H
A
g ~ f ~
f: X Y
TRAINING EXAMPLES
UNKNOWN TARGET FUNCTION
DISTRIBUTION
PROBABILITY
onP X
x y x yNN11
( , ), ... , ( , )
up
down
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 6/24
![Page 8: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/8.jpg)
VC dimension of per eptronsFor d = 2, dv = 3
In general, dv = d + 1
We will prove two dire tions:dv ≤ d + 1
dv ≥ d + 1
down
up
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 7/24
![Page 9: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/9.jpg)
Here is one dire tionA set of N = d + 1 points in R
d shattered by the per eptron:
X =
xT1 xT2 xT3 ...x
Td+1
=
1 0 0 . . . 0
1 1 0 . . . 0
1 0 1 0
. . . ... . . . 0
1 0 . . . 0 1
X is invertible © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 8/24
![Page 10: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/10.jpg)
Can we shatter this data set?
For any y =
y1
y2...yd+1
=
±1
±1...±1
, an we nd a ve tor w satisfyingsign(Xw) = y
Easy! Just make sign(Xw)= y
whi h means w = X−1y
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 9/24
![Page 11: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/11.jpg)
We an shatter these d + 1 pointsThis implies what?[a dv = d + 1
[b dv ≥ d + 1 X
[ dv ≤ d + 1
[d No on lusion © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 10/24
![Page 12: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/12.jpg)
Now, to show that dv ≤ d + 1
We need to show that:[a There are d + 1 points we annot shatter[b There are d + 2 points we annot shatter[ We annot shatter any set of d + 1 points[d We annot shatter any set of d + 2 points X
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 11/24
![Page 13: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/13.jpg)
Take any d + 2 pointsFor any d + 2 points,
x1, · · · ,xd+1,xd+2
More points than dimensions =⇒ we must havexj =
∑
i6=j
ai xi
where not all the ai's are zeros © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 12/24
![Page 14: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/14.jpg)
So?xj =
∑
i6=j
ai xi
Consider the following di hotomy:xi's with non-zero ai get yi = sign(ai)
and xj gets yj = −1
No per eptron an implement su h di hotomy! © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 13/24
![Page 15: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/15.jpg)
Why?xj =
∑
i6=j
ai xi =⇒ wTxj =
∑
i6=j
ai wTxi
If yi = sign(wTxi) = sign(ai), then ai w
Txi > 0
This for esw
Txj =
∑
i6=j
ai wTxi > 0
Therefore, yj = sign(wTxj) = +1
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 14/24
![Page 16: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/16.jpg)
Putting it togetherWe proved dv ≤ d + 1 and dv ≥ d + 1
dv = d + 1
What is d + 1 in the per eptron?It is the number of parameters w0, w1, · · · , wd
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 15/24
![Page 17: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/17.jpg)
Outline• The denition• VC dimension of per eptrons• Interpreting the VC dimension• Generalization bounds © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 16/24
![Page 18: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/18.jpg)
1. Degrees of freedomParameters reate degrees of freedom# of parameters: analog degrees of freedomdv : equivalent `binary' degrees of freedom
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 17/24
![Page 19: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/19.jpg)
The usual suspe tsPositive rays (dv = 1):
PSfrag repla ements
x1 x2 x3 xN. . .
h(x) = −1 h(x) = +1a
00.20.40.60.81-0.1-0.08-0.06-0.04-0.0200.020.040.060.080.1Positive intervals (dv = 2):
PSfrag repla ements
x1 x2 x3 xN. . .
h(x) = −1 h(x) = −1h(x) = +1
00.20.40.60.81-0.1-0.08-0.06-0.04-0.0200.020.040.060.080.1 © AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 18/24
![Page 20: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/20.jpg)
Not just parametersParameters may not ontribute degrees of freedom:
down
down
yx
dv measures the ee tive number of parameters © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 19/24
![Page 21: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/21.jpg)
2. Number of data points neededTwo small quantities in the VC inequality:
PP [|Ein(g) − Eout(g)| > ǫ] ≤ 4mH(2N)e−1
8ǫ2N
︸ ︷︷ ︸δ
If we want ertain ǫ and δ, how does N depend on dv ?Let us look at N de−N
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 20/24
![Page 22: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/22.jpg)
N de−N
Fix N de−N = small valueHow does N hange with d?Rule of thumb:
N ≥ 10 dv 20 40 60 80 100 120 140 160 180 200
10−5
100
105
1010
N 30e−N
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 21/24
![Page 23: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/23.jpg)
Outline• The denition• VC dimension of per eptrons• Interpreting the VC dimension• Generalization bounds © AM
L Creator: Yaser Abu-Mostafa - LFD Le ture 7 22/24
![Page 24: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/24.jpg)
Rearranging thingsStart from the VC inequality:
PP [|Eout − Ein| > ǫ] ≤ 4mH(2N)e−1
8ǫ2N
︸ ︷︷ ︸δGet ǫ in terms of δ:
δ = 4mH(2N)e−1
8ǫ2N =⇒ ǫ =
√
8
Nln
4mH(2N)
δ︸ ︷︷ ︸Ω
With probability ≥ 1 − δ, |Eout − Ein| ≤ Ω(N,H, δ) © AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 23/24
![Page 25: E M e - Yaser Abu-Mostafawork.caltech.edu/slides/slides07.pdf · C V dimension The C V dimension of a hyp othesis set H, denoted y b d c v (H) is the rgest la value of N r fo which](https://reader035.vdocument.in/reader035/viewer/2022070722/5f01c1847e708231d400e239/html5/thumbnails/25.jpg)
Generalization boundWith probability ≥ 1 − δ, |Eout − Ein| ≤ Ω(N,H, δ)
=⇒
With probability ≥ 1 − δ,Eout ≤ Ein + Ω
© AML Creator: Yaser Abu-Mostafa - LFD Le ture 7 24/24