Download - 5.3 Algorithmic Stability Bounds
![Page 1: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/1.jpg)
5.3 Algorithmic Stability Bounds5.3 Algorithmic Stability Bounds
Summarized by:
Sang Kyun Lee
![Page 2: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/2.jpg)
(c) 2005 SNU CSE Biointelligence Lab
2
Robustness of a learning algorithmRobustness of a learning algorithm
Instead of compression and reconstruction function, now we think about the “robustness of a learning algorithm A”
Robustness a measure of the influence of an additional training example (x, y) 2 Z
on the learned hypothesis A(z) 2 H quantified in terms of the loss achieved at any test object x 2 X A robust learning algorithm guarantees
|expected risk - empirical risk| < M
even if we replace one training example by its worst counterpart This fact is of great help when using McDiarmid’s inequality (A.119)
– a large deviation result perfectly suited for the current purpose
![Page 3: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/3.jpg)
(c) 2005 SNU CSE Biointelligence Lab
3
McDiarmid’s Inequality (A.119)McDiarmid’s Inequality (A.119)
![Page 4: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/4.jpg)
(c) 2005 SNU CSE Biointelligence Lab
4
5.3.1 Algorithmic Stability for Regression5.3.1 Algorithmic Stability for Regression
Framework Training sample:
drawn iid from an unknown distribution
Hypothesis: a real-valued function
Loss function: l : R £ R ! R a function of predicted value and observed value t
![Page 5: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/5.jpg)
(c) 2005 SNU CSE Biointelligence Lab
5
NotationsNotations
Given &
![Page 6: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/6.jpg)
(c) 2005 SNU CSE Biointelligence Lab
6
mm-stability (1/2)-stability (1/2)
this implies robustness in the more usual sense of measuring the influence of an extra training example. This is formally expressed in the following theorem.
![Page 7: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/7.jpg)
(c) 2005 SNU CSE Biointelligence Lab
7
mm-stability (2/2)-stability (2/2)
Proof (theorem 5.27)
\ \
\ \
( ( ), ) ( ( ), )
= { ( ( ), ) ( ( ), )} { ( ( ), ) ( ( ), )}
| ( ( ), ) ( ( ), ) |
| ( ( ), ) ( ( ), ) | | ( ( ), ) ( ( ), ) |
i z
i z
i z
i z
z z
z z i z i z
z z
z z i z i z
l f x t l f x t
l f x t l f x t l f x t l f x t
l f x t l f x t
l f x t l f x t l f x t l f x t
m 2
![Page 8: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/8.jpg)
(c) 2005 SNU CSE Biointelligence Lab
8
Lipschitz Loss Function (1/3)Lipschitz Loss Function (1/3)
Thus, given Lipschitz continuous loss function l,
That is, we can use the difference of the two functions to bound the losses incurred by themselves at any test object x.
\| ( ( ), ) ( ( ), ) | | ( ) ( ) |i zz z l z z il f x t l f x t C f x f x
![Page 9: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/9.jpg)
(c) 2005 SNU CSE Biointelligence Lab
9
Lipschitz Loss Function (2/3)Lipschitz Loss Function (2/3)
Examples of Lipschitz continuous loss functions
![Page 10: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/10.jpg)
(c) 2005 SNU CSE Biointelligence Lab
10
Lipschitz Loss Function (3/3)Lipschitz Loss Function (3/3)
Using the concept of Lipschitz continuous loss functinos we can upper bound the value of m for a large class of learning algorithms, using the following theorem (Proof at Appendix C9.1):
Using this, we’re able to cast most of the learning algorithms presented in Part I of this book into this framework
![Page 11: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/11.jpg)
(c) 2005 SNU CSE Biointelligence Lab
11
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimationfor Regression Estimation Now, in order to obtain generalization error bounds for m-st
able learning algorithms A we proceed as follows:1. To use McDiarmid’s inequality, define a random variable g(Z) which
measure |R[fz] – Remp[fz,z]| or |R[fz] – Rloo[A,z]|.(ex) g(Z) = R[fz] – Remp[fz,z]
2. Then we need to upper bound E[g] over the random draw of training samples z 2 Zm. This is because we’re only interested in the prob. that g(Z) will be larger than some prespecified .
3. We also need an upper bound on
which should preferably not depend on i 2 {1,…,m}
Little bit crappy here!
![Page 12: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/12.jpg)
(c) 2005 SNU CSE Biointelligence Lab
12
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimation (C9.2 – 1/8)for Regression Estimation (C9.2 – 1/8)
=
=
=
Quick Proof:Expectation over the random draw of training samples z 2 Zm
![Page 13: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/13.jpg)
(c) 2005 SNU CSE Biointelligence Lab
13
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimation (C9.2 – 2/8)for Regression Estimation (C9.2 – 2/8)
Quick Proof:
![Page 14: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/14.jpg)
(c) 2005 SNU CSE Biointelligence Lab
14
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimation (C9.2 – 3/8)for Regression Estimation (C9.2 – 3/8)
![Page 15: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/15.jpg)
(c) 2005 SNU CSE Biointelligence Lab
15
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimation (C9.2 – 4/8)for Regression Estimation (C9.2 – 4/8)
Proof
by Lemma C.21
![Page 16: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/16.jpg)
(c) 2005 SNU CSE Biointelligence Lab
16
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimation (C9.2 – 5/8)for Regression Estimation (C9.2 – 5/8) Summary:
The two bounds are essentially the same the additive correction ¼ m
the decay of the prob. is O(exp(-/m m))
This result is slightly surprising, because VC theory indicates that the training error Remp is only a good indicator of t
he generalization error when the hypothesis space has a small VC dimension (Thm. 4.7)
In contrast, the loo error disregards VC dim and is an almost unbiased estimator of the expected generalization error of an algorithm (Thm 2.36)
![Page 17: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/17.jpg)
(c) 2005 SNU CSE Biointelligence Lab
17
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimation (C9.2 – 6/8)for Regression Estimation (C9.2 – 6/8)
However, recall that VC theory is used for empirical risk minimization algos which o
nly consider the training error as the coast function to be minimized
In contrast, in the current formulation we have to guarantee a certain stability of the learning algorithm
: in case of ! 0 (the learning algorithm minimizes the emp risk only, we can no longer guarantee a finite stability.
![Page 18: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/18.jpg)
(c) 2005 SNU CSE Biointelligence Lab
18
Algorithmic Stability BoundAlgorithmic Stability Boundfor Regression Estimation (C9.2 – 7/8)for Regression Estimation (C9.2 – 7/8)
Let’s consider m-stable algorithm A s.t. m · m-1
From thm 5.32,
! with probability of at least 1-.
This is an amazingly tight generalization error bound whenever ¿ because the expression is dominated by the second term
Moreover, this provides us practical guides on the possible values of the trade-off parameter . From (5.19),
m
regardless of the empirical term Remp[A(z),z]
![Page 19: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/19.jpg)
(c) 2005 SNU CSE Biointelligence Lab
19
2 1
2 1
1
1
From new error bounds expression,
2 2(4 ) ln( )[ ( )] [ ( ), ] .
Since we assume -insensitive loss,
1 2 2(4 ) ln( )[ ( )] [ max(| | ,0)]
1[ max(| , | ,0)] ..
emp
m
Z i ii
m
Z i ii
bR A z R A z z
m m
bR A z E t t
m m m
E t w xm
1
1
22 1
2 2 1 2 1
.
1[ ( )] ...
1...
1 2 2(4 ) ln( ) ( 1, )
2
1 2(4 ) ln( )
m
Z ii
m
ii
ll
Em
m
CbC
m m m
b
m m
T
T
ξ 1
ξ 1
![Page 20: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/20.jpg)
(c) 2005 SNU CSE Biointelligence Lab
20
5.3.2 Algorithmic Stability for Classification5.3.2 Algorithmic Stability for Classification
Framework Training sample:
Hypothesis:
Loss function: Confine to zero-one loss, although the following also applies to any loss that takes a finite
set of values.
![Page 21: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/21.jpg)
(c) 2005 SNU CSE Biointelligence Lab
21
mm stability stability
For a given classification algorithm However, here we have m 2 {0,1} only.
m= 0 occurs if, for all training samples z 2 Zm and all test examples (x,y) 2 Z,
which is only possible if H only contains on hypothesis.
If we exclude this trivial case, then thm 5.32 gives trivial result
![Page 22: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/22.jpg)
(c) 2005 SNU CSE Biointelligence Lab
22
Refined Loss Function (1/2)Refined Loss Function (1/2)
In order to circumvent this problem, we think about the real-valued output f(x) and the classifier of the form h(¢)=sign(f(¢)). As our ultimate interest is the generalization error
, Consider a loss function:
which is a upper bound of the function Advantage from this loss function settings:
![Page 23: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/23.jpg)
(c) 2005 SNU CSE Biointelligence Lab
23
Refined Loss Function (2/2)Refined Loss Function (2/2)
Another useful requirement on the refined loss function l is Lipschitz continuity with a small Lipschitz constant This can be done by adjusting the linear soft margin loss
: where y 2 {-1,+1}
1. Modify this function to output at least on the correct side
2. Loss function has to pass through 1 for f(x)=01. Thus the steepness of the function is 1/2. Therefore the Lipschitz constant is also 1/
3. The function should be in the interval [0,1] because the zero-one loss will never exceed 1.
ˆ ˆ( , ) max{1 ,0}linl t y yt
![Page 24: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/24.jpg)
(c) 2005 SNU CSE Biointelligence Lab
24
![Page 25: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/25.jpg)
(c) 2005 SNU CSE Biointelligence Lab
25
Algorithmic Stability for Classification (1/3)Algorithmic Stability for Classification (1/3)
• For ! 1, the first term is provably non-increasing whereas the second term is always decreasing
![Page 26: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/26.jpg)
(c) 2005 SNU CSE Biointelligence Lab
26
Algorithmic Stability for Classification (2/3)Algorithmic Stability for Classification (2/3)
Consider this thm for the special case of linear soft margin SVM for classification (see 2.4.2)
WLOG, assume = 1
22 1
2 2
2
1
1
From the error bounds for classification,
2(2 1) ln( )[ ( ( ))] [ ( ), ] .
Since we assume soft-margin loss,
1[ ( )] [ max(|1 |,0)] ...
1[ max(|1 , |,0)] ..
emp
m
Z ii
m
Z i ii
R sign A z R A z zm m
R A z E ytm
E y w xm
1
1
22 1
2 2
2
1 2 1
.
1[ )] ...
1...
2(2 1) ln( )1 ( =1)
1 1 ( 1) ln( )2
m
Z ii
m
ii
Em
m
m m m
m m
T
T
ξ 1
ξ 1
![Page 27: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/27.jpg)
(c) 2005 SNU CSE Biointelligence Lab
27
Algorithmic Stability for Classification (3/3)Algorithmic Stability for Classification (3/3)
This bounds provides an interesting model selection criterion, by which we select the value of (the assumed noise level).
In contrast to the result of Subsection 4.4.3, this bound only holds for the linear soft margin SVM
The results in this section are so recent that no empirical studies have yet been carried out
![Page 28: 5.3 Algorithmic Stability Bounds](https://reader036.vdocument.in/reader036/viewer/2022070412/56814b99550346895db87991/html5/thumbnails/28.jpg)
(c) 2005 SNU CSE Biointelligence Lab
28
Algorithmic Stability for Classification (4/4)Algorithmic Stability for Classification (4/4)