![Page 1: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/1.jpg)
Excess Risk Decomposition
David S. Rosenberg
Bloomberg ML EDU
September 29, 2017
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27
![Page 2: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/2.jpg)
Excess Risk Decomposition
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 2 / 27
![Page 3: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/3.jpg)
Error Decomposition
f ∗ =argminf
E`(f (X ),Y )
fF =argminf∈F
E`(f (X ),Y ))
f̂n =argminf∈F
1n
n∑i=1
`(f (xi ),yi )
Approximation Error (of F) = R(fF)−R(f ∗)
Estimation error (of f̂n in F) = R(f̂n)−R(fF)
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 3 / 27
![Page 4: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/4.jpg)
Excess Risk
DefinitionThe excess risk compares the risk of f to the Bayes optimal f ∗:
Excess Risk(f ) = R(f )−R(f ∗)
Can excess risk ever be negative?
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 4 / 27
![Page 5: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/5.jpg)
Excess Risk Decomposition for ERM
The excess risk of the ERM f̂n can be decomposed:
Excess Risk(f̂n) = R(f̂n)−R(f ∗)
= R(f̂n)−R(fF)︸ ︷︷ ︸estimation error
+ R(fF)−R(f ∗)︸ ︷︷ ︸approximation error
.
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 5 / 27
![Page 6: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/6.jpg)
Approximation Error
Approximation error R(fF)−R(f ∗) isa property of the class Fthe penalty for restricting to F (rather than considering all possible functions)
Bigger F mean smaller approximation error.
Concept check: Is approximation error a random or non-random variable?
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 6 / 27
![Page 7: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/7.jpg)
Estimation Error
Estimation error R(f̂n)−R(fF)
is the performance hit for choosing f using finite training datais the performance hit for minimizing empirical risk rather than true risk
With smaller F we expect smaller estimation error.Under typical conditions: “With infinite training data, estimation error goes to zero.”
Concept check: Is estimation error a random or non-random variable?
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 7 / 27
![Page 8: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/8.jpg)
ERM Overview
Given a loss function ` :A×Y→ R.Choose hypothesis space F.Use an optimization method to find ERM f̂n ∈ F:
f̂n = argminf∈F
1n
n∑i=1
`(f (xi ),yi ).
Data scientist’s job:choose F to balance between approximation and estimation error.as we get more training data, use a bigger F
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 8 / 27
![Page 9: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/9.jpg)
ERM in Practice
We’ve been cheating a bit by writing “argmin”.In practice, we need a method to find f̂n ∈ F.For nice choices of loss functions and classes F, we can get arbitrarily close to a minimizer
But takes time – is it worth it?
For some hypothesis spaces (e.g. neural networks), we don’t know how to find f̂n ∈ F.
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 9 / 27
![Page 10: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/10.jpg)
Optimization Error
In practice, we don’t find the ERM f̂n ∈ F.We find f̃n ∈ F that we hope is good enough.Optimization error: If f̃n is the function our optimization method returns, and f̂n is theempirical risk minimizer, then
Optimization Error = R(f̃n)−R(f̂n).
Can optimization error be negative? Yes!But
R̂(f̃n)− R̂(f̂ (n))> 0.
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 10 / 27
![Page 11: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/11.jpg)
Error Decomposition in Practice
Excess risk decomposition for function f̃n returned by algorithm:
Excess Risk(f̃n) = R(f̃n)−R(f ∗)
= R(f̃n)−R(f̂n)︸ ︷︷ ︸optimization error
+R(f̂n)−R(fF)︸ ︷︷ ︸estimation error
+ R(fF)−R(f ∗)︸ ︷︷ ︸approximation error
Concept check: It would be nice to have a concrete example where we find an f̃n and lookat it’s error decomposition. Why is this usually impossible?But we could constuct an artificial example, where we know PX×Y and f ∗ and fF...
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 11 / 27
![Page 12: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/12.jpg)
Excess Risk Decomposition: Example
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 12 / 27
![Page 13: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/13.jpg)
A Simple Classification Problem
Y = {blue,orange}PX = Uniform([0,1]2)
P(orange | x1 > x2) = .9P(orange | x1 < x2) = .1
Bayes Error Rate= 0.1
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 13 / 27
![Page 14: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/14.jpg)
Binary Decision Trees on R2
Consider a binary tree on {(X1,X2) | X1,X2 ∈ R}
From An Introduction to Statistical Learning, with applications in R (Springer, 2013) with permission from the authors: G. James, D. Witten,T. Hastie and R. Tibshirani.
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 14 / 27
![Page 15: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/15.jpg)
Hypothesis Space: Decision Tree
F ={all decision tree classifiers on [0,1]2
}Fd =
{all decision tree classifiers on [0,1]2 with DEPTH6 d
}We’ll consider
F1 ⊂ F2 ⊂ F3 ⊂ F4 · · · ⊂ F15
Bayes error rate = 0.1
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 15 / 27
![Page 16: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/16.jpg)
Theoretical Best in F1
Risk Minimizer in F1 has Risk= P(error) = 0.3.Approximation Error= 0.3−0.1= 0.2.
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 16 / 27
![Page 17: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/17.jpg)
Theoretical Best in F2
Risk Minimizer in F2 has Risk= P(error) = 0.2.Approximation Error = 0.2 - 0.1 = 0.1
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 17 / 27
![Page 18: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/18.jpg)
Theoretical Best in F3
Risk Minimizer in F3 has Risk= P(error) = 0.15.Approximation Error = 0.15 - 0.1 = 0.05
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 18 / 27
![Page 19: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/19.jpg)
Theoretical Best in F4
Risk Minimizer in F4 has Risk= P(error) = 0.125.Approximation Error = 0.125 - 0.1 = 0.025
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 19 / 27
![Page 20: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/20.jpg)
Decision Tree in F3 Estimated From Sample (n = 1024)
R(f̃ ) = P(error) = 0.176± .004
Estimation Error+Optimization Error= 0.176± .004︸ ︷︷ ︸R(f̃ )
− 0.150︸ ︷︷ ︸minf∈F3 R(f )
= .026± .004
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 20 / 27
![Page 21: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/21.jpg)
Decision Tree in F4 Estimated From Sample (n = 1024)
R(f̃ ) = P(error) = 0.144± .005
Estimation Error+Optimization Error= 0.144± .005︸ ︷︷ ︸R(f̃ )
− 0.125︸ ︷︷ ︸minf∈F4 R(f )
= .019± .005
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 21 / 27
![Page 22: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/22.jpg)
Decision Tree in F6 Estimated From Sample (n = 1024)
R(f̃ ) = P(error) = 0.148± .007
Estimation Error+Optimization Error= 0.148± .007︸ ︷︷ ︸R(f̃ )
− 0.106︸ ︷︷ ︸minf∈F6 R(f )
= .042± .007
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 22 / 27
![Page 23: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/23.jpg)
Decision Tree in F8 Estimated From Sample (n = 1024)
R(f̃ ) = P(error) = 0.162± .009
Estimation Error+Optimization Error= 0.162± .009︸ ︷︷ ︸R(f̃ )
− 0.102︸ ︷︷ ︸minf∈F8 R(f )
= .061± .009
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 23 / 27
![Page 24: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/24.jpg)
Decision Tree in F8 Estimated From Sample (n = 2048)
R(f̃ ) = P(error) = 0.146± .006
Estimation Error+Optimization Error= 0.146± .006︸ ︷︷ ︸R(f̃ )
− 0.102︸ ︷︷ ︸minf∈F3 R(f )
= .045± .006
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 24 / 27
![Page 25: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/25.jpg)
Decision Tree in F8 Estimated From Sample (n = 8192)
R(f̃ ) = P(error) = 0.121± .002
Estimation Error+Optimization Error= 0.121± .002︸ ︷︷ ︸R(f̃ )
− 0.102︸ ︷︷ ︸minf∈F3 R(f )
= .019± .002
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 25 / 27
![Page 26: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/26.jpg)
Risk Summary
Why do some curves have confidence bands and others not?David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 26 / 27
![Page 27: ExcessRiskDecomposition - GitHub Pages...ExcessRiskDecomposition DavidS.Rosenberg Bloomberg ML EDU September29,2017 David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 1 / 27](https://reader035.vdocument.in/reader035/viewer/2022062919/5f01bd587e708231d400ce31/html5/thumbnails/27.jpg)
Excess Risk Decomposition
Why do some curves have confidence bands and others not?
David S. Rosenberg (Bloomberg ML EDU) September 29, 2017 27 / 27