bias-variance in machine learningwcohen/10-601/bias-variance.pdf · bias-variance decomposition •...
TRANSCRIPT
![Page 1: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/1.jpg)
Bias-Variance in Machine Learning
![Page 2: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/2.jpg)
Bias-Variance: Outline
• Underfitting/overfitting: – Why are complex hypotheses bad?
• Simple example of bias/variance • Error as bias+variance for regression
– brief comments on how it extends to classification
• Measuring bias, variance and error • Bagging - a way to reduce variance • Bias-variance for classification
![Page 3: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/3.jpg)
Bias/Variance is a Way to Understand Overfitting and Underfitting
Error/Loss on training set Dtrain
Error/Loss on an unseen test set Dtest
high error
3
complex classifier simple classifier
“too simple” “too complex”
![Page 4: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/4.jpg)
Bias-Variance: An Example
![Page 5: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/5.jpg)
Example Tom Dietterich, Oregon St
![Page 6: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/6.jpg)
Example Tom Dietterich, Oregon St
Same experiment, repeated: with 50 samples of 20 points each
![Page 7: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/7.jpg)
The true function f can’t be fit perfectly with hypotheses from our class H (lines) è Error1
We don’t get the best hypothesis from H because of noise/small sample size è Error2
Fix: more expressive set of hypotheses H
Fix: less expressive set of hypotheses H
noise is similar to error1
![Page 8: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/8.jpg)
Bias-Variance Decomposition: Regression
![Page 9: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/9.jpg)
Bias and variance for regression
• For regression, we can easily decompose the error of the learned model into two parts: bias (error 1) and variance (error 2) – Bias: the class of models can’t fit the data.
• Fix: a more expressive model class.
– Variance: the class of models could fit the data, but doesn’t because it’s hard to fit.
• Fix: a less expressive model class.
![Page 10: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/10.jpg)
Bias – Variance decomposition of error
learned from D
( ){ } )()( 2, xhxfE DD −+εε
true function
dataset and noise
Fix test case x, then do this experiment:
1. Draw size n sample D=(x1,y1),….(xn,yn)
2. Train linear regressor hD using D
3. Draw one test example (x, f(x)+ε)
4. Measure squared error of hD on that example x
What’s the expected error? 10
noise
![Page 11: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/11.jpg)
Bias – Variance decomposition of error
learned from D
ED,ε f (x)+ε − hD (x)( )2 { }
true function
dataset and noise
11
noise
)}({ xhEh DD≡
)(ˆˆ xhyy DD ≡=
Notation - to simplify this
f ≡ f (x)+ε
long-term expectation of learner’s prediction on this x averaged over many data sets D
![Page 12: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/12.jpg)
Bias – Variance decomposition of error
ED,ε ( f − y)2 { }= E [ f − h]+[h− y]( )2 { }= E [ f − h]2 +[h− y]2 + 2[ f − h][h− y] { }= E [ f − h]2 +[h− y]2 + 2[ fh− fy− h2 + hy] { }= E[( f − h)2 ]+E[(h− y)2 ]+ 2 E[ fh]−E[ fy]−E[h2 ]+E[hy]( )
)}({ xhEh DD≡)(ˆˆ xhyy DD ≡=
f ≡ f (x)+ε
ED,ε f (x)+ε( )*ED hD (x){ }{ }= ED,ε f (x)+ε( )*hD (x){ }
ED,ε ED hD (x){ }*ED hD (x){ }{ }= ED,ε ED hD (x){ }*hD (x){ }
![Page 13: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/13.jpg)
Bias – Variance decomposition of error
ED,ε ( f − y)2 { }= E [ f − h]+[h− y]( )2 { }= E [ f − h]2 +[h− y]2 + 2[ f − h][h− y] { }= E[( f − h)2 ]+E[(h− y)2 ]
Squared difference btwn our long-term expectation for the learners performance, ED[hD(x)], and what we expect in a representative run
on a dataset D (hat y)
Squared difference between best possible
prediction for x, f(x), and our “long-term” expectation for what the learner will do if we averaged over many
datasets D, ED[hD(x)]
)}({ xhEh DD≡)(ˆˆ xhyy DD ≡=
BIAS2
VARIANCE
13
f ≡ f (x)+ε
![Page 14: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/14.jpg)
bias
variance
x=5
![Page 15: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/15.jpg)
Bias-variance decomposition
• This is something real that you can (approximately) measure experimentally – if you have synthetic data
• Different learners and model classes have different tradeoffs – large bias/small variance: few features, highly
regularized, highly pruned decision trees, large-k k-NN…
– small bias/high variance: many features, less regularization, unpruned trees, small-k k-NN…
![Page 16: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/16.jpg)
Bias-Variance Decomposition: Classification
![Page 17: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/17.jpg)
A generalization of bias-variance decomposition to other loss functions
• “Arbitrary” real-valued loss L(y,y’) But L(y,y’)=L(y’,y), L(y,y)=0, and L(y,y’)!=0 if y!=y’
• Define “optimal prediction”: y* = argmin y’ L(t,y’)
• Define “main prediction of learner” ym=ym,D = argmin y’ ED{L(y,y’)}
• Define “bias of learner”: Bias(x)=L(y*,ym)
• Define “variance of learner” Var(x)=ED[L(ym,y)]
• Define “noise for x”: N(x) = Et[L(t,y*)]
Claim: ED,t[L(t,y) = c1N(x)+Bias(x)+c2Var(x) where c1=PrD[y=y*] - 1 c2=1 if ym=y*, -1 else
m=|D|
Domingos, A Unified Bias-Variance Decomposition and its Applications, ICML 2000
For 0/1 loss, the main prediction is the most common class predicted by hD(x), weighting h’s by Pr(D)
![Page 18: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/18.jpg)
Bias and variance
• For classification, we can also decompose the error of a learned classifier into two terms: bias and variance – Bias: the class of models can’t fit the data. – Fix: a more expressive model class. – Variance: the class of models could fit the data,
but doesn’t because it’s hard to fit. – Fix: a less expressive model class.
![Page 19: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/19.jpg)
Bias-Variance Decomposition: Measuring
![Page 20: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/20.jpg)
Bias-variance decomposition
• This is something real that you can (approximately) measure experimentally – if you have synthetic data – …or if you’re clever
– You need to somehow approximate ED{hD(x)} – I.e., construct many variants of the dataset D
![Page 21: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/21.jpg)
Background: “Bootstrap” sampling
• Input: dataset D • Output: many variants of D: D1,…,DT
• For t=1,….,T: – Dt = { } – For i=1…|D|:
• Pick (x,y) uniformly at random from D (i.e., with replacement) and add it to Dt
• Some examples never get picked (~37%) • Some are picked 2x, 3x, ….
![Page 22: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/22.jpg)
Measuring Bias-Variance with “Bootstrap” sampling
• Create B bootstrap variants of D (approximate many draws of D)
• For each bootstrap dataset – Tb is the dataset; Ub are the “out of bag” examples – Train a hypothesis hb on Tb
– Test hb on each x in Ub
• Now for each (x,y) example we have many predictions h1(x),h2(x), …. so we can estimate (ignoring noise)
– variance: ordinary variance of h1(x),….,hn(x) – bias: average(h1(x),…,hn(x)) - y
![Page 23: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/23.jpg)
Applying Bias-Variance Analysis
• By measuring the bias and variance on a problem, we can determine how to improve our model – If bias is high, we need to allow our model to
be more complex – If variance is high, we need to reduce the
complexity of the model • Bias-variance analysis also suggests a
way to reduce variance: bagging (later) 23
![Page 24: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/24.jpg)
Bagging
![Page 25: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/25.jpg)
Bootstrap Aggregation (Bagging) • Use the bootstrap to create B variants of D • Learn a classifier from each variant • Vote the learned classifiers to predict on a test
example
![Page 26: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/26.jpg)
Bagging (bootstrap aggregation) • Breaking it down:
– input: dataset D and YFCL – output: a classifier hD-BAG
– use bootstrap to construct variants D1,…,DT
– for t=1,…,T: train YFCL on Dt to get ht
– to classify x with hD-BAG
• classify x with h1,….,hT and predict the most frequently predicted class for x (majority vote)
Note that you can use any learner you like!
You can also test ht on the “out of bag” examples
![Page 27: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/27.jpg)
Experiments Freund and Schapire
![Page 28: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/28.jpg)
Bagged, minimally pruned decision trees
![Page 29: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/29.jpg)
![Page 30: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/30.jpg)
Generally, bagged decision trees
outperform the linear classifier eventually if
the data is large enough and clean
enough.
![Page 31: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/31.jpg)
Bagging (bootstrap aggregation)
• Experimentally: – especially with minimal pruning: decision trees
have low bias but high variance. – bagging usually improves performance for
decision trees and similar methods – It reduces variance without increasing the bias
(much).
![Page 32: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/32.jpg)
More detail on bias-variance and bagging for classification
Thanks Tom Dietterich MLSS 2014
![Page 33: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/33.jpg)
A generalization of bias-variance decomposition to other loss functions
• “Arbitrary” real-valued loss L(y,y’) But L(y,y’)=L(y’,y), L(y,y)=0, and L(y,y’)!=0 if y!=y’
• Define “optimal prediction”: y* = argmin y’ L(t,y’)
• Define “main prediction of learner” ym=ym,D = argmin y’ ED{L(y,y’)}
• Define “bias of learner”: Bias(x)=L(y*,ym)
• Define “variance of learner” Var(x)=ED[L(ym,y)]
• Define “noise for x”: N(x) = Et[L(t,y*)]
Claim: ED,t[L(t,y) = c1N(x)+Bias(x)+c2Var(x) where c1=PrD[y=y*] - 1 c2=1 if ym=y*, -1 else
m=|D|
Domingos, A Unified Bias-Variance Decomposition and its Applications, ICML 2000
For 0/1 loss, the main prediction is the most common class predicted by hD(x), weighting h’s by Pr(D)
![Page 34: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/34.jpg)
More detail on Domingos’s model
• Noisy channel: yi = noise(f(xi)) – f(xi) is true label of xi
– Noise noise(.) may change y à y’ • h=hD is learned hypothesis
– from D={(x1,y1),…(xm,ym)} • for test case (x*,y*), and predicted label
h(x*), loss is L(h(x*),y*) – For instance, L(h(x*),y*) = 1 if error, else 0
![Page 35: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/35.jpg)
More detail on Domingos’s model
• We want to decompose ED,P{L(h(x*),y*)} where m is size of D, (x*,y*)~P
• Main prediction of learner is ym(x*) – ym(x*) = argmin y’ ED,P{L(h(x*),y’)} – ym(x*) = “most common” hD(x*) among all
possible D’s, weighted by Pr(D) • Bias is B(x*) = L(ym(x*) , f(x*)) • Variance is V(x*) = ED,P{L(hD(x*) , ym(x*) ) • Noise is N(x*)= L(y*, f(x*))
![Page 36: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/36.jpg)
More detail on Domingos’s model
• We want to decompose ED,P{L(h(x*),y*)} • Main prediction of learner is ym(x*)
– “most common” hD(x*) over D’s for 0/1 loss • Bias is B(x*) = L(ym(x*) , f(x*))
– main prediction vs true label • Variance is V(x*) = ED,P{L(hD(x*) , ym(x*) )
– this hypothesis vs main prediction • Noise is N(x*)= L(y*, f(x*))
– true label vs observed label
![Page 37: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/37.jpg)
More detail on Domingos’s model
• We will decompose ED,P{L(h(x*),y*)} into – Bias is B(x*) = L(ym(x*) , f(x*))
• main prediction vs true label • this is 0/1, not a random variable
– Variance is V(x*) = ED,P{L(hD(x*) , ym(x*) ) • this hypothesis vs main prediction
– Noise is N(x*)= L(y*, f(x*)) • true label vs observed label
![Page 38: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/38.jpg)
Case analysis of error
![Page 39: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/39.jpg)
Analysis of error: unbiased case
Main prediction is correct
Noise but no
variance
Variance but no noise
![Page 40: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/40.jpg)
Analysis of error: biased case
Main prediction is wrong
Noise and
variance
No noise, no
variance
![Page 41: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/41.jpg)
Analysis of error: overall
Interaction terms are usually small
Hopefully we’ll be in this case more
often, if we’ve chosen a good
classifier
![Page 42: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/42.jpg)
Analysis of error: without noise which is hard to estimate anyway
As with regression, we can experimentally approximately measure bias and variance with bootstrap replicates
Typically break variance down into biased variance, Vb, and unbiased variance, Vu.
Vb
Vu
![Page 43: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/43.jpg)
K-NN Experiments
![Page 44: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/44.jpg)
Tree Experiments
![Page 45: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/45.jpg)
Tree “stump” experiments (depth 2)
Bias is reduced (!)
![Page 46: Bias-Variance in Machine Learningwcohen/10-601/bias-variance.pdf · Bias-variance decomposition • This is something real that you can (approximately) measure experimentally –](https://reader031.vdocument.in/reader031/viewer/2022020303/5b16ebc37f8b9a4e6b8b61d6/html5/thumbnails/46.jpg)
Large tree experiments (depth 10)
Bias is not changed
much
Variance is reduced