introduction to machine learningcs.kangwon.ac.kr/~leeck/ml/04-2.pdf · 2016. 6. 6. ·...

Upload: others

Post on 04-Feb-2021

8 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Regression

2

20

,N

,N

|~|

~

|:estimator

xgxrp

xg

xfr

N

t

tN

t

tt

N

t

tt

xpxrp

rxp

11

1

log| log

, log|XL

2
Regression: From LogL to Error

2

1

2

12

2

2

1

2

1

2

12

22

1

N

t

tt

N

t

tt

ttN

t

xgrE

xgrN

xgr

||

|log

|exp log|

X

XL

3

2

2

22

1

xxp exp

Least Square Estimates
Linear Regression

0101 wxwwwxg

tt ,|

t

t

t

tt

t

t

t

t

t

t

xwxwxr

xwNwr

2

10

10

t

t

tt

t

t

t

t

t

t

t

xr

r

w

w

xx

xN

yw 1

0

2A

yw1A

4

Likelihood or Least Square Estimates partial derivative (MLE)

E q |X( ) =1

2rt - w1x

t +w0( )éë ùût=1

N

å2
Polynomial Regression

01

2

2012 wxwxwxwwwwwxgttkt

kkt ,,,,|

5
Polynomial Regression – cont’d

01

2

2012 wxwxwxwwwwwxgttkt

kkt ,,,,|

rw TT DDD 1

6
Other Error Measures

2

12

1

N

t

tt xgrE ||X

7

Square Error:

Relative Square Error:

Absolute Error: E (θ |X) = ∑t |rt – g(xt| θ)|

ε-sensitive Error:

E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)

2

1

2

1

N

t

t

N

t

tt

rr

xgr

E

|

|X
Bias and Variance

8

222 ||| xgExgExgExrExxgxrEE XXXX bias variance

222 xgxrExxrErExxgrE ||||noise squared error

Estimator di = d (Xi) on sample Xi Mean square error = E [(d–)2] = (E [d] – )2 + E [(d–E [d])2] = Bias2 + Variance
Estimating Bias and Variance

ti

t i

tti

t

tt

xgM

xg

xgxgNM

g

xfxgN

g

1

1

1

2

22

Variance

Bias

9

M samples Xi={xti , r

ti}, i=1,...,M

are used to fit gi (x), i =1,...,M
Bias/Variance Dilemma Example:

gi(x)=2 has no variance and high bias

gi(x)= ∑t rti/N has lower bias with higher variance

As we increase complexity,

bias decreases (a better fit to data) and

variance increases (fit varies more with data)

Bias/Variance dilemma: (Geman et al., 1992)

10
11

bias

variance

f

gi g

f
Polynomial Regression

12

Best fit “min error”

underfitting overfitting
13

Best fit, “elbow”
Model Selection Cross-validation: Measure generalization accuracy by

testing on data unused during training

Regularization: Penalize complex models

E’= error on data + λ model complexity

Akaike’s information criterion (AIC):

Bayesian information criterion (BIC):

Minimum description length (MDL): shortest description of data

Structural risk minimization (SRM):

14

h is the VC dimension of a model

k: #parameters L: Likelihood
Bayesian Model Selection

datamodel model|data

data|modelp

ppp

15

Prior on models, p(model)

Regularization, when prior favors simpler models

Bayes, MAP of the posterior, p(model|data)

Average over a number of models with high posterior (voting, ensembles: Chapter 17)
Regression example

16

Coefficients increase in magnitude as order increases: 1: [-0.0769, 0.0016] 2: [0.1682, -0.6657, 0.0080] 3: [0.4238, -2.5778, 3.4675, -0.0002 4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]

i i

N

t

tt wxgrEtionregulariza 22

12

1ww ||: X