introduction to machine learningcs.kangwon.ac.kr/~leeck/ml/04-2.pdf · 2016. 6. 6. ·...

16

Upload: others

Post on 04-Feb-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

  • Regression

    2

    20

    ,N

    ,N

    |~|

    ~

    |:estimator

    xgxrp

    xg

    xfr

    N

    t

    tN

    t

    tt

    N

    t

    tt

    xpxrp

    rxp

    11

    1

    log| log

    , log|XL

    2

  • Regression: From LogL to Error

    2

    1

    2

    12

    2

    2

    1

    2

    1

    2

    12

    22

    1

    N

    t

    tt

    N

    t

    tt

    ttN

    t

    xgrE

    xgrN

    xgr

    ||

    |log

    |exp log|

    X

    XL

    3

    2

    2

    22

    1

    xxp exp

    Least Square Estimates

  • Linear Regression

    0101 wxwwwxg

    tt ,|

    t

    t

    t

    tt

    t

    t

    t

    t

    t

    t

    xwxwxr

    xwNwr

    2

    10

    10

    t

    t

    tt

    t

    t

    t

    t

    t

    t

    t

    xr

    r

    w

    w

    xx

    xN

    yw 1

    0

    2A

    yw1A

    4

    Likelihood or Least Square Estimates partial derivative (MLE)

    E q |X( ) =1

    2rt - w1x

    t +w0( )éë ùût=1

    N

    å2

  • Polynomial Regression

    01

    2

    2012 wxwxwxwwwwwxgttkt

    kkt ,,,,|

    5

  • Polynomial Regression – cont’d

    01

    2

    2012 wxwxwxwwwwwxgttkt

    kkt ,,,,|

    rw TT DDD 1

    6

  • Other Error Measures

    2

    12

    1

    N

    t

    tt xgrE ||X

    7

    Square Error:

    Relative Square Error:

    Absolute Error: E (θ |X) = ∑t |rt – g(xt| θ)|

    ε-sensitive Error:

    E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)

    2

    1

    2

    1

    N

    t

    t

    N

    t

    tt

    rr

    xgr

    E

    |

    |X

  • Bias and Variance

    8

    222 ||| xgExgExgExrExxgxrEE XXXX bias variance

    222 xgxrExxrErExxgrE ||||noise squared error

    Estimator di = d (Xi) on sample Xi Mean square error = E [(d–)2] = (E [d] – )2 + E [(d–E [d])2] = Bias2 + Variance

  • Estimating Bias and Variance

    ti

    t i

    tti

    t

    tt

    xgM

    xg

    xgxgNM

    g

    xfxgN

    g

    1

    1

    1

    2

    22

    Variance

    Bias

    9

    M samples Xi={xti , r

    ti}, i=1,...,M

    are used to fit gi (x), i =1,...,M

  • Bias/Variance Dilemma Example:

    gi(x)=2 has no variance and high bias

    gi(x)= ∑t rti/N has lower bias with higher variance

    As we increase complexity,

    bias decreases (a better fit to data) and

    variance increases (fit varies more with data)

    Bias/Variance dilemma: (Geman et al., 1992)

    10

  • 11

    bias

    variance

    f

    gi g

    f

  • Polynomial Regression

    12

    Best fit “min error”

    underfitting overfitting

  • 13

    Best fit, “elbow”

  • Model Selection Cross-validation: Measure generalization accuracy by

    testing on data unused during training

    Regularization: Penalize complex models

    E’= error on data + λ model complexity

    Akaike’s information criterion (AIC):

    Bayesian information criterion (BIC):

    Minimum description length (MDL): shortest description of data

    Structural risk minimization (SRM):

    14

    h is the VC dimension of a model

    k: #parameters L: Likelihood

  • Bayesian Model Selection

    datamodel model|data

    data|modelp

    ppp

    15

    Prior on models, p(model)

    Regularization, when prior favors simpler models

    Bayes, MAP of the posterior, p(model|data)

    Average over a number of models with high posterior (voting, ensembles: Chapter 17)

  • Regression example

    16

    Coefficients increase in magnitude as order increases: 1: [-0.0769, 0.0016] 2: [0.1682, -0.6657, 0.0080] 3: [0.4238, -2.5778, 3.4675, -0.0002 4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]

    i i

    N

    t

    tt wxgrEtionregulariza 22

    12

    1ww ||: X