what does r² measure.pdf

Upload: gerson-miranda

Post on 06-Jul-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 What Does r² measure.pdf

    1/4

    WHAT DOES

    R2

    MEASURE?

    Steve Thomson, University of Kentucky Computing Center

    The coefficient of determinaHon, squared multiple

    correlation coefficient, that is,

    R2,

    is perhaps the most

    extensively used measure of goodness of fit for

    regression models. In my consulting, many people have

    asked questions like:

    My

    R2 was .6. That's good, isn't

    it? , or My R2 was only

    .05.

    That's bad, isn't it? . Thus

    many people seem to use high values of R2 as an index

    of fit, concluding that if R2

    is

    above some value, they

    have good fit, but that small values indicate bad fit. In

    either case above, probably the best response is

    Compared to what? .

    One interpretation of R2 is as the limit of the

    regression sum of squares divided by the limit of the total

    sum of squares. Or equivalently, it is an estimate of the

    amount of variation in observation means compared to

    the variation in means pius the variation in residual error.

    This can be small even for models with small error,

    Or

    large for models with large error.

    I. Review:

    The general linear model can be interpreted

    as

    investigating how well an observed nx1 vector,

    y,

    can be

    represented

    in some lower dimensional

    subspace

    spanned

    by

    the columns of the matrix of regressors.

    (Recall that the span of a set of vectors is the set of all

    linear combinations of the vectors, i.e. a subspace) The

    coefficients of the vectors determining the points in the

    subspace are the parameters. The least squares solution

    for the parameters are the parameter values

    corresponding to the point

    in

    the subspace

    i

    closest to

    y..

    The various sums of squares in ANOVA can be

    represented as (squared) Euclidian distances between

    subspaces spanned by different groups of columns in the

    regressor matrix.

    ,

    SSE

    1

    Figure

    1.

    Geometry

    01

    ANOV

    A

    (Note that distances are square roots of indicated ssq's.)

    The total sum of squares, SST, is the (squared) distance

    from

    y. to

    the origin. The error sum of squares, SSE, is

    the (squared) distance from y. to

    i.

    in the span of A and B.

    The model sum of squares, SS(A, B), is the (squared)

    distance from i to the origin. By the Pythagorean

    theorem,

    SST = SS(A,B) + SSE.

    So heuristically, a good model has a large SS(A,B)

    compared to SST. This seems to justify the use of their

    ratio as an index of fit. namely:

    R2= SS(A,B)

    SST -

    1246

    model sum of squares

    total sum of squares .

    The model sum of squares, SS(A,B), could also be

    partitioned into sequential sums of squares. One

    possible partition would be:

    SS(A,B) = SS(A) + SS(BIA).

    where both terms on the right are squared distances, and

    thus positive. The SS(BIA) is the sum

    of

    squares for B

    adjusted for ,

    or

    orthogonalized to A. So comparing

    the R2 for the submodel with only A's as regressors to the

    R2 for the complete model with A's and B's:

    R2 = SS(A)

    ,;

    SS(A) + SS(BIA) R2

    A SST SST

    Clearly R2 has to increase as variables are added to the

    model (or at least not decrease). Also, note from the

    geometry that 0 ,; R2 ,; 1.

    If the model Includes an intercept or constant term,

    tt

    is particularly easy to adjust. or orthogonalize, to this

    constant, merely subtract the column mean from each

    column. Most analyses are performed on such

    corrected models. One last simple observation from the

    geometry is that there are an uncountably infinite number

    of subspaces with R2 = 1, namely, any subspace that

    includes

    Other results on

    R2;

    1. If true replicates are added to the data, so that more

    than one y value occurs at each regressors value,

    then R2 will often decrease (e.g. Healy, 1984, Draper,

    1984).

    In fact, when there are true replicates its

    achievable upper bound will be less than 1.0. In

    practice, the

    resuHs

    for

    c l o s e ~ r e p c a t e s

    are often

    similar.

    2.

    For Yi = +

    1 ,

    x +

    J

    Xi' + ...

    +

    P-' Xi,p-' + j, with

    the

    ; i n d e p e n d e n ~

    homogeneous normal random

    variables with mean 0, when

    1 ,

    = IJ = ... =

    1

    .., =0, it

    is easy to show that then the expected value of R2.

    using corrected sums of squares, is

    P

    l

     

    (e.g.

    Crocker,

    1972,

    Seber,

    1977.

    pg

    115).

    A similar result

    holds for R2 using uncorrected sums of squares (

    £. .

    n

    In fact, under these assumptions, R2 follows

    a

    beta

    .distribution.

    It would seem that a good fit index for comparing

    completely different regression equations should be

    optimized by the least squares equation, should be free of

    the measurement units, and should be large for models

    with small residual error and small for models with a large

    residual error. Finally, the regressor variables are usually

    considered

    as

    fixed. So a fit index should

    be

    indapendent

    of their actual values. R2 satisfies the first two criteria, but

    fails on the last two.

    II. An

    Alternative

    Definition:

    An alternative definition of R2 that has been

    recommended (e.g. Draper & Smith,

    1981)

    is

  • 8/18/2019 What Does r² measure.pdf

    2/4

      =

    squared correlation between regressand

    ( response and predicted value

    ®.

    It is well known that for a linear model "corrected" for the

    Intercept or constant term, estimated

    by

    ordinaty least

    squares {so corrected sums

    of

    squares are used in the

    first deflnHlon}, the two definitions are equivalent. For

    uncorrected models, Is easily computed in SAS'"

    PROC REG or GLM

    by

    OUTPUTing the predicted values

    and computing correlations In

    PAOC

    CORA, and finally

    squaring the result.

    To IUustrete a possible problem with the second

    definition, suppose below Ihal o's denole observed

    values, and

    p s

    denote predicted values from

    \I1e

    least

    squares equation, from a simple regression wilhout an

    intercept. Then using Ihe second definition,

    =

    1.

    o

    p

    o

    p

    o

    p

    p 0

    p

    0

    p 0

    Figure 2.

    "

    Yet Ihe predicted values, fis, are most certainly nol doing

    a

    good

    job of predicting the observed values. Of course,

    in this case, any linear function of the observed x's will

    give an

    =

    1. It

    is

    slightiy more complicated

    lor

    multiple

    regression problems,

    but

    generally, even if =

    ,

    there

    Is no requirement tha i the least squares hyperplane must

    be "close" to

    y.

    Thai is, for any b, and nonzero a,

    ( Corr(£,

    y:)

    }2

    = (

    Corr(a

    2

    +

    b,

    y:)

    l'}.

    Since we could

    generate a hyperplane passing through zero and any

    specified value, that suggests that unless the possible

    choices of coefficients is

    restricted

    in some way, is

    not

    explicHly a good measure of "fit".

    III.

    Limiting Behavior of R2'

    The definition of

    R2

    used In SAS procedures seems

    to make more sense,

    R2

    =

    ~ ~ ,

    by default, using

    corrected

    sums of squares

    jf the model h ~ s

    an intercept,

    uncorrected if the model has no intercept. But thai

    particular ratio is exactly what R2 is measuring. not

    necessarily the quality of fit.

    Suppose a general tinear model is appropriate, i.e.

    the nxl regressand vector, y> is appropriately modeled as

    = X ll. + J , where the ei's are stochastically indapendent

    with

    mean

    0 and variance B'.

    n

    For convenience, write the orthogonal projection

    matrices p.

    =

    X{X Xr X and for i denoting the

    oxl

    vector

    ..

    ,

    of l s, P, = L, a nxn matrix with all elements lin Since

    n

    the

    j

    is a column

    of

    X, there is an g so thai

    i

    =X g. Also

    then PxP, = P, =p,p .

    Thus,

    R =

    SSf\, (Pxlt-P

    y : ) t ( p . ~ _ p ,y:)

    It'Pxlt - yip l

    SSTc

    iY.-y P,y

    y . Y . - Y P , ~

    .

    By the law of arge numbers, plim 1

    =

    0, so

    n

    ~ = ft'J

  • 8/18/2019 What Does r² measure.pdf

    3/4

    xtp X

    As n gets large, with -- -->Bo, say, then,

    n

    .

    yip ,?i . 1l'X'P Xll lltXtj j'e

    k k

    )

    p l l m - - = p l l m ( +

    ~

    -

    n n n n n n

    =

    ltBoil

    fttBll- ft'Boil fttB'1l

    Thus,plimR

    2

    - t t .R 2 RtB'R+"

    II

    Bft ll

    B""

    +

    a

    y-

    where. now, B' is the limiting, centered matrix of

    regression

    coefficients,

    corresponding to

    a moment

    malrix of the· regressors. Thus

    R

    will be 'large if

    i f

    is

    small (i.e. good fit). f ifllt

    B 1l.

    is large Le. large variation

    among mean values).

    Or again. with correct specification, R2

    can

    also be

    n

    '

    -p: n)

    f

    interpreted as lim

    ~ ; ' - - : : - - -

      ~

    n

    basically.

    the

    corrected regression sum of squares.

    IV. So what does

    R'

    measure?

    For

    a correctly specified model, R2 tends toward the

    inverse of

    1+- - .

    where. again, B Is analogous to a

    Il.tBIl

    sample

    covariance matrix

    of

    regressors

    for the intercept

    case, and a

    moment matrix

    for

    'the

    no Intercept case. In

    either

    case,

    the fttB{1 is a reasonable regreSSion sums

    of squares, but is not much of an indicator of fit. If one

    has perfect frt.

    , ;

    =

    0,

    and

    R'

    will

    be

    1.

    Of

    course,

    R'

    will

    also tend to 1 as the variation in observation means gets

    large.

    In practice, this means that if two experimenters

    choose equal sample sizes from the same process, the

    one who chooses his regressor variables to have more

    variation will tend to have a higher R', For example, in

    an economic context, suppose some process remains

    true over the years and satisfies the standard regression

    assumptions. Then an economist who uses monthly data

    from that process will tend to have a lower fl ' than one

    who uses yearly data, since generally the yearly data will

    have more variation. Similarly, suppose two nutritionists

    are modeling weight gain as a lunction of calories and

    prior weight All else being equal the nuttitionist who

    chooses his sample to have more variation In calories

    and prior weight will tend to have a higher

    R'. In

    an

    educational context, with a true model, a researcher who

    samples across schools in a stale, could be expected to

    have a larger R' than a researcher who samples schools

    in

    a certain county.

    Actually this is not surprising. Under the usual

    assumptions, the variance of the ordinary least squares

    estimates of the /l's, is i f X

    t

    Xl". So large variation in the

    X's tends to reduce the variance of the estimates of the

    Ws.

    Thus more variability in the X's increases efflctency.

    But f l ' is apparentiy not usually Interpreted

    as

    a measure

    01 efficiency, but rather fit. However, R' is not quite a

    measure of efficiency either. For any set of regressors,

    R'

    would also tend to increase as the ft's get large. a

    change which would have no effect on the precision of

    the estimates.

    1248

    The fact that

    R'

    is not quite a measure of fit is not

    suprising. It is just the squared correlation between the

    response variable y and a linear combination of the

    regressor

    variables. Correlation measures how

    much

    variables vary (linearly) together. So R2

    has

    to depend

    on

    the

    regressors as much as

    on

    the response variable.

    Note thaI asymptotically,

    as

    n increases, and the

    number of regressors (Including the intercept), p, remains

    constant

    R'

    has the same limit as the adjusted R of

    SAS PROC REG:

    R q= 1 - (

    - R ' ) ~ .

    n·p

    where n

    = n-l

    for a corrected model with an intercept,

    and n for an uncorrected model. This is meant

    to

    adjust

    for the fact that R' always increases as regressors are

    added to the model. R aq only Increases if the t-statistic

    for the added variable is greater than one In absoiute

    magnitude. The same remarks would apply to the ofher

    finitely corrected adjusted R"s, including, for example.

    Amemiya's prediction criterion (Amemiya,

    1985):

    R' =1-( l-R') . : E.,

    pc n·p

    designed to correct still further for the tendency of even

    R ~ q to choose models with many variables. But again,

    Ihese have the same asymptotic behavior as

    R'.

    V.

    So

    again, what does

    R

    measure?

    Even in a correctly specified model, f l ' is primarily

    an estimate of the relative size of the variation in

    observation means and the population variance. Within a

    single set of possible regressors. it could e a useful

    index

    01

    fit to compare various submodels. But, models

    with small error, and thus good fll,

    can

    have a high R2 or

    a small

    R2

    depending on the observation means, I.e. on

    the

    vaJues

    of the regressor variables. So if R2

    is

    large it

    does not necessarily indicate good' fit, while R' .small

    does not necessarily indicate bad fit

    SAS is the registered trademark of $AS Institute, inc.,

    Casy, NC, USA.

    For further information contact:

    Steve Thomson

    University of Kentucky Computing Center

    128 McVey Hall

    leXington, KY 40506'()045

  • 8/18/2019 What Does r² measure.pdf

    4/4

    REFEREN ES

    Amemiya, T. (1985),

    Advanced Econometrics.

    Cambridge. Massachusetts: Harvaro University Press.

    Barrett. J.P. (1974). The Coefficient 01 Detenninalion -

    Some Limitations ,

    The American Statistician. 28.

    19-20.

    Chang. P.C. and Alift. A.C. (1987)

    Goodness 01

    Fit

    Statistics for General Linear Regression Equations in

    the Presence of Replicated Response . The

    American

    Statistician.

    41.195-199.

    Crocker, D.C. (1972). Some Interpretations of the

    Multiple Correlation Coefficient ,

    The American

    Staustfcian 26. 31-33.

    Draper. N.R. (1984). The Box-Wetz Criterion Versus R2 .

    Journal

    o

    the Royal Statistical

    SOCiety Series A. 147,

    101H03.

    Draper. N.R. (1985). 'Corrections: The Box-Wetz

    Criterion Versus R2 .

    Joumat o

    /he

    Royal Statistical

    Society

    Series

    A.

    148. 357.

    Draper. N.R. and Smith.

    H.

    1981). Applied Regression

    Analysis.

    2nd

    Ed.

    New York: Wiley.

    Heaty. M.J.R.

    1984).

    The·Use

    01

    R2 as a Measure

    1

    Goodness 01 Fit .

    Joumat o

    the

    Royat Statislical

    SOciety.

    SeriesA.

    147.608-609.

    Kvalselh. T.O. (1985). Cautionary Note about R

    The

    American Statistician.

    39, 279·285.

    Ranney. G.B. and Thigpen. C.C. (19Bl) The Sample

    Coefficient

    of

    Determination in Simple

    line r

    Regression-.

    The

    merican Statistician 35 152-153.

    Seber,

    GAF 1977). Linear Regression Analysis.

    New

    York: Wiley.

    Theil. H. and Chung. C.-F. (198B) Inlonnation-Theoretic

    Measures

    01

    Fit for Univariate and Multivariate line r

    Regression .

    The American Staastician.

    42. 249-252.

    Willett, J.B • and Singer. J.B. (1988). Another Cautionary

    Note about R2: Its Use in Weighted Regression

    Analysis .

    The American Statistician

    42236-238.

    1249