non-parametric regression: the loess...

26
Non-parametric Regression: the Loess Method LOWESS= LOESS is an Acronym for LOcally reWEighted ScatterPlot Smoothing (Cleveland). For i=1 to n, the ith measurement y i of the response y and the corresponding measurement x i of the vector x of p predictors are related by Y i =g(x i ) + I where g is the regression function and i is a random error. Idea: g(x) can be locally approximated by a parametric function. Obtained by fitting a regression surface to the data points within a chosen neighborhood of the point x.

Upload: others

Post on 01-Feb-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Non-parametric Regression: the Loess Method

    LOWESS= LOESS is an Acronym for LOcally

    reWEighted ScatterPlot Smoothing (Cleveland).

    For i=1 to n, the ith measurement yi of the response y and

    the corresponding measurement xi of the vector x of p

    predictors are related by

    Yi=g(xi) + I

    where g is the regression function and i is a random error.

    Idea: g(x) can be locally approximated by a parametric

    function.

    Obtained by fitting a regression surface to the data points

    within a chosen neighborhood of the point x.

  • In the LOESS (LOWESS) method, weighted least squares

    is used to fit linear or quadratic functions of the predictors

    at the centers of neighborhoods.

    The radius of each neighborhood is chosen so that the

    neighborhood contains a specified percentage of the data

    points. The fraction of the data, called the smoothing

    parameter, in each local neighborhood controls the

    smoothness of the estimated surface.

    Data points in a given local neighborhood are weighted by

    a smooth decreasing function of their distance from the

    center of the neighborhood.

    Finding distance between the ith and hth points 2 predictors:

    Distance between: (Xi1, Xi2) and (Xh1,Xh2):

    Generally Eucledean Distance is used,

    ])()[( 2222

    11 hihii XXXXd

    and weights are defined by a tri-cube function:

    otherwise 0

    if ])/(1[ 33 qiqi ddddwi

    Choice of q is between 0 and 1, often between .4 to .6.

    Large q: smoother but maybe too smooth

    Small q: too rough

  • Example: SINGLE Variable LOESS (our Tree data)

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    22

    23

    di amet er

    0 10 20 30

  • ODS output from SAS

    Height versus breast diameter for selected redwood trees

    The LOESS Procedure

    Independent Variable Scaling

    Scaling applied: None

    Statistic diameter

    Minimum Value 5.80000

    Maximum Value 29.80000

  • Height versus breast diameter for selected redwood trees

    The LOESS Procedure Dependent Variable: height

    Optimal Smoothing Criterion

    AICC Smoothing Parameter

    1.91149 0.95833

  • Height versus breast diameter for selected redwood trees

    The LOESS Procedure Selected Smoothing Parameter: 0.958 Dependent Variable: height

    Fit Summary

    Fit Method kd Tree

    Blending Linear

    Number of Observations 36

    Number of Fitting Points 9

    kd Tree Bucket Size 6

    Degree of Local Polynomials 1

    Smoothing Parameter 0.95833

    Points in Local Neighborhood 34

    Residual Sum of Squares 68.12269

    Trace[L] 3.21315

    GCV 0.06337

    AICC 1.91149

    AICC1 68.91565

    Delta1 32.38147

    Delta2 32.06626

    Equivalent Number of Parameters 2.80778

    Lookup Degrees of Freedom 32.69979

    Residual Standard Error 1.45043

  • Height versus breast diameter for selected redwood trees The LOESS Procedure

    Independent Variable Scaling

    Scaling applied: None

    Statistic diameter

    Minimum Value 0.70000

    Maximum Value 6.10000

  • Height versus breast diameter for selected redwood trees

    The LOESS Procedure Smoothing Parameter: 0.96 (possibly too smooth) Dependent Variable: height

    Fit Summary

    Fit Method kd Tree

    Blending Linear

    Number of Observations 15

    Number of Fitting Points 9

    kd Tree Bucket Size 2

    Degree of Local Polynomials 1

    Smoothing Parameter 0.96000

    Points in Local Neighborhood 14

    Residual Sum of Squares 59.93997

  • Height versus breast diameter for selected redwood trees

    The LOESS Procedure

    Independent Variable Scaling

    Scaling applied: None

    Statistic diameter

    Minimum Value 5.80000

    Maximum Value 29.80000

  • Height versus breast diameter for selected redwood trees

    The LOESS Procedure Smoothing Parameter: 0.5 Dependent Variable: height

    Fit Summary

    Fit Method kd Tree

    Blending Linear

    Number of Observations 36

    Number of Fitting Points 17

    kd Tree Bucket Size 3

    Degree of Local Polynomials 1

    Smoothing Parameter 0.50000

    Points in Local Neighborhood 18

    Residual Sum of Squares 60.77325

  • fitting is done at each point at which the regression surface is to be estimated

    faster computational procedure is to perform such local fitting at a selected sample of points and then to

    blend local polynomials to obtain a regression surface

    can use the LOESS procedure to perform statistical inference provided the error distribution are i.i.d.

    normal random variables with mean 0.

    using the iterative reweighting, LOESS can also provide statistical inference when the error distribution

    is symmetric but not necessarily normal.

    by doing iterative reweighting, you can use the LOESS procedure to perform robust fitting in the

    presence of outliers in the data.

    PROC LOESS in SAS

    While all output of the LOESS procedure can be optionally

    displayed, most often the LOESS procedure is used to

    produce output data sets that will be viewed and

    manipulated by other SAS procedures. PROC LOESS uses

    the Output Delivery System (ODS) to place results in

    output data sets. This is a departure from older SAS

    procedures that provide OUTPUT statements to create SAS

    data sets from analysis results.

  • Multiple Predictors:

    This was an experiment trying to model photolytic damage

    to paint molecules based on time of exposure x1, relative

    humidity x2, Ultra Violet Filter x3, Temperature x4.

    It is easier to look at 2 variables at a time:

    0. 00

    33. 03

    66. 06

    99. 08

    x1 19. 00

    37. 67

    56. 33

    75. 00

    x2- 0. 46

    0. 01

    0. 48

    0. 94

    y

  • 0. 00

    33. 03

    66. 06

    99. 08

    x1 290

    343

    397

    450

    x3- 2. 67

    - 0. 54

    1. 60

    3. 73

    y

    290

    343

    397

    450

    x3

    0. 00

    33. 03

    66. 06

    99. 08

    x1

    Pr edi ct ed y

    - 0. 81998

    - 0. 30260

    0. 21477

    0. 73214

  • 0. 00

    33. 03

    66. 06

    99. 08

    x1 30

    40

    50

    60

    x40. 000

    0. 276

    0. 552

    0. 828

    y

    30

    40

    50

    60

    x4

    0. 00

    33. 03

    66. 06

    99. 08

    x1

    Pr edi ct ed y

    - 0. 05846

    0. 11607

    0. 29060

    0. 46513

  • 19. 00

    37. 67

    56. 33

    75. 00

    x2 290

    343

    397

    450

    x3- 4. 50

    - 1. 13

    2. 25

    5. 62

    y

    290

    343

    397

    450

    x3

    19. 00

    37. 67

    56. 33

    75. 00

    x2

    Pr edi ct ed y

    - 4. 02495

    - 0. 99681

    2. 03133

    5. 05948

  • 20

    37

    53

    70

    x2

    0. 030. 0

    60. 090. 0

    x1

    Pr edi ct ed y

    - 0. 1

    0. 2

    0. 5

    0. 8

    290

    343

    397

    450

    x3

    0. 030. 0

    60. 090. 0

    x1

    Pr edi ct ed y

    - 0. 1

    0. 2

    0. 5

    0. 8

  • 30

    40

    50

    60

    x4

    0. 030. 0

    60. 090. 0

    x1

    Pr edi ct ed y

    - 0. 1

    0. 2

    0. 5

    0. 8

    Pr edi ct ed y - 0. 0 0. 1 0. 3 0. 4

    0. 5 0. 7 0. 8

    x1

    0. 0

    22. 5

    45. 0

    67. 5

    90. 0

    x2

    70 58 45 33 20

  • Pr edi ct ed y - 0. 0 0. 1 0. 3 0. 4

    0. 5 0. 7 0. 8

    x1

    0. 0

    22. 5

    45. 0

    67. 5

    90. 0

    x3

    450 410 370 330 290

    Pr edi ct ed y - 0. 0 0. 1 0. 3 0. 4

    0. 5 0. 7 0. 8

    x1

    0. 0

    22. 5

    45. 0

    67. 5

    90. 0

    x4

    60 53 45 38 30