non-parametric regression: the loess...
TRANSCRIPT
-
Non-parametric Regression: the Loess Method
LOWESS= LOESS is an Acronym for LOcally
reWEighted ScatterPlot Smoothing (Cleveland).
For i=1 to n, the ith measurement yi of the response y and
the corresponding measurement xi of the vector x of p
predictors are related by
Yi=g(xi) + I
where g is the regression function and i is a random error.
Idea: g(x) can be locally approximated by a parametric
function.
Obtained by fitting a regression surface to the data points
within a chosen neighborhood of the point x.
-
In the LOESS (LOWESS) method, weighted least squares
is used to fit linear or quadratic functions of the predictors
at the centers of neighborhoods.
The radius of each neighborhood is chosen so that the
neighborhood contains a specified percentage of the data
points. The fraction of the data, called the smoothing
parameter, in each local neighborhood controls the
smoothness of the estimated surface.
Data points in a given local neighborhood are weighted by
a smooth decreasing function of their distance from the
center of the neighborhood.
Finding distance between the ith and hth points 2 predictors:
Distance between: (Xi1, Xi2) and (Xh1,Xh2):
Generally Eucledean Distance is used,
])()[( 2222
11 hihii XXXXd
and weights are defined by a tri-cube function:
otherwise 0
if ])/(1[ 33 qiqi ddddwi
Choice of q is between 0 and 1, often between .4 to .6.
Large q: smoother but maybe too smooth
Small q: too rough
-
Example: SINGLE Variable LOESS (our Tree data)
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
di amet er
0 10 20 30
-
ODS output from SAS
Height versus breast diameter for selected redwood trees
The LOESS Procedure
Independent Variable Scaling
Scaling applied: None
Statistic diameter
Minimum Value 5.80000
Maximum Value 29.80000
-
Height versus breast diameter for selected redwood trees
The LOESS Procedure Dependent Variable: height
Optimal Smoothing Criterion
AICC Smoothing Parameter
1.91149 0.95833
-
Height versus breast diameter for selected redwood trees
The LOESS Procedure Selected Smoothing Parameter: 0.958 Dependent Variable: height
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 36
Number of Fitting Points 9
kd Tree Bucket Size 6
Degree of Local Polynomials 1
Smoothing Parameter 0.95833
Points in Local Neighborhood 34
Residual Sum of Squares 68.12269
Trace[L] 3.21315
GCV 0.06337
AICC 1.91149
AICC1 68.91565
Delta1 32.38147
Delta2 32.06626
Equivalent Number of Parameters 2.80778
Lookup Degrees of Freedom 32.69979
Residual Standard Error 1.45043
-
Height versus breast diameter for selected redwood trees The LOESS Procedure
Independent Variable Scaling
Scaling applied: None
Statistic diameter
Minimum Value 0.70000
Maximum Value 6.10000
-
Height versus breast diameter for selected redwood trees
The LOESS Procedure Smoothing Parameter: 0.96 (possibly too smooth) Dependent Variable: height
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 15
Number of Fitting Points 9
kd Tree Bucket Size 2
Degree of Local Polynomials 1
Smoothing Parameter 0.96000
Points in Local Neighborhood 14
Residual Sum of Squares 59.93997
-
Height versus breast diameter for selected redwood trees
The LOESS Procedure
Independent Variable Scaling
Scaling applied: None
Statistic diameter
Minimum Value 5.80000
Maximum Value 29.80000
-
Height versus breast diameter for selected redwood trees
The LOESS Procedure Smoothing Parameter: 0.5 Dependent Variable: height
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 36
Number of Fitting Points 17
kd Tree Bucket Size 3
Degree of Local Polynomials 1
Smoothing Parameter 0.50000
Points in Local Neighborhood 18
Residual Sum of Squares 60.77325
-
fitting is done at each point at which the regression surface is to be estimated
faster computational procedure is to perform such local fitting at a selected sample of points and then to
blend local polynomials to obtain a regression surface
can use the LOESS procedure to perform statistical inference provided the error distribution are i.i.d.
normal random variables with mean 0.
using the iterative reweighting, LOESS can also provide statistical inference when the error distribution
is symmetric but not necessarily normal.
by doing iterative reweighting, you can use the LOESS procedure to perform robust fitting in the
presence of outliers in the data.
PROC LOESS in SAS
While all output of the LOESS procedure can be optionally
displayed, most often the LOESS procedure is used to
produce output data sets that will be viewed and
manipulated by other SAS procedures. PROC LOESS uses
the Output Delivery System (ODS) to place results in
output data sets. This is a departure from older SAS
procedures that provide OUTPUT statements to create SAS
data sets from analysis results.
-
Multiple Predictors:
This was an experiment trying to model photolytic damage
to paint molecules based on time of exposure x1, relative
humidity x2, Ultra Violet Filter x3, Temperature x4.
It is easier to look at 2 variables at a time:
0. 00
33. 03
66. 06
99. 08
x1 19. 00
37. 67
56. 33
75. 00
x2- 0. 46
0. 01
0. 48
0. 94
y
-
0. 00
33. 03
66. 06
99. 08
x1 290
343
397
450
x3- 2. 67
- 0. 54
1. 60
3. 73
y
290
343
397
450
x3
0. 00
33. 03
66. 06
99. 08
x1
Pr edi ct ed y
- 0. 81998
- 0. 30260
0. 21477
0. 73214
-
0. 00
33. 03
66. 06
99. 08
x1 30
40
50
60
x40. 000
0. 276
0. 552
0. 828
y
30
40
50
60
x4
0. 00
33. 03
66. 06
99. 08
x1
Pr edi ct ed y
- 0. 05846
0. 11607
0. 29060
0. 46513
-
19. 00
37. 67
56. 33
75. 00
x2 290
343
397
450
x3- 4. 50
- 1. 13
2. 25
5. 62
y
290
343
397
450
x3
19. 00
37. 67
56. 33
75. 00
x2
Pr edi ct ed y
- 4. 02495
- 0. 99681
2. 03133
5. 05948
-
20
37
53
70
x2
0. 030. 0
60. 090. 0
x1
Pr edi ct ed y
- 0. 1
0. 2
0. 5
0. 8
290
343
397
450
x3
0. 030. 0
60. 090. 0
x1
Pr edi ct ed y
- 0. 1
0. 2
0. 5
0. 8
-
30
40
50
60
x4
0. 030. 0
60. 090. 0
x1
Pr edi ct ed y
- 0. 1
0. 2
0. 5
0. 8
Pr edi ct ed y - 0. 0 0. 1 0. 3 0. 4
0. 5 0. 7 0. 8
x1
0. 0
22. 5
45. 0
67. 5
90. 0
x2
70 58 45 33 20
-
Pr edi ct ed y - 0. 0 0. 1 0. 3 0. 4
0. 5 0. 7 0. 8
x1
0. 0
22. 5
45. 0
67. 5
90. 0
x3
450 410 370 330 290
Pr edi ct ed y - 0. 0 0. 1 0. 3 0. 4
0. 5 0. 7 0. 8
x1
0. 0
22. 5
45. 0
67. 5
90. 0
x4
60 53 45 38 30