4.2.1 curve fitting in many cases the re~ationship of y to x is not...
TRANSCRIPT
![Page 1: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/1.jpg)
Section 4.2 Fitting Curves and Surfaces by Least Squares
\ 4.2.1 Curve Fitting'--i---
In many cases the re~ationship of y to x is not a straight line
~~:I
4) 50.0U:;a....0 40.0>.g4):s
go 30.0of:
10.00.0 10.0 20.0 30.0 40.0 50.0 60.0
Miles east of Southport, Connecticut sC>'~D I: LA >ui. K h ~~I{
~>("""~ .~
To fit a curve to the data on can
1. Fit a nonlinear function directly to the data. .
2. Rescale, transform x or y to make the relationship linear.
3. Fit a polynomial function to the data.
For a uniform fluid light should decay as an exponential function of depth.Given data on depth, x, and light intensity, y, the data should look like an expo-nential function. .
y = a * eb1*x
1. Direct Fitting
We could fit the equation directly by minimizing I: (Yi - a * ebl*Xi)2
If the residuals are normal with equal variance in the plot, then this would be agood way to go.
We could use a "nonlinear regression" program to accomplish this.
/
. This can also be accomplish with the "Solver" add-in in Excel
We won't be doing such curves this way in Stat 3411.
![Page 2: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/2.jpg)
But very, very often in these situations the variances are not constant.
One way to handle nonconstant variance is "weighted regression" where one give-", larger weight to residuals where variance is ~mall and we are more sure of where~ the fitted line should go. We won't be doing this in Stat 3411
2. 'Iransforming
Often the larger values have more variance and as the signal decreases exponen-tially, the smaller values have smaller variances
In this case it can work better to rescale the data to
In(y) = In(a) + b1 * x = bo + b1 * x j =- c{81:>,><
Often values in the log scale have fairly constant variances.
~. ..aC\I .
.. .
... ... . .. .. .. .
>-10,... . ... ..
... ........ . ...... ,. .. . '" r. .. ..'... .
. .-. ...fit. .
T
.a,...
.-0 5 10 15 20
x
... .... .acw) .
.. ... .. ... .. . .....
. .. ........ . .... ... fit. .".... ..
.
coN~--Z...JC\IN
... ...... . .. .- '..
".,--- eX) .,... .-0 5 10 15 20
![Page 3: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/3.jpg)
Predicted values are less reliable if you are extrapolating outside the range of xvalues in the data set.
FIGURE 9.24
Using a Regression ModelOutside the ExperimentalRegion
~a..Zt:J
//
//
// .-?
~. ~~::: .. \
\ f
M.c <:: L ~. \/'e
6 8
Inflation rate (x, %)
iI1J't If at all possible, extrapolations are more reliable if one has an equation from a~) physical model.
For example physical models predict that in a uniform fluid light will decay expo-nentially with depth.
CE4505SurfaceWaterQualilVEngineering
tth
t~y
http://www.cee.mtu.edu/~mtauer/classes/ce4505/lecture3.htm
The attenuation of light with depth is well-described by first-order kinetics,
,:iI'
dI
dz = -ke. I
where I is light (j.!E. m'2-s'I), z is depth (m), and keis the extinction coefficient (m'I).Integrating from z = 0 to z = z,
1z = 10. e'k.-z
Values for ke are determined from paired field measurements of light at depth using alog-linearization of the above equation,
log 12: = - ke . Z + log 10
Where a plot of lzversus depth yields keas its slope.
![Page 4: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/4.jpg)
3. Polynomials
When we have no theory to guide us, we can often fit the curve in the range ofobserved x values with a polyomial function .
For example a cubic polynomial would be
y ~ bo+ blx + b2x2 + b2x3
This is linear function fo the three variables
Xl = X
X2 = X2
X3 = x3
y ~ bo+ bixi + b2x2 + b3x3
Excel and other programs fit these sorts of polynomial models
Given the fitted function, we want to check for an adequate fit by plotting the dataalong with the fitted function
(~" J
1&00
...
. . i. 1700. 'i 1600I .: .1500;io
°[1400
~ 1300
123 4 S
Percent ammonium phosphate
Figure4:10 ScatterplOtandfittedcubicfortheflyashdata .
Nate that replication is useful for assessing how well the functional form fits thedata. With replication here we can tell that the quadratic polynoimal is undefittingthe y values at x = 2
---
![Page 5: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/5.jpg)
We would also plot residuals in the same ways suggested for a linear fit.
<,,'~i' "j
. In run order,.
. Versus fj
. Versus other potentially influential variables, e.g. technician
. Normal plot of residuals
x x x~.,
x xx x x x -
x x x xx x x
)( xx xxOl-lL_-x 'x-x x x
x x x xx x x
x x XXx x
xx
xx
x x x xOl--JJS.Y_":_-~xxx xx-
Xx x x x xx x x
~ or Xi
lal
x x
.x or xiIbl
r;
xx x)< )<)<
)< x)< x )<)< x x
~--x---x xx xxx
x x
r;
/ ',"";- 1\"" i
0
x x x xx x x XX
Xx X)C xx~-'jl'"-'-)i-x-
x' x )C )Cxx x x )C
~ x xx
)C
Yi or Xi
Icl.» or xi
Idl
r; Ii
I- l~~~!~_-0 -- ~
Xx~)C )C)<~x)<x
Xx
,;':.:Iy /""----~i. .*
Ji(el
Y;IfI
Ii Ii
Ii )Cxx x
x )Cx )C
x )C )<x )<)CxO~ x)<)<I xx---~ )t)<)<
x ~)C x )C,,-;----xxXx)C)C)C -.)C )C
XXX X X )CXXx
Ot-x~;r!~-~x~- )( x ---)( )( X
XX)(x x )()f< x xI(. x X">ex x
n Y;(gl Ihl
Figure 6.3 Residual plots: (a) null plot; (b) right-openingmegaphone; (c) left-opening megaphone; (d) double outward bow; (e) nonlinearity; (j) nonlinearity;(g) nonlinearityand nonconstantvariance;(h) nonlinearityand nonconstantvariance.
we;sb,,)------ --
![Page 6: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/6.jpg)
Section 4.2.2 Surface Fitting by Least Squares
In many situations the response variable, y, is affected by more than one x variable) For example we could have (see problem 21 in the Exercises)
y = armor strengthXl = thicknessX2 = Brinell hardness
The simplest model to fit in this case is a linear model
y ~ bo + bixi + b2x2
This sort of linear model with more than one x variable is called "multiple linearregression" .
f
In this case we are fitting a plane to the 3-D points.
y
Q).s:...
FIGURE 10.2.1 Multiple regression plane and scatter of points. V~h'-'e\
bo, bl, and b3 are chosen to minimize
L [Yi - Yi)2 = L (Yi - (bo+ blxli + b2x2i+ b3x3i)]2
Again, we are minimizing the sum of squared deviations of the fitted values, Y,, from the observed y values; we are minimizing the error sum of squares. This min-
imization can be solved with explicit matrix formulas, so many programs includingExcel have the capabiltiy of fitting such multiple linear regression models.
~--- ---
![Page 7: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/7.jpg)
For this function fitted y values for a fixed value of X2 follow parallel lines whenplotted against Xl
Mean y value
~.>-
'b\'.y,~'~¥
~
~
X2
l~~<'\'.
'.\ J
.1200 + 15x1- 35x2:
~
I~,
(60 t ~0~~ ($, k-J
~f).f' SD~z- +-#. ><1J'
~o+ t,{DI';..f-' (bvX,
i!
1~
..'
--
--- - ----- ---
![Page 8: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/8.jpg)
There can be any number of x variables, for example
'j y ~ bo+ b1xl + b2x2 + b3x3
It is also possible to introduce curvature into one or more of the variables
For example y ~ bo+ b1xl + b2x2+ b3x12
<;:... 30vj...g 25
i§ 20tf:
15 Y= -15.409 - .069xI + .528x2+ .007xf
35
50 55
{~~~
, Figure 4.15 Plots of fitted stack loss from equation(4.20)
0'1'"Y\DVl ra -rCL l\e I c- v ~ -\- t::7~ Y kke.s
Mean y value Mean y value
~.>-9;
~\.!;,?'1;
...~
~
X2 X2
(a)
FIGURE 14.3 Graphs of mean yvalue for two different models: (a) 1200 + 15x, - 35x2;(b) -4500 + 75x, + 60X2- X,X2
(b)
![Page 9: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/9.jpg)
We would also plot residuals in the same ways suggested for a linear fit.
. In run order
. Versus y
. Versus other potentially influential variables, e.g. technician
. Normal plot of residuals
In addition we would plot the residuals
. Versus each x variable
(a) Residual Plot against Predicted (b) Residual Plot against Xs
0.6
0.4
ji; 0.2;j
:g 0'"w~ -0.2
-0.4
-0.6
... .. ... . .... ... .,....: .. .... <. . . .". .,
0.6
0.4
ta 0.2;j
3;! 0'"w~ -0.2
-0.4
-0.6
.. ... . . ., . "... .. . ,.' ',' ..... . , ,°.., " .. ,
.
5 5.5 6 6.5
Predicted Value
7 7.5 30 40 60 7050
Xs
(a) Studentized Deleted Residuals
(d) Normal Probability Plot
. -2 0
Expected Value
4
170.6
0.4
ta 0.2;j
3;! 0'"w~ -0.2
-0.4
-06. L-4
,
.-'",;
/,,--.
10 20 30 40 50
Case Index
K \A.f \\e If e-\- C~l ,
4
3
2
1...,-
0
-1
-2
-3
![Page 10: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/10.jpg)
r ".!
y
Error probabil ity distribution
X1 X2
.-- --
(J
10..
!S
00 5
10
.
5
xX3
10
5
5
10
5
0'" , , I , I I I I I I , I I , I I , I , 00 5 10 15 20 0 ~ 5 10 15 20
Fig. 1.1. Plot of the data (x,y) with the fitted line for four data sets (Table l.l). Source:Anscombe (1973).
![Page 11: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/11.jpg)
y y y .. .... .. .. .. . ... . ..x x
(c)
FIGURE 13.6 Some commonly encountered patterns in scatter plots: (a) Consistent withthe simple linear regression model; (b) Suggests a nonlinear probabilistic model; (c) Sug-gests that variability in y changes with x
Standardizedresidual
Standardizedresidual
2 r--------. 2
.t~\ ,}
1
.........
x 0.. .
x. . . .-1 . .
-2 .r--------- -2(a) (b)
Standardizedresidual
Standardizedresidual
'.
2
,-,Large "~"
J~~E~_____--.x x
.-2 --------------------
(d)
2 --------------------..
' j -2 --------------------(e)
FIGURE 13.14 Examples of residual plots: (a) Satisfactory plot; (b) Plot suggestingthat a curvilinear regression model is needed; (c) Plot indicating nonconstant variance;(d) Plot showing a large residual; (e) Plot showing a potentially influential observation
.. . .... . .
. .. .-
(a)
. .. . .... . .. . .. .. .. .x
(b)
11- . ..
-l. . .
. .. . .
2.- .11- . .. .. .0I -.. .. .. . . .-1r
. ..-2
I
(c)Standardized
residual
q- .. .01
. . .- .
-1 1- . ... .
I', .. . ,-,.01 - ' .' . x.Potentially. .
-1 1- . . influential.. observation
![Page 12: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/12.jpg)
:3 2.0ag. 1.0<;I~ 00=
~ -1.0"0
~ -2.0en
. .......,........-2.0 -1.0 0 1.0 2.0
Residualquan~le
80
Air flow, xl
2.0 ..1.0 2 .. 2. .<;I
:§ 0'"u
~ -1.0
-2.0
Figure 4.14 Plots of residuals from a two-variable equation fit to the stack loss data(y = -42.00 + .78x! + .57X2)
Residuals are then (as always)
ResidualsI e=y-y I
(and should look like noise if the simplified equation is an adequate description ofthe data set). Further, the fraction of raw variation in y accounted for in the fittingprocess is (as always)
Coefficient ofdetermination R2 = ~)y - y)2 - I:(y - y)2
I:(y - y)2(4.27)
where the sums are over all observed y's. (Summation notation is being abused evenfurther than usual, by not even subscripting the y's and y's.)
2.01 .1.0 I 2<;I
:§2 . .'" 0 . .
-1.0 .. .-2.0 ..
I r50 60 70
2.0 . .1.0 .. 2<;I
:§ 0 2 . .'" ...-1.0 . .
-2.0 ..I I I
10 20 30
Fitted Stack Loss, y
. .I..
I I20 25 30
Inlet temperature, x2
![Page 13: 4.2.1 Curve Fitting In many cases the re~ationship of y to x is not …rregal/documents/stat3411_sp07/stat3411_… · The simplest model to fit in this case is a linear model y ~](https://reader033.vdocument.in/reader033/viewer/2022050301/5f6a609a7d71bf394d226463/html5/thumbnails/13.jpg)
Extrapolating
, With multiple x variables we need to be carefull about extrapolating beyond theregion of x variables for the observed data.
We can't necessarily tell that a combination of Xl and X2 is unusual just by seeingwhere the new value of Xl falls amongst the measured Xl values and separatelywhere the new value of X2 falls amongst the measured X2 values,
X2
10
5
CDots show (Xl>X
.
2) locations
of fictitious data points
1----;.;i;.;;e.:.-.~~The regionI I1 ... ' thl 5: .:.. (x : WI SXlS
L.:!~-t i and 10 SX2 S 20(3,15) is unlike the(Xl>X2)pairs for the data
20
15
1 2 3 4 5 X .1
.._---