4.2.1 curve fitting in many cases the re~ationship of y to x is not...

Section 4.2 Fitting Curves and Surfaces by Least Squares

\ 4.2.1 Curve Fitting'--i---

In many cases the re~ationship of y to x is not a straight line

~~:I

4) 50.0U:;a....0 40.0>.g4):s

go 30.0of:

10.00.0 10.0 20.0 30.0 40.0 50.0 60.0

Miles east of Southport, Connecticut sC>'~D I: LA >ui. K h ~~I{

~>("""~ .~

To fit a curve to the data on can

1. Fit a nonlinear function directly to the data. .

2. Rescale, transform x or y to make the relationship linear.

3. Fit a polynomial function to the data.

For a uniform fluid light should decay as an exponential function of depth.Given data on depth, x, and light intensity, y, the data should look like an expo-nential function. .

y = a * eb1*x

1. Direct Fitting

We could fit the equation directly by minimizing I: (Yi - a * ebl*Xi)2

If the residuals are normal with equal variance in the plot, then this would be agood way to go.

We could use a "nonlinear regression" program to accomplish this.

/

. This can also be accomplish with the "Solver" add-in in Excel

We won't be doing such curves this way in Stat 3411.

But very, very often in these situations the variances are not constant.

One way to handle nonconstant variance is "weighted regression" where one give-", larger weight to residuals where variance is ~mall and we are more sure of where~ the fitted line should go. We won't be doing this in Stat 3411

2. 'Iransforming

Often the larger values have more variance and as the signal decreases exponen-tially, the smaller values have smaller variances

In this case it can work better to rescale the data to

In(y) = In(a) + b1 * x = bo + b1 * x j =- c{81:>,><

Often values in the log scale have fairly constant variances.

~. ..aC\I .

.. .

... ... . .. .. .. .

>-10,... . ... ..

... ........ . ...... ,. .. . '" r. .. ..'... .

. .-. ...fit. .

T

.a,...

.-0 5 10 15 20

x

... .... .acw) .

.. ... .. ... .. . .....

. .. ........ . .... ... fit. .".... ..

.

coN~--Z...JC\IN

... ...... . .. .- '..

".,--- eX) .,... .-0 5 10 15 20

Predicted values are less reliable if you are extrapolating outside the range of xvalues in the data set.

FIGURE 9.24

Using a Regression ModelOutside the ExperimentalRegion

~a..Zt:J

//

//

// .-?

~. ~~::: .. \

\ f

M.c <:: L ~. \/'e

6 8

Inflation rate (x, %)

iI1J't If at all possible, extrapolations are more reliable if one has an equation from a~) physical model.

For example physical models predict that in a uniform fluid light will decay expo-nentially with depth.

CE4505SurfaceWaterQualilVEngineering

tth

t~y

http://www.cee.mtu.edu/~mtauer/classes/ce4505/lecture3.htm

The attenuation of light with depth is well-described by first-order kinetics,

,:iI'

dI

dz = -ke. I

where I is light (j.!E. m'2-s'I), z is depth (m), and keis the extinction coefficient (m'I).Integrating from z = 0 to z = z,

1z = 10. e'k.-z

Values for ke are determined from paired field measurements of light at depth using alog-linearization of the above equation,

log 12: = - ke . Z + log 10

Where a plot of lzversus depth yields keas its slope.

3. Polynomials

When we have no theory to guide us, we can often fit the curve in the range ofobserved x values with a polyomial function .

For example a cubic polynomial would be

y ~ bo+ blx + b2x2 + b2x3

This is linear function fo the three variables

Xl = X

X2 = X2

X3 = x3

y ~ bo+ bixi + b2x2 + b3x3

Excel and other programs fit these sorts of polynomial models

Given the fitted function, we want to check for an adequate fit by plotting the dataalong with the fitted function

(~" J

1&00

...

. . i. 1700. 'i 1600I .: .1500;io

°[1400

~ 1300

123 4 S

Percent ammonium phosphate

Figure4:10 ScatterplOtandfittedcubicfortheflyashdata .

Nate that replication is useful for assessing how well the functional form fits thedata. With replication here we can tell that the quadratic polynoimal is undefittingthe y values at x = 2

---

We would also plot residuals in the same ways suggested for a linear fit.

<,,'~i' "j

. In run order,.

. Versus fj

. Versus other potentially influential variables, e.g. technician

. Normal plot of residuals

x x x~.,

x xx x x x -

x x x xx x x

)( xx xxOl-lL_-x 'x-x x x

x x x xx x x

x x XXx x

xx

xx

x x x xOl--JJS.Y_":_-~xxx xx-

Xx x x x xx x x

~ or Xi

lal

x x

.x or xiIbl

r;

xx x)< )<)<

)< x)< x )<)< x x

~--x---x xx xxx

x x

r;

/ ',"";- 1\"" i

0

x x x xx x x XX

Xx X)C xx~-'jl'"-'-)i-x-

x' x )C )Cxx x x )C

~ x xx

)C

Yi or Xi

Icl.» or xi

Idl

r; Ii

I- l~~~!~_-0 -- ~

Xx~)C )C)<~x)<x

Xx

,;':.:Iy /""----~i. .*

Ji(el

Y;IfI

Ii Ii

Ii )Cxx x

x )Cx )C

x )C )<x )<)CxO~ x)<)<I xx---~ )t)<)<

x ~)C x )C,,-;----xxXx)C)C)C -.)C )C

XXX X X )CXXx

Ot-x~;r!~-~x~- )( x ---)( )( X

XX)(x x )()f< x xI(. x X">ex x

n Y;(gl Ihl

Figure 6.3 Residual plots: (a) null plot; (b) right-openingmegaphone; (c) left-opening megaphone; (d) double outward bow; (e) nonlinearity; (j) nonlinearity;(g) nonlinearityand nonconstantvariance;(h) nonlinearityand nonconstantvariance.

we;sb,,)------ --

Section 4.2.2 Surface Fitting by Least Squares

In many situations the response variable, y, is affected by more than one x variable) For example we could have (see problem 21 in the Exercises)

y = armor strengthXl = thicknessX2 = Brinell hardness

The simplest model to fit in this case is a linear model

y ~ bo + bixi + b2x2

This sort of linear model with more than one x variable is called "multiple linearregression" .

f

In this case we are fitting a plane to the 3-D points.

y

Q).s:...

FIGURE 10.2.1 Multiple regression plane and scatter of points. V~h'-'e\

bo, bl, and b3 are chosen to minimize

L [Yi - Yi)2 = L (Yi - (bo+ blxli + b2x2i+ b3x3i)]2

Again, we are minimizing the sum of squared deviations of the fitted values, Y,, from the observed y values; we are minimizing the error sum of squares. This min-

imization can be solved with explicit matrix formulas, so many programs includingExcel have the capabiltiy of fitting such multiple linear regression models.

~--- ---

For this function fitted y values for a fixed value of X2 follow parallel lines whenplotted against Xl

Mean y value

~.>-

'b\'.y,~'~¥

~

~

X2

l~~<'\'.

'.\ J

.1200 + 15x1- 35x2:

~

I~,

(60 t ~0~~ ($, k-J

~f).f' SD~z- +-#. ><1J'

~o+ t,{DI';..f-' (bvX,

i!

1~

..'

--

--- - ----- ---

There can be any number of x variables, for example

'j y ~ bo+ b1xl + b2x2 + b3x3

It is also possible to introduce curvature into one or more of the variables

For example y ~ bo+ b1xl + b2x2+ b3x12

<;:... 30vj...g 25

i§ 20tf:

15 Y= -15.409 - .069xI + .528x2+ .007xf

35

50 55

{~~~

, Figure 4.15 Plots of fitted stack loss from equation(4.20)

0'1'"Y\DVl ra -rCL l\e I c- v ~ -\- t::7~ Y kke.s

Mean y value Mean y value

~.>-9;

~\.!;,?'1;

...~

~

X2 X2

(a)

FIGURE 14.3 Graphs of mean yvalue for two different models: (a) 1200 + 15x, - 35x2;(b) -4500 + 75x, + 60X2- X,X2

(b)

We would also plot residuals in the same ways suggested for a linear fit.

. In run order

. Versus y

. Versus other potentially influential variables, e.g. technician

. Normal plot of residuals

In addition we would plot the residuals

. Versus each x variable

(a) Residual Plot against Predicted (b) Residual Plot against Xs

0.6

0.4

ji; 0.2;j

:g 0'"w~ -0.2

-0.4

-0.6

... .. ... . .... ... .,....: .. .... <. . . .". .,

0.6

0.4

ta 0.2;j

3;! 0'"w~ -0.2

-0.4

-0.6

.. ... . . ., . "... .. . ,.' ',' ..... . , ,°.., " .. ,

.

5 5.5 6 6.5

Predicted Value

7 7.5 30 40 60 7050

Xs

(a) Studentized Deleted Residuals

(d) Normal Probability Plot

. -2 0

Expected Value

4

170.6

0.4

ta 0.2;j

3;! 0'"w~ -0.2

-0.4

-06. L-4

,

.-'",;

/,,--.

10 20 30 40 50

Case Index

K \A.f \\e If e-\- C~l ,

4

3

2

1...,-

0

-1

-2

-3

r ".!

y

Error probabil ity distribution

X1 X2

.-- --

(J

10..

!S

00 5

10

.

5

xX3

10

5

5

10

5

0'" , , I , I I I I I I , I I , I I , I , 00 5 10 15 20 0 ~ 5 10 15 20

Fig. 1.1. Plot of the data (x,y) with the fitted line for four data sets (Table l.l). Source:Anscombe (1973).

y y y .. .... .. .. .. . ... . ..x x

(c)

FIGURE 13.6 Some commonly encountered patterns in scatter plots: (a) Consistent withthe simple linear regression model; (b) Suggests a nonlinear probabilistic model; (c) Sug-gests that variability in y changes with x

Standardizedresidual


2 r--------. 2

.t~\ ,}

1

.........

x 0.. .

x. . . .-1 . .

-2 .r--------- -2(a) (b)



'.

2

,-,Large "~"

J~~E~_____--.x x

.-2 --------------------

(d)

2 --------------------..

' j -2 --------------------(e)

FIGURE 13.14 Examples of residual plots: (a) Satisfactory plot; (b) Plot suggestingthat a curvilinear regression model is needed; (c) Plot indicating nonconstant variance;(d) Plot showing a large residual; (e) Plot showing a potentially influential observation

.. . .... . .

. .. .-

(a)

. .. . .... . .. . .. .. .. .x

(b)

11- . ..

-l. . .

. .. . .

2.- .11- . .. .. .0I -.. .. .. . . .-1r

. ..-2

I

(c)Standardized

residual

q- .. .01

. . .- .

-1 1- . ... .

I', .. . ,-,.01 - ' .' . x.Potentially. .

-1 1- . . influential.. observation

:3 2.0ag. 1.0<;I~ 00=

~ -1.0"0

~ -2.0en

. .......,........-2.0 -1.0 0 1.0 2.0

Residualquan~le

80

Air flow, xl

2.0 ..1.0 2 .. 2. .<;I

:§ 0'"u

~ -1.0

-2.0

Figure 4.14 Plots of residuals from a two-variable equation fit to the stack loss data(y = -42.00 + .78x! + .57X2)

Residuals are then (as always)

ResidualsI e=y-y I

(and should look like noise if the simplified equation is an adequate description ofthe data set). Further, the fraction of raw variation in y accounted for in the fittingprocess is (as always)

Coefficient ofdetermination R2 = ~)y - y)2 - I:(y - y)2

I:(y - y)2(4.27)

where the sums are over all observed y's. (Summation notation is being abused evenfurther than usual, by not even subscripting the y's and y's.)

2.01 .1.0 I 2<;I

:§2 . .'" 0 . .

-1.0 .. .-2.0 ..

I r50 60 70

2.0 . .1.0 .. 2<;I

:§ 0 2 . .'" ...-1.0 . .

-2.0 ..I I I

10 20 30

Fitted Stack Loss, y

. .I..

I I20 25 30

Inlet temperature, x2

Extrapolating

, With multiple x variables we need to be carefull about extrapolating beyond theregion of x variables for the observed data.

We can't necessarily tell that a combination of Xl and X2 is unusual just by seeingwhere the new value of Xl falls amongst the measured Xl values and separatelywhere the new value of X2 falls amongst the measured X2 values,

X2

10

5

CDots show (Xl>X

.

2) locations

of fictitious data points

1----;.;i;.;;e.:.-.~~The regionI I1 ... ' thl 5: .:.. (x : WI SXlS

L.:!~-t i and 10 SX2 S 20(3,15) is unlike the(Xl>X2)pairs for the data

20

15

1 2 3 4 5 X .1

.._---

4.2.1 curve fitting in many cases the re~ationship of y to x is not...

Documents