Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1
~ Curve Fitting ~Least Squares Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2
•Fit the best curve to a discrete data set and obtain estimates for other data points
•Two general approaches:–Data exhibit a significant degree
of scatter. Find a single curve that represents the general trend of the data.
–Data is very precise. Pass a curve(s) exactly through each of the points: interpolation.
Curve Fitting
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
3
In sciences, if several measurements are made of a particular quantity, additional insight can be gained by summarizing the data in one or more well chosen statistics:
Arithmetic mean - The sum of the individual data points (yi) divided by the number of points.
Standard deviation – a common measure of spread for a sample
or variance
Coefficient of variation – quantifies the spread of data (similar to relative error)
ny
y i
1
2
n
yyS i
y
)(1
22
n
yyS i
y
)(
Simple Statistics
%100.. y
Svc y
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
4
• Fitting a straight line to a set of paired observations: (x1, y1), (x2, y2),…,(xn, yn)
yi : measured value ei : error (residual)
yi = a0 + a1 xi + ei
ei = yi - a0 - a1 xi
a1 : slope
a0 : intercept
Linear Regression
ei Error
Line equationy = a0 + a1 x
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
5
• The most commonly used strategy is to minimize the sum of the squares of the residuals between the measured-y and the y calculated with the linear model:
• Yields a unique line for a given set of data• Need to compute a0 and a1 such that Sr is minimized!
n
iiir
n
imodelimeasuredi
n
iir
xaayS
yy
eS
1
210
1
2
1
2
)(
)( ,,
ei Error
Criteria For a “Best Fit”
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Least-Squares Fit of a Straight Line
0 0)(2
0 0)(2
2101
1
101
iiiiiioir
iiioio
r
xaxaxyxxaayaS
xaayxaayaS
Normal equations which can be solved simultaneously
iiii
ii
xyaxax
yaxna
naa
12
0
10
00
(2)
(1)
Since
n
iii
n
iir xaayeS
1
210
1
2 )( :error Minimize
6
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Least-Squares Fit of a Straight Line
xayaa
xxn
yxyxna
ii
iiii
100
221
as expressed becan (1), using
Normal equations which can be solved simultaneously
Mean values
iiii
ii
xyaxax
yaxna
12
0
10
n
iii
n
iir xaayeS
1
210
1
2 )( :error Minimize
ii
i
ii
i
xyy
aa
xxx
1
02
n
7
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
• Sr = Sum of the squares of residuals around the regression line• St = total sum of the squares around the mean• (St – Sr) quantifies the improvement or error reduction due to describing
data in terms of a straight line rather than as an average value.
• For a perfect fit Sr=0 and r = r2 = 1 signifies that the line explains 100 % of the variability of the data.
• For r = r2 = 0 Sr=St the fit represents no improvement
t
rt
SSSr
2r : correlation coefficientr2 : coefficient of determination
“Goodness” of our fit
The spread of data
(a) around the mean(b) around the best-fit line
Notice the improvement in the error due to linear regression
2
1
210
)(
)(
yyS
xaayS
it
n
iiir
8
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Use linear regression to fit the following data to a straight line.
Solution:Solving this system of two equations gives
a0 = 4.56 and a1 = 0.6
y = 4.56 + 0.6 x
x y x2 xy1 5.1 1 5.1
2 5.9 4 11.8
3 6.3 9 18.9
∑ 6 17.3 14 35.8
ii
i
ii
i
xyy
aa
xxx
1
02
n
8.353.17
14663
1
0
aa
9
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example
Use linear regression to fit the following data to a straight line.
Solution:Solving this system of two equations gives
a0=5 and a1=4
y = 5 + 4x
x y x2 xy0 5 0 0
1 9 1 9
2 13 4 26
3 17 9 51
4 21 16 84
∑ 10 65 30 170
ii
i
ii
i
xyy
aa
xxx
1
02
n
17065
3010105
1
0
aa
10
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linearization of Nonlinear Relationships
(a) Data that is ill-suited for linear least-squares regression
(b) Indication that a parabola may be more suitable
Exponential Eq.
xey
slope
lnintercept
11
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Saturation growth-rate Eq.
Power Eq.
Linearization of Nonlinear Relationships
12
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
13
Polynomial Regression• Some engineering data is poorly represented by a straight line. A curve
(polynomial) may be better suited to fit the data. The least squares method can be extended to fit the data to higher order polynomials.
• As an example let us consider a second order polynomial to fit the data points:
n
iiii
n
iir xaxaayeS
1
22210
1
2 )( :error Minimize
21 2
21 2
1
2 21 2
2
2 ( ) 0
2 ( ) 0
2 ( ) 0
ri o i i
o
ri i o i i
ri i o i i
S y a a x a xaS x y a a x a xaS x y a a x a xa
iiiii
iiiii
iii
yxaxaxax
yxaxaxax
yaxaxna
22
41
30
2
23
12
0
22
10
2210 xaxaay
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example: Polynomial Regression
Fit a second-order polynomial to the following data
Solution:Solving this system of three equations gives
a0 = 2.48, a1 = 2.36, a2 = 1.86
y = 2.48 + 2.36 x + 1.86 x2
xi yi xi2 xi
3 xi4 xiyi xi
2yi
0 2.1 0 0 0 0 0
1 7.7 1 1 1 7.7 7.7
2 13.6 4 8 16 27.2 54.4
3 27.2 9 27 81 81.6 244.8
4 40.9 16 64 256 163.6 654.4
5 61.1 25 125 625 305.5 1527.5
∑ 15 152.6 55 225 979 585.6 2488.8
ii
ii
i
iii
iii
ii
yxyx
y
aaa
xxxxxxxx
22
1
0
432
32
2n
8.24886.5856.152
97925555225551555156
2
1
0
aaa
14