section 7 curve fitting - college of...
TRANSCRIPT
MAE 4020/5020 – Numerical Methods with MATLAB
SECTION 7: CURVE FITTING
Introduction2
K. Webb MAE 4020/5020
3
Curve Fitting
K. Webb MAE 4020/5020
Often have data, , that is a function of some independent variable, , but the underlying relationship is unknown Know ’s and ’s (perhaps only approximately), but don’t know Measured data Tabulated data
Determine a function (i.e., a curve) that “best” describes relationship between and An approximation to (the unknown) This is curve fitting
4
Regression vs. Interpolation
K. Webb MAE 4020/5020
We’ll look at two categories of curve fitting:
Least‐squares regression Noisy data – uncertainty in value for a given valueWant “good” agreement between and data points Curve (i.e., ) may not pass through any data points
Polynomial interpolation Data points are known exactly – noiseless data Resulting curve passes through all data points
Before moving on to discuss least‐squares regression, we’ll first review a few basic concepts from statistics.
Review of Basic Statistics5
K. Webb MAE 4020/5020
6
Basic Statistical Quantities
K. Webb MAE 4020/5020
Arithmetic mean – the average or expected value
∑
Standard deviation (unbiased) – a measure of the spread of the data about the mean
1
where is the total sum of the squares of the residuals
7
Basic Statistical Quantities
K. Webb MAE 4020/5020
Variance – another measure of spread The square of the standard deviation Useful measure due to relationship with power and power spectral density of a signal or data set
or
8
Normal (Gaussian) Distribution
K. Webb MAE 4020/5020
Many naturally‐occurring random process are normally‐distributedMeasurement noise Very often assume noise in our data is Gaussian Probability density function (pdf):
where is the variance, and is the mean of the random variable,
9
Statistics in MATLAB
K. Webb MAE 4020/5020
Generation of random values Very useful for simulations including noise Uniformly‐distributed: rand(m,n) E.g., generate an m nmatrix of uniformly‐distributed points between xmin and xmax:
x = xmin + (xmax – xmin)*rand(m,n)
Normally‐distributed: randn(m,n) E.g., generate an m nmatrix of normally‐distributed points with mean of mu and standard deviation of sigma:
x = mu + sigma*randn(m,n)
10
Statistics in MATLAB
K. Webb MAE 4020/5020
Histogram plots: hist(x,binx,Nbins) Graphical depiction of the variation of random quantities Plots the frequency of occurrence of ranges (bins) of values
Provides insight into the nature of the distribution E.g., generate a histogram of the values in the vector x with 20 bins:
hist(x,20)
Optional input argument, binx, specifies x values at the centers of the bins
Built‐in statistical functions:min.m, max.m, mean.m, var.m, std.m, median.m, mode.m
11
Statistics in MATLAB
K. Webb MAE 4020/5020
Linear Least Squares Regression12
K. Webb MAE 4020/5020
13
Linear Regression
K. Webb MAE 4020/5020
Noisy data, , values at known values
Suspect relationship between and is linear
i.e., assume
Determine and that define the “best‐fit” line for the data
How do we define the “best fit”?
14
Measured Data
K. Webb MAE 4020/5020
Assumed a linear relationship between and :
Due to noise, can’t measure exactly at each Can only approximate values
Measured values are approximations True value of plus some random error or residual
15
Best Fit Criteria
K. Webb MAE 4020/5020
Noisy data do not all line on a single line – discrepancy between each point and the line fit to the data The error, or residual:
Minimize some measure of this residual: Minimize the sum of the residuals Positive and negative errors can cancel Non‐unique fit
Minimize the sum of the absolute values of the residuals Effect of sign of error eliminated, but still not a unique fit
Minimize the maximum error –minimax criterion Excessive influence given to single outlying points
16
Least‐Squares Criterion
K. Webb MAE 4020/5020
Better fitting criterion is to minimize the sum of the squares of the residuals
Yields a unique best‐fit line for a given set of data
The sum of the squares of the residuals is a function of the two fitting parameters, and ,
Minimize by setting its partial derivatives to zero and solving for and
17
Least‐Squares Criterion
K. Webb MAE 4020/5020
At its minimum point, partial derivatives of with respect to and will be zero
2 0
2 0
Breaking up the summation:
0
0
18
Normal Equations
K. Webb MAE 4020/5020
and form a system of two equations with two unknowns, and
1
2
In matrix form:
3
These are the normal equations
19
Normal Equations
K. Webb MAE 4020/5020
Normal equations can be solved for and :
∑ ∑ ∑∑ ∑
∑ ∑̅
Or solve the matrix form of the normal equations, (3), in MATLAB using mldivide.m (\)
\
20
Linear Least‐Squares ‐ Example
K. Webb MAE 4020/5020
Noisy data with suspected linear relationship
Calculate summation terms in the normal equations: , , ,
21
Linear Least‐Squares ‐ Example
K. Webb MAE 4020/5020
Assemble normal equation matrices
Solve normal equations for vector of coefficients, , using mldivide.m
22
Goodness of Fit
K. Webb MAE 4020/5020
How well does a function fit the data? Is a linear fit best? A quadratic, higher‐order polynomial, or other non‐linear function?
Want a way to be able to quantify goodness of fit
Quantify spread of data about the mean prior to regression:
Following regression, quantify spread of data about the regression line (or curve):
23
Goodness of Fit
K. Webb MAE 4020/5020
quantifies the spread of the data about the mean quantifies spread about the best‐fit line (curve)
The spread that remains after the trend is explained The unexplained sum of the squares
represents the reduction in data spread after regression explains the underlying trend
Normalize to ‐ the coefficient of determination
24
Coefficient of Determination
K. Webb MAE 4020/5020
For a perfect fit: No variation in data about the regression line 0 → 1
If the fit provides no improvement over simply characterizing data by its mean value: → 0
If the fit is worse at explaining the data than their mean value: → 0
25
Coefficient of Determination
K. Webb MAE 4020/5020
Calculate for previous example:
26
Coefficient of Determination
K. Webb MAE 4020/5020
Don’t rely too heavily on the value of Anscombe’s famous data sets:
Same line fit to all four data sets in each case
Chapra
Linearization of Nonlinear Relationships27
K. Webb MAE 4020/5020
28
Nonlinear functions
K. Webb MAE 4020/5020
Not all data can be explained by a linear relationship to an independent variable, e.g.
Exponential model
Power equation
Saturation‐growth‐rate equation
29
Nonlinear functions
K. Webb MAE 4020/5020
Methods for nonlinear curve fitting:
Linearization of the nonlinear relationship Transform the dependent and/or independent data values
Apply linear least‐squares regression Inverse transform the determined coefficients back to those that define the nonlinear functional relationship
Nonlinear regression Treat as an optimization problem – more later…
30
Linearizing an Exponential Relationship
K. Webb MAE 4020/5020
Linearize the fitting equation:
or
where
,
Have noisy data that is believed to be best described by an exponential relationship
31
Linearizing an Exponential Relationship
K. Webb MAE 4020/5020
Determine and :
Can calculate for the line fit to the transformed data
Note that original data must be positive
Fit a line to the transformed data using linear least‐squares regression
32
Linearizing an Exponential Relationship
K. Webb MAE 4020/5020
Exponential fit:
where,
Note that is different than that for the line fit to the transformed data
Transform the linear fitting parameters, and , back to the parameters defining the exponential relationship
33
Linearizing an Exponential Relationship
K. Webb MAE 4020/5020
34
Linearizing a Power Equation
K. Webb MAE 4020/5020
Linearize the fitting equation:
log log logor
log log
where
log ,
Have noisy data that is believed to be best described by an power equation
35
Linearizing a Power Equation
K. Webb MAE 4020/5020
Determine and :
Can calculate for the line fit to the transformed data
Note that original data –both and – must be positive
Fit a line to the transformed data using linear least‐squares regression
36
Linearizing a Power Equation
K. Webb MAE 4020/5020
Power equation:
where,
Note that is different than that for the line fit to the transformed data
Transform the linear fitting parameters, and , back to the parameters defining the power equation
37
Linearizing a Power Equation
K. Webb MAE 4020/5020
38
Linearizing a Saturation Growth‐Rate Equation
K. Webb MAE 4020/5020
Linearize the fitting equation:
1 1 1
or
1 1
where
,
Have noisy data that is believed to be best described by a saturation growth‐rate equation
39
Linearizing a Saturation Growth‐Rate Equation
K. Webb MAE 4020/5020
Determine and :
Can calculate for the line fit to the transformed data
Fit a line to the transformed data using linear least‐squares regression
40
Linearizing a Saturation Growth‐Rate Equation
K. Webb MAE 4020/5020
Saturation growth‐rate equation:
where
,
Note that is different than that for the line fit to the transformed data
Transform the linear fitting parameters, and , back to the parameters defining the saturation growth‐rate equation
41
Linearizing a Saturation Growth‐Rate Equation
K. Webb MAE 4020/5020
Polynomial Regression42
K. Webb MAE 4020/5020
43
Polynomial Regression
K. Webb MAE 4020/5020
So far we’ve looked at fitting straight lines to linearand linearized data sets
Can also fit mth‐order polynomials directly to data using polynomial regression
Same fitting criterion as linear regression:Minimize the sum of the squares of the residualsm+1 fitting parameters for an mth‐order polynomialm+1 normal equations
44
Polynomial Regression
K. Webb MAE 4020/5020
Assume, for example, that we have data we believe to be quadratic in nature
2nd‐order polynomial regression Fitting equation:
Best fit will minimize the sum of the squares of the residuals:
45
Polynomial Regression – Normal Equations
K. Webb MAE 4020/5020
Best‐fit polynomial coefficients will minimize Differentiate w.r.t. each coefficient and set to zero
2 0
2 0
2 0
46
Polynomial Regression – Normal Equations
K. Webb MAE 4020/5020
Rearranging the normal equations yields
Σ Σ Σ Σ Σ Σ ΣΣ Σ Σ Σ
Which can be put into matrix form:
Σ ΣΣ Σ ΣΣ Σ Σ
ΣΣΣ
This system of equations can be solved for the vector of unknown coefficients using MATLAB’s mldivide.m (\)
47
Polynomial Regression – Normal Equations
K. Webb MAE 4020/5020
For mth‐order polynomial regression the normal equations are:
Σ ⋯ ΣΣ Σ ⋯ Σ⋮ ⋮ ⋱ ⋮
Σ Σ ⋯ Σ⋮
ΣΣ⋮
Σ
Again, this system of 1 equations can be solved for the vector of 1 unknown polynomial coefficients using MATLAB’s mldivide.m (\)
48
Polynomial Regression – Example
K. Webb MAE 4020/5020
49
Polynomial Regression – polyfit.m
K. Webb MAE 4020/5020
p = polyfit(x,y,m)
x: ‐vector of independent variable data values y: ‐vector of dependent variable data values m: order of the polynomial to be fit to the data p: 1 ‐vector of best‐fit polynomial coefficients
Least‐squares polynomial regression if: 1 i.e., for over‐determined systems
Polynomial interpolation if: 1 Resulting fit passes through all (x,y) points – more later
50
Polynomial Regression – polyfit.m
K. Webb MAE 4020/5020
Note that the result matches that obtained by solving normal equations
51Exercise
Determine the 4th‐order polynomial with roots at
Generate noiseless data points by evaluating this polynomial at integer values of from to
Add Gaussian white noise with a standard deviation of to your data points
Use polyfit.m to fit a 4th‐order polynomial to the noisy data
Calculate the coefficient of determination, Plot the noisy data points, along with the best‐fit polynomial
Polynomial Regression Using polyfit.m
K. Webb MAE 4020/5020
Multiple Linear Regression52
K. Webb MAE 4020/5020
53
Multiple Linear Regression
K. Webb MAE 4020/5020
We have so far fit lines or curves to data described by functions of a single variable
For functions of multiple variables, fit planes or surfaces to data
Linear function of two independent variables: multiple linear regression
Sum of the squares of the residuals is now
, ,
54
Multiple Linear Regression – Normal Equations
K. Webb MAE 4020/5020
Differentiate w.r.t. fitting coefficients and equate to zero
The normal equations:
, ,
, , , ,
, , , ,
,
,
Solve as before – now fitting coefficients, , define a plane
General Linear Least Squares Regression55
K. Webb MAE 4020/5020
56
General Linear Least Squares
K. Webb MAE 4020/5020
We’ve seen three types of least‐squares regression Linear regression Polynomial regressionMultiple linear regression
All are special cases of general linear least‐squares regression
The ’s are basis functions Basis functions may be nonlinear This is linear regression, because dependence on fitting coefficients, , is linear
57
General Linear Least Squares
K. Webb MAE 4020/5020
For linear regression – simple or multiple:, , , …
For polynomial regression:, , , …
In all cases, this is a linear combination of basis function, which may, themselves, be nonlinear
58
General Linear Least Squares
K. Webb MAE 4020/5020
The general linear least‐squares model:⋯
Can be expressed in matrix form:
where is an 1 matrix, the design matrix, whose entries are the 1 basis functions evaluated at the independent variable values corresponding to the measurements:
⋯⋯
⋮ ⋮ ⋱ ⋮⋯
where is the basis function evaluated at the independent variable value. (Note: is not the row index and is not the column index, here.)
59
General Linear Least Squares
K. Webb MAE 4020/5020
The least‐squares model is:
More measurements than coefficients
is not square – tall and narrow Over‐determined system does not exist
60
General Linear Least Squares – Design Matrix Example
K. Webb MAE 4020/5020
For example, consider fitting a quadratic to five measured values, , at
Model is:
Basis functions are , , and Least‐squares equation is
61
General Linear Least Squares – Residuals
K. Webb MAE 4020/5020
Linear least‐squares model is:
(1)
Residual:(2)
Sum of the squares or the residuals:
(3)
Expanding,
(4)
62
Deriving the Normal Equations
K. Webb MAE 4020/5020
Best fit will minimize the sum of the squares of the residuals Differentiate with respect to the coefficient vector, , and set to zero
(5)
We’ll need to use some matrix calculus identities:
(6)
63
Deriving the Normal Equations
K. Webb MAE 4020/5020
Using the matrix derivative relationships, (6),
(7)
Equation (7) is the matrix form of the normal equations:
(8)
Solution to (8) is the vector of least‐squares fitting coefficients:
(9)
64
Solving the Normal Equations
K. Webb MAE 4020/5020
(9)
Remember, our starting point was the linear least‐squares model:
(10)
Couldn’t we have solved (10) for fitting coefficients as
(11)
No, must solve using (9), because: Don’t have , only noisy approximations, We have an over‐determined system is not square does not exist
65
Solving the Normal Equations
K. Webb MAE 4020/5020
Solution to the linear least‐squares problem is:
(12)
where
(13)
is the Moore‐Penrose pseudo‐inverse of
Use the pseudo‐inverse to find the least‐squares solutions to an over‐determined system
66
Coefficient of Determination
K. Webb MAE 4020/5020
Goodness of fit characterized by the coefficient of determination:
where is given by (3)
(14)
and(15)
67
General Least Squares in MATLAB
K. Webb MAE 4020/5020
Have measurements⋯
at known independent variable values⋯
and a model, defined by 1 basis functions⋯
Generate design matrix by evaluating 1 basis functions at all values of
⋯⋯
⋮ ⋮ ⋱ ⋮⋯
68
General Least Squares in MATLAB
K. Webb MAE 4020/5020
Solve for vector of fitting coefficients as the solution to the normal equations
or by using mldivide.m (\)
Result is the same, though the methods are differentmldivide.m uses QR factorization to find the solution to over‐determined systems
Nonlinear Regression69
K. Webb MAE 4020/5020
70
Nonlinear Regression – fminsearch.m
K. Webb MAE 4020/5020
Nonlinear models: Have nonlinear dependence on fitting parameters E.g.,
Two options for fitting nonlinear models to data Linearize the model first, then use linear regression Fit a nonlinear model directly by treating as an optimization problem
Want to minimize a cost function Cost function is the sum of the squares of the residuals
Find the minimum of – a multi‐dimensional optimization Use MATLAB’s fminsearch.m
71
Nonlinear Regression – fminsearch.m
K. Webb MAE 4020/5020
Cost function:
Find and to minimize Use fminsearch.m
Have noisy data that is believed to be best described by an exponential relationship
72
Nonlinear Regression – fminsearch.m
K. Webb MAE 4020/5020
73
Nonlinear Regression – lsqcurvefit.m
K. Webb MAE 4020/5020
An alternative to minimizing a cost function using fminsearch.m, if the optimization toolbox is available:
a = lsqcurvefit(f,a0,x,y)
f: handle to the fitting function e.g., f = @(a,x) a(1)*exp(a(2)*x);
a0: initial guess for fitting parameters x: independent variable data y: dependent variable data a: best‐fit parameters
74
Nonlinear Regression – lsqcurvefit.m
K. Webb MAE 4020/5020
Polynomial Interpolation75
K. Webb MAE 4020/5020
76
Polynomial Interpolation
K. Webb MAE 4020/5020
Sometimes we know both and values exactlyWant a function that describes Allows for interpolation between know data points
Fit an ‐order polynomial to data points
Polynomial will pass through all points
We’ll look at polynomial interpolation using Newton’s polynomial The Lagrange polynomial
77
Polynomial Interpolation
K. Webb MAE 4020/5020
Can approach similar to linear least‐squares regression
where, , …
For an ‐order polynomial, we have equations with unknowns
In matrix form
78
Polynomial Interpolation
K. Webb MAE 4020/5020
Now, unlike for linear regression All 1 values in are known exactly 1 equations with 1 unknown coefficients is square 1 1
⋮
⋯ 1⋯ 1
⋮ ⋮ ⋱ ⋮⋯ 1
⋮
Could solve in MATLAB using mldivide.m\
is a Vandermonde matrix Tend to be ill‐conditioned The techniques that follow are more numerically robust
Newton Interpolating Polynomial79
K. Webb MAE 4020/5020
80
Linear Interpolation
K. Webb MAE 4020/5020
Fit a line (1st‐order polynomial) to two data points using a truncated Taylor series (or simple trigonometry):
where is the function for the line fit to the data, and are the known data values
This is the Newton linear‐interpolation formula
81
Quadratic Interpolation
K. Webb MAE 4020/5020
To fit a 2nd‐order polynomial to three data points, consider the following form
Evaluate at to find
Back‐substitution and evaluation at and at will yield the other coefficients
and
82
Quadratic Interpolation
K. Webb MAE 4020/5020
Can still view this as a Taylor series approximation represents an offset is slope is curvature
Choice of initial quadratic form (Newton interpolating polynomial) was made to facilitate the development Resulting polynomial would be the same for any initial form of an ‐order polynomial
Solution is unique
83
‐Order Newton Interpolating Polynomial
K. Webb MAE 4020/5020
Extending the quadratic example to ‐order⋯ ⋯
Solve for coefficients as before with back‐substitution and evaluation of
denotes a finite divided difference
84
Finite Divided Differences
K. Webb MAE 4020/5020
First finite divided difference
,
Second finite divided difference
, ,, ,
finite divided difference
, , … , ,, … , , … ,
Calculate recursively
85
‐Order Newton Interpolating Polynomial
K. Webb MAE 4020/5020
‐order Newton interpolating polynomial in terms of divided differences:
, ⋯, , … , , ⋯
Divided difference table for calculation of coefficients:
Chapra
86
Newton Interpolating Polynomial – Example
K. Webb MAE 4020/5020
Lagrange Interpolating Polynomial87
K. Webb MAE 4020/5020
88
Linear Lagrange Interpolation
K. Webb MAE 4020/5020
Fit a first‐order polynomial (a line) to two known data points: and
∙ ∙
and are weighting functions, where1,0,1,0,
The interpolating polynomial is a weighted sum of the individual data point values
89
Linear Lagrange Interpolation
K. Webb MAE 4020/5020
For linear (1st‐order) interpolation, the weighting functions are:
The linear Lagrange interpolating polynomial is:
90
‐Order Lagrange Interpolation
K. Webb MAE 4020/5020
Lagrange interpolation technique can be extended to ‐order polynomials
∙
where
91
Lagrange Interpolating Polynomial – Example
K. Webb MAE 4020/5020
MATLAB’s Curve‐Fitting Tool92
K. Webb MAE 4020/5020
93
MATLAB’s Curve‐Fitting Tool GUI
K. Webb MAE 4020/5020
Type cftool at the command line Launches curve fitting tool GUI in a new window
Import data from the workspace Quickly try fitting different types of functions to the data Polynomial Exponential Power Fourier Custom equations, and more…
Interpolation for Least‐squares regression for
94
MATLAB’s Curve‐Fitting Tool GUI
K. Webb MAE 4020/5020
95Exercise
Go to the course website and download the following data file: cftool_ex_data.mat
Load data file into the workspace At the command line, type: load(‘cftool_ex_data’)
Launch the Curve Fitting Tool Type cftool at the command line
Try fitting different functions to each of the three data sets Can open a new tab in the GUI for each fit
Curve Fitting with the cftool GUI
K. Webb MAE 4020/5020