di erential equations & functional data analysishuang251/dynamics.pdf · di erential equations...
TRANSCRIPT
Differential Equations & Functional Data AnalysisParameter Estimation for Differential Equations
Whitney Huang
Department of StatisticsPurdue University
April 21, 2014
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 1 / 27
Outline
1 Motivation
2 The estimation procedure
3 Example: Groundwater Levels
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 2 / 27
Differential Equations
A differential equation is an equation that relates some function of one ormore variables with its derivatives.Ordinary differential equation:
mx + bx + kx = Fext
Partial differential equation:
∂u
∂t− α
(∂2u
∂x2+∂2u
∂y2+∂2u
∂z2
)= 0
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 3 / 27
Differential Equations
I Widely used to model dynamic systemsI Maxwell’s equations in electromagnetismI Navier–Stokes equations in fluid dynamics in fluid dynamicsI The Black–Scholes PDE in Economics
I Forward problem (i.e. solving differential equations) has been studiedextensively by mathematicians
Inverse Problem
Suppose a given data set can be reasonably model by a differentialequation but with unknown coefficients. Can we make a statisticalinference?
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 4 / 27
Differential Equations
I Widely used to model dynamic systemsI Maxwell’s equations in electromagnetismI Navier–Stokes equations in fluid dynamics in fluid dynamicsI The Black–Scholes PDE in Economics
I Forward problem (i.e. solving differential equations) has been studiedextensively by mathematicians
Inverse Problem
Suppose a given data set can be reasonably model by a differentialequation but with unknown coefficients. Can we make a statisticalinference?
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 4 / 27
Differential Equations
I Widely used to model dynamic systemsI Maxwell’s equations in electromagnetismI Navier–Stokes equations in fluid dynamics in fluid dynamicsI The Black–Scholes PDE in Economics
I Forward problem (i.e. solving differential equations) has been studiedextensively by mathematicians
Inverse Problem
Suppose a given data set can be reasonably model by a differentialequation but with unknown coefficients. Can we make a statisticalinference?
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 4 / 27
Differential Equations
I Widely used to model dynamic systemsI Maxwell’s equations in electromagnetismI Navier–Stokes equations in fluid dynamics in fluid dynamicsI The Black–Scholes PDE in Economics
I Forward problem (i.e. solving differential equations) has been studiedextensively by mathematicians
Inverse Problem
Suppose a given data set can be reasonably model by a differentialequation but with unknown coefficients. Can we make a statisticalinference?
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 4 / 27
Differential Equations
I Widely used to model dynamic systemsI Maxwell’s equations in electromagnetismI Navier–Stokes equations in fluid dynamics in fluid dynamicsI The Black–Scholes PDE in Economics
I Forward problem (i.e. solving differential equations) has been studiedextensively by mathematicians
Inverse Problem
Suppose a given data set can be reasonably model by a differentialequation but with unknown coefficients. Can we make a statisticalinference?
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 4 / 27
Differential Equations
I Widely used to model dynamic systemsI Maxwell’s equations in electromagnetismI Navier–Stokes equations in fluid dynamics in fluid dynamicsI The Black–Scholes PDE in Economics
I Forward problem (i.e. solving differential equations) has been studiedextensively by mathematicians
Inverse Problem
Suppose a given data set can be reasonably model by a differentialequation but with unknown coefficients. Can we make a statisticalinference?
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 4 / 27
Differential Equations
I Widely used to model dynamic systemsI Maxwell’s equations in electromagnetismI Navier–Stokes equations in fluid dynamics in fluid dynamicsI The Black–Scholes PDE in Economics
I Forward problem (i.e. solving differential equations) has been studiedextensively by mathematicians
Inverse Problem
Suppose a given data set can be reasonably model by a differentialequation but with unknown coefficients. Can we make a statisticalinference?
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 4 / 27
Differential Equations & Functional Data Analysis
Most dynamic systems defined by the solutions of their differentialequations are not fit to data, they intend to capture gross shape featuresin the specified context. However, · · ·
I Solutions of differential equations are functions
I We can treat the data as an approximated solution of thecorresponding differential equation
⇒ FDA framework can help for the inverse problem
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 5 / 27
Differential Equations & Functional Data Analysis
Most dynamic systems defined by the solutions of their differentialequations are not fit to data, they intend to capture gross shape featuresin the specified context. However, · · ·
I Solutions of differential equations are functions
I We can treat the data as an approximated solution of thecorresponding differential equation
⇒ FDA framework can help for the inverse problem
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 5 / 27
Differential Equations & Functional Data Analysis
Most dynamic systems defined by the solutions of their differentialequations are not fit to data, they intend to capture gross shape featuresin the specified context. However, · · ·
I Solutions of differential equations are functions
I We can treat the data as an approximated solution of thecorresponding differential equation
⇒ FDA framework can help for the inverse problem
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 5 / 27
Differential Equations & Functional Data Analysis
Most dynamic systems defined by the solutions of their differentialequations are not fit to data, they intend to capture gross shape featuresin the specified context. However, · · ·
I Solutions of differential equations are functions
I We can treat the data as an approximated solution of thecorresponding differential equation
⇒ FDA framework can help for the inverse problem
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 5 / 27
Differential Equations & Functional Data Analysis
Most dynamic systems defined by the solutions of their differentialequations are not fit to data, they intend to capture gross shape featuresin the specified context. However, · · ·
I Solutions of differential equations are functions
I We can treat the data as an approximated solution of thecorresponding differential equation
⇒ FDA framework can help for the inverse problem
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 5 / 27
Outline
1 Motivation
2 The estimation procedure
3 Example: Groundwater Levels
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 6 / 27
Set-up
Differential equation:f (x , x , x , · · · ; θ) = 0
e.g. x = −βx + µ⇒ f = x + βx − µ = 0
Observed data:
y(ti ) = x(ti ) + εi , i = 1, · · · , n
εii.i.d.∼ N(0, σ2)
Goal: to estimate the unknown θ in the differential equation from the datawith measurement error and to quantify the uncertainty of the estimates
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 7 / 27
Statistical Challenge
Most differential equations are not solvable analytically. In order to carryout the estimation
I it requires repeatedly solving differential equation numerically
I the initial value of the differential equation need to be known
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 8 / 27
Statistical Challenge
Most differential equations are not solvable analytically. In order to carryout the estimation
I it requires repeatedly solving differential equation numerically
I the initial value of the differential equation need to be known
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 8 / 27
Estimation procedure (Ramsay et al. 2007): basic idea
1 Use basis function expansion to approximate x(t), i.e x(t) = cTφ(t)
2 Estimate the coefficients c of the chosen basis functions byincorporating differential equation defined penalty
3 Estimate the parameters θ in the differential equation
4 Choosing the amount of smoothing λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 9 / 27
Estimation procedure (Ramsay et al. 2007): basic idea
1 Use basis function expansion to approximate x(t), i.e x(t) = cTφ(t)
2 Estimate the coefficients c of the chosen basis functions byincorporating differential equation defined penalty
3 Estimate the parameters θ in the differential equation
4 Choosing the amount of smoothing λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 9 / 27
Estimation procedure (Ramsay et al. 2007): basic idea
1 Use basis function expansion to approximate x(t), i.e x(t) = cTφ(t)
2 Estimate the coefficients c of the chosen basis functions byincorporating differential equation defined penalty
3 Estimate the parameters θ in the differential equation
4 Choosing the amount of smoothing λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 9 / 27
Estimation procedure (Ramsay et al. 2007): basic idea
1 Use basis function expansion to approximate x(t), i.e x(t) = cTφ(t)
2 Estimate the coefficients c of the chosen basis functions byincorporating differential equation defined penalty
3 Estimate the parameters θ in the differential equation
4 Choosing the amount of smoothing λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 9 / 27
Basic function expansion
x(t) =K∑
k=1
ckφk(t) = cTφ(t)
I Choice of basis function: splines are usually the logical choice becauseof the compact support and the capacity to capture any transientlocalized features
I Number of basis function: usually large because it requires not onlyto approximate x(t) but also its derivatives
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 10 / 27
Basic function expansion
x(t) =K∑
k=1
ckφk(t) = cTφ(t)
I Choice of basis function: splines are usually the logical choice becauseof the compact support and the capacity to capture any transientlocalized features
I Number of basis function: usually large because it requires not onlyto approximate x(t) but also its derivatives
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 10 / 27
Data fitting criterion
J(c|θ) = ` (x(ti ), y(ti ))︸ ︷︷ ︸data fidelity
+ λ
∫[f (x(t); θ)]2 dt︸ ︷︷ ︸
DE defined penalty
I the first part of the criterion function is the fidelity of basis functionapproximation to the data
I the second part is the penalty term with respect to differentialequation given θ
I smoothing parameter λ controls the relative emphasis on these twoobjectives
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 11 / 27
Data fitting criterion
J(c|θ) = ` (x(ti ), y(ti ))︸ ︷︷ ︸data fidelity
+ λ
∫[f (x(t); θ)]2 dt︸ ︷︷ ︸
DE defined penalty
I the first part of the criterion function is the fidelity of basis functionapproximation to the data
I the second part is the penalty term with respect to differentialequation given θ
I smoothing parameter λ controls the relative emphasis on these twoobjectives
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 11 / 27
Data fitting criterion
J(c|θ) = ` (x(ti ), y(ti ))︸ ︷︷ ︸data fidelity
+ λ
∫[f (x(t); θ)]2 dt︸ ︷︷ ︸
DE defined penalty
I the first part of the criterion function is the fidelity of basis functionapproximation to the data
I the second part is the penalty term with respect to differentialequation given θ
I smoothing parameter λ controls the relative emphasis on these twoobjectives
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 11 / 27
The parameter hierarchy
There are three classes of parameters to estimate:
I The coefficients c in the basis function expansion
I The parameters θ defining the differential equation
I The smoothing parameter λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 12 / 27
The parameter hierarchy
There are three classes of parameters to estimate:
I The coefficients c in the basis function expansion
I The parameters θ defining the differential equation
I The smoothing parameter λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 12 / 27
The parameter hierarchy
There are three classes of parameters to estimate:
I The coefficients c in the basis function expansion
I The parameters θ defining the differential equation
I The smoothing parameter λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 12 / 27
The roles of the three parameter levels
I c are not of the direct interest ⇒ nuisance parameters
I we are primary interest in θ, the parameters that define thedifferential equation
I Smoothing parameter λ control the overall complexity of the modelI λ→ 0⇒ high complexity in x(t)I λ→∞⇒ low complexity in x(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 13 / 27
The roles of the three parameter levels
I c are not of the direct interest ⇒ nuisance parameters
I we are primary interest in θ, the parameters that define thedifferential equation
I Smoothing parameter λ control the overall complexity of the modelI λ→ 0⇒ high complexity in x(t)I λ→∞⇒ low complexity in x(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 13 / 27
The roles of the three parameter levels
I c are not of the direct interest ⇒ nuisance parameters
I we are primary interest in θ, the parameters that define thedifferential equation
I Smoothing parameter λ control the overall complexity of the modelI λ→ 0⇒ high complexity in x(t)I λ→∞⇒ low complexity in x(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 13 / 27
The roles of the three parameter levels
I c are not of the direct interest ⇒ nuisance parameters
I we are primary interest in θ, the parameters that define thedifferential equation
I Smoothing parameter λ control the overall complexity of the modelI λ→ 0⇒ high complexity in x(t)I λ→∞⇒ low complexity in x(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 13 / 27
The roles of the three parameter levels
I c are not of the direct interest ⇒ nuisance parameters
I we are primary interest in θ, the parameters that define thedifferential equation
I Smoothing parameter λ control the overall complexity of the modelI λ→ 0⇒ high complexity in x(t)I λ→∞⇒ low complexity in x(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 13 / 27
The parameter cascade algorithm
1 c are nuisance parameters are defined as a smooth functions c(θ, λ)
2 Structural parameters θ are defined as functions θ(λ) of thecomplexity parameters
3 These functional relationships are defined implicitly by specifying adifferent conditional fitting criterion at each level of the parameterhierarchy
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 14 / 27
The parameter cascade algorithm
1 c are nuisance parameters are defined as a smooth functions c(θ, λ)
2 Structural parameters θ are defined as functions θ(λ) of thecomplexity parameters
3 These functional relationships are defined implicitly by specifying adifferent conditional fitting criterion at each level of the parameterhierarchy
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 14 / 27
The parameter cascade algorithm
1 c are nuisance parameters are defined as a smooth functions c(θ, λ)
2 Structural parameters θ are defined as functions θ(λ) of thecomplexity parameters
3 These functional relationships are defined implicitly by specifying adifferent conditional fitting criterion at each level of the parameterhierarchy
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 14 / 27
The multi–criterion optimization strategy
I Nuisance parameter functions c(θ, λ) are defined by optimizing theregularized fitting criterion
J(c|θ) =n∑
i=1
{yi − x(ti )}2 + λ
∫[f (x(t); θ)]2 dt
I A purely data-fitting criterion H(θ) is then optimized with respect tothe structural parameters θ alone
H(θ) =n∑
i=1
{yi − x(ti , θ)}2
I At the top level, a complexity criterion F (λ), is optimized withrespect to λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 15 / 27
The multi–criterion optimization strategy
I Nuisance parameter functions c(θ, λ) are defined by optimizing theregularized fitting criterion
J(c|θ) =n∑
i=1
{yi − x(ti )}2 + λ
∫[f (x(t); θ)]2 dt
I A purely data-fitting criterion H(θ) is then optimized with respect tothe structural parameters θ alone
H(θ) =n∑
i=1
{yi − x(ti , θ)}2
I At the top level, a complexity criterion F (λ), is optimized withrespect to λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 15 / 27
The multi–criterion optimization strategy
I Nuisance parameter functions c(θ, λ) are defined by optimizing theregularized fitting criterion
J(c|θ) =n∑
i=1
{yi − x(ti )}2 + λ
∫[f (x(t); θ)]2 dt
I A purely data-fitting criterion H(θ) is then optimized with respect tothe structural parameters θ alone
H(θ) =n∑
i=1
{yi − x(ti , θ)}2
I At the top level, a complexity criterion F (λ), is optimized withrespect to λ
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 15 / 27
Remarks
I Gradients and hessians at any level can be computed using theImplicit Function Theorem
I x(t) = cTφ(t) is only approximately a solution of the differentialequation. It is a data-regularized approximation to a solution
I When λ are small, the optimization criteria J and H are smoothfunctions of their arguments, easily optimized. By increasing λ, weavoid the local minimum and force x(t) close to x∗(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 16 / 27
Remarks
I Gradients and hessians at any level can be computed using theImplicit Function Theorem
I x(t) = cTφ(t) is only approximately a solution of the differentialequation. It is a data-regularized approximation to a solution
I When λ are small, the optimization criteria J and H are smoothfunctions of their arguments, easily optimized. By increasing λ, weavoid the local minimum and force x(t) close to x∗(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 16 / 27
Remarks
I Gradients and hessians at any level can be computed using theImplicit Function Theorem
I x(t) = cTφ(t) is only approximately a solution of the differentialequation. It is a data-regularized approximation to a solution
I When λ are small, the optimization criteria J and H are smoothfunctions of their arguments, easily optimized. By increasing λ, weavoid the local minimum and force x(t) close to x∗(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 16 / 27
Outline
1 Motivation
2 The estimation procedure
3 Example: Groundwater Levels
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 17 / 27
Rain and Groundwater
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 18 / 27
Rain and Groundwater
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 19 / 27
Rain and Groundwater: a smaller time scale
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 20 / 27
Functional regression model
G (t) = G0 + αR(t) + ε(t)
I it implies that sudden changes in rainfall R(t) are passed onimmediately to groundwater level G (t)
I we observe that G (t) increases smoothly over a series of localizedrainfalls, and declines slowly when they cease
⇒ functional linear model won’t work
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 21 / 27
Functional regression model
G (t) = G0 + αR(t) + ε(t)
I it implies that sudden changes in rainfall R(t) are passed onimmediately to groundwater level G (t)
I we observe that G (t) increases smoothly over a series of localizedrainfalls, and declines slowly when they cease
⇒ functional linear model won’t work
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 21 / 27
Functional regression model
G (t) = G0 + αR(t) + ε(t)
I it implies that sudden changes in rainfall R(t) are passed onimmediately to groundwater level G (t)
I we observe that G (t) increases smoothly over a series of localizedrainfalls, and declines slowly when they cease
⇒ functional linear model won’t work
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 21 / 27
Functional regression model
G (t) = G0 + αR(t) + ε(t)
I it implies that sudden changes in rainfall R(t) are passed onimmediately to groundwater level G (t)
I we observe that G (t) increases smoothly over a series of localizedrainfalls, and declines slowly when they cease
⇒ functional linear model won’t work
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 21 / 27
Differential equation for groundwater level
dG (t)
dt= −βG (t) + αR(t − δ) + µ
where
I β specifies the the rate of change of G (t) with itself
I α defines the impact of a change in R(t)
I µ is a baseline level, required here because the origin for level G (t) isnot meaningful
I lag δ is the time for rainfall to reach the groundwater level, and isknown to be about 3 hours
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 22 / 27
The constant coefficient fit
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 23 / 27
Allowing ODE parameters time varying
I As groundwater level G (t) changes, the dynamics change, too,because we move through different types of sub-soil structures
I We weren’t given sub-soil transmission rates, so we needed to allowβ(t), α(t) and µ(t) to vary slowly over time
dG (t)
dt= −β(t)G (t) + α(t)R(t − δ) + µ(t)
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 24 / 27
The time-varying coefficient fit
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 25 / 27
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 26 / 27
Ramsay, J. O., Hooker, G., Campbell, D., & Cao, J.Parameter estimation for differential equations: a generalizedsmoothing approach (with discussion)Journal of the Royal Statistical Society. Series B (Methodological),69(5), 741–796, 2007
Xun, X., Cao, J., Mallick, B., Maity, A., & Carroll, R. J.Parameter Estimation of Partial Differential Equation ModelsJournal of the American Statistical Association, 108(503), 1009–1020,2013
Whitney Huang (Purdue University) DE’s & FDA April 21, 2014 27 / 27