inferringstochasticdynamics from functional data

20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika, pp. 1–20 C 2007 Biometrika Trust Printed in Great Britain Inferring Stochastic Dynamics from Functional Data By Nicolas Verzelen, Institut National de recherche Agronomique, 2, place Pierre Viala F-34060 Montpellier, FRANCE. [email protected] Wenwen Tao and Hans-Georg M¨ uller Department of Statistics, University of California, Davis, One Shields Avenue, Davis, California, 95616, U.S.A. [email protected] [email protected] Summary In most current data modelling for time-dynamic systems, one works with a pre- specified differential equation and attempts to fit its parameters. In contrast, we demon- strate that in the case of functional data, the equation itself can be inferred from the data. Assuming only that the dynamics are described by a first order nonlinear differ- ential equation with a random component, we obtain data-adaptive dynamic equations from the observed data via a simple smoothing-based procedure. We prove consistency and introduce diagnostics to ascertain the fraction of variance that is explained by the deterministic part of the equation. This approach is shown to yield useful insights into the time-dynamic nature of human growth. Some key words: Empirical Dynamics, Functional Data Analysis, Goodness of Fit, Growth Curves, Smoothing 1. Introduction In recent years, there has been increasing interest in fitting nonlinear differential equa- tions to data arising in engineering, economics or biology. A major motivation is to understand the dynamics underlying physical or biological processes (Holte et al., 2006; Perelson et al., 1997) or to predict the future behavior of such systems from current observations. These challenges arise in growth studies (Gasser et al., 1984), where, in addition to scientific interest in understanding the dynamics of human growth by study- ing how growth velocity relates to current age and current height, differential equation models can also be used to assess clinical aspects of a child’s growth patterns. A differen- tial equation model that fits the data can be applied to predict the size of the derivative of growth for a healthy child that is low on height for current age. This predicted derivative can then be checked against the observed derivative for monitoring purposes. Substantial work has been devoted to parametric estimation procedures for dynamic systems (Bellman & Roth, 1971; Brunel, 2008; Liang & Wu, 2008; Ramsay et al., 2007). These, and also recent semiparametric approaches (Chen & Wu, 2008; Paul et al., 2011) for modelling dynamic systems, rely on the fact that a pre-specified non-random differen-

Upload: others

Post on 23-Jun-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: InferringStochasticDynamics from Functional Data

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748

Biometrika, pp. 1–20C© 2007 Biometrika TrustPrinted in Great Britain

Inferring Stochastic Dynamics from FunctionalData

By Nicolas Verzelen,

Institut National de recherche Agronomique, 2, place Pierre VialaF-34060 Montpellier, FRANCE.

[email protected]

Wenwen Tao and Hans-Georg Muller

Department of Statistics, University of California, Davis, One Shields Avenue, Davis,California, 95616, U.S.A.

[email protected] [email protected]

Summary

In most current data modelling for time-dynamic systems, one works with a pre-specified differential equation and attempts to fit its parameters. In contrast, we demon-strate that in the case of functional data, the equation itself can be inferred from thedata. Assuming only that the dynamics are described by a first order nonlinear differ-ential equation with a random component, we obtain data-adaptive dynamic equationsfrom the observed data via a simple smoothing-based procedure. We prove consistencyand introduce diagnostics to ascertain the fraction of variance that is explained by thedeterministic part of the equation. This approach is shown to yield useful insights intothe time-dynamic nature of human growth.

Some key words: Empirical Dynamics, Functional Data Analysis, Goodness of Fit, Growth Curves,Smoothing

1. Introduction

In recent years, there has been increasing interest in fitting nonlinear differential equa-tions to data arising in engineering, economics or biology. A major motivation is tounderstand the dynamics underlying physical or biological processes (Holte et al., 2006;Perelson et al., 1997) or to predict the future behavior of such systems from currentobservations. These challenges arise in growth studies (Gasser et al., 1984), where, inaddition to scientific interest in understanding the dynamics of human growth by study-ing how growth velocity relates to current age and current height, differential equationmodels can also be used to assess clinical aspects of a child’s growth patterns. A differen-tial equation model that fits the data can be applied to predict the size of the derivative ofgrowth for a healthy child that is low on height for current age. This predicted derivativecan then be checked against the observed derivative for monitoring purposes.

Substantial work has been devoted to parametric estimation procedures for dynamicsystems (Bellman & Roth, 1971; Brunel, 2008; Liang & Wu, 2008; Ramsay et al., 2007).These, and also recent semiparametric approaches (Chen & Wu, 2008; Paul et al., 2011)for modelling dynamic systems, rely on the fact that a pre-specified non-random differen-

Page 2: InferringStochasticDynamics from Functional Data

495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596

2 N. Verzelen, H.-G. Muller and W. Tao

tial equation actually applies to the data. However, this is often not the case, particularlyin the study of dynamics that are repeatedly observed for many subjects or experiments.There are two major reasons for discrepancies between stipulated dynamic models andactual behavior of systems. First, differential equation models have been traditionallyaccepted and based on their inherent plausibility and concordance with presumed un-derlying mechanisms. All too often, this leads to models that actually do not fit thedata well (Hooker, 2009), because the presumed underlying mechanisms that the modelreflects are not well understood or do not provide good approximations to the actualmechanisms. Second, deterministic models rarely provide satisfactory fits to phenomenathat are inherently stochastic in nature, because the dynamics vary across subjects orexperiments. Dynamics of viral level in HIV studies (Miao et al., 2009) that are subject-specific and the dynamics of auction price trajectories (Reddy & Dass, 2006; Wang et al.,2008) provide examples of this difficulty. In such cases, subject-specific effects come intoplay that cannot be controlled for, and it is then not reasonable to expect a deterministicdynamic equation to provide a good fit across subjects.

All of this motivates an alternative bottom-up approach, namely to directly obtaininformation about underlying dynamic systems from repeated observations of the trajec-tories that result from the dynamics, in contrast to the customary top-down approachof a priori postulating what the dynamic equations should be. Our aim thus is to derivedifferential equations from functional data, i.e., learning these equations from observingmany realizations of the trajectories that they generate. To allow for random variationbetween subjects, it is necessary to add stochastic elements to a deterministic equation.For this, inclusion of an additional stochastic drift process is expedient.

Nonparametric analysis of stochastic differential equations has been previously studiedfor diffusion processes (Hoffmann, 1999; Jacod, 2000), with solutions that are versionsof Brownian motion and have non-differentiable trajectories. As growth and many otherdynamic phenomena are usually considered to be quite smooth, the stochastic differentialequation approach is not useful for most non-financial data. Recently, Muller & Yao(2010) have investigated an empirical dynamic approach, where one determines lineardynamics empirically from a sample of trajectories. Specifically, each trajectory of adifferentiable Gaussian process is shown to satisfy a first order linear differential equation,which can be determined for various types of longitudinal data by suitable estimationprocedures. However, this approach does not extend to nonlinear dynamic systems ornon-Gaussian processes.

Here we show that each trajectory of a smooth stochastic process X satisfies a firstorder nonlinear differential equation with a random component, where the stochasticpart is an additive smooth drift process Z. We call this representation of the process thedata-driven differential equation. The variance of the process Z determines to what extentthe process X is driven by the deterministic part of the differential equation. Wheneverthe variance of the drift Z is small in comparison to the variance of X, a deterministicversion of the differential equation explains most of the observed behavior of the process.Obtaining data-driven dynamics reveals underlying mechanisms generating the observedfunctional data and provides diagnostic tools for assessing the linearity of the dynamics orthe quality of a parametric fit. Implementation proceeds via a two-step kernel estimationprocedure, which we show to be consistent. We illustrate the method by constructingthe data-driven differential equation governing the growth of children for the BerkeleyGrowth Study.

Page 3: InferringStochasticDynamics from Functional Data

979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144

Inferring Stochastic Dynamics 3

We conclude this section by describing the data structure of the available observationsfrom which the dynamics will be learned. Given n realizations Xi of the underlying pro-cess X on a domain T , we assume that Ni measurements Yij (i = 1, . . . , n, j = 1, . . . , Ni),where N = infi=1,...,nNi, are obtained at times tij according to

Yij = Yi(tij) = Xi(tij) + ǫij . (1)

Here ǫij are zero mean independent identically distributed measurements errors withfinite and constant variance var(ǫij) = σ2, independent of all other random components.The design points tij are considered deterministic and densely spaced. This model reflectstypical measurements obtained in growth studies.

2. Data-driven differential equation

In the following we consider a differentiable stochastic process X(t) such that X andits derivative X ′ are square integrable. A simple representation of the derivative processis to decompose it into a mean function µX′ and a mean zero stochastic process Z1,

X ′(t) = µX′(t) + Z1(t). (2)

Nonparametric estimation of individual derivative trajectories and of µX′ provides data-driven descriptions (Gasser & Muller, 1984; Gasser et al., 1984; Mas & Pumo, 2009).

Considering a dynamic equation that captures the relationship between the processX(t) and its derivativeX ′(t), the simplest such relation is a linear relationship betweenX ′

and X. The corresponding linear empirical dynamics is a natural approach for Gaussianprocesses, since the joint Gaussianity ofX andX ′ implies that there exists a deterministicfunction β with

X ′(t) = µX′(t) + β(t)X(t)− µX(t)+ Z2(t). (3)

Here Z2 is a zero mean drift process with covZ2(t), X(t) = 0, implying independencebetween X and Z2 in the Gaussian case (Muller & Yao, 2010).

Many complex biological processes, including growth, cannot be expected to be ade-quately represented by linear dynamics. For more complex dynamics, it is therefore ofinterest to model the dynamics of X with a nonlinear differential equation. There alwaysexists a function f with

EX ′(t) | X(t)(t) = ft,X(t), X ′(t) = ft,X(t)+ Z(t) , (4)

with EZ(t) | X(t) = 0 almost surely. When f is unknown and is determined fromthe data, (4) is a data-driven nonlinear differential equation. The function f and theproperties of the drift process Z determine the underlying non-linear dynamics. In someapplications, comparisons with the special case of a simpler autonomous system

EX ′(t) | X(t) = f1X(t), (5)

for a function f1, which is time-independent, are of interest.Parametric differential equations with random effects provide alternatives to modelling

with Equation (4). Upon integration, these become nonlinear random effects models,which are difficult to fit, especially if they contain many random effects. A typical exampleis the nonlinear Preece–Baines model (Preece & Baines, 1978) for human growth, whichcan be derived from a non-autonomous differential equation. Such nonlinear models arenearly always fitted by least squares separately for each child, not taking advantage of

Page 4: InferringStochasticDynamics from Functional Data

145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192

4 N. Verzelen, H.-G. Muller and W. Tao

the availability of a sample of growth curves and not including any random effects. Thesemodel fits are usually not efficient and have been shown to be inferior to nonparametricsmoothing and differentiation methods in Gasser et al. (1984). These parametric growthmodels can be expressed in the form of the proposed general equation X ′(t) = ft,X(t),which thus provides a general and flexible framework that is informed by all data in thesample. As is typical for the life sciences, for growth data the nature of the underlyingdynamics is largely unknown. The popular Preece-Baines model and related models havebeen derived purely based on data fitting considerations, while the model parameters arenot interpretable (Hansen et al., 2003).

Models (2), (3) and (4) are characterized by increasing complexity, as

varZ(t) ≤ varZ2(t) ≤ varZ1(t) = varX ′(t),

by definition of these drift processes. This means that the dynamic behavior of the pro-cess X is better predictable by the data-driven nonlinear differential equation (4), whencompared with the empirical linear differential equation (3). If varZ(t) = varZ2(t),there is no gain in adopting a non-linear as compared to a simpler linear differentialequation, but there can be substantial gains when the variance of Z(t) is strictly smallerthan the variance of Z2(t). Thus, the estimation of a data-driven nonlinear differentialequation also can be used to assess the linearity of the underlying dynamics.

3. Estimating the components of data-driven non-linear dynamics

3·1. Estimation of the deterministic component

We adopt a two-step kernel smoothing approach to obtain an estimator f of the deter-ministic part of the nonlinear differential equation (4), corresponding to the function f ,which from now on we assume to be a smooth function. This two-step procedure proceedsfrom the same ideas as the method of Ellner et al. (2002) for autonomous dynamics.

Step 1: Obtaining the trajectories of X(t) and X ′(t).For any i = 1, . . . , n, we estimate the trajectory Xi(t) and its derivative X ′

i(t) by aconvolution kernel smoothing method (Gasser et al., 1984). Using a nonnegative sym-metric kernel function K and an antisymmetric kernel function with one sign change K2

for derivative estimation, such that∫K(u)du = 1,

∫K2(u)du = 0 and

∫K2(u)udu = 1,

these estimates are

Xi(t) =1

hX

Ni∑

j=1

∫ sj

sj−1

YijK

(u− t

hX

)du, (6)

X ′i(t) =

1

h2X′

Ni∑

j=1

∫ sj

sj−1

YijK2

(u− t

hX′

)du, (7)

where sj = (tij + ti,j+1)/2 and hX > 0 and hX′ > 0 are smoothing bandwidths.

Step 2: Estimation of f .

Trajectory estimates X(t) and X ′(t) from Step 1 are combined to obtain a Nadaraya–

Page 5: InferringStochasticDynamics from Functional Data

193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240

Inferring Stochastic Dynamics 5

Watson kernel estimator for f ,

f(t, x) =

∑ni=1K Xi(t)−x

bXX ′

i(t)

∑ni=1K Xi(t)−x

bX

. (8)

utilizing bandwidths bX > 0.When estimators (6), (7) are supplemented with suitably chosen boundary kernels for

estimating the regression function near endpoints of the domain of X (Jones & Foster,1996; Muller, 1991), these convolution kernel estimates are equivalent to fitting local

linear estimates for Xi(t), taking the intercept as estimator, and to fitting local quadratic

estimates for X ′(t), taking the linear term as estimator (Fan & Gijbels, 1996; Muller,1987). Thus, one can conveniently implement these estimators by local polynomial fitting.

3·2. Decomposition of variance

By definition (4) of the differential equation, we have the following decomposition ofvariance,

varX ′(t) = var[ft,X(t)] + varZ(t). (9)

Therefore, on subdomains where the variance of the drift process varZ(t) is small,the solution of (4) will not deviate much from the solution that is obtained with thedeterministic approximation

X ′(t) = ft,X(t) (t ∈ T ), (10)

that corresponds to the population equation. In this situation, the future changes ofindividual trajectories are easily predictable. This motivates to consider the fractionof the variance of X ′(t) that is explained by the deterministic part of the data-drivendifferential equation itself as a key quantity for assessing the predictability of the process,leading to a coefficient of determination

R2(t) =var[ft,X(t)]

varX ′(t)= 1−

varZ(t)

varX ′(t). (11)

It is of interest to locate subdomains of T where R2(t) is large. On such subdomains,the drift process is small compared to X ′(t). An obvious estimate for the coefficient ofdetermination R2(t) is obtained by plugging in estimates of the unknown quantities,yielding

R2(t) = 1−

∑ni=1

[X ′

i(t)− ft, Xi(t)]2

∑ni=1

X ′

i(t)− X ′(t)2 . (12)

The coefficient of determination R2(t) assesses the fraction of X ′(t) explained by thedeterministic differential at a given time t. However, for some processes the predictabilityof the process may depend on the time t and on the position x of the process. Consideringthe nonlinear regression model (4), we define the dynamic signal over noise ratio S(t, x)by

S(t, x) =f2(t, x)

EX ′2(t) | X(t) = x=

f2(t, x)

f2(t, x) + varZ(t) | X(t) = x. (13)

Page 6: InferringStochasticDynamics from Functional Data

241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288

6 N. Verzelen, H.-G. Muller and W. Tao

Obviously, S(t, x) lies between 0 and 1. When S(t, x) is close to one, then f2(t, x) is largecompared to varZ(t) | X(t) = x and the process is well predictable when X(t) = x. Incontrast, small values of S(t, x) indicate that the variability of Z(t) given X(t) = x islarge. The functions S quantify the predictability of X as a function of the level of theprocess at time t.

By plugging in the estimate f(t, x) for f(t, x), one obtains the estimator given by

S(t, x) =f2(t, x)

E[X ′2(t) | X(t) = x], E[X ′2(t) | X(t) = x] =

∑ni=1K Xi(t)−x

bXX ′

2

i (t)

∑ni=1K Xi(t)−x

bX

.

(14)

3·3. Applying data-driven nonlinear dynamics for goodness-of-fit

It is of interest to determine whether linear dynamics, implied by Gaussianity of theunderlying processes, suffices to describe the dynamics, or whether a more complex non-linear model is needed, reflecting increased complexity. A simple diagnostic of this canbe obtained by comparing the variance of the drift process Z(t) of the nonlinear dynamicmodel (4) with that of the drift process Z2(t) of the linear dynamic model (3), as follows.

For the coefficient of determination for the linear empirical dynamic model (3),

R2L(t) =

var β(t)X(t)

varX ′(t)= 1−

varZ2(t)

varX ′(t), (15)

one expects that R2(t) ≥ R2L(t). Similar to equation (12),

R2L(t) = 1−

∑ni=1

X ′

i(t)− β(t)Xi(t)2

∑ni=1

X ′

i(t)− X ′(t)2 , (16)

where we note that both R2(t) in (12) and R2L(t) in (16) might be negative when the

fits are bad. On subdomains of T where R(t) is close to RL(t), varZ(t) is close tovarZ2(t) and one may infer that the data-driven differential equation is almost linear,so Equation (3) provides a simpler description.

On subdomains where the diagnostic function R(t)−RL(t) is large, the linear differen-tial equation (3) is probably insufficient to provide a good description of the underlyingdynamics, and then one would then choose the data-driven non-linear dynamic model(4). Equation (4) can be written as an integral equation, and a solution can be obtainedby numerical integration of the equation, given an initial value and a realization of thedrift process Z.

4. Asymptotic properties

4·1. Assumptions

In the following, we describe consistency results for the estimation of the smoothbivariate function f that determines the deterministic part of the proposed data-drivendynamic model (4) and for the estimate (12) of the fraction of variance explained attime t. In the sequel, g(t, x) denotes the density of the random variable X(t) at x. Theassumptions C.1–7 are listed below.

Page 7: InferringStochasticDynamics from Functional Data

289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336

Inferring Stochastic Dynamics 7

C.1 The kernels K and K2 have a compact support [−1, 1] and are Lipschitz continuous

with respective constants µK and µK′ . Moreover,K is positive and satisfies∫ 1−1K(u)du =

1,∫ 1−1K(u)udu = 0 and

∫ 1−1K(u)u2du 6= 0. The kernel K2 satisfies

∫ 1−1K2(u)du = 0,∫ 1

−1K2(u)udu = 1,∫ 1−1K2(u)u

2du = 0 and∫ 1−1K2(u)u

3du 6= 0.C.2 The random function X is almost surely three times continuously differentiable andfor all t ∈ T , |X(t)| ≤ C0, |X

′(t)| ≤ C1, |X(2)(t)| ≤ C2 and |X(3)(t)| ≤ C3 almost surely.

C.3 The random variables ǫij (i = 1, . . . , n; j = 1, . . . , N) are centered and have a finitemoment of order 8.C.4 The functions f(t, ·) and g(t, ·) are Lipschitz with constants µf and µg, twice con-tinuously differentiable and have compact support.C.5 The conditional variance s(t, u) = varX ′(t) | X(t) = u is continuous and is non-zero.C.6 We have (N,n) → ∞ and (bX , hX , hX′) → 0 such that nbX ≥ log2 n → ∞, NhXb4X ≥1, Nh3X′ → ∞ and hX ≤ bX .C.7 There exists a constant C > 0 such that g(t, x) > C for any x ∈ [x1;x2].

4·2. Results

Theorem 1. Under assumptions C.1–6, for any t ∈ T and x such that g(t, x) 6= 0,

E

[f(t, x)− f(t, x)

2]= O

(b4X +

h4Xb2X

+ h4X′ +σ2

NhXb2X+

1

nbX+

σ2

Nh3X′

). (17)

With suitable choices of the bandwidths bX , hX , and hX′ , one obtains

Ef(t, x)− f(t, x)2 = Omax

(N−8/15, n−4/5

). (18)

If n ≤ N2/3, the classical convergence rate n−4/5 for nonparametric regression is ob-tained. Conversely, when n ≥ N2/3, the estimation error in Xi is no more negligible andthe lower bound N on the number of measurements per curve becomes the limitingquantity for the convergence rate.

Regarding R2(t), the rate of convergence of R2(t) depends on that of f(t, ·) nearthe boundary of the support of g(t, ·), where there are few observations. Therefore, weconsider bounded domains for asymptotic study. For positive numbers x1 and x2 in thesupport of g(t, ·), define

R2x1,x2

(t) =var [f t,X(t) | x1 ≤ X(t) ≤ x2]

varX ′(t) | x1 ≤ X(t) ≤ x2= 1−

varZ(t) | x1 ≤ X(t) ≤ x2

varX ′(t) | x1 ≤ X(t) ≤ x2, (19)

so that R2x1,x2

(t) quantifies the ratio of these variances when X(t) is conditioned to lie

between x1 and x2. With nx1,x2= #i : x1 ≤ Xi(t) ≤ x2, we estimate R2

x1,x2(t) by

R2x1,x2

(t) = 1−

∑ni=1

[ft, Xi(t) − X ′

i(t)]2

1x1≤Xi(t)≤x2∑n

i=1 X′2i (t)1

x1≤Xi(t)≤x2

− ∑n

i=1 X′i(t)1x1≤Xi(t)≤x2

/nx1,x22

. (20)

Theorem 2. Under assumptions C.1–7,

R2x1,x2

(t)−R2x1,x2

(t) = Op

b2X +

h2XbX

+ h2X′ + (nbX)−1/2 +1

(NhX)1/2bX+

1

N1/2h3/2X′

.

Page 8: InferringStochasticDynamics from Functional Data

337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384

8 N. Verzelen, H.-G. Muller and W. Tao

Corollary 1. Under assumptions C.1–6, for the dynamic signal/noise ratio (13),

S(t, x) = Op

(b2X +

h2XbX

+ h2X′ + (nbX)−1/2 +1

(NhX)1/2bX+

1

N1/2h3/2X′

).

5. Nonlinear concurrent model

Our methodology provides an estimation procedure for a nonlinear version of theconcurrent model, also known as varying-coefficient model (Chiang et al., 2001). We aimat investigating the relationship between two stochastic processes X(t) and U(t) at eachtime t ∈ T . The linear concurrent model captures a linear relationship between X andU through a deterministic function β(t),

U(t) = µU (t) + β(t)X(t)− µX(t)+ Z2(t), (21)

where Z2(t) is a Z2 is a zero mean drift process with covZ2(t), X(t) = 0. Versions of thisfunctional linear varying coefficient linear model were mentioned in Ramsay & Silverman(2005) and estimators and asymptotics were studied in Senturk & Muller (2010).

Our methodology covers the more general situation where the link between U(t) andX(t) is nonlinear, i.e., where one has a smooth function f(·, ·) and a drift process Z(t)such that

U(t) = ft,X(t)+ Z(t) , (22)

with EZ(t) | X(t) = 0 almost surely and ft,X(t) = EU(t) | X(t). This nonlinearvarying coefficient model can be studied with the methods that we have developed forthe nonlinear dynamic model (4).

Given n realizations Xi and Ui of the underlying processes X and U on a domainT , we assume that N noisy measurements Yij and Vij (i = 1, . . . , n, j = 1, . . . , N) havebeen obtained at times tij analogously to (1). Following the arguments of Section 3·1,we propose a two-step estimator. For any i = 1, . . . , n, we first estimate the trajectoryXi(t) and Ui(t) with a convolution kernel K with bandwidths hX and hU . Then, using

another bandwidth bX , these trajectory estimates Xi(t) and Ui(t) step are combined toobtain

f(t, x) =

∑ni=1K Xi(t)−x

bXUi(t)

∑ni=1K Xi(t)−x

bX

.

Arguing as for the estimation of the non linear dynamic, we obtain the rate of convergencefor f .

Corollary 2. Suppose that assumptions D.1–6 in the Appendix hold. For any t ∈ Tand any x such that g(t, x) 6= 0

E

[f(t, x)− f(t, x)

2]= O

(b4X +

h4Xb2X

+ h4U +σ2

NhXb2X+

1

nbX+

σ2

Nh3U

).

With suitable choices of the bandwidths bX , hX , and hU , one obtains

Ef(t, x)− f(t, x)2 = Omax

(N−8/15, n−4/5

). (23)

Page 9: InferringStochasticDynamics from Functional Data

385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432

Inferring Stochastic Dynamics 9

As before, one can compute a coefficient of determination

R2(t) =var[ft,X(t)]

varU(t)= 1−

varZ(t)

varU(t),

to decompose the variance of U(t) into a part explained by the model and a part leftunexplained.

6. Nonlinear dynamics of human growth data

The proposed model and estimation procedures can be used to illuminate the dynam-ics of human growth. We illustrate the nonlinear differential equation in (4) using theBerkeley Growth Study (Jones & Bayley, 1941), in which, the heights of 54 girls and 39boys from 1–18 years were recorded. Since male and female growth patterns differ sub-stantially, with girls entering puberty much earlier than boys (Tanner et al., 1966), wefocus on girls only. For each of the 54 girls in the study, 31 measurements are available,which were recorded at different time intervals, ranging from three months (from 1 to2 years old), six months (from 8 to 18 years old), to one year (from 3 to 8 years old).The purpose of characterizing the dynamics of human growth and especially the timedomains where the dynamics is nonlinear is twofold . First, it allows us to gain a betterunderstanding of the growth process. Second, it of clinical interest to distinguish betweennormal and pathological patterns of development.

In order to estimate the data-driven differential equation, we apply the two-step proce-dure described in Section 3.1, which is implemented through local weighted least-squaresmethods (Fan & Gijbels, 1996) with a Gaussian kernel K. For t ∈ [0, 18], we obtain es-

timates Xi(t) = ai0(t), where

(ai0, ai1)(t) = argmina′∈R2

1

hX

N∑

j=1

K

(tij − t

hX

)Xij − a0 − a1(tij − t)2 , (24)

with N = 31. The growth velocities X ′i(t) are estimated analogously by taking the slope

of weighted local quadratic fits, X ′i(t) = bi1(t), where

(bi0, bi1, bi2)(t) = argminb∈R3

1

h′X

Ni∑

j=1

K

(tij − t

h′X

)Xij − b0 − b1(tij − t)− b2(tij − t)2

2.

(25)

In a second step, f(t, x) is obtained by another local linear estimator based on Xi(t)

and X ′i(t), setting f(t, x) = d0(t, x), where

d0(t, x), d1(t, x) = argmind∈R2

n∑

i=1

1

bXK

Xi(t)− x

bX

X ′

i(t)− d0 − d1(Xi(t)− x)2

.

(26)

A practically relevant feature is that for given t the function f(t, ·) is only defined on

the domain (mini Xi(t),maxi Xi(t)). A second implementation issue is the choice of thesmoothing bandwidths hX , hX′ , and bX that are needed for local polynomial estimators(24), (25) and (26). We select these tuning parameters by generalized cross-validation(Golub et al., 1979).

Page 10: InferringStochasticDynamics from Functional Data

433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480

10 N. Verzelen, H.-G. Muller and W. Tao

Figure 1. Estimated curves. Estimated growth curves and estimated growth velocities for54 girls.

Estimated growth curves and estimated growth velocities for the sample of girls aredepicted in Figure 1. The estimated function f(t, x), corresponding to the deterministicpart of the data-driven nonlinear differential equation, is displayed as a surface in theleft panel of Figure 2 and as a contour plot in the right panel. Growth velocity has atendency to decrease with age, with the exception of the pubertal growth spurt at agebetween 10 and 13.

A more detailed study of the function f , considering f(t, ·) as a function of currentheight x for ages t = 2, 4, 6, 8, 12 or 16, as shown in Figure 3, reveals that at earlierages (e.g., at age 2), there is a sizeable difference between the fits of the linear and thenonlinear differential equation and furthermore that an autonomous differential equationis inadequate. The clearly more appropriate proposed nonlinear non-autonomous modelshows that there is only a weak relationship between growth velocity and height, whilebetween ages 4 and 8, taller girls also tend to have a higher growth velocity, whichcan be interpreted as manifestation of an inherent growth momentum in this age range.In contrast, for ages between 12 and 16, f(t, ·) is no longer monotone. At Age 12, therelationship is weak, likely due to the fact that the taller girls already had their pubertygrowth peak prior to this age and their growth velocity then is decreasing during thepost-pubertal growth deceleration, while the smaller girls did not enter the pubertalspurt with its growth acceleration yet. At age 16, all girls are growing in a much slowerway, however both shorter and taller girls grow relatively faster than medium sized girls,indicating a strongly nonlinear relationship.

The nonlinear dynamic coefficient of determination R2(t) defined in Equation (11)quantifies to which extent the deterministic part of the nonlinear differential equation(4) explains the variance of X ′(t). When estimating this coefficient with R2

x1(t),x2(t)(t)

defined in Equation (19), we chose x1(t), respectively x2(t), as the third smallest, respec-

Page 11: InferringStochasticDynamics from Functional Data

481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528

Inferring Stochastic Dynamics 11

Figure 2. Estimation of f(t, x). Left panel: Estimated surface f(t, x) ona curved domain, characterizing the deterministic part of the nonlinear

dynamic model in (4). Right panel: Contour plot of the surface f(t, x).

tively largest, value among Xi(t), (i = 1, . . . , n). We also estimated the linear dynamiccoefficient of determination R2

L(t) defined in Equation (15) for the linear dynamic model

(3). A comparison of the two coefficients of determination R2(t) and R2L(t) is shown in

Figure 4, and bootstrap confidence bands for the nonlinear version R2(t) are shown inthe right panel.

For the proposed nonlinear dynamic model, R2(t) is seen to be close to 0.5 fromapproximately age 4 to 8. This implies that the deterministic part of the data-drivendifferential equation captures the behavior of the growth curves during these periodsquite well. In contrast, R2(t) decays sharply from around age 11, as growth velocitiesare difficult to predict during this period, likely due to time variation in the occurrenceof menarche and pubertal growth spurts. For the simpler linear dynamic model, thecorresponding R2

L(t) is always smaller than the corresponding R2(t) for the proposedmodel, but comes closest during ages 8 to 10, where the discrepancy between the fits fromthe linear and the nonlinear systems is relatively small. In conclusion, growth dynamicsaround the pubertal growth spurt are highly nonlinear.

Acknowledgements

We wish to thank several reviewers for helpful comments. The third author acknowl-edges support from the U.S. National Science Foundation.

Page 12: InferringStochasticDynamics from Functional Data

529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576

12 N. Verzelen, H.-G. Muller and W. Tao

Figure 3. Comparison between nonlinear and linear dynamic estimation. Each of the pan-els, arranged for ages t = 2, 4, 6, 8, 12 years from left to right and top to bottom, respec-

tively, illustrates the estimates f(t, ·) of the deterministic part of the nonlinear dynamicmodel (4) (solid), the linear estimates (3) (dashed) and the scatterplot of observed data

pairs Xi(t), X′

i (t).

Appendix 1

Assumptions for Corollary 2

In these assumptions, g(t, ·) stands for the density of X(t).D.1 The kernel K has compact support [−1, 1] and is Lipschitz continuous with constant µK .

Moreover, K is positive and satisfies∫ 1

−1K(u)du = 1,

∫ 1

−1K(u)udu = 0 and

∫ 1

−1K(u)u2du 6= 0.

D.2 The random function X and U are almost surely two times continuously differentiable. Fort ∈ T , |X(t)| ≤ C0, |X

′(t)| ≤ C1, |X(2)(t)| ≤ C2, |U(t)| ≤ C3, |U

′(t)| ≤ C4, |U(2)(t)| ≤ C5.

D.3 The random variables ǫij (i = 1, . . . , n; j = 1, . . . , N) and ζij (i = 1, . . . , n; j = 1, . . . , N) arecentered and have a finite moment of order 8.D.4 The functions f(t, ·) and g(t, ·) are Lipschitz with constants µf and µg, twice continuouslydifferentiable and have a compact support.D.5 The conditional variance s(t, x) = varU(t) | X(t) = x is continuous and is nonzero.D.6 We have (N,n) → ∞ and (bX , hX , hU ) → 0, such that nbX ≥ log2 n → ∞, NhXb4X ≥ 1,NhU → ∞ and hX ≤ bX .

Page 13: InferringStochasticDynamics from Functional Data

577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624

Inferring Stochastic Dynamics 13

Figure 4. Coefficients of determination. Left panel: Estimated coefficients

of determination R2(t) (12), corresponding to the fraction of variance ex-plained by the deterministic part of the nonlinear dynamic model (4)

(solid), in comparison with the corresponding fractions of variance R2

L(t)(15) explained by the linear dynamics (3) (dot-dashed). Right panel: 95%

bootstrap confidence interval for R2(t).

Appendix 2

Proofs

Proof of Theorem 1. We decompose the difference f(t, x)− f(t, x) into the sum of two terms,

A =

∑ni=1 KXi(t)−x

bXX ′

i(t)∑ni=1 KXi(t)−x

bX

− f(t, x)

B =

∑ni=1 K Xi(t)−x

bXX ′

i(t)∑n

i=1 K Xi(t)−xbX

∑ni=1 KXi(t)−x

bXX ′

i(t)∑ni=1 KXi(t)−x

bX

.

The term A is simply the difference between a Nadaraya–Watson estimator and its target.Under Assumptions C.1–2,4–6, the pointwise risk of this estimator is known (Schimek, 2000,pages 43–70) to be equivalent to

b2X

∫ 1

−1

u2K(u)du

21

2

d2f(t, x)

dx2+

df(t,x)dx

dg(t,x)dx

g(t, x)

2

+s(t, x)

g(t, x)nbX

∫ 1

−1

K2(u)du , (A1)

if the quantities involved in the last expression are nonzero. Hence, we have

E[A] = Ob4X + (nbX)−1

.

Page 14: InferringStochasticDynamics from Functional Data

625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672

14 N. Verzelen, H.-G. Muller and W. Tao

By Assumption C.2, we have |X′

(t)| ≤ C1, |X(2)(t)| ≤ C2, and |X(3)(t)| ≤ C3 almost surely.

Applying classical results in kernel estimation (Gasser et al., 1984), one finds

E

[X(t)−X(t)

2

| X

]= Op

[h2XC2

∫ 1

−1

K(u)u2du

2

+σ2

NhX

∫ 1

−1

K2(u)du

], (A2)

E

[X ′(t)−X ′(t)

2

| X

]= Op

[h2X′C3

∫ 1

−1

K2(u)u3du

2

+σ2

Nh3X′

∫ 1

−1

K22 (u)du

]. (A3)

For the sake of simplicity, we respectively denote the rates in (A2) and (A3) by r21 and r22.

Moreover, we have E[X(t)−X(t)4 | X

]= Op(r

41). To prove that

E(B2)= O

(r21b2X

+r41b6X

+r81b14X

+ r22 +1

n

),

we decompose B into the sum of two terms,

B1 =

∑ni=1 K

Xi(t)−x

bX

X ′

i(t)

∑ni=1 K

Xi(t)−x

bX

∑ni=1 K

Xi(t)−x

bX

X ′

i(t)

∑ni=1 K

Xi(t)−x

bX

,

B2 =

∑ni=1 K

Xi(t)−x

bX

X ′

i(t)−X ′i(t)

∑ni=1 K

Xi(t)−x

bX

.

Let us first control the term B1. Writing

αi =K

Xi(t)−xbX

∑nj=1 K

Xj(t)−x

bX

, αi =K

Xi(t)−xbX

∑nj=1 K

Xj(t)−x

bX

. (A4)

Applying Equation (A3), we upper bound

B21 =

1≤i1,i2≤n

(αi1 − αi1)(αi2 − αi2)X′i1(t)X

′i2(t)

≤ O(1)∑

1≤i1,i2≤n

|(αi1 − αi1)(αi2 − αi2)|+∑

1≤i1,i2≤n

n2(αi1 − αi1)2(αi2 − αi2)

2

+1

n2

1≤i1,i2≤n

X ′

i1(t)X′i2(t)−X ′

i1(t)X′i2(t)

2

,

since the random variables X ′i(t) are uniformly bounded above. As explained after (A3), we have

E

1

n2

1≤i1,i2≤n

X ′

i1(t)X′i2(t)−X ′

i1(t)X′i2(t)

2

= O(r22) .

Define events Ω by

Ω =

n∑

j=1

K

Xj(t)− x

bX

≥ nbXg(t, x) ,

n∑

j=1

K

Xj(t)− x

bX

≥ nbXg(t, x)

. (A5)

We bound B21 under the event Ωc,

E(B2

11Ωc

)≤E(B4

1

)pr(Ωc)

1/2≤ n2

[EX ′4(t)

pr(Ωc)

]1/2.

Page 15: InferringStochasticDynamics from Functional Data

673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720

Inferring Stochastic Dynamics 15

To obtain an upper bound for pr(Ωc), we bound the first two moments of K[Xj(t)− x/bX ]:

E

[K

Xj(t)− x

bX

]≥ 2bX g(t, x)− µgbX = 2bXg(t, x)1 + o(1) ,

E

[K2

Xj(t)− x

bX

]≤ 2bXg(t, x)1 + o(1)‖K‖2∞ ,

E

[K

Xj(t)− x

bX

]≥ 2bXg(t, x)1 + o(1) ,

since bX goes to 0, h2X/bX goes to 0 and NhXbX goes to infinity. Since the kernel K is bounded,

we can apply Bernstein’s inequality

pr

n∑

j=1

K

Xj(t)− x

bX

≤ nbXg(t, x)

≤ exp

[−nbXg(t, x)

5‖K‖2∞1 + o(1)

].

Since nbX ≥ log2 n, it follows that

E(B2

11Ωc

)= o(n−1) . (A6)

Considering E(B211Ω), we aim to find bounds for terms of the form

E |(αi1 − αi1)(αi2 − αi2) | |1Ω, noting αi − αi decomposes as

K

Xi(t)−xbX

−K

Xi(t)−x

bX

∑nj=1 K

Xj(t)−x

bX

+K

Xi(t)− x

bX

∑nj=1

[K

Xj(t)−xbX

−K

Xj(t)−x

bX

]

∑nj=1 K

Xj(t)−x

bX

∑nj=1 K

Xj(t)−x

bX

.

Applying Assumption C.1, under the event Ω,

|αi − αi|1Ω = O

1

nb2Xg(t, x)

[ ∣∣∣Xi(t)− Xi(t)∣∣∣ 1|Xi(t)−x|≤2bX∪|Xi(t)−Xi(t)|≥bX

+1|Xi(t)−x|≤bX

nbXg(t, x)

n∑

j=1, j 6=i

∣∣∣Xj(t)− Xj(t)∣∣∣ 1|Xj(t)−x|≤2bX∪|Xj(t)−Xj(t)|≥bX

].

Applying (A2), the Cauchy–Schwarz inequality and Tchebychev’s inequality, for i1 6= i2,

E |(αi1 − αi1)(αi2 − αi2)|1Ω =1

n2O

r21b2X

+r41

b6Xg2(t, x)+

1

n

. (A7)

Similarly, bounding the second moment for i1 6= i2,

E(αi1 − αi1)

2(αi2 − αi2)21Ω=

1

n2O

r41

b6Xg2(t, x)+

r81b14X g6(t, x)

+1

n

. (A8)

The terms corresponding to i1 = i2 are negligible. Combining the upper bounds (A7) and (A8)with (A6), we conclude that

E(B2

1

)= O

r22 +

r21b2X

+r41

b6Xg2(t, x)+

r81b14X g6(t, x)

+1

n

.

The term B2 is simply a weighted sum of the differences X ′i(t)−X ′

i(t). Recall the weights αi

(i = 1, . . . , n) defined in (A4). Conditioning on Xi(t) (i = 1, . . . , n; t ∈ T ), we get

E(B2

2

)= E

1≤i,j≤n

αiαj

X ′

i(t)−X ′i(t)

X ′j(t)−X ′

j(t) = O(r22) .

Page 16: InferringStochasticDynamics from Functional Data

721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768

16 N. Verzelen, H.-G. Muller and W. Tao

All in all, we conclude that

E(B2)= O

r22 +

r21b2X

+r41

b6Xg2(t, x)+

r81b14X g6(t, x)

+1

n

.

It then follows from Assumption C.6, (A2) and (A3) that

E(B2)= O

(h4X′ +

h4X

b2X+

σ2

NhXb2X+

σ2

Nh3X′

+1

n

). (A9)

Combining this last bound with (A1) allows to prove the first part of the theorem. Setting hX =N−1/5, hX′ = N−1/7 and bX = N−2/15 if n ≥ N2/3, while bX = n−1/5 if n ≤ N2/3, assumptionC.6 is satisfied and one obtains

Ef(t, x)− f(t, x)

2

= Omax

(N−8/15, n−4/5

).

Proof of Theorem 2. We first consider the denominator of (20) divided by nx1,x2and then the

numerator of (20) divided by nx1,x2.

We note that

varx1,x2X ′(t) =

∑ni=1 X

′2i (t)1x1≤Xi(t)≤x2

− ∑n

i=1 X′i(t)1x1≤Xi(t)≤x2

/nx1,x22

nx1,x2

.

In the sequel, nx1,x2stands for #i, x1 ≤ Xi(t) ≤ x2. The difference varx1,x2

X ′(t) −varx1,x2

X ′(t) behaves like

Op

(n−1/2

)+

∑ni=1 X

′2

i (t)1x1≤Xi(t)≤x2

nx1,x2

∑ni=1 X

′2i (t)1x1≤Xi(t)≤x2

nx1,x2

+

∑ni=1 X

′i(t)1x1≤Xi(t)≤x2

nx1,x2

2

∑ni=1 X

′i(t)1x1≤Xi(t)≤x2

nx1,x2

2

. (A10)

Consider the following upper bound of |X ′2

i (t)1x1≤Xi(t)≤x2−X ′2

i (t)1x1≤Xi(t)≤x2|

|X ′2

i (t)−X ′2i (t)|+X

′2i (t)|1x1≤Xi(t)≤x2

− 1x1≤Xi(t)≤x2| .

Since X ′ is a kernel estimator of X ′(t), we have

E

[X ′

2(t)−X ′2(t)

2

| X

]= Op

(h4X′ +

1

Nh3X′

).

To bound the expectation of the term |1x1≤Xi(t)≤x2− 1x1≤Xi(t)≤x2

|, we use the rate of convergence

(A2) of Xi(t). Since X ′(t) is uniformly bounded, we get

EX

′2i (t)|1x1≤Xi(t)≤x2

− 1x1≤Xi(t)≤x2|= O

h2X + (NhX)−1/2

.

E∣∣∣X ′

2

i (t)1x1≤Xi(t)≤x2−X ′2

i (t)1x1≤Xi(t)≤x2

∣∣∣= O

h2X′ + h2

X +1

N1/2h3/2X′

+ (NhX)−1/2

.

From the rate of convergence (A2) of X(t), we derive that

nx1,x2− nx1,x2

n= Op

h2X + (NhX)−1/2

. (A11)

Page 17: InferringStochasticDynamics from Functional Data

769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816

Inferring Stochastic Dynamics 17

It follows that∑n

i=1 X′2

i (t)1x1≤Xi(t)≤x2/nx1,x2

−∑n

i=1 X′2i (t)1x1≤Xi(t)≤x2

/nx1,x2is

Op

h2X′ + h2

X +1

N1/2h3/2X′

+ (NhX)−1/2

.

Arguing similarly for the last terms in (A10), we conclude that

varx1,x2X ′(t) − varx1,x2

X ′(t) = Op

h2X′ + h2

X +1

N1/2h3/2X′

+ (NhX)−1/2 + n−1/2

.

Let us now study the Convergence of the numerator of (20). The difference

∑ni=1

[ft, Xi(t) − X ′

i(t)]2

1x1≤Xi(t)≤x2

nx1,x2

− varZ(t) | x1 ≤ X(t) ≤ x2

behaves like

Op

(n−1/2

)+

∑ni=1 f

2t, Xi(t)1x1≤Xi(t)≤x2

nx1,x2

∑ni=1 f

2t,Xi(t)1x1≤Xi(t)≤x2

nx1,x2

+

∑ni=1X

′i(t)

21x1≤Xi(t)≤x2

nx1,x2

∑ni=1X

′i(t)

21x1≤Xi(t)≤x2

nx1,x2

+ 2

∑ni=1 ft,Xi(t)X

′i(t)1x1≤Xi(t)≤x2

nx1,x2

− 2

∑ni=1 ft, Xi(t)X

′i(t)1x1≤Xi(t)≤x2

nx1,x2

.

(A12)

We only bound the first difference in (A12), the two other differences being handled similarly.

For any 1 ≤ i ≤ n, we consider the random variable f (−i) which is computed analogously tof with the data (Yk,j)k 6=i,1≤j≤Nk

. Consequently, f (−i)(t, ·) is independent of Xi(t). DenotingE−i as the expectation with respect to (Yk,j)k 6=i,1≤j≤Ni

and Ei as the expectation with respect

to (Yi,j)1≤j≤Nk, the difference |f2t, Xi(t)1x1≤Xi(t)≤x2

− f2t,Xi(t)1x1≤Xi(t)≤x2| decomposes

into a sum of three terms

|f2t, Xi(t) − f (−i)2t, Xi(t)|1x1≤Xi(t)≤x2

+ |f (−i)2t, Xi(t) − f2t, Xi(t)|1x1≤Xi(t)≤x2

+ |f2t, Xi(t)1x1≤Xi(t)≤x2− f2t,Xi(t)1x1≤Xi(t)≤x2

| . (A13)

Let us bound the expected value of the second difference

E[|f (−i)2t, Xi(t) − f2t, Xi(t)|1x1≤Xi(t)≤x2

]

≤ Ei

(‖f‖∞ + E(−i)

[f (−i)2(t, Xi(t))

]1/2)

× E(−i)

([f (−i)t, Xi(t) − ft, Xi(t)

]2)1/2

1x1≤Xi(t)≤x2

.

Arguing as in the proof of Proposition 1, we know that the rate of convergence of f (−i) satisfies

E

[f (−i)(t, x)− f(t, x)

2]=

O(b4X +

h4

X

b2X

+ h4X′ + 1

nbX+ 1

NhXb2X

+ 1Nh3

X′

)

min g6(t, x), 1,

Page 18: InferringStochasticDynamics from Functional Data

817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864

18 N. Verzelen, H.-G. Muller and W. Tao

therefore the expectation of the second difference in (A13) is

O

b2X + (nbX)−1/2 +

h2X

bX+

σ

(NhX)1/2bX+ h2

X′ +σ

N1/2h3/2X′

. (A14)

In order to control difference f (−i)2t, Xi(t) − f2t, Xi(t) in (A13), we observe that

ft, Xi(t) decomposes as

K(0)X ′i(t)

K(0) +∑

j 6=i K

Xj(t)−Xi(t)bX

+ f (−i)t, Xi(t)

1− K(0)

K(0) +∑

j 6=i K

Xj(t)−Xi(t)bX

.

We note β = K(0)/(K(0) +∑

j 6=i K[Xj(t)− Xi(t)/bX ]). Thus, the difference

E[|f (−i)2t, Xi(t) − f2t, Xi(t)|1x1≤Xi(t)≤x2

]is of the form

O(1)Eβ|X ′

2

i (t)|1x1≤Xi(t)≤x2

+O(1)E

(β[f (−i)t, Xi(t)]

21x1≤Xi(t)≤x2

).

Applying Bernstein inequality as in the proof of Theorem 1, we upper bound β byO[gt, Xi(t)/(nbX)] with overwhelming probability. We control the random variable on the com-plementary event applying the Cauchy–Schwarz inequality. All in all, we get

E−i

[|f (−i)2t, Xi(t) − f2t, Xi(t) | 1x1≤Xi(t)≤x2

]≤ Op

1x1≤Xi(t)≤x2

max1, X ′2

i (t)

nbXgt, Xi(t)

.

Integrating with respect to Xi, we conclude that

E[|f (−i)2t, Xi(t) − f2t, Xi(t)|1x1≤Xi(t)≤x2

]= O

(1

nbX

). (A15)

In order to control the third difference in (A13), we upper bound |f2t, Xi(t)1x1≤Xi(t)≤x2−

f2t,Xi(t))1x1≤Xi(t)≤x2| by 2µf‖f‖∞

∣∣∣Xi(t)−Xi(t)∣∣∣ if x1 ≤ Xi(t) ≤ x2 and if x1 ≤ Xi(t) ≤ x2,

by 0 if Xi(t) /∈ [x1, x2] and if Xi(t) /∈ [x1, x2], and by ‖f‖2∞ else. From Equation (A2), we derive

E|f2t, Xi(t)1x1≤Xi(t)≤x2− f2t,Xi(t)1x1≤Xi(t)≤x2

| = Oh2X + (NhX)−1/2

. (A16)

Combining (A14), (A15), and (A16) with (A12) and (A13), we obtain

E

[1

n

n∑

i=1

∣∣∣f2t, Xi(t)1x1≤Xi(t)≤x2− f2t,Xi(t)1x1≤Xi(t)≤x2

∣∣∣]

= O

b2X +

h2X

bX+ h2

X′ + (nbX)−1/2 +1

(NhX)1/2bX+

1

n1/2h3/2X′

.

Combining this bound with (A11), one finds

1

nx1,x2

n∑

i=1

f2t, Xi(t)1x1≤Xi(t)≤x2−

1

nx1,x2

n∑

i=1

f2t,Xi(t)1x1≤Xi(t)≤x2

= Op

b2X +

h2X

bX+ h2

X′ + (nbX)−1/2 +1

(NhX)1/2bX+

1

n1/2h3/2X′

+ hX

.

Page 19: InferringStochasticDynamics from Functional Data

865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912

Inferring Stochastic Dynamics 19

Arguing similarly, we obtain the rate of convergence of the two remaining terms in (A12). We

conclude that varx1,x2f(t, X(t)) − varx1,x2

f(t,X(t)) behaves like

Op

b2X +

h2X

bX+ h2

X′ + (nbX)−1/2 +1

(NhX)1/2bX+

1

n1/2h3/2X′

.

Proof of Corollary 1. We only need to observe that the rate of convergence of EX ′2(t) |

X(t) = x towards EX ′2(t) | X(t) = x is the same as that of f(t, x) towards f(t, x). Indeed,

EX ′2(t) | X(t) = x is a Nadaraya–Watson estimator based on X ′i(t), X

′i(t), (i = 1, . . . , n).

Gathering this remark with Theorem 1 allows to conclude.

Proof of Corollary 2. The arguments are the same as in the proof of Theorem 1, the onlydifference being that the rate of convergence of X ′(t) is replaced by the rate of convergence of

U(t).

Bibliography

Bellman, R. & Roth, R. (1971). The use of splines with unknown end points in the identification ofsystems. J. Math. Anal. Appl. 34 26–33.

Brunel, N. (2008). Parameter estimation of ODE’s via nonparametric estimators. Electron. J. Stat. 21242–1267. URL http://dx.doi.org/10.1214/07-EJS132.

Chen, J. & Wu, H. (2008). Efficient local estimation for time-varying coefficients in deterministicdynamic models with applications to hiv-1 dynamics. Ann. Stat 103 369–384.

Chiang, C., Rice, J. & Wu, C. (2001). Smoothing spline estimation for varying coefficient models withrepeatedly measured dependent variables. Journal of the American Statistical Association 96 605–619.

Ellner, S., Seifu, Y. & Smith, R. (2002). Fitting population dynamic models to time-series data bygradient matching. Ecology 83 2256–2270.

Fan, J. & Gijbels, I. (1996). Local polynomial modelling and its applications, vol. 66 of Monographs onStatistics and Applied Probability. London: Chapman & Hall.

Gasser, T. & Muller, H.-G. (1984). Estimating regression functions and their derivatives by the kernelmethod. Scand. J. Statist. 11 171–185.

Gasser, T., Muller, H.-G., Kohler, W., Molinari, L. & Prader, A. (1984). Nonparametricregression analysis of growth curves. Ann. Statist. 12 210–229.

Golub, G., Heath, M. & Wahba, G. (1979). Generalized cross-validation as a method for choosing agood ridge parameter. Technometrics 21 215–223.

Hansen, B., Cortina-Borja, M. & Ratcliffe, S. (2003). Assessing non-linear estimation proceduresfor human growth models. Ann Hum Biol 30 80 –96.

Hoffmann, M. (1999). Adaptive estimation in diffusion processes. Stochastic Process. Appl. 79 135–163.URL http://dx.doi.org/10.1016/S0304-4149(98)00074-X.

Holte, S., Melvin, A., Mullins, J., Tobin, N. & Frenkel, L. (2006). Density-dependent decay inhiv-1 dynamics. J. Acquired Immune Deficiency Syndromes 41 266–276.

Hooker, G. (2009). Empirical Dynamics for Longitudinal Data. Biometrics 65 928–936.Jacod, J. (2000). Non-parametric kernel estimation of the coefficient of a diffusion. Scand. J. Statist.

27 83–96. URL http://dx.doi.org/10.1111/1467-9469.00180.Jones, H. & Bayley, N. (1941). The Berkeley Growth Study. Child Development 12 167–173.Jones, M. C. & Foster, P. J. (1996). A simple nonnegative boundary correction method for kernel

density stimation. Statistica Sinica 6 1005–1013.Liang, H. & Wu, H. (2008). Parameter estimation for differential equation models using a frame-

work of measurement error in regression models. J. Amer. Statist. Assoc. 103 1570–1583. URLhttp://dx.doi.org/10.1198/016214508000000797.

Mas, A. & Pumo, B. (2009). Functional linear regression with derivatives. J. Nonparametr. Stat. 2119–40.

Page 20: InferringStochasticDynamics from Functional Data

913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960

20 N. Verzelen, H.-G. Muller and W. Tao

Miao, H., Dykes, C., Demeter, L. M. & Wu, H. (2009). Differential equation modeling of hiv viralfitness experiments: Model identification, model selection, and multimodel inference. Biometrics 65292 – 300.

Muller, H.-G. (1987). Weighted local regression and kernel methods for nonparametric curve fitting.J. Am. Stat. Assoc. 82 231–238.

Muller, H.-G. (1991). Smooth optimum kernel estimators near endpoints. Biometrika 78 521–530.Muller, H.-G. & Yao, F. (2010). Empirical Dynamics for Longitudinal Data. Ann. Statist. 38 3458–

3486.Paul, D., Peng, J. & Burman, P. (2011). Semiparametric modeling of autonomous nonlinear dynamical

systems with applications. Ann. Appl. Stat. 5 2078–2108.Perelson, A., Essunger, P., Cao, Y., Vesanen, M., Hurley, A., Saksela, K., Markowitz, M.

& Ho, D. (1997). Decay characteristics of hiv-l-infected compartments during combination therapy.Nature 387 188–191.

Preece, M. & Baines, M. (1978). A new family of mathematical models describing the human growthcurve. Ann. Hum. Biol. 5 1–24.

Ramsay, J. O., Hooker, G., Campbell, D. & Cao, J. (2007). Parameter estima-tion for differential equations: a generalized smoothing approach. J. R. Stat. Soc. Ser.B Stat. Methodol. 69 741–796. With discussions and a reply by the authors, URLhttp://dx.doi.org/10.1111/j.1467-9868.2007.00610.x.

Ramsay, J. O. & Silverman, B. W. (2005). Functional Data Analysis. New York: Springer, 2nd ed.Reddy, S. K. & Dass, M. (2006). Modeling on-line art auction dynamics using functional data analysis.

Stat. Sci. 21 179–193.Schimek, M., ed. (2000). Smoothing and regression. New York: John Wiley & Sons Inc.Senturk, D. & Muller, H.-G. (2010). Functional varying coefficient models for longitudinal data.

Journal of the American Statistical Association 105 1256–1264.Tanner, J., Whitehouse, R. & Takaishi, M. (1966). Standards from birth to maturity for height,

weight, height velocity, and weight velocity: British children. Arch. Dis. Child. 41 613–635.Wang, S., Jank, W., Shmueli, G. & Smith, P. (2008). Modeling price dynamics in ebay auctions using

principal differential analysis. J. Am. Stat. Assoc. 103 1100–1118.