1 functional data analysis in matlab and r james ramsay, professor, mcgill u., montreal hadley...

25
1 Functional Data Analysis in Matlab and R James Ramsay, Professor, McGill U., Montreal Hadley Wickham, Grad student, Iowa State, Ames, IA Spencer Graves, Statistician, PDF Solutions, San José, CA

Upload: sidney-bromley

Post on 11-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

1

Functional Data Analysis in Matlab and R

James Ramsay, Professor, McGill U., Montreal

Hadley Wickham, Grad student, Iowa State, Ames, IA

Spencer Graves, Statistician, PDF Solutions, San José, CA

2

Outline • What is Functional Data Analysis?

• FDA and Differential Equations

• Examples: – Squid Neurons– Continuously Stirred Tank Reactor (CSTR)

• Conclusions

• References

3

What is FDA? • Functional data analysis is a collection of

techniques to model data from dynamic systems – possibly governed by differential equations – in terms of some set of basis functions

• The ‘fda’ package supports the use of 8 different types of basis functions: constant, monomial, polynomial, polygonal, B-splines, power, exponential, and Fourier.

4

Observations of different lengths • Observation vectors of different lengths

can be mapped to coordinates of a fixed basis set

• All examples in the ‘fda’ package have the same numbers of observations

• No conceptual obstacles to handling observation vectors of different lengths

5

Time Warping

• “start” and “stop” are sometimes determined by certain transitions

• Example: growth spurts in the life cycle of various species do not occur at exactly the same ages in different individuals (even within the same species)

6

10 Girls: Berkeley Growth Study• Tuddenham, R. D.,

and Snyder, M. M. (1954) "Physical growth of California boys and girls from birth to age 18", _University of California Publications in Child Development_, 1, 183-364.

ooo

ooo

oo

oo

ooooooooooooooooooooo

5 10 15

8010

012

014

016

018

0

age

Hei

ght

(cm

.)

ooooo

oo

oo

ooo

ooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

ooo

ooo

oo

oo

oooooooooo

ooooooooooo

oooo

oo

oo

oo

ooooooo

oooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

ooooo

oo

oo

oo

oo

oo

oooooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

oooo

o

oo

oo

ooooooo

ooooooooooooooo

7

Acceleration • Growth spurts

occur at different ages

• Average shows the basic trend, but features are damped by improper registration

ooo

ooo

oo

oo

ooooooooooooooooooooo

5 10 1580

100

120

140

160

180

age

Hei

ght

(cm

.)

ooooo

oo

oo

ooo

ooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

ooo

ooo

oo

oo

oooooooooo

ooooooooooo

oooo

oo

oo

oo

ooooooo

oooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

ooooo

oo

oo

oo

oo

oo

oooooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

oooo

o

oo

oo

ooooooo

ooooooooooooooo

5 10 15

-4-3

-2-1

01

2

age

Gro

wth

acc

eler

atio

n (c

m/y

ear^

2)

8

Registration • register.fd all

to the mean

• Not perfect, but better

5 10 15

-4-3

-2-1

01

2

ageG

row

th a

ccel

erat

ion

(cm

/yea

r^2)

5 10 15

-4-3

-2-1

01

2

warped age

Gro

wth

acc

eler

atio

n (c

m/y

r^2)

9

A Stroll Along the Beach

• Light intensity over 365 days at each of 190*143 = 27140 pixels was – smoothed – functional principal components

• http://www.stat.berkeley.edu/~wickham/userposter.pdf

10

Other fda capabilities

• Correlations – even with

series of different lengths!

• Phase plane plots – good

estimates of derivatives

Month

Me

an

Te

mp

era

ture

Jan Apr Jun Sep Dec

-10

05

15

j F

m

A

M

JJ A

S

O

N

D

Montreal average daily tempdeviation from average (C)

-10 -5 0 5 10 15 20

-0.0

06

0.0

00

0.0

06

Temperature (C)

Acc

ele

ratio

n

jF

m

A

M JJ

AS

O

N

D

j

Montreal average daily tempdeviation from average (C)

afda-ch03.Rfda-ch01.Rfda-ch02.R

11

Script files for fda books • Ramsay and Silverman

– (2002) Applied Functional Data Analysis (Springer)

– (2006) Functional Data Analysis, 2nd ed. (Springer)

• ~R\library\fda\scripts– Some but not all data sets discussed in the

books are in the ‘fda’ package – Script files are available to reproduce some but

not all of the analyses in the books. – plus CSTR demo

12

FDA and Differential Equations

• Many dynamic systems are believed to follow processes where output changes are a function of the outputs, x, and inputs, u (and unknown parameters ):

Tttt ,0,|, θux,fx

• Matlab was designed in part for these types of models

13

Squid Neurons • FitzHugh (1961) - Nagumo et al. (1962) Equations:

Estimate a, b and c in: cbRaVR

RVVcV

33

Vol

tage

acr

oss

Axo

n M

embr

ane

Rec

over

y vi

a O

utw

ard

Cur

rent

s

V

R

14

Tank Reactions • Continuously Stirred Tank Reactor (CSTR)

Tem

pera

ture

C

once

ntra

tion

15

Functional Data Analysis Process1. Select Basis Set

2. Select Smoothing Operator – e.g., differential equation– equivalent to a Bayesian prior over coefficients

to estimate

3. Estimate coefficients to optimize some objective function

4. Model criticism, residual plots, etc.

5. Hypothesis testing

16

Inputs to Tank Reaction Simulation

17

ba

aFFaFF

FTFT

FFFF

FTTFT

TFTFCFTTFFdtdT

CFCFTdtdC

bb

CCTC

TT

CC

TCTT

CC

,,,:parameters 4

2

,130,

,

1110exp,

,,

,

co

co

1 coco

inin

incoinco

inref4

in

cocoininininco

ininin

Computations: Nonlinear ODE

• Compute Input vectors

• Define functions

• Call differential equation solver

• Summarize, plot

Tem

pera

ture

C

once

ntra

tion

estimate parameters (, , a, b)

18

Three problems

• Estimate (, , a, b) to minimize SSE in Temperature only

function SSE SSE-minMatlab lsqnonlin 5.09888 0.00236R nls 5.09652 0

optim Nelder-Mead 5.09652 0BFGS 5.09652 0CG 5.09900 0.00248SANN 5.17504 0.07852

nlminb 5.09652 0

19

0 10 20 30 40 50 601.2

1.4

1.6

C(t

)

Concentration (red = true, blue = estimated)

0 10 20 30 40 50 60330

340

350

360

T(t

)

Temperature

SSE(Temp, Conc)

• Matlab: lsqnonlin • R: nls

0 10 20 30 40 50 60

1.2

1.4

1.6

1.8

Concentration (red = true, blue = estimate)

C(t

)

0 10 20 30 40 50 60

33

03

40

35

03

60

Temperature

C(t

)

Matlab RConcentration 1.149E-03 1.145E-03Temperature 2.640E-04 2.636E-04

Median absolute relative error

20

R vs. Matlab • Gave comparable answers

• R code for CSTR slightly more accurate but requires much more compute time – coded by different people

• R has helper functions not so easily replicated in Matlab – summary.nls – confint.nls – profile.nls

Estimate StdErr t Pr(>|t|) kref 0.466 0.004 113.0 < 2e-16 ***EoverR 0.840 0.009 94.7 < 2e-16 ***a 1.720 0.232 7.4 8.2e-13 ***b 0.496 0.050 10.0 < 2e-16 ***

21

confint.nls• Likelihood-based confidence intervals:

generally more accurate than Wald intervals – Wald subject to parameter effects curvature – Likelihood: only affected by intrinsic curvature

> confintNlsFit 2.5% 97.5%kref 0.458 0.474EoverR 0.823 0.858a 1.300 2.222b 0.401 0.599

22

0.455 0.465 0.475

0.0

1.0

2.0

0.82 0.84 0.86

0.0

1.0

2.0

1.2 1.6 2.0 2.4

0.0

1.0

2.0

0.40 0.50 0.60

0.0

1.0

2.0

plot.profile.nls• for a plot

showing the sqrt(log(LR))

0.455 0.465 0.475

0.0

1.0

2.0

0.82 0.84 0.86

0.0

1.0

2.0

1.2 1.6 2.0 2.4

0.0

1.0

2.0

0.40 0.50 0.60

0.0

1.0

2.0

kref EoverR

a b

50

99

80

9590

23

Conclusions

• R and Matlab give comparable answers

• R:nls has helper functions absent from Matlab:lsqnonlin

• Functional data analysis tools are key for – estimating derivatives and – working with differential operators

24

References

• www.functionaldata.org

• Ramsay and Silverman (2006) Functional Data Analysis, 2nd ed. (Springer)

• ________(2002) Applied Functional Data Analysis (Springer)

• Ramsay, J. O., Hooker, G., Cao, J. and Campbell, D. (2007) Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of the Royal Statistical Society, Series B. To appear.

25

NOT free-knot splines

• For this, see – DierckxSpline package – Companion to Dierckx, P. (1993). Curve and

Surface Fitting with Splines. Oxford Science Publications, New York.

• R package by Sundar Dorai-Raj – links to Fortran code by Dierckx available from

www.netlib.org/dierckx

• soon to appear on CRAN