tutorial on functional data analysis - samsi · a-m staicu tutorial on functional data analysis...

Methodology

Tutorial on Functional Data Analysis

Ana-Maria Staicu

Department of Statistics, North Carolina State [email protected]

SAMSI

April 5, 2017

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 1 / 71

Methodology

Broad overview of course topics:

1. Introduction to Functional Data

2. Modeling Functional Data with Preset Basis Expansions

3. Modeling Functional Data using Functional PrincipalComponent Analysis

4. Beyond Independent and Identically Distributed FunctionalData


Methodology

Example of functional data

Canadian weather data. Daily (and monthly) temperature andprecipitation at 35 different locations in Canada averaged over35 years from 1960 to 1994.

●●●●●●●

●●●●●●●

●

●

●●●●●

●●●●●●●●●●●●●●●●

●●

●●●●●●●●●

●

●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●

●●

●●

0 100 200 300

−30

−20

−10

010

20

Ontario

days

tem

pera

ture

0 100 200 300

−30

−20

−10

010

20

days

tem

pera

ture

Avg Daily Temp in Ottawa Avg Daily Temp across Canada


Methodology

Example of functional data (cont’d)

Marathon data. Running time performance of 149 femalesand 105 male athletes competing in the US Olympic TeamTrials Marathon 02/16/2016.

6

8

10

12

Miles

Spe

ed (

mile

s/hr

)

1 3 5 7 9 11 13 15 17 19 21 23 25

●●

●

●

● ●

●●

● ●●

● ●

●● ●

●● ●

●●

●

●●

●

●

●●

●● ●

●●

● ●●

● ● ● ●●

● ● ●

●●

●●

● ●

● ●

●● ● ●

● ●● ●

●●

● ● ●

●●

●● ●

●●

●

● ●●

●

●● ●

● ● ● ●● ●

● ●

●● ●

● ●● ● ●

●

● ●

● ●●

● ● ● ● ●

● ●●

● ●●

●● ●

●●

●

●

●

●● ●

●

●

6

8

10

12

Miles

Spe

ed (

mile

s/hr

)

1 3 5 7 9 11 13 15 17 19 21 23 25

●●

● ●

● ●●

●

● ● ●

● ● ●●

●

●● ● ● ● ●

●●

●●

●

●

●● ● ●

●

● ●● ●

●

●●

●

●●

●

● ● ●●

●

● ●●

● ● ● ●

● ●● ● ● ●

●●

●●

●●

●●

● ●

●

●

●

● ●

●

● ●●

● ●●

●●

●

● ●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

● ●

●

●

●●

● ●

●●

●

●

●

●

●

●●

●●

●

●●

●

Speed for females Speed for males


Methodology


CD4 data. CD4 cell count per mm of blood is a usefulsurrogate of the progression of HIV. Below are CD4 cellcounts for 366 subjects between months -18 and 42 monthssince seroconversion (diagnosis of HIV).

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

CD4 cell counts

months since seroconversion

num

ber

of c

d4 c

ells

●●

●

● ●

●●

●

●●

●

●

●

●

●

●

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

CD4 cell counts


num

ber

of c

d4 c

ells

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

●

● ●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●●

●

● ●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

●●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●●

●●

●

●●

●

●

● ●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ●

●

●

●

●●

●● ●

●

●

●

●

●

●●

●

●

● ●

●

●

●●

●

●

●

●

●

●

●

● ●●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

● ●

●

●

●● ●

●●

●

●● ● ●

●

●

●

●

●

●●

● ●●

●

●

●

●

●

●

●

●

●●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●

●

● ●

●

●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●● ●

●

●

●

●

●

●●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●● ●

●

●●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

● ●

●

● ● ●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●●

●

●

●

●

●●

●●

●

● ●

● ●

●

●

●

●

●

● ●

●●

●

●

●

● ●

●● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

● ●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

● ●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

● ●

●

●

●●

●

●

●

● ●

● ●

●●

●

●

●

●

●

●

●

●●

●●

●

●

● ●

●

●●

● ●

● ●

●●

●

●●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

● ● ●

●

● ●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

● ● ●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ● ●

●

● ● ●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●

● ●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●●

● ●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●● ●

●

●

●

●

●

●●

●●

●

●

●

●

●

●● ●

●

●

●

●

●

●

●

●

●

● ● ●●

●

●

●

● ●

●

●●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

●

●

●

●

●

●

●

●●

● ●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ● ●

●

● ●

●

●●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●● ●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

CD4 for few subjects CD4 counts for all the subjects


Methodology


Diffusion tensor imaging (DTI) data. Fractional anisotropy(FA) - measure of the tissue integrity that is useful indiagnosis/progression of multiple sclerosis (MS) - along themain direction of the corpus callosum (CCA) for many subj.

●●

●

●

●

●

●

●

●

●●

●●●●●

●

●

●●●

●

●●

●●●●

●

●●●

●●●●●●●●

●●

●●

●●●

●●●●●●●●

●

●

●●

●●●●

●●

●●

●●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●●●

0 20 40 60 80

0.3

0.4

0.5

0.6

0.7

Diffusion Tensor Imaging : CCA

tract

Fra

ctio

nal a

niso

trop

y (F

A)

0 20 40 60 80

0.3

0.4

0.5

0.6

0.7


tract

Fra

ctio

nal a

niso

trop

y (F

A)

FA obs for one MS subj FA for all subj Brain CCA (red)


Methodology


Diffusion tensor imaging (DTI) data. Each subject is observedat several hospital visits and at each time a FA profile ismeasured. Below: FA for two subjects.Of interest: what is the dynamic behavior over time ?

0 20 40 60 80

0.40

0.50

0.60

Subject A (total of 4 visits)

tract

FA

visit times = 0, 192, 365, 834

0 20 40 60 80

0.30

0.40

0.50

0.60

Subject B (total of 3 visits)

tract

FA

visit times = 0, 112, 239

FA along CCA at various times after baseline (time=0)


Methodology

Some characteristics of functional data:

. Often high dimensional

. Typically involves multiple measurements of the same“process”

. Interpretability across subject domains

. Parametric assumptions on the underlying “process” are oftennot made


Methodology

FDA vs multivariate data analysis

. Topology: well defined topology for FDA / one can permutethe elements in MDA

. Model covariance assn: smooth for FDA / unstructured orsparse for MDA


Methodology

FDA vs longitudinal data analysis

. Model covariance: no assn for FDA / parametric assn for LDA

. Mechanisms of missingness: not very important in FDA / veryimportant for LDA

. Interest more in subject-specific trajectories for FDA /inference for LDA

. Sampling design: typically high frequency for FDA / sparseand irregular for LDA


Methodology

Some references

. Ramsay & Silverman, 2005, “Functional Data Analysis”

. Ramsay, Hooker & Graves, 2009, “Functional Data Analysis inR and Matlab”

. Horvath & Kokoszka, 2012, “Inference for Functional Datawith Applications”

. Bosq, 2002, “Linear Processes on Function Spaces”

. Ferraty & Vieux, 2006, “Nonparametric Functional DataAnalysis”

. Zhang, 2013, “Analysis of Variance for Functional Data”

. Hsing & Eubank, 2015, “Theoretical Foundations ofFunctional Data Analysis, with an Introduction to LinearOperators”


Methodology

From discrete to functional data. Intuition

The term functional in reference to observed data refers to theintrinsic structure of the data being functional; i.e. there is anunderlying function that gives rise to the observed data.

Advantages of representing the data as a smooth function:

. allows evaluation at any time point

. allows evaluation of rates of change of the underlying curve

. allows registration to a common time-scale

Main idea in FDA: treat the observed data functions as singleentities, rather than sequence of individual observations.


Methodology

Notation

Observed data: {(Yi1, ti1), . . . , (Yimi, timi

)}i ;

. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error

. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)

. tij ∈ T ; tij may vary across i and j

. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .

εij - white noise; Typically εij are IID

Common objectives :

. Characterize the pattern of variability among curves

. Recover the subject specific trajectories Xi (·)’s


Methodology

Smoothing

Why do we need smoothing?

. Data are often observed with error

. There’s a need to“interpolate” to a common grid

How are we going to do smoothing?

. Use prespecified basis functions (e.g. splines, wavelets, Fourieretc.)

. Use data-driven basis functions (e.g. functional principalcomponents)


Methodology

Prespecified basis expansion

Let {ψ1(·), ψ2(·), . . . , ψK (·)} be a user specified basis. Assume:

Yij =K∑

k=1

cikψk(tij) + εij .

. We only need to estimate the subject-specific scores cik ’s.

What kind of pre-specified basis to use? B-splines, wavelets etc


Methodology

Some common basis functions: B-splines

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−0.5

0.5

1.5 Fourier basis

s

φ k(s)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

B spline basis

s

φ k(s) . Continuous

. Easily defined derivatives

. Good for smooth data


Methodology

Some common basis functions: Wavelets

0.0 0.2 0.4 0.6 0.8 1.0

-0.2

-0.1

0.0

0.1

0.2

0.3

x

wav

elet

func

tion

valu

e

. Formed from a single“mother wavelet” function:ψjk(t) = 2j/2ψ(2j t − k)

. Orthonormal basis

. Particularly good when thereare jumps, spikes, peaks,etc.

. Wavelet representation issparse


Methodology

Example

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.40

0.45

0.50

0.55

0.60

Distance Along Tract

y

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.40

0.45

0.50

0.55

0.60

Distance Along Tracty

Few B-splines fns expansion Many B-spline fns expansion


Methodology

Tuning

For any curve, many possible smooths are available

. Depends on the type of basis

. Depends on the number of basis functions

. Depends on the estimation procedure

“Tuning” is the process of adjusting the smoother to the data athand. This is often implicit (eg. kernel smoothing).


Methodology

Prespecified basis expansion: estimation

Explicit penalization: use a large number of basis functions andpenalize the “roughness” of the fit.

Leads to a penalized SSE:

PenSSEi =

mi∑j=1

(Yij −

K∑k=1

ψk(tij)cik

)2

+ λPen

(K∑

k=1

ψk(·)cik

)

. Measure the roughness using derivatives of the fit, e.g.

Pen(∑K

k=1 ψk(·)cik)

=∫{∑K

k=1 ψ′′k (t)cik}2dt = cTi Dci ,

D is K × K matrix with (k, k ′) equal to∫ψ

′′k (t)ψ

′′k ′(t) ds.

. Choose cik ’s that minimize PenSSEi ; obtain Xi (·)

. Need to select tuning parameter λ. Common ways: CV andREML


Methodology

Smoothing parameter λ

. When λ = 0 the criterion reduces to the SSE (emphasis on fit)

. When λ >> 0 (is very large) more emphasis on smoothness

Illustration : Vancouver mean temperature. Fits using 100 cubicsplines (with equally spaced knots) and various λ .

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●●

●●

●●

●

●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●●

●●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●

●●

●●

●●●●

●●

●

●●

●

●●●●

●●●

●

●

●

●

●●

●

●●

●

●

●

●●

●●●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●●

●●●

●

●●

●●●●●●

●●●●●

●

●●●●●

●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●

●

●●

●

●

●

●

●●

●

●●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

0 100 200 300

02

46

8

lambda = 0.1

prec

ipita

tion

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●●

●●

●●

●

●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●●

●●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●

●●

●●

●●●●

●●

●

●●

●

●●●●

●●●

●

●

●

●

●●

●

●●

●

●

●

●●

●●●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●●

●●●

●

●●

●●●●●●

●●●●●

●

●●●●●

●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●

●

●●

●

●

●

●

●●

●

●●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

0 100 200 300

02

46

8

lambda = 1000

prec

ipita

tion

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●●

●●

●●

●

●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●●

●●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●

●●

●●

●●●●

●●

●

●●

●

●●●●

●●●

●

●

●

●

●●

●

●●

●

●

●

●●

●●●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●●

●●●

●

●●

●●●●●●

●●●●●

●

●●●●●

●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●

●

●●

●

●

●

●

●●

●

●●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

0 100 200 300

02

46

8

lambda = 1e+07

prec

ipita

tion

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●●

●●

●●

●

●●●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●

●●

●

●●

●●●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●

●●

●●

●●●●

●●

●

●●

●

●●●●

●●●

●

●

●

●

●●

●

●●

●

●

●

●●

●●●

●●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●●●●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●●

●●●

●

●●

●●●●●●

●●●●●

●

●●●●●

●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●●●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

●●

●

●●

●

●

●

●

●●

●

●●●

●

●●

●

●

●

●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●●

●●●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

0 100 200 300

02

46

8

lambda = 1e+11

prec

ipita

tion


Methodology

Data-driven basis

. Previous bases don’t depend on the data; only the loadings do

. Functional principal component analysis (FPCA) gives a“data-driven” basis (constructed from the observed data)

. Similar representation of the observed data Y ′ijs:

Yij = µ(tij) +K∑

k=1

cikφk(sij) + εi`.

. Difference is that the φk(·)’s are not known.

. {φ1(·), φ2(·), . . .} describe the main directions of variability inthe observed data


Methodology

fPCA: main idea

Idea of fPCA: find projections of maximum variance.Let {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ].

. The 1st fPC = φ1(t) for which

ξ1i =< φ1,Xi >=

∫Tφ1(t)Xi (t) dt

has maximum variance subject to the constraint ‖φ1‖2 = 1.

. The 2nd fPC = φ2(t) for which

ξ2i =< φ2,Xi >=

∫Tφ2(t)Xi (t) dt

has maximum variance, subject to < φ2, φ1 >= 0, ‖φ2‖2 = 1.

. . . .


Methodology

Data-driven basis: Illustration

FPCA for a sample of 20 curves. Displayed in dashed black line are3 FPC: top middle panel, bottom left and right panels.

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

20 functions

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

comp 1, 85%

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

residuals: remove comp 1

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

comp 2, 14%

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

residuals: remove comp 1,2

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

comp 3, 1%

Model: Xi (t) = ξi1φ1(t) + ξi2φ2(t) + ξi3φ3(t)A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 24 / 71

Methodology

How to obtain the fPC ?

Notation:

. Σ(s, t) := E [{Xi (s)− µ(s)}{Xi (t)− µ(t)}] is the covar fn

. Σ(s, t) induces an integral operator Σ

Σf (t) =

∫T

Σ(t, s)f (s)ds, f (t) ∈ L2[T ]

Re-write the steps of the algorithm that defines fPCs:

. 1st fPC: φ1 = arg maxf < Σf , f >, ‖f ‖2 = 1

. 2nd fPC:φ2 = arg maxf < Σf , f >,< f , φ1 >= 0 and ‖f ‖2 = 1

. . . .

fPCs φk ’s are the eigenvectors of the cov operator Σ: Σφk = λkφk .


Methodology

Mercer’s thm

. Assume Σ(s, t) is continuous over T × T . Then there exists:an orthonormal basis {φk}k of continuous fns in L2[T ] andλ1 ≥ λ2 ≥ . . . > 0 such that

I Σφk = λkφk where Σ is the operator induced by the cov fn

I Σ(s, t) =∑∞

k=1 λkφk(s)φk(t), t, s ∈ T ,where the series converges uniformly on T 2


Methodology

Karhunen-Loeve expansion

. {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ] and {φk}k- eigenfns of the cov fn Σ(·, ·) .

Xi (t) =∞∑k=1

ξikφk(t) (series converges in L2 norm)

where

I ξik :=∫Xi (t)φk(t)dt are random variables

I E [ξik ] = 0, Var [ξik ] = λk and {ξik : k ≥ 1} are mutuallyuncorrelated

I ξik ’s are called fPC scores for Xi


Methodology

Implications KL expansion

. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T

. λk is the average variance of the kth fPC φk , i.e.:

λk = λk

∫φ2k(t)dt =

∫Var{ξikφk(t)}dt

.∫{VarXi (t)}dt =

∑∞k=1 λk

. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl


Methodology

fPCA for sample of IID true curves

Recall: {Xi (t) : t ∈ T }i be IID curves in L2[T ].

1. Sample mean X (t)

2. Sample covariance fn:CX (s, t) = (n − 1)−1

∑ni=1{Xi (s)− X (s)}{Xi (t)− X (t)}

3. Spectral decomposition (or eigenanalysis) of CX gives pairseigenfn/eigenval {φk(·), λk}


Methodology

fPCA for sample of IID true curves

4. Select finite truncation K using PVE or information criteria(AIC, BIC etc.)

5. Estimate fPC scores: ξik =∫T {Xi (t)− X (t)}φk(t)dt

6. KL expansion Xi (t) = X (t) +∑K

k=1 ξik φk(t)


Methodology

Theoretical properties fPCA

Consistency results for the sample mean, sample covariancefunctions, and the corresponding eigenvalues/eigenfunctions.

. EX = µ and E‖X − µ‖2 = O(n−1).

. The sample covariance is unbiased and is mean squaredconsistent estimator of the covariance function

Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0

Let {λk , φk}k be the eigenvalues/eigenfunctions andck = sign

∫φk(t)φk(t)dt. Then

lim supn→∞

nE [‖ck φk−φk‖2] <∞ lim supn→∞

nE [|λk−λk |2] <∞


Methodology

fPCA for dense sampling design plus noise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Common approach: first smooth each curve, then do fPCA

. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)

. µ sample mean of Xi ’s

. Σ sample cov of Xi ’s

. Eigenanalysis of Σ to get {λk , φk}k

. Use PVE or other criteria to select truncation K


Methodology

Trajectories reconstruction

. Estimate fPC scores

ξik =

mi∑j=1

{Yij − µ(tij)}φk(tij)(tij − tij−1)

(accounting for possibly unequal grids of points)

. Finite dimension approx:

XKi (t) = µ(t) +

K∑k=1

φk(t)ξik


Methodology

Illustration: DTITop: FA along corpus callosum for MS subjects (left); covarianceestimate (middle); scree plot (right). Bottom: The two leadingfPCs and their effect relative to the population mean

0 20 40 60 80

0.3

0.4

0.5

0.6

0.7


tract

Fra

ctio

nal a

niso

trop

y (F

A)

20 40 60 80

2040

6080

Smooth covariance of FA

tract

trac

t

0.002

0.004

0.006

0.008

●

●

●

●

●

●

●

●●

● ● ● ● ● ● ● ● ● ● ●

5 10 15 20

0.70

0.75

0.80

0.85

0.90

0.95

1.00

scree plot

number of PCsP

EV

0 20 40 60 80

−2

−1

01

2

fPC1

tract

−

−

−

−

−−−−−−−−−−−

−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−

−−−−−−−−−

0 20 40 60 80

0.4

0.5

0.6

0.7

fPC1 (68%)

tract

+++++++

+++++++++++++++++++

+++++

+++++++++++++++++

+++++++++++++++++++++++++++++

++++++++++++++++

0 20 40 60 80

−2

−1

01

2

fPC2

tract

−

−

−

−

−

−

−−−−−

−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−

−

−

−

−

−

−

−

−

0 20 40 60 80

0.40

0.45

0.50

0.55

0.60

0.65

fPC2 (10%)

tract

+

+

+

+

+

+++++++

++

+

+

+

+

++++++++

+++++

+++++++++++++++

+++++++++

+++++++++++

+++++

+

+

+

+

+

+

+

+

+

++++

+++++++++


Methodology

3.4. fPCA for sparse sampling design plusnoise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Challenges:

. Number of measurements per curve is very small;

. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !

Common approach: pool subjects data to do fPCA


Methodology

Reasoning behind estimation procedure

Recall model: Yij = Xi (tij) + εij Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2)

. Mean estimator µ(t): by smoothing the data {(tij ,Yij) : i , j}

. To estimate the covariance, note the following:

Cov(Yij ,Yij ′) = Σ(tij , tij ′) if j 6= j ′

Var(Yij) = Σ(tij , tij) + σ2

Treat differently diagonal (tij , tij) & off-diagonal (tij , tij ′) terms


Methodology

Sparse fPCA

. ‘Off-diagonal’ smoothing: Define “raw -covariances”

Gijj ′ = {Yij − µ(tij)}{Yij ′ − µ(tij ′)}

- two-dimensional smoothing of {Gijj ′ , tij , tij ′) : j 6= j ′}iworking model: Gijj ′ = Σ(tij , tij ′) + eijj ′ , eijj ′ ∼ IID,

where Σ(·, ·) symmetric + positive-definite fn

Model Σ using bivariate basis fns or tensor product of twouniv bases fns. Different penalization (Wood, 2005)

. Adjust estimate to be symmetric; zero out negative eigenvals !


Methodology

Sparse fPCA

. ‘Diagonal’ smoothing: Define raw -variances

Gij = {Yij − µ(tij)}2

- one-dimensional smoothing of {(Gij , tij) : i , j} → σ2Y (t)

. Estimate σ2: σ2 =∫T {σ

2Y (t)− Σ(t, t)} dt


Methodology

Sparse fPCA (cont’d)

. Eigenanalysis of Σ(t, s) gives eigenvals/fns, {λk , φk(t)}k ,

. Choose K using PVE or other approaches (e.g. 95%)

. Estimate true signal Xi (·) by the truncated KL expansion:

XKi (t) = µ(t) +

K∑k=1

ξik φk(t)

. How to estimate the fPC ξik ?∑mij=1{Yij − µ(tij)}φk(tij)(tij − tij−1) is no longer feasible !?

Instead estimate fPC using conditional expectation !


Methodology

Prediction of fPC

. Consider the reduced rank model (mixed effects model)

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij j = 1, 2, . . . ,mi

ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)

Assume for now that µ(·), φk(·), λk and σ2 are known

. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]

. Assume ξik ’s and Yij ’s jointly Gaussian.

Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).


Methodology

Prediction of fPC in practice

. Use the approximated reduced rank model as working model

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij

where ξik ∼i IID(0, λk) and εij ∼IID (0, σ2)

. Predict ξik by the empirical BLUP:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).


Methodology

Theoretical properties of sparse fPCAUnder suitable regularity conditions

. Let µ(t) be the local linear mean estimator of µ(t). Then

supt|µ(t)− µ(t)| = Op(n−3/10)

. Let Σ(s, t) be the local linear estimator of Σ(s, t). Then

supt,s|Σ(s, t)− Σ(s, t)| = Op(n−1/10)

Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0

Let {λk , φk}k be the eigenvalues/eigenfunctions of Σ(·, ·) andck = sign

∫φk(t)φk(t)dt. Then for all k = 1, . . . ,K

‖ck φk − φk‖2 = Op(n−1/10), |λk − λk | = Op(n−1/10)


Methodology

Illustration: CD4Top: estimated mean; estimated covariance; scree plotBottom: Top fPC and the effect of changes along this direction.

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

CD4 cell counts


num

ber

of c

d4 c

ells

●●

●

● ●

●●

●

●●

●

●

●

●

●

●

row

colu

mn

10

20

30

40

50

60

10 20 30 40 50 60

40000

50000

60000

70000

80000

90000

100000

110000

120000

●

●

●

● ● ●

1 2 3 4 5 6

0.80

0.85

0.90

0.95

1.00

scree plot

number of PCs

PE

V

−20 −10 0 10 20 30 40

−2

−1

01

2

fPC1

month

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

fPC1 79.5%

month

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Methodology

Software implementation

. R

I for estimation

face for fast covariance estimation for sparse functional datafda for functional data analysis in Rfpca for functional principal component analysismgcv for generalized additive (mixed) models;semi-parametric smoothingrefund for regression with functional data;

I for visualization

fields for 2d image plotslattice for various plotsrefund.shiny for interacting plots for functional dataanalysesrgl for 3d plots

. MATLAB

I PACE


Methodology

Beyond iid functional data

Although the iid case is quite common, here are other situations:

. Multilevel functional data:

I {Yij(s), s ∈ S, i = 1, . . . , n, j = 1, . . . ,mi}I Eg: crypt-level biomarker data in colon-carcinogenesis studies

. Longitudinal functional data:

I [{Yij(s),Tij}, s ∈ S, i = 1, . . . , n, j = 1, . . . ,mi}I Eg: DTI data (multiple clinical visits)


Methodology

DTI data (revisit)

Each subject is observed at several hospital visits and at eachtime a FA profile is measured. Below: FA for two subjects.Of interest: what is the dynamic behavior over time ?

0 20 40 60 80

0.40

0.50

0.60

Subject A (total of 4 visits)

tract

FA

visit times = 0, 192, 365, 834

0 20 40 60 80

0.30

0.40

0.50

0.60

Subject B (total of 3 visits)

tract

FA

visit times = 0, 112, 239

FA along CCA at various times after baseline (time=0)


Methodology

Longitudinal functional data analysis (LFDA)

Data Structure

[{Yij(·), Tij} : i = 1, . . . , n, j = 1, . . . ,mi ]

. Yij(·) is jth response of the ith subject observed on fine grid

. Tij is the time corresponding to the Yij(·)

Tij is observed in a longitudinal design

Objective:

. Dynamic behavior of the process over time

. Predict full curve using subjects’ past history


Methodology

Related literature

. Morris & Carroll(JRSSB06); Baladandayuthapani etal(Bcs08); Di et al(AoAS09); Aston et al(JRSSC10); S. et al(Biost10); Scheipl et al(JCGS14); Morris (AnRev15)

. Greven et al(EJS10); Chen & Muller(JASA12);. . .

Limitation: no methodology that captures the process dynamicsover time and provides prediction of full trajectory at future time


Methodology

Park & S. (Stat15)

Proposed Model

Yij(s) = µ(s,Tij) +∑∞

k=1 ξik(Tij)φk(s) + εij(s)

. µ(·,Tij) mean response profile (FA along CCA) at time Tij

. φk(·)’s are orthogonal functions in L2(S)

. ξik(T ) zero-mean time-varying random coefficients; theyquantify the process dynamics

. εij(·) zero-mean IID residual process


Methodology

Flexibility of the proposed framework

PAR : cov{ξik(T ), ξik(T ′)} = λkρk(|T − T ′|) , λk > 0;

relates to Gromenko et al(AnnAS12) and Gromenko andKokoszka (CSDA13) for n = 1

REM : ξik(T ) = ζ0,ik + ζ1,ikT ;

relates to Greven et al(EJS10) for φX1k (s) = φX2

k (s) := φk(s)

NP : ξik(T ) =∑

`≥1 ζik`ψk`(T );

relates to Chen & Muller(JASA12) for φk(s|Tij) := φk(s)


Methodology

Selecting the time-invariant basis functions

Recall model : Yij(s) = µ(s,Tij) +∑∞

k=1 ξik(Tij) φk(s) + εij(s)

. Pre-specified basis functions (e.g. Fourier, B-splines, wavelets)

. Data-driven basis functions like the eigenfunctions of somecovariance function. Which covariance function to use ?


Methodology

Time-invariant basis functions, φk(s)’s (cont’d)

Recall model : Yij(s) = µ(s,Tij) +∑∞


. Let Wi (s,T ) =∑∞

k=1 ξik(T )φk(s) and denote its covariance by

cov{Wi (s,T ), Wi (s′,T ′)} = c{(s,T ), (s ′,T ′)}

. Define Σ(s, s ′) :=∫c{(s,T ), (s ′,T )} g(T ) dT ,

g(T ) the sampling density of T .

. Σ(s, s ′) → proper cov fn. Call Σ marginal cov induced by Wi .Similar concept: Chen et al.(JRSSB16+), Aston et al.(srXiv15)


Methodology

Time-invariant basis functions, φk(s)’s (cont’d)

. Choose {φk(·)}k ’s as the eigenfunctions of Σ(s, s ′)

. Proxy time-varying basis coefficients

ξW ,ijk =∫{Yij(s)− µ(s,Tij)} φk(s)ds

I For each k , {(ξW ,ijk ,Tij)mi

j=1}i describe the process dynamics.


Methodology

Estimation roadmap

Step 1 : Mean function µ(s,T )

Step 2 : Marginal covariance function Σ(s, s ′) and the orthogonalbasis φk(s)’s

Step 3 : Time-varying coefficients ξik(T ) for every T

Step 4 : Full trajectories Yi (·,T ) for every T


Methodology

Estimation

Case: fully observed curves L2[S] error

Step 1 : Mean function

. Estimate µ(s,T ) using bivariate (tensor product) splinessmoothing Wood (Bcs2006) + working independence ⇒ µ(s,T )


Methodology

Estimation (cont’d)

Step 2 : Marginal Covariance Function

. Demean data: Yij(s) := Yij(s)− µ(s,Tij)

. Estimate Σ(s, s ′) by sample covariance of the demeaned data

Σ(s, s ′) =1∑n

i=1mi

∑ni=1

∑mij=1 Yij(s)Yij(s

′)

. Spectral decomposition of Σ(s, s ′) gives {λk , φk(s)}k


Methodology


Step 3 : Time-varying coefficients ξik(T ) for every T

For each k estimate proxy ξW ,ijk = ξik(Tij) + eijk by

ξW ,ijk=∫Yij(s) φk(s)ds (numerical integ.)

. Use standard longitudinal/sparse functional data methods toanalyse {(ξW ,ijk ,Tij)

mij=1}i and estimate temporal covaraince

. Predict ξik(T ): ξik(T ) for every T using this estimated covariance


Methodology


Step 4 : Predict the full trajectory Yi (·,T ) for every T

Yi (·,T ) = µ(·,T ) +∑K

k=1 ξik(T )φk(·)


Methodology

Theoretical properties

Lemma 1: Marginal covariance

Under regularity assumptions, ‖Σ(·, ·)− Σ(·, ·)‖ →p 0

. Consistency results for λk and φk(·).

Lemma 2 : Proxy time-varying basis coefficients

Under regularity assumptions, supj |ξW ,ijk − ξW ,ijk | →p 0

. Consistency of the predicted trajectories, Yi (s,T ).


Methodology

Implementation in R

Software implementation in R (Wrobel, Park, S., and Goldsmith,Stat 2016)

. refund: fpca.lfda

. refund.shiny: plot shiny for visualization


Methodology

Simulation experiment

Generating Model:

Yij(s) = µ(s,Tij) + ξi1(Tij)φ1(s) + ξi2(Tij)φ2(s) + εij(s)

. Covariance structures for the time-varying coef, ξik(T )’s(i) NP ; (ii) REM ; (iii) Exp

. Error structure εij(s): smooth + white noise (SNR= 1)

. Grid points for s: 101 equally spaced points in [0, 1]

. For each i , {Tij : j = 1, 2, . . . ,mi} are randomly sampled from 41equally spaced points in [0, 1]


Methodology

Prediction performance and computation efficiency comparison:

. Proposed method

. Naıve: take average the subject’s previous curves to predictthe current trajectory

. Chen&Muller (JASA12): using time-dependent orthogonalfunctions φk(s|T )


Methodology

Results

mi ∼ {8, . . . , 12}IN-IPE IN-IPEnaive OUT-IPE OUT-IPEnaive

NP n = 100 0.406 7.790 0.988 11.478n = 300 0.313 7.773 0.559 11.349n = 500 0.288 7.779 0.455 11.262

REM n = 100 0.328 1.199 1.011 2.160n = 300 0.265 1.197 0.675 2.160n = 500 0.247 1.197 0.571 2.150

Exp n = 100 0.554 1.528 1.426 2.520n = 300 0.508 1.531 1.143 2.498n = 500 0.494 1.530 1.074 2.492


Methodology

Results (cont’d)

. Computationally faster compared to available approaches

mi ∼ {8, . . . , 12}Chen&Muller (JASA12) Proposed method

IN-IPE OUT-IPE time (sec) IN-IPE OUT-IPE time (sec)

NP n = 100 0.880 2.221 983.872 0.406 0.988 7.369n = 300 0.622 1.468 1659.611 0.313 0.559 15.892n = 500 0.556 1.298 2502.462 0.288 0.455 21.418

REM n = 100 0.424 1.359 1084.753 0.328 1.011 9.282n = 300 0.289 0.729 1955.193 0.265 0.675 11.347n = 500 0.257 0.614 2947.126 0.247 0.571 22.559

Exp n = 100 0.634 1.642 1556.182 0.554 1.426 7.514n = 300 0.549 1.251 1959.219 0.508 1.143 16.229n = 500 0.531 1.155 2865.041 0.494 1.074 17.109


Methodology

DTI data analysis

. FA along CCA for MS patients

. 162 MS patients observed at between 1 to 8 hospital visits

. Tij - hospital visit time (mean = 2.6 visits/subj)

. Yij(·) - FA profile (93 locn along CCA) for i subj at Tij

. 421 total curves

Objective:

. Dynamic behavior of FA over time

. Predict FA profile at a subject’s future visit


Methodology

DTI study (cont’d)

DTI data exploratory analysis

Model assumption

. Yij(s) = µ(s) +∑K


. Time-varying coef: ξik(Tij) = b0ik + b1ikTij (REM)

Estimation

Longitudinal functional data analysis for DTI data

. φk(·)’s - eigenfns of the estimated marginal covariance Ξ(·, ·)

. Fix percentage of explained variance to 95% → K = 10


https://park-ncsu-stat.shinyapps.io/DTI_EDA_ex

https://park-ncsu-stat.shinyapps.io/DTI_LFPCA_ex

Methodology

DTI study results (time dynamics)

Figure: Estimated basis functions φk(s) for k = 1, 2 and 3


Methodology

DTI study results (time dynamics)

Figure: Time-varying coefficients ξik(T ) for k = 1, 2 and 3 (REM)


Methodology

DTI study (cont’d)

Predicted values of FA for the last visits of three randomly selectedsubjects; actual(gray) / proposed (black) / naıve (dashed)

0.0 0.2 0.4 0.6 0.8 1.0

0.40

0.45

0.50

0.55

0.60

0.65

subject A, visit 6

tract

0.0 0.2 0.4 0.6 0.8 1.0

0.4

0.5

0.6

0.7

subject B, visit 2

tract

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

0.55

0.65

subject C, visit 3

tract

actual obsnaive predictionour prediction

Naıve Greven et al. (EJS10) Chen & Muller (JASA12) Proposed

IN-IPE×102 - 2.66 3.76 2.31OUT-IPE×102 3.52 - 8.71 3.48


Methodology

Final remarks

. Sample of independent curves

I Smoothing using pre-specified basis

I FPCA

. Longitudinally observed curves

I Process dynamics + future curve prediction

. Software implementation/visualization in R


Methodology

Thank you!

Comments? Questions ? [email protected]


tutorial on functional data analysis - samsi · a-m staicu tutorial on functional data analysis...

Documents