tutorial on functional data analysis - samsi · a-m staicu tutorial on functional data analysis...

111
Methodology Tutorial on Functional Data Analysis Ana-Maria Staicu Department of Statistics, North Carolina State University [email protected] SAMSI April 5, 2017 A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 1 / 71

Upload: others

Post on 14-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Tutorial on Functional Data Analysis

Ana-Maria Staicu

Department of Statistics, North Carolina State [email protected]

SAMSI

April 5, 2017

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 1 / 71

Page 2: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Broad overview of course topics:

1. Introduction to Functional Data

2. Modeling Functional Data with Preset Basis Expansions

3. Modeling Functional Data using Functional PrincipalComponent Analysis

4. Beyond Independent and Identically Distributed FunctionalData

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 2 / 71

Page 3: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Example of functional data

Canadian weather data. Daily (and monthly) temperature andprecipitation at 35 different locations in Canada averaged over35 years from 1960 to 1994.

●●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●

●●

0 100 200 300

−30

−20

−10

010

20

Ontario

days

tem

pera

ture

0 100 200 300

−30

−20

−10

010

20

days

tem

pera

ture

Avg Daily Temp in Ottawa Avg Daily Temp across Canada

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 3 / 71

Page 4: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Example of functional data (cont’d)

Marathon data. Running time performance of 149 femalesand 105 male athletes competing in the US Olympic TeamTrials Marathon 02/16/2016.

6

8

10

12

Miles

Spe

ed (

mile

s/hr

)

1 3 5 7 9 11 13 15 17 19 21 23 25

●●

● ●

●●

● ●●

● ●

●● ●

●● ●

●●

●●

●●

●● ●

●●

● ●●

● ● ● ●●

● ● ●

●●

●●

● ●

● ●

●● ● ●

● ●● ●

●●

● ● ●

●●

●● ●

●●

● ●●

●● ●

● ● ● ●● ●

● ●

●● ●

● ●● ● ●

● ●

● ●●

● ● ● ● ●

● ●●

● ●●

●● ●

●●

●● ●

6

8

10

12

Miles

Spe

ed (

mile

s/hr

)

1 3 5 7 9 11 13 15 17 19 21 23 25

●●

● ●

● ●●

● ● ●

● ● ●●

●● ● ● ● ●

●●

●●

●● ● ●

● ●● ●

●●

●●

● ● ●●

● ●●

● ● ● ●

● ●● ● ● ●

●●

●●

●●

●●

● ●

● ●

● ●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

Speed for females Speed for males

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 4 / 71

Page 5: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Example of functional data (cont’d)

CD4 data. CD4 cell count per mm of blood is a usefulsurrogate of the progression of HIV. Below are CD4 cellcounts for 366 subjects between months -18 and 42 monthssince seroconversion (diagnosis of HIV).

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

CD4 cell counts

months since seroconversion

num

ber

of c

d4 c

ells

●●

● ●

●●

●●

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

CD4 cell counts

months since seroconversion

num

ber

of c

d4 c

ells

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

● ●

● ●

●●

● ●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

● ● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●●

● ●

● ●

● ●

● ●

●● ●

●●

●● ● ●

●●

● ●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

● ●

● ●

●●

●●

●●

●●

●● ●

●●●

● ●

● ●

● ●

● ● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

● ●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

● ● ●

●● ●

● ●●

●●

●●

● ● ●

● ●

●●

●●

●●

● ●

● ● ●

●●

●●

● ● ●

● ● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●● ●

● ● ●●

● ●

●●

● ●

●●

● ●

●●

●● ●

●●

●●

● ●●

●●

● ● ●

● ●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

CD4 for few subjects CD4 counts for all the subjects

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 5 / 71

Page 6: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Example of functional data (cont’d)

Diffusion tensor imaging (DTI) data. Fractional anisotropy(FA) - measure of the tissue integrity that is useful indiagnosis/progression of multiple sclerosis (MS) - along themain direction of the corpus callosum (CCA) for many subj.

●●

●●

●●●●●

●●●

●●

●●●●

●●●

●●●●●●●●

●●

●●

●●●

●●●●●●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

0 20 40 60 80

0.3

0.4

0.5

0.6

0.7

Diffusion Tensor Imaging : CCA

tract

Fra

ctio

nal a

niso

trop

y (F

A)

0 20 40 60 80

0.3

0.4

0.5

0.6

0.7

Diffusion Tensor Imaging : CCA

tract

Fra

ctio

nal a

niso

trop

y (F

A)

FA obs for one MS subj FA for all subj Brain CCA (red)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 6 / 71

Page 7: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Example of functional data (cont’d)

Diffusion tensor imaging (DTI) data. Each subject is observedat several hospital visits and at each time a FA profile ismeasured. Below: FA for two subjects.Of interest: what is the dynamic behavior over time ?

0 20 40 60 80

0.40

0.50

0.60

Subject A (total of 4 visits)

tract

FA

visit times = 0, 192, 365, 834

0 20 40 60 80

0.30

0.40

0.50

0.60

Subject B (total of 3 visits)

tract

FA

visit times = 0, 112, 239

FA along CCA at various times after baseline (time=0)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 7 / 71

Page 8: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Some characteristics of functional data:

. Often high dimensional

. Typically involves multiple measurements of the same“process”

. Interpretability across subject domains

. Parametric assumptions on the underlying “process” are oftennot made

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 8 / 71

Page 9: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

FDA vs multivariate data analysis

. Topology: well defined topology for FDA / one can permutethe elements in MDA

. Model covariance assn: smooth for FDA / unstructured orsparse for MDA

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 9 / 71

Page 10: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

FDA vs longitudinal data analysis

. Model covariance: no assn for FDA / parametric assn for LDA

. Mechanisms of missingness: not very important in FDA / veryimportant for LDA

. Interest more in subject-specific trajectories for FDA /inference for LDA

. Sampling design: typically high frequency for FDA / sparseand irregular for LDA

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 10 / 71

Page 11: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Some references

. Ramsay & Silverman, 2005, “Functional Data Analysis”

. Ramsay, Hooker & Graves, 2009, “Functional Data Analysis inR and Matlab”

. Horvath & Kokoszka, 2012, “Inference for Functional Datawith Applications”

. Bosq, 2002, “Linear Processes on Function Spaces”

. Ferraty & Vieux, 2006, “Nonparametric Functional DataAnalysis”

. Zhang, 2013, “Analysis of Variance for Functional Data”

. Hsing & Eubank, 2015, “Theoretical Foundations ofFunctional Data Analysis, with an Introduction to LinearOperators”

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 11 / 71

Page 12: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

From discrete to functional data. Intuition

The term functional in reference to observed data refers to theintrinsic structure of the data being functional; i.e. there is anunderlying function that gives rise to the observed data.

Advantages of representing the data as a smooth function:

. allows evaluation at any time point

. allows evaluation of rates of change of the underlying curve

. allows registration to a common time-scale

Main idea in FDA: treat the observed data functions as singleentities, rather than sequence of individual observations.

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71

Page 13: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

From discrete to functional data. Intuition

The term functional in reference to observed data refers to theintrinsic structure of the data being functional; i.e. there is anunderlying function that gives rise to the observed data.

Advantages of representing the data as a smooth function:

. allows evaluation at any time point

. allows evaluation of rates of change of the underlying curve

. allows registration to a common time-scale

Main idea in FDA: treat the observed data functions as singleentities, rather than sequence of individual observations.

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71

Page 14: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

From discrete to functional data. Intuition

The term functional in reference to observed data refers to theintrinsic structure of the data being functional; i.e. there is anunderlying function that gives rise to the observed data.

Advantages of representing the data as a smooth function:

. allows evaluation at any time point

. allows evaluation of rates of change of the underlying curve

. allows registration to a common time-scale

Main idea in FDA: treat the observed data functions as singleentities, rather than sequence of individual observations.

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71

Page 15: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Notation

Observed data: {(Yi1, ti1), . . . , (Yimi, timi

)}i ;

. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error

. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)

. tij ∈ T ; tij may vary across i and j

. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .

εij - white noise; Typically εij are IID

Common objectives :

. Characterize the pattern of variability among curves

. Recover the subject specific trajectories Xi (·)’s

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71

Page 16: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Notation

Observed data: {(Yi1, ti1), . . . , (Yimi, timi

)}i ;

. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error

. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)

. tij ∈ T ; tij may vary across i and j

. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .

εij - white noise; Typically εij are IID

Common objectives :

. Characterize the pattern of variability among curves

. Recover the subject specific trajectories Xi (·)’s

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71

Page 17: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Notation

Observed data: {(Yi1, ti1), . . . , (Yimi, timi

)}i ;

. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error

. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)

. tij ∈ T ; tij may vary across i and j

. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .

εij - white noise; Typically εij are IID

Common objectives :

. Characterize the pattern of variability among curves

. Recover the subject specific trajectories Xi (·)’s

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71

Page 18: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Notation

Observed data: {(Yi1, ti1), . . . , (Yimi, timi

)}i ;

. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error

. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)

. tij ∈ T ; tij may vary across i and j

. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .

εij - white noise; Typically εij are IID

Common objectives :

. Characterize the pattern of variability among curves

. Recover the subject specific trajectories Xi (·)’s

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71

Page 19: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Notation

Observed data: {(Yi1, ti1), . . . , (Yimi, timi

)}i ;

. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error

. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)

. tij ∈ T ; tij may vary across i and j

. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .

εij - white noise; Typically εij are IID

Common objectives :

. Characterize the pattern of variability among curves

. Recover the subject specific trajectories Xi (·)’s

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71

Page 20: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Smoothing

Why do we need smoothing?

. Data are often observed with error

. There’s a need to“interpolate” to a common grid

How are we going to do smoothing?

. Use prespecified basis functions (e.g. splines, wavelets, Fourieretc.)

. Use data-driven basis functions (e.g. functional principalcomponents)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 14 / 71

Page 21: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Smoothing

Why do we need smoothing?

. Data are often observed with error

. There’s a need to“interpolate” to a common grid

How are we going to do smoothing?

. Use prespecified basis functions (e.g. splines, wavelets, Fourieretc.)

. Use data-driven basis functions (e.g. functional principalcomponents)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 14 / 71

Page 22: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prespecified basis expansion

Let {ψ1(·), ψ2(·), . . . , ψK (·)} be a user specified basis. Assume:

Yij =K∑

k=1

cikψk(tij) + εij .

. We only need to estimate the subject-specific scores cik ’s.

What kind of pre-specified basis to use? B-splines, wavelets etc

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 15 / 71

Page 23: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Some common basis functions: B-splines

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−0.5

0.5

1.5 Fourier basis

s

φ k(s)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

B spline basis

s

φ k(s) . Continuous

. Easily defined derivatives

. Good for smooth data

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 16 / 71

Page 24: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Some common basis functions: Wavelets

0.0 0.2 0.4 0.6 0.8 1.0

-0.2

-0.1

0.0

0.1

0.2

0.3

x

wav

elet

func

tion

valu

e

. Formed from a single“mother wavelet” function:ψjk(t) = 2j/2ψ(2j t − k)

. Orthonormal basis

. Particularly good when thereare jumps, spikes, peaks,etc.

. Wavelet representation issparse

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 17 / 71

Page 25: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Example

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.40

0.45

0.50

0.55

0.60

Distance Along Tract

y

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.40

0.45

0.50

0.55

0.60

Distance Along Tracty

Few B-splines fns expansion Many B-spline fns expansion

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 18 / 71

Page 26: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Tuning

For any curve, many possible smooths are available

. Depends on the type of basis

. Depends on the number of basis functions

. Depends on the estimation procedure

“Tuning” is the process of adjusting the smoother to the data athand. This is often implicit (eg. kernel smoothing).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 19 / 71

Page 27: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prespecified basis expansion: estimation

Explicit penalization: use a large number of basis functions andpenalize the “roughness” of the fit.

Leads to a penalized SSE:

PenSSEi =

mi∑j=1

(Yij −

K∑k=1

ψk(tij)cik

)2

+ λPen

(K∑

k=1

ψk(·)cik

)

. Measure the roughness using derivatives of the fit, e.g.

Pen(∑K

k=1 ψk(·)cik)

=∫{∑K

k=1 ψ′′k (t)cik}2dt = cTi Dci ,

D is K × K matrix with (k, k ′) equal to∫ψ

′′k (t)ψ

′′k ′(t) ds.

. Choose cik ’s that minimize PenSSEi ; obtain Xi (·)

. Need to select tuning parameter λ. Common ways: CV andREML

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 20 / 71

Page 28: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Smoothing parameter λ

. When λ = 0 the criterion reduces to the SSE (emphasis on fit)

. When λ >> 0 (is very large) more emphasis on smoothness

Illustration : Vancouver mean temperature. Fits using 100 cubicsplines (with equally spaced knots) and various λ .

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 0.1

prec

ipita

tion

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 1000

prec

ipita

tion

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 1e+07

prec

ipita

tion

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 1e+11

prec

ipita

tion

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 21 / 71

Page 29: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Smoothing parameter λ

. When λ = 0 the criterion reduces to the SSE (emphasis on fit)

. When λ >> 0 (is very large) more emphasis on smoothness

Illustration : Vancouver mean temperature. Fits using 100 cubicsplines (with equally spaced knots) and various λ .

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 0.1

prec

ipita

tion

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 1000

prec

ipita

tion

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 1e+07

prec

ipita

tion

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

0 100 200 300

02

46

8

lambda = 1e+11

prec

ipita

tion

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 21 / 71

Page 30: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Data-driven basis

. Previous bases don’t depend on the data; only the loadings do

. Functional principal component analysis (FPCA) gives a“data-driven” basis (constructed from the observed data)

. Similar representation of the observed data Y ′ijs:

Yij = µ(tij) +K∑

k=1

cikφk(sij) + εi`.

. Difference is that the φk(·)’s are not known.

. {φ1(·), φ2(·), . . .} describe the main directions of variability inthe observed data

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 22 / 71

Page 31: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA: main idea

Idea of fPCA: find projections of maximum variance.Let {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ].

. The 1st fPC = φ1(t) for which

ξ1i =< φ1,Xi >=

∫Tφ1(t)Xi (t) dt

has maximum variance subject to the constraint ‖φ1‖2 = 1.

. The 2nd fPC = φ2(t) for which

ξ2i =< φ2,Xi >=

∫Tφ2(t)Xi (t) dt

has maximum variance, subject to < φ2, φ1 >= 0, ‖φ2‖2 = 1.

. . . .

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 23 / 71

Page 32: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA: main idea

Idea of fPCA: find projections of maximum variance.Let {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ].

. The 1st fPC = φ1(t) for which

ξ1i =< φ1,Xi >=

∫Tφ1(t)Xi (t) dt

has maximum variance subject to the constraint ‖φ1‖2 = 1.

. The 2nd fPC = φ2(t) for which

ξ2i =< φ2,Xi >=

∫Tφ2(t)Xi (t) dt

has maximum variance, subject to < φ2, φ1 >= 0, ‖φ2‖2 = 1.

. . . .

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 23 / 71

Page 33: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA: main idea

Idea of fPCA: find projections of maximum variance.Let {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ].

. The 1st fPC = φ1(t) for which

ξ1i =< φ1,Xi >=

∫Tφ1(t)Xi (t) dt

has maximum variance subject to the constraint ‖φ1‖2 = 1.

. The 2nd fPC = φ2(t) for which

ξ2i =< φ2,Xi >=

∫Tφ2(t)Xi (t) dt

has maximum variance, subject to < φ2, φ1 >= 0, ‖φ2‖2 = 1.

. . . .

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 23 / 71

Page 34: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Data-driven basis: Illustration

FPCA for a sample of 20 curves. Displayed in dashed black line are3 FPC: top middle panel, bottom left and right panels.

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

20 functions

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

comp 1, 85%

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

residuals: remove comp 1

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

comp 2, 14%

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

residuals: remove comp 1,2

0.0 0.2 0.4 0.6 0.8 1.0

−3

−2

−1

01

23

comp 3, 1%

Model: Xi (t) = ξi1φ1(t) + ξi2φ2(t) + ξi3φ3(t)A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 24 / 71

Page 35: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

How to obtain the fPC ?

Notation:

. Σ(s, t) := E [{Xi (s)− µ(s)}{Xi (t)− µ(t)}] is the covar fn

. Σ(s, t) induces an integral operator Σ

Σf (t) =

∫T

Σ(t, s)f (s)ds, f (t) ∈ L2[T ]

Re-write the steps of the algorithm that defines fPCs:

. 1st fPC: φ1 = arg maxf < Σf , f >, ‖f ‖2 = 1

. 2nd fPC:φ2 = arg maxf < Σf , f >,< f , φ1 >= 0 and ‖f ‖2 = 1

. . . .

fPCs φk ’s are the eigenvectors of the cov operator Σ: Σφk = λkφk .

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 25 / 71

Page 36: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

How to obtain the fPC ?

Notation:

. Σ(s, t) := E [{Xi (s)− µ(s)}{Xi (t)− µ(t)}] is the covar fn

. Σ(s, t) induces an integral operator Σ

Σf (t) =

∫T

Σ(t, s)f (s)ds, f (t) ∈ L2[T ]

Re-write the steps of the algorithm that defines fPCs:

. 1st fPC: φ1 = arg maxf < Σf , f >, ‖f ‖2 = 1

. 2nd fPC:φ2 = arg maxf < Σf , f >,< f , φ1 >= 0 and ‖f ‖2 = 1

. . . .

fPCs φk ’s are the eigenvectors of the cov operator Σ: Σφk = λkφk .

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 25 / 71

Page 37: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Mercer’s thm

. Assume Σ(s, t) is continuous over T × T . Then there exists:an orthonormal basis {φk}k of continuous fns in L2[T ] andλ1 ≥ λ2 ≥ . . . > 0 such that

I Σφk = λkφk where Σ is the operator induced by the cov fn

I Σ(s, t) =∑∞

k=1 λkφk(s)φk(t), t, s ∈ T ,where the series converges uniformly on T 2

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 26 / 71

Page 38: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Karhunen-Loeve expansion

. {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ] and {φk}k- eigenfns of the cov fn Σ(·, ·) .

Xi (t) =∞∑k=1

ξikφk(t) (series converges in L2 norm)

where

I ξik :=∫Xi (t)φk(t)dt are random variables

I E [ξik ] = 0, Var [ξik ] = λk and {ξik : k ≥ 1} are mutuallyuncorrelated

I ξik ’s are called fPC scores for Xi

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 27 / 71

Page 39: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Karhunen-Loeve expansion

. {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ] and {φk}k- eigenfns of the cov fn Σ(·, ·) .

Xi (t) =∞∑k=1

ξikφk(t) (series converges in L2 norm)

where

I ξik :=∫Xi (t)φk(t)dt are random variables

I E [ξik ] = 0, Var [ξik ] = λk and {ξik : k ≥ 1} are mutuallyuncorrelated

I ξik ’s are called fPC scores for Xi

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 27 / 71

Page 40: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Implications KL expansion

. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T

. λk is the average variance of the kth fPC φk , i.e.:

λk = λk

∫φ2k(t)dt =

∫Var{ξikφk(t)}dt

.∫{VarXi (t)}dt =

∑∞k=1 λk

. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71

Page 41: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Implications KL expansion

. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T

. λk is the average variance of the kth fPC φk , i.e.:

λk = λk

∫φ2k(t)dt =

∫Var{ξikφk(t)}dt

.∫{VarXi (t)}dt =

∑∞k=1 λk

. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71

Page 42: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Implications KL expansion

. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T

. λk is the average variance of the kth fPC φk , i.e.:

λk = λk

∫φ2k(t)dt =

∫Var{ξikφk(t)}dt

.∫{VarXi (t)}dt =

∑∞k=1 λk

. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71

Page 43: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Implications KL expansion

. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T

. λk is the average variance of the kth fPC φk , i.e.:

λk = λk

∫φ2k(t)dt =

∫Var{ξikφk(t)}dt

.∫{VarXi (t)}dt =

∑∞k=1 λk

. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71

Page 44: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for sample of IID true curves

Recall: {Xi (t) : t ∈ T }i be IID curves in L2[T ].

1. Sample mean X (t)

2. Sample covariance fn:CX (s, t) = (n − 1)−1

∑ni=1{Xi (s)− X (s)}{Xi (t)− X (t)}

3. Spectral decomposition (or eigenanalysis) of CX gives pairseigenfn/eigenval {φk(·), λk}

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 29 / 71

Page 45: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for sample of IID true curves

4. Select finite truncation K using PVE or information criteria(AIC, BIC etc.)

5. Estimate fPC scores: ξik =∫T {Xi (t)− X (t)}φk(t)dt

6. KL expansion Xi (t) = X (t) +∑K

k=1 ξik φk(t)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 30 / 71

Page 46: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for sample of IID true curves

4. Select finite truncation K using PVE or information criteria(AIC, BIC etc.)

5. Estimate fPC scores: ξik =∫T {Xi (t)− X (t)}φk(t)dt

6. KL expansion Xi (t) = X (t) +∑K

k=1 ξik φk(t)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 30 / 71

Page 47: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for sample of IID true curves

4. Select finite truncation K using PVE or information criteria(AIC, BIC etc.)

5. Estimate fPC scores: ξik =∫T {Xi (t)− X (t)}φk(t)dt

6. KL expansion Xi (t) = X (t) +∑K

k=1 ξik φk(t)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 30 / 71

Page 48: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Theoretical properties fPCA

Consistency results for the sample mean, sample covariancefunctions, and the corresponding eigenvalues/eigenfunctions.

. EX = µ and E‖X − µ‖2 = O(n−1).

. The sample covariance is unbiased and is mean squaredconsistent estimator of the covariance function

Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0

Let {λk , φk}k be the eigenvalues/eigenfunctions andck = sign

∫φk(t)φk(t)dt. Then

lim supn→∞

nE [‖ck φk−φk‖2] <∞ lim supn→∞

nE [|λk−λk |2] <∞

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 31 / 71

Page 49: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Theoretical properties fPCA

Consistency results for the sample mean, sample covariancefunctions, and the corresponding eigenvalues/eigenfunctions.

. EX = µ and E‖X − µ‖2 = O(n−1).

. The sample covariance is unbiased and is mean squaredconsistent estimator of the covariance function

Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0

Let {λk , φk}k be the eigenvalues/eigenfunctions andck = sign

∫φk(t)φk(t)dt. Then

lim supn→∞

nE [‖ck φk−φk‖2] <∞ lim supn→∞

nE [|λk−λk |2] <∞

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 31 / 71

Page 50: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Theoretical properties fPCA

Consistency results for the sample mean, sample covariancefunctions, and the corresponding eigenvalues/eigenfunctions.

. EX = µ and E‖X − µ‖2 = O(n−1).

. The sample covariance is unbiased and is mean squaredconsistent estimator of the covariance function

Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0

Let {λk , φk}k be the eigenvalues/eigenfunctions andck = sign

∫φk(t)φk(t)dt. Then

lim supn→∞

nE [‖ck φk−φk‖2] <∞ lim supn→∞

nE [|λk−λk |2] <∞

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 31 / 71

Page 51: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for dense sampling design plus noise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Common approach: first smooth each curve, then do fPCA

. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)

. µ sample mean of Xi ’s

. Σ sample cov of Xi ’s

. Eigenanalysis of Σ to get {λk , φk}k

. Use PVE or other criteria to select truncation K

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71

Page 52: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for dense sampling design plus noise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Common approach: first smooth each curve, then do fPCA

. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)

. µ sample mean of Xi ’s

. Σ sample cov of Xi ’s

. Eigenanalysis of Σ to get {λk , φk}k

. Use PVE or other criteria to select truncation K

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71

Page 53: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for dense sampling design plus noise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Common approach: first smooth each curve, then do fPCA

. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)

. µ sample mean of Xi ’s

. Σ sample cov of Xi ’s

. Eigenanalysis of Σ to get {λk , φk}k

. Use PVE or other criteria to select truncation K

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71

Page 54: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for dense sampling design plus noise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Common approach: first smooth each curve, then do fPCA

. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)

. µ sample mean of Xi ’s

. Σ sample cov of Xi ’s

. Eigenanalysis of Σ to get {λk , φk}k

. Use PVE or other criteria to select truncation K

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71

Page 55: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for dense sampling design plus noise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Common approach: first smooth each curve, then do fPCA

. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)

. µ sample mean of Xi ’s

. Σ sample cov of Xi ’s

. Eigenanalysis of Σ to get {λk , φk}k

. Use PVE or other criteria to select truncation K

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71

Page 56: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

fPCA for dense sampling design plus noise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Common approach: first smooth each curve, then do fPCA

. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)

. µ sample mean of Xi ’s

. Σ sample cov of Xi ’s

. Eigenanalysis of Σ to get {λk , φk}k

. Use PVE or other criteria to select truncation K

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71

Page 57: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Trajectories reconstruction

. Estimate fPC scores

ξik =

mi∑j=1

{Yij − µ(tij)}φk(tij)(tij − tij−1)

(accounting for possibly unequal grids of points)

. Finite dimension approx:

XKi (t) = µ(t) +

K∑k=1

φk(t)ξik

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 33 / 71

Page 58: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Trajectories reconstruction

. Estimate fPC scores

ξik =

mi∑j=1

{Yij − µ(tij)}φk(tij)(tij − tij−1)

(accounting for possibly unequal grids of points)

. Finite dimension approx:

XKi (t) = µ(t) +

K∑k=1

φk(t)ξik

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 33 / 71

Page 59: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Illustration: DTITop: FA along corpus callosum for MS subjects (left); covarianceestimate (middle); scree plot (right). Bottom: The two leadingfPCs and their effect relative to the population mean

0 20 40 60 80

0.3

0.4

0.5

0.6

0.7

Diffusion Tensor Imaging : CCA

tract

Fra

ctio

nal a

niso

trop

y (F

A)

20 40 60 80

2040

6080

Smooth covariance of FA

tract

trac

t

0.002

0.004

0.006

0.008

●●

● ● ● ● ● ● ● ● ● ● ●

5 10 15 20

0.70

0.75

0.80

0.85

0.90

0.95

1.00

scree plot

number of PCsP

EV

0 20 40 60 80

−2

−1

01

2

fPC1

tract

−−−−−−−−−−−

−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−

−−−−−−−−−

0 20 40 60 80

0.4

0.5

0.6

0.7

fPC1 (68%)

tract

+++++++

+++++++++++++++++++

+++++

+++++++++++++++++

+++++++++++++++++++++++++++++

++++++++++++++++

0 20 40 60 80

−2

−1

01

2

fPC2

tract

−−−−−

−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−

0 20 40 60 80

0.40

0.45

0.50

0.55

0.60

0.65

fPC2 (10%)

tract

+

+

+

+

+

+++++++

++

+

+

+

+

++++++++

+++++

+++++++++++++++

+++++++++

+++++++++++

+++++

+

+

+

+

+

+

+

+

+

++++

+++++++++

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 34 / 71

Page 60: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

3.4. fPCA for sparse sampling design plusnoise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Challenges:

. Number of measurements per curve is very small;

. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !

Common approach: pool subjects data to do fPCA

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71

Page 61: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

3.4. fPCA for sparse sampling design plusnoise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Challenges:

. Number of measurements per curve is very small;

. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !

Common approach: pool subjects data to do fPCA

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71

Page 62: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

3.4. fPCA for sparse sampling design plusnoise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Challenges:

. Number of measurements per curve is very small;

. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !

Common approach: pool subjects data to do fPCA

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71

Page 63: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

3.4. fPCA for sparse sampling design plusnoise

Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i

Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).

Challenges:

. Number of measurements per curve is very small;

. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !

Common approach: pool subjects data to do fPCA

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71

Page 64: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Reasoning behind estimation procedure

Recall model: Yij = Xi (tij) + εij Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2)

. Mean estimator µ(t): by smoothing the data {(tij ,Yij) : i , j}

. To estimate the covariance, note the following:

Cov(Yij ,Yij ′) = Σ(tij , tij ′) if j 6= j ′

Var(Yij) = Σ(tij , tij) + σ2

Treat differently diagonal (tij , tij) & off-diagonal (tij , tij ′) terms

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 36 / 71

Page 65: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Reasoning behind estimation procedure

Recall model: Yij = Xi (tij) + εij Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2)

. Mean estimator µ(t): by smoothing the data {(tij ,Yij) : i , j}

. To estimate the covariance, note the following:

Cov(Yij ,Yij ′) = Σ(tij , tij ′) if j 6= j ′

Var(Yij) = Σ(tij , tij) + σ2

Treat differently diagonal (tij , tij) & off-diagonal (tij , tij ′) terms

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 36 / 71

Page 66: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Reasoning behind estimation procedure

Recall model: Yij = Xi (tij) + εij Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2)

. Mean estimator µ(t): by smoothing the data {(tij ,Yij) : i , j}

. To estimate the covariance, note the following:

Cov(Yij ,Yij ′) = Σ(tij , tij ′) if j 6= j ′

Var(Yij) = Σ(tij , tij) + σ2

Treat differently diagonal (tij , tij) & off-diagonal (tij , tij ′) terms

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 36 / 71

Page 67: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA

. ‘Off-diagonal’ smoothing: Define “raw -covariances”

Gijj ′ = {Yij − µ(tij)}{Yij ′ − µ(tij ′)}

- two-dimensional smoothing of {Gijj ′ , tij , tij ′) : j 6= j ′}iworking model: Gijj ′ = Σ(tij , tij ′) + eijj ′ , eijj ′ ∼ IID,

where Σ(·, ·) symmetric + positive-definite fn

Model Σ using bivariate basis fns or tensor product of twouniv bases fns. Different penalization (Wood, 2005)

. Adjust estimate to be symmetric; zero out negative eigenvals !

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 37 / 71

Page 68: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA

. ‘Off-diagonal’ smoothing: Define “raw -covariances”

Gijj ′ = {Yij − µ(tij)}{Yij ′ − µ(tij ′)}

- two-dimensional smoothing of {Gijj ′ , tij , tij ′) : j 6= j ′}iworking model: Gijj ′ = Σ(tij , tij ′) + eijj ′ , eijj ′ ∼ IID,

where Σ(·, ·) symmetric + positive-definite fn

Model Σ using bivariate basis fns or tensor product of twouniv bases fns. Different penalization (Wood, 2005)

. Adjust estimate to be symmetric; zero out negative eigenvals !

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 37 / 71

Page 69: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA

. ‘Off-diagonal’ smoothing: Define “raw -covariances”

Gijj ′ = {Yij − µ(tij)}{Yij ′ − µ(tij ′)}

- two-dimensional smoothing of {Gijj ′ , tij , tij ′) : j 6= j ′}iworking model: Gijj ′ = Σ(tij , tij ′) + eijj ′ , eijj ′ ∼ IID,

where Σ(·, ·) symmetric + positive-definite fn

Model Σ using bivariate basis fns or tensor product of twouniv bases fns. Different penalization (Wood, 2005)

. Adjust estimate to be symmetric; zero out negative eigenvals !

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 37 / 71

Page 70: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA

. ‘Diagonal’ smoothing: Define raw -variances

Gij = {Yij − µ(tij)}2

- one-dimensional smoothing of {(Gij , tij) : i , j} → σ2Y (t)

. Estimate σ2: σ2 =∫T {σ

2Y (t)− Σ(t, t)} dt

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 38 / 71

Page 71: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA

. ‘Diagonal’ smoothing: Define raw -variances

Gij = {Yij − µ(tij)}2

- one-dimensional smoothing of {(Gij , tij) : i , j} → σ2Y (t)

. Estimate σ2: σ2 =∫T {σ

2Y (t)− Σ(t, t)} dt

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 38 / 71

Page 72: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA (cont’d)

. Eigenanalysis of Σ(t, s) gives eigenvals/fns, {λk , φk(t)}k ,

. Choose K using PVE or other approaches (e.g. 95%)

. Estimate true signal Xi (·) by the truncated KL expansion:

XKi (t) = µ(t) +

K∑k=1

ξik φk(t)

. How to estimate the fPC ξik ?∑mij=1{Yij − µ(tij)}φk(tij)(tij − tij−1) is no longer feasible !?

Instead estimate fPC using conditional expectation !

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 39 / 71

Page 73: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA (cont’d)

. Eigenanalysis of Σ(t, s) gives eigenvals/fns, {λk , φk(t)}k ,

. Choose K using PVE or other approaches (e.g. 95%)

. Estimate true signal Xi (·) by the truncated KL expansion:

XKi (t) = µ(t) +

K∑k=1

ξik φk(t)

. How to estimate the fPC ξik ?∑mij=1{Yij − µ(tij)}φk(tij)(tij − tij−1) is no longer feasible !?

Instead estimate fPC using conditional expectation !

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 39 / 71

Page 74: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Sparse fPCA (cont’d)

. Eigenanalysis of Σ(t, s) gives eigenvals/fns, {λk , φk(t)}k ,

. Choose K using PVE or other approaches (e.g. 95%)

. Estimate true signal Xi (·) by the truncated KL expansion:

XKi (t) = µ(t) +

K∑k=1

ξik φk(t)

. How to estimate the fPC ξik ?∑mij=1{Yij − µ(tij)}φk(tij)(tij − tij−1) is no longer feasible !?

Instead estimate fPC using conditional expectation !

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 39 / 71

Page 75: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prediction of fPC

. Consider the reduced rank model (mixed effects model)

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij j = 1, 2, . . . ,mi

ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)

Assume for now that µ(·), φk(·), λk and σ2 are known

. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]

. Assume ξik ’s and Yij ’s jointly Gaussian.

Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71

Page 76: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prediction of fPC

. Consider the reduced rank model (mixed effects model)

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij j = 1, 2, . . . ,mi

ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)

Assume for now that µ(·), φk(·), λk and σ2 are known

. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]

. Assume ξik ’s and Yij ’s jointly Gaussian.

Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71

Page 77: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prediction of fPC

. Consider the reduced rank model (mixed effects model)

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij j = 1, 2, . . . ,mi

ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)

Assume for now that µ(·), φk(·), λk and σ2 are known

. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]

. Assume ξik ’s and Yij ’s jointly Gaussian.

Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71

Page 78: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prediction of fPC

. Consider the reduced rank model (mixed effects model)

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij j = 1, 2, . . . ,mi

ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)

Assume for now that µ(·), φk(·), λk and σ2 are known

. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]

. Assume ξik ’s and Yij ’s jointly Gaussian.

Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71

Page 79: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prediction of fPC in practice

. Use the approximated reduced rank model as working model

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij

where ξik ∼i IID(0, λk) and εij ∼IID (0, σ2)

. Predict ξik by the empirical BLUP:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 41 / 71

Page 80: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prediction of fPC in practice

. Use the approximated reduced rank model as working model

Yij = µ(tij) +K∑

k=1

φk(tij)ξik + εij

where ξik ∼i IID(0, λk) and εij ∼IID (0, σ2)

. Predict ξik by the empirical BLUP:

ξik = λkΦTik(Σi + σ2Imi )

−1(Yi − µi ).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 41 / 71

Page 81: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Theoretical properties of sparse fPCAUnder suitable regularity conditions

. Let µ(t) be the local linear mean estimator of µ(t). Then

supt|µ(t)− µ(t)| = Op(n−3/10)

. Let Σ(s, t) be the local linear estimator of Σ(s, t). Then

supt,s|Σ(s, t)− Σ(s, t)| = Op(n−1/10)

Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0

Let {λk , φk}k be the eigenvalues/eigenfunctions of Σ(·, ·) andck = sign

∫φk(t)φk(t)dt. Then for all k = 1, . . . ,K

‖ck φk − φk‖2 = Op(n−1/10), |λk − λk | = Op(n−1/10)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 42 / 71

Page 82: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Theoretical properties of sparse fPCAUnder suitable regularity conditions

. Let µ(t) be the local linear mean estimator of µ(t). Then

supt|µ(t)− µ(t)| = Op(n−3/10)

. Let Σ(s, t) be the local linear estimator of Σ(s, t). Then

supt,s|Σ(s, t)− Σ(s, t)| = Op(n−1/10)

Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0

Let {λk , φk}k be the eigenvalues/eigenfunctions of Σ(·, ·) andck = sign

∫φk(t)φk(t)dt. Then for all k = 1, . . . ,K

‖ck φk − φk‖2 = Op(n−1/10), |λk − λk | = Op(n−1/10)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 42 / 71

Page 83: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Illustration: CD4Top: estimated mean; estimated covariance; scree plotBottom: Top fPC and the effect of changes along this direction.

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

CD4 cell counts

months since seroconversion

num

ber

of c

d4 c

ells

●●

● ●

●●

●●

row

colu

mn

10

20

30

40

50

60

10 20 30 40 50 60

40000

50000

60000

70000

80000

90000

100000

110000

120000

● ● ●

1 2 3 4 5 6

0.80

0.85

0.90

0.95

1.00

scree plot

number of PCs

PE

V

−20 −10 0 10 20 30 40

−2

−1

01

2

fPC1

month

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−20 −10 0 10 20 30 40

050

010

0015

0020

0025

0030

00

fPC1 79.5%

month

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 43 / 71

Page 84: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Software implementation

. R

I for estimation

face for fast covariance estimation for sparse functional datafda for functional data analysis in Rfpca for functional principal component analysismgcv for generalized additive (mixed) models;semi-parametric smoothingrefund for regression with functional data;

I for visualization

fields for 2d image plotslattice for various plotsrefund.shiny for interacting plots for functional dataanalysesrgl for 3d plots

. MATLAB

I PACE

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 44 / 71

Page 85: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Beyond iid functional data

Although the iid case is quite common, here are other situations:

. Multilevel functional data:

I {Yij(s), s ∈ S, i = 1, . . . , n, j = 1, . . . ,mi}I Eg: crypt-level biomarker data in colon-carcinogenesis studies

. Longitudinal functional data:

I [{Yij(s),Tij}, s ∈ S, i = 1, . . . , n, j = 1, . . . ,mi}I Eg: DTI data (multiple clinical visits)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 45 / 71

Page 86: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

DTI data (revisit)

Each subject is observed at several hospital visits and at eachtime a FA profile is measured. Below: FA for two subjects.Of interest: what is the dynamic behavior over time ?

0 20 40 60 80

0.40

0.50

0.60

Subject A (total of 4 visits)

tract

FA

visit times = 0, 192, 365, 834

0 20 40 60 80

0.30

0.40

0.50

0.60

Subject B (total of 3 visits)

tract

FA

visit times = 0, 112, 239

FA along CCA at various times after baseline (time=0)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 46 / 71

Page 87: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Longitudinal functional data analysis (LFDA)

Data Structure

[{Yij(·), Tij} : i = 1, . . . , n, j = 1, . . . ,mi ]

. Yij(·) is jth response of the ith subject observed on fine grid

. Tij is the time corresponding to the Yij(·)

Tij is observed in a longitudinal design

Objective:

. Dynamic behavior of the process over time

. Predict full curve using subjects’ past history

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 47 / 71

Page 88: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Related literature

. Morris & Carroll(JRSSB06); Baladandayuthapani etal(Bcs08); Di et al(AoAS09); Aston et al(JRSSC10); S. et al(Biost10); Scheipl et al(JCGS14); Morris (AnRev15)

. Greven et al(EJS10); Chen & Muller(JASA12);. . .

Limitation: no methodology that captures the process dynamicsover time and provides prediction of full trajectory at future time

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 48 / 71

Page 89: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Park & S. (Stat15)

Proposed Model

Yij(s) = µ(s,Tij) +∑∞

k=1 ξik(Tij)φk(s) + εij(s)

. µ(·,Tij) mean response profile (FA along CCA) at time Tij

. φk(·)’s are orthogonal functions in L2(S)

. ξik(T ) zero-mean time-varying random coefficients; theyquantify the process dynamics

. εij(·) zero-mean IID residual process

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 49 / 71

Page 90: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Flexibility of the proposed framework

PAR : cov{ξik(T ), ξik(T ′)} = λkρk(|T − T ′|) , λk > 0;

relates to Gromenko et al(AnnAS12) and Gromenko andKokoszka (CSDA13) for n = 1

REM : ξik(T ) = ζ0,ik + ζ1,ikT ;

relates to Greven et al(EJS10) for φX1k (s) = φX2

k (s) := φk(s)

NP : ξik(T ) =∑

`≥1 ζik`ψk`(T );

relates to Chen & Muller(JASA12) for φk(s|Tij) := φk(s)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 50 / 71

Page 91: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Selecting the time-invariant basis functions

Recall model : Yij(s) = µ(s,Tij) +∑∞

k=1 ξik(Tij) φk(s) + εij(s)

. Pre-specified basis functions (e.g. Fourier, B-splines, wavelets)

. Data-driven basis functions like the eigenfunctions of somecovariance function. Which covariance function to use ?

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 51 / 71

Page 92: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Time-invariant basis functions, φk(s)’s (cont’d)

Recall model : Yij(s) = µ(s,Tij) +∑∞

k=1 ξik(Tij) φk(s) + εij(s)

. Let Wi (s,T ) =∑∞

k=1 ξik(T )φk(s) and denote its covariance by

cov{Wi (s,T ), Wi (s′,T ′)} = c{(s,T ), (s ′,T ′)}

. Define Σ(s, s ′) :=∫c{(s,T ), (s ′,T )} g(T ) dT ,

g(T ) the sampling density of T .

. Σ(s, s ′) → proper cov fn. Call Σ marginal cov induced by Wi .Similar concept: Chen et al.(JRSSB16+), Aston et al.(srXiv15)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 52 / 71

Page 93: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Time-invariant basis functions, φk(s)’s (cont’d)

. Choose {φk(·)}k ’s as the eigenfunctions of Σ(s, s ′)

. Proxy time-varying basis coefficients

ξW ,ijk =∫{Yij(s)− µ(s,Tij)} φk(s)ds

I For each k , {(ξW ,ijk ,Tij)mi

j=1}i describe the process dynamics.

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 53 / 71

Page 94: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Estimation roadmap

Step 1 : Mean function µ(s,T )

Step 2 : Marginal covariance function Σ(s, s ′) and the orthogonalbasis φk(s)’s

Step 3 : Time-varying coefficients ξik(T ) for every T

Step 4 : Full trajectories Yi (·,T ) for every T

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 54 / 71

Page 95: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Estimation

Case: fully observed curves L2[S] error

Step 1 : Mean function

. Estimate µ(s,T ) using bivariate (tensor product) splinessmoothing Wood (Bcs2006) + working independence ⇒ µ(s,T )

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 55 / 71

Page 96: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Estimation (cont’d)

Step 2 : Marginal Covariance Function

. Demean data: Yij(s) := Yij(s)− µ(s,Tij)

. Estimate Σ(s, s ′) by sample covariance of the demeaned data

Σ(s, s ′) =1∑n

i=1mi

∑ni=1

∑mij=1 Yij(s)Yij(s

′)

. Spectral decomposition of Σ(s, s ′) gives {λk , φk(s)}k

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 56 / 71

Page 97: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Estimation (cont’d)

Step 3 : Time-varying coefficients ξik(T ) for every T

For each k estimate proxy ξW ,ijk = ξik(Tij) + eijk by

ξW ,ijk=∫Yij(s) φk(s)ds (numerical integ.)

. Use standard longitudinal/sparse functional data methods toanalyse {(ξW ,ijk ,Tij)

mij=1}i and estimate temporal covaraince

. Predict ξik(T ): ξik(T ) for every T using this estimated covariance

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 57 / 71

Page 98: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Estimation (cont’d)

Step 4 : Predict the full trajectory Yi (·,T ) for every T

Yi (·,T ) = µ(·,T ) +∑K

k=1 ξik(T )φk(·)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 58 / 71

Page 99: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Theoretical properties

Lemma 1: Marginal covariance

Under regularity assumptions, ‖Σ(·, ·)− Σ(·, ·)‖ →p 0

. Consistency results for λk and φk(·).

Lemma 2 : Proxy time-varying basis coefficients

Under regularity assumptions, supj |ξW ,ijk − ξW ,ijk | →p 0

. Consistency of the predicted trajectories, Yi (s,T ).

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 59 / 71

Page 100: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Implementation in R

Software implementation in R (Wrobel, Park, S., and Goldsmith,Stat 2016)

. refund: fpca.lfda

. refund.shiny: plot shiny for visualization

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 60 / 71

Page 101: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Simulation experiment

Generating Model:

Yij(s) = µ(s,Tij) + ξi1(Tij)φ1(s) + ξi2(Tij)φ2(s) + εij(s)

. Covariance structures for the time-varying coef, ξik(T )’s(i) NP ; (ii) REM ; (iii) Exp

. Error structure εij(s): smooth + white noise (SNR= 1)

. Grid points for s: 101 equally spaced points in [0, 1]

. For each i , {Tij : j = 1, 2, . . . ,mi} are randomly sampled from 41equally spaced points in [0, 1]

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 61 / 71

Page 102: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Prediction performance and computation efficiency comparison:

. Proposed method

. Naıve: take average the subject’s previous curves to predictthe current trajectory

. Chen&Muller (JASA12): using time-dependent orthogonalfunctions φk(s|T )

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 62 / 71

Page 103: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Results

mi ∼ {8, . . . , 12}IN-IPE IN-IPEnaive OUT-IPE OUT-IPEnaive

NP n = 100 0.406 7.790 0.988 11.478n = 300 0.313 7.773 0.559 11.349n = 500 0.288 7.779 0.455 11.262

REM n = 100 0.328 1.199 1.011 2.160n = 300 0.265 1.197 0.675 2.160n = 500 0.247 1.197 0.571 2.150

Exp n = 100 0.554 1.528 1.426 2.520n = 300 0.508 1.531 1.143 2.498n = 500 0.494 1.530 1.074 2.492

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 63 / 71

Page 104: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Results (cont’d)

. Computationally faster compared to available approaches

mi ∼ {8, . . . , 12}Chen&Muller (JASA12) Proposed method

IN-IPE OUT-IPE time (sec) IN-IPE OUT-IPE time (sec)

NP n = 100 0.880 2.221 983.872 0.406 0.988 7.369n = 300 0.622 1.468 1659.611 0.313 0.559 15.892n = 500 0.556 1.298 2502.462 0.288 0.455 21.418

REM n = 100 0.424 1.359 1084.753 0.328 1.011 9.282n = 300 0.289 0.729 1955.193 0.265 0.675 11.347n = 500 0.257 0.614 2947.126 0.247 0.571 22.559

Exp n = 100 0.634 1.642 1556.182 0.554 1.426 7.514n = 300 0.549 1.251 1959.219 0.508 1.143 16.229n = 500 0.531 1.155 2865.041 0.494 1.074 17.109

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 64 / 71

Page 105: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

DTI data analysis

. FA along CCA for MS patients

. 162 MS patients observed at between 1 to 8 hospital visits

. Tij - hospital visit time (mean = 2.6 visits/subj)

. Yij(·) - FA profile (93 locn along CCA) for i subj at Tij

. 421 total curves

Objective:

. Dynamic behavior of FA over time

. Predict FA profile at a subject’s future visit

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 65 / 71

Page 106: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

DTI study (cont’d)

DTI data exploratory analysis

Model assumption

. Yij(s) = µ(s) +∑K

k=1 ξik(Tij) φk(s) + εij(s)

. Time-varying coef: ξik(Tij) = b0ik + b1ikTij (REM)

Estimation

Longitudinal functional data analysis for DTI data

. φk(·)’s - eigenfns of the estimated marginal covariance Ξ(·, ·)

. Fix percentage of explained variance to 95% → K = 10

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 66 / 71

Page 107: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

DTI study results (time dynamics)

Figure: Estimated basis functions φk(s) for k = 1, 2 and 3

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 67 / 71

Page 108: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

DTI study results (time dynamics)

Figure: Time-varying coefficients ξik(T ) for k = 1, 2 and 3 (REM)

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 68 / 71

Page 109: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

DTI study (cont’d)

Predicted values of FA for the last visits of three randomly selectedsubjects; actual(gray) / proposed (black) / naıve (dashed)

0.0 0.2 0.4 0.6 0.8 1.0

0.40

0.45

0.50

0.55

0.60

0.65

subject A, visit 6

tract

0.0 0.2 0.4 0.6 0.8 1.0

0.4

0.5

0.6

0.7

subject B, visit 2

tract

0.0 0.2 0.4 0.6 0.8 1.0

0.35

0.45

0.55

0.65

subject C, visit 3

tract

actual obsnaive predictionour prediction

Naıve Greven et al. (EJS10) Chen & Muller (JASA12) Proposed

IN-IPE×102 - 2.66 3.76 2.31OUT-IPE×102 3.52 - 8.71 3.48

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 69 / 71

Page 110: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Final remarks

. Sample of independent curves

I Smoothing using pre-specified basis

I FPCA

. Longitudinally observed curves

I Process dynamics + future curve prediction

. Software implementation/visualization in R

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 70 / 71

Page 111: Tutorial on Functional Data Analysis - SAMSI · A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71. Methodology From discrete to functional data. Intuition The

Methodology

Thank you!

Comments? Questions ? [email protected]

A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 71 / 71