tutorial on functional data analysis - samsi · a-m staicu tutorial on functional data analysis...
TRANSCRIPT
Methodology
Tutorial on Functional Data Analysis
Ana-Maria Staicu
Department of Statistics, North Carolina State [email protected]
SAMSI
April 5, 2017
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 1 / 71
Methodology
Broad overview of course topics:
1. Introduction to Functional Data
2. Modeling Functional Data with Preset Basis Expansions
3. Modeling Functional Data using Functional PrincipalComponent Analysis
4. Beyond Independent and Identically Distributed FunctionalData
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 2 / 71
Methodology
Example of functional data
Canadian weather data. Daily (and monthly) temperature andprecipitation at 35 different locations in Canada averaged over35 years from 1960 to 1994.
●●●●●●●
●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●
●●
●●●●●●●●●
●
●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●
●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●
●
●●
●●
0 100 200 300
−30
−20
−10
010
20
Ontario
days
tem
pera
ture
0 100 200 300
−30
−20
−10
010
20
days
tem
pera
ture
Avg Daily Temp in Ottawa Avg Daily Temp across Canada
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 3 / 71
Methodology
Example of functional data (cont’d)
Marathon data. Running time performance of 149 femalesand 105 male athletes competing in the US Olympic TeamTrials Marathon 02/16/2016.
6
8
10
12
Miles
Spe
ed (
mile
s/hr
)
1 3 5 7 9 11 13 15 17 19 21 23 25
●●
●
●
● ●
●●
● ●●
● ●
●● ●
●● ●
●●
●
●●
●
●
●●
●● ●
●●
● ●●
● ● ● ●●
● ● ●
●●
●●
● ●
● ●
●● ● ●
● ●● ●
●●
● ● ●
●●
●● ●
●●
●
● ●●
●
●● ●
● ● ● ●● ●
● ●
●● ●
● ●● ● ●
●
● ●
● ●●
● ● ● ● ●
● ●●
● ●●
●● ●
●●
●
●
●
●● ●
●
●
6
8
10
12
Miles
Spe
ed (
mile
s/hr
)
1 3 5 7 9 11 13 15 17 19 21 23 25
●●
● ●
● ●●
●
● ● ●
● ● ●●
●
●● ● ● ● ●
●●
●●
●
●
●● ● ●
●
● ●● ●
●
●●
●
●●
●
● ● ●●
●
● ●●
● ● ● ●
● ●● ● ● ●
●●
●●
●●
●●
● ●
●
●
●
● ●
●
● ●●
● ●●
●●
●
● ●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
● ●
●●
●
●
●
●
●
●●
●●
●
●●
●
Speed for females Speed for males
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 4 / 71
Methodology
Example of functional data (cont’d)
CD4 data. CD4 cell count per mm of blood is a usefulsurrogate of the progression of HIV. Below are CD4 cellcounts for 366 subjects between months -18 and 42 monthssince seroconversion (diagnosis of HIV).
−20 −10 0 10 20 30 40
050
010
0015
0020
0025
0030
00
CD4 cell counts
months since seroconversion
num
ber
of c
d4 c
ells
●●
●
● ●
●●
●
●●
●
●
●
●
●
●
−20 −10 0 10 20 30 40
050
010
0015
0020
0025
0030
00
CD4 cell counts
months since seroconversion
num
ber
of c
d4 c
ells
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●●
●
●●
●
●
● ●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●●
●● ●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●● ●
●●
●
●● ● ●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
● ● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●
● ●
● ●
●
●
●
●
●
● ●
●●
●
●
●
● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
● ●
●
●
●●
●
●
●
● ●
● ●
●●
●
●
●
●
●
●
●
●●
●●
●
●
● ●
●
●●
● ●
● ●
●●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ● ●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
● ●
●
●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ● ●
●
● ●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
CD4 for few subjects CD4 counts for all the subjects
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 5 / 71
Methodology
Example of functional data (cont’d)
Diffusion tensor imaging (DTI) data. Fractional anisotropy(FA) - measure of the tissue integrity that is useful indiagnosis/progression of multiple sclerosis (MS) - along themain direction of the corpus callosum (CCA) for many subj.
●●
●
●
●
●
●
●
●
●●
●●●●●
●
●
●●●
●
●●
●●●●
●
●●●
●●●●●●●●
●●
●●
●●●
●●●●●●●●
●
●
●●
●●●●
●●
●●
●●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
0 20 40 60 80
0.3
0.4
0.5
0.6
0.7
Diffusion Tensor Imaging : CCA
tract
Fra
ctio
nal a
niso
trop
y (F
A)
0 20 40 60 80
0.3
0.4
0.5
0.6
0.7
Diffusion Tensor Imaging : CCA
tract
Fra
ctio
nal a
niso
trop
y (F
A)
FA obs for one MS subj FA for all subj Brain CCA (red)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 6 / 71
Methodology
Example of functional data (cont’d)
Diffusion tensor imaging (DTI) data. Each subject is observedat several hospital visits and at each time a FA profile ismeasured. Below: FA for two subjects.Of interest: what is the dynamic behavior over time ?
0 20 40 60 80
0.40
0.50
0.60
Subject A (total of 4 visits)
tract
FA
visit times = 0, 192, 365, 834
0 20 40 60 80
0.30
0.40
0.50
0.60
Subject B (total of 3 visits)
tract
FA
visit times = 0, 112, 239
FA along CCA at various times after baseline (time=0)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 7 / 71
Methodology
Some characteristics of functional data:
. Often high dimensional
. Typically involves multiple measurements of the same“process”
. Interpretability across subject domains
. Parametric assumptions on the underlying “process” are oftennot made
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 8 / 71
Methodology
FDA vs multivariate data analysis
. Topology: well defined topology for FDA / one can permutethe elements in MDA
. Model covariance assn: smooth for FDA / unstructured orsparse for MDA
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 9 / 71
Methodology
FDA vs longitudinal data analysis
. Model covariance: no assn for FDA / parametric assn for LDA
. Mechanisms of missingness: not very important in FDA / veryimportant for LDA
. Interest more in subject-specific trajectories for FDA /inference for LDA
. Sampling design: typically high frequency for FDA / sparseand irregular for LDA
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 10 / 71
Methodology
Some references
. Ramsay & Silverman, 2005, “Functional Data Analysis”
. Ramsay, Hooker & Graves, 2009, “Functional Data Analysis inR and Matlab”
. Horvath & Kokoszka, 2012, “Inference for Functional Datawith Applications”
. Bosq, 2002, “Linear Processes on Function Spaces”
. Ferraty & Vieux, 2006, “Nonparametric Functional DataAnalysis”
. Zhang, 2013, “Analysis of Variance for Functional Data”
. Hsing & Eubank, 2015, “Theoretical Foundations ofFunctional Data Analysis, with an Introduction to LinearOperators”
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 11 / 71
Methodology
From discrete to functional data. Intuition
The term functional in reference to observed data refers to theintrinsic structure of the data being functional; i.e. there is anunderlying function that gives rise to the observed data.
Advantages of representing the data as a smooth function:
. allows evaluation at any time point
. allows evaluation of rates of change of the underlying curve
. allows registration to a common time-scale
Main idea in FDA: treat the observed data functions as singleentities, rather than sequence of individual observations.
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71
Methodology
From discrete to functional data. Intuition
The term functional in reference to observed data refers to theintrinsic structure of the data being functional; i.e. there is anunderlying function that gives rise to the observed data.
Advantages of representing the data as a smooth function:
. allows evaluation at any time point
. allows evaluation of rates of change of the underlying curve
. allows registration to a common time-scale
Main idea in FDA: treat the observed data functions as singleentities, rather than sequence of individual observations.
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71
Methodology
From discrete to functional data. Intuition
The term functional in reference to observed data refers to theintrinsic structure of the data being functional; i.e. there is anunderlying function that gives rise to the observed data.
Advantages of representing the data as a smooth function:
. allows evaluation at any time point
. allows evaluation of rates of change of the underlying curve
. allows registration to a common time-scale
Main idea in FDA: treat the observed data functions as singleentities, rather than sequence of individual observations.
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 12 / 71
Methodology
Notation
Observed data: {(Yi1, ti1), . . . , (Yimi, timi
)}i ;
. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error
. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)
. tij ∈ T ; tij may vary across i and j
. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .
εij - white noise; Typically εij are IID
Common objectives :
. Characterize the pattern of variability among curves
. Recover the subject specific trajectories Xi (·)’s
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71
Methodology
Notation
Observed data: {(Yi1, ti1), . . . , (Yimi, timi
)}i ;
. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error
. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)
. tij ∈ T ; tij may vary across i and j
. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .
εij - white noise; Typically εij are IID
Common objectives :
. Characterize the pattern of variability among curves
. Recover the subject specific trajectories Xi (·)’s
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71
Methodology
Notation
Observed data: {(Yi1, ti1), . . . , (Yimi, timi
)}i ;
. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error
. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)
. tij ∈ T ; tij may vary across i and j
. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .
εij - white noise; Typically εij are IID
Common objectives :
. Characterize the pattern of variability among curves
. Recover the subject specific trajectories Xi (·)’s
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71
Methodology
Notation
Observed data: {(Yi1, ti1), . . . , (Yimi, timi
)}i ;
. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error
. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)
. tij ∈ T ; tij may vary across i and j
. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .
εij - white noise; Typically εij are IID
Common objectives :
. Characterize the pattern of variability among curves
. Recover the subject specific trajectories Xi (·)’s
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71
Methodology
Notation
Observed data: {(Yi1, ti1), . . . , (Yimi, timi
)}i ;
. Yij : the “snapshot” an underlying ith signal/curve -latent-Xi (·), at timepoint tij , possibly blurred by error
. Xi (·) smooth latent curve on T ; Xi (·)’s are independentrealizations of a stochastic process X (·)
. tij ∈ T ; tij may vary across i and j
. If no noise, then Yij = Xi (tij); otherwise Yij = Xi (tij) + εij .
εij - white noise; Typically εij are IID
Common objectives :
. Characterize the pattern of variability among curves
. Recover the subject specific trajectories Xi (·)’s
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 13 / 71
Methodology
Smoothing
Why do we need smoothing?
. Data are often observed with error
. There’s a need to“interpolate” to a common grid
How are we going to do smoothing?
. Use prespecified basis functions (e.g. splines, wavelets, Fourieretc.)
. Use data-driven basis functions (e.g. functional principalcomponents)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 14 / 71
Methodology
Smoothing
Why do we need smoothing?
. Data are often observed with error
. There’s a need to“interpolate” to a common grid
How are we going to do smoothing?
. Use prespecified basis functions (e.g. splines, wavelets, Fourieretc.)
. Use data-driven basis functions (e.g. functional principalcomponents)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 14 / 71
Methodology
Prespecified basis expansion
Let {ψ1(·), ψ2(·), . . . , ψK (·)} be a user specified basis. Assume:
Yij =K∑
k=1
cikψk(tij) + εij .
. We only need to estimate the subject-specific scores cik ’s.
What kind of pre-specified basis to use? B-splines, wavelets etc
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 15 / 71
Methodology
Some common basis functions: B-splines
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−0.5
0.5
1.5 Fourier basis
s
φ k(s)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
B spline basis
s
φ k(s) . Continuous
. Easily defined derivatives
. Good for smooth data
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 16 / 71
Methodology
Some common basis functions: Wavelets
0.0 0.2 0.4 0.6 0.8 1.0
-0.2
-0.1
0.0
0.1
0.2
0.3
x
wav
elet
func
tion
valu
e
. Formed from a single“mother wavelet” function:ψjk(t) = 2j/2ψ(2j t − k)
. Orthonormal basis
. Particularly good when thereare jumps, spikes, peaks,etc.
. Wavelet representation issparse
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 17 / 71
Methodology
Example
0.0 0.2 0.4 0.6 0.8 1.0
0.35
0.40
0.45
0.50
0.55
0.60
Distance Along Tract
y
0.0 0.2 0.4 0.6 0.8 1.0
0.35
0.40
0.45
0.50
0.55
0.60
Distance Along Tracty
Few B-splines fns expansion Many B-spline fns expansion
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 18 / 71
Methodology
Tuning
For any curve, many possible smooths are available
. Depends on the type of basis
. Depends on the number of basis functions
. Depends on the estimation procedure
“Tuning” is the process of adjusting the smoother to the data athand. This is often implicit (eg. kernel smoothing).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 19 / 71
Methodology
Prespecified basis expansion: estimation
Explicit penalization: use a large number of basis functions andpenalize the “roughness” of the fit.
Leads to a penalized SSE:
PenSSEi =
mi∑j=1
(Yij −
K∑k=1
ψk(tij)cik
)2
+ λPen
(K∑
k=1
ψk(·)cik
)
. Measure the roughness using derivatives of the fit, e.g.
Pen(∑K
k=1 ψk(·)cik)
=∫{∑K
k=1 ψ′′k (t)cik}2dt = cTi Dci ,
D is K × K matrix with (k, k ′) equal to∫ψ
′′k (t)ψ
′′k ′(t) ds.
. Choose cik ’s that minimize PenSSEi ; obtain Xi (·)
. Need to select tuning parameter λ. Common ways: CV andREML
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 20 / 71
Methodology
Smoothing parameter λ
. When λ = 0 the criterion reduces to the SSE (emphasis on fit)
. When λ >> 0 (is very large) more emphasis on smoothness
Illustration : Vancouver mean temperature. Fits using 100 cubicsplines (with equally spaced knots) and various λ .
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 0.1
prec
ipita
tion
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 1000
prec
ipita
tion
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 1e+07
prec
ipita
tion
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 1e+11
prec
ipita
tion
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 21 / 71
Methodology
Smoothing parameter λ
. When λ = 0 the criterion reduces to the SSE (emphasis on fit)
. When λ >> 0 (is very large) more emphasis on smoothness
Illustration : Vancouver mean temperature. Fits using 100 cubicsplines (with equally spaced knots) and various λ .
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 0.1
prec
ipita
tion
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 1000
prec
ipita
tion
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 1e+07
prec
ipita
tion
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●●
●●
●
●●●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●
●
●●
●
●●●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●
●●●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●●●
●
●●
●●●●●●
●●●●●
●
●●●●●
●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
0 100 200 300
02
46
8
lambda = 1e+11
prec
ipita
tion
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 21 / 71
Methodology
Data-driven basis
. Previous bases don’t depend on the data; only the loadings do
. Functional principal component analysis (FPCA) gives a“data-driven” basis (constructed from the observed data)
. Similar representation of the observed data Y ′ijs:
Yij = µ(tij) +K∑
k=1
cikφk(sij) + εi`.
. Difference is that the φk(·)’s are not known.
. {φ1(·), φ2(·), . . .} describe the main directions of variability inthe observed data
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 22 / 71
Methodology
fPCA: main idea
Idea of fPCA: find projections of maximum variance.Let {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ].
. The 1st fPC = φ1(t) for which
ξ1i =< φ1,Xi >=
∫Tφ1(t)Xi (t) dt
has maximum variance subject to the constraint ‖φ1‖2 = 1.
. The 2nd fPC = φ2(t) for which
ξ2i =< φ2,Xi >=
∫Tφ2(t)Xi (t) dt
has maximum variance, subject to < φ2, φ1 >= 0, ‖φ2‖2 = 1.
. . . .
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 23 / 71
Methodology
fPCA: main idea
Idea of fPCA: find projections of maximum variance.Let {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ].
. The 1st fPC = φ1(t) for which
ξ1i =< φ1,Xi >=
∫Tφ1(t)Xi (t) dt
has maximum variance subject to the constraint ‖φ1‖2 = 1.
. The 2nd fPC = φ2(t) for which
ξ2i =< φ2,Xi >=
∫Tφ2(t)Xi (t) dt
has maximum variance, subject to < φ2, φ1 >= 0, ‖φ2‖2 = 1.
. . . .
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 23 / 71
Methodology
fPCA: main idea
Idea of fPCA: find projections of maximum variance.Let {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ].
. The 1st fPC = φ1(t) for which
ξ1i =< φ1,Xi >=
∫Tφ1(t)Xi (t) dt
has maximum variance subject to the constraint ‖φ1‖2 = 1.
. The 2nd fPC = φ2(t) for which
ξ2i =< φ2,Xi >=
∫Tφ2(t)Xi (t) dt
has maximum variance, subject to < φ2, φ1 >= 0, ‖φ2‖2 = 1.
. . . .
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 23 / 71
Methodology
Data-driven basis: Illustration
FPCA for a sample of 20 curves. Displayed in dashed black line are3 FPC: top middle panel, bottom left and right panels.
0.0 0.2 0.4 0.6 0.8 1.0
−3
−2
−1
01
23
20 functions
0.0 0.2 0.4 0.6 0.8 1.0
−3
−2
−1
01
23
comp 1, 85%
0.0 0.2 0.4 0.6 0.8 1.0
−3
−2
−1
01
23
residuals: remove comp 1
0.0 0.2 0.4 0.6 0.8 1.0
−3
−2
−1
01
23
comp 2, 14%
0.0 0.2 0.4 0.6 0.8 1.0
−3
−2
−1
01
23
residuals: remove comp 1,2
0.0 0.2 0.4 0.6 0.8 1.0
−3
−2
−1
01
23
comp 3, 1%
Model: Xi (t) = ξi1φ1(t) + ξi2φ2(t) + ξi3φ3(t)A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 24 / 71
Methodology
How to obtain the fPC ?
Notation:
. Σ(s, t) := E [{Xi (s)− µ(s)}{Xi (t)− µ(t)}] is the covar fn
. Σ(s, t) induces an integral operator Σ
Σf (t) =
∫T
Σ(t, s)f (s)ds, f (t) ∈ L2[T ]
Re-write the steps of the algorithm that defines fPCs:
. 1st fPC: φ1 = arg maxf < Σf , f >, ‖f ‖2 = 1
. 2nd fPC:φ2 = arg maxf < Σf , f >,< f , φ1 >= 0 and ‖f ‖2 = 1
. . . .
fPCs φk ’s are the eigenvectors of the cov operator Σ: Σφk = λkφk .
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 25 / 71
Methodology
How to obtain the fPC ?
Notation:
. Σ(s, t) := E [{Xi (s)− µ(s)}{Xi (t)− µ(t)}] is the covar fn
. Σ(s, t) induces an integral operator Σ
Σf (t) =
∫T
Σ(t, s)f (s)ds, f (t) ∈ L2[T ]
Re-write the steps of the algorithm that defines fPCs:
. 1st fPC: φ1 = arg maxf < Σf , f >, ‖f ‖2 = 1
. 2nd fPC:φ2 = arg maxf < Σf , f >,< f , φ1 >= 0 and ‖f ‖2 = 1
. . . .
fPCs φk ’s are the eigenvectors of the cov operator Σ: Σφk = λkφk .
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 25 / 71
Methodology
Mercer’s thm
. Assume Σ(s, t) is continuous over T × T . Then there exists:an orthonormal basis {φk}k of continuous fns in L2[T ] andλ1 ≥ λ2 ≥ . . . > 0 such that
I Σφk = λkφk where Σ is the operator induced by the cov fn
I Σ(s, t) =∑∞
k=1 λkφk(s)φk(t), t, s ∈ T ,where the series converges uniformly on T 2
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 26 / 71
Methodology
Karhunen-Loeve expansion
. {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ] and {φk}k- eigenfns of the cov fn Σ(·, ·) .
Xi (t) =∞∑k=1
ξikφk(t) (series converges in L2 norm)
where
I ξik :=∫Xi (t)φk(t)dt are random variables
I E [ξik ] = 0, Var [ξik ] = λk and {ξik : k ≥ 1} are mutuallyuncorrelated
I ξik ’s are called fPC scores for Xi
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 27 / 71
Methodology
Karhunen-Loeve expansion
. {Xi (t) : t ∈ T }i be IID zero-mean curves in L2[T ] and {φk}k- eigenfns of the cov fn Σ(·, ·) .
Xi (t) =∞∑k=1
ξikφk(t) (series converges in L2 norm)
where
I ξik :=∫Xi (t)φk(t)dt are random variables
I E [ξik ] = 0, Var [ξik ] = λk and {ξik : k ≥ 1} are mutuallyuncorrelated
I ξik ’s are called fPC scores for Xi
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 27 / 71
Methodology
Implications KL expansion
. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T
. λk is the average variance of the kth fPC φk , i.e.:
λk = λk
∫φ2k(t)dt =
∫Var{ξikφk(t)}dt
.∫{VarXi (t)}dt =
∑∞k=1 λk
. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71
Methodology
Implications KL expansion
. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T
. λk is the average variance of the kth fPC φk , i.e.:
λk = λk
∫φ2k(t)dt =
∫Var{ξikφk(t)}dt
.∫{VarXi (t)}dt =
∑∞k=1 λk
. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71
Methodology
Implications KL expansion
. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T
. λk is the average variance of the kth fPC φk , i.e.:
λk = λk
∫φ2k(t)dt =
∫Var{ξikφk(t)}dt
.∫{VarXi (t)}dt =
∑∞k=1 λk
. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71
Methodology
Implications KL expansion
. One-to-one mapping Xi (·)→ (ξi1, ξi2, . . .)T
. λk is the average variance of the kth fPC φk , i.e.:
λk = λk
∫φ2k(t)dt =
∫Var{ξikφk(t)}dt
.∫{VarXi (t)}dt =
∑∞k=1 λk
. Percentage variance explained (PVE) by the first K fPCs:∑Kk=1 λk∑∞l=1 λl
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 28 / 71
Methodology
fPCA for sample of IID true curves
Recall: {Xi (t) : t ∈ T }i be IID curves in L2[T ].
1. Sample mean X (t)
2. Sample covariance fn:CX (s, t) = (n − 1)−1
∑ni=1{Xi (s)− X (s)}{Xi (t)− X (t)}
3. Spectral decomposition (or eigenanalysis) of CX gives pairseigenfn/eigenval {φk(·), λk}
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 29 / 71
Methodology
fPCA for sample of IID true curves
4. Select finite truncation K using PVE or information criteria(AIC, BIC etc.)
5. Estimate fPC scores: ξik =∫T {Xi (t)− X (t)}φk(t)dt
6. KL expansion Xi (t) = X (t) +∑K
k=1 ξik φk(t)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 30 / 71
Methodology
fPCA for sample of IID true curves
4. Select finite truncation K using PVE or information criteria(AIC, BIC etc.)
5. Estimate fPC scores: ξik =∫T {Xi (t)− X (t)}φk(t)dt
6. KL expansion Xi (t) = X (t) +∑K
k=1 ξik φk(t)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 30 / 71
Methodology
fPCA for sample of IID true curves
4. Select finite truncation K using PVE or information criteria(AIC, BIC etc.)
5. Estimate fPC scores: ξik =∫T {Xi (t)− X (t)}φk(t)dt
6. KL expansion Xi (t) = X (t) +∑K
k=1 ξik φk(t)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 30 / 71
Methodology
Theoretical properties fPCA
Consistency results for the sample mean, sample covariancefunctions, and the corresponding eigenvalues/eigenfunctions.
. EX = µ and E‖X − µ‖2 = O(n−1).
. The sample covariance is unbiased and is mean squaredconsistent estimator of the covariance function
Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0
Let {λk , φk}k be the eigenvalues/eigenfunctions andck = sign
∫φk(t)φk(t)dt. Then
lim supn→∞
nE [‖ck φk−φk‖2] <∞ lim supn→∞
nE [|λk−λk |2] <∞
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 31 / 71
Methodology
Theoretical properties fPCA
Consistency results for the sample mean, sample covariancefunctions, and the corresponding eigenvalues/eigenfunctions.
. EX = µ and E‖X − µ‖2 = O(n−1).
. The sample covariance is unbiased and is mean squaredconsistent estimator of the covariance function
Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0
Let {λk , φk}k be the eigenvalues/eigenfunctions andck = sign
∫φk(t)φk(t)dt. Then
lim supn→∞
nE [‖ck φk−φk‖2] <∞ lim supn→∞
nE [|λk−λk |2] <∞
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 31 / 71
Methodology
Theoretical properties fPCA
Consistency results for the sample mean, sample covariancefunctions, and the corresponding eigenvalues/eigenfunctions.
. EX = µ and E‖X − µ‖2 = O(n−1).
. The sample covariance is unbiased and is mean squaredconsistent estimator of the covariance function
Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0
Let {λk , φk}k be the eigenvalues/eigenfunctions andck = sign
∫φk(t)φk(t)dt. Then
lim supn→∞
nE [‖ck φk−φk‖2] <∞ lim supn→∞
nE [|λk−λk |2] <∞
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 31 / 71
Methodology
fPCA for dense sampling design plus noise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Common approach: first smooth each curve, then do fPCA
. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)
. µ sample mean of Xi ’s
. Σ sample cov of Xi ’s
. Eigenanalysis of Σ to get {λk , φk}k
. Use PVE or other criteria to select truncation K
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71
Methodology
fPCA for dense sampling design plus noise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Common approach: first smooth each curve, then do fPCA
. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)
. µ sample mean of Xi ’s
. Σ sample cov of Xi ’s
. Eigenanalysis of Σ to get {λk , φk}k
. Use PVE or other criteria to select truncation K
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71
Methodology
fPCA for dense sampling design plus noise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Common approach: first smooth each curve, then do fPCA
. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)
. µ sample mean of Xi ’s
. Σ sample cov of Xi ’s
. Eigenanalysis of Σ to get {λk , φk}k
. Use PVE or other criteria to select truncation K
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71
Methodology
fPCA for dense sampling design plus noise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Common approach: first smooth each curve, then do fPCA
. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)
. µ sample mean of Xi ’s
. Σ sample cov of Xi ’s
. Eigenanalysis of Σ to get {λk , φk}k
. Use PVE or other criteria to select truncation K
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71
Methodology
fPCA for dense sampling design plus noise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Common approach: first smooth each curve, then do fPCA
. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)
. µ sample mean of Xi ’s
. Σ sample cov of Xi ’s
. Eigenanalysis of Σ to get {λk , φk}k
. Use PVE or other criteria to select truncation K
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71
Methodology
fPCA for dense sampling design plus noise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi is large ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Common approach: first smooth each curve, then do fPCA
. For each i smooth data {(Yij , tij) : j = 1, . . . ,mi} → Xi (·)
. µ sample mean of Xi ’s
. Σ sample cov of Xi ’s
. Eigenanalysis of Σ to get {λk , φk}k
. Use PVE or other criteria to select truncation K
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 32 / 71
Methodology
Trajectories reconstruction
. Estimate fPC scores
ξik =
mi∑j=1
{Yij − µ(tij)}φk(tij)(tij − tij−1)
(accounting for possibly unequal grids of points)
. Finite dimension approx:
XKi (t) = µ(t) +
K∑k=1
φk(t)ξik
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 33 / 71
Methodology
Trajectories reconstruction
. Estimate fPC scores
ξik =
mi∑j=1
{Yij − µ(tij)}φk(tij)(tij − tij−1)
(accounting for possibly unequal grids of points)
. Finite dimension approx:
XKi (t) = µ(t) +
K∑k=1
φk(t)ξik
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 33 / 71
Methodology
Illustration: DTITop: FA along corpus callosum for MS subjects (left); covarianceestimate (middle); scree plot (right). Bottom: The two leadingfPCs and their effect relative to the population mean
0 20 40 60 80
0.3
0.4
0.5
0.6
0.7
Diffusion Tensor Imaging : CCA
tract
Fra
ctio
nal a
niso
trop
y (F
A)
20 40 60 80
2040
6080
Smooth covariance of FA
tract
trac
t
0.002
0.004
0.006
0.008
●
●
●
●
●
●
●
●●
● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
scree plot
number of PCsP
EV
0 20 40 60 80
−2
−1
01
2
fPC1
tract
−
−
−
−
−−−−−−−−−−−
−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−
−−−−−−−−−
0 20 40 60 80
0.4
0.5
0.6
0.7
fPC1 (68%)
tract
+++++++
+++++++++++++++++++
+++++
+++++++++++++++++
+++++++++++++++++++++++++++++
++++++++++++++++
0 20 40 60 80
−2
−1
01
2
fPC2
tract
−
−
−
−
−
−
−−−−−
−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−
−
−
−
−
−
−
−
−
0 20 40 60 80
0.40
0.45
0.50
0.55
0.60
0.65
fPC2 (10%)
tract
+
+
+
+
+
+++++++
++
+
+
+
+
++++++++
+++++
+++++++++++++++
+++++++++
+++++++++++
+++++
+
+
+
+
+
+
+
+
+
++++
+++++++++
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 34 / 71
Methodology
3.4. fPCA for sparse sampling design plusnoise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Challenges:
. Number of measurements per curve is very small;
. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !
Common approach: pool subjects data to do fPCA
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71
Methodology
3.4. fPCA for sparse sampling design plusnoise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Challenges:
. Number of measurements per curve is very small;
. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !
Common approach: pool subjects data to do fPCA
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71
Methodology
3.4. fPCA for sparse sampling design plusnoise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Challenges:
. Number of measurements per curve is very small;
. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !
Common approach: pool subjects data to do fPCA
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71
Methodology
3.4. fPCA for sparse sampling design plusnoise
Observed data: {Yij , tij : j = 1, . . . ,mi}i ; mi small ∀i
Model assn: Yij = Xi (tij) + εij ; Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2).
Challenges:
. Number of measurements per curve is very small;
. Reconstruction of the curves is very important; howeversmoothing each curve individually is not realistic !
Common approach: pool subjects data to do fPCA
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 35 / 71
Methodology
Reasoning behind estimation procedure
Recall model: Yij = Xi (tij) + εij Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2)
. Mean estimator µ(t): by smoothing the data {(tij ,Yij) : i , j}
. To estimate the covariance, note the following:
Cov(Yij ,Yij ′) = Σ(tij , tij ′) if j 6= j ′
Var(Yij) = Σ(tij , tij) + σ2
Treat differently diagonal (tij , tij) & off-diagonal (tij , tij ′) terms
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 36 / 71
Methodology
Reasoning behind estimation procedure
Recall model: Yij = Xi (tij) + εij Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2)
. Mean estimator µ(t): by smoothing the data {(tij ,Yij) : i , j}
. To estimate the covariance, note the following:
Cov(Yij ,Yij ′) = Σ(tij , tij ′) if j 6= j ′
Var(Yij) = Σ(tij , tij) + σ2
Treat differently diagonal (tij , tij) & off-diagonal (tij , tij ′) terms
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 36 / 71
Methodology
Reasoning behind estimation procedure
Recall model: Yij = Xi (tij) + εij Xi ∼ IID (µ,Σ), εij ∼WN(0, σ2)
. Mean estimator µ(t): by smoothing the data {(tij ,Yij) : i , j}
. To estimate the covariance, note the following:
Cov(Yij ,Yij ′) = Σ(tij , tij ′) if j 6= j ′
Var(Yij) = Σ(tij , tij) + σ2
Treat differently diagonal (tij , tij) & off-diagonal (tij , tij ′) terms
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 36 / 71
Methodology
Sparse fPCA
. ‘Off-diagonal’ smoothing: Define “raw -covariances”
Gijj ′ = {Yij − µ(tij)}{Yij ′ − µ(tij ′)}
- two-dimensional smoothing of {Gijj ′ , tij , tij ′) : j 6= j ′}iworking model: Gijj ′ = Σ(tij , tij ′) + eijj ′ , eijj ′ ∼ IID,
where Σ(·, ·) symmetric + positive-definite fn
Model Σ using bivariate basis fns or tensor product of twouniv bases fns. Different penalization (Wood, 2005)
. Adjust estimate to be symmetric; zero out negative eigenvals !
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 37 / 71
Methodology
Sparse fPCA
. ‘Off-diagonal’ smoothing: Define “raw -covariances”
Gijj ′ = {Yij − µ(tij)}{Yij ′ − µ(tij ′)}
- two-dimensional smoothing of {Gijj ′ , tij , tij ′) : j 6= j ′}iworking model: Gijj ′ = Σ(tij , tij ′) + eijj ′ , eijj ′ ∼ IID,
where Σ(·, ·) symmetric + positive-definite fn
Model Σ using bivariate basis fns or tensor product of twouniv bases fns. Different penalization (Wood, 2005)
. Adjust estimate to be symmetric; zero out negative eigenvals !
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 37 / 71
Methodology
Sparse fPCA
. ‘Off-diagonal’ smoothing: Define “raw -covariances”
Gijj ′ = {Yij − µ(tij)}{Yij ′ − µ(tij ′)}
- two-dimensional smoothing of {Gijj ′ , tij , tij ′) : j 6= j ′}iworking model: Gijj ′ = Σ(tij , tij ′) + eijj ′ , eijj ′ ∼ IID,
where Σ(·, ·) symmetric + positive-definite fn
Model Σ using bivariate basis fns or tensor product of twouniv bases fns. Different penalization (Wood, 2005)
. Adjust estimate to be symmetric; zero out negative eigenvals !
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 37 / 71
Methodology
Sparse fPCA
. ‘Diagonal’ smoothing: Define raw -variances
Gij = {Yij − µ(tij)}2
- one-dimensional smoothing of {(Gij , tij) : i , j} → σ2Y (t)
. Estimate σ2: σ2 =∫T {σ
2Y (t)− Σ(t, t)} dt
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 38 / 71
Methodology
Sparse fPCA
. ‘Diagonal’ smoothing: Define raw -variances
Gij = {Yij − µ(tij)}2
- one-dimensional smoothing of {(Gij , tij) : i , j} → σ2Y (t)
. Estimate σ2: σ2 =∫T {σ
2Y (t)− Σ(t, t)} dt
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 38 / 71
Methodology
Sparse fPCA (cont’d)
. Eigenanalysis of Σ(t, s) gives eigenvals/fns, {λk , φk(t)}k ,
. Choose K using PVE or other approaches (e.g. 95%)
. Estimate true signal Xi (·) by the truncated KL expansion:
XKi (t) = µ(t) +
K∑k=1
ξik φk(t)
. How to estimate the fPC ξik ?∑mij=1{Yij − µ(tij)}φk(tij)(tij − tij−1) is no longer feasible !?
Instead estimate fPC using conditional expectation !
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 39 / 71
Methodology
Sparse fPCA (cont’d)
. Eigenanalysis of Σ(t, s) gives eigenvals/fns, {λk , φk(t)}k ,
. Choose K using PVE or other approaches (e.g. 95%)
. Estimate true signal Xi (·) by the truncated KL expansion:
XKi (t) = µ(t) +
K∑k=1
ξik φk(t)
. How to estimate the fPC ξik ?∑mij=1{Yij − µ(tij)}φk(tij)(tij − tij−1) is no longer feasible !?
Instead estimate fPC using conditional expectation !
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 39 / 71
Methodology
Sparse fPCA (cont’d)
. Eigenanalysis of Σ(t, s) gives eigenvals/fns, {λk , φk(t)}k ,
. Choose K using PVE or other approaches (e.g. 95%)
. Estimate true signal Xi (·) by the truncated KL expansion:
XKi (t) = µ(t) +
K∑k=1
ξik φk(t)
. How to estimate the fPC ξik ?∑mij=1{Yij − µ(tij)}φk(tij)(tij − tij−1) is no longer feasible !?
Instead estimate fPC using conditional expectation !
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 39 / 71
Methodology
Prediction of fPC
. Consider the reduced rank model (mixed effects model)
Yij = µ(tij) +K∑
k=1
φk(tij)ξik + εij j = 1, 2, . . . ,mi
ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)
Assume for now that µ(·), φk(·), λk and σ2 are known
. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]
. Assume ξik ’s and Yij ’s jointly Gaussian.
Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:
ξik = λkΦTik(Σi + σ2Imi )
−1(Yi − µi ).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71
Methodology
Prediction of fPC
. Consider the reduced rank model (mixed effects model)
Yij = µ(tij) +K∑
k=1
φk(tij)ξik + εij j = 1, 2, . . . ,mi
ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)
Assume for now that µ(·), φk(·), λk and σ2 are known
. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]
. Assume ξik ’s and Yij ’s jointly Gaussian.
Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:
ξik = λkΦTik(Σi + σ2Imi )
−1(Yi − µi ).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71
Methodology
Prediction of fPC
. Consider the reduced rank model (mixed effects model)
Yij = µ(tij) +K∑
k=1
φk(tij)ξik + εij j = 1, 2, . . . ,mi
ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)
Assume for now that µ(·), φk(·), λk and σ2 are known
. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]
. Assume ξik ’s and Yij ’s jointly Gaussian.
Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:
ξik = λkΦTik(Σi + σ2Imi )
−1(Yi − µi ).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71
Methodology
Prediction of fPC
. Consider the reduced rank model (mixed effects model)
Yij = µ(tij) +K∑
k=1
φk(tij)ξik + εij j = 1, 2, . . . ,mi
ξik ∼i IID(0, λk) , uncorrelated over k , and εij ∼ IID(0, σ2)
Assume for now that µ(·), φk(·), λk and σ2 are known
. Prediction of ξik by ξik = E [ξik |Yi1, . . . ,Yimi]
. Assume ξik ’s and Yij ’s jointly Gaussian.
Then ξik is the best linear unbiased predictor (BLUP) of ξik .In matrix notation; subscript i → to allow for tij ’s per subject:
ξik = λkΦTik(Σi + σ2Imi )
−1(Yi − µi ).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 40 / 71
Methodology
Prediction of fPC in practice
. Use the approximated reduced rank model as working model
Yij = µ(tij) +K∑
k=1
φk(tij)ξik + εij
where ξik ∼i IID(0, λk) and εij ∼IID (0, σ2)
. Predict ξik by the empirical BLUP:
ξik = λkΦTik(Σi + σ2Imi )
−1(Yi − µi ).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 41 / 71
Methodology
Prediction of fPC in practice
. Use the approximated reduced rank model as working model
Yij = µ(tij) +K∑
k=1
φk(tij)ξik + εij
where ξik ∼i IID(0, λk) and εij ∼IID (0, σ2)
. Predict ξik by the empirical BLUP:
ξik = λkΦTik(Σi + σ2Imi )
−1(Yi − µi ).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 41 / 71
Methodology
Theoretical properties of sparse fPCAUnder suitable regularity conditions
. Let µ(t) be the local linear mean estimator of µ(t). Then
supt|µ(t)− µ(t)| = Op(n−3/10)
. Let Σ(s, t) be the local linear estimator of Σ(s, t). Then
supt,s|Σ(s, t)− Σ(s, t)| = Op(n−1/10)
Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0
Let {λk , φk}k be the eigenvalues/eigenfunctions of Σ(·, ·) andck = sign
∫φk(t)φk(t)dt. Then for all k = 1, . . . ,K
‖ck φk − φk‖2 = Op(n−1/10), |λk − λk | = Op(n−1/10)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 42 / 71
Methodology
Theoretical properties of sparse fPCAUnder suitable regularity conditions
. Let µ(t) be the local linear mean estimator of µ(t). Then
supt|µ(t)− µ(t)| = Op(n−3/10)
. Let Σ(s, t) be the local linear estimator of Σ(s, t). Then
supt,s|Σ(s, t)− Σ(s, t)| = Op(n−1/10)
Assume λ1 > λ2 > . . . > λK > λK+1 ≥ 0
Let {λk , φk}k be the eigenvalues/eigenfunctions of Σ(·, ·) andck = sign
∫φk(t)φk(t)dt. Then for all k = 1, . . . ,K
‖ck φk − φk‖2 = Op(n−1/10), |λk − λk | = Op(n−1/10)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 42 / 71
Methodology
Illustration: CD4Top: estimated mean; estimated covariance; scree plotBottom: Top fPC and the effect of changes along this direction.
−20 −10 0 10 20 30 40
050
010
0015
0020
0025
0030
00
CD4 cell counts
months since seroconversion
num
ber
of c
d4 c
ells
●●
●
● ●
●●
●
●●
●
●
●
●
●
●
row
colu
mn
10
20
30
40
50
60
10 20 30 40 50 60
40000
50000
60000
70000
80000
90000
100000
110000
120000
●
●
●
● ● ●
1 2 3 4 5 6
0.80
0.85
0.90
0.95
1.00
scree plot
number of PCs
PE
V
−20 −10 0 10 20 30 40
−2
−1
01
2
fPC1
month
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−20 −10 0 10 20 30 40
050
010
0015
0020
0025
0030
00
fPC1 79.5%
month
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 43 / 71
Methodology
Software implementation
. R
I for estimation
face for fast covariance estimation for sparse functional datafda for functional data analysis in Rfpca for functional principal component analysismgcv for generalized additive (mixed) models;semi-parametric smoothingrefund for regression with functional data;
I for visualization
fields for 2d image plotslattice for various plotsrefund.shiny for interacting plots for functional dataanalysesrgl for 3d plots
. MATLAB
I PACE
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 44 / 71
Methodology
Beyond iid functional data
Although the iid case is quite common, here are other situations:
. Multilevel functional data:
I {Yij(s), s ∈ S, i = 1, . . . , n, j = 1, . . . ,mi}I Eg: crypt-level biomarker data in colon-carcinogenesis studies
. Longitudinal functional data:
I [{Yij(s),Tij}, s ∈ S, i = 1, . . . , n, j = 1, . . . ,mi}I Eg: DTI data (multiple clinical visits)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 45 / 71
Methodology
DTI data (revisit)
Each subject is observed at several hospital visits and at eachtime a FA profile is measured. Below: FA for two subjects.Of interest: what is the dynamic behavior over time ?
0 20 40 60 80
0.40
0.50
0.60
Subject A (total of 4 visits)
tract
FA
visit times = 0, 192, 365, 834
0 20 40 60 80
0.30
0.40
0.50
0.60
Subject B (total of 3 visits)
tract
FA
visit times = 0, 112, 239
FA along CCA at various times after baseline (time=0)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 46 / 71
Methodology
Longitudinal functional data analysis (LFDA)
Data Structure
[{Yij(·), Tij} : i = 1, . . . , n, j = 1, . . . ,mi ]
. Yij(·) is jth response of the ith subject observed on fine grid
. Tij is the time corresponding to the Yij(·)
Tij is observed in a longitudinal design
Objective:
. Dynamic behavior of the process over time
. Predict full curve using subjects’ past history
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 47 / 71
Methodology
Related literature
. Morris & Carroll(JRSSB06); Baladandayuthapani etal(Bcs08); Di et al(AoAS09); Aston et al(JRSSC10); S. et al(Biost10); Scheipl et al(JCGS14); Morris (AnRev15)
. Greven et al(EJS10); Chen & Muller(JASA12);. . .
Limitation: no methodology that captures the process dynamicsover time and provides prediction of full trajectory at future time
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 48 / 71
Methodology
Park & S. (Stat15)
Proposed Model
Yij(s) = µ(s,Tij) +∑∞
k=1 ξik(Tij)φk(s) + εij(s)
. µ(·,Tij) mean response profile (FA along CCA) at time Tij
. φk(·)’s are orthogonal functions in L2(S)
. ξik(T ) zero-mean time-varying random coefficients; theyquantify the process dynamics
. εij(·) zero-mean IID residual process
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 49 / 71
Methodology
Flexibility of the proposed framework
PAR : cov{ξik(T ), ξik(T ′)} = λkρk(|T − T ′|) , λk > 0;
relates to Gromenko et al(AnnAS12) and Gromenko andKokoszka (CSDA13) for n = 1
REM : ξik(T ) = ζ0,ik + ζ1,ikT ;
relates to Greven et al(EJS10) for φX1k (s) = φX2
k (s) := φk(s)
NP : ξik(T ) =∑
`≥1 ζik`ψk`(T );
relates to Chen & Muller(JASA12) for φk(s|Tij) := φk(s)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 50 / 71
Methodology
Selecting the time-invariant basis functions
Recall model : Yij(s) = µ(s,Tij) +∑∞
k=1 ξik(Tij) φk(s) + εij(s)
. Pre-specified basis functions (e.g. Fourier, B-splines, wavelets)
. Data-driven basis functions like the eigenfunctions of somecovariance function. Which covariance function to use ?
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 51 / 71
Methodology
Time-invariant basis functions, φk(s)’s (cont’d)
Recall model : Yij(s) = µ(s,Tij) +∑∞
k=1 ξik(Tij) φk(s) + εij(s)
. Let Wi (s,T ) =∑∞
k=1 ξik(T )φk(s) and denote its covariance by
cov{Wi (s,T ), Wi (s′,T ′)} = c{(s,T ), (s ′,T ′)}
. Define Σ(s, s ′) :=∫c{(s,T ), (s ′,T )} g(T ) dT ,
g(T ) the sampling density of T .
. Σ(s, s ′) → proper cov fn. Call Σ marginal cov induced by Wi .Similar concept: Chen et al.(JRSSB16+), Aston et al.(srXiv15)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 52 / 71
Methodology
Time-invariant basis functions, φk(s)’s (cont’d)
. Choose {φk(·)}k ’s as the eigenfunctions of Σ(s, s ′)
. Proxy time-varying basis coefficients
ξW ,ijk =∫{Yij(s)− µ(s,Tij)} φk(s)ds
I For each k , {(ξW ,ijk ,Tij)mi
j=1}i describe the process dynamics.
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 53 / 71
Methodology
Estimation roadmap
Step 1 : Mean function µ(s,T )
Step 2 : Marginal covariance function Σ(s, s ′) and the orthogonalbasis φk(s)’s
Step 3 : Time-varying coefficients ξik(T ) for every T
Step 4 : Full trajectories Yi (·,T ) for every T
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 54 / 71
Methodology
Estimation
Case: fully observed curves L2[S] error
Step 1 : Mean function
. Estimate µ(s,T ) using bivariate (tensor product) splinessmoothing Wood (Bcs2006) + working independence ⇒ µ(s,T )
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 55 / 71
Methodology
Estimation (cont’d)
Step 2 : Marginal Covariance Function
. Demean data: Yij(s) := Yij(s)− µ(s,Tij)
. Estimate Σ(s, s ′) by sample covariance of the demeaned data
Σ(s, s ′) =1∑n
i=1mi
∑ni=1
∑mij=1 Yij(s)Yij(s
′)
. Spectral decomposition of Σ(s, s ′) gives {λk , φk(s)}k
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 56 / 71
Methodology
Estimation (cont’d)
Step 3 : Time-varying coefficients ξik(T ) for every T
For each k estimate proxy ξW ,ijk = ξik(Tij) + eijk by
ξW ,ijk=∫Yij(s) φk(s)ds (numerical integ.)
. Use standard longitudinal/sparse functional data methods toanalyse {(ξW ,ijk ,Tij)
mij=1}i and estimate temporal covaraince
. Predict ξik(T ): ξik(T ) for every T using this estimated covariance
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 57 / 71
Methodology
Estimation (cont’d)
Step 4 : Predict the full trajectory Yi (·,T ) for every T
Yi (·,T ) = µ(·,T ) +∑K
k=1 ξik(T )φk(·)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 58 / 71
Methodology
Theoretical properties
Lemma 1: Marginal covariance
Under regularity assumptions, ‖Σ(·, ·)− Σ(·, ·)‖ →p 0
. Consistency results for λk and φk(·).
Lemma 2 : Proxy time-varying basis coefficients
Under regularity assumptions, supj |ξW ,ijk − ξW ,ijk | →p 0
. Consistency of the predicted trajectories, Yi (s,T ).
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 59 / 71
Methodology
Implementation in R
Software implementation in R (Wrobel, Park, S., and Goldsmith,Stat 2016)
. refund: fpca.lfda
. refund.shiny: plot shiny for visualization
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 60 / 71
Methodology
Simulation experiment
Generating Model:
Yij(s) = µ(s,Tij) + ξi1(Tij)φ1(s) + ξi2(Tij)φ2(s) + εij(s)
. Covariance structures for the time-varying coef, ξik(T )’s(i) NP ; (ii) REM ; (iii) Exp
. Error structure εij(s): smooth + white noise (SNR= 1)
. Grid points for s: 101 equally spaced points in [0, 1]
. For each i , {Tij : j = 1, 2, . . . ,mi} are randomly sampled from 41equally spaced points in [0, 1]
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 61 / 71
Methodology
Prediction performance and computation efficiency comparison:
. Proposed method
. Naıve: take average the subject’s previous curves to predictthe current trajectory
. Chen&Muller (JASA12): using time-dependent orthogonalfunctions φk(s|T )
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 62 / 71
Methodology
Results
mi ∼ {8, . . . , 12}IN-IPE IN-IPEnaive OUT-IPE OUT-IPEnaive
NP n = 100 0.406 7.790 0.988 11.478n = 300 0.313 7.773 0.559 11.349n = 500 0.288 7.779 0.455 11.262
REM n = 100 0.328 1.199 1.011 2.160n = 300 0.265 1.197 0.675 2.160n = 500 0.247 1.197 0.571 2.150
Exp n = 100 0.554 1.528 1.426 2.520n = 300 0.508 1.531 1.143 2.498n = 500 0.494 1.530 1.074 2.492
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 63 / 71
Methodology
Results (cont’d)
. Computationally faster compared to available approaches
mi ∼ {8, . . . , 12}Chen&Muller (JASA12) Proposed method
IN-IPE OUT-IPE time (sec) IN-IPE OUT-IPE time (sec)
NP n = 100 0.880 2.221 983.872 0.406 0.988 7.369n = 300 0.622 1.468 1659.611 0.313 0.559 15.892n = 500 0.556 1.298 2502.462 0.288 0.455 21.418
REM n = 100 0.424 1.359 1084.753 0.328 1.011 9.282n = 300 0.289 0.729 1955.193 0.265 0.675 11.347n = 500 0.257 0.614 2947.126 0.247 0.571 22.559
Exp n = 100 0.634 1.642 1556.182 0.554 1.426 7.514n = 300 0.549 1.251 1959.219 0.508 1.143 16.229n = 500 0.531 1.155 2865.041 0.494 1.074 17.109
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 64 / 71
Methodology
DTI data analysis
. FA along CCA for MS patients
. 162 MS patients observed at between 1 to 8 hospital visits
. Tij - hospital visit time (mean = 2.6 visits/subj)
. Yij(·) - FA profile (93 locn along CCA) for i subj at Tij
. 421 total curves
Objective:
. Dynamic behavior of FA over time
. Predict FA profile at a subject’s future visit
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 65 / 71
Methodology
DTI study (cont’d)
DTI data exploratory analysis
Model assumption
. Yij(s) = µ(s) +∑K
k=1 ξik(Tij) φk(s) + εij(s)
. Time-varying coef: ξik(Tij) = b0ik + b1ikTij (REM)
Estimation
Longitudinal functional data analysis for DTI data
. φk(·)’s - eigenfns of the estimated marginal covariance Ξ(·, ·)
. Fix percentage of explained variance to 95% → K = 10
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 66 / 71
Methodology
DTI study results (time dynamics)
Figure: Estimated basis functions φk(s) for k = 1, 2 and 3
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 67 / 71
Methodology
DTI study results (time dynamics)
Figure: Time-varying coefficients ξik(T ) for k = 1, 2 and 3 (REM)
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 68 / 71
Methodology
DTI study (cont’d)
Predicted values of FA for the last visits of three randomly selectedsubjects; actual(gray) / proposed (black) / naıve (dashed)
0.0 0.2 0.4 0.6 0.8 1.0
0.40
0.45
0.50
0.55
0.60
0.65
subject A, visit 6
tract
0.0 0.2 0.4 0.6 0.8 1.0
0.4
0.5
0.6
0.7
subject B, visit 2
tract
0.0 0.2 0.4 0.6 0.8 1.0
0.35
0.45
0.55
0.65
subject C, visit 3
tract
actual obsnaive predictionour prediction
Naıve Greven et al. (EJS10) Chen & Muller (JASA12) Proposed
IN-IPE×102 - 2.66 3.76 2.31OUT-IPE×102 3.52 - 8.71 3.48
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 69 / 71
Methodology
Final remarks
. Sample of independent curves
I Smoothing using pre-specified basis
I FPCA
. Longitudinally observed curves
I Process dynamics + future curve prediction
. Software implementation/visualization in R
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 70 / 71
Methodology
Thank you!
Comments? Questions ? [email protected]
A-M Staicu Tutorial on Functional Data Analysis April 5, 2017 71 / 71