florida state university libraries - fsu.digital.flvc.org653405/...n(left) and p-values (right) for...
TRANSCRIPT
Florida State University LibrariesElectronic Theses, Treatises and Dissertations The Graduate School
2018
Elastic Functional Principal ComponentAnalysis for Modeling and Testing ofFunctional DataMegan Duncan
Follow this and additional works at the DigiNole: FSU's Digital Repository. For more information, please contact [email protected]
FLORIDA STATE UNIVERSITY
COLLEGE OF ARTS AND SCIENCES
ELASTIC FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS FOR MODELING AND
TESTING OF FUNCTIONAL DATA
By
MEGAN DUNCAN
A Dissertation submitted to theDepartment of Statistics
in partial fulfillment of therequirements for the degree of
Doctor of Philosophy
2018
Copyright c© 2018 Megan Duncan. All Rights Reserved.
Megan Duncan defended this dissertation on April 19, 2018.The members of the supervisory committee were:
Anuj Srivastava
Professor Directing Thesis
Eric Klassen
University Representative
Fred Huffer
Committee Member
Wei Wu
Committee Member
The Graduate School has verified and approved the above-named committee members, and certifiesthat the dissertation has been approved in accordance with university requirements.
ii
TABLE OF CONTENTS
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 Introduction 11.1 Motivation for Phase and Amplitude Separation . . . . . . . . . . . . . . . . . . . . 41.2 Background Material: Mathematical Amplitude Framework . . . . . . . . . . . . . . 6
1.2.1 Phase Amplitude Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Space of Amplitudes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Space of Time Warping Functions . . . . . . . . . . . . . . . . . . . . . . . . 91.2.4 Distance Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Motivation for Testing Phase Variability . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Overview of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Methods of Modeling Functional Data 142.1 Models for Functional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Functional Principal Component Analysis in SRVF Space . . . . . . . . . . . 152.1.2 FPCA of Phase and Amplitude Components . . . . . . . . . . . . . . . . . . 162.1.3 Elastic FPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Metric-Based Hypothesis Testing for Phase Variability in Functional Data 333.1 Background Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Metric-Based Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Friedman-Rafsky Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2.2 Schilling Nearest Neighbors Test . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.1 Basic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.2 Pseudo - Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Additional Work: Energy Test and Permutation Distribution . . . . . . . . . . . . . 433.4.1 Energy Test and Permutation Distribution Methods . . . . . . . . . . . . . . 453.4.2 Energy Test and Permutation Distribution Results . . . . . . . . . . . . . . . 46
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Model-Based Hypothesis Testing for Phase in Functional Data 514.1 Background Material: Review of Functional Data Models . . . . . . . . . . . . . . . 52
4.1.1 Functional Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 524.1.2 Elastic Functional Principal Component Analysis . . . . . . . . . . . . . . . . 53
4.2 Introduction to Hypothesis Testing for Phase Variability . . . . . . . . . . . . . . . . 544.3 Background Material: Concordance Correlation Coefficient . . . . . . . . . . . . . . 56
4.3.1 Concordance Correlation Coefficient in R1 . . . . . . . . . . . . . . . . . . . . 564.3.2 Concordance Correlation Coefficient in L2 . . . . . . . . . . . . . . . . . . . . 57
4.4 Methods of Hypothesis Testing for Phase with CCC . . . . . . . . . . . . . . . . . . 59
iii
4.4.1 Applying Concordance Correlation Coefficient to Hypothesis Testing . . . . . 604.4.2 Simple Concordance Correlation Coefficient for Two Comparisons . . . . . . 61
4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5.2 Real Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 General Discussion and Future Work 74
Appendix
A Proof of Theorem in Chapter 4 77
B IRB Approval 81
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
iv
LIST OF TABLES
2.1 The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate avariety of amplitude and time warping variability. . . . . . . . . . . . . . . . . . . . . 20
3.1 Distances between observations are shown above. Using K = 2 as an example, thetwo nearest neighbors for each observation is in bold for that row. For example, A1’snearest neighbors are A2 and B1. We now count the number of these neighbors, fromthe same sample size as the observation (last column). There are 10 such neighbors,resulting in TNK = 10
12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1 A table showing the correlation and CCC between X,Y and X,Z. . . . . . . . . . . . 62
4.2 The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate avariety of amplitude and time warping variability. . . . . . . . . . . . . . . . . . . . . 63
v
LIST OF FIGURES
1.1 The Berkeley Growth Curves show the change in female children’s height (cm/year)from the ages of 1 to 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 We show the average daily temperature (Celsius) over the course of the year for 35locations in Canada. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 On the left are the Akamai Technologies closing stock prices for each business day in2014. We smoothed the stock data to eliminate seasonal variation (shown on the right). 3
1.4 On the left are simulated functions. On the right are the pointwise mean (black) andthe pointwise mean plus/minus one standard deviation (red). . . . . . . . . . . . . . . 5
1.5 On the left are the Berkeley growth curves. On the right are the pointwise mean(black) and the pointwise mean plus/minus one standard deviation (red). . . . . . . . 5
1.6 This figure illustrates the Multiple Alignment Algorithm. The algorithm initializes(top row) time warping as identity. For each iteration, we show the phase (left column),amplitudes (middle column), and mean (right column). After initialization, we showthe next three iterations (second through fourth row). We terminate the process onthe twentieth iteration (bottom row). . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Growth rates (cm/year) of girls between the ages of 1 and 18 years of age. The firstplot shows the unaligned growth curves. The curves are then aligned using MultipleAlignment and are separated into amplitudes (center) and time warping functions(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8 Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phasecomponents (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.9 Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phasecomponents (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Representation of Algorithm 5 A: Use {qi} (black dot) to find {bj}. The blue curverepresents all possible functions using this basis set. B: Fix {bj} (now black), findoptimal {cij} (red dot) by minimizing distance (purple line). C: Find {qi} (greendot), which is minimum distance from reparameterizations of {qi} to reconstruction.D: Find optimal reconstruction of {qi}, repeat until converged. E: Use {qi} to find anew basis set (blue). F: Repeat steps B through E until converged on {bj}. This imageis only meant to be an abstract representation to give a more intuitive understandingof the algorithm. It does not reflect all the details of the algorithm, nor does it properlyreflect the mathematics behind the algorithm. . . . . . . . . . . . . . . . . . . . . . . 19
2.2 The original sampled amplitude SRVF functions: {qi}. These amplitudes have almostno variability (left) and a lot of variability (right). . . . . . . . . . . . . . . . . . . . . 20
vi
2.3 The original sampled amplitude functions: {fi}. Note that the inverse SRVF ofthe basis functions are not orthonormal, so the variability in these functions do notnecessarily correspond to the variability seen in Figure 2.2 . . . . . . . . . . . . . . . . 21
2.4 The original sampled time warping functions: {γi}. Note that these functions aresampled in a vector space and mapped to Γ. Therefore the variability in these functionsdo not necessarily correspond to the variability described by Table 2.1 . . . . . . . . . 21
2.5 The original SRVF functions: {qi}. Note that the same amplitudes are used for eachrow and the same time warping functions are used for each column. . . . . . . . . . . 22
2.6 The original sampled functions: {fi}. Note that the same amplitudes are used foreach row and the same time warping functions are used for each column. . . . . . . . 23
2.7 The reconstructed SRVF functions: {qi} for the three methods. All three methods areconverging to the same set of functions. Computational noise is causing the functionsto look more jagged. This noise is very minimal and is avoided when observing the{fi} in Figure 2.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 The reconstructed functions: {fi} for the three methods. Note that these are theinverse SRVFs of the reconstrunstructions, as oppose to reconstructions themselves. . 26
2.9 The amplitude reconstruction of the SRVFs and functions: {ˆqi} and { ˆfi} for Separate
FPCA. Note that the time warping functions seem to have accounted for some ofthe initial amplitude variability and therefore there is almost none present in thereconstructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.10 The basis functions of the SRVFs, scaled by the variance of the coefficients, used in thereconstruction process of the amplitudes. Computational noise causes these functionsto appear jagged, although this error is minimal. . . . . . . . . . . . . . . . . . . . . . 28
2.11 The first column contains basis functions of the {vi}, scaled by the variance of thecoefficients, used in the reconstruction process of the amplitudes from Separate FPCA.The second column are the {vi} themselves. The final column is the exponentialmapping of the {vi}, i.e. the {γi}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.12 Starting in the top row and moving left to right: {γi} from the Elastic FPCA methodfor J = 0, . . . , 4. A plot for the {γi} at J = 0 are included for reference. Note thatthis is the only part of any of the methods where J = 0 is more than just a simplemean of the functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.13 These are the error rates for all 9 variability settings. Each individual plot shows theerror rates for J = 0, ..., 4 of the reconstructions for FPCA (blue), Separate FPCA(green) and Elastic FPCA (red), using Equation 2.6. . . . . . . . . . . . . . . . . . . 32
3.1 We observe sampled functions with phase variability (top) and without phase vari-ability (bottom), in the first column. These functions are then split into amplitudes(second column) and phase (third column) components. . . . . . . . . . . . . . . . . . 34
3.2 The first image shows the original growth curves (cm/year). Their SRVFs are takenand then aligned, shown in the second and third images respectively. The L2 distances
vii
between call the curves in the second and third image are displayed in the fourth image.The first half of the indices correspond to the unaligned SRVFs and the second halfcorrespond to the aligned SRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 A Minimal Spanning Tree (left) is formed from two samples. As described in Algorithm6, the tree is then cut where edges connect nodes from different samples (right). Thereare now TN = 13 separate trees remaining. This example shows that one sample (red)is completely connected in the original tree. Some nodes are only connected to theoriginal tree through the opposite sample, which now form trees of a single node. . . . 38
3.4 Various simulated functions before (left) and after alignment (center). The secondround of alignment (right) shows both the functions from the first round of alignment(blue) and the second (red) for comparison. . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 The TN (left) and P-values (right) for the simulated functions using the SNNT methodare shown as a function of K. The results are shown for comparing unaligned functionsto aligned (blue) and aligned to aligned (red) functions. . . . . . . . . . . . . . . . . . 41
3.6 Various growth curves before (left) and after alignment (center). The second roundof alignment (right) shows both the functions from the first round of alignment (blue)and the second (red) for comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7 The TN (left) and P-values (right) for the Berkeley Growth curves using the SNNTmethod are shown as a function of K. The results are shown for comparing unalignedfunctions to unaligned (blue) and aligned to aligned (red) functions. . . . . . . . . . . 42
3.8 Unaligned simulated functions are compared to aligned simulated functions using thesampling technique described. Results are shown for FRT and for K = 6 in SNNT. . . 44
3.9 Unaligned growth curves are compared to aligned growth curves using the samplingtechnique described. Results are shown for FRT and for K = 6 in SNNT. . . . . . . . 44
3.10 We want to know: are the fi (blue) significantly different from the fi (red)? The teststatistic is TN = 1540.8− 986.5− 68.9 = 485.4. . . . . . . . . . . . . . . . . . . . . . . 46
3.11 Distribution of TN under null hypothesis (blue). Estimated TN from original sample(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.12 Amplitudes (top) and Phases (bottom) used for simulation. No axes are shown, but allfunctions are plotted on the same scale. These amplitudes and phases are composedto form the initial functions used in the simulation. . . . . . . . . . . . . . . . . . . . 47
3.13 The amplitudes and phases are composed to simulate a variety of functions. No axesare shown, but all functions are plotted on the same scale. The variability in theamplitudes increases toward the right. The variability in the phases increases upward. 48
3.14 For each of the nine cases, we see the test statistics (red) along with the permutationdistribution. The nine cases shown are all plotted on the same axes to demonstratethe change in significance as phase and amplitude variability increase. . . . . . . . . . 49
viii
3.15 This image illustrates when phase variability is significant or not significant. Thecenter region is where the methods proposed in Chapter 3 struggle. . . . . . . . . . . 50
4.1 The paired scores for comparison are shown on the left. The identity line (black) hasbeen added for reference. A bootstrapped distribution for ρc is shown on the right.The vertical line shows the location of the point estimate. . . . . . . . . . . . . . . . . 57
4.2 Left: One pair of the twenty simulated functions. Right: The bootstrapped distribu-tion of the CCC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 We want to measure the agreement between the scores on the X-axis and those on theY-axis for two samples: Y (red) and Z (blue) shown on the left. The line of identity(black) has been added for reference. On the right, the bootstrapped distribution ofρyc is compared to the bootstrapped distribution of ρzc . . . . . . . . . . . . . . . . . . . 62
4.4 Simulation The original simulated functions. Each row uses the same amplitudeswith increasing time warping variance moving to the right. Each column uses thesame time warping functions, but increasing amplitude variance moving upward. . . . 65
4.5 Simulation Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c (red) for
each J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6 Simulation The p-values for the nine cases are shown for J = 0, ..., 50. The amplitudevariance increases as the plots move to the right. Each plot shows the p-values forsmall (blue), medium(red), and large (yellow) time warping variance. . . . . . . . . . 66
4.7 Female Growth Curves Left: The original female growth curves. Middle andRight: Reconstructions of the growth curves using J = 1 for FPCA and ElasticFPCA, respectively. Note that the functions are shown in the original space and notthe SRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8 Female Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and
ρE(J)c (red) for each J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.9 Female Growth Curves The p-values for the female growth curves are close to zerountil around J = 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.10 Male Growth Curves Left: The original male growth curves. Middle and Right:Reconstructions of the growth curves using J = 1 for FPCA and Elastic FPCA,respectively. Note that the functions are shown in the original space and not theSRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.11 Male Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c
(red) for each indicated J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . 69
4.12 Male Growth Curves The p-values for the male growth curves are close to zerountil around J = 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.13 Tecator Data Left: The original tecator functions. Middle and Right: Reconstruc-tions of the tecator functions using J = 1 for FPCA and Elastic FPCA, respectively.Note that the functions are shown in the original space and not the SRVFs. . . . . . . 70
ix
4.14 Tecator Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c (red)
for each indicated J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.15 Tecator Data The p-values for the tecator data are close to zero until around J = 20. 71
4.16 Temperature Data Left: The original temperature functions. Middle and Right:Reconstructions of the temperature functions using J = 1 for FPCA and ElasticFPCA, respectively. Note that the functions are shown in the original space and notthe SRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.17 Temperature Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c
(red) for each indicated J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . 72
4.18 Temperature Data The p-values for the weather data are close to zero until aroundJ = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
x
ABSTRACT
Statistical analysis of functional data requires tools for comparing, summarizing and modeling
observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA)
is the presence of the phase variability in the observed data. A successful statistical model of
functional data has to account for the presence of phase variability. Otherwise the ensuing inferences
can be inferior. Recent methods for FDA include steps for phase separation or functional alignment.
For example, Elastic Functional Principal Component Analysis (Elastic FPCA) uses the strengths
of Functional Principal Component Analysis (FPCA), along with the tools from Elastic FDA, to
perform joint phase-amplitude separation and modeling. A related problem in FDA is to quantify
and test for the amount of phase in a given data. We develop two types of hypothesis tests for
testing the significance of phase variability: a metric-based approach and a model-based approach.
The metric-based approach treats phase and amplitude as independent components and uses their
respective metrics to apply the Friedman-Rafsky Test, Schilling’s Nearest Neighbors, and Energy
Test to test the differences between functions and their amplitudes. In the model-based test, we
use Concordance Correlation Coefficients as a tool to quantify the agreement between functions
and their reconstructions using FPCA and Elastic FPCA. We demonstrate this framework using a
number of simulated and real data, including weather, tecator, and growth data.
xi
CHAPTER 1
INTRODUCTION
Functional Data is an important topic with applications in many numerous fields, such as: biology,
physics, mathematics, etc. Various applications have been explored with functional data: gait
measurements, tecator data, stock rates, childhood growth rates, weather data, etc.
The importance of functional data analysis can best be explained through an example. Figure
1.1 shows the Berkeley growth curves [9]. The children’s heights were recorded on a quarterly basis
from the ages of 1 to 18 years. We could treat each record as a vector of 69 time points instead
of as a function over time. However, children’s heights change overtime and not just at the time
points measured. Therefore it makes more sense to consider functional data.
Another example is seen in Figure 1.2. The average daily temperature at various locations in
Canada are plotted per day of the year for 35 locations in Canada [17]. We could treat each location
as a vector of 365 days, but temperature is not discrete and therefore makes more sense to consider
as a function.
According to Morris [15], functional regression is the fastest growing area in functional data
analysis. This developing area includes Functional Predictor Regression, Functional Response Re-
gression, and Function-on-Function Regression. Three other areas of Functional Data Analysis
include: Replication, Regularization and Basis Functions. Replication and Regularization are key
components in statistics in general, but they play an extra important role in functional data. Func-
tional data is often noisy and our goal is to explain the variation in noise between replicates. We
can smooth functions to assist with eliminating excessive noise.
Figure 1.3 demonstrates the advantage of smoothing through stock market data. The plot on
the left shows the closing stock price for Akamai Technologies on every business day in 2014. The
day to day noise is sometimes referred to as seasonal variation. If we are interested in long term
trends of this particular stock, we may want to eliminate this seasonal trend. The plot on the right
shows the smoothed Akamai stock prices using a moving average. Now it is easier to see the general
trend of the Akamai stock prices over the course of the year. We will not go into the details of this
method since it is not of interest to us in our problem.
1
Figure 1.1: The Berkeley Growth Curves show the change in female children’s height (cm/year)from the ages of 1 to 18.
Figure 1.2: We show the average daily temperature (Celsius) over the course of the year for 35locations in Canada.
2
Figure 1.3: On the left are the Akamai Technologies closing stock prices for each business day in2014. We smoothed the stock data to eliminate seasonal variation (shown on the right).
The first part of this dissertation will focus on Basis Functions and using them to model func-
tional data. The various models will be discussed in Chapter 2. These models originate from
Principal Component Analysis (PCA) [16]. PCA takes is a method of representing correlated,
multivariate observations as uncorrelated variables called Principal Components. PCA allows us to
express observations using fewer variables. The Karhunen-Loeve Expansion Theorem [10] showed
the same principals can be applied to functions. The first model we explore, Functional Principal
Component Analysis (FPCA) [3] uses the Karhunen-Loeve Expansion Theorem to create a PCA
equivalent for functions.
The inclusion of a time warping component in the model has been explored though Separate
Functional Principal Component Analysis (Separate FPCA) [23]. This model stems from the idea
of representing functions as an amplitude component and an orthogonal phase component [20]. In
Separate FPCA, the amplitude component is expressed using FPCA. The phase component is also
expressed using FPCA after being mapped to a vector field.
The third model we explore, Elastic Functional Principal Component Analysis (Elastic FPCA)
is an expansion on work completed by Kneip [12] and Tucker [22]. Elastic FPCA models functions
with an FPCA basis while considering the amplitude and phase components jointly. Details of
Elastic FPCA and the other two models are presented in Chapter 2. The goal of this chapter is
to explore the reconstruction of functions with and without time warping. As we will see, this can
result in overfitting to models where time warping isn’t present.
The second part of this dissertation will focus on hypothesis testing. This Chapter will first
discuss phase-amplitude separation, before giving motivation for hypothesis testing.
3
Throughout this dissertation, we show examples of our results on both simulated and real data
sets. The real data sets include the NIR spectra data [4], the Berkeley Growth Curves [9], and
Canadian Weather data [17]. The NIR spectra data set, also referred to here as the tecator data,
used a ”Tecator Infratec spectrometer that measures the absorbencies at 100 wavelengths in the
region 850-1050 mm” [4]. The Berkeley Growth Curves show the change in children’s heights for
the ages of one to eighteen. The growth curves are a good example of time warping in a data set,
since children tend to have growth spurts and various ages. The Canadian Weather data shows the
average daily temperature for each day of the year at 35 separate weather stations in Canada.
1.1 Motivation for Phase and Amplitude Separation
Returning to the idea of growth spurts in children, we may want to consider that children have
growth spurts at different times. One child may have a growth spurt at the ages of 1, 4, 8, and
12, while another has growth spurts at the ages of 1, 3, 8, and 13. The two children both have
growth spurts, but at different ages. We can consider this a phase component. Children will also
have different amplitudes, or changes in height. For example, one child might growth 10 cm/year
at the age of 8, while another child grew 7 cm/year. We provide formal definitions of phase and
amplitude in the next section.
We need to take phase and amplitude into consideration in order to compute more accurate
statistics. Consider Figure 1.4 as an example, which shows sampled functions on the left and their
mean point wise sample mean and sample standard deviations on the right. When we take the
point wise mean of these functions, we get a plateau shaped function. The mean does not give an
accurate representation of the individual functions, which have distinct peaks. The same is true
for the standard deviation. If we were to separate out the phase and amplitude components, we
could get a more accurate representation of the functions.
The same issues arise with the Berkeley growth curves. Figure 1.5 shows the original growth
curves on the left, and their pointwise sample mean and sample standard deviation on the right. If
we take a point wise average of all the children’s growth curves, we would not get a very accurate
representation of growth spurts. It appears as though children grow quickly at first and then have
a relatively flat rate of growth until around the age of 12. However, we can see from the individual
growth curves that children have multiple distinct growth spurts throughout their childhood. We
again conclude that separating out phase and amplitude may provide more accurate representations
of the functions.
4
Figure 1.4: On the left are simulated functions. On the right are the pointwise mean (black) andthe pointwise mean plus/minus one standard deviation (red).
Figure 1.5: On the left are the Berkeley growth curves. On the right are the pointwise mean (black)and the pointwise mean plus/minus one standard deviation (red).
5
An L2 function can be described in terms of amplitude variability and phase, or time warping
variability. We will introduce the framework this dissertation uses in the next section.
1.2 Background Material: Mathematical Amplitude Framework
It is important to understand the spaces and distance metric we are using. This section provides
an overview of the space of amplitudes, the space of time warping functions, and the distance metric
used. More details can be found in the textbook ”Functional and Shape Data Analysis” [20]. We
begin by introducing the distance metric and multiple alignment. We will then discuss the details
of the amplitude and phase spaces in the following subsections.
1.2.1 Phase Amplitude Separation
In the previous section, we gave an intuitive idea of what is meant by phase and amplitude.
Now the goal is to separate the functions into their phase and amplitude components. Essentially
we will align the peaks and valleys of functions through composition with a time warping function.
We will then consider the aligned functions to be the amplitude component and the time warping
functions to be the phase component. We remind the reader that this subsection will just give a
general overview of Phase Amplitude Separation. The following subsections will go into detail.
Algorithm 1 gives a general overview of how phase and amplitude separation is conducted. Our
goal is to find a set of time warping functions, which minimize the variance of the amplitudes.
Algorithm 1 Multiple Alignment Overview
1. Initialize the mean of the functions
2. Align each function to the template mean
3. Update the mean
4. Repeat steps 2 and 3 until converged
Figure 1.6 illustrates Algorithm 1 using the Female Berkeley Growth data. We initialize the
process (top row) with an identity phase component (first column), the details of which are explained
in a later subsection. Since the phases are all identity, the amplitudes (second column) are the
original functions. We also show the mean of these amplitudes (third column). On the first
iteration (second row), we find time warping functions (second row, first column), which minimize
the distance between each original function and the mean of the amplitudes. Note that we compute
6
this distance in a pairwise fashion on a given iteration. We then compose the time warping functions
with the original functions to get a new set of amplitudes (second row, second column). We then
update the mean of the amplitudes using the pointwise mean of the new amplitudes (second row,
third column).
We repeat the process of computing the phase, amplitudes, and updating the mean of the
amplitudes until some convergence criteria has been met. In the particular case demonstrated in
Figure 1.6, we terminate the process after the twentieth iteration (bottom row).
1.2.2 Space of Amplitudes
We begin by discussing the general functions and amplitude space. Note that the L2 functions
exist on a closed interval [a, b]. For simplicity we will assume a = 0, b = 1 unless otherwise specified
for a real dataset. For a set of N,L2 functions: {fi(t)} i = 1, ..., N , we observe the Square-Root
Velocity Functions (SRVF), {qi(t)}:
qi(t) =
fi(t)√|fi(t)|
if fi(t) exists and is nonzero
0 otherwise,
A discussion of why we use the SRVF instead of the original functions is in [19].
The SRVFs can be expressed as follows:
qi(t) = µ(t) +∞∑j=1
cijbj(t),
where µ is an underlying mean, {bj} is an orthonormal basis of L2, and cij has mean 0 and
variance σ2j . Simulated functions will use the fourier basis with normally distributed coefficients
for simplicity.
In practice we use a finite basis, because an infinite basis is not practical:
qi(t) = µ(t) +J∑j=1
cijbj(t) + εi(t),
where J is the number of basis functions, and εi(t) has mean 0 and variance σ2ε . In application,
since the functions are discretized at M time points, J ≤ max(M,N). When discussing the basis
sets, it will be assumed var(cij) > var(cik) when j < k, as is common practice.
7
Iter. Phase Amplitude Mean of Amp.
0
1
2
3
... ... ...
20
Figure 1.6: This figure illustrates the Multiple Alignment Algorithm. The algorithm initializes (toprow) time warping as identity. For each iteration, we show the phase (left column), amplitudes(middle column), and mean (right column). After initialization, we show the next three iterations(second through fourth row). We terminate the process on the twentieth iteration (bottom row).
8
1.2.3 Space of Time Warping Functions
The time warping functions are represented by {γi} ∈ Γ where Γ is the set of all diffeomorphisms
from [0, 1] to [0, 1]. These functions form a group action on the SRVFs:
(qi, γi) = qi(γi)√γi.
This action preserves the norm of the SRVFs, i.e. ‖qi‖ = ‖(qi, γi)‖. To express these time
warping functions with a set of basis functions, they are mapped into a vector space:
vi(t) = G(γi(t)) =θi
sin(θi)
(√γi(t)− cos(θi)
), (1.1)
where
θi = cos−1
(∫ 1
0
√γi(t)dt
).
These functions can now be represented as:
vi(t) = µ(v)(t) +∞∑j=1
c(v)ij b
(v)j (t),
where {b(v)j } is an orthonormal basis of L2, and c
(v)ij has mean 0 and standard deviation σ
(v)j .
The parameter µ(v) is taken to be zero. Note that if µ(v) is not zero, the mean can be subtracted
from all vs, which corresponds to centering γs at γid(t) = t.
1.2.4 Distance Metric
The methods discussed in the previous subsections develop the distance metric:
d(qi, qj) = infγ∈Γ‖qi − (qj , γ)‖.
Variability from time warping can be estimated by finding a mean and set of time warping
functions, which minimize these distances:
9
Figure 1.7: Growth rates (cm/year) of girls between the ages of 1 and 18 years of age. The firstplot shows the unaligned growth curves. The curves are then aligned using Multiple Alignmentand are separated into amplitudes (center) and time warping functions (right).
{γi}, µ =N∑i=1
arginfγi∈Γ,µ
‖(qi, γi)− µ‖2,
where µ = 1N
∑Ni=1(qi, γi). This process is conducted iteratively using Algorithm 2.
Algorithm 2 Multiple Alignment
1. Set γi = γid, for all i
2. Interpolate qi at γi: qi = (qi, γi)
3. Compute the estimated mean: µ = 1N
∑Ni=1 qi
4. Use the Dynamic Programming Algorithm [19]:
γi = arginfγ∈Γ ‖(qi, γ)− µ‖2 for all i
5. Repeat steps 2 through 4 until converged
Figure 1.7 is an example of the Multiple Alignment algorithm. The figure shows growth curves
from the Berkeley Growth Study [9] on the left. We show the amplitude and time warping func-
tions in the middle and on the right, respectively. We align the growth curves to give a better
understanding of the pattern of growth spurts in children, without needing to worry about these
growth spurts occurring at different ages for different children.
1.3 Motivation for Testing Phase Variability
Once we understand phase-amplitude separation, it is important to determine when phase is
significant. Figure 1.8 and Figure 1.9 demonstrate the importance of testing for significance. Both
10
Figure 1.8: Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phase components(right).
Figure 1.9: Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phase components(right).
figures show sample functions (left) and their amplitudes (middle) and phases (right) In Figure
1.8, we separated the phase and amplitude components from a set of simulated functions using the
Multiple Alignment Algorithm. Figure 1.9 shows another example, however, it appears there is not
as extreme of a difference between the original functions and the amplitudes after alignment.
We can clearly see phase plays an important role in the first example. It is less clear in the
second example whether phase plays an important role. On the one hand, the functions look almost
identical after alignment. On the other hand, the phases are no identically identity. The locations
were phase is not identity appear to be where the original functions are almost zero. We are left
with the question: Is phase significant or not?.
Morris [15] conducted a literature review on functional regression. He notes that most work in
functional data analysis focuses on point estimates. He also notes that existing work on confidence
intervals and hypothesis testing for functions focuses on point wise estimation instead of on the
function as a whole. He concludes that more work is needed to be completed in this area.
11
Chapter 3 explores a metric-based approach to hypothesis testing for the presence of time
warping in functions, while Chapter 4 explores a model-based approach. Hypothesis testing allows
us to better understand how the effects of the various models and identify when overfitting using
a time warping function may be occurring.
Hagwood [7] conducted hypothesis tests on closed curves using time warping components in a
distance metric. Three preexisting tests for comparing two populations are used to conduct the
hypothesis: Friedman-Rafsky Test [6], Schilling’s Nearest Neighbors Test [18], and the Energy Test
[1]. Although our interests for hypothesis testing differs from Hagwood [7], this paper led us to
these three hypothesis testing methods.
Chapter 3 will used the Friedman-Rafsky Test (FRT) [6] and the Schilling Nearest Neighbors
Test (SNNT) [18] in our hypothesis tests. FRT employees a minimal spanning treat to determine
the extent of the separation between two populations of observations. SNNT uses a distance matrix
and indicator functions to compare the same separation. Later on in Chapter 3 we explore using
the Energy Test, which takes a similar approach to an ANOVA.
The main focus of Chapter 4 is on a model-based approach to creating a hypothesis test for
phase variability. This required the use of a paired test, which is a shortcoming of the tests we
introduce in Chapter 3. Chapter 4 will review a few of the methods we attempted to use before
we came across Concordance Correlation Coefficients (CCC) [14]. CCC are a way of measuring
agreement between paired continuous data. King [11] expanded the use of CCC for continuous
and categorical data. Li [13] expanded the use of CCC for evaluating the agreement of two sets of
functions.
Our methods require comparing multiple concordance correlation coefficients. Fisher [5] intro-
duced the z-transformation for use with correlation coefficients. Steiger [21] proposed comparing
two correlations using a correlation matrix. These methods allowed Barnhart [2] to create a method
for comparing multiple CCCs. Unfortunately this method of comparison for CCC does not work
well in cases of almost complete agreement.
1.4 Overview of Dissertation
This dissertation will have the following discussions:
1. Chapter 2: Methods of Modeling Functional Data
(a) Introduction Functional Principal Component Analysis (FPCA)
12
(b) Previous models of FPCA after phase-amplitude separation
(c) Introduce a new model for combining phase-amplitude separation with FPCA
2. Chapter 3: Metric-Based Hypothesis Testing for Phase Variability in Functional Data
(a) Review of the importance of hypothesis testing for phase in functional data analysis
(b) Using the Friedman-Rafsky Test and Schilling’s Nearest Neighbors in the hypothesis
testing
(c) Using the Energy Test with a permutation distribution for hypothesis testing
3. Chapter 4: Hypothesis testing for Phase Amplitude Separation using Models
(a) Review of the goals of hypothesis testing for phase in functional data analysis
(b) Introduction to Concordance Correlation Coefficients
(c) Using Concordance Correlation Coefficients in hypothesis testing for phase in functional
data
4. Each Chapter also includes various simulated and/or real data examples to demonstrate the
methods
13
CHAPTER 2
METHODS OF MODELING FUNCTIONAL DATA
As explained in Chapter 1, there are various L2 functional data sets which have been modeled in
the past. Understand the variability in functions is important for estimating and predicting. The
challenges of understanding functional variability include working in an infinite dimensional space.
In addition, we run into issues with time warping of functions and issues with registration.
We will first focus on the issue of working in an infinite dimensional space. Dimension reduction
is possible through the use of basis functions. As mentioned in Chapter 1, there are various models
for regression on functional data. This dissertation will focus on linear models instead of more
complicated models. There are two main strategies used to reduce the dimensions in a linear
model: use a pre-determined set of basis functions or use a data driven set of basis functions.
Examples of pre-determined sets of basis functions include the Fourier Basis or the polynomial
basis functions. This dissertation will focus on using data driven methods, specifically those that
stem from Principal Component Analysis.
The goal of this Chapter is to understand the more substantial directions of variability both
from amplitudes and from time warping. To do this, we will study the reconstruction of functions
under three different models.
This chapter will treat functions as random quantities, which can be measured in various ways.
The goal will be to find a model which best represents the variability in the given dataset.We will
need metrics for measuring differences in the functions and ways of representing the functions.
Section 2 will discuss some of the theory behind these models including a distance metric.
Section 3 will discuss the methodology behind the three models: FPCA, Separate FPCA and
Elastic FPCA. Section 4 will show results from simulations and real data examples. Section 5 will
draw conclusions and point to future directions for exploration.
2.1 Models for Functional Data
This section will explore two previous models for functional data: Functional Principal Compo-
nent Analysis (FPCA) and Separate Functional Principal Component Analysis (Separate FPCA).
14
We will introduce and expand on a third model: Elastic Functional Principal Component Analysis
(Elastic FPCA). Examples of the three models will be shown in the results section.
2.1.1 Functional Principal Component Analysis in SRVF Space
Functional Principal Component Analysis (FPCA) [3] is straightforward, since it does not take
time warping into consideration. FPCA comes almost directly from Principal Component Analysis
(PCA) [16]. PCA is a method of representing correlated, multivariate observations as uncorrelated
variables called Principal Components. PCA allows us to express observations using fewer vari-
ables. The Karhunen-Loeve Expansion Theorem [10] showed the same principals can be applied to
functions. The first model we explore, Functional Principal Component Analysis (FPCA) [3] uses
the Karhunen-Loeve Expansion Theorem to create a PCA equivalent for functions. We can express
these functions in the SRVF space as follows:
qi(t) = µ(t) +J∑j=1
cijbj(t) + εi(t), (2.1)
where:
• µ(t) is the expected value of qi(t),
• {bj} form an orthonormal basis of L2,
• εi ∈ L2 is considered the noise process, typically chosen to be a white Gaussian process with
zero mean, and variance σ2
• cij ∈ R are coefficients of (qi− µ) with respect to {bj}. In order to ensure that µ is the mean
of qi, we impose the condition that the sample mean of {c·j} is zero.
Given a set of observed functions {fi}, the estimation of these model parameters is performed
using the minimization on their SRVFs, {qi}:
(µ, ˆ{b}) = arginfµ,{bj},{cij}
N∑i=1
‖qi − µ−J∑j=1
cijbj‖2 , (2.2)
and set ci,j =⟨qi − µ, bj
⟩. This minimization is performed as follows:
Algorithm 3 Functional Principal Component Analysis
1. Estimate the mean: µ(t) = 1N
∑Ni=1 qi(t).
15
2. Define the sample covariance function: K : [0, 1]× [0, 1]→ R according to:
K(s, t) =1
N − 1
N∑i=1
(qi(s)− µ(s))(qi(t)− µ(t)) .
3. Perform Singular Value Decomposition on K to obtain the estimated first J Principal-Directions:
{bj}.
4. Compute the estimated coefficients: cij =⟨qi − µ, bj
⟩.
An estimate for qi is formed as follows:
qi(t) = µ(t) +J∑j=1
cij bj(t)
The function K is by definition symmetric and positive semidefinite, where the latter means
that for any g ∈ L2, we have∫ 1
0
∫ 10 K(s, t)g(s)g(t) ds dt ≥ 0. This covariance function K defines
a linear operator on L2 using the formula: A : L2 → L2, Aq(t) =∫ 1
0 K(s, t)q(s) ds. Since K is
positive semidefinite we have 〈Aq, q〉 ≥ 0, for all q ∈ L2 [8].
According to the Karhunen-Loeve expansion theorem [10], the eigenfunctions of A provide
the principal components of the function data. Let b1, b2, . . . , bJ be the eigenfunctions of A, i.e.
Abj(t) = λj bj(t), so that the corresponding eigenvalues {λj} satisfy: |λ1| ≥ |λ2| ≥ . . . . Then,
{bj , j = 1, 2 . . . , J} solves the optimization problem in 2.2. They are termed the first J principal-
directions of variations in the given data. The space spanned by them is called the principal
subspace.
As discussed in Hall [8], there are a few asymptotic properties of FPCA:
• As J →∞, ‖qi − qi‖ → 0
• As N →∞, bj → bj and var(cij)→ var(cij)
2.1.2 FPCA of Phase and Amplitude Components
Separate FPCA [23] takes possible time warping of the functions into consideration with the
following model:
qi(t) = (qi, γi) (2.3)
=
µ(t) +
J∑j=1
cijbj(t) + εi(t), G−1
J∑j=1
c(v)ij b
(v)j (t) + ε
(v)i (t)
, (2.4)
16
where all parameters are as discussed in Section 2.2 and G−1 is the inverse of Equation 1.1.
Estimates for γis and qis can be found using the following algorithm.
Algorithm 4 Separate FPCA
1. Align the SRVFs of the functions to get the qis and γis, using Algorithm 2.
2. FPCA on Amplitudes
(a) Perform FPCA on the qis to get their lower dimension reconstructions: ˆqi, using Algo-
rithm 3.
3. FPCA on Phases
(a) Map the γis from Γ to T (S∞) to get the vis using the G function.
(b) Perform FPCA on the vs to the get their lower dimension reconstructions, vis.
(c) Map the vis back to Γ to get the γis using G−1.
4. Combine the estimates as follows:
qi(t) = (ˆqi, γi−1).
The reconstruction on the amplitudes and time warping functions, separately, hold the same
properties as described in the FPCA section. However, the full reconstruction of the functions
is somewhat arbitrary since the principal directions are found separately and ignore the possible
dependence of the time warping functions on the amplitudes and vice versa. This is where Elastic
FPCA comes in.
2.1.3 Elastic FPCA
Elastic FPCA comes from models studied in [12] and [22]. It considers εi a single estimate of
error, as opposed to having two separate estimates of error. While εi represents unaccounted vari-
ability from the reconstructions, γi assists with minimizing the variability which can be described
as being from time warping. The model is as follows:
qi(t) =
µ(t) +J∑j=1
cijbj(t), γi
+ εi(t). (2.5)
Similar to the method used in Separate FPCA, we can apply a coordinate-descent technique
for estimation and the resulting algorithm can be summarized in the following.
17
Algorithm 5 Elastic Functional Principal Component Analysis
1. Initialization: Set γi = γid and qi = (qi, γi), for all i.
2. Compute the covariance matrix
Kq(s, t) =1
N − 1
N∑i=1
(qi(s)− µ(s))(qi(t)− µ(t)) .
3. Take the SVD of Kq, and set bjs to be the first J eigenvectors of Kq, cij =⟨qi, bj
⟩, and
µ = 1N
∑Ni=1 qi. The set of basis functions are now fixed for the following:
(a) For each i, solve the optimization problem using the DPA:
γi = arginfγ∈Γ
‖(qi, γ)− µ−J∑j=1
cij bj‖2 .
(b) Form the warped SRVFs qi = (qi, γi) for all i .
(c) Estimate µ using µ = 1N
∑Ni=1 qi .
(d) Compute the coefficients cij =⟨qi − µ, bj
⟩for i = 1, . . . , n and j = 1, . . . , J .
(e) Check for convergence of the reconstructions. If not converged, return to step (a).
4. Check for convergence of the basis functions. If not converged, return to Step 3.
We test the convergence using the decrease in the objective function from one iteration to the next.
Figure 2.1 is an abstract drawing of Algorithm 5. It does not reflect details of the algorithm, but
the reader may find it useful to get an intuitive sense of the iterations.
As J increases, εi decreases, and γi approaches γid. Note that when J = 0, Elastic FPCA
returns the mean of the aligned functions as found in Algorithm 2. As J → ∞, the convergence
properties described under FPCA hold, and converge to the same parameters as in FPCA.
2.2 Results
The simulations in this section used the parameters in Table 2.1. Remember that the model
uses the SRVFs and therefore we simulated SRVFs. The only parameters varied in the study relate
to the variance of the coefficients of the amplitudes and time warping functions. Variations in the
remaining parameters have been studied, however, they are less important in terms of current and
future work. Figures 2.2 through 2.6 show the original simulated functions under these parameters.
18
Figure 2.1: Representation of Algorithm 5A: Use {qi} (black dot) to find {bj}. The blue curve represents all possible functions using thisbasis set.B: Fix {bj} (now black), find optimal {cij} (red dot) by minimizing distance (purple line).C: Find {qi} (green dot), which is minimum distance from reparameterizations of {qi} to recon-struction.D: Find optimal reconstruction of {qi}, repeat until converged.E: Use {qi} to find a new basis set (blue).F: Repeat steps B through E until converged on {bj}.This image is only meant to be an abstract representation to give a more intuitive understandingof the algorithm. It does not reflect all the details of the algorithm, nor does it properly reflect themathematics behind the algorithm.
19
Table 2.1: The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate a variety ofamplitude and time warping variability.
Paramater Amplitudes Phases
µ Φ(.5, .5) 0{bj} Fourier FourierJ 4 2{cij} N(0, σ2
j ) N(0, σ2W )
var({cij}) σ2j = ξ/j, [.1, 1, 3]
ξ = [.01, 0.5, 2]
ξ = .01 ξ = 0.5 ξ = 2
Figure 2.2: The original sampled amplitude SRVF functions: {qi}. These amplitudes have almostno variability (left) and a lot of variability (right).
Figure 2.2 show the SRVFs that were simulated using the parameters in Table 2.1. The figure
shows the SRVFs with low (left), medium (middle), and high (right) variability. The inverse SRVFs,
i.e. the amplitudes, for each of the cases shown in Figure 2.2 are shown in Figure 2.3. Figure 2.4
show the time warping functions which were formed by the inverse exponential mapping from the
parameters in Table 2.1.
Each of the simulated amplitudes were composed with each of the simulated phases to create
nine test cases. Figure 2.5 show the SRVFs of nine test cases and Figure 2.6 show the inverse of
the SRVFs, i.e. the amplitudes. For both figures, the amount of phase variability increase to the
right, while the amount of amplitude variability increases downward. Looking at Figure 2.6, it is
clear the top left plot shows functions with very little variability overall. This variability increases
more as the phase and amplitude variability are increased.
Figures 2.7 and 2.8 show the reconstruction of the amplitudes under the various methods for
20
ξ = .01 ξ = 0.5 ξ = 2
Figure 2.3: The original sampled amplitude functions: {fi}. Note that the inverse SRVF of the basisfunctions are not orthonormal, so the variability in these functions do not necessarily correspondto the variability seen in Figure 2.2
.
σ2W = .1 σ2
W = 1 σ2W = 3
Figure 2.4: The original sampled time warping functions: {γi}. Note that these functions aresampled in a vector space and mapped to Γ. Therefore the variability in these functions do notnecessarily correspond to the variability described by Table 2.1
.
21
σ2W = .1 σ2
W = 1 σ2W = 3
ξ=.0
1ξ
=0.
5ξ
=2
Figure 2.5: The original SRVF functions: {qi}. Note that the same amplitudes are used for eachrow and the same time warping functions are used for each column.
22
σ2W = .1 σ2
W = 1 σ2W = 3
ξ=.0
1ξ
=0.
5ξ
=2
Figure 2.6: The original sampled functions: {fi}. Note that the same amplitudes are used for eachrow and the same time warping functions are used for each column.
23
the middle variabilities, i.e. ξ = 0.5 and σ2W = 1. This is just to give an idea of the reconstruction
process. A comparison of the error rates is given for all 9 test cases at the end. Both figures 2.7 and
2.8 show the reconstructions of this particular test case for FPCA (left column), Separate FPCA
(middle column), and Elastic FPCA (right column). We show the reconstructions when including
J = 1, 2, 3, and 4 basis sets (top to bottom).
In FPCA, the amplitudes are the only interest and therefore the reconstruction of the amplitudes
is the reconstruction of the functions themselves. In Elastic FPCA, the same is true and γs are used
specifically to aid in this reconstruction process. In Separate FPCA, the amplitudes are considered
independent of the γs. This is reflected by the Separate FPCA amplitudes approaching aligned
amplitudes rather than the original functions.
Figure 2.9 shows reconstructions of the amplitude functions using the Separate FPCA method,
as opposed to incorporating both the reconstructed amplitudes and reconstructed phases. As previ-
ously mentioned, the reconstructions on the amplitudes and time warping functions are conducted
separately and do not necessarily make sense to show together. Figure 2.9 shows the reconstruc-
tions of the SRVFs (left) and amplitudes (right) when J = 1, 2, 3, and 4 (top to bottom) sets of
basis functions are included in the reconstructions.
Figure 2.10 shows the basis functions used in the reconstruction process for FPCA (left column),
Separate FPCA (middle column), and Elastic FPCA (right column) using J = 1, 2, 3, and 4 basis
functions in the reconstruction process. The jth set of basis functions used in the reconstruction
using J basis functions for FPCA and Separate FPCA are consistent. That is, the basis function
shown for J = 1 for FPCA is the same as the first basis function for the remaining J . The same
is true for the Separate FPCA method. The basis functions used in Elastic FPCA converge to the
basis functions used in FPCA.
The Separate FPCA and Elastic FPCA methods both include time warping functions. In
Separate FPCA, they are an independent parameter with their own reconstruction process, while
in Elastic FPCA the time warping functions assist with minimizing errors beyond what the basis
functions can do. Since the time warping functions serve different purposes for the two methods
Figure 2.11 shows the process of reconstructing the time warping functions for Separate FPCA
method using J = 1, 2, 3, and 4 (top to bottom) basis functions in the reconstruction process. The
basis functions (left column) are used to form functions on the Hilbert Sphere (middle column),
which are then mapped ot the space of time warping functions (right column). Figure 2.12 shows
24
FPCA Separate Elastic
J=
1J
=2
J=
3J
=4
Figure 2.7: The reconstructed SRVF functions: {qi} for the three methods. All three methods areconverging to the same set of functions. Computational noise is causing the functions to look morejagged. This noise is very minimal and is avoided when observing the {fi} in Figure 2.8.
25
FPCA Separate Elastic
J=
1J
=2
J=
3J
=4
Figure 2.8: The reconstructed functions: {fi} for the three methods. Note that these are theinverse SRVFs of the reconstrunstructions, as oppose to reconstructions themselves.
26
{qi} {fi}
J=
1J
=2
J=
3J
=4
Figure 2.9: The amplitude reconstruction of the SRVFs and functions: {ˆqi} and { ˆfi} for Sepa-
rate FPCA. Note that the time warping functions seem to have accounted for some of the initialamplitude variability and therefore there is almost none present in the reconstructions.
27
FPCA Separate Elastic
J=
1J
=2
J=
3J
=4
Figure 2.10: The basis functions of the SRVFs, scaled by the variance of the coefficients, used inthe reconstruction process of the amplitudes. Computational noise causes these functions to appearjagged, although this error is minimal.
28
the process of deconstructing the time warping functions for Elastic FPCA using J = 0, 1, 2, 3, and
4 basis functions in the reconstruction process.
The error rates of the reconstructions are computed using equation 2.6. The error rates are
displayed for the three methods in Figure 2.13. Reconstruction errors are only shown for up to
J = 4, because the reconstructions using the Separate and Elastic FPCA methods have converged
by then. The reconstruction error rates are shown for all nine test cases, where the phase variability
increases moving left and the amplitude variability increases moving downward.
ErrorRate =N∑i=1
‖qi − qi‖/‖qi‖ (2.6)
Elastic FPCA tends to converges around J = 1 while Separate FPCA tends to converge around
J = 2 or 3, but not as consistently as FPCA. This is because the reconstruction rates of the vs does
not translate back to the reconstruction rates of the γs. FPCA converges the slowest and does not
usually seem to converge by J = 4 unless no time warping is present.
2.3 Conclusion
When the functions are not simulated with time warping variability, it is clear the Separate
FPCA struggles taking potential time warping variability into account. It is clear Elastic FPCA is
the desired method to minimize variability in the functions, while avoiding unecessary time warping.
It is noted that the amplitude variability in the Separate FPCA method is low and the basis
functions capture a lot of noise created in the minimization process.
It is possible that a combination of FPCA and Separate FPCA would perform better if there is
a way of indicating whether time warping is truly present in the functions or not. This topic will
be discussed in Chapter 3.
29
{bj} {vi} {γi}
J=
1J
=2
J=
3J
=4
Figure 2.11: The first column contains basis functions of the {vi}, scaled by the variance of thecoefficients, used in the reconstruction process of the amplitudes from Separate FPCA. The secondcolumn are the {vi} themselves. The final column is the exponential mapping of the {vi}, i.e. the{γi}.
30
Figure 2.12: Starting in the top row and moving left to right: {γi} from the Elastic FPCA methodfor J = 0, . . . , 4. A plot for the {γi} at J = 0 are included for reference. Note that this is the onlypart of any of the methods where J = 0 is more than just a simple mean of the functions.
31
σ2W = .1 σ2
W = 1 σ2W = 3
ξ=.0
1ξ
=0.
5ξ
=2
Figure 2.13: These are the error rates for all 9 variability settings. Each individual plot shows theerror rates for J = 0, ..., 4 of the reconstructions for FPCA (blue), Separate FPCA (green) andElastic FPCA (red), using Equation 2.6.
32
CHAPTER 3
METRIC-BASED HYPOTHESIS TESTING FOR
PHASE VARIABILITY IN FUNCTIONAL DATA
As observed in Chapter 2, time warping can sometimes be unnecessary to functional data analysis.
This maybe because the amount of phase variability is too small to make a significant difference.
We need to have a way of detecting when we should consider time warping and when we are risking
overfitting the data. This is similar to determining the significance of the slope in simple linear
regression. We need to know: Is the phase variability significant for a set of functions?
We remind the reader that for a set of functions {fi}, we can separate the phase and amplitude
components of the functions using the Multiple Alignment Algorithm introduced in Chapter 1.
Any particular function, fi can then be expressed as a pair fi = (fi, γi), where fi is considered the
amplitude and γi is considered the phase. The Multiple Alignment Algorithm uses the SRVF of fi:
qi(t) =
fi(t)√|fi(t)|
if fi(t) exists and is nonzero
0 otherwise,
so that the action of the time warping functions is norm preserving. More details are presented
in Chapter 1.
Figure 3.1 gives an example of the alignment solution. The first column shows two different
samples of simulated functions: an example with large phase variable (top) and an example with
almost no phase variability (bottom). We can see evidence of the phase variability when we separate
the phase and amplitude components as shown in the right column. Note that the amplitudes,
shown in the middle column, of the second case look almost identical to the original functions,
while the amplitudes in the first case are shifted. We would like a formal test that assess the
amount of phase variability in the phase component in a given data set. In other words, we would
like a hypothesis test to indicate there is phase variability in the first case, but not in the second
case.
33
f f γC
ase
1C
ase
2
Figure 3.1: We observe sampled functions with phase variability (top) and without phase variability(bottom), in the first column. These functions are then split into amplitudes (second column) andphase (third column) components.
The goal of Chapters 3 and 4 is to construct such tests for the significance of phase variability.
The hypothesis test, using the same notation as in Chapter 2 is:
H0 : {γi = γid, for all i}
HA : {γi 6= γid, for at least one i}.(3.1)
We take two approaches to performing this test: a metric-based approach and a model-based
approach.
In Chapter 3, we will take a metric-based approach. The basic idea is to use the metric between
functions to separate out the phase component {γi} and amplitude component {qi}. Given a metric,
we can develop a test involving differences between {qi} and {qi}. We can rephrase this test as:
H0 : {qi = qi, for all i}
HA : {qi 6= qi, for at least one i}.
This is equivalent to the previous hypothesis, since qi = qi if and only if γi = γid. We make further
modifications to this hypothesis test in Chapter 3.
In Chapter 4, we will take a model-based approach. The goal of the model-based approach is to
consider phase and amplitude together and not separately. We motivate the model-based approach
34
at the end of Chapter 3. The specific models used are FPCA based, but one can choose ay other
type of model also. If we use the Elastic FPCA model introduced in Chapter 2, the amount of phase
variability in our model decreases as we allow for more amplitude variability. We can reconstruct
a set of functions using Elastic FPCA (taking both phase and amplitude into consideration) and
compare it with the standard FPCA model (taking just amplitude into consideration). The FPCA
and Elastic FPCA models are equal if and only if there is no phase variability present. In this
approach, we quantify the differences using the reconstructions based on the two models. We can
write this as:
H0 : {qFi = qEi , for all i}
HA : {qFi 6= qEi , for at least one i}.
This is equivalent to the initial hypothesis test, since the models are equal if and only if γi = γid.
We make further modifications to this hypothesis test in Chapter 4.
We remind the reader that this Chapter will introduce methods for testing the significance of
phase by using a metric-based approach. We will begin this process by first introducing tests to
compare groups of functions. Specifically, Hagwood [7] explored the application of the Friedman-
Rafsky Test (FRT) [6] and Schilling Nearest Neighbors Test (SNNT) [18] on closed curves in RM .
They used methods of finding geodesic distances, which are similar to the problem of open curves.
What makes applying these methods particularly difficult is the need for independence. The
alignment process creates a dependence of the aligned curves on each other. We observe a simple
solution to this problem, with future exploration for better solutions possible. Other than this, the
challenges we face with this chapter are similar to the challenges addressed in Chapter 2.
Section 1 will discuss the theory behind hypothesis testing on L2 functions. Section 2 will give
an overview of the methods and procedures used in FRT and SNNT. Section 3 will show some
simulated and real examples of this application. Section 4 will show additional work using the
Energy Test, which which will be introduced later. Section 5 will make observations about the
examples and point to future directions for exploration.
3.1 Background Material
Given a sample of functions, let S1 represent N unaligned curves and S2 represent N aligned
curves. If S1 contained mostly aligned curves, then the distance between S1 and S2 shouldn’t be
significant. We use the L2 norm as the distance.
35
Both methods, FRT and SNNT, are applicable to sample data sets of unequal size. For the
purpose of this paper, we are not interested in unequal samples sizes and only observe equal sample
sizes. We clarify these reasons in the next few sections.
3.2 Metric-Based Hypothesis Tests
The following are hypothesis tests which use compare two samples of functions using a distance
matrix. In each example, the distance between two functions is defined using the L2 norm of their
SRVFs.
The basic way we use these tests is to compare two samples: S1 = {qi} and S2 = {qi}, where {qi}
is the SRVF of the original functions and {qi} are found using Algorithm 2. Figure 3.2 illustrates
this set up using the Berkeley Growth Curves. The original growth curves are shown in the upper
left. The SRVFs are shown in the upper right. The aligned SRVFs are shown in the lower left. The
SRVFs and aligned SRVFs are taken as the two samples. The lower right panel shows a pairwise
distance matrix between the two samples and will be used shortly.
Better methods for applying these methods are discussed and shown later in this chapter and
discussed in the future works chapter.
3.2.1 Friedman-Rafsky Test
The Friedman-Rafsky Test (FRT) is a generalization of the Wald-Wolfowitzs Runs Test (WWRT),
in which two populations in R are compared. In WWRT, the data is ordered and the number of
runs of consecutive sample members are recorded as the test statistic.
FRT replaces R with RM and uses a Minimal Spanning Tree as a form of ordering the data.
Edges connecting nodes of different samples are eliminated and then the number of disjoint trees,
TN , are counted. TN is asymptotically normally distributed with mean, µN = N + 1 and variance,
σ2N =
N
2N − 1
(N − 1− C − 2N + 2
2N − 3
),
where C is the sum of edge pairs, which share a common node. The null hypothesis is rejected
for small values of TN . A quick example of this process with the minimal spanning trees is shown
in Figure 3.3. We form a minimal spanning tree (left) and then break the minimal spanning tree
at edges connecting between the two samples (right). This leaves TN = 13 disjoint trees remaining.
In this particular example, one sample remains together as one tree, indicating a smaller variance
between that sample than with the other sample.
36
Figure 3.2: The first image shows the original growth curves (cm/year). Their SRVFs are takenand then aligned, shown in the second and third images respectively. The L2 distances between callthe curves in the second and third image are displayed in the fourth image. The first half of theindices correspond to the unaligned SRVFs and the second half correspond to the aligned SRVFs.
37
Figure 3.3: A Minimal Spanning Tree (left) is formed from two samples. As described in Algorithm6, the tree is then cut where edges connect nodes from different samples (right). There are nowTN = 13 separate trees remaining. This example shows that one sample (red) is completelyconnected in the original tree. Some nodes are only connected to the original tree through theopposite sample, which now form trees of a single node.
Algorithm 6 Friedman-Rafsky Test
1. Take the SRVFs of the original functions to get S1 = {qi}.
2. Use Algorithm 2 to get S2 = {qi}.
3. Compute a distance matrix between all the functions from both samples.
4. Use the distance matrix to form an MST.
5. Break the MST at edges connecting nodes of different samples.
6. Count the number of disjoint trees, TN .
3.2.2 Schilling Nearest Neighbors Test
Schillings Nearest Neighbors Test observes the proportion of K nearest neighbors to a function
{qi}, which belong to the same sample. The K nearest neighbors are determined by pooling the
functions from both samples and computing a distance matrix. This forms the following test
statistics:
TNK =1
2NK
2N∑i=1
K∑k=1
Ii(k),
38
Table 3.1: Distances between observations are shown above. Using K = 2 as an example, the twonearest neighbors for each observation is in bold for that row. For example, A1’s nearest neighborsare A2 and B1. We now count the number of these neighbors, from the same sample size as theobservation (last column). There are 10 such neighbors, resulting in TNK = 10
12 .
A1 A2 A3 B1 B2 B3
A1 - 1.1 3.2 1.2 2.0 4.2 1A2 1.1 - 1.9 2.7 2.6 3.6 2A3 3.2 1.9 - 2.5 3.5 3.4 1B1 1.2 2.7 2.5 - 0.9 1.1 2B2 2.0 2.6 3.5 0.9 - 0.4 2B3 4.2 3.6 3.4 1.1 0.4 - 2
where Ii(k) = 1 if the kth closest function to i belongs to the same sample and 0 otherwise.
Schilling [18] shows TNK is normally distributed with mean, µ = 0.5 and variance, σ2K = 0.25 +
0.25(1−(
2KK
)2−2K). An example of this is shown in Table 3.1.
3.3 Results
The results are shown in two subsections. The first subsection shows some basic examples of the
methods described in the previous section. The second subsection shows results using a repeated
sampling method. The repeated sampling method is detailed before showing results.
3.3.1 Basic Examples
Two simulated sample data sets are examined. The first is simulated bimodal functions with
some amplitude and time warping variability. The second is simulated unimodal functions with a
small amount of amplitude variability and a moderate amount of time warping variability.
Figure 3.4 shows the original functions (left) and the amplitudes after alignment (middle). The
amplitudes after alignment are aligned again and are shown in the third column. For FRT, Table
3.2 shows each data set with two p-values. The first p-value compares the unaligned functions
as S1 and the aligned functions as S2. The second p-value compares the aligned functions to the
re-aligned functions. The purpose of the second p-value is to verify already aligned functions will
need no further alignment (i.e. the null hypothesis is not rejected).
Figure 3.5 shows the results from SNNT as a function of K for the Bimodal example (top) and
Unimodal example (bottom). Both the test statistics (left column) and p-values (right column) are
39
Bim
od
al
Un
imod
al
Figure 3.4: Various simulated functions before (left) and after alignment (center). The secondround of alignment (right) shows both the functions from the first round of alignment (blue) andthe second (red) for comparison.
Table 3.2: The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions.
Data Set Unaligned Aligned
Bimodal ≈ 10−9 0.50Unimodal ≈ 10−12 ≈ 10−14
shown The results are similar to those found using FRT. The variance of the distributions are a
function of K only, and are therefore the same for both p-values and not displayed.
It is of note that there is so little variability in the amplitudes of the unimodal functions that
computational error seems to play a roll in the significance of the 2nd p-value. The unimodal
functions return p-values as expected when comparing unaligned to aligned functions, but compu-
tational error results in the two sets of aligned functions being identified is different. The bimodal
functions return p-values as expected.
The next two examples are the Berkeley Growth Data [9], which measure the rate of growth
(cm/year) of female and male children quarterly from age 1 year to 18. Figure 3.6 shows the growth
40
TN P-value
Bim
od
alU
nim
od
al
Figure 3.5: The TN (left) and P-values (right) for the simulated functions using the SNNT methodare shown as a function of K. The results are shown for comparing unaligned functions to aligned(blue) and aligned to aligned (red) functions.
Table 3.3: The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions.
Data Set Unaligned Aligned
Female Growth ≈ 10−14 0.58Male Growth ≈ 10−8 0.50
curves for male (bottom row) and female (top row) subjects, before alignment (left column) and
after one (middle column) and two (right column) rounds of alignment, in the same manner as the
simulated functions. Table 3.3 and Figure 3.7 show the results for FRT and SNNT, respectively.
Figure 3.7 show the test statistic (left column) and p-values (right column) as a function of K,
the number of ”neighbors” to include in the calculations. These plots are shown for both the male
(bottom row) and female (top row) data sets. In both cases, we observe a significant difference
in the unaligned and aligned curves and a non-significant difference when comparing the aligned
curves to re-aligned curves.
The process we just observed compared amplitudes to time warpings of the same amplitudes.
This creates a dependence between samples, which is unwanted. This makes these results worth
41
Fem
ale
Gro
wth
Mal
eG
row
th
Figure 3.6: Various growth curves before (left) and after alignment (center). The second round ofalignment (right) shows both the functions from the first round of alignment (blue) and the second(red) for comparison.
TN P-value
Fem
ale
Mal
e
Figure 3.7: The TN (left) and P-values (right) for the Berkeley Growth curves using the SNNTmethod are shown as a function of K. The results are shown for comparing unaligned functions tounaligned (blue) and aligned to aligned (red) functions.
42
looking at to understand how we might compare the two methods, however, the results should be
taken with a grain of salt.
3.3.2 Pseudo - Bootstrap
In an attempt to address the issue of independence between the amplitudes, a process similar to
bootstrapping is used. The data set is randomly split into two samples. The first sample, S1, will
contain half the sampled functions. The second sample, which S2 contains the remaining sampled
functions, are aligned. The two samples are then compared to get a test statistics and p-value using
FRT and SNNT. This process is repeated to create a distribution of test statistics and p-values.
This process is shown in Algorithm 7.
Algorithm 7 Pseudo - Bootstrap
1. For each desired pseudo - bootstrap sample:
(a) Partition {qi} into two samples: S1 and S∗2 .
(b) Align S∗2 using Algorithm 2 to get S2.
(c) Compare S1 and S2 using FRT or SNNT.
2. Repeat step 1 to create a distribution of test statistics and p-values.
Figure 3.8 shows the test statistics (left four plots), and p-values (right four plots) as a distri-
bution for both simulated samples (binomial on top and unimodal on bottom row) and for both
methods (FRT in the first and third column, SNNT in the second and fourth column). To avoid
showing an excessive number of these plots we choose K = 6 for SNNT, because the test statistics
have mostly converged by this point. These same results are shown for the Berkeley Growth data
in Figure 3.9, for both the male (bottom row) and female (top row) data sets.
3.4 Additional Work: Energy Test and Permutation Distribution
In Hagwood [7] a third test is used in addition to FRT and SNNT, called the Energy Test.
The Energy Test is used to make the same comparison as in the first two tests, but at a later
point in time. While the FRT and SNNT results give a single p-value, the Energy Test employs
the permutation distribution recommended by Hagwood to make a more meaningful comparison.
We did not use the permutation distribution with FRT and SNNT. We have not made a formal
comparison between the Energy Test and FRT and SNNT, which are all less successful than the
43
TN - FRT TN - SNNT P-value - FRT P-value - SNNT
Bim
od
al
Un
imod
al
Figure 3.8: Unaligned simulated functions are compared to aligned simulated functions using thesampling technique described. Results are shown for FRT and for K = 6 in SNNT.
TN - FRT TN - SNNT P-value - FRT P-value - SNNT
Fem
ale
Male
Figure 3.9: Unaligned growth curves are compared to aligned growth curves using the samplingtechnique described. Results are shown for FRT and for K = 6 in SNNT.
44
results of the Model-Based Hypothesis Test presented in Chapter 4. Because of this, the methods
and results of the Energy Test are summarized in this section.
Zech and Aslan in 2003 [1] introduced the Energy Test. The method is similar to an ANOVA,
where the test statistic is based on distances within groups compared to the overall distance.
These distances form a test statistic, which is then compared to a permutation distribution. The
permutation distribution is formed by repeatedly, randomly permuting the groups of functions and
computing a new test statistics.
3.4.1 Energy Test and Permutation Distribution Methods
Using the same set up as in the rest of this chapter, we want to compare two sets of SRVFs: {qi}
and {qi}, both with sample size N . Note that we are still making assumptions of independences
between these two groups of functions, which is not true. The test statistic for the Energy Test is:
TN =1
N
N∑i=1
N∑j=1
d(qi, qj)−N∑i=1
∑j<i
d(qi, qj)−N∑i=1
∑j<i
d(qi, qj)
.
This test statistic, TN is a degenerate V-statistic. This leads to the Permutation Distribution,
which would allow us to have a comparison for TN . The permutation distribution gives an approx-
imate distribution for the test statistic under the null hypothesis, that {qi} and {qi} come from the
same distribution. Assuming these distributions are the same, we can assume the ”groupings” of
{qi} and {qi} are formed at random. Algorithm 8 shows how to use this information to form the
permutation distribution for the Energy Test.
Algorithm 8 Energy Test with Permutation Distribution
1. Begin with samples {qi} and {qi}, find TN
2. Combine {qi} and {qi} into one sample Z
3. For each desired sample test statistic:
(a) Sample from Z, without replacement, S∗1
(b) Set unsampled Z to S∗2
(c) Find T ∗i using S∗1 and S∗2
4. P-value is proportion of T ∗i more extreme than TN
45
Figure 3.10: We want to know: are the fi (blue) significantly different from the fi (red)? The teststatistic is TN = 1540.8− 986.5− 68.9 = 485.4.
Figure 3.10 gives an example of two such sets of functions. We get a test statistic of TN =
485.4. Figure 3.11 shows the permutation distribution for the test statistics by using Algorithm 8.
Comparing the permutation distribution to the original test statistic indicates that the {qi} and
{qi} are significantly different.
3.4.2 Energy Test and Permutation Distribution Results
Only simulation results are presented for the Energy Test with the Permutation Distribution.
Figure 3.12 shows three sets of amplitudes (top) and three sampled sets of phases (bottom), which
are simulated to have small (first column), medium (second column), and large (third column)
variability. Figure 3.13 shows the composition for each of the nine possible pairs of the amplitudes
and phases, with the variability of the amplitudes increasing to the left and phase variability
increasing upward. These composed functions will serve as nine simulated cases of {fi}.
For each of the nine simulated functions, we can see the test statistics and permutation distri-
butions in Figure 3.14, the variability of the amplitudes increasing to the left and phase variability
increasing upward in the test cases. Note that as the amplitude variability increases, TN moves
closer to the center of the permutation distribution, making the results less and less significant. The
opposite occurs as the phase variability increases. That is to say, when phase variability increases,
the test statistic moves further away from the permutation distribution.
46
Figure 3.11: Distribution of TN under null hypothesis (blue). Estimated TN from original sample(red).
Am
ps
Ph
ases
Figure 3.12: Amplitudes (top) and Phases (bottom) used for simulation. No axes are shown, butall functions are plotted on the same scale. These amplitudes and phases are composed to formthe initial functions used in the simulation.
47
Ph
ase
Var
iab
ilit
y
Amplitude Variability
Figure 3.13: The amplitudes and phases are composed to simulate a variety of functions. Noaxes are shown, but all functions are plotted on the same scale. The variability in the amplitudesincreases toward the right. The variability in the phases increases upward.
48
Ph
ase
Var
iab
ilit
y
Amplitude Variability
Figure 3.14: For each of the nine cases, we see the test statistics (red) along with the permutationdistribution. The nine cases shown are all plotted on the same axes to demonstrate the change insignificance as phase and amplitude variability increase.
49
Figure 3.15: This image illustrates when phase variability is significant or not significant. Thecenter region is where the methods proposed in Chapter 3 struggle.
3.5 Conclusion
In this Chapter we proposed several ways to test the significance of phase variability using the
metric introduced at the start of Chapter 1. We evaluated the significance using the Friedman-
Rafsky Test, Schilling’s Nearest Neighbors Test, and the Energy Test. We also proposed a ”Pseudo-
Bootstrap” solution to attempt to create a null distribution of test statistics for the FRT and SNNT
to address issues of independence.
The FRT and SNNT gave similar results for the same data sets. Both methods, along with the
Energy Test, violate the same assumptions of independence between the two groups of functions
being compared. Although the ”Pseudo-Bootstrap” helped to address this assumption in the null
distribution, a better method is clearly needed. The work in Chapter 4 will propose a paired test
to try and address this assumption.
From the results of the Energy Test, it is clear that quantifications of phase variability are in
relation to the amplitude variability. The metric-based approach allows for the phase component
to account for variability that could be from the amplitudes. Figure 3.15 illustrates this issue. This
could be addressed by considering a model that accounts for both phase and amplitude variability.
Chapter 4 will explore this topic.
50
CHAPTER 4
MODEL-BASED HYPOTHESIS TESTING FOR
PHASE IN FUNCTIONAL DATA
In Chapter 2, we discussed an Elastic framework for aligning functional data. As described there,
for a set of functions {fi} one can decompose the functions into amplitude components, {fi}, and
phases component, {γi}. The Multiple Alignment algorithm shown in Chapter 2 is used to solve
for {qi}, the SRVFs of {fi}. Chapter 2 also discusses and provides examples of the FPCA and
Elastic FPCA models.
In Chapter 3, we presented a study to create hypothesis tests to consider if alignment is necessary
for a given set of functions. If alignment isn’t a significant component, then including it in a model
could result in overfitting. Using a metric-based approach, in Chapter 3 we found that aligning all
the functions to the mean often resulted in a significant difference between {qi} and {qi}.
In the metric-based approach, the Multiple Alignment algorithm allows for the phase variability
to account for variance that could be attributed to amplitude variability. This means that our
hypothesis tests needs a way to model phase and amplitude variability together. The Elastic
FPCA model is a good candidate, because it allows for more of the amplitude variability to be
account for during the alignment process, reducing the need for a more variability in the phase
component.
The goal of this Chapter is to create a model-based hypothesis test for phase variability. In
order to do this, we need the following:
1. A model which considers phase and amplitude variability and a model which only considers
amplitude variability for comparison
2. A test for comparing these two models
We already indicated the Elastic FPCA model is a good candidate for a model with a phase
component. We use the standard FPCA model for comparison. Section 1 gives a brief review of
these two models.
The second item on the list is the main focus of the Chapter: How can we compare the FPCA
and Elastic FPCA models?
51
Section 1 gives a brief review of the FPCA and Elastic FPCA models discussed in Chapter 2.
Section 2 gives an overview of the hypothesis testing problem and discusses previous solutions. We
choose to use Concordance Correlation Coefficients [13] as a tool for comparing the two models
for various reasons, which we explain in Sections 2. Section 3 introduces Concordance Correlation
Coefficients and gives some basic examples. Section 4 combines the methodology for using Concor-
dance Correlation Coefficients with the Elastic FPCA and FPCA models. We present simulated
examples and real data applications in Section 5.
4.1 Background Material: Review of Functional Data Models
The goal of this Chapter is to create a model-based hypothesis test for the significance of phase
variability. In Chapter 3, we noted that phase and amplitude variability could not be considered
completely separate. Therefore, we need a model which incorporates both amplitude and phase
variability together. We also need a ”null” model, which does not include phase variability. We will
use the FPCA model (considers amplitude only) and Elastic FPCA model (considers both phase
and amplitude) as our two models.
Chapter 2 introduces the FPCA and Elastic FPCA models. This section provides a quick review
of these models. We remind the reader that both are discussed and have numerous examples in
Chapter 2.
4.1.1 Functional Principal Component Analysis
Functional Principal Component Analysis (FPCA) [3] comes almost directly from Principal
Component Analysis (PCA) [16]. PCA is a method of representing correlated observations in RN
as uncorrelated variables called Principal Components. PCA allows us to express observations
using fewer variables. The Karhunen-Loeve Expansion Theorem [10] showed the same principles
can be applied to functions. The first model we explore, Functional Principal Component Analysis
(FPCA) [3] uses the Karhunen-Leove Expansion Theorem to create a PCA equivalent for functions.
For a set of sampled functions, {fi}, we express the SRVFs, {qi}, using the FPCA model:
qi(t) = µ(t) +J∑j=1
cijbj(t) + εi(t), (4.1)
where:
• µ(t) is the expected value of qi(t),
52
• {bj} form an orthonormal basis of L2,
• εi ∈ L2 is considered the noise process, typically chosen to be a white Gaussian process with
zero mean, and variance σ2
• cij ∈ R are coefficients of (qi − µ) with respect to {bj}, i.e. ci,j = 〈qi − µ, bj〉. In order to
ensure that µ is the mean of qi, we impose the condition that the sample mean of {c·j} is
zero.
Given a set of observed functions {fi}, the estimation of these model parameters is performed
using the following minimization:
(µ, ˆ{b}) = arginfµ,{bj},{cij}
N∑i=1
‖qi − µ−J∑j=1
cijbj‖2 , (4.2)
and set ci,j =⟨qi − µ, bj
⟩. More details about this model are in Chapter 2, including several
examples.
4.1.2 Elastic Functional Principal Component Analysis
Elastic Functional Principal Component Analysis expands on work completed by Kneip [12] and
Tucker [22]. Elastic FPCA models functions with an FPCA basis while considering the amplitude
and phase components jointly. While εi represents unaccounted variability from the reconstructions,
γi assists with minimizing the variability attributed to phase. The model is as follows:
qi(t) =
µ(t) +
J∑j=1
cijbj(t), γi
+ εi(t). (4.3)
This model is different from 4.1 because of the phase component, as mentioned above. Given
a set of observed functions {fi}, the estimation of these model parameters is performed using the
minimization on their SRVFs, {qi}:
(µ, ˆ{b}) = arginfµ,{bj},{cij}
N∑i=1
‖ (qi, γi)− µ−J∑j=1
cijbj‖2 , (4.4)
and set ci,j =⟨
(qi, γi)− µ, bj⟩
.
Equivalently, we can write the expression above as:
(µ, ˆ{b}) = arginfµ,{bj},{cij}
N∑i=1
‖qi − µ−J∑j=1
cij bj‖2 , (4.5)
53
and set ˆci,j =⟨qi − ˆµ,
ˆbj
⟩. The tilde’s indicate we have composed each component with a time
warping function. In this case, the qi are not the aligned functions from the Multiple Alignment
algorithm, but are composed with the γi found in the Elastic FPCA model.
More details about this model are in Chapter 2, including several examples.
4.2 Introduction to Hypothesis Testing for Phase Variability
Now we return to the main problem studied in this thesis: Does a given set of functional data
contain significant phase variability?
We remind the reader that this Chapter is taking a model-based approach to creating a hy-
pothesis test to answer this question. In the previous section, we reviewed two models, which we
initially introduced in Chapter 2: FPCA and Elastic FPCA. We now need a way to compare these
two models. This is an important issue, because if there is no time warping present, we risk over
fitting by using Elastic FPCA. We need to determine if time warping is present in the data or not.
We remind the reader that the basic hypothesis test can be written as:
H0 : {γi = γid, for all i}
HA : {γi 6= γid, for at least one i}.
As seen in Chapter 3, it is difficult to directly compare the γi, instead we can modify the hypothesis
test. To understand how we arrived at the modified test, we will review some facts we have already
discussed.
Let qF (J)i represent the reconstruction of qi using FPCA with J basis functions for reconstruc-
tion. Equivalently, let qE(J)i represent the same reconstruction using Elastic FPCA. As J → ∞,
qF (J)i , q
E(J)i → qi. We have previously used this fact to make comparisons similar to those in
Chapter 3 (but are not presented in this dissertation), which include:
1. H0 : PE = PF or HA : PE 6= PF .
• PE , PF represent underlying properties of {qE(J)i } and {qF (J)
i }, respectively. This reflects
the notation used by Friedman [6].
• We used the tests from Hagwood: Friedman-Rafsky Test [6], Energy Test [1] and
Schilling’s Nearest Neighbors[18] to make the comparison.
2. We used the same set up and solution as in (1), but tested using the residual errors.
3. H0 : E[qF (J)i ] = E[q
E(J)i ] or HA : E[q
F (J)i ] 6= E[q
E(J)i ].
54
• This shifts our solution to a paired test.
• We also tried using residual errors.
Hagwood’s paper inspired the first modified tests. Issues rise, because the variance of the
reconstructions under Elastic FPCA are always smaller than reconstructions under FPCA. The
second test addresses this issue by considering the mean and basis sets are different for each model.
We then shifted to paired tests since our functions are not independent between the samples.
The equivalence of the modified hypothesis tests are stated formally in the following Lemma:
Lemma 1 For the FPCA and Elastic FPCA models, the following hypothesis tests are equiv-
alent:
1. H0 : {γi = γid, for all i}, or HA : {γi 6= γid, for at least one i}.
2. H0 : {qF (J)i = q
E(J)i , for all i} or HA : {qF (J)
i 6= qE(J)i , for at least one i }.
3. H0 : {εF (J)i = ε
E(J)i , for all i} or HA : {εF (J)
i 6= εE(J)i , for at least one i }
We created something similar to a paired t-test for functional data for the equivalent hypothesis
tests in Lemma 1. The paired test is an improvement, but still is not dealing with the differences
in variance properly, which causes some problems when there is relatively low variance in the
amplitudes.
It should be noted that hypothesis testing isn’t completely new to functional data. Morris
[15] points out that most of the work on hypothesis testing with functional data uses pointwise
estimates. As an example, we could compare two samples of functional data using a t-test at each
time point in the domain. These hypothesis tests then involve multiple comparison issues and do
not give results for the functions as a whole, only based on the time point. We are not interested
in these hypothesis tests and therefore do not explore them here.
We came across a paper, Li [13], which proposes using Concordance Correlation Coefficients
(CCC) for evaluating the similarity of paired functions. This is the only paper we found, which
presents a method for comparing paired functional data as a whole and not at individual time
points. Section 3 introduces Concordance Correlation Coefficients, before we proceed to a discuss
of how they are used in our hypothesis test. We explain how we use CCC in our hypothesis test in
Section 4.
55
4.3 Background Material: Concordance Correlation Coefficient
The goal of this Chapter is to create a model-based hypothesis test for the significance of phase
variability. We chose the models and now we need some way to compare the models. We remind the
reader that Section 2 explained why other methods were unsuccessful, and concluded by mentioning
an important tool for quantifying agreement: Concordance Correlation Coefficients (CCC). In this
section, we will introduce CCC. We will make it clear as to how this relates to our problem in
Section 4.
We use a simple example to motivate its definition. Consider an example where X,Y ∈ R1
are paired observations of some random process. We typically use correlation, ρ, to describes the
strength and direction of the connection between X and Y . Instead, we can use the Concordance
Correlation Coefficient, ρc, which is generally defined as:
ρc = 1− E(squared distance of (X,Y) to the identity line)
E(squared distance of (X,Y) to the identity line, assuming independence).
Essentially, ρc is scaled based on the distance of the observations to the identity line. The
expected distance to the identity line includes location and scale parameters. The next sub sections
will demonstrate the use of CCC for data in R1 and L2.
4.3.1 Concordance Correlation Coefficient in R1
We can write the definition of CCC using standard notation in R1 as follows:
ρc =2ρσxσy
σ2x + σ2
y + (µx − µy)2
=2cov(x, y)
σ2x + σ2
y + (µx − µy)2.
We can see from this definition of CCC that the concordance correlation coefficient is a scaled
version of correlation. While correlation measures the dependence in the relationship between X
and Y , CCC measures the agreement. This section provides examples, which demonstrate the
differences between correlation and CCC.
The following example demonstrates the use of CCC in R1. This may not be the best method
for analyzing data in R1. For this demonstration the following parameters are used: xi ∼ N(3, 1)
i.i.d, yi = 1.1 ∗ xi + εi, where εi ∼ N(0, 0.3616) i.i.d. Note that the true correlation is ρ = 0.95 and
the true CCC is ρc = 0.9051. Our goal is to measure the agreement between the X and Y .
56
Figure 4.1: The paired scores for comparison are shown on the left. The identity line (black) hasbeen added for reference. A bootstrapped distribution for ρc is shown on the right. The verticalline shows the location of the point estimate.
Figure 4.1 shows a plot of the pairs of scores (left) and a test statistic with a bootstrapped dis-
tribution (right). The bootstrapped distribution is explained in the next paragraph. The estimated
correlation, ρ = 0.9425, is very strong. The line of best fit is not identity, which leads to a penalty
in the CCC. The estimated CCC is ρc = 0.8942. These estimates appear close to the true values,
but we desire some method of evaluating this claim. A typical choice is to use standard deviations
or confidence intervals of the estimates.
Li [13] discusses how common theoretical estimates underestimate the variance of ρc and rec-
ommends using bootstrapped estimates. We will be using the bootstrapped distribution in our
method and are using the bootstrapped distributions here for consistency and omit a discussion on
alternative estimates. To get the bootstrapped distribution, we repeatedly sample N of the N pairs
of functions and compute a new ρc for each sample. We can use the bootstrapped distribution to
find a 95% confidence interval for ρc by using the 2.5th and 97.5th percentile of the bootstrapped
distribution. For this example, we get a confidence interval of (0.86, 0.92).
We just introduced CCC through an example in R1 to give the reader a basic understanding of
CCC. The next subsection will introduce CCC for L2 functions and provide examples.
4.3.2 Concordance Correlation Coefficient in L2
In the previous subsection, we introduced CCC using a simple example in R1. In this section,
we will discuss CCC and provide examples using L2 functions. Li [13] introduced the use of CCC
for comparing paired functional data. The Concordance Correlation Coefficient between X(t) and
Y (t) is defined as:
57
Figure 4.2: Left: One pair of the twenty simulated functions. Right: The bootstrapped distributionof the CCC.
ρc(X,Y ) =2〈X − E(X), Y − E(Y )〉
‖X − E(X)‖2 + ‖Y − E(Y )‖2 + ‖E(X)− E(Y )‖2,
where 〈X,Y 〉 = E∫X(t)Y (t)w(t)dt and w(t) is an optional weight function. For now we assume
w(t) = 1, ignoring any potential weightings. We see this formula is basically the same as the one
given for R1: the numerator is twice the covariance, and the denominator is the sum of the standard
deviations plus the distance between the means.
The example using functions is taken directly from Li’s paper. Twenty pairs of functions
are simulated using a Gaussian process with µx(t) = −√
0.05t, µy(t) =√
0.05t, σ2x = σ2
y = 1,
and σxy(t) = 0.95. We simulated each time point independently, using a multivariate normal
distribution. The true correlation between these functions is ρ = 0.95. Since the means are slightly
different, we will see a smaller CCC with the true CCC being ρc = 0.9048.
We remind the reader that a bootstrapped distribution is used, as recommended by Li. To get
the bootstrapped distribution, we repeatedly sample N of the N pairs of functions and compute
a new ρc for each sample. To be clear, the pairings do not change, just how often they appear
in the sample. We can use the bootstrapped distribution to get a 95% confidence interval for ρc:
(0.8927, 0.9100). Figure 4.2 shows one of twenty pairs of the simulated functions (left) along with
the bootstrapped distribution of ρc (right).
In this section we introduced CCC and provided some simple examples of how to use it. We
made comparisons between CCC and correlation to help emphasize the importance of agreement.
We provided examples in both R1 and L2. In the next section, we will discuss how CCC relates to
the rest of the Chapter.
58
4.4 Methods of Hypothesis Testing for Phase with CCC
The overall goal of this Chapter is to create a model-based hypothesis test to evaluate the
significance of time warping variability. We chose to use Elastic FPCA and FPCA as our models
with and without phase variability, respectively. We will use CCC as a tool for comparing the two
methods. We can now develop a framework for using CCC in our current hypothesis test situation.
There are several reasons we use CCC:
• As previously discussed, we want to evaluate agreement and not just correlation.
• CCC is the only existing method for comparing the agreement of functional data as a whole.
• Preliminary results using the L2 norm did not work well for cases with balanced variability.
We remind the reader that other methods, such as this, are discussed earlier in this Chapter.
Before getting into more details, lets review our goal. The overall goal for this Chapter is to be
able to compare the reconstructions under FPCA to those under Elastic FPCA. Using the same set
up and q-space described in the previous chapters, we would like to compare the original SRVFs to
the FPCA reconstructions and the original SRVFs to the Elastic FPCA reconstructions. We can
do this through comparing the CCC’s.
Using the expansion on the methods from Li [13], we can return to the question of Hypothesis
Testing. We can use the CCC, ρF (J)c , to denote the similarity between the original SRVFs to the
FPCA reconstructions. Note that as J →∞, ρF (J)c → 1. Similarly, we can denote ρ
E(J)c for Elastic
FPCA equivalent.
If the CCC’s are similar, then the γi are not contributing much to the Elastic FPCA model and
we risk overfitting our reconstruction. This means our initial hypothesis test:
H0 : {γi = γid, for all i}
HA : {γi 6= γid, for at least one i},
is equivalent to:
H0 : ρF (J)c = ρE(J)
c
HA : ρF (J)c 6= ρE(J)
c .
59
We do not want to compare the correlation of the reconstructions for FPCA and Elastic FPCA
directly. As the reconstructions approach the original functions, the reconstructions for FPCA and
Elastic FPCA could have mismatching variance resulting in a smaller CCC.
In Section 3, we introduced CCC and gave an example of how we can get a distribution for
a single CCC, however we will need to compare two correlations. Previous work in this area
includes Fisher 1921 [5], which uses the Fisher Z-Transformation to form normal approximations
for correlation. Steiger 1980 [21] who explored methods of comparing two correlations, which share
an index. Unfortunately, these approximations do not work well when ρ is close to 1, and are not
worth applying to CCC for our purposes.
Barnhart [2] proposed a new versions of CCC to be used in a situation with a reference method
and multiple comparisons. However, this deals with assessing overall agreement between the refer-
ence method and multiple other methods, which is not what we are interested in here.
The next rest of this section justifies using CCC and gives a simple example of how we approach
the hypothesis testing.
4.4.1 Applying Concordance Correlation Coefficient to Hypothesis Testing
To show CCC makes sense to use, the following conditions must be satisfied:
1. If γi = γid for all i, then ρF (J)c = ρ
E(J)c . This is easy to show and therefore omitted.
2. If γi 6= γid for at least one i, then ρF (J)c 6= ρ
E(J)c .
We will in fact show ρE(J)c > ρ
F (J)c , making our hypothesis equivalent to a one sided test:
H0 : ρF (J)c = ρE(J)
c
HA : ρF (J)c < ρE(J)
c .
We will fix J and use the following notation:
• Q is the SRVF of random functions
• QF , QE are the reconstructions of Q using FPCA and Elastic FPCA, respectively
• 〈X,Y 〉 = E∫X(t)Y (t)dt
• ‖X‖2 = 〈X,X〉
• E(X) = E(X(t)) are functions of t
• ρFc = ρF (J)c , ρEc = ρ
E(J)c for a short hand notation, since J is fixed
60
• ρFc = 2〈Q−E(Q),QF−E(QF )〉‖Q−E(Q)‖2+‖QF−E(QF )‖2+‖E(Q)−E(QF )‖2
• ρEc = 2〈Q−E(Q),QE−E(QE)〉‖Q−E(Q)‖2+‖QF−E(QE)‖2+‖E(Q)−E(QE)‖2
Theorem 1: For a set of functions with reconstructions as defined above, the Concordance
Correlation Coefficient from Elastic FPCA is greater than that of FPCA when phase variability is
present, i.e. ρEc > ρFc .
The proof of the above Theorem is shown in the Appendix. Algorithm 9 is used for getting
distributions of ρFc and ρEc .
Algorithm 9 CCC Comparison with FPCA and Elastic FPCA
1. Fix J .
2. Compute qF (J)i and q
E(J)i for each i = 1...N .
3. Compute ρF (J)c and ρ
E(J)c .
4. Replicate step 2 with bootstrapped samples. For each bootstrapped sample, b:
(a) Sample N of the i with replacement.
(b) Compute ρbF (J) and ρbE(J) using the bootstrapped sample.
We will demonstrate these methods using R1 to try and give a more intuitive understanding.
The results section will demonstrate these methods as described.
4.4.2 Simple Concordance Correlation Coefficient for Two Comparisons
We will continue with the example from Section 4.3.1 using R1. The example started with two
random variables, X and Y . X represented a gold standard measurement and Y was an alternative
measurement. Lets assume there is an alternative to Y , which we would also like to compare,
denoted Z. Figure 4.3 illustrates this situation with the paired data (left) and the bootstrapped
distribution (right). For this demonstration the following parameters are used: zi = xi + ηi, where
ηi ∼ N(0, 0.3287) i.i.d.
To be clear, for each i there are three scores: a gold standard using method xi, a score using
method yi and a score using the new method zi. Concordance Correlation Coefficients can be used
to measure the agreement between X and Y and between X and Z, denoted ρyc and ρzc . Note
that the true correlation between X and Z is ρz = 0.95 and the true CCC is ρzc = 0.9488. Recall
that ρxc = 0.9051. Also, note that Y and Z have the same correlation with X. Y is not on the
61
Table 4.1: A table showing the correlation and CCC between X,Y and X,Z.
Xi to Yi Xi to Ziρ 0.94 0.95ρc 0.90 0.95
Figure 4.3: We want to measure the agreement between the scores on the X-axis and those on theY-axis for two samples: Y (red) and Z (blue) shown on the left. The line of identity (black) hasbeen added for reference. On the right, the bootstrapped distribution of ρyc is compared to thebootstrapped distribution of ρzc .
identity line, which results in a penalty for its Concordance Correlation Coefficient, while Z is not
penalized.
Table 4.1 states the estimates of the correlation and CCC in both cases. The CCC is lower
in the case with the worse agreement as expected. However, this observed difference could be a
coincidence. A hypothesis test can be conducted to determine if this difference is significant:
H0 : ρyc = ρzc
HA : ρyc 6= ρzc .
We compare a bootstrapped distribution of ρyc , which I can consider my null distribution, with
the bootstrapped distribution of ρzc . Figure 4.3 shows the distributions. We can get p-values by
comparing the bootstrapped distribution of ρyc to the point estimate of ρzc . Remember that this
demonstration in R1 is just to assist with the understanding of the hypothesis test using CCC before
apply the methods in the model testing set up. That being said, we are treating the distribution of
bootstrapped distribution as a null distribution, because the bootstrapped distribution of ˆrhoF (J)
c
is indeed the null distribution in our set up.
62
Table 4.2: The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate a variety ofamplitude and time warping variability.
Paramater Amplitudes Phases
µ Φ(.5, .5) 0{bj} Fourier FourierJ 2 2{cij} N(0, σ2
j ) N(0, σ2W )
var({cij}) σ2j = ξ/j, [.1, 1, 3]
ξ = [.01, 0.5, 2]
4.5 Results
In this Chapter we created a test for the significance of phase variability using the FPCA and
Elastic FPCA models. This section applies the test on simulated data and then on several real
data sets: Berkeley Growth Curves, tecator data, and temperature data.
4.5.1 Simulations
The simulated functions are shown in Figure 4.4. They are simulated using the same technique
as in Chapter 2.3. Amplitudes and phases are simulated separately with three different levels of
variability. For simplicity, the amplitudes are labeled as Amp 1, 2, 3 where Amp 1 has almost
no variance and the variance increases for Amp 2 and again for Amp 3. The same is true for the
amount of time warping variance, labeled with Gam 1, 2, 3. These amplitudes and phases are
composed in pairs to create the nine test cases. In figure 4.4 the variability of the phase increase
going from left to right, while the variability of the amplitude increase going from the bottom to
the top.
The actual parameters used are presented in Table 4.2. To ensure the amplitudes do not contain
phase variability, we use the Multiple Alignment Algorithm presented in Chapter 2.
Figure 4.5 shows the boxplots of the bootstrap estimates of ρF (J)c and ρ
E(J)c for J = 0, ...10 for
each of the nine sample cases (phase variability increases from left to right, amplitude variability
increases bottom to top). As J increases, the distribution of ρF (J)c approaches the distribution of
ρE(J)c and both approach a distribution of constant 1. As expected, as the variance of the time
warping functions increase, it takes longer for these distributions to converge.
63
Note that when J = 0, the FPCA model is simply the mean of the functions, which results in
a covariance of 0. Elastic FPCA is able to reconstruct with the mean and time warping functions,
which is why the covariance is none zero. An important feature to note is that when the true
variance of time warping is low (i.e. Gam 1), we see a lower distribution of ρE(J)c than when the
time warping variance is higher. This may relate to the overfitting of the the Elastic FPCA model
resulting in a relatively less than optimal covariance with the original functions.
The top row of Figure 4.6 shows the p-values using the point estimate of ρE(J)c for comparison.
The bottom row uses the minimum of the bootstrapped ρE(J)c for comparison. Both p-values are
shown as a function of J , the number of basis sets used in the reconstruction. At a certain point,
both FPCA and Elastic FPCA are able to fully reconstruct the original functions, and therefore
have p-values of 1 for those J . We can see that as the variance of the amplitudes increase, it
generally takes longer for the p-values to converge. In general, we can see the p-values decrease for
a fixed J , as the variance of the time warping functions increase.
4.5.2 Real Data Examples
The Berkeley Growth Curves show the change in height for children ages 0 to 18 years old. As
previously discussed, the Berkeley growth curves are used as a typical example of an unaligned set
of functions. This is because, children have growth spurts at different ages (time warping) and to
different extremes (amplitude). We will now be able to apply the hypothesis test to determine if
alignment is significant. The female growth curves are shown first followed by the male growth
curves.
Figure 4.7 show the original functions (left) and their reconstructions of the amplitudes for
J = 1 using FPCA (middle) and Elastic FPCA (right). Note that the reconstruction process, as
well the CCC calculations occur in the SRVF space. Figure 4.8 shows the distributions of ρF (J)c
and ρE(J)c as a function of J , i.e. the number of basis functions used in reconstruction. We can see
the CCC are converging, but haven’t yet converged by J = 10. Figure 4.9 show the p-values for a
larger number of J . We can see that the distributions converge when J is in the upper twenties.
Figures 4.10, 4.11, and 4.12 show the results for the male growth curves. We expect the same
results here as for the female growth curves, because of the similar nature of the data. From the
boxplots, we can see the CCC are converging, but haven’t yet converged by J = 10. Figure 4.9
show the p-values for a larger number of J . We can see that the distributions converge when J is
in the mid twenties.
64
Am
p3
Am
p2
Am
p1
Gam 1 Gam 2 Gam 3
Figure 4.4: Simulation The original simulated functions. Each row uses the same amplitudes withincreasing time warping variance moving to the right. Each column uses the same time warpingfunctions, but increasing amplitude variance moving upward.
65
Am
p3
Am
p2
Am
p1
Gam 1 Gam 2 Gam 3
Figure 4.5: Simulation Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c (red) for
each J along the x-axis.
PV
alu
es
Amp 1 Amp 2 Amp 3
Figure 4.6: Simulation The p-values for the nine cases are shown for J = 0, ..., 50. The amplitudevariance increases as the plots move to the right. Each plot shows the p-values for small (blue),medium(red), and large (yellow) time warping variance.
66
Figure 4.7: Female Growth Curves Left: The original female growth curves. Middle and Right:Reconstructions of the growth curves using J = 1 for FPCA and Elastic FPCA, respectively. Notethat the functions are shown in the original space and not the SRVFs.
Figure 4.8: Female Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and
ρE(J)c (red) for each J along the x-axis.
67
Figure 4.9: Female Growth Curves The p-values for the female growth curves are close to zerountil around J = 28.
Figure 4.10: Male Growth Curves Left: The original male growth curves. Middle and Right:Reconstructions of the growth curves using J = 1 for FPCA and Elastic FPCA, respectively. Notethat the functions are shown in the original space and not the SRVFs.
The results from the male and female growth curves are similar, as expected. This is good,
because it provides a real data example of sets of functions we want to have similar results for. Of
course it would be good in the future to have replicated studies of the growth curves to affirm the
findings here.
Figures 4.13, 4.14, and 4.15 show the results for the tecator functions. The tecator data, used
a ”Tecator Infratec spectrometer that measures the absorbances at 100 wavelengths in the region
850-1050 mm” [4]. This is an example of a data set that appears mostly aligned, but with some
phase variability present.
Figure 4.13 show the original tecator functions (left), and reconstructions using FPCA and
Elastic FPCA (middle and right, respectively). The reconstructions look very similar to the original
functions. Figure 4.14 show the boxplots for J = 0, ..., 10 of the CCC. The CCC appear to converge
68
Figure 4.11: Male Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c
(red) for each indicated J along the x-axis.
Figure 4.12: Male Growth Curves The p-values for the male growth curves are close to zerountil around J = 25.
69
Figure 4.13: Tecator Data Left: The original tecator functions. Middle and Right: Reconstruc-tions of the tecator functions using J = 1 for FPCA and Elastic FPCA, respectively. Note thatthe functions are shown in the original space and not the SRVFs.
toward one quickly. Figure 4.15 show that p-values do not become significant until J is close to 20.
This contradicts what was expected.
It is hard to see visually in Figure 4.13, but many of the original functions have a small peak
to the left of the main peak. The Elastic FPCA struggles to converge in such a situation. This
is similar to issues with the Multiple Alignment Algorithm, which struggles to converge when
presented with functions with varying numbers of less prominent peaks. Further investigation into
cases similar to this are necessary.
Figures 4.16, 4.17, and 4.18 show the results for the temperature data. This data set is an
example where there is almost no noticeable time warping variability. Even with only one basis,
the FPCA and Elastic FPCA models seem to be able to reconstruct most of the variance of the
original functions. The boxplots shown in Figure 4.17 show the box plots converging at very low
values of J compared to the other data sets. Observing the p-values, the Elastic FPCA model
might not be necessary for J > 1 for this particular data set.
4.6 Conclusion
In this Chapter we proposed a model-based approach to testing the significance of phase vari-
ability. We used the models proposed in Chapter 2: FPCA as a general model with no time warping
component and Elastic FPCA as a model with time warping. Concordance Correlation Coefficients
were used to make the comparison. We created a hypothesis test using CCC which was equivalent
to testing the significance of phase variability using the two models.
Various simulations showed a change in the significance of the time warping functions-based
on the initial phase and amplitude variability. We explored several applications including growth
70
Figure 4.14: Tecator Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c (red)
for each indicated J along the x-axis.
Figure 4.15: Tecator Data The p-values for the tecator data are close to zero until around J = 20.
71
Figure 4.16: Temperature Data Left: The original temperature functions. Middle and Right: Re-constructions of the temperature functions using J = 1 for FPCA and Elastic FPCA, respectively.Note that the functions are shown in the original space and not the SRVFs.
Figure 4.17: Temperature Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ
E(J)c
(red) for each indicated J along the x-axis.
72
Figure 4.18: Temperature Data The p-values for the weather data are close to zero until aroundJ = 5.
curves, tecator data, and weather data. The real data had both positive and negative results. The
growth curves had the desired results: significant differences between the two models and similar
results between the male and female growth curves. The FPCA and Elastic FPCA models were
not significantly different for larger J in the weather data, as desired.
The tecator data was expected to give similar results to those of the weather data. It is
suspected that the discrepancy in results has to do with small additional peaks in some of the
tecator functions. Specifically, it appears the models give different results in locations where the
functions are relatively flat. A solution to this might be to use the weight parameter in CCC to
make locations where the functions are flat less influential. Further investigation is required. In
the meantime, caution should be applied when using the hypothesis test on certain datasets.
In conclusion, we have formulated a hypothesis test for testing the significance of phase vari-
ability. This test can now be applied to data sets before proceeding with time warping techniques,
with some cautions.
73
CHAPTER 5
GENERAL DISCUSSION AND FUTURE WORK
To summarize the main contribution of this dissertation we have:
• Studied three models for functional data, while accounting for the phase variability in different
ways:
– Functional Principal Component Analysis, that does not separate phase variability from
the amplitude variability
– Separate Functional Principal Component Analysis, that first separates phase and am-
plitude and then models each component separately
– Elastic Functional Principal Component Analysis, that jointly models phase and ampli-
tude together
• Developed a metric-based hypothesis test for testing the significance of phase variability for
a given set of functions
• Developed a model-based hypothesis test for testing the significance of phase variability using
Concordance Correlation Coefficients and the FPCA and Elastic FPCA models
In Chapter 2 we compared Elastic FPCA with two other models: FPCA and Separate FPCA.
When the functions contain small phase variability, it is clear the Separate FPCA model struggles
taking this variability into account properly. FPCA did not separate out phase variability, making
it preferable to Separate FPCA when functions did not have phase variability. Elastic FPCA
takes jointly models phase and amplitude and was able to model functions with phase better than
FPCA, and it was able to model functions without phase better than Separate FPCA. We concluded
that Elastic FPCA is the desired method to minimize variability in the functions, while avoiding
unnecessary phase components.
For both Separate FPCA and Elastic FPCA, we noted that the models overfit their phase
components when phase variability was not necessary. We noted that it was possible for FPCA to
perform better than the other models if there was a way of indicating whether phase variability is
truly present in the functions or not. This led to the hypothesis tests developed in Chapters 3 and
4, that test the significance of phase variability in functional data.
74
In Chapter 3, we proposed several ways to test the significance of phase variability using the
metric introduced in Chapter 1. We evaluated the significance using the Friedman-Rafsky Test,
Schilling’s Nearest Neighbors Test, and the Energy Test. We also proposed a ”Pseudo-Bootstrap”
solution to attempt to create a null distribution of test statistics for the FRT and SNNT to address
issues of independence. Both methods, along with the Energy Test, violated the same assumptions
of independence between the two groups of functions being compared. Although the ”Pseudo-
Bootstrap” helped to address this assumption in the null distribution, a better method was clearly
needed.
From the results of the Energy Test, it was clear that quantifications of phase variability are in
relation to the amplitude variability. The metric-based approach allowed for the phase component
to account for variability that could be from the amplitudes. We noted that this could be addressed
by considering a model that accounts for both phase and amplitude variability, such as the Elastic
FPCA model. This led to the work presented in Chapter 4, where we developed a hypothesis test
for testing the significance of phase variability using a model-based approach.
In Chapter 4, we proposed using Concordance Correlation Coefficients (CCC) as a tool for test-
ing the agreement between FPCA and Elastic FPCA. We developed a hypothesis test that compared
the CCC between SRVFs and their reconstructions using FPCA and Elastic FPCA. We stated in
a theorem and provide that the CCC between SRVFs and their Elastic FPCA reconstructions will
always be closer to one than the CCC between the SRVFs and their FPCA reconstructions. We
used this theorem to formulate a hypothesis test using the CCC.
Various simulations showed a change in the significance of the phase components based on the
initial phase and amplitude variability. We explored several applications including growth curves,
tecator data, and weather data. The real data had both positive and negative results. The growth
curves had the desired results: significant differences between the two models and similar results
between the male and female growth curves. The FPCA and Elastic FPCA models were not
significantly different for larger J in the weather data, as desired.
The tecator data was expected to give similar results to those of the weather data. It is suspected
that the discrepancy in results has to do with the functions being approximately flat in the same
locations. Further investigation is required. A solution may be to include a weighted function in
CCC to give SRVFs close to zero less influence over the distribution of the CCC. In the meantime,
caution should be applied when using the hypothesis test on certain data.
75
In conclusion, we have developed several hypothesis tests for testing the significance of phase
variability in functional data. We will need to continue to work on cases with overall low amplitude
and phase variability, such as the tecator functions.
76
APPENDIX A
PROOF OF THEOREM IN CHAPTER 4
Theorem: For a set of functions with reconstructions as defined above, the Concordance Correla-
tion Coefficient from Elastic FPCA is greater than that of FPCA when phase variability is present
(i.e. ρEc > ρFc ).
The proof is broken down into several Lemma’s and remarks. Many properties of the FPCA
model are used throughout this process.
Remark: If γi 6= γid for at least one i, it is because E‖Q − QE‖2 < E‖Q − QF ‖2. This is
directly from the minimization. This is because the model using Elastic FPCA with remain at γid
unless a better fit has been found.
Lemma 1: If ‖Q −QE‖2 < ‖Q −QF ‖2, then 2〈Q − E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 +
‖QE − E(QE)‖2.
Proof of Lemma 1:
‖Q−QE‖2 < ‖Q−QF ‖2
‖Q− E(Q)‖2 − ‖QE − E(QE)‖2 − 2〈QE − E(QE), Q−QE〉 < ‖Q− E(Q)‖2 − ‖QF − E(Q)‖2
‖QE − E(QE)‖2 + 2〈QE − E(QE), Q−QE〉 > ‖QF − E(Q)‖2
‖QE − E(QE)‖2 + 2〈Q− E(Q), QE − E(QE)〉 − 2‖QE − E(QE)‖2 > ‖QF − E(Q)‖2
2〈Q− E(Q), QE − E(QE)〉 − ‖QE − E(QE)‖2 > ‖QF − E(Q)‖2
2〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 + ‖QE − E(QE)‖2
Lemma 2: If 2〈Q − E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 + ‖QE − E(QE)‖2, then 〈Q −
E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2.
Proof of Lemma 2:
77
Note that ‖QF − E(Q)‖2 < ‖QE − E(QE)‖2:
2〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 + ‖QE − E(QE)‖22〈Q− E(Q), QE − E(QE)〉 > 2‖QF − E(Q)‖2
〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2
Lemma 3: We can simplify ρF in the following manner:
ρF =2‖QF − E(QF )‖2
‖Q−QF ‖2 + 2‖QF − E(Q)‖2.
Proof of Lemma 3:
Recall that (using the fundamental rules of FPCA) Q−QF and QF − E(Q) are orthogonal:
ρF =2〈Q− E(Q), QF − E(QF )〉
‖Q− E(Q)‖2 + ‖QF − E(QF )‖2 + ‖E(Q)− E(QF )‖2
=2〈Q− E(Q), QF − E(QF )〉
‖Q− E(Q)‖2 + ‖QF − E(QF )‖2 + ‖E(Q)− E(Q)‖2
=2〈Q− E(Q), QF − E(QF )〉
‖Q− E(Q)‖2 + ‖QF − E(QF )‖2 + 0
=2〈QF +Q−QF − E(Q), QF − E(QF )〉‖Q− E(Q)‖2 + ‖QF − E(QF )‖2
=2〈Q−QF , QF − E(QF )〉+ 2〈QF − E(Q), QF − E(QF )〉
‖Q− E(Q)‖2 + ‖QF − E(QF )‖2
=0 + 2〈QF − E(Q), QF − E(QF )〉‖Q− E(Q)‖2 + ‖QF − E(QF )‖2
=2‖QF − E(QF )‖2
‖Q− E(Q)‖2 + ‖QF − E(QF )‖2
=2‖QF − E(QF )‖2
‖Q−QF ‖2 + ‖QF − E(Q)‖2 + ‖QF − E(Q)‖2
=2‖QF − E(QF )‖2
‖Q−QF ‖2 + 2‖QF − E(Q)‖2
Lemma 4: We can simplify ρE in the following manner:
ρE =2〈Q− E(Q), QE − E(QE)〉
‖Q−QE‖2 + 2〈Q− E(Q), QE − E(QE)〉.
Proof of Lemma 4:
78
ρE =2〈Q− E(Q), QE − E(QE)〉
‖Q− E(Q)‖2 + ‖QE − E(QE)‖2 + ‖E(Q)− E(QE)‖2
=2〈Q− E(Q), QE − E(QE)〉
‖Q− E(Q)‖2 + ‖QE − E(QE)‖2 + ‖Q−QE‖2 − ‖Q− E(Q)‖2 − ‖QE − E(QE)‖2 +A1
=2〈Q− E(Q), QE − E(QE)〉
‖Q−QE‖2 + 2〈Q− E(Q), QE − E(QE)〉
In the above proof, A1 = 2〈Q−E(Q), QE −E(QE)〉. Note that we use the following remark to
move from the first to second step of the above proof:
Remark:∥∥E(Q)− E(QE)∥∥2
=∥∥Q−QE∥∥2 − ‖Q− E(Q)‖2 −
∥∥QE − E(QE)∥∥2
+ 2〈Q− E(Q), QE − E(QE)〉
Proof of Remark:
∥∥E(Q)− E(QE)∥∥2
=∥∥Q−QE∥∥2 −
∥∥Q−QE∥∥2+∥∥E(Q)− E(QE)
∥∥2
=∥∥Q−QE∥∥2 −
∥∥(Q−QE)− E(Q) + E(QE)∥∥2
=∥∥Q−QE∥∥2 −
∥∥Q− E(Q)−QE − E(QE)∥∥2
=∥∥Q−QE∥∥2 − ‖Q− E(Q)‖2 −
∥∥QE − E(QE)∥∥2
+ 2〈Q− E(Q), QE − E(QE)〉
Proof of Theorem:
We begin with the inequality of residual errors, ‖Q−QE‖2 < ‖Q−QF ‖2, which comes directly
from the models. Recall that if the Elastic FPCA model differs from FPCA, it is because a set of
warping functions has been found, which minimize the energy more than from the FPCA model.
From Lemma 1 and Lemma 2, we can show the inequality of residual errors is equivalent to:
〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2.
Remark: ∥∥Q−QE∥∥2
〈Q− E(Q), QE − E(QE)〉<
‖Q−QF ‖2
‖QF − E(QF )‖2
The remark notes that combing the inequality of residual errors, with the above inequality gives
us the following inequality:
79
∥∥Q−QE∥∥2
〈Q− E(Q), QE − E(QE)〉<
‖Q−QF ‖2
‖QF − E(QF )‖2
The next few steps use some simple algebra manipulation on the above inequality.
∥∥Q−QE∥∥2
2〈Q− E(Q), QE − E(QE)〉+ 1 <
‖Q−QF ‖2
2‖QF − E(QF )‖2+ 1∥∥Q−QE∥∥2
+ 2〈Q− E(Q), QE − E(QE)〉2〈Q− E(Q), QE − E(QE)〉
<‖Q−QF ‖2 + 2‖QF − E(Q)‖2
2‖QF − E(QF )‖2
2〈Q− E(Q), QE − E(QE)〉‖Q−QE‖2 + 2〈Q− E(Q), QE − E(QE)〉
>2‖QF − E(QF )‖2
‖Q−QF ‖2 + 2‖QF − E(Q)‖2
Lemma 3 shows the right side of the above inequality is equivalent to ρF . Lemma 5, with
the help of Lemma 4, shows the left side is equivalent to ρE . We can therefore conclude that if
‖Q−QE‖2 6= ‖Q−QF ‖2, then ρE > ρF .
80
APPENDIX B
IRB APPROVAL
81
BIBLIOGRAPHY
[1] B. Aslan and G. Zech. New test for the multivariate two-sample problem based on the conceptof minimum energy. Journal of Statistical Computation and Simulation, 75(2):109–119, 2005.
[2] Huiman X. Barnhart, Yuliya Lokhnygina, Andrzej S. Kosinski, and Michael Haber. Compari-son of concordance correlation coefficient and coefficient of individual agreement in assessingagreement. Journal of Biopharmaceutical Statistics, 17(4):721–738, Feb 2007.
[3] Philippe Besse and J. O. Ramsay. Principal components analysis of sampled functions. Psy-chometrika, 51(2):285–311, 1986.
[4] Claus. Borggaard and Hans Henrik. Thodberg. Optimal minimal neural interpretation ofspectra. Analytical Chemistry, 64(5):545–551, 1992.
[5] Ronald A Fisher. 014: On the ”probable error” of a coefficient of correlation deduced from asmall sample. Metron, 1:3–32, 1921.
[6] Jerome H. Friedman and Lawrence C. Rafsky. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics, 7(4):697–717, 1979.
[7] C. Hagwood, J. Bernal, M. Halter, J. Elliott, and T. Brennan. Testing Equality of CellPopulations Based on Shape and Geodesic Distance. IEEE Trans Med Imaging, 32(12):2230–2237, Dec 2013.
[8] Peter Hall and Mohammad Hosseini-Nasab. On properties of functional principal componentsanalysis. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 68(1):109–126, 2006.
[9] Harold E. Jones and Nancy Bayley. The berkeley growth study. Child Development, 12(2):167–173, 1941.
[10] K. Karhunen. Ueber lineare Methoden in der Wahrscheinlichkeitsrechnung. Annales Academiaescientiarum Fennicae. Series A. 1, Mathematica-physica. 1947.
[11] Tonya S. King and Vernon M. Chinchilli. A generalized concordance correlation coefficient forcontinuous and categorical data. Statistics in Medicine, 20(14):2131–2147, 2001.
[12] Alois Kneip and James O Ramsay. Combining registration and fitting for functional models.Journal of the American Statistical Association, 103(483), 2008.
[13] Runze Li and Mosuk Chow. Evaluation of reproducibility for paired functional data. Journalof Multivariate Analysis, 93(1):81–101, 2005.
[14] Lawrence I-Kuei Lin. A concordance correlation coefficient to evaluate reproducibility. Bio-metrics, 45(1):255, 1989.
82
[15] Jeffrey S. Morris. Functional regression. Annual Review of Statistics and Its Application,2(1):321–359, 2015.
[16] Karl Pearson. Liii. on lines and planes of closest fit to systems of points in space. The London,Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.
[17] J.O Ramsay and B.W Silverman. Applied functional data analysis: methods and case studies.Springer, 2002.
[18] Mark F. Schilling. Multivariate two-sample tests based on nearest neighbors. Journal of theAmerican Statistical Association, 81(395):799–806, 1986.
[19] A. Srivastava, E. Klassen, S. H. Joshi, and I. H. Jermyn. Shape analysis of elastic curves in eu-clidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7):1415–1428, July 2011.
[20] Anuj Srivastava and Eric P. Klassen. Functional and shape data analysis. Springer, 2016.
[21] James H. Steiger. Tests for comparing elements of a correlation matrix. Psychological Bulletin,87(2):245–251, 1980.
[22] J. Derek Tucker. Functional Component Analysis and Regression Using Elastic Methods. PhDthesis, Florida State University Libraries, 2014.
[23] J. Derek Tucker, Wei Wu, and Anuj Srivastava. Generative models for functional data usingphase and amplitude separation. Comput. Stat. Data Anal., 61:50–66, May 2013.
83
BIOGRAPHICAL SKETCH
Megan Duncan received her B.A. in Mathematics with a minor in Latin from the University of
Maine in Fall of 2011. After some graduate work in Mathematics at the University of Maine, she
moved to Florida to pursue her Ph.D. In the Fall of 2013, she began the Ph.D. program in the
Department of Statistics at Florida State University under the advisement of Dr. Anuj Srivastava.
Megan is married to Adam Duncan and they have four cats.
84