florida state university libraries - fsu.digital.flvc.org653405/...n(left) and p-values (right) for...

96
Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2018 Elastic Functional Principal Component Analysis for Modeling and Testing of Functional Data Megan Duncan Follow this and additional works at the DigiNole: FSU's Digital Repository. For more information, please contact [email protected]

Upload: others

Post on 06-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Florida State University LibrariesElectronic Theses, Treatises and Dissertations The Graduate School

2018

Elastic Functional Principal ComponentAnalysis for Modeling and Testing ofFunctional DataMegan Duncan

Follow this and additional works at the DigiNole: FSU's Digital Repository. For more information, please contact [email protected]

Page 2: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

FLORIDA STATE UNIVERSITY

COLLEGE OF ARTS AND SCIENCES

ELASTIC FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS FOR MODELING AND

TESTING OF FUNCTIONAL DATA

By

MEGAN DUNCAN

A Dissertation submitted to theDepartment of Statistics

in partial fulfillment of therequirements for the degree of

Doctor of Philosophy

2018

Copyright c© 2018 Megan Duncan. All Rights Reserved.

Page 3: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Megan Duncan defended this dissertation on April 19, 2018.The members of the supervisory committee were:

Anuj Srivastava

Professor Directing Thesis

Eric Klassen

University Representative

Fred Huffer

Committee Member

Wei Wu

Committee Member

The Graduate School has verified and approved the above-named committee members, and certifiesthat the dissertation has been approved in accordance with university requirements.

ii

Page 4: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

TABLE OF CONTENTS

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction 11.1 Motivation for Phase and Amplitude Separation . . . . . . . . . . . . . . . . . . . . 41.2 Background Material: Mathematical Amplitude Framework . . . . . . . . . . . . . . 6

1.2.1 Phase Amplitude Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Space of Amplitudes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Space of Time Warping Functions . . . . . . . . . . . . . . . . . . . . . . . . 91.2.4 Distance Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Motivation for Testing Phase Variability . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Overview of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Methods of Modeling Functional Data 142.1 Models for Functional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.1 Functional Principal Component Analysis in SRVF Space . . . . . . . . . . . 152.1.2 FPCA of Phase and Amplitude Components . . . . . . . . . . . . . . . . . . 162.1.3 Elastic FPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Metric-Based Hypothesis Testing for Phase Variability in Functional Data 333.1 Background Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 Metric-Based Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.1 Friedman-Rafsky Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2.2 Schilling Nearest Neighbors Test . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.1 Basic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.2 Pseudo - Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Additional Work: Energy Test and Permutation Distribution . . . . . . . . . . . . . 433.4.1 Energy Test and Permutation Distribution Methods . . . . . . . . . . . . . . 453.4.2 Energy Test and Permutation Distribution Results . . . . . . . . . . . . . . . 46

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Model-Based Hypothesis Testing for Phase in Functional Data 514.1 Background Material: Review of Functional Data Models . . . . . . . . . . . . . . . 52

4.1.1 Functional Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 524.1.2 Elastic Functional Principal Component Analysis . . . . . . . . . . . . . . . . 53

4.2 Introduction to Hypothesis Testing for Phase Variability . . . . . . . . . . . . . . . . 544.3 Background Material: Concordance Correlation Coefficient . . . . . . . . . . . . . . 56

4.3.1 Concordance Correlation Coefficient in R1 . . . . . . . . . . . . . . . . . . . . 564.3.2 Concordance Correlation Coefficient in L2 . . . . . . . . . . . . . . . . . . . . 57

4.4 Methods of Hypothesis Testing for Phase with CCC . . . . . . . . . . . . . . . . . . 59

iii

Page 5: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

4.4.1 Applying Concordance Correlation Coefficient to Hypothesis Testing . . . . . 604.4.2 Simple Concordance Correlation Coefficient for Two Comparisons . . . . . . 61

4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5.2 Real Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 General Discussion and Future Work 74

Appendix

A Proof of Theorem in Chapter 4 77

B IRB Approval 81

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

iv

Page 6: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

LIST OF TABLES

2.1 The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate avariety of amplitude and time warping variability. . . . . . . . . . . . . . . . . . . . . 20

3.1 Distances between observations are shown above. Using K = 2 as an example, thetwo nearest neighbors for each observation is in bold for that row. For example, A1’snearest neighbors are A2 and B1. We now count the number of these neighbors, fromthe same sample size as the observation (last column). There are 10 such neighbors,resulting in TNK = 10

12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3 The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1 A table showing the correlation and CCC between X,Y and X,Z. . . . . . . . . . . . 62

4.2 The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate avariety of amplitude and time warping variability. . . . . . . . . . . . . . . . . . . . . 63

v

Page 7: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

LIST OF FIGURES

1.1 The Berkeley Growth Curves show the change in female children’s height (cm/year)from the ages of 1 to 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 We show the average daily temperature (Celsius) over the course of the year for 35locations in Canada. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 On the left are the Akamai Technologies closing stock prices for each business day in2014. We smoothed the stock data to eliminate seasonal variation (shown on the right). 3

1.4 On the left are simulated functions. On the right are the pointwise mean (black) andthe pointwise mean plus/minus one standard deviation (red). . . . . . . . . . . . . . . 5

1.5 On the left are the Berkeley growth curves. On the right are the pointwise mean(black) and the pointwise mean plus/minus one standard deviation (red). . . . . . . . 5

1.6 This figure illustrates the Multiple Alignment Algorithm. The algorithm initializes(top row) time warping as identity. For each iteration, we show the phase (left column),amplitudes (middle column), and mean (right column). After initialization, we showthe next three iterations (second through fourth row). We terminate the process onthe twentieth iteration (bottom row). . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.7 Growth rates (cm/year) of girls between the ages of 1 and 18 years of age. The firstplot shows the unaligned growth curves. The curves are then aligned using MultipleAlignment and are separated into amplitudes (center) and time warping functions(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.8 Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phasecomponents (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.9 Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phasecomponents (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Representation of Algorithm 5 A: Use {qi} (black dot) to find {bj}. The blue curverepresents all possible functions using this basis set. B: Fix {bj} (now black), findoptimal {cij} (red dot) by minimizing distance (purple line). C: Find {qi} (greendot), which is minimum distance from reparameterizations of {qi} to reconstruction.D: Find optimal reconstruction of {qi}, repeat until converged. E: Use {qi} to find anew basis set (blue). F: Repeat steps B through E until converged on {bj}. This imageis only meant to be an abstract representation to give a more intuitive understandingof the algorithm. It does not reflect all the details of the algorithm, nor does it properlyreflect the mathematics behind the algorithm. . . . . . . . . . . . . . . . . . . . . . . 19

2.2 The original sampled amplitude SRVF functions: {qi}. These amplitudes have almostno variability (left) and a lot of variability (right). . . . . . . . . . . . . . . . . . . . . 20

vi

Page 8: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

2.3 The original sampled amplitude functions: {fi}. Note that the inverse SRVF ofthe basis functions are not orthonormal, so the variability in these functions do notnecessarily correspond to the variability seen in Figure 2.2 . . . . . . . . . . . . . . . . 21

2.4 The original sampled time warping functions: {γi}. Note that these functions aresampled in a vector space and mapped to Γ. Therefore the variability in these functionsdo not necessarily correspond to the variability described by Table 2.1 . . . . . . . . . 21

2.5 The original SRVF functions: {qi}. Note that the same amplitudes are used for eachrow and the same time warping functions are used for each column. . . . . . . . . . . 22

2.6 The original sampled functions: {fi}. Note that the same amplitudes are used foreach row and the same time warping functions are used for each column. . . . . . . . 23

2.7 The reconstructed SRVF functions: {qi} for the three methods. All three methods areconverging to the same set of functions. Computational noise is causing the functionsto look more jagged. This noise is very minimal and is avoided when observing the{fi} in Figure 2.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.8 The reconstructed functions: {fi} for the three methods. Note that these are theinverse SRVFs of the reconstrunstructions, as oppose to reconstructions themselves. . 26

2.9 The amplitude reconstruction of the SRVFs and functions: {ˆqi} and { ˆfi} for Separate

FPCA. Note that the time warping functions seem to have accounted for some ofthe initial amplitude variability and therefore there is almost none present in thereconstructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.10 The basis functions of the SRVFs, scaled by the variance of the coefficients, used in thereconstruction process of the amplitudes. Computational noise causes these functionsto appear jagged, although this error is minimal. . . . . . . . . . . . . . . . . . . . . . 28

2.11 The first column contains basis functions of the {vi}, scaled by the variance of thecoefficients, used in the reconstruction process of the amplitudes from Separate FPCA.The second column are the {vi} themselves. The final column is the exponentialmapping of the {vi}, i.e. the {γi}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.12 Starting in the top row and moving left to right: {γi} from the Elastic FPCA methodfor J = 0, . . . , 4. A plot for the {γi} at J = 0 are included for reference. Note thatthis is the only part of any of the methods where J = 0 is more than just a simplemean of the functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.13 These are the error rates for all 9 variability settings. Each individual plot shows theerror rates for J = 0, ..., 4 of the reconstructions for FPCA (blue), Separate FPCA(green) and Elastic FPCA (red), using Equation 2.6. . . . . . . . . . . . . . . . . . . 32

3.1 We observe sampled functions with phase variability (top) and without phase vari-ability (bottom), in the first column. These functions are then split into amplitudes(second column) and phase (third column) components. . . . . . . . . . . . . . . . . . 34

3.2 The first image shows the original growth curves (cm/year). Their SRVFs are takenand then aligned, shown in the second and third images respectively. The L2 distances

vii

Page 9: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

between call the curves in the second and third image are displayed in the fourth image.The first half of the indices correspond to the unaligned SRVFs and the second halfcorrespond to the aligned SRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 A Minimal Spanning Tree (left) is formed from two samples. As described in Algorithm6, the tree is then cut where edges connect nodes from different samples (right). Thereare now TN = 13 separate trees remaining. This example shows that one sample (red)is completely connected in the original tree. Some nodes are only connected to theoriginal tree through the opposite sample, which now form trees of a single node. . . . 38

3.4 Various simulated functions before (left) and after alignment (center). The secondround of alignment (right) shows both the functions from the first round of alignment(blue) and the second (red) for comparison. . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 The TN (left) and P-values (right) for the simulated functions using the SNNT methodare shown as a function of K. The results are shown for comparing unaligned functionsto aligned (blue) and aligned to aligned (red) functions. . . . . . . . . . . . . . . . . . 41

3.6 Various growth curves before (left) and after alignment (center). The second roundof alignment (right) shows both the functions from the first round of alignment (blue)and the second (red) for comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7 The TN (left) and P-values (right) for the Berkeley Growth curves using the SNNTmethod are shown as a function of K. The results are shown for comparing unalignedfunctions to unaligned (blue) and aligned to aligned (red) functions. . . . . . . . . . . 42

3.8 Unaligned simulated functions are compared to aligned simulated functions using thesampling technique described. Results are shown for FRT and for K = 6 in SNNT. . . 44

3.9 Unaligned growth curves are compared to aligned growth curves using the samplingtechnique described. Results are shown for FRT and for K = 6 in SNNT. . . . . . . . 44

3.10 We want to know: are the fi (blue) significantly different from the fi (red)? The teststatistic is TN = 1540.8− 986.5− 68.9 = 485.4. . . . . . . . . . . . . . . . . . . . . . . 46

3.11 Distribution of TN under null hypothesis (blue). Estimated TN from original sample(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.12 Amplitudes (top) and Phases (bottom) used for simulation. No axes are shown, but allfunctions are plotted on the same scale. These amplitudes and phases are composedto form the initial functions used in the simulation. . . . . . . . . . . . . . . . . . . . 47

3.13 The amplitudes and phases are composed to simulate a variety of functions. No axesare shown, but all functions are plotted on the same scale. The variability in theamplitudes increases toward the right. The variability in the phases increases upward. 48

3.14 For each of the nine cases, we see the test statistics (red) along with the permutationdistribution. The nine cases shown are all plotted on the same axes to demonstratethe change in significance as phase and amplitude variability increase. . . . . . . . . . 49

viii

Page 10: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

3.15 This image illustrates when phase variability is significant or not significant. Thecenter region is where the methods proposed in Chapter 3 struggle. . . . . . . . . . . 50

4.1 The paired scores for comparison are shown on the left. The identity line (black) hasbeen added for reference. A bootstrapped distribution for ρc is shown on the right.The vertical line shows the location of the point estimate. . . . . . . . . . . . . . . . . 57

4.2 Left: One pair of the twenty simulated functions. Right: The bootstrapped distribu-tion of the CCC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3 We want to measure the agreement between the scores on the X-axis and those on theY-axis for two samples: Y (red) and Z (blue) shown on the left. The line of identity(black) has been added for reference. On the right, the bootstrapped distribution ofρyc is compared to the bootstrapped distribution of ρzc . . . . . . . . . . . . . . . . . . . 62

4.4 Simulation The original simulated functions. Each row uses the same amplitudeswith increasing time warping variance moving to the right. Each column uses thesame time warping functions, but increasing amplitude variance moving upward. . . . 65

4.5 Simulation Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c (red) for

each J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6 Simulation The p-values for the nine cases are shown for J = 0, ..., 50. The amplitudevariance increases as the plots move to the right. Each plot shows the p-values forsmall (blue), medium(red), and large (yellow) time warping variance. . . . . . . . . . 66

4.7 Female Growth Curves Left: The original female growth curves. Middle andRight: Reconstructions of the growth curves using J = 1 for FPCA and ElasticFPCA, respectively. Note that the functions are shown in the original space and notthe SRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.8 Female Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and

ρE(J)c (red) for each J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.9 Female Growth Curves The p-values for the female growth curves are close to zerountil around J = 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.10 Male Growth Curves Left: The original male growth curves. Middle and Right:Reconstructions of the growth curves using J = 1 for FPCA and Elastic FPCA,respectively. Note that the functions are shown in the original space and not theSRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.11 Male Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c

(red) for each indicated J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . 69

4.12 Male Growth Curves The p-values for the male growth curves are close to zerountil around J = 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.13 Tecator Data Left: The original tecator functions. Middle and Right: Reconstruc-tions of the tecator functions using J = 1 for FPCA and Elastic FPCA, respectively.Note that the functions are shown in the original space and not the SRVFs. . . . . . . 70

ix

Page 11: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

4.14 Tecator Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c (red)

for each indicated J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.15 Tecator Data The p-values for the tecator data are close to zero until around J = 20. 71

4.16 Temperature Data Left: The original temperature functions. Middle and Right:Reconstructions of the temperature functions using J = 1 for FPCA and ElasticFPCA, respectively. Note that the functions are shown in the original space and notthe SRVFs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.17 Temperature Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c

(red) for each indicated J along the x-axis. . . . . . . . . . . . . . . . . . . . . . . . . 72

4.18 Temperature Data The p-values for the weather data are close to zero until aroundJ = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

x

Page 12: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

ABSTRACT

Statistical analysis of functional data requires tools for comparing, summarizing and modeling

observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA)

is the presence of the phase variability in the observed data. A successful statistical model of

functional data has to account for the presence of phase variability. Otherwise the ensuing inferences

can be inferior. Recent methods for FDA include steps for phase separation or functional alignment.

For example, Elastic Functional Principal Component Analysis (Elastic FPCA) uses the strengths

of Functional Principal Component Analysis (FPCA), along with the tools from Elastic FDA, to

perform joint phase-amplitude separation and modeling. A related problem in FDA is to quantify

and test for the amount of phase in a given data. We develop two types of hypothesis tests for

testing the significance of phase variability: a metric-based approach and a model-based approach.

The metric-based approach treats phase and amplitude as independent components and uses their

respective metrics to apply the Friedman-Rafsky Test, Schilling’s Nearest Neighbors, and Energy

Test to test the differences between functions and their amplitudes. In the model-based test, we

use Concordance Correlation Coefficients as a tool to quantify the agreement between functions

and their reconstructions using FPCA and Elastic FPCA. We demonstrate this framework using a

number of simulated and real data, including weather, tecator, and growth data.

xi

Page 13: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

CHAPTER 1

INTRODUCTION

Functional Data is an important topic with applications in many numerous fields, such as: biology,

physics, mathematics, etc. Various applications have been explored with functional data: gait

measurements, tecator data, stock rates, childhood growth rates, weather data, etc.

The importance of functional data analysis can best be explained through an example. Figure

1.1 shows the Berkeley growth curves [9]. The children’s heights were recorded on a quarterly basis

from the ages of 1 to 18 years. We could treat each record as a vector of 69 time points instead

of as a function over time. However, children’s heights change overtime and not just at the time

points measured. Therefore it makes more sense to consider functional data.

Another example is seen in Figure 1.2. The average daily temperature at various locations in

Canada are plotted per day of the year for 35 locations in Canada [17]. We could treat each location

as a vector of 365 days, but temperature is not discrete and therefore makes more sense to consider

as a function.

According to Morris [15], functional regression is the fastest growing area in functional data

analysis. This developing area includes Functional Predictor Regression, Functional Response Re-

gression, and Function-on-Function Regression. Three other areas of Functional Data Analysis

include: Replication, Regularization and Basis Functions. Replication and Regularization are key

components in statistics in general, but they play an extra important role in functional data. Func-

tional data is often noisy and our goal is to explain the variation in noise between replicates. We

can smooth functions to assist with eliminating excessive noise.

Figure 1.3 demonstrates the advantage of smoothing through stock market data. The plot on

the left shows the closing stock price for Akamai Technologies on every business day in 2014. The

day to day noise is sometimes referred to as seasonal variation. If we are interested in long term

trends of this particular stock, we may want to eliminate this seasonal trend. The plot on the right

shows the smoothed Akamai stock prices using a moving average. Now it is easier to see the general

trend of the Akamai stock prices over the course of the year. We will not go into the details of this

method since it is not of interest to us in our problem.

1

Page 14: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 1.1: The Berkeley Growth Curves show the change in female children’s height (cm/year)from the ages of 1 to 18.

Figure 1.2: We show the average daily temperature (Celsius) over the course of the year for 35locations in Canada.

2

Page 15: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 1.3: On the left are the Akamai Technologies closing stock prices for each business day in2014. We smoothed the stock data to eliminate seasonal variation (shown on the right).

The first part of this dissertation will focus on Basis Functions and using them to model func-

tional data. The various models will be discussed in Chapter 2. These models originate from

Principal Component Analysis (PCA) [16]. PCA takes is a method of representing correlated,

multivariate observations as uncorrelated variables called Principal Components. PCA allows us to

express observations using fewer variables. The Karhunen-Loeve Expansion Theorem [10] showed

the same principals can be applied to functions. The first model we explore, Functional Principal

Component Analysis (FPCA) [3] uses the Karhunen-Loeve Expansion Theorem to create a PCA

equivalent for functions.

The inclusion of a time warping component in the model has been explored though Separate

Functional Principal Component Analysis (Separate FPCA) [23]. This model stems from the idea

of representing functions as an amplitude component and an orthogonal phase component [20]. In

Separate FPCA, the amplitude component is expressed using FPCA. The phase component is also

expressed using FPCA after being mapped to a vector field.

The third model we explore, Elastic Functional Principal Component Analysis (Elastic FPCA)

is an expansion on work completed by Kneip [12] and Tucker [22]. Elastic FPCA models functions

with an FPCA basis while considering the amplitude and phase components jointly. Details of

Elastic FPCA and the other two models are presented in Chapter 2. The goal of this chapter is

to explore the reconstruction of functions with and without time warping. As we will see, this can

result in overfitting to models where time warping isn’t present.

The second part of this dissertation will focus on hypothesis testing. This Chapter will first

discuss phase-amplitude separation, before giving motivation for hypothesis testing.

3

Page 16: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Throughout this dissertation, we show examples of our results on both simulated and real data

sets. The real data sets include the NIR spectra data [4], the Berkeley Growth Curves [9], and

Canadian Weather data [17]. The NIR spectra data set, also referred to here as the tecator data,

used a ”Tecator Infratec spectrometer that measures the absorbencies at 100 wavelengths in the

region 850-1050 mm” [4]. The Berkeley Growth Curves show the change in children’s heights for

the ages of one to eighteen. The growth curves are a good example of time warping in a data set,

since children tend to have growth spurts and various ages. The Canadian Weather data shows the

average daily temperature for each day of the year at 35 separate weather stations in Canada.

1.1 Motivation for Phase and Amplitude Separation

Returning to the idea of growth spurts in children, we may want to consider that children have

growth spurts at different times. One child may have a growth spurt at the ages of 1, 4, 8, and

12, while another has growth spurts at the ages of 1, 3, 8, and 13. The two children both have

growth spurts, but at different ages. We can consider this a phase component. Children will also

have different amplitudes, or changes in height. For example, one child might growth 10 cm/year

at the age of 8, while another child grew 7 cm/year. We provide formal definitions of phase and

amplitude in the next section.

We need to take phase and amplitude into consideration in order to compute more accurate

statistics. Consider Figure 1.4 as an example, which shows sampled functions on the left and their

mean point wise sample mean and sample standard deviations on the right. When we take the

point wise mean of these functions, we get a plateau shaped function. The mean does not give an

accurate representation of the individual functions, which have distinct peaks. The same is true

for the standard deviation. If we were to separate out the phase and amplitude components, we

could get a more accurate representation of the functions.

The same issues arise with the Berkeley growth curves. Figure 1.5 shows the original growth

curves on the left, and their pointwise sample mean and sample standard deviation on the right. If

we take a point wise average of all the children’s growth curves, we would not get a very accurate

representation of growth spurts. It appears as though children grow quickly at first and then have

a relatively flat rate of growth until around the age of 12. However, we can see from the individual

growth curves that children have multiple distinct growth spurts throughout their childhood. We

again conclude that separating out phase and amplitude may provide more accurate representations

of the functions.

4

Page 17: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 1.4: On the left are simulated functions. On the right are the pointwise mean (black) andthe pointwise mean plus/minus one standard deviation (red).

Figure 1.5: On the left are the Berkeley growth curves. On the right are the pointwise mean (black)and the pointwise mean plus/minus one standard deviation (red).

5

Page 18: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

An L2 function can be described in terms of amplitude variability and phase, or time warping

variability. We will introduce the framework this dissertation uses in the next section.

1.2 Background Material: Mathematical Amplitude Framework

It is important to understand the spaces and distance metric we are using. This section provides

an overview of the space of amplitudes, the space of time warping functions, and the distance metric

used. More details can be found in the textbook ”Functional and Shape Data Analysis” [20]. We

begin by introducing the distance metric and multiple alignment. We will then discuss the details

of the amplitude and phase spaces in the following subsections.

1.2.1 Phase Amplitude Separation

In the previous section, we gave an intuitive idea of what is meant by phase and amplitude.

Now the goal is to separate the functions into their phase and amplitude components. Essentially

we will align the peaks and valleys of functions through composition with a time warping function.

We will then consider the aligned functions to be the amplitude component and the time warping

functions to be the phase component. We remind the reader that this subsection will just give a

general overview of Phase Amplitude Separation. The following subsections will go into detail.

Algorithm 1 gives a general overview of how phase and amplitude separation is conducted. Our

goal is to find a set of time warping functions, which minimize the variance of the amplitudes.

Algorithm 1 Multiple Alignment Overview

1. Initialize the mean of the functions

2. Align each function to the template mean

3. Update the mean

4. Repeat steps 2 and 3 until converged

Figure 1.6 illustrates Algorithm 1 using the Female Berkeley Growth data. We initialize the

process (top row) with an identity phase component (first column), the details of which are explained

in a later subsection. Since the phases are all identity, the amplitudes (second column) are the

original functions. We also show the mean of these amplitudes (third column). On the first

iteration (second row), we find time warping functions (second row, first column), which minimize

the distance between each original function and the mean of the amplitudes. Note that we compute

6

Page 19: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

this distance in a pairwise fashion on a given iteration. We then compose the time warping functions

with the original functions to get a new set of amplitudes (second row, second column). We then

update the mean of the amplitudes using the pointwise mean of the new amplitudes (second row,

third column).

We repeat the process of computing the phase, amplitudes, and updating the mean of the

amplitudes until some convergence criteria has been met. In the particular case demonstrated in

Figure 1.6, we terminate the process after the twentieth iteration (bottom row).

1.2.2 Space of Amplitudes

We begin by discussing the general functions and amplitude space. Note that the L2 functions

exist on a closed interval [a, b]. For simplicity we will assume a = 0, b = 1 unless otherwise specified

for a real dataset. For a set of N,L2 functions: {fi(t)} i = 1, ..., N , we observe the Square-Root

Velocity Functions (SRVF), {qi(t)}:

qi(t) =

fi(t)√|fi(t)|

if fi(t) exists and is nonzero

0 otherwise,

A discussion of why we use the SRVF instead of the original functions is in [19].

The SRVFs can be expressed as follows:

qi(t) = µ(t) +∞∑j=1

cijbj(t),

where µ is an underlying mean, {bj} is an orthonormal basis of L2, and cij has mean 0 and

variance σ2j . Simulated functions will use the fourier basis with normally distributed coefficients

for simplicity.

In practice we use a finite basis, because an infinite basis is not practical:

qi(t) = µ(t) +J∑j=1

cijbj(t) + εi(t),

where J is the number of basis functions, and εi(t) has mean 0 and variance σ2ε . In application,

since the functions are discretized at M time points, J ≤ max(M,N). When discussing the basis

sets, it will be assumed var(cij) > var(cik) when j < k, as is common practice.

7

Page 20: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Iter. Phase Amplitude Mean of Amp.

0

1

2

3

... ... ...

20

Figure 1.6: This figure illustrates the Multiple Alignment Algorithm. The algorithm initializes (toprow) time warping as identity. For each iteration, we show the phase (left column), amplitudes(middle column), and mean (right column). After initialization, we show the next three iterations(second through fourth row). We terminate the process on the twentieth iteration (bottom row).

8

Page 21: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

1.2.3 Space of Time Warping Functions

The time warping functions are represented by {γi} ∈ Γ where Γ is the set of all diffeomorphisms

from [0, 1] to [0, 1]. These functions form a group action on the SRVFs:

(qi, γi) = qi(γi)√γi.

This action preserves the norm of the SRVFs, i.e. ‖qi‖ = ‖(qi, γi)‖. To express these time

warping functions with a set of basis functions, they are mapped into a vector space:

vi(t) = G(γi(t)) =θi

sin(θi)

(√γi(t)− cos(θi)

), (1.1)

where

θi = cos−1

(∫ 1

0

√γi(t)dt

).

These functions can now be represented as:

vi(t) = µ(v)(t) +∞∑j=1

c(v)ij b

(v)j (t),

where {b(v)j } is an orthonormal basis of L2, and c

(v)ij has mean 0 and standard deviation σ

(v)j .

The parameter µ(v) is taken to be zero. Note that if µ(v) is not zero, the mean can be subtracted

from all vs, which corresponds to centering γs at γid(t) = t.

1.2.4 Distance Metric

The methods discussed in the previous subsections develop the distance metric:

d(qi, qj) = infγ∈Γ‖qi − (qj , γ)‖.

Variability from time warping can be estimated by finding a mean and set of time warping

functions, which minimize these distances:

9

Page 22: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 1.7: Growth rates (cm/year) of girls between the ages of 1 and 18 years of age. The firstplot shows the unaligned growth curves. The curves are then aligned using Multiple Alignmentand are separated into amplitudes (center) and time warping functions (right).

{γi}, µ =N∑i=1

arginfγi∈Γ,µ

‖(qi, γi)− µ‖2,

where µ = 1N

∑Ni=1(qi, γi). This process is conducted iteratively using Algorithm 2.

Algorithm 2 Multiple Alignment

1. Set γi = γid, for all i

2. Interpolate qi at γi: qi = (qi, γi)

3. Compute the estimated mean: µ = 1N

∑Ni=1 qi

4. Use the Dynamic Programming Algorithm [19]:

γi = arginfγ∈Γ ‖(qi, γ)− µ‖2 for all i

5. Repeat steps 2 through 4 until converged

Figure 1.7 is an example of the Multiple Alignment algorithm. The figure shows growth curves

from the Berkeley Growth Study [9] on the left. We show the amplitude and time warping func-

tions in the middle and on the right, respectively. We align the growth curves to give a better

understanding of the pattern of growth spurts in children, without needing to worry about these

growth spurts occurring at different ages for different children.

1.3 Motivation for Testing Phase Variability

Once we understand phase-amplitude separation, it is important to determine when phase is

significant. Figure 1.8 and Figure 1.9 demonstrate the importance of testing for significance. Both

10

Page 23: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 1.8: Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phase components(right).

Figure 1.9: Left: Unimodel functions with a large amount of phase variability. We use the Multi-ple Alignment algorithm to separate the functions into amplitude(middle) and phase components(right).

figures show sample functions (left) and their amplitudes (middle) and phases (right) In Figure

1.8, we separated the phase and amplitude components from a set of simulated functions using the

Multiple Alignment Algorithm. Figure 1.9 shows another example, however, it appears there is not

as extreme of a difference between the original functions and the amplitudes after alignment.

We can clearly see phase plays an important role in the first example. It is less clear in the

second example whether phase plays an important role. On the one hand, the functions look almost

identical after alignment. On the other hand, the phases are no identically identity. The locations

were phase is not identity appear to be where the original functions are almost zero. We are left

with the question: Is phase significant or not?.

Morris [15] conducted a literature review on functional regression. He notes that most work in

functional data analysis focuses on point estimates. He also notes that existing work on confidence

intervals and hypothesis testing for functions focuses on point wise estimation instead of on the

function as a whole. He concludes that more work is needed to be completed in this area.

11

Page 24: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Chapter 3 explores a metric-based approach to hypothesis testing for the presence of time

warping in functions, while Chapter 4 explores a model-based approach. Hypothesis testing allows

us to better understand how the effects of the various models and identify when overfitting using

a time warping function may be occurring.

Hagwood [7] conducted hypothesis tests on closed curves using time warping components in a

distance metric. Three preexisting tests for comparing two populations are used to conduct the

hypothesis: Friedman-Rafsky Test [6], Schilling’s Nearest Neighbors Test [18], and the Energy Test

[1]. Although our interests for hypothesis testing differs from Hagwood [7], this paper led us to

these three hypothesis testing methods.

Chapter 3 will used the Friedman-Rafsky Test (FRT) [6] and the Schilling Nearest Neighbors

Test (SNNT) [18] in our hypothesis tests. FRT employees a minimal spanning treat to determine

the extent of the separation between two populations of observations. SNNT uses a distance matrix

and indicator functions to compare the same separation. Later on in Chapter 3 we explore using

the Energy Test, which takes a similar approach to an ANOVA.

The main focus of Chapter 4 is on a model-based approach to creating a hypothesis test for

phase variability. This required the use of a paired test, which is a shortcoming of the tests we

introduce in Chapter 3. Chapter 4 will review a few of the methods we attempted to use before

we came across Concordance Correlation Coefficients (CCC) [14]. CCC are a way of measuring

agreement between paired continuous data. King [11] expanded the use of CCC for continuous

and categorical data. Li [13] expanded the use of CCC for evaluating the agreement of two sets of

functions.

Our methods require comparing multiple concordance correlation coefficients. Fisher [5] intro-

duced the z-transformation for use with correlation coefficients. Steiger [21] proposed comparing

two correlations using a correlation matrix. These methods allowed Barnhart [2] to create a method

for comparing multiple CCCs. Unfortunately this method of comparison for CCC does not work

well in cases of almost complete agreement.

1.4 Overview of Dissertation

This dissertation will have the following discussions:

1. Chapter 2: Methods of Modeling Functional Data

(a) Introduction Functional Principal Component Analysis (FPCA)

12

Page 25: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

(b) Previous models of FPCA after phase-amplitude separation

(c) Introduce a new model for combining phase-amplitude separation with FPCA

2. Chapter 3: Metric-Based Hypothesis Testing for Phase Variability in Functional Data

(a) Review of the importance of hypothesis testing for phase in functional data analysis

(b) Using the Friedman-Rafsky Test and Schilling’s Nearest Neighbors in the hypothesis

testing

(c) Using the Energy Test with a permutation distribution for hypothesis testing

3. Chapter 4: Hypothesis testing for Phase Amplitude Separation using Models

(a) Review of the goals of hypothesis testing for phase in functional data analysis

(b) Introduction to Concordance Correlation Coefficients

(c) Using Concordance Correlation Coefficients in hypothesis testing for phase in functional

data

4. Each Chapter also includes various simulated and/or real data examples to demonstrate the

methods

13

Page 26: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

CHAPTER 2

METHODS OF MODELING FUNCTIONAL DATA

As explained in Chapter 1, there are various L2 functional data sets which have been modeled in

the past. Understand the variability in functions is important for estimating and predicting. The

challenges of understanding functional variability include working in an infinite dimensional space.

In addition, we run into issues with time warping of functions and issues with registration.

We will first focus on the issue of working in an infinite dimensional space. Dimension reduction

is possible through the use of basis functions. As mentioned in Chapter 1, there are various models

for regression on functional data. This dissertation will focus on linear models instead of more

complicated models. There are two main strategies used to reduce the dimensions in a linear

model: use a pre-determined set of basis functions or use a data driven set of basis functions.

Examples of pre-determined sets of basis functions include the Fourier Basis or the polynomial

basis functions. This dissertation will focus on using data driven methods, specifically those that

stem from Principal Component Analysis.

The goal of this Chapter is to understand the more substantial directions of variability both

from amplitudes and from time warping. To do this, we will study the reconstruction of functions

under three different models.

This chapter will treat functions as random quantities, which can be measured in various ways.

The goal will be to find a model which best represents the variability in the given dataset.We will

need metrics for measuring differences in the functions and ways of representing the functions.

Section 2 will discuss some of the theory behind these models including a distance metric.

Section 3 will discuss the methodology behind the three models: FPCA, Separate FPCA and

Elastic FPCA. Section 4 will show results from simulations and real data examples. Section 5 will

draw conclusions and point to future directions for exploration.

2.1 Models for Functional Data

This section will explore two previous models for functional data: Functional Principal Compo-

nent Analysis (FPCA) and Separate Functional Principal Component Analysis (Separate FPCA).

14

Page 27: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

We will introduce and expand on a third model: Elastic Functional Principal Component Analysis

(Elastic FPCA). Examples of the three models will be shown in the results section.

2.1.1 Functional Principal Component Analysis in SRVF Space

Functional Principal Component Analysis (FPCA) [3] is straightforward, since it does not take

time warping into consideration. FPCA comes almost directly from Principal Component Analysis

(PCA) [16]. PCA is a method of representing correlated, multivariate observations as uncorrelated

variables called Principal Components. PCA allows us to express observations using fewer vari-

ables. The Karhunen-Loeve Expansion Theorem [10] showed the same principals can be applied to

functions. The first model we explore, Functional Principal Component Analysis (FPCA) [3] uses

the Karhunen-Loeve Expansion Theorem to create a PCA equivalent for functions. We can express

these functions in the SRVF space as follows:

qi(t) = µ(t) +J∑j=1

cijbj(t) + εi(t), (2.1)

where:

• µ(t) is the expected value of qi(t),

• {bj} form an orthonormal basis of L2,

• εi ∈ L2 is considered the noise process, typically chosen to be a white Gaussian process with

zero mean, and variance σ2

• cij ∈ R are coefficients of (qi− µ) with respect to {bj}. In order to ensure that µ is the mean

of qi, we impose the condition that the sample mean of {c·j} is zero.

Given a set of observed functions {fi}, the estimation of these model parameters is performed

using the minimization on their SRVFs, {qi}:

(µ, ˆ{b}) = arginfµ,{bj},{cij}

N∑i=1

‖qi − µ−J∑j=1

cijbj‖2 , (2.2)

and set ci,j =⟨qi − µ, bj

⟩. This minimization is performed as follows:

Algorithm 3 Functional Principal Component Analysis

1. Estimate the mean: µ(t) = 1N

∑Ni=1 qi(t).

15

Page 28: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

2. Define the sample covariance function: K : [0, 1]× [0, 1]→ R according to:

K(s, t) =1

N − 1

N∑i=1

(qi(s)− µ(s))(qi(t)− µ(t)) .

3. Perform Singular Value Decomposition on K to obtain the estimated first J Principal-Directions:

{bj}.

4. Compute the estimated coefficients: cij =⟨qi − µ, bj

⟩.

An estimate for qi is formed as follows:

qi(t) = µ(t) +J∑j=1

cij bj(t)

The function K is by definition symmetric and positive semidefinite, where the latter means

that for any g ∈ L2, we have∫ 1

0

∫ 10 K(s, t)g(s)g(t) ds dt ≥ 0. This covariance function K defines

a linear operator on L2 using the formula: A : L2 → L2, Aq(t) =∫ 1

0 K(s, t)q(s) ds. Since K is

positive semidefinite we have 〈Aq, q〉 ≥ 0, for all q ∈ L2 [8].

According to the Karhunen-Loeve expansion theorem [10], the eigenfunctions of A provide

the principal components of the function data. Let b1, b2, . . . , bJ be the eigenfunctions of A, i.e.

Abj(t) = λj bj(t), so that the corresponding eigenvalues {λj} satisfy: |λ1| ≥ |λ2| ≥ . . . . Then,

{bj , j = 1, 2 . . . , J} solves the optimization problem in 2.2. They are termed the first J principal-

directions of variations in the given data. The space spanned by them is called the principal

subspace.

As discussed in Hall [8], there are a few asymptotic properties of FPCA:

• As J →∞, ‖qi − qi‖ → 0

• As N →∞, bj → bj and var(cij)→ var(cij)

2.1.2 FPCA of Phase and Amplitude Components

Separate FPCA [23] takes possible time warping of the functions into consideration with the

following model:

qi(t) = (qi, γi) (2.3)

=

µ(t) +

J∑j=1

cijbj(t) + εi(t), G−1

J∑j=1

c(v)ij b

(v)j (t) + ε

(v)i (t)

, (2.4)

16

Page 29: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

where all parameters are as discussed in Section 2.2 and G−1 is the inverse of Equation 1.1.

Estimates for γis and qis can be found using the following algorithm.

Algorithm 4 Separate FPCA

1. Align the SRVFs of the functions to get the qis and γis, using Algorithm 2.

2. FPCA on Amplitudes

(a) Perform FPCA on the qis to get their lower dimension reconstructions: ˆqi, using Algo-

rithm 3.

3. FPCA on Phases

(a) Map the γis from Γ to T (S∞) to get the vis using the G function.

(b) Perform FPCA on the vs to the get their lower dimension reconstructions, vis.

(c) Map the vis back to Γ to get the γis using G−1.

4. Combine the estimates as follows:

qi(t) = (ˆqi, γi−1).

The reconstruction on the amplitudes and time warping functions, separately, hold the same

properties as described in the FPCA section. However, the full reconstruction of the functions

is somewhat arbitrary since the principal directions are found separately and ignore the possible

dependence of the time warping functions on the amplitudes and vice versa. This is where Elastic

FPCA comes in.

2.1.3 Elastic FPCA

Elastic FPCA comes from models studied in [12] and [22]. It considers εi a single estimate of

error, as opposed to having two separate estimates of error. While εi represents unaccounted vari-

ability from the reconstructions, γi assists with minimizing the variability which can be described

as being from time warping. The model is as follows:

qi(t) =

µ(t) +J∑j=1

cijbj(t), γi

+ εi(t). (2.5)

Similar to the method used in Separate FPCA, we can apply a coordinate-descent technique

for estimation and the resulting algorithm can be summarized in the following.

17

Page 30: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Algorithm 5 Elastic Functional Principal Component Analysis

1. Initialization: Set γi = γid and qi = (qi, γi), for all i.

2. Compute the covariance matrix

Kq(s, t) =1

N − 1

N∑i=1

(qi(s)− µ(s))(qi(t)− µ(t)) .

3. Take the SVD of Kq, and set bjs to be the first J eigenvectors of Kq, cij =⟨qi, bj

⟩, and

µ = 1N

∑Ni=1 qi. The set of basis functions are now fixed for the following:

(a) For each i, solve the optimization problem using the DPA:

γi = arginfγ∈Γ

‖(qi, γ)− µ−J∑j=1

cij bj‖2 .

(b) Form the warped SRVFs qi = (qi, γi) for all i .

(c) Estimate µ using µ = 1N

∑Ni=1 qi .

(d) Compute the coefficients cij =⟨qi − µ, bj

⟩for i = 1, . . . , n and j = 1, . . . , J .

(e) Check for convergence of the reconstructions. If not converged, return to step (a).

4. Check for convergence of the basis functions. If not converged, return to Step 3.

We test the convergence using the decrease in the objective function from one iteration to the next.

Figure 2.1 is an abstract drawing of Algorithm 5. It does not reflect details of the algorithm, but

the reader may find it useful to get an intuitive sense of the iterations.

As J increases, εi decreases, and γi approaches γid. Note that when J = 0, Elastic FPCA

returns the mean of the aligned functions as found in Algorithm 2. As J → ∞, the convergence

properties described under FPCA hold, and converge to the same parameters as in FPCA.

2.2 Results

The simulations in this section used the parameters in Table 2.1. Remember that the model

uses the SRVFs and therefore we simulated SRVFs. The only parameters varied in the study relate

to the variance of the coefficients of the amplitudes and time warping functions. Variations in the

remaining parameters have been studied, however, they are less important in terms of current and

future work. Figures 2.2 through 2.6 show the original simulated functions under these parameters.

18

Page 31: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 2.1: Representation of Algorithm 5A: Use {qi} (black dot) to find {bj}. The blue curve represents all possible functions using thisbasis set.B: Fix {bj} (now black), find optimal {cij} (red dot) by minimizing distance (purple line).C: Find {qi} (green dot), which is minimum distance from reparameterizations of {qi} to recon-struction.D: Find optimal reconstruction of {qi}, repeat until converged.E: Use {qi} to find a new basis set (blue).F: Repeat steps B through E until converged on {bj}.This image is only meant to be an abstract representation to give a more intuitive understandingof the algorithm. It does not reflect all the details of the algorithm, nor does it properly reflect themathematics behind the algorithm.

19

Page 32: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Table 2.1: The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate a variety ofamplitude and time warping variability.

Paramater Amplitudes Phases

µ Φ(.5, .5) 0{bj} Fourier FourierJ 4 2{cij} N(0, σ2

j ) N(0, σ2W )

var({cij}) σ2j = ξ/j, [.1, 1, 3]

ξ = [.01, 0.5, 2]

ξ = .01 ξ = 0.5 ξ = 2

Figure 2.2: The original sampled amplitude SRVF functions: {qi}. These amplitudes have almostno variability (left) and a lot of variability (right).

Figure 2.2 show the SRVFs that were simulated using the parameters in Table 2.1. The figure

shows the SRVFs with low (left), medium (middle), and high (right) variability. The inverse SRVFs,

i.e. the amplitudes, for each of the cases shown in Figure 2.2 are shown in Figure 2.3. Figure 2.4

show the time warping functions which were formed by the inverse exponential mapping from the

parameters in Table 2.1.

Each of the simulated amplitudes were composed with each of the simulated phases to create

nine test cases. Figure 2.5 show the SRVFs of nine test cases and Figure 2.6 show the inverse of

the SRVFs, i.e. the amplitudes. For both figures, the amount of phase variability increase to the

right, while the amount of amplitude variability increases downward. Looking at Figure 2.6, it is

clear the top left plot shows functions with very little variability overall. This variability increases

more as the phase and amplitude variability are increased.

Figures 2.7 and 2.8 show the reconstruction of the amplitudes under the various methods for

20

Page 33: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

ξ = .01 ξ = 0.5 ξ = 2

Figure 2.3: The original sampled amplitude functions: {fi}. Note that the inverse SRVF of the basisfunctions are not orthonormal, so the variability in these functions do not necessarily correspondto the variability seen in Figure 2.2

.

σ2W = .1 σ2

W = 1 σ2W = 3

Figure 2.4: The original sampled time warping functions: {γi}. Note that these functions aresampled in a vector space and mapped to Γ. Therefore the variability in these functions do notnecessarily correspond to the variability described by Table 2.1

.

21

Page 34: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

σ2W = .1 σ2

W = 1 σ2W = 3

ξ=.0

=0.

=2

Figure 2.5: The original SRVF functions: {qi}. Note that the same amplitudes are used for eachrow and the same time warping functions are used for each column.

22

Page 35: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

σ2W = .1 σ2

W = 1 σ2W = 3

ξ=.0

=0.

=2

Figure 2.6: The original sampled functions: {fi}. Note that the same amplitudes are used for eachrow and the same time warping functions are used for each column.

23

Page 36: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

the middle variabilities, i.e. ξ = 0.5 and σ2W = 1. This is just to give an idea of the reconstruction

process. A comparison of the error rates is given for all 9 test cases at the end. Both figures 2.7 and

2.8 show the reconstructions of this particular test case for FPCA (left column), Separate FPCA

(middle column), and Elastic FPCA (right column). We show the reconstructions when including

J = 1, 2, 3, and 4 basis sets (top to bottom).

In FPCA, the amplitudes are the only interest and therefore the reconstruction of the amplitudes

is the reconstruction of the functions themselves. In Elastic FPCA, the same is true and γs are used

specifically to aid in this reconstruction process. In Separate FPCA, the amplitudes are considered

independent of the γs. This is reflected by the Separate FPCA amplitudes approaching aligned

amplitudes rather than the original functions.

Figure 2.9 shows reconstructions of the amplitude functions using the Separate FPCA method,

as opposed to incorporating both the reconstructed amplitudes and reconstructed phases. As previ-

ously mentioned, the reconstructions on the amplitudes and time warping functions are conducted

separately and do not necessarily make sense to show together. Figure 2.9 shows the reconstruc-

tions of the SRVFs (left) and amplitudes (right) when J = 1, 2, 3, and 4 (top to bottom) sets of

basis functions are included in the reconstructions.

Figure 2.10 shows the basis functions used in the reconstruction process for FPCA (left column),

Separate FPCA (middle column), and Elastic FPCA (right column) using J = 1, 2, 3, and 4 basis

functions in the reconstruction process. The jth set of basis functions used in the reconstruction

using J basis functions for FPCA and Separate FPCA are consistent. That is, the basis function

shown for J = 1 for FPCA is the same as the first basis function for the remaining J . The same

is true for the Separate FPCA method. The basis functions used in Elastic FPCA converge to the

basis functions used in FPCA.

The Separate FPCA and Elastic FPCA methods both include time warping functions. In

Separate FPCA, they are an independent parameter with their own reconstruction process, while

in Elastic FPCA the time warping functions assist with minimizing errors beyond what the basis

functions can do. Since the time warping functions serve different purposes for the two methods

Figure 2.11 shows the process of reconstructing the time warping functions for Separate FPCA

method using J = 1, 2, 3, and 4 (top to bottom) basis functions in the reconstruction process. The

basis functions (left column) are used to form functions on the Hilbert Sphere (middle column),

which are then mapped ot the space of time warping functions (right column). Figure 2.12 shows

24

Page 37: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

FPCA Separate Elastic

J=

1J

=2

J=

3J

=4

Figure 2.7: The reconstructed SRVF functions: {qi} for the three methods. All three methods areconverging to the same set of functions. Computational noise is causing the functions to look morejagged. This noise is very minimal and is avoided when observing the {fi} in Figure 2.8.

25

Page 38: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

FPCA Separate Elastic

J=

1J

=2

J=

3J

=4

Figure 2.8: The reconstructed functions: {fi} for the three methods. Note that these are theinverse SRVFs of the reconstrunstructions, as oppose to reconstructions themselves.

26

Page 39: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

{qi} {fi}

J=

1J

=2

J=

3J

=4

Figure 2.9: The amplitude reconstruction of the SRVFs and functions: {ˆqi} and { ˆfi} for Sepa-

rate FPCA. Note that the time warping functions seem to have accounted for some of the initialamplitude variability and therefore there is almost none present in the reconstructions.

27

Page 40: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

FPCA Separate Elastic

J=

1J

=2

J=

3J

=4

Figure 2.10: The basis functions of the SRVFs, scaled by the variance of the coefficients, used inthe reconstruction process of the amplitudes. Computational noise causes these functions to appearjagged, although this error is minimal.

28

Page 41: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

the process of deconstructing the time warping functions for Elastic FPCA using J = 0, 1, 2, 3, and

4 basis functions in the reconstruction process.

The error rates of the reconstructions are computed using equation 2.6. The error rates are

displayed for the three methods in Figure 2.13. Reconstruction errors are only shown for up to

J = 4, because the reconstructions using the Separate and Elastic FPCA methods have converged

by then. The reconstruction error rates are shown for all nine test cases, where the phase variability

increases moving left and the amplitude variability increases moving downward.

ErrorRate =N∑i=1

‖qi − qi‖/‖qi‖ (2.6)

Elastic FPCA tends to converges around J = 1 while Separate FPCA tends to converge around

J = 2 or 3, but not as consistently as FPCA. This is because the reconstruction rates of the vs does

not translate back to the reconstruction rates of the γs. FPCA converges the slowest and does not

usually seem to converge by J = 4 unless no time warping is present.

2.3 Conclusion

When the functions are not simulated with time warping variability, it is clear the Separate

FPCA struggles taking potential time warping variability into account. It is clear Elastic FPCA is

the desired method to minimize variability in the functions, while avoiding unecessary time warping.

It is noted that the amplitude variability in the Separate FPCA method is low and the basis

functions capture a lot of noise created in the minimization process.

It is possible that a combination of FPCA and Separate FPCA would perform better if there is

a way of indicating whether time warping is truly present in the functions or not. This topic will

be discussed in Chapter 3.

29

Page 42: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

{bj} {vi} {γi}

J=

1J

=2

J=

3J

=4

Figure 2.11: The first column contains basis functions of the {vi}, scaled by the variance of thecoefficients, used in the reconstruction process of the amplitudes from Separate FPCA. The secondcolumn are the {vi} themselves. The final column is the exponential mapping of the {vi}, i.e. the{γi}.

30

Page 43: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 2.12: Starting in the top row and moving left to right: {γi} from the Elastic FPCA methodfor J = 0, . . . , 4. A plot for the {γi} at J = 0 are included for reference. Note that this is the onlypart of any of the methods where J = 0 is more than just a simple mean of the functions.

31

Page 44: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

σ2W = .1 σ2

W = 1 σ2W = 3

ξ=.0

=0.

=2

Figure 2.13: These are the error rates for all 9 variability settings. Each individual plot shows theerror rates for J = 0, ..., 4 of the reconstructions for FPCA (blue), Separate FPCA (green) andElastic FPCA (red), using Equation 2.6.

32

Page 45: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

CHAPTER 3

METRIC-BASED HYPOTHESIS TESTING FOR

PHASE VARIABILITY IN FUNCTIONAL DATA

As observed in Chapter 2, time warping can sometimes be unnecessary to functional data analysis.

This maybe because the amount of phase variability is too small to make a significant difference.

We need to have a way of detecting when we should consider time warping and when we are risking

overfitting the data. This is similar to determining the significance of the slope in simple linear

regression. We need to know: Is the phase variability significant for a set of functions?

We remind the reader that for a set of functions {fi}, we can separate the phase and amplitude

components of the functions using the Multiple Alignment Algorithm introduced in Chapter 1.

Any particular function, fi can then be expressed as a pair fi = (fi, γi), where fi is considered the

amplitude and γi is considered the phase. The Multiple Alignment Algorithm uses the SRVF of fi:

qi(t) =

fi(t)√|fi(t)|

if fi(t) exists and is nonzero

0 otherwise,

so that the action of the time warping functions is norm preserving. More details are presented

in Chapter 1.

Figure 3.1 gives an example of the alignment solution. The first column shows two different

samples of simulated functions: an example with large phase variable (top) and an example with

almost no phase variability (bottom). We can see evidence of the phase variability when we separate

the phase and amplitude components as shown in the right column. Note that the amplitudes,

shown in the middle column, of the second case look almost identical to the original functions,

while the amplitudes in the first case are shifted. We would like a formal test that assess the

amount of phase variability in the phase component in a given data set. In other words, we would

like a hypothesis test to indicate there is phase variability in the first case, but not in the second

case.

33

Page 46: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

f f γC

ase

1C

ase

2

Figure 3.1: We observe sampled functions with phase variability (top) and without phase variability(bottom), in the first column. These functions are then split into amplitudes (second column) andphase (third column) components.

The goal of Chapters 3 and 4 is to construct such tests for the significance of phase variability.

The hypothesis test, using the same notation as in Chapter 2 is:

H0 : {γi = γid, for all i}

HA : {γi 6= γid, for at least one i}.(3.1)

We take two approaches to performing this test: a metric-based approach and a model-based

approach.

In Chapter 3, we will take a metric-based approach. The basic idea is to use the metric between

functions to separate out the phase component {γi} and amplitude component {qi}. Given a metric,

we can develop a test involving differences between {qi} and {qi}. We can rephrase this test as:

H0 : {qi = qi, for all i}

HA : {qi 6= qi, for at least one i}.

This is equivalent to the previous hypothesis, since qi = qi if and only if γi = γid. We make further

modifications to this hypothesis test in Chapter 3.

In Chapter 4, we will take a model-based approach. The goal of the model-based approach is to

consider phase and amplitude together and not separately. We motivate the model-based approach

34

Page 47: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

at the end of Chapter 3. The specific models used are FPCA based, but one can choose ay other

type of model also. If we use the Elastic FPCA model introduced in Chapter 2, the amount of phase

variability in our model decreases as we allow for more amplitude variability. We can reconstruct

a set of functions using Elastic FPCA (taking both phase and amplitude into consideration) and

compare it with the standard FPCA model (taking just amplitude into consideration). The FPCA

and Elastic FPCA models are equal if and only if there is no phase variability present. In this

approach, we quantify the differences using the reconstructions based on the two models. We can

write this as:

H0 : {qFi = qEi , for all i}

HA : {qFi 6= qEi , for at least one i}.

This is equivalent to the initial hypothesis test, since the models are equal if and only if γi = γid.

We make further modifications to this hypothesis test in Chapter 4.

We remind the reader that this Chapter will introduce methods for testing the significance of

phase by using a metric-based approach. We will begin this process by first introducing tests to

compare groups of functions. Specifically, Hagwood [7] explored the application of the Friedman-

Rafsky Test (FRT) [6] and Schilling Nearest Neighbors Test (SNNT) [18] on closed curves in RM .

They used methods of finding geodesic distances, which are similar to the problem of open curves.

What makes applying these methods particularly difficult is the need for independence. The

alignment process creates a dependence of the aligned curves on each other. We observe a simple

solution to this problem, with future exploration for better solutions possible. Other than this, the

challenges we face with this chapter are similar to the challenges addressed in Chapter 2.

Section 1 will discuss the theory behind hypothesis testing on L2 functions. Section 2 will give

an overview of the methods and procedures used in FRT and SNNT. Section 3 will show some

simulated and real examples of this application. Section 4 will show additional work using the

Energy Test, which which will be introduced later. Section 5 will make observations about the

examples and point to future directions for exploration.

3.1 Background Material

Given a sample of functions, let S1 represent N unaligned curves and S2 represent N aligned

curves. If S1 contained mostly aligned curves, then the distance between S1 and S2 shouldn’t be

significant. We use the L2 norm as the distance.

35

Page 48: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Both methods, FRT and SNNT, are applicable to sample data sets of unequal size. For the

purpose of this paper, we are not interested in unequal samples sizes and only observe equal sample

sizes. We clarify these reasons in the next few sections.

3.2 Metric-Based Hypothesis Tests

The following are hypothesis tests which use compare two samples of functions using a distance

matrix. In each example, the distance between two functions is defined using the L2 norm of their

SRVFs.

The basic way we use these tests is to compare two samples: S1 = {qi} and S2 = {qi}, where {qi}

is the SRVF of the original functions and {qi} are found using Algorithm 2. Figure 3.2 illustrates

this set up using the Berkeley Growth Curves. The original growth curves are shown in the upper

left. The SRVFs are shown in the upper right. The aligned SRVFs are shown in the lower left. The

SRVFs and aligned SRVFs are taken as the two samples. The lower right panel shows a pairwise

distance matrix between the two samples and will be used shortly.

Better methods for applying these methods are discussed and shown later in this chapter and

discussed in the future works chapter.

3.2.1 Friedman-Rafsky Test

The Friedman-Rafsky Test (FRT) is a generalization of the Wald-Wolfowitzs Runs Test (WWRT),

in which two populations in R are compared. In WWRT, the data is ordered and the number of

runs of consecutive sample members are recorded as the test statistic.

FRT replaces R with RM and uses a Minimal Spanning Tree as a form of ordering the data.

Edges connecting nodes of different samples are eliminated and then the number of disjoint trees,

TN , are counted. TN is asymptotically normally distributed with mean, µN = N + 1 and variance,

σ2N =

N

2N − 1

(N − 1− C − 2N + 2

2N − 3

),

where C is the sum of edge pairs, which share a common node. The null hypothesis is rejected

for small values of TN . A quick example of this process with the minimal spanning trees is shown

in Figure 3.3. We form a minimal spanning tree (left) and then break the minimal spanning tree

at edges connecting between the two samples (right). This leaves TN = 13 disjoint trees remaining.

In this particular example, one sample remains together as one tree, indicating a smaller variance

between that sample than with the other sample.

36

Page 49: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 3.2: The first image shows the original growth curves (cm/year). Their SRVFs are takenand then aligned, shown in the second and third images respectively. The L2 distances between callthe curves in the second and third image are displayed in the fourth image. The first half of theindices correspond to the unaligned SRVFs and the second half correspond to the aligned SRVFs.

37

Page 50: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 3.3: A Minimal Spanning Tree (left) is formed from two samples. As described in Algorithm6, the tree is then cut where edges connect nodes from different samples (right). There are nowTN = 13 separate trees remaining. This example shows that one sample (red) is completelyconnected in the original tree. Some nodes are only connected to the original tree through theopposite sample, which now form trees of a single node.

Algorithm 6 Friedman-Rafsky Test

1. Take the SRVFs of the original functions to get S1 = {qi}.

2. Use Algorithm 2 to get S2 = {qi}.

3. Compute a distance matrix between all the functions from both samples.

4. Use the distance matrix to form an MST.

5. Break the MST at edges connecting nodes of different samples.

6. Count the number of disjoint trees, TN .

3.2.2 Schilling Nearest Neighbors Test

Schillings Nearest Neighbors Test observes the proportion of K nearest neighbors to a function

{qi}, which belong to the same sample. The K nearest neighbors are determined by pooling the

functions from both samples and computing a distance matrix. This forms the following test

statistics:

TNK =1

2NK

2N∑i=1

K∑k=1

Ii(k),

38

Page 51: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Table 3.1: Distances between observations are shown above. Using K = 2 as an example, the twonearest neighbors for each observation is in bold for that row. For example, A1’s nearest neighborsare A2 and B1. We now count the number of these neighbors, from the same sample size as theobservation (last column). There are 10 such neighbors, resulting in TNK = 10

12 .

A1 A2 A3 B1 B2 B3

A1 - 1.1 3.2 1.2 2.0 4.2 1A2 1.1 - 1.9 2.7 2.6 3.6 2A3 3.2 1.9 - 2.5 3.5 3.4 1B1 1.2 2.7 2.5 - 0.9 1.1 2B2 2.0 2.6 3.5 0.9 - 0.4 2B3 4.2 3.6 3.4 1.1 0.4 - 2

where Ii(k) = 1 if the kth closest function to i belongs to the same sample and 0 otherwise.

Schilling [18] shows TNK is normally distributed with mean, µ = 0.5 and variance, σ2K = 0.25 +

0.25(1−(

2KK

)2−2K). An example of this is shown in Table 3.1.

3.3 Results

The results are shown in two subsections. The first subsection shows some basic examples of the

methods described in the previous section. The second subsection shows results using a repeated

sampling method. The repeated sampling method is detailed before showing results.

3.3.1 Basic Examples

Two simulated sample data sets are examined. The first is simulated bimodal functions with

some amplitude and time warping variability. The second is simulated unimodal functions with a

small amount of amplitude variability and a moderate amount of time warping variability.

Figure 3.4 shows the original functions (left) and the amplitudes after alignment (middle). The

amplitudes after alignment are aligned again and are shown in the third column. For FRT, Table

3.2 shows each data set with two p-values. The first p-value compares the unaligned functions

as S1 and the aligned functions as S2. The second p-value compares the aligned functions to the

re-aligned functions. The purpose of the second p-value is to verify already aligned functions will

need no further alignment (i.e. the null hypothesis is not rejected).

Figure 3.5 shows the results from SNNT as a function of K for the Bimodal example (top) and

Unimodal example (bottom). Both the test statistics (left column) and p-values (right column) are

39

Page 52: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Bim

od

al

Un

imod

al

Figure 3.4: Various simulated functions before (left) and after alignment (center). The secondround of alignment (right) shows both the functions from the first round of alignment (blue) andthe second (red) for comparison.

Table 3.2: The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions.

Data Set Unaligned Aligned

Bimodal ≈ 10−9 0.50Unimodal ≈ 10−12 ≈ 10−14

shown The results are similar to those found using FRT. The variance of the distributions are a

function of K only, and are therefore the same for both p-values and not displayed.

It is of note that there is so little variability in the amplitudes of the unimodal functions that

computational error seems to play a roll in the significance of the 2nd p-value. The unimodal

functions return p-values as expected when comparing unaligned to aligned functions, but compu-

tational error results in the two sets of aligned functions being identified is different. The bimodal

functions return p-values as expected.

The next two examples are the Berkeley Growth Data [9], which measure the rate of growth

(cm/year) of female and male children quarterly from age 1 year to 18. Figure 3.6 shows the growth

40

Page 53: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

TN P-value

Bim

od

alU

nim

od

al

Figure 3.5: The TN (left) and P-values (right) for the simulated functions using the SNNT methodare shown as a function of K. The results are shown for comparing unaligned functions to aligned(blue) and aligned to aligned (red) functions.

Table 3.3: The table shows the p-values using FRT to compare unaligned to aligned and alignedto aligned functions.

Data Set Unaligned Aligned

Female Growth ≈ 10−14 0.58Male Growth ≈ 10−8 0.50

curves for male (bottom row) and female (top row) subjects, before alignment (left column) and

after one (middle column) and two (right column) rounds of alignment, in the same manner as the

simulated functions. Table 3.3 and Figure 3.7 show the results for FRT and SNNT, respectively.

Figure 3.7 show the test statistic (left column) and p-values (right column) as a function of K,

the number of ”neighbors” to include in the calculations. These plots are shown for both the male

(bottom row) and female (top row) data sets. In both cases, we observe a significant difference

in the unaligned and aligned curves and a non-significant difference when comparing the aligned

curves to re-aligned curves.

The process we just observed compared amplitudes to time warpings of the same amplitudes.

This creates a dependence between samples, which is unwanted. This makes these results worth

41

Page 54: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Fem

ale

Gro

wth

Mal

eG

row

th

Figure 3.6: Various growth curves before (left) and after alignment (center). The second round ofalignment (right) shows both the functions from the first round of alignment (blue) and the second(red) for comparison.

TN P-value

Fem

ale

Mal

e

Figure 3.7: The TN (left) and P-values (right) for the Berkeley Growth curves using the SNNTmethod are shown as a function of K. The results are shown for comparing unaligned functions tounaligned (blue) and aligned to aligned (red) functions.

42

Page 55: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

looking at to understand how we might compare the two methods, however, the results should be

taken with a grain of salt.

3.3.2 Pseudo - Bootstrap

In an attempt to address the issue of independence between the amplitudes, a process similar to

bootstrapping is used. The data set is randomly split into two samples. The first sample, S1, will

contain half the sampled functions. The second sample, which S2 contains the remaining sampled

functions, are aligned. The two samples are then compared to get a test statistics and p-value using

FRT and SNNT. This process is repeated to create a distribution of test statistics and p-values.

This process is shown in Algorithm 7.

Algorithm 7 Pseudo - Bootstrap

1. For each desired pseudo - bootstrap sample:

(a) Partition {qi} into two samples: S1 and S∗2 .

(b) Align S∗2 using Algorithm 2 to get S2.

(c) Compare S1 and S2 using FRT or SNNT.

2. Repeat step 1 to create a distribution of test statistics and p-values.

Figure 3.8 shows the test statistics (left four plots), and p-values (right four plots) as a distri-

bution for both simulated samples (binomial on top and unimodal on bottom row) and for both

methods (FRT in the first and third column, SNNT in the second and fourth column). To avoid

showing an excessive number of these plots we choose K = 6 for SNNT, because the test statistics

have mostly converged by this point. These same results are shown for the Berkeley Growth data

in Figure 3.9, for both the male (bottom row) and female (top row) data sets.

3.4 Additional Work: Energy Test and Permutation Distribution

In Hagwood [7] a third test is used in addition to FRT and SNNT, called the Energy Test.

The Energy Test is used to make the same comparison as in the first two tests, but at a later

point in time. While the FRT and SNNT results give a single p-value, the Energy Test employs

the permutation distribution recommended by Hagwood to make a more meaningful comparison.

We did not use the permutation distribution with FRT and SNNT. We have not made a formal

comparison between the Energy Test and FRT and SNNT, which are all less successful than the

43

Page 56: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

TN - FRT TN - SNNT P-value - FRT P-value - SNNT

Bim

od

al

Un

imod

al

Figure 3.8: Unaligned simulated functions are compared to aligned simulated functions using thesampling technique described. Results are shown for FRT and for K = 6 in SNNT.

TN - FRT TN - SNNT P-value - FRT P-value - SNNT

Fem

ale

Male

Figure 3.9: Unaligned growth curves are compared to aligned growth curves using the samplingtechnique described. Results are shown for FRT and for K = 6 in SNNT.

44

Page 57: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

results of the Model-Based Hypothesis Test presented in Chapter 4. Because of this, the methods

and results of the Energy Test are summarized in this section.

Zech and Aslan in 2003 [1] introduced the Energy Test. The method is similar to an ANOVA,

where the test statistic is based on distances within groups compared to the overall distance.

These distances form a test statistic, which is then compared to a permutation distribution. The

permutation distribution is formed by repeatedly, randomly permuting the groups of functions and

computing a new test statistics.

3.4.1 Energy Test and Permutation Distribution Methods

Using the same set up as in the rest of this chapter, we want to compare two sets of SRVFs: {qi}

and {qi}, both with sample size N . Note that we are still making assumptions of independences

between these two groups of functions, which is not true. The test statistic for the Energy Test is:

TN =1

N

N∑i=1

N∑j=1

d(qi, qj)−N∑i=1

∑j<i

d(qi, qj)−N∑i=1

∑j<i

d(qi, qj)

.

This test statistic, TN is a degenerate V-statistic. This leads to the Permutation Distribution,

which would allow us to have a comparison for TN . The permutation distribution gives an approx-

imate distribution for the test statistic under the null hypothesis, that {qi} and {qi} come from the

same distribution. Assuming these distributions are the same, we can assume the ”groupings” of

{qi} and {qi} are formed at random. Algorithm 8 shows how to use this information to form the

permutation distribution for the Energy Test.

Algorithm 8 Energy Test with Permutation Distribution

1. Begin with samples {qi} and {qi}, find TN

2. Combine {qi} and {qi} into one sample Z

3. For each desired sample test statistic:

(a) Sample from Z, without replacement, S∗1

(b) Set unsampled Z to S∗2

(c) Find T ∗i using S∗1 and S∗2

4. P-value is proportion of T ∗i more extreme than TN

45

Page 58: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 3.10: We want to know: are the fi (blue) significantly different from the fi (red)? The teststatistic is TN = 1540.8− 986.5− 68.9 = 485.4.

Figure 3.10 gives an example of two such sets of functions. We get a test statistic of TN =

485.4. Figure 3.11 shows the permutation distribution for the test statistics by using Algorithm 8.

Comparing the permutation distribution to the original test statistic indicates that the {qi} and

{qi} are significantly different.

3.4.2 Energy Test and Permutation Distribution Results

Only simulation results are presented for the Energy Test with the Permutation Distribution.

Figure 3.12 shows three sets of amplitudes (top) and three sampled sets of phases (bottom), which

are simulated to have small (first column), medium (second column), and large (third column)

variability. Figure 3.13 shows the composition for each of the nine possible pairs of the amplitudes

and phases, with the variability of the amplitudes increasing to the left and phase variability

increasing upward. These composed functions will serve as nine simulated cases of {fi}.

For each of the nine simulated functions, we can see the test statistics and permutation distri-

butions in Figure 3.14, the variability of the amplitudes increasing to the left and phase variability

increasing upward in the test cases. Note that as the amplitude variability increases, TN moves

closer to the center of the permutation distribution, making the results less and less significant. The

opposite occurs as the phase variability increases. That is to say, when phase variability increases,

the test statistic moves further away from the permutation distribution.

46

Page 59: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 3.11: Distribution of TN under null hypothesis (blue). Estimated TN from original sample(red).

Am

ps

Ph

ases

Figure 3.12: Amplitudes (top) and Phases (bottom) used for simulation. No axes are shown, butall functions are plotted on the same scale. These amplitudes and phases are composed to formthe initial functions used in the simulation.

47

Page 60: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Ph

ase

Var

iab

ilit

y

Amplitude Variability

Figure 3.13: The amplitudes and phases are composed to simulate a variety of functions. Noaxes are shown, but all functions are plotted on the same scale. The variability in the amplitudesincreases toward the right. The variability in the phases increases upward.

48

Page 61: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Ph

ase

Var

iab

ilit

y

Amplitude Variability

Figure 3.14: For each of the nine cases, we see the test statistics (red) along with the permutationdistribution. The nine cases shown are all plotted on the same axes to demonstrate the change insignificance as phase and amplitude variability increase.

49

Page 62: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 3.15: This image illustrates when phase variability is significant or not significant. Thecenter region is where the methods proposed in Chapter 3 struggle.

3.5 Conclusion

In this Chapter we proposed several ways to test the significance of phase variability using the

metric introduced at the start of Chapter 1. We evaluated the significance using the Friedman-

Rafsky Test, Schilling’s Nearest Neighbors Test, and the Energy Test. We also proposed a ”Pseudo-

Bootstrap” solution to attempt to create a null distribution of test statistics for the FRT and SNNT

to address issues of independence.

The FRT and SNNT gave similar results for the same data sets. Both methods, along with the

Energy Test, violate the same assumptions of independence between the two groups of functions

being compared. Although the ”Pseudo-Bootstrap” helped to address this assumption in the null

distribution, a better method is clearly needed. The work in Chapter 4 will propose a paired test

to try and address this assumption.

From the results of the Energy Test, it is clear that quantifications of phase variability are in

relation to the amplitude variability. The metric-based approach allows for the phase component

to account for variability that could be from the amplitudes. Figure 3.15 illustrates this issue. This

could be addressed by considering a model that accounts for both phase and amplitude variability.

Chapter 4 will explore this topic.

50

Page 63: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

CHAPTER 4

MODEL-BASED HYPOTHESIS TESTING FOR

PHASE IN FUNCTIONAL DATA

In Chapter 2, we discussed an Elastic framework for aligning functional data. As described there,

for a set of functions {fi} one can decompose the functions into amplitude components, {fi}, and

phases component, {γi}. The Multiple Alignment algorithm shown in Chapter 2 is used to solve

for {qi}, the SRVFs of {fi}. Chapter 2 also discusses and provides examples of the FPCA and

Elastic FPCA models.

In Chapter 3, we presented a study to create hypothesis tests to consider if alignment is necessary

for a given set of functions. If alignment isn’t a significant component, then including it in a model

could result in overfitting. Using a metric-based approach, in Chapter 3 we found that aligning all

the functions to the mean often resulted in a significant difference between {qi} and {qi}.

In the metric-based approach, the Multiple Alignment algorithm allows for the phase variability

to account for variance that could be attributed to amplitude variability. This means that our

hypothesis tests needs a way to model phase and amplitude variability together. The Elastic

FPCA model is a good candidate, because it allows for more of the amplitude variability to be

account for during the alignment process, reducing the need for a more variability in the phase

component.

The goal of this Chapter is to create a model-based hypothesis test for phase variability. In

order to do this, we need the following:

1. A model which considers phase and amplitude variability and a model which only considers

amplitude variability for comparison

2. A test for comparing these two models

We already indicated the Elastic FPCA model is a good candidate for a model with a phase

component. We use the standard FPCA model for comparison. Section 1 gives a brief review of

these two models.

The second item on the list is the main focus of the Chapter: How can we compare the FPCA

and Elastic FPCA models?

51

Page 64: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Section 1 gives a brief review of the FPCA and Elastic FPCA models discussed in Chapter 2.

Section 2 gives an overview of the hypothesis testing problem and discusses previous solutions. We

choose to use Concordance Correlation Coefficients [13] as a tool for comparing the two models

for various reasons, which we explain in Sections 2. Section 3 introduces Concordance Correlation

Coefficients and gives some basic examples. Section 4 combines the methodology for using Concor-

dance Correlation Coefficients with the Elastic FPCA and FPCA models. We present simulated

examples and real data applications in Section 5.

4.1 Background Material: Review of Functional Data Models

The goal of this Chapter is to create a model-based hypothesis test for the significance of phase

variability. In Chapter 3, we noted that phase and amplitude variability could not be considered

completely separate. Therefore, we need a model which incorporates both amplitude and phase

variability together. We also need a ”null” model, which does not include phase variability. We will

use the FPCA model (considers amplitude only) and Elastic FPCA model (considers both phase

and amplitude) as our two models.

Chapter 2 introduces the FPCA and Elastic FPCA models. This section provides a quick review

of these models. We remind the reader that both are discussed and have numerous examples in

Chapter 2.

4.1.1 Functional Principal Component Analysis

Functional Principal Component Analysis (FPCA) [3] comes almost directly from Principal

Component Analysis (PCA) [16]. PCA is a method of representing correlated observations in RN

as uncorrelated variables called Principal Components. PCA allows us to express observations

using fewer variables. The Karhunen-Loeve Expansion Theorem [10] showed the same principles

can be applied to functions. The first model we explore, Functional Principal Component Analysis

(FPCA) [3] uses the Karhunen-Leove Expansion Theorem to create a PCA equivalent for functions.

For a set of sampled functions, {fi}, we express the SRVFs, {qi}, using the FPCA model:

qi(t) = µ(t) +J∑j=1

cijbj(t) + εi(t), (4.1)

where:

• µ(t) is the expected value of qi(t),

52

Page 65: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

• {bj} form an orthonormal basis of L2,

• εi ∈ L2 is considered the noise process, typically chosen to be a white Gaussian process with

zero mean, and variance σ2

• cij ∈ R are coefficients of (qi − µ) with respect to {bj}, i.e. ci,j = 〈qi − µ, bj〉. In order to

ensure that µ is the mean of qi, we impose the condition that the sample mean of {c·j} is

zero.

Given a set of observed functions {fi}, the estimation of these model parameters is performed

using the following minimization:

(µ, ˆ{b}) = arginfµ,{bj},{cij}

N∑i=1

‖qi − µ−J∑j=1

cijbj‖2 , (4.2)

and set ci,j =⟨qi − µ, bj

⟩. More details about this model are in Chapter 2, including several

examples.

4.1.2 Elastic Functional Principal Component Analysis

Elastic Functional Principal Component Analysis expands on work completed by Kneip [12] and

Tucker [22]. Elastic FPCA models functions with an FPCA basis while considering the amplitude

and phase components jointly. While εi represents unaccounted variability from the reconstructions,

γi assists with minimizing the variability attributed to phase. The model is as follows:

qi(t) =

µ(t) +

J∑j=1

cijbj(t), γi

+ εi(t). (4.3)

This model is different from 4.1 because of the phase component, as mentioned above. Given

a set of observed functions {fi}, the estimation of these model parameters is performed using the

minimization on their SRVFs, {qi}:

(µ, ˆ{b}) = arginfµ,{bj},{cij}

N∑i=1

‖ (qi, γi)− µ−J∑j=1

cijbj‖2 , (4.4)

and set ci,j =⟨

(qi, γi)− µ, bj⟩

.

Equivalently, we can write the expression above as:

(µ, ˆ{b}) = arginfµ,{bj},{cij}

N∑i=1

‖qi − µ−J∑j=1

cij bj‖2 , (4.5)

53

Page 66: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

and set ˆci,j =⟨qi − ˆµ,

ˆbj

⟩. The tilde’s indicate we have composed each component with a time

warping function. In this case, the qi are not the aligned functions from the Multiple Alignment

algorithm, but are composed with the γi found in the Elastic FPCA model.

More details about this model are in Chapter 2, including several examples.

4.2 Introduction to Hypothesis Testing for Phase Variability

Now we return to the main problem studied in this thesis: Does a given set of functional data

contain significant phase variability?

We remind the reader that this Chapter is taking a model-based approach to creating a hy-

pothesis test to answer this question. In the previous section, we reviewed two models, which we

initially introduced in Chapter 2: FPCA and Elastic FPCA. We now need a way to compare these

two models. This is an important issue, because if there is no time warping present, we risk over

fitting by using Elastic FPCA. We need to determine if time warping is present in the data or not.

We remind the reader that the basic hypothesis test can be written as:

H0 : {γi = γid, for all i}

HA : {γi 6= γid, for at least one i}.

As seen in Chapter 3, it is difficult to directly compare the γi, instead we can modify the hypothesis

test. To understand how we arrived at the modified test, we will review some facts we have already

discussed.

Let qF (J)i represent the reconstruction of qi using FPCA with J basis functions for reconstruc-

tion. Equivalently, let qE(J)i represent the same reconstruction using Elastic FPCA. As J → ∞,

qF (J)i , q

E(J)i → qi. We have previously used this fact to make comparisons similar to those in

Chapter 3 (but are not presented in this dissertation), which include:

1. H0 : PE = PF or HA : PE 6= PF .

• PE , PF represent underlying properties of {qE(J)i } and {qF (J)

i }, respectively. This reflects

the notation used by Friedman [6].

• We used the tests from Hagwood: Friedman-Rafsky Test [6], Energy Test [1] and

Schilling’s Nearest Neighbors[18] to make the comparison.

2. We used the same set up and solution as in (1), but tested using the residual errors.

3. H0 : E[qF (J)i ] = E[q

E(J)i ] or HA : E[q

F (J)i ] 6= E[q

E(J)i ].

54

Page 67: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

• This shifts our solution to a paired test.

• We also tried using residual errors.

Hagwood’s paper inspired the first modified tests. Issues rise, because the variance of the

reconstructions under Elastic FPCA are always smaller than reconstructions under FPCA. The

second test addresses this issue by considering the mean and basis sets are different for each model.

We then shifted to paired tests since our functions are not independent between the samples.

The equivalence of the modified hypothesis tests are stated formally in the following Lemma:

Lemma 1 For the FPCA and Elastic FPCA models, the following hypothesis tests are equiv-

alent:

1. H0 : {γi = γid, for all i}, or HA : {γi 6= γid, for at least one i}.

2. H0 : {qF (J)i = q

E(J)i , for all i} or HA : {qF (J)

i 6= qE(J)i , for at least one i }.

3. H0 : {εF (J)i = ε

E(J)i , for all i} or HA : {εF (J)

i 6= εE(J)i , for at least one i }

We created something similar to a paired t-test for functional data for the equivalent hypothesis

tests in Lemma 1. The paired test is an improvement, but still is not dealing with the differences

in variance properly, which causes some problems when there is relatively low variance in the

amplitudes.

It should be noted that hypothesis testing isn’t completely new to functional data. Morris

[15] points out that most of the work on hypothesis testing with functional data uses pointwise

estimates. As an example, we could compare two samples of functional data using a t-test at each

time point in the domain. These hypothesis tests then involve multiple comparison issues and do

not give results for the functions as a whole, only based on the time point. We are not interested

in these hypothesis tests and therefore do not explore them here.

We came across a paper, Li [13], which proposes using Concordance Correlation Coefficients

(CCC) for evaluating the similarity of paired functions. This is the only paper we found, which

presents a method for comparing paired functional data as a whole and not at individual time

points. Section 3 introduces Concordance Correlation Coefficients, before we proceed to a discuss

of how they are used in our hypothesis test. We explain how we use CCC in our hypothesis test in

Section 4.

55

Page 68: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

4.3 Background Material: Concordance Correlation Coefficient

The goal of this Chapter is to create a model-based hypothesis test for the significance of phase

variability. We chose the models and now we need some way to compare the models. We remind the

reader that Section 2 explained why other methods were unsuccessful, and concluded by mentioning

an important tool for quantifying agreement: Concordance Correlation Coefficients (CCC). In this

section, we will introduce CCC. We will make it clear as to how this relates to our problem in

Section 4.

We use a simple example to motivate its definition. Consider an example where X,Y ∈ R1

are paired observations of some random process. We typically use correlation, ρ, to describes the

strength and direction of the connection between X and Y . Instead, we can use the Concordance

Correlation Coefficient, ρc, which is generally defined as:

ρc = 1− E(squared distance of (X,Y) to the identity line)

E(squared distance of (X,Y) to the identity line, assuming independence).

Essentially, ρc is scaled based on the distance of the observations to the identity line. The

expected distance to the identity line includes location and scale parameters. The next sub sections

will demonstrate the use of CCC for data in R1 and L2.

4.3.1 Concordance Correlation Coefficient in R1

We can write the definition of CCC using standard notation in R1 as follows:

ρc =2ρσxσy

σ2x + σ2

y + (µx − µy)2

=2cov(x, y)

σ2x + σ2

y + (µx − µy)2.

We can see from this definition of CCC that the concordance correlation coefficient is a scaled

version of correlation. While correlation measures the dependence in the relationship between X

and Y , CCC measures the agreement. This section provides examples, which demonstrate the

differences between correlation and CCC.

The following example demonstrates the use of CCC in R1. This may not be the best method

for analyzing data in R1. For this demonstration the following parameters are used: xi ∼ N(3, 1)

i.i.d, yi = 1.1 ∗ xi + εi, where εi ∼ N(0, 0.3616) i.i.d. Note that the true correlation is ρ = 0.95 and

the true CCC is ρc = 0.9051. Our goal is to measure the agreement between the X and Y .

56

Page 69: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.1: The paired scores for comparison are shown on the left. The identity line (black) hasbeen added for reference. A bootstrapped distribution for ρc is shown on the right. The verticalline shows the location of the point estimate.

Figure 4.1 shows a plot of the pairs of scores (left) and a test statistic with a bootstrapped dis-

tribution (right). The bootstrapped distribution is explained in the next paragraph. The estimated

correlation, ρ = 0.9425, is very strong. The line of best fit is not identity, which leads to a penalty

in the CCC. The estimated CCC is ρc = 0.8942. These estimates appear close to the true values,

but we desire some method of evaluating this claim. A typical choice is to use standard deviations

or confidence intervals of the estimates.

Li [13] discusses how common theoretical estimates underestimate the variance of ρc and rec-

ommends using bootstrapped estimates. We will be using the bootstrapped distribution in our

method and are using the bootstrapped distributions here for consistency and omit a discussion on

alternative estimates. To get the bootstrapped distribution, we repeatedly sample N of the N pairs

of functions and compute a new ρc for each sample. We can use the bootstrapped distribution to

find a 95% confidence interval for ρc by using the 2.5th and 97.5th percentile of the bootstrapped

distribution. For this example, we get a confidence interval of (0.86, 0.92).

We just introduced CCC through an example in R1 to give the reader a basic understanding of

CCC. The next subsection will introduce CCC for L2 functions and provide examples.

4.3.2 Concordance Correlation Coefficient in L2

In the previous subsection, we introduced CCC using a simple example in R1. In this section,

we will discuss CCC and provide examples using L2 functions. Li [13] introduced the use of CCC

for comparing paired functional data. The Concordance Correlation Coefficient between X(t) and

Y (t) is defined as:

57

Page 70: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.2: Left: One pair of the twenty simulated functions. Right: The bootstrapped distributionof the CCC.

ρc(X,Y ) =2〈X − E(X), Y − E(Y )〉

‖X − E(X)‖2 + ‖Y − E(Y )‖2 + ‖E(X)− E(Y )‖2,

where 〈X,Y 〉 = E∫X(t)Y (t)w(t)dt and w(t) is an optional weight function. For now we assume

w(t) = 1, ignoring any potential weightings. We see this formula is basically the same as the one

given for R1: the numerator is twice the covariance, and the denominator is the sum of the standard

deviations plus the distance between the means.

The example using functions is taken directly from Li’s paper. Twenty pairs of functions

are simulated using a Gaussian process with µx(t) = −√

0.05t, µy(t) =√

0.05t, σ2x = σ2

y = 1,

and σxy(t) = 0.95. We simulated each time point independently, using a multivariate normal

distribution. The true correlation between these functions is ρ = 0.95. Since the means are slightly

different, we will see a smaller CCC with the true CCC being ρc = 0.9048.

We remind the reader that a bootstrapped distribution is used, as recommended by Li. To get

the bootstrapped distribution, we repeatedly sample N of the N pairs of functions and compute

a new ρc for each sample. To be clear, the pairings do not change, just how often they appear

in the sample. We can use the bootstrapped distribution to get a 95% confidence interval for ρc:

(0.8927, 0.9100). Figure 4.2 shows one of twenty pairs of the simulated functions (left) along with

the bootstrapped distribution of ρc (right).

In this section we introduced CCC and provided some simple examples of how to use it. We

made comparisons between CCC and correlation to help emphasize the importance of agreement.

We provided examples in both R1 and L2. In the next section, we will discuss how CCC relates to

the rest of the Chapter.

58

Page 71: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

4.4 Methods of Hypothesis Testing for Phase with CCC

The overall goal of this Chapter is to create a model-based hypothesis test to evaluate the

significance of time warping variability. We chose to use Elastic FPCA and FPCA as our models

with and without phase variability, respectively. We will use CCC as a tool for comparing the two

methods. We can now develop a framework for using CCC in our current hypothesis test situation.

There are several reasons we use CCC:

• As previously discussed, we want to evaluate agreement and not just correlation.

• CCC is the only existing method for comparing the agreement of functional data as a whole.

• Preliminary results using the L2 norm did not work well for cases with balanced variability.

We remind the reader that other methods, such as this, are discussed earlier in this Chapter.

Before getting into more details, lets review our goal. The overall goal for this Chapter is to be

able to compare the reconstructions under FPCA to those under Elastic FPCA. Using the same set

up and q-space described in the previous chapters, we would like to compare the original SRVFs to

the FPCA reconstructions and the original SRVFs to the Elastic FPCA reconstructions. We can

do this through comparing the CCC’s.

Using the expansion on the methods from Li [13], we can return to the question of Hypothesis

Testing. We can use the CCC, ρF (J)c , to denote the similarity between the original SRVFs to the

FPCA reconstructions. Note that as J →∞, ρF (J)c → 1. Similarly, we can denote ρ

E(J)c for Elastic

FPCA equivalent.

If the CCC’s are similar, then the γi are not contributing much to the Elastic FPCA model and

we risk overfitting our reconstruction. This means our initial hypothesis test:

H0 : {γi = γid, for all i}

HA : {γi 6= γid, for at least one i},

is equivalent to:

H0 : ρF (J)c = ρE(J)

c

HA : ρF (J)c 6= ρE(J)

c .

59

Page 72: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

We do not want to compare the correlation of the reconstructions for FPCA and Elastic FPCA

directly. As the reconstructions approach the original functions, the reconstructions for FPCA and

Elastic FPCA could have mismatching variance resulting in a smaller CCC.

In Section 3, we introduced CCC and gave an example of how we can get a distribution for

a single CCC, however we will need to compare two correlations. Previous work in this area

includes Fisher 1921 [5], which uses the Fisher Z-Transformation to form normal approximations

for correlation. Steiger 1980 [21] who explored methods of comparing two correlations, which share

an index. Unfortunately, these approximations do not work well when ρ is close to 1, and are not

worth applying to CCC for our purposes.

Barnhart [2] proposed a new versions of CCC to be used in a situation with a reference method

and multiple comparisons. However, this deals with assessing overall agreement between the refer-

ence method and multiple other methods, which is not what we are interested in here.

The next rest of this section justifies using CCC and gives a simple example of how we approach

the hypothesis testing.

4.4.1 Applying Concordance Correlation Coefficient to Hypothesis Testing

To show CCC makes sense to use, the following conditions must be satisfied:

1. If γi = γid for all i, then ρF (J)c = ρ

E(J)c . This is easy to show and therefore omitted.

2. If γi 6= γid for at least one i, then ρF (J)c 6= ρ

E(J)c .

We will in fact show ρE(J)c > ρ

F (J)c , making our hypothesis equivalent to a one sided test:

H0 : ρF (J)c = ρE(J)

c

HA : ρF (J)c < ρE(J)

c .

We will fix J and use the following notation:

• Q is the SRVF of random functions

• QF , QE are the reconstructions of Q using FPCA and Elastic FPCA, respectively

• 〈X,Y 〉 = E∫X(t)Y (t)dt

• ‖X‖2 = 〈X,X〉

• E(X) = E(X(t)) are functions of t

• ρFc = ρF (J)c , ρEc = ρ

E(J)c for a short hand notation, since J is fixed

60

Page 73: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

• ρFc = 2〈Q−E(Q),QF−E(QF )〉‖Q−E(Q)‖2+‖QF−E(QF )‖2+‖E(Q)−E(QF )‖2

• ρEc = 2〈Q−E(Q),QE−E(QE)〉‖Q−E(Q)‖2+‖QF−E(QE)‖2+‖E(Q)−E(QE)‖2

Theorem 1: For a set of functions with reconstructions as defined above, the Concordance

Correlation Coefficient from Elastic FPCA is greater than that of FPCA when phase variability is

present, i.e. ρEc > ρFc .

The proof of the above Theorem is shown in the Appendix. Algorithm 9 is used for getting

distributions of ρFc and ρEc .

Algorithm 9 CCC Comparison with FPCA and Elastic FPCA

1. Fix J .

2. Compute qF (J)i and q

E(J)i for each i = 1...N .

3. Compute ρF (J)c and ρ

E(J)c .

4. Replicate step 2 with bootstrapped samples. For each bootstrapped sample, b:

(a) Sample N of the i with replacement.

(b) Compute ρbF (J) and ρbE(J) using the bootstrapped sample.

We will demonstrate these methods using R1 to try and give a more intuitive understanding.

The results section will demonstrate these methods as described.

4.4.2 Simple Concordance Correlation Coefficient for Two Comparisons

We will continue with the example from Section 4.3.1 using R1. The example started with two

random variables, X and Y . X represented a gold standard measurement and Y was an alternative

measurement. Lets assume there is an alternative to Y , which we would also like to compare,

denoted Z. Figure 4.3 illustrates this situation with the paired data (left) and the bootstrapped

distribution (right). For this demonstration the following parameters are used: zi = xi + ηi, where

ηi ∼ N(0, 0.3287) i.i.d.

To be clear, for each i there are three scores: a gold standard using method xi, a score using

method yi and a score using the new method zi. Concordance Correlation Coefficients can be used

to measure the agreement between X and Y and between X and Z, denoted ρyc and ρzc . Note

that the true correlation between X and Z is ρz = 0.95 and the true CCC is ρzc = 0.9488. Recall

that ρxc = 0.9051. Also, note that Y and Z have the same correlation with X. Y is not on the

61

Page 74: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Table 4.1: A table showing the correlation and CCC between X,Y and X,Z.

Xi to Yi Xi to Ziρ 0.94 0.95ρc 0.90 0.95

Figure 4.3: We want to measure the agreement between the scores on the X-axis and those on theY-axis for two samples: Y (red) and Z (blue) shown on the left. The line of identity (black) hasbeen added for reference. On the right, the bootstrapped distribution of ρyc is compared to thebootstrapped distribution of ρzc .

identity line, which results in a penalty for its Concordance Correlation Coefficient, while Z is not

penalized.

Table 4.1 states the estimates of the correlation and CCC in both cases. The CCC is lower

in the case with the worse agreement as expected. However, this observed difference could be a

coincidence. A hypothesis test can be conducted to determine if this difference is significant:

H0 : ρyc = ρzc

HA : ρyc 6= ρzc .

We compare a bootstrapped distribution of ρyc , which I can consider my null distribution, with

the bootstrapped distribution of ρzc . Figure 4.3 shows the distributions. We can get p-values by

comparing the bootstrapped distribution of ρyc to the point estimate of ρzc . Remember that this

demonstration in R1 is just to assist with the understanding of the hypothesis test using CCC before

apply the methods in the model testing set up. That being said, we are treating the distribution of

bootstrapped distribution as a null distribution, because the bootstrapped distribution of ˆrhoF (J)

c

is indeed the null distribution in our set up.

62

Page 75: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Table 4.2: The table shows the parameters used for simulating the functions. Φ(a, b) representsa Normal PDF with mean a and variance b. The varying parameters demonstrate a variety ofamplitude and time warping variability.

Paramater Amplitudes Phases

µ Φ(.5, .5) 0{bj} Fourier FourierJ 2 2{cij} N(0, σ2

j ) N(0, σ2W )

var({cij}) σ2j = ξ/j, [.1, 1, 3]

ξ = [.01, 0.5, 2]

4.5 Results

In this Chapter we created a test for the significance of phase variability using the FPCA and

Elastic FPCA models. This section applies the test on simulated data and then on several real

data sets: Berkeley Growth Curves, tecator data, and temperature data.

4.5.1 Simulations

The simulated functions are shown in Figure 4.4. They are simulated using the same technique

as in Chapter 2.3. Amplitudes and phases are simulated separately with three different levels of

variability. For simplicity, the amplitudes are labeled as Amp 1, 2, 3 where Amp 1 has almost

no variance and the variance increases for Amp 2 and again for Amp 3. The same is true for the

amount of time warping variance, labeled with Gam 1, 2, 3. These amplitudes and phases are

composed in pairs to create the nine test cases. In figure 4.4 the variability of the phase increase

going from left to right, while the variability of the amplitude increase going from the bottom to

the top.

The actual parameters used are presented in Table 4.2. To ensure the amplitudes do not contain

phase variability, we use the Multiple Alignment Algorithm presented in Chapter 2.

Figure 4.5 shows the boxplots of the bootstrap estimates of ρF (J)c and ρ

E(J)c for J = 0, ...10 for

each of the nine sample cases (phase variability increases from left to right, amplitude variability

increases bottom to top). As J increases, the distribution of ρF (J)c approaches the distribution of

ρE(J)c and both approach a distribution of constant 1. As expected, as the variance of the time

warping functions increase, it takes longer for these distributions to converge.

63

Page 76: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Note that when J = 0, the FPCA model is simply the mean of the functions, which results in

a covariance of 0. Elastic FPCA is able to reconstruct with the mean and time warping functions,

which is why the covariance is none zero. An important feature to note is that when the true

variance of time warping is low (i.e. Gam 1), we see a lower distribution of ρE(J)c than when the

time warping variance is higher. This may relate to the overfitting of the the Elastic FPCA model

resulting in a relatively less than optimal covariance with the original functions.

The top row of Figure 4.6 shows the p-values using the point estimate of ρE(J)c for comparison.

The bottom row uses the minimum of the bootstrapped ρE(J)c for comparison. Both p-values are

shown as a function of J , the number of basis sets used in the reconstruction. At a certain point,

both FPCA and Elastic FPCA are able to fully reconstruct the original functions, and therefore

have p-values of 1 for those J . We can see that as the variance of the amplitudes increase, it

generally takes longer for the p-values to converge. In general, we can see the p-values decrease for

a fixed J , as the variance of the time warping functions increase.

4.5.2 Real Data Examples

The Berkeley Growth Curves show the change in height for children ages 0 to 18 years old. As

previously discussed, the Berkeley growth curves are used as a typical example of an unaligned set

of functions. This is because, children have growth spurts at different ages (time warping) and to

different extremes (amplitude). We will now be able to apply the hypothesis test to determine if

alignment is significant. The female growth curves are shown first followed by the male growth

curves.

Figure 4.7 show the original functions (left) and their reconstructions of the amplitudes for

J = 1 using FPCA (middle) and Elastic FPCA (right). Note that the reconstruction process, as

well the CCC calculations occur in the SRVF space. Figure 4.8 shows the distributions of ρF (J)c

and ρE(J)c as a function of J , i.e. the number of basis functions used in reconstruction. We can see

the CCC are converging, but haven’t yet converged by J = 10. Figure 4.9 show the p-values for a

larger number of J . We can see that the distributions converge when J is in the upper twenties.

Figures 4.10, 4.11, and 4.12 show the results for the male growth curves. We expect the same

results here as for the female growth curves, because of the similar nature of the data. From the

boxplots, we can see the CCC are converging, but haven’t yet converged by J = 10. Figure 4.9

show the p-values for a larger number of J . We can see that the distributions converge when J is

in the mid twenties.

64

Page 77: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Am

p3

Am

p2

Am

p1

Gam 1 Gam 2 Gam 3

Figure 4.4: Simulation The original simulated functions. Each row uses the same amplitudes withincreasing time warping variance moving to the right. Each column uses the same time warpingfunctions, but increasing amplitude variance moving upward.

65

Page 78: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Am

p3

Am

p2

Am

p1

Gam 1 Gam 2 Gam 3

Figure 4.5: Simulation Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c (red) for

each J along the x-axis.

PV

alu

es

Amp 1 Amp 2 Amp 3

Figure 4.6: Simulation The p-values for the nine cases are shown for J = 0, ..., 50. The amplitudevariance increases as the plots move to the right. Each plot shows the p-values for small (blue),medium(red), and large (yellow) time warping variance.

66

Page 79: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.7: Female Growth Curves Left: The original female growth curves. Middle and Right:Reconstructions of the growth curves using J = 1 for FPCA and Elastic FPCA, respectively. Notethat the functions are shown in the original space and not the SRVFs.

Figure 4.8: Female Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and

ρE(J)c (red) for each J along the x-axis.

67

Page 80: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.9: Female Growth Curves The p-values for the female growth curves are close to zerountil around J = 28.

Figure 4.10: Male Growth Curves Left: The original male growth curves. Middle and Right:Reconstructions of the growth curves using J = 1 for FPCA and Elastic FPCA, respectively. Notethat the functions are shown in the original space and not the SRVFs.

The results from the male and female growth curves are similar, as expected. This is good,

because it provides a real data example of sets of functions we want to have similar results for. Of

course it would be good in the future to have replicated studies of the growth curves to affirm the

findings here.

Figures 4.13, 4.14, and 4.15 show the results for the tecator functions. The tecator data, used

a ”Tecator Infratec spectrometer that measures the absorbances at 100 wavelengths in the region

850-1050 mm” [4]. This is an example of a data set that appears mostly aligned, but with some

phase variability present.

Figure 4.13 show the original tecator functions (left), and reconstructions using FPCA and

Elastic FPCA (middle and right, respectively). The reconstructions look very similar to the original

functions. Figure 4.14 show the boxplots for J = 0, ..., 10 of the CCC. The CCC appear to converge

68

Page 81: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.11: Male Growth Curves Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c

(red) for each indicated J along the x-axis.

Figure 4.12: Male Growth Curves The p-values for the male growth curves are close to zerountil around J = 25.

69

Page 82: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.13: Tecator Data Left: The original tecator functions. Middle and Right: Reconstruc-tions of the tecator functions using J = 1 for FPCA and Elastic FPCA, respectively. Note thatthe functions are shown in the original space and not the SRVFs.

toward one quickly. Figure 4.15 show that p-values do not become significant until J is close to 20.

This contradicts what was expected.

It is hard to see visually in Figure 4.13, but many of the original functions have a small peak

to the left of the main peak. The Elastic FPCA struggles to converge in such a situation. This

is similar to issues with the Multiple Alignment Algorithm, which struggles to converge when

presented with functions with varying numbers of less prominent peaks. Further investigation into

cases similar to this are necessary.

Figures 4.16, 4.17, and 4.18 show the results for the temperature data. This data set is an

example where there is almost no noticeable time warping variability. Even with only one basis,

the FPCA and Elastic FPCA models seem to be able to reconstruct most of the variance of the

original functions. The boxplots shown in Figure 4.17 show the box plots converging at very low

values of J compared to the other data sets. Observing the p-values, the Elastic FPCA model

might not be necessary for J > 1 for this particular data set.

4.6 Conclusion

In this Chapter we proposed a model-based approach to testing the significance of phase vari-

ability. We used the models proposed in Chapter 2: FPCA as a general model with no time warping

component and Elastic FPCA as a model with time warping. Concordance Correlation Coefficients

were used to make the comparison. We created a hypothesis test using CCC which was equivalent

to testing the significance of phase variability using the two models.

Various simulations showed a change in the significance of the time warping functions-based

on the initial phase and amplitude variability. We explored several applications including growth

70

Page 83: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.14: Tecator Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c (red)

for each indicated J along the x-axis.

Figure 4.15: Tecator Data The p-values for the tecator data are close to zero until around J = 20.

71

Page 84: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.16: Temperature Data Left: The original temperature functions. Middle and Right: Re-constructions of the temperature functions using J = 1 for FPCA and Elastic FPCA, respectively.Note that the functions are shown in the original space and not the SRVFs.

Figure 4.17: Temperature Data Each boxplot shows the distribution of ρF (J)c (blue) and ρ

E(J)c

(red) for each indicated J along the x-axis.

72

Page 85: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Figure 4.18: Temperature Data The p-values for the weather data are close to zero until aroundJ = 5.

curves, tecator data, and weather data. The real data had both positive and negative results. The

growth curves had the desired results: significant differences between the two models and similar

results between the male and female growth curves. The FPCA and Elastic FPCA models were

not significantly different for larger J in the weather data, as desired.

The tecator data was expected to give similar results to those of the weather data. It is

suspected that the discrepancy in results has to do with small additional peaks in some of the

tecator functions. Specifically, it appears the models give different results in locations where the

functions are relatively flat. A solution to this might be to use the weight parameter in CCC to

make locations where the functions are flat less influential. Further investigation is required. In

the meantime, caution should be applied when using the hypothesis test on certain datasets.

In conclusion, we have formulated a hypothesis test for testing the significance of phase vari-

ability. This test can now be applied to data sets before proceeding with time warping techniques,

with some cautions.

73

Page 86: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

CHAPTER 5

GENERAL DISCUSSION AND FUTURE WORK

To summarize the main contribution of this dissertation we have:

• Studied three models for functional data, while accounting for the phase variability in different

ways:

– Functional Principal Component Analysis, that does not separate phase variability from

the amplitude variability

– Separate Functional Principal Component Analysis, that first separates phase and am-

plitude and then models each component separately

– Elastic Functional Principal Component Analysis, that jointly models phase and ampli-

tude together

• Developed a metric-based hypothesis test for testing the significance of phase variability for

a given set of functions

• Developed a model-based hypothesis test for testing the significance of phase variability using

Concordance Correlation Coefficients and the FPCA and Elastic FPCA models

In Chapter 2 we compared Elastic FPCA with two other models: FPCA and Separate FPCA.

When the functions contain small phase variability, it is clear the Separate FPCA model struggles

taking this variability into account properly. FPCA did not separate out phase variability, making

it preferable to Separate FPCA when functions did not have phase variability. Elastic FPCA

takes jointly models phase and amplitude and was able to model functions with phase better than

FPCA, and it was able to model functions without phase better than Separate FPCA. We concluded

that Elastic FPCA is the desired method to minimize variability in the functions, while avoiding

unnecessary phase components.

For both Separate FPCA and Elastic FPCA, we noted that the models overfit their phase

components when phase variability was not necessary. We noted that it was possible for FPCA to

perform better than the other models if there was a way of indicating whether phase variability is

truly present in the functions or not. This led to the hypothesis tests developed in Chapters 3 and

4, that test the significance of phase variability in functional data.

74

Page 87: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

In Chapter 3, we proposed several ways to test the significance of phase variability using the

metric introduced in Chapter 1. We evaluated the significance using the Friedman-Rafsky Test,

Schilling’s Nearest Neighbors Test, and the Energy Test. We also proposed a ”Pseudo-Bootstrap”

solution to attempt to create a null distribution of test statistics for the FRT and SNNT to address

issues of independence. Both methods, along with the Energy Test, violated the same assumptions

of independence between the two groups of functions being compared. Although the ”Pseudo-

Bootstrap” helped to address this assumption in the null distribution, a better method was clearly

needed.

From the results of the Energy Test, it was clear that quantifications of phase variability are in

relation to the amplitude variability. The metric-based approach allowed for the phase component

to account for variability that could be from the amplitudes. We noted that this could be addressed

by considering a model that accounts for both phase and amplitude variability, such as the Elastic

FPCA model. This led to the work presented in Chapter 4, where we developed a hypothesis test

for testing the significance of phase variability using a model-based approach.

In Chapter 4, we proposed using Concordance Correlation Coefficients (CCC) as a tool for test-

ing the agreement between FPCA and Elastic FPCA. We developed a hypothesis test that compared

the CCC between SRVFs and their reconstructions using FPCA and Elastic FPCA. We stated in

a theorem and provide that the CCC between SRVFs and their Elastic FPCA reconstructions will

always be closer to one than the CCC between the SRVFs and their FPCA reconstructions. We

used this theorem to formulate a hypothesis test using the CCC.

Various simulations showed a change in the significance of the phase components based on the

initial phase and amplitude variability. We explored several applications including growth curves,

tecator data, and weather data. The real data had both positive and negative results. The growth

curves had the desired results: significant differences between the two models and similar results

between the male and female growth curves. The FPCA and Elastic FPCA models were not

significantly different for larger J in the weather data, as desired.

The tecator data was expected to give similar results to those of the weather data. It is suspected

that the discrepancy in results has to do with the functions being approximately flat in the same

locations. Further investigation is required. A solution may be to include a weighted function in

CCC to give SRVFs close to zero less influence over the distribution of the CCC. In the meantime,

caution should be applied when using the hypothesis test on certain data.

75

Page 88: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

In conclusion, we have developed several hypothesis tests for testing the significance of phase

variability in functional data. We will need to continue to work on cases with overall low amplitude

and phase variability, such as the tecator functions.

76

Page 89: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

APPENDIX A

PROOF OF THEOREM IN CHAPTER 4

Theorem: For a set of functions with reconstructions as defined above, the Concordance Correla-

tion Coefficient from Elastic FPCA is greater than that of FPCA when phase variability is present

(i.e. ρEc > ρFc ).

The proof is broken down into several Lemma’s and remarks. Many properties of the FPCA

model are used throughout this process.

Remark: If γi 6= γid for at least one i, it is because E‖Q − QE‖2 < E‖Q − QF ‖2. This is

directly from the minimization. This is because the model using Elastic FPCA with remain at γid

unless a better fit has been found.

Lemma 1: If ‖Q −QE‖2 < ‖Q −QF ‖2, then 2〈Q − E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 +

‖QE − E(QE)‖2.

Proof of Lemma 1:

‖Q−QE‖2 < ‖Q−QF ‖2

‖Q− E(Q)‖2 − ‖QE − E(QE)‖2 − 2〈QE − E(QE), Q−QE〉 < ‖Q− E(Q)‖2 − ‖QF − E(Q)‖2

‖QE − E(QE)‖2 + 2〈QE − E(QE), Q−QE〉 > ‖QF − E(Q)‖2

‖QE − E(QE)‖2 + 2〈Q− E(Q), QE − E(QE)〉 − 2‖QE − E(QE)‖2 > ‖QF − E(Q)‖2

2〈Q− E(Q), QE − E(QE)〉 − ‖QE − E(QE)‖2 > ‖QF − E(Q)‖2

2〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 + ‖QE − E(QE)‖2

Lemma 2: If 2〈Q − E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 + ‖QE − E(QE)‖2, then 〈Q −

E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2.

Proof of Lemma 2:

77

Page 90: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

Note that ‖QF − E(Q)‖2 < ‖QE − E(QE)‖2:

2〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2 + ‖QE − E(QE)‖22〈Q− E(Q), QE − E(QE)〉 > 2‖QF − E(Q)‖2

〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2

Lemma 3: We can simplify ρF in the following manner:

ρF =2‖QF − E(QF )‖2

‖Q−QF ‖2 + 2‖QF − E(Q)‖2.

Proof of Lemma 3:

Recall that (using the fundamental rules of FPCA) Q−QF and QF − E(Q) are orthogonal:

ρF =2〈Q− E(Q), QF − E(QF )〉

‖Q− E(Q)‖2 + ‖QF − E(QF )‖2 + ‖E(Q)− E(QF )‖2

=2〈Q− E(Q), QF − E(QF )〉

‖Q− E(Q)‖2 + ‖QF − E(QF )‖2 + ‖E(Q)− E(Q)‖2

=2〈Q− E(Q), QF − E(QF )〉

‖Q− E(Q)‖2 + ‖QF − E(QF )‖2 + 0

=2〈QF +Q−QF − E(Q), QF − E(QF )〉‖Q− E(Q)‖2 + ‖QF − E(QF )‖2

=2〈Q−QF , QF − E(QF )〉+ 2〈QF − E(Q), QF − E(QF )〉

‖Q− E(Q)‖2 + ‖QF − E(QF )‖2

=0 + 2〈QF − E(Q), QF − E(QF )〉‖Q− E(Q)‖2 + ‖QF − E(QF )‖2

=2‖QF − E(QF )‖2

‖Q− E(Q)‖2 + ‖QF − E(QF )‖2

=2‖QF − E(QF )‖2

‖Q−QF ‖2 + ‖QF − E(Q)‖2 + ‖QF − E(Q)‖2

=2‖QF − E(QF )‖2

‖Q−QF ‖2 + 2‖QF − E(Q)‖2

Lemma 4: We can simplify ρE in the following manner:

ρE =2〈Q− E(Q), QE − E(QE)〉

‖Q−QE‖2 + 2〈Q− E(Q), QE − E(QE)〉.

Proof of Lemma 4:

78

Page 91: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

ρE =2〈Q− E(Q), QE − E(QE)〉

‖Q− E(Q)‖2 + ‖QE − E(QE)‖2 + ‖E(Q)− E(QE)‖2

=2〈Q− E(Q), QE − E(QE)〉

‖Q− E(Q)‖2 + ‖QE − E(QE)‖2 + ‖Q−QE‖2 − ‖Q− E(Q)‖2 − ‖QE − E(QE)‖2 +A1

=2〈Q− E(Q), QE − E(QE)〉

‖Q−QE‖2 + 2〈Q− E(Q), QE − E(QE)〉

In the above proof, A1 = 2〈Q−E(Q), QE −E(QE)〉. Note that we use the following remark to

move from the first to second step of the above proof:

Remark:∥∥E(Q)− E(QE)∥∥2

=∥∥Q−QE∥∥2 − ‖Q− E(Q)‖2 −

∥∥QE − E(QE)∥∥2

+ 2〈Q− E(Q), QE − E(QE)〉

Proof of Remark:

∥∥E(Q)− E(QE)∥∥2

=∥∥Q−QE∥∥2 −

∥∥Q−QE∥∥2+∥∥E(Q)− E(QE)

∥∥2

=∥∥Q−QE∥∥2 −

∥∥(Q−QE)− E(Q) + E(QE)∥∥2

=∥∥Q−QE∥∥2 −

∥∥Q− E(Q)−QE − E(QE)∥∥2

=∥∥Q−QE∥∥2 − ‖Q− E(Q)‖2 −

∥∥QE − E(QE)∥∥2

+ 2〈Q− E(Q), QE − E(QE)〉

Proof of Theorem:

We begin with the inequality of residual errors, ‖Q−QE‖2 < ‖Q−QF ‖2, which comes directly

from the models. Recall that if the Elastic FPCA model differs from FPCA, it is because a set of

warping functions has been found, which minimize the energy more than from the FPCA model.

From Lemma 1 and Lemma 2, we can show the inequality of residual errors is equivalent to:

〈Q− E(Q), QE − E(QE)〉 > ‖QF − E(Q)‖2.

Remark: ∥∥Q−QE∥∥2

〈Q− E(Q), QE − E(QE)〉<

‖Q−QF ‖2

‖QF − E(QF )‖2

The remark notes that combing the inequality of residual errors, with the above inequality gives

us the following inequality:

79

Page 92: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

∥∥Q−QE∥∥2

〈Q− E(Q), QE − E(QE)〉<

‖Q−QF ‖2

‖QF − E(QF )‖2

The next few steps use some simple algebra manipulation on the above inequality.

∥∥Q−QE∥∥2

2〈Q− E(Q), QE − E(QE)〉+ 1 <

‖Q−QF ‖2

2‖QF − E(QF )‖2+ 1∥∥Q−QE∥∥2

+ 2〈Q− E(Q), QE − E(QE)〉2〈Q− E(Q), QE − E(QE)〉

<‖Q−QF ‖2 + 2‖QF − E(Q)‖2

2‖QF − E(QF )‖2

2〈Q− E(Q), QE − E(QE)〉‖Q−QE‖2 + 2〈Q− E(Q), QE − E(QE)〉

>2‖QF − E(QF )‖2

‖Q−QF ‖2 + 2‖QF − E(Q)‖2

Lemma 3 shows the right side of the above inequality is equivalent to ρF . Lemma 5, with

the help of Lemma 4, shows the left side is equivalent to ρE . We can therefore conclude that if

‖Q−QE‖2 6= ‖Q−QF ‖2, then ρE > ρF .

80

Page 93: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

APPENDIX B

IRB APPROVAL

81

Page 94: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

BIBLIOGRAPHY

[1] B. Aslan and G. Zech. New test for the multivariate two-sample problem based on the conceptof minimum energy. Journal of Statistical Computation and Simulation, 75(2):109–119, 2005.

[2] Huiman X. Barnhart, Yuliya Lokhnygina, Andrzej S. Kosinski, and Michael Haber. Compari-son of concordance correlation coefficient and coefficient of individual agreement in assessingagreement. Journal of Biopharmaceutical Statistics, 17(4):721–738, Feb 2007.

[3] Philippe Besse and J. O. Ramsay. Principal components analysis of sampled functions. Psy-chometrika, 51(2):285–311, 1986.

[4] Claus. Borggaard and Hans Henrik. Thodberg. Optimal minimal neural interpretation ofspectra. Analytical Chemistry, 64(5):545–551, 1992.

[5] Ronald A Fisher. 014: On the ”probable error” of a coefficient of correlation deduced from asmall sample. Metron, 1:3–32, 1921.

[6] Jerome H. Friedman and Lawrence C. Rafsky. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics, 7(4):697–717, 1979.

[7] C. Hagwood, J. Bernal, M. Halter, J. Elliott, and T. Brennan. Testing Equality of CellPopulations Based on Shape and Geodesic Distance. IEEE Trans Med Imaging, 32(12):2230–2237, Dec 2013.

[8] Peter Hall and Mohammad Hosseini-Nasab. On properties of functional principal componentsanalysis. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 68(1):109–126, 2006.

[9] Harold E. Jones and Nancy Bayley. The berkeley growth study. Child Development, 12(2):167–173, 1941.

[10] K. Karhunen. Ueber lineare Methoden in der Wahrscheinlichkeitsrechnung. Annales Academiaescientiarum Fennicae. Series A. 1, Mathematica-physica. 1947.

[11] Tonya S. King and Vernon M. Chinchilli. A generalized concordance correlation coefficient forcontinuous and categorical data. Statistics in Medicine, 20(14):2131–2147, 2001.

[12] Alois Kneip and James O Ramsay. Combining registration and fitting for functional models.Journal of the American Statistical Association, 103(483), 2008.

[13] Runze Li and Mosuk Chow. Evaluation of reproducibility for paired functional data. Journalof Multivariate Analysis, 93(1):81–101, 2005.

[14] Lawrence I-Kuei Lin. A concordance correlation coefficient to evaluate reproducibility. Bio-metrics, 45(1):255, 1989.

82

Page 95: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

[15] Jeffrey S. Morris. Functional regression. Annual Review of Statistics and Its Application,2(1):321–359, 2015.

[16] Karl Pearson. Liii. on lines and planes of closest fit to systems of points in space. The London,Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.

[17] J.O Ramsay and B.W Silverman. Applied functional data analysis: methods and case studies.Springer, 2002.

[18] Mark F. Schilling. Multivariate two-sample tests based on nearest neighbors. Journal of theAmerican Statistical Association, 81(395):799–806, 1986.

[19] A. Srivastava, E. Klassen, S. H. Joshi, and I. H. Jermyn. Shape analysis of elastic curves in eu-clidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7):1415–1428, July 2011.

[20] Anuj Srivastava and Eric P. Klassen. Functional and shape data analysis. Springer, 2016.

[21] James H. Steiger. Tests for comparing elements of a correlation matrix. Psychological Bulletin,87(2):245–251, 1980.

[22] J. Derek Tucker. Functional Component Analysis and Regression Using Elastic Methods. PhDthesis, Florida State University Libraries, 2014.

[23] J. Derek Tucker, Wei Wu, and Anuj Srivastava. Generative models for functional data usingphase and amplitude separation. Comput. Stat. Data Anal., 61:50–66, May 2013.

83

Page 96: Florida State University Libraries - fsu.digital.flvc.org653405/...N(left) and P-values (right) for the simulated functions using the SNNT method are shown as a function of K. The

BIOGRAPHICAL SKETCH

Megan Duncan received her B.A. in Mathematics with a minor in Latin from the University of

Maine in Fall of 2011. After some graduate work in Mathematics at the University of Maine, she

moved to Florida to pursue her Ph.D. In the Fall of 2013, she began the Ph.D. program in the

Department of Statistics at Florida State University under the advisement of Dr. Anuj Srivastava.

Megan is married to Adam Duncan and they have four cats.

84