functional data analysis for activity profiles from ... · introductionmethodsimulation study...

Post on 14-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Functional data analysis for activity profilesfrom wearable devices

Ian McKeague

Joint work with Hsin-wen ChangInstitute of Statistical Science, Academia Sinica

September 16, 2019

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Outline

• Motivation: inference for sensor data from wearable devices

• Activity profiles based on sensor data (no pre-alignment)

• Empirical likelihood based confidence bands and functionalANOVA testing for mean activity profiles

• Monotonic functional data: no need for smoothing

• Application: accelerometer data from NHANES

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Wearable device data

• Inexpensive wearable sensors generate massive amounts ofreal-time data, with potentially exciting applications tophysiological monitoring and health care delivery (mHealth).

• Inferential methods for comparing treatments based onwearable device (outcome) data not well developed.

• Serious challenges: unmeasured time-dependent confounders(e.g., circadian and dietary patterns), highly non-stationary,difficult to align across subjects, missing data, . . . .

• Connection to precision medicine: reinforcement learning formHealth (Murphy et al., 2017):http://papers.nips.cc/paper/7179-action-centered-contextual-bandits.pdf

Tradeoff between exploration and exploitation.Assumes stationarity.

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: blood pressure monitoring

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: sweat monitoring (really!)

Noninvasive alternative to blood glucose monitoring (Nyein et al. 2019)

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: real-time sweat measurements

Patches worn on the forehead, forearm, underarm, and back, andsweat parameters monitored simultaneously.

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: physical activity monitoring

• National Health and Nutrition Examination Survey (NHANES)

• Accelerometer ‘counts’ recorded for 7 consecutive days in1-minute epochs using an ActiGraph device

• Goal: to compare groups of subjects using their activity profiles

• activity profile: the amount of time activity exceeds some level

• Typically in the physical activity literature, activity is classifiedusing selected thresholds (e.g., as “sedentary”)

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Example: gene therapy for mitochondrial disease

5, 000 children are born with mitochondrial disease each year in the US.

Columbia RCT: 40 patients. Accelerometer: activPAL.

Introduction Method Simulation study Discussion Bibliography Miscellaneous

ActiGraph accelerometer readings (NHANES)

0 2000 4000 6000 8000 10000

020

0040

0060

00

time (unit = 1 minute)

inte

nsity

cou

nts

0 2000 4000 6000 8000 10000

020

0060

00

time (unit = 1 minute)

inte

nsity

cou

nts

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Activity profiles as monotonic functional data

Sensor readings X (t), t ∈ [0, τ ] generate an activity profile:

Ta = Leb({t ∈ [0, τ ] : X (t) > a}), a ∈ R.

Sensor readings X (t) over 25-minutes with activity Ta = 9 minutesabove level a = 0.1, and Ta = 16 minutes above level a = −0.1.

Note: Need to avoid pre-alignment of sensor data among subjects

(needed in standard FDA approaches).

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Raw and mean activity profiles from the NHANES data

Veterans aged 75-and-older (black, n1 = 160), non-veterans aged75-and-older (red, n2 = 279), veterans aged 65–74 (blue,n3 = 139), and non-veterans aged aged 65–74 (green n4 = 348)

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Functional ANOVA for activity profiles

Goal: Compare k mean activity profiles µj(a) = ETaj , j = 1, . . . , k.Functional ANOVA: tests µ1(·) = . . . = µk(·) vs. omnibus alternative

• Taj = Leb({t ∈ [0, τ ] : Xj(t) > a}), where

Xj = {Xj(t), t ∈ [0, τ ]}

for sensor readings in the jth group, τ is total study time

• Observe nj iid copies

{Taj1, . . . ,Tajnj , a ∈ [α1, α2]}

of the activity profile Taj . Weaker than iid observations of Xj .[α1, α2] is the range of device readings of interest

• Approach based on a nonparametric likelihood ratio procedure:empirical likelihood (EL)

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Empirical likelihood (EL)

• EL involves forming a ratio of two nonparametric likelihoodssubject to constraints on the parameters of interest

• Two early papers: [Thomas and Grunkemeier, 1975],[Owen, 1988]

• Produces highly accurate confidence regions [Owen, 2001] andtests with optimal power [Kitamura et al., 2012]

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Empirical likelihood

Observe X1, . . . ,Xn ∼ iid F , µ = µ(F ) a parameter of interest.

NP likelihood ratio:

R(µ0) =sup{L(F ) : µ(F ) = µ0}

sup{L(F )}

L(F ) =∏n

i=1 pi is the NP likelihood, pi = point mass (of F ) at Xi .

Hypothesis tests:

Accept µ(F ) = µ0 when R(µ0) ≥ r0 for some threshold r0.

Confidence regions: {µ : R(µ) ≥ r0}

Introduction Method Simulation study Discussion Bibliography Miscellaneous

EL for means

µ = E (X )

R(µ) = max

{n∏

i=1

npi :n∑

i=1

piXi = µ, pi ≥ 0,n∑

i=1

pi = 1

}

Chi-squared calibration: Wilks type theorem for −2 logR(µ0).

Introduction Method Simulation study Discussion Bibliography Miscellaneous

EL for quantiles

Estimating equation:

E (m(X , µ)) = 0, where for the α-quantile

m(X , µ) = 1{X ≤ µ} − α.

R(µ) = max

{n∏

i=1

npi :n∑

i=1

pim(Xi , µ) = 0, pi ≥ 0,n∑

i=1

pi = 1

}

Chi-squared calibration:

Wilks theorem still applies: replace Xi − µ0 by m(X , µ0).

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Activity profiles: relevant references

• Functional data literature• Wald-type ANOVA tests requiring curve registration

[Gorecki and Smaga, 2018]• EL-based tests in a concurrent linear model

[Wang et al., 2018], requiring curve registration and smoothing• Curve registration/alignment only useful on raw sensor data• Time warping alters the activity profiles!

• Physical activity literature• Only considers activity profiles at a few activity levels

• e.g., the time spent in sedentary behavior could berepresented by the accumulated amount of time below 100counts/minute [Matthews et al., 2008]

• The levels are typically chosen in an ad hoc fashion

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Our contribution

EL-based functional ANOVA test for comparing groups of subjectsbased on their activity profile data, i.e., an omnibus test of

H0 : µ1(·) = . . . = µk(·)

• greater efficiency using EL

• avoids issues in pre-aligning sensor data

• no smoothing needed (as activity profiles are monotonic)

• analyze entire activity profiles

• EL-based simultaneous confidence bands

• approach also applies to the quantiles of activity profiles

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Approach applies to functions of bounded variation

Example: Area covered by Arctic sea ice (Nature, Sept 2019)

Example: Canadian temperature data (Ramsay & Silverman)

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Canadian temperature data

Average daily temperature at 35 Canadian weather stations.

Introduction Method Simulation study Discussion Bibliography Miscellaneous

EL-based ANOVA test for activity profiles

• For an activity level a, construct the local EL ratio as

R(a) =sup

{∏kj=1 L(Faj) : µ1(a) = . . . = µk(a)

}sup

{∏kj=1 L(Faj)

}L(Faj) is the NP likelihood based on observation of Taj

• To test H0 we propose the maximally selected EL statistic:

Kn = supa∈[α1,α2]

[−2 logR(a)] .

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Wilks type theorem for the EL-based ANOVA test

Suppose nj/n→ γj > 0 and infa∈[α1,α2]Var(Tja) > 0, for each j .

Then, under H0,

Knd−→ sup

a∈[α1,α2]

k∑j=1

wj(a)

[Uj(a)√wj(a)

− U(a)

]2

,

U(a) =k∑

j=1

√wj(a)Uj(a),

Uj are independent zero-mean Gaussian processes, and the weightswj(a) ∝ γj/Var(Tja) are normalized to sum to 1 across the groups.

Proof: Bracketing-entropy CLT for stochastic processes with monotone sample

paths furnishes Uj as the limit of the process√nj{µj(·)− µj(·)}/σj(·).

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Nonparametric bootstrap calibration

• The limiting distribution can be bootstrapped by replacingUj(a) by its nonparametric bootstrap

U∗j (a) =√nj{µ∗j (a)− µj(a)}/σj(a)

and replacing other unknowns by their estimates.

• µ∗j (a) is obtained by evaluating µj(a) after resampling with

replacement from {Taj1, . . . ,Tajnj}, with each Taji regarded asfunction of a

• Let M∗n denote the resulting bootstrap

• Simulate M∗n by repeatedly resampling

• Compare the empirical quantiles of these bootstrapped valuesM∗n with our test statistic Kn

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Simulation study

• Compare our approach with tests from R package fdANOVA:• Fmaxb: a maximally-selected F -statistic• GPF: an integrated F -statistic• TRP: random projections

that apply to generic functional data.

• Striking differences in performance if groups are unbalanced

• Simulation model:• Generate Xj(·) as positive part of a scaled OU process;

multiply the resulting Taj by an independent beta r.v.• k = 3, each group/scenario with distinct OU/beta parameters.

Introduction Method Simulation study Discussion Bibliography Miscellaneous

A B C

Top row: µj(a) = ETja. Bottom row: σ2j (a) = Var(Tja). Scenario A: identical

µj(a). Scenario B: crossing µj(a). Scenario C: ordered µj(a).

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Table: Empirical rejection rates (percentages) for functional ANOVA testsunder various scenarios and sample sizes, based on 1000 Monte Carloreplications, 1000 bootstrap samples, and a nominal level of 5%.

scenario (n1, n2, n3) EL test Fmaxb GPF TRP

A(70, 100, 130) 5.4 3.8 3.6 2.3(130, 100, 70) 5.8 8.1 9.5 4.0

B(70, 100, 130) 76.3 50.2 30.4 67.1(130, 100, 70) 75.6 70.0 60.0 61.7

C(70, 100, 130) 81.3 60.9 55.2 61.6(130, 100, 70) 77.6 74.5 69.5 57.4

Introduction Method Simulation study Discussion Bibliography Miscellaneous

NHANES data revisited

Sample means of raw accelerometer readings (in 4 consecutive days)

comparing veterans aged 75-and-older (acqua) and veterans aged 65–74

(coral). Differences apparent even without curve alignment/smoothing.

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Confidence bands of mean activity profiles

Right: EL (black), Wald-type (red), and MFD (blue) 95% simultaneousconfidence bands for the mean activity profile (estimate in gray) of veteransaged-75-and-older, showing that the EL band is narrower than the Wald-typeband and similar to the MFD band at most activity levels.

MFD (mean of functional data) band: uses local linear smoothing with

cross-validated bandwidth selection [Degras, 2011, Degras, 2017].

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Applying the various tests

Table: p-values from various functional ANOVA tests: veterans aged75-and-older (group 1), non-veterans aged 75-and-older (group 2),veterans aged 65–74 (group 3), and non-veterans aged 65–74 (group 4).

Comparison EL test GPF Fmaxb TRP

all groups < 0.001 < 0.001 < 0.001 < 0.001group 1 vs 2 0.010 0.060 0.016 0.033group 3 vs 4 0.345 0.416 0.365 0.579group 1 vs 3 < 0.001 < 0.001 < 0.001 < 0.001group 2 vs 4 < 0.001 < 0.001 < 0.001 < 0.001

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Conclusion

• We have developed a new functional ANOVA test based on amaximally-selected local empirical likelihood statistic

• Approach applies generally to functional data with samplepaths of bounded variation. Smoothing avoided.

• Simulation study shows that the new test is more accurateand more powerful than standard FDA approaches

• We applied the proposed method to wearable device datafrom NHANES and obtained more significant results thanexisting functional ANOVA tests

• Directions for future work: gaps in sensor observations,activity profiles regressed on high-dimensional predictors . . .

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Thank you!

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Degras, D. A. (2011).Simultaneous confidence bands for nonparametric regressionwith functional data.Statistica Sinica, 21(4):1735–1765.

Degras, D. A. (2017).Simultaneous confidence bands for the mean of functionaldata.Wiley Interdisciplinary Reviews: Computational Statistics,9(3):e1397.

Gorecki, T. and Smaga, L. (2018).fdANOVA: an R software package for analysis of variance forunivariate and multivariate functional data.Computational Statistics.https://doi.org/10.1007/s00180-018-0842-7.

Kitamura, Y., Santos, A., and Shaikh, A. M. (2012).

Introduction Method Simulation study Discussion Bibliography Miscellaneous

On the asymptotic optimality of empirical likelihood for testingmoment restrictions.Econometrica, 80(1):413–423.

Matthews, C. E., Chen, K. Y., Freedson, P. S., Buchowski,M. S., Beech, B. M., Pate, R. R., and Troiano, R. P. (2008).Amount of time spent in sedentary behaviors in the UnitedStates, 2003–2004.American Journal of Epidemiology, 167(7):875–881.

Owen, A. B. (1988).Empirical likelihood ratio confidence intervals for a singlefunctional.Biometrika, 75(2):237–249.

Owen, A. B. (2001).Empirical Likelihood.Chapman & Hall/CRC, Boca Raton.

Thomas, D. R. and Grunkemeier, G. L. (1975).

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Confidence interval estimation of survival probabilities forcensored data.Journal of the American Statistical Association, 70:865–871.

Wang, H., Zhong, P.-S., Cui, Y., and Li, Y. (2018).Unified empirical likelihood ratio tests for functionalconcurrent linear models and the phase transition from sparseto dense functional data.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 80(2):343–364.

Introduction Method Simulation study Discussion Bibliography Miscellaneous

Specifying [α1, α2]

• In practice α1 and α2 may be specified by practitioners basedon a range of accelerometer readings available in theparticular context

• They could also be chosen in a data-driven fashion, sayα1 = inf{a : µ(a) < 0.95τ} and α2 = sup{a : µ(a) > 0.05τ};this is what we use in our simulation studies and data analysis

top related