hierarchical linear models for energy prediction using...

Noname manuscript No.(will be inserted by the editor)

Hierarchical Linear Models for Energy Prediction usingInertial Sensors: A Comparative Study for TreadmillWalking

Harshvardhan Vathsangam · B. Adar Emken · E. Todd Schroeder ·Donna Spruijt-Metz · Gaurav S. Sukhatme

the date of receipt and acceptance should be inserted later

Abstract Walking is a commonly available activity tomaintain a healthy lifestyle. Accurately tracking andmeasuring calories expended during walking can im-prove user feedback and intervention measures. Inertialsensors are a promising measurement tool to achievethis purpose. An important aspect in mapping inertialsensor data to energy expenditure is the question ofnormalizing across physiological parameters. Commonapproaches such as weight scaling require validation foreach new population. An alternative is to use a hier-archical approach to model subject-speci�c parametersat one level and cross-subject parameters connected byphysiological variables at a higher level. In this paper,we evaluate an inertial sensor-based hierarchical modelto measure energy expenditure across a target popu-lation. We �rst determine the optimal movement andphysiological features set to represent data. Periodic-ity based features are more accurate (p<0.1 per sub-ject) when generalizing across populations. Weight isthe most accurate parameter (p<0.1 per subject) mea-sured as percentage prediction error. We also comparethe hierarchical model with a subject-speci�c regressionmodel and weight exponent scaled models. Subject-speci�cmodels perform signi�cantly better (p<0.1 per subject)than weight exponent scaled models at all exponentscales whereas the hierarchical model performed worse

Harshvardhan Vathsangam · Gaurav S. SukhatmeDept. of Computer Science, University of Southern California,Los Angeles, CA - 90007.E-mail: vathsang|[email protected]

B. Adar Emken · Donna Spruijt-MetzDept. of Preventive Medicine, University of Southern California,Los Angeles, CA - 90007.E-mail: emken|[email protected]

E. Todd SchroederDivision of Kinesiology, University of Southern California, LosAngeles, CA - 90007.E-mail: [email protected]

than both. However, using an informed prior from thehierarchical model produces similar errors to using asubject-speci�c model with large amounts of trainingdata (p<0.1 per subject). The results provide evidencethat hierarchical modeling is a promising technique forgeneralized prediction energy expenditure prediction acrossa target population in a clinical setting.

Keywords Accelerometer · Bayesian Linear regres-sion · Gyroscope · Hierarchical Linear Model

1 Introduction

Regular physical activity plays an important role inweight control, reducing risk of cardiovascular disease,type 2 diabetes, some cancers besides improving men-tal health and bone strength as described by Warburtonet al. [2006]. Dunn et al. [1999] have shown that an eas-ily available practice for an active lifestyle is to walk reg-ularly . Characterizing energy expenditure from walk-ing would provide a valuable tool in the assessment ofactivity-based intervention measures.

Over the last decade, commodity-priced kinematicsensors such as accelerometers and gyroscopes have be-come promising tools for the quanti�cation of caloriesconsumed from human movement. Recent research byAminian and Naja� [2004], Troiano [2006], Rothneyet al. [2007] has focused on combining their small size,low cost, increasingly high precision with pattern recog-nition techiniques. These techniques rely on learning adata-driven regression map from movement features toa ground truth measure of calorie consumption. Learn-ing regression models from inertial sensors is a well-studied problem as shown by [Vathsangam et al., 2010],[Albinali et al., 2010] and Vathsangam et al. [2011a] .An important issue that arises when learning this mapis that the participants of any sample population will

exhibit natural variations in weight, height, leg-length,age, sex etc. A successful calorie estimation model mustbe able to predict calories from movement accuratelywhile accounting for these variations. A natural ques-tion to ask is whether it is possible to create a modelthat allows accurate user-speci�c modeling while cap-turing commonalities across users of di�ering physio-logical descriptions. If so, what combinations of inertialsensor and physiological features together provide thebest descriptors of this model?

In this study, we address the problem of normaliz-ing predictions of energy expenditure across a popula-tion when using inertial sensor data to predict caloriesburnt. We focus on a particular activity: steady-statetreadmill walking, because it allows the capture of arepeatable, well-de�ned and easily quanti�able move-ment. We use inertial data from a triaxial accelerometerand triaxial gyroscope mounted on the right iliac crestas inputs to describe walking movement and treat thefunctional mapping of these inputs to energy expendi-ture as a two-level hierarchical regression problem. Our�rst goal is to identify the best features to representmovement and physiological parameters as de�ned byhighest prediction accuracy. We then show results com-paring the accuracy of the generalized approach withconventional regression models. This study expands onprevious work involving hierarchical linear modeling byVathsangam et al. [2011b]. It di�ers from the originalwork in that an in-depth feature study is evaluated forthe Hierarchical Linear Model. We also compare ourapproach to current state of the art speed-based andaccelerometer count-based approaches.

2 Related Work

Current techniques for normalizing energy predictionacross a population adopt two methodologies. The �rstfamily of techniques create isolated user-speci�c mod-els from each individual's data. Such models are notlikely to be successful for unseen data points of an-other participant. The second set of techniques learna general model treating all users as one after normal-izing for users based on physical characteristics suchas weight or height. For example, a common techniqueto normalize across participants is to scale energy con-sumption values by a suitable weight exponent. Theparticipants in the population are then replaced by asingle pseudo-participant with weight-scaled energy val-ues. Most common scaling coe�cients include a rangefrom 0.6 − 1.0 by Zakeri et al. [2006], the most popu-lar being 0.67 by Neville et al. [1992], 0.75 by Rogerset al. [1995] and 1.0 by Waters and Mulroy [1999],Wyndham et al. [1971] andPearce et al. [1983]. An is-sue with weight scaling is determining the appropriatescaling coe�cient across a target population. Rogerset al. [1995] showed that scaling coe�cients vary across

Fig. 1: Illustration of the di�erence between traditional

approaches (left) and our approach (right). Rather than

treating all subjects as equivalent or developing a sep-

arate model per user, we adopt a hierarchical approach

to capture common behaviors while retaining individu-

alization.

age groups and stages of development in individuals .This means that one has to determine a di�erent scal-ing coe�cient for each new population. Such an ap-proach suppresses the role of individual variances inpredicting energy expenditure. Also, Waters and Mul-roy [1999] showed that in addition to weight, the ef-fect of other physiological descriptors such as sex, stridelength and heart rate also have to be incorporated andit is not clear how scaling based techniques would gen-eralize across these additional parameters.

An important observation to be made when mod-eling across a population is that the individual par-ticipants in the population share common kinematictraits. In the case of energy expenditure from steady-state treadmill walking, all participants share a com-mon property in that steady state walking is cyclicalin nature. This individual details of the map from thenature of walk to energy expenditure might vary foreach participant. However, similar participants mightexhibit similar maps. It might be possible to take ad-vantage of common traits across users to capture a gen-eral population model and use it to create better indi-vidual models. The challenge is to fuse both the individ-ual and population-based characteristics into a uni�edframework while maintaining the simplicity of standardregression techniques.

Linear regression can be extended to capture com-monalities across a population using a Hierarchical Lin-ear Model (HLM) such as that described by Gelmanand Hill [2007]. Given a target population, HLM basedtechniques use linear models at levels within and acrossparticipants. Figure 1 illustrates the principle behindHLMs. At one level we have participant speci�c mod-els relating inertial sensor features to energy expendi-ture. At a second level we capture the inter-dependenceof di�erent subject-speci�c models on physiological pa-rameters using a (second) regression model. The ad-vantages of such an approach are many. Using a sec-ond level to capture commonalities across subjects al-lows the separation of the dependence on physiologi-cal parameters from participant-speci�c inertial sensordata. Such an approach also allows �exibility in decid-ing the right combination of physiological parameters torepresent participants. Training this model allows jointmodeling of inter-participant information. Most impor-tantly, HLM allows one to generate informed participant-speci�c models using only higher level information. Thisis an advantage when limited or no data is available fora new participant. Thus we retain all the bene�ts ofsubject-speci�c monitoring using linear regression whilecapturing generalizability across populations. Hierar-chical Linear Modeling has been successfully used invarious biological systems for joint modeling across apopulation [Gelfand et al., 1990].

3 Estimating Energy across a Population

Our goal is to obtain a data-driven functional map frommovement features (derived from continuous inertialsensor data) to calories burned (as measured by averageV O2 consumption). This map must be accurate be ob-tainable using minimal data from the user. We set thisin a regression framework. Consider a D-dimensionalinput variable x ∈ RD of which there are speci�c datapoints {xn}Nn=1. The goal of regression is to predict thevalue of one or more continuous target variables y ofwhich there are corresponding observed values {yn}Nn=1

that are related to the input variables by a �best-�t�function f(xn). In this section, we describe a family oftechniques for creating individualized regression mapsand then extend these techniques normalize these mapsacross a populating using a Hierarchical Linear Model(HLM).

3.1 Representing Treadmill Walking

Steady state human walk is cyclic in nature [Changet al., 2004]. We capture this periodicity using a singleinertial sensor worn above the iliac crest on the righthip. Sensor data corresponds directly to the accelera-tions and rotational rates of the hip in the sensor's local

Fig. 2: Illustration of individualized energy prediction

model. Training the model is equivalent to �nding op-

timal w to maximize data likelihood. A distribution is

assumed over w : w ∼ N (w : 0, α−1I). α is a hyper-

parameter.

frame of reference. To take advantage of this periodic-ity, we considered six families of features as describedby Tapia et al. [Tapia, 2008]:

� Measures of body posture consisting of AC meanacross each axis and area under signal

� Measures of motion shape consisting of meanof absolute value, total signal vector magnitude, en-tropy, skewness and kurtosis

� Measures of motion variation consisting of vari-ance, coe�cient of variation over the absolute valueand signal range

� Measures of motion spectral content consist-ing of the normalized Fast Fourier Transform (1024point FFT) coe�cients

� Measures of motion energy consisting of totalenergy, energy in the activity band of 0.3 - 3.5 Hz,energy of low physical activity band of 0.0 - 0.7 Hzand energy in the vigorous physical activity band of0.71 - 10 Hz in the unnormalized 1024 - point FFTsof raw signals within each epoch

� Measures of motion periodicity consisting ofmean crossing rate and cepstral coe�cients

Any particular category or a combination of these fea-tures can be used to represent the input data pointsx = {xn}Nn=1.

3.2 Subject-speci�c energy estimation models withBayesian Linear Regression

One way to map input features to target values is touse a separate linear regression model for each partici-pant. We adopt a Bayesian approach through BayesianLinear Regression (BLR). The approach is similar toour earlier work in predicting energy expenditure [Vath-sangam et al., 2010]. Figure 2 represents a graphicalmodel based plate representation of the conventionalBLR technique.

In the context of energy prediction, for a given per-

son p, with input feature data set Xp = {xnp}Np

n=1 and

target energy values Yp = {ynp}Np

n=1:

Fig. 3: Illustration of generalized model for energy

expenditure prediction for P people and Np training

points per person . Each person has a subject-speci�c

weight wp that is in�uenced by physiological parame-

ters such as height, weight and age through k and in

turn in�uences energy prediction given an input point

xnp.

ynp = wTp xnp + ε, ε ∼ N

(0, σ−2p

)(1)

where ε is a noise parameter andwp = (w0p , . . . , wM−1p)T

are the model weights. Training the model is equivalentto learning the weights wp and the noise parameters σp.Using the properties of Gaussian distributions, we have

p(ynp |xnp ,wp, σp) = N(ynp ;w

Tp xnp , σ

−2p

)(2)

Bayesian Linear Regression (BLR) Bishop [2006a]adopts a Bayesian approach to linear regression by in-troducing a prior probability distribution over wp. Wechoose a Gaussian prior, p(wp) = N

(wp;0, α

−1I)over

the model parameters wp. The optimal prediction for anew data point is given by the predictive distribution:

p(y∗p|x∗p,Yp, α, σp) = N (mTNx∗p, σ

2N (x∗p)) (3)

and σ2Np

(x∗p) = σ−2p + x∗pTSNp

x∗p (4)

mNp = σ2pSNXT

p Yp (5)

and S−1Np= αI+ σ2

pXTp Xp (6)

Model parameters are estimated by �nding the best αand σp to maximize the evidence function, �nding thebest parameters w to maximize the likelihood givena �xed α and σp alternately until convergence. Thistechnique provides a subject-speci�c BLR model thatcan be trained and used for each participant separately.

3.3 Generalizing Energy Estimation Models withHierarchical Regression

3.3.1 Model description

We build on the individualized model to obtain an HLMincorporating physiological features. Consider a test pop-ulation consisting of P participants. For each partici-pant p, let there be Np data points collected, consisting

of the energy values Yp = {y1p , y2p , . . . , yNp} and D-dimensional feature inputs Xp = {x1p , x2p , . . . , xNp

} asdetailed in Sec. 4.2. We denote Y = {Y1,Y2, . . . ,YP }and X = {X1,X2, . . . ,XP } to be complete set of train-ing data for all participants. Let each participant havephysiological features determined by Physp and the

complete set be PHY = {Physp}Pp=1These includeparameters such as height, weight and age (with a con-stant term for bias). We model top-down dependence ofwp on each participant's physiological features Physpthrough a universal weight parameter k. Each wp inturn in�uences energy predictions ynp

for an input xnp.

Figure 3 illustrates the plate representation of the hier-archical model. Similar to Sec. 3.2, each output energyvalue, ynp is linearly dependent on input xnp . This canbe expressed as:

Yp ∼ N (Yp;wTp Xp, σ

−2p I) (7)

or ynp∼ N (ynp

;wTp xnp

, σ−2p ) (8)

Our model di�ers from the BLR case in that, we assumethat the prior distribution overwp is not uninformativebut has a linear dependence on k and Physp:

wp ∼ N (wp;kTPhysp, α

−1I) (9)

Both wp and k are hidden variables which need to beestimated from data. Variable k is also not a point es-timate but has a prior distribution k ∼ N (k;0, σ−2I).Each wp is now an informative prior dependent on theperson's physiological features Physp through k. We

denote W = {wp}Pp=1

Training the multilevel model is equivalent to learn-ing individual wp's, the overall parameter k as well asthe noise parameters {σp}Pp=1, α and σ. The HLM com-bines P personal regression models in two ways. First,the local regression coe�cients wp determine energyvalues for each person. Second, the di�erent coe�cientsare connected through the population-level model pa-rameter k. Intuitively, the HLM captures the inherentsimilarity in walking across di�erent people while ac-counting for individual walking styles and energy con-sumption.

3.3.2 Likelihood evaluation

We aim to �nd the optimal parameters that maximizethe likelihood of each energy prediction ynp

for eachperson p, given the input data points xnpand the phys-iological features Physp. This likelihood can be writtenas:

l = p(Y|X,PHY) (10)

=

ˆp(Y,W|X,PHY)dW (11)

=

ˆp(Y|W,X,PHY)p(W|X,PHY)dW (12)

p(W|X,PHYS) can be expressed in terms of hid-den variable k as:

p(W|X,PHY)

=´p(W,k|X,PHY)dk

=´p(W|X,k,PHY)p(k|X,PHY)dk (13)

From the graphical model in Fig. 3, the probabili-ties p(Y|W,X,PHYS) and p(W|X,k,PHYS) can bebroken down into individual distributions as:

p(W|X,k,PHY) =

P∏p=1

p(wp|Xp,k,Physp)

p(Y|W,X,PHY) =

P∏p=1

Np∏np=1

p(ynp|xnp

,wp,Physp)

From the model graph, we can inferwp ⊥⊥ Xp|k,Physp,ynp⊥⊥ Physp|xnp

,wp and k ⊥⊥ PhyspXp. Thus:

p(W|X,k,PHY) =

P∏p=1

p(wp|k,Physp) (14)

p(Y|W,X,PHY) =

P∏p=1

Np∏np=1

p(ynp |xnp ,wp) (15)

p(k|X,PHY) = p(k) (16)

Substituting Eqs. 14, 15 and 16 into Eqs. 12 and 13, wehave:

l = p(Y|X,PHY) =ˆ P∏

p=1

Np∏np=1

p(ynp |xnp ,wp,Physp)p(W|X,PHY)dW

p(W|X,PHY) =

ˆ P∏p=1

p(wp|k,Physp)dk (17)

The pair of equations represented by [17] represent thelikelihood of the observationsY given physiological fea-tures PHY and inputs X. Maximizing the log - likeli-hood, L = log l is equivalent to �nding the optimalwp,kand respective noise parameters that maximize theseequations. Of particular interest is parameter k ∈ RD

which helps generate a person dependent weight wp

given only the physiological parameters. The probabil-ities p(ynp

|xnp,wp) and p(wp|k,Physp) are de�ned by

Eqs. 8 and 9 respectively. For the class of exponentialdistributions in the absence of prior information, thereis no closed form solution for {wp}Pp=1 and k. Henceapproximation techniques are required. Table 1 sum-marizes the terms used in our model.

Table 1: Glossary of Terms

Term Description Dim

xnpnth input point for

person p

1×D

ynpnth target V O2 for

person p

1× 1

wp Weight parameter for

person p

1×D

k Population weight

parameter

1×D

Physp Physiological features 1× (M + 1)

σp Noise parameter for

person p

1× 1

σ Prior variance for k 1× 1

α Variance for wp 1× 1

Np Number of data points

for person p

−

Xp Collection of input

data for person p

Np ×D

Yp Collection of V O2

values for person p

Np × 1

X Collection of all input

data points

P ×Np ×D

Y Collection of all V O2

values

P ×Np × 1

PHY Collection of all

physiological values

P × (M + 1)× 1

3.3.3 Algorithm description

We propose an EM like algorithm Bishop [2006b] tolearn the parameters {wp}Pp=1 and k. Our original aimwas to maximize the likelihood L = log p(Y|X,PHY).We approximate the likelihood term to incorporate themaximum a posteriori estimates of individual weightswp denoted by wp. Each wp is now a point estimateassumed to be known and can be interpreted as a pa-rameter that has to be optimized. From Eq. 9, the MAPestimate corresponds to the mean of eachwp. The mod-i�ed algorithm maximizes the incomplete log likelihoodlog p(Y|W,X,PHY). It does so by maximizing the ex-

pected complete log likelihood⟨log p(Y,Z|W,X,PHY)

⟩.

This expectation is written as:

Q(W,Wold) =∑k

log p(Y,Z|W,X,PHY)p(Y,Z|W,X,PHY)

We treat k as a hidden variable that needs to be esti-mated and hence Z = k. Correspondingly, we have thefollowing steps:

Initialization Initialize wp, σp, α to appropriate values.An appropriate initialization condition is one obtainedusing least squares estimates.

E step Evaluate p(k|{woldp }Pp=1, {Physp}Pp=1). For this,

we take advantage of the linear dependence of eachperson's weight vector wp given their physiological fea-tures Physp through k. This means we have a data setof P target wp's and their corresponding input vari-ables Physp's. We can thus frame a regression problemfrom a person's physiological parameter variable Physto a weight variable w (of which there are P exam-ples) through k. Assuming that k ∈ RD and no co-variance terms, we frame D separate linear regressionproblems. The mth regression problem, maps the phys-iological features to the mth component of w throughthe mth component of k. Thus we have D separate lin-ear regression problems mapping physiological featuresPhysp to wEM through k in a component-wise man-ner. We solve each of these regression problems in aBLR framework similar to that described in Sec. 3.2 toobtain a mean and variance measure for k.

M step Maximize likelihood of the data set given k.This corresponds to re-estimating parameters using thecurrent value of k. The learned k from the E-step isused to estimate individual wp's given their physiologi-cal features Physp. From the linear relationship as de-�ned by Eq. 9, we have:

wP = kTPhysp (18)

We use this estimate as initial conditions and maximizethe likelihood of the data set. Given k (which is �xedafter the E-step), this is equivalent to maximizing indi-vidual likelihoods of each of the participants. Maximiz-ing the individual likelihoods is the same as �nding the

optimal wp given Np data pairs {xnp , ynp}Np

np=1. Thisis equivalent to solving P individual Bayesian LinearRegression problems with the initial conditions of wp'sas de�ned above and �nding the optimal wp's.

Evaluate log likelihood The total log likelihood is thesum of the P individual log likelihoods found from theM step. We check for convergence of log likelihood andif not, repeat the E and M steps again. Using this al-gorithm we learn a generalized energy prediction modelthat maps physiological features to subject speci�c weightswp and uses these weights to predict energy expendi-ture for each subject.

4 Methods

4.1 Hardware

Human movement was captured using a modi�ed Spark-fun 6DoF Inertial Measurement Unit (IMU) [2009] worn

on the right iliac crest. The sensor provided 6 sen-sor streams conveying triaxial acceleration through aFreescale [2008] MMA7260Q triaxial accelerometer fromand triaxial rotational rates through 2 Invensense [2006]IDG300 gyroscopes allowing translational and rotationalmotion capture in all three planes � sagittal, frontal andtransverse. The accelerometer was set at ±2g range.Data were sampled at 100 Hz and transmitted via Blue-tooth (RN-41 Bluetooth module) to a nearby PC. Addi-tionally, activity patterns in the form of accelerometercounts as generated from a single Actigraph GT1M (10second epochs) were recorded. The GT1M was mounted�rmly on top of the IMU with the X and Y axes of bothsensors aligned as close as possible. Figure 4a illustratesthe hardware setup used for collecting data.

4.2 Data collection

Nine healthy adults (Five men, four women) partici-pated in the study. Height and weight of each partici-pant were recorded using a Healthometer balance beamscale. Figure 4b illustrates participant statistics usinga Weight (vs) Height graph. Each point is color codedby Body Mass Index (BMI). Informed written consentwas obtained from participants and the study was ap-proved by the Institutional Review Board, University ofSouthern California. Participants had average weight =72± 21 kg and average height = 1.73± 0.11 m.

Rate of oxygen consumption (V O2, ml/min) as mea-sured using the MedGraphics Cardio II metabolic sys-tem with BreezeSuite v6.1B (Medical Graphics Corpo-ration) was the representation of energy expenditure.Before each test, the �ow meter was calibrated againsta 3.0 L syringe and the system was calibrated againstO2 and CO2 gases of known concentration. This systemoutputs data at the frequency of every breath. Figure4c illustrates a typical recording procedure. Each par-ticipant was asked to walk at 5 speeds (2.5, 2.8, 3.0, 3.3,and 3.5 mph) on a motorized treadmill for 7 minutesof recording time per speed. At each speed transition,V O2 readings were allowed to stabilize for 2 minutesprior to the start of data collection. Speeds were cho-sen based on the Compendium of Physical Activities[Ainsworth et al., 2000].

4.3 Data Preparation

Each sensor stream from the IMU was passed througha low pass �lter with 3dB cuto� at 20 Hz. The cut-o� frequency was chosen keeping in mind that every-day activities fall in the frequency range of 0.1-10 Hzas was shown by Bao and Intille [2004]. Each of the7 minute streams from the IMU were divided into 10second intervals or epochs. The 10 second interval waschosen based on previous successful implementations

(a) Sensor board used to collect data.Data streams consist of triaxial ac-celerometer and gyroscopic information.Source:www.sparkfun.com

(b) Illustration of population statistics plotted as Weight (vs)Height for each subject. Each point is color coded by BMI.Average weight = 72± 21 kg. Average height = 1.73± 0.11m.

(c) An example recordingprocedure for a single par-ticipant.

Fig. 4: Illustration of hardware, ground truth collection and population statistics

by Vathsangam et al. [2010]. Each subject's data con-sists of roughly 210 data points. Within each epoch,feature vectors were extracted from each sensor streamas explained in Sec. 3.1. A combination of these featuresrepresent the input points xnp as described in Sec. 3.3.When it was required to extract an FFT over a sig-nal, the Fourier coe�cients corresponding to frequen-cies greater than 10 Hz were discarded. All featureswere calculated for both the accelerometer and gyro-scope and for each axis within each. Data for each userwas thus a set of epochs each containing 6-dimensionalraw data and the average rate of oxygen consumption(V O2) for that epoch. These represent per-user datawhile walking at �ve di�erent speeds.

4.4 Reference Equations for Comparison

In order to compare our approach with current state-of-the-art techniques, a speed based calorie predictionobtained from The ACSM [2010] Exercise Guidelinesand an Actigraph count-based 2-regression model byCrouter et al. [2006] were also calculated on the corre-sponding recorded speeds and synchronized Actigraphdata respectively. The ACSM Exercise Guidelines pro-vide a means to estimate calories burnt from the speedof walk as:

V O2(mL/min) =

((0.1× 26.8× Speed (mph)) + 3.5)×Weight (kg)

Crouter et al. [2006] provides a method to convert 10second epoch based accelerometer counts to caloriesconsumed as:

V O2(mL/min) =

2.330519 + (0.001646× Counts)−(1.2017× 10−7 × Counts2)

+(3.3779× 10−12Counts3)) ∗ 3.5×Weight (kg)

Errors for all models were measured as least-squarederror from the ground truth output from the metaboliccart.

5 Results and Discussion

This section provides a comparative analysis of predic-tion accuracy based on di�erent models. Models werevaried across two axes. First, a comparative featurestudy was performed to identify the best feature spacefor the Hierarchical Linear Model and the subject-speci�cregression model. The best feature space was de�nedas that which resulted in the least percentage predic-tion errors across participants. Second, given the op-timal feature space the Hierarchical Linear Model wascompared with subject-speci�c and weight-scaled ap-proaches to determine relative accuracies. Unless oth-erwise stated, all results were statistically signi�cant(p<0.1 per participant).

5.1 Feature Study

5.1.1 Choosing Optimal Movement Features

Given candidate feature families as described in Sec.3.1, our �rst study focused on determining the best fea-ture space to represent human movement. Once again,the best candidate was determined as that which min-imized the V O2 prediction error. Figure 5 summarizesthe results. In each run, data from one participant pwas selected. 60% of this data was randomly partitionedinto training data, the remaining constituting test data.Five di�erent feature families were extracted from theepochs corresponding to the training data. Five di�er-ent subject-speci�c models (shown in yellow) for each ofthe feature families using participant p's training data

Fig. 5: Comparison of errors obtained from di�erent fea-

ture spaces using the HLM (red) and subject-speci�c

models averaged across all participants (yellow). The

1024 point FFT produced the least error in both the

subject-speci�c model and HLM. Errors from the HLM

were roughly twice that of the subject-speci�c regres-

sion models. The second lowest errors in the subject-

speci�c model cases were obtained using motion shape

as a feature. Corresponding second lowest errors in the

HLM cases were obtained when using motion periodic-

ity as a feature. Features that constitute motion shape

and motion variation vary across experiment sessions

and lead to poor generalizations. Features that derive

periodicity are more reliable across a population.

alone. Energy predictions were made for participant p'stest data. Simultaneously, �ve di�erent HLMs (shownin red) were trained using all data from the remainingP − 1 participants and energy predictions were madefor participant p′s test data. For reference, speed-based(shown in black) and count-based (shown in gray) V O2

calculation techniques were also implemented on thesame test data. This was repeated �ve times for dif-ferent randomly partitioned training and test data anda per-participant error was calculated. This procedurewas repeated for each participant p and errors were av-eraged across all participants.

The 1024 point FFT produced the least error inboth the subject-speci�c model and HLM. Errors fromthe HLM were roughly twice that of the subject-speci�cregression models. Subject-speci�c regression models per-formed better than speed or count based techniques. Aninteresting observation is that while the second lowesterrors in the subject-speci�c model cases were obtainedusing motion shape as a feature, the corresponding sec-ond lowest errors in the HLM cases were obtained whenusing motion periodicity as a feature. Features that con-stitute motion shape and motion variation vary acrossexperiment sessions. These feature values can changewith slight changes in orientation, location of sensor,di�ering walking styles between sessions in additionto natural variations across participants. On the other

Fig. 6: Illustration of prediction errors (expressed as

percentage of ground truth) with di�erent combinations

of physical parameters. Maximum errors were obtained

when only height was used as a feature vector. Among

features chosen weight is the single best physiological

features to estimate k.

hand, since steady-state walking is a cyclical activity,features that derive this periodicity are more reliableacross a population. The 1024 point FFT is a case of�ne-grained tracking of periodicity. This is the primaryreason why periodicity based features perform betterthan motion shape or motion variation based features.Given that it produced the lowest error rates, in bothsubject-speci�c models and HLMs the 1024 point FFTwas used as the only feature space to represent humanmovement for the remainder of this study.

5.1.2 Choosing the Single-best Optimal Physiological

Feature

Given the 1024 point FFT transform as the optimalfeature set to represent movement, the second studyexamined the best set of physiological features thatminimized energy expenditure prediction errors. Thefeatures used were Height, Weight, BMI, Age and allfeatures combined. We did not consider combinationsof features because the small size of our populationmade the algorithm prone to over�tting. Maximum er-rors were obtained when only height was used as afeature vector. The best individual physiological fea-ture for this population was weight. Combining weightand height only marginally improved performance andadding age degraded performance. Based on the aboveresults, it was decided to use weight as the only physio-logical feature. Figure 6 illustrates the errors obtained.With these results in mind, the optimal feature spacewas chosen to be the 1024 point FFT at the individuallevel and weight only at the population level.

Fig. 7: Comparison of errors obtained from the HLM

(red) with individualized subject-speci�c models av-

eraged across all participants (yellow). The general-

ized model showed comparable errors to subject-speci�c

models with 10% of training data. With more training

data, subject-speci�c models outperformed the general-

ized model and previous speed or accelerometer based

approaches.

5.2 Algorithm Comparison

5.2.1 Comparison with Subject-speci�c Modeling

Given the optimal feature space, an important ques-tion is how the HLM compares with subject-speci�cmodels. Fig. 7 illustrates the relative errors obtainedwhen an HLM (shown in red) trained using data fromP − 1 people is compared with a subject-speci�c modelfor the pth person with varying training data (shownin yellow) averaged across all participants. The test-ing methodology was similar to that described in Sec-tion 5.1.1 except that the feature set was kept constantand the percentage of training data was varied. Forreference, speed based and accelerometer count basedcalorie determination techniques also shown. The HLMshowed the same accuracy at all percentages of train-ing data and hence only one bar is shown. The HLMshowed comparable errors to subject speci�c modelswith 10% of training data used. With more trainingdata, subject speci�c models outperformed the HLMand speed/count based approaches. The availability ofmore training data allows stronger modeling capabil-ity of subject-speci�c models. An HLM still showed thesame level of performance as a subject-speci�c modelwith small amounts of data. Hence, in cases where nodata from a subject is available, using an HLM mightbe a preferable option to predict calories burnt. Withlarge amounts of data, a subject-speci�c model withlarge amounts of data would perform better than boththe HLM and speed/accelerometer based approachesdescribed. It can be shown that predictions from a BLRmodel (used in estimating k's) approach the ideal valuewith increasing large amounts of training data. Henceit is expected that as the size of the target population

Fig. 8: Illustration of the e�ect of adding a small

amount of subject-speci�c training data to the HLM.

An informative prior from physiological features is used

to train a subject-speci�c model, shown in red. For

comparison, a subject-speci�c model (shown in yellow)

with an uninformative prior is also trained. Using an

HLM with initial conditions along with a small percent-

age of training data produced similar errors to using a

subject-speci�c model with large amounts of training

data (p<0.1 per subject).

is expanded, the HLM will perform competitively withthe subject-speci�c approach.

5.2.2 Utilization of HLM as an Informative Prior

Given the superior performance of subject-speci�c mod-els, this section of the study explores whether a hy-brid approach utilizing both an HLM and a subject-speci�c model could be used to produce even moreaccurate results. This can be achieved by combiningthe weights obtained using the generalized model withlimited training data to equal or improve model pre-diction accuracy. Such an approach would be bene�-cial because it o�ers the potential of using less train-ing data for training subject-speci�c models. Given thelearned k, from a HLM, one can predict the subjectspeci�c weight wp for an unknown subject using Eq.18. One can then train a subject-speci�c model withthis wp as an informative initial condition and limitedsubject-speci�c data. Fig. 8 illustrates percentage er-rors obtained with varying amounts of training data.Training a model with an informative prior that waslearned from the generalized model (red) showed lowererrors when compared with cases where an uninforma-tive prior was used (yellow). This was most pronouncedwhen small amounts of training data are used (p<0.1per subject). It can also be seen from the �gure that us-ing an informative prior with small amounts of trainingdata showed comparable errors to training with no anuninformative prior and substantial training data. Theuse of an informative prior also showed higher accu-racy than existing speed-based or accelerometer basedtechniques. This approach suggests that while the HLM

Fig. 9: Illustration of the predicted values versus ground

truth using both generalized algorithms (red) and

subject-speci�c algorithms (blue) for a single partici-

pant. Similar plots exist for other participants. HLMs

perform poorly in the end regions. This could be be-

cause the parameters are optimized over the most sim-

ilar looking input points. These occur in the middle

ranges of speeds of each participant.

by itself produces higher errors than a subject-speci�cmodel, the lowest errors can be obtained by using the kfrom the model to obtain a subject speci�c weight wp

and using this wp as an informative prior with smallamounts of training data from the participant.

5.2.3 Sources of Model Inaccuracies

Figure 9 illustrates the relative predictive capacities ofthe HLM and subject-speci�c model as applied to oneparticipant. It can be seen that while the HLM made asgood V O2 predictions as the subject-speci�c model inthe middle energy range, the model broke down whenpredicting lower or higher energy ranges. This can beunderstood as follows, the HLM has to simultaneously�t model parameters across participants and withineach participant. Most participants' walking styles ex-hibit the most variation at edges of the walking speedrange due to natural physiological di�erences such asweight, height and leg-length. The generalized modelhas to tradeo� between overall accuracy and subject-speci�c accuracy. Therefore, the parameters are opti-mized over the most similar looking input points whichoccur in the middle ranges of speeds of each participant.This is typical of the bias-variance tradeo� involved inlearning such models.

5.2.4 Comparison with Weight-scaled Models

As was described in Section 2, current normalizing ap-proaches model weight dependence by using power lawscaling. V O2 values are scaled by a weight exponent ofthe form W s. All participants are considered to repre-sent a single data set and each V O2 value is scaled by

(a) A color map illustration of percentage errors as a func-tion of training data and weight exponent extracted from theprevious �gure. Lowest errors of ≈ 6.5% are seen when anexponent coe�cient of 0.7 − 1.0 and a large percentage oftraining data (> 50%) are used.

(b) Illustration of the training errors as a function of di�erentexponents as compared with subject-speci�c models and HLM.Subject-speci�c models outperform weight exponent scaled mod-els regardless of exponent. Generalized models perform worsethan subject-speci�c or weight exponent scaled models.

Fig. 10: Comparison of HLM and subject-speci�c mod-

els with weight-scaled linear approaches. V O2 values

are scaled by a weight exponentW s where 0.6 < s < 1.6

and probabilistic linear models are trained, these are

compared against the subject-speci�c and generalized

model described in this study.

a suitable weight exponent W s as V O2,scaled =V O2

W s.

This section of the study focused on how the HLM andsubject-speci�c models compare against such scalingbased approaches. The uni�ed data set was divided intotraining and testing data and regression models weretrained. This was repeated for di�erent randomly sam-pled data and percentages of training and testing data.Various exponent coe�cients in the range 0.6 < s < 1.6were used in increments of 0.1 and percentage errors (asde�ned in previous section) were recorded. Thus for dif-

ferent combinations of training data and exponent co-e�cients, we have an error surface. Fig. 10a shows thesurface as seen from above. This represents a color plotof errors for various training percentage-exponent coef-�cient combinations. Lowest errors (≈ 6.5%) were seenwith an exponent coe�cient of 0.7−1.0 and a large per-centage of training data (> 50%). This exponent coe�-cient value corresponded to previous research indicatingthat the optimal value is approximately 0.65− 1.0.

Fig. 10b illustrates a comparative performance be-tween power law scaled approaches, generalized modeland subject-speci�c model based approaches for di�er-ent power coe�cients at all training percentages. Subject-speci�c models outperformed weight exponent scaledmodels irrespective of exponent. This is because eachsubject-speci�c model used training data only from thatparticular participant. This would naturally result in abetter �t than any form of weight scaling across allparticipants. Generalized models performed worse thansubject-speci�c or weight exponent scaled models butperformed better than either when a small amount oftraining data was used.

6 Conclusion

Accurately tracking and measuring calories burned fromwalking provides a valuable tool in designing e�ectiveintervention measures. The last decade has seen theemergence of inertial sensors to detect, characterize andquantify physical activity. An issue with using inertialsensor data to estimate energy expenditure is how datacan be normalized across varying physiological featuressuch as height, weight, age etc. Common approachessuch as weight scaling require validation across eachnew target population. An alternative is to extend thecapability of standard linear regression through Hier-archical Linear Modeling (HLM). At one level we haveparticipant speci�c models relating inertial sensor fea-tures to energy expenditure. At a second level we cap-ture the inter-dependence of participant-speci�c mod-els on physiological features using a second regressionmodel. Our contributions are summarized as:

Flexibility in modeling physiological and fea-

ture parameters: HLMs allow �exibility in incorpo-rating features both at the physiological and sensormodeling level. By di�erentiating these features at mul-tiple levels, one can easily switch, add or remove variouscombinations and examine their e�ects on predictionaccuracy. In our study, weight was the single best phys-iological feature and a 1024-point FFT was the bestfeature for description of movement.

Accurate models with sparse data: In manystudies across a large population, researchers often haveto deal with inadequate or unequal amounts of datafrom participants. Capturing inter-participant depen-dencies through a higher level of modeling allows one

to e�ectively �transfer� model information from thoseparticipants for whom extensive data are available tothose where only limited data are available. Our imple-mentation demonstrated the e�ectiveness of using theHLM to obtain an informative initial condition to fur-ther train individual models. Despite having access tosparse data, using an informative initial condition pro-duced similar errors to a subject-speci�c model withlarge training data.

Comparison with subject-speci�c and weight-

scaled modeling: The generalized model showed sim-ilar errors to subject-speci�c models with 10% of train-ing data used. Subject-speci�c models performed bet-ter than weight exponent scaled models for all exponentscales. An important insight was that generalized mod-els showed competitive prediction accuracies with thesubject-speci�c model in the middle energy range foreach subject but broke down when predicting for loweror higher energy ranges. This is most likely becausemost subjects exhibit similar walking patterns in themid-speed ranges.

7 Future Work

We plan to expand our work in a number of directions.The most important extension is to test our algorithmacross a much larger population and more comprehen-sive set of activities. Testing across a larger populationwill also allow a comparison between the e�ects of otherphysiological features such as height, stride length andsex on prediction accuracies. We also aim to test thealgorithm in free-living conditions across common ac-tivities such as walking, sitting and standing. Finally,we plan to compare other approaches that learn theparameters k and wp, including Gibbs-sampling andvariational approximations.

Acknowledgements This work was supported in part by Qual-comm, Nokia, NSF (CCR-0120778) as part of the Center for Em-bedded Networked Sensing (CENS), and the USC ComprehensiveNCMHD Research Center of Excellence (P60 MD 002254). Sup-port for H. Vathsangam was provided by the USC AnnenbergDoctoral fellowship program. The authors would like to thankDavid Erceg of the Division of Biokinesiology and Physical Ther-apy, USC, for his invaluable support and guidance.

References

6DoF Inertial Measurement Unit Datasheet. Technical report,Sparkfun Electronics, 2009. URL http://www.sparkfun.com/

products/8454.ACSM. ACSM's Guidelines for Exercise Testing and Prescrip-

tion. American College of Sports Medicine, 2010.B.E. Ainsworth, W.L. Haskell, M.C. Whitt, A.M Swartz, S.J.

Strath, W.L. O'Brien, D.R. Bassett Jr., K.H Schmitz, P.O.Emplaincourt, D.R. Jacobs, and A.S Leon. Compendium ofphysical activities: classi�cation of energy costs of human phys-ical activities. Med. & Sci. in Sports & Exer., 32:S498�S516,2000.

http://www.sparkfun.com/products/8454

http://www.sparkfun.com/products/8454

Fahd Albinali, Stephen Intille, William Haskell, and Mary Rosen-berger. Using wearable activity type detection to improvephysical activity energy expenditure estimation. In 12th con-ference on Ubiquitous Computing, 2010.

Kamiar Aminian and Bijan Naja�. Capturing human motionusing body-�xed sensors: outdoor measurement and clinicalapplications. Computer Animation and Virtual Worlds, 15(2):79�94, 2004.

Ling Bao and Stephen S. Intille. Activity Recognition from User-Annotated Acceleration Data. In Alois, editor, PERVASIVE,pages 1�17. Massachusetts Institute of Technology, Springer-Verlag GmbH, April 2004.

Christopher M. Bishop. Pattern Recognition and Machine Learn-ing, chapter 3.3, pages 152�156. Springer, 2006a.

Christopher M. Bishop. Pattern Recognition and Machine Learn-ing, chapter 9.4, pages 450�454. Springer, 2006b.

C. Chang, R. Ansari, and A.A. Khokhar. E�cient tracking ofcyclic human motion by component motion. (12):941�944,2004.

Scott E Crouter, Kurt G Clowers, and David R Bassett. A novelmethod for using accelerometer data to predict energy expen-diture. J Appl Physiol, 100(4):1324�1331, Apr 2006.

A.L. Dunn, B.H Marcus, J.B. Kampert, M.E. Garcia, H.W. KohlIII, and S.B. Blair. Comparison of Lifestyle and StructuredInterventions to Increase Physical Activity and Cardiorespira-tory Fitness: A Randomized Trial. JAMA, 281:327�334, 1999.

Freescale. 1.5g - 6g three axis low-g micromachined accelerome-ter. Technical report, 2008. URL http://www.freescale.com/

files/sensors/doc/data_sheet/MMA7260QT.pdf.A. E. Gelfand, S. E. Hills, A. Racine-Poon, and A. F. M. Smith.

Illustration of bayesian inference in normal data models usinggibbs sampling. Journal of the American Statistical Associa-tion, 85:972�985, 1990.

Andrew Gelman and Jennifer Hill. Data Analysis Using Regres-sion and Multilevel/Hierarchical Models. Cambridge Univer-sity Press, 2007.

Invensense. Integrated dual-axis gyro - idg 300. Technicalreport, 2006. URL http://www.sparkfun.com/datasheets/

Components/IDG-300_Datasheet.pdf.A.M Neville, R. Ramsbottom, and C. Williams. Scaling phys-

iological measurements for individuals of di�erent body size.Eur J Appl Physiol Occup Physiol., 65(2):110�7, 1992.

M. Pearce, D. Cunningham, A. Donner, P. Rechnitzer, G. Fuller-ton, and J. Howard. Energy cost of treadmill and �oor walkingat self-selected paces. European Journal of Applied Physiologyand Occupational Physiology, 52:115�119, 1983. ISSN 0301-5548.

D.M. Rogers, B.L. Olson, and J.H. Wilmore. Scaling for thevo_2-to-body size relationship among children and adults. JAppl. Physiol., 79(3):958�967, 1995.

Megan P. Rothney, Megan Neumann, Ashley Beziat, and Kong Y.Chen. An arti�cial neural network model of energy ex-penditure using nonintegrated acceleration signals. J ApplPhysiol, 103(4):1419�1427, 2007. doi: 10.1152/japplphysiol.00429.2007. URL http://jap.physiology.org/cgi/content/

abstract/103/4/1419.Munguia Tapia. Using machine learning for real-time activity

recognition and estimation of energy expenditure. PhD thesis,Massachusetts Institute of Technology. Dept. of Architecture.Program in Media Arts and Sciences., 2008.

Richard P Troiano. Translating accelerometer counts into en-ergy expenditure: advancing the quest. J Appl Physiol, 100(4):1107�1108, Apr 2006.

H. Vathsangam, A. Emken, E.T. Schroeder, D. Spruijt-Metz,and G.S. Sukhatme. Determining Energy Expenditure FromTreadmill Walking Using Hip-Worn Inertial Sensors: An Ex-perimental Study. IEEE Transactions on Biomedical Engi-neering, 58(10):2804 �2815, oct. 2011a. ISSN 0018-9294. doi:10.1109/TBME.2011.2159840.

Harshvardhan Vathsangam, Adar Emken, Donna Spruijt-Metz,Todd E. Schroeder, and Gaurav S. Sukhatme. Energy Estima-tion of Treadmill Walking using On-body Accelerometers andGyroscopes. In EMBC, 2010.

Harshvardhan Vathsangam, Adar Emken, E. Todd Schroeder,Donna Spruijt-Metz, and Gaurav S. Sukhatme. Towards aGeneralized Regression Model for On-body Energy Predictionfrom Treadmill Walking. In 5th International ICST Con-ference on Pervasive Computing Technologies for Healthcare,2011b.

Darren E.R. Warburton, Crystal Whitney, and Shannon S.D.Bredin. Health Bene�ts of Physical Activity. CMAJ, 174,2006.

Robert L. Waters and Sara Mulroy. The energy expenditure ofnormal and pathologic gait. Gait & Posture, 9(3):207 � 231,1999.

C. Wyndham, W. van der Walt, A. van Rensburg, G. Rogers, andN. Strydom. The in�uence of body weight on energy expendi-ture during walking on a road and on a treadmill. EuropeanJournal of Applied Physiology and Occupational Physiology,29:285�292, 1971. ISSN 0301-5548.

I. Zakeri, M.R. Puyau, A.L. Adolph, F.A. Vohra, and N.F. Butte.Normalization of Energy Expenditure Data for Di�erences inBody Mass or Composition in Children and Adolescents. J ofNutrition, 136:1371�1376, 2006.

http://www.freescale.com/files/sensors/doc/data_sheet/MMA7260QT.pdf

http://www.freescale.com/files/sensors/doc/data_sheet/MMA7260QT.pdf

http://www.sparkfun.com/datasheets/Components/IDG-300_Datasheet.pdf

http://www.sparkfun.com/datasheets/Components/IDG-300_Datasheet.pdf

http://jap.physiology.org/cgi/content/abstract/103/4/1419

http://jap.physiology.org/cgi/content/abstract/103/4/1419

hierarchical linear models for energy prediction using...

Documents