online human activity recognition using low-power wearable ... · articial neural network, random...

Online Human Activity Recognition using Low-PowerWearable Devices

Ganapati Bhat1, Ranadeep Deb1, Vatika Vardhan Chaurasia1, Holly Shill2, Umit Y. Ogras11School of Electrical Computer and Energy Engineering, Arizona State University,Tempe, AZ

2Lonnie and Muhammad Ali Movement Disorder Center, Phoenix, AZ

ABSTRACTHuman activity recognition (HAR) has attracted signi�cant researchinterest due to its applications in health monitoring and patient re-habilitation. Recent research on HAR focuses on using smartphonesdue their widespread use. However, this leads to inconvenient use,limited choice of sensors and ine�cient use of resources, sincesmartphones are not designed for this purpose. This paper presentsthe �rst HAR framework that can perform both online training andinference. The proposed framework starts with a novel techniquethat generates features using the fast Fourier and discrete wavelettransforms of a textile-based stretch sensor and accelerometer. Us-ing these features, we design an arti�cial neural network classi�erwhich is trained online using policy gradient algorithm. Exper-iments on a low power IoT device (TI-CC2650 MCU) with nineusers show 97.7% accuracy in identifying six activities and theirtransitions with less than 12.5 mW power consumption.

1 INTRODUCTIONAdvances in wearable electronics has potential to disrupt a widerange of health applications [10, 22]. For example, diagnosis andfollow-up for many health problems, such as motion disorders, de-pend currently on the behavior observed in a clinical environment.Specialists analyze gait and motor functions of patients in a clinic,and prescribe a therapy accordingly. However, as soon as the personleaves the clinic, there is no way to continuously monitor the pa-tient and report potential problems1 [12, 25]. Another high-impactapplication area is obesity related diseases, which claim about 2.8million lives every year [2, 4]. Automated tracking of physical ac-tivities of overweight patients, such as walking, o�ers tremendousvalue to health specialists, since self recording is inconvenient andunreliable. As a result, human activity recognition (HAR) usinglow-power wearable devices can revolutionize health and activitymonitoring applications.

There has been growing interest in human activity recognitionwith the prevalence of low cost motion sensors and smartphones.For example, accelerometers in smartphones are used to recognizeactivities such as stand, sit, lay down, walking and jogging [3, 15, 19].This information is used for rehabilitation instruction, fall detectionof elderly, and reminding users to be active [17, 34]. Furthermore,activity tracking also facilitates physical activity, which improvesthe wellness and health of its users [7, 8, 18]. HAR techniquescan be broadly classi�ed based on when training and inferencetake place. Early work collects the sensor data before processing.Then, both classi�er design and inference are performed o�ine [5].Hence, they have limited applicability. More recent work trainsa classi�er o�ine, but processes the sensor data online to inferthe activity [3, 29]. However, to date, there is no technique that1One of the authors is a neurologist whose expertise is movement disorders.

can perform both online training and inference. Online training iscrucial, since it needs to adapt to new, and potentially large numberof, users who are not involved in the training process. To this end,this paper presents the �rst HAR technique that continues to trainonline to adapt to its user.

The vast majority, if not all, of recent HAR techniques employsmartphones. Major motivations behind this choice are their wide-spread use and easy access to integrated accelerometer and gyro-scope sensors [34]. We argue that smartphones are not suitable forHAR for three reasons. First, patients cannot always carry a phoneas prescribed by the doctor. Even when they have the phone, it isnot always in the same position (e.g., at hand or in pocket), which istypically required in these studies [9, 29]. Second, mobile operatingsystems are not designed for meeting real-time constraints. Forexample, the Parkinson’s Disease Dream Challenge [1] organizersshared raw motion data collected using iPhones in more than 30Kexperiments. According to the o�cial spec, the sampling frequencyis 100 Hz. However, the actual sampling rate varies from 89 Hz to100 Hz, since the phones continue to perform many unintendedtasks during the experiments. Due to the same reason, the powerconsumption is in the order of watts (more than 100⇥ of our result).Finally, researchers are limited to sensors integrated in the phones,which are not speci�cally designed for human activity recognition.

This paper presents an online human activity recognition frame-work using the wearable system setup shown in Figure 1. Theproposed solution is the �rst to perform online training and lever-age textile-based stretch sensors in addition to commonly usedaccelerometers. Using the stretch sensor is notable, since it pro-vides low-noise motion data that enables us to segment the rawdata in non-uniform windows ranging from one to three seconds.In contrast, prior studies are forced to divide the sensor data into�xed windows [4, 19] or smoothen noisy accelerometer data overlong durations [9] (detailed in Section 2). After segmenting thestretch and accelerometer data, we generate features that enable

Figure 1: Wearable system setup, sensors and the low-powerIoT device [33]. We knitted the textile-based stretch sensorto a knee sleeve to accurately capture the leg movements.

This paper has been accepted for publication in the Proceedings of the International Conference on Computer Aided Design, 2018.

classifying the user activity into walking, sitting, standing, driving,lying down, jumping, as well as transitions between them. Since thestretch sensor accurately captures the periodicity in the motion, itsfast Fourier transform (FFT) reveals invaluable information aboutthe human activity in di�erent frequency bands. Therefore, we judi-ciously use the leading coe�cients as features in our classi�cationalgorithm. Unlike the stretch sensor, the accelerometer data is no-toriously known to be noisy. Hence, we employ the approximationcoe�cients of its discrete wavelet transform (DWT) to capture thebehavior as a function of time. We evaluate the performance ofthese features for HAR using commonly used classi�ers includingarti�cial neural network, random forest and k-nearest neighbor (k-NN). Among these, we focus on arti�cial neural network, since itenables online reinforcement learning using policy gradient [32]with low implementation cost. Finally, this work is the �rst to pro-vide a detailed power consumption and performance break-downof sensing, processing and communication tasks. We implementthe proposed framework on the TI-CC2650 MCU [33], and presentan extensive experimental evaluation using data from nine usersand a total of 2614 activity windows. Our approach provides 97.7%overall recognition accuracy with 27.60 ms processing time, 1.13mW sensing and 11.24 mW computation power consumption.

The major contributions of this work are as follows:• A novel technique to segment the sensor data non-

uniformly as a function of the user motion,• Online inference and training using an ANN, and reinforce-

ment learning based on policy gradient,• A low power implementation on a wearable device and ex-

tensive experimental evaluation of accuracy, performanceand power consumption using nine users.

The rest of the paper is organized as follows. We review therelated work in Section 2. Then, we present the feature generationand classi�er design techniques in Section 3. Online learning usingpolicy gradient algorithm is detailed in Section 4. Finally, the ex-perimental results are presented in Section 5, and our conclusionsare summarized in Section 6.

2 RELATEDWORK AND NOVELTYHuman activity recognition has been an active area of research dueits applications in health monitoring, patient rehabilitation and inpromoting physical activity among the general population [4, 6, 7].Advances in sensor technology have enabled activity recognitionto be performed using body mounted sensors [27]. Typical stepsfor activity recognition using sensors include data collection, seg-mentation, feature extraction and classi�cation.

HAR studies typically use a �xed window length to infer theactivity of a person [4, 14, 19]. For instance, the studies in [4, 19]use 10 second windows to perform activity recognition. Increasingthe window duration improves accuracy [6], since it provides richerdata about the underlying activity. However, transitions betweendi�erent activities cannot be captured with long windows. More-over, �xed window lengths rarely capture the beginning and endof an activity. This leads to inaccurate classi�cation as the windowcan have features of two di�erent activities [6]. A recent workproposes action segmentation using step detection algorithm onthe accelerometer data [9]. Since the accelerometer data is noisy,they need to smoothen the data using a one-second sliding window

with 0.5 second overlap. Hence, this approach is not practical forlow-cost devices with limited memory capacity. Furthermore, theauthors state that there is a strong need for better segmentationtechniques to improve the accuracy of HAR [9]. To this end, wepresent a robust segmentation technique which produces windowswhose sizes vary as a function of the underlying activity.

Most existing studies employ statistical features such as mean,median, minimum, maximum, and kurtosis to perform HAR [4,14, 19, 26]. These features provide useful insight, but there is noguarantee that they are representative of all activities. Therefore, anumber of studies use all the features or choose a subset of themthrough feature selection [26]. Fast Fourier transform and morerecently discrete wavelet transform have been employed on ac-celerometer data. For example, the work in [9] computes the 5thorder DWT of the accelerometer data. Eventually, it uses only a fewof the coe�cients to calculate the wavelet energy in the 0.625 - 2.5Hz band. In contrast, we use only the approximation coe�cientsof a single level DWT with O (N /2) complexity. Unlike prior work,we do not use the FFT of the accelerometer data, since it entailssigni�cant high frequency components without clear implications.In contrast, we employ leading FFT coe�cients of the stretch sensordata, since it gives a very good indication of the underlying activity.

Early work on HAR used wearable sensors to perform data col-lection while performing various activities [5]. This data is thenprocessed o�ine to design the classi�er and perform the inference.However, o�ine inference has limited applicability since users donot get any real time feedback. Therefore, recent work on HARhas focused on implementation on smartphones [3, 7, 29, 31]. How-ever, smartphones are not designed for human activity recognition,which leads to a higher power consumption [16]. Moreover, a smart-phone is mostly in a user’s pocket and it is inconvenient to placeit in a di�erent place to collect sensor data. In addition to thesechallenges, approaches using smartphones are not reproducible dueto the variability in di�erent phones, operating systems and usagepatterns [8, 29]. In contrast, our implementation on a wearabledevice consumes less than 12.5 mW power.

Finally, existing studies on HAR approaches employ commonlyused classi�ers, such as k-NN [13], support vector machines [13],decision trees [28], and random forest [13], which are trained o�ine.In strong contrast to these methods, the proposed framework isthe �rst to enable online training. We �rst train an arti�cial neuralnetwork o�ine to generate an initial implementation of the HARsystem. Then, we use reinforcement learning at runtime to improvethe accuracy of the system. This enables our approach to adapt tonew users in the �eld.

3 FEATURE SET AND CLASSIFIER DESIGN3.1 Goals and Problem StatementThe goal of the proposed HAR framework is to recognize the sixcommon daily activities listed in Table 1 and the transitions betweenthem in real-time with more than 90% accuracy under mW powerrange. These goals are set to make the proposed system practicalfor daily use. The power consumption target enables day-longoperation using ultrathin lithium polymer cells [11].

The stretch sensor is knitted to a knee sleeve, and the IoT devicewith a built-in accelerometer is attached to it, as shown in Figure 1.

Figure 2: Overview of the proposed human activity recogni-tion framework.

All the processing outlined in Figure 2 is performed locally on theIoT device. More speci�cally, the streaming stretch sensor data isprocessed to generate segments ranging from one to three seconds(Section 3.2). Then, the raw accelerometer and stretch data in eachwindow are processed to produce the features used by the classi-�er (Section 3.3). Finally, these features are used both for onlineinference (Section 3.4) and reinforcement learning using policygradient (Section 4). Since communication energy is signi�cant,only the recognized activity and time stamps are transmitted to agateway, such as a phone or PC, using Bluetooth whenever theyare nearby (within 10m). The following sections provide a theoreti-cal description of the proposed framework without tying them tospeci�c parameters values. These parameters are chosen to enablea low-overhead implementation using streaming data. The actualvalues used in our experiments are summarized in Section 5.1 whiledescribing the experimental setup.

3.2 Sensor Data SegmentationActivity windows should be su�ciently short to catch transitionsand fast movements, such as fall and jump. However, short windowscan also waste computation time and power for idle periods, suchas sitting. Furthermore, a �xed window may contain portions oftwo di�erent activities, since perfect alignment is not possible.Hence, activity-based segmentation is necessary to maintain a highaccuracy with minimum processing time and power consumption.

To illustrate the proposed segmentation algorithm, we start withthe snapshot in Figure 3 from our user studies. Both the 3-axisaccelerometer data and stretch sensor data are preprocessed using

Table 1: List of activities used in the HAR framework

• Drive (D) • Jump (J) • Lie Down (L)• Sit (S) • Stand (Sd) • Walk (W)• Transition (T) between the activities

19 21 23 25 27 29 31 33Time (s)

-2

0

2

Acc

eler

atio

n (g

)

ax ay az

19 21 23 25 27 29 31 33Time (s)

0

2

4

6

8

Nor

mal

ized

St

retc

h C

apac

itanc

e

-2

0

2

Sign

of D

eriv

ativ

e

Figure 3: Illustration of the segmentation algorithm.a moving average �lter similar to prior studies. The unit of accelera-tion is already normalized to gravitational acceleration. The stretchsensor outputs a capacitance value which changes as a function ofits state. This value ranges from around 390 pF (neutral) to closeto 500 pF when it is stretched [23]. Therefore, we normalize thestretch sensor output by subtracting its neutral value and scalingby a constant: s (t ) = [sraw (t ) �min(sraw )]/Sconst . We adoptedSconst = 8 to obtain a comparable range to accelerator. First, wenote that the 3-axis accelerometer data exhibits signi�cantly largervariations compared to the normalized stretch capacitance. There-fore, decisions based on accelerations are prone to false hits [9].In contrast, we propose a robust solution which generates the seg-ments speci�ed with red ⇤ markers in Figure 3.

The boundaries between di�erent activities can be identi�edby detecting the deviation of the stretch sensor from its neutralvalue. For example, the �rst segment in Figure 3 corresponds to astep during walk. The sensor value starts increasing from a localminima to a peak in the beginning of the step. The beginning ofthe second segment (t ⇡ 21 s) exhibits similar behavior, since itis another step. Although the second step is followed by a longerneutral period (the user stops and sits to a chair at t ⇡ 23 s), thebeginning of the next segment is still marked by a rise from a localminima. In general, we can observe a distinct minima (fall followedby rise as in walk) or a �at period followed by rise (as in walk tosit) at the boundaries of di�erent activity windows. Therefore, theproposed segmentation algorithm monitors the derivative of thestretch sensor to detect the activity boundaries.

We employ the 5-point derivative formula given below to trackthe trend of the sensor value:

s0(t ) =

s (t � 2) � 8s (t � 1) + 8s (t + 1) � s (t + 2)12

(1)

where s (t ) and s0(t ) are the stretch sensor value and its deriva-

tive time step t , respectively. When the derivative is positive, weknow that the stretch value is increasing. Similarly, a negative valuemeans a decrease, and s 0(t ) = 0 implies a �at region. Looking at asingle data point can catch sudden peaks and lead to false alarms.To improve the robustness, one can look at multiple consecutivedata points before determining the trend. In our implementation,we conclude that the trend changes only if the last three derivativesconsistently signal the new trend. For example, if the current trendis �at, we require that the derivative is positive for three consecu-tive data points to �lter glitches in the data point. Whenever we

Jump Walk Sit Stand

3-axis accelerometer

Stretch sensor

Transition

Figure 4: Illustration of the sensor data segmentation.

detect that the trend changes from �at or decreasing to positive, weproduce a new segment. Finally, we bound the window size frombelow and above to prevent excessively short or long windows. Westart looking for a new segment, only if a minimum duration (onesecond in this work) passes after starting a new window. Besidespreventing unnecessarily small segments, this approach saves com-putation time. Similarly, a new segment is generated automaticallyafter exceeding an upper threshold. This choice improves robust-ness in case a local minima is missed. We use tmax = 3 s as theupper bound, since it is long enough to cover all transitions.

Figure 4 shows the segmented data for the complete duration ofthe illustrative example given in Figure 3. The proposed approachis able to clearly segment each step of walk. Moreover, it is able tocapture the transitions from walking to sitting and sitting to stand-ing very well. This segmentation allows us to extract meaningfulfeatures from the sensor data, as described in the next section.

3.3 Feature GenerationTo achieve a high classi�cation accuracy, we need to choose therepresentative features that capture the underlying movements. Westart noting that human movements typically do not exceed 10-Hz.Since statistical features, such as mean and variance, are not neces-sarily representative, we focus on FFT and DWT coe�cients, whichhave clear frequency interpretations. Prior studies typically choosethe largest transform coe�cients [29] to preserve the maximumsignal power as in compression algorithms. However, sorting losesthe frequency connotation, besides using valuable computationalresources. Instead, we focus on the coe�cients in the frequencybins of interest by preserving the number of data samples in eachsegment, as described next.Stretch sensor features: The stretch sensor shows a periodic pat-tern for walking, and remains mostly constant during sitting andstanding, as shown in Figure 4. As the level of activity changes, thesegment duration varies in the (1,3] second interval. We can pre-serve 10 Hz sampling rate for the longest duration (3 s during lowactivity), if we maintain 25 = 32 data samples per segment. As thelevel of activity intensi�es, the sampling rate grows to 32 Hz, whichis su�cient to capture human movements. We choose a power of 2,since it enables e�cient FFT computation in real-time. When thesegment has more than 32 samples due to larger sensor samplingrate, we �rst sub-sample and smooth the input data as follows:

ss [k] =1

2SR

SRX

i=�SRs (tSR + i ), 0 k < 32 (2)

where SR = bN /32c is the subsampling rate, and ss [k] is the sub-sampled and smoothed data point. When there are less than 32samples, we simply pad the segment with zeros.

After standardizing the size, we take the FFT of the current win-dow and the previous window. We use two windows as it allows usto capture any repetitive patterns in the data. With 32 Hz samplingrate during high activity regions, we cover Fs/2 =16 Hz activityper Nyquist theorem. We observe that the leading 16 FFT coe�-cients, which cover the [0-8] Hz frequency range, carry most of thesignal power in our experimental data. Therefore, they are usedas features in our classi�ers. The level of the stretch sensor alsogives useful information. For instance, it can reliably di�erentiatesit from stand. Hence, we also add the minimum and maximumvalue of the stretch sensor to the feature set.Accelerometer features: Acceleration data contains fasterchanges compared to the stretch data, even the underlying hu-man motion is slow. Therefore, we sub-sample and smoothen theacceleration to 26 = 64 points following the same procedure givenin Equation 2. Three axis accelerometers provide acceleration ax ,a� and az along x�, �� and z�axes, respectively. In addition, wecompute the body acceleration excluding the e�ect of gravity � asbacc =

qa2x + a

2� + a

2z � �, since it carries useful information.

Discrete wavelet transform is an e�ective method to recursivelydivide the input signal to approximation Ai and detail Di coef-�cients. One can decompose the input signal to log2 N sampleswhere N is the number of data points. After one level of decom-position, A1 coe�cients in our data correspond to 0-32 Hz, whileand D1 coe�cients cover 32-64 Hz band. Since former is more thansu�cient to capture acceleration due to human activity, we onlycompute and preserveA1 coe�cients withO (N /2) complexity. Thenumber of features could be further reduced by computing thelower level coe�cients and preserving largest ones. As shown inthe performance break-down in Table 5, using the features in theANN computations takes less time than computing the DWT coef-�cients. Moreover, keeping more coe�cients and preserving theorder maintains the shape of the underlying data.Feature Overview: In summary, we use the following features:Stretch sensor: We use 16 FFT coe�cients, the minimum and maxi-mum values in each segment. This results in 18 features.Accelerometer: We use 32 DWT coe�cients for ax , az and bacc . Inour experiments, we use only the mean value of a� , since no activityis expected in the lateral direction, and bacc already captures itse�ect given the other two directions. This result in 97 features.General features: The length of the segment also carries importantinformation, since the number of data points in each segment isnormalized. Similarly, the activity in the previous window is usefulto detect transitions. Therefore, we also add these two features toobtain a total of 117 features.

3.4 Supervised Learning for State Classi�cationIn the o�ine phase of our framework, the feature set is assigned alabel corresponding to the user activity. Then, a supervised learningtechnique takes the labeled data to train a classi�er which is usedat runtime. Since one of our major goals is online training usingreinforcement learning, we employ a cost-optimized arti�cial neural

network (ANN). We also compare our solution to most commonlyused classi�ers by prior work, and provide brief explanations.Support Vector Machine (SVM): SVM [13] �nds a hyperplanethat can separate the feature vectors of two output classes. If aseparating hyperplane does not exist, SVM maps the data intohigher dimensions until a separating hyperplane is found. SinceSVM is a two class classi�er, multiple classi�ers need to be trainedfor recognizing more than two output classes. Due to this, SVM isnot suitable for reinforcement learning with multiple classes [20],which is the case in our HAR framework.Random Forests and Decision Trees: Random forests [13] usean ensemble of tree-structured classi�ers, where each tree inde-pendently predicts the output class as a function of the featurevector. Then, the class which is predicted most often is selectedas the �nal output class. C4.5 decision tree [28] is another com-monly used classi�er for HAR. Instead of using multiple trees, C4.5uses a single tree. Random forests typically shows a higher accu-racy than decision trees, since it evaluates multiple decision trees.Reinforcement learning using random forests has been recentlyinvestigated in [24]. As part of the reinforcement learning process,additional trees are constructed and then a subset of trees is chosento form the new random forest. This adds additional processingand memory requirements on the system, making it unsuitable forimplementation on a wearable system with limited memory.k-Nearest Neighbors (k-NN): k-Nearest Neighbors [13] is one ofthe most popular techniques used by many previous HAR studies.k-NN evaluates the output class by �rst calculating k nearest neigh-bors in the training dataset. Then, it chooses the class that is mostcommon among the k neighbors and assigns it as the output class.This requires storing all the training data locally. Since storing thetraining data on a wearable device with limited memory is notfeasible, k-NN is not suitable for online training.Proposed ANN Classi�er: We use the arti�cial neural networkshown in Figure 5 as our classi�er. The input layer processes thefeatures denoted by X, and relay to the hidden layer with the ReLUactivation. It is important to choose an appropriate number ofneurons (Nh ) in the hidden layer to have a good accuracy, whilekeeping the computational complexity low. To obtain the best trade-o�, we evaluate the recognition accuracy andmemory requirementsas a function of neurons, as detailed in Section 5.2.

The output layer includes a neuron for each activity ai 2 A ={D, � ,L, S, Sd,W ,T }, 1 i NA, where NA is the number of activi-ties in set A, which are listed in Table 1. Output neuron for activityai computesOai (X,�in ,� ) as a function of the input featuresX andthe weights of the ANN. To facilitate the policy gradient approachdescribed in Section 4, we express the output Oai in terms of thehidden layer outputs as:

Oai (X,�in ,� ) = Oai (h,� ) =Nh+1X

j=1hj� j,i , 1 i NA (3)

where hj is the output of the jth neuron in the hidden layer, and� j,i is the weight from j

th neuron to output activity ai . Note thathj is a function of X and �in . The summation goes to Nh + 1, sincethere are Nh neurons and one bias term in the hidden layer.

Input Layer Hidden Layer Output Layer

!($%|h ,)))+, )

Softmax

-./ (0, ))

-.12 (0, ))

)345%,32

ℎ%

ℎ34!($32 |h ,)7

8

Bias Bias ℎ349/

)%,%

Figure 5: The ANN used for activity classi�er and reinforce-ment learning.

After computing the output functions, we use the softmax acti-vation function to obtain the probability of each activity:

� (ai | h,� ) =eOai (h,� )

PNAj=1 e

Oaj (h,� ), 1 i NA (4)

We express � (ai | h,� ) as a function of the hidden layer outputs hinstead of the input features, since our reinforcement learning algo-rithm will leverage it. Finally, the activity which has the maximumprobability is chosen as the output.Implementation cost: Our optimized classi�er requires 264 mul-tiplications for the FFT of stretch data, 118Nh + (Nh + 1)NA multi-plications for the ANN and uses only 2 kB memory.

4 ONLINE LEARNING with POLICY GRADIENTThe trained ANN classi�er is implemented on the IoT device torecognize the human activities in real-time. In addition to onlineactivity recognition, we employ a policy gradient based reinforce-ment learning (RL) to continue training the classi�er in the �eld.Online training improves the recognition accuracy for new usersby as much as 33%, as demonstrated in our user studies. We use thefollowing de�nitions for the state, action, policy, and the reward.State: Stretch sensor and accelerometer readings within a segmentare used as the continuous state space.We process them as describedin Section 3.3 to generate the input feature vector X (Figure 5).Policy: The ANN processes input features as shown in Figure 5 togenerate the hidden layer outputs h = {hj , 1 j Nh + 1} and theactivity probabilities � (ai |h,� ), i.e., the policy given in Equation 4.Action: The activity performed in each sensor data segment isinterpreted as the action in our RL framework. It is given byar�max � (ai |h,� ), i.e., the activity with maximum probability.Reward: Online training requires user feedback, which is de�nedas the reward function. When no feedback is provided by the user,the weights of the network remain the same. The user can givefeedback upon completion of an activity, such as walking, whichcontains multiple segments (i.e., non-uniform action windows). Ifthe classi�cation in this period is correct, a positive reward (in ourimplementation +1) is given. Otherwise, the reward is negative(�1). We de�ne the sequence of segments for which a reward isgiven as an epoch. The set of epochs in a given training session iscalled an episode following the RL terminology [32].Objective: The value function for a state is de�ned as the totalreward that can be earned starting from that state and following thegiven policy until the end of an episode. Our objective is tomaximizethe total reward � (� ) as a function of the classi�er weights.

Proposed Policy Gradient Update: In general, all the weights inthe policy network can be updated after an epoch [32]. This is usefulwhen we start with an untrained network with random weights.When a policy network is trained o�ine as in our example, its �rstfew layers generate broadly applicable intermediate features [21].Consequently, we can update only the weights of the output layerto take advantage of o�ine training and minimize the computationcost. More precisely, we update the weights denoted by � in Figure 5to tune our optimized ANN to individual users.

Since we use the value function as the objective, the gradient of� (� ) is proportional to the gradient of the policy [32]. Using thisresult, the update equation for � is given as:

�t+1 ⌘ �t + �rtr�� (at | h,�t )� (at | h,�t )

, � : Learning rate (5)

where �t and �t+1 are the current and updated weight matrices,respectively. Similarly, at is the current action at time t , rt is thecorresponding reward, and h denotes the hidden layer outputs.Hence, we need to compute the gradient of the policy to update theweights. To facilitate this computation and partial update, we parti-tion the weights into two disjoint sets as St and St . The weightsthat connect to the output Oat corresponding to the current actionare in St . The rest of the weights belong to the complementaryset St . With this de�nition, we summarize the weight update rulein a theorem in order not to disrupt the �ow of the paper withderivations. Interested readers can go through the proof.Weight Update Theorem: Given the current policy, reward andthe learning rate � , the weights in the output layer of the ANNgiven in Figure 5 are updated online as follows:

�t+1, j,i ⌘8><>:�t, j,i + �rt (1 � � (at | h,�t )) · hj �t, j,i 2 St�t, j,i � �rt� (ai | h,�t )) · hj �t, j,i 2 St

(6)

Proof: The partial derivative of the policy � (at | h,� ) with respectto the weights � j,i can be expressed using the chain rule as:

@� (at | h,� )@� j,i

=@� (at | h,� )@Oai (h,� )

@Oai (h,� )@� j,i

(7)

where 1 j Nh + 1 and 1 i NA. When �t, j,i 2 St , action atcorresponds to output Oat (h,� ). Hence, we can express the �rstpartial derivative using Equation 4 as follows:

@� (at | h,� )@Oat (h,� )

=eOat (h,� )

PNaj=1 e

Oaj (h,� )�

⇣eOat (h,� )

⌘2

✓PNaj=1 e

Oaj (h,� )◆2

= � (at | h,� )⇣1 � � (at | h,� )

⌘(8)

Otherwise, i.e., �t, j,i 2 St , the derivative is taken with respect toanother output. Hence, we can �nd the partial derivative as:

@� (at |h,� )@Oai (h,� )

= �eOat (h,� )eOai (h,� )✓PNA

j=1 eOaj (h,� )

◆2 = �� (at |h,� )� (ai |h,� ) (9)

The second partial derivative in Equation 7, @Oai (h,� )/@� j,i , canbe easily computed as hj using Equation 3. The weight update is theproduct of learning rate � , reward rt , hj and the partial derivativeof the policy with respect to the output functions. For the weights�t, j,i 2 St , we use the partial derivative in Equation 8. For the

remaining weights, we use Equation 9. Hence, we obtain the �rstand second lines in Equation 6, respectively. Q.E.D ⇤

In summary, the weights of the output layer are updated onlineusing Equation 6 after a user feedback. Detailed results for theimprovement in accuracy using RL are presented in Section 5.3.

5 EXPERIMENTAL EVALUATION5.1 Experimental SetupWearable System Setup: The proposed HAR framework is imple-mented on the TI-CC2650 [33] IoT device, which includes a motionprocessing unit. It also integrates a radio that runs Bluetooth LowEnergy (BLE) protocol. This device is placed on the ankle, since thisallows for maximum swing in the accelerometer [15]. The userswear the �exible stretch sensor on the right knee to capture theknee movements of the user. In our current implementation, thestretch sensor transmits its output to the IoT device over BLE toprovide �exibility in placement. To synchronize the sensors, werecord the wall clock time of each sensor in the beginning of theexperiment. Then, we compute the o�set between the sensors, anduse this o�set to align the sensor readings, as proposed in [30].

All processing is performed on the IoT device. The recognizedactivities, such as walk, and their time duration are recorded by thedevice. This data is then transmitted to a host, such as a smartphone,for debugging and o�ine analysis.Parameter Selection: We use the default sampling frequencies:100 Hz for the stretch sensor, and 250 Hz for the accelerometer.We did not experiment with lower sampling frequencies for tworeasons. First, it did not produce any signi�cant power savings.Second, we already sub-sample and smooth the sensor readingduring feature generation (Section 3.3). The moving average �lterused for preprocessing computes averages over �ve sensor readings.User Studies: We evaluate the accuracy of the proposed approachusing data from nine users, as summarized in Table 2. Data fromonly �ve of them are employed during the training phase. This datais divided into 80% training/validation and 20% test following thecommon practice. The rest of the user data is saved for evaluatingonly the online reinforcement learning framework. Each user per-forms the activities listed in Table 1 while wearing the sensors. Forexample, the illustration in Figure 4 is from an experiment wherethe user jumps, takes 10 steps, sits on a chair, and �nally stands up.The experiments vary from 21 seconds to 6 minutes in length, andhave di�erent composition of activities. We report results from 58di�erent experiments with a 100 minutes total duration, as summa-rized in Table 2. After each experiment, the segmentation algorithmpresented in Section 3.2 is used to identify non-uniform activitywindows. This results in 2614 unique di�erent segments in our ex-perimental data. Then, each window is labeled manually by visualinspection of four human experts. Finally, the labeled data is usedfor o�ine training. Comparing speci�c HAR approaches is chal-lenging, since data is collected using di�erent platforms, sensorsand settings. Therefore, we compare our results with all commonly

Table 2: Summary of user studiesUsers Unique Experiments No. of Segments Duration (min)9 58 2614 100

1 2 3 4 5 6 7

Number of Neurons

60

70

80

90

100

Reco

gn

itio

n A

ccu

racy (

%)

0

1

2

3

4

Me

mo

ry U

sa

ge

(k

B)

Recognition AccuracyMemory Usage

Figure 6: Comparison of accuracy with number of neuronsused classi�ers in the next section. We also release the labeled ex-perimental data to public (blind-url) to enable other researchersmake comparisons using a common data set.

5.2 Training by Supervised LearningWe use an arti�cial neural network to perform online activity recog-nition and training. The ANN has to be implemented on the wear-able device with a limited memory (in our case 20kB). Therefore, itshould have small memory footprint, i.e., number of weights, whilegiving a high recognition accuracy. To achieve robust online weightupdates during reinforcement learning, we �rst �x the number ofhidden layers to one. Then, we vary the number of neurons inthe hidden layer to study the e�ect on the accuracy and memoryrequirements. Speci�cally, we varied the number of hidden layerneurons from one to seven. Note that the number of neurons inthe output layer remains constant as we do not change the num-ber of activities being recognized. Figure 6 shows the recognitionaccuracy (left axis) and memory requirements (right axis) of thenetwork as a function of number of neurons in the hidden layer. Weobserve that the accuracy is only about 80%, when a single neuronis used in the hidden layer. As we increase the number of neurons,both the memory requirements and accuracy increase. The accu-racy starts saturating after the third neuron, while the number ofweights and memory requirements increase. In fact, the increasein memory requirement is linear, with an increase of around 500bytes with every additional neuron in the hidden layer. Thus, thereis a trade-o� between the memory requirements and accuracy. Inour HAR framework, we choose an ANN with four neurons in thehidden layer as it gives an overall accuracy of about 97.7% and hasa memory requirement of 2 kB, leaving the rest of the memory foroperating system and other tasks.5.2.1 Confusion MatrixWe analyze the accuracy of recognizing each activity in our experi-ment in Table 3. There is one column and one row corresponding

Table 3: Confusion matrix for 5 training users

Drive Jump LieDown Sit Stand Walk Tran-

sitionD (155) 99.4% 0.00 0.00 0.00 0.00 0.00 0.6%J (181) 0.00 93.4% 0.00 0.00 1.1% 3.9% 1.6%L (204) 0.00 0.00 100% 0.00 0.00 0.00 0.00S (394) 0.25% 0.25% 0.00 97.7% 0.76% 0.00 1.0%Sd (350) 0.00 0.29% 0.00 0.00 98.6% 1.1% 0.00W (806) 0.00 0.50% 0.00 0.00 0.62% 98.5% 0.37%T (127) 0.00 3.1% 0.79% 2.4% 0.79% 2.4% 90.5%

to the activities of interest. The numbers on the diagonal show therecognition accuracy for each activity. For example, the �rst rowin the �rst column shows that driving is recognized with 99.4% ac-curacy. According to the �rst row, only 0.6% of the driving activitywindows are classi�ed falsely as “Transition”. To provide also theabsolute numbers, the number in parenthesis at the end of eachrow shows the total number of activity windows with the corre-sponding label. For instance, a total of 155 windows were labeled“Drive” according to row 1.

We achieve an accuracy greater than 97% for �ve of the sevenactivities. The accuracy is slightly lower for jump because it is moredynamic than all the other activities. Moreover, there is a highervariability in the jump patterns for each user, leading to slightlylower accuracy. It is also harder to recognize transitions due to thefact that each transition segment contains features of two activity.This can lead to a higher confusion for the ANN, but we still achievemore than 90% accuracy. We also note that the loss in accuracy isacceptable for transitions, since we can indirectly infer a transitionby looking at the segments before and after the transition.5.2.2 Comparison with other classi�ersIt is not possible to do a one to one comparison with existing ap-proaches because they use di�erent devices, data sets and activities.Therefore, we use our data set with the commonly used classi�ersdescribed in Section 3.4. The results are summarized in Table 4. Al-though we use only a single hidden layer and minimize the numberof neurons, our implementation achieves competitive test and over-all accuracy compared to the other classi�ers. We also emphasizethat our ANN is used for both online classi�cation and training onthe IoT device.

Table 4: Comparison of accuracy for di�erent classi�ers

Classi�er Train Acc. (%) Test Acc. (%) Overall Acc. (%)Random Forest 100.00 94.58 98.92C4.5 99.09 93.90 98.05k-NN 100.00 94.80 98.96SVM 97.68 95.03 97.15Our ANN 98.53 94.36 97.70

5.3 Reinforcement Learning with new usersThe ANN obtained in the o�ine training stage is used to recognizethe activities of four new users that are previously unseen by thenetwork. This capability provides a real world evaluation of theapproach, since a device cannot be trained for all possible users. Dueto variations in usage patterns, it is possible that the initial accuracyfor a new user is low. Indeed, the initial accuracy for users 6 and 9 isonly about 60 - 70 %. Therefore, we use reinforcement learning usingpolicy gradients to continuously adapt the HAR system to eachuser. Figure 7 shows the improvement achieved using reinforcementlearning for four users. Each episode in the x-axis corresponds toan iteration of RL using the data set for new users. The weights ofthe ANN are updated after each segment as a function of the userfeedback for a total of 100 episodes. Moreover, we run 5 independentruns, each consisting of 100 epochs, to obtain an average accuracyof the ANN at each episode. We observe consistent improvementin accuracy for all for users. The accuracy for users 6 and 9 startslow and increases to about 93% after about 20 episodes. User 8

100 101 102

Episode

50

60

70

80

90

100

Reco

gn

itio

n A

ccu

rac

y (

%)

User 6

User 7

User 8

User 9

Figure 7: Reinforcement learning results for four new users.

starts with a higher accuracy of about 85%. The accuracy increasesquickly to about 98% after 10 episodes. In summary, reinforcementlearning allows to improve the accuracy for users not previouslyseen by the network. This ensures that the device can adapt to newusers very easily.

5.4 Power, Performance and Energy EvaluationTo fully assess the cost of the proposed HAR framework, we presenta detailed breakdown of execution time, power consumption andenergy consumption for each step. The �rst part of HAR involvesdata acquisition from the sensors and segmentation. The segmen-tation algorithm is continuously running while the data is beingacquired. Therefore, we include its energy consumption in the sens-ing block. Table 5 shows the power and energy consumption for atypical segment of 1.5 s. The average power consumption for thedata acquisition is 1.13 mW, leading to a total energy consumptionof 1695 µJ. If the segments are of a longer duration, the energyconsumption for data sensing increase linearly. Following the datasegmentation, we extract the features and run the classi�er. Theexecution time, power and energy for these blocks are shown inthe "Compute" rows in Table 5. As expected, the FFT block hasthe largest execution time and energy consumption. However, it isstill two orders of magnitude lower than the duration of a typicalsegment. Finally, the energy consumption of the BLE communica-tion block is given in the last row of Table 5. Since we transmitthe inferred activity, the energy consumed by the BLE communi-cation is only about 43 µJ. In summary, with less than 12.5 mWaverage power consumption, our approach enables close to 60-houruninterrupted operation using a 200mAh @ 3.7V battery [11].

Table 5: Execution time, power and energy consumption

Block Exe.Time (ms)

AveragePower (mW) Energy (µJ)

Sense Read/Segment 1500.00 1.13 1695.00

ComputeDWT 7.90 9.50 75.05FFT 17.20 11.80 202.96ANN 2.50 12.90 32.25Overall 27.60 11.24 310.26

Comm. BLE 8.60 5.00 43.00

6 CONCLUSIONSWe presented a HAR framework on a wearable IoT device usingstretch accelerometer sensors. The �rst step of our solution is a

novel technique to segment the sensor data non-uniformly as afunction of the user motion. Then, we generate FFT and DWTfeatures using the segmented data. Finally, these features are usedfor online inference and training using an ANN. Our solution is the�rst to perform online training. Experiments on TI-CC2650 MCUwith nine users show 97.7% accuracy in identifying six activitiesand their transitions with less than 12.5 mW power consumption.

REFERENCES[1] Parkinsons Disease Digital Biomarker DREAM Challenge. [Online] https://www.

synapse.org/#!Synapse:syn8717496/wiki/. Accessed 04/15/2018.[2] World Health Organization, Obesity and Overweight. Fact Sheets, 2013. [Online]

http://www.who.int/mediacentre/factsheets/fs311/en/. Accessed 03/22/2018.[3] D. Anguita, A. Ghio, L. Oneto, F. X. Llanas Parra, and J. L. Reyes Ortiz. Energy

E�cient Smartphone-Based Activity Recognition Using Fixed-Point Arithmetic.J. of Universal Comput. Sci., 19(9):1295–1314.

[4] M. Arif, M. Bilal, A. Kattan, and S. I. Ahamed. Better Physical Activity Classi�ca-tion Using Smartphone Acceleration Sensor. J. of Med. Syst., 38(9):95, 2014.

[5] L. Bao and S. S. Intille. Activity Recognition From User-Annotated AccelerationData. In Int. Conf. on Pervasive Comput., pages 1–17, 2004.

[6] A. G. Bonomi, A. H. Goris, B. Yin, and K. R. Westerterp. Detection of Type,Duration, and Intensity of Physical Activity Using an Accelerometer. Medicine& Science in Sports & Exercise, 41(9):1770–1777, 2009.

[7] J. Bort-Roig, N. D. Gilson, A. Puig-Ribera, R. S. Contreras, and S. G. Trost. Mea-suring and In�uencing Physical Activity With Smartphone Technology: A Sys-tematic Review. Sports Medicine, 44(5):671–686, 2014.

[8] M. A. Case, H. A. Burwick, K. G. Volpp, and M. S. Patel. Accuracy of SmartphoneApplications and Wearable Devices for Tracking Physical Activity Data. Jama,313(6):625–626, 2015.

[9] Y. Chen and C. Shen. Performance Analysis of Smartphone-Sensor Behavior forHuman Activity Recognition. IEEE Access, 5:3095–3110, 2017.

[10] K. Dinesh, M. Xiong, J. Adams, R. Dorsey, and G. Sharma. Signal Analysis forDetecting Motor Symptoms in Parkinson’s and Huntington’s Disease UsingMultiple Body-A�xed Sensors: A Pilot Study. In Image and Signal Process.Workshop, pages 1–5, 2016.

[11] DMI International Distribution Ltd. Curved lithium thin cells. [Online] http://www.dmi-international.com/data%20sheets/Curved%20Li%20Polymer.pdf Ac-cessed 04/18/2018.

[12] A. J. Espay, P. Bonato, F. B. Nahab, W. Maetzler, J. M. Dean, J. Klucken, B. M.Esko�er, A. Merola, F. Horak, A. E. Lang, et al. Technology in Parkinson’s Disease:Challenges and Opportunities. Movement Disorders, 31(9):1272–1282, 2016.

[13] J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning,volume 1. Springer, 2001.

[14] P. Gupta and T. Dallas. Feature Selection and Activity Recognition System Usinga Single Triaxial Accelerometer. IEEE Trans. Biomed. Eng., 61(6):1780–1786, 2014.

[15] N. Győrbíró, Á. Fábián, and G. Hományi. An Activity Recognition System forMobile Phones. Mobile Networks and Appl., 14(1):82–91, 2009.

[16] Y. He and Y. Li. Physical Activity Recognition Utilizing the Built-in KinematicSensors of a Smartphone. Int. J. of Distrib. Sensor Networks, 9(4):481–580.

[17] R. Jafari, W. Li, R. Bajcsy, S. Glaser, and S. Sastry. Physical Activity Monitoringfor Assisted Living at Home. In Int. Workshop on Wearable and Implantable BodySensor Network, pages 213–219, 2007.

[18] M. Kirwan, M. J. Duncan, C. Vandelanotte, and W. K. Mummery. Using Smart-phone Technology to Monitor Physical Activity in the 10,000 Steps Program: AMatched Case–Control Trial. J. of Med. Internet Research, 14(2), 2012.

[19] J. R. Kwapisz, G. M. Weiss, and S. A. Moore. Activity Recognition Using CellPhone Accelerometers. ACM SigKDD Explorations Newsletter, 12(2):74–82, 2011.

[20] M. G. Lagoudakis and R. Parr. Reinforcement Learning as Classi�cation: Lever-aging Modern Classi�ers. In Proc. Int. Conf. on Machine Learning, pages 424–431,2003.

[21] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan. A Fast andAccurate Online Sequential Learning Algorithm for Feedforward Networks. IEEETrans. Neural Netw., 17(6):1411–1423, 2006.

[22] A. Mosenia, S. Sur-Kolay, A. Raghunathan, and N. K. Jha. Wearable MedicalSensor-Based System Design: A Survey. IEEE Trans. Multi-Scale Comput. Syst.,3(2):124–138, 2017.

[23] B. O’Brien, T. Gisby, and I. A. Anderson. Stretch Sensors for Human BodyMotion.In Electroactive Polymer Actuators and Devices, volume 9056, page 905618, 2014.

[24] A. Paul and D. P. Mukherjee. Reinforced random forest. In Proc. Indian Conf. onComput. Vision, Graphics and Image Processing, page 1. ACM, 2016.

[25] Pérez-López et al. Dopaminergic-Induced Dyskinesia Assessment Based on aSingle Belt-Worn Accelerometer. Arti�cial Intell. in Medicine, 67:47–56, 2016.

[26] S. Pirttikangas, K. Fujinami, and T. Nakajima. Feature Selection and Activ-ity Recognition From Wearable Sensors. In Int. Symp. on Ubiquitious Comput.Systems, pages 516–527, 2006.

[27] S. J. Preece et al. Activity Identi�cation Using Body-Mounted Sensors–A Reviewof Classi�cation Techniques. Physiological Measurement, 30(4):R1, 2009.

[28] J. R. Quinlan. C4. 5: Programs for Machine Learning. Elsevier, 2014.[29] M. Shoaib et al. A Survey of Online Activity Recognition Using Mobile Phones.

Sensors, 15(1):2059–2085, 2015.[30] S. Sridhar, P. Misra, G. S. Gill, and J. Warrior. Cheepsync: A Time Synchronization

Service for Resource Constrained Bluetooth LE Advertisers. IEEE Commun. Mag.,54(1):136–143, 2016.

[31] V. Stewart, S. Ferguson, J.-X. Peng, and K. Ra�erty. Practical Automated ActivityRecognition Using Standard Smartphones. In Int. Conf. on Pervasive Comput. andCommun., pages 229–234, 2012.

[32] R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press,2nd edition.

[33] Texas Instruments Inc. CC-2650 Microcontroller. [Online] http://www.ti.com/product/CC2650 Accessed 04/18/2018.

[34] A. Wang, G. Chen, J. Yang, S. Zhao, and C.-Y. Chang. A Comparative Studyon Human Activity Recognition Using Inertial Sensors in a Smartphone. IEEESensors J., 16(11):4566–4578, 2016.

online human activity recognition using low-power wearable ... · articial neural network, random...

Documents