wearable computing - part iii: the activity recognition chain (arc)

Daniel Roggen

2011

Wearable ComputingPart III

The Activity Recognition Chain (ARC)

© Daniel Roggen www.danielroggen.net [email protected]

Focus: activity recognition

• Activity is a key element of context!

Fitness coaching

iPhone: Location-based services

Step counter

Wii

Fall detection, alarm

Elderly assistant


There is no « Drink Sensor »

• Simple sensors (e.g. RFID) can provide a "binary" information– Presence (e.g. RFID, Proximity infrared sensors)– Movement (e.g. ADXL345 accelerometer ‘activity/inactivity

pin’)– Fall (e.g. ADXL345 accelerometer ‘freefall pin’)

• But in general « activity-X sensor » does not exist– Sensor data must be interpreted

– Multiple sensors must be correlated (data fusion)– Several factors influence the sensor data

• Drinking while standing: the arm reaches the object then the mouth

• Drinking while walking: the arm moves, and also the whole body

• Context is interpreted from the sensor data with– Signal processing– Machine learning

– Reasoning

• Can be integrated into a « sensor node » or « smart sensor »– Sensor chip + data processing in a device


User Activity Structure

Working Resting Working Resting Working Resting

Year 1 Year 2 Year 3

Go to work Read mailMeeting Shopping Go home

Enter Give talk Listen Leave

Walk ShowSpeak Stand SpeakSpeak

Week 10 Week 11 Week 12


How to detect a presentation?

• Place– Conference room

– In front of audience

– Generally at the lectern

• Sound– User speaks

– Maybe short interruptions

– Otherwise silence

• Motion– Mostly standing, with small walking motion

– Hand motion, pointing

– Typical head motion


Greeting

Sensorplatzierung

Upper body Right wrist Left upper leg

Activity Person is seated Stands up Greets somebody Seats again


Greeting

-2g

+2g

0g

-1g

+1g

/-2g

/+2g

/0g


Data recording

Stand up

Sit down

Seating

Standing

Seating

Upper body

WristHand on

tableHand on tableArm

motionArm

motion

Handshake

Time [s]

Combination from individual data is distinctive of the activity!

Acc

ele

rati

on [

g]


Turn pages:Drink from a glass:

How to recognize activities?With sensors on the body, in objects, in the environment, …

1. Activities are represented by typical signal patterns

Sensor data

2. Recognition: "comparison" between template and sensor data

Drink recognized Turn page recognized

Motion sensorActivity = movement

Activity = sound Microphone


Characteristic Type Description

Execution Offline The system records the sensor data first. The recognition is performed afterwards. Typically used for non-interactive applications such as activity monitoring for health-related applications.

Online The system acquires sensor data and processes it on-the-fly to infer activities. Typically used for activity-based computing and interactive applications (HCI).

Recognition Continuous The system “spots” the occurrence of activities or gestures in streaming data. It implements data stream segmentation, classification and null class rejection.

Isolated / segmented

The system assumes that the sensor data stream is segmented at the start and end of a gesture by an oracle. It only classifies the sensor data into the activity classes. The oracle can be an external system in a working system (e.g. cross-modality segmentation), or the experimenter when assessing classification performance during design phases.

Recognition system characteristics


Activity recognition: learning by demonstration

• Sensor data• 1) train activity models• 2) recognition

Sensor data Recognition

=?ContextActivity

Training data is required

Activity models

Model TrainingTraining



World model Stateless The recognition system does not model the state of the world. Activities are recognized by spotting specific sensor signals. This is currently the dominant approach when dealing with the recognition of activity primitives (e.g. reach, grasp).

Stateful The system uses a model of the world, such as user’s context or environment map with location of objects. This enhances activity recognition performance, at the expense of designtime knowledge and more complex recognition system.

Activity models


Assumptions

• Constant sensor-signal to activity-class mapping

• Design-time: identify sensor-signal/activity-class mapping– Sensor setup– Activity sets

• Run-time: "low"-variability– Can't displace sensors or modify garments– Can't change the way activities are done


The activity recognition chain (ARC)

• A standard set of steps followed by most research in activity recognition (e.g. [1,2,3,4])

• Streaming signal processing

• Machine learning

• Reasoning

[1] J. Ward, P. Lukowicz, G. Tröster, and T. Starner, “Activity recognition of assembly tasks using body-worn microphones and accelerometers,”

IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1553–1567, 2006.

[2] L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” in Pervasive Computing: Proc. of the 2nd Int’l Conference, Apr. 2004, pp. 1–17.

[3] D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Preprocessing techniques for context recognition from accelerometer data,” Pervasive and Mobile Computing, vol. 14, no. 7, pp. 645–662, 2010.

[4] Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor Networks (BSN), 2010


Low-level activity models

(primitives)

Design-time: Training phase

Optim

ize

Sensor data

AnnotationsHigh-level activity

models

Optim

ize

ContextActivity

Reasoning

Symbolic processing

Activity-aware application

A1, p1, t1

A2, p2, t2

A3, p3, t3

A4, p4, t4

t

[1] Roggen et al., Wearable Computing: Designing and Sharing Activity-Recognition Systems Across Platforms, IEEE Robotics&Automation Magazine, 2011

Runtime: Recognition phase

FS2 P2

S1 P1

S0 P0

S3 P3

S4 P4

S0

S1

S2

S3

S4

F1

F2

F3

F0 C0

C1

C2

PreprocessingSensor sampling Segmentation

Feature extractionClassification

Decision fusion

R

Null classrejection

Subsymbolic processing


Segmentation• A major challenge!

• Find the boundaries of activities for later classification

Classification

Drink Turn• Methods:

– Sliding window segmentation– Energy-based segmentation– Rest-position segmentation– HMM [1], DTW [2,3], SWAB [4]

[1] J. Deng and H. Tsui. An HMM-based approach for gesture segmentation and recognition. In 15th International Conference on Pattern Recognition, volume 3, pages 679–682. 2000.

[2] M. Ko, G. West, S. Venkatesh, and M. Kumar, “Online context recognition in multisensor systems using dynamic time warping,” in Proc. Int. Conf. on Intelligent Sensors, Sensor Networks and Information Processing, 2005, pp. 283–288.

[3] Stiefmeier, Wearable Activity Tracking in Car Manufacturing, PCM, 2008

[4] E. Keogh, S. Chu, D. Hart, and M. Pazzani. An online algorithm for segmenting time series. In Proceedings of the IEEE International Conference on Data

Mining, pages 289–96, 2001.

• Classification here undefined– classifier not trained on "no activity"

– "null class" hard to model: can be anything

• Or use "null class rejection" after classification


Segmentation: sliding/jumping window

• Commonly used for audio processing– E.g. 20 ms windows

• or for periodic activities– E.g. walking, with windows of few seconds



Activity kinds Periodic Nature of activities exhibiting periodicity, such as walking, running, rowing, biking, etc. Sliding window and frequency-domain features are generally used.

Sporadic The activity or gesture occurs sporadically, interspersed with other activities or gestures. Segmentation plays a key role to to isolate the subset of data containing the gesture.

Static The system deals with the detection of static postures or static pointing gestures. Sliding window and time-domain features are generally used.

Activity characteristics


Segmentation• Energy-based segmentation [1]

– Between activities the user does not move– Low energy in the acceleration signal– E.g. standard deviation of acceleration compared to a threshold

• Rest-position segmentation [1]– User comes back to a rest position between gestures– Can be trained

• Challenge:– Usually no 'pause' or 'rest' between activities!– Combination of segmentation and null class rejection– E.g. DTW [2]

[1] Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor Networks (BSN), 2010

[2] Stiefmeier, Wearable Activity Tracking in Car Manufacturing, PCM, 2008


Feature extraction• Compute features on signal that emphasize signal

characteristics related to the activities

• Tradeoffs– Reduce dimensionality– Computational complexity– Maximize separation between classes– Specificity of the features to the classes: robustness, overfitting

• Some common features for acceleration data [1]:

[1] Figo, Diniz, Ferreira, Cardoso. Preprocessing techniques for context recognition from accelerometer data, Pers Ubiquit Comput, 14:645–662, 2010

mean

std


Car manufacturing activities

Data from Zappi et al, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, EWSN, 2008

Dataset available at: http://www.wearable.ethz.ch/resources/Dataset


Feature space: car manufacturing activities

Data from Zappi et al, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, EWSN, 2008

Dataset available at: http://www.wearable.ethz.ch/resources/Dataset

Angle X, angle Y, angle Z Energy X, Energy Y, Energy Z Energy, angle X, angle Y

Energy X, Energy Y Energy, angle XAngle X, angle Y


1 = Stand; 2= Walk; 3 = Sit; 4 = Lie

• Mean crossing rate of x, y and z axes, std of magnitude

Feature space: modes of locomotion (FS1)

Calatroni et al, Transferring Activity Recognition Capabilities between Body-Worn Motion Sensors: How to Train Newcomers to Recognize Modes of Locomotion, INSS, 2011


• Mean value of x, y and z axes, std of magnitude





• Ratio of x and y axes, ratio of y and z axes, std of magnitude





• Mean value of x, y and z axes, std of x, y and z axes





Less overlapping features yield better accuracies regardless with all classifiers

Classification Accuracy

Feature set 1 Feature set 2 Feature set 3 Feature set 4

NCC 11-NN NCC 11-NN NCC 11-NN NCC 11-NN

Knee 0.64 0.71 0.94 0.95 0.94 0.94 0.95 0.94

Shoe 0.53 0.65 0.68 0.86 0.7 0.86 0.77 0.87

Back 0.6 0.7 0.79 0.81 0.66 0.74 0.78 0.82

RUA 0.53 0.58 0.77 0.84 0.72 0.75 0.73 0.86

RLA 0.45 0.59 0.72 0.81 0.67 0.8 0.61 0.84

LUA 0.55 0.64 0.86 0.85 0.78 0.85 0.75 0.87

LLA 0.6 0.66 0.7 0.82 0.75 0.8 0.68 0.82

Hip 0.57 0.62 0.77 0.81 0.81 0.79 0.77 0.79

kNN better than NCC, more evident for more overlapping features


Feature extraction

• Ideally: explore as many features as possible – Not limited to "human design space"

• Evolutionary techniques to search a larger set of solutions– E.g. genetic programming

[1] Förster et al., Evolving discriminative features robust to sensor displacement for activity recognition in body area sensor networks, ISSNIP, 2009

Space of all possible designs

Human design space

Example evolved featureCross-over genetic operator


Feature selection• Select the "best" set of features• Improve the performance of learning models by:

– Alleviating the effect of the curse of dimensionality.– Enhancing generalization capability.– Speeding up learning process.– Improving model interpretability.

• Tradeoffs– Select features that correlate strongest to the classification variable (maximum relevance), ...– ... and are mutually far away from each other (minimum redundancy)– Emphasize characteristics of signal related to activity– Computational complexity (minimize feature number)– Complementary– Robustness

F1

F2F3

F4

F5

F6 F7

F8

F9

[1] Peng et al., Feature selection based on mutual information-criteria of max-dependency max-relevance and min-redundancy, PAMI, 2005


Feature selection

Filter methods• Does not involve a classifier but a

'filter', e.g. mutual information

• +– Computationally light– General: good for a larger set of

classifiers

• -– Feature set may not be ideal for all

classifiers– Larger subsets of features

Set of candidate features

Subset selection algorithm

Learning algorithm

Wrapper methods• Involves the classifier

• +– Higher accuracy (exploits

classifier's characteristics)– Can avoid overfitting with

crossvalidation

• -– Computationally expensive– Not general features

Set of candidate features

Subset evaluation

Learning algorithm

learning algorithm

Subset selection algorithm


Sequential foward selection (SFS)

• "Brute force" is not applicable!– With N candidate features: 2N feature sets to test

1. Start from an empty feature set Y0={Ø}

2. Select best feature x+ that maximize an objective function J(Yk+x+): x+ = argmax[J(Yk+x+)]

3. Update feature set: Yk+1 = Yk + x+; k=k+1

4. Go to 2

[1] Peng et al., Feature selection based on mutual information-criteria of max-dependency max-relevance and min-redundancy, PAMI, 2005

• Works well with small number of features• Objective: measure of “goodness” of the features

– E.g. accuracy


Classification

• Map feature vector to a class label


Bayesian classification• F: sensor reading, features

• C: activity class

P(F)P(C¦F) =

P(F¦C) · P(C)

P(F¦C): conditional probability of sensor reading Z knowing x

P(C): prior probability of class

P(F): marginal probability (sum of all the probabilities to obtain F)

P(C¦F): posteriori probability

Bayes theorem:

• With multiple sensors: conditional independence (Naive Bayes)

P(F)P(C¦F1,...Fn) =

P(F1,....Fn¦C) · P(C)

P(F)

P(F1¦C) · ... · P(Fn¦C) · P(C)=

• In practice only the numerator is important (denominator is constant)• Classification with a detector: e.g. class with max posterior probability

From training data


• Memory: C class centers

• Classification: C comparisons

• Pros:– Simple implementation– Online model update: add/remove classes, adapt class center– Fast, few memory

• Cons:– Simple class boundaries– Suited when classes cluster in the feature space

Nearest centroid classifier (NCC)

• Simplest classification methods– No parameters– Classify to the nearest class center

?

F1

F2


k-nearest neighbor (k-NN)

• Simple classification methods– Instance based learning– Classify to most represented around the test point– Parameter: k– k=1: nearest neighbor (overfit)– k>>: "smoothes" noise in training data

[1] Garcia et al, K-nearest neighbor search-fast GPU-based implementations and application to high-dimensional feature matching, ICIP, 2010

Figure from http://jakehofman.com/ddm/2009/09/lecture-02/

?

F1

F2

• Memory: N training points• Classification: N comparisons

• Pros:– Simple implementation– Online model update (add/remove instances, classes)– Complex boundaries

• Cons:– Potentially slow, or lots of memory

• Some faster versions– GPGPU [1]– Kd-trees to optimize neighborhood search


Decision tree

• Simple classification methods– Programmatic tree– Parameters: decision boundaries

• C4.5

?

F1

F2

t1

t2F1

F2

< t1 >= t1

< t2 >= t2

• Memory: Decision boundaries• Classification: lightweight if/else comparisons• Pros:

– Simple implementation– Continuous and discrete values, symbols

• Cons:– Appropriate when classes separate along feature dimensions

• Or PCA

– Limit the size of the tree to avoid overfitting

Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993


Null-class rejection

• Continuous activity recognition with sliding window segmentation– Gestures are not always present in a segment

– Must be "null class"

• Or confidence in the classification result is too low

[1] Calatroni et al., ETHZ Tech Report, 2010

[2] I. Cohen and M. Goldszmidt, “Properties and benefits of calibrated classifiers,” in Proc. Knowledge Discovery in Databases (PKDD), 2004.

NCC kNN

• Many classifiers can be "calibrated" to have probabilistic outputs [2]– Statistical test / likelihood of an activity


Sliding window and temporal data structure

• Activities where temporal data structure generally not important:– Walking, running, rowing, biking...

– Generally periodic activities

• Activities where it is important:– Open dishwasher: walk, grasp handle up, pull down, walk

– Close dishwasher: walk, grasp handle down, pull up, walk

– Opening or closing car door

– Generally manipulative gestures

– Complex hierarchical activities

• Problem with some features:– Different sensor readings but identical features: μ1 = μ2

μ1

μ2

Act A

Act B


Sliding window and temporal data structure

• Time to space mapping• Encode the temporal unfolding in the feature vector

– E.g. subwindows

μ1,1

μ2,1

μ1,2

μ2,2

Act A

Act Bsw1

sw2

A

B

• Other approaches:– Hidden Markov models

– Dynamic time warping / string matching

– Signal predictors


t-1 Predictor (CTRNN)

Error

axt

ayt

azt

ayt-1

azt-1

axt-1

pyt

pzt

pxt

Prediction error for gesture of

class 1

Gesture recognition using neural-network signal predictors

• Signal: 3-D acceleration vector• Predict future acceleration vector • Operation on raw signal a

t

Predictor

t0 t1

Time delayRaw acceleration

Prediction Prediction error

Class=best predictiont-1 Predictor

(CTRNN)Error

Prediction error for gesture of

class 2

• Predictors “trained” on gesture classes• Prediction error smaller on trained class

[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007


Predictor: Continuous Time Recurrent Neural Network (CTRNN)

Continuous-time recurrent neural network (CTRNN)• Continuous model neurons• Fully connected network• Rich dynamics (non-linear, temporal dynamics)• Theoretically: approximation of any dynamical system

• Well suited as universal predictor

i j

ij



Architecture of CTRNN Predictor • 5 neurons, fully connected• 3 inputs. Acceleration vector in previous step• “Hidden” neurons• 3 outputs. Acceleration vector in next step• Connections between neurons/inputs

= State of neuron i at time t

= Connection weight between neuron i and j

= Connection weight of input k to neuron i

= Value of input k (X,Y,Z)

= Bias of neuron j

= Time constant of neuron i

= 0.01 secsDiscretization using Forward Euler numerical integration:[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007


Training of the signal predictors

• Record instances of each gesture class

• Train one predictor for each class

• For each class: minimize prediction error

• Genetic algorithm– Robust in complex search spaces– Representation of the parameters by a genetic string (binary string)

• Global optimization of neural network parameters– Neuron interconnection weights– Neuron input weights– Time constant– Bias



Genetic algorithm

···Neuronweights

Inputweights

6 bits

Bias&

TimeConstant

Neuron parameters: 60 bits

Genetic string (5 neurons): 300 bits

Fitness function• Minimize prediction error for a given class• Measured on N of a training set T (T1...TN)• Lower is better (smaller prediction error)

GA Parameters• 100 individuals• Rank selection of the 30 best individuals• One-point crossover rate: 70% • Mutation rate: 1% per bit• Elitism



Experiments

• 8 gesture classes

• Planar

• Acceleration sensor on wrist

• 20 instances per class (one person)

• "Restricted" setup– No motion between gestures– Automatic segmentation (magnitude of the signal >1g indicates gesture)

• "Unconstrained" setup– Freely moving in an office, typical activities (sitting, walking, reading …)– Manual segmentation pressing a button



Results: unconstrained setup

• Training: 62%-100% (80.5% average); testing: 48%-92% (63.6% average)• User egomotion

Training Testing



Prediction error: gesture of class A



Prediction error: one instance per class



Activity segmentation and classification with string matching

Strings

Trajectories

Sensors +Signal Processing

Sensors

becfcca

aabadca

bad

cfcc

Templates

Segments

bad

cfcc

StringMatchin

gFusion

OverlapDetection

Activity Spotting

Filtering

[1] Stiefmeier et al., Wearable Activity Tracking in Car Manufacturing, PCM, 2008


Motion encoding


ab

c

de

f

g

h

Codebook

x

y

b

b

c

c

b

b

d

d

c

c

b

b

b

b

DirectionVector

Trajectory


• Approximate string matching algorithm is used to spot activity occurrences in the motion string– Based on a distance measure called Levensthein or edit distance

– Edit distance involves symbol operations associated with dedicated costs

• substitution/replacement r• insertion i• deletion d

– Crucial algorithm modification to find template occurrences at arbitrary positions within the motion string

String matching



Approximate string matching

i i

r = = = =d

= = r ==

b c b d c b b

a b c b b

b d a h c g b

a b c b b

b d

a h

c g b

C1(t0) = r + d

C2(t0) = 2i + r

Template String 1

Template String 2

Motion String

Timet-1t-6 t-5 t-4 t-3 t-2 t0



Spotting operation

t

MatchingCost C1(t)

b bd cc bb

kthr,1

ActivityEnd PointActivity

Start Point

Spotted Segment



String matching

• +– Easily implemented in FPGAs / ASIC– Lightweight– Computational complexity scales linearly with number of

templates– Multiple templates per activities

• -– Need a string encoding– Hard to decide how to quantize sensor data– Online implementation requires to "forget the past"


Activity Recognition with Hidden Markov model

• Markov chain– Discrete-time stochastic process

– Describes the state of a system at successive times

– State transitions are probabilistic

– Markov property: state transition depends only on the current state

– State is visible to the observer

– Only parameter: state transition probabilities

• Hidden Markov model– Statistical model which assumes the system being

modeled is a Markov chain

– Unknown parameters

– State is NOT visible to the observer

– But variables influenced by the state are visible (probability distribution for each state)

– Observations generated by HMM give information about the state sequence

0

1

2

3

a01

a02

a12

a13

a23

a00

0

1

2

3

a01

a02

a12

a13

a23

a00

Z1Z0

b21b20

Z2

b22


Hidden Markov model: parameters

0

1

2

3

a01

a02

a12

a13

a23

a00

aij: state transition probabilities (A={aij})bij: observation probabilities (B={bij})Π: initial state probabilities

N: number of states (N=4)M: number of symbols (M=3)X: State space, X={x1, x2, x3...}Z: Observations, Z={z1, z2, z3...}

a00 a01 a02 a03

a10 a11 a12 a13

a20 a21 a22 a23

a30 a31 a32 a33

b00 b01 b02

b10 b11 b12

b20 b21 b22

b30 b31 b32

Π0 Π1 Π2 Π3

λ(A,B,Π): HMM model

Z1Z0

b21b20

Z2

b22


Hidden Markov model: 3 main questions

Find most likely sequence of states generating Z: {xi}T

• Model parameters λ known, output sequence Z known

• Viterbi algorithm

HMM training: find the HMM parameters λ• (Set of) Output sequence(s) known

• Find the observation prob., state transition prob., ....

• Statistics, expectation maximization: Baum-Welch algorithm

Find probability of output sequence: P(Z¦ λ) • Model parameters λ known, output sequence Z known

• Forward algorithm


• Waving hello (by raising the hand)– Raising the arm

– Lowering the arm immediately after

• Handshake– Raising the arm

– Shaking

– Lowering the arm

Handraise v.s. Handshake

• Measurements: angular speed of the lower arm at the elbow

• Only 3 discrete values:– <0, negative angular speed– =0, zero angular speed– >0, positive angular speed

α

> > > > < < < = < <

> > = < = = < > < < > < < < < < = < < <

Handraise

Handshake?


Classification with separate HMMs

• Train HMM for each class (HMM0, HMM1, ....) with Baum-Welch

– HMM models the gestures

• Classify a sequence of observations– Compute the probability of the sequence with each HMM

• Forward algorithm: P(Z / HMMi).

– Consider the HMM probability as the a priori probability for the classification

• In general the class corresponds to the HMM with the highest probability

C=0,1

Gesture 0

Gesture 1

HMM0

HMM1

P(G=0)

P(G=1)MaxGesture

Gesture 2

Gesture 3

HMM2 P(G=2)

HMM3 P(G=3)

C=0,1,2,3

Training / testing dataset Likelihood estimation Classification w/maximum likelihood


Validation of activity recognition [1]

• Recognition performance– Confusion matrix

– ROC curve

– Continuous activity recognition measures

– Latency

• User-related measures– Comfort / user acceptance

– Robustness

– Cost

• Processing-related measures– Computational complexity, memory

– Energy

• ... application dependent!

[1] Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality of Context (QuaCon), 2009


Performance measures: Confusion matrix

• Instance based• Indicates how an instance is classified / what is the true class• Ideally: diagonal matrix• TP / TN: True positive / negative

– correctly detected when there is (or isn't) an activity

• FP / FN: False positive / negative– detected an activity when there isn't, or not detected when there is

• Substitution: correctly detected, but incorrectly classified



Performance measures: ROC curve

• Receiver operating characteristic• Indicates classifier performance when a parameter is varied

– E.g. null class rejection threshold

• True positive rate (TPR) or Sensitivity– TPR = TP / P = TP / (TP + FN)

• True negative rate– FPR = FP / N = FP / (FP + TN)

– Specificity = 1 − FPR



Performance measures: online activity recognition

• Problem with previous measures: suited for isolated activity recognition– I.e. the activity is perfectly segmented

• Does not reflect performance of online (continuous) recognition• Ward et al introduce [2]:

– Overfill / underfill: activities detected as longer/shorted than ground truth

– Insertion / deletions

– Merge / fragmentation / substitutions


[2] Ward et al., Performance metrics for activity recognition, ACM Transactions on Information Systems and Technology, 2(1), 2011

from [2]from [1]


ValidationEntire dataset

Training / evaluation

Train set Test set• Optimization of the ARC on

the train set

• Includes feature selection, classifier training, null class rejection, etc

• Never seen during training

• Assess generalization

• Used only once for testing

• (otherwise, indirectly optimizing on test set)

Cross-validation Fold 1

Fold 2

Fold 3

Fold 4

• 4-fold cross-validation• Assess whether results generalize to independent

dataset


Validation

• Leave-one-out cross-validation:– Train on the entire samples minus one

– Test on the left-out sample

• In wearable computing, various goals:– Robustness to multiple user (user-independent)

– Robustness to multiple sensor placement (placement-independent)

– ...

Leave out Assess performance

Person User-independent

Day, week, ... Time-independent (e.g. if the user can change behavior over time)

Sensor placement Sensor-placement-independent

Sensor modality Modality-independent

... ...


For further readingARC• Roggen et al., Wearable Computing: Designing and Sharing Activity-Recognition Systems Across Platforms, IEEE Robotics&Automation

Magazine, 2011

Activity recognition• Stiefmeier et al, Wearable Activity Tracking in Car Manufacturing, PCM, 2008• J. Ward, P. Lukowicz, G. Tröster, and T. Starner, “Activity recognition of assembly tasks using body-worn microphones and

accelerometers,”• IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1553–1567, 2006.• L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” in Pervasive Computing: Proc. of the 2nd Int’l

Conference, Apr. 2004, pp. 1–17.• D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Preprocessing techniques for context recognition from accelerometer data,”

Pervasive and Mobile Computing, vol. 14, no. 7, pp. 645–662, 2010.• Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor

Networks (BSN), 2010

Classification / Machine learning / Pattern recognition• Duda, Hart, Stork, Pattern Classification, Wiley Interscience, 2000• Bishop, Pattern recognition and machine learning, Springer, 2007 (http://research.microsoft.com/en-us/um/people/cmbishop/prml/)

Performance measures• Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality

of Context (QuaCon), 2009• Ward et al., Performance metrics for activity recognition, ACM Transactions on Information Systems and Technology, 2(1), 2011

wearable computing - part iii: the activity recognition chain (arc)

Technology

activity recognition

daniel roggen

sensor data stream

sensor data drinking

demonstration sensor

movement activity

context activity training

activity monitoring