Download - Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing research

Daniel Roggen

2011

Wearable ComputingPart IV

Ensemble classifiersInsight into ongoing research

© Daniel Roggen www.danielroggen.net [email protected]

F ContextActivityS2 P2

S1 P1

S0 P0

S3 P3

S4 P4

S0

S1

S2

S3

S4

F1

F2

F3

F0 C0

C1

C2

PreprocessingSensor sampling Segmentation

Feature extractionClassification

Decision fusion

R

Null classrejection

Reasoning

Subsymbolic processing Symbolic processing

Low-level activity models

(primitives)

Runtime: Recognition phase

Design-time: Training phase

Training

Activity-aware application

Sensor data

AnnotationsHigh-level activity

models

Training

A1, p1, t1

A2, p2, t2

A3, p3, t3

A4, p4, t4

t


Many classifiers: Ensemble classifiers

• What is it?

• How to generate ensembles?

• What are they useful for in wearable computing?


What are ensemble classifiers?

{(X1,y1),(X2,y2)…(Xn,yn)}

Decision fusion


Why?

• Intuitively: increasing the confidence in the decision taken

– Seek additional opinion before making a decision

– Read multiple product reviews

– Request reference before hiring someone


Background

• 1786 Condorcet’s Jury Theorem

– Probability of a group of individuals arriving at a correct decision

– Individual vote correctly (p) or incorrectly (1-p)

– With p>0.5, the more voters the higher the probability that the majority decision is correct

– « Theoretical basis for democracy »

http://en.wikipedia.org/wiki/Condorcet_jury_theorem


Also known as…

• Combination of multiple classifiers

• Classifier fusion

• Classifier ensembles

• Mixture of experts

• Consensus aggregation

• Composite classifier systems

• Dynamic classifier selection

• …

Polikar, Ensemble based systems in decision making, IEEE Circuits and Systems magazine, 2006


Why are classifier ensembles interesting?

• Ruta: Another approach [to progress in decision support systems] suggests that as the limits of the existing individual method are approached and it is hard to develop a better one, the solution of the problem might be just to combine existing well performing methods, hoping that better results will be achieved.

• Diettrich: The main discovery is that ensembles are often much more accurate than the individual classifiers that make them up.

• Polikar: If we had access to a classifier with perfect generalization performance, there would be no need to resort to ensemble techniques. The realities of noise, outliers and overlapping data distributions, however, make such a classifier an impossible proposition. At best, we can hope for classifiers that correctly classify the field data most of the time. The strategy in ensemble systems is therefore to create many classifiers, and combine their outputs such that the combination improves upon the performance of a single classifier.

Ruta et al., An overview of classifier fusion methods, Computing and Information Systems, 2000

Dietterich, Ensemble methods in machine learning, Proc. Multiple Classifier Systems, 2000



Motivation


• The ‘true f’ cannot be

represented by any of

the classifiers in H

• A combination of

multiple classifiers

expands the

representable functions

Dietterich: “These three fundamental issues are the three most important ways in which existing learning algorithms fail. Hence, ensemble methods have the promise of reducing (and perhaps even eliminating) these three key shortcomings of standard learning algorithms.”

• Enough training data but

computationally difficult

to find the best classifier

• Local optima

• Ensemble constructed

from different start

points better

approximates f

• Insufficient data

• Many classifiers give the

same accuracy on the

training data

• An ensemble of

‘accurate’ classifiers

reduces the risk of

choosing the wrong

classifier


Motivation• Statistical reasons:

– Good performance on training set does not guarantee generalization– Combining classifiers reduce the risk of selecting a poorly one

• Large volume of data– Training classifiers with large amounts of data can be impractical– Partition data in smaller subsets and train/combine specific classifiers

• Too little data– Resampling techniques and training of different classifiers on (random) subsets

• Data fusion– Multiple/multimodal sensors– For each modality a specific classifier is trained, and then combined

• Divide and conquer– Too complex decision boundary for a single classifier– Approximate the complex decision boundary by multiple classifiers


© Daniel Roggen www.danielroggen.net [email protected], Ensemble based systems in decision making, IEEE Circuits and Systems magazine, 2006

Divide and conquer


Classifier selection / Classifier fusion

• Classifier selection: Use an expert in a local area of the feature space

• Classifier fusion: merge individual (weaker) learners to obtain a single (stronger) learner


The diversity problem

• Classifiers must (in a fused sense) agree on the right decision

• When classifiers disagree, they must disagree differently

5 classifiers, majority voting

Classifier Decision

h0: 0

h1: 1

h2: 0

h3: 2

h4: 3

• Classifiers are diverse if they make different errors on data points

• A strategy for ensemble generation must find diverse classifiers


Measuring diversity

• An good diversity measure should relate to the ensemble accuracy

• No strict definition of ‘diversity’ – active area of research

• For two classifiers: statistical litterature

• For three+ classifiers: no consensus


Kuncheva, Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy, Machine Learning, 2003


Measuring diversity: pair-wise measures

• Average of all pair-wise diversity measures

• Q-Statistics

• Correlation

• Disagreement, double fault


Measuring diversity: summary

• No diversity measure consistently correlates with higher accuracy

• “although a rough tendency was confirmed. . . no prominent links appeared between the diversity of the ensemble and its accuracy. Diversity alone is a poor predictor of the ensemble accuracy” [1]

• Although there are proven connections between diversity and accuracy in some special cases, our results raise some doubts about the usefulness of diversity measures in building classifier ensembles in real-life pattern recognition problems. [2]

[1] Kuncheva, That Elusive Diversity in Classifier Ensembles, IbPRIA, 2003

[2] Kuncheva, Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy, Machine Learning, 2003


Measuring diversity: summary

• In the absence of additional information Q may be recommended– Simple implementation

– Limits: [-1;1]

– Independence value: 0

Kuncheva, Is Independence Good for Combining Classifiers?, Proc. Int. Conf. Pattern Recognition, 2000


How to obtain diversity

Strategies for ensemble generation

1. Enumerating the hypotheses

2. Manipulating the training examples

3. Manipulating the input features

4. Manipulating the output targets

5. Injecting randomness


Brown, Yao, Diversity creation methods: a survey and categorisation, Information Fusion, 2005


Strategy for ensemble generation (1)

Manipulating the training examples

• Learning algorithm run multiple times on different training subsets

• Suited for unstable classifiers– decision tree, neural networks, …– (Stable: linear regression, nearest neighbor, linear threshold)

• Methods:– Bagging: randomly draw samples from training set– Cross-validation: leave out disjoints subsets from training– Boosting: draw samples with more likelihood for difficult samples



Manipulating the input features

• Change the set of input features available to the learning algorithm

• E.g. select/group features according to identical sensors

• Input features need to be redundant

• Input decimated ensembles [1]

[1] Tumer,Oza, Input decimated ensembles, Pattern Anal Applic, 2003

Ho, The Random Subspace Method for Constructing Decision Forests, IEEE PAMI, 1998



Manipulating the output targets

• Classification: {(X1,y1),(X2,y2)…(Xn,yn)}

• Change the classification problem by changing y

• Error correcting codes– Change form 1 classifier with K classes -> log2(K) 2-class classifiers



Injecting randomness

• Randomness in the learning algorithm

• E.g.– initial weights of a neural network

– initial parameters of HMM

– C4.5: random selection among N best decision tree splits


How to combine the classifiers?

Ruta et al., An overview of classifier fusion methods, Computing and Information Systems, 2000


• (weighted) Majority voting– Class label output– Select the class most voted for

• Mean rule– Continuous output

– Support for class wj is average of classifier output

• Product rule– Continuous output– Product of classifier output

How to combine the classifiers?


Which method is better?

• No free lunch - problem dependent

• Ensemble generation– Boosting vs Bagging: Boosting usually achieves better generalization but is more

sensitive to noise and outliers

• Ensemble combination– General case: mean rule - consistent performance on a broad range of problems

– Reliable estimate of classifier accuracy: weighted average, weighted majority

– Classifier output posterior probabilities: product rule



Which method is better?

• Ensemble combination

– No information classifier errors distribution: median• always leads to Pe → 0 even with heavy-tailed distributions.

– Error distribution less heavy tailed: mean

– For technical reasons (e.g. communication in WSN) majority vote may be the only one that can be implemented

• Performance of the majority vote strategy coincides with the performance of the median strategy

Cabrera, On the impact of fusion strategies on classification errors for large ensembles of classifiers , Pattern recognition, 2006


In wearable computing

Classifier fusion• Multimodal sensors & NULL class rejection

• Sound

• Acceleration

• Null class when sound&acceleration classification disagree

Ward, Gesture Spotting Using Wrist Worn Microphone and 3-Axis Accelerometer, Proc. Joint Conf on Smart objects and ambient intelligence, 2005


Zappi, Roggen et al. Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. EWSN, 2008.

Stiefmeier et al., Wearable activity tracking in car manufacturing, Pervasive Computing Magazine, 2008




Classifier fusion

Sensor Scalability [2]

• Application defined performance

• Clustering

Robustness to faults [1]

• Graceful degradation

• Implicit fault-tolerance

[1] Zappi, Stiefmeier, Farella, Roggen, Benini, Tröster, Activity Recognition from On-Body Sensors by Classifier Fusion: Sensor Scalability and Robustness. ISSNIP 07

[2] Zappi, Lombriser, Stiefmeier, Farella, Roggen, Benini, Tröster, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, EWSN 08



Classifier fusion

Power-performance management[1]

[1] Zappi, Roggen et al., Network-level power-performance trade-off in wearable activity recognition: a dynamic sensor selection approach, submitted to ACM Trans. Embedded Computing Systems



Classifier selection

Stiefmeier, Combining Motion Sensors and Ultrasonic Hands Tracking for Continuous Activity Recognition in a Maintenance Scenario,

Location Class 1(μ1,σ1)

Location Class 2(μ2,σ2)

Select 'expert' classifier for location

class 1

Select 'expert' classifier for location

class 2


Further applications

• Classification despite missing features– "A bootstrap-based method can provide an alternative approach to the missing

data problem by generating an ensemble of classifiers, each trained with a random subset of the features." [1]

– "Strikingly the reduced-models approach, seldom mentioned or used, consistently outperforms the other two [imputation] methods, sometimes by a large margin." [2]

• E.g.:– Long term multimodal activity recognition

– Physiological signal assessment

– Opportunistic activity recognition

[1] Polikar, Bootstrap-inspired techniques in computational intelligence, IEEE Signal Processing Magazine, 2007

[2] Provost, Handling Missing Values when Applying Classification Models, Machine Learning Research, 2007


Further applications

• Enhanced robustness in activity recognition– Typically small datasets: are we using the optimal decision boundary for field

deployment?– Ensembles of classsifiers trained with resampling– Ensembles have different field generalization performance

• Confidence estimation/QoC– Continuous valued output of ensemble classifiers can estimate posterior

probability [1]

• WSN– "classifiers using data from different sensors are usually uncorrelated to a far

greater degree than classifiers which use data from the same sensor" [2]– Distributed activity recognition (Tiny Task Network): only classification result is

required, lower bandwidth

[1] Muhlbaier, Polikar, Ensemble confidence estimates posterior probability, Int. Workshop on Multiple Classifier Systems, 2005

[2] Fumera, Roli, A theoretical and experimental analysis of linear combiners for multiple classifier systems , IEEE Trans. Pattern Anal. Mach. Intell., 2005


Reasons not to use ensembles

• Classifier with (perfect|good) generalization performance available

• Decreased comprehensibility

• Limited storage and computational resources

• Correlated errors or uncorrelated errors at rate higher than chance


Summary

• Large body of research showing benefits of ensembles

• Some ensembles classifiers already in use in Wearable Computing

• Potentials: missing features, confidence/QoC, improved robustness, WSN

• Active field of research


Further readingReviews, books• Ruta et al., An overview of classifier fusion methods, Computing and Information Systems, 2000• Dietterich, Ensemble methods in machine learning, Proc. Multiple Classifier Systems, 2000• Polikar, Ensemble based systems in decision making, IEEE Circuits and Systems magazine, 2006• Polikar, Bootstrap-inspired techniques in computational intelligence, IEEE Signal Processing Magazine, 2007• Kuncheva, Combining Pattern Classifiers, Methods and Algorithms, Wiley, 2005

Diversity• Kuncheva, Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy ,

Machine Learning, 2003• Brown, Yao, Diversity creation methods: a survey and categorisation, Information Fusion, 2005

Decimation• Tumer,Oza, Input decimated ensembles, Pattern Anal Applic, 2003• Ho, The Random Subspace Method for Constructing Decision Forests, IEEE PAMI, 1998

Confidence• Muhlbaier, Polikar, Ensemble confidence estimates posterior probability, Int. Workshop on Multiple Classifier

Systems, 2005• Tourassi, Reliability Assessment of Ensemble Classifiers-Application in Mammography

Missing features• Provost, Handling Missing Values when Applying Classification Models, Machine Learning Research, 2007

Conferences• Proc. Workshop Multiple Classifier Systems (Springer)

Various• Cabrera, On the impact of fusion strategies on classification errors for large ensembles of classifiers , Pattern

recognition, 2006• Fumera, A theoretical and experimental analysis of linear combiners for multiple classifier systems, IEEE Trans.

Pattern Anal. Mach. Intell., 2005


Multiplication of sensors in real-world use


http://www.opportunity-project.eu


Activity recognition with sensors that just happen to be available

Opportunistic activity recognition

Designing a pattern recognition system without knowing the input space !


The OPPORTUNITY activity recognition chain


WP4 Ad-hoc cooperative sensing

OPPORTUNITY Architecture, Recognition goal, Self-* principles

• Specify what should be recognized but not how– E.g.: « Detect grasping manipulative activities with wearable sensors »

• Self-organization in a coordinated sensing mission– E.g.: « Recognition of manipulative activities » calls for sensors capable of providing

movement information, and placed on body to network

• Sensor self-description (statically known characteristics)


WP1 Sensor and features

Filter variations

• Conditioning: re-define features to make them less sensitive to variations– E.g. use magnitude of acceleration signal, rather than X,Y,Z vector

• Abstraction: different modalities map to the same feature space– E.g. hand coordinates from inertial sensors or localization system

• Self-characterization: run-time characteristics– E.g. location, orientation


WP2: Opportunistic classifiers

Robust classification & allow for adaptation

• Dynamic « Ensemble classifier » architecture

• Dynamic selection of most informative information channel

• Allow for multimodal data, changing sensor numbers

• Allow for adaptation

sensor0

sensor1

sensorn

classifier0

classifier1

classifiern

c0

c1

cn

Fusion class userGesture


WP3 Dynamic adaptation and autonomous evolution

Run-time monitoring and adapation of the system

• Adaptation to slow changes, long-term, concept drift– Sensor degradation, change in user action-motor strategies

• Use new sensors– Sensing infrastructure changes with upgrades

• Opportunistic user feedback– Explicit: e.g. feedback through keyboard

– Implicit: e.g. from EEG signals


Dynamic adaptation: power-performance management

• Dynamic ensemble classifiers• Passively: ensemble classifiers allow for changes in the environment• Actively: benefit of dynamic adaptation

Zappi et al. Network-level power-performance trade-off in wearable activity recognition: a dynamic sensor selection approach, To appear ACM TECS


Adaptation: Classifier self-calibration to sensor displacement

Förster, Roggen, Tröster, Unsupervised classifier self-calibration through repeated context occurences: is there robustness against sensor displacement to gain?, Proc. Int. Symposium Wearable Computers, 2009

Calibration dynamics: class centers follow cluster

displacement in feature space

Self-calibration to displaced sensors increases accuracy:

• by 33.3% in HCI dataset

• by 13.4% in fitness dataset

Principle: upon activity detection, classifiers are re-trained to better model the last classified activity


Adaptation: minimally user-supervised adaptation

Acceleration data Recognized gesture

Error button

Förster et al., Incremental kNN classifier exploiting correct - error teacher for activity recognition, Submitted to ICMLA 2010


Adaptation: minimally user-supervised adaptation

• Adaptation leads to:• Higher accuracy in the adaptive case v.s. control• Higher input rate• More "personalized" gestures

Förster et al., Online user adaptation in gesture and activity recognition - what’s the benefit? Tech Rep.

Förster et al., Incremental kNN classifier exploiting correct - error teacher for activity recognition, Submitted to ICMLA 2010

© Daniel Roggen www.danielroggen.net [email protected]örster et al., On the use of brain decoded signals for online user adaptive gesture recognition systems , Pervasive 2010

Adaptation: with brain-signal feedback

• ~9% accuracy increase with perfect brain signal recognition• ~3% accuracy increase with effective brain signal recognition accuracy•Adaptation guided by the user’s own perception of the system• User in the loop


• New sensors may be discovered – Infrastructure upgrades– Entering a new environment

• Problem: How to use the sensor without self-*?– Typical in open-ended environments– Hard to predict what future sensors will be deployed

• Unsupervised approaches to use new sensors!

Using new sensors without supervision…


Using new sensors without supervision… … using behavioral assumptions

• Can a reed switch recognize different gestures and modes of locomotion?

• Extract maximum information content from simple sensors– Use behavioral assumptions


Open

Using new sensors without supervision… … using behavioral assumptions


Application to Opportunity Dataset

• Functionality of wearable sensor is learned incrementally• Autonomous training of wearable systems• Only needed: sporadic interactions with the environment• Applicable in WSN/AmI as demonstrated by hardware implementation

Calatroni et al. Context Cells: Towards Lifelong Learning in Activity Recognition Systems, EuroSSC 2009


Transfer of recognition capabilities

• System designed for domain 1 should work in domain 2• Changes of sensors between setup 1 and 2

Roggen et al., Wearable Computing: Designing and Sharing Activity-Recognition Systems Across Platforms, IEEE Robotics&Automation Magazine, 2011


Summary

• Improving wearability & user-acceptance

• Addressing real-world deployment issues

• Enabling large-scale Ambient Intelligence environments

www.opportunity-project.euEC grant n° 225938

Download - Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing research

Top Related