handling concept drift in adaptive information...

17
1 GOOD-day, 25.05.07 Hingene, Belgium Handling Concept Drift in Adaptive Information Systemsby M. Pechenizkiy 1 Handling Concept Drift in Adaptive Information Systems: Review of the State-of-the-Art and Ideas how to Handle it Better Mykola Pechenizkiy Information Systems Group Department of Computer Science Eindhoven University of Technology the Netherlands GOOD-dag at Hingene, Bornem, Belgium May 25, 2007 GOOD-day, 25.05.07 Hingene, Belgium Handling Concept Drift in Adaptive Information Systemsby M. Pechenizkiy 2 Very Short CV 1996 – 2000 Univ. of Radioelectronics, Ukraine (B.Sc) 2000 – 2001 University of Jyväskylä, Finland (M.Sc) 2002 – 2005 University of Jyväskylä Ph.D. (12.2005) in CS “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 01-09.2006 University of Jyväskylä and VTT DM for pilot CFB reactor (time-series mining) 10.2006 joined TU/e

Upload: lycong

Post on 06-Apr-2018

226 views

Category:

Documents


5 download

TRANSCRIPT

1

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy1

Handling Concept Drift in Adaptive Information Systems:

Review of the State-of-the-Art and Ideas how to Handle it Better

Mykola PechenizkiyInformation Systems Group

Department of Computer ScienceEindhoven University of Technology

the Netherlands

GOOD-dag at Hingene, Bornem, Belgium May 25, 2007

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy2

Very Short CV

� 1996 – 2000 Univ. of Radioelectronics, Ukraine (B.Sc)

� 2000 – 2001 University of Jyväskylä, Finland (M.Sc)

� 2002 – 2005 University of Jyväskylä– Ph.D. (12.2005) in CS “Feature Extraction for Supervised Learning in

Knowledge Discovery Systems”

� 01-09.2006 University of Jyväskylä and VTT– DM for pilot CFB reactor

– (time-series mining)

� 10.2006 joined TU/e

2

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy3

Research Interests

� Dimensionality reduction in machine learning– feature extraction, feature selection (PhD)

� Ensemble methods in machine learning– combination functions, ensemble feature selection

� Processing data streams with concept drift (CD)– handling (local) CD using dynamic integration in ensembles

� Data mining systems as information systems– use of achievements of the IS discipline in DM research

� Research questions that applications areas suggest– industrial (pilot CFB-reactor),

– medical (antibiotic resistance),

– adaptive hypermedia (access to cultural heritage)

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy4

Outline

� Adaptive IS/Adaptive Hypermedia (AH)– Information overload; User modelling (UM)

– Machine learning for UM

� What is concept drift– types and few examples

� Approaches to track concept drift– performance measures, data characteristics

� Approaches to handle concept drift– instance selection, instance weighting, ensemble learning

� Summary on the state-of-the-art– Ways of improving CD handling in AIS

3

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy5

Adaptive Information Systems

� Explosion of information

– WWW, including deep web, various IS

– e-Learning, e-Health, e-Culture, e-Entertainment

� Information overload

– IR, personalized IR, adaptive IS, new recommender systems, personalized agents etc

� Requires user modelling (UM)

– Quizzes and simple user profiling

� Observation of the user’s behaviour

– Machine learning

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy6

Examples of ML for UM

� Student modelling– Predicting which learning objects are (the most) relevant

� On-line shopper modelling (electronic commerce)– Buying preferences

� Mobile-phone users– use of different information services (news, LBS)

� Web news readers, movie recommenders– Predicting which information objects are (the most) relevant

� E-mail processing– Classifying e-mails to different folders including spam-folder

� User interface adaptation

� Traditional classification task?

4

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy7

Traditional classification task?

� Learning to predict what items are of the most interest to a user (or simply are/aren’t of interest)

– can be defined as a traditional classification or IR task (in the context of on-line learning).

� Learning to predict user interests

– can be defined similar to multi-label hierarchical text classification task (also in on-line learning),

� Learning to predict changes in drifting:

– user interests and preferences,

– user groups (social network evolutions),

– categorization of information content (corpus of information resources is open)

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy8

Difficulties of applying ML for UM

Web et al, 2000:

� Lack of large data sets– Classifiers need to learn from a very few examples

� Lack of labeled data– Correct aren’t apparent from user’s behaviour observation

� Concept drift– The rest of this talk

� Computational complexity– E.g. in the context of giants like Google, Yahoo, etc.

but also:

� Variety of user needs, interests, skills

� Lack of both real applications and benchmarks

5

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy9

ML for UM: Concept drift

� UM is known to be a very dynamic

– attributes that characterize a user are likely to change over time.

– Therefore, it is important that learning algorithms be capable of adjusting to these changes quickly.

– In many cases attributes that cause the change are not observed by AIS.

� From a ML perspective, this is a challenging problem known as concept drift (WK96).

� What is concept drift?

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy10

General Definition of Concept Drift

� The closed world assumption in Machine Learning– existing algorithms learn from observations described by a finite set of attributes.

� In reality there can be important properties of the domain that are not observed. – hidden variables that influence the concept

� Hidden variables may change over time– concepts learned at one time can become inaccurate

– possible changes in the characteristic properties of the concept.

� Concept Drift– Changes in the hidden context (a dependency not given explicitly in the form of predictive features) that can induce more or lessradical changes in the target concept.

6

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy11

Types of Concept Drift

� The nature of change is diverse and abundant

� Cause of change (population versus concept drift)

– Virtual concept drift (sampling shift, population drift )• The necessity in the change of current model due to the change of data distribution.

– Real concept drift• Hidden changes can change the target concept

– Virtual CD often occurs with real CD

� Rate of change

– Sudden (abrupt, instantaneous), sometimes called concept shift

– Gradual (moderate, slow)

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy12

General Goals of an Ideal CD Handling

– Quickly adapts to concept drift.

– Is robust to noise and distinguishes it from concept drift

– Recognizes and reacts to reoccurring contexts (such as seasonal differences).

� Examples of CDs in UM/AIS– news are interesting by definition only ones

– learning objects are useful until they are learnt

– buying preferences change because of a new job/salary/marital status

– please suggest more!

� Examples of different types of noise (not CD)– Occasionally wrong selection/rating/skipping of an information object

– Connection failure (mobile computing)

– Related to discouraging brittleness in AIS

7

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy13

Approaches to Handle Concept Drift

� Instance selection (partial memory, abrupt forgetting):

– select instances relevant to the current concept;

– generalizing from a moving window (fixed size or adaptive size) and uses the learnt concepts for prediction only in the immediate future;

– case-base editing strategies in CBR that delete noisy, irrelevant andredundant cases.

� Instance weighting (full-memory, gradual forgetting):

– weighting according to “age”, and competence wrt the current concept;

– weighting techniques handle CD worse than analogous instance selection techniques (due overfitting the data).

� Ensemble learning:

– maintains a set of concept descriptions, predictions of which are combined using e.g. a form of voting;

– dividing the data into sequential blocks of fixed size and building an ensemble on them.

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy14

Approaches to Detect Concept Drift

� can provide meaningful description and quantification of the changes.

– indicating change-points or small time windows where the change occurs

� may follow two different approaches:

– monitoring the evolution of performance indicators over time• performance measures like accuracy, coverage

• properties of the data like

– monitoring distributions on two different time-windows.

� existing techniques: FLORA family (WK96)

8

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy15

Categorization of CD Handling Strategies

Blind Methods vs. Informed Methods� Blind: adapt the learner at regular intervals without considering

whether changes have really occurred– instance selection and instance weighting

� Informed: modify the decision model only after a change was detected– used in conjunction with a detection model.

Single vs. multiple models and their granularity � global models (Naive Bayes, FLD, SVM)

– require reconstruction of the decision model

� granular decision models (decision rules or trees) – can adapt parts of the decision model

� ensembles of multiple classifiers– static and dynamic integration

Global eager learners vs. local lazy learning� only local learning has a possibility to adapt to local CD

– Local CD - changes are different in different regions of instance space

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy16

More recent research on handling CD

� Takes into account that CD can be local

– changes in the concept or data distribution occur in some regions of instance space only,

– the type and severity of changes may depend on the location in the instance space.

� Changes in concept and data distribution occurs at an instance rather than data set level.– Local CD occurs between two consecutive time points

• if there is a sub-space of the whole instance space such that it has different changes of concept and/or data distribution in comparison with the rest of the data.

– Reflected by a different change in (local) predictive performance of currently used model in this subspace.

9

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy17

Approaches to Handle Local CD

� Case-based reasoning

� Granular decision models– VFDT, UFFT

– Figures from Gama&Castillo ”Learning with Local Drift Detection”

� Ensemble learning with dynamic integration of classifiers

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy18

Ensemble classification

Application

T

T1 T2 … TS

(x, ?) h* = F(h1, h2, …, hS)

(x, y*)

Learning

h1 h2 … hS How to prepare inputs for generation of the base classifiers?

10

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy19

Application

phase

T

T1 T2 … TS

(x, ?) h* = F(h1, h2, …, hS)

(x, y*)

Learning

phase

h1 h2 … hS

AEE −=

• Overall error depends on

average error of ensemble

members

• Increasing ambiguity

decreases overall error

• Provided it does not result

in an increase in average

error

(Krogh and Vedelsby, 1995)

Ensembles: the need for disagreement

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy20

Ensemble classification

Application

T

T1 T2 … TS

(x, ?) h* = F(h1, h2, …, hS)

(x, y*)

Learning

h1 h2 … hS

How to combine the predictions of the base classifiers?

11

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy21

Integration of classifiers

Motivation for the Dynamic Integration:Motivation for the Dynamic Integration:

The main assumption is that each classifier is the best in some sub-areas of the whole data set, where its local error is comparatively less than the corresponding errors of the other classifiers.

IntegrationIntegration

StaticStatic DynamicDynamic StaticStatic DynamicDynamic

SelectionSelection CombinationCombination

Dynamic Votingwith Selection (DVS)

Weighted Voting (WV)Dynamic Selection (DS) Static Selection (CVM) Dynamic Voting (DV)

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy22

Handling Concept Drift with Ensembles

� Ensemble is constructed as a set of concept descriptions corresponding to different time intervals:

time

training set for next base classifier

� Usually simple voting is used for model combination– does not work in complex domains with local concept drift

� Our basic idea: use local accuracies for model combination in order to handle local concept drift– adapts to concept drift better (e.g. with antibiotic resistance data)

12

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy23

The task of classification with (local) CD

� Predicting Antibiotic Resistance

– predict the sensitivity of a pathogen to an antibiotic based

on data about the antibiotic, the isolated pathogen, and the

demographic and clinical features of the patient.

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy24

Prediction of Antibiotic Resistance

13

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy25

Classification over Sequential Data Blocks

accuracy for C4.5 ensembles:

our ds, dv and dvs approaches shows much better results

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 3 5 7 9

11

13

15

17

19

21

23

25

27

v

wv

ds

dv

dvs

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy26

Stability of Regions: Rotating Hyperplane

� Base models of an ensemble should not be discarded if - global accuracy on the current block of data falls, but they are

still good experts in the stable parts of the data.

� One solution to this problem is the use of DIC:- the models are integrated at an instance level according to

their local accuracies.

t1 t

2 t

3 t

4

Stability of regions in the rotating hyperplane problem

14

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy27

Classification over Sequential Data Blocks

accuracy results for C4.5 ensembles on rotating hyperplane data

0.7

0.75

0.8

0.85

0.9

0.95

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

v

wv

ds

dv

dvs

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy28

Classification over Sequential Data Blocks

Accuracy results of Naïve Bayes ensembles for SEA concepts data

0.7

0.75

0.8

0.85

0.9

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

v

wv

ds

dv

dvs

15

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy29

State-of-the-art CD handling approaches

wrt UM problem do not take into account:

� the development of the scope of user-interests in terms of coverage of topics,

� the different types of changes in user interest and their reoccurrence,

� the hierarchical nature and rich semantics of information content,

� (un)certainty in – the correct capturing of user interests,

– relevancy (and top-relevancy) of an information object,

� existence and evolution of the user social networks– CD can be predicted (for proactive control)

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy30

Future work

� Development of a new unifying theoretical framework and corresponding ML techniques to facilitate more effective UM;

� Also more stress on interdisciplinary– online machine learning, stream mining,

– text mining, information retrieval

– recommender systems

– user modelling, adaptive IS, etc.

� Development of research infrastructure for studying CD

� Evaluation of the framework and techniques with various simulation studies, and

� Integration of the techniques with existing AIS for the external validation

16

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy31

Concluding Summary

� CD handling is still an emerging area of research

� Interesting research direction from DM side– Ensemble learning and meta-learning in general

– CBR approaches

– Rule-based approaches (also for cyclic concept sequences)

– Dimensionality reduction under the presence of CD

– Regression, clustering

– Visualization of CD

– … application areas will suggest more

� E.g. in UM for AIS– Nature of information content

– Many uncertainties

– Social networks

� Research infrastructure is important

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy32

Thank you!

Questions,

comments,

suggestions, and

collaboration of course

are warmly welcome!

17

GOOD-day, 25.05.07

Hingene, Belgium

“Handling Concept Drift in Adaptive Information Systems”

by M. Pechenizkiy33

References

� Gama J., Castillo G. 2006. Learning with Local Drift Detection. ADMA 2006: 42-55

� Klinkenberg R. and Renz. I. 1998. Adaptive information filtering: Learning in the presence of concept drifts. In Learning for TextCategorization, pages 33–40. AAAI Press.

� Tsymbal A., Pechenizkiy M., Cunningham P., Puuronen S. 2006. Dynamic Integration of Classifiers for Handling Concept Drift. (to appear) Special Issue of Information Fusion journal “Applications of Ensemble Methods”, Elsevier Science.

http://www.win.tue.nl/~mpechen/data/DriftSets/dic_cd.pdf

� Widmer G. and Kubat M. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23:69–101.