lecture8 - from cbr to ibk

19
Introduction to Machine Introduction to Machine Learning Learning Lecture 8 Lecture 8 Instance Based Learning and Case-Based Reasoning Albert Orriols i Puig il@ ll l d aorriols@salle.url.edu Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Upload: albert-orriols-puig

Post on 24-Jan-2015

982 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lecture8 - From CBR to IBk

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 8Lecture 8 Instance Based Learning and Case-Based Reasoning

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Page 2: Lecture8 - From CBR to IBk

Recap of Lecture 7kNN

15-NN 1-NN

Key aspectsValue of kValue of k

Distance functions

Slide 2Artificial Intelligence Machine Learning

Page 3: Lecture8 - From CBR to IBk

Recap of Lecture 7Where is learning in kNN?g

Retrieval system

N l b l d lNo global model

No generalization

No learning!o ea g

B t till it i bl t t t l ifi tiBut still, it is able to create accurate classification models

Slide 3Artificial Intelligence Machine Learning

Page 4: Lecture8 - From CBR to IBk

Today’s Agenda

Formalizing the framework: From kNN to CBRIncorporating learning in different phases:

Learn prototypesOrganize the memory in clustersOrganize the memory in clustersLearn the best distance function

Provide explanations

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture8 - From CBR to IBk

From kNN to CBR

kNN provides a retrieving system

Much work on different phases of kNNPrototype selection

Distance function selection

CBR provides a general framework based on kNN

Slide 5Artificial Intelligence Machine Learning

Page 6: Lecture8 - From CBR to IBk

Schema of CBR

Select a solution

CBR cycle(Aamodt & Plaza 1994)

ReuseSimilarityfunction Revise the

solution

Plaza, 1994)

Retrieve Revise

solution

Problem SolutionCaseMemory

Retain

Coherence andrelevance of the

attributesStructure and

agrupation of the casesRetain the

new knowledge

Slide 6Artificial Intelligence Machine Learning

Page 7: Lecture8 - From CBR to IBk

Phases of CBRFive key phasesFive key phases

Preprocess the training instanceSo that it meets the requirements of the systemSo that it meets the requirements of the system

RetrieveU kNN i h h l d di f iUse kNN with the selected distance function

ReuseVote-based scheme

Revise Adapt the solution if necessary

RetainRemove examples from or add examples to the case memory

Slide 7Artificial Intelligence Machine Learning

Page 8: Lecture8 - From CBR to IBk

Challenges in CBR

Hot areasReduce the cost of matchingReduce the cost of matching

Reduce the total number of examples in the case memoryO i th i l t d l lt lOrganize the case memory in clusters and only consult examples of some clusters

Automatically create distance functions that are suited to yourAutomatically create distance functions that are suited to your problem

Extraction of explanations:Extraction of explanations:CBR does not extract legible models (actually, does not learn any model))

Slide 8Artificial Intelligence Machine Learning

Page 9: Lecture8 - From CBR to IBk

Prototype SelectionTraining data sets contain a large number of instancesg g

Increase the prediction time

M t i i i tMay contain noisy instances

Prototype selectionSelect the representative examples to form the case baseSelect the representative examples to form the case base

Remove all the other examples

How?Learn which examples are the ones that maximize CBR accuracy

Slide 9Artificial Intelligence Machine Learning

Page 10: Lecture8 - From CBR to IBk

Prototype SelectionPossible sets of prototypes

Sel.Proto 1

Sel.Proto 2

Sel.Proto 3

…Training Data set

Training Data set

Training Data set Proto 1 Proto 2 Proto 3Data set

How do we know which i th b t S l ti f

Data setSplit the

training set

Data set

is the best Selection of Prototypes?

Validation

KNN

set

Test data set

Does it sound familiar to you?Does it sound familiar to you?Problem: Search for the best SPIt’s just an optimization problem

Slide 10Artificial Intelligence Machine Learning

It s just an optimization problemFor robustness, use cross-validation or similar validation procedures

Page 11: Lecture8 - From CBR to IBk

Prototype SelectionOptimization methods used so farp

Genetic algorithms (Holland, 75)

G ti P i (K t l 1989)Genetic Programming (Koza et al., 1989)

Grammar Evolution (Ryan & O’Neill, 1998)

Slide 11Artificial Intelligence Machine Learning

Page 12: Lecture8 - From CBR to IBk

Case-Based Memory Clustering

Training data sets contain a large number of instancesg gClustering: Place instances in different clusters

O l t i f th l t l t th tOnly retrieve from the same cluster or clusters that are close to you

Slide 12Artificial Intelligence Machine Learning

Page 13: Lecture8 - From CBR to IBk

Case-Based Memory Clustering

Retrieve phase ReuseRetrieve phase1. Compare with all the prototypes2. Compare only with the examples

of the closest cluster

Reuse phasePropose a solution with theretrieved casesof the closest cluster retrieved cases

CaseR t i R iCase Memory

Retrieve Revise

Retain

Retain phaseUpdate the organization.It may imply the update of the

Revise phaseRevise if the solution is potentially valid

Slide 13Artificial Intelligence Machine Learning

Retainy p y pclusters

p y

Page 14: Lecture8 - From CBR to IBk

Generation of Distance Functions

How does the distance function influences learning?g

It may be the key between success and failure!

Slide 14Artificial Intelligence Machine Learning

Page 15: Lecture8 - From CBR to IBk

Generation of Distance Functions

Can I find a distance function that makes kNN perform pthe best in all cases?

No way Actually NFL announces it (Wolpert 1992)No way. Actually, NFL announces it (Wolpert, 1992)

Different distances suited for different domains

May I try to create a new distance function for each specific problem?

Of course. Again, an optimization problem

Slide 15Artificial Intelligence Machine Learning

Page 16: Lecture8 - From CBR to IBk

Generation of Distance FunctionsSplit the training data set into

T i i t’Training set’Validation set

Optimization problem

Assume a parametric form

Validation set

Optimize the parameters of the underlying function

Dist.function1

Dist.function2

Dist.functionn

Being more ambitious?

Do not assume any parametric form… form

Optimize both the function structure and the parameters

Training Data set‘

kNN Examples:

(Fornells et al., 2005)

error1 error2 errorn

( , )

(Camps et al., 2003)

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture8 - From CBR to IBk

Extraction of ExplanationsOne of the main drawbacks of CBR is that it does not provide pany explanation

Prediction based on nearest neighbors

New techniques to provide explanationsNew techniques to provide explanationsBased on used instances

Building of partial models

Not studied in more detail here

Slide 17Artificial Intelligence Machine Learning

Page 18: Lecture8 - From CBR to IBk

Next Class

Probabilistic-based learning

Slide 18Artificial Intelligence Machine Learning

Page 19: Lecture8 - From CBR to IBk

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 8Lecture 8 Instance Based Learning and Case-Based Reasoning

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull