supervised ir incremental user feedback (relevance feedback) or initial fixed training sets –user...

22
Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets User tags relevant/irrelevant Routing problem initial class Big open Question How do we obtain feedback automatically with minimal effort? fined Computation of relevant set based on:

Upload: roxanne-carpenter

Post on 01-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Supervised IR

• Incremental user feedback (Relevance Feedback)

OR

• Initial fixed training sets– User tags relevant/irrelevant

– Routing problem initial class

Big open Question –How do we obtain feedback automatically with minimal effort?

Refined Computation of relevant set based on:

Page 2: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

“Unsupervised” IRPredicting relevance without user feedback

Pattern Matching:–Query vector/set–Document vector/set–Co-occurrence of terms assumed to be indication of relevance

Page 3: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Relevance Feedback

Incremental Feedback in vector model

Refer to Rocchio, 71

Q0 = Initial Query

Q1 = Q0 + Ri - Si

1

NRel

i = 1

NRel1

NIrrel

i = 1

NIrrel

Page 4: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Probabilistic IR/Text Classification

Document Retrieval

If P(Rel|Doci) > P(Irrel|Doci)

Then Doci is “relevant”

Else Doci is “not relevant”

-OR- P(Rel|Doci)

P(Irrel|Doci)

Then Doci is “relevant”…

Magnitude of ratio indicates our confidence

If > 1

Page 5: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Text Classification

Select Classj such that:

P(Classj | Doci) is maximized

(Bowling, DogBreeding, etc.) (incoming mail message)

Alternately

P(Classj | Doci)

P(NOT Classj | Doci)

is maximized

Page 6: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

General FormulationCompute:

P(Classj | Evidence)

One of a fixed K classes or set of feature values

*disjoint classes* - Can’t be a (e.g. Words in a language,Medical Test Results, etc)

member of more than 1

Uses:

•REL/IRREL Document Retrieval

•Work/Bowling/Dog Breeding Text Classification/Routing

•Spanish/Italian/English Language ID

•Sick/Well Medical Diagnosis

•Herpes/Not Herpes Medical Diagnosis

Page 7: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Feature SetGoal: To Compute:

P(Classj | Doci) Abstract Formulation

P(Classj| Representation of Doci) Probability given a

representation of Doci

P(Classj| W1, W2,…Wk) One Representation of a vector

of words in the document

-OR-

P(Classj| F1, F2, … Fk) More general, a list of document

features

Page 8: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Problem – Sparse Feature Set

In Medical Diagnosis:

worth considering all possible feature combinations

Test 1 Test 2 Test 3 F(H), F(Not H)

Herpes T T T 30/1

-Herpes T T F 12/120

Herpes T F T 17/3

-Herpes T F F 4/186

-Herpes F T T 100/32

Can compute P(Evidence|Classi) directly from data for all evidence patterns

Eg P(T,T,F|Herpes) = 12/Total Herpes

Word 17 Word 24 Word 38 Word 54

Work C++ Compile Run 486

Personal Collie Show Pup Fur

Work

Personal Akita Show Pup Groom

In IR:

Too many combinations of feature values to estimate class distribution after all combinations

Page 9: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Bayes RulePosterior probability of class given evidence Prior probability of class

P(Classi|Evidence) = P(Evidence|Classi) x P(Classi)

P(Evidence)

Uninformative prior: P(Classi) = 1

(Total # of Classes)

Page 10: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Example in Medical DiagnosisA single blood test Probability of test if patient has herpes

P(Herpes|Evidence) = P(Evidence|Herpes) * P(Herpes)

Probability of herpes given a test result P(Evidence) Prior probability of patient having herpes

Prob of a (pos/neg) test result

P(Herpes|Positive Test) = .9

P(Herpes|Negative Test) = .001

P(Not Herpes|Positive Test) = .1

P(Not Herpes|Negative Test) = .999

Page 11: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Evidence DecompositionP(Classj | Evidence)

A given combination of feature values

Medical Diagnosis

Class Blood Test Visible Sores Fever Blood Test 2

HERPES POS T T F

NOT HERP NEG F T F

NOT HERP NEG F F F

HERPES NEG T F T

Text Classification/ Routing

Class W13 W27 W34 W49 Wi

Work Compiler C++ YK486 Disassembler …

Bowling

Dog Breeding Collie Show Grooming Sire …

Personal date Tonight movie love …

Page 12: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Example in Text Classification / RoutingDog Breeding (collie, groom, show) Prior chance that mail is about dog breeding

P(Classi|Evidence) = P(Evidence|Classi) * P(Classi)

Observe directly through Training data P(Evidence)Class 1 – Dog Breeding Training Class 2 - Work

Fur Collie

Collie

Groom

Show

PoodleSire

Breed

Akita

Pup

Compiler

X86

C++

Lex

YACC

JavaComputer

Page 13: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Probabalistic IRTarget/Goal:

Document

Retrieval

Evidence P(Rel|Doci) P(Irrel|Doci)

(Words in) Doc1 .95 .05

(Words in) Doc2 .80 .20

(Words in) Doc3 .01 .99

For a given model of

relevance to user’s needs

Document Routing /

Classification

Evidence P(Work1) P(Work2) P(Dog Breeding) P(Bowling) P(other)

(Words in) Doc1 .91 .01 .07 .02 .01

(Words in) Doc2 .45 .45 .03 .05 .02

(Words in) Doc3 .01 .03 .94 .01 .01

Page 14: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Multiple Binary SplitsQ1

A

A1 A2

B

B1 B2

Flat K-Way Classification

A B C D EF G

Q1

Page 15: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Likelihood Ratios

P(Class1| Evidence) = P(Evidence|Class1) * P(Class1)

P(Evidence)

P(Class2| Evidence) = P(Evidence|Class2) * P(Class2)

P(Evidence)

P(Class1|Evidence) P(Evidence|Class1) P(Class1)

P(Class2|Evidence) P(Evidence|Class2) P(Class2)

= *

Page 16: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Likelihood Ratios

Binary Classifications

P(Rel|Doci) Document Retrieval options are

P(Irrel|Doci) Rel and Irrel1.

2.P(Work|Doci) Binary routing task –

P(Personal|Doci) (2 possible classes)

Can Treat K-Way classification as a series of binary classifications

3.

P(Classj|Doci)

P(NOT Classj|Doci)

•Compute this ratio for all classes

•Choose class j for which this ratio is greatest

Page 17: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Independence Assumption

Evidence = w1,w2,w3,…wk

P(Class1|Evidence) P(Class1) P(Evidence|Class1)

P(Class2|Evidence) P(Class2) P(Evidence|Class2)

P(Class1) P(wi|Class1)

P(Class2) P(wi|Class2)

= *

Final Odds Initial Odds

= * i = 1

k

Page 18: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Using Independence AssumptionP(Personal|Akita,pup,fur,show) P(Personal) P(Akita|Personal) P(pup|Personal)

P(Work|Akita,pup,fur,show) P(Work) P(Akita|Work) P(pup|Work)= * *

P(fur|Personal) P(show|Personal)

P(fur|Work) P(show|Work)* *

P(Personal|Evidence) 1 27 18 36 3

P(Work|Evidence) 9 2 0 2 5= * * * *

Product of likelihood ratios for each word

= some constant

Page 19: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Note: Ratios (Partially) Self Weighting

P(The|Personal) 1 5137/100,000 P(The|Work) 1 5238/100,000

e.g.( )

P(Akita|Personal) 37 37/100,000 P(Akita|Work) 1 1/100,000

e.g.( )

Page 20: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Bayesian Model ApplicationsAuthorship Identification

P(Hamilton|Evidence) P(Evidence|Hamilton) P(Hamilton)P(Madison|Evidence) P(Evidence|Madison) P(Madison)

= *

Sense Disambiguation

P(Tank-Container|Evidence) P(Evidence|Tank-Container) P(Tank-Container) P(Tank-Vehicle|Evidence) P(Evidence|Tank-Vehicle) P(Tank-Vehicle)

= *

Page 21: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Dependence Trees(Hierarchical Bayesian Models)

P(w1,w2,…,wk) = P(w1) * P(w2|w1) * P(w3|w2) * P(w4|w2) * P(w5|w2) * P(w6|w5) * P(w6|w5w4)

= direction of dependence

w1

w2

w3 w4

w5

w6

Page 22: Supervised IR Incremental user feedback (Relevance Feedback) OR Initial fixed training sets –User tags relevant/irrelevant –Routing problem  initial class

Full Probability Decomposition

P(w) = P(w1) * P(w2|w1) * P(w3|w2w1) * P(w4|w3w2w1) * …

Using Simplifying (Markov) Assumptions

P(w) = P(w1) * P(w2|w1) * P(w3|w2) * P(w4|w3) * …(Assume P(word) only conditional upon the probability of the previous word)

Assumption of Full Independence

P(w) = P(w1) * P(w2|w1) * P(w3) * P(w4|w1w3) * …

Graphical Models – Partial Decomposition into Dependence TreesP(w) = P(w1) * P(w2) * P(w3) * P(w4) * …