supervised ir incremental user feedback (relevance feedback) or initial fixed training sets –user...

Supervised IR

• Incremental user feedback (Relevance Feedback)

• Initial fixed training sets– User tags relevant/irrelevant

– Routing problem initial class

Big open Question –How do we obtain feedback automatically with minimal effort?

Refined Computation of relevant set based on:

“Unsupervised” IRPredicting relevance without user feedback

Pattern Matching:–Query vector/set–Document vector/set–Co-occurrence of terms assumed to be indication of relevance

Relevance Feedback

Incremental Feedback in vector model

Refer to Rocchio, 71

Q0 = Initial Query

Q1 = Q0 + Ri - Si

NIrrel

Probabilistic IR/Text Classification

Document Retrieval

If P(Rel|Doci) > P(Irrel|Doci)

Then Doci is “relevant”

Else Doci is “not relevant”

-OR- P(Rel|Doci)

P(Irrel|Doci)

Then Doci is “relevant”…

Magnitude of ratio indicates our confidence

If > 1

Text Classification

Select Classj such that:

P(Classj | Doci) is maximized

(Bowling, DogBreeding, etc.) (incoming mail message)

Alternately

P(Classj | Doci)

P(NOT Classj | Doci)

is maximized

General FormulationCompute:

P(Classj | Evidence)

One of a fixed K classes or set of feature values

*disjoint classes* - Can’t be a (e.g. Words in a language,Medical Test Results, etc)

member of more than 1

•REL/IRREL Document Retrieval

•Work/Bowling/Dog Breeding Text Classification/Routing

•Spanish/Italian/English Language ID

•Sick/Well Medical Diagnosis

•Herpes/Not Herpes Medical Diagnosis

Feature SetGoal: To Compute:

P(Classj | Doci) Abstract Formulation

P(Classj| Representation of Doci) Probability given a

representation of Doci

P(Classj| W1, W2,…Wk) One Representation of a vector

of words in the document

P(Classj| F1, F2, … Fk) More general, a list of document

features

Problem – Sparse Feature Set

In Medical Diagnosis:

worth considering all possible feature combinations

Test 1 Test 2 Test 3 F(H), F(Not H)

Herpes T T T 30/1

-Herpes T T F 12/120

Herpes T F T 17/3

-Herpes T F F 4/186

-Herpes F T T 100/32

Can compute P(Evidence|Classi) directly from data for all evidence patterns

Eg P(T,T,F|Herpes) = 12/Total Herpes

Word 17 Word 24 Word 38 Word 54

Work C++ Compile Run 486

Personal Collie Show Pup Fur

Personal Akita Show Pup Groom

In IR:

Too many combinations of feature values to estimate class distribution after all combinations

Bayes RulePosterior probability of class given evidence Prior probability of class

P(Classi|Evidence) = P(Evidence|Classi) x P(Classi)

P(Evidence)

Uninformative prior: P(Classi) = 1

(Total # of Classes)

Example in Medical DiagnosisA single blood test Probability of test if patient has herpes

P(Herpes|Evidence) = P(Evidence|Herpes) * P(Herpes)

Probability of herpes given a test result P(Evidence) Prior probability of patient having herpes

Prob of a (pos/neg) test result

P(Herpes|Positive Test) = .9

P(Herpes|Negative Test) = .001

P(Not Herpes|Positive Test) = .1

P(Not Herpes|Negative Test) = .999

Evidence DecompositionP(Classj | Evidence)

A given combination of feature values

Medical Diagnosis

Class Blood Test Visible Sores Fever Blood Test 2

HERPES POS T T F

NOT HERP NEG F T F

NOT HERP NEG F F F

HERPES NEG T F T

Text Classification/ Routing

Class W13 W27 W34 W49 Wi

Work Compiler C++ YK486 Disassembler …

Bowling

Dog Breeding Collie Show Grooming Sire …

Personal date Tonight movie love …

Example in Text Classification / RoutingDog Breeding (collie, groom, show) Prior chance that mail is about dog breeding

P(Classi|Evidence) = P(Evidence|Classi) * P(Classi)

Observe directly through Training data P(Evidence)Class 1 – Dog Breeding Training Class 2 - Work

Fur Collie

Collie

PoodleSire

Compiler

JavaComputer

Probabalistic IRTarget/Goal:

Document

Retrieval

Evidence P(Rel|Doci) P(Irrel|Doci)

(Words in) Doc1 .95 .05

For a given model of

relevance to user’s needs

Document Routing /

Classification

Evidence P(Work1) P(Work2) P(Dog Breeding) P(Bowling) P(other)

(Words in) Doc1 .91 .01 .07 .02 .01

(Words in) Doc2 .45 .45 .03 .05 .02

(Words in) Doc3 .01 .03 .94 .01 .01

Multiple Binary SplitsQ1

Flat K-Way Classification

A B C D EF G

Likelihood Ratios

P(Class1| Evidence) = P(Evidence|Class1) * P(Class1)

P(Evidence)

P(Class2| Evidence) = P(Evidence|Class2) * P(Class2)

P(Evidence)

P(Class1|Evidence) P(Evidence|Class1) P(Class1)

P(Class2|Evidence) P(Evidence|Class2) P(Class2)

Likelihood Ratios

Binary Classifications

P(Rel|Doci) Document Retrieval options are

P(Irrel|Doci) Rel and Irrel1.

2.P(Work|Doci) Binary routing task –

P(Personal|Doci) (2 possible classes)

Can Treat K-Way classification as a series of binary classifications

P(Classj|Doci)

P(NOT Classj|Doci)

•Compute this ratio for all classes

•Choose class j for which this ratio is greatest

Independence Assumption

Evidence = w1,w2,w3,…wk

P(Class1|Evidence) P(Class1) P(Evidence|Class1)

P(Class2|Evidence) P(Class2) P(Evidence|Class2)

P(Class1) P(wi|Class1)

P(Class2) P(wi|Class2)

Final Odds Initial Odds

= * i = 1

Using Independence AssumptionP(Personal|Akita,pup,fur,show) P(Personal) P(Akita|Personal) P(pup|Personal)

P(Work|Akita,pup,fur,show) P(Work) P(Akita|Work) P(pup|Work)= * *

P(fur|Personal) P(show|Personal)

P(fur|Work) P(show|Work)* *

P(Personal|Evidence) 1 27 18 36 3

P(Work|Evidence) 9 2 0 2 5= * * * *

Product of likelihood ratios for each word

= some constant

Note: Ratios (Partially) Self Weighting

P(The|Personal) 1 5137/100,000 P(The|Work) 1 5238/100,000

e.g.( )

P(Akita|Personal) 37 37/100,000 P(Akita|Work) 1 1/100,000

e.g.( )

Bayesian Model ApplicationsAuthorship Identification

P(Hamilton|Evidence) P(Evidence|Hamilton) P(Hamilton)P(Madison|Evidence) P(Evidence|Madison) P(Madison)

Sense Disambiguation

P(Tank-Container|Evidence) P(Evidence|Tank-Container) P(Tank-Container) P(Tank-Vehicle|Evidence) P(Evidence|Tank-Vehicle) P(Tank-Vehicle)

Dependence Trees(Hierarchical Bayesian Models)

P(w1,w2,…,wk) = P(w1) * P(w2|w1) * P(w3|w2) * P(w4|w2) * P(w5|w2) * P(w6|w5) * P(w6|w5w4)

= direction of dependence

Full Probability Decomposition

P(w) = P(w1) * P(w2|w1) * P(w3|w2w1) * P(w4|w3w2w1) * …

Using Simplifying (Markov) Assumptions

P(w) = P(w1) * P(w2|w1) * P(w3|w2) * P(w4|w3) * …(Assume P(word) only conditional upon the probability of the previous word)

Assumption of Full Independence

P(w) = P(w1) * P(w2|w1) * P(w3) * P(w4|w1w3) * …

Graphical Models – Partial Decomposition into Dependence TreesP(w) = P(w1) * P(w2) * P(w3) * P(w4) * …

supervised ir incremental user feedback (relevance feedback) or initial fixed training sets –user...

Documents

twgsbmedia17a2group8.files.wordpress.com · what have you...

intranet user feedback

initial feedback on jurisdictions’ · 8/7/2015 · 1...

learning from user feedback in image retrieval...

initial screening feedback from my focus group

parks network plan review initial consultation feedback

feedback pro extension - user guide

better user feedback

feedback on initial ideas

initial feedback

initial feedback & design phase...initial feedback & design...

understanding user feedback using sentiment analyzer &...

an ontology of online user feedback in software...

initial feedback 3

secor park survey #2: feedback on initial design … ·...

360 feedback questionnaire user manual - myskillsprofile.com...

grosum : 360 feedback user guide

initial feedback and strategy options for care international...

initial ideas and feedback

initial screening with local modelling feedback example ·...