3.3 probabilistic ir - htw saar · 2019-02-28 · 28 3.3 probabilistic ir § vector space model is...

3.3 Probabilistic IR§ Vector space model is commonly criticized for being

heuristic and lacking a clear model of when adocument should be considered relevant

§ Probabilistic IR relies on probability theory to model the event that a document d is relevant to a query q

§ This probability is then estimated based on the termscontained in the document and the query

Information Retrieval / Chapter 3: Retrieval Models

Events and Probabilities§ Let’s consider two events A and B

§ A is the event that an object is a circle§ B is the event that an object is green

§ We refer to A ∧ B as the joint event that an objectis a green circle

P [ A ] = 59 P [ B ] = 4

P [ A · B ] = P [ A, B ] = 39

Conditional Probabilities§ The conditional probability P[B|A] (B given A) is the

probability that the event B occurs if we already know that the event A has occurred

P [ B | A ] = P [ A · B ]P [ A ]

P [ B | A ] = 35

P [ A | B ] = 34

Independence§ Two events A and B are called (stochastically)

independent, if the following holds for their joint probability

§ In our example, the events A and B are not independent

P [ A · B ] = P [ A ] P [ B ]

39 ”= 5

Bayes’ Theorem§ Thomas Bayes (1701-1761) famously

observed the following theoremregarding the conditionalprobabilities of events

§ Bayes’ theorem is particularly useful when, for two eventsA and B, one of the conditional probabilities is easyto estimate, but the other is hard to estimate

Source: en.wikipedia.orgP [ A | B ] = P [ B | A ] P [ A ]P [ B ]

Bayes’ Theorem in Action§ Example: Examining animals in the wild

§ A is the event that the animal is a fox§ B is the event that the animal has rabies (“Tollwut”)

§ Assume that we know the following probabilities

§ P[A] = 0.1 (e.g., estimated based on video surveillance)

§ P[B] = 0.05 (e.g., estimated based on hunted animals)

§ P[A|B] = 0.25 (e.g., estimated based on deceased animals)

§ We can now estimate the probability that a fox has rabies

P [ B | A ] = 0.25 · 0.050.1 = 0.125

Probabilistic Ranking Principle (PRP)§ Probabilistic Ranking Principle (PRP) suggests that

documents should be ranked in descendingorder of their probability

of being relevant to the query (R = 1 indicates the event of observing a relevant document)

§ PRP maximizes precision under the assumptions that the probabilities can be determined exactly and that they are independent (both questionable assumptions)

P [ R = 1 | d, q ]

Binary Independence Model§ Binary Independence Model (BIM) considers documents

and queries as sets of terms, i.e., a term either occursin a document or it doesn’t

§ BIM assumes that terms occur independently from eachother in documents (a questionable assumption)

§ Documents are ranked, following the PRP, according totheir probability P[R = 1 | d, q] with

P [ R = 1 | d, q ] + P [ R = 0 | d, q ] = 1

Binary Independence Model§ We obtain the same ranking of documents, if we consider

their so-called odds ratios

§ Applying Bayes’ theorem we obtain

O [ R | d, q ] =P [ R = 1 | d, q ]

P [ R = 0 | d, q ]

O [ R | d, q ] =P [ R = 1 | q ]

P [ R = 0 | q ]· P [ d | R = 1, q ]

P [ d | R = 0, q ]

Ã P [ d | R = 1, q ]P [ d | R = 0, q ]

{Constant

(depends only on q)

Binary Independence Model§ Assuming that terms occur independently

with V as the vocabulary of all known terms

§ Assuming that only terms from the query play a role

P [ d | R = 1, q ]P [ d | R = 0, q ] =

P [ v | R = 1, q ]P [ v | R = 0, q ]

P [ d | R = 1, q ]P [ d | R = 0, q ] ¥

P [ v | R = 1, q ]P [ v | R = 0, q ]

Binary Independence Model§ We can distinguish between terms that occur in a

document and terms that don’t

§ Let pv and uv denote the probabilities that a term v occurs in a relevant and irrelevant document, respectively

P [ d | R = 1, q ]P [ d | R = 0, q ] ¥

vœqvœd

P [ v | R = 1, q ]P [ v | R = 0, q ] ·

vœqv ”œd

P [ v | R = 1, q ]P [ v | R = 0, q ]

P [ d | R = 1, q ]P [ d | R = 0, q ] ¥

vœqvœd

uv·Ÿ

vœqv ”œd

1 ≠ pv

1 ≠ uv

Binary Independence Model§ This can be rewritten as

P [ d | R = 1, q ]P [ d | R = 0, q ] ¥

vœqvœd

pv (1 ≠ uv)uv (1 ≠ pv) ·

1 ≠ pv

1 ≠ uv

Constant(depends only on q)

vœqvœd

pv (1 ≠ uv)uv (1 ≠ pv)

Computing with Probabilities§ When representing probabilities as floating point numbers

(e.g., double in Java) we have to worry aboutnumerical imprecision

§ We can mitigate the problem of numerical imprecisionby applying a logarithmic transformation, thus turningproducts into sums and operating with logarithmsof probabilities

Computing with Probabilities

Binary Independence Model§ Applying a logarithmic transformation to the binary

independence model, we obtain

§ We can return documents in descending order of theirrank status value (RSVd) and obtain the sameranking that we would have obtained whencomputing with the actual probabilities

§ How can we estimate the probabilities pv and uv?Information Retrieval / Chapter 3: Retrieval Models

vœqvœd

pv (1 ≠ uv)uv (1 ≠ pv)

db =ÿ

vœqvœd

log pv (1 ≠ uv)uv (1 ≠ pv) = RSVd

Binary Independence Model§ Assuming that most documents in the document

collection are irrelevant to any query, we estimate

as the probability that the term v occurs ina document that is irrelevant to the query

uv = df (v)|D|

Binary Independence Model§ We have no information about which documents are

relevant to the query and thus estimate

as the probability that the term v occurs in a document that is relevant to the query

pv = (1 ≠ pv) = 0.5

Binary Independence Model§ Retrieval status value RSVd can thus be rewritten as

the following variant of tf.idf

under the assumption that most terms occur rarely

RSVd =ÿ

vœqvœd

log pv (1 ≠ uv)uv (1 ≠ pv) =

vœqvœd

log (1 ≠ uv)uv

vœqvœd

11 ≠ df (v)

df (v)|D|

vœqvœd

log |D| ≠ df (v)df (v)

vœqvœd

log |D|df (v)

Binary Independence Model§ Binary Independence Model has been shown to obtain

good results on collections with documents havinghomogeneous lengths, it does not work well whendocument lengths differ a lot (e.g., on the Web)

§ Relevance feedback by a user can be incorporated when estimating the probabilities pv and uv

§ While more principled than the vector space model,many of the assumptions made are questionablein practice (e.g., independence of terms)

Okapi BM25§ Okapi BM25 is a probabilistic retrieval model that builds on

the binary independence model but takes term frequencies into account

§ It assumes that terms in relevant and irrelevant documents are distributed according to a Poisson distribution

§ Derivation of the rank status value is beyondthe scope of this lecture

P [ tf (v, d) = k ] = ⁄k

k! e≠⁄

Okapi BM25

§ Parameter k1 controls influence of term frequencies

§ k1 = 0.0 yields a binary model similar to the BIM

§ k1 = 1.2 is a common choice in practice

§ Parameter b controls the normalization of term frequencies based on the document length |d| and the average document length avdl

§ b = 0.0 ignores document lengths

§ b = 0.75 is a common choice in practice

RSVd =ÿ

(k1 + 1) tf (v, d)k1 ((1 ≠ b) + b (|d|/avdl)) + tf (v, d) log |D| ≠ df (v) + 0.5

df (v) + 0.5

Okapi BM25§ Okapi BM25F as an extension that can deal with fielded

documents (e.g., title, abstract, body)

§ Okapi BM25 has been shown to yield excellent results in different settings and is considered one of the state of the art retrieval models (e.g., available in Apache Lucene)

§ While more principled than the vector space model,many of the assumptions made are questionablein practice (e.g., independence of terms)

Summary§ Probabilistic IR relies on probability theory to model the

event that a document is relevant to a query

§ Probabilistic Ranking Principle suggests to rank documents according to their probability of being relevant

§ Binary Independence Model considers whether a term occurs in a document or not and assumes independence

§ Okapi BM25 as a more sophisticated model that yields good results and is considered state of the art

Literature[1] C. D. Manning, P. Raghavan, and H. Schütze:

Introduction to Information Retrieval,Cambridge University Press, 2008 (Chapter 11)

[2] W. B. Croft, D. Metzler, and T. Strohman:Search Engines – Information Retrievalin Practice, Pearson Education, 2009 (Chapter 7)

3.3 probabilistic ir - htw saar · 2019-02-28 · 28 3.3 probabilistic ir § vector space model is...

Documents

ir models based on predicate logic - uni-due.de · the...

1 probabilistic data management for the digital home: the...

lecture 8: probabilistic ir and relevance feedback

introduction to information retrieval ` `%%%`# ` ~~~false...

irdm ws 2005 4-1 chapter 4: advanced ir models 4.1...

hd ir waterproof fixed network camera user s manual hd...

probabilistic design introduction an example motivation...

probabilistic record linkages for generating a...

information*retrieval - national university of...

modelling and verification of probabilistic...

introduction to information retrieval ` `%%%`#`&12...

probabilistic mapping - probabilistic robotics dt4051...

introduction to information retrieval ` `%%%`# `...

high definition intelligent ir ip speed dome camera ·...

a probabilistic approach for validation of advanced driver...

hp jfs 3.3 and hp onlinejfs 3.3 veritas file system 3.3

· 2016-01-22 · 0.5—00 m 1--1 ( 1--1 . inst. no....

introduction go probabilistic models for ir

ir-113e/ir-123e - edimax€¦ · the program is free...

probabilistic programming and stan · probabilistic...