michael bendersky, w. bruce croft dept. of computer science univ. of massachusetts amherst amherst,...

24
Michael Bendersky , W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Upload: morris-thomas

Post on 17-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Motivation Goal : retrieve more relevant documents to users Query Representation : 3 This paper term dependencies concept dependencies bag-of-words

TRANSCRIPT

Page 1: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Michael Bendersky , W. Bruce CroftDept. of Computer Science

Univ. of Massachusetts AmherstAmherst, MA

SIGIR 2012

1

Page 2: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

• Motivation• Query Hypergraphs• Ranking Documents• Parameter estimation• Evaluation• Conclusion

2

Outline

Page 3: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Motivation• Goal : retrieve more relevant documents to

users• Query Representation :

3

This paper

term dependencies

concept dependencies

bag-of-words

Page 4: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Example • ”Provide information on the use of dogs worldwide for law enforcement purposes.”

• bag-of-word { Provide, information, dog….}• term dependency {(Provide, information ),( law, enforcement)}• concept dependency {(dog, law enforcement),..}

4

Page 5: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

• ”Provide information on the use of dogs worldwide for law enforcement

purposes.”

5

Example(cont.)

{provide, information,( law, enforcement)} {(dog, law enforcement)}

Page 6: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Model concept dependency

• Use Query Hypergraphs 1. build linguistic structure ” members of the rock group nirvana” 2. each element in the structures can be represented as a concept

6

Page 7: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Query Hypergraphs• Query Hypergraph

7

(international art crime)

D: a document

V = {D,i,a,c,ac}

E = {({i},D),({a},D),({c},D),({ac},D),({i,a,c,ac},D)}

hyperedge

Page 8: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Query Hypergraph Induction

• Three types of structures

8

• query term structure : individual query words • phrase structure : bi-gram (consider order)• proximity structure : arbitrary subsets of query terms

Page 9: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Hyperedges• Local hyperedges ({k},D)• Global hyperedge ( ,D)

9

QK

k: a conceptQK : set of query concepts

k QK

Page 10: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Ranking Documents• relevance score

10

Q: a queryD: a documente: a hyperedge E: set of hyperedges

Factor: )( ,Dkee

Page 11: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Local Factors

11

)(k : the importance weight of the concept k

: a matching function between the concept k and the document D

Page 12: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Matching Function

12

DCCktfDktf

Dkf

),(),(log),(

C: the collectionD

C

: the number of term in the document

: the number of term in the collection

: Dirichlet smoothing parameter

Page 13: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

• consider the dependency between the entire set of query concepts

13

Global Factor

: the highest score passage from the document

The dependency range is much longer for concept dependencies.

),( QKk : the importance weight of concept k in the context of the entire set of query concepts QK (with the concept in the passage )

Page 14: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Example

14

{(dog, law enforcement)}

Don’t appear in the same sentence, but co-occurrence in a largertext passage.

Page 15: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Query Hypergraph Parameterization

• Goal: parameterize concept weights (local & global)

15

)(k ),( QKk

• Parameterization By Structure• Parameterization By Concept

Page 16: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Parameterization By Structure

16

: a structure

Page 17: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

• parameterize the concept weights based on the concepts themselves

17

Parameterization By Concept

concept importance feature

estimation

Page 18: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Parameter Estimation• optimize a target metric (mean average

precision)• rely on a large collection• use coordinate ascent algorithm - a coordinate-level hill climbing search• repeatedly cycles through each of

parameters , while holding all other parameters fixed

18

)(

Page 19: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

19

Parameter Estimation(cont.)

Optimize the local component (the weight ))(k

retrieve top thousand documents

optimize the global component (the weight )),( QKk

Page 20: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Parameter Estimation(cont.)

20

(Robust04 collection)

Page 21: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Evaluation(testing)• search engine - Indri • test collections

• query

21

Page 22: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Evaluation(evaluation metric)• MAP(mean average precision)

ex. Topic 1 : 3 個相關 (order: 1,3,5) (1/1+2/3+3/5)/3

• ERR@k (expected reciprocal rank, k=20)

22

1

11

))(1()( k

jj

k

i

i gRigR g= 0,1,2,3,4

R(g)=(2^g-1)/16

satisfied by doc k

not satisfied with previous doc (1~k-1)

Page 23: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Evaluation(retrieval performance)

23

Page 24: Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR 2012 1

Conclusion• model arbitrary term dependencies as

concepts• uses passage-level evidence to model the

dependencies between the concepts • assign weight to both concepts and

concept dependencies• The proposed retrieval framework

improves the retrieval effectiveness for verbose natural queries.

24