michael bendersky, w. bruce croft dept. of computer science univ. of massachusetts amherst amherst,...
DESCRIPTION
Motivation Goal : retrieve more relevant documents to users Query Representation : 3 This paper term dependencies concept dependencies bag-of-wordsTRANSCRIPT
Michael Bendersky , W. Bruce CroftDept. of Computer Science
Univ. of Massachusetts AmherstAmherst, MA
SIGIR 2012
1
• Motivation• Query Hypergraphs• Ranking Documents• Parameter estimation• Evaluation• Conclusion
2
Outline
Motivation• Goal : retrieve more relevant documents to
users• Query Representation :
3
This paper
term dependencies
concept dependencies
bag-of-words
Example • ”Provide information on the use of dogs worldwide for law enforcement purposes.”
• bag-of-word { Provide, information, dog….}• term dependency {(Provide, information ),( law, enforcement)}• concept dependency {(dog, law enforcement),..}
4
• ”Provide information on the use of dogs worldwide for law enforcement
purposes.”
5
Example(cont.)
{provide, information,( law, enforcement)} {(dog, law enforcement)}
Model concept dependency
• Use Query Hypergraphs 1. build linguistic structure ” members of the rock group nirvana” 2. each element in the structures can be represented as a concept
6
Query Hypergraphs• Query Hypergraph
7
(international art crime)
D: a document
V = {D,i,a,c,ac}
E = {({i},D),({a},D),({c},D),({ac},D),({i,a,c,ac},D)}
hyperedge
Query Hypergraph Induction
• Three types of structures
8
• query term structure : individual query words • phrase structure : bi-gram (consider order)• proximity structure : arbitrary subsets of query terms
Hyperedges• Local hyperedges ({k},D)• Global hyperedge ( ,D)
9
QK
k: a conceptQK : set of query concepts
k QK
Ranking Documents• relevance score
10
Q: a queryD: a documente: a hyperedge E: set of hyperedges
Factor: )( ,Dkee
Local Factors
11
)(k : the importance weight of the concept k
: a matching function between the concept k and the document D
Matching Function
12
DCCktfDktf
Dkf
),(),(log),(
C: the collectionD
C
: the number of term in the document
: the number of term in the collection
: Dirichlet smoothing parameter
• consider the dependency between the entire set of query concepts
13
Global Factor
: the highest score passage from the document
The dependency range is much longer for concept dependencies.
),( QKk : the importance weight of concept k in the context of the entire set of query concepts QK (with the concept in the passage )
Example
14
{(dog, law enforcement)}
Don’t appear in the same sentence, but co-occurrence in a largertext passage.
Query Hypergraph Parameterization
• Goal: parameterize concept weights (local & global)
15
)(k ),( QKk
• Parameterization By Structure• Parameterization By Concept
Parameterization By Structure
16
: a structure
• parameterize the concept weights based on the concepts themselves
17
Parameterization By Concept
concept importance feature
estimation
Parameter Estimation• optimize a target metric (mean average
precision)• rely on a large collection• use coordinate ascent algorithm - a coordinate-level hill climbing search• repeatedly cycles through each of
parameters , while holding all other parameters fixed
18
)(
19
Parameter Estimation(cont.)
Optimize the local component (the weight ))(k
retrieve top thousand documents
optimize the global component (the weight )),( QKk
Parameter Estimation(cont.)
20
(Robust04 collection)
Evaluation(testing)• search engine - Indri • test collections
• query
21
Evaluation(evaluation metric)• MAP(mean average precision)
ex. Topic 1 : 3 個相關 (order: 1,3,5) (1/1+2/3+3/5)/3
• ERR@k (expected reciprocal rank, k=20)
22
1
11
))(1()( k
jj
k
i
i gRigR g= 0,1,2,3,4
R(g)=(2^g-1)/16
satisfied by doc k
not satisfied with previous doc (1~k-1)
Evaluation(retrieval performance)
23
Conclusion• model arbitrary term dependencies as
concepts• uses passage-level evidence to model the
dependencies between the concepts • assign weight to both concepts and
concept dependencies• The proposed retrieval framework
improves the retrieval effectiveness for verbose natural queries.
24