affinity rank yi liu, benyu zhang, zheng chen msra
Post on 15-Jan-2016
224 views
TRANSCRIPT
![Page 1: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/1.jpg)
Affinity RankYi Liu, Benyu Zhang, Zheng Chen
MSRA
![Page 2: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/2.jpg)
Outline Motivation Related Work Model & Algorithm Evaluation Conclusion & Future work
![Page 3: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/3.jpg)
Search for Useful Information
Full-text search
Importance Judgment
Manual compilation
Failure Still Exists
![Page 4: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/4.jpg)
Example – “Spielberg”
Search
![Page 5: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/5.jpg)
Example – “Spielberg” Search (Cont.)
![Page 6: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/6.jpg)
Motivation Existing problem in IR applications
Similar search results dominate in top one/two pages Users feel tired to similar results of same topic Users cannot find what they need in those similar results
Situations where problem are/will be intensified Highly repetitive corpus, e.g.
Newsgroup News archive Specialized website
Generalized or short query
![Page 7: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/7.jpg)
Diversity & Informativeness Diversity The coverage of different topics of a group of documents
InformativenessTo what extent a document can represent its topic locality
(high informativeness: inclusive)
![Page 8: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/8.jpg)
Why? Traditional IR evaluation measure
Maximize relevance between query & results Most important results
To end-usersrelevant + important ≠ desirable
A way out Increase diversity in top results Increase the informativeness of each single results
![Page 9: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/9.jpg)
Basic Idea Build similarity-based link map Link analysis Affinity Rank
indicating the informativeness of each document Rank adjustment
Only the most informative of each topic can rank high
Re-rank with Affinity Rank More diversified top results More informative top results
![Page 10: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/10.jpg)
Related Work – link analysis Explicit
PageRank (Page et al. 1998)
HITS(Kleinberg, 1998)
Implicit DirectHit
(http://www.directhit.com) Small Web Search
(Xue et al. 2003)
Web author’s perspective
End-user’s perspective
Subjective Objective
![Page 11: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/11.jpg)
Related Work – Clustering
Algorithm Complexity Naming
Scatter/Gather* O(kn) Centroid + ranked words
TopCat High Set of named entities
WBSC* O(m2+n) Ranked words
STC* O(n) Sets of N-grams
IF O(kn) -
PRSA O(knm) Ranked words
Bipartite O(nm)? Ranked words
n: #doc k: #clusters m: #words
* applied on clustering search results
![Page 12: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/12.jpg)
Our proposed IR framework
AffinityGraph
Informativeness
DiversityPenalty
Relevance
DocumentCollection
Query
Query-independent
Query-dependentOutput
AffinityRank
Re-rank
![Page 13: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/13.jpg)
Link Construction Similarity to directed link Directed graph Threshold
Save storage space Reduce noise brought by
overwhelmingly large amount of weak-similarity-links
BA
),cos(),( BABAsim
A
BAsimBAaff
),(),(
BA
B
BAsimABaff
),(),(
![Page 14: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/14.jpg)
AssumptionObservation : relation among documents varies
Some are similar, others are not Similarity varies
The more relatives a document has, the more informative it is itself
The more informative a document’s relatives are, the more informative it is itself
![Page 15: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/15.jpg)
Link Analysis Link map adjacency matrix Row Normalize Based on two assumptions
Principal eigenvector rank score Implementation: Power Method
ijall
jiji MARAR
,
~ M
~
n
cMARcAR
ijalljiji
)1(~
,
en
cc
)1(~ M
![Page 16: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/16.jpg)
“Random Transform” Model A transforming document
jump from doc. to doc. at each time step
Markov Chainstationary transition probability principle eigenvector
informativeness
ccurrent
doc.
“relative” doc.
randomly picked doc.c1
) ( affinity
![Page 17: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/17.jpg)
Rank Adjustment Greedy-like Algorithm
decrease the score of j by the part conveyed from i (the most informative one in the same topic)
T1-1
T1-6T1-5
T1-4
T1-3
T1-2
T2-3
T2-3
T2-1
iijjj ARMARAR ,
~
![Page 18: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/18.jpg)
Re-rank Score-combine scheme
where
Rank-combine scheme
i
i
ii d
AR
AR
qSim
dqSimdqScore ,
log
log
)(
),(),(
),()( id dqSimMaxqSimi
id ARMaxARi
iARdqSimi dRankRankdqScoreii
, ),( ),(
1
![Page 19: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/19.jpg)
Advantages of Affinity Rank Give attention to both diversity and
informativeness Implicitly expand the query towards the
multiple topics Automatically pick the representative ones for
each chosen topic Most of the computation can be computed
OFFLINE
![Page 20: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/20.jpg)
Experiment Setup Dataset
Microsoft Newsgroup 117 Office product related newsgroups
256,449 posts (mainly in 4 months), about 400M Preprocess
Title & text body (citation, signature, etc. stripped) Stemming, stop words removal, tfidf weighting
Query Randomly picked 20 query scenarios with query words
Search Results Okapi Top 50 results as answer set
![Page 21: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/21.jpg)
Evaluation – ground truth User Study
4 users independently evaluate all results For each query
First manually cluster all results into different topics Then score each result in terms of its informativeness in
corresponding topic Finally score each result in terms of its relevance to the
query Evaluation
Compare original ranking with new ranking (re-ranked by Affinity Rank)
3 aspects of ranking concerned -- diversity, informativeness & relevance in top n results
![Page 22: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/22.jpg)
Definitions Diversity
diversity = No. of different topics in a document group
Informativeness3 - very informative2 - informative1 - somewhat informative0 - not informative
Relevance1 - relevant0 – hard to tell-1 - irrelevant
![Page 23: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/23.jpg)
Experiment Result (1) Top 10 search results
Compared to traditional IR results
DiversityInformative
nessRelevance
RelativeChange
+31.02% +11.97% +0.72%
p value(t-test)
0.004632 0.002225 0.067255
Significant improvement in diversity & informative
without loss in relevance
![Page 24: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/24.jpg)
Experiment Result (2) Diversity Improvement Informative Improvement
Affinity Rank efficiently improves both diversity & informativeness of top search results
(Re-ranking top 50 results all by Affinity Rank, e.g. )0iw
![Page 25: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/25.jpg)
Experiment Result (3) - Parameter Tuning
Top 10 search results
Affinity Rank is robust
1. Parameter doesn’t affect much if enough weight is given
2. No over-tune problem - Simply re-rank all by Affinity Rank is nearly optimal)
![Page 26: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/26.jpg)
Experiment Result (4) - Parameter Tuning
Improvement overview subject to weight adjustment
Affinity Rank STABLELY exerts positive influence on
diversity & informativeness enhancement
![Page 27: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/27.jpg)
Conclusion A new IR framework Affinity Rank can help to improve diversity &
informativeness of search results, especially for TOP ones
Affinity Rank is computed offline, therefore brings few burden in online retrieval
Future work Metrics for information quantity measurement Scale to large collection
![Page 28: Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA](https://reader035.vdocument.in/reader035/viewer/2022062314/56649d6a5503460f94a4843f/html5/thumbnails/28.jpg)
Thanks