entity ranking using wikipedia as a pivot
DESCRIPTION
Entity Ranking Using Wikipedia as a Pivot. (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu. Outline. Introduction From Wikipedia Entities to Web Entities and back Entity Ranking on Wikipedia Entity Ranking on Web Conclusion. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
1
Entity Ranking Using Wikipedia as a Pivot
(CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps
2010/12/14 Yu-wen,Hsu
2
Outline
IntroductionFrom Wikipedia Entities to Web Entities
and backEntity Ranking on WikipediaEntity Ranking on WebConclusion
3
Introduction
Entity ranking is the task of finding documents representing entities of a correct type that are relevant to a query.presenting a ranked list of entities directly,
rather than a list of web pages with relevant but also potentially redundant information about these entities.
4
Differs from document retrieval on at least three points: i) returned documents have to represent an
entity ii) this entity should belong to a specified entity
typeiii) to create a diverse result list an entity should
only be returned once.
5
Main Goal
To Rank Web entities1. Associate target entity types with the query2. Rank Wikipedia pages according to their
similarity with the query and target entity types3. Find web entities corresponding to the
Wikipedia entities
6
Using Wikipedia as a pivotentities: Wikipedia pagesthe name of the entity: the title of the page the content of the page: the representation of
the entityEach Wikipedia page is assigned to a number
of categories: topical, type, and administrative categories.
7
From Wikipedia Entities to Web Entities and back
From Web to Wikipediathese repositories provide enough clues to find
the corresponding entities on theWeb?they contain enough entities that cover the com
plete range of entities needed to satisfy all kinds of information needs?
8
From Wikipedia to WebUse External Link
9
Entity Ranking on Wikipedia* Entity Types
Entity Type Assignmentexploit the existing Wikipedia categorization of
documentsPseudo-relevance feedback of the top retrieved
documents we extract the categories that are most frequently
assignedthe top 10 results, and look at the 2 most frequently
occurring categories belonging to these documents
10
*Entity Types -Scoring Entities
estimate background probabilities
smooth the probabilities of a term occurring in a category name with the background collection
: the name of the category : the category
: the query terms: the document: the entire Wikipedia document collection
11
Similarity between two categories
The entity type score for a document in relation to a query topic
Score Normalization
12
Entity Ranking on Wikipedia*Experimental Setup
Data Set:INEX: specific, ex countries, national parks.. TREC: people, organization, product
Advantage: clear, few options, could be easily selected
Disadvantage: cover a small part of all possible entity ranking queries
manually assigned more specific entity types
13
rerank the top 2,500 results of the baselineManually assigned (author)Automatically assigned (PRF)
evaluation2009 TREC:P10 and NDCG@20INEX:P10 and MAP
INEX 2006-2008 consisting of 79 topics INEX 2009 topics consisting of a selection of 55 topics
from the 2006-2008 topics.
only count the so-called ‘primary’ pages
14
15
Entity Ranking on The Web
We have three approaches for finding web pages associated with Wikipedia pages.1. External links:
the External links section of the Wikipedia page2. Anchor text:
Wikipedia page title as query retrieve pages from the anchor text index
3. Combined:not all Wikipedia pages have external linksnot all external links of Wikipedia pages are part of the
Clueweb collectionless than 3 webpages are found, we fill up the results to 3
pages using the top pages retrieved using anchor text
16
17
Conclusion
Our experiments show that our wikipedia-as-a-pivot approach outperforms a baselines of full-text search.
Both external links on Wikipedia pages, and searching an anchor text index of the web are effective approaches to find homepages for entities represented by Wikipedia pages.