entity ranking using wikipedia as a pivot

17
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu

Upload: maeve

Post on 09-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

Entity Ranking Using Wikipedia as a Pivot. (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu. Outline. Introduction From Wikipedia Entities to Web Entities and back Entity Ranking on Wikipedia Entity Ranking on Web Conclusion. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Entity Ranking Using Wikipedia as a Pivot

1

Entity Ranking Using Wikipedia as a Pivot

(CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps

2010/12/14 Yu-wen,Hsu

Page 2: Entity Ranking Using Wikipedia as a Pivot

2

Outline

IntroductionFrom Wikipedia Entities to Web Entities

and backEntity Ranking on WikipediaEntity Ranking on WebConclusion

Page 3: Entity Ranking Using Wikipedia as a Pivot

3

Introduction

Entity ranking is the task of finding documents representing entities of a correct type that are relevant to a query.presenting a ranked list of entities directly,

rather than a list of web pages with relevant but also potentially redundant information about these entities.

Page 4: Entity Ranking Using Wikipedia as a Pivot

4

Differs from document retrieval on at least three points: i) returned documents have to represent an

entity ii) this entity should belong to a specified entity

typeiii) to create a diverse result list an entity should

only be returned once.

Page 5: Entity Ranking Using Wikipedia as a Pivot

5

Main Goal

To Rank Web entities1. Associate target entity types with the query2. Rank Wikipedia pages according to their

similarity with the query and target entity types3. Find web entities corresponding to the

Wikipedia entities

Page 6: Entity Ranking Using Wikipedia as a Pivot

6

Using Wikipedia as a pivotentities: Wikipedia pagesthe name of the entity: the title of the page the content of the page: the representation of

the entityEach Wikipedia page is assigned to a number

of categories: topical, type, and administrative categories.

Page 7: Entity Ranking Using Wikipedia as a Pivot

7

From Wikipedia Entities to Web Entities and back

From Web to Wikipediathese repositories provide enough clues to find

the corresponding entities on theWeb?they contain enough entities that cover the com

plete range of entities needed to satisfy all kinds of information needs?

Page 8: Entity Ranking Using Wikipedia as a Pivot

8

From Wikipedia to WebUse External Link

Page 9: Entity Ranking Using Wikipedia as a Pivot

9

Entity Ranking on Wikipedia* Entity Types

Entity Type Assignmentexploit the existing Wikipedia categorization of

documentsPseudo-relevance feedback of the top retrieved

documents we extract the categories that are most frequently

assignedthe top 10 results, and look at the 2 most frequently

occurring categories belonging to these documents

Page 10: Entity Ranking Using Wikipedia as a Pivot

10

*Entity Types -Scoring Entities

estimate background probabilities

smooth the probabilities of a term occurring in a category name with the background collection

: the name of the category : the category

: the query terms: the document: the entire Wikipedia document collection

Page 11: Entity Ranking Using Wikipedia as a Pivot

11

Similarity between two categories

The entity type score for a document in relation to a query topic

Score Normalization

Page 12: Entity Ranking Using Wikipedia as a Pivot

12

Entity Ranking on Wikipedia*Experimental Setup

Data Set:INEX: specific, ex countries, national parks.. TREC: people, organization, product

Advantage: clear, few options, could be easily selected

Disadvantage: cover a small part of all possible entity ranking queries

manually assigned more specific entity types

Page 13: Entity Ranking Using Wikipedia as a Pivot

13

rerank the top 2,500 results of the baselineManually assigned (author)Automatically assigned (PRF)

evaluation2009 TREC:P10 and NDCG@20INEX:P10 and MAP

INEX 2006-2008 consisting of 79 topics INEX 2009 topics consisting of a selection of 55 topics

from the 2006-2008 topics.

only count the so-called ‘primary’ pages

Page 14: Entity Ranking Using Wikipedia as a Pivot

14

Page 15: Entity Ranking Using Wikipedia as a Pivot

15

Entity Ranking on The Web

We have three approaches for finding web pages associated with Wikipedia pages.1. External links:

the External links section of the Wikipedia page2. Anchor text:

Wikipedia page title as query retrieve pages from the anchor text index

3. Combined:not all Wikipedia pages have external linksnot all external links of Wikipedia pages are part of the

Clueweb collectionless than 3 webpages are found, we fill up the results to 3

pages using the top pages retrieved using anchor text

Page 16: Entity Ranking Using Wikipedia as a Pivot

16

Page 17: Entity Ranking Using Wikipedia as a Pivot

17

Conclusion

Our experiments show that our wikipedia-as-a-pivot approach outperforms a baselines of full-text search.

Both external links on Wikipedia pages, and searching an anchor text index of the web are effective approaches to find homepages for entities represented by Wikipedia pages.