finding good urls: aligning entities in knowledge bases with public web document representations

Post on 10-May-2015

131 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Institute for Web Science & Technologies – WeST

Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web

Document Representations

Christian Hachenberg and Thomas Gottron

Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012

Sunday, 11 November 2012

Thomas Gottron WoLE Workshop 2012 2Finding Good URLs

Mapping Documents to Entities

dbpedia.org:Rob_Roy_(film)

Thomas Gottron WoLE Workshop 2012 3Finding Good URLs

Mapping Entities to Documents

dbpedia.org:Rob_Roy_(film)

Align entities in KB with public documents

• Publish knowledge base• Propagate changes• Human readable

representation

Align entities in KB with public documents

• Publish knowledge base• Propagate changes• Human readable

representation

Thomas Gottron WoLE Workshop 2012 4Finding Good URLs

Task Definition

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

3 types of information:

• Labels• Link structure• Types

???

Thomas Gottron WoLE Workshop 2012 5Finding Good URLs

Label Search (using Web Search Engine)

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

SW4

SW4

SW4

Implementation:

• Bing

Thomas Gottron WoLE Workshop 2012 6Finding Good URLs

Exploiting Link Structure

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

GL SW4

SW4

SW4HF

Implementation:

• In-degree• PageRank• HITS

+ Variations:Topic, Focussed

Thomas Gottron WoLE Workshop 2012 7Finding Good URLs

Type Filtering

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

type: movie

Star Wars IV: A New Hope

SW4

SW4

RR

SW4

GT

Implementation:

• Borda Count for domain ranking

dbpedia:Gran_Torino_(film)

type: movieGran Torino

dbpedia:Rob_Roy_(film)

type: movieRob Roy

Thomas Gottron WoLE Workshop 2012 8Finding Good URLs

Experimental Setup

100 Entities 4 domains (cities, companies, persons, movies) Stratified by little, medium and large representation on the

web Complete network of linked entities

Application of label search and link structure approaches Type-filtering as post-process

User evaluation (Cranfield setup, pooling) Graded relevance judgements High juror agreement (Krippendorff's Alpha >0.67)

Thomas Gottron WoLE Workshop 2012 9Finding Good URLs

Evaluation Metrics

At which rank can I expect the first relevant result

Average P@1: How often can I expect the first result to be relevant

Precision@1

Thomas Gottron WoLE Workshop 2012 10Finding Good URLs

Evaluation: Results

Statistically significant , p=0.05

Thomas Gottron WoLE Workshop 2012 11Finding Good URLs

Evaluation: Results (Domain, Stratum)

Thomas Gottron WoLE Workshop 2012 12Finding Good URLs

Evaluation: Results (Filtering)

Thomas Gottron WoLE Workshop 2012 13Finding Good URLs

Conclusions and Next Steps

Novel task: Mapping entities to public web URLs

– Evaluated 9 link analysis and web search methods (+1 post-processing using Borda counts)

– Best methods: Label Search and Focussed HITS• Semantic Typing boosts all results

Next steps: Investigate domain-dependent performance of methods

Thomas Gottron WoLE Workshop 2012 14Finding Good URLs

Thank you!

Contact:WeST – Institute for Web Science and Technologies

Universität Koblenz-Landau

gottron@uni-koblenz.de

top related