finding good urls: aligning entities in knowledge bases with public web document representations

14
Institute for Web Science & Technologies – WeST Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations Christian Hachenberg and Thomas Gottron Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012 Sunday, 11 November 2012

Upload: thomas-gottron

Post on 10-May-2015

130 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Institute for Web Science & Technologies – WeST

Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web

Document Representations

Christian Hachenberg and Thomas Gottron

Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012

Sunday, 11 November 2012

Page 2: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 2Finding Good URLs

Mapping Documents to Entities

dbpedia.org:Rob_Roy_(film)

Page 3: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 3Finding Good URLs

Mapping Entities to Documents

dbpedia.org:Rob_Roy_(film)

Align entities in KB with public documents

• Publish knowledge base• Propagate changes• Human readable

representation

Align entities in KB with public documents

• Publish knowledge base• Propagate changes• Human readable

representation

Page 4: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 4Finding Good URLs

Task Definition

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

3 types of information:

• Labels• Link structure• Types

???

Page 5: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 5Finding Good URLs

Label Search (using Web Search Engine)

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

SW4

SW4

SW4

Implementation:

• Bing

Page 6: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 6Finding Good URLs

Exploiting Link Structure

Harrison Ford

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

dbpedia:George_Lucas

prop

erty

: sta

rrin

g

type: actor

type: movie

type: director

dbpedia:Harrison_Ford

Star Wars IV: A New Hope

George Lucas

prop

erty

: dire

cts

GL SW4

SW4

SW4HF

Implementation:

• In-degree• PageRank• HITS

+ Variations:Topic, Focussed

Page 7: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 7Finding Good URLs

Type Filtering

dbpedia:Star_Wars_Episode_IV:_A_New_Hope

type: movie

Star Wars IV: A New Hope

SW4

SW4

RR

SW4

GT

Implementation:

• Borda Count for domain ranking

dbpedia:Gran_Torino_(film)

type: movieGran Torino

dbpedia:Rob_Roy_(film)

type: movieRob Roy

Page 8: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 8Finding Good URLs

Experimental Setup

100 Entities 4 domains (cities, companies, persons, movies) Stratified by little, medium and large representation on the

web Complete network of linked entities

Application of label search and link structure approaches Type-filtering as post-process

User evaluation (Cranfield setup, pooling) Graded relevance judgements High juror agreement (Krippendorff's Alpha >0.67)

Page 9: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 9Finding Good URLs

Evaluation Metrics

At which rank can I expect the first relevant result

Average P@1: How often can I expect the first result to be relevant

Precision@1

Page 10: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 10Finding Good URLs

Evaluation: Results

Statistically significant , p=0.05

Page 11: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 11Finding Good URLs

Evaluation: Results (Domain, Stratum)

Page 12: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 12Finding Good URLs

Evaluation: Results (Filtering)

Page 13: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 13Finding Good URLs

Conclusions and Next Steps

Novel task: Mapping entities to public web URLs

– Evaluated 9 link analysis and web search methods (+1 post-processing using Borda counts)

– Best methods: Label Search and Focussed HITS• Semantic Typing boosts all results

Next steps: Investigate domain-dependent performance of methods

Page 14: Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Document Representations

Thomas Gottron WoLE Workshop 2012 14Finding Good URLs

Thank you!

Contact:WeST – Institute for Web Science and Technologies

Universität Koblenz-Landau

[email protected]