finding good urls: aligning entities in knowledge bases with public web document representations
TRANSCRIPT
Institute for Web Science & Technologies – WeST
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web
Document Representations
Christian Hachenberg and Thomas Gottron
Workshop Web of Linked Entities (WoLE 2012) at ISWC 2012
Sunday, 11 November 2012
Thomas Gottron WoLE Workshop 2012 2Finding Good URLs
Mapping Documents to Entities
dbpedia.org:Rob_Roy_(film)
Thomas Gottron WoLE Workshop 2012 3Finding Good URLs
Mapping Entities to Documents
dbpedia.org:Rob_Roy_(film)
Align entities in KB with public documents
• Publish knowledge base• Propagate changes• Human readable
representation
Align entities in KB with public documents
• Publish knowledge base• Propagate changes• Human readable
representation
Thomas Gottron WoLE Workshop 2012 4Finding Good URLs
Task Definition
Harrison Ford
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
dbpedia:George_Lucas
prop
erty
: sta
rrin
g
type: actor
type: movie
type: director
dbpedia:Harrison_Ford
Star Wars IV: A New Hope
George Lucas
prop
erty
: dire
cts
3 types of information:
• Labels• Link structure• Types
???
Thomas Gottron WoLE Workshop 2012 5Finding Good URLs
Label Search (using Web Search Engine)
Harrison Ford
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
dbpedia:George_Lucas
prop
erty
: sta
rrin
g
type: actor
type: movie
type: director
dbpedia:Harrison_Ford
Star Wars IV: A New Hope
George Lucas
prop
erty
: dire
cts
SW4
SW4
SW4
Implementation:
• Bing
Thomas Gottron WoLE Workshop 2012 6Finding Good URLs
Exploiting Link Structure
Harrison Ford
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
dbpedia:George_Lucas
prop
erty
: sta
rrin
g
type: actor
type: movie
type: director
dbpedia:Harrison_Ford
Star Wars IV: A New Hope
George Lucas
prop
erty
: dire
cts
GL SW4
SW4
SW4HF
Implementation:
• In-degree• PageRank• HITS
+ Variations:Topic, Focussed
Thomas Gottron WoLE Workshop 2012 7Finding Good URLs
Type Filtering
dbpedia:Star_Wars_Episode_IV:_A_New_Hope
type: movie
Star Wars IV: A New Hope
SW4
SW4
RR
SW4
GT
Implementation:
• Borda Count for domain ranking
dbpedia:Gran_Torino_(film)
type: movieGran Torino
dbpedia:Rob_Roy_(film)
type: movieRob Roy
Thomas Gottron WoLE Workshop 2012 8Finding Good URLs
Experimental Setup
100 Entities 4 domains (cities, companies, persons, movies) Stratified by little, medium and large representation on the
web Complete network of linked entities
Application of label search and link structure approaches Type-filtering as post-process
User evaluation (Cranfield setup, pooling) Graded relevance judgements High juror agreement (Krippendorff's Alpha >0.67)
Thomas Gottron WoLE Workshop 2012 9Finding Good URLs
Evaluation Metrics
At which rank can I expect the first relevant result
Average P@1: How often can I expect the first result to be relevant
Precision@1
Thomas Gottron WoLE Workshop 2012 10Finding Good URLs
Evaluation: Results
Statistically significant , p=0.05
Thomas Gottron WoLE Workshop 2012 11Finding Good URLs
Evaluation: Results (Domain, Stratum)
Thomas Gottron WoLE Workshop 2012 12Finding Good URLs
Evaluation: Results (Filtering)
Thomas Gottron WoLE Workshop 2012 13Finding Good URLs
Conclusions and Next Steps
Novel task: Mapping entities to public web URLs
– Evaluated 9 link analysis and web search methods (+1 post-processing using Borda counts)
– Best methods: Label Search and Focussed HITS• Semantic Typing boosts all results
Next steps: Investigate domain-dependent performance of methods
Thomas Gottron WoLE Workshop 2012 14Finding Good URLs
Thank you!
Contact:WeST – Institute for Web Science and Technologies
Universität Koblenz-Landau