ranking the linked data: the case of dbpedia - icwe 2010

27
10th International Conference on Web Engineering, Vienna July 5-9, 2010 Ranking the LinkedData: the case of DBpedia Roberto Mirizzi 1 , Azzurra Ragone 1,2 , Tommaso Di Noia 1 , Eugenio Di Sciascio 1 1 Politecnico di Bari Via Orabona, 4 70125 Bari (ITALY) 2 University of Trento Via Sommarive, 14 38100 Trento (ITALY)

Upload: roberto-mirizzi

Post on 11-May-2015

712 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Ranking the LinkedData: the case of DBpedia

Roberto Mirizzi1, Azzurra Ragone1,2, Tommaso Di Noia1, Eugenio Di Sciascio1

1Politecnico di BariVia Orabona, 470125 Bari (ITALY)

2University of TrentoVia Sommarive, 14

38100 Trento (ITALY)

Page 2: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Outline

• Tags are all around• NOT (Not Only Tag): what is it?• NOT a look behind the curtains:– Ranking of RDF resources: an hybrid approach

• Evaluation• Conclusion and Future Work

Page 3: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Tags are all around

Page 5: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Tagging: a double face

Annotation phase Retrieval phase

Page 6: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Problems with annotation

• Insert as much as possible tags (time consuming):– different versions of the same tag to catch all the

possible searches– Multilingual tags

Page 7: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Problem with retrieval

• Exactly (syntactic) match among tags: web service is different from web services, webservices,…

Page 8: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Why not to use Semantic tags?

Plugged into the Web 3.0DisambiguationRelations among tagsMachine understandable

NOT: Not Only Tag

http://sisinflab.poliba.it/not-only-tag/

Page 9: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Demo

• Let’s imagine to tag the book:

Page 10: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

NOT

http://sisinflab.poliba.it/not-only-tag/

Page 11: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Smarter taggingAn

nota

tion

phas

eRe

trie

val p

hase

Page 12: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

What is behind NOT?

• DBpedia graph exploration• Computation of similarity value between each

pair of RDF resources using external information sources (search engines, bookmarking systems)

Page 13: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

What is behind NOT? (II)

Page 14: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

What is behind NOT? (III)

Page 15: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

What is behind NOT? (IV)

Semantic_Web XML-based_standards

Knowledge_representation Data_management Internet_architecture

Triplestores Folksonomy

XML Computer_and_telecommunication_stantards

Web_services User_interface_markup_languages Scalable_Vector_GraphicsMicroformats

skos:subject skos:broaderCategoryArticle

Legend

……

Resource Description Framework

Microformat

RDFa

Page 16: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

DBpedia-Ranker: hybrid ranking

?r1 ?r2isSimilar

v

hasValue

)(

),(

)(

),(),(

2

21

1

2121 rf

rrf

rf

rrfrrsim

viceversaand r and rbetween wikilink,2

saor vicever r and rbetween k wikilin,1

r and rbetween wikilink no ,0

),(

21

21

21

21 rrorewikilinkSc

)(

),(),(

2

1221 rl

rrlrroreabstractSc

Graph-based ranking

External sources-based ranking

Page 17: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Functional Architecture

Back-end

Query engine

Storage

Cloud GeneratorGUI

Ext.

Info

Sou

rces

DBpedia Lookup Service

Delicious

Yahoo!

Bing

Graph Explorer

SPARQLContext Analyzer

Ranker

Offline computation

Linked Data graph exploration

Rank nodes exploiting external information

Store results as pairs of nodes together with their similarity

Runtime SearchStart typing a tag

Query the system for relevant tags (corresponding to DBpedia resources)

Show the semantic tag cloud

1

2

3

1

2

3

1

Offl

ine

com

puta

tion

2

3

1

2

3

GoogleGoogle

Runti

me

sear

ch

Page 18: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Evaluation

We evaluate five different algorithms:1. DBpediaRanker2. DBpediaRanker minus Wikipedia info3. DBpediaRanker minus ext info sources4. Co-occurrence 5. Similarity Distance

),()()(

),(),(

2121

2121 rrfrfrf

rrfrrcoOcc

)}(log),(min{loglog

),(log)(log),(logmax),(

21

212121 rfrfN

rrfrfrfrrngd

Page 19: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Evaluation (II)

http://sisinflab.poliba.it/evaluation

50 volunteersResearchers in the ICT area244 votes collected (on average 5 votes for each users)Time to vote: 1min and 40secs

Page 20: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Evaluation (III)

http://sisinflab.poliba.it/evaluation/data

3.91 - Good

Page 21: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Conclusion

• NOT *is* useful in the annotation phase: – suggestions of semantically related tags– Tags enrichment

• NOT *is* useful in the retrieval phase:– Semantic match among tags

Page 22: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Future Work

Page 23: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Impakt Revolution

http://sisinflab.poliba.it/impakt-revolution/

Page 24: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Inspiration: Google Wonder Wheel

Exploratory Search in Google……nice, but there is no “semantics” in it.

You can not discover new knowledge exploiting the meaning of a term (keyword/tag/query)

Page 25: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

SWOC: Semantic Wonder Cloud

http://sisinflab.poliba.it/semantic-wonder-cloud/index/

Page 26: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Q&A

[email protected]

Thanks for being here on Friday! :-)

http://sisinflab.poliba.it/not-only-tag/

http://sisinflab.poliba.it/semantic-wonder-cloud/index/

http://sisinflab.poliba.it/impakt-revolution/

Page 27: Ranking the Linked Data: the case of DBpedia - ICWE 2010

10th International Conference on Web Engineering, ViennaJuly 5-9, 2010

Conclusion

NOT: a tool for smarter tagging Ranking algorithm for RDF graphs

Future work Test our algorithms with different domains Extract more fine grained contexts Enrich the extracted context using also relevant properties Integrate our approach with real existing systems Use the core system to automatically extract relevant tags

(concepts) from a document (or from a collection of documents) exploiting tools for named entities extraction