ranking the linked data: the case of dbpedia - icwe 2010
TRANSCRIPT
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Ranking the LinkedData: the case of DBpedia
Roberto Mirizzi1, Azzurra Ragone1,2, Tommaso Di Noia1, Eugenio Di Sciascio1
1Politecnico di BariVia Orabona, 470125 Bari (ITALY)
2University of TrentoVia Sommarive, 14
38100 Trento (ITALY)
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Outline
• Tags are all around• NOT (Not Only Tag): what is it?• NOT a look behind the curtains:– Ranking of RDF resources: an hybrid approach
• Evaluation• Conclusion and Future Work
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Tags are all around
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Tag cloud
and many
more…
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Tagging: a double face
Annotation phase Retrieval phase
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Problems with annotation
• Insert as much as possible tags (time consuming):– different versions of the same tag to catch all the
possible searches– Multilingual tags
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Problem with retrieval
• Exactly (syntactic) match among tags: web service is different from web services, webservices,…
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Why not to use Semantic tags?
Plugged into the Web 3.0DisambiguationRelations among tagsMachine understandable
NOT: Not Only Tag
http://sisinflab.poliba.it/not-only-tag/
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Demo
• Let’s imagine to tag the book:
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
NOT
http://sisinflab.poliba.it/not-only-tag/
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Smarter taggingAn
nota
tion
phas
eRe
trie
val p
hase
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
What is behind NOT?
• DBpedia graph exploration• Computation of similarity value between each
pair of RDF resources using external information sources (search engines, bookmarking systems)
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
What is behind NOT? (II)
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
What is behind NOT? (III)
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
What is behind NOT? (IV)
Semantic_Web XML-based_standards
Knowledge_representation Data_management Internet_architecture
Triplestores Folksonomy
…
…
XML Computer_and_telecommunication_stantards
Web_services User_interface_markup_languages Scalable_Vector_GraphicsMicroformats
skos:subject skos:broaderCategoryArticle
Legend
……
…
Resource Description Framework
Microformat
RDFa
…
…
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
DBpedia-Ranker: hybrid ranking
?r1 ?r2isSimilar
v
hasValue
)(
),(
)(
),(),(
2
21
1
2121 rf
rrf
rf
rrfrrsim
viceversaand r and rbetween wikilink,2
saor vicever r and rbetween k wikilin,1
r and rbetween wikilink no ,0
),(
21
21
21
21 rrorewikilinkSc
)(
),(),(
2
1221 rl
rrlrroreabstractSc
Graph-based ranking
External sources-based ranking
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Functional Architecture
Back-end
Query engine
Storage
Cloud GeneratorGUI
Ext.
Info
Sou
rces
DBpedia Lookup Service
Delicious
Yahoo!
Bing
Graph Explorer
SPARQLContext Analyzer
Ranker
Offline computation
Linked Data graph exploration
Rank nodes exploiting external information
Store results as pairs of nodes together with their similarity
Runtime SearchStart typing a tag
Query the system for relevant tags (corresponding to DBpedia resources)
Show the semantic tag cloud
1
2
3
1
2
3
1
Offl
ine
com
puta
tion
2
3
1
2
3
GoogleGoogle
Runti
me
sear
ch
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Evaluation
We evaluate five different algorithms:1. DBpediaRanker2. DBpediaRanker minus Wikipedia info3. DBpediaRanker minus ext info sources4. Co-occurrence 5. Similarity Distance
),()()(
),(),(
2121
2121 rrfrfrf
rrfrrcoOcc
)}(log),(min{loglog
),(log)(log),(logmax),(
21
212121 rfrfN
rrfrfrfrrngd
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Evaluation (II)
http://sisinflab.poliba.it/evaluation
50 volunteersResearchers in the ICT area244 votes collected (on average 5 votes for each users)Time to vote: 1min and 40secs
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Evaluation (III)
http://sisinflab.poliba.it/evaluation/data
3.91 - Good
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Conclusion
• NOT *is* useful in the annotation phase: – suggestions of semantically related tags– Tags enrichment
• NOT *is* useful in the retrieval phase:– Semantic match among tags
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Future Work
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Impakt Revolution
http://sisinflab.poliba.it/impakt-revolution/
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Inspiration: Google Wonder Wheel
Exploratory Search in Google……nice, but there is no “semantics” in it.
You can not discover new knowledge exploiting the meaning of a term (keyword/tag/query)
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
SWOC: Semantic Wonder Cloud
http://sisinflab.poliba.it/semantic-wonder-cloud/index/
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Q&A
Thanks for being here on Friday! :-)
http://sisinflab.poliba.it/not-only-tag/
http://sisinflab.poliba.it/semantic-wonder-cloud/index/
http://sisinflab.poliba.it/impakt-revolution/
10th International Conference on Web Engineering, ViennaJuly 5-9, 2010
Conclusion
NOT: a tool for smarter tagging Ranking algorithm for RDF graphs
Future work Test our algorithms with different domains Extract more fine grained contexts Enrich the extracted context using also relevant properties Integrate our approach with real existing systems Use the core system to automatically extract relevant tags
(concepts) from a document (or from a collection of documents) exploiting tools for named entities extraction