semantic grounding of tag relatedness in social bookmarking systems

21
SEMANTIC GROUNDING OF TAG RELATEDNESS IN SOCIAL BOOKMARKING SYSTEMS Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme Presented by Smitashree Choudhury

Upload: samuru

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Ciro Cattuto , Dominik Benz, Andreas Hotho , Gerd Stumme Presented by Smitashree Choudhury. Semantic Grounding of tag Relatedness in Social Bookmarking Systems. Overview. Motivation Measures of semantic Relatedness Semantic Grounding of measures Result analysis. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

SEMANTIC GROUNDING OF TAG RELATEDNESS IN SOCIAL BOOKMARKING SYSTEMS

Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme

Presented bySmitashree Choudhury

Page 2: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Overview

Motivation Measures of semantic Relatedness Semantic Grounding of measures Result analysis

Page 3: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Motivation

Folksonomy is open ended, noisy and large system

Lack of explicit semantic relation in the tag space

Lack of robust semantic grounding of existing similarity measures.

Possible applications are : Ontology learning Tag recommendation Query expansion

Page 4: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Folksonomies and tagging

Folksonomy is a result of social annotation of shared resources.

A folksonomy is a tuple F := (U, T, R, Y) U: the set of users T: the set of tags R: the set of resources Y: a set of ternary “tagging”

relation/assignment. A post is a set of tags assigned by a user

to a resource

user1 resource1

tag1

Page 5: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Data under study

Del.icio.us tags for November 2006. 667,128 users (U) 2,454,546 tags (T) 18,782,132 resources (R) 140,333,714 tag assignments (Y) The study was focused on |T| =10,000

most frequent tags and their users (|U|=476, 378),resources (|R|=12, 660, 470) and |Y | = 101, 491, 722 tag assignments.

Page 6: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Similarity and relatedness

Capture the emergent semantics of the folksonomy

Similarity can be considered as a special case of relatedness

There are (at least) two options for similarity metrics: mapping into a domain where

similarity is well -defined by means of the network structure of

the folksonomy

Page 7: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Measures of Relatedness

Co-Occurrence Contextual (Distributional) Measures

: based on three different vector space feature representation for the tag. Tag context Resource context User context

Folk Rank (Graph based)

Page 8: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Co-Occurrence

Given a folksonomy (U,T,R,Y) a tag-tag co-occurrence graph is a weighted undirected graph whose set of nodes is the set of tags (T).

two tags are connected by an edge if both are used at least for 1 post .

The weight of this edge is given by the number of posts that contain both t1

and t2.

U1-{t1,t2,t3}-r1 U2-{t1,t2}-r1 U3-{t1,t2,t5}-r2

3

1

2

2

t1

t4t5

t3

t2

1

Page 9: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Contextual measures (cosine similarity)

Three measures of tag relatedness based on three different vector space representation of tags. The elements of tag vectors are tag, users and resource weights

If two tags t1 and t2 are represented by v1, v2 their cosine similarity is defined as: cossim(t1, t2) := cos (v1, v2).

The cosine similarity is independent of the length of the vectors and normalised to avoid frequency bias.

Page 10: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Contextual measures (Tag context)

Tag Context Similarity. The Tag Context Similarity (TagCont) is computed in the vector space RT , where, for tag t, the entries of the vector vt are defined by w(t1 t2) where w is the co-occurrence weight defined above.t1 t2 t3 t10000

t1 0 3 1 0t2 3 0 0 2t3

Page 11: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Contextual measures (Resource and User Context) The vector space of tag t is computed based

on how often a tag t is used to annotate certain resource r.

The user context similarity is built similarly to resource context by swapping the roles of the sets R and U.

r5t1

0 1 0 0 3 0 0 1 0 0 0 1 0 0 3 2

t2

0 0 0 1 1 0 1 2 0 0 0 1 0 0 1 1

Page 12: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Folk rank

Adaptation of PageRank to folksonomy : “A resource which is tagged with important tags by important users becomes important itself”[Hotho].

FolkRank computes a ranked list of relevant tags on a random surfer vector.

It considers a folksonomy (U,T,R,Y) as an undirected graph Initially each tag is assigned weight 1 and adjusted with

iterations by spreading weights. Tags for a given tag t1 obtain highest FolkRank

weight are considered to be the most relevant in relation to t1.

Page 13: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Related tags according to various similarity measures

Co-occurrence

Cosine

FolkRank

Page 14: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Result Analysis

Computed most related tags for the 10000 most frequent tags tag and resource context similarity provide more synonyms than

the other measures. For instance, for the tag web2.0 they return some of its alternative spellings such as web-2.0,web,web2.

For the tag games, the tag and resource similarity also provide tags that could be regarded as semantically similar. For instance, the morphological variations game and gaming, or corresponding words in other languages, like spiel (German), jeu (French) and juegos (Spanish).

whereas the FolkRank and co-occurrence measures provide more related general tags and categories.

An interesting observation about the tag java is that python, perl and c++ (provided by tag context similarity) could all be considered as siblings in some suitable concept hierarchy, presumably under a common parent concept like programming languages.

Page 15: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Result analysis

Are related tags shared across relatedness measures?

related tags obtained via tag context or resource context appear to be “synonyms” or “siblings” of the original tag.

Co-occurrence and FolkRank seem to provide “more general” tags.

In terms of shared tags, the co-occurrence and FolkRank measures are most similar and overlap 6.81 tags out of 10, while cosine similarity displays little overlap with either of them.

Page 16: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Semantic Grounding The strategy is to ground the relations between the

original and the related tags by looking up the tags in a formal representation of word meanings.

Mapping tags into WordNet synsets allows these measures to be compared against well-studied similarity measures .

In WordNet words are grouped into synsets, sets of synonyms that represent one concept. Synsets are nodes in a network and links between synsets represent semantic relations.

Only is-a relationships are considered. Roughly 61% of the 10,000 most frequent tags in

del.icio.us are covered in WordNet.

Page 17: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Wordnet similarity

In Wordnet semantic similarity is measured using both

taxonomic shortest-path length Jiang-Conrath metric

combines taxonomic path length with an information-theoretic similarity measure

validated in user studies

A first assessment of the measures of relatedness is carried out by measuring – in WordNet – the average semantic distance between a tag and the corresponding most closely related tag according to each one of the relatedness measures

Page 18: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Wordnet similarity

Page 19: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Analysis Jiang-Conrath measure has been validated in

user studies [Budanitsky] so semantic distances correspond to distances cognitively perceived by human subjects.

The tag and resource context relatedness point to tags that are semantically closer according to both grounding measures.

Resource context measure is optimal but expensive

Tag context performs equally good like resource context yet computationally lighter.

Page 20: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

Summary

First, it introduces a systematic methodology for characterizing measures of tag relatedness in a folksonomy.

Grounded several measures of tag relatedness by mapping the tags of the folksonomy to synsets in WordNet using semantic distance.

semantic characterization of similarity measures computed on a folksonomy is possible and insightful in terms of the type of relations that can be extracted

given an appropriate measure, globally meaningful tag relations can be harvested from an aggregated and uncontrolled folksonomy vocabulary.

Admittedly, in their current status, none of the measures we studied can be seen as the way to instant ontology creation but further analysis and combination of measures will help to close the gap towards the Semantic Web.

The tag or resource context similarities are clearly the first measures to choose when one would like to discover synonyms and also useful for query expansion

Both FolkRank and co-occurrence relatedness seemed to extract taxonomic relationship between tags and tag recommendations.

Page 21: Semantic Grounding of tag Relatedness in Social Bookmarking Systems

References

Jiang, J.J., Conrath, D.W.: Semantic Similarity based on Corpus Statistics and Lexical Taxonomy.In: Proceedings of the International Conference on Research in Computational Linguistics(ROCLING), Taiwan (1997)

Hotho, A., J¨aschke, R., Schmitz, C., Stumme, G.: Information retrieval in folksonomies: Search and ranking. In Sure, Y., Domingue, J., eds.: The Semantic Web: Research and Applications. Volume 4011 of LNAI., Heidelberg, Springer (2006) 411–426

Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1) (2006) 13–47

Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1989)

And others.....