growing a tree in the forest: constructing folksonomies by integrating structured metadata anon...

Growing a Tree in the Forest: Constructing Folk-sonomies by Integrating Structured Metadata

Anon Plangprasopchok1, Kristina Lerman1, Lise Getoor2

1USC Information Sciences Institute 2Department of Computer Science, University of Maryland

SIGKDD 2010

2010. 12. 17.

Summarized and Presented by Kim Chung Rim, IDS Lab., Seoul National University

Contents

Introduction

Challenges

Learning Folksonomies

SAP: Growing a Tree

Evaluation

Conclusion & Discussion

Introduction

The social Web has changed the way people create and use information

Users organize their content via tags

Sites such as Flickr, Del.icio.us and YouTube allows users to tag their contents with descriptive keywords

Social Metadata

captures collective knowledge of Social Web users

Once mined from many users,

Such collective knowledge will add a rich semantic layer to the content of Social Web

Could be used for information discovery

Structured Social Metadata

Some Social Websites allow users to organize tags hierarchically

Delicious

Users can group related tagsinto bundles

Flickr

Users can group related photos into sets and related sets into collections

Folksonomy

Folksonomy – a common taxonomy that is learned from social metadata.

There exist predefined hierarchies such as WordNet or Linnaean classification system.

Benefits of Folksonomy over already existing hierarchies

represent collective agreement of many individuals

Are relatively inexpensive to obtain

Can adapt to evolving vocabularies and community’s in-formation needs

Directly tied to the annotated content

The goal of this paper is to

Extract users’ knowledge (folk knowledge) from user-gen-erated semantics

To build up a tree of communal taxonomy

animal

fish cat dog

biggle jindo

animal

reptile cat

animal

fish cat dog

biggle jindo

reptile

Challenges

Sparseness

Social metadata is very sparse – only 3.74 tags per photo

Nosiy vocabulary

Spelling errors and meaningless tags – not sure, mykid

Ambiguity

Individual tag is often ambiguous – jaguar

Challenges

Structural noise and conflicts

Users use different term to build their own hierarchy

Varying granularity level

Users’ level of expertise and expressiveness differ

Learning Folksonomies - intuition

roots with similar term should be aggregated horizontally

A Root and a leaf with similar term should be aggregated vertically

animal

fish cat dog

animal

reptile cat

animal

fish cat dog reptile

animal

fish cat dog

biggle jindo

animal

fish cat dog

biggle jindo

Learning Folksonomy

Relational Clustering

Two nodes should be clustered if their similarity is above a certain threshold

Similarity is computed using local similarity & structural similarity

– localSim(a,b) measures the similarity of node names and their tag distribution

– structSim(a,b) measures the similarity of leaves

– is a weight for adjusting contributions from localSim and structSim ( )

),(),()1(),( bastructSimbalocalSimbanodesim

Learning Folksonomy - terms

A personal hierarchy is called a sapling

A sapling is composed of a root node and its leaves

Tag statistic of a node x is defined as

Where and are tag and its frequency

is aggregated from all

= {(canada, 1), (vacation, 1), … }

= {(bc,1), (canada, 2), (vacation, 1), … }

i ll ,...,1

)},(),...,,(),,{(:21 21 ktkttx ftftft

kt ktf

Local Similarity

The local similarity of nodes a and b is composed of

Name similarity

Tag distribution similarity

nameSim can be any string similarity metric, which returns a value from 0 to 1

tagSim can be any function for measuring the similarity of distributions – in this paper, tagSim counts the number of common tags n, in the top K tag of a and b. if n ≥ J (a cer-tain threshold), it returns 1, else it returns n/J.

),()1(),(),( batagSimbanameSimbalocalSim

Structural Similarity

Compute structural similarity in two cases

Similarity between two root nodes

Similarity between a root node and a leaf node

Root-to-Root Similarity

Similarity based on

How many of their children have common name

The tag distribution similarity of those that do not have the same name

– Where ,

– is an aggregation of tag distributions of all

),()1(),( Btag

BA LLtagSimCLCLrrRstructSimR

))(),((1

Ai lnamelname

ZCL |)||,min(| YX llZ

Root-to-Leaf Similarity

Takes into account when a sapling contains the same leaf as the compared node

When merging UK->{Scotland, Glasgow, Edinburgh, Lon-don} and Scotland->{Glasgow, Shetland}, it can be said that Scotland and UK are similar because UK contains a shortcut to Glasgow

Measures overlap between siblings of and children of

),(),( BABAi rrRstructSimRrlRstructSimL

Relational Clustering

When the nodeSim(a,b)

exceeds a certain threshold, two nodes are clustered(merged)

),(),()1(),( bastructSimbalocalSimbanodesim

animal

fish cat dog

animal

reptile cat

animal

fish cat dog reptile

SAP: Growing a Tree

Using Incremental Relational Clustering,

canada

canada canada

canada

victoria

victoria victoria

Horizontal aggrega-

Vertical ag-gregation

Handling Problems

Shortcuts

Take the longer route

Noisy vocabularies

If a name is used by less than 1% of the user who contrib-uted in the sapling, it is considered as noisy and discarded

Evaluation

20,759 saplings from 7,121 users are selected from Flickr

Values for several parameters are chosen

Parame-ters

RR 0.1

LR 0.8 0.5

Evaluation Methodologies

Against the reference hierarchy (ODP)

Taxonomic Overlap – measuring structure similarity be-tween two trees. Measures how many ancestor and de-scendant nodes overlap to those in the reference tree for each node and take the harmonic mean (fmTO)

Lexical Recall – measuring how well an approach can dis-cover concepts in the reference hierarchy (LR)

Structural evaluation

Area Under Tree – a measurement to define how bushy and deep a tree is (AUT)

Manual evaluation (for concepts not in ODP)

Asking 5 judges whether the path is correct

Evaluation metric - AUT

AUT for below tree is:

1*(3+1)/2 + 1*(3+2)/2 = 4.5

animal

fish cat dog

biggle jindo0 0.5 1 1.5 2

nodes per depth

Baseline of comparison

SIG – past work of the authors

Assume nodes having the same name refer to the same concept

Keep the relations between node pairs

Combine all relations into a tree

Results

Simple comparison

Some of the Learned Folksonomies

Conclusion & Discussion

This work can create more accurate and more detailed folksonomies than the current state-of-the-art approach by exploiting structural information

This work is more scalable – it incrementally grows the folksonomies

Lacks comparison with other state-of-the-art approaches

Thank you

growing a tree in the forest: constructing folksonomies by integrating structured metadata anon...

similarity of leaves

social websites

cebtstructured social

similarity of node names

cebtintroductionthe

content of social webcould

mykidambiguityindividual

tag distributionstructsima

Documents

tag relatedness in image folksonomies · folksonomies....

lise getoor, "

taxonomy, ontology, folksonomies & skos

aspects of broad folksonomies

dr. robert lerman,

harnessing folksonomies for resource classification

widgets, folksonomies, mashups and syndication

paper by elena zhelea and lise getoor

lerman evolving

folksonomies in museums

folksonomies and facets

anon plangprasopchok & kristina lerman usc information...

workshop on folksonomies - uni-duesseldorf.de...1....

liz lerman critical response process

the social web or how flickr changed my life kristina lerman...

graphical models: an introduction lise getoor computer...

identifying overlapping communities in folksonomies or...

understanding the semantics of ambiguous tags in...

constructing folksonomies from user-speciﬁed relations on...

folksonomies, ontologies and corporate blogging