![Page 1: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/1.jpg)
Anon Plangprasopchok&
Kristina LermanUSC Information Sciences Institute
Constructing Folksonomies from User-Specificed Relations
on Flickr
![Page 2: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/2.jpg)
Motivation
UsersWeb content
classification
Consume
ProduceAnnotate
Organize
DiscoverAnnotation /
Metadata
Organize Search Recommend Leverage
![Page 3: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/3.jpg)
Inducing Folksonomy
• GOAL: induce hidden classification hierarchies, “Folksonomies*,” from user generated metadata
Although metadata from an individual user may be too inaccurate and incomplete, the metadata from different users may complement each other, making it, in combination, meaningful.
• In this work, we explore some strategies that combine metadata from many users and then induce folksonomies.
* The definition is somewhat different from the original one, made by Thomas Vander Wal.
![Page 4: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/4.jpg)
Outline
• Motivation• Hierarchical Relations• Approaches• Results• Discussion• Related work
![Page 5: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/5.jpg)
Hierarchical Relations in Social Web
• Appear Implicitly
• Appear Explicitly
Tags:InsectGrasshopperAustralianMacroOrthoptera
Folder (collection)
Sub folder (set)
Relations
Goal: to induce deeper hierarchies from this metadata
![Page 6: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/6.jpg)
Inducing Hierarchy from Tags
Existing approaches
• Graph based [Mika05]• build a network of associated tags (node = tag, edge = co-occurrence of tags)• suggest applying betweenness centrality and set theory to determine broader/narrower relations
• Hierarchical Clustering [Brooks06; Heymann06+]•Tags appearing more frequently would likely have higher centrality and thus more abstract.
• Probabilistic subsumption [Sanderson99+; Schmitz06]
• x is broader than y if x subsumes y x subsumes y if p(x|y) > t & p(y|x) < t
x
y
![Page 7: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/7.jpg)
Inducing Hierarchy from Tags
• Some difficulties when using tags to induce hierarchy:
Above relations induced using subsumption approach on tags [Sanderson99+, Schmitz06]
Washington United States
Car Automobile
Notation: A B (A is broader than B
Or hypernym relation)
Insect Hongkong
Color Brazilian
Specificity Rarity
Tags are from different facets
![Page 8: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/8.jpg)
• User specified relations, e.g., • Flickr’s Collection-Set • Delicious’ Bundle-Tag • Bibsonomy’s Relation-Tag
• Key intuition: Not so many people specify peculiar relations like• “automobile” “car”, or • “Washington” “United States”
Inducing Hierarchy from user-specified relations
In this work, we concentrate on metadata from Flickr.
![Page 9: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/9.jpg)
Simple Strategy
Mushrooms & Fungi
Set
Collection
Fungi, Puffballs & Shelf FungiTokenize + Stem …
Concept relations
mushroom mushroom
fungi
Shelf fungi
fungi
puffballs
live thing
fungi
mushrooms
puffballs
shelf fungi
live thing
fungimushroom plant
……
2. Link concepts & Select path
1. Remove “noisy” relations- Conflict resolution- Significance test
Sets
Collection
![Page 10: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/10.jpg)
Remove noisy relations: 1st approach
• Conflict Resolution (when both AB and BA appear)• Relation conflicts occur because of noise• Voting scheme:
Keep AB (and discard BA)
If Nu(AB) > 1 and Nu(AB) > Nu(BA)
insect
butterfly
butterfly
insect
10 2
![Page 11: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/11.jpg)
Remove noisy relations:2nd approach
• Significance Test- Use statistical significance test to decide if A B is significant
- Null hypothesis: observed relation AB was generated by chance, via the random, independent generation of individual concepts A, B.
# observations
rejectaccept
# of A B
Is B narrower than A by chance?
![Page 12: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/12.jpg)
Link Concepts
• Link concepts together
• simply assume that same terms refer to the same concept
anim
bug
anim
insect
anim
bug insectbug
insect
anim
mothbug
moth
insect
moth
moth
![Page 13: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/13.jpg)
Select path
anim
bug insect
moth
26 72
4 18
1
4 possible paths from anim moth:1) abim2) aim3) am4) abm
Network Bottleneck idea: “the flow bottleneck is a minimum flow capacity among all relations in the path”
1) abim [BN score = min(26,1,18) = 1]2) aim [BN score = min(72,18) = 18]3) am [BN score = min(10) = 10]4) abm [BN score = min(26,4) = 4]
10
• Select path: link relations from many users can cause a spaghetti graph
![Page 14: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/14.jpg)
Evaluation & Data Set
• Hypothesis: the approach that takes explicit relations into account can induce better hierarchies.• “Better” means more consistent with the reference hierarchy (obtained from Open Directory Project (ODP))
ODP
Hierarchy in ODP is• created by volunteer editors• controlled under ODP guidelines
![Page 15: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/15.jpg)
Evaluation & Data Set (2)
• The baseline approach is subsumption approach [Schmitz06] Collection and set terms are used instead of tags, making it comparable.
Data Set: Data from 17 user groups, devoted to wildlifeand naturalist photography
21,792 of 39,922 users specify at least one collection
110,543 unique terms (c.f. 166,153 unique terms in ODP), 15,495 terms in common.
![Page 16: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/16.jpg)
Evaluation methodology
ODP has many sub hierarchies: comparing to the induced ones are impractical!It’s easier to compare when specifying “root concept” and “leaf concepts”, i.e., specifying a certain sub tree to compare.
Reference hierarchy
Relations (right after tokenized)
Induced hierarchy
Induce (remove
noise+link)
(ODP)
![Page 17: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/17.jpg)
Metrics
• Taxonomic Overlap [adapted from Maedche02+]• measuring structure similarity between two trees• for each node, determining how many ancestor
and descendant nodes overlap to those in the reference tree.
• Lexical Recall• measuring how well an approach can discover
concepts, existing in the reference hierarchy (coverage)
![Page 18: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/18.jpg)
Quantitative Result
• Manually selecting 32 root nodes
Subs 1/32Conres 11/32Sig001 15/32
Subs 2/32Conres 17/32Sig001 6/32
Subs ~ 0.85 Conres ~ 2.24Sig001 ~ 2.08
![Page 19: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/19.jpg)
Sport hierarchy
![Page 20: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/20.jpg)
Invertebrate hierarchy
![Page 21: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/21.jpg)
Country hierarchy
![Page 22: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/22.jpg)
Discussion
• Simple strategy to aggregate a large number of shallow relations specified by different users into a common, deeper hierarchy
• Induced hierarchies are more consistent with ODP
• Future work includes: Term ambiguity Global structure Relation types Apply to other datasets
![Page 23: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/23.jpg)
Related Work
• Learning concept hierarchy from text data • Syntactic based [Hearst92, Caraballo99,
Pasca04, Cimiano+05, Snow+06]• Word clustering [e.g., Segal+02, Blei+03]
• Induce concept hierarchy from tags • Graph-based & clustering based [Mika05,
Brooks+06, Heymann+06, Zhou07+]• Probabilistic subsumption [Schmitz06]
• Ontology alignment [e.g., Udrea+07]
• Exploit user-specified hierarchy• GiveALink [Markines06+]
![Page 24: Anon Plangprasopchok & Kristina Lerman USC Information Sciences Institute Constructing Folksonomies from User-Specificed Relations on Flickr](https://reader035.vdocument.in/reader035/viewer/2022062423/56649cc65503460f9498fce5/html5/thumbnails/24.jpg)
• Questions?• Is the metric used in evaluation meaningful?• How is the scalability of the system?• WordNet, ODP is already there. Why do we need
this system?• How is this work related to ontology enrichment?• Is it ethical to collect users’ data?• ….
Questions?
THANK YOU!