pragmatic evaluation of concept hierarchies
DESCRIPTION
Presentation of our best paper awarded work at i-Know 2012TRANSCRIPT
Graz University of Technology
1
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Pragmatic Evaluation of Concept
Hierarchies
Christoph Trattner, Philipp Singer
Denis Helic, Markus Strohmaier
Graz University of Technology, Austria
Graz University of Technology
2
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
What is this talk about
We will introduce a framework to evaluate concept
hierarchies that do not rely on a Golden-Standard
Framework determines the pragmatic usefulness of
concept hierarchies utilizing Kleinberg‟s idea of
hierarchical decentralized search
We will show evidence that the framework does not
only work in theory but also in practice
Part 1
Part 2
Graz University of Technology
3
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
What was the motivation of our research?
Graz University of Technology
4
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Directories: Categorization by Experts
Graz University of Technology
5
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Research question
Can a crowd of users contribute to the
creation of such categorizations?
How can we generate such hierarchical
structures automatically?
Graz University of Technology
6
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Annotation by Users: Tagging
Folksonomy
Tuple (U, R, T, Y)
User (U)
Resource (R)
Tag (T)
Relation (Y)
Graz University of Technology
7
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Folksonomies
Emerge from the process of collaborative tagging
Latent hierarchical structures
Turn flat structure into hierarchy taxonomy
induction algorithms Generality-based algorithms (centrality in tag-to-tag networks)
Other algorithms possible: k-means, affinity propagation, ...
E.g., [Heyman and Garcia-Molina 2006] or [Benz et al. 2010]
Graz University of Technology
8
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Problem: How can we evaluate the
usefulness of these hierarchies?
Idea: Golden standard based methods
Problem: Lack of golden standard [Strohmaier et al. 2012]
little taxonomic overlap => results are not trustworthy
Very small overlap !!!M. Strohmaier, D. Helic, D. Benz, C. Körner and R.
Kern, Evaluation of Folksonomy Induction Algorithms, In the
ACM Transactions on Intelligent Systems and Technology
Graz University of Technology
9
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Question?
Can we somehow find another evaluation method?
Graz University of Technology
10
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Stanley Milgram
A social psychologist
Yale and Harvard University
Study on the Small World Problem,
beyond well defined communities
and relations
(such as actors, scientists, …)
„An Experimental Study of the Small World Problem”
1933-1984
Graz University of Technology
11
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
The simplest way of formulating the small-world problem is:
Starting with any two people in the world, what is the
likelihood that they will know each other?
A somewhat more sophisticated formulation, however, takes
account of the fact that while person X and Z may not know
each other directly, they may share a mutual acquaintance -
that is, a person who knows both of them. One can then think of
an acquaintance chain with X knowing Y and Y knowing Z.
Moreover, one can imagine circumstances in which X is linked
to Z not by a single link, but by a series of links, X-A-B-C-D…Y-
Z. That is to say, person X knows person A who in turn knows
person B, who knows C… who knows Y, who knows Z.
[Milgram 1967, according to
]http://www.ils.unc.edu/dpr/port/socialnetworking/theory_paper.html#2]
Graz University of Technology
12
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
An Experimental Study of the Small World
Problem [Travers and Milgram 1969] A Social Network Experiment tailored towards
Demonstrating
Defining
And measuring
Inter-connectedness in a large society (USA)
A test of the modern idea of “six degrees of
separation”
Which states that: every person on earth is
connected to any other person through a chain of
acquaintances not longer than 6
Graz University of Technology
13
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Set Up
Target person: A Boston stockbroker
Three starting populations 100 “Nebraska stockholders”
96 “Nebraska random”
100 “Boston random”
Nebraska
random
Nebraska
stockholders
Boston
stockbroker
Boston
random
Target
Graz University of Technology
14
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results
How many of the starters would be able to establish
contact with the target? 64 out of 296 reached the target
How many intermediaries would be required to link
starters with the target? Well, that depends: the overall mean 5.2 links
Through hometown: 6.1 links
Through business: 4.6 links
Boston group faster than Nebraska groups
Nebraska stockholders not faster than Nebraska random
What form would the distribution of chain lengths
take?
Graz University of Technology
15
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Decentralized Search
Search in (social) networks people have only local
knowledge of the network
People have background knowledge of the network, e.g.
geography
Background knowledge defines the notion of distance
between nodes
People are greedy: at each step people select a node that
has the smallest distance to the target
Kleinberg explained the process of navigating a network and
finding others with only local knowledge
Decentralized search with hierarchical background
knowledge [Kleinberg 2000]
Graz University of Technology
16
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Hierarchical decentralized searcher
Information
Network
Hierarchy
Graz University of Technology
17
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Idea!
Use Kleinberg„s model of decentralized search in social
networks and apply it to information networks.
Graz University of Technology
18
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Framework
Hence, we implemented a framework that takes as input a given
hierarchy & network and determines the usefulness of this
hierarchy for navigating the network [Helic et al. 2011].
Framework
Useful?
Yes/No
Hierarchy
Network
Hierarchical
Decentralized
SearcherD. Helic, M. Strohmaier, C. Trattner, M. Muhr, K.
Lerman, Pragmatic Evaluation of Folksonomies, 20th
International World Wide Web Conference
(WWW2011), Hyderabad, India, March 28 - April 1, ACM, 2011.
Graz University of Technology
19
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Question?
To what extent are current tag hierarchy induction
algorithms useful for navigation?
Graz University of Technology
20
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Evaluating Tag Hierarchy Induction
Algorithms In [Helic et al. 2011 we used this kind of framework to
evaluate 5 different hierarchy induction algorithms on
5 different datasets (25 combinations) BibSonomy
Delicious
CiteUlike
Flickr
LastFM
Simulations were based on a random sample of
100.000 search pairs
Measuring the success rate and stretch for evaluation
Graz University of Technology
21
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Evaluating Tag Hierarchy Induction
Algorithms
BibSonomy CiteULikeDelicious
Flickr LastFM
Results:
Centrality-based hierarchy induction
algorithms outperform complicated
methods such as K-Means or Affinity
Propagation
Graz University of Technology
22
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Question
What are the differences and similarities of hierarchies
based on different types of annotations?
To what extent are hierarchies based on tags more useful for navigation
than hierarchies based on keywords?
Graz University of Technology
23
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
We
Keywords
Tags
Graz University of Technology
24
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results
Results:
Tag-based Hierarchies are more
useful for navigation than keyword-
based hierarchies
Graz University of Technology
25
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Question???
To what extent is it justified to model human navigation
in information networks with hierarchical
decentralized search?
Graz University of Technology
26
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Idea?
Compare Simulations with real world data!
Exploring the Differences and Similarities between Hierarchical Decentralized
Search and Human Navigation in Information Networks
Graz University of Technology
27
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Evaluation
We compared simulations with
human click trails of the online Game –
The Wiki Game (http://thewikigame.com/)
Contains 1,500,000
click trails of more
than 500,000 users with
(start; target) information.
Graz University of Technology
28
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Hierachy Creation
Two types of hierarchies were evaluated
1.) First type is based on our previous work Categorial Concepts:
Tags from Delicious
Category labels from Wikipedia
Similarity GraphLatent Hierarchical Taxonomy
Wikipedia Category Label Dataset:
2,300,000 category labels,
4,500,000 articles, 30,000,000 category
label assignments
Delicious Tag Dataset:
440,000 tags, 580,000 articles and
3,400,000 tag assignments
Graz University of Technology
29
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Hierarchy Creation
2.) Second type is based on the work of [Muchnik et al. 2007]
Muchnik, L., Itzhack, R., Solomon S. and Louzoun Y.: Self-emergence of knowledge trees: Extraction
of the Wikipedia hierarchies, PHYSICAL REVIEW E 76, 016106 (2007)
Simple idea: Algorithm iterates through all
links in the network and decides if that link is
of a hierarchical type, in which case it
remains in the network otherwise it is
removed.
Directed link-network dataset of the
English-Wikipedia from February
2012.
All in all, the dataset includes
around 10,000,000 articles and
around 250,000,000 links
Graz University of Technology
30
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Evaluation Metrics
Success Rate: Percentage of target nodes found
Number of Hops: Number of hops needed to reach the target
node
Stretch: Fraction of number of the number of steps and global
shortest path
Path Similarity: intersection(h_clicks,s_clicks)/s_clicks
Degree: median in- and out-degree values of the nodes visited
by the simulator and the human navigator
Transition Similarity
Graz University of Technology
31
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
What are the results??
Graz University of Technology
32
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results: Hops, Stretch, Success Rate
Humans Searcher with Wikipedia Category
Hierarchy
Success Rate: 31.6%
Stretch: 1.7
Success Rate: 100%
Stretch: 2.5
Graz University of Technology
33
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results: Hops, Stretch, Success Rate
Humans Searcher with Wikipedia Delicious
Hierarchy
Success Rate: 69%
Stretch: 8.8
Success Rate: 100%
Stretch: 2.5
Graz University of Technology
34
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results: Hops, Stretch, Success Rate
Humans
Success Rate: 100%
Stretch: 2.5
Success Rate: 93%
Stretch: 1.5
Searcher with Wikipedia Network
Hierarchy
Graz University of Technology
35
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results: Path Similarity
Humans vs. Humans Humans vs. Simulators
Question: How similar are the paths taken by our searcher compared
to the humans
Graz University of Technology
36
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results: Degree
In- Degree Out- Degree
Graz University of Technology
37
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Results: Transition Similarity
Humans Searcher
Graz University of Technology
38
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Conclusions
We have shown that our approach of hierarchical
decentralized search models human navigation in
information networks fairly well
Furthermore, we have shown that hierarchies created
directly from the link network are better suited for
navigation than hierarchies that are created from
external knowledge
Graz University of Technology
39
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
What we plan for the Future?
Enhance the framework to consider not only
navigation but also search (= search box)
Evaluation of alternative navigational structures
and many more things
Graz University of Technology
40
T Trattner C., Singer P., Helic D., Strohmaier M. I-Know 2012
Thank you!
Christoph Trattner
www.christophtrattner.info
@ctrattner
Philipp Singer
www.philippsinger.info
@ph_singer
Denis Helic
http://coronet.iicm.edu/
denis/homepage/
@dhelic
Markus Strohmaier
www.markusstrohmaier.info
@mstrohm
Take home message
Network hierarchies are better suited for
navigation than hierarchies created from
external knowledge