networks navigability: theory and applications

76
Networks Navigability: Theory and Applications Denis Helic & Christoph Trattner KMI, TU Graz August 31, 2011 Denis Helic & Christoph Trattner (KMI, TU Graz) Networks Navigability: Theory and Applications August 31, 2011 1 / 75

Upload: christoph-trattner

Post on 10-May-2015

517 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Networks Navigability: Theory and Applications

Networks Navigability: Theory and Applications

Denis Helic & Christoph Trattner

KMI, TU Graz

August 31, 2011

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 1 / 75

Page 2: Networks Navigability: Theory and Applications

Internet of Things

http://www.youtube.com/watch?v=sfEbMV295Kk

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 2 / 75

Page 3: Networks Navigability: Theory and Applications

Internet of Things

We are heading towards a completely interconnected society

Where people, devices, sensors are all connected to each other

producing billions of billions of data each day...

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 3 / 75

Page 4: Networks Navigability: Theory and Applications

Internet of Things

One big challenge in this context is how we can find relevantinformation in such a networked world of data

Hence, in this presentation:

Latest research results on the navigability of such networks will beshown

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 4 / 75

Page 5: Networks Navigability: Theory and Applications

Internet of Things

In particular I will show:

what are structural clues that make such networksnavigable/searchable?

In addition to this, I will present a framework that is able to measurenetwork navigability.

and I will present two algorithms to generate efficient navigationaltools for that networks.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 5 / 75

Page 6: Networks Navigability: Theory and Applications

Networks

What are networks?

Basically a network is a system that can be modeled with graphs.

Graphs are mathematical structures consisting of vertices and edgesconnecting the vertices

When we observe large graphs that exist in nature, societies, orsystems we refer to them as networks

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 6 / 75

Page 7: Networks Navigability: Theory and Applications

Networks

What are popular examples of such networks?

Social networks. Nodes are people and links are acquaintances,friendship, and so on.

Communication networks. Internet: nodes are computers and linksare cables connecting computers

Biological networks. Metabolism: nodes are substances and links aremetabolic reactions

Information networks. Web: nodes are Web pages and links arehyperlinks connecting pages

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 7 / 75

Page 8: Networks Navigability: Theory and Applications

Networks6 How to search in a small world

Pajek

Figure 2: HP Labs’ email communication (light grey lines) mapped onto the organizationalhierarchy (black lines). Note that communication tends to “cling” to the formal organizationalchart.

with one another. The h-distance, used to navigate the network, is computed asfollows: individuals have h-distance one to their manager and to everyone they sharea manager with. Distances are then recursively assigned, so that each individualhas h-distance 2 to their first neighbor’s neighbors, and h-distance 3 to their secondneighbor’s neighbors, etc.

The optimum relationship derived in [7] for the probability of linking would beinversely proportional to the size of the smallest organizational group that both indi-viduals belong to. However, the observed relationship, shown in Figure 5 is slightlyoff, with p ∼ g−3/4, g being the group size. This means that far-flung collaborationsoccur slightly more often than would be optimal for the particular task of searching, atthe expense of short range contacts. The tendency for communication to occur acrossthe organization was also revealed in an analysis utilizing spectroscopy methods onthe same email network [12]. While collaborations mostly occurred within the sameorganizational unit, they also occasionally bridged different parts of the organizationor broke up a single organizational unit into noninteracting subgroups.

Given the close correspondence between the assumptions of the models regardinggroup structure and the email network, we expected greedy strategies using the orga-nizational hierarchy to work fairly well. Indeed, this was confirmed in our simulations.

Figure: Social network of HP Labs constructed out of e-mail communication.From: How to search a social network, Adamic, 2005.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 8 / 75

Page 9: Networks Navigability: Theory and Applications

Networks

Figure: Network of pages and hyperlinks on a Website. From: Networks, MarkNewman, 2011.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 9 / 75

Page 10: Networks Navigability: Theory and Applications

Structure and Function of Networks

One of the most important research questions in the study ofnetworks: what is the relation between structure and function ofnetworks

For example, the Internet – how should the link structure of theInternet look like that supports efficient routing?

Or how should the link structure of the Web look like to be efficientnavigable?

In this presentation we will focus on network navigability!

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 10 / 75

Page 11: Networks Navigability: Theory and Applications

Network Navigability

Definition

Put simple, a network is navigable if and only if there is a short pathbetween all or almost all pairs of nodes in the network.

Formally:

1 There exist a giant component

2 The effective diameter is low – bounded by log(n), where n is thenumber of nodes in the network

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 11 / 75

Page 12: Networks Navigability: Theory and Applications

Network Navigability

Example 1:

Knowledge Management Institute

7

Denis Helic 2010

Navigability: Examples

Example 1:

Not navigable: No giant component

Example 2:

Not navigable: giant component, BUTeff.diam: 7 > log2(8)

Figure: Network is not navigable because there is no giant component, i.e. thenetwork is not connected.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 12 / 75

Page 13: Networks Navigability: Theory and Applications

Network Navigability

Example 2:

Knowledge Management Institute

7

Denis Helic 2010

Navigability: Examples

Example 1:

Not navigable: No giant component

Example 2:

Not navigable: giant component, BUTeff.diam: 7 > log2(8)Figure: Now, there is a giant component, i.e. the network is connected. However

the network is not navigable because eff .diam = 6, and 6 > log2(8).

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 13 / 75

Page 14: Networks Navigability: Theory and Applications

Network Navigability

Example 3:

Knowledge Management Institute

8

Denis Helic 2010

Navigability: Examples

Navigable: Giant component AND eff.diam: 2 < log2(10)

Is this efficiently navigable? There are short paths between all nodes, but can an

agent or algorithm find them with local knowledgeonly?

Figure: The network is navigable because there is a giant component andeff .diam = 2. Effective diamater is bounded by log2(10).

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 14 / 75

Page 15: Networks Navigability: Theory and Applications

Global Network Navigability

We discussed so far global network navigability

Suppose that the network is navigable and we have global knowledgeof network

Then it is easy to design efficient procedures to find an arbitrarytarget node from an arbitrary start node

For example, breadth-first search is such an algorithm that has lineartime complexity O(n + m), where m is the number of links

Such procedures are called centralized search

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 15 / 75

Page 16: Networks Navigability: Theory and Applications

Local Network Navigability

Let us now discuss local network navigability

Suppose that the network is navigable but we have only localknowledge of network

That means when we arrive at a particular node we know onlyoutgoing links from that node and nothing beyond that

For instance on Facebook we only know our friends or the friends ofof our friends.

These procedure are typically called decentralized search

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 16 / 75

Page 17: Networks Navigability: Theory and Applications

Local Network Navigability

But, how efficient are people in such social search?

As shown by Millgram’s experiment, people are very efficient in socialsearch.

As shown, people are able to find each other in less than seven hops(friends), ∝ log(n)

Hence, people have an extremely efficient decentralized searchprocedure

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 17 / 75

Page 18: Networks Navigability: Theory and Applications

Local Network Navigability

How we are able to find other people efficiently?

Or in other words, what are the properties of social networks, ornetworks in general that make efficient decentralized search possible?

Are there some structural clues in the network which allows us todesign sub-linear algorithms?

And if yes, what are these structural clues?

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 18 / 75

Page 19: Networks Navigability: Theory and Applications

Local Network Navigability

Example:

Knowledge Management Institute

9

Denis Helic 2010

Efficiently navigable

A network is efficiently navigable iff:If there is an algorithm that can find a short path with

only local knowledge, and the delivery time of thealgorithm is bounded polynomially by logk(n).

Efficiently navigable, if the algorithm knows it needs togo through A B C

A

B

C

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)

D

Figure: A is start node and D is target node.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 19 / 75

Page 20: Networks Navigability: Theory and Applications

Local Network Navigability

Step 1:

Knowledge Management Institute

10

Denis Helic 2010

Efficiently navigable

A network is efficiently navigable iff:If there is an algorithm that can find a short path with

only local knowledge, and the delivery time of thealgorithm is bounded polynomially by logk(n).

Efficiently navigable, if the algorithm knows it needs togo through A B C

A

B

C

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)

D

Figure: There are two possible paths from A. Obviously, the optimal path leads toB. What is the structural property that can guide us in selecting B?

Nodes degree

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 20 / 75

Page 21: Networks Navigability: Theory and Applications

Local Network Navigability

Step 1:

Knowledge Management Institute

10

Denis Helic 2010

Efficiently navigable

A network is efficiently navigable iff:If there is an algorithm that can find a short path with

only local knowledge, and the delivery time of thealgorithm is bounded polynomially by logk(n).

Efficiently navigable, if the algorithm knows it needs togo through A B C

A

B

C

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)

D

Figure: There are two possible paths from A. Obviously, the optimal path leads toB. What is the structural property that can guide us in selecting B?

Nodes degree

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 20 / 75

Page 22: Networks Navigability: Theory and Applications

Local Network Navigability

Step 2:

Knowledge Management Institute

11

Denis Helic 2010

Efficiently navigable

A network is efficiently navigable iff:If there is an algorithm that can find a short path with

only local knowledge, and the delivery time of thealgorithm is bounded polynomially by logk(n).

Efficiently navigable, if the algorithm knows it needs togo through A B C

A

B

C

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)

D

Figure: There are seven possible paths from B. Obviously, the optimal path leadsto C. What is the structural property that can guide us in selecting C?

Nodes clustering

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 21 / 75

Page 23: Networks Navigability: Theory and Applications

Local Network Navigability

Step 2:

Knowledge Management Institute

11

Denis Helic 2010

Efficiently navigable

A network is efficiently navigable iff:If there is an algorithm that can find a short path with

only local knowledge, and the delivery time of thealgorithm is bounded polynomially by logk(n).

Efficiently navigable, if the algorithm knows it needs togo through A B C

A

B

C

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)

D

Figure: There are seven possible paths from B. Obviously, the optimal path leadsto C. What is the structural property that can guide us in selecting C?

Nodes clustering

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 21 / 75

Page 24: Networks Navigability: Theory and Applications

Local Network Navigability

Summarizing, local network navigability requires:

1 Existence of network hubs that are connected to many nodes

2 Existence of network clusters where nodes are highly interlinked

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 22 / 75

Page 25: Networks Navigability: Theory and Applications

Local Network Navigability

Formally:

1 Power-low degree distribution with exponent γ

2 High clustering coefficient C

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 23 / 75

Page 26: Networks Navigability: Theory and Applications

Local Network Navigability

4

102

103

104

105

0

0.1

0.2

γ=2.1γ=2.2γ=2.3γ=2.4γ=2.5

102

103

104

105

network size (N)

0.3

0.4

0.5

0.6

0.7

succ

ess

prob

abili

ty (

p s)

2 2.2 2.4 2.6 2.8 3

degree exponent (γ)

0

0.2

0.4

0.6

succ

ess

prob

abili

ty (

p s)

α=1.1α=1.5α=2.0α=3.0α=5.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

clustering coefficient (C)

2

2.5

3

degr

ee e

xpon

ent (

γ)

102

103

104

105

network size (N)

0

0.1

0.2su

cces

s pr

obab

ility

(p s)

γ=2.6γ=2.7γ=2.8γ=2.9γ=3.0

α=1.1

α=5.0 navigable region

non-navigable region

Internet

Web of trust

Airports

Metabolic

FIG. 3: Success probability of greedy routing. Leftplots: success probability ps as a function of network size Nfor different values of γ with weak (top) and strong (bottom)clustering. The top-right plot shows ps as a function of γand α for networks of fixed size N ≈ 105. In the bottom-right plot, parameter α is mapped to clustering coefficientC [15] by computing C for each network with given γ andα. For each value of C, there is a critical value of γ = γc(C)such that the success ratio in networks with this C and γ >γc(C) decreases with the network size (ps(N) −−−−→

N→∞0), while

ps(N) reaches a constant value for large N in networks withγ < γc(C). The solid line in the plot shows these criticalvalues γc(C), separating the low-γ, high-C navigable region,in which greedy routing remains efficient in the large-graphlimit, from the high-γ, low-C non-navigable region, wherethe efficiency of greedy routing degrades for large networks.The plot labels measured values of γ and C for several realcomplex networks. Internet is the global Internet topology ofautonomous systems as seen by the Border Gateway Protocol(BGP) [31]; Web of trust is the Pretty Good Privacy (PGP)social network of mutual trust relationships [32]; Metabolic isthe network of metabolic reactions of E. coli [33]; and Airportsis the network of the public air transportation system [34].

the number of successful paths once clustering is abovea threshold, α ≥ 1.5. These observations mean that fora fixed clustering strength, there is a critical value of theexponent γ (Fig. 3 bottom-right) below which networksremain navigable as their size increases, but above whichtheir navigability deteriorates with their size.

In summary, strong clustering improves both naviga-bility metrics. We also find a delicate trade-off betweenvalues of γ close to 2 minimising path lengths, and highervalues – not exceeding γ ≈ 2.6 – maximising the per-centage of successful paths. We explain these findingsin the next section, but we note here that qualitatively,this navigable parameter region contains a majority ofcomplex networks observed in reality [1, 2, 3], as con-firmed in Fig. 3 (bottom-right), where we juxtapose fewparadigmatic examples of communication, social, biolog-ical, and transportation networks vs. the identified nav-

igable region of clustering and degree distribution expo-nent. Interestingly, power grids, which propagate elec-tricity rather than route information, are neither scale-free nor clustered [15, 35].

IV. AIR TRAVEL BY GREEDY ROUTING ASAN EXPLANATION

We illustrate the greedy routing function, and thestructure of networks conductive to such routing, withan example of passenger air travel. Suppose we wantto travel from Toksook Bay, Alaska, to Ibiza, Spain, bythe public air transportation network. Nodes in this net-work are airports, and two airports are connected if thereis at least one flight between them. We travel accord-ing to the greedy routing strategy using geography asthe underlying metric space. At each airport we choosethe next-hop airport geographically closest to the desti-nation. Under these settings, our journey goes first toBethel, then to Anchorage, to Detroit, over the Atlanticto Paris, then to Valencia and finally to Ibiza, see Fig. 4.The sequence and sizes of airport hops reveal the struc-ture of our greedy-routing path. The path proceeds froma small airport to a local hub at a small distance, fromthere to a larger hub at a larger distance, and so on un-til we reach Paris. At that point, when the distance tothe destination becomes sufficiently small, greedy routingleads us closer to our final destination by choosing notanother hub, but a less connected neighbouring airport.

We observe that the navigation process has two, some-what symmetric phases. The first phase is a coarse-grained search, travelling longer and longer distances perhop toward hubs, thus “zooming out” from the startingpoint. The second phase corresponds to a fine-grainedsearch, “zooming in” onto the destination. The turningpoint between the two phases appears naturally: once weare in a hub near the destination, the probability that itis connected to a bigger hub closer to the destinationsharply decreases, but at this point we do not need hubsanyway, and greedy routing directs us to smaller airportsat shorter distances next to the destination.

This zoom out/zoom in mechanism works efficientlyonly if the coupling between the airport network topol-ogy and the underlying geography satisfies the follow-ing two conditions: the sufficient hubs condition andthe sufficient clustering condition. The first conditionensures that a network has enough hub airports (high-degree nodes) to provide an increasing sequence duringthe zoom out phase. This condition is fulfilled by the realairport network and by other scale-free networks withsmall values of degree distribution exponent γ, becausethe smaller the γ, the larger the proportion of hubs inthe network.

However, the presence of many hubs does not ensurethat greedy routing will use them. Unlike humans, whocan use their knowledge of airport size to selectivelytravel via hub airports, greedy routing uses only one con-

Figure: Navigable networks in γ, C space.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 24 / 75

Page 27: Networks Navigability: Theory and Applications

Local Network Navigability

Revisiting Step 2:

Knowledge Management Institute

12

Denis Helic 2010

Efficiently navigable

A network is efficiently navigable iff:If there is an algorithm that can find a short path with

only local knowledge, and the delivery time of thealgorithm is bounded polynomially by logk(n).

A

B

C

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)

D

E

Figure: There are seven possible paths from B. Obviously, the optimal path leadsto C. What is an additional hint that can guide us in selecting C over E?

Nodes similarity

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 25 / 75

Page 28: Networks Navigability: Theory and Applications

Local Network Navigability

Revisiting Step 2:

Knowledge Management Institute

12

Denis Helic 2010

Efficiently navigable

A network is efficiently navigable iff:If there is an algorithm that can find a short path with

only local knowledge, and the delivery time of thealgorithm is bounded polynomially by logk(n).

A

B

C

J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. 32nd ACM Symposium on Theory of Computing, 2000. Also appears as Cornell Computer Science Technical Report 99-1776 (October 1999)

D

E

Figure: There are seven possible paths from B. Obviously, the optimal path leadsto C. What is an additional hint that can guide us in selecting C over E?

Nodes similarity

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 25 / 75

Page 29: Networks Navigability: Theory and Applications

Local Network Navigability

Nodes similarity is external to the network

It is derived from some additional information that we have aboutnetwork nodes

In Millgram’s experiment people selected the next person according totheir occupation or geography

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 26 / 75

Page 30: Networks Navigability: Theory and Applications

Local Network Navigability

All of this information, i.e. degrees, clustering, similarity can beunderstood as a kind of our background knowledge about the network

We use this background knowledge to guide us in our search for atarget node

When we have more than one link to follow we consult thebackground knowledge and ask which of the links will lead us withhighest probability to a given target node

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 27 / 75

Page 31: Networks Navigability: Theory and Applications

Greedy Decentralized Search

On the next abstraction level we can say that background knowledgedefines a notion of distance between nodes

In other words, background knowledge is a metric space where eachnode has unique coordinates and we can calculate the distancebetween nodes

Or in other words, we can abstract background knowledge as ablack-box executing a simple function:getDistance(node, target node)

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 28 / 75

Page 32: Networks Navigability: Theory and Applications

Greedy Decentralized Search

Let us now take an algorithmic perspective on decentralized search

We start at an arbitrary node and need to find as fast as possible atarget node having only local knowledge of the network

In addition, we have background knowledge represented throughgetDistance(node, target node) function

At each search step we have to make a decision which of the availablelinks to follow

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 29 / 75

Page 33: Networks Navigability: Theory and Applications

Greedy Decentralized Search

In order to maximize the probability of finding the target node wealways select a node which has the smallest distance to the targetnode

It has been shown that the greedy algorithm is very efficient, i.e. thenumber of steps to reach an arbitrary target node is ∝ log(n)

Kleinberg proved it theoretically, Watts by simulation

Watts was able to reproduce Millgram’s experiment with properselection of parameters: Identity and Search in Social Networks, 2002

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 30 / 75

Page 34: Networks Navigability: Theory and Applications

Background Knowledge

Now, how does our background knowledge of people typically looklike?

It is a metric space, e.g. 1-D spaces, 2-D vector spaces, 3-D Euclideanspaces, hyperbolic spaces, ... or does it look like completely different?

Actually, it was observed by Kleinberg and also by Watts that ahierarchy of nodes is also a very good approximation of how peoplethink

Hence, we will also use hierarchical background knowledge

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 31 / 75

Page 35: Networks Navigability: Theory and Applications

Hierarchy as a Metric Space

1 15

141312

22

11

2

2423 21

25 3

33323111

1

2

3

33

44 44

55 5 5

Figure: Node distances in a hierarchy.

Distance: d(i , j) = h(i) + h(j)− 2h(lca(i , j))− 1

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 32 / 75

Page 36: Networks Navigability: Theory and Applications

Example of a Greedy Navigation

1 15

141312

22

11

2

2423 21

25 3

333231

1

2 3

4

11

1

2

3

33

44 44

55 5 5

1

1514

13

12

22

11

2

24

23

21

25

3

3332

31

1

2 3

4

Figure: Greedy search.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 33 / 75

Page 37: Networks Navigability: Theory and Applications

Calculating Network Navigability

Now in order to measure network navigability, we developed atheoretical framework to estimate network navigability by simulations

As input we take a network, e.g. information network like Wikipedia,or Delicious

and a suitable hierarchy that models background knowledge

For example, Wikipedia categories or Delicious folksonomy

and simulate decentralized search on 106 start and target node pairs

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 34 / 75

Page 38: Networks Navigability: Theory and Applications

Network Navigability Simulation Framework

The metrics we measure by our framework are

success rate s

and stretch τ

For both metrics we calculate distributions over global shortest path

Definition

Stretch: τ = hl , where h is the number of simulator steps and l is the

global shortest path.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 35 / 75

Page 39: Networks Navigability: Theory and Applications

Evaluating hierarchies

The framework lets you e.g. estimate the quality of a hierarchy toserve as background knowledge

A hierarchy with better navigational properties will have bettersuccess rate and stretch in comparison with other hierarchies

For example, Wikipedia categories versus Delicious tags

For example, different folksonomies for navigating social taggingsystems, see Helic et al.: Pragmatic Evaluation of Folksonomies, 2011

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 36 / 75

Page 40: Networks Navigability: Theory and Applications

Evaluating Navigational Tools

But we can use framework to estimate the effects of changes in thenetwork on its navigational properties

For example, how navigable is Wikipedia now?

How navigable will be Wikipedia if we include Delicious tags?

How navigable will be Wikipedia if we include breadcrumbs?

We take Wikipedia as the starting network and create new links in thenetwork to emulate Delicious tags, breadcrumbs, etc.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 37 / 75

Page 41: Networks Navigability: Theory and Applications

Evaluating folksonomies

A folksonomy is a hierarchy that is automatically generated from atagging system

Today there are several folksonomy algorithms, see e.g. Heymann2008, or Benz 2010

In addition, you can produce folksonomies by using standardhierarchical clustering methods such as K-Means or AffinityPropagation

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 38 / 75

Page 42: Networks Navigability: Theory and Applications

Evaluating folksonomies

In Helic et al.: Pragmatic Evaluation of Folksonomies, WWW2011 wetook 5 tagging datasets and 5 different folksonomy algorithms

We produced 5x5 folksonomies and simulated (100.000 samples)greedy decentralized search on the datasets

We measured the success rate and stretch to see if some folksonomiesperform better than the other ones.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 39 / 75

Page 43: Networks Navigability: Theory and Applications

Evaluating folksonomies

0

20

40

60

80

100

1 2 3 4 5 6 7

Suc

cess

Rat

e (P

erce

ntag

e)

Shortest path

Greedy Search Success Rate: BibSonomy

FolksonomyRandomAff.Prop.K-Means

Deg/CoocClo/Cos

Figure: Success Rate of different folksonomies in BibSonomy

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 40 / 75

Page 44: Networks Navigability: Theory and Applications

Evaluating folksonomies

0

20

40

60

80

100

1 2 3 4 5 6 7

Suc

cess

Rat

e (P

erce

ntag

e)

Shortest path

Greedy Search Success Rate: CiteULike

FolksonomyRandomAff.Prop.K-Means

Deg/CoocClo/Cos

Figure: Success Rate of different folksonomies in CiteULike

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 41 / 75

Page 45: Networks Navigability: Theory and Applications

Evaluating folksonomies

0

20

40

60

80

100

1 2 3 4 5 6 7

Suc

cess

Rat

e (P

erce

ntag

e)

Shortest path

Greedy Search Success Rate: Delicious

FolksonomyRandomAff.Prop.K-Means

Deg/CoocClo/Cos

Figure: Success Rate of different folksonomies in Delicious

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 42 / 75

Page 46: Networks Navigability: Theory and Applications

Evaluating folksonomies

0

20

40

60

80

100

1 2 3 4 5 6

Suc

cess

Rat

e (P

erce

ntag

e)

Shortest path

Greedy Search Success Rate: Flickr

FolksonomyRandomAff.Prop.K-Means

Deg/CoocClo/Cos

Figure: Success Rate of different folksonomies in Flickr

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 43 / 75

Page 47: Networks Navigability: Theory and Applications

Evaluating folksonomies

0

20

40

60

80

100

1 2 3 4 5

Suc

cess

Rat

e (P

erce

ntag

e)

Shortest path

Greedy Search Success Rate: LastFM

FolksonomyRandomAff.Prop.K-Means

Deg/CoocClo/Cos

Figure: Success Rate of different folksonomies in LastFM

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 44 / 75

Page 48: Networks Navigability: Theory and Applications

Evaluating folksonomies

Centrality-based algorithms such as Heymann 2008 or Benz 2010outperform traditional methods

However, these are all theoretical results

Because, what is if we wanted to embed folksonomies in the userinterface (UI) to support users in their navigation tasks

and the space in user interface is limited?

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 45 / 75

Page 49: Networks Navigability: Theory and Applications

Embedding folksonomies in UIGoogle Directory - Computers > Internet > On the Web > Online Communities

Directory Help Online Communities Computers > Internet > On the Web > Online Communities Go to Directory Home Categories

Bulletin Board Systems (132) By Region (8) By Subject (204) Chat (745) Community Management (36) Community Providers (14)

Directories (9) Events (1) Mailing Lists (85) Message Boards (154) MySpace (28) Neopets (171)

PowerPets (6) Second Life (119) Social Networking (222) Software and Services (27) The Palace (51) Zetapets (3)

Related Categories: Society > Activism > Community Building (26) Society > Organizations (16987) Society > People > Personal Homepages (8890) Society > Relationships > Cyber Relationships (59) Society > Subcultures > Cyberculture (162) Web Pages Viewing in Google PageRank order View in alphabetical order

Talk City - http://www.talkcity.com/ Participate in discussions about relationships, hobbies, business, technology, health and other topics. Socialize with friends, or start your own chat group.

Whyville - http://www.whyville.net/ A virtual 3-D world for curious minds where you can own land, build your own house, play simulation games, win prizes, chat, and help the community grow.

Buzznet - http://www.buzznet.com/ Users can create communities and share blogs and photographs.

Flamingcube - http://www.flamingcube.com/ Offer image gallery hosting, webmail, albums, polls, and forums.

Flork - http://www.flork.com Worldwide community of interesting people. Find new friends easily by 'florking around.'

BusinessWeek: Internet Communities - http://www.businessweek.com/1997/18/b35251.htm Surfers are losing interest on website content and want now to settle in online communities.

http://directory.google.com/Top/Computers/Internet/On_the_Web/Online_Communities/ (1 of 3) [11.05.2011 10:05:12]

Figure: Directory Based Navigation

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 46 / 75

Page 50: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

We have breadcrumbs connecting each node all the way up to theroot node

We have limited number of subcategories (n)

We have limited number of related categories (m)

Now we embed folksonomy as in Benz 2010 and apply differentrestrictions

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 47 / 75

Page 51: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7 8 9

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.585123, h

-=5.936013, sg=0.005548, τg=1.655735

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in BibSonomy with n = 20 and m = 20

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 48 / 75

Page 52: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.634634, h

-=6.536937, sg=0.001110, τg=1.798513

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in CiteULike with n = 20 and m = 20

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 49 / 75

Page 53: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10 11 12

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.518932, h

-=5.557032, sg=0.000903, τg=1.579181

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in Delicious with n = 20 and m = 20

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 50 / 75

Page 54: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.467684, h

-=4.162304, sg=0.000382, τg=1.200312

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in Flickr with n = 20 and m = 20

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 51 / 75

Page 55: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

1 2 3 4 5 6

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.197477, h

-=6.662900, sg=0.001062, τg=2.083799

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in LastFM with n = 20 and m = 20

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 52 / 75

Page 56: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

Under this restriction the navigator in Considering practical user interfacerestriction folksonomies are useless for supporting navigation. The successrate drops below 1%.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 53 / 75

Page 57: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

Thus, folksonomies (unlimited) are useful theoretically but uselesspractically

The problem is that top nodes have many children (possiblythousands) and UI restrictions cut to many children nodes off

Hence, we need a new algorithm that takes into account these UIrestrictions

Technically, we need to able to determine the branching factor for thehierarchy

We developed such an algorithm and published in CIKM2011. Helicet al. Building Directories for Social Tagging Systems

We were able to almost recover theoretical navigability

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 54 / 75

Page 58: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.585123, h

-=8.691685, sg=1.000000, τg=2.424376

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in BibSonomy with new folksonomy algorithm

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 55 / 75

Page 59: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.634634, h

-=9.163688, sg=1.000000, τg=2.521213

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in CiteULike with new folksonomy algorithm

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 56 / 75

Page 60: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9 10 11 12

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.518932, h

-=9.720769, sg=1.000000, τg=2.762420

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in Delicious with new folksonomy algorithm

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 57 / 75

Page 61: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

1 2 3 4 5 6 7 8 9

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.467684, h

-=8.886960, sg=0.996066, τg=2.562794

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in Flickr with new folksonomy algorithm

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 58 / 75

Page 62: Networks Navigability: Theory and Applications

Embedding folksonomies in UI

0

1

2

3

4

5

6

1 2 3 4 5 6

s, τ

Shortest Path

Greedy Navigator (1000000 Runs) l-=3.197477, h

-=9.830726, sg=1.000000, τg=3.074526

Success Rate (s)Stretch (τ)

Figure: Success Rate and stretch in LastFM with new folksonomy algorithm

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 59 / 75

Page 63: Networks Navigability: Theory and Applications

Why usefulness of folksonomies for navigation is limited?

Even if folksonomies allow the user to navigate to related concepts inan efficient manner navigation to a particular resource is still aproblem

As shown related work, in tagging systems the tag-resourcedistribution follows a power-law function, i.e. there are many tagsthat refer to a large number of resources.

In BibSonomy or CiteULike for instance there are tags, which refer tohundreds or even thousands of resources.

To display such long resource lists, developers typically paginate theresource lists in a tagging system by a certain factor k

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 60 / 75

Page 64: Networks Navigability: Theory and Applications

Why usefulness of folksonomies for navigation is limited?

(a) Austria-Forum (b) BibSonomy (c) CiteULike

Figure: Tag distributions.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 61 / 75

Page 65: Networks Navigability: Theory and Applications

Creating tag-resource Taxonomies

To support the user not only to navigate to related tags in efficientmanner but also to the resources of a tagging system, we inventedthe approach of the so-called tag-resource taxonomies.

Tire Motor

Mercedes VWVOLVO BMW

Car

(a) Folksonomy

Tire Motor

VW VWBMW BMW

Car

(b) Tag-Resource Taxonomy

Figure: Folksonomy vs. Tag-Resource Taxonomy. In a Folksonomy tags appearonly once. However, resources can be referred by different tags. In a tag-resourcetaxonomy on the other hand resources can occur only once while tags can appearon multiple and on different levels.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 62 / 75

Page 66: Networks Navigability: Theory and Applications

Why usefulness of folksonomies for navigation is limited?

In the worst case a user would have to click max{click(Ttag )} timesto reach a desired resource with the help of a Folksonomy.

max{click(Ttag )} =c1 |r |

k+ logb/2(c2 · |r |), b ≥ 2 (1)

or

max{click(Ttag )} ≈ c1 · |r |k

(2)

supposing that logb/2(c2 · |r |)� c1·|r |k

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 63 / 75

Page 67: Networks Navigability: Theory and Applications

Why usefulness of folksonomies for navigation is limited?

The worst case scenario of a tag-resource taxonomy is significantlybetter. In the worst case a user would have to click max{click(Tres)}times to reach a desired target resource.

max{click(Tres)} = max{depth(Tres)} = logk/2 |r | , k ≥ 2 (3)

Then for large values of |r | we have:

logk/2 |r | �c1 · |r |

k(4)

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 64 / 75

Page 68: Networks Navigability: Theory and Applications

Why usefulness of folksonomies for navigation islimited?xxx

Austria-Forum BibSonomy CiteULike

max{click(Ttag )} 184 5,278 20,799

max{click(Tres)} 6.1 7.7 8.5

Table: Tag Taxonomy vs. Tag-Resource Taxonomy: Maximum number of clicksfor k = 10.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 65 / 75

Page 69: Networks Navigability: Theory and Applications

Why usefulness of folksonomies for navigation is limited?

To calculate the number of tags suffering from the so-calledpagination effect, we can user the following equation:

|tp| = |t| ·(α

k− 1

k

)( 1α)

(5)

Austria-Forum BibSonomy CiteULike

|tp| (%) 5079 (38%) 7401 (28%) 51748 (32%)

Table: Number of paginated tags for k = 10.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 66 / 75

Page 70: Networks Navigability: Theory and Applications

Why usefulness of folksonomies for navigation is limited?

The mean number of clicks is calculated as follows:Tag-Resource Taxonomy: mean{click(Tres)} = logk(|r |)Folksonomy: mean{click(Ttag )} = logk(|t|) + 1

|t|∑|t|

i=1rik

k Austria-Forum BibSonomy CiteULike

mean{click(Tres)} 2 14.2 17.8 19.8mean{click(Ttag )} 2 29.5 22.4 30.7

mean{click(Tres)} 5 6.1 7.6 8.5mean{click(Ttag )} 5 11.6 9.2 12.3

mean{click(Tres)} 10 4.3 5.3 5.9mean{click(Ttag )} 10 6.4 5.6 7.3

Table: Tag Taxonomy vs. Tag-Resource Taxonomy: Mean number of clicks fordifferent branching factors k .

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 67 / 75

Page 71: Networks Navigability: Theory and Applications

Creating tag-resource Taxonomies

1. Computer Degree centrality of the resource-to-resource tag network

2. Take most general resource as root an attach max. b resources aschilds. Child-nodes are selected according their cos-sim values.

3. After that we take the resource taxonomy and apply labels (tags)to the resource (top-down, in left-order)

3.1 We calculate candidate labels by the method of co-occurance, i.e.we take the tags of the related resources into account to rank theactual tags of the currently processed resource.

3.2. If the candidate tag has already been applied to one of theparent resources of the currently processed resource we take the nextcandidate tag from the co-occurance vector and try to apply it.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 68 / 75

Page 72: Networks Navigability: Theory and Applications

Evaluating Tag-Resource Taxonomies

In the first experiment we measured the average and maximumnumber of clicks and the drop rate

Name b n max{click(Tres)} mean{click(Tres)}Res2 2 19,430 17 12.45Res5 5 19,430 10 5.93Res10 10 19,430 8 4.44

Table: max{click(Tres)} and mean{click(Tres)} for different branching factors b.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 69 / 75

Page 73: Networks Navigability: Theory and Applications

Evaluating Tag-Resource Taxonomies

In the second experiment we measured the number of collisions

Name b n CR (%)

Res2 2 19,430 0.1%Res5 5 19,430 0.2%Res10 10 19,430 0.2%

Table: Collision Rates (CR) for different resource taxonomies with differentbranching factor b.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 70 / 75

Page 74: Networks Navigability: Theory and Applications

Evaluating Tag-Resource Taxonomies

In the third experiment we measured the semantic structure of thetag-resource taxonomy compared to popular folksonomy inductionalgorithms such as Heymann, K-Means, Affinity Propagation andCo-OccuranceAs measure for this experiment we used Taxonomic Recall/Prec. andoverlap.Ground truth: Germanet ontholoy

Res2 Res5 Res10 Deg/Cooc Aff. Prop K−Means Heymann0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Cou

nt (

1 =

100

%)

Taxonomic F−MeasureTaxonomic Overlap

Figure: Results of the semantic evaluation of the three generated tag-resourcetaxonomies Res2, Res5 and Res10.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 71 / 75

Page 75: Networks Navigability: Theory and Applications

Evaluating Tag-Resource Taxonomies

In the fourth and last experiment a user study was conducted to testweather the approach is also useful for humans or not

As ground truth for the experiment the best so far known folksonomygeneration approach was used

All over we had 9 test users who had to judge 200 tag trails extractedfrom both hierarchies

Name b Correct (%) Related (%) Equivalent (%) Not Related (%) Unknown(%)

Deg/Cooc10 10 33.2 27.3 13 21.9 5.1Res10 10 27.3 36.2 12.3 19.8 4.2

Table: Results of the empirical analysis of the tag-resource taxonomy withbranching factor b = 10 compared to a Deg/Cooc folksonomy with branchingfactor b = 10.

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 72 / 75

Page 76: Networks Navigability: Theory and Applications

End of presentation

Thank you very much for your attention!Christoph Trattner ([email protected])

Denis Helic & Christoph Trattner (KMI, TU Graz)Networks Navigability: Theory and Applications August 31, 2011 73 / 75