kdd 2005 review session jure leskovec. query incentive networks jon kleinberg prabhakar raghavan

KDD 2005 Review Session

Jure Leskovec

Query Incentive Networks

Jon Kleinberg

Prabhakar Raghavan

Query Incentive Networks Networks are everywhere

Decentralized peer-to-peer networks On-line communities

There is no central index So users post queries to the network itself Requests get propagated until the answer

is found

Motivating example On-line Social Networking sites

Frendster, Orkut, LinkedIn, … maintain social network of their members

Large on-line community of off-line world friendships

Use the network to help find information and services:

Find job or apartment through friends of friends

Queries propagating over the network of friendships (trust)

Networks as Marketplaces Intuitively we want our network to have short

easily findable paths

More questions How do members of the network extract utility from

their interactions with other members? What is the system behavior as members interact

strategically to maximize their utilities?

The Setting Formulation of a simple model of query

propagation on a random network Node v* poses a query and offers a reward Query propagates and answer is found

How should other nodes behave? How much reward should a query node v* offer?

Main Result If a node has less than 2 neighbors on the

average (branching factor is 2), then node has to invest enormous amount to receive the answer

If branching factor is more than 2, it needs to invest only O(log n)

Consequence Known result

At branching factor of 1 network has a giant connected component and short paths O(log N)

Consequence The network achieves structural robustness at

branching factor of 1 At branching factor of at least 2 the network makes

searching feasible in the presence of incentives

Formulating the Model – Big Picture Node v* belonging to network is seeking a

piece of information held by certain nodes Node v* offers a reward which will be paid

when the answer is received If a neighbor of v* does not have an answer

It takes a piece of the reward for itself Offers a smaller reward to its neighbors –

“subcontractors” (hoping they would have the answer)

Query propagates and eventually finds the answer

Tree Model – Example

v*

Initial utility 9

reward 5

cb

fed

reward 2reward 2

answer

g reward 0

reward 1 answer

Offers reward for the answer

Tree Model – Answer propagation

v*

Initial utility 9

reward 5

cb

fed

reward 2reward 2

g reward 0

reward 1 answer

Offers reward for the answer

Answer dreward=2

Reward5-2=3

Utility9-5=4

answer

Tree model We model the underlying network as a tree T The root v* of thee T has a query for which it has

utility r* Each node holds an answer with probability 1-p Node v* offers a reward for the answer The query propagates down the tree

Tree model – “Subcontracting” Each node takes its share of reward

Each node v has a integer valued function fv

If node v is offered a reward of r by its parent and node v does not posses the answer

Then it offers a reward fv(r) < r to its children

The propagation of query stops along a particular path in T when Offered rewards shrink to 0 When a node that holds the answer is reached

Tree model – Getting the Reward From among all the answer-holders that are

discovered the root v* selects an answer The reward propagates down the path to the

answer-holder Each node on the answer path keeps its share

of the reward Forwarding the reward has unit cost

The Model as a Game (1) Nodes behave strategically Each chooses how to offer a reward to

maximize the payoff Each player (node) v picks a strategy in form of

function fv

u v

Offers reward r

Offers reward fv(r)

Note that fv(r) is integer valuedso propagation is not infinite

The Model as a Game (2) Function fv(r) determines how much of the

reward is passed forward Defines the strategy Forwarding the query has unit cost: fv(r) ≤ r-1 So each node keeps at least 1 unit for their effort

u v

Offers reward r

Offers reward fv(r)

Note that fv(r) is integer valuedso propagation is not infinite

Unique Nash Equilibrium The game has a unique Nash Equilibrium

fv(r) = x subject to maxx (r-x-1)αv(f, x)

αv(f, x) … probability that tree bellow v contains an answer

At Nash equilibrium no player gains anything by deviating from the current strategy

All nodes have the same strategy (function f)

Proof: Show that maximum expected reward is

exactly maxx (r-x-1)αv(f, x)

The Model – the Details Rarity n

One out of n nodes has the answer: n = (1–p)-1

We model the tree T with 2 parameters T is a d-ary tree Each node in T is on-line with probability q Average branching b of the tree T is then b=q∙d

v*

db

ge

d = 3q = 0.5

tree T

Structure of Rewards (1) How large must be the initial reward to obtain an

answer with high probability?

Given n … the rarity of an answer b … tree branching factor

Let Rσ(n,b) be the reward root node v* must offer to get the answer with probability σ

What can we say about Rσ(n,b) as we change σ, the probability of getting an answer?

Structure of Rewards (2) Reward Rσ(n,b) increases in steps >1

The steps occur exactly the reward is sufficient to propagate a level deeper

σ, probability of getting an answer

r* = Rσ(n,b)Reward offered

by the root

Growth rate of Rewards (1) How does the reward offered by the root Rσ(n,b)

change as a function of rarity n and branching b

For branching factor 1<b<2

Rσ(n,b) = Ω(nc)

where c depends on 2-b

A node needs to offer exponentially larger rewards as the answer gets rarer

Growth rate of Rewards (2) For branching factor b>2

Rσ(n,b) = O(log n)

where constant in O() depends on b-2

The offered reward increases logarithmically with the rarity

Consequences + Conclusion (1) For branching factors b>1 the answer is close

The distance to the nearest answer is O(log n) Each node on the path retains at least a unit of reward

For large branching factor the propagation of queries is very efficient in the use of reward

When 1<b<2 the distance to the nearest answer is still O(log n)

But now the reward needed by the root is exponential in distance

Consequences + Conclusion (2) The result is a surprise since branching b=2 is

not a critical point of the branching process

As b>1 the answer lies O(log n) steps away – network structure becomes robust

But as incentives are taken into account b=2 is a critical value – network becomes efficient incentive based queries

Other interesting papers (1) The Predictive Power of On-Line Chatter by

Gruhl, Guha, Kumar, Novak and Tomkins Can we say anything about book sales while

observing blog postings? Correlate the blog postings with book sales

Craw blogs to get blog postings Get sales data from Amazon (Sales-rank)

They automate the process Given a set of book titles Automatically design queries for blog posts Predict whether a spike in the book sales will occur

Predictive power of on-line chatter

The Lance Armstrong performance program

Other interesting papers (2) Monograms for Visualizing Support Vector

Machines by Jakulin, Mozina, Demsar, Bratko and Zupan Monogram … place axis parallel to each other

instead of at right angles Can work with large number of attributes Supports either regression of classification Can be generalized to visualize any Generalized

Additive Model (decomposable kernels)

Example – Boston Housing

Expensive Cheap

Outcomeprobability

log odds ratio:the odds in favor of a cheap area

Aggregate OR:

Att

rib

ute

s

possible values

valuestaken

prediction

logpcheap

pexpensive=intercept(prior)

+e®ect

(evidence)

Other interesting papers (3) Unweaving a Web of Documents by Guha,

Kumar, Sivakumar and Sundaram Given a set of time-stamped documents (e.g. news) We want to decompose them into semantically

coherent threads Algorithm

Create relevance graph pointing back in time Decompose a graph into node-disjoint paths Formulate the minimum cost flow problem

Given graph with costs and capacities Push as much flow using minimum cost

Unweaving a Web of Documents

Thread found by the algorithm IRA attacks

kdd 2005 review session jure leskovec. query incentive networks jon kleinberg prabhakar raghavan

Documents

reward of r

answeroffers reward

query node v

answerthe reward

smaller reward

reward fvr r

answerholdereach node

underlying network