reputation systems i

49
Reputation Systems I HITS, PageRank, SALSA, eBay, EigenTrust, VKontakte Yury Lifshits Caltech http://yury.name Caltech CMI Seminar March 4, 2008 1 / 32

Upload: yury-lifshits

Post on 11-May-2015

2.043 views

Category:

Economy & Finance


0 download

DESCRIPTION

More details at http://yury.name/reputation.html

TRANSCRIPT

Page 1: Reputation Systems I

Reputation Systems IHITS, PageRank, SALSA,

eBay, EigenTrust, VKontakte

Yury LifshitsCaltech

http://yury.name

Caltech CMI SeminarMarch 4, 2008

1 /32

Page 2: Reputation Systems I

Wiki Definition

Reputation is the opinion (more technically, asocial evaluation) of the public toward aperson, a group of people, or an organization

2 /32

Page 3: Reputation Systems I

Outline

1 Intro

2 Reputations in Hyperlink GraphsHITSPageRankSALSA

3 Trust ReputationseBayEigenTrust

4 Personal ReputationsVKontakte

3 /32

Page 4: Reputation Systems I

1Introduction to Reputations

4 /32

Page 5: Reputation Systems I

Applications

Search

Trust and recommendations

Motivating openness & contribution

Keeping users engaged

Spam protection

Loyalty programs

Online systems: Slashdot, ePinions, Amazon, eBay,Yahoo! Answers, Digg, Wikipedia, World of Warcraft,BizRate.

Russian systems: Habr, VKontakte, Photosight

5 /32

Page 6: Reputation Systems I

Applications

Search

Trust and recommendations

Motivating openness & contribution

Keeping users engaged

Spam protection

Loyalty programs

Online systems: Slashdot, ePinions, Amazon, eBay,Yahoo! Answers, Digg, Wikipedia, World of Warcraft,BizRate.

Russian systems: Habr, VKontakte, Photosight

5 /32

Page 7: Reputation Systems I

Applications

Search

Trust and recommendations

Motivating openness & contribution

Keeping users engaged

Spam protection

Loyalty programs

Online systems: Slashdot, ePinions, Amazon, eBay,Yahoo! Answers, Digg, Wikipedia, World of Warcraft,BizRate.

Russian systems: Habr, VKontakte, Photosight 5 /32

Page 8: Reputation Systems I

Aspects

Input information

Benefits of reputation

Centralized/decentralized

Spam protection mechanisms

6 /32

Page 9: Reputation Systems I

Main Ideas

Random walk model

Rights, limits and thresholds

Real name, photo, contact and profileinformation

7 /32

Page 10: Reputation Systems I

Challenges

Spam protection

Fast computing

General theory, taxonomy of existingsystems

Reputation exchange market

What’s inside the real systems?

8 /32

Page 11: Reputation Systems I

2Reputations in Hyperlink Graphs

9 /32

Page 12: Reputation Systems I

Challenge

How to define the most relevant webpage to“Bill Gates”?

Naive ideas

By frequency of query words in a webpage

By number of links from other relevantpages

10 /32

Page 13: Reputation Systems I

Challenge

How to define the most relevant webpage to“Bill Gates”?

Naive ideas

By frequency of query words in a webpage

By number of links from other relevantpages

10 /32

Page 14: Reputation Systems I

Web Search: Formal Settings

Every webpage is represented as aweighted set of keywords

There are hyperlinks (directed edges)between webpages

Conceptual problem: define a relevancerank based on keyword weights and linkstructure of the web

11 /32

Page 15: Reputation Systems I

Web Search: Formal Settings

Every webpage is represented as aweighted set of keywords

There are hyperlinks (directed edges)between webpages

Conceptual problem: define a relevancerank based on keyword weights and linkstructure of the web

11 /32

Page 16: Reputation Systems I

HITS Algorithm

1 Given a query construct a focusedsubgraph F(q) of the web

2 Compute hubs and authorities ranks forall vertices in F(q)

Focused subgraph: pages with highestweights of query words and pageshyperlinked with them

12 /32

Page 17: Reputation Systems I

HITS Algorithm

1 Given a query construct a focusedsubgraph F(q) of the web

2 Compute hubs and authorities ranks forall vertices in F(q)

Focused subgraph: pages with highestweights of query words and pageshyperlinked with them

12 /32

Page 18: Reputation Systems I

Hubs and Authorities

Mutual reinforcing relationship:

A good hub is a webpage with many linksto query-authoritative pages

A good authority is a webpage with manylinks from query-related hubs

13 /32

Page 19: Reputation Systems I

Hubs and Authorities: Equations

a(p) ∼∑

q:(q,p)∈Eh(q)

h(p) ∼∑

q:(p,q)∈Ea(q)

14 /32

Page 20: Reputation Systems I

Hubs and Authorities: SolutionInitial estimate:

∀p : a0(p) = 1,h0(p) = 1

Iteration:

ak+1(p) =∑

q:(q,p)∈Ehk(q)

hk+1(p) =∑

q:(p,q)∈Eak(q)

We normalize ak, hk after every step15 /32

Page 21: Reputation Systems I

Convergence Theorem

TheoremLet M be the adjacency matrix of focusedsubgraph F(query). Then ak converges toprincipal eigenvector of MTM and hkconverges to principal eigenvector of MMT

16 /32

Page 22: Reputation Systems I

Lessons from HITS

Link structure is useful for relevancesorting

Link popularity is defined by linearequations

Solution can be computed by iterativealgorithm

17 /32

Page 23: Reputation Systems I

PageRank: Problem Statement

Compute “quality” of every page

Idea: base quality on the number of referringpages and their own quality

Other factors:Frequency of updatesNumber of visitorsRegistration in affiliated directory

18 /32

Page 24: Reputation Systems I

PageRank: Problem Statement

Compute “quality” of every page

Idea: base quality on the number of referringpages and their own quality

Other factors:Frequency of updatesNumber of visitorsRegistration in affiliated directory

18 /32

Page 25: Reputation Systems I

PageRank: Problem Statement

Compute “quality” of every page

Idea: base quality on the number of referringpages and their own quality

Other factors:Frequency of updatesNumber of visitorsRegistration in affiliated directory

18 /32

Page 26: Reputation Systems I

Random Walk ModelNetwork:NodesDirected edges (hyperlinks)

Model of random surferStart in a random nodeUse a random outgoing edgewith probability 1− ϵMove to a random node with probability ϵ

Limit probabilitiesFor every k the value PRk(i) is defined asprobability to be in the node i after k stepsFact: limk→∞ PRk(i) = PR(i), i.e.all probabilities converge to some limit ones

19 /32

Page 27: Reputation Systems I

Random Walk ModelNetwork:NodesDirected edges (hyperlinks)

Model of random surferStart in a random nodeUse a random outgoing edgewith probability 1− ϵMove to a random node with probability ϵ

Limit probabilitiesFor every k the value PRk(i) is defined asprobability to be in the node i after k stepsFact: limk→∞ PRk(i) = PR(i), i.e.all probabilities converge to some limit ones

19 /32

Page 28: Reputation Systems I

Random Walk ModelNetwork:NodesDirected edges (hyperlinks)

Model of random surferStart in a random nodeUse a random outgoing edgewith probability 1− ϵMove to a random node with probability ϵ

Limit probabilitiesFor every k the value PRk(i) is defined asprobability to be in the node i after k stepsFact: limk→∞ PRk(i) = PR(i), i.e.all probabilities converge to some limit ones19 /32

Page 29: Reputation Systems I

20 /32

Page 30: Reputation Systems I

PageRank EquationLet T1, . . . ,Tn be the nodes referring to iLet C(X) denote the out-degree of X

Claim: PR(i) = ϵ/N+ (1− ϵ)∑n

i=1PR(Ti)C(Ti)

Proof?

By definition of PRk(i):PR0(i) = 1/NPRk(i) = ϵ/N+ (1− ϵ)

∑ni=1

PRk−1(Ti)C(Ti)

Then just take the limits of both sides

Practical solution: to use PR50(i) computedvia iterative formula instead of PR(i)

21 /32

Page 31: Reputation Systems I

PageRank EquationLet T1, . . . ,Tn be the nodes referring to iLet C(X) denote the out-degree of X

Claim: PR(i) = ϵ/N+ (1− ϵ)∑n

i=1PR(Ti)C(Ti)

Proof?

By definition of PRk(i):PR0(i) = 1/NPRk(i) = ϵ/N+ (1− ϵ)

∑ni=1

PRk−1(Ti)C(Ti)

Then just take the limits of both sides

Practical solution: to use PR50(i) computedvia iterative formula instead of PR(i)

21 /32

Page 32: Reputation Systems I

PageRank EquationLet T1, . . . ,Tn be the nodes referring to iLet C(X) denote the out-degree of X

Claim: PR(i) = ϵ/N+ (1− ϵ)∑n

i=1PR(Ti)C(Ti)

Proof?

By definition of PRk(i):PR0(i) = 1/NPRk(i) = ϵ/N+ (1− ϵ)

∑ni=1

PRk−1(Ti)C(Ti)

Then just take the limits of both sides

Practical solution: to use PR50(i) computedvia iterative formula instead of PR(i)

21 /32

Page 33: Reputation Systems I

PageRank EquationLet T1, . . . ,Tn be the nodes referring to iLet C(X) denote the out-degree of X

Claim: PR(i) = ϵ/N+ (1− ϵ)∑n

i=1PR(Ti)C(Ti)

Proof?

By definition of PRk(i):PR0(i) = 1/NPRk(i) = ϵ/N+ (1− ϵ)

∑ni=1

PRk−1(Ti)C(Ti)

Then just take the limits of both sides

Practical solution: to use PR50(i) computedvia iterative formula instead of PR(i)

21 /32

Page 34: Reputation Systems I

PageRank as an EigenvectorLet us define a matrix L:lij := ϵ/N, if there is no edge from i to jlij := ϵ/N+ (1− ϵ) · 1

C(j), if there is an edge

Notation:PRk = (PRk(1), . . . ,PRk(N))

PR = (PR(1), . . . ,PR(N))

We have:PRk = Lk PR0PR = LPR

22 /32

Page 35: Reputation Systems I

PageRank as an EigenvectorLet us define a matrix L:lij := ϵ/N, if there is no edge from i to jlij := ϵ/N+ (1− ϵ) · 1

C(j), if there is an edge

Notation:PRk = (PRk(1), . . . ,PRk(N))

PR = (PR(1), . . . ,PR(N))

We have:PRk = Lk PR0PR = LPR

22 /32

Page 36: Reputation Systems I

PageRank as an EigenvectorLet us define a matrix L:lij := ϵ/N, if there is no edge from i to jlij := ϵ/N+ (1− ϵ) · 1

C(j), if there is an edge

Notation:PRk = (PRk(1), . . . ,PRk(N))

PR = (PR(1), . . . ,PR(N))

We have:PRk = Lk PR0PR = LPR

22 /32

Page 37: Reputation Systems I

PageRank as an EigenvectorLet us define a matrix L:lij := ϵ/N, if there is no edge from i to jlij := ϵ/N+ (1− ϵ) · 1

C(j), if there is an edge

Notation:PRk = (PRk(1), . . . ,PRk(N))

PR = (PR(1), . . . ,PR(N))

We have:PRk = Lk PR0PR = LPR

22 /32

Page 38: Reputation Systems I

SALSA

Construct query-specific directed graphF(q)

Transform F(q) into undirected bipartiteundirected graph W

Define its column weighted and rowweighted versions Wc,Wr

Consider “hub-authority” random walk:a(k+1) = WT

cWra(k)

Define authorities as the limit value of a(k)

vector23 /32

Page 39: Reputation Systems I

3Trust Reputations

24 /32

Page 40: Reputation Systems I

eBay

Buyers and sellers

Bidirectional feedback evaluation afterevery transaction

eBay Feedback: +/-, four criteria-specificratings, text comment

Total score: sum of +/- Feedback points

1, 6, 12, months and lifetime versions

25 /32

Page 41: Reputation Systems I

EigenTrust

Local trust cij ≥ 0 is based on personalexperience

Normalization∑n

j=1cij = 1

Experience matrix C

Trust equation t(k)

i =∑n

j=1cij · t

(k−1)

j

t(k)

i = (CT)nci

Trust vector t is the principle eigenvectorof C: t = lim t(k)

i

26 /32

Page 42: Reputation Systems I

EigenTrust: Pre-Trusted Nodes

Starting vector. Let P is the set ofpre-trusted nodes. Use t(0) = 1/ |P |

Local trust. Assume ϵ local trust from anynode to any pre-trusted node

27 /32

Page 43: Reputation Systems I

4Personal Reputations

28 /32

Page 44: Reputation Systems I

VKontakte

What is VKontakte.ru?

Russian “Facebook-style” website

Name means “in touch” in Russian

8.5M users (February 2008)

Working on English language version

29 /32

Page 45: Reputation Systems I

VKontakte Rating1 First 100 points: real name and photo, profile

completeness

2 Then: paid points (via SMS) gifted by yoursupporters

3 Any person has 1 free reference link, initiallypointing to a person who invited him to VKontakte.Bonus points (acquired by rules 2 and 3) arepropagating with 1/4 factor by reference links.

Rating benefits:

Basis for sorting: friends lists, group members,event attendees

Bias for “random six friends” selection

30 /32

Page 46: Reputation Systems I

VKontakte Rating1 First 100 points: real name and photo, profile

completeness

2 Then: paid points (via SMS) gifted by yoursupporters

3 Any person has 1 free reference link, initiallypointing to a person who invited him to VKontakte.Bonus points (acquired by rules 2 and 3) arepropagating with 1/4 factor by reference links.

Rating benefits:

Basis for sorting: friends lists, group members,event attendees

Bias for “random six friends” selection30 /32

Page 47: Reputation Systems I

References

J. Kleinberg

Authoritative sources in a hyperlinked environment

L. Page, S. Brin, R. Motwani, T. Winograd

The Pagerank citation ranking: Bringing order to the web

R. Lempel, S. Moran

The stochastic approach for link-structure analysis (SALSA) and the TKC effect

D. Houser, J. Wooders

Reputation in Auctions: Theory, and Evidence from eBay

S.D. Kamvar, M.T. Schlosser, H. Garcia-Molina

The Eigentrust algorithm for reputation management in P2P networks

VKontakte Team

http://vkontakte.ru/rate.php?act=help (in Russian)

31 /32

Page 48: Reputation Systems I

http://yury.nameOngoing project: http://businessconsumer.net

Thanks for your attention!Questions?

Second part (March 11, 4pm):

Spam protection for reputations

Open problems

32 /32

Page 49: Reputation Systems I

http://yury.nameOngoing project: http://businessconsumer.net

Thanks for your attention!Questions?

Second part (March 11, 4pm):

Spam protection for reputations

Open problems

32 /32