efficient algorithms for personalized...

53
Sublinear Algorithms for Personalized PageRank, with Applications Ashish Goel Joint work with Peter Lofgren; Sid Banerjee; C Seshadhri 1

Upload: others

Post on 18-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Sublinear Algorithms for Personalized PageRank, with

Applications

Ashish Goel

Joint work with Peter Lofgren; Sid Banerjee; C Seshadhri

1

Page 2: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Personalized PageRank

2

Assume a directed graph with n nodes and m edges

Page 3: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Motivation: Personalized Search

3

Page 4: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Motivation: Personalized Search Re-ranked by PPR

4

Page 5: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Applications

• Personalized Web Search [Haveliwala, 2003]

• Product Recommendation [Baluja, et al, 2008]

• Friend Recommendation (SALSA) [Gupta et al, 2013]

5

Page 6: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms
Page 7: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

A Dark Test for Twitter’s People Recommendation System

Run various algorithms to predict follows, but don’t display the results. Instead, just observe how many of the top predictions get followed organically

(Money = Personalized PageRank on a bipartite graph; Love = HITS)

[Bahmani, Chowdhury, Goel; 2010]

Page 8: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Promoted Tweets and Promoted Accounts

Page 9: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Applications

• Community Detection

– Personalized PageRank [Yang, Lescovec 2015],[Andersen, Chung, Lang 2006]

9

Page 10: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Estimation Goal

10

Page 11: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

The Challenge

• Every user has different score vector: Full pre-computation: O(n2)

• Computing from scratch previously took Ω(n) time—several minutes on Twitter-2010

11

Page 12: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Previous Algorithms Summary

• Monte-Carlo: Sample random walks.

• (Local) Power-Iteration: Iteratively improve estimates based on recursive equation

12

Page 13: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

13

3s

6 min

10+ min

Monte-Carlo

Power Iteration R

un

nin

g Ti

me

per

Es

tim

ate

(s)

Mean relative error set to ≈10% for all algorithms.

Runtime on Twitter-2010 (1.5 billion edges)

Results Preview (Experiment)

Bidirectional

Page 14: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Results Preview (Theory) • Task: estimate of size within relative

error

• Previous Algorithms: – Monte Carlo:

– Power Iteration/ Local Update:

• Bidirectional Estimator for average target:

On Twitter-2010, n=40M, m=1.5B, =40K 14

# Nodes

# Edges

Page 15: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Generalizations • Arbitrary starting distributions.

Uniform ⇒ Global PageRank in average time

• Other Walk Length Distributions like Heat Kernel (used in community detection [Kloster, Gleich 2014],[Chung 2007]): Our estimator is 100x faster on 4 graphs

• Arbitrary Discrete Markov Chain

15

Page 16: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Previous Algorithm: Monte-Carlo

16

[Avrachenkov, et al 2007]

Page 17: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Previous Algorithm: Local Update

17

[Andersen, et al 2007]

• Computes from all s to a single t

• Works from t backwards along edges, updating Personalized PageRank estimates locally.

• Running time for average t: Average Degree Additive Error

Page 18: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Local Update Background

18

0.2 0.8

Page 19: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Local Update Example

19

Page 20: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Local Update Example

20

Page 21: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Local Update Example

21

Page 22: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Local Update Example

22

Page 23: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Local Update Example

23

Page 24: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Local Update Example

24

Page 25: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Analogy: Bidirectional Shortest Path

25

Page 26: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Bidirectional Estimation

The estimates p and residuals r satisfy a loop invariant [Anderson, et al 2007]:

Reinterpret the residuals as an expectation!

26

Page 27: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Bidirectional-PPR Algorithm

27

Page 28: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Number of samples

Every walk gives a sample, with

– Maximum value rmax

– Expected value at least ±

Number of walks needed to get a (1 +/- ²)-approximation with high probability =

28

Page 29: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Bidirectional-PPR Example

29

Page 30: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Theoretical Results

30

Page 31: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Forward vs Reverse Work Trade-off

31 31

More Walks Fewer Reverse Pushes

More Reverse Push Fewer Walks

u

Forward Walks

Reverse Pushes

Page 32: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Theoretical Results

Time-Space Trade-off [Lofgren, Banerjee, Goel, Seshadhri 2014; Lofgren, Banerjee, Goel 2015]

32

Page 33: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Problem: Unbalanced Forward and Reverse Runtime

0.5 s Forward

0.1 ms Reverse

1000 s Reverse

Global PageRank of Target

Ru

nti

me

(s)

33

Page 34: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Heuristic: Balancing Forward and Reverse Runtime

0.5 s Forward

0.1 ms Reverse

Median is 25 ms Forward and Reverse

1000 s Reverse

Global PageRank of Target

Ru

nti

me

(s)

34

50s Forward and Reverse

Page 35: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Experiments

35

Page 36: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Experimental Results: 70x Faster

Mean relative error set to ≈10% for all algorithms.

Ru

nn

ing

Tim

e p

er

Esti

mat

e (m

s)

Running Time (Targets PageRank)

36

20s

50ms

5min

3s

Page 37: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Alternative Estimator for Undirected Graphs

Key property:

We push forwards from s, and take random walks from t.

37

Page 38: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Alternative Algorithm for Undirected Graphs

• Loop Invariant of push-forward algorithm [Andersen, Chung, Lang, 2006]

• Use symmetry, and then interpret as expectation

Page 39: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Alternative Algorithm for Undirected Graphs

• Loop Invariant of push-forward algorithm [Andersen, Chung, Lang, 2006]

• Use symmetry, and then interpret as expectation

rmax

Page 40: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Running time for Undirected Graphs

Page 41: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Open Problems

• Get rid of the dependence on degree, to get an amortized bound of O(±1/2)

• Get a worst-case bound of O(m1/2) for directed graphs under the condition that the target has a high global PageRank

• Find sharding and sampling algorithms that preserve Personalized PageRank (eg. a sparsifier for Personalized PageRank?)

• Build an index around Personalized PageRank to enable network based Personalized Search

41

Page 42: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Open Problems

• Get rid of the dependence on degree, to get an amortized bound of O(±1/2)

• Get a worst-case bound of O(m1/2) for directed graphs under the condition that the target has a high global PageRank

• Find sharding and sampling algorithms that preserve Personalized PageRank (eg. a sparsifier for Personalized PageRank?)

• Build an index around Personalized PageRank to enable network based Personalized Search

42

Page 43: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Personalized Search Problem

Given

– A network with nodes (with keywords) and edges (weighted, directed)—Twitter

– A query, filtering nodes to a set T— “People named Adam”

– A user s (or distribution over nodes) —me

Rank the approximate top-k targets by

43

Page 44: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Personalized Search Problem

Baselines:

– Monte Carlo: Needs many walks to find enough samples within T unless T is very large

– Bidirectional-PPR to each t: Slow unless T is small

Challenge: Can we efficiently find top-k for any size of T?

44

Idea: Modify Bidirectional-PPR to sample in proportion to

Page 45: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Personalized Search Example

t1

t2

t3

b a

s c

Expand targets Random walks To sample a target: layer 1: sample (a,b,c) w.p. (0, 10%, 90%) Layer 2: b→ t1 c→ sample (t1, t2, t3) w.p (56%, 22%, 22%)

45

People Named Adam Searching User

Page 46: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Personalized Search Running Time R

un

nin

g Ti

me

per

Sea

rch

(s)

Target Set Size Precision@3 set to 90% for all algorithms.

46

Runtime on Twitter-2010 (1.5 billion edges)

10 100 1000 10,000

10

100

1000

1

0.1

Significant Pre-computation (3-30MB per keyword)

Page 47: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Personalized Search Result

47

Page 48: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Demo

48

Page 49: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

49

Page 50: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

50

Page 51: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Demo Task: Find applications of entropy in computer networking.

personalizedsearchdemo.com

51

Page 52: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

Distributed PageRank

• Problem: Computing PageRank on graph too large for one machine.

• Algorithm:

– Shard edges randomly,

– compute on each machine

– average results

• Basic idea: Duplicate edges from low-degree nodes. Gives an unbiased* estimator.

52

Page 53: Efficient Algorithms for Personalized PageRankdimacs.rutgers.edu/Workshops/ParallelAlgorithms/... · A Dark Test for Twitter’s People Recommendation System Run various algorithms

53

L1 E

rro

r

Min Degree Parameter

Sharded Global PageRank Accuracy on Twitter-2010

No Edge Duplication

Average 2.1x Duplication