fast random walk with restart and its applications
DESCRIPTION
Fast Random Walk with Restart and Its Applications. Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan. ICDM 2006 Dec. 18-22, HongKong. Motivating Questions. Q: How to measure the relevance? - PowerPoint PPT PresentationTRANSCRIPT
Fast Random Walk with Restart and Its Applications
Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan
ICDM 2006 Dec. 18-22, HongKong
2
Motivating Questions
• Q: How to measure the relevance?
• A: Random walk with restart
• Q: How to do it efficiently?
• A: This talk tries to answer!
5
1
4
3
2
56
7
910
8
11
12
Random walk with restart
6
Random walk with restart
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
4
3
2
56
7
910
811
120.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Ranking vector
4r
7
Automatic Image Caption
• [Pan KDD04]
Text
Image
Region
Test Image
Jet Plane RunwayCandy
Texture
Background
8
Neighborhood Formulation
• [Sun ICDM05]
9
Center-Piece Subgraph
• [Tong KDD06]
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Heikki Mannila
Christos Faloutsos
Padhraic Smyth
Corinna Cortes
15 1013
1 1
6
1 1
4 Daryl Pregibon
10
2
11
3
16
10
Other Applications
• Content-based Image Retrieval• Personalized PageRank• Anomaly Detection (for node; link)• Link Prediction [Getoor], [Jensen], …• Semi-supervised Learning• ….
• [Put Authors]
11
Roadmap
• Background– RWR: Definitions– RWR: Algorithms
• Basic Idea• FastRWR
– Pre-Compute Stage– On-Line Stage
• Experimental Results• Conclusion
12
Computing RWR
1
43
2
5 6
7
9 10
811
12
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0
0.13
0.22
0.13
0.050.9
0.05
0.08
0.04
0.03
0.04
0.02
0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0.13 0
0.10 0
0.13 0
0.22
0.13 0
0.05 00.1
0.05 0
0.08 0
0.04 0
0.03 0
0.04 0
2 0
1
0.0
n x n n x 1n x 1
Ranking vector starting vectorAdjacent matrix
(1 )i i ir cWr c e
Q: Given ei, how to solve?
1
13
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
0
0
0
1
0
0
0
0
0
0
0
0
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
1
43
2
5 6
7
9 10
811
12
0.3
0
0.3
0.1
0.3
0
0
0
0
0
0
0
0.12
0.18
0.12
0.35
0.03
0.07
0.07
0.07
0
0
0
0
0.19
0.09
0.19
0.18
0.18
0.04
0.04
0.06
0.02
0
0.02
0
0.14
0.13
0.14
0.26
0.10
0.06
0.06
0.08
0.01
0.01
0.01
0
0.16
0.10
0.16
0.21
0.15
0.05
0.05
0.07
0.02
0.01
0.02
0.01
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
e
Wr
r
No pre-computation/ light storage
Slow on-line response
(1 )r cWr c e
O(mE)
14
2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34
1.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45
1.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33
1.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32
0.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56
0.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13
0.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79
0.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80
0.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72
0.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
PreCompute: 1 1( )Q I cW
1 1( )Q I cW
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
0.1 1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
1
43
2
5 6
7
9 10
811
12
Fast on-line response
Heavy pre-computation/storage costO(n^3) O(n^2)
15
Q: How to Balance?
On-line Off-line
16
Roadmap
• Background– RWR: Definitions– RWR: Algorithms
• Basic Idea• FastRWR
– Pre-Compute Stage– On-Line Stage
• Experimental Results• Conclusion
17
1
43
2
5 6
7
9 10
811
12
Basic Idea
1
43
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
1
43
2
5 6
7
9 10
811
12
Find Community
Fix the remaining
Combine
18
Basic Idea: Pre-computational stage
• A few small, instead of ONE BIG, matrices inversions
U V
Q-matrices Link matrices
+
1Q
19
Basic Idea: On-Line Stage
• A few, instead of MANY, matrix-vector multiplication
1Q
UV
+ +
Query
0
0
0
0
0
0
1
0
0
0
0
0
ir
ie
Result
20
Roadmap
• Background
• Basic Idea
• FastRWR– Pre-Compute Stage– On-Line Stage
• Experimental Results
• Conclusion
21
Pre-compute Stage
• p1: B_Lin Decomposition– P1.1 partition– P1.2 low-rank approximation
• p2: Q matrices– P2.1 computing (for each partition)– P2.2 computing (for concept space)
11Q
22
P1.1: partition
1
43
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
811
12
1 2WW W Within-partition links cross-partition links
23
P1.1: block-diagonal 1W
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1/2 1/2 0 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
11
1 22
33
0 0
0 0
0 0
W
W W
W
1
43
2
5 6
7
9 10
811
12
11W
12W
13W
1
43
2
5 6
7
9 10
811
12
24
P1.2: LRA for
2 SW U V
2W
31
4
2
5 6
7
9 10
811
12
0 0 0 0
-0.18 -0.36 0.13 -0.90
0 0 0 0
0.36 -0.18 0.90 0.13
-0.40 -0.81 -0.06 0.40
0 0 0 0
0 0 0 0
0.81 -0.40 -0.40 -0.06
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0.60 0 -0.30 0.65 0 0 -0.32 0 0 0 0
0 -0.30 0 -0.60 -0.32 0 0 -0.65 0 0 0 0
0 -0.72 0 -0.11 0.66 0 0 0.10 0 0 0 0
0 -0.11 0
0.72 0.10 0 0 -0.66 0 0 0 0
0.44 0 0 0
0 0.44 0 0
0 0 0.18 0
0 0 0 0.18
U
VS
1
43
2
5 6
7
9 10
811
12
25
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1/2 1/2 0 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
11W
12W
13W
31
4
29 10
811
12
5 6
7
c3c1
c4
c21
43
2
5 6
7
9 10
811
12
0 0 0 0
-0.18 -0.36 0.13 -0.90
0 0 0 0
0.36 -0.18 0.90 0.13
-0.40 -0.81 -0.06 0.40
0 0 0 0
0 0 0 0
0.81 -0.40 -0.40 -0.06
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0.60 0 -0.30 0.65 0 0 -0.32 0 0 0 0
0 -0.30 0 -0.60 -0.32 0 0 -0.65 0 0 0 0
0 -0.72 0 -0.11 0.66 0 0 0.10 0 0 0 0
0 -0.11 0
0.72 0.10 0 0 -0.66 0 0 0 0
0.44 0 0 0
0 0.44 0 0
0 0 0.18 0
0 0 0 0.18
UVS
+W
1W
26
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1/2 1/2 0 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
11W
12W
13W
p2.1 Computing
11
1 12
13
0 0
0 0
0 0
W
W W
W
1.85 0.88 1.08 0.88 0 0 0 0 0 0 0 0
0.88 1.52 0.88 0.52 0 0 0 0 0 0 0 0
1.08 0.88 1.85 0.88 0 0 0 0 0 0 0 0
0.88 0.52 0.88 1.52 0 0 0 0 0 0 0 0
0 0 0 0 1.58 1.29 1.29 0 0 0 0 0
0 0 0 0 0.64 1.78 1.09 0 0 0 0 0
0 0 0 0 0.64 1.09 1.78 0 0 0 0 0
0 0 0 0 0 0 0 1.42 1.00 0.79 0.89 0.76
0 0 0 0 0 0 0 0.50 1.61 0.86 0.60 0.66
0 0 0 0 0 0 0 0.59 1.30 2.29 1.35 1.64
0 0 0 0 0 0 0 0.67 0.91 1.35 2.07 1.54
0 0 0 0 0 0 0 0.38 0.66 1.09 1.02 1.95
11Q
1,1
1,2
1
1 11
11,3
0 0
0 0
0 0
Q
Q Q
Q
1,
1 1( )i iiQ I cW
1,1
1Q
11,2Q
1,
1 1( )i iiQ I cW
1,
1 1( )i iiQ I cW 1
1,3Q
27
Comparing and
• Computing Time– 100,000 nodes; 100 partitions– Computing 100,00x is Faster!
• Storage Cost (100x saving!)
11Q
11Q1Q
11Q
11Q
28
p2.2 Computing:
1S 1
1Q UV=_
1 1 11( )S cVQ U
-1
1
43
2
5 6
7
9 10
811
12
29
SM Lemma says:
We have:
U V
1 1 1 11 1 1Q Q cQ U VQ
Q-matricies Link matrices1
1Q
30
Roadmap
• Background
• Basic Idea
• FastRWR– Pre-Compute Stage– On-Line Stage
• Experimental Results
• Conclusion
31
On-Line Stage
• Q
+
Query
0
0
0
0
0
0
1
0
0
0
0
0
ir
ie
Result
?1Q
UV
+
11Q
• A (SM lemma)
32
On-Line Query Stage
q1:q2:q3:q4:q5:q6:
33
ir
ie
0r
ir
ir
ir
ir
+ (1-c)
U c11Q
11Q
V
q1: Find the community
q2-q5: Compensate out-community Links
q6: Combine
34
Example
• We have
1Q
UV
+
11Q
• we want to: 4r
1
4
3
2
5 6
7
9 10
811
12
35
1.85 0.88 1.08 0.88 0 0 0 0 0 0 0 0
0.88 1.52 0.88 0.52 0 0 0 0 0 0 0 0
1.08 0.88 1.85 0.88 0 0 0 0 0 0 0 0
0.88 0.52 0.88 1.52 0 0 0 0 0 0 0 0
0 0 0 0 1.58 1.29 1.29 0 0 0 0 0
0 0 0 0 0.64 1.78 1.09 0 0 0 0 0
0 0 0 0 0.64 1.09 1.78 0 0 0 0 0
0 0 0 0 0 0 0 1.42 1.00 0.79 0.89 0.76
0 0 0 0 0 0 0 0.50 1.61 0.86 0.60 0.66
0 0 0 0 0 0 0 0.59 1.30 2.29 1.35 1.64
0 0 0 0 0 0 0 0.67 0.91 1.35 2.07 1.54
0 0 0 0 0 0 0 0.38 0.66 1.09 1.02 1.95
q1:Find Community
q1:
0r
1
43
21
43
2
5 6
7
9 10
811
12
36
q2-q5: out-community
0r
q2:q3:q4:
5 6
7
9 10
811
12
1
43
2
11 0ir Q U V r
37
q6: Combination
4r
q6:
+ 0.9 0.1 =
5 6
7
9 10
811
12
1
43
21
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
38
Roadmap
• Background
• Basic Idea
• FastRWR– Pre-Compute Stage– On-Line Stage
• Experimental Results
• Conclusion
39
Experimental Setup
• Dataset– DBLP/authorship– Author-Paper– 315k nodes– 1,800k edges
• Quality: Relative Accuracy
• Application: Center-Piece Subgraph
40
Query Time vs. Pre-Compute Time
Log Query Time
Log Pre-compute Time
41
Query Time vs. Pre-Storage
Log Query Time
Log Storage
43
Roadmap
• Background
• Basic Idea
• FastRWR– Pre-Compute Stage– On-Line Stage
• Experimental Results
• Conclusion
44
Conclusion
• FastRWR– Reasonable quality preservation (90%+)– 150x speed-up: query time– Orders of magnitude saving: pre-compute & storage
• More in the paper– The variant of FastRWR and theoretic justification– Implementation details
• normalization, low-rank approximation, sparse
– More experiments• Other datasets, other applications
45
Q&A
Thank you!
www.cs.cmu.edu/~htong
46
Future work
• Incremental FastRWR
• Paralell FastRWR– Partition– Q-matraces for each partition
• Hierarchical FastRWR– How to compute one Q-matrix for
47
Possible Q?
• Why RWR?