scs cmu proximity on large graphs speaker: hanghang tong 2008-4-10 15-826 guest lecture
TRANSCRIPT
![Page 1: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/1.jpg)
SCS CMU
Proximity on Large Graphs
Speaker:
Hanghang Tong
2008-4-10 15-826 Guest Lecture
![Page 2: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/2.jpg)
SCS CMU
2
Graphs are everywhere!
![Page 3: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/3.jpg)
SCS CMU
4
Graph Mining: the big picture Graph/Global Level
Subgraph/Community Level
Node Level We are here!
![Page 4: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/4.jpg)
SCS CMU
5
Proximity on Graph: What?
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
a.k.a Relevance, Closeness, ‘Similarity’…
![Page 5: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/5.jpg)
SCS CMU
6
Proximity is the main tool behind…
• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]• Image caption [Pan+]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…
Will return to this later
![Page 6: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/6.jpg)
SCS CMU
7
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
![Page 7: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/7.jpg)
SCS CMU
8
Why not shortest path?
‘pizza delivery guy’ problem
A BD1 1
A BD1 1
E
F
G1 11
A BD1 1
E F1
1 1
A BD1 1
‘multi-facet’ relationship
Some ``bad’’ proximities
![Page 8: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/8.jpg)
SCS CMU
9
A BD1 1
A BD1 11 E
Why not max. netflow?
No punishment on long paths
Some ``bad’’ proximities
![Page 9: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/9.jpg)
SCS CMU
11
What is a ``good’’ Proximity?
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
• Multiple Connections
• Quality of connection
•Direct & In-directed Conns
•Length, Degree, Weight…
…
![Page 10: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/10.jpg)
SCS CMU
12
1
4
3
2
56
7
910
8
11
12
Random walk with restart
![Page 11: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/11.jpg)
SCS CMU
13
Random walk with restart
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
4
3
2
56
7
910
811
120.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Ranking vector More red, more relevant
Nearby nodes, higher scores
4r
![Page 12: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/12.jpg)
SCS CMU
Why RWR is a good score?
14
2c 3cc ...
W 2W 3W
all paths from i to j with length 1
all paths from i to j with length 2
all paths from i to j with length 3
![Page 13: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/13.jpg)
SCS CMU
15
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
![Page 14: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/14.jpg)
SCS CMU
16
Variant: escape probability
• Define Random Walk (RW) on the graph• Esc_Prob(AB)
– Prob (starting at A, reaches B before returning to A)
Esc_Prob = Pr (smile before cry)
A Bthe remaining graph
![Page 15: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/15.jpg)
SCS CMU
17
Other Variants
• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]
• Equivalence of Random Walks– Electric Networks:
• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]
– String Systems
• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]
![Page 16: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/16.jpg)
SCS CMU
18
Other Variants
• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]
• Equivalence of Random Walks– Electric Networks:
• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+]
– String Systems
• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]
All are related to, or similar to random walk with restart!
![Page 17: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/17.jpg)
SCS CMU
19
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
![Page 18: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/18.jpg)
SCS CMU
20
Asymmetry of Proximity [Tong+ KDD07 a]
A B
1 1
1
111
1 A B
1 1
1
10.51
0.5
What is Prox from A to B?What is Prox from B to A?
What is Prox between A and B?
![Page 19: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/19.jpg)
SCS CMU
21
Asymmetry also exists in un-directed graphs
• Hanghang’s most important conf. is KDD
• The most important author in KDD is ...
So is love…
HanghangKDD
![Page 20: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/20.jpg)
SCS CMU
22
Roadmap
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
• Basic: RWR
• Variants
• Asymmetry of Prox.
• Group Prox
• Prox w/ Attributes
• Prox w/ Time
![Page 21: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/21.jpg)
SCS CMU
23
Group Proximity [Tong+ 2007]
• Q: How close are Accountants to SECs?
• A: Prob (starting at any RED, reaches any GREEN before touching any RED again)
Accountant
CEO
Manager
SEC
![Page 22: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/22.jpg)
SCS CMU
24
Proximity on Attribute Graphs
What is the proximity from node 7 to 10?If we know that…
Accountant
CEO
Manager
SEC
![Page 23: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/23.jpg)
SCS CMU
25
Sol: Augmented graphs
12
1311
9
10
5
6
7
8
1
2
3
4
![Page 24: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/24.jpg)
SCS CMU
26
Attributes on nodes/edges (ER graph) [Chakrabarti+ WWW07]
skip
Wrote Sent Received
In-Replied-toCited
Works
![Page 25: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/25.jpg)
SCS CMU
27
Proximity w/ Time
• Sol #1: treat time an categorical attr. [Minkov+]
• Sol #2: aggregate slice matrices [Tong+]
Time
•Global aggregation
•Slide window
•Exponential emphasis
![Page 26: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/26.jpg)
SCS CMU
28
Summary of Part I
• Goal: Summarize multiple … relationships
• Solutions– Basic: Random Walk with Restart– Property: Asymmetry– Variants: Esc_Prob and many others.– Generalization: Group Prox.; w/ Attr.; w/ Time
![Page 27: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/27.jpg)
SCS CMU
29
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
![Page 28: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/28.jpg)
SCS CMU
Preliminary: Sherman–Morrison Lemma
30
Tv
uAA =
If:
1 11 1 1
1( )
1
TT
T
A u v AA A u v A
v A u
Then:
![Page 29: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/29.jpg)
SCS CMU
SM Lemma: Applications
• RLS – and almost any algorithm in time series!
• Leave-one-out cross validation for LS
• Kalman filtering
• Incremental matrix decomposition
• … and all the fast sols we will introduce!
31
![Page 30: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/30.jpg)
SCS CMU
32
Computing RWR
1
43
2
5 6
7
9 10
811
12
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0
0.13
0.22
0.13
0.050.9
0.05
0.08
0.04
0.03
0.04
0.02
0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0.13 0
0.10 0
0.13 0
0.22
0.13 0
0.05 00.1
0.05 0
0.08 0
0.04 0
0.03 0
0.04 0
2 0
1
0.0
n x n n x 1n x 1
Ranking vector Starting vectorAdjacent matrix
1
(1 )i i ir cWr c e
Restart p
![Page 31: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/31.jpg)
SCS CMU
33
Beyond RWR
P-PageRank[Haveliwala]
PageRank[Haveliwala]
RWR[Pan, Sun]
SM Learning[Zhou, Zhu]
RL in CBIR[He]
Fast RWR (B_Lin) Finds the Root Solution !
: Maxwell Equation for Web![Chakrabarti]
![Page 32: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/32.jpg)
SCS CMU
34
• RWR is the building block for computing…– Escape Probability (augmented w/sink)
[Tong+]– ..Effective ConductancResistance Dist.
Commute Time– MRF (special structure) [Cohen]
• Similar Idea of B_Lin to compute other measurements
Beyond RWR
![Page 33: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/33.jpg)
SCS CMU
35
• Q: Given query i, how to solve it?
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
??
Adjacent matrix Starting vector
![Page 34: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/34.jpg)
SCS CMU
36
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
0
0
0
1
0
0
0
0
0
0
0
0
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
1
43
2
5 6
7
9 10
811
12
0.3
0
0.3
0.1
0.3
0
0
0
0
0
0
0
0.12
0.18
0.12
0.35
0.03
0.07
0.07
0.07
0
0
0
0
0.19
0.09
0.19
0.18
0.18
0.04
0.04
0.06
0.02
0
0.02
0
0.14
0.13
0.14
0.26
0.10
0.06
0.06
0.08
0.01
0.01
0.01
0
0.16
0.10
0.16
0.21
0.15
0.05
0.05
0.07
0.02
0.01
0.02
0.01
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
No pre-computation/ light storage
Slow on-line response O(mE)
ir
ir
![Page 35: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/35.jpg)
SCS CMU
37
0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34
0.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45
0.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33
0.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32
0.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56
0.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13
0.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79
0.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80
0.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72
0.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
4
PreCompute
1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
13
2
5 6
7
9 10
811
12
[Haveliwala]
R:
![Page 36: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/36.jpg)
SCS CMU
38
2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.34
1.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.45
1.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.33
1.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32
0.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.56
0.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.13
0.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.79
0.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.80
0.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.72
0.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
PreCompute:
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
1
43
2
5 6
7
9 10
811
12
Fast on-line response
Heavy pre-computation/storage costO(n ) O(n )
0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02
3 2
![Page 37: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/37.jpg)
SCS CMU
39
Q: How to Balance?
On-line Off-line
![Page 38: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/38.jpg)
SCS CMU
40
B_Lin: Basic Idea[Tong+]
1
43
2
5 6
7
9 10
8 11
120.130.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
1
43
2
5 6
7
9 10
811
12
Find Community
Fix the remaining
Combine1
43
2
5 6
7
9 10
8 11
12
1
43
2
5 6
7
9 10
8 11
12
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
8 11
12
1
43
2
5 6
7
9 10
8 11
12
1
43
2
![Page 39: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/39.jpg)
SCS CMU
41
Pre-computational stage
• Q: • A: A few small, instead of ONE BIG, matrices inversions
Efficiently compute and store Q-1
![Page 40: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/40.jpg)
SCS CMU
42
• Q: Efficiently recover one column of Q• A: A few, instead of MANY, matrix-vector multiplication
On-Line Query Stage
+
0
0
0
0
0
0
1
0
0
0
0
0
-1
ie ir
![Page 41: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/41.jpg)
SCS CMU
43
Pre-compute Stage
• p1: B_Lin Decomposition– P1.1 partition– P1.2 low-rank approximation
• p2: Q matrices– P2.1 computing (for each partition)– P2.2 computing (for concept space)
11Q
![Page 42: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/42.jpg)
SCS CMU
44
P1.1: partition
1
43
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
811
12
Within-partition links cross-partition links
skip
![Page 43: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/43.jpg)
SCS CMU
45
P1.1: block-diagonal
1
43
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
811
12
skip
![Page 44: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/44.jpg)
SCS CMU
46
P1.2: LRA for
31
4
2
5 6
7
9 10
811
12
1
43
2
5 6
7
9 10
811
12
|S| << |W2|~
skip
![Page 45: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/45.jpg)
SCS CMU
47
+
=
skip
![Page 46: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/46.jpg)
SCS CMU
48
p2.1 Computing
c
skip
![Page 47: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/47.jpg)
SCS CMU
49
Comparing and
• Computing Time– 100,000 nodes; 100 partitions– Computing 100,00x is Faster!
• Storage Cost – 100x saving!
Q 1,1
Q 1,2
Q 1,k
11Q
=
11Q
11Q
skip
![Page 48: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/48.jpg)
SCS CMU
50
• Q: How to fix the green portions?
W +~
~~
11Q
+ ?
skip
![Page 49: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/49.jpg)
SCS CMU
51
p2.2 Computing:
1S UV=_
-1
1
43
2
5 6
7
9 10
811
12
Q 1,1
Q 1,2
Q 1,k
skip
![Page 50: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/50.jpg)
SCS CMU
52
SM Lemma says:
We have:
Communities Bridges
1 1 11 1
11U VcQQ Q Q
skip
![Page 51: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/51.jpg)
SCS CMU
53
On-Line Stage
• Q
+
Query
0
0
0
0
0
0
1
0
0
0
0
0
Result
?
• A (SM lemma)
Pre-Computation
ie ir
skip
![Page 52: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/52.jpg)
SCS CMU
54
On-Line Query Stage
q1:q2:q3:q4:q5:q6:
skip
![Page 53: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/53.jpg)
SCS CMU
55
skip
![Page 54: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/54.jpg)
SCS CMU
56
Query Time vs. Pre-Compute Time
Log Query Time
Log Pre-compute Time
•Quality: 90%+ •On-line:
•Up to 150x speedup•Pre-computation:
•Two orders saving
![Page 55: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/55.jpg)
SCS CMU
57
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
![Page 56: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/56.jpg)
SCS CMU
58
FastAllDAP [Tong+]
Footnote: augmented w/ universal sink as practical modification
A Bthe remaining graph
• Q: How to compute – Esc_Prob = Pr (smile before cry)?
![Page 57: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/57.jpg)
SCS CMU
59
Solving DAP (Straight-forward way)
One matrix inversion, one proximity!
2 1,
ˆProx( )=c ( )ti j i ji j p I cP p c p
1 x (n-2) 1 x (n-2)(n-2) x (n-2)
1-c: fly-out probability (to black-hole)
![Page 58: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/58.jpg)
SCS CMU
60
Esc_Prob(1->5) =
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
P=
I - +
-1
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
P: Transition matrix (row norm.)
2cc
![Page 59: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/59.jpg)
SCS CMU
61
• Case 1, Medium Size Graph– Matrix inversion is feasible, but…– What if we want many proximities?– Q: How to get all (n ) proximities efficiently?– A: FastAllDAP!
• Case 2: Large Size Graph – Matrix inversion is infeasible– Q: How to get one proximity efficiently?– A: FastOneDAP!
Challenges
2
skip
![Page 60: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/60.jpg)
SCS CMU
62
FastAllDAP
• Q1: How to efficiently compute all possible proximities on a medium size graph?– a.k.a. how to efficiently solve multiple
linear systems simultaneously?
• Goal: reduce # of matrix inversions!
![Page 61: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/61.jpg)
SCS CMU
63
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP: Observation
1 5
3
2
6
4
0.5 0.5
0.5
0.50.5
0.5
0.5
1
0.5 1
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
Need two different matrix inversions!
P=
P=
![Page 62: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/62.jpg)
SCS CMU
64
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP: Rescue
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
Redundancy among different linear systems!
P=
P=
Overlap between two gray parts!
Prox(1 5)
Prox(1 6)
![Page 63: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/63.jpg)
SCS CMU
65
FastAllDAP: Theorem
• Theorem:
• Proof: by SM Lemma
• Example:
![Page 64: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/64.jpg)
SCS CMU
66
FastAllDAP: Algorithm
• Alg.– Compute Q– For i,j =1,…, n, compute
• Computational Save O(1) instead of O(n )!
• Example– w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix!
2
![Page 65: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/65.jpg)
SCS CMU
67
FastAllDAP
Size of Graph
Time (sec)
Straight-Solver
FastAllDAP
1,000xfaster!
![Page 66: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/66.jpg)
SCS CMU
68
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
![Page 67: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/67.jpg)
SCS CMU
RWR on Bipartite Graph
69
n
m
authors
Conferences
Author-Conf. Matrix
Observation: n >> m!
Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs, 18k mvs
![Page 68: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/68.jpg)
SCS CMU
70
• Q: Given query i, how to solve it?
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
RWR on Skewed bipartite graphs
??
… . . .. . …. .. .
. …. . ..
0
0
n m
Ar
… .
. .. . …. .. .
. …. . .. Ac
![Page 69: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/69.jpg)
SCS CMU
• Step 1:
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
71
BB_Lin: Pre-Computation [Tong+ 06]
M = Ac ArX
1( 0.9 )I M 3( )O m m E
2-step RWR for Conferences
All Conf-Conf Prox. Scores
![Page 70: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/70.jpg)
SCS CMU
72
BB_Lin: Pre-Computation [Tong+ 06]
• Step 1:
• Step 2:
M = Ac ArX
1( 0.9 )I M
2-step RWR for Conferences
All Conf-Conf Prox. Scores
![Page 71: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/71.jpg)
SCS CMU
73
BB_Lin: Pre-Computation [Tong+ 06]
• Step 1:
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
M = Ac ArX
1( 0.9 )I M 3( )O m m E
2-step RWR for Conferences
All Conf-Conf Prox. Scores
Ac/ArE edges
m x m
![Page 72: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/72.jpg)
SCS CMU
BB_Lin: On-Line Stage
74Ac/Ar
E edges
Case 1: - Conf - Conf
authors
Conferences
Read out !
![Page 73: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/73.jpg)
SCS CMU
BB_Lin: On-Line Stage
75Ac/Ar
E edges
Case 2: - Au - Conf
authors
Conferences
1 matrix-vec!
![Page 74: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/74.jpg)
SCS CMU
BB_Lin: On-Line Stage
76Ac/Ar
E edges
Case 3: - Au - Au
authors
Conferences
2 m atrix-vec!
![Page 75: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/75.jpg)
SCS CMU
BB_Lin: Examples
• NetFlix dataset (2.7m user x 18k movies)– 1.5hr for pre-computation; – <1 sec for on-line
• DBLP dataset (400k authors x 3.5k confs)– A few minutes for pre-computation– <0.01 sec for on-line
77
![Page 76: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/76.jpg)
SCS CMU
78
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• B_Lin: RWR
• FastAllDAP: Esc_Prob
• BB_Lin: Skewed BGs
• FastUpdate: Time-Evolving
![Page 77: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/77.jpg)
SCS CMU
79
Challenges
• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– w/ 1.5 hr pre-computation for m x m core matrix– fraction of seconds for on-line query
• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– 1.5hr itself becomes a part of on-line cost!
![Page 78: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/78.jpg)
SCS CMU
80
1( 0.9 )I M 3( )O m m E t=0
Q: How to update the core matrix?
t=11( 0.9 )I M
~ ~3( )O m m E
?
![Page 79: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/79.jpg)
SCS CMU
Update the core matrix
• Step 1:
• Step 2:
81
M = Ac ArX
1( 0.9 )I M ~
~
~ ?
M= X+
Rank 2 update
= + X
2( )O m 3( )O m m E
![Page 80: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/80.jpg)
SCS CMU
Update : General Case [Tong+ 2008]
• E’ edges changed
• Involves n’ authors, m’ confs.
• Observation
82
M = Ac ArX~
min( ', ') 'n m E
n authors
m Conferences
![Page 81: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/81.jpg)
SCS CMU
83
• Observation: – the rank of update is small!
• Algorithm:– E’ edges changed– Involves n’ authors, m’ confs.
– our Alg.
– (details in the paper)
Update : General Case
2(min( ', ') ')O n m m E3( )O m m E
min( ', ') 'n m E
83
n authors
m Conferences
![Page 82: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/82.jpg)
SCS CMU
84
FastOneUpdate
176x speedup
40x speedup
Time (Seconds)
Datasets
![Page 83: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/83.jpg)
SCS CMU
85
Fast-Batch-Update
Min (n’, m’)E’
Time (Seconds)Time (Seconds)
15x speed-up on average!
![Page 84: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/84.jpg)
SCS CMU
86
Summary of Part II
• Goal: Efficiently Solve Linear System(s)
• Sols.– B_Lin: Approximate one large linear system– FastAllDAP: multiple inner-related linear
systems– BB_Lin: the intrinsic complexity is small– FastUpdate: (smooth) dynamic linear system
![Page 85: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/85.jpg)
SCS CMU
87
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
B_Lin
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
p p p p p p
FastAllDAP
…
BB_Lin…FastUpdate
![Page 86: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/86.jpg)
SCS CMU
88
• Motivation
• Part I: Definitions
• Part II: Fast Solutions
• Part III: Applications
• Conclusion
Roadmap
• Link Prediction
• NF
• gCap
• CePS
• G-Ray
• pTrack/cTrack
![Page 87: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/87.jpg)
SCS CMU
890 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0
0.05
0.1
0.15
0.2
0.25
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18 Link Prediction: existence
no link
with link
density
density
Prox (ij)+Prox (ji)
Prox (ij)+Prox (ji)
Prox. is effective to distinguish red and blue!
![Page 88: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/88.jpg)
SCS CMU
90
Link Prediction: direction
• Q: Given the existence of the link, what is the direction of the link?
• A: Compare prox(ij) and prox(ji)>70%
Prox (ij) - Prox (ji)
density
![Page 89: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/89.jpg)
SCS CMU
91
Neighborhood Formulation
ICDM
KDD
SDM
Philip S. Yu
IJCAI
NIPS
AAAI M. Jordan
Ning Zhong
R. Ramakrishnan
…
…
… …
Conference Author
A: RWR! [Sun ICDM2005]
Q: what is most related conference to ICDM
![Page 90: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/90.jpg)
SCS CMU
92
NF: example
ICDM
KDD
SDM
ECML
PKDD
PAKDD
CIKM
DMKD
SIGMOD
ICML
ICDE
0.009
0.011
0.0080.007
0.005
0.005
0.005
0.0040.004
0.004
![Page 91: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/91.jpg)
SCS CMU
93
gCaP: Automatic Image Caption• Q
…
Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger
{?, ?, ?,}
?A: RWR! [Pan KDD2004]
![Page 92: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/92.jpg)
SCS CMU
94
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword
Region
![Page 93: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/93.jpg)
SCS CMU
95
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword
Region
{Grass, Forest, Cat, Tiger}
![Page 94: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/94.jpg)
SCS CMU
96
Center-Piece Subgraph(CePS)
A C
B
A C
B
?
Original GraphBlack: query nodes
CePS
Q
A: RWR! [Tong KDD 2006]
Red: Max (Prox(Red, A) x Prox(Red, B) x Prox(Red, C))
CePS guy
![Page 95: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/95.jpg)
SCS CMU
97
CePS: Example
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Heikki Mannila
Christos Faloutsos
Padhraic Smyth
Corinna Cortes
15 1013
1 1
6
1 1
4 Daryl Pregibon
10
2
11
3
16
![Page 96: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/96.jpg)
SCS CMU
98
K_SoftAnd: Relaxation of AND
Asking AND query? No Answer!
Disconnected Communities
Noise
![Page 97: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/97.jpg)
SCS CMU
99
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.1617
0.1387
0.1617
0.0849
0.1617
0.1617
0.0103
0.1387
0.13870.16170.1617
0.0103
2_SoftAnd
And
1_SoftAnd (OR)
x 1e-4
1 7
5
10
11
9 8
12
13
4
3
62
0.4505
0.1010
0.0710
0.1010
0.2267
0.1010
0.1010
0.4505
0.0710
0.07100.10100.1010
0.4505
1 7
5
10
11
9 8
12
13
4
3
62
0.0103
0.0046
0.0019
0.0046
0.0024
0.0046
0.0046
0.0103
0.0019
0.00190.00460.0046
0.0103
![Page 98: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/98.jpg)
SCS CMU
100
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V. Jagadish
Laks V.S. Lakshmanan
Umeshwar Dayal
Bernhard Scholkopf
Peter L. Bartlett
Alex J. Smola
1510
13
3 3
5 2 2
327
4
CePS: 2 Soft_AND
Stat.
DB
![Page 99: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/99.jpg)
SCS CMU
101
OutputInput
Attributed Data Graph
Query Graph
Matching SubgraphAccountant
CEO
Manager
SEC
Graph X-Ray
![Page 100: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/100.jpg)
SCS CMU
102
G-Ray: How to?
matching node
matching node
matching node
matching node
Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)
![Page 101: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/101.jpg)
SCS CMU
103
Effectiveness: star-query
Query Result
![Page 102: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/102.jpg)
SCS CMU
104
Effectiveness: line-query
Query
Result
![Page 103: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/103.jpg)
SCS CMU
105
Query
Result
Effectiveness: loop-query
![Page 104: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/104.jpg)
SCS CMU
106
pTrack
• [Given] – (1) a large, skewed time-evolving bipartite
graphs, – (2) the query nodes of interest
• [Track] – (1) top-k most related nodes for each query
node at each time step t; – (2) the proximity score (or rank of proximity)
between any two query nodes at each time step t
Author A’ Rank in KDD
Year
![Page 105: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/105.jpg)
SCS CMU
107
Philip S. Yu’s Top-5 conferences up to each year
ICDE
ICDCS
SIGMETRICS
PDIS
VLDB
CIKM
ICDCS
ICDE
SIGMETRICS
ICMCS
KDD
SIGMOD
ICDM
CIKM
ICDCS
ICDM
KDD
ICDE
SDM
VLDB
1992 1997 2002 2007
DatabasesPerformanceDistributed Sys.
DatabasesData Mining
![Page 106: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/106.jpg)
SCS CMU
108
KDD’s Rank wrt. VLDB over years
Rank
Year
Data Mining and Databases are more and more relavant!
![Page 107: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/107.jpg)
SCS CMU
109
cTrack
• [Given] – (1) a large, skewed time-evolving graphs, – (2) the query nodes of interest
• [Track] – (1) top-k most central nodes at each time step t; – (2) the centrality score (or rank of centrality) for
each query node at each time step t
![Page 108: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/108.jpg)
SCS CMU
110
Ranking of Centrality up to each year (in NIPS)
M. Jordan
G.HintonC. Koch
T. Sejnowski
Year
Rank of Influential-ness
![Page 109: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/109.jpg)
SCS CMU
111
10 most influential authors up to each year
Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years
T. Sejnowski
M. Jordan
![Page 110: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/110.jpg)
SCS CMU
112
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
RWR
Varia
nts
w/ Tim
e
w/ Attr
ibut
e
Gro
up P
orx.
Definitions
B_Lin
FastAllDAP
BB_Lin
FastUpdate
Computations
Link Prediction
NF
gCap
CePS
G-Ray
pTrack
cTrack
Applications
ProximityOn
Graphs
Weighted Multiple
Relationship
EfficientlySolve Linear
System(s)
Use Proximity as
Building block
![Page 111: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/111.jpg)
SCS CMU
Take-home Messages
• Proximity Definitions – RWR
– and a lot of variants
• Computations– SM Lemma
113
(1 )i i ir cWr c e
1 11 1 1
1( )
1
TT
T
A u v AA A u v A
v A u
![Page 112: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/112.jpg)
SCS CMU
References
• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library.
• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002
• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004.
• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004.
• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005.
• W. Cohen. (2007) Graph Walks and Graphical Models. Draft.114
![Page 113: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/113.jpg)
SCS CMU
References
• P. Doyle & J. Snell. (1984) Random walks and electric networks, volume 22. Mathematical Association America, New York.
• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006.
• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.
• S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007.
• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007.
115
![Page 114: SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture](https://reader035.vdocument.in/reader035/viewer/2022062721/56649f1c5503460f94c32dc0/html5/thumbnails/114.jpg)
SCS CMU
References
• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, 404-413, 2006.
• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, 613-622, 2006.
• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for graph mining. In KDD, 747-756, 2007.
• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007) Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007.
• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008.
116