jure leskovecand anandrajaraman · 2/4/2010 jure leskovec & anand rajaraman, stanford cs345a:...

37
CS345a: Data Mining Jure Leskovec and Anand Rajaraman Stanford University

Upload: others

Post on 08-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

CS345a: Data MiningJure Leskovec and Anand RajaramanjStanford University

Page 2: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

… …

Philip S. Yu

IJCAI Q: what is most relatedconference to ICDM 

ICDM

KDD

Ning Zhong

A: Random walks!

conference to ICDM 

SDM

AAAI M. Jordan

R. RamakrishnanA: Random walks!

[Sun, ICDM2005]

NIPS…

2

Conference Author2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 3: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

0 008

0.009

0.011

0.0080.007

0.0050.011

0.005

0.0050 004

0.004

0.0040.004

32/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 4: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Pan, KDD ‘04]

Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger

?A: Proximity on 

grpahs!

4

{?, ?, ?,}grpahs!

[Pan KDD2004]2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 5: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Pan, KDD ‘04]

Region

Image

Test Image

5

Sea Sun Sky Wave Cat Forest Tiger Grass

Keyword2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 6: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Pan, KDD ‘04]

Region

Image

Test Image{Grass, Forest, Cat, Tiger}

6

Sea Sun Sky Wave Cat Forest Tiger Grass

Keyword2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 7: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Tong‐Faloutsos, ‘06]

Connection subgraphs: Connection subgraphs: What is the most likely connection between Andrew McCallum and Yiming Yang:Andrew McCallum and Yiming Yang:

Michael I.Jordan

RongJin

AndrewNg 1 16

2

7

Andrew Yiming

Jordan JinNg

ZoubinJohn D.

2

22 2

3

McCallum Yang

Xiaojin

GhahramaniLaffterty

2

Fernando42

1

42

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 7

XiaojinZhu

Jian ZhangFernando

C.N. Pereira

Page 8: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Tong‐Faloutsos, ‘06]

Gi Q d B Given Q query nodes Find Center‐piece subgraph on b nodes

A C

subgraph on b nodes

Input of CEPS: Q Query nodes Q Query nodes Budget b k‐softAND coefficient B

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 8

A C

Page 9: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Tong‐Faloutsos, ‘06]

Individual Score Calculation: Individual Score Calculation: Measure proximity of each nodewith respect to individual query node

( , ) n Qr i j

with respect to individual query node

Combine Individual Scores:

Measure importance of a node tothe whole query set

1( , ) nr Q j

“Extract” the connection subgraph: arg max ( )H g H

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 9

Page 10: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Tong‐Faloutsos, ‘06]

Goal: Goal: Calculate importance score r(i,j) = ri,j f h d j d h d i for each node j and each query node i

How to do that? How to do that?

102/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 11: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Tong‐Faloutsos, ‘06]

I J1

1 1

A BH1 1

1 1D

1 1

1 11E

FG

1 11

11

a.k.a.: Relevance, Closeness, ‘Similarity’…

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 12: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Shortest path is not good:Shortest path is not good:

No influence for degree‐1 nodes (E, F)! known as ‘pizza delivery guy’ problem in undirected graphgraph

Multi‐faceted relationships2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 12

Page 13: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Max flow is not good: Max‐flow is not good:

Does not punish long paths

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 13

Page 14: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Tong‐Faloutsos, ‘06]

I J11 1

A BH1 1

D1 1

1 11•Multiple Connections…

EF

G1 11

• Quality of connection

•Direct & In‐direct connections

14

Direct & In direct connections

•Length, Degree, Weight…2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 15: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

910

2

9

12

1

3

8

11

4

56

15

7

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 16: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Node 4

Node 1Node 2Node 3Node 4

0.130.100.130.222

9 10

8

120.13

0.10

0.08

0.04

0.02

0.03

Node 5Node 6Node 7Node 8

0.130.050.050.08

1

4

3

6

811

30.13

0 05

0.04

Node 9Node 10Node 11Node 12

0.040.030.040.02

5 6

7

0.13

0.05

0.05

Ranking vector More red  more relevant

Nearby nodes, higher scores

16

More red, more relevant4r

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 17: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

( , ) i jQ i j rj

W : adjacency matrix. 1( )Q I cW

,( , ) i jQ j

i

j

c: damping factor

2c 3c ...W 2W 3WQ I c W WQ

all paths from i all paths from i all paths from i

17

to jwith length 1 to jwith length 2 to jwith length 3

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 18: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

(1 )r cWr c e

S i  

(1 )i i ir cWr c e R   b

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 00.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

0.13 00.10 0

Ranking vector Starting vectorTransition matrix Restart prob.

13

2

9 10

811

12

0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0.130.220.130.05

0.9

01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0.10 00.13 00.220.13 00.05 0

0.1

1

4

5 6

0.050.080.040.030.04

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0.05 00.08 00.04 00.03 00.04 0

7

0.02 0 0 0 0 0 0 0 0 0 1/3 1/3 0

2 00.0

n x n n x 1n x 1182/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 19: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0 0 0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0001

Query

0.9

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

00

0.10

?? 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/20 0 0 0 0

000

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3 0 0

Adjacency matrix Starting vectorRanking vectorRanking vector Adjacency matrix Starting vectorRanking vectorRanking vector

19

2/4/2010Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Page 20: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 0

00

00

0.130.10

0.30

0.120.18

0.190.09

0.140.13

0.160.10

0.130.10 1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0.9

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 00 0 0 0 1/4 1/2 0 0 0 0 0 0

0

00

0.10

1

01000

0.130.220.130.050 05

1

2

9 10

8

12

0.30.10.300

0.120.350.030.070 07

0.190.180.180.040 04

0.140.260.100.060 06

0.160.210.150.050 05

0.130.220.130.050 05

12

9 10

811

120.130.10

0.08

0.04

0.02

0.03

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0000

0 0 1/4 0 1/3 0 1/2 0

00000

0.050.080.040.030.04

1

4

3

5 6

1100000

0.070.07000

0.040.060.0200.02

0.060.080.010.010.01

0.050.070.020.010.02

0.050.080.040.030.04

43

5 6

7

110.13

0.13

0.05

0.04

0 0 0 0 0 0 0 0 0 1/3 1/3 0 0 0

0.02

70 0 0 0 0.01 0.02

ir

ir

70.05

No pre computation / light storage

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 20

No pre‐computation / light storageSlow on‐line response O(mE)

Page 21: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Q1 Q2 Q3

Node 1Node 2

0.5767 0.0088 0.00880.1235 0.0076 0.0076

0 00240 0333

0.0088

5

Node 3Node 4Node 5Node 6

0.0283 0.0283 0.02830.0076 0.1235 0.00760.0088 0.5767 0.00880.0076 0.0076 0.123510

11 12

13

4

0.1260

0 02830.0024

0.00760.00240.0333

Node 7Node 8Node 9Node 10

0.0088 0.0088 0.57670.0333 0.0024 0.12600.1260 0.0024 0.03330.1260 0.0333 0.0024

10 133

62

0.1235

0.0283

0.0076

Node 11Node 12Node 13

0.0333 0.1260 0.00240.0024 0.1260 0.03330.0024 0.0333 0.1260

19 8

0.5767

0.1260 0.0333

0.0088

7

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 21

Page 22: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

The RWR score r(i j) The RWR score r(i,j) i … query node, iQ j d i th t k j … node in the network

Combine r(i,j) to get the score r(Q,j)( ,j) g (Q,j) 1) AND query: Meeting probability prob. that |Q| random particles coincide on node jp |Q| p j

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 22

Page 23: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

OR query: OR query: At least one particle arrives to node j:(i d j i i t t t t l t 1 d )(i.e.,  node j is important to at least 1 query node)

k‐SoftAND query:d l k d Node j is important to at least k query nodes

Can be computed recursively:

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 23

Page 24: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Goal:2

910Goal:

Extract a sub‐graph S on b nodes that maximizes 

4

68

9

1113

the uS r(Q, u) Idea: Iterate until budget is

1 35

4712

14 15 16 Iterate until budget is reached Pick not‐yet‐selected node j (Q j)

29

10

j=arg maxj r(Q,j) Find good paths to all query nodes 4

68

1113

Add the paths to S

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 24

1 35 712

Page 25: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

0 4505 0.0103

5

11 0 07100.10100.1010

0.45055

11 120.00190.00460.0046

0.0103

10

11 12

13

4

0.1010

0.22670.1010

0.0710

10

12

13

4

3

0.0046

0.00240.0046

3

62

0.0710

0 1010 0 1010

0.0710

1 7

3

62

0.0019

0 0046 0 0046

0.0019

AND

1 7

9 80.4505

0.1010 0.1010

0.4505

1 7

9 80.0103

0.0046 0.0046

0.0103

S f AND

25

AND 2‐SoftAND

Page 26: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

0 0103

5

0 16170.1617

0.0103

11 124

0 1617 0 1617

0.13870.16170.1617

10 133

0.1617

0.1387

0.08490.1617

0.1387

1 7

62

0.1617 0.1617

26

9 80.0103 0.0103

Page 27: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 27

Page 28: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Solving the random walk for each query Solving the random walk for each query separately is time consuming Not appropriate for real time Not appropriate for real‐time

Can we do better? Can we do better?

Idea 1) Pre compute scores for each possible Idea 1) Pre‐compute scores for each possible query node i

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 28

Page 29: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.340.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.450.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.330.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32

29 10

8120.13

0.100.08

0.04

0 02

0.03

0.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.220.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130 03 0 04 0 03 0 28 0 36 0 30 0 30 0 74 1 78 1 00 0 76 0 79

1

43

5 6

8 110.13

0.130.05

0.02

0.04Q‐1:

0.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

70.13

0.05

Fast on‐line response

29

Heavy pre‐computation/storage costO(n3 ) O(n2)

Page 30: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

[Tong‐Faloutsos‐Pan, ‘06]

Find Communities 12

9 10

812

12

9 10

812

100.04 0.039 10

12

1

43

5 6

7

11

9 1012

1

43

5 6

7

11

1

43

29 10

8 11

120.130.10

0.13

0.080.02

0.04

1

43

28

11

12 78

11

12 7

1

43

2

45 6

70.13

0.05

0.05

4

5 6

7

5 6

7

4

Combine1 3

29 10

8 11

121 3

29 10

8 11

12

30

Fix the remaining4

3

5 6

7

43

5 6

7

Page 31: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

10

13

29 10

811

12

12

9 10

812

4

5 6

7

1

43

5 6

811

7 5

7

Within partition links cross partition links

31

Within‐partition links cross‐partition links

Page 32: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

9 109 10

1

43

29

811

12

1

43

29

811

12

4

5 6

7

4

5 6

7

32

Page 33: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

9 1012

9 1012

31

4

28

11

12

1

43

28

11

12

5 6

7

5 6

7

|S| << |W2|~| | | |

33

Page 34: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Offline stage: Offline stage:

QQ1,1

Q 1,2

=11Q

Q1,k

C i i BridgesCommunities Bridges

8/22/2006 34

Page 35: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Sherman Morrison Woodbury matrix identity: Sherman‐Morrison‐Woodbury matrix identity: Inverse of a rank‐k correction of matrix X can be computed by doing a rank‐k correction to X‐1:computed by doing a rank‐k correction to X :

In our case: In our case:Q‐1=c(I‐W)‐1=c(I‐W1‐W2)‐1=

1 1 11 U VQQ Q Q 8/22/2006 35

1 1 11 1

11U VcQQ Q Q

Page 36: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Online stage: Online stage: Compute column i of:

1 1 11 QQ Q Q

U i

1 1 11 1

11U VcQQ Q Q

Using: 

8/22/2006 36

Page 37: Jure Leskovecand AnandRajaraman · 2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining. 0 008 0.009 0.011. 0.007 0.005 0.005 0.005 0 004 0.004 0.004. 2/4/2010 Jure

Most slides are borrowed from the tutorial by Most slides are borrowed from the tutorial by Hanghang Tong and Christos Faloutsos from CMUCMU

http://www.cs.cmu.edu/~htong/tut/cikm2008/cikm_tutorial.html

8/22/2006 37