scs cmu outlinehtong/tut/cikm2008/part3_proximity.pdf · faloutsos, tong cikm, 2008 4 scs cmu...

19
Faloutsos, Tong CIKM, 2008 1 SCS CMU Large Graph Mining: Patterns, Tools and Case Studies CIKM’08 Copyright: Faloutsos, Tong (2008) Christos Faloutsos Hanghang Tong CMU 3-1 SCS CMU Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity CIKM’08 Copyright: Faloutsos, Tong (2008) Part 3: Proximity Part 4: Case Studies 3-2 SCS CMU Part 3: Proximity on Graphs -Definitions, Fast Solutions, and Applications Copyright: Faloutsos, Tong (2008) 3-3 CIKM, 2008 Joint work with Christos Faloutsos (CMU) Jia-Yu Pan (Google) Yehuda Koren (AT&T Labs) IBM Spiros Papadimitriou Tina Eliassi-Rad (LLNL) Brian Gallagher (LLNL) Kensuke Oonuma (Sony Corp.) Yasushi Sakurai (NTT Labs) Philip S. Yu Huiming Qu Hani Jamjoom SCS CMU Recap: Graphs are everywhere! Copyright: Faloutsos, Tong (2008) 3-5 CIKM, 2008 SCS CMU Graph Mining: Big Picture John Smith Tom Jones Alan Peter Adam + Graph Level -Patterns -Laws -Generators + Subgraph Level - Community Adam Dan Jack Alice Amy Anna Beck Tom Cell Phone Message E-Mail Community + Node Level -Association -Correlation -Causality -Proximity Anna We are here! Copyright: Faloutsos, Tong (2008) 3-6

Upload: others

Post on 19-May-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

1

SCS CMU

Large Graph Mining:Patterns, Tools and Case Studies

CIKM’08 Copyright: Faloutsos, Tong (2008)

Christos FaloutsosHanghang Tong

CMU

3-1

SCS CMU

Outline

• Part 1: Patterns• Part 2: Matrix and Tensor Tools• Part 3: Proximity

CIKM’08 Copyright: Faloutsos, Tong (2008)

• Part 3: Proximity• Part 4: Case Studies

3-2

SCS CMU

Part 3: Proximity on Graphs-Definitions, Fast Solutions, and Applications

Copyright: Faloutsos, Tong (2008) 3-3CIKM, 2008

SCS CMU

Joint work with

Christos Faloutsos (CMU)

Jia-Yu Pan(Google)

Yehuda Koren(AT&T Labs)

IBM

3-4

Spiros Papadimitriou

Tina Eliassi-Rad(LLNL)

Brian Gallagher(LLNL)

Kensuke Oonuma(Sony Corp.)

Yasushi Sakurai (NTT Labs)

Philip S. Yu Huiming Qu Hani Jamjoom

SCS CMU

Recap: Graphs are everywhere!

Copyright: Faloutsos, Tong (2008) 3-5CIKM, 2008

SCS CMU

Graph Mining: Big Picture

John

Smith

TomJones

Alan

Peter

Adam

+ Graph Level-Patterns-Laws-Generators

+ Subgraph Level- Community

Adam

Dan

Jack

Alice

Amy Anna

Beck

Tom

Cell PhoneMessageE-Mail

Community+ Node Level

-Association-Correlation-Causality-Proximity

Anna

We are here! Copyright: Faloutsos, Tong (2008) 3-6

Page 2: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

2

SCS CMU

Proximity on Graph: What?

A BH1 1

I J1

1 1

Copyright: Faloutsos, Tong (2008)

D1 1

EF

G1 11

a.k.a Relevance, Closeness, ‘Similarity’… 3-7

SCS CMU

Proximity on Graphs: Why?

• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]• Image caption [Pan+]g p [ ]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…

Will return to this laterCopyright: Faloutsos, Tong (2008) 3-8CIKM, 2008

SCS CMU

0.2

0.25

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Link Prediction

Prox. Hist. for a set of deleted links

density

density

Prox (i j)+Prox (j i)

Prox. is effective to ‘deleted’ and absent edges!

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.05

0.1

0.15

Prox (i j)+Prox (j i)

Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]

Prox. Hist. for a set of absent links

Copyright: Faloutsos, Tong (2008) 3-9CIKM, 2008

SCS CMU Neighborhood Search on graphs

ICDM

KDD

SDM

Philip S. Yu

IJCAI

AAAI M. Jordan

Ning Zhong

R. Ramakrishnan

… …

NIPS…

Conference Author

A: Proximity! [Sun+ ICDM2005]

Q: what is most related conference to ICDM?

Copyright: Faloutsos, Tong (2008) 3-10CIKM, 2008

SCS CMU

Image

Region Automatic Image Caption

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Keyword

Q: How to assign keywords to the test image?A: Proximity! [Pan+ 2004]

3-11Copyright: Faloutsos, Tong (2008)CIKM, 2008

SCS CMU

Center-Piece Subgraph(CePS)

B

CePS guy

Input Output

A C

Original Graph CePS

Q: How to find hub for the black nodes?A: Proximity! [Tong+ KDD 2006]

Copyright: Faloutsos, Tong (2008) 3-12CIKM, 2008

Page 3: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

3

SCS CMU

OutputInput

Data Graph

Query Graph

Best-Effort Pattern Match

Matching Subgraph

Copyright: Faloutsos, Tong (2008) 3-13

Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007 b]CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• Properties

• Applications• Conclusion

• Generalizations

Copyright: Faloutsos, Tong (2008) 3-14CIKM, 2008

SCS CMU

Why not shortest path?

A BD1 1

E F11 1

A BD1 1

Some ``bad’’ proximities

‘pizza delivery guy’ problem

A BD1 1

A BD1 1

EF

G1 11

‘multi-facet’ relationship

Copyright: Faloutsos, Tong (2008) 3-15CIKM, 2008

SCS CMU

A BD1 1

Why not max. netflow?Some ``bad’’ proximities

A BD1 11 E

No punishment on long paths

Copyright: Faloutsos, Tong (2008) 3-16CIKM, 2008

SCS CMU

What is a ``good’’ Proximity?

A BH1 1

I J1

1 1

D1 1

EF

G1 11

• Multiple Connections

• Quality of connection

•Direct & In-directed Conns

•Length, Degree, Weight…

Copyright: Faloutsos, Tong (2008) 3-17CIKM, 2008

SCS CMU

13

2

910

811

12

Random walk with restart

Copyright: Faloutsos, Tong (2008)

4

3

56

7

3-18CIKM, 2008

Page 4: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

4

SCS CMU

Random walk with restartNode 4

Node 1Node 2Node 3Node 4Node 5Node 6

0.130.100.130.220.130.05

13

2

910

811

120.13

0.10

0.13

0.08

0.04

0.02

0.04

0.03

Node 7Node 8Node 9Node 10Node 11Node 12

0.050.080.040.030.040.02

4

5 6

7

0.13

0.05

0.05

Ranking vector More red, more relevantNearby nodes, higher scores

4rCopyright: Faloutsos, Tong (2008) 3-19CIKM, 2008

SCS CMU

Why RWR is a good score?

W : adjacency matrix. c: damping factor

1( )Q I cW −= − =

,( , ) i jQ i j r∝

i

j

2c 3cQ c= ...W 2W 3W

all paths from ito j with length 1

all paths from ito j with length 2

all paths from ito j with length 3

Copyright: Faloutsos, Tong (2008) 3-20CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• Properties

• Applications• Conclusion

• Generalizations

Copyright: Faloutsos, Tong (2008) 3-21CIKM, 2008

SCS CMU

Variant: escape probability• Define Random Walk (RW) on the graph• Esc_Prob(CMU Napa)

– Prob (starting at CMU, reaches Napa before returning to CMU)

CMU Napathe remaining graph

Copyright: Faloutsos, Tong (2008) 22Esc_Prob = Pr (smile before cry)

SCS CMU

Other Variants

• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]

• Equivalence of Random WalksAll are “related to” or “similar to”Equivalence of Random Walks– Electric Networks:

• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems

• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]

All are “related to” or “similar to” random walk with restart!

Copyright: Faloutsos, Tong (2008) 3-23CIKM, 2008

SCS CMU

Chaptering different measurements

RWR

Esc_Prob+ Sink

Hitting Time/Commute Time

RegularizedUn-constrainedQuad Opt.

relax

4 ssp decides 1 esc_prob

KatzNormalize

+ Sink Commute Time

Effective Conductance

String System

Harmonic Func.ConstrainedQuad Opt.

Mathematic Tools

X out-degree

“voltage = position”

CIKM, 2008 Physical Models

Page 5: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

5

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• Properties

• Applications• Conclusion

• Generalizations

Copyright: Faloutsos, Tong (2008) 3-25CIKM, 2008

SCS CMU

Property: Monotonicity

A B

Candidate Graph

Original Graph

We want:

A B

Prox ( ) Prox ( )candi origianla b a b≤→ →A: degree preserving! [Koren+ KDD06][Tong+ KDD07a][Tong+ SDM08]

Copyright: Faloutsos, Tong (2008) 26CIKM, 2008

SCS CMU

Property: Asymmetry [Tong+ KDD07 a]

A B

1 1

1 A B

1 1

0.5

111

11

10.51

What is Prox from A to B?What is Prox from B to A?

What is Prox between A and B?

Copyright: Faloutsos, Tong (2008) 3-27CIKM, 2008

SCS CMU

Asymmetry in un-directed graphs• Hanghang’s # 1 Conf. is CIKM• The #1 author of CIKM is ...

So is love…

HanghangCIKM

Copyright: Faloutsos, Tong (2008) 3-28CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Basic: RWR

• Variants

• PropertiesFast Solutions

• Applications• Conclusion

• Generalizations

Copyright: Faloutsos, Tong (2008) 3-29CIKM, 2008

SCS CMU

Group Proximity [Tong+ KDD07 a]

• Q: How close are Accountants to SECs?

Copyright: Faloutsos, Tong (2008) 3-30CIKM, 2008

• A: Prob (starting at any RED, reaches anyGREEN before touching any RED again)

Page 6: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

6

SCS CMU

Proximity on Attributed Graphs [Tong+ KDD07 b]

What is the proximity from node 7 to 10?If we know that…

Copyright: Faloutsos, Tong (2008) 3-31CIKM, 2008

SCS CMU

A: Augmented graphs [Tong+ KDD07 b]

Copyright: Faloutsos, Tong (2008) 3-32CIKM, 2008

SCS CMU

More on Generalizations

• Attributed on edges [Chakrabarti+ KDD 06]

• Proximity w/ Time– [Minkov+], [Tong+ SDM 2008], [Tong+ CIKM 2008]

• Proximity w/ Side Information [Tong+ 2008]

• …

Copyright: Faloutsos, Tong (2008) 3-33CIKM, 2008

SCS CMU

Summary of Proximity Definitions• Goal: Summarize multiple … relationship

• Solutions– Basic: Random Walk with Restart

• [Pan+ 2004][Sun+ 2006][Tong+ 2006][ ][ ][ g ]

– Properties: Asymmetry, monotonicity• [Koren+ 2006][Tong+ 2007] [Tong+ 2008]

– Variants: Esc_Prob and many others.• [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]

– Generalizations: Group Prox, w/ Attr., w/ Time, w/ Side Information

• [Charkrabarti+ 2006][Tong+ 2007] [Tong+ 2008]Copyright: Faloutsos, Tong (2008) 3-34CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-35CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• B_Lin: RWR

• BB_Lin: Skewed BGs

• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-36CIKM, 2008

• FastUpdate: Time-Evolving

Page 7: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

7

SCS CMU

Preliminary: Sherman–Morrison Lemma

Tv

uAA =If:

1 11 1 1

1( )1

TT

T

A u v AA A u v Av A u

− −− − −

⋅ ⋅= + ⋅ = −+ ⋅ ⋅

Then:

Copyright: Faloutsos, Tong (2008) 3-37CIKM, 2008

SCS CMU

SM: The block-form A

D

B

C1 1 1 1 1 1 1 1 1

1 1 1 1 1

( ) ( )( ) ( )

A B A A B D CA B CA A B D CA BC D D CA B CA D CA B

− − − − − − − − −

− − − − −

⎡ ⎤+ − − −⎡ ⎤= ⎢ ⎥⎢ ⎥ − − −⎢ ⎥⎣ ⎦ ⎣ ⎦

Or1 1 1 1 1 1

1 1 1 1 1

( ) ( )( ) ( )

A B A BD C A B D CA BC D D C A BD C D CA B

− − − − − −

− − − − −

⎡ ⎤− − −⎡ ⎤= ⎢ ⎥⎢ ⎥ − − −⎢ ⎥⎣ ⎦ ⎣ ⎦

And many other variants…Also known as Woodburg Identity

Or…

Copyright: Faloutsos, Tong (2008) 3-38CIKM, 2008

SCS CMU

SM Lemma: Applications

• RLS (Recursive least square)– and almost any algorithm in time series!

• Leave-one-out cross validation for LSR• Kalman filtering• Incremental matrix decomposition• …• … and all the fast sol.s we will introduce!

Copyright: Faloutsos, Tong (2008) 3-39CIKM, 2008

SCS CMU

Computing RWR

9 1012

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 00.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0.13

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

01/3 1/3 0 1/3 0 0 0 0 0 0 0 0

⎛ ⎞⎜⎜⎜⎜

0.13 00.10 00.13 0

⎛ ⎞ ⎛ ⎞⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟

Ranking vector Starting vectorAdjacency matrix

(1 )i i ir cWr c e= + −Restart p

1

43

2

5 6

7

811

120.220.130.05

0.90.050.080.040.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ = ×⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

1/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0 0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 1/3 1/3 0

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎠

0.220.13 00.05 0

0.10.05 00.08 00.04 00.03 00.04 0

2 0

1

0.0

⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟+ ×⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

n x n n x 1n x 1

1

Copyright: Faloutsos, Tong (2008) 3-40CIKM, 2008

SCS CMU

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

000

00

1

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

Q: Given query i, how to solve it?

??

Query

0.9= ×0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

00.1

0000

0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3 0 0

⎜ ⎟ ⎜ ⎟+ ×⎜ ⎟ ⎜ ⎟

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

??

Adjacency matrix Starting vectorRanking vectorRanking vector

Copyright: Faloutsos, Tong (2008) 3-41CIKM, 2008

SCS CMU

14

32

6

9 10

8 11120.13

0.10

0.13

0 05

0.08

0.04

0.02

0.04

0.03

OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4

0.9= ×

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 0

000

00

0.1000

1

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

000100000

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

0.130.100.130.220.130.050.050.080.04

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

1

43

2

6

9 10

8 11

12

5 6

70.13

0.05

0.05

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0

00 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3 0 0

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

000

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

0.030.040.02

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

7

No pre-computation/ light storage

Slow on-line response O(mE)

ir ir

Copyright: Faloutsos, Tong (2008) 3-42CIKM, 2008

Page 8: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

8

SCS CMU

0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.340.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.450.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.330.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

4

PreCompute1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r

1 32

6

9 10

8 11

12

R:0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

7

[Haveliwala+ 2002]c x Q

Q Copyright: Faloutsos, Tong (2008) 3-43CIKM, 2008

SCS CMU

2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.341.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.451.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.331.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

PreCompute:

14

32

6

9 10

8 11120.13

0.10

0.13

0.08

0.04

0.02

0.04

0.03

0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

5 6

70.13

0.05

0.05

Fast on-line response

Heavy pre-computation/storage costO(n ) O(n )3 2 3-44

SCS CMU

Q: How to Balance?

On-line Off-line

Copyright: Faloutsos, Tong (2008) 3-45CIKM, 2008

SCS CMU

B_Lin: Basic Idea[Tong+ ICDM 2006]

1 32

9 10

8 11120.13

0.10

0.13

0.08

0.04

0.02

0 04

0.03

13

29 10

811

12

Find Community 14

32

5 6

7

9 10

8 1112

9 10

811

12

14

32

5 6

7

9 10

8 1112

13

2

45 6

70.13

0.05

0.05

0.044

5 6

7

Fix the remaining

Combine1

43

2

5 6

7

9 10

8 1112

5 6

7

14

32

5 6

7

9 10

8 1112

4

3-46

SCS CMU

+~~

B_Lin: details

W~

details

W 1: within community~

Cross-community

Copyright: Faloutsos, Tong (2008) 3-47CIKM, 2008

SCS CMU

B_Lin: details

W~I – c ~~ I – c – U S VW1~

-1 -1

details

Easy to be inverted LRA difference

SM Lemma!

Copyright: Faloutsos, Tong (2008) 3-48CIKM, 2008

Page 9: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

9

SCS CMU

B_Lin: summary

• Pre-Computational Stage• Q: • A: A few small, instead of ONE BIG, matrices inversions

Efficiently compute and store Q

• On-Line Stage• Q: Efficiently recover one column of Q• A: A few, instead of MANY, matrix-vector multiplications

Copyright: Faloutsos, Tong (2008) 3-49CIKM, 2008

SCS CMU

Query Time vs. Pre-Compute Time

Log Query Time

•Quality: 90%+ •On-line:

Log Pre-compute Time

•Up to 150x speedup•Pre-computation:

•Two orders saving

Copyright: Faloutsos, Tong (2008) 3-50CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• B_Lin: RWR

• BB_Lin: Skewed BGs

• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-51CIKM, 2008

• FastUpdate: Time-Evolving

SCS CMU

RWR on Bipartite Graph

n

authorsAuthor-Conf. Matrix

Observation: n >> m!

Examples: n

mConferences

p1. DBLP: 400k aus, 3.5k confs2. NetFlix: 2.7M usrs,18k mvs

Copyright: Faloutsos, Tong (2008) 3-52CIKM, 2008

SCS CMU

• Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0001

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

RWR on Skewed bipartite graphs

… .. .

. .

m confs

1/3 0 1/3 0 1/4

0.9= ×

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

00

0.10000

0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

??…. .. . . …. ...

0

0

n m

Ar

… .

. .. . …. .. . . …. ... Ac n aus

3-53

SCS CMU

• Step 1:

BB_Lin: Pre-Computation dd[Tong+ ICDM 06]

M = AcArX

2-step RWR for Conferences

m conferences

details

n authorsCopyright: Faloutsos, Tong (2008) 3-54CIKM, 2008

Page 10: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

10

SCS CMU

• Step 1:

• Step 2:

M = AcArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

m conferences

BB_Lin: Pre-Computation dd[Tong+ ICDM 06] details

• Step 2: ( 0.9 )I MΛ = − ×

n authorsCopyright: Faloutsos, Tong (2008) 3-55CIKM, 2008

SCS CMU

• Step 1:

• Step 2:

M = Ac ArX

1( 0 9 )I M −Λ = ×

2-step RWR for Conferences

All Conf-Conf Prox. Scores

BB_Lin: Pre-Computation dd[Tong+ ICDM 06] details

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

( 0.9 )I MΛ = − ×3( )O m m E+ ⋅

Ac/ArE edges

m x m

SCS CMU

BB_Lin: On-Line Stage

(Base) Case 1: - Conf - Conf

authors

Conferences

Read out !

Ac/ArE edges 3-57CIKM, 2008

SCS CMU

BB_Lin: On-Line Stage

Case 2: - Au - Conf

authors

Conferences

Ac/ArE edges

1 matrix-vec!

3-58CIKM, 2008

SCS CMU

BB_Lin: On-Line Stage

Case 3: - Au - Au

authors

Conferences

Ac/ArE edges

2 matrix-vec!

3-59CIKM, 2008

SCS CMU

BB_Lin: Examples

Dataset Off-Line Cost On-Line CostDBLP a few minutes frac. of sec.

NetFlix 1.5 hours <0.01 sec.

400k authors x 3.5k conf.s 2.7m user

x 18k movies

Copyright: Faloutsos, Tong (2008) 3-60CIKM, 2008

Page 11: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

11

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• B_Lin: RWR

• BB_Lin: Skewed BGs

• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-61CIKM, 2008

• FastUpdate: Time-Evolving

SCS CMU

Challenges

• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– On-line cost for query: fraction of seconds

• w/ 1.5 hr pre-computation for m x m core matrix

• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– On-line cost: 1.5hr itself becomes a part of this!

Copyright: Faloutsos, Tong (2008) 3-62CIKM, 2008

SCS CMU

1( 0.9 )I M −Λ = − × 3( )O m m E+ ⋅t=0

Q: How to update the core matrix?

t=11( 0.9 )I M −Λ = − ×

~ ~ 3( )O m m E+ ⋅

?Copyright: Faloutsos, Tong (2008) 3-63CIKM, 2008

SCS CMU

Update the core matrix• Step 1:

M = Ac ArX~ M= X+

details

• Step 2:1( 0.9 )I M −Λ = − ×~ ~ Rank 2 update

= + X

2( )O m 3( )O m m E+ ⋅ 3-64CIKM, 2008

SCS CMU

Update : General Case

M = AcArX~

n authors

m Conferences

details

• E’ edges changed• Involves n’ authors, m’ confs.

• Observationmin( ', ') 'n m E

Copyright: Faloutsos, Tong (2008) 3-65CIKM, 2008

SCS CMU

• Observation: – the rank of update is small!– Real Example (DBLP Post)

• 1258 time steps

Update : General Case

min( ', ') 'n m E n authors

m Conferences

details

• 1258 time steps• E’ up to ~20,000!• min(n’,m’) <=132

• Our Algorithm2(min( ', ') ')O n m m E+

3( )O m m E+ ⋅ 663-66

Page 12: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

12

SCS CMU

Fast-Single-Update

176x speeduplog(Time) (Seconds)

40x speedup

Datasets

Our method

Our method

Copyright: Faloutsos, Tong (2008)

SCS CMU

Fast-Batch-UpdateTime (Seconds)Time (Seconds)

Min (n’, m’)E’

15x speed-up on average!

Our method Our method

Copyright: Faloutsos, Tong (2008) 3-68CIKM, 2008

SCS CMU

More on “Fast Solutions”

• FastAllDAP– Simultaneously solve multiple linear systems– [Tong+ KDD 2007 a]

• MT3– Multiple-Resolution Analysis on Time [Tong+ CIKM 2008]

• Fast-ProSIN– On-Line response for users’ feedback [Tong+ ICDM 2008]

Copyright: Faloutsos, Tong (2008) 3-69CIKM, 2008

SCS CMU

Summary of Fast Solutions

• Goal: Efficiently Solve Linear System(s)

• Sols.– B_Lin: one large linear system [Tong+ ICDM06]

– BB_Lin: the intrinsic complexity is small [Tong+ ICDM06]

– FastUpdate: dynamic linear system [Tong+ SDM08]– FastAllDAP: multiple linear systems [Tong+ KDD07 a]– MT3: [Tong+ CIKM 2008]

– Fast-ProSIN: [Tong+ 2008]Copyright: Faloutsos, Tong (2008) 3-70CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-71CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-72CIKM, 2008

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

Page 13: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

13

SCS CMU

0.2

0.25

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18 Link Prediction:existence

Prox. Hist. for a set of deleted links

density

density

Prox (i j)+Prox (j i)

Prox. is effective to ‘deleted’ and absent edges!

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.05

0.1

0.15

Prox (i j)+Prox (j i)

Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]

Prox. Hist. for a set of absent links

Copyright: Faloutsos, Tong (2008) 3-73CIKM, 2008

SCS CMU

Link Prediction: direction [Tong+ KDD 07 a]

• Q: Given the existence of the link, what is the direction of the link?

• A: Compare prox(i j) and prox(j i)>70%>70%

Prox (i j) - Prox (j i)

density

CIKM, 2008

SCS CMU

Beyond Link Prediction

• Collaborative Filtering [Fouss+]

• Name Disambiguation – [Minkov+ SIGIR 06]

• Anomaly Nodes/Edges – ‘a’ is abnormal if the neighborhood of ‘a’ is so different– [Sun+ ICDM 2005]

Copyright: Faloutsos, Tong (2008) 3-75CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-76CIKM, 2008

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

SCS CMU Neighborhood Search on graphs

ICDM

KDD

SDM

Philip S. Yu

IJCAI

AAAI M. Jordan

Ning Zhong

R. Ramakrishnan

… …

NIPS…

Conference Author

A: Proximity! [Sun+ ICDM2005]

Q: what is most related conference to ICDM?

Copyright: Faloutsos, Tong (2008)CIKM, 2008

SCS CMU

NF: example

ICDM

KDD

SDM

PKDD

PAKDD

ICML0.009

0.011

0.0080.007

0.005

ICDM

ECML

CIKM

DMKD

SIGMOD

ICDE0.005

0.0050.004

0.004

0.004

Copyright: Faloutsos, Tong (2008) 3-78CIKM, 2008

Page 14: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

14

SCS CMU

gCaP: Automatic Image Caption• Q

Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger

{?, ?, ?,}

?A: Proximity!

[Pan+ KDD2004]

Copyright: Faloutsos, Tong (2008) 3-79CIKM, 2008

SCS CMU

Image

Region

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

KeywordCopyright: Faloutsos, Tong (2008) 3-80

SCS CMU

Image

Region

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

KeywordCopyright: Faloutsos, Tong (2008) 3-81

SCS CMU

C-DEM: Multi-Modal Query System for DrosophilaEmbryo Databases [Fan+ VLDB 2008]

Copyright: Faloutsos, Tong (2008) 3-82CIKM, 2008

SCS CMUC-DEM (Screen-shot)

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-84CIKM, 2008

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

Page 15: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

15

SCS CMU

Center-Piece Subgraph(CePS)

B

CePS guy

Input Output

A C

Original Graph CePS

Q: How to find hub for the black nodes?A: Proximity! [Tong+ KDD 2006]

Red: Max (Prox(A, Red) x Prox(B, Red) x Prox(C, Red))

SCS CMU

CePS: Example

R. Agrawal Jiawei Han

H.V. Jagadish

Laks V.S. Lakshmanan

Heikki

15 10 13

1 110

V. Vapnik M. Jordan

Mannila

Christos Faloutsos

Padhraic Smyth

Corinna Cortes

1 1

6

1 1

4 Daryl Pregibon

2

11 3

16

Copyright: Faloutsos, Tong (2008) 3-86CIKM, 2008

SCS CMU

K_SoftAnd: Relaxation of AND

Asking AND query? No Answer!

Disconnected Communities

Noise

Copyright: Faloutsos, Tong (2008) 3-87CIKM, 2008

SCS CMU

R. Agrawal Jiawei Han

H.V. Jagadish

Laks V.S. Lakshmanan

Umeshwar D l

1510 13

3 3

CePS: 2 SoftANDDB

V. Vapnik M. Jordan

Dayal

Bernhard Scholkopf

Peter L. Bartlett

Alex J. Smola

5 2 2

327

4

Stat.

SCS CMU

OutputInput

Data Graph

Query Graph

Best-Effort Pattern Match

Matching Subgraph

Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007 b]

CIKM, 2008

SCS CMU

G-Ray: How to?

matching nodematching node

matching node

details

matching node

Goodness = Prox (12, 4) x Prox (4, 12) xProx (7, 4) x Prox (4, 7) x

Prox (11, 7) x Prox (7, 11) xProx (12, 11) x Prox (11, 12)

Copyright: Faloutsos, Tong (2008) 3-90CIKM, 2008

Page 16: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

16

SCS CMU

Effectiveness: star-query

Query ResultCopyright: Faloutsos, Tong (2008) 3-91CIKM, 2008

SCS CMU

Effectiveness: line-query

Query

ResultCopyright: Faloutsos, Tong (2008) 3-92CIKM, 2008

SCS CMU

Query

Effectiveness: loop-query

Result

3-93CIKM, 2008

SCS CMU

Outline: Part 3

• Motivation• Definitions• Fast Solutions

• Link Prediction & +

• Ranking Related Tasks• Applications• Conclusion

Copyright: Faloutsos, Tong (2008) 3-94CIKM, 2008

• Ranking Related Tasks

• User Specific Patterns

• Time Related Tasks

SCS CMU

Challenge• Graphs are evolving over time!

–New nodes/edges show up; –Existing nodes/edges die out;

Ed i ht h–Edge weights change…

Q: How to generalize everything? A: Track Proximity! [Tong+ SDM 2008]

Copyright: Faloutsos, Tong (2008) 3-95CIKM, 2008

SCS CMU

pTrack/cTrack: Trend analysis on graph level

G.Hinton

T. SejnowskiRank of Influential-ness

M. Jordan

C. Koch

Year

Copyright: Faloutsos, Tong (2008) 3-96CIKM, 2008

Page 17: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

17

SCS CMU

pTrack: Problem Definition

• [Given]– (1) a large, skewed time-evolving bipartite

graphs, – (2) the query nodes of interest

3-97

• [Track]– (1) top-k most related nodes for each query

node at each time step t; – (2) the proximity score (or rank of proximity)

between any two query nodes at each time step t

Copyright: Faloutsos, Tong (2008)CIKM, 2008

SCS CMU

pTrack: Philip S. Yu’s Top-5 conferences up to each year

ICDEICDCS

SIGMETRICSPDISVLDB

CIKMICDCSICDE

SIGMETRICSICMCS

KDDSIGMOD

ICDMCIKM

ICDCS

ICDMKDDICDESDMVLDB

1992 1997 2002 2007

DatabasesPerformanceDistributed Sys.

DatabasesData Mining

DBLP: (Au. x Conf.)- 400k aus, - 3.5k confs - 20 yrs

Copyright: Faloutsos, Tong (2008) 3-98CIKM, 2008

SCS CMU

KDD’s Rank wrt. VLDB over years

Prox.Rank

Year

Data Mining and Databases are more and more relavant!

CIKM, 2008 3-99

SCS CMU

cTrack:10 most influential authors in NIPS community up to each year

T. Sejnowski

Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years

M. Jordan

3-100

SCS CMU

T3: Understand Time in Complex Context [Tong+ CIKM 2008]

Time Cluster, rep. entities: b7,b6, b8

Abnormal Time

Time Event Entityt1 e1 b1, b2

e2 b2, b3t2 e3 b2, b3

e b b

Input Output

rep. entities: b5,b4

Time Clusterrep. entities: b3, b2, b1

e4 b3, b4t3 e5 b4, b5t4 e6 b5, b6

e7 b6, b7t5 e8 b6, b7t6 e9 b7, b8

Copyright: Faloutsos, Tong (2008) 3-101CIKM, 2008

SCS CMU

T3: Time-to-Time Proximity Matrix

Time Event Entityt1 e1 b1, b2

e2 b2, b3t2 e3 b2, b3

e b be4 b3, b4t3 e5 b4, b5t4 e6 b5, b6

e7 b6, b7t5 e8 b6, b7t6 e9 b7, b8

Copyright: Faloutsos, Tong (2008) 3-102CIKM, 2008

Page 18: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

18

SCS CMU

More Applications• Clustering

– Proximity as input [Ding+ KDD 2007]• Email management [Minkov+ CEAS 06].

• Business Process Management [Qu+ 2008]

• ProSINProSIN– Listen to clients’ comments [Tong+ 2008]

• Ghost Edge– Within Network Classification [Gallagher & Tong+ KDD08 b]

• …

Copyright: Faloutsos, Tong (2008) 3-103

SCS CMU

gCap:

pTrack/cTrack:

LP: [Liben-Nowell+][Tong+ 2007]

NF:

CePS:

G-Ray:

T3:GhostEdge:

[Gallagher &

[Tong+ 2007]

[Tong+ 2006]

[Pan+ 2004]

[Pan+ 2005]

[Tong+ 2008]

FastAllDAP: [Tong+ 2007]

BB Lin: [Tong+ 2006]

FastUpdate: [Tong+ 2008]

ComputationsApplications

EfficientlySolve Linear

System(s)

MT3: [Tong+ 2008]Use Proximity as

Building block(gCap, CePS…)

Fast-ProSIN: [Tong+ 2008]

T3: & Tong+ 2008][Tong+ 2008]

A BH1 1

D1 1

EF

G1 11

I J11 1 RWR: [Pan+ 2004][Sun+ 2006][Tong+ 2006]

Variants: [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]

Generalizations: [Charkrabarti+ 2006][Tong+ 2007, 2008]

Properties.: [Koren+ 2006]]Tong+ 2007, 2008]

Definitions

B_Lin: [Tong+ 2006]

BB_Lin: [Tong+ 2006]

ProximityOn

Graphs

Weighted Multiple

Relationship

System(s)

SCS CMU

Take-home messages• Proximity Definitions

– RWR– and a lot of variants

Computations• Computations– Sherman–Morrison Lemma– Fast Incremental Computation

• Applications– Proximity as a building block

Copyright: Faloutsos, Tong (2008) 3-105CIKM, 2008

SCS CMU

References• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The

PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library.

• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002

• J Y Pan H J Yang C Faloutsos & P Duygulu (2004) Automatic• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004.

• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004.

• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005.

• W. Cohen. (2007) Graph Walks and Graphical Models. Draft.Copyright: Faloutsos, Tong (2008) 3-106CIKM, 2008

SCS CMU

References• P. Doyle & J. Snell. (1984) Random walks and electric

networks, volume 22. Mathematical Association America, New York.

• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006.

• A Agarwal S Chakrabarti & S Aggarwal (2006) Learning to• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.

• S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007.

• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007.

Copyright: Faloutsos, Tong (2008) 3-107CIKM, 2008

SCS CMU

References• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs:

problem definition and fast solutions. In KDD, 404-413, 2006.• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk

with Restart and Its Applications. In ICDM, 613-622, 2006.• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-

aware proximity for graph mining In KDD 747 756 2007aware proximity for graph mining. In KDD, 747-756, 2007.• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007)

Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007.

• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008.

Copyright: Faloutsos, Tong (2008) 3-108CIKM, 2008

Page 19: SCS CMU Outlinehtong/tut/cikm2008/part3_proximity.pdf · Faloutsos, Tong CIKM, 2008 4 SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0.13 0.10 0.13

Faloutsos, Tong CIKM, 2008

19

SCS CMU

References• B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos. Using

Ghost Edges for Classification in Sparsely Labeled Networks. KDD 2008

• H. Tong, Y. Sakurai, T. Eliassi-Rad, and C. Faloutsos. Fast Mining of Complex Time-Stamped Events CIKM 08

• H Tong H Qu and H Jamjoom Measuring Proximity on• H. Tong, H. Qu, and H. Jamjoom. Measuring Proximity on Graphs with Side Information. ICDM 2008

Copyright: Faloutsos, Tong (2008) 3-109CIKM, 2008