anomaly detection in large graphschristos/talks/16-09-arl/faloutsos-social-computing... · cmu scs...

51
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU

Upload: others

Post on 04-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Anomaly Detection in Large Graphs

Christos Faloutsos CMU

Page 2: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Thank you!

Prof. Judith Klavans

Soc. Comp. ARL (c) 2016, C. Faloutsos 2

Page 3: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 3

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Conclusions

Soc. Comp. ARL

Page 4: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 4

Graphs - why should we care?

>$10B; ~1B users

Soc. Comp. ARL

Page 5: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 5

Graphs - why should we care?

Internet Map [lumeta.com]

Soc. Comp. ARL

computer network security: •  Email traffic •  IP traffic (src, dst, dst-port, t)

Malware propagation •  (machine-id, infected-file-id)

Page 6: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 7

Graphs - why should we care? •  web-log (‘blog’) news propagation •  Recommendation systems •  Documents & terms (and plagiarism; fake

appstore reviews, ....)

•  Many-to-many db relationship -> graph

Soc. Comp. ARL

Page 7: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Motivating problems •  P1: patterns? Fraud detection?

•  P2: patterns in time-evolving graphs / tensors

Soc. Comp. ARL (c) 2016, C. Faloutsos 8

time

destination

Page 8: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Motivating problems •  P1: patterns? Fraud detection?

•  P2: patterns in time-evolving graphs / tensors

Soc. Comp. ARL (c) 2016, C. Faloutsos 9

time

destination

Patterns anomalies

Page 9: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 10

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns & fraud detection •  Part#2: time-evolving graphs; tensors •  Conclusions

Soc. Comp. ARL

Page 10: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Soc. Comp. ARL (c) 2016, C. Faloutsos 11

Part 1: Patterns, &

fraud detection

Page 11: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 12

Laws and patterns •  Q1: Are real graphs random?

Soc. Comp. ARL

Page 12: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 13

Laws and patterns •  Q1: Are real graphs random? •  A1: NO!!

– Diameter (‘6 degrees’; ‘Kevin Bacon’) –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

Soc. Comp. ARL

Page 13: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 23

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

Soc. Comp. ARL

Page 14: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 24

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns? –  2x the friends, 2x the triangles ?

Soc. Comp. ARL

Page 15: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 25

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

Soc. Comp. ARL

Page 16: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

26 Soc. Comp. ARL 26 (c) 2016, C. Faloutsos

Page 17: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

27 Soc. Comp. ARL 27 (c) 2016, C. Faloutsos

Page 18: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

28 Soc. Comp. ARL 28 (c) 2016, C. Faloutsos

Page 19: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

29 Soc. Comp. ARL 29 (c) 2016, C. Faloutsos

Page 20: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

MORE Graph Patterns

Soc. Comp. ARL (c) 2016, C. Faloutsos 30

✔ ✔

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.

Page 21: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

MORE Graph Patterns

Soc. Comp. ARL (c) 2016, C. Faloutsos 31

•  Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal)

•  Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.

Page 22: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 44

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs

– Patterns – Anomaly / fraud detection

•  CopyCatch •  Spectral methods (‘fBox’) •  Belief Propagation

•  Part#2: time-evolving graphs; tensors •  Conclusions Soc. Comp. ARL

Page 23: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Soc. Comp. ARL (c) 2016, C. Faloutsos 45

E-bay Fraud detection

w/ Polo Chau & Shashank Pandit, CMU [www’07]

Page 24: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Soc. Comp. ARL (c) 2016, C. Faloutsos 46

E-bay Fraud detection

Page 25: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Soc. Comp. ARL (c) 2016, C. Faloutsos 47

E-bay Fraud detection

Page 26: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Soc. Comp. ARL (c) 2016, C. Faloutsos 48

E-bay Fraud detection - NetProbe

Page 27: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Popular press

And less desirable attention: •  E-mail from ‘Belgium police’ (‘copy of

your code?’) Soc. Comp. ARL (c) 2016, C. Faloutsos 49

Page 28: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 55

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Conclusions

Soc. Comp. ARL

Page 29: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Soc. Comp. ARL (c) 2016, C. Faloutsos 56

Part 2: Time evolving

graphs; tensors

Page 30: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

Soc. Comp. ARL (c) 2016, C. Faloutsos 57

smith

Page 31: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

Soc. Comp. ARL (c) 2016, C. Faloutsos 58

Page 32: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

Soc. Comp. ARL (c) 2016, C. Faloutsos 59

Mon Tue

Page 33: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Graphs over time -> tensors! •  Problem #2:

– Given who calls whom, and when – Find patterns / anomalies

Soc. Comp. ARL (c) 2016, C. Faloutsos 60 callee

caller

Page 34: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Graphs over time -> tensors! •  Problem #2’:

– Given author-keyword-date – Find patterns / anomalies

Soc. Comp. ARL (c) 2016, C. Faloutsos 61 keyword

author

MANY more settings, with >2 ‘modes’

Page 35: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Graphs over time -> tensors! •  Problem #2’’:

– Given subject – verb – object facts – Find patterns / anomalies

Soc. Comp. ARL (c) 2016, C. Faloutsos 62 object

subject

MANY more settings, with >2 ‘modes’

Page 36: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Graphs over time -> tensors! •  Problem #2’’’:

– Given <triplets> – Find patterns / anomalies

Soc. Comp. ARL (c) 2016, C. Faloutsos 63 mode2

mode1

MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)

Page 37: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 64

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

–  Intro to tensors – Results – Speed

•  Conclusions

Soc. Comp. ARL

Page 38: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS Answer to both: tensor

factorization •  Recall: (SVD) matrix factorization: finds

blocks

Soc. Comp. ARL (c) 2016, C. Faloutsos 65

N users

M products

‘meat-eaters’ ‘steaks’

‘vegetarians’ ‘plants’

‘kids’ ‘cookies’

~ + +

Page 39: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS Answer to both: tensor

factorization •  PARAFAC decomposition

Soc. Comp. ARL (c) 2016, C. Faloutsos 66

= + + subject

object

politicians artists athletes

Page 40: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Answer: tensor factorization •  PARAFAC decomposition •  Results for who-calls-whom-when

–  4M x 15 days

Soc. Comp. ARL (c) 2016, C. Faloutsos 67

= + + caller

callee

?? ?? ??

Page 41: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!

1 caller 5 receivers 4 days of activity

Soc. Comp. ARL 68 (c) 2016, C. Faloutsos

=

Page 42: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day! Soc. Comp. ARL 69 (c) 2016, C. Faloutsos

=

1 caller 5 receivers 4 days of activity

Page 43: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day! Soc. Comp. ARL 70 (c) 2016, C. Faloutsos

=

Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

Page 44: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 86

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Acknowledgements and Conclusions

Soc. Comp. ARL

Page 45: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 87

Thanks

Soc. Comp. ARL

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies

Page 46: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 89

Cast

Akoglu, Leman

Chau, Polo

Kang, U

Hooi, Bryan

Amazon'16

Koutra, Danai

Beutel, Alex

Papalexakis, Vagelis

Shah, Neil

Araujo, Miguel

Song, Hyun Ah

Eswaran, Dhivya

Shin, Kijung

Page 47: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 90

CONCLUSION#1 – Big data •  Patterns Anomalies

•  Large datasets reveal patterns/outliers that are invisible otherwise

Soc. Comp. ARL

Page 48: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 91

CONCLUSION#2 – tensors

•  powerful tool

Soc. Comp. ARL

=

1 caller 5 receivers 4 days of activity

Page 49: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

(c) 2016, C. Faloutsos 92

References •  D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,

Tools and Case Studies, Morgan Claypool 2012 •  http://www.morganclaypool.com/doi/abs/10.2200/

S00449ED1V01Y201209DMK006

Soc. Comp. ARL

Page 50: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

TAKE HOME MESSAGE: Cross-disciplinarity

Soc. Comp. ARL (c) 2016, C. Faloutsos 93

=

Page 51: Anomaly Detection in Large Graphschristos/TALKS/16-09-ARL/faloutsos-Social-Computing... · CMU SCS Anomaly detection in time-evolving graphs • Anomalous communities in phone call

CMU SCS

Cross-disciplinarity

Soc. Comp. ARL (c) 2016, C. Faloutsos 94

=

Thank you!