anomaly detection in large graphschristos/talks/16-09-arl/faloutsos-social-computing... · cmu scs...
TRANSCRIPT
CMU SCS
Anomaly Detection in Large Graphs
Christos Faloutsos CMU
CMU SCS
Thank you!
Prof. Judith Klavans
Soc. Comp. ARL (c) 2016, C. Faloutsos 2
CMU SCS
(c) 2016, C. Faloutsos 3
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 4
Graphs - why should we care?
>$10B; ~1B users
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 5
Graphs - why should we care?
Internet Map [lumeta.com]
Soc. Comp. ARL
computer network security: • Email traffic • IP traffic (src, dst, dst-port, t)
Malware propagation • (machine-id, infected-file-id)
CMU SCS
(c) 2016, C. Faloutsos 7
Graphs - why should we care? • web-log (‘blog’) news propagation • Recommendation systems • Documents & terms (and plagiarism; fake
appstore reviews, ....)
• Many-to-many db relationship -> graph
Soc. Comp. ARL
CMU SCS
Motivating problems • P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs / tensors
Soc. Comp. ARL (c) 2016, C. Faloutsos 8
time
destination
CMU SCS
Motivating problems • P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs / tensors
Soc. Comp. ARL (c) 2016, C. Faloutsos 9
time
destination
Patterns anomalies
CMU SCS
(c) 2016, C. Faloutsos 10
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns & fraud detection • Part#2: time-evolving graphs; tensors • Conclusions
Soc. Comp. ARL
CMU SCS
Soc. Comp. ARL (c) 2016, C. Faloutsos 11
Part 1: Patterns, &
fraud detection
CMU SCS
(c) 2016, C. Faloutsos 12
Laws and patterns • Q1: Are real graphs random?
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 13
Laws and patterns • Q1: Are real graphs random? • A1: NO!!
– Diameter (‘6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns
• So, let’s look at the data
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 23
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 24
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles – Friends of friends are friends
• Any patterns? – 2x the friends, 2x the triangles ?
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 25
Triangle Law: #S.3 [Tsourakakis ICDM 2008]
SN Reuters
Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles
Soc. Comp. ARL
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
26 Soc. Comp. ARL 26 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
27 Soc. Comp. ARL 27 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
28 Soc. Comp. ARL 28 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
29 Soc. Comp. ARL 29 (c) 2016, C. Faloutsos
CMU SCS
MORE Graph Patterns
Soc. Comp. ARL (c) 2016, C. Faloutsos 30
✔
✔ ✔
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.
CMU SCS
MORE Graph Patterns
Soc. Comp. ARL (c) 2016, C. Faloutsos 31
• Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal)
• Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.
CMU SCS
(c) 2016, C. Faloutsos 44
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection
• CopyCatch • Spectral methods (‘fBox’) • Belief Propagation
• Part#2: time-evolving graphs; tensors • Conclusions Soc. Comp. ARL
CMU SCS
Soc. Comp. ARL (c) 2016, C. Faloutsos 45
E-bay Fraud detection
w/ Polo Chau & Shashank Pandit, CMU [www’07]
CMU SCS
Soc. Comp. ARL (c) 2016, C. Faloutsos 46
E-bay Fraud detection
CMU SCS
Soc. Comp. ARL (c) 2016, C. Faloutsos 47
E-bay Fraud detection
CMU SCS
Soc. Comp. ARL (c) 2016, C. Faloutsos 48
E-bay Fraud detection - NetProbe
CMU SCS
Popular press
And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of
your code?’) Soc. Comp. ARL (c) 2016, C. Faloutsos 49
CMU SCS
(c) 2016, C. Faloutsos 55
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions
Soc. Comp. ARL
CMU SCS
Soc. Comp. ARL (c) 2016, C. Faloutsos 56
Part 2: Time evolving
graphs; tensors
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
Soc. Comp. ARL (c) 2016, C. Faloutsos 57
smith
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
Soc. Comp. ARL (c) 2016, C. Faloutsos 58
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
Soc. Comp. ARL (c) 2016, C. Faloutsos 59
Mon Tue
CMU SCS
Graphs over time -> tensors! • Problem #2:
– Given who calls whom, and when – Find patterns / anomalies
Soc. Comp. ARL (c) 2016, C. Faloutsos 60 callee
caller
CMU SCS
Graphs over time -> tensors! • Problem #2’:
– Given author-keyword-date – Find patterns / anomalies
Soc. Comp. ARL (c) 2016, C. Faloutsos 61 keyword
author
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2’’:
– Given subject – verb – object facts – Find patterns / anomalies
Soc. Comp. ARL (c) 2016, C. Faloutsos 62 object
subject
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2’’’:
– Given <triplets> – Find patterns / anomalies
Soc. Comp. ARL (c) 2016, C. Faloutsos 63 mode2
mode1
MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)
CMU SCS
(c) 2016, C. Faloutsos 64
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– Intro to tensors – Results – Speed
• Conclusions
Soc. Comp. ARL
CMU SCS Answer to both: tensor
factorization • Recall: (SVD) matrix factorization: finds
blocks
Soc. Comp. ARL (c) 2016, C. Faloutsos 65
N users
M products
‘meat-eaters’ ‘steaks’
‘vegetarians’ ‘plants’
‘kids’ ‘cookies’
~ + +
CMU SCS Answer to both: tensor
factorization • PARAFAC decomposition
Soc. Comp. ARL (c) 2016, C. Faloutsos 66
= + + subject
object
politicians artists athletes
CMU SCS
Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when
– 4M x 15 days
Soc. Comp. ARL (c) 2016, C. Faloutsos 67
= + + caller
callee
?? ?? ??
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day!
1 caller 5 receivers 4 days of activity
Soc. Comp. ARL 68 (c) 2016, C. Faloutsos
=
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day! Soc. Comp. ARL 69 (c) 2016, C. Faloutsos
=
1 caller 5 receivers 4 days of activity
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day! Soc. Comp. ARL 70 (c) 2016, C. Faloutsos
=
Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.
CMU SCS
(c) 2016, C. Faloutsos 86
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Acknowledgements and Conclusions
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 87
Thanks
Soc. Comp. ARL
Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies
CMU SCS
(c) 2016, C. Faloutsos 89
Cast
Akoglu, Leman
Chau, Polo
Kang, U
Hooi, Bryan
Amazon'16
Koutra, Danai
Beutel, Alex
Papalexakis, Vagelis
Shah, Neil
Araujo, Miguel
Song, Hyun Ah
Eswaran, Dhivya
Shin, Kijung
CMU SCS
(c) 2016, C. Faloutsos 90
CONCLUSION#1 – Big data • Patterns Anomalies
• Large datasets reveal patterns/outliers that are invisible otherwise
Soc. Comp. ARL
CMU SCS
(c) 2016, C. Faloutsos 91
CONCLUSION#2 – tensors
• powerful tool
Soc. Comp. ARL
=
1 caller 5 receivers 4 days of activity
CMU SCS
(c) 2016, C. Faloutsos 92
References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/
S00449ED1V01Y201209DMK006
Soc. Comp. ARL
CMU SCS
TAKE HOME MESSAGE: Cross-disciplinarity
Soc. Comp. ARL (c) 2016, C. Faloutsos 93
=
CMU SCS
Cross-disciplinarity
Soc. Comp. ARL (c) 2016, C. Faloutsos 94
=
Thank you!