xiaowei ying, xintao wu, daniel barbara spectrum based fraud detection in social networks 1

Click here to load reader

Upload: megan-charles

Post on 17-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

  • Slide 1
  • Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1
  • Slide 2
  • An abstraction of collaborative attacks including spam, viral marketing, individual re-identification via active/passive attacks The attacker creates some fake nodes and uses them to attack a large set of randomly selected regular nodes; Fake nodes also mimic the real graph structure among themselves to evade detection. Random Link Attack Shirvastava et al. icde08 2
  • Slide 3
  • 3 Idea count external triangles around each node --- neighbors of a regular user have many triangles, but random victims do not. Algorithm detecting suspects clustering test and neighborhood independence test detecting RLAs GREEDY and TRWALK Limitation too many parameters high computational cost difficult to detect when there exist multiple RLAs Topology Approach Shirvastava et al. icde08
  • Slide 4
  • Our Approach Examine the spectral space of graph topology. : undirected, un-weighted, unsigned, and without considering link/node attribute information; Adjacency Matrix A (symmetric) Adjacency Eigenspace 4
  • Slide 5
  • 5 Spectral coordinate: Ying and Wu SDM09 Polbook Network
  • Slide 6
  • Spectrum Based Fraud Detection RLA from the matrix perturbation point of view 6
  • Slide 7
  • Spectrum Based Fraud Detection Approximate the spectral coordinate 7
  • Slide 8
  • Approximate the eigenvector in random link attack Regular nodes Approximation first order second order Attacking nodes 8
  • Slide 9
  • Illustrating network data 9 Network of the political blogs on the 2004 U.S. election (polblogs, 1,222 nodes and 16,714 edges) The blogs were labeled as either liberal or conservative.
  • Slide 10
  • Illustrating example Political blogs (1222, 16714): each node labeled as either liberal or conservative Add one RLA with 20 attacking nodes that have the same degree dist. as the regular ones. 10
  • Slide 11
  • Problem We do not know who are attackers/victims in the graph topology. For Random Link Attacks, we can derive the distribution of attacking nodes spectral coordinates. 11
  • Slide 12
  • The spectral coordinate of attacking node p has the normal distribution with mean and variance bounded by: We can get the region in the spectral space where RLA attacking nodes appear with high prob. Dist. of attackers spectral coordinates Inner structure of attackers does not affect the region!!! polblogs (1222, 16714), 20 attackers, each randomly attacks 30 victims 12
  • Slide 13
  • It is tedious to check every dimension one by one. The node non-randomness of RLA attackers We derive the upper bounds of mean and variance and get the decision line: Using node non-randomness 13
  • Slide 14
  • The node non-randomness of RLA attackers Identifying suspects Nodes below the decision line are suspects 14
  • Slide 15
  • RLAs with varied inner structure 15
  • Slide 16
  • SPCTRA Algorithm 16
  • Slide 17
  • Evaluation Topology based RLA detection approach Shrivastava et al. ICDE08 clustering test and neighborhood independence test GREEDY and TRWALK Experimental Setting Political blogs (1222,16714), add 1 RLA with 20 attackers Web Spam Challenge data (114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes and connection patterns. 17
  • Slide 18
  • Evaluation on political blogs (1 RLA each time) Evaluation 18
  • Slide 19
  • Evaluation on Web spam challenge data A snapshot of websites in domain.UK (2007) SPCTRA: based on spectral space GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008] Accuracy 19
  • Slide 20
  • Execution time TRWALK is 10 times faster than GREEDY (with less accuracy), but still 100 times slower than SPCTRA. Discussion of complexity is in the paper. 20
  • Slide 21
  • Bipartite Core Attacks Attacker creates two type of nodes: Accomplices: behave like normal users except heavily connecting to fraudsters to enhance fraudsters rating. Fraudsters: nodes that actually do frauds, mostly connect to accomplices. No link exists within accomplices or fraudsters. Figure from: Duen Horng Chau et. al., Detecting Fraudulent Personalities in Networks of Online Auctioneers 21 Bipartite core
  • Slide 22
  • Bipartite Core Attacks 22 20 fraudsters and 30 accomplices.
  • Slide 23
  • DDoS attacks 23 Attacker controls 10% normal nodes to attack one victim node.
  • Slide 24
  • Conclusion Present a framework that exploits the spectral space of graph topology to detect attacks. Theoretical analysis showed that attackers locate in a different region from the regular ones in the spectral space. Develop the SPCTRA algorithm for detecting RLAs. Demonstrate its effectiveness and efficiency through empirical evaluation. 24
  • Slide 25
  • Future Work Explore other attacking scenarios in both social networks and communication networks. In Sybil attacks, attackers may choose victims purposely, rather than randomly. Track how graph evolves dynamically. 25
  • Slide 26
  • Questions? Acknowledgments This work was collaborated with Xiaowei Ying and Daniel Barbara, and was supported in part by U.S. National Science Foundation IIS- 0546027, CNS-0831204 and CCF-1047621. Thank You! 26
  • Slide 27
  • 27 Another Example
  • Slide 28
  • Adjacency Eigenspace 28 Spectral coordinate: Ying and Wu SDM09 Polbook Network