neighborhood formation and anomaly detection in bipartite graphs
DESCRIPTION
Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos. Neighborhood Formation and Anomaly Detection in Bipartite Graphs. Speaker: Jimeng Sun. Bipartite Graphs. G={ V 1 + V 2 , E} such that edges are between V 1 and V 2 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/1.jpg)
Neighborhood Formation and Anomaly Detection in Bipartite
Graphs
Jimeng Sun Huiming Qu
Deepayan Chakrabarti Christos Faloutsos
Speaker: Jimeng Sun
![Page 2: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/2.jpg)
2
Bipartite Graphs• G={V1 +V2, E} such that
edges are between V1 and V2
• Many applications can be modeled using bipartite graphs
• The key is to utilize these links across two natural groups for data mining
E
a1
ak
a5
a4
a3
a2
t1
tn
t5
t4
t3
t2
V1 V2
![Page 3: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/3.jpg)
3
Problem Definition• Neighborhood formation (NF)
• Given a query node a in V1, what are the relevance scores of all the nodes in V1 to a ?
• Anomaly detection (AD)
• Given a query node a in V1, what are the normality scores for nodes in V2 that link to a ?
V1 V2
a
.3
.2
.05
.01
.002
.01
.25
.25
.05
![Page 4: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/4.jpg)
4
Application I: Publication network
• Authors vs. papers in research communities
• Interesting queries:• Which authors are most related to Dr.
Carman?• Which is the most unusual paper written
by Dr. Carman?
![Page 5: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/5.jpg)
5
Application II: P2P network • Users vs. files in P2P systems• Interesting queries:
• Find the users with similar preferences to me• Locate files that are downloaded by users
with very different preferences
users
files
![Page 6: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/6.jpg)
6
Application III: Financial Trading• Traders vs. stocks in stock
markets• Interesting queries:
• Which are the most similar stocks to company A?
• Find most unusual traders (i.e., cross sectors)
![Page 7: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/7.jpg)
7
Application IV: Collaborative filtering• collaborative filtering • recommendation system Customers Products
![Page 8: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/8.jpg)
8
Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work
![Page 9: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/9.jpg)
9
Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work
![Page 10: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/10.jpg)
10
Neighborhood formation – intuition
Input: a graph G and a query node q
Output: relevance scores to q
• random-walk with restart from q in V1
• record the probability visiting each node in V1
• the nodes with higher probability are the neighbors
V1 V2
q
.3
.2
.05
.01.002
.01
![Page 11: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/11.jpg)
11
Exact neighborhood formation
Input: a graph G and a query node qOutput: relevance scores to q
• Construct the transition matrix P where • every node in the graph becomes a
state • every state has a restart probability c
to jump back to the query node q.• transition probability
• Find the steady-state probability u which is the relevance score of all the nodes to q
q
cc c
c
(1-c)
c
![Page 12: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/12.jpg)
12
Approximate neighborhood formation
• Scalability problem with exact neighborhood formation: • too expensive to do for every
single node in V1
• Observation: • Nodes that are far away from
q have almost 0 relevance scores.
• Idea:• Partition the graphs and apply
neighborhood formation for the partition containing q.
![Page 13: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/13.jpg)
13
Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work
![Page 14: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/14.jpg)
14
Anomaly detection - intuition
• t in V2 is normal if all a in V1 that link to t belong to the same neighborhood
• e.g.
low normalityhigh normality
tt
![Page 15: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/15.jpg)
15
S
Anomaly detection - methodInput: a query node q from V2
Output: the normality score of q
• Find the set of nodes connected to q, say S
• Compute relevance scores of elements in S, denoted as rs
• Apply score function f(rs) to obtain normality scores:• e.g. f(rs) = mean(rs)
q
![Page 16: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/16.jpg)
16
Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work
![Page 17: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/17.jpg)
17
Datasets
datasets |V1| |V2| |E| Avgdeg(V1) Avgdeg(V2)
Conference-Author (CA)
2687
288K 662K
510 5
Author-Paper (AP)
316K
472K 1M 3 2
IMDB 553K
204k 2.2M 4 11
![Page 18: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/18.jpg)
18
Goals
[Q1]: Do the neighborhoods make sense? (NF)
[Q2]: How accurate is the approximate NF?
[Q3]: Do the anomalies make sense? (AD)[Q4]: What about the computational cost?
![Page 19: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/19.jpg)
19
[Q1] Exact NF
• The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node.
• The relevance scores can quantify how close/related the node is to the query node.
rele
van
ce s
core
most relevant neighbors
rele
van
ce s
core
most relevant neighbors
ICDM (CA) Robert DeNiro (IMDB)
![Page 20: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/20.jpg)
20
[Q2] Approximate NF
• Precision = fraction of overlaps between ApprNF and NF among top k neighbors
• The precision drops slowly while increasing the number of partition
• The precision remain high for a wide range of neighborhood size
neighborhood size = 20 num of partitions = 10
# of partitions
Pre
cisi
on
Pre
cisi
on
neighborhood size
![Page 21: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/21.jpg)
21
[Q3] Anomaly detection
• Randomly inject some nodes and edges (biased towards high-degree nodes)
• The genuine ones on average have high normality score than the injected ones
nor
mal
ity
scor
e
![Page 22: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/22.jpg)
22
[Q4] Computational cost
• Even with a small number of partitions, the computational cost can be reduced dramatically.
Approximate NF
Tim
e(se
c)
# of Partitions
![Page 23: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/23.jpg)
23
Related Work• Random walk
[Brin & Page98] [Haveliwala WWW02] • Graph partitioning
[Karypis and Kumar98] [Kannan et al. FOCS00]• Collaborative filtering
[Shardanand&Maes95] …• Anomaly detection
[Aggarwal&Yu. SIMOD01] [Noble&Cook KDD03] [Newman03]
![Page 24: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/24.jpg)
24
Conclusion• Two important queries on bipartite
graphs: NF and AD• An efficient method for NF using random-
walk with restart and graph partitioning techniques
• Based the result of NF, we can also spot anomalies (AD)
• Effectiveness is confirmed on real datasets
![Page 25: Neighborhood Formation and Anomaly Detection in Bipartite Graphs](https://reader036.vdocument.in/reader036/viewer/2022062723/56813cea550346895da69527/html5/thumbnails/25.jpg)
25
Future work and Q & A• Future work
• What about time-evolving graphs?
• Contact:Jimeng [email protected]://www.cs.cmu.edu/~jimeng