discovering important nodes through graph entropy
DESCRIPTION
Discovering Important Nodes through Graph Entropy. Jitesh Shetty, Jafar Adibi [KDD’ 05] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/09/18. Outline. Introduction Order In Networks Graph Entropy Experimental Result Conclusions. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Discovering Important Nodes through Graph Entropy
Jitesh Shetty, Jafar Adibi[KDD’ 05]
Advisor: Dr. Koh Jia-LingReporter: Che-Wei, Liang
Date: 2008/09/18
1
Introduction
• A new challenge in the area of Link Discovery and Social Network Analysis– To exploit communication pattern information
and text information within knowledge discovery processes
– such as discovery of hidden organizational structure and selection of interesting prominent members
3
Introduction
• Email logs– Prime importance and relevance in the study of
information flow in an organization– Evidence database for law enforcement and
intelligence organizations to detect hidden groups in an organization which are engaged in illegal activities
• Graph entropy– To determine the most prominent interesting people
4
Order In Networks
• A graph model might not be the best representation of organizations– Such as drug dealers, terrorist organization, threat
groups
• Usually ignore their hierarchy– They are composed of leaders and followers
5
Graph Entropy (1/6)
• To find prominent people in a network– Need to aggregate links between them and discover
which node has the most effect on network– Entropy model can identify an entity that most effect
on the graph entropy
• Transform the problem space into a multigraph– Each node represents an entity, each link represents
action between entities
7
Graph Entropy (3/6)
• Let G = (V, E) be a graph. P is the probability distribution on the vertex set V(G)
• P(AemailB) =
9
Graph Entropy (4/6)
• A great concern in LD domain is that elements of data are not independent– Ex: link AsendemailtoB and link BsendemailtoC are
dependent to each other, means B may forward A’s email to C
• Three approach to discover dependency1.Examine the similarity of emails2.check
10
Graph Entropy (5/6)
3. Exploitation of Markov Blanket type of model– Assume an event(link) between two nodes is only
dependent to those node’s events
11
Experiment• Enron Email Dataset– 151 users, mostly senior management of Enron– contains 252,759 email messages– Almost all users use folders to organize their
emails
13
Experiment
• Created an Enron dictionary– Normalized all emails using porter stemming
algorithm– Compare the vectors using Jaccards Algorithm
• Ordered emails based on the time stamp
15