collaborative social network discovery from - cpdiehl.org presentation - april 07.pdf ·...
TRANSCRIPT
Collaborative SocialNetwork Discovery fromOnline Communications
Chris Diehl
USMA-ARI Network ScienceWorkshop
Collaboration with Lise Getoorand Galileo Namata, University
of Maryland – College Park
2
The Question
Organizations today utilize a number ofcommunication channels
Email, Instant Messaging, Text Messaging,Wikis, Blogs
Given access to an organization’sonline communications, how does oneinfer relationship and role types withinthe organization from the data?
From: [email protected]
Subject: Re: trade
My friend John says ….
3
Data Attributes
Structured Data (Metadata)
Sender and recipient(s), datetime
Can identify patterns of communication from metadata
Metadata provides no relationship context
Unstructured Data (Content)
Message subject and body, attachments
Content may provide relationship and role information
Additional context may be needed to clarify the message
Goal is to exploit complimentary cues offered bythe metadata and content
4
Identifying Key Actors –A Motivating Example
From: Jennifer Fraser
Subject: john arnold bid for 20,000?
true? and when do you plan on selling them?
From: John Arnold
exaggerations...word travels everywhere doesnt it?how'd you hear?
From: Jennifer Fraser
johnny johhny johnny-- there is no secrecy whenone is the king of ng .. your brokers have thebiggest moves in the world…
5
Representations: Data and Network
Nodes: Network ReferencesEdges: Communication Events
Communication
(Hyper)GraphNetwork (Hyper)Graph
Nodes: EntitiesEdges: Social Relationships
HP Labs Communication Graph(Adamic and Adar, 2003)
6
Collaborative Social Network Discovery
Entity ResolutionRelationship Identification
Incremental MachineLearning from Context
Communication Graph
Validated Network
7
Before After
Entity Resolution:InfoVis Co-Author Network Fragment
8
D-Dupe: An Interactive Toolfor Entity Resolution
http://www.cs.umd.edu/projects/linqs/ddupe
9
Entity Resolution:Name and Network References
Datetime: 2001-01-23 09:45:00
Sender: [email protected]
Recipients: [email protected]
Subject: Hedge Funds
Tana: Other than your emailattached, have you had otherdiscussions with Mark or creditabout hedge funds? Sara
NetworkReferences
NameReferences
Every individualhas two classes ofreferences
To define anindividual’s identityand draw broaderconnections acrossemails, we need tofirst associatename and networkreferences
Reference: C. P. Diehl, L. Getoor, G. Namata, "Name Reference Resolution
in Organizational Email Archives," SIAM Data Mining 2006
10
Context Challenges
Datetime: 2001-02-2809:32:00
Sender:[email protected]
Recipients:[email protected]
Subject: Greg s Bill
Johnny, What does Greg oweyou for the champagne? Is it$896.00? Liz
Datetime: 2000-06-1909:52:00
Sender:[email protected]
Recipients:[email protected]
Subject: Just a tease!!!
Wouldn t you like to knowwhich of the two Susan sgave her notice today
11
Relationship Identification -Incremental Ego Network Exploration
Tracy Ngo6
Dave Fuller5
Mark Haedicke4
Steve Hall3
Richard Sanders2
Elizabeth Sager1
Relationship with Ego(Christian Yoder)
Rank
Question about a dealwe did
4
Mark Taylor Visit3
System Outage Risk2
Happiness1
Message SubjectRank
From: Christian Yoder[[email protected]]
To: Elizabeth Sager[[email protected]],
Genia Fitzgerald[[email protected]]
Subject: Happiness
Happiness is looking at the new legalorg chart (which Jan just nowdropped on my desk). I alwaysapproach these dry documents asthough they were trigrams resultingfrom throwing the coins andconsulting the I-Ching. At the top ofthe trigram which I find myself listedin I see a single name: ElizabethSager, and at the bottom I see thename Genia FitzGerald. ... cgy
Relationship Ranking Message Ranking
Evidence Discovery
Reference: C. P. Diehl, G. Namata, L. Getoor, ”Relationship Identification
for Social Network Discovery," AAAI 2007
12
Enron Manager-SubordinateCommunications Relationships
l @
pinto. [email protected]
bill. [email protected]
13
Relationship Identification -Manager-Subordinate Relations
Preference Learning
Supervised learning of relationshipranker
Given initial set of labeled ego networks
Ranking dyadic relationships
Traffic-Based Approach
Message frequency
Number of recipients
Exchanges between relationshipparticipants and common recipients
Content-Based Approach
Term frequency vector for set ofmessages corresponding to therelationship
Exploits text from sender to recipient
0.141Worst Case
0.211RandomSelection
0.518Traffic-Based
0.660Content-Based
0.719
Content-Based withAttributeSelection
MeanReciprocal
Rank
Approach
14
Future Directions
Incremental, Active Learning
Relationship-Level and Message-Level Annotations
Automated Model Selection
Automated Feature Selection
Visualization
Communications Graph Exploration
Network Graph Construction
Interaction Paradigms
Unified Workflow for Entity Resolution and
Relationship Identification