collaborative social network discovery from

14
Collaborative Social Network Discovery from Online Communications Chris Diehl USMA-ARI Network Science Workshop Collaboration with Lise Getoor and Galileo Namata, University of Maryland – College Park

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Collaborative SocialNetwork Discovery fromOnline Communications

Chris Diehl

USMA-ARI Network ScienceWorkshop

Collaboration with Lise Getoorand Galileo Namata, University

of Maryland – College Park

2

The Question

Organizations today utilize a number ofcommunication channels

Email, Instant Messaging, Text Messaging,Wikis, Blogs

Given access to an organization’sonline communications, how does oneinfer relationship and role types withinthe organization from the data?

To: [email protected]

From: [email protected]

Subject: Re: trade

My friend John says ….

3

Data Attributes

Structured Data (Metadata)

Sender and recipient(s), datetime

Can identify patterns of communication from metadata

Metadata provides no relationship context

Unstructured Data (Content)

Message subject and body, attachments

Content may provide relationship and role information

Additional context may be needed to clarify the message

Goal is to exploit complimentary cues offered bythe metadata and content

4

Identifying Key Actors –A Motivating Example

From: Jennifer Fraser

Subject: john arnold bid for 20,000?

true? and when do you plan on selling them?

From: John Arnold

exaggerations...word travels everywhere doesnt it?how'd you hear?

From: Jennifer Fraser

johnny johhny johnny-- there is no secrecy whenone is the king of ng .. your brokers have thebiggest moves in the world…

5

Representations: Data and Network

Nodes: Network ReferencesEdges: Communication Events

Communication

(Hyper)GraphNetwork (Hyper)Graph

Nodes: EntitiesEdges: Social Relationships

HP Labs Communication Graph(Adamic and Adar, 2003)

6

Collaborative Social Network Discovery

Entity ResolutionRelationship Identification

Incremental MachineLearning from Context

Communication Graph

Validated Network

7

Before After

Entity Resolution:InfoVis Co-Author Network Fragment

8

D-Dupe: An Interactive Toolfor Entity Resolution

http://www.cs.umd.edu/projects/linqs/ddupe

9

Entity Resolution:Name and Network References

Datetime: 2001-01-23 09:45:00

Sender: [email protected]

Recipients: [email protected]

Subject: Hedge Funds

Tana: Other than your emailattached, have you had otherdiscussions with Mark or creditabout hedge funds? Sara

NetworkReferences

NameReferences

Every individualhas two classes ofreferences

To define anindividual’s identityand draw broaderconnections acrossemails, we need tofirst associatename and networkreferences

Reference: C. P. Diehl, L. Getoor, G. Namata, "Name Reference Resolution

in Organizational Email Archives," SIAM Data Mining 2006

10

Context Challenges

Datetime: 2001-02-2809:32:00

Sender:[email protected]

Recipients:[email protected]

Subject: Greg s Bill

Johnny, What does Greg oweyou for the champagne? Is it$896.00? Liz

Datetime: 2000-06-1909:52:00

Sender:[email protected]

Recipients:[email protected]

Subject: Just a tease!!!

Wouldn t you like to knowwhich of the two Susan sgave her notice today

11

Relationship Identification -Incremental Ego Network Exploration

Tracy Ngo6

Dave Fuller5

Mark Haedicke4

Steve Hall3

Richard Sanders2

Elizabeth Sager1

Relationship with Ego(Christian Yoder)

Rank

Question about a dealwe did

4

Mark Taylor Visit3

System Outage Risk2

Happiness1

Message SubjectRank

From: Christian Yoder[[email protected]]

To: Elizabeth Sager[[email protected]],

Genia Fitzgerald[[email protected]]

Subject: Happiness

Happiness is looking at the new legalorg chart (which Jan just nowdropped on my desk). I alwaysapproach these dry documents asthough they were trigrams resultingfrom throwing the coins andconsulting the I-Ching. At the top ofthe trigram which I find myself listedin I see a single name: ElizabethSager, and at the bottom I see thename Genia FitzGerald. ... cgy

Relationship Ranking Message Ranking

Evidence Discovery

Reference: C. P. Diehl, G. Namata, L. Getoor, ”Relationship Identification

for Social Network Discovery," AAAI 2007

13

Relationship Identification -Manager-Subordinate Relations

Preference Learning

Supervised learning of relationshipranker

Given initial set of labeled ego networks

Ranking dyadic relationships

Traffic-Based Approach

Message frequency

Number of recipients

Exchanges between relationshipparticipants and common recipients

Content-Based Approach

Term frequency vector for set ofmessages corresponding to therelationship

Exploits text from sender to recipient

0.141Worst Case

0.211RandomSelection

0.518Traffic-Based

0.660Content-Based

0.719

Content-Based withAttributeSelection

MeanReciprocal

Rank

Approach

14

Future Directions

Incremental, Active Learning

Relationship-Level and Message-Level Annotations

Automated Model Selection

Automated Feature Selection

Visualization

Communications Graph Exploration

Network Graph Construction

Interaction Paradigms

Unified Workflow for Entity Resolution and

Relationship Identification