copyright 2006, data mining research laboratory an event-based framework for characterizing the...

31
Copyright 2006, Data Mining Research Laboratory An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs Sitaram Asur , Srinivasan Parthasarathy and Duygu Ucar Department of Computer Science The Ohio State University

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Copyright 2006, Data Mining Research Laboratory

An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs

Sitaram Asur, Srinivasan Parthasarathy and Duygu Ucar

Department of Computer ScienceThe Ohio State University

Copyright 2006, Data Mining Research Laboratory

Motivation

• Interaction Networks

– Represent scientific data from various domains

– Nodes represent entities– Edges represent

interactions among entities

– Examples: • Biological Networks -

Protein-Protein Interaction (PPI) networks, gene expression networks

• Collaboration networks• Social networks, online

communities, blog networks

Protein-protein interactions in yeast (Jeong et al, 2001)

Physicist collaboration network (Newman and Girvan, 2004)

Copyright 2006, Data Mining Research Laboratory

Motivation

• Mining interaction networks important– Gain insight into structure,

properties and behavior of these networks [Newman, 2001]

• Modular nature of interaction networks important– Co-expression networks :

dense components - > functional modules

– Social networks : clusters -> community structure

Copyright 2006, Data Mining Research Laboratory

Motivation • A large number of earlier approaches focused

on mining static interaction networks• Many important real-world networks are

dynamic

Temporal protein interaction network of the yeast mitotic cell cycle.

Ulrik de Lichtenberg, et al. Science 307, 724 (2005)

Copyright 2006, Data Mining Research Laboratory

Motivation• Dynamic Interaction Networks

– Nodes and interactions change over time– Structure changes in the network

• Need for a structured method to characterize and model evolution– Understand nature of change (evolution) in

networks– Consider evolution of individuals and communities– Develop models for reasoning and inference of

future events

Copyright 2006, Data Mining Research Laboratory

Workflow

Temporal Snapshots

Clustering

Event Detection

Behavioral Patterns

Analysis and

Inference

Iterate

i

Si Si+1

Ci Ci+1

Evolving Graph

Copyright 2006, Data Mining Research Laboratory

Temporal Snapshots

• Split the graph data into non-overlapping temporal snapshots– Each snapshot corresponds to a graph– Consists of all nodes and interactions active in

that time period– Nodes active if they have an interaction in a

particular time period

A BE

C G

F

D

A B

E

C G

F

D

T1 T2

Copyright 2006, Data Mining Research Laboratory

Clustering

• Represent the snapshot graphs using clusters– Clusters of a graph can provide structure information– Examine the evolution of clusters over time– Can provide insight on corresponding changes to the

graph

– MCL clustering algorithm employed in this work– Ensemble clustering approaches can be employed to

obtain robust clusters (Asur et al, ISMB 2007)

A BE

C G

F

D

A B

E

C G

F

D

T1 T2

Copyright 2006, Data Mining Research Laboratory

Community-based Event Detection

• Continue• Merge• Split• Form• Dissolve

C1

1C2

1

C22 C3

1

5

1C

5

3

5

2 CC5

4C

C

6

1C

6

4C

6

2 C

6

5C

6

3

T=1T=3T=2

4

3C

4

1C

4

2C

T=4T=5T=6

Copyright 2006, Data Mining Research Laboratory

Entity-based Event Detection

• Appear• Disappear• Join• Leave

12

C1

1

C

T=1T=3T=2T=4

C32

3

1

C

B

A

4

1C

4

2CA

B

A

B

C2

1

C22

A

B

Copyright 2006, Data Mining Research Laboratory

Event Detection

• Represent each set of snapshot clusters as a k X N binary cluster-membership matrix

• Use bitwise operators to compute the events between each successive pair of matrices (snapshots)

• Example: Continue Event

Continue (Cj, Ck) = AND (Si(j), Si+1(k)) == OR(Si(j), Si+1(k))

• Event Detection algorithm linear in the number of nodes in the graph O(N)

Copyright 2006, Data Mining Research Laboratory

Temporal Analysis

• Use critical events for analysis• Form and Dissolve events

– Used to study group formation and dissipation• Merge and Split events

– Evolution of groups• Continue events

– Stability of clusters/groups – Evolution of topics in a collaboration network

Copyright 2006, Data Mining Research Laboratory

Behavioral Analysis

• Use entity-based critical events discovered to compose incremental measures for capturing behavioral patterns

• Behavioral measures can then be used to analyze evolutionary behavior of nodes and clusters

• Four Behavioral measures – Stability Index– Sociability Index– Popularity Index– Influence Index

Copyright 2006, Data Mining Research Laboratory

Case Study 1 : DBLP Collaboration network

• Data from 28 key conferences in databases/data mining/AI over 10 years

• Authors (nodes) connected by collaborations (edges)

• 23136 nodes and 54989 edges

• Collaboration networks display many of the structural features of social networks (Kempe, Kleinberg and Tardos 2003, Newman 2001)

Copyright 2006, Data Mining Research Laboratory

Case Study 2 : Clinical Trials Network• Clinical Trials

– Can provide information on risks, benefits and optimal dosage levels.

– Consists of observations of patients under drug use as well as some under placebo

– Generally represented as a set of multivariate time series

• Evolving clinical trials network– Nodes representing patients– Correlations among patients modeled as edges– Edges change over time as correlations change

• Motivation: Use evolution of correlation to identify potential toxic effects of drugs

Copyright 2006, Data Mining Research Laboratory

Stability Index

• Propensity of a node to interact with the same group of people over time

• Stability for a node over time incrementally computed based on the stability of the clusters it belongs to

Copyright 2006, Data Mining Research Laboratory

Stability for Clinical Trials data • Nodes with low Stability Index values

represent patients with fluctuating correlation values (outliers)

• Null Hypothesis: – If the drug does not result in

toxicity, then outliers are likely to be flagged at random from each group (drug and placebo).

• Experiment on clinical trials network for diabetes patients– 19 nodes (patients) found having

Stability Index below threshold.

– The drug under study was discontinued due to possible toxic effects.

18 out of the 19 were on the drug!!!

Copyright 2006, Data Mining Research Laboratory

Sociability Index

• Incremental measure of the different interactions a node participates in

• Opposite of the Stability Index

Does not represent degree!

Copyright 2006, Data Mining Research Laboratory

Sociability Index for Community Prediction

• Goal : To identify future cluster co-occurrences based on history data for the DBLP dataset

• Key Intuition: If two authors have high sociability, and they have not yet collaborated (not been clustered together), there is a high chance they will.

• Setup : Use the data for 1997-2001 to predict cluster co-occurrences for 2002-2006

Copyright 2006, Data Mining Research Laboratory

Experimental Results

• Comparison with other measures (Liben-Nowell and Kleinberg, CIKM 2003)– Common Neighbor

– Adamic-Adar

– Jacquard

Copyright 2006, Data Mining Research Laboratory

Popularity Index

• Measure of attraction of nodes to a cluster

• Influence measure of a cluster• Does not reflect the size of the cluster• DBLP dataset

– Can be used to identify hot topics– If a large number of nodes join a cluster and they are

all working on a similar topic, it indicates a buzz around that topic for that year

Copyright 2006, Data Mining Research Laboratory

Application of Popularity Index

• Example : XML• Year 1999 : 3

authors (XML and web applications)

• Year 2000 : 50 joins

– 30 of these authors published papers on XML

Copyright 2006, Data Mining Research Laboratory

Influence Index

• Measure of influence of a node on others• Influence in terms of participation in critical events• Influence of a node initially computed as

• Follower nodes need to be pruned!

unless

Copyright 2006, Data Mining Research Laboratory

Top Influential authors – DBLP dataset

Copyright 2006, Data Mining Research Laboratory

Diffusion Models

• Study the spread of information in an evolving interaction network (Kempe et al, 2003, 2005)– Nodes activated with information– Newly activated nodes become contagious briefly– Information propagates through the network– Activation function maps weights of the links of a

node to determine if it is activated• SUM Activation: If sum of weights > threshold,

activate• MAX Activation: If any single weight >

threshold, activate

t1 t2 t3 t4

Copyright 2006, Data Mining Research Laboratory

Diffusion Models – Influence Maximization

• Influence Maximization Problem : Find initial set of nodes that can activate the most number of nodes over a time period– Critical in applications such as viral marketing and for

epidemiological research

– Complicated in the case of dynamic interaction networks as the network changes over time

• Need for dynamic measures that reflect the current status of the network– Sociability Index used to weight links

• Highly sociable nodes have high propensity to pass on information

– Influence Index to determine initial set of active nodes– Comparison with random choice of nodes and degree-based

selection (Wasserman and Faust, 1994)

Copyright 2006, Data Mining Research Laboratory

Conclusions

• Most real-world graphs dynamic in nature– Need for analysis, reasoning and

inference– Proposed an event-based framework

• Clusters to capture structure at different snapshots

• Critical events over clusters to identify dynamic properties of graphs• Behavioral patterns incrementally

composed from critical events– Proposed method useful in many

application domains• Protein function prediction, drug design,

recommender systems, viral marketing, epidemiology

Temporal Snapshots

Clustering

Event Detection

Behavioral Patterns

Analysis and

Inference

Copyright 2006, Data Mining Research Laboratory

Future Directions

• Extensions to large interaction graphs• Use of semantic information for reasoning and inference

– Merge and Split Events• If two clusters have high semantic similarity, probability of a

Merge is high

– Continue events• Track the evolution of topics• Sequences of Form, Continue, Continue …

• Multi-scale temporal modeling• Analyze snapshots of different granularity

Copyright 2006, Data Mining Research Laboratory

• Poster # 36, this evening (Mon 13th Aug, 6:15 – 9:15 pm)

• This work was supported by the following grants:– DOE Early Career Principal Investigator Award No. DE-

FG02-04ER25611– NSF CAREER Grant IIS-0347662

• Contacts:– Sitaram Asur : [email protected]– Dr Srinivasan Parthasarathy : [email protected]– Duygu Ucar : [email protected]

• Group Webpage : http://dmrl.cse.ohio-state.edu

Thanks!

Copyright 2006, Data Mining Research Laboratory

Event Detection

Copyright 2006, Data Mining Research Laboratory

Event Detection