university at buffalothe state university of new york clustering of interaction network definition...

26
University at Buffalo The State University of New York Clustering of Interaction Clustering of Interaction Network Network Definition Process to detect densely connected sub-graphs Determines protein complexes or functional modules Difficulties Noisy data (too many false positives or false negatives) Cannot be solved by traditional clustering techniques Difficult to define the pair-wise distance between proteins in the network. Protein complexes may overlap. Disparate sources of data Different reliabilities 17%~50% Small overlaps <17%

Upload: derek-mitchell

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Clustering of Interaction NetworkClustering of Interaction Network

Definition

Process to detect densely connected sub-graphs

Determines protein complexes or functional modules

Difficulties

Noisy data (too many false positives or false negatives) Cannot be solved by traditional clustering techniques

Difficult to define the pair-wise distance between proteins in the network.

Protein complexes may overlap. Disparate sources of data

Different reliabilities 17%~50%

Small overlaps <17%

Page 2: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Protein Interaction NetworkProtein Interaction Network Undirected, unweighted graph

Node represents protein, edge represents interaction

Example of Yeast protein interaction

network

Importance

Provide a global view of cellular organizations and biological functions

Applicable to systematic approaches for functional knowledge discovery

Problem

Large scale

Complex connectivity

Page 3: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Small-world Phenomenon ( Watts & Strogatz )

Appearance of networks in the middle of regular and random networks

Higher average clustering coefficient than expected by random chance

Significantly small average shortest path length

Scale-free Distribution ( Barabasi & Albert )

Network growth by preferential attachment

Power law degree distribution – a few high degree nodes, many low degree nodes

Clustering coefficient distribution independent to degree

Protein Interaction Database DIP MIPS

density 0.0015 0.0015

average clustering coefficient 0.2283 0.2878

average shortest path length 4.14 4.43

degree distribution (γ) 1.77 1.64

high modularity

hub existence

Structural Property

Page 4: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Conventional Graph Clustering Approaches

Density-based Clustering

Finding densely connected sub-graphs ( e.g. Maximal clique algorithm )

Hierarchical Clustering

Top-down approach: iteratively partitioning a graph

( e.g. Minimum cut algorithm )

Bottom-up approach: iteratively merging nodes

( e.g. Node merging by common neighbors )

Problems

Computationally inefficient

Unable to detect overlapping clusters

Discard sparsely connected nodes

Page 5: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Functional Influence ModelFunctional Flow

treat each protein of known functional annotation as a ‘source’ of ‘functional flow’ for that function

simulating the spread of this functional flow through the neighborhoods surrounding the sources with random walk.

‘functional score’: the amount of ‘flow’ that the protein has received for that function

u vFunc(a)

Page 6: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Functional InfluenceFunctional Influence based on Distance.

Weibull Distribution

kdk edk

kdf)(1)(),;(

Curve Fitting

d is the distance between two nodes

Page 7: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Information Flow Simulation

Computation of functional influence infs(x) of s on x ∈ V based on Shortest Path

Input: a weighted interaction network and a source node s

Output: functional influence pattern of s

Measurements

Functional Influence Model

PathRatioPathRatio is the natural “aging” or “losing” of information propagation in the network.

SPath(s,y) is all the shortest paths between node s and node y.

PR(s,y) is the PathRatio between node s and node y.

PathStrength

PS(P) measures the strength of path P using weights

on the edges along the path P.

)),((

)(),( ),(

ysSPathN

pSPysPR ysSPathp

),()( 11

ii

k

ivvwpPS

Page 8: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Framework of functional influence simulation

)(),()inf()( dFysPRsysI

)(

)()(infyNx

s yxIy

Algorithm

1. Initialize inf(s)

2. Compute initial flow I(s → y) by

3. Update inf(y) by

4. Repeat 3 for every node in the network.

5. Finally, the functional profile,

is generated for every node in the network.

F(d) is the functional distribution model. d is the distance between node s and node y.

PR(s,y) is the Path Resistance between node s and node y.

Inf(s) is the initial functional influence from node s.

Infs(y) is the functional influence received by node y from node s.

)](inf),...,(inf),([inf)(21

yyyyVnsss

Page 9: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Functional Module Detection (FMD)

Page 10: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

FlowChart for functional module detection

Page 11: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Functional Modularity Detection

Experimental Data DIP (4935 proteins, 14162 interaction)

Evaluation

Functional categories and annotations from MIPS

Hyper-geometric p-value

Result

Page 12: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Computational Epidemiology

Computational Epidemiology is a multidisciplinary field utilizing techniques to develop tools and

models to aid epidemiologists in their study of the spread of diseases.

1. Developing a virus spread and containment

respond model

2. Understanding virus spread and identifying

critical properties

3. Utilizing this finding into real infectious virus

spread

4. Analyzing results of the containment

strategy (death toll vs. strategies)

Page 13: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Virus Spread Network Model

What represent nodes and edges in virus spread network model? Node

Person (community network) Town or place (road network)

Edge Interaction (community network) Pathway (road network)

Weight of nodes and edges Changed by time t based on virus

spread dynamics model Node weight: Status of health (0 ~ 1) Edge weight: Status of strength (0 ~ 1)

Page 14: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Model Scheme

Spread Model Spreading phase: edges which are in the

region of spreading will be damaged

Defense Model Signaling and propagation phase: nodes

which have a certain number of damaged edges will send signals to neighbor nodes

Defense action phase: nodes which have a certain level of signals from neighbor nodes will remove all edges of those nodes

Signaling alarms to neighbor nodes from infected neighbor node

Virus progression to neighbor nodes

Culling nodes to prevent from virus

progression

Page 15: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Spread ModelSpreading Model

Simulating disease spreading

Damaging nodes and edges which are in a virus spread radius from center

Virus Spread by r(t)

Page 16: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Defense ModelDefense Model

Simulating defense system of disease spreading and message spreading

Culling interactions from damaged nodes in order to stop spreading (Edge Culling in Green Circles)

Page 17: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Problem / Solution Approach

Which element of virus spread system has the greatest impact on containment campaign? Identifying critical element

of system by computational modeling and stochastic simulation.

How to plan a effective containment campaign for minimizing damages by virus spread? Mining best combination

of critical parameters under certain conditions.

Parameters Critical parameterSimulation & Analysis

Page 18: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Application Virus Spread Simulation on the road network at the city of Oldenburg, German

Green edges: Healthy edges Red edges: Damaged edges by spread process Blue edges: Damaged edges by defense process

Uncontrolled = 0.02

Intermediate = 0.12

Controlled = 0.22

Page 19: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Osteoporosis Osteoporosis

Definition: “a systemic skeletal disease characterized by low bone mass and micro-architectural deterioration of bone tissue leading to enhanced bone fragility and a consequent increase in fracture risk”

25 million people in the United States are suffered. $10 billion dollars are expended by medical charges including

rehabilitation and treatment facilities. Research Funding will be $200 billion by the year of 2040

Normal Osteoporosis

Page 20: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Challenges Diagnosis of Osteoporosis?

Traditional method of evaluating bone strength is by assessing bone mineral density (BMD).

Limitations on BMD A major limitation of BMD is that it incompletely reflects

variation in bone strength. Other factors like bone microarchitecture contribute

substantially to bone strength By evaluating bone microstructure we can improve

determination of bone quality and strength

Computational Model on Bone Microstructure

Page 21: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Computational Model on Bone Microstructure

Questions What is the better way to evaluate bone strength? How can we identify fragile locations of the bone structure? Why don’t we think this problem in a new direction?

Let me think this problem with the structural point of view.

Graph-based approach of bone microstructure Bone microstructure contributes on bone strength. We suppose rod-like mineral fibers represented by edges in a

graph. It is capable of quantitative

assessment of bone mineral

density and bone micro-architecture

Page 22: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Model Approach

Bone is not a uniformly solid material, but rather has some spaces between its hard elements.

Designing a network approach model for the bone microstructure.

Quantitative assessment of bone mineral density could be successfully done with this approach.

Page 23: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Bone Network Model Creating Bone Network

A femur bone image from patients with osteoporosis by DXA scan.

By image profiling on DXA scan image, we create bone network based on the bone density.

What represent nodes and edges in bone network model? Node: fiber binding point for bone

cell movements and biochemical interactions

Edge: a group of mineralized fibers Weight of nodes and edges

Node weight: average weight of directly connected edges

Edge weight: Strength status of mineralized fibers

Page 24: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Problem / Solution Approach

What alternative ways for determining the strength of bone rather than Bone Mineral Density (BMD)?Designing a computational

model of bone microstructure.

How can we identify fragile locations of the bone structure?Creating algorithms for

mining weak locations from a computational model of bone microstructure.

Bone Model

Human Bone

Page 25: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Identifying Critical Locations

Information Propagation ModelAn algorithm to find critical edges

in bone networkMeasuring the quantity of stress

energy in each edgeCutting the most critical edge by

Information Propagation Model Iteratively run to find the next

critical edges. It stops at the first isolated network

Page 26: University at BuffaloThe State University of New York Clustering of Interaction Network Definition qProcess to detect densely connected sub-graphs qDetermines

University at Buffalo The State University of New York

Conclusions

Various applications are generating data very rapidly and in great volume, demanding data mining approaches.

Network-based approaches look promising to solve complex problems.

This research requires close collaboration among multidisciplinary groups.

Semi-supervised approaches to integrate domain knowledge into data mining tools are important to the success of the research.