functional module identification in biological networksfunctional module identification in...

63
Functional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center for Bioinformatics & Genomic Systems Engineering Texas A&M University 21 st European Signal Processing Conference (EUSIPCO 2013) EMBC 2017 Tutorial, July 11th, Jeju Island, Korea

Upload: others

Post on 22-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Functional Module identification in Biological Networks

Xiaoning Qian Department of Electrical & Computer Engineering;

Center for Bioinformatics & Genomic Systems Engineering Texas A&M University

21st European Signal Processing Conference (EUSIPCO 2013) EMBC 2017 Tutorial, July 11th, Jeju Island, Korea

Page 2: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Acknowledgements

Dr. Yijie Wang

Siamak Zamani Dadaneh

Profs. Byung-Jun Yoon and Mingyuan Zhou

Page 3: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Protein-Protein Interaction Networks

Jeong H, Mason S, Barabási A-L, Oltvai ZN (2001) Lethality and centrality in protein networks, Nature 441: 41-42.

21st European Signal Processing Conference (EUSIPCO 2013)

Page 4: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Gene Co-Expression Networks

Tamasin N Doig, et al. Coexpression analysis of large cancer datasets provides insight into the cellular phenotypes of the tumour microenvironment. BMC Genomics, 14:469, 2013.

Page 5: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Images à Graph Representations

Page 6: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Images à Graph Representations

Page 7: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Networks (Graphs)

Page 8: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Networks (Graphs) §  Graph representation: G = {V, E}

§  Adjacency matrix: A: Aij = 1 if (i,j) E

§  Graph Laplacian: L = D - A

It is symmetric for undirected graphs: L = BBT

It is non-negative definite.

It is directly related to graph bi-partitioning (cut size).

Page 9: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Network Clustering §  Clustering: assign vertices into groups such that

there are many edges within groups but very few across groups.

§  Graph partitioning:

§  Community detection:

§  Clustering is NP hard.

The algorithm has to divide the given graph.

Number and size of partitions are typically fixed.

Some vertices may not belong to any group.

Number and size of partitions are not fixed.

Page 10: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Module Identification

How do we define and identify biologically meaningful modules in biological networks?

Question

Page 11: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Community (Module) Identification §  Group “similar” nodes.

21st European Signal Processing Conference (EUSIPCO 2013) M. Van Steen, Complex network analysis, Course materials

Page 12: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Community (Module) Identification §  Group “similar” nodes.

21st European Signal Processing Conference (EUSIPCO 2013) M. Van Steen, Complex network analysis, Course materials

Page 13: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Community (Module) Identification §  Group “similar” nodes.

21st European Signal Processing Conference (EUSIPCO 2013) M. Van Steen, Complex network analysis, Course materials

Page 14: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Formulations & Solution Algorithms §  Group “similar” nodes. •  “Modularity”-based formulations

•  Clique, k-plex, other network motif based algorithms •  Heuristic algorithms (greedy) •  Integer programming •  Spectral algorithms – Random Walk on Graph •  Hybrid algorithms

•  Interaction-pattern-based formulations •  Non-negative matrix factorization (NMF) •  Edge partition models (Stochastic Block Models)

21st European Signal Processing Conference (EUSIPCO 2013) MEJ Newman (2010) Networks: An Introduction (ISBN 9780199206650)

Page 15: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Non-negative Matrix Factorization

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2016) Clustering of Noisy Graphs via Non-negative Matrix Factorization With Sparsity Regularization. Texas A&M ECE Tech Report (http://www.ece.tamu.edu/~xqian/FMItutorial.html)

Page 16: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Modularity §  Topologically, nodes within the same modules have

higher than “expected” connectivity.

Many existing algorithms search for these densely connected modules.

This is a “diagonal” block model.

max c

Page 17: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Modularity §  Topologically, nodes within the same modules have

higher than “expected” connectivity.

Many existing algorithms search for these densely connected modules.

This is a “diagonal” block model.

Do molecules always work together by dense connections?

Question

Page 18: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification §  Signal transduction (e.g. transmembrane) proteins

do not have high connectivity among themselves but interact with similar other types of proteins.

21st European Signal Processing Conference (EUSIPCO 2013) Pinkert S, et al.. (2010) Protein interaction networks---more than mere modules, PLoS Comput Biol 6(1).

Page 19: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification §  Signal transduction (e.g. transmembrane) proteins

do not have high connectivity among themselves but interact with similar other types of proteins.

vs.

21st European Signal Processing Conference (EUSIPCO 2013) Pinkert S, et al.. (2010) Protein interaction networks---more than mere modules, PLoS Comput Biol 6(1).

Page 20: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel “Modularity” §  More general blockmodel “modularity” by

introducing a virtual image graph

vs.

21st European Signal Processing Conference (EUSIPCO 2013) Pinkert S, et al.. (2010) Protein interaction networks---more than mere modules, PLoS Comput Biol 6(1).

Page 21: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel “Modularity” §  More general blockmodel “modularity” by

introducing a virtual image graph

However, it is computationally hard to get the global optimum due to the inherent combinatorial complexity of the resulting optimization problem.

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2013) A novel subgradient-based algorithm for blockmodel functional module identification, APBC Best Paper

Page 22: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Random Walk on Graphs §  Random walk on graphs is a Markov chain

u6 u5

u2

u8

u7

u3

u1

u4

Transition matrix:

For a connected graph, there is a steady-state distribution:

Transition probability from set S to T:

Page 23: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Random Walk and Graph Partitioning §  Solving graph cut:

Graph Laplacian:

The second smallest eigenvalue quantifies connectivity.

Normalized cut is for graph partition by solving:

Essentially, it uses the second smallest eigenvector of

21st European Signal Processing Conference (EUSIPCO 2013) Maila M and Shi J (2001) A random walks view of spectral segmentation , AI and STATISTICS 2001.

Page 24: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Random Walk and Graph Partitioning §  Solving graph cut:

vs.

21st European Signal Processing Conference (EUSIPCO 2013) Maila M and Shi J (2001) A random walks view of spectral segmentation , AI and STATISTICS 2001.

Graph Laplacian:

The second smallest eigenvalue quantifies connectivity.

Normalized cut is for graph partition by solving:

Essentially, it uses the second smallest eigenvector of

Page 25: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Random Walk and Graph Partitioning §  Normalized cut:

The objective function of normalized cut is for balanced graph partition:

We can define this as the conductance of S on graph:

Hypothesis: Functional modules have low conductance to the rest of the graph.

21st European Signal Processing Conference (EUSIPCO 2013) Maila M and Shi J (2001) A random walks view of spectral segmentation , AI and STATISTICS 2001.

Page 26: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification §  Do we capture general blockmodel modules with conductance?

1

1

1

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2014) Blockmodel module identification in PPI through Markov random walk, Bioinformatics doi:btt569.

Page 27: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification §  Can we define a new conductance which captures

interaction patterns instead of “modularity”?

Two-hop transition matrix:

Intuition: If two nodes interact with similar nodes, it is more probable that they can reach each other in two steps.

As the steady-state distribution does not change, we can define a new two-hop conductance:

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2014) Blockmodel module identification in PPI through Markov random walk, Bioinformatics doi:btt569.

Page 28: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification §  Do we capture general blockmodel modules with conductance?

1

1

1

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2014) Blockmodel module identification in PPI through Markov random walk, Bioinformatics doi:btt569.

Page 29: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification

§  We search for modules as low conductance sets:

Module assignment matrix (binary): X n-by-q

After algebraic manipulations, we can prove that

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2014) Blockmodel module identification in PPI through Markov random walk, Bioinformatics doi:btt569.

Page 30: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification

§  We can solve :

This final optimization problem can be solved by semi-definite programming or spectral approximate method.

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2014) Blockmodel module identification in PPI through Markov random walk, Bioinformatics doi:btt569.

Page 31: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Experimental Results

Page 32: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Synthetic Networks

Page 33: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Synthetic Networks

1/16 1/8 3/16 1/4 5/16 3/8 7/16 1/2 9/16 5/8 11/160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Noise level

NMI

SA (blockmodel)LCS (blockmodel)InfoModNormalized cut (P)MCL(I=3)EM

Page 34: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

Page 35: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

There is no ground truth regarding functional modules in these real-world PPI networks.

The performance is measured by F-measures based on Gene Ontology (GO) terms and KOG categories.

We collect yeast (Sce) PPI network from DIP and human (Hsa) PPI network from HPRD.

We check whether an identified module can be associated with specific GO terms or KOG categories.

Page 36: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Our (coarse−grained)Our (fine−grained)PGGS

F-measure GO Terms

F-measure GO Terms

F-measure KOG

F-measure KOG

Sce PPI network Hsa PPI network

Coarse-grained: Average module size is around 10. Fine-grained: Average module size is around 5.

Page 37: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

Page 38: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

Page 39: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

Page 40: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

Page 41: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Ongoing Research §  The current publicly accessible data may not be

complete or accurate. à Will integration of network data help improve clustering? 1.  Random Walk across networks 2.  Generative models: Edge Partition Models with

Bayesian Computation

§  What about vertex properties? 1.  Deep models: Graph Convolutional Networks

Page 42: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Interactomic data are noisy. §  The current publicly accessible data may not be

complete or accurate. Can we borrow strengths from data across different data sources or even different organisms?

Page 43: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Random Walk across Networks §  Random walk across networks to integrate both

topology and constituent similarity between nodes.

21st European Signal Processing Conference (EUSIPCO 2013) Wang Y and Qian X (2014) Joint clustering of protein interaction networks through Markov random walk, APBC 2014.

Page 44: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Random Walk across Networks §  Random walk across networks can either first take

a step within a network or take a step across networks.

§  We focus on the two-hop random walk again:

Page 45: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Blockmodel Module Identification

§  We can similarly solve :

We can again solve this by semi-definite programming or spectral approximate method.

Note that the time complexity is linear with respect to the number of networks.

Page 46: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Experimental Results

Page 47: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Biological Networks

There is no ground truth regarding functional modules in these real-world PPI networks.

We compare the top GO enriched clusters based on the average –log(p-value).

We construct two human (Hsa) PPI networks from HPRD as well as from PIPs.

We also simultaneously cluster human PPI network from HPRD and yeast (Sce) network from DIP.

Page 48: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Two is more than one.

Page 49: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Two is more than one.

Page 50: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Image Co-Segmentation

ASModel

ASModel

ASModel

ASModel

Joulin et al., CVPR10

Joulin et al., CVPR10ASModel ASModelASModel

Joint network clustering helps identify better modules.

Random walk for blockmodel module identification, which can be extended to other applications.

Page 51: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Ongoing Research §  Bayesian multiple network clustering •  Generative models: Edge Partition Models (Stochastic

Block Model) with Bayesian Computation

Zamani Dadaneh S and Qian X (2016) Bayesian module identification from multiple noisy networks. EURASIP JBSB.

21st European Signal Processing Conference (EUSIPCO 2013)

Page 52: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Stochastic Block Model (SBM) §  A network is represented by a binary adjacency

matrix A

§  Module membership can be captured by a vector z, which has a multinomial distribution with the prior π

§  Probability of the existence of an edge between two nodes (Aij = 0 or 1) is governed by θij , depending on whether zi = zj

Zamani Dadaneh S and Qian X (2016) Bayesian module identification from multiple noisy networks. EURASIP JBSB.

21st European Signal Processing Conference (EUSIPCO 2013)

Page 53: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Stochastic Block Model (SBM) §  SBM Formulation

Hofman JK and Wiggins CH (2008) Bayesian approach to network modularity. Phys. Rev. Lett. 100, 258701 .

21st European Signal Processing Conference (EUSIPCO 2013)

Page 54: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Hierarchical SBM for Multiple Nets

Zamani Dadaneh S and Qian X (2016) Bayesian module identification from multiple noisy networks. EURASIP JBSB.

21st European Signal Processing Conference (EUSIPCO 2013)

Page 55: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Experimental Results

Page 56: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Integrating 10 noisy networks

Page 57: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Edge Prediction

Page 58: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Ongoing Research §  What about vertex properties?

1.  Deep models: Graph Convolutional Networks

Page 59: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Graph Convolution Networks (GCN) §  What is GCN?

Wipf T and Wellling M (2017) Semi-Supervised Classification with Graph Convolutional Networks. ICLR.

21st European Signal Processing Conference (EUSIPCO 2013)

Page 60: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Graph Convolution Networks (GCN) §  What is GCN?

Wipf T and Wellling M (2017) Semi-Supervised Classification with Graph Convolutional Networks. ICLR.

21st European Signal Processing Conference (EUSIPCO 2013)

Page 61: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Conclusions

Markov models is one class of appropriate models that enables effective clustering of biological networks.

There are more challenges in module identification: definitions, optimization, and evaluation due to empirical properties of “measured” biological networks.

It is more appropriate to investigate interaction patterns for biological modules.

Page 62: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Acknowledgements

Drs. Byung-Jun Yoon, Mingyuan Zhou, and other collaborators for insightful discussion.

NSF Awards #1244068, #1447235, #1547557, #1553281, and NIH R21DK092845 for their kind funding support.

Dr. Yijie Wang, Siamak Zamani Dadaneh, and other students who have been working on the problem.

Page 63: Functional Module identification in Biological NetworksFunctional Module identification in Biological Networks Xiaoning Qian Department of Electrical & Computer Engineering; Center

Q&A Any Questions?

Thank you!