flexible and robust multi-network clusteringflexible and robust multi-network clustering jingchao...

27
Flexible and Robust Multi-Network Clustering Jingchao Ni 1 , Hanghang Tong 2 , Wei Fan 3 , Xiang Zhang 1 1 Department of Electrical Engineering and Computer Science, Case Western Reserve University 2 School of Computing, Informatics, Decision Systems Engineering, Arizona State University 3 Baidu Research Big Data Lab The 21 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Upload: others

Post on 18-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Flexible and Robust Multi-Network

Clustering

Jingchao Ni1, Hanghang Tong2, Wei Fan3, Xiang Zhang1 1Department of Electrical Engineering and Computer Science, Case Western Reserve University

2School of Computing, Informatics, Decision Systems Engineering, Arizona State University 3Baidu Research Big Data Lab

The 21th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Page 2: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Network Clustering

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Network Data are ubiquitous

Web networks

Social networks

Biological networks, etc.

Network Clustering

Detect sub-networks that satisfy

certain properties

Many connections within clusters

and few connections across

clusters

Page 3: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Network Clustering

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Network Data are ubiquitous

Web networks

Social networks

Biological networks, etc.

Network Clustering

Detect sub-networks that satisfy

certain properties

Many connections within clusters

and few connections across

clusters

Page 4: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Network Clustering

Network Data are ubiquitous

Web networks

Social networks

Biological networks, etc.

Network Clustering

Detect sub-networks that satisfy

certain properties

Many connections within clusters

and few connections across

clusters Coauthorship network between physicisits

Figure from “Mark EJ Newman and Michelle Girvan. Finding and

evaluating community structure in networks. Physical review

E 69.2 (2004): 026113.”

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Page 5: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Network Clustering

Network Data are ubiquitous

Web networks

Social networks

Biological networks, etc.

Network Clustering

Detect sub-networks that satisfy

certain properties

Many connections within clusters

and few connections across

clusters

Figure from “Daniel Marbach, et al. Wisdom of crowds for robust

gene network inference. Nature methods 9.8 (2012): 796-804.”

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Page 6: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Multi-Network Clustering

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Networks collected from multiple conditions,

sources or domains

E.g., co-author networks from different

research areas

E.g., gene co-expression networks from

different tissues of model organisms

Multi-network clustering motivation

Single network can be noisy, incomplete and provide partial knowledge

Multi-network can provide compatible and complementary information

Multi-network can be robust to noise in individual networks

Figure from “Mikko Kivelä, et al. Multilayer

networks. Journal of Complex Networks 2.3

(2014): 203-271.”

Page 7: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Multi-Network Clustering

1. Abhishek Kumar, et al., Co-regularized multi-view spectral clustering.

In NIPS, 2011.

2. Wei Cheng, et al., Flexible and robust co-regularized multi-domain

graph clustering. In KDD, 2013.

Multi-view and multi-domain network clustering1,2

Key assumption

Different views/domains share the same underlying clustering structure

Methods are designed to identify consistent clustering structure across all

views/domains

Multi-view networks Multi-domain networks

Page 8: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Motivation

In many emerging applications, different networks have different data

distributions

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Gene cluster

Research community

Domain similarity

Page 9: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Motivation

In many emerging applications, different networks have different data

distributions

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

An unrooted tissue distance tree

Figure from “Daniela Börnigen, et al. Concordance of gene expression in

human protein complexes reveals tissue specificity and pathology. Nucleic

acids research 41.18 (2013): e171-e171.”

Domain similarity

Page 10: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Motivation

In many emerging applications, different networks have different data

distributions

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

A research conference similarity network

Domain similarity

Page 11: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Motivation

Network of Networks (NoN)

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

The dashed line network formed by A to F

is called the main network. Denoted as G.

The solid line networks formed by 1 to 10

are called the domain-specific networks.

Denoted as {A(1), …, A(g)}.

The goal of this work is to simultaneously

clustering multi-network by using their

multiple underlying clustering structures.

Adjacency matrix G

Page 12: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Motivation

Network of Networks (NoN)

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

The dashed line network formed by A to F

is called the main network. Denoted as G.

The solid line networks formed by 1 to 10

are called the domain-specific networks.

Denoted as {A(1), …, A(g)}.

The goal of this work is to simultaneously

clustering multi-network by using their

multiple underlying clustering structures.

{A(1), A(2), …, A(6)}

Adjacency matrices

Page 13: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Motivation

Network of Networks (NoN)

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

The dashed line network formed by A to F

is called the main network. Denoted as G.

The solid line networks formed by 1 to 10

are called the domain-specific networks.

Denoted as {A(1), …, A(g)}.

The goal of this work is to simultaneously

clustering multi-network by using their

multiple underlying clustering structures.

Page 14: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Problem Formulation

Phase I: Main Network Clustering

Symmetric Non-negative Matrix Factorization (SNMF)

Minimizing

where is the factor matrix of G. k is the number of main clusters.

Main cluster: the cluster in the main network

hij indicates to which degree a main node i belongs to the jth main cluster.

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

0..2

HHHG tsJF

T

M

kg

H

Page 15: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Problem Formulation

Phase II: Domain-specific Network Clustering (A Simplified Case)

Assumption: domain-specific networks in the same main cluster share a common

underlying clustering structure, so we have k underlying clustering structures.

A simplified case: all domains have n nodes and t clusters.

Let the domain cluster assignment vector for node x in A(i) be (i = 1, …, g).

Define k hidden domain cluster assignment vectors (j=1, …, k) for

each domain node x.

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

g

i

k

jF

j

x

i

xijx vuhJ1 1

2)(

*

)(

*

tj

xv

1)(

*

)(

*

i

xu

g

i

k

jF

ji

ij

g

iF

Tiii

D haJ1 1

2)()(

1

2)()()( )( VUUUA

Domain-specific network clustering Main cluster guided regularization Recall hij represents main cluster membership

The number of main clusters

Page 16: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Phase II: Domain-specific Network Clustering (The General Case)

Different domains can have different set of nodes thus different sizes.

Different domains can have different number of clusters.

Indirect regularization:

Example: if nodes 1 and 3 have similar cluster

assignments in D , their cluster assignments

in the underlying clustering structure shared by

{ D , E , F } should be similar as well.

Define two mapping matrices , such that the same

rows of and represent the same instances.

Problem Formulation

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

ji nnij~

)( }1,0{

O ii nnij }1,0{)(

D)()( jij

VO)()( iij

UD

2)(

*

)(

*

)(

*

)(

* )ˆ(ˆ)ˆ(ˆ Tij

y

ij

x

Tij

y

ij

xijh vvuu Minimize

Page 17: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Phase II: Domain-specific Network Clustering (The General Case)

Optimization problem

Where

Learning algorithm: an alternating minimization approach. U(i) and V(j) are

alternately solved by multiplicative updating rules with convergence guarantee.

Problem Formulation

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

g

i

k

j

Rij

g

i

AD

kj

giJhaJJ

j

i

1 11),...,1(,0

),...,1(,0)(

)(min

V

U

2)()()( )(

F

Tiii

AJ UUA

2)()()()()()()()(

1,

2)(

*

)(

*

)(

*

)(

* ))(())(())ˆ(ˆ)ˆ(ˆ(F

TjijjijTiijiijn

yx

ij

y

ij

x

Tij

y

ij

xR

i

J VOVOUDUDvvuu

Domain-specific network clustering

Main cluster guided regularization

Page 18: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Simulation Study

Synthetic data generation

Experimental Results

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Main Network An underlying

Clustering Structure

A(1)

A(2)

A(3)

Page 19: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Simulation Study

Accuracy of different methods on synthetic datasets

In view dataset, all A(i) have the same size. In dom dataset, different A(i) have

different sizes.

CTSC, PairCRSC, CentCRSC are multi-view graph clustering methods. TF is the

tensor factorization. CGC is a multi-domain graph clustering method.

Experimental Results

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Page 20: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Scalability Evaluation on Synthetic Dataset

Experimental Results

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Running time evaluation

(a) Varying network size (a) Varying number of networks

NoNClus

NoNClus

Page 21: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Functional Module Detection in Tissue-specific Gene Co-Expression Networks

Experimental Results

Tissue-specific Network # nodes # edges

Blood 633 2,573

Lymph node 648 2,256

Tonsil 682 2,480

Thymus 786 2,939

Brain 746 3,135

Caudate nucleus 640 2,578

Hypothalamus 641 2,500

Cerebellum 679 2,636

Total 5,455 21,097

Tissue-specific gene co-expression networks1

*5372 samples for 128 different tissues in four different cell

types, i.e., normal, disease, neoplasm and cell line. We

select 8 tissues to construct gene co-expression networks.

Tissue-tissue similarity network

(the main network in NoN)

1. Margus Lukk, et al. A global map of human gene expression. Nature

biotechnology 28.4 (2010): 322-324.

Page 22: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Functional Module Detection in Tissue-specific Gene Co-Expression Networks

Evaluation method: standard Gene Set Enrichment Analysis (GSEA).

The most significant Gene Ontology (GO) term in the biological process category

is assigned to each identified gene cluster.

The significance is assessed by Hypergeometric distribution.

Raw p-values are adjusted for multiple testing problem by False Discovery Rate

(FDR).

Experimental Results

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Page 23: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Functional Module Detection in Tissue-specific Gene Co-Expression Networks

Experimental Results

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Number of detected significant clusters with various input number of clusters

NoNClus

Page 24: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Functional Module Detection in Tissue-specific Gene Co-Expression Networks

Experimental Results

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Comparison of FDR adjusted p-values of detected clusters

NoNClus

Page 25: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Functional Module Detection in Tissue-specific Gene Co-Expression Networks

Experimental Results

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Method # significant clusters p-values

SNMF 116 4.64e-5

Spectral clustering 119 6.66e-3

Markov clustering 70 6.45e-17

ClusterOne 89 1.43e-10

NoNClus’ 121 4.87e-2

NoNClus 130 1

Comparison of number of detected significant clusters

Page 26: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

A novel multi-network clustering problem

Multi-network with multi-underlying clustering structures

A new clustering framework based on new network model

NoNClus on a Network of Networks (NoN)

Comprehensive experiments

Results on both synthetic and real datasets demonstrate the

effectiveness of NoNClus

Conclusion

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.

Page 27: Flexible and Robust Multi-Network ClusteringFlexible and Robust Multi-Network Clustering Jingchao 3Ni1, Hanghang Tong2, Wei Fan , Xiang Zhang1 1Department of Electrical Engineering

Thank you!

Questions?

Jingchao Ni, Hanghang Tong, Wei Fan, Xiang Zhang.

Flexible and Robust Multi-Network Clustering. In KDD, 2015.