ch4 network topology part1 network topology...

Post on 05-Jun-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ch4 Network topology

Part1 Network Topology based

on graph theory

Zhuo Wang

zhuowang@sjtu.edu.cn

Outline

• Basic network topology analysis

• Advanced network topology analysis

• Comparison with random network

Basic Network Measures

• Degree ki

• Degree distribution P(k)

• Mean path length

• Network Diameter

• Clustering Coefficient

Paths:metabolic, signaling pathways

Cliques:protein complexes

Hubs:regulatory modules

Subgraphs:maximally weighted

Network Analysis

Graphs

• Graph G=(V,E) is a set of vertices V and edges E

• A subgraph G’ of G is induced by some V’ V and E’ E

• Graph properties:– Connectivity (node degree, paths)

– Cyclic vs. acyclic

– Directed vs. undirected

Sparse vs Dense

• G(V, E) where |V|=n, |E|=m the number of vertices and edges

• Graph is sparse if m~n

• Graph is dense if m~n2

• Complete graph when m=n2

Connected Components

• G(V,E)

• |V| = 69

• |E| = 71

Connected Components

• G(V,E)

• |V| = 69

• |E| = 71

• 6 connected components

Paths

A path is a sequence {x1, x2,…, xn} such that (x1,x2), (x2,x3), …, (xn-1,xn) are edges of the graph.

A closed path xn=x1 on a graph is called a graph cycle or circuit.

Shortest-Path between nodes

Shortest-Path between nodes

Longest Shortest-Path

Network Measures: Degree

P(k) is probability of each

degree k, i.e fraction of

nodes having that degree.

For random networks, P(k)

is normally distributed.

For real networks the

distribution is often a power-

law:

P(k) ~ k-g

Such networks are said to

be scale-free

Degree Distribution

Hierarchical Networks

Detecting Hierarchical Organization

y = 1.2x-1.91

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1 10 100

Degree k

P (

k)

Knock-out Lethality and Connectivity

0

10

20

30

40

50

60

0 5 10 15 20 25

Degree k%

Es

se

nti

al G

en

es

Target the hubs to have

an efficient safe sex

education campaign

Lewin Bo, et al., Sex i Sverige; Om sexuallivet i Sverige 1996,

Folkhälsoinstitutet, 1998

Scale-Free Networks are Robust• Complex systems (cell, internet, social networks),

are resilient to component failure

• Network topology plays an important role in this robustness– Even if ~80% of nodes fail, the remaining ~20% still maintain network

connectivity

• Attack vulnerability if hubs are selectively targeted

• In yeast, only ~20% of proteins are lethal when deleted, and are 5 times more likely to have degree k>15 than k<5.

Degree (hubs)-attack

Other Interesting Features

• Cellular networks are assortative, hubs tend not to interact directly with other hubs.

• Hubs tend to be “older” proteins (so far claimed for protein-protein interaction networks only)

• Hubs also seem to have more evolutionary pressure—their protein sequences are more conserved than average between species (shown in yeast vs. worm)

• Experimentally determined protein complexes tend to contain solely essential or non-essential proteins—further evidence for modularity.

Clustering Coefficient (聚合系数)

( )1

2

2

-=

=

kk

n

k

nC II

I

k: neighbors of I

nI: edges between

node I’s neighbors

The density of the network

surrounding node I, characterized as

the number of triangles through I.

Related to network modularity

The center node has 8 (grey) neighbors

There are 4 edges between the neighbors

C = 2*4 /(8*(8-1)) = 8/56 = 1/7

Small-world Network

• Every node can be reached from every other by a small number of hops or steps

• High clustering coefficient and low mean-shortest path length– Random graphs don’t necessarily have high clustering coefficients

• Social networks, the Internet, and biological networks all exhibit small-world network characteristics

Social network

Small word network

Summary: Network Measures

• Degree ki

The number of edges involving node i

• Degree distribution P(k)The probability (frequency) of nodes of degree k

• Mean path lengthThe avg. shortest path between all node pairs

• Network Diameter– i.e. the longest shortest path

• Clustering Coefficient– A high CC is found for modules

1. List and explain the basic topological parameters of network.

2. List which organisms with genome-scale curated metabolic network, and give the source (database or literature).

3. Get the regulatory network information of E.coli and Yeast from RegulonDB and Yeastract database respectively.

(Some networks maybe your team project)

Assignment

Advanced Network Topology

Types of graphs

• Simple graphs

• Weighted graphs

• Multigraphs

• Directed graphs

Simple Graphs

Simple graphs are graphs without multiple

edges or self-loops. They are weighted graphs

with all edge weights equal to one.

B

ED

C

A

Weighted graph

is a graph for which each edge has an associated weight, usually given by a weight functionw: E → R, generally positive

07.05.01.20

7.001.200

5.01.204.30

004.305.1

0005.10

E

D

C

B

A

EDCBA

Adjacency Matrix of Weighted graphs

Degree of Weighted graphs

▪The sum of the weights associated to every edgeincident to the corresponding node

▪The sum of the corresponding row or column of the adjacency matrix

07.05.01.20

7.001.200

5.01.204.30

004.305.1

0005.10

E

D

C

B

A

EDCBA Degree1.54.96

2.83.3

Multigraph

▪is a graph which is permitted to have multiple edges. Is an ordered pair G:=(V,E) with▪V a set of nodes▪E a multiset of unordered pairs of vertices.

Adjacency Matrix of Multigraphs

02140

20100

11030

40301

00012

E

D

C

B

A

EDCBA

Directed Graph (digraph)

• Edges have directions– The adjacency matrix is not symmetric

01000

10100

10010

20010

00010

E

D

C

B

A

EDCBA

Local metrics

◼ Local metrics provide a measurement of a structural property of a single node

◼ Designed to characterise

◼ Functional role – what part does this node play in system dynamics?

◼ Structural importance – how important is this node to the structural characteristics of the system?

Degree centrality of a node refers to the number of edges attached to the node. In order to know the standardized score, you need to divide each score by n-1 (n = the number of nodes).

Degree Centrality (点度中心度)

Degree Centrality

B

ED

C

A

2

4

2

1

1

degree

00010

00010

00011

11101

00110

E

D

C

B

A

EDCBA

0.5

1

0.5

0.25

0.25

Degreecentrality

Betweenness centrality (中间中心度)

◼ The number of shortest paths in the graph that pass through the node divided by the total number of shortest paths.

( )( )( )

kjiji

jkikBC

i j

= ,

,

,,

Betweenness centrality

B

◼ Shortest paths are:

◼ AB, AC, ABD, ABE, BC, BD, BE, CBD, CBE, DBE

◼ B has a BC of 5/10=0.5

A

C

D E

( ) ( )

( ) ( )( ) ( )

( ) ( )

( ) ( ) 1, ;1,,

1, ;1,,

1, ;1,,

1, ;1,,

1, ;1,,

==

==

==

==

==

EBEBD

DBEBC

CBDBC

CAEBA

BADBA

Betweenness centrality

◼ Nodes with a high betweenness centrality are interesting because they

◼ control information flow in a network

◼ may be required to carry more information

◼ And therefore, such nodes

◼ may be the subject of targeted attack

Closeness centrality (接近中心度)

( )( )-

=

j

jid

NiCC

,

1

◼ The normalised inverse of the sum of topological distances in the graph.

B

ED

C

A

02212

20212

22011

11101

22110

E

D

C

B

A

EDCBA( )

=

n

j

jid1

,

6

4

6

7

7

Closeness centrality

Closeness centrality

B

ED

C

A Closeness

0.67

1.00

0.67

0.57

0.57

◼Node B is the most central one in spreading

information from it to the other nodes in the

network.

Closeness centrality

B

ED

C

A

Local metrics

▪Node B is the most central oneaccording to the degree, betweenness and closenesscentralities.

Which is the most central node?

A

B

the winner is…

▪A is the most central according to the degree

▪B is the most central according to closeness and betweenness

Degree: Difficulties

Extending the Concept of Degree

Make xi proportional to the average of the centralitiesof its i’s network neighbors

where l is a constant. In matrix-vector notation we can write

In order to make the centralities non-negative we selectthe eigenvector corresponding to the principal eigenvalue(Perron-Frobenius theorem).

j

n

j

iji xA x =

=1

1

l

Axxl

1=

Eigenvalues and Eigenvectors

◼ The value λ is an eigenvalue of matrix A if there exists a non-zero vector x, such that Ax=λx. Vector x is an eigenvector of matrix A◼ The largest eigenvalue is called the principal

eigenvalue

◼ The corresponding eigenvector is the principaleigenvector

◼ Corresponds to the direction of maximum change

Eigenvector Centrality (特征向量中心度)

◼ The corresponding entry of the principal eigenvector of the adjacency matrix of the network.

◼ It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more.

Node EC

1 0.5002 0.2383 0.2384 0.5755 0.3546 0.3547 0.1688 0.168

Eigenvector Centrality

Path of length 6 Walk of length 8

Shortest path

Communicability

k

pq

sk

k

s

pqspq WcPbG

+=

s

pqP

Let be the number of walks of length k>s

between p and q.

Let be the number of shortest paths of length s

between p and q.k

pqW

DEFINITION (Communicability):

sb and must be selected such as the communicability converges.kc

Communicability

( )( )pq

k

pq

k

pq ek

G AA

==

=0 !

( ) ( ) jeqxpxG j

n

j

jpq

l

=

=1

Communicability

▪By selecting bl=1/l! and cl=1/l! we obtain

where eA is the exponential of the adjacency matrix.

▪For simple graphs we have

( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

+-

-+

--

++

++

++=

22

22

11

1

j

jj

j

jj

j

jj

j

jjpq

jj

jj

eqpeqp

eqpeqpeqpG

ll

lll

intracluster

intercluster

Communicability

( ) ( ) ( ) ( ) jj eqeqpG j

j

j

j

jpq

ll

-

=

-

=

+=clusterinter

2

j

clusterintra

2

p

Communicability & Communities

▪A community is a group of nodes for wich the intra-cluster communicability is larger than the inter-cluster one

▪These nodes communicates better among them than with the rest of extra-community nodes.

( ) ( ) ( ) ( ) jj eqeqp j

j

j

j

j

ll

-

=

-

=

clusterinter

2

j

clusterintra

2

p

( )

=

0 if 0

0 if 1

x

xx▪Let

▪The communicability graph (G) is the graph whose adjacency matrix is given by ((G)) results from the elementwise application of the function (G) to the matrix (G).

Communicability Graph

( )G

communicability

graph

( )=GΔ ( )( )= GΔ

pqGΔ 1,0

Communicability Graph

▪A community is defined as a clique

in the communicability graph.

▪Identifying communities is reduced

to the “all cliques problem” in the

communicability graph.

Communicability Graph

Social (Friendship) Network

Communities: Example

Communities: Example

The Network

Its Communicability Graph

Communities

Social Networks Metabolic Networks

How to construct a proper random network?

Randomization of a network

given complex network random

Stub reconnection algorithm残端重连接算法

• Break every edge into two halves (“stubs”)

• Randomly reconnect stubs

• Watch for multiple edges!

• For example, in the AS-Internet two largest hubs would end up being connected with 50 edges

• Not adaptable to conserve other low-level topological properties of the network

Local rewiring algorithm

• Randomly select and rewire two edges

• Repeat many times

• R. Kannan, P. Tetali, and S. Vempala, Random Structures and Algorithms (1999)

• SM, K. Sneppen, Science (2002)

Metropolis rewiring algorithm

• Randomly select two edges

• Calculate change E in “energy function” E=(Nactual-Ndesired)2/Ndesired

• Rewire with probability p=exp(-E/T)

“energy” E “energy” E+E

SM, K. Sneppen:cond-mat preprint (2002),Physica A (2004)

Assignment

Identify the most central node according to the following criteria:(a) the largest chance of receiving information from closest neighbors;(b) spreading information to the rest of nodes in the network;(c) passing information from some nodes to others.

top related