ch4 network topology part1 network topology...
TRANSCRIPT
Outline
• Basic network topology analysis
• Advanced network topology analysis
• Comparison with random network
Basic Network Measures
• Degree ki
• Degree distribution P(k)
• Mean path length
• Network Diameter
• Clustering Coefficient
Paths:metabolic, signaling pathways
Cliques:protein complexes
Hubs:regulatory modules
Subgraphs:maximally weighted
Network Analysis
Graphs
• Graph G=(V,E) is a set of vertices V and edges E
• A subgraph G’ of G is induced by some V’ V and E’ E
• Graph properties:– Connectivity (node degree, paths)
– Cyclic vs. acyclic
– Directed vs. undirected
Sparse vs Dense
• G(V, E) where |V|=n, |E|=m the number of vertices and edges
• Graph is sparse if m~n
• Graph is dense if m~n2
• Complete graph when m=n2
Connected Components
• G(V,E)
• |V| = 69
• |E| = 71
Connected Components
• G(V,E)
• |V| = 69
• |E| = 71
• 6 connected components
Paths
A path is a sequence {x1, x2,…, xn} such that (x1,x2), (x2,x3), …, (xn-1,xn) are edges of the graph.
A closed path xn=x1 on a graph is called a graph cycle or circuit.
Shortest-Path between nodes
Shortest-Path between nodes
Longest Shortest-Path
Network Measures: Degree
P(k) is probability of each
degree k, i.e fraction of
nodes having that degree.
For random networks, P(k)
is normally distributed.
For real networks the
distribution is often a power-
law:
P(k) ~ k-g
Such networks are said to
be scale-free
Degree Distribution
Hierarchical Networks
Detecting Hierarchical Organization
y = 1.2x-1.91
1.0E-04
1.0E-03
1.0E-02
1.0E-01
1.0E+00
1.0E+01
1 10 100
Degree k
P (
k)
Knock-out Lethality and Connectivity
0
10
20
30
40
50
60
0 5 10 15 20 25
Degree k%
Es
se
nti
al G
en
es
Target the hubs to have
an efficient safe sex
education campaign
Lewin Bo, et al., Sex i Sverige; Om sexuallivet i Sverige 1996,
Folkhälsoinstitutet, 1998
Scale-Free Networks are Robust• Complex systems (cell, internet, social networks),
are resilient to component failure
• Network topology plays an important role in this robustness– Even if ~80% of nodes fail, the remaining ~20% still maintain network
connectivity
• Attack vulnerability if hubs are selectively targeted
• In yeast, only ~20% of proteins are lethal when deleted, and are 5 times more likely to have degree k>15 than k<5.
Degree (hubs)-attack
Other Interesting Features
• Cellular networks are assortative, hubs tend not to interact directly with other hubs.
• Hubs tend to be “older” proteins (so far claimed for protein-protein interaction networks only)
• Hubs also seem to have more evolutionary pressure—their protein sequences are more conserved than average between species (shown in yeast vs. worm)
• Experimentally determined protein complexes tend to contain solely essential or non-essential proteins—further evidence for modularity.
Clustering Coefficient (聚合系数)
( )1
2
2
-=
=
kk
n
k
nC II
I
k: neighbors of I
nI: edges between
node I’s neighbors
The density of the network
surrounding node I, characterized as
the number of triangles through I.
Related to network modularity
The center node has 8 (grey) neighbors
There are 4 edges between the neighbors
C = 2*4 /(8*(8-1)) = 8/56 = 1/7
Small-world Network
• Every node can be reached from every other by a small number of hops or steps
• High clustering coefficient and low mean-shortest path length– Random graphs don’t necessarily have high clustering coefficients
• Social networks, the Internet, and biological networks all exhibit small-world network characteristics
Social network
Small word network
Summary: Network Measures
• Degree ki
The number of edges involving node i
• Degree distribution P(k)The probability (frequency) of nodes of degree k
• Mean path lengthThe avg. shortest path between all node pairs
• Network Diameter– i.e. the longest shortest path
• Clustering Coefficient– A high CC is found for modules
1. List and explain the basic topological parameters of network.
2. List which organisms with genome-scale curated metabolic network, and give the source (database or literature).
3. Get the regulatory network information of E.coli and Yeast from RegulonDB and Yeastract database respectively.
(Some networks maybe your team project)
Assignment
Advanced Network Topology
Types of graphs
• Simple graphs
• Weighted graphs
• Multigraphs
• Directed graphs
Simple Graphs
Simple graphs are graphs without multiple
edges or self-loops. They are weighted graphs
with all edge weights equal to one.
B
ED
C
A
Weighted graph
is a graph for which each edge has an associated weight, usually given by a weight functionw: E → R, generally positive
07.05.01.20
7.001.200
5.01.204.30
004.305.1
0005.10
E
D
C
B
A
EDCBA
Adjacency Matrix of Weighted graphs
Degree of Weighted graphs
▪The sum of the weights associated to every edgeincident to the corresponding node
▪The sum of the corresponding row or column of the adjacency matrix
07.05.01.20
7.001.200
5.01.204.30
004.305.1
0005.10
E
D
C
B
A
EDCBA Degree1.54.96
2.83.3
Multigraph
▪is a graph which is permitted to have multiple edges. Is an ordered pair G:=(V,E) with▪V a set of nodes▪E a multiset of unordered pairs of vertices.
Adjacency Matrix of Multigraphs
02140
20100
11030
40301
00012
E
D
C
B
A
EDCBA
Directed Graph (digraph)
• Edges have directions– The adjacency matrix is not symmetric
01000
10100
10010
20010
00010
E
D
C
B
A
EDCBA
Local metrics
◼ Local metrics provide a measurement of a structural property of a single node
◼ Designed to characterise
◼ Functional role – what part does this node play in system dynamics?
◼ Structural importance – how important is this node to the structural characteristics of the system?
Degree centrality of a node refers to the number of edges attached to the node. In order to know the standardized score, you need to divide each score by n-1 (n = the number of nodes).
Degree Centrality (点度中心度)
Degree Centrality
B
ED
C
A
2
4
2
1
1
degree
00010
00010
00011
11101
00110
E
D
C
B
A
EDCBA
0.5
1
0.5
0.25
0.25
Degreecentrality
Betweenness centrality (中间中心度)
◼ The number of shortest paths in the graph that pass through the node divided by the total number of shortest paths.
( )( )( )
kjiji
jkikBC
i j
= ,
,
,,
Betweenness centrality
B
◼ Shortest paths are:
◼ AB, AC, ABD, ABE, BC, BD, BE, CBD, CBE, DBE
◼ B has a BC of 5/10=0.5
A
C
D E
( ) ( )
( ) ( )( ) ( )
( ) ( )
( ) ( ) 1, ;1,,
1, ;1,,
1, ;1,,
1, ;1,,
1, ;1,,
==
==
==
==
==
EBEBD
DBEBC
CBDBC
CAEBA
BADBA
Betweenness centrality
◼ Nodes with a high betweenness centrality are interesting because they
◼ control information flow in a network
◼ may be required to carry more information
◼ And therefore, such nodes
◼ may be the subject of targeted attack
Closeness centrality (接近中心度)
( )( )-
=
j
jid
NiCC
,
1
◼ The normalised inverse of the sum of topological distances in the graph.
B
ED
C
A
02212
20212
22011
11101
22110
E
D
C
B
A
EDCBA( )
=
n
j
jid1
,
6
4
6
7
7
Closeness centrality
Closeness centrality
B
ED
C
A Closeness
0.67
1.00
0.67
0.57
0.57
◼Node B is the most central one in spreading
information from it to the other nodes in the
network.
Closeness centrality
B
ED
C
A
Local metrics
▪Node B is the most central oneaccording to the degree, betweenness and closenesscentralities.
Which is the most central node?
A
B
the winner is…
▪A is the most central according to the degree
▪B is the most central according to closeness and betweenness
Degree: Difficulties
Extending the Concept of Degree
Make xi proportional to the average of the centralitiesof its i’s network neighbors
where l is a constant. In matrix-vector notation we can write
In order to make the centralities non-negative we selectthe eigenvector corresponding to the principal eigenvalue(Perron-Frobenius theorem).
j
n
j
iji xA x =
=1
1
l
Axxl
1=
Eigenvalues and Eigenvectors
◼ The value λ is an eigenvalue of matrix A if there exists a non-zero vector x, such that Ax=λx. Vector x is an eigenvector of matrix A◼ The largest eigenvalue is called the principal
eigenvalue
◼ The corresponding eigenvector is the principaleigenvector
◼ Corresponds to the direction of maximum change
Eigenvector Centrality (特征向量中心度)
◼ The corresponding entry of the principal eigenvector of the adjacency matrix of the network.
◼ It assigns relative scores to all nodes in the network based on the principle that connections to high-scoring nodes contribute more.
Node EC
1 0.5002 0.2383 0.2384 0.5755 0.3546 0.3547 0.1688 0.168
Eigenvector Centrality
Path of length 6 Walk of length 8
Shortest path
Communicability
k
pq
sk
k
s
pqspq WcPbG
+=
s
pqP
Let be the number of walks of length k>s
between p and q.
Let be the number of shortest paths of length s
between p and q.k
pqW
DEFINITION (Communicability):
sb and must be selected such as the communicability converges.kc
Communicability
( )( )pq
k
pq
k
pq ek
G AA
==
=0 !
( ) ( ) jeqxpxG j
n
j
jpq
l
=
=1
Communicability
▪By selecting bl=1/l! and cl=1/l! we obtain
where eA is the exponential of the adjacency matrix.
▪For simple graphs we have
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
+-
-+
--
++
++
++=
22
22
11
1
j
jj
j
jj
j
jj
j
jjpq
jj
jj
eqpeqp
eqpeqpeqpG
ll
lll
intracluster
intercluster
Communicability
( ) ( ) ( ) ( ) jj eqeqpG j
j
j
j
jpq
ll
-
=
-
=
+=clusterinter
2
j
clusterintra
2
p
Communicability & Communities
▪A community is a group of nodes for wich the intra-cluster communicability is larger than the inter-cluster one
▪These nodes communicates better among them than with the rest of extra-community nodes.
( ) ( ) ( ) ( ) jj eqeqp j
j
j
j
j
ll
-
=
-
=
clusterinter
2
j
clusterintra
2
p
( )
=
0 if 0
0 if 1
x
xx▪Let
▪The communicability graph (G) is the graph whose adjacency matrix is given by ((G)) results from the elementwise application of the function (G) to the matrix (G).
Communicability Graph
( )G
communicability
graph
( )=GΔ ( )( )= GΔ
pqGΔ 1,0
Communicability Graph
▪A community is defined as a clique
in the communicability graph.
▪Identifying communities is reduced
to the “all cliques problem” in the
communicability graph.
Communicability Graph
Social (Friendship) Network
Communities: Example
Communities: Example
The Network
Its Communicability Graph
Communities
Social Networks Metabolic Networks
How to construct a proper random network?
Randomization of a network
given complex network random
Stub reconnection algorithm残端重连接算法
• Break every edge into two halves (“stubs”)
• Randomly reconnect stubs
• Watch for multiple edges!
• For example, in the AS-Internet two largest hubs would end up being connected with 50 edges
• Not adaptable to conserve other low-level topological properties of the network
Local rewiring algorithm
• Randomly select and rewire two edges
• Repeat many times
• R. Kannan, P. Tetali, and S. Vempala, Random Structures and Algorithms (1999)
• SM, K. Sneppen, Science (2002)
Metropolis rewiring algorithm
• Randomly select two edges
• Calculate change E in “energy function” E=(Nactual-Ndesired)2/Ndesired
• Rewire with probability p=exp(-E/T)
“energy” E “energy” E+E
SM, K. Sneppen:cond-mat preprint (2002),Physica A (2004)
Assignment
Identify the most central node according to the following criteria:(a) the largest chance of receiving information from closest neighbors;(b) spreading information to the rest of nodes in the network;(c) passing information from some nodes to others.