doochul kim (seoul national university, korea) based on k.i- goh, b. kahng and d. kim
DESCRIPTION
Hybrid model of protein interaction network: Modularity and the family constraint. Doochul Kim (Seoul National University, Korea) based on K.I- Goh, B. Kahng and D. Kim (q-bio.MN/0312009 v2) and Goh’s talk at Statphys 22. Academia Sinica, Taipei, 2004.09.15. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Doochul Kim(Seoul National University, Korea)
based on K.I- Goh, B. Kahng and D. Kim
(q-bio.MN/0312009 v2)
and Goh’s talk at Statphys 22
Hybrid model of protein interaction network:
Modularity and the family constraint
Academia Sinica, Taipei, 2004.09.15
Introduction
• Most real-world networks are modular.How does the modularity emerge dynamically?
• Vertices can be grouped according to their common characteristics: Vertex families.
• In some systems, the vertex families can be defined explicitly.
• Families themselves form a network, which may also evolve in time [cf. The social network models with a priori defined communities (Jin et al. PRE 2001, Watts et al. Science 2002)].
Domains/ASes Hosts/RoutersInternet
Social parties IndividualsSociety
Protein family ProteinsCell
Family constraint
Topological properties of yeast protein-protein interaction network
Yeast protein network
[Jeong et al., Nature 2001]
[ Maslov & Sneppen Science 2002]
From K.-I. Goh, B. Kahng and D. Kim, “Graphical analysis of biocomplex networks and transport phenomena", book chapter in “Power Laws, Scale-free Networks and Genome Biology", eds. E. Koonin, Y. Wolf and G. Karev (Lanes Biosciences, 2004)
Yeast protein network
Re-analysis with an integrated data
i) Scale-free ii) Modular clustering iii) Disassortative mixing
a. DIP+MIPS+BIND b. Ho et al. Nature ’02 (MassSpec)
c. Uetz et al. Nature ’00 + Ito et al. PNAS ’01 +Tong et al. Science’02 (Yeast Two Hybrid –Y2H)
pd(k)~(k+k0) C=0.13 knn(k)~k
(Crandom0.02)
Yeast protein network
3.5 0.2 0.3
2-node correlations: Average neighbor degree function nn()=
3-node correlations; clustering:Local clustering function ()
Local clustering coefficient
=
# of edges between neighbors()/
nn
()
positive correlation;assortative
no correlation; neutral
negative correlation;disassortative
() modular
clustering
hierarchical clustering
no clustering
2'
' ( , ') / ( ) /dk
k P k k kP k k
Network correlations
Protein interactions
• Physical interactions between proteins mostly occur on a structural basis (key-and-lock, induced fit, etc).
• Protein structure is well conserved during evolution, based on which the proteins can be classified into families.How the high clustering and the strong modular
organization appear? Family compatibility constraint
Yeast protein network
Protein family network is also scale-free [Park et al. J Mol Biol 2001].
Yeast protein network
Protein family size distribution follows a power law [Huynen & van Nimwegen, PNAS 1998].
Yeast protein network
Evolution by gene duplication and divergence (DD) [Ohno, 1970] => incorporated in previous models:
- Protein interaction network evolution [Solé et al., Adv Compl Sys 2002; Vázquez et al., ComplexUs 2003].
- Domain occurrence frequency distribution [Qian et al., J Mol Biol 2001; Karev et al., BMC Evol Biol 2002].
Protein family compatibility: The interaction between proteins is possible only when the corresponding families they belong to are “compatible.”
- Those connected in the family network are compatible.
Model
Basic scenario: duplication+divergence+mutation+family constraint
Proteins can interact only with those in the compatible families: E.g, pink protein CANNOT
interact with black one.
Divergence with probability
Mutation with rate 1
Protein family networkProtein interaction network
Model
Model: Stage 0
1) N0 = 3 proteins in the beginning.
2) They interact with one another, forming a complete network of size N0: ki(t=0)=2 for i=1,2,3.
3) Each protein constitutes a protein family: Nf(t=0) = 1.
4) Each family contains single domain: Df(t=0) = 1.
Protein
Protein family
Protein interaction
Protein family link
t=0
Model (details)
Model: Stage 1
1) With rate , a randomly chosen protein is duplicated.
2) Each inherited interactions are removed with probability .
3) The new protein establishes a new protein family.
4) Initial protein family link is determined by protein interactions.
5) If no interaction is left, it belongs to the original family.
duplication with rate
removed with probability
Model (details)
Model: Stage 1-2
1) With rate , a randomly chosen protein mutates.
2) The mutating protein i gains a new interaction with another proteins previously not linked chosen with the probability,
3) Fl Fi sets the constraint on family compatibility.
4) DFi is increased by 1.
.
il
l
j
FFF
F
j D
D
mutation with rate
Model (details)
Model: Stage 2 Family map is fixed
1) With rate , a randomly chosen protein is duplicated, which becomes a member of the original family. It again diverges with rate same as before to make the model minimal.)
2) With rate , a randomly chosen protein mutates with the same constraint as in Stage 1.
Model (details)
Parameters for the simulation:1) The duplication rate = 0.8 and the divergence ratio =
0.7. - is fixed to accommodate the high level of “sequence diversity” within a family.- is tuned to match the empirical average degree ~ 6.4
2) Family creation lasts until = 1000, when the number of families becomes about 500.
3) Evolution lasts until 6000, the size of proteome of the yeast.
Model
Snapshot
N = 600NF = 62
Results
pd(k)~(k+k0) C~0.1 knn(k)~k
Results
Protein interaction network
Yeast dataModel with family constraint
Model without family constraint
Degree-correlation profile
0.5 1.0 1.5 2.0 2.5
0.5
1.0
1.5
2.0
2.5
log10k
log 1
0k'
-0.2600
-0.2200
-0.1800
-0.1400
-0.1000
-0.06000
-0.02000
0.02000
0.06000
0.1000
0.1400
0.1800
0.2200
0.5 1.0 1.5 2.0 2.5
0.5
1.0
1.5
2.0
2.5
log10k
log 1
0k'
-1.100
-0.9000
-0.7000
-0.5000
-0.3000
-0.1000
0.1000
0.3000
0.50000.6000
Yeast Model
z = log10[P(k,k’)/Prand(k,k’)]
P(k,k’): probability that a randomly chosen link connects proteins with degrees k and k’.
Results
Results: Clustering of clustering
Yeast =0.8, =0.7
z = log10[P(c,c’)/Prand(c,c’)]
-0.5 -1.0 -1.5 -2.0 -2.5 -3.0 -3.5
-0.5
-1.0
-1.5
-2.0
-2.5
-3.0
-3.5
log10c
log 10c'
-3.000
-2.000
-1.000
0
1.000
2.000
3.000
-0.5 -1.0 -1.5 -2.0 -2.5 -3.0 -3.5
-0.5
-1.0
-1.5
-2.0
-2.5
-3.0
-3.5
log10c
log 10c'
-3.000
-2.000
-1.000
0
1.000
2.000
3.000
Results: Statistics
Item model yeast
Total number of proteins n 6000 ~6000
Number of interacting proteins N
~5000 4929
Average degree k 6.5 0.2 6.41
Clustering coefficient C 0.13 0.2 0.128
Assortativity r -0.09 0.04
-0.13
Size of the largest component N1
4900 7 4832
Results
Modular clustering is conserved for edge shuffling conserving family constraint.
Results
Model network before shuffling Model network shuffled with family
constraintModel network shuffled without family
constraint
Protein family network
p (x)~(x+x0)
Results
• “Family constraint” is introduced as a mechanism behind the emergence of modularity in evolving networks.
• We applied it to the protein-protein interaction and protein family network of the yeast, and achieved detailed agreement in the topological properties with the empirical data.
• Our result suggests the physical constraint encoded in the domain structure within proteins is crucial in the organization of the protein interaction networks.
• The concept can be applied in other systems, e.g., the Internet (domains/hosts) and the social networks (social parties/individuals).
Summary