doochul kim (seoul national university, korea) based on k.i- goh, b. kahng and d. kim

24
Doochul Kim (Seoul National University, Korea) based on K.I- Goh, B. Kahng and D. Kim (q-bio.MN/0312009 v2) and Goh’s talk at Statphys 22 Hybrid model of protein interaction network: Modularity and the family constraint Academia Sinica, Taipei, 2004.09.15

Upload: chaney

Post on 08-Jan-2016

18 views

Category:

Documents


2 download

DESCRIPTION

Hybrid model of protein interaction network: Modularity and the family constraint. Doochul Kim (Seoul National University, Korea) based on K.I- Goh, B. Kahng and D. Kim (q-bio.MN/0312009 v2) and Goh’s talk at Statphys 22. Academia Sinica, Taipei, 2004.09.15. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Doochul Kim(Seoul National University, Korea)

based on K.I- Goh, B. Kahng and D. Kim

(q-bio.MN/0312009 v2)

and Goh’s talk at Statphys 22

Hybrid model of protein interaction network:

Modularity and the family constraint

Academia Sinica, Taipei, 2004.09.15

Page 2: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Introduction

• Most real-world networks are modular.How does the modularity emerge dynamically?

• Vertices can be grouped according to their common characteristics: Vertex families.

• In some systems, the vertex families can be defined explicitly.

• Families themselves form a network, which may also evolve in time [cf. The social network models with a priori defined communities (Jin et al. PRE 2001, Watts et al. Science 2002)].

Domains/ASes Hosts/RoutersInternet

Social parties IndividualsSociety

Protein family ProteinsCell

Family constraint

Page 3: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Topological properties of yeast protein-protein interaction network

Yeast protein network

[Jeong et al., Nature 2001]

[ Maslov & Sneppen Science 2002]

Page 4: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

From K.-I. Goh, B. Kahng and D. Kim, “Graphical analysis of biocomplex networks and transport phenomena", book chapter in “Power Laws, Scale-free Networks and Genome Biology", eds. E. Koonin, Y. Wolf and G. Karev (Lanes Biosciences, 2004)

Yeast protein network

Page 5: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Re-analysis with an integrated data

i) Scale-free ii) Modular clustering iii) Disassortative mixing

a. DIP+MIPS+BIND b. Ho et al. Nature ’02 (MassSpec)

c. Uetz et al. Nature ’00 + Ito et al. PNAS ’01 +Tong et al. Science’02 (Yeast Two Hybrid –Y2H)

pd(k)~(k+k0) C=0.13 knn(k)~k

(Crandom0.02)

Yeast protein network

3.5 0.2 0.3

Page 6: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

2-node correlations: Average neighbor degree function nn()=

3-node correlations; clustering:Local clustering function ()

Local clustering coefficient

=

# of edges between neighbors()/

nn

()

positive correlation;assortative

no correlation; neutral

negative correlation;disassortative

() modular

clustering

hierarchical clustering

no clustering

2'

' ( , ') / ( ) /dk

k P k k kP k k

Network correlations

Page 7: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Protein interactions

• Physical interactions between proteins mostly occur on a structural basis (key-and-lock, induced fit, etc).

• Protein structure is well conserved during evolution, based on which the proteins can be classified into families.How the high clustering and the strong modular

organization appear? Family compatibility constraint

Yeast protein network

Page 8: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Protein family network is also scale-free [Park et al. J Mol Biol 2001].

Yeast protein network

Page 9: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Protein family size distribution follows a power law [Huynen & van Nimwegen, PNAS 1998].

Yeast protein network

Page 10: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Evolution by gene duplication and divergence (DD) [Ohno, 1970] => incorporated in previous models:

- Protein interaction network evolution [Solé et al., Adv Compl Sys 2002; Vázquez et al., ComplexUs 2003].

- Domain occurrence frequency distribution [Qian et al., J Mol Biol 2001; Karev et al., BMC Evol Biol 2002].

Protein family compatibility: The interaction between proteins is possible only when the corresponding families they belong to are “compatible.”

- Those connected in the family network are compatible.

Model

Page 11: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Basic scenario: duplication+divergence+mutation+family constraint

Proteins can interact only with those in the compatible families: E.g, pink protein CANNOT

interact with black one.

Divergence with probability

Mutation with rate 1

Protein family networkProtein interaction network

Model

Page 12: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Model: Stage 0

1) N0 = 3 proteins in the beginning.

2) They interact with one another, forming a complete network of size N0: ki(t=0)=2 for i=1,2,3.

3) Each protein constitutes a protein family: Nf(t=0) = 1.

4) Each family contains single domain: Df(t=0) = 1.

Protein

Protein family

Protein interaction

Protein family link

t=0

Model (details)

Page 13: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Model: Stage 1

1) With rate , a randomly chosen protein is duplicated.

2) Each inherited interactions are removed with probability .

3) The new protein establishes a new protein family.

4) Initial protein family link is determined by protein interactions.

5) If no interaction is left, it belongs to the original family.

duplication with rate

removed with probability

Model (details)

Page 14: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Model: Stage 1-2

1) With rate , a randomly chosen protein mutates.

2) The mutating protein i gains a new interaction with another proteins previously not linked chosen with the probability,

3) Fl Fi sets the constraint on family compatibility.

4) DFi is increased by 1.

.

il

l

j

FFF

F

j D

D

mutation with rate

Model (details)

Page 15: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Model: Stage 2 Family map is fixed

1) With rate , a randomly chosen protein is duplicated, which becomes a member of the original family. It again diverges with rate same as before to make the model minimal.)

2) With rate , a randomly chosen protein mutates with the same constraint as in Stage 1.

Model (details)

Page 16: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Parameters for the simulation:1) The duplication rate = 0.8 and the divergence ratio =

0.7. - is fixed to accommodate the high level of “sequence diversity” within a family.- is tuned to match the empirical average degree ~ 6.4

2) Family creation lasts until = 1000, when the number of families becomes about 500.

3) Evolution lasts until 6000, the size of proteome of the yeast.

Model

Page 17: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Snapshot

N = 600NF = 62

Results

Page 18: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

pd(k)~(k+k0) C~0.1 knn(k)~k

Results

Protein interaction network

Yeast dataModel with family constraint

Model without family constraint

Page 19: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Degree-correlation profile

0.5 1.0 1.5 2.0 2.5

0.5

1.0

1.5

2.0

2.5

log10k

log 1

0k'

-0.2600

-0.2200

-0.1800

-0.1400

-0.1000

-0.06000

-0.02000

0.02000

0.06000

0.1000

0.1400

0.1800

0.2200

0.5 1.0 1.5 2.0 2.5

0.5

1.0

1.5

2.0

2.5

log10k

log 1

0k'

-1.100

-0.9000

-0.7000

-0.5000

-0.3000

-0.1000

0.1000

0.3000

0.50000.6000

Yeast Model

z = log10[P(k,k’)/Prand(k,k’)]

P(k,k’): probability that a randomly chosen link connects proteins with degrees k and k’.

Results

Page 20: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Results: Clustering of clustering

Yeast =0.8, =0.7

z = log10[P(c,c’)/Prand(c,c’)]

-0.5 -1.0 -1.5 -2.0 -2.5 -3.0 -3.5

-0.5

-1.0

-1.5

-2.0

-2.5

-3.0

-3.5

log10c

log 10c'

-3.000

-2.000

-1.000

0

1.000

2.000

3.000

-0.5 -1.0 -1.5 -2.0 -2.5 -3.0 -3.5

-0.5

-1.0

-1.5

-2.0

-2.5

-3.0

-3.5

log10c

log 10c'

-3.000

-2.000

-1.000

0

1.000

2.000

3.000

Page 21: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Results: Statistics

Item model yeast

Total number of proteins n 6000 ~6000

Number of interacting proteins N

~5000 4929

Average degree k 6.5 0.2 6.41

Clustering coefficient C 0.13 0.2 0.128

Assortativity r -0.09 0.04

-0.13

Size of the largest component N1

4900 7 4832

Results

Page 22: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Modular clustering is conserved for edge shuffling conserving family constraint.

Results

Model network before shuffling Model network shuffled with family

constraintModel network shuffled without family

constraint

Page 23: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

Protein family network

p (x)~(x+x0)

Results

Page 24: Doochul Kim (Seoul National University, Korea)  based on K.I- Goh, B. Kahng and D. Kim

• “Family constraint” is introduced as a mechanism behind the emergence of modularity in evolving networks.

• We applied it to the protein-protein interaction and protein family network of the yeast, and achieved detailed agreement in the topological properties with the empirical data.

• Our result suggests the physical constraint encoded in the domain structure within proteins is crucial in the organization of the protein interaction networks.

• The concept can be applied in other systems, e.g., the Internet (domains/hosts) and the social networks (social parties/individuals).

Summary