gene duplication models and reconstruction of gene regulatory network evolution from network...

Post on 14-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Gene duplication models Gene duplication models and reconstruction of gene and reconstruction of gene

regulatory network regulatory network evolution from network evolution from network

structurestructure

Juris Viksna, David GilbertJuris Viksna, David Gilbert

Riga, IMCS, 10.02.2006Riga, IMCS, 10.02.2006

Gene regulatory networks

[J.Rung,T.Schlitt,A.Brazma,K.Freivalds,J.Vilo Bioinformatics 18 S2 (ECCB), 202-210 ]

Yeast network:

Gene regulatory networks

• Directed graph

• Graph vertices correspond to genes

• An edge from gene A to B means that gene B is (directly) regulated by gene A

Properties of gene networks (1)

• Believed to be scale-free (vertex degrees satisfy so-called power law):

N(k) – number of vertices with degree k

N(k) k

Properties of gene networks (1)

N(k) k

[F.Chung,L.Lu,T.Dewey,D.Gallas JCB 10, 677-687]

Properties of gene networks (2)

• Believed to have a noticeable modularity

i - vertexki - number of neighbours for vertex iki - number of direct links between these

ki neighbours

Clustering coefficient (for vertex i):

Ci = 2ni/ki(ki1)

Properties of gene networks (2)

Clustering coefficient (for vertex i):

Ci = 2ni/ki(ki1)

[E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.Barabasi Science 297, 1551-1555]

Network evolution models (1)

[A.Barabasi, R.Albert Science 286, 509-512]

(i) networks expand continuously by the addition of new vertices,

(ii) new vertices attach preferentially to sites that are already well connected.

A model based on these two ingredients reproduces the observed stationary scale-free distributions.

Network evolution models (2)

"Hierarchical" model

[E.Ravasz,A.Somera,D.Mongru,Z.Oltvai,A.Barabasi Science 297, 1551-1555]

Sample hierarchical networks (scale-free and modular)

Network evolution models (3)

"Duplication" model

Scale-free with < 2 for ½ < p < 1

[F.Chung,L.Lu,T.Dewey,D.Gallas JCB 10, 677-687]

Network evolution models (4)

Network evolution models (M1)

M1

M1, p = 0.1, 5000 vertices

4.5

M1, p = 0.01, 5000 vertices

3

M1, p=0.05, d=0.2, 5000 vertices

M1, p=0.05, d=0.2, 5000 vertices

2.5

Network evolution models (M1)

M1

V E

20 4050 200100 700500

150001000

500005000

800000

Network evolution models (M2)

A

X'X

A

X'X

genome evolution

Network evolution models (M2)

A

X'X

genome evolution

A

X'X

A

X'X

or

Network evolution models (M2)

M2

M2, p = 0.1, 20000 vertices

M2, p = 0.1, 20000 vertices

1

Network evolution models (M2)

M2

V E

20 4050 80100 150500 7001000 15005000 7000

Evolution graphs

k+2 vertices

two types of edges:

- for swappable events (black)- for dependent events (grey)

Evolution graphs

Evolution graphs

Initial graph G

Graph G' obtained from G afterk (in this example k=6) evolutionsteps

Intermediate graphs between G and G' correspond to cuts of evolution graph (G and G' can also be obtained in this way)

Numbered vertices correspondto evolution steps and are markedby the vertices duplicated in thecorresponding steps

Evolution graphs – some questions

EquivalenceDecide whether 2 given evolution graphs are equivalent

Irreducible networks – networks that can’t be obtainedfrom simpler networks by evolution graph

Uniqueness of evolutionIs it possible that D(G1,E1)= D(G2,E2) for two different irreducible networks G1 and G2?

"Reverse engineering" problems

Given: Reconstruct:

G'

G

E

"Reverse engineering" problem (1)

(Assuming either model M1 or M2.)

Reconstruction of evolution graph

For a given network N’ find an irreducible network N, the sequence of duplication events D1,...,Dm and the corresponding evolution tree, such that N’=D(N,E).

"Reverse engineering" problem (2)

(Assuming either model M1 or M2.)

Reconstruction of duplication event

For a given network N’ find a network N and a duplication event D, such that N’=D(N).

"Reverse engineering" problem (3)

(Assuming either model M1 or M2.)

Reconstruction of the largest duplication event

For a given network N’ find a network N with the smallest possible number of genes and a duplication event D, such that N’=D(N).

"Reverse engineering" - complexity

For a given network N’ find a network N with the smallest possible number of genes and a duplication event D, such that N’=D(N).

• at least as hard as graph isomorphism problem

• likely NP-hard (maximum clique for reconstruction graphs)

• reconstruction graphs are much smaller than networks

• still might be practically solvable for random graphs of reasonable size (few tens of thousands of vertices).

Algorithm – stage 1

Partition G' vertices into orbits

Can be done e.g. with nauty package

One can try to use some property p which is more simple to compute than automorphisms and is such that p(G1)=p(G2) for isomorphic graphs G1 and G2.

Reconstruction graphs

Vertices correspond to non-singleton orbits

Two types of edges: - (1) have to participate in the same duplication event (solid) - (2) can not participate in the same duplication event (dotted)

Algorithm – stage 2

Find reconstruction graph

Algorithm – stage 3

Find the largest independent set (according to type 2 edges)in reconstruction graph

Algorithm – stage 4

- if all selected orbits contain just 2 nodes, we are practicallydone

- otherwise we have to find a pair of (largest) sets of vertices from selected orbits, which correspond to duplication event[currently exhaustive search]

Algorithm

Evolution graph can be reconstructed by repeated use ofLargest duplication event

Algorithm - efficiency

- using nauty we can deal with networks with < 200 genes

- for larger graphs one can use heuristics to computeorbits

- vertex/edge counts at different DFS levels seems to workquite well

- likely to find a large part of duplication event

- for <200 vertices often gives the exact result

Algorithm – Model 2

General case – check automorphisms for all k-tuplesof vertices

A serious problem even for k=2

However, large components are duplicated not that often

Previous algorithm could be used to find "large" partof duplicated genes

Still an open problem

Also, a question about good heuristics

Model 2 – Component sizesModel M2

550 vertices132 duplications

Model 2 – Component sizes

Constructing random network with 20000 genes:

Component sizes #of events1 1770082 3423 974 495 376 187 138 1010,11,14 49,12,13,15,27 316,24 217,18,21,22,31,27 1

Experiments with yeast network

6270 genes106 regulators

Experiments with yeast network

p=0.0001

E=106

V=216

Experiments with yeast network

277 pairs of duplication candidates were discovered

Few "real": COS5 and COS8, YLR460C and YNL134L

All 5962 genes were compared all-v-all using SW

Normalized compression score: ssearch_score(P1,P2)/min{length(P1),length(P2)}

Scores for the found duplication pairs were compared withaverage values

Experiments with yeast network

Observed distances vs average, all non-adjacent gene pairs

top related