rna bioinformatics - dartmouth computer sciencerockmore/csss2006/stadler...rna bioinformatics peter...
Post on 14-Oct-2020
1 Views
Preview:
TRANSCRIPT
RNA Bioinformatics
Peter F. Stadler
Bioinformatics Group, Dept. of Computer Science & Interdisciplinary Center forBioinformatics, University of Leipzig
Institute for Theoretical Chemistry, Univ. of Vienna (external faculty)The Santa Fe Institute (external faculty)
CSSS, June 2006
Overview
PART 1: RNA Structures and How to Compute Them
PART 2: RNA Landscapes
PART 3: The Modern RNA World
PART1Why RNA?
until relatively recently:Central Dogma of Molecular BiologyDNA → RNA → ProteinDNA = “genetic memory”, RNA = working copy, proteins dothe work
around 1980: discovery of catalytic RNAs (Nobelprize for TomCech and Sidney Altman)nevertheless long considered “exotic” remnants from theancient RNA world
around 2000: structure of the ribosome showns that theribosome is an “RNA enzyme”
around 2000: microRNAs are discovered as a large class ofregulatory RNAs that inhibit translation of proteins
2006: the ENCODE project shows that human geneexpression is quite different from textbook knowledge
RNA Bioinformatics
RNA Secondary Structures are an appropriate level of description
explain the thermodynamics of RNA Structures
often highly conserved in evolution
can be computed efficiently
Many Functional RNAs are Structured
(a) Group I intron P4-P6 domain(b) Hammerhead ribozyme(c) HDV ribozyme
(d) Yeast tRNAphe
(e) L1 domain of 23S rRNA
Hermann & Patel, JMB 294, 1999
The RNA Model
GCGGGAAU
AGCUC
AGUUGG U A
G A G CA
CGA
CC
UU
GC C
AAGGUCGGGGU
CG C G A G
U U CGA
GUCUCGU
UUCCCGC
UC
CA
GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA
Formal Definition
A secondary structure on a sequence s is a collection of pairs (i , j)with i < j such that
Base pairing rules are respected, i.e., (i , j) ∈ Ω implies (si , sj)form an allowed pair (GC, CG, AU, UA, GU, UG)
Each base is involved in at most one pair, i.e., Ω is amatching, (i , j), (i , k) ∈ Ω implies j = k and (i , k), (j , k) ∈ Ωin implies i = j .
(i , j)Ω implies |j − i | > 3 (sterical constraint)
No-crossing rule: (i , j), (k, l) ∈ Ω and i < k implies eitheri < k < l < j or i < j < k < l .This excludes so-called pseudoknots
Let’s count the structures . . .
Counting secondary structures. Given a sequence of length n.Πkl = 1 if sequence positions k, l can form a pair GC, CG, AU,
UA, GU, UG and Πkl = 0 otherwise.Nkl = number of structures of the subsequence from k to l .Basic recursion:
• xxxxxxx +∑
(xxxx)xxxx
Nkl = Nk+1,l +l∑
j=k+m
ΠkjNk+1,j−1Nj+1,l
RNA Folding in a nutshell
i jj i i+1 j i i+1 k−1 k k+1|=
Nij = Ni+1,j +∑
k(i,k)pair
Ni+1,k−1Nk+1,j
Eij = minEi+1,j + min
k(i,k)pair
(Ei+1,k−1 + Ek+1,j + εik)
Zij = Zi+1,j +∑
k(i,k)pair
Zi+1,k−1Zk+1,j exp(−εik/RT )
Partition function: Z =∑
Ω exp(−E (Ω)/RT )
A word on the Partition Function
The partition function is the link between the combinatorics of thestructures (in general: states in an ensemble) and thethermodynamic properties of the physical ensemble, e.g.:
Free energy G = −RT lnZ
Expected Energy 〈E 〉 = RT 2 ∂ lnZ∂T
Heat Capacity Cp = −T ∂2G∂T 2
0 20 37 100
0
1
2
3
4
5
CUGUAUUGUUGUAUAGCCCGUGUGGUAAUAUGG
T [C]
C(T
) [k
cal•(
mol
•K)-1
]
Realistic Energy Model
GC
U
A
C
G
closing base pair
A
5
3
3
5
A
5
3
closing base pair
interior base pairsclosing base pair
3
C
U
A
A
U G
C
closing base pair
multi-loop
interior base pair
A U
interior base pair
interior base pair
hairpin loop
bulge
C
G
C U
A
closing base pair
stacking pair
interior loop
G
G
G
3
5A
G
C
CA
5
Parameters from large number of melting experiments by Douglas
Turner, David Matthews, John Santa Lucia, and others
Recursions for Linear RNAs
M1 M1
M1
i u u+1|
MC
= |
= | |
FC
i j i+1 j i
hairpin Cinterior
i j i i k l j
k k+1 j
=
C
F F
i j
M
=i ij j−1
j
j| C
i j
M
ui+1 u+1i j−1 j
j i
M
j−1 ji u u+1|C
Recursions for Linear RNAs
Fij free energy of the optimal substructure on the subsequencex [i , j].
Cij free energy of the optimal substructure on the subsequencex [i , j] subject to the constraint that i and j form a base pair.
Mij free energy of the optimal substructure on the subsequencex [i , j] subject to the constraint that that x [i , j] is part of amultiloop and has at least one component, i.e., asub-sequence that is enclosed by a base pair.
M1ij free energy of the optimal substructure on the subsequence
x [i , j] subject to the constraint that that x [i , j] is part of amultiloop and has exactly one component, which has theclosing pair i , h for some h satisfying i ≤ h < j .
Recursions for Linear RNAs
Fij = minFi+1,j , min
i<k≤jCik + Fk+1,j
Cij = minH(i , j), min
i<k<l<jCkl + I(i , j ; k, l),
mini<u<j
Mi+1,u + M1u+1,j−1 + a
Mij = min
mini<u<j
(u − i − 1)c + Cu+1,j + b,
mini<u<j
Mi ,u + Cu+1,j + b, Mi ,j−1 + c
M1ij = min
M1
i ,j−1 + c , Cij + b
Backward Recursion: Base Pairing Probabilities
pij =Z1,i−1Zi ,jZj+1,n
Z1,n+
∑
k<i
∑
l>j
pklΞij ,kl .
Ξij ,kl is a ratio of the two partition functions:
Zij ,kl ... both i , j and k, l pair
Zkl ... k, l pair.Simplest case:Zij ,kl = Zk+1,i−1ZijZj+1,l−1ζkl where ζkl = exp(−βkl/RT ) is theBoltzman factor of the pairing energy
Backward recursion: full modelBackward recursion:
Pkl = Pkl +
∑
p<k;q>l
Ppq
ZBk,l
ZBp,q
e−I(p,q,k,l)
+
∑
p<u<k
ZMp+1,uZ
M1u+1,k−1)
e−(a+(q−l−1)c)
+
∑
l<u<q
ZMl+1,uZ
M1v+1,q−1)
e−(a+(k−p−1)c)
+ ZMp+1,k−1Z
Ml+1,q−1
k lp q
Single-Stranded Circular RNAs
Viroid RNA
Hepatitis Delta Virus Genome
Cryptic by-products of splicing formed intronic sequence
Circularized C/D box snoRNAs were recently reported inPyrococcus furiosus
Synthetic constructs for in vitro selection
Circular, Linear, and Interacting RNAs
In the maximum matching case=⇒ same algorithm for all three cases
CIRCULAR FOLDING LINEAR FOLDING BINARY COFOLDING
i
j j j
i i
1n
1
n
1
n’
n
1’
Linear versus Circular Folding
Linear folding: energy contributions inside a pair (i , j) only.Co-folding: additional contribution for loop spanning [n, 1].
i
j
no energy contribution for external loop
i
j
1
n
k
l
p
q
1n
no external loop
extra contribution
external loop
Strategy 1 (e.g. Michael Zucker’s mfold)For each pair (i , j): compute energy both inside and outsidethe pair⇒ doubles memory requirements
Strategy 2 (Vienna RNA Package)First compute linear folding energies. Then compute energiesfor the loop spanning [n, 1].
1
n
1
n
1
n
hairpin loop interior loop or bulge multi−branch loop
Implementing Circular Folding
Relative to linear folding, only the loop containing the cut has tobe re-evaluated.Three cases: cut in Hairpin, Interior-, or Multi-loop
F = minF H ,F
I ,F
M
M1 M1
= | |n1
n1
n1
n1C
C
C
kl
p
interior
hairp
in kk+1
ql
k
F o
M2
M
k n
=
k nu u+1
M 2
Exterior Hairpin.
F H = min
p<qCpq + H(q, p)
Exterior Interior Loop.
F I = min
k<l<p<qCpq + Ckl + I(q, p, l , k)
Exterior Multi-Loop.
Modified decomposition: one or more components M1,k +exactly two components M2
k+1,n
M2kn = min
k<u<n
(M1
ku + M1u+1,n
)
F M = min
1<k<n
M1,k + M2
k+1,n + a
Folding energy: F = minF H ,F
I ,F
M
Application
sof
Circu
larFold
ing
Itdoes
make
adiff
erence
120
4060
80100
120140
160180
200220
240260
284starting point of linearized sequence
0 10 20 30 40 50
Structure distance
-6 -4 -2 0 2 4Energy difference
CU
GGGGAAUUUCUCUGCGGGACCAA
AUAA
A AACAGCUUGUGGAGGGAACAUACCUGAAGAG
GG
AUCCCCGGGG
AA
A UCU
CUUCAG
AC
UCGUCGAGGGGAGGGCG
CCGCGGAUC
ACUGG
CGUCCAGC
ACC
GGAA
CAGGAG
CU C G
UCUCCUU
CCU
UCC
AU
CGCUGGCU
CCA
CAUCCGAUCG
UCGCUUCUUCCUUCGCGA
CCUGAG
AU
AGA A
ACU
ACCCGGUG
GAUA
CA
ACUCUUGGGUUGUUCCUCCCAGGCUUGUU A
AUAA
AAAU
GGCCCGCGUUUG
AGACCCC
U
Citru
sviro
idIV
RNAfold-circ
inth
eViennaRNAPackage
Local structures
Idea: Restrict Recursion to base pairs (i , j) with j − i < L.Special interest in robust structures:Z
u,Lij . . . partition function of sub-sequence [i , j] when sequence
window [u, u + L] is folded
pu,Lij . . . probability that i and j form a base pair when window
[u, u + L] is folded.
Zu,Lij =
Zij if [i , j] ⊆ [u, u + L]0 otherwise
pu,Lij =
Zu,L1,i−1Z
u,Li ,j Z
u,Lj+1,n
Zu,Lu,u+L
+∑
k<i
∑
l>j
pu,Lkl Ξu,L
ij ,kl
=Zu,i−1Zi ,jZj+1,u+L
Zu,u+L
+∑
k<i
∑
l>j
pu,Lkl Ξij ,kl .
Robust local structuresAverage probability of an (i , j) pair over all folding windowscontaining the sequence interval [i , j]
πLij =
1
L − (j − i) + 1
i∑
u=j−L
pu,Lij .
Direct Recursion:
πLij =
1
L − (j − i) + 1
iX
u=j−L
Zu,L1,i−1
bZu,Li,j Z
u,Lj+1,n
Zu,L1,n
| z π∗L
ij
+1
L − (j − i) + 1
iX
u=j−L
X
k<i
X
l>j
pu,Lkl
Ξij,kl
= π∗Lij +
i−1X
k=j−L
i+LX
l=j+1
kX
u=l−L
pu,Lkl
Ξij,kl
L − (j − i) + 1= π∗L
ij +
i−1X
k=j−L
i+LX
l=j+1
L − (k − l) + 1
L − (j − i) + 1πL
klΞij,kl .
(1)
C C U U G G C C A U G U A A A A G U G C U U A C A G U G C A G G U A G C U U U U U G A G A U C U A C U G C A A U G U A A G C A C U U C U U A C A U U A C C A U G G U G A U U U A G U C A A U G G C U A C U G A G A A C U G U A G U U U G U G C A U A A U U A A G U A G U U G A U G C U U U U G A G C U G C U U C U U A U A A U G U G U C U C U U G U G U U A A G G U G C A U C U A G U G C A G U U A G U G A A G C A G C U U A G A A U C U A C U G C C C U A A A U G C C C C U U C U G G C A C A G G C U G C C U A A U A U A C A G C A U U U U A A A A G U A U G C C U U G A G U A G U A A U U U G A A U A G G A C A C A U U U C A G U G G U U U G U U U U U U G C C U U U U U A U U G U U U G U U G G G A A C A G A U G G U G G G G A C U G U G C A G U G U A C A G U U G U G U A C A G A G G A U A A G A U U G G G U C C U A G U A G U A C C A A A G U G C U C A U A G U G C A G G U A G U U U U G G C A U G A C U C U A C U G U A G U A U G G G C A C U U C C A G U A C U C U U G G A U A A C A A A U C U C U U G U U G A U G G A G A G A A U A U U C A A A G A C A U U G C U A C U U A C A A U U A G U U U U G C A G G U U U G C A U U U C A G C G U A U A U A U G U A U A U G U G G C U G U G C A A A U C C A U G C A A A A C U G A U U G U G A U A A U G U G U G C U U C C U A C G U C U G U G U G A A C A C A C C U U C A U G C G U A U C U C C A G C A C U C A U G C C C A U U C A U C C C U G G G U G G G G A U U U G U U G C A U U A C U U G U G U U C U A U A U A A A G U A U U G C A C U U G U C C C G G C C U G U G G A A G A
133029828 133029088Human chromosome X (minus strand)
mir-92-2mir-19b-2mir-20bmir-18bmir-106a
Local structures (L=100) in a 740 nt region of human X chromosome
Cofold: How to deal with Concentration?
Algorithmically that same as linear foldingspecial energy contribution for “loop with the cut”
Additional energy contribution for forming duplex
At least 5 molecular species need to be taken into account(Dmitrov & Zuker, 2005): A, B , A2, B2, AB .
Their folding energies and partition functions are easilycomputed
Cofold
A U G A A G A U G A C U G U C U G U C U U G A G A C A
A U G A A G A U G A C U G U C U G U C U U G A G A C AAU
GA
AG
AU
GA
CU
GU
CU
GU
CU
UG
AG
AC
A
AU
GA
AG
AU
GA
CU
GU
CU
GU
CU
UG
AG
AC
A
AUGAAG
AUG A
CUG
UC
UGUC
UUG
AGACA
Dot plot (left) and mfe structure representation (right) of thecofolding structure of the two RNA molecules AUGAAGAUGA (red)and CUGUCUGUCUUGAGACA.
Cofold: Concentration dependencies
Q = V n a!b! × (Z ′A)nA(Z ′AA)nAA(Z ′AB)nAB (Z ′BB)nBB (Z ′B)nB
nA!nB ! 2 nAA! 2 nBB !nAB !
where a = nA + 2nAA + nAB . The system minimizes the freeenergy −kT lnQ.solving this optimization problem yields the equilibria:[AA] = KAA [A]2 , [BB ] = KBB [B ]2 . [AB ] = KAB [A] [B ].with [A] = 6.023 × 1023nA, etc., and
KAA =Z ′AA
(ZA)2=
(ZAA − (ZA)2)e−ΘI /RT/2
(ZA)2=
1
2e−ΘI /RT
(ZAA
(ZA)2− 1
)
KBB =1
2e−ΘI /RT
(ZBB
(ZB)2− 1
)
KAB = e−ΘI /RT
(ZAB
ZAZB− 1
)
1 10total siRNA concentration b [nmol]
0
10
20
conc
entr
atio
n [n
mol
]A.siA.AAsiA’.si’A’.A’A’si’
Example for the concentration dependency for two mRNA-siRNAbinding experiments.
RNAup: Small RNAs Binding to Large Ones
RNA folding excludes pseudoknots, i.e., non outerplanargraphs
cofold thus does not allow small RNA binding into loopregions of large ones
... but this happens in reality
Remedy: Compute energy/partition function
Pu[i , j] =Z [1, i − 1] × 1 × Z [j + 1,N]
Z︸ ︷︷ ︸exterior
+∑
p,qp<i≤j<q
Ppq ×Zpq[i , j]
Z b[p, q]︸ ︷︷ ︸enclosed
that subsequence [i , j] is unpaired and the energy of binding ashort molecule in this location
RNAup
q
i j
1n
p
lk
i
j i
j
k l
qp p q
1 n 1 n
p q
i j
1 n
i j
1 n
qp p q
i j
1 n
(a) (b’) (b”) (c) (d) (e)
Zpq[i , j ] = exp(−βH(p, q))︸ ︷︷ ︸(a)
+∑
p < i ≤ j < k or
l < i ≤ j < q
Z b[k , l ]e−βI (p,q;k,l)
︸ ︷︷ ︸(b)
+∑
p<i≤j<q
Zm2[p + 1, i − 1]e−βc(q−i)
︸ ︷︷ ︸(c)
+∑
p<i≤j<q
Zm[p + 1, i − 1]Zm[j + 1, q − 1]e−βc(j−i+1)
︸ ︷︷ ︸(d)
+∑
p<i≤j<q
Zm2[j + 1, q − 1]e−βc(j−p)
︸ ︷︷ ︸(e)
RNAup: Interaction part
Z I [i , j , i∗, j∗] =∑
i<k<ji∗>k∗>j∗
Z I [i , k, i∗, k∗]e−βI (k,k∗ ;j ,j∗)
Z ∗[i , j] = Pu[i , j]∑
i∗>j∗
Z I [i , j , i∗, j∗];
P∗[i , j] = Z ∗[i , j]/∑
k<l
Z ∗[k, l ]n (3’)
m (3’)
1 (5’)
i
j
k k*
1 (5’)j*
i*
RNAup: Application
0
0.2
0.4
0.6
0.8
1
Prob
abili
ties
VR1 straight VR1 HP5_16 VR1 HP5_11
0
0.2
0.4
Exp
ress
ion
Sequence position
160 180 160 180 160 180 160 180
VR1 HP5_6
1060 1080
-25
-20
-15
-10
-5
0
∆G
i [kc
al/m
ol]
Binding of siRNAs to VR mRNA.Pu[i , i ] (dashed line), P∗
i (thick black line), ∆Gi (thick red line).Below: activity of siRNA
Alternative Approach
Consider RNA Folding as a Machine Learning ProblemContext Free Grammar + probabilities for production rules⇒ Stochastic Context Free Grammarssee work by Sean Eddy, Jotun Hein, and collaborators
Folding Kinetics
RNA molecules may have kinetic traps which prevent them from reachingequilibrium within the lifetime of the molecule. Long molecules are oftentrapped in such meta-stable states during transcription.Possible solutions are
Stochastic folding simulations can predict folding pathways and finalstructures. Computationally expensive, few programs available.
Predicting structures for growing fragments of the sequence canshow whether large scale re-folding will occur during transcription.Cheap but inaccurate.
Analysis of the energy landscape based on complete suboptimalfolding can identify possible traps (local minima).
Kinetic Folding Algorithm
Simulate folding kinetics by a Monte-Carlo type algorithm:Generate all neighbors using the move-setAssign rates to each move, e.g.
Pi = min
1, exp
(−
∆E
kT
)
Select a move with probability proportional toits rateAdvance clock 1/
∑i Pi .
P4
P3
P5P6
P7
P8
P1 P2
Characterization of Landscapes
A landscape consists of a configuration space V , a move set within thatconfiguration space and an energy function f : V → R.Simplest move set for secondary structure: opening and closing of basepairs.Speed of optimization depends on the roughness of the Landscape.Measures of roughness suggested in the literature:
Number of local optima
Correlation lengths (e.g. along a random walk)
Lengths of adaptive walks
Folding temperature vs. glass temperature Tf /Tg
Energy barriers between the local optima. Especially, themaximum barrier height (“depth” in SA literature)
Energy barriers
E [s,w ] = min
max
[f (z)
∣∣z ∈ p] ∣∣∣∣ p : path from s to w
,
B(s) = minE [s,w ] − f (s)
∣∣w : f (w) < f (s)
Depth and Difficulty(borrowed from simulated annealing theory)
D = maxB(s)
∣∣s is not a global minimum
ψ = max
B(s)
f (s) − f (min)
∣∣∣∣s is not a global minimum
Energy Barriers and Barrier Trees
Some topological definitions:A structure is a
local minimum if its energy is lowerthan the energy of all neighbors
local maximum if its energy ishigher than the energy of all
neighbors
saddle point if there are at leasttwo local minima that can bereached by a downhill walk startingat this point
Calculating barrier trees
M1
M2
M3
S12
S23
M3
M3
M1
M1
M2
M2
S12
S12
S23
S23
M3
M1
M2
M3
M1
M2
S23
M3
M1
M2
S12
S23
The flooding algorithm:Read conformations in energy sortedorder.For each confirmation x we havethree cases:
(a) x is a local minimum if it hasno neighbors we’ve alreadyseen
(b) x belongs to basin B(s), if allknown neighbors belong toB(s)
(c) if x has neighbors in severalbasins B(s1) . . .B(sk) then it’sa saddle point that merges
these basins.Basins B(s1), . . . ,B(sk) arethen united and are assigned tothe deepest of local minimum.
Information from the Barrier Trees
Local minima Saddle points Barrier heights Gradient basins Partition functions and free energies of (gradient) basins Depth and Difficulty of the landscape
N.B.: A gradient basin is the set of all initial points from which a
gradient walk (steepest descent) ends in the same local minimum.
Energy Landscape of a Toy Sequence
G C U A U U A
GC
GC
G
UG
A
CG
UG
CG
U
UU
A
G C U A U UCG
UG
CG
U
UU
A
C
C
G
G
G
UG
A
A G C U A U UCG
UG
CG
U
UU
A
C C G G A
G
UG
A
G C U A U UCG
UG
CG
U
UU
A
C
C
A
G
UGGGA
G C U A G UCG
UG
CG
U
UU
A
G G G
AC
CU
U
A
G C G U G GCG
UG
CG
U
UU
A
G A
AU
C
CU
U
A
G U G G G ACG
UG
CG
U
UU
A
GC
AU
C
CU
U
A
G U
UG
CG
U
UU
A GC
AU
C
CU
U
A
C G G GG
G U
CG
U
UU
A
GC
AU
C
CU
U
A
C G
G GUGG
G U
GC
AU
C
CU
U
A
C G
GU
C GUUUAGGG
8 9 10
1 2 3
4567
A A
G U
GC
AU
C
CU
U
A
C G
GU
C G
UUUAGGG
11
U AA
Steps [arbitrary]
−8
−6
−4
−2
0
2
Ene
rgy
[kca
l/mol
]
1
2
3
4
5
6
7
8
9
10
11
2.45[5]
[9]3.80 1 [11]
1.60 3 [7]
13
141.30 9
1.40 192.10 11
3.80 2 [1][3] 12
1.20 8 [4]
173.60 4
2.20 73.61 5
1.40 161.90 10
1.50 152.00 205.02 6
3.90 18E
Folding Kinetics
Transition rates from x to y :
ryx = r0e−
E6=yx−E(x)
RT for x 6= y
rxx = −∑
y 6=x
ryx
Kinetics as a Markov process:
dpx
dt=
∑
y∈X
rxypy (t) .
Transition states:E 6=
yx = maxE (x),E (y)
or more complex models (Tacker et al 1994, Schmitz et al 1996)
Reduced Description of the Folding Dynamics
Macrostates = Classes of a partition of the state space.Partition function for a macro state:
Zα =∑
x∈α
exp(−E (x)/RT )
Free energy of a macro state:
G(α) = −RT ln Zα
rβα =∑
y∈β
∑
x∈α
ryxProb[x |α] for α 6= β
=1
Zα
∑
y∈β
∑
x∈α
ryxe−E(x)/RT
rβα “on flight” while executing the barriers program.Transition state free energy:
G6=βα = −RT ln
∑
y∈β
∑
x∈α
e−E6=xy
RT
12
3
4
5
6
7
8
910
11
12
13
14
0.0
2.0
4.0
6.0
1.2
1.5
2.1
2.8
1.1
0.9
2.4
2.8
0.5 1.3
4.7
0.9
1.2
0.8
2.0
lillyA simple model sequence
10-1
100
101
102
103
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe23456
10-1
100
101
102
103
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe23456
10-1
100
101
102
103
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe23456
103
104
105
106
107
108
109
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
101
102
103
104
105
106
107
108
time
0
0.2
0.4
0.6
0.8
1
popu
latio
n pr
obab
ility
mfe255680
Refolding of a tRNA molecule.
Summary I:
RNA structures can be computed efficiently by means ofdynamic programming
Computations are based on a set of carefully measures energyparameters and an additive energy model
Algorithms exist for ground state energy and structure, fullpartition functions, density of states, interacting structures,. . .
The folding kinetics of a given RNA Sequence can also beinvestigated as the level of secondary structures
VIENNA RNA PACKAGE
PART II: How Do RNAs Evolve
Basic Assumption
Selection Acts on Secondary Structures, Mutations acts on theunderlying sequences⇒ We need to understand the sequence-to-structure map of RNAs(hang on, we’ll discuss the empirical evidence for that a bit later)
Sewall Wright’s Fitness Landscapes
Fitn
ess
Phenotype
How do realistic fitness landscapes look like?
Biological Landscapes
The RNA case is a special case of a very general paradigm:
genotype 7→ phenotype 7→ fitness
What is the relationship between Genotyp and Phenotype?
Central topic in any theory of evolutionbecause:* Selection acts on the Phenotype* Mutation/Recombination acts on the GenotypeBiopolymers as the simplest model:The molecule is both genotype (sequence) and phenotype (structure).
The map from genotype to genotype is determined by physical chemistry:
⇐⇒ folding problem
Computational Analysis of the RNA Map
There are many more sequences than structures.(.)-string: 3-letters (with constraints)
=⇒ less than 3n structures
but 4n sequences.
=⇒ Redundancy
How are sequences folding into the same structure distributed insequence space?Neutral Set S(ψ) = x ∈ Qn
α|f (x) = ψ
Sensitivity and Neutrality
GCGGGAAU
AGCUC
AGUUGG U A
G A G CA
CGA
CC
UU
GC C
AAGGUCGGGGU
CG C G A G
U U CGA
GUCUCGU
UUCCCGC
UC
CA
GCGGGUAUA
GCUCAGU
UGG U A
G A G CA C G
A C CUU G CC A A
G GU
C G G G GU CG C G A G
U U CGA
GUCUCGU
UUCCCGCUCC
A
Effect of a single
point mutation0 100 200 300
Structure Distance
10-4
10-3
10-2
10-1
100
Fre
quen
cy
Distribution of structure distances
The Random Graph Model
Approach:Model S(ψ) as a random induced subgraph Γ with a given value
λ =〈#neutral neighbors〉
(α− 1)n
Threshold value:
λ∗ = 1 −
(1
α
) 1α−1
Theorem. [Reidys, Stadler, Schuster]If λ > λ∗ then Γ is a.s. dense and connected,if λ < λ∗ then Γ is a.s. neither dense nor connected
A complication: Base Pairing Rules
Unpaired bases:Alphabet A = A,U,G,C
Paired bases: 5’ and 3’ side correlated:Alphabet: B = AU,UA,GC,CG,GU,UG, .
Thus consider only the set of compatible sequences C (ψ):S(ψ) ⊆ C (ψ) ≡ Qnu
4 ×Qnp
6 .=⇒ Two neutrality parameters λu and λp
Connected Components of Neutral Networks
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0λu
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
λp
gray many small components red 1 connected componentgreen 2 equal sized components yellow 3 components size 2:1blue 4 equal sized components
Explanation: for this deviation from the random graph model in terms of the energy model. Some structures can
be made only with a significant bias in the G/C ratio.
x??P (d)
Length of Neutral Path: d !0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0
0.250.200.150.100.050.00
0 20 40 60 80 100 120
Chain lenght n
0
5
10
15
20
25
30
Cov
erin
g ra
dius
from enumeration from inverse folding lower bound
Distance to Target structure Covering radiuslength neutral paths
Closest Approach
Intersection Theorem. For any two secondary structures φ, andψ holds
C (φ) ∩ C (ψ) 6= ∅
What is the distance of neutral networks
δ(φ,ψ) = mind(x , y)|f (x) = φ and f (y) = ψ
Random graph Theory: If λ > λ∗ then δ(φ,ψ) ≈ 2.Computer simulations: upper bounds on δ(φ,ψ):
n GC AU AUGC
50 5.6 2.6 2.170 9.3 4.6 3.4100 13.0 7.8 5.6
AccessibilityFontana & Schuster 1998
Idea: The “interface” between two structures is large is they are“similar”.More precisely: Structure ψ is accessible for φ if x ∈ S(φ) is like tohave neighbor (mutant) x ′ ∈ S(ψ).Structural characterization of “easy” (continuous) transitions:
Shorteningof stacks
Elongationof stacks
Opening ofconstrained stacks
Closing ofconstrained stacks
SUMMARY: Sequence-Structure Map of RNA
1. Redundancy: Many more sequences than structures
2. Sensitivity: Small changes in the sequences may lead to largechanges in the structure
3. Neutrality: A substantial fraction of mutations does not alter thestructure.
4. Isotropy: S(ψ) is “randomly” embedded in C (ψ).
Implications:
1. Neutral Networks: S(ψ) forms a connected “percolating” network insequence space for all “common” structures.
2. Shape Space Covering: Almost all structures can be found in arelatively small neighborhood of almost every sequence.
3. Mutual Accessibility: The neutral networks of any two structuresalmost touch each other somewhere in sequence space.
Simulated Trajectories
0.5 1.0 1.5time (arbitrary units)
5
15
25
35
45
stru
ctur
e di
stan
ce
0 510
40
proj
ectio
n co
ordi
nate
Punctuated equilibra = diffusion of neutral networks +constant rate of innovation +exponential selection of rare mutants
Proc.Natl.Acad.Sci. 93: 397-401 (1996)
Diffusion Constant
. . . can be deduced from Moran model:
D = λ6Anp
3 + 4Np(1 + 1/N) ∼
(3/2)A(n/N) p ≫ 0 orN ≫ 1
2Anp p ≪ 1
A . . . replication raten . . . sequence lengthN . . . population sizep . . . mutation rateλ . . . neutrality of network
Dynamics of Interacting Replicators
Ik + Ij −→ Il + Ik + Ij
With mutation:
xk = xk
∑
j
Akjxj −∑
i ,j
Aijxixj
+∑
l ,j
(QklAljxjxl − QlkAkjxkxj)
where
Qkl = (1 − p)n−d(k,l)
(p
α− 1
)d(k,l)
How does this behave in sequence space?
Simplest case: Simplest case: Akl = A0(1 − d(Ik , Il )/n):
0 1000 2000 3000 4000 5000time
0.0
0.2
0.4
0.6
0.8di
vers
ity
0 2×105
4×105
6×105
8×105
time lag τ
0
10
20
30
40
50
60
g(τ)
g(τ) =1
T2 − T1 + 1
T2∑
t=T1
‖p(t + τ) − p(t)‖2
B.M.R. Stadler, Adv. Complex Syst. (2003)
10-6
10-4
10-2
100
p
10-6
10-4
10-2
100
D
100
101
N/n
10-3
10-2
10-1
100
D
Left: Diffusion coefficient D as a function of the mutation rate for N = 10, 20, 30, 40, 80 and
n = 10, 20, 30, 40, 80 such that N/n = 1 after equilibration for 105 timesteps. Right: Dependence of the ratio
D/p on N/n.
An RNA-Based Model in the Plane
Target hypercycle with 8 mem-bers.
Model:Hypercyclically coupledspecies, each sequence hasa function that depends onits structure.
Spatial Extension: CA Model
Possible Catalysts Actual CatalystsPossible Replicators
Rules of replication. For each of the neighbors (•) of the empty cell (marked by a bold outline) the replication rate
ρz is computed taking into account their neighbors in the direction of the replication () as potential catalysts.
The neighbor with the largest values of ρz invades the empty position. In this example, for the chosen replicator,
only three of its neighbours are catalysts according to the hypercycle topology.
Spirals formed after 3000 generations in an evolution experimentstarted with 300 random sequences in the absence of parasites.see also Borlijst & Hogeweg (1993)
Diffusion in Sequence Space
2000 4000 6000 8000 10000time lag (tau)
2
4
6
8
10
12
g(ta
u)
0.0005 0.0010 0.0015 0.0020Mutation rate
0.002
0.003
0.004
Diff
usio
n co
nsta
nt
Summary
Neutrality of the Sequence-Structure Map impliesdiffusion/drift-like motion in sequence independent of detailsof the selection/mutation mechanisms and whether spatialextension is taken into account or not.
=⇒ The basic assumption of molecular phylogenetics, namelya dominating influence of drift in sequence evolution, holdstrue even when phenotypic evolution is dominated byinteractions(co-evolution).
TODO Development of a rigorous mathematical theorydescribing the motion in sequence space of a population withstrong interactions.
Evolutionary histories of some structured RNAs
Ribosomal RNAs (rRNAs) are the most frequently used sequencedata for reconstructing phylogenies from molecular dataHow does that work:In a nutshell:(1) compute evolutionary distances from the sequence data(2) “fit” an additive tree to the distances(In reality, there are other methods such as maximum parsimonyand maximum likelihood approaches, but the basic idea is thesame)Observation: all tRNAs have more or less the same clover-leafstructure.
MicroRNAs
processed from precursorhairpins
short (∼ 22nt) RNAmolecules
highly conserved
Function
bind to 3’UTRs of mRNAtargets
supress expression of thismRNA
mark mRNA molecule fordegradation
in plants involved in DNAmethylation
AGU
GCC
ACACU
CC
GUGUAUUUGACAAGCU
GAGU
U GGACACUC
CAU
G U GGU
AGA
GUGUCAGUUUGUCAAAUACC
CCA
AGUG
AGG
CACA
CGAU
GC
GCAU
MicroRNAs — processing and function
MicroRNAs ...
transcribed inlonger transcripts(primary-miRNA)
in some cases:polycistronic
“clusters”
Drosha processing→ precursor
miRNA
export to cytoplasmExportin-5 pathway
Dicer processing →mature miRNA
Evolution of microRNA Families: mir-17 clusters
Many miRNAs are transcribed from polycistronic transcriptsMost spectacular example: Human mir-17 clusters
100nt
Chr−13
Chr−X
Chr−7
19b−1 92−1
18X 20X 19b−2 92−2
106b 93 25
18 19a 2017
106a
91 17 18 19a 20 19b 92
106a 18X 20X 19b 92
106b 93 25
I−1
I−X
II−3
J. Mol. Biol. 339: 327-335 (2004)
Case Study: mir-17 clusters
AU
UG
CG
GC
GA
A U
A U
AC
AA
UA
GC
CAU
A
GCA
CG
C
GG
AU
AA
AU
UG
CU
UA
UA
GAUA
UAAG
AG
AU
C_
A_
G_
_U
AC
CA
A_
G_
UG
UA
AU
CG
AU
UG
_G
GC
UC
AU
GU
GC
UA
UA
A U
G C
A U
G C
G C
G C
U A
C G
C G
G U
C G
C G
C G
U A
G U
U _
C G
A U
C G
G C
U A
A U
U A
G C
A U
A U
X-106a
X-18X
X-20X
X-19b-2
X-92-2
. . . . . . . . ( ( ( ( ( ( ( ( ( ( ( ( ( ( . . . . ( ( ( ( ( . ( ( ( ( ( ( . . . ) ) ) . . ) ) ) ) ) ) ) ) . . . . ) ) ) ) ) ) ) ) . ) ) ) ) ) ) ( ( ( . . . . ) ) ) . . . ( ( ( . ( ( ( ( ( . . . . . . . . . . . . . . . . . . . . . . . . . ( ( ( ( . ( ( ( . . . ( ( ( . . . . . . . . ( ( ( ( ( . . ( ( ( ( . . . . . . . . ( ( ( . . . ( ( ( ( ( ( ( ( ( ( . ( ( ( ( . ( ( ( . ( ( ( ( ( . . . . . ( ( ( . . . ) ) ) . . . . . . . ) ) ) ) ) . ) ) ) . ) ) ) ) . ) ) ) ) . . ) ) ) ) ) ) ) ) ) ( ( ( . . . . . . . . . . . . ( ( ( ( ( ( ( . . . . . . . . . . . . . . . . . . . . ) ) ) ) ) ) ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( ( ( ( ( ( ( ( ( . . . . . . . . . . . ) ) ) ) ) ) ) ) ) . . . . . . . . . . . . . . . . . . . . ) ) ) . . . . . ) ) ) ) . . ) ) ) ) ) . . . ( ( ( . . . . ( ( ( ( ( . . ( ( ( ( ( ( ( ( ( ( ( . ( ( ( ( ( . ( ( ( . . . . . . . . . . . . . ) ) ) ) ) ) ) ) . ) ) ) ) ) ) ) ) ) ) ) . . . ) ) ) ) ) . . . . ) ) ) . . . . ) ) ) . . . ) ) ) . ) ) ) ) . . ( ( ( . . . . . . . ) ) ) . . . . . ) ) ) ) ) ) ) ) . . . . . . . ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( . . . ( ( ( ( . . . . . . . . . . . . . . . . . ) ) ) ) ) ) ) ) ) ) ) ) . . ) ) ) ) ) ) ) ) ) ) ) . . . . . . . . . . . ( ( ( ( . . . . . . . ) ) ) ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ( ( ( . . . . ) ) ) . . . ( ( ( . ( ( ( . ( ( ( ( ( . ( ( ( ( ( . ( ( ( ( . . . . ( ( ( ( ( ( ( ( ( . . . ) ) ) ) . ) ) ) ) ) . . . ) ) ) ) . ) ) ) ) ) . ) ) ) ) ) . ) ) ) . ) ) ) . . . . . . . .
0 100
200
300
400
500
600
700
X-106a
X-18X
X-20X
X-19b-2
X-92-2
(a) (b)
Structure of the pri-pre-mir-17 at the human X chromosome.
Construction of Gene Treesfrom concatenated sequences in the cluster
HsXPtX
MmX
RnX
Pt1Hs1 Mm1
Rn1
Xt
TnB
TrB
TnA
TrA
DrA
Dr14
974
898
CcX
1000
1000
998
DrB
1000
233617
427
10
00
954
462
464
TnCTrC
0.1
Pt−IIHs−II
Mm−IIRn−II
Xt−II
Dr−II−D
Tr−II−D
0.1
Distant Homologies with unreliable Alignments
How to quantify sequence similarity when we cannot get a goodalignment?
measure pairwise sequence similarity s(x , y)
compare to the distribution of similarity values of alignmentsof shuffled sequences
define a z-score
z(x , y) =s(m, y) − 〈s(π(x), π′(y))〉π,π′
√varπ,π′(s(π(x), π′(y)))
use z(x , y) as similarity measure in WPGMA clustering
Gen
eTre
eof
mir-1
7cl
ust
erm
ember
s
1710
620
X−
20
106b
9318
X−
18
9225
19a
19b1−
19b
X−
19b
Hs1−17
Hs1−18
Hs1−19a
Hs1−19b−1
Hs1−20
Hs1−92−1
Mm1−17
Mm1−18
Mm1−19a
Mm1−19b−1
Mm1−20
Mm1−92−1
Rn1−20
HsX−19b−2
HsX−92−2
HsX−106a
MmX−19b−2
MmX−92−2
MmX−106a
Hs3−25
Hs3−93
Hs3−106b
Mm3−25
Mm3−93
Mm3−106b
Dm−92aDm−92b
Ce−235
Pt1−17
Pt1−18
Pt1−19a
Pt1−19b−1
Pt1−20
Pt1−92−1
Rn1−17
Rn1−18
Rn1−19a
Rn1−19b−1
Rn1−92−1
Xt−17
Xt−18
Xt−19a
Xt−20
Xt−19b
Xt−92
HsX−18
HsX−20
PtX−106a
PtX−18
PtX−19b−2
PtX−20
MmX−18
MmX−20
RnX−106a
RnX−18
RnX−19b−2
RnX−20
RnX−92−2
CcX−19b−2
CcX−92
Pt3−93
Pt3−25
Pt3−106bRn3−106b
Rn3−25
Rn3−93
Cc3−25DrD−25
DrD−19b
DrD−93
TrD−17/106b
TrD−20/93
TrD−25
TrD−19b
XtB−93
XtB−25
DrA−17
DrA−18
DrA−92−1
DrB−17
DrB−18
DrB−19a
DrB−19b−1
DrB−92−1
Dr14−20
Dr14−18
Dr14−19−b1
TrA−17
TrA−18
TrA−19a
TrA−19b−1
TrA−20
TrA−92−1
TrB−17
TrB−18
TrB−19a
TrB−19b−1
TrB−20
TrB−92−1
TrC−18
TrC−19b
TnB−17
TnB−18
TnB−19a
TnB−20
TnB−19b
TnB−92
TnC−18
TnC−19b
TnA−17
TnA−18
TnA−19a
TnA−20
TnA−19b
TnA−92
22.0
20.0
18.0
16.0
14.0
12.0
10.08.0
6.0
4.0
2.0
0.0
z−sc
ore
Collapsed tree of microRNA subgroups
z−score
I−17
I−18
I−19a
I−20
I−19b
I−92
II−106b
II−93
II−19b
II−25
Ce−92
Dm
−92
18.0
16.0
14.0
12.0
10.0
8.0
6.0
4.0
2.0
0.0
mir−17 groupmir−92 group mir−19 group
obtained by collapsingvertebrate, insect,and nematode speciestrees to single vertices
next step:combine gene treesand syntenyinformation to aduplication history
Scenario for the evolution of the mir17 familyancestral mir17 cluster probably contained
mir-17, mir-19 and mir-92
181720
93106b 19b
19−3
1992
25
deletion
Scenario for the evolution of the mir17 familyfirst detectable duplication event:
branch mir-17 and mir-18
181720
93106b 19b
19−3
1992
25
deletion
19b is copy of 19
18 is copy of 17
Scenario for the evolution of the mir17 familyseries of duplications:
branch mir-19 and mir19b, mir-17 and mir-93
181720
93106b 19b
19−3
1992
25
deletion
19b is copy of 19
18 is copy of 17
93 is copy of 17
Scenario for the evolution of the mir17 familygenome wide duplication:
duplication of whole cluster and loss of individual miRNAs
181720
93106b 19b
19−3
1992
25
deletion
cluster duplication
Deletions
19b is copy of 19
18 is copy of 17
93 is copy of 17
Scenario for the evolution of the mir17 familyindependent miRNA duplications
in type I cluster
III
181720
93106b 19b
19−3
1992
25
deletion
20 is copy of 17
cluster duplication
Deletions
19b is copy of 19
18 is copy of 17
93 is copy of 17
Scenario for the evolution of the mir17 familysplit of teleosts and mammalia
teleost specific genome duplication
Mammalia
III
I−1I−XII
181720
93106b 19b
19−3
1992
25
deletion
20 is copy of 17
cluster duplication
Deletions
19b is copy of 19
18 is copy of 17
93 is copy of 17
Scenario for the evolution of the mir17 familysplit of teleosts and mammalia
teleost specific genome duplication
TeleosteiMammalia
I−BI−AI−CII−D
III
I−1I−XII
181720
93106b 19b
19−3
1992
25
deletion
20 is copy of 17
cluster duplication
Deletions
19b is copy of 19
18 is copy of 17
93 is copy of 17
History of the mir-17 cluster: updated data
TeleosteiTetrapoda
deletion
cluster duplication
I
II
I−X
IIII−D
I−B
I−C
I−A
20
93
18
106b
19
19b92
2519b−II
I−1
17/106a
20 is copy of 17
18 is copy of 17
93 is copy of 17
Further Examples: let-7 family
a1 f1 df2 m
ir98
a3/c
2
k g i c1 e a2jb
Ancestral Eubilaterian
Tetrapoda only
Ancestral vertebrate
lost in mammals loss of mir100 or mir125 in one paralog in most species
Dro
sop
hila
Ne
ma
tod
es
duplicationteleost genome
Mammalian transcription from
intron of coding sequenceintron of non−coding sequenceexon
Further Examples: mir-1 and mir-30
1−1 1−2Teleosts Teleosts
TetrapodaTetrapoda
Nematoda
Urochordata
Arthropoda
Sea Urchin
Xtr
0.10.0
Gga−1b
d
b
bd
c2
c2
teleosts
tetrapoda
teleosts
tetrapoda
tetr
ap
od
a
tetrapoda
teleosts
teleosts
tetrapoda
teleosts
tetrapoda
c1
e
a
c1e
a
0.10.0
Dr
mir-1 mir-30
Further Examples: mir-9, mir-23, mir130/301
tt
t t
t
t
t
Apis
Sea Urchin
Nematoda
Schistosoma (?)
Tetrapods
Teleosts Diptera
9−3
9−2
9−3
9−4
mir−79
mir−9a
mir−9b
mir−9c
9c 79 9b306 9a
0.1
mir-9
(2)
Dr22(
6.1)
/ Tn1
Dr22(
8.8M
) / T
n23
Dr11
/ Tn3
Dr8 /
Tn12
Dr22(
10.1
)
lost in Tn/Tr
23a
27a
24.2
23b
27b
24.1
Hs19 Hs9
(1)
mir-23 cluster
Gg
Md
Xt
Rn, Mm
Cf, Pt, Hs
Dr
Fr
Tv
mir−301
mir−130b
mir−130a
mir−301
mir−130a
? ?
mir-130 cluster
Expansion of the Metazoan MicroRNA Repertoire
0 10 20 30 40 50 60 70 80
21
140
Rn Bt Cf PtMdGgXtDr Tn TrCsCiOdSpAgBmTcAmCb Ce D.sp. Mm Hs
40
miRNA innovationsnon−local duplicationslocal duplications
2
1
11
18
171
4 44
114
22
23
131
10
11
1124
56
2
13
11
5
46
26 1
1
Sm
Similar Situation: snoRNA
snoRNAs direct chemical modification of other RNAs (mostlyrRNA, snRNA, and (some?) messenger RNAs
two classes: box-H/ACA and box-C/D
known in eukaryotes and archea, not in eubacteria
H/A
CA
box
snoR
NAs
inVer
tebr
ates
Mm_4_E1_1Mm_4_E1_2
Rn_8_E1Mm_9_E1
Hs_1_E1_1Pt_1_E1_1
Hs_1_E1_2
Pt_1_E1_2Oc_E1_1r
Ss_E1_1Cf_2_E1_1
Cf_2_E1_2Bt_E1
Gg_E1Xl_E1_6r
Xl_E1_1rXl_E1_5r
Xl_E1_4rXl_E1_3r
Xl_E1_2rTr_E1_3
Tr_E1_4Tr_E1_5Tr_E1_6
Ol_E1_1Tn_E1_1
Tn_E1_2Tn_E1_4
Tn_E1_3Om_E1_r
St_E1_rDr_E1_3r
Dr_E1_2rDr_E1_5r
Dr_E1_4r
98
.79
9.8
90
.9
90
.49
6.3
82
.0
89
.6
80
.2X
enop
us
Tel
eost
s
Chi
ck
Mam
mal
s
0.00
0.02
0.04
0.06
0.08
0.10
DrE2_1
DrE2_2
Tr_E2
Cf_23_E2_1
Hs_3_E2_1
Mm_9_E2_1
Rn_8_E2_1
Gg_E2_2
Xt_E2_2
Xt_E2_1
Mm_9_E2
Rn_8_E2
Hs_3_E2
Cf_23_E2
Gg_E2_1 0.00
0.05
0.10
66
.6
92
.3
85
.4
74
.7
Tet
rapo
da−
1
Tet
rapo
da−
2
93
.0
Dr_25_E1_1Dr_25_E3_3
Dr_25_E3_5Dr_25_E3_2
Dr_25_E3_4Dr_25_E3_6
Tr_E3_1Tn_E3_2
Tr_E3_2Tn_E3_1
Xl_E3_2Gg_9_E3_1
Xl_E3_1Cf_34_E3_2
Hs_3_E3_2Pt_2_E3_2
Rn_11_E3_1Mm_16_E3_2
Cf_34_E3_1Hs_3_E3_1
Pt_2_E3_1Mm_16_E3_1
Rn_11_E3_2
0.00
0.05
0.10
99
.6
99
.8
10
0.0
81
.2
54
.7
Mam
mal
s−1
Mam
mal
s−2
Tel
eost
s
E1
E2
E3
Ver
tebr
ate
YRN
As
Rn
RnMm
MmRn
FrTnTn
Dr
MmRn
GgMmRn
Out
grou
p
HsY5
XlYa XlY3
XlY3
MmY3
OmY1HsY4
CfY4
XlY4
HsY3
HsY1
ApY3
OcY1
Tr
Y4
Y1
Y3
Y5
Summary
The genotype-phenotype map of RNA is charcterized by aninterplay of “ruggedness” and neutrality
Selection plus drift results in diffusion on neutral networks
Many non-coding RNAs have highly constrained (i.e.,evolutionarily very well conserved) structures but fairly rapidlyevolving sequences
Drift of sequences is independent of the details of theselection mechanism
Ongoing research: elucidate the evolutionary histories ofstructured ncRNAs
PART III: The Modern RNA World
mRNA
pre−mRNA
tRNA rRNA
miRNA
pre−miRNA7SL + proteinsY + proteins
NUCLEOLUS
CAJAL BODY
7SL Y RNA vRNApre−tRNA RNAse_P
tRNA
MRPmiRNA
pre−miRNA
pri−miRNA
pre−rRNAsnoRNA
rRNASplicosome
snRNA scaRNA
Drosha
Introns mRNA
translational inhibition
RISCRNAi
Ro RNP SRP
Ribosome
vRNA + proteins
vRNP
Dicer
Proteins
NUCLEUS
Multiple Origins of ncRNAs
CraniataCephalochordata
UrochordataEchinodermataHemichordata
ChoanoflagellataFungiMicrosporidia
AlveolatesStramenophilesRhodophyta
Chordata
Metazoa
Green Plantsother protists
Eukarya
Protostomia
Bacteria
ArcheaLUCA
rRNA
tmRNA
snoRNA C/DsnoRNA H/ACA
RNAse MRPtelomerase RNAmost snRNAsvault RNAs ?
7SK ?
U7 Y RNA
tRNARNAseP7SL/SRP small bacterial RNAs
in Kinetoplastids onlygRNA
microRNA in multicellular animals and plants only ?
Surveys for noncoding RNAs
> 5% of the human genome is under stabilizing selection(from man/mouse comparison), less than 1/3 of this codes forprotein
Virtually the entire genome is transcribed as primary nucleartranscripts in at least one direction(ENCODE Genes&Transcripts group, unpublished data)
∼ 80% of the ENCODE regions are transcribed in as parts ofprotein coding transcripts including introns and UTRs
Only a tiny part of the primary transcripts is protein coding
Large fraction of apparently non-protein-coding cDNAs
The functions of most of these transcripts are unclear.
“There is need for reliable experimental and computational methods
for comprehensive identification of non-coding RNAs.”
–International Human Genome Sequencing Consortium, Nature 431, p.943, October 2004
The ENCODE Project
ENCyclopedia Of DNA Elements
Public research consortium launched by NHGRI in 2003
Purpose: “testing and comparing existing methods torigorously analyze a defined portion of the human genomesequence”.
Focus: specified 30 megabases ( 1% of genome) in more than20 species
Informally organized in subgroups: Sequencing Technology,Comparative Genomics, Genes and Transcripts, GeneticVariation, ...
Results from 1st phase currently under review
Phase 2: scale-up to complete genome
Highlights from
ENCODE Genes and Transcripts Analysis Group
(Data presented by Tom Gingeras in Bethesda, Jan 12 2006)
Only a fraction of processed RNA transcripts correspond toGeneCode annotated transcripts:70% correlated with annotated (m)RNAs52% correlate with annotated protein coding sequences
Substantial fraction of transcription is specific of cellularconditionsonly 2.6% of transfrags are common to all 11 cell-lines.
The same genomic sequence may be processed into multipleRNA sequences with different fates
Virtually the entire genome is transcribed as primary nucleartranscript in at least one direction.
Transcriptional output is MUCH more extensive AND much more
complex than previously thought.
Recall: Sequence-Structure Map of RNA
1. Redundancy: Many more sequences than structures
2. Sensitivity: Small changes in the sequences may lead to largechanges in the structure
3. Neutrality: A substantial fraction of mutations does not alter thestructure.
4. Isotropy: S(ψ) is “randomly” embedded in C (ψ).
Implications:1. Neutral Networks: S(ψ) forms a connected “percolating” network in
sequence space for all “common” structures.
2. Shape Space Covering: Almost all structures can be found in arelatively small neighborhood of almost every sequence.
3. Mutual Accessibility: The neutral networks of any two structuresalmost touch each other somewhere in sequence space.
Proc.Roy.Soc.B 255 279-284 (1994), Proc. Natl. Acad. Sci. USA 93, 397-401 (1996),
Bull. Math. Biol. 59, 339-397 (1997), RNA 7: 254-265 (2000).
RNA Sequencs
Multiple Sequence Alignment
CLUSTAL W
Minimum Energy Folding
Mountain Representations
Vienna RNA Package
Secondary Structures
Aligned Structures
Detect conserved sub-structures Confirmed conserved sub-structures
CHECK:
compensatory
mutations
. . .. . .. . .. . .
. . .
. . .. . .. . .
.
..
.. .. ..
... . .
. . .. . .
. . .
. . ... .. ... . .. . ..
..
...
...
. .
.... .
....
.
..
.. . .
......
.. . . ..
...
......
.
. . ... .. ... . ... .. ..
RNA Sequencs
Dot Plots
Multiple Sequence Alignment
Combined Pair
Table
Conserved sub-structures
McCaskill’s
Algorithm
CLUSTAL W
UGUGGUCGAUAU 0.99
0.01
0.45
0.00
0.77
0.34
sequence and pairing probability
CHECK
compensatory
mutations
Credibility RankingReduce Pair List
Minimum Energy Base Pairing Probabilities
Nucl. Acids Res. 26: 3825-3836 (1998), Comp. & Chem. 23: 401-414 (1999)
Examples: HIV-1 TAR-hairpin
. . . . . (((((((((((. (((((. . . ((((. . . . . . )))))))))))))))))))).
0 10 20 30 40 50
. . . . . ( ( ( ( ( ( ( ( ( ( ( . ( ( ( ( ( . . . ( ( ( ( . . . . . . ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) . ..
..
.(
((
((
((
((
((
.(
((
((
..
.(
((
(.
..
..
.)
))
))
))
))
))
))
))
))
))
).
GG
UC
UC
UC
UG
GU
UA
GAC
CA
GA
UCU
GA
GC
CUG
GG
A GC
UC
UC
UG
G
CU
AA
CU
AG
GG
A
A
Flaviviridae: Nucl. Acids Res. 29: 5079-5089 (2001), Picornaviridae: J. Gen. Virol. 85: 1113-1124 (2004), Broad
survey: Bioinformatics 20: 1495-1499 (2004)
Examples: Picornaviridae: Cis-acting-Replication Element
(CRE)
The function of the CRE probably involves the initiation of the synthesisof the negative-sense strand template RNA during virus replication.
CGACGGUUA
CAA C C A
GC
AGACCGUCG C
AUAC
AGUUCAAG
UCCA
A A U GCCG
UAU
UGAAC
CU
GUAUG
A UU A
G C
ACGGCCAC
AAACA
CC C A A U
CAACU
GUUG
GCCGU A
UCAUAUACCG
AACA
A A C ACUA
UAG
GUGAUGAU G
AAGUCAUCGUUG
AGAA
A A C GAAA
CAG
ACGGUGGCC
UC
U G
ACGGCUA
CAA A C A
AC
AAGCUGU
C G
UUUUGCAUUUUG
CA
A AUU
CAAGAUGUAGAG
C G
U AG U
Aphthovirus Enterovirus Cardiovirus HRV-A HRV-B Teschov. Hepatov.region:2C 2C 1B 2A 1B 2C 2C
Aphto ~~~~CGAC-GGUU------ACA-CCAAGCA------GACCGUCG~~~~~Entero CAUACAGU-UCAAG--------UCCAAAU-GCCGUAUUGAACCUGUAUGCardio ~~~~~ACG-GCCA---CAAACACCCAAUCAACUGU-UGGCCGU~~~~~~HRV-A ~~~AUCAUAUACCGAACAAACA---------CUAUAGGUGAUGAU~~~~HRV-B GAAGUCAU-CGUUGAGAAAACG---AAACA------GACGGUGGCCUC~Tescho ~~~~~~AC-GGCU--ACAAACA-----ACA------AGCUGU~~~~~~~Hepato UUUUGCAU-UUUG---CAAA--------------UUCAAGAUGUAGAG~ ~~~(((((-((((.......................)))))))))~~~~ 1.......10........20........30........40.........
predicted in Nucl. Acids Res. 29 5079-5089 (2001),
experimentally detected by Gerber, Wimmer Paul J.Virol. 75 10979-10990 (2001).
A Method for Large Genomes: RNAz
∗ Two ingredients: Thermodynamic Stability & StructureConservation
Measuring thermodynamic stability of ncRNAs
Naturally occurring structured RNAs have a lower foldingenergy compared to random sequences of the same size andbase composition?
1. Calculate native MFE m.2. Calculate mean µ and standard deviation σ of MFEs of a large
number of shuffled random sequences.3. Express significance in standard deviations from the mean as
z-score
z =m − µ
σ
Negative z-scores indicate that the native RNA is more stablethan the random RNAs.
Efficient calculation of stability z-scores
The mean µ and standard deviation σ ofrandom samples of a given sequence arefunctions of the length and the basecomposition:
µ, σ(length,GC
AT,G
C,A
T)
Calculating z-scores is thus a 5 dimensionalregression problem.
The regression problem is solved using aSupport Vector Machine regression algorithm.
The SVM was trained on 10,000 syntheticsequences spaced evenly in the variable space.
The regression calculation is of the sameaccuracy as the sampling procedure.
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
Sam
pled
z-s
core
s-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
Sampled z-scores
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
Cal
cula
ted
z-sc
ores
z-scores of known ncRNAs
ncRNA Type No. of Seqs. Mean z-score
tRNA 579 −1.845S rRNA 606 −1.62Hammerhead ribozyme III 251 −3.08Group II catalytic intron 116 −3.88SRP RNA 73 −3.37U5 spliceosomal RNA 199 −2.73
Functional RNAs are clearly more stable than randomsequences.
However: The scores are too small to discriminate reliably in agenome-wide screens since the z-score distributions haveheavy tails.
Consensus folding using RNAalifold
RNAalifold uses the same algorithms and energy parametersas RNAfold
Energy contributions of the single sequences are averaged
Covariance information (e.g. compensatory mutations) isincorporated in the energy model.
It calculates a consensus MFE consisting of an energy termand a covariance term:
J.Mol.Biol. 319:1059-1066 (2002)
The structure conservation index
The SCI is an efficient and convenient measure for secondarystructure conservation.
Separation of native ncRNAs from random controls in two
dimensions
0
0.2
0.4
0.6
0.8
1
1.2
0
0.2
0.4
0.6
0.8
1
1.2
5S rRNA tRNASignal recognitionparticle RNA
RNAseP U2 spliceosomal RNA U5 spliceosomalRNA
z-score
Str
uctu
re c
onse
rvat
ion
inde
x
Classification based on both scores
Classification based on both scores
Implementation and availability
The approach is implemented in ANSI C in the program RNAz.
The z-score regression is limited to 400 nucleotides.
The classification model is currently limited to alignments ofsix sequences.
At least an order of magnitude faster than other programs.
RNAz is freely available:Download from www.tbi.univie.ac.at/∼wash/RNAz
Proc. Natl. Acad. Sci. USA 102: 2454-2459 (2005)
Screening the human genome
Large scale comparative screen including: human, mouse, rat, dog chicken fugu, zebrafish
Reduction of the ≈ 3.095 MB human genome: Take ≈ 5% of the best conserved regions Remove all annotated coding exons Only take alignments strictly conserved in all 4 mammals.
→ 438,788 alignments alignments covering 82.64 MB
92.0M 94.0M 96.0M 98.0MMost conserved noncoding regions (present in at least human/mouse/rat/dog)
RNAz structural RNAs (P>0.5)
RNAz structural RNAs (P>0.9)
RefSeq Genes
90801000 90801500RNAz structural RNAs (P>0.9)
miRNAsmir-17 mir-19a mir-19b-1
mir-18 mir-20mir-92-1
(((((..((((((..((((((((.((.(((((...(((........)))...))))).)).))))))))...))))))....)))))
GTCAGAATAATGTCAAAGTGCTTACAGTGCAGGTAGTGATATGT-GCATCTACTGCAGTGAAGGCACTTGTAGCATTA-TG-GTGAC
GTCAGAATAATGTCAAAGTGCTTACAGTGCAGGTAGTGATGTGT-GCATCTACTGCAGTGAGGGCACTTGTAGCATTA-TG-CTGAC
GTCAGGATAATGTCAAAGTGCTTACAGTGCAGGTAGTGGTGTGT-GCATCTACTGCAGTGAAGGCACTTGTGGCATTG-TG-CTGAC
GTCAGAGTAATGTCAAAGTGCTTACAGTGCAGGTAGTGATATATAGAACCTACTGCAGTGAAGGCACTTGTAGCATTA-TG-TTGAC
GTCAATGTATTGTCAAAGTGCTTACAGTGCAGGTAGTATTATGGAATATCTACTGCAGTGGAGGCACTTCTAGCAATA-CACTTGAC
GTCTGTGTATTGCCAAAGTGCTTACAGTGCAGGTAGTTCTATGTGACACCTACTGCAATGGAGGCACTTACAGCAGTACTC-TTGAC
HumanMouseRatChickenZebrafishFugu
G U C A GA A
U A A U G UC A
A A G U G C U UA C A
G U G C A GG
U AG U G
AU A
UG
U_G
CAUC
UACUGCA
GUGAAGGCACUU
GUAGCAUUA
_UG_UUGAC
93104k 93106k 93108kRNAz structural RNAs (P>0.5)
RNAz structural RNAs (P>0.9)
H/ACA snoRNAs
C/D-box snoRNAs
ACA25ACA32
ACA1ACA8
ACA18ACA40
mgh28S-2412mgh28S-2410
Chr. 13
Chr. 13
Chr. 11
a
b
d
c
Results of Human Genome Screen
Genome Coverage Alignments RNAz hits p > 0.9Size Fraction Number Size Fraction of Number
(MB) (%) (MB) input (%)Human genome 3,095.02 100.00 –PhastCons most conserved 137.85 4.81 1,601,903without coding regions 110.04 3.84 1,291,385without alignments < 50nt 103.83 3.33 564,455Set 1: 4 Mammals 82.64 2.88 438,788 5.46 6.62 35,985Set 2: + Chicken 24.00 0.85 104,266 1.34 5.50 8,802Set 3: + Fugu or zebrafish 6.86 0.24 30,896 0.14 2.03 996
Nature Biotechn. 23: 1383-1390 (2005)
Pictures instead of Numbers
91,676
35,958
20,391
8,8022,916 9965,6616,898
2,281
26,508
2087950
30,000
60,000
90,000
Str
uctu
rale
lem
ents Observed
Expected
P > 0.5 P > 0.9
6.6%15.1%
Structural RNA
Estimated false positives
Other conservednoncoding element s
4 Mammals
4 Mammals+ chicken
All vertebrates
P > 0.5 P > 0.5 P > 0.5P > 0.9 P > 0.9 P > 0.9
Distribution related to known protein gene annotation
Known gene< 10 kb from nearest gene> 10 kb from nearest gene
Intron of coding region3’−UTR (exon or intron)
1538016860
3745
2866
283011205
5’−UTR (exon or intron)
Sensitivity on known classes of ncRNAs
Detected ( > 0.9)P
Detected (0.5 < < 0.9)P
Not detected
Not in input setmicroRNA
(207)C/D snoRNA
(256)H/ACA
(86)
45
150
7 522
14
9
41
12977
2624
Not all ncRNAs have conserved secondary structures!
chr7: RNAz_set1_50
EvoFoldsno/miRNA
Conservation
RepeatMasker
26.90m 26.95m 27.00m 27.05m
HOXA1
chr7.279
HOXA2
HOXA3
chr7.283
HOXA4
HOXA5
HOXA6
chr7.287
HOXA7
HOXA10HOXA9
chr7.290
HOXA11
hoxa11-as
HOXA13 chr7.295
EVX1
HOXA1HOXA1
AC004079.7AC004079.7AC004079.7
AC004079.7HOXA2
HOXA3
HOXA3HOXA3
AC010990.1HOXA3HOXA3
AC010990.1AC010990.1
AC010990.1
HOXA4
AC004080.14HOXA5HOXA5
HOXA6HOXA6
AC004080.14HOXA6
AC004080.14AC004080.14
HOXA7HOXA7
HOXA9HOXA9
HOXA9
HOXA9
HOXA9
HOXA10HOXA10
HOXA10
HOXA10HOXA10
HOXA11HOXA11
HOXA13 EVX1
AC004080.12AC004080.12AC004080.12
AC004080.13
AC004080.15AC004080.15
AC004080.1AC004080.1
AC004080.1AC004080.17
AC004080.18AC004080.19
Affy Transcription
GENCODE
GENCODEputative
mRNA
alternativesplicing
Other RNAz Screens
Urochordates: Ciona intestinalis & Ciona savignyi
only a few conserved RNA with Oikopleura dioica
Bioinformatics 21(S2): i77-i78
Nematodes: Caernorhabditis elegans & Caenorhabditis
briggsae
JEZ:MDE 2006 epub
Teleost fishes: Danio rerio, Takifugu rubripes, Tetraodon
nigroviridis, Oryzias latipes (partial)(in progress)
Trypanosomatids: Trypananosoma and Leishmania species
Yeasts. (joint work with Kay Nieselt and Stephan Steigele)
Summary
Predicted structured RNAs (RNAz predictions, p > 0.9)
Teleosts Mammals36000
Yeasts Nematods Insects Urochordates4000 20002500
? ?
Trypanosomatids
rRNA, tRNA, snoRNA, snRNA ...
>10000
1000 unknownconserved
500<200
Novel Human ncRNA Candidates
GU
GGAGGCCU
UUGUCCGCUG
GAG
G C A G CGUU A U
GG G A
AG C
A G G C CA C CUUCCAAAGCCUGCACAAGGG
CCU C
CAG
GCAG
UGG A G G
U AGA
CCCCUC
GG
UG
CU
CC
AG
C AC
AUG
CU
GG
AG
UG
A
CGC
GGC
GCG
CGC
CGG
C
GC
GC
GC
GC
UA
GC
0 10 20 30 40 50 60 70 80 90 100
110
120
Novel ncRNA Candidates in Caenorhabditis
__
__
_ACCUU
AC
UCGAA
AU _ A C C
CG
UCGAUGAA
GA
CCACUAA
AU
GA C
GA A
UC
CU
AA
UA
AC
CCA A
UGG
GU
UUCA
UU
GCG
GA
UAU
GA
GGCA
UU
UG
UC
U
GAG
CG
GG
GUCU C
GGUC
C
GG
CGUC
AGUGGGUUAU
CG
UAUUUCUCUC
CC
UU
CGGG G
_AAU
UU
CCCAU
CGGC
ACCAA
CUU
GACCG U
UG
CGU C
AAU
U CGGUC
CGG
AGUCAAUGGGUU
AUCU
UU C A A
AACC
C_C
CCAUUGACAA
CAA
CUU
GACCG
GCG
U
CeN23 (UM1) CeN74 (UM3) CeN77 (UM3)unknown sb-RNA sb-RNA
__GAUCA
UGC
__U
CAUGCU
__
__
CUCAACCAG
UUA C C C U A C C
UGUCC
UGG CUGUGG
ACA
C C C A C A GU
AC
GC
AUUCG
GUACAG
UA
AC
CA
UCA
A CG
UG
GC A C
AAU
UA
CA
CCG A C AC
C C A CA
A C CG
GA
CAUGACACU
GG
UC
G UC
GG
AUC
A AG
ACA
A U AAC
ACGU
CUC
UUGU
CC
AGU
GGC
CA
ACU
GU
CC
GAUGG
C C G G GU
AUACGGU
AGGUGGCG
AC
GCGGU
GU
ACA
UG
GA
CG
GA
UU
CAAGAG
UG
G
UCUGA
CUAU
C A GAAA
UAAUCGAU
UCCG
GUUUGAAUUGUUUCAAUUGU
GA C
UGCAAGGAAACAAU C C
GC
UUCAAA
GCU
CG
AUCAAUCU
UCUC
GCCA
CA
AC
AU
AGCAUA
GA
UC
UC
UG
CUCAGAUAUCAAUU__UCUACAACA
UG
GA
GGU_
_ GA
GA
CGG
AC G
AG
CC
UC
UU
CA
UGAUUAGCAU
GA
UU C
UCA
UC
A
C
C
GC
AG
AA A C C A A A A
UA
ACAGA
AA
AA
ACAAACCAC
UU
A__
CA
513253 515948 513590(UM2) (UM3) (UM1)
Efforts to Annotate the RNAz Results
ongoing effort
Large number of microRNA candidates approximately 30-40 good H/ACA-box snoRNAs only 6% of hits (comparable to estimated false positive rate)
overlaps with predicted coding regions few clusters of signals with high sequence-similarity
work in progress: structure-based clustering (joint work withRolf Backofen’s lab in Freiburg)
BOTTOM LINE: most signals still unclassified.We need MUCH better methods to recognize members of knownRNA classes
RNAmicro: A classificator for microRNA Precursors
Input: Multiple Sequence alignment
Preprocessing: non-restrictive check for almost-hairpinstructureSome known microRNA precursors, notably some let-7
family members have small branches!
SVM Classification with few descriptors:Property # DescriptorsStructure 2 ls , lhSequence composition 1 G+CSequence conservation 4 S5′ , S3′ , S0 , SminThermodynamic stability 4 E , ǫ, η, z
Structure conservation 1 Econs
ISMB 2006, in press
Results: Caenorhabditis
351 158
RNAmicroP > 0.5 P > 0.9
3666
19
00
6
86
9
2 7
2
45
626 31 5
1251452675
RNAz
miRNA registry 7.1
Grad et al 2003
other RNAs
206
Results: Mammals
5440 1491
RNAmicroP > 0.5 P > 0.9
RNAz
208481
177
00
2
2541
72
10 21
33
miRNA registry 7.1
Berezikov et al. 2005
203014 3826 1260
38
846
Clustering
A
B
D E
AB DE C
A
B
DE
CE
BD
AE
AD
AC
AB
BC
BE
CD
C
DE
C
dot.ps
U A C G A C G G A C U U A C G G A C U U A C G
U A C G A C G G A C U U A C G G A C U U A C GUA
CG
AC
GG
AC
UU
AC
GG
AC
UU
AC
G
UA
CG
AC
GG
AC
UU
AC
GG
AC
UU
AC
G
dot.ps
A U C A C U C G U A C U G U A C
A U C A C U C G U A C U G U A CAU
CA
CU
CG
UA
CU
GU
AC
AU
CA
CU
CG
UA
CU
GU
AC
dot.ps
A U C A C U C G U A C U G U A C
A U C A C U C G U A C U G U A CAU
CA
CU
CG
UA
CU
GU
AC
AU
CA
CU
CG
UA
CU
GU
AC
dot.ps
A U C A C U C G U A C U G U A C
A U C A C U C G U A C U G U A CAU
CA
CU
CG
UA
CU
GU
AC
AU
CA
CU
CG
UA
CU
GU
AC
dot.ps
U A C G A C G G A C U U A C G G A C U U A C G
U A C G A C G G A C U U A C G G A C U U A C GUA
CG
AC
GG
AC
UU
AC
GG
AC
UU
AC
G
UA
CG
AC
GG
AC
UU
AC
GG
AC
UU
AC
G
Pro
ofof
Con
cept:
tRN
As
inCio
na
inte
stin
alis
Arg/Asn2 2 Arg 2 Thr 2 ~ Ile2 Phe Cys 2 2 Lys 2 Gln 3 Gly 2 2 Val Glu 2 Pro~ 2 Met Arg Met Gln Ala ~~~4 2 Ile 4 Ser Leu Tyr
0.0
0.050.1
0.150.2
0.250.3
0.350.4
Summary
Some classes of ncRNAs, namely the structures ones, can befound efficiently by means of comparative genomics
There are Tens of Thousands of structured RNAs of unknownfunction in the human genome
Some of them probably act, like microRNA and snoRNAs bybinding to other RNAs. These could be investigated usingRNA cofolding approaches (ongoing research).
So far, we know only of the proverbial tip of the iceberg of the
complexity of cellular regulation
& RNA bioinformatics is a really cool research topic ...
Acknowledgments: It’s not my fault . . .
Leipzig: Kristin Missal, Dominic Rose, Jana Hertel, ManjaLindemeyer, Matthias Kruspe, Sonja J. Prohaska, ClaudiaFried, Roman Stocsits, Axel Mosig Bettina Muller (FH
Weihenstephan), Katrin Sameith (U Jena)
Vienna: Stefan Washietl, Ivo L. Hofacker, Christoph Flamm,Andrea Tanzer, Stefan BernhartSusanne Rauscher, Caroline Thurner, Christina Witwer, Ingrid
Abfalter, and many others
Havard: Walter Fontana
Beijing: Xiaopeng Zhu, Wei Deng, Geir Skogerbø, RunshengChen
Tubingen: Stephan Steigele, Kay Nieselt,
Copenhagen: Jan Gorodkin, Stefan Seemann
Freiburg: Rolf Backofen, Sebastian Will
top related