rna bioinformatics beyond thepks/presentation/budapest-08.pdfrna bioinformatics beyond the one...
TRANSCRIPT
RNA Bioinformatics Beyond theOne Sequence-One Structure Paradigm
Peter Schuster
Institut für Theoretische Chemie, Universität Wien, Austriaand
The Santa Fe Institute, Santa Fe, New Mexico, USA
2008 Molecular Informatics and Bioinformatics
Collegium Budapest, 27.– 29.03.2008
Web-Page for further information:
http://www.tbi.univie.ac.at/~pks
1. Computation of RNA equilibrium structures
2. Inverse folding and neutral networks
3. Evolutionary optimization of structure
4. Suboptimal conformations and kinetic folding
1. Computation of RNA equilibrium structures
2. Inverse folding and neutral networks
3. Evolutionary optimization of structure
4. Suboptimal conformations and kinetic folding
OCH2
OHO
O
PO
O
O
N1
OCH2
OHO
PO
O
O
N2
OCH2
OHO
PO
O
O
N3
OCH2
OHO
PO
O
O
N4
N A U G Ck = , , ,
3' - end
5' - end
Na
Na
Na
Na
5'-end 3’-endGCGGAU AUUCGCUUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCAGCUC GAGC CCAGA UCUGG CUGUG CACAG
Definition of RNA structure
N = 4n
NS < 3n
Criterion: Minimum free energy (mfe)
Rules: _ ( _ ) _ {AU,CG,GC,GU,UA,UG}
A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
Conventional definition of RNA secondary structures
H-type pseudoknot
jnn
j jnn SSSS −−
= −+ ⋅+= ∑ 1
1 11
Counting the numbers of structures of chain length n n+1
M.S. Waterman, T.F. Smith (1978) Math.Bioscience 42:257-266
Restrictions on physically acceptable mfe-structures: 3 and 2
Size restriction of elements: (i) hairpin loop(ii) stack σ
λ
≥
≥
stack
loop
n
n
⎣ ⎦∑∑
+−
−= +−+
−
−+= +−+
−++
Ξ=Φ
⋅Φ+=Ξ
Φ+Ξ=
2/)1(
1 121
2
22 11
111
λ
σ
σλ
m
k kmm
m
k kmkmm
mmm
SS
S
Sn # structures of a sequence with chain length n
Recursion formula for the number of physically acceptable stable structures
I.L.Hofacker, P.Schuster, P.F. Stadler. 1998. Discr.Appl.Math. 89:177-207
RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA
Empirical parameters
Biophysical chemistry: thermodynamics and
kineticsRNA folding:
Structural biology,spectroscopy of biomolecules, understanding
molecular function
Sequence, structure, and design
RNA structureof minimal free
energy
S1(h)
S9(h)
Free
ene
rgy
G
0
Minimum of free energy
Suboptimal conformations
S0(h)
S2(h)
S3(h)
S4(h)
S7(h)
S6(h)
S5(h)
S8(h)
The minimum free energy structures on a discrete space of conformations
Elements of RNA secondary structures as used in free energy calculations
L∑∑∑∑ ++++=∆
loopsinternalbulges
loopshairpin
pairsbaseofstacks
,3000 )()()( iblklij ninbnhgG
Maximum matching
An example of a dynamic programming computation of the maximum number of base pairs
Back tracking yields the structure(s).
i i+1 i+2 k
Xi,k-1
j-1 j
Xk+1,j
j+1
[ k+1,j ][i,k-1]
( ){ }1,,11,1,1, )1(max,max ++−−≤≤+ ++= jkjkkijkijiji XXXX ρ
j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15i G G C G C G C C C G G C G C C1 G * * 1 1 1 1 2 3 3 3 4 4 5 6 62 G * * 0 1 1 2 2 2 3 3 4 4 5 63 C * * 0 1 1 1 2 3 3 3 4 5 54 G * * 0 1 1 2 2 2 3 4 5 55 C * * 0 1 1 2 2 3 4 4 46 G * * 1 1 1 2 3 3 3 47 C * * 0 1 2 2 2 2 38 C * * 1 1 1 2 2 29 C * * 1 1 2 2 2
10 G * * 1 1 1 211 G * * 0 1 112 C * * 0 113 G * * 114 C * *15 C *
Minimum free energy computations are based on empirical energies
1. Computation of RNA equilibrium structures
2. Inverse folding and neutral networks
3. Evolutionary optimization of structure
4. Suboptimal conformations and kinetic folding
RNA sequence: GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA
RNA folding:
Structural biology,spectroscopy of biomolecules, understanding
molecular function
Inverse FoldingAlgorithm
Iterative determinationof a sequence for the
given secondarystructure
RNA structureof minimal free
energy
Inverse folding of RNA:
Biotechnology,design of biomolecules
with predefined structures and functions
Sequence, structure, and design
Compatibility of sequences and structures
Compatibility of sequences and structures
Inverse folding algorithm
I0 I1 I2 I3 I4 ... Ik Ik+1 ... It
S0 S1 S2 S3 S4 ... Sk Sk+1 ... St
Ik+1 = Mk(Ik) and dS(Sk,Sk+1) = dS(Sk+1,St) - dS(Sk,St) < 0
M ... base or base pair mutation operator
dS (Si,Sj) ... distance between the two structures Si and Sj
‚Unsuccessful trial‘ ... termination after n steps
Approach to the target structure Sk in the inverse folding algorithm
The inverse folding algorithm searches for sequences that form a given RNA secondary structure under the minimum free energy criterion.
A mapping and its inversion
Gk = ( ) | ( ) = -1 US I Sk j j kI
( ) = I Sj k
Space of genotypes: = {I
S
I I I I I
S S S S S
1 2 3 4 N
1 2 3 4 M
, , , , ... , } ; Hamming metric
Space of phenotypes: , , , , ... , } ; metric (not required)
N M
= {
1. Computation of RNA equilibrium structures
2. Inverse folding and neutral networks
3. Evolutionary optimization of structure
4. Suboptimal conformations and kinetic folding
Phenylalanyl-tRNA as target structure
Structure of andomly chosen initial sequence
Evolution in silico
W. Fontana, P. Schuster, Science 280 (1998), 1451-1455
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Evolution of RNA molecules as a Markow process and its analysis by means of the relay series
Replication rate constant:
fk = / [ + dS(k)]
dS(k) = dH(Sk,S )
Selection constraint:
Population size, N = # RNA molecules, is controlled by
the flow
Mutation rate:
p = 0.001 / site replication
NNtN ±≈)(
The flowreactor as a device for studies of evolution in vitro and in silico
In silico optimization in the flow reactor: Evolutionary Trajectory
28 neutral point mutations during a long quasi-stationary epoch
Transition inducing point mutations change the molecular structure
Neutral point mutations leave the molecular structure unchanged
Neutral genotype evolution during phenotypic stasis
A sketch of optimization on neutral networks
Randomly chosen initial structure
Phenylalanyl-tRNA as target structure
Application of molecular evolution to problems in biotechnology
1. Computation of RNA equilibrium structures
2. Inverse folding and neutral networks
3. Evolutionary optimization of structure
4. Suboptimal conformations and kinetic folding
RNA secondary structures derived from a single sequence
An algorithm for the computation of all suboptimal structures of RNA molecules using the same concept for retrieval as applied in the sequence alignment algorithm by
M.S. Waterman and T.F. Smith. Math.Biosci. 42:257-266, 1978.
An algorithm for the computation of RNA folding kinetics
The Folding Algorithm
A sequence I specifies an energy ordered set of compatible structures S(I):
S(I) = {S0 , S1 , … , Sm , O}
A trajectory Tk(I) is a time ordered series of structures in S(I). A folding trajectory is defined by starting with the open chain O and ending with the global minimum free energy structure S0 or a metastable structure Sk which represents a local energy minimum:
T0(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , S0}
Tk(I) = {O , S (1) , … , S (t-1) , S (t) , S (t+1) , … , Sk}
Kinetic equation
( )
1,,1,0
)()( 1
0
1
0
1
0
+=
−=−= ∑∑∑ +
=
+
=
+
=
mk
kPPktPtPdt
dP m
i kikim
i ikm
i kiikk
K
Transition rate prameters Pij(t) are defined by
Pij(t) = Pi(t) kij = Pi(t) exp(-∆Gij/2RT) / Σi
Pji(t) = Pj(t) kji = Pj(t) exp(-∆Gji/2RT) / Σj
exp(-∆Gki/2RT)
The symmetric rule for transition rate parameters is due to Kawasaki (K. Kawasaki, Diffusion constants near the critical point for time dependent Ising models. Phys.Rev. 145:224-230, 1966).
∑ +
≠==Σ
2
,1
m
ikkk
Formulation of kinetic RNA folding as a stochastic process and by reaction kinetics
Sh
S1(h)
S6(h)
S7(h)
S5(h)
S2(h)
S9(h)
Free
ene
rgy
G
0
Local minimum
Suboptimal conformations
Search for local minima in conformation space
Free
ene
rgy
G
0
"Reaction coordinate"
Sk
S{Saddle point T {k
Free
ene
rgy
G
0
Sk
S{
T {k
"Barrier tree"
Definition of a ‚barrier tree‘
CUGCGGCUUUGGCUCUAGCC....((((........)))) -4.30(((.(((....))).))).. -3.50(((..((....))..))).. -3.10..........(((....))) -2.80..(((((....)))...)). -2.20....(((..........))) -2.20((..(((....)))..)).. -2.00..((.((....))....)). -1.60....(((....)))...... -1.60.....(((........))). -1.50.((.(((....))).))... -1.40....((((..(...).)))) -1.40.((..((....))..))... -1.00(((.(((....)).)))).. -0.90(((.((......)).))).. -0.90....((((..(....))))) -0.80.....((....))....... -0.80..(.(((....))))..... -0.60....(((....)).)..... -0.60(((..(......)..))).. -0.50..(((((....)).)..)). -0.50..(.(((....))).).... -0.40..((.......))....... -0.30..........((......)) -0.30...........((....)). -0.30(((.(((....)))).)).. -0.20....(((.(.......)))) -0.20....(((..((....))))) -0.20..(..((....))..).... 0.00.................... 0.00.(..(((....)))..)... 0.10
S0 S1
M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
CUGCGGCUUUGGCUCUAGCC....((((........)))) -4.30(((.(((....))).))).. -3.50(((..((....))..))).. -3.10..........(((....))) -2.80..(((((....)))...)). -2.20....(((..........))) -2.20((..(((....)))..)).. -2.00..((.((....))....)). -1.60....(((....)))...... -1.60.....(((........))). -1.50.((.(((....))).))... -1.40....((((..(...).)))) -1.40.((..((....))..))... -1.00(((.(((....)).)))).. -0.90(((.((......)).))).. -0.90....((((..(....))))) -0.80.....((....))....... -0.80..(.(((....))))..... -0.60....(((....)).)..... -0.60(((..(......)..))).. -0.50..(((((....)).)..)). -0.50..(.(((....))).).... -0.40..((.......))....... -0.30..........((......)) -0.30...........((....)). -0.30(((.(((....)))).)).. -0.20....(((.(.......)))) -0.20....(((..((....))))) -0.20..(..((....))..).... 0.00.................... 0.00.(..(((....)))..)... 0.10 M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm,
I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
Arrhenius kinetics
M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
Arrhenius kinetic Exact solution of the kinetic equation
M.T. Wolfinger, W.A. Svrcek-Seiler, C. Flamm, I.L. Hofacker, P.F. Stadler. 2004. J.Phys.A: Math.Gen. 37:4731-4741.
JN1LH
1D
1D
1D
2D
2D
2D
R
R
R
GGGGUGGAAC GUUC GAAC GUUCCUCCCCACGAG CACGAG CACGAG
-28.6 kcal·mol-1
G/
-31.8 kcal·mol-1
GGG
G
GG
CCC
CC
CAA
UU
UU
G
G C
CUU A
A
GG
G
CCC
AA
AA
G C
GCAA
GC
/G
-28.2 kcal·mol-1
G G
G GG
GGG
CCC
CC C
C C
U
G GG GC C
C CA AA A
A AA AU U
U U
UG
GC
CA A
-28.6 kcal·mol-1
3
3
3
13
13
13
23
23
23
33
33
33
44
44
44
5'
5' 3’
3’
Design of an RNA switch
45
89
11
19 20
2425 27
33 3436 38 39
4146 47
3
49
1
2
67
10
12 13 1415
16 1718
2122 232628 29 30313235 3740 4243
44454850
-26.0
-28.0
-30.0
-32.0
-34.0
-36.0
-38.0
-40.0
-42.0
-44.0
-46.0
-48.0
-50.0
2.77
5.32
2.09
3.42.362.
44
2.44
2.44
1.46 1.44
1.66
1.9
2.142.512.14
2.51
2.14 1.47 1.49
3.04 2.973.04
4.886.136.
8
2.89
Fre
e en
ergy
[k
cal /
mol
e]
J1LH barrier tree
J.H.A. Nagel, C. Flamm, I.L. Hofacker, K. Franke, M.H. de Smit, P. Schuster, and C.W.A. Pleij.Nucleic Acids Res. 34:3568-3576 (2006)
A ribozyme switch
E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis- -virus (B)
The sequence at the intersection:
An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
Acknowledgement of support
Fonds zur Förderung der wissenschaftlichen Forschung (FWF)Projects No. 09942, 10578, 11065, 13093
13887, and 14898
Wiener Wissenschafts-, Forschungs- und Technologiefonds (WWTF) Project No. Mat05
Jubiläumsfonds der Österreichischen NationalbankProject No. Nat-7813
European Commission: Contracts No. 98-0189, 12835 (NEST)
Austrian Genome Research Program – GEN-AU: BioinformaticsNetwork (BIN)
Österreichische Akademie der Wissenschaften
Siemens AG, Austria
Universität Wien and the Santa Fe Institute
Universität Wien
Coworkers
Walter Fontana, Harvard Medical School, MA
Christian Forst, Christian Reidys, Los Alamos National Laboratory, NM
Peter Stadler, Bärbel Stadler, Universität Leipzig, GE
Jord Nagel, Kees Pleij, Universiteit Leiden, NL
Christoph Flamm, Ivo L.Hofacker, Andreas Svrček-Seiler,Universität Wien, AT
Stefan Bernhart, Jan Cupal, Lukas Endler, Kurt Grünberger,Michael Kospach, Ulrike Langhammer, Rainer Machne, Ulrike Mückstein,
Hakim Tafer, Andreas Wernitznig, Stefanie Widder, Michael Wolfinger, Stefan Wuchty, Dilmurat Yusuf, Universität Wien, AT
Ulrike Göbel, Walter Grüner, Stefan Kopp, Jaqueline Weber, Institut für Molekulare Biotechnologie, Jena, GE
Universität Wien
Web-Page for further information:
http://www.tbi.univie.ac.at/~pks