software article open access viennarna package 2

14
SOFTWARE ARTICLE Open Access ViennaRNA Package 2.0 Ronny Lorenz 1* , Stephan H Bernhart 1 , Christian Höner zu Siederdissen 1 , Hakim Tafer 2 , Christoph Flamm 1 , Peter F Stadler 2,1,3,4,5,6 and Ivo L Hofacker 1,3,7* Abstract Background: Secondary structure forms an important intermediate level of description of nucleic acids that encapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely used as a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exact dynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well as thermodynamic properties. Results: The ViennaRNA Package has been a widely used compilation of RNA secondary structure related computer programs for nearly two decades. Major changes in the structure of the standard energy model, the Turner 2004 parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variants prompted a major technical overhaul of both the underlying RNAlib and the interactive user programs. New features include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles of structures, additional output information such as centroid structures and maximum expected accuracy structures derived from base pairing probabilities, or z-scores for locally stable secondary structures, and support for input in fasta format. Updates were implemented without compromising the computational efficiency of the core algorithms and ensuring compatibility with earlier versions. Conclusions: The ViennaRNA Package 2.0, supporting concurrent computations via OpenMP, can be downloaded from http://www.tbi.univie.ac.at/RNA. Background A typical single stranded-nucleic acid molecule has the propensity to form double helical structures causing the molecule to fold back onto itself. Simple rules of com- plementary base pairing govern this process, which results in a regular pattern of Watson-Crick and GU pairings (helices) and intervening stretches of less regu- larly ordered nucleotides (loops), collectively known as the molecules secondary structure. Secondary structure elements may be placed in close spatial proximity allow- ing additional non-covalent interactions. These are not as frequent and often are energetically less favorable compared to canonical base pairs, thus rendering the 3- dimensional tertiary structure of an RNA to be domi- nated by the underlying scaffold of the secondary struc- ture. The canonical base pairing governs not only the thermodynamics but also the folding kinetics, which can be approximated as a hierarchical process in which sec- ondary structure is formed before tertiary structure [1]. The dominance of base pairing and the confinement to a single interaction partner makes it possible to model RNA (and DNA) secondary structures at a purely combinatorial level, completely ignoring both atom-scale details and spatial embeddings. Formally, an RNA sec- ondary structure is a (labeled) graph whose nodes repre- sent nucleotides. The edge set contains edges between consecutive nodes (i, i + 1) representing the phosphate backbone as well as edges between base pairs. For the latter, the following conditions must hold: 1. base pair edges are formed only between nucleo- tides that form Watson-Crick or GU base pairs; 2. no two base pair edges emanate from the same vertex, i.e., a secondary structure is a matching; 3. base pair edges span at least three unpaired bases; 4. if the vertices are placed in 5to 3order on the circumference of a circle and edges are drawn as straight lines, no two edges cross. * Correspondence: [email protected]; [email protected] 1 Institute for Theoretical Chemistry and Structural Biology, University of Vienna, Währingerstraße 17/3, A-1090 Vienna, Austria Full list of author information is available at the end of the article Lorenz et al. Algorithms for Molecular Biology 2011, 6:26 http://www.almob.org/content/6/1/26 © 2011 Lorenz et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: others

Post on 20-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

SOFTWARE ARTICLE Open Access

ViennaRNA Package 2.0Ronny Lorenz1*, Stephan H Bernhart1, Christian Höner zu Siederdissen1, Hakim Tafer2, Christoph Flamm1,Peter F Stadler2,1,3,4,5,6 and Ivo L Hofacker1,3,7*

Abstract

Background: Secondary structure forms an important intermediate level of description of nucleic acids thatencapsulates the dominating part of the folding energy, is often well conserved in evolution, and is routinely usedas a basis to explain experimental findings. Based on carefully measured thermodynamic parameters, exactdynamic programming algorithms can be used to compute ground states, base pairing probabilities, as well asthermodynamic properties.

Results: The ViennaRNA Package has been a widely used compilation of RNA secondary structure relatedcomputer programs for nearly two decades. Major changes in the structure of the standard energy model, theTurner 2004 parameters, the pervasive use of multi-core CPUs, and an increasing number of algorithmic variantsprompted a major technical overhaul of both the underlying RNAlib and the interactive user programs. Newfeatures include an expanded repertoire of tools to assess RNA-RNA interactions and restricted ensembles ofstructures, additional output information such as centroid structures and maximum expected accuracy structuresderived from base pairing probabilities, or z-scores for locally stable secondary structures, and support for input infasta format. Updates were implemented without compromising the computational efficiency of the corealgorithms and ensuring compatibility with earlier versions.

Conclusions: The ViennaRNA Package 2.0, supporting concurrent computations via OpenMP, can bedownloaded from http://www.tbi.univie.ac.at/RNA.

BackgroundA typical single stranded-nucleic acid molecule has thepropensity to form double helical structures causing themolecule to fold back onto itself. Simple rules of com-plementary base pairing govern this process, whichresults in a regular pattern of Watson-Crick and GUpairings (helices) and intervening stretches of less regu-larly ordered nucleotides (loops), collectively known asthe molecule’s secondary structure. Secondary structureelements may be placed in close spatial proximity allow-ing additional non-covalent interactions. These are notas frequent and often are energetically less favorablecompared to canonical base pairs, thus rendering the 3-dimensional tertiary structure of an RNA to be domi-nated by the underlying scaffold of the secondary struc-ture. The canonical base pairing governs not only thethermodynamics but also the folding kinetics, which can

be approximated as a hierarchical process in which sec-ondary structure is formed before tertiary structure [1].The dominance of base pairing and the confinement

to a single interaction partner makes it possible tomodel RNA (and DNA) secondary structures at a purelycombinatorial level, completely ignoring both atom-scaledetails and spatial embeddings. Formally, an RNA sec-ondary structure is a (labeled) graph whose nodes repre-sent nucleotides. The edge set contains edges betweenconsecutive nodes (i, i + 1) representing the phosphatebackbone as well as edges between base pairs. For thelatter, the following conditions must hold:

1. base pair edges are formed only between nucleo-tides that form Watson-Crick or GU base pairs;2. no two base pair edges emanate from the samevertex, i.e., a secondary structure is a matching;3. base pair edges span at least three unpaired bases;4. if the vertices are placed in 5’ to 3’ order on thecircumference of a circle and edges are drawn asstraight lines, no two edges cross.

* Correspondence: [email protected]; [email protected] for Theoretical Chemistry and Structural Biology, University ofVienna, Währingerstraße 17/3, A-1090 Vienna, AustriaFull list of author information is available at the end of the article

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

© 2011 Lorenz et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

Page 2: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

The last condition ensures that the graph is outerpla-nar and therefore excludes so-called pseudo-knots.Matching problems usually have cost functions deter-mined by edge-weights. The earliest predictions of RNAsecondary structures in the early 1970s indeed usedsuch simple energy models [2]. Detailed melting experi-ments, however, soon showed that a different, morecomplex type of energy function is necessary to properlymodel the thermodynamics of nucleic acid structures.Instead of individual base pairs, the energy contributionsare dominated by base-pair stacking and the destabiliz-ing entropic effects of unpaired “loops”. Sequence-dependent energy parameters for these building blockscontribute to a very good approximation additively tothe folding energy [3]. Over the last two decades, thisadditive standard energy model has been repeatedlyrefined and updated, see e.g. [4-9].The RNA folding problem is solvable by means of

dynamic programming. The simplest version, known asmaximum circular matching problem, accounts for basepairing energies only [10,11]. In the early 1980s Nussi-nov and Jacobson [12] and Michael Zuker with colla-borators [13,14] demonstrated that the loop-basedenergy model is also amenable to the same algorithmicideas. Their work made computational RNA structureprediction accurate and efficient enough for practicaluse, resulting in the first versions of mfold. A decadelater, John McCaskill realized that the dynamic pro-gramming recursions can be adapted to compute thepartition function of an equilibrium ensemble of RNAmolecules [15], paving the way for efficient computa-tional access to accurate thermodynamic modellingwithout exceeding an asymptotic time complexity ofO(n3) .The secondary structure model of RNA perfectly fits

together with modern genomics and transcriptomicssince it works at the same level of abstraction, treatingnucleotides as basic entities. With the increasing avail-ability of RNA sequence data, and the realization thatmany of the functional RNAs have evolutionary well-conserved secondary structures, many research groupsdeveloped a plethora of specialized tools for variousaspects of RNA bioinformatics. As an alternative to thedirect measurement of thermodynamic parameters, forinstance, machine learning approaches employing sto-chastic context free grammars (SCFG) were introducede.g. in the infernal suite [16,17]. The algorithmicwork horses of the SCFG approach, the Cocke-Younger-Kasami (CYK), the inside and the outside algorithms,are also dynamic programming schemes. They are, infact, very close cousins of the minimum free energy andpartition function folding algorithms. The contrafoldtools in fact recently bridged the apparent gap between

the thermodynamic and the machine learning approachto RNA bioinformatics proposing to learn a parameterset for a SCFG that structurally matches the standardenergy model [18].Several other tools implement dynamic programming

based RNA secondary structures prediction: UNAfold[19] is the successor of the original mfold program andadds suppport for predicting RNA-RNA hybridization.RNAstructure [20] started as a reimplementation ofmfold with a graphical user interface in Windows, butis now available for other platforms and has added sev-eral additional algorithms such as partition functionfolding and suboptimal structures. The NUPACK suite[21] focuses on folding of several interacting RNAstrands and design problems. The group around KiyoshiAsai developed several tools focusing the usage of cen-troid and maximum expected accuracy (MEA) estima-tors, see e.g. [22]. Ye Ding’s Sfold program [23] wasthe first to introduce stochastic structure sampling. Thegroup around Robert Giegerich provides several RNArelated tools, notably the RNAshapes [24] program.The Vienna RNA Package [25] has its roots in a

series of large-scale simulation studies aiming at anunderstanding of adaptive evolution on rugged fitnesslandscapes [26-28] and the statistical properties of thesequence-structure relationships of RNA [29-31] ratherthan the detailed analysis of individual RNA moleculesof biological interest. The primary design goals for itsimplementation in the early 1990s, therefore, were two-fold. First and foremost, the basic folding algorithmswere to be implemented so as to be as efficient as possi-ble in their usage of both CPU and memory resources.The core algorithms are accessible as a C library, whichlater on was also equipped with Perl bindings to facili-tate interoperability with this commonly used scriptinglanguage. Secondly, the interactive programs were to beused mostly in (shell-script) pipelines, hence they use asimple command-line interface and, where possible, theyread from and write to a stream. This feature made iteasy to construct a suite of web services [32] providingeasy access to most functionalities of the Vienna RNAPackage. With the rising tide of first genomics andthen transcriptomics data, the need for both efficientimplementation and easy incorporation into pipelinesremained, even though the focus gradually shifted fromlarge-scale simulation to large-scale data analysis. Littlehas changed in the core folding algorithms in the 17years since the first publication [25] of the package. Onthe other hand, a variety of variants have been includedsuch as consensus structure prediction from alignmentsor scanning versions capable of dealing with local struc-tures in genome-scale data sets. The systematic overhaulof the Vienna RNA Package documented here was

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 2 of 14

Page 3: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

largely triggered by the publication of improved parame-trizations of the energy model, which affected nearlyevery component in the library, and by the progress incomputer technology, which led to the widespreaddeployment of shared-memory multi-core processors. Inorder to exploit these hardware features a restructuringof the RNA library to make it thread-safe and hence fitfor use in concurrent computations was required.Beyond these technical improvements, the Vienna RNAPackage 2.0 features a number of additions to itsalgorithmic repertoire, an improved API to RNAlib,and an expanded toolkit of auxiliary programs.

Interactive toolsSince its first release, the ViennaRNA Packageincluded interactive command-line tools which enableusers to access the high performance implementationsof the algorithms via a command-line interface. Toensure scalability of the use-cases all programs weredeveloped with the objective of handling input- and out-put-streams, facilitating their integration into UNIXpipes. Thus pre- and post-processing of the input/outputdata can proceed without the need of intermediateinput- or output-files. Most programs of the Vien-naRNA Package furthermore are able to operate inbatch mode, handling large sets of input data with a sin-gle call. By default, the programs of the ViennaRNAPackage generate an output that is meant to be easilyparsable while keeping it human-readable.The core of the package provides several variants of

the RNA folding recursion: energy minimization, parti-tion function and base pairing probabilities, backtracingof suboptimal structures, alignment-based as well asscanning versions. The decision whether a certain func-tionality is implemented as a separate stand alone pro-gram or as an optional command-line switch is basedon the compatibility of I/O formats and internal datastructures. Table 1 presents the implemented model var-iants as well as the data formats for each program,whereas Figure 1 illustrates example program callstogether with their corresponding output. In the follow-ing paragraphs, we provide a comprehensive summaryof programs included in the ViennaRNA Package.

FoldingThe main secondary structure prediction tool is RNA-fold, which computes the minimum free energy (MFE)and backtraces an optimal secondary structure. Usingthe -p option, RNAfold also uses McCaskill’s algo-rithm [15] to compute the partition function, the matrixof base pairing probabilities, and the centroid structure.The RNAfold output is a string representation of thestructure and the folding energy written to the standardoutput stream. With the -p option, it also creates a

PostScript file containing the base pairing probabilitymatrix. Circular RNA sequences are rare in nature andappear infrequently in practical applications. With the–circ option this case is handled as a post-processingfor the forward recursion and a preprocessing of thebackward recursions without compromising the perfor-mance of the folding algorithms for linear RNAs [33].Constraints can be supplied to the folding algorithmsenforcing that individual positions are paired, unpaired,or paired with specific partners.The program RNAsubopt can be used to generate

suboptimal structures. Using command-line options, itcan switch between three different ways of generatingthem: by default, it generates the complete set of subop-timal structures within a certain energy band, the size ofwhich can be chosen using the -e option [34]. With the-p option it uses stochastic backtracking [35] from thepartition function to generate a Boltzmann-weightedrandom sample of structures, effectively providing thefunctionality of sfold [23]. Finally, the -z option gen-erates suboptimal secondary structures according toZuker’s algorithm [36]. The resulting set consists, foreach basepair (i, j) that can be formed by the inputstructure, of the energetically most favorable structurethat contains the (i, j)-pair. This option implements afeature that has been used frequently in applications ofthe mfold package.RNALfold [37] is a “scanning” version of the folding

programs that can be used to calculate local stable sub-structures of very long RNA molecules. Local in thiscontext means that the sequence interval spanned by abase pair is limited by a user-defined upper bound (setby the -L option). Scanning versions of RNA foldingprograms conceptutally perform computations for allsequence-windows of a fixed size. Algorithmically, theyare faster than the naïve approach by re-using partialresults for overlapping windows. RNALfold does notcome with a partition function version because the glo-bal partition function with restricted base pair span is oflimited interest in practical applications. Instead, a sepa-rate program, RNAplfold [38], computes the basepairing probability averaged over all sequence windowsthat contain the putative pair. This tool can also beused to compute the local accessibilities, i.e., the prob-abilities that sequence intervals are single-stranded inthermodynamic equilibrium (option -a).RNA2Dfold [39] implements energy minimization,

partition function computations, and stochastic backtra-cing for the two dimensional projection of the secondarystructure space that is defined by the base pair distancesfrom the two prescribed reference structures. Therestricted ensembles of secondary structures are usefulin particular for tracing refolding pathways and to com-pute lower bounds of energy barriers between

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 3 of 14

Page 4: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

Table

1Mainfeatures

oftheinteractiveprogramsprovided

bytheViennaRNAPackage2.0

Prog

ram

Energymod

elvarian

tsDataform

ats

intram

olecular

bpinterm

olecular

bpstructure

constraint

cano

nical

structures

circular

sequ

ence

dang

ling

endmod

el(s)

centroid

structure

MEA

structure

subo

ptim

alstructures

base

pair

prob

abilitie

s/pa

rtition

functio

n

inpu

tform

at(s)

text

output

file(s)

PostScript

plot(s)

singlesequ

ence

analysis(globa

lvariant)

RNAfold

+-

++

+0,1,2,3

++

-+

F,V

-+

RNAsubopt

++

++

+0,1,2,3

NA

NA

B,E,Z

-F,V

--

RNAcofold

++

++

NA

0,1,2,3

--

-+

F,V

++

RNAup

++

++

-0,2

--

-+

V+

-

RNAduplex

-+

-+

-0,1,2,3

--

E-

V+

-

RNA2Dfold*

+-

--

+0,2

-+

B-

V-

-

RNAPKplex*

+-

-+

-0,1,2,3

--

E-

F,V

-+

RNAplex*

-+

+-

-2

NA

NA

E-

V,W

++

RNAsnoop*

++

++

-2

NA

NA

E-

V,W

++

singlesequ

ence

analysis(localvariant)

RNALfold

+-

-+

-0,1,2,3

--

--

F,V

--

RNAplfold

+-

-+

-0,2

--

--

F,V

++

comparativean

alysis(globa

lvariant)

RNAalifold

+-

++

+0,2

++

B+

C,S

-+

RNAaliduplex

-+

-+

-0,1,2,3

--

E-

C,S

++

comparativean

alysis(localvariant)

RNALalifold*

+-

-+

-0,1,2,3

--

-+

C,S

++

Misc.a

nalysis/Utilities

RNAeval

++

NA

NA

+0,1,2,3

NA

NA

NA

NA

F,V

--

RNAplot

NA

NA

NA

NA

+NA

NA

NA

NA

NA

F,V

-+

RNAheat

+-

-+

-0,2

--

--

F,V

--

RNAinverse

+-

NA

NA

-0,1,2,3

NA

NA

NA

NA

V-

-

RNApaln

+-

-+

-0,1,2,3

NA

NA

NA

+V

++

RNApdist

+-

--

-0,1,2,3

NA

NA

NA

+V

++

RNAdistance

+-

NA

NA

NA

NA

NA

NA

NA

NA

V+

+

Thecharacters

+an

d-show

presen

cean

dab

senceof

acertainfeature,

while

NAindicatesthat

thefeatureisno

tap

plicab

lein

agivencontext.Abb

reviations

ofinpu

tfileform

atsare(C)lu

stal-format,(F)asta-format,

(S)tockh

olm-format,a

nd(V)ie

nnaR

NA-format.S

uppo

rtforpred

ictio

nof

subo

ptim

alstructures

may

beim

plem

entedas

(B)oltzman

nweigh

tedsampling,

exha

ustiv

e(E)num

erationof

allstructures

inagivenen

ergy

band

,and

(Z)uker-stylesubo

ptim

alstructures.P

rogram

smarkedby

anasterisk(*)wereno

tinclud

edin

aprevious

releaseof

theViennaRNAPackage.

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 4 of 14

Page 5: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

alternative conformations of an RNA molecule.Although RNA2Dfold is based upon the usual dynamicprogramming recursion of energy-directed folding, theasymptotic time complexity is multiplied by a factor ofk2 · l2, where k and l are maximum base pair distances

to the first and the second reference structure, resp.Hence, the overall time complexity for a sequence oflength n is O(n7) . The memory requirements of O(n4)are also higher than for the regular secondary structureprediction scheme. However, since the implementation

$ RNAfold -p < Example.fa

>Example1

GCGACCCAUGCGAACGCGAGCAUUUGAAGCUAGAUGCCGUUUUGAAACGAAUGGGAACGCGAACGC

(((.(((((.((((.(((.(((((((....)))))))))).))....)).)))))..)))...... (-19.50)

(((.(((((.(({{.(((.(((((((....)))))))))).}}....)).)))))..)))...... [-20.45]

(((.(((((.((((.(((.(((((((....)))))))))).))....)).)))))..)))...... {-19.50 d=2.85}

frequency of mfe structure in ensemble 0.212986; ensemble diversity 4.22

G C G AC C C A U G

C G A AC G C GAG C A U U U G

A AG

CUAGAUGCCGUUUUGAAACGA

AUGGGAACGC

GAACG C

$ Utils/mountain.pl Example1_dp.ps

0 10 20 30 40 50 600

10

20

30

base pair probabilityminimum free energypositional entropy

$ Utils/relplot.pl Example1_ss.ps Example1_dp.ps > Example1_rss.ps

0 1.5

G C G AC C C A U G

C G A AC G C GAG C A U U U G

A AG

CUAGAUGCCGUUUUGAAACGA

AUGGGAA

CGCGA

ACG C

$ RNAplfold -W 32 -L 30 < allseamp.seq

c c g g a a a c c g a a c g c a g c a c c g c g g a u c u g g a a c g c c g c u a G a a c a a c u a u c u g u a G c g c g a a a a c a u u g u g U A G c a u U A G u u u g c g u g c a a a g a a c g c a g c a c c g a a c c g c a u g c g a a c u G a g a a

$ RNALfold -L 30 < allexample.fa

.(((.((...))..))). ( -0.80) 100

.((((((........)).)))). ( -7.10) 96

.(((((.((((..........))))))))). ( -9.50) 80

.(((......))). ( -2.60) 74

.(((((((......))).)))). ( -5.80) 70

.((((((.....(((...)))....)))))). ( -6.60) 57

.(((.((((((.....)))))).))). ( -6.40) 52

.(((((.(.........).))))). ( -5.10) 36

.(((.((((.(.........).))))))). ( -7.50) 33

.(((.........))). ( -2.60) 21

.(((......))). ( -1.00) 19

.(((......))). ( -3.30) 12

.((((.(((......)))..)).)). ( -5.80) 7

ccggaaaccgaacgcagcaccgcggaucuggaacgccgcuaGaacaacuaucuguaGcgcgaaaacauugugUAGcau

UAGuuugcgugcaaagaacgcagcaccgaaccgcaugcgaacuGagaa

(-26.20)

_ _ _ _ _ _ _ _ _ _ C G C U _ A A _ _ A C C A A C _ _ _ _ _ A G C _ C G C _ _ _ _ _ _ _ G _ G G C G A G A A C _ _

_ _ _ _ _ _ _ _ _ _ C G C U _ A A _ _ A C C A A C _ _ _ _ _ A G C _ C G C _ _ _ _ _ _ _ G _ G G C G A G A A C _ ___

__

__

__

__

CG

CU

_A

A_

_A

CC

AA

C_

__

__

AG

C_

CG

C_

__

__

__

G_

GG

CG

AG

AA

C_

_

__

__

__

__

__

CG

CU

_A

A_

_A

CC

AA

C_

__

__

AG

C_

CG

C_

__

__

__

G_

GG

CG

AG

AA

C_

_

____

____

_ _ CGCUGA

A__

ACCA

AC_

_ _ G _AGC

GCGC__

___

_GG_

GGCG A

GAAC

__

$ RNAalifold -p --aln --color < samples.aln5 sequences; length of alignment 57.__________CGCUGAA__ACCAAC___G_AGCGCGC______GG_GGCGAGAAC__..........((((((...(((............))).......))))))....... ( -9.71 = -6.50 + -3.21)..........((((((...(((............))).......))))))....... [ -9.99]frequency of mfe structure in ensemble 0.636366

..........((((((...(((............))).......))))))....... -9.79 { -6.50 + -3.29}

..........((((((...(((............))).......)))))).......Examp1 ----------CCGG-AAA-CCGAACGCAGCACCGCGG------AU-CUGGAACGC--Examp2 ----------CGCU-AG--AACAAC-------UAUCU------GU-AGCGCGAAAACExamp3 ---------AUUGUGUA--GCAUU------AGUUUGC-------GUGCAAAGAACGCExamp4 -------UGCCAUCGCAUUAGCACC---U-AGCCGCAUUUUCUGGCGAUGAUG----Examp5 AGCACCGAACCGCAU----GCGAACUGAG-AA--CGCAACC----AUGCGCGCACC-

0 50 100 150 200 250nt

-10

-5

0

5

kcal

/mol

opening energy w=4interaction energy

$ RNAup --interaction_pairwise < inputup

(((((((((((((..((((((&))))))....))))))))))))) 111,131 : 2,24 (-13.01 = -24.85 + 11.84)

gcaugcgaacuGagaacgcaa&uuguguagcauuaguuugcgugc

RNAup output in file: RNA_w25_u1.out

CGCUAG

AACA

AC U A U

CUG UAG C G C G

AA

AA C AGCAC C G

AA

CCGCA

U G C G A ACU

GAGA

ACGCA

ACCAU

GCGCGCAC

C

$ RNAcofold -p < Examplecofold

>Examplecofold

CGCUAGAACAACUAUCUGUAGCGCGAAAAC&AGCACCGAACCGCAUGCGAACUGAGAACGCAACCAUGCGCGCACC

.................((.(((((.....&.((........)).((((.........))))......))))))). (-14.00)

.{{{{.,.........}|},||(((.....&..........{||{((((.........)))}...,}})))))... [-16.00]

frequency of mfe structure in ensemble 0.039241 , delta G binding= -2.19

$ RNAsubopt -s -e 1 < berni.fa

> berni [100]

AGCACCGAACCGCAUGCGAACUGAGAACGCAACCAUGCGCGCACC -700

..........((((((((.........))))....))))...... -7.00

.........(((((((((.........))))....)))).).... -6.90

..........((((((.(...((......)).)))))))...... -6.70

.........(((((((.(...((......)).))))))).).... -6.60

.((........)).((((.........((((....)))))))).. -6.50

..........((((((((.........)))....)))))...... -6.50

.........(((((((((.........)))....))))).).... -6.40

.....((...((((((((.........))))....)))))).... -6.20

..............((((.........((((....)))))))).. -6.00

$ RNAsubopt -z < peter.fa

> peter [100]

.(((...)))..... [-1.30]

.((.........)). [0.40]

(.((...)))..... [1.80]

(.(....).)..... [1.90]

((((...)))...). [2.10]

...(....)...... [2.70]

...(........).. [3.00]

........(....). [3.00]

.......(.....). [3.10]

.........(....) [4.00]

..(...........) [4.10]

...(..........) [5.00]

$ RNAsubopt -p 10 < xtof.fa

> Xtof [100]

...((((((....((........))........))))))...

...((((((...(...(......)..)......))))))...

...((((((....((.....))..((.....))))))))...

...((((((........................))))))...

(((....)))....(((...(((.........)))).))...

...((((((........................))))))...

.(.((((((...((.............))....)))))).).

...((((((....((.....))...........))))))...

...((((((....((........))........))))))...

.(.((((((....((........))........)))))).).1e−03 1e−01 1e+01 1e+03 1e+05

0.0

0.2

0.4

0.6

0.8

1.0

time

po

pu

lati

on

den

sity

12

3

4 56 7

89

10

11

12

13

14

15

16

17

18

19

20

-6.0

-4.0

-2.0

0.0

1.1

0.7

1.4

0.4 0.6 1.8 2.0

0.8 1.2

2.1

1.9

1.4

2.0

0.82.3

1.52.0

2.9

2.0

3.9

2.4

3.4

0.9

0.7 1.5

2.1

3.7

3.6

20

1

2

A B

C

D

E

F

G

Figure 1 Example calls of programs included in the ViennaRNA Package and their corresponding output. (A) Single sequence analysisusing RNAfold. (B) Locally optimal secondary structures and base pair probabilities using RNAplfold and RNALfold. (C) Interactionthermodynamics of two RNA sequences computed by RNAup. (D) Consensus structures and base pair probabilities for RNA sequencealignments obtained from RNAalifold. (E) Secondary structure of an RNA dimer calculated by RNAcofold. (F) Folding kinetics usingRNAsubopt in conjunction with the external programs barriers and treekin. (G) Suboptimal secondary structures generated byRNAsubopt. For a detailed description see the appendix.

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 5 of 14

Page 6: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

uses a sparse matrix approach, the prefactor of time andmemory complexity is very small, making the programapplicable for RNA sequence lengths of up to about 400- 600 nt.

RNA-RNA interactionsSeveral programs focus on various aspects of the hybri-dization structure of two RNA molecules, using differentlevels of detail. The programs RNAcofold [40] andRNAup [41] are two complementary programs with thehighest level of detail available within the ViennaRNAPackage. RNAup first computes local opening energiesfor both molecules and then computes interaction ener-gies, looking for the best interaction site of two mole-cules. RNAcofold, on the other hand, concatenatestwo molecules and computes a common secondarystructure using modified energies for the loop that con-tains the cut. RNAcofold thus can generate arbitrarymany binding sites, but does not allow pseudoknottedconfigurations, while RNAup covers only a single inter-action site, which however may form a complex pseudo-knotted configuration. The partition function version ofRNAcofold can be used to investigate the concentra-tion dependency of dimerization, similar to [42]. On theother hand, RNAup is mostly geared towards investiga-tions of the binding of regulatory RNA molecules withtheir target RNAs.RNAPKplex is at present the only component of the

Vienna RNA Package that explicitly predicts pseudo-knotted RNA structures [43]. As an “intramolecular var-iant” of RNAup it computes accessibilities and thenidentifies regions that can form stable base pairs.Although optimized for speed, the full-fledged folding

algorithms are not fast enough for genome-wide applica-tions. RNAduplex, similar to Rehmsmeier’s RNAhy-brid [44], ignores intramolecular structures and allmulti-branch loops in its search for thermodynamicallyfavorable interaction regions. RNAplex [45] achieves amassive gain in speed by simplifying the energy modelfor interior loops to an affine gap cost model, effectivelyreducing the folding problem to a variant of localsequence alignment. The accuracy of this approach canbe further improved by reading in accessibilities (ascomputed by RNAplfold) and incorporating them intothe scoring model [46].The specialized programs RNAsnoop [47] for the pre-

diction of target sites of H/ACA snoRNAs, and RNAL-foldz [48] for the evaluation of predicted localsecondary structures, use SVMs to further classify theoutput of the RNA folding routines.

Consensus structures and alignmentsA central issue for the comparative analysis of RNAsequences is the computation of a consensus structure.

Starting from a sequence alignment, this can beachieved using the same algorithmic framework as fold-ing a single sequence. More precisely, energy contribu-tions can be added up in a columnwise manner to yieldan effective energy model for the alignment as a whole[49]. The Vienna RNA Package provides alignment-based variants for several of the algorithms discussedabove: RNAalifold [50] computes global consensusstructures both in MFE and partition function mode, ascanning version of long sequence alignments is RNA-Lalifold. RNAaliduplex is designed to facility thesearch for conserved RNA-RNA interaction sites inlarge alignment data sets. The alidot program [51,52],finally, extracts local conserved structures given asequence alignment and secondary structure predictionsfor each of the aligned sequences. By default, consensusstructure prediction is dominated by the thermodynamicparameters and sequence covariation. Thus, phyloge-netic support for conservation of secondary structure isincluded only as a small bonus energy term. A muchmore sophisticated substitution model for paired regionsbased on the RIBOSUM scoring scheme [53] can beinvoked with the -R option.The Vienna RNA Package does not contain its own

optimized implementation for the simultaneous foldingand alignment of two RNA sequences, i.e., of the Sank-off algorithm [54]. We refer to the well-established soft-ware tools FoldAlign [55], or DynAlign [56] for thistask. A simplified version of the Sankoff algorithmunderlies pmcomp [57,58], a facility to align pre-com-puted base-pairing probability matrices, although thistool is now included mostly for backward compatibility.An improved and much more efficient implementationis provided by the locarna package [59] developed incooperation with Rolf Backofen and Sebastian Will anddistributed separately.With RNApaln and RNApdist the package also pro-

vides tools to align and compare base pair probabilitypatterns using modified string alignment algorithms.Tree editing distances and corresponding pairwise align-ments can be computed with RNAdist.

Miscellaneous toolsConcerning sequence design, we ship the programRNAinverse [25]. It generates a sequence that foldsinto the input structure by mutating a start sequence.More efficient versions of inverse folding algorithmshave become available over the last decade, see e.g.INFO-RNA [60], RNA Designer [61] and the recentNUPACK design algorithms [62]. Nevertheless, RNAin-verse remains useful for some applications as it isdesigned for search for solutions as close as possible tothe starting sequence. RNAswitch [63] takes a pair ofsecondary structures as input and finds a sequence that

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 6 of 14

Page 7: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

has both input structures as near ground states. Thepossibility to design bistable RNAs may be useful e.g.for synthetic biology.A closer look at the dynamics of RNA folding a avail-

able through kinfold [64], a rejectionless MonteCarlo simulation algorithm generating trajectories ofsubsequent secondary structures. Kinetic informationcan also be obtained from the exhaustive enumerationof suboptimal structures using RNAsubopt in conjunc-tion with the barriers package [64,65]. The latter isnot restricted to RNA landscapes and hence distributedseparately from the Vienna RNA Package.

Auxiliary ProgramsIn addition to the prediction and analysis tools, theViennaRNA Package provides utility programs andscripts that mainly assist in processing input- and out-put data. RNAeval computes the energy of a givenstructure formed by a given sequence and can in parti-cular be used to re-compute energies for a given pair ofsequence and structure with different energy models.The Perl script refold.pl generates single structurepredictions using a previously computed consensusstructure as constraint.RNAplot can be used to generate a graphical repre-

sentation of the an input sequence/structure pair [66].Several Perl scripts can be used to further manipulatePostScript output produced by the various componentsof the Vienna RNA Package. Conventional structuredrawings can be rotated with rotate_ss.pl. Therelplot.pl script includes reliability annotation intosecondary structure plots, colorrna.pl uses the con-servation of alignments for coloring consensus structureplots, while coloraln.pl does the same with analignment. Mountain plots can be produced with moun-tain.pl and cmount.pl from single and consensusstructures, respectively.Many tools in RNA bioinformatics use mfold’s “con-

nectivity” (.ct) file format. The dot-bracket representa-tion used consistently by the Vienna RNA Packagecan converted into this format using b2ct and ct2b.pl, resp.

The ViennaRNA WebserverThe ViennaRNA Webserver [32] facilitates an easy touse form based web browser interface to most of theprograms included in the ViennaRNA Package and addi-tional tools. It combines the call of the appropriate com-mand-line tools with post-processing steps to obtain avisualization of the output. The webtools echo the com-mand-lines used to call components of the ViennaRNA Package; this feature can be used to get morefamiliar with the individual tools. The webserver alsoprovides an interface to the barriers and treekin

program allowing the analysis of folding landscapes andstructural refolding kinetics. The backbone of theViennaRNA Webserver has been upgraded so that allcalculations with the webserver profit from theincreased performance of the new ViennaRNAPackage.

Modifying the energy parameters of the modelThe energy model implemented in ViennaRNA Pack-age 2.0 follows the structure of the Turner 2004energy parameters as described in [9] with a few veryminor deviations. Compared to previous parametriza-tions, the Turner 2004 model introduced additionallook-up tables for certain free energies and for loopentropies in response to more precise measurements ofcertain loop types. For the sake of computational effi-ciency a few peculiar rules were deliberately ignored,however. Details on these discrepancies, which do notaffect the overall accuracy of predictions (see below), areprovided in the appendix.All programs of the ViennaRNA Package can read

in energy parameters from a human-readable text fileallowing the user to replace the default Turner 2004parameter set. This can either be a user-supplied para-meter file or one of several parameter compilations thatare shipped with the package. Of particular interest areparameters for DNA folding. Here we provide a para-meter set compiled by Douglas Turner and David Math-ews [67] from published data, incorporating in particularearlier work by the group of John SantaLucia [68].While the Turner parameters are based almost exclu-sively on thermodynamic measurements, there has beenincreasing interest in optimizing parameters such as tomaximize prediction accuracy, see e.g. [69]. As an exam-ple for such trained parameters we provide the Andro-nescu parameter set from ref. [70].To maintain backward compatibility we also ship

Turner ‘99 energy parameter files containing the basiccontributions used in previous versions of the Vien-naRNA Package. These parameter files, however, willnot always produce results identical to earlier versionsof the package. Affecting mainly the computation ofconsensus structures, these differences are mainly owedto a different handling of non-standard base pairs (i.e.,base pairs other than Watson-Crick and GU). The cur-rent implementation assumes that the energy contribu-tion of a loop with non-standard base pairs or non-standard nucleotides equals the least stabilizing contri-bution from the same loop type with canonical nucleo-tides and pairs only. Small differences may also appearin partition function computations as a consequence ofround-off errors.Since the structure of the energy model has changed

in ViennaRNA Package 2.0, energy parameter files for

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 7 of 14

Page 8: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

versions 1.8.5. and earlier will not work with the newversion of the package. Such old-style user-suppliedparameter files can be converted to the new file formatusing the RNAparconv utility.Additional output optionsMore information gathered through the course of thefolding algorithms can be included in the output. RNA-fold and RNAalifold, for instance, optionally pro-vide further information about the reliability of foldingresults. When evaluating ensemble properties with thepartition function, most programs now also computethe centroid structure [71], i.e., the structure with thesmallest average base pair distance to all other struc-tures in the ensemble. When base pair probabilities arecomputed, the maximum expected accuracy (MEA)structure [18,72] is also available. The RNALfold/RNALfoldz program now features an add-on to calcu-late the z-score for the predicted local secondary struc-tures [48]. This makes results comparable betweensequences with different nucleotide compositions andfacilitates the choice of a reasonable cutoff thresholds todecrease the number of structure hits.

Program options and documentationEach of the command-line tools provides the option -hor –help to print a brief overview of its general beha-vior as well as a list of all available parameter optionsincluding their description. To obtain more detailedinformation or even exemplary use-case scenarios for acertain program of the ViennaRNA Package, a UNIXmanpage is provided for each of them.An important change in the new release is the compli-

ance to the GNU standard regarding the format of com-mand-line options. Short options consist of a singlecharacter preceded by a minus sign, e.g. -p, while longoptions are strings of two or more characters precededby two minus signs, e.g. –noLP. This change will breakbackward compatibility wherever command-line toolsfrom older versions of the package were used. This canbe easily fixed by inserting the second dash in longoptions.

Input file formatsA plethora of different file formats have been introducedby the many tools and databases relevant to RNA bioin-formatics. The ViennaRNA Package has also contrib-uted to this unpleasant diversity with its own nativeformats. Originally designed for simulation pipelines inwhich no meta-data is attached to sequence or structuredata, it expects input items (sequences and/or struc-tures) as single strings uninterrupted by white spaces orline breaks. FASTA-like headers can optionally be usedto specify an identifier for the data item(s). Secondarystructures are also specified as strings, using the three

characters (, ), and. to denote nucleotides that are pairedwith a partner upstream or downstream, or that areunpaired, resp. In addition to uniquely determining apseudoknot-free secondary structure, this notation hasthe advantage of providing a compact annotation of thesequence or alignment to which the structure refers.The dot-parentheses-format is meanwhile usedalso in many unrelated tools e.g. [18,21,61,73-79]. Simi-lar annotation strings are used to specify constraints asinput to folding algorithms.The requirement to write input items on a single line

usually requires data format conversions for the interac-tions with most other bioinformatics tools. Theseusually read and write FASTA format [80], which allowswhite spaces and line breaks arbitrarily interspersedwithin a sequence. An improved handling of data inputnow provides full FASTA support for all tools thatrequire only sequences or sequence alignments as input.This should considerably facilitate the use of the Vien-naRNA Package. More complex input structures arestill required for the tools that compute RNA-RNAinteractions, in particular RNAup and RNAcofold.Programs that process alignment data used clustal

format [81] in previous versions of the package. Due tothe wide-spread use of the STOCKHOLM format in RNAbioinformatics, e.g. in the Rfam - RNA family data-base [82]), support for *.stk files has been added.There are currently no plans to include support for

input formats that use heavy markup such as Genbank[83] files or XML-based formats such as BioXSD [84]or RNAML [85].

RNAlib -

API to fast and reliable algorithmsThe algorithms implemented in the ViennaRNA Pack-age are not only accessible by means of the interactiveprograms outlined in the previous section but alsodirectly in the form of a C/C++ library. This makesthem readily available for third-party programs and,with the help of included Perl-interface, to elaboratescripting pipelines.

OpenMP thread-safe C/C++ APIMulti-core CPUs have become standard components inoff-the-shelf PC hardware. In order to allow the Vien-naRNA Package to make use of this increase of com-putational power, several changes had to be introducedinto the API functions of the RNAlib. Although it ispossible to parallelize the core folding algorithms[86,87] this requires substantial overheads so that thegain is small unless massively parallel architectures areused. On the other hand, computationally demandingapplications of RNA folding typically require the

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 8 of 14

Page 9: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

processing of large numbers of input sequences, a taskthat trivially can be parallelized. The only requirementfor enabling concurrent computation on shared memorymulti-core systems using OpenMP [88] is that the corealgorithms are independent of shared global variablesand thus thread-safe. In particular the variables referringto the energy parameters are now deprecated andreplaced by additional functions or parameters whichhave to be passed to functions. A few remaining globalvariables, which are inaccessible through RNAlib, weremade thread-private using OpenMP, allowing simulta-neous function calls to operate on private copies ofthese variables. Using the OpenMP framework, thirdparty applications are therefore now able to call RNA-lib interfaces, such as MFE or partition function algo-rithms, in parallel. Limitations concerning the use ofdifferent energy models used in concurrent computa-tions are described in detail in the API reference man-ual. For backward compatibility, the old functions of theprevious API remain included in RNAlib but aremarked as deprecated. Thus, programs which weredeveloped for binding against the previous versions ofRNAlib up to 1.8.5 are still working without limitationswhen linked against the new library.

The reference manualDocumentation is an important issue for the usability ofthe RNAlib API. In previous versions of the Vien-naRNA Package, this was addressed by maintaining, inaddition to the source code, a texinfo-based referencemanual containing introductions into the particular pro-blem sets and describing the related library functions. Inorder to keep this documentation up to date and todecrease the developers’ effort in maintaining the man-ual, we opted to use in-source documentation that (a)helps developers who interact with the source codedirectly and (b) enable to use the doxygen documenta-tion program to generate a comprehensive and alwaysup-to-date reference manual automatically. An HTMLand a PDF version are included in the package.

PERL bindingsScripting language bindings to the C functions in theRNAlib are made using the SWIG interface compiler.With the ViennaRNA Package, we include bindingsfor the most important library functions made accessiblefor the script language Perl. This allows a very easyaccess to e.g. the folding functions and thus a rapiddesign of functional pipelines or small programs thatexploit the potential of the ViennaRNA Package.Using the SWIG environment bindings for other (script-ing) languages including Python and JAVA can beimplemented quite easily.

PerformanceWe assess the performance of the ViennaRNA Pack-age 2.0 both in terms of computational efficiency andin terms of prediction accuracy. We emphasize that it isnot the purpose of this contribution to compare ther-modynamics-based prediction algorithms against otherapproaches to RNA structure prediction. For such abenchmark we refer to the literature, e.g. [18,89,90].In order to investigate the impact of the energy para-

meters, and in particular of our small changes to theTurner 2004 model, we use a test set comprising all 1817non-multimer sequence/structure pairs in the RNAstranddatabase [73] without pseudoknots in the reference struc-ture. For each sequence, the MFE secondary structure wascalculated with RNAfold 2.0, RNAfold 1.8.5, UNAFold3.8 [19], and RNAstructure 5.2 [20]. All use a nearestneighbor energy model and a variant of Zuker’s dynamicprogramming algorithm. As expected, the new version ofRNAfold performs better than the old one. Somewhat sur-prisingly, however, RNAfold 2.0 also performs slightlybetter than RNAstructure 5.2 and UNAFold 3.8,despite the fact that we neglected a few peculiarities of themost recent energy model, see Figure 2, Additional File 1and the implementation details in the appendix. The aver-age performance indicators are compiled in Table 2. Weemphasize, however, that the performance of the algo-rithms differs widely across RNA families and no singleimplementation provides consistently superior results.Detailed data can be found in Additional File 2.Despite the increase in the number of parameters fromTurner ‘99 to Turner 2004 we observe virtually no dif-ference in the runtime and memory consumptionbetween RNAfold 1.8.5 and RNAfold 2.0. Similarcomparisons can be made for other components of theViennaRNA Package. The computational speed ofRNAfold compares quite favorably to that of the com-peting implementations, Figure 2B, although all theimplementations of thermodynamic folding algorithmsuse essentially the same energy model and algorithmicframework, and hence have the same asymptotic run-time and memory consumption.

DiscussionThe ViennaRNA Package has been a useful tool for theRNA bioinformatics community for almost two decades.Quite a few widely-used software tools and data analysispipelines have been built upon this foundation, eitherincorporating calls to the interactive programs or directlyinterfacing to RNAlib. Numeric characteristics of second-ary structures, such as Gibbs free energy ΔG, Minimumfree energy (MFE), ensemble diversity or probabilities ofMFE structures in the ensemble, have been widely used asfeatures for machine learning classification, e.g. in

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 9 of 14

Page 10: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

microRNA precursor and target detection [91-94]. Thenon-coding RNA gene finder RNAz [95,96], the snoRNAdetector snoReport [97], and RNAstrand [98], a toolthat predicts the reading direction of structured RNAsfrom a multiple sequence alignment, combine thermody-namic properties computed with RNAlib functions and amachine learning component. RNAsoup [99] takes advan-tage of the programs RNAfold, RNAalifold and someother tools provided by the ViennaRNA Package for astructural clustering of ncRNAs. The siRNA design pro-gram RNAxs [100] employs the site accessibility predic-tions offered by RNAplfold, as does IntaRNA [60], aprogram to predict RNA interaction sites. Several second-ary structure prediction tools, such as CentroidFold[22], McCaskill-MEA [101], or RNAsalsa [102], usebase pair probabilities predicted by RNAfold -p as input,

while the LocARNA package [59] uses them for structuralalignment. The motif-based comparison and alignmenttool ExpaRNA [103] and the tree alignment programRNAforester [75] also rely on the algorithms providedby RNAlib. Since its initial publication [25], no compre-hensive description [104] of the ViennaRNA Packagehas appeared. Release 2.0 now implements the latestenergy model, provides many new and improved function-alities, and - as we hope - is even easier and more efficientto use due to a thread-safe architecture, an improved API,a more consistent set of options, and a much moredetailed documentation. Care has been taken to ensurebackward compatibility so that ViennaRNA Package2.0 can be readily substituted for earlier versions.

Availability and RequirementsThe source code of the ViennaRNA Package as wellas the current reference manual can be downloadedfrom http://www.tbi.univie.ac.at/RNA.

AppendixEnergy model implementation detailsThe most important technical innovation is the use ofthe 2004 - improved nearest neighbor model byMathews et al. [9] as the default parameter set in all

0 500 1000 1500 2000Prediction count

0.0

0.2

0.4

0.6

0.8

1.0

MC

C o

f pre

dict

ion

102

103

104

Sequence length [nt]

10-3

10-2

10-1

100

101

102

103

104

Run

time

[s]

RNAfold 2.0RNAfold 1.8.5UNAfold 3.8RNAstructure 5.2

Figure 2 Performance comparison of RNAfold 2.0 to other secondary structure prediction software. (A) Accuracy of thermodynamicfolding programs in terms of cumulative distribution of the Matthews correlation coefficient (MCC). RNAfold 2.0 outperforms the othersecondary structure prediction programs on the RNAstrand dataset: more of its predictions fall into the region of higher performance values.Both versions of RNAfold were run with -d2 option. For UNAFold and RNAStructure default options were used. Performancedistributions of Sensitivity, Positive predictive value (PPV) and F-measure are shown in Additional File 1. The averaged overall accuracies can betaken from table 2. (B) Comparison of runtimes for MFE structure predictions. Measurement was performed on an Intel® Core™ 2 6600 CPUrunning at 2.4 GHz. Shown are averaged running times for random sequences of lengths 100 nt (100 samples), 500 nt (100 samples), 1000 nt(100 samples), 2500 nt (20 samples), 5000 nt (16 samples) and 10000 nt (16 samples). While the compared programs RNAfold 2.0, RNAfold1.8.5 and UNAfold 3.8 were capable of predicting an MFE structure for all tested samples in a relatively small time frame,RNAstructure 5.2 was omitted from predictions for the 10000 nt sample set due to its time requirements.

Table 2 Averaged performance measures forthermodynamic folding algorithms

Sensitivity Specificity MCC F-measure

RNAfold 2.0 0.739 0.792 0.763 0.761

RNAfold 1.8.5 0.711 0.773 0.740 0.737

UNAFold 0.692 0.766 0.727 0.724

RNAStructure 0.715 0.781 0.745 0.742

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 10 of 14

Page 11: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

free energy calculations. This entails not only an updateof all free energy evaluating sections in each affectedprogram, but also major changes in the structure of theparameter sets. In particular, several additional energyparameters for the different loop types (hairpin loops,interior loops and multi-branch loops) were introduced.In order to keep the number of energy parameters and

thus the complexity of the energy model small, werefrained from implementing exceptional contributionsfor some highly specialized configurations. In particularthe following special cases are not incorporated in ourfolding recursions:

1. All-C loop penalty, i.e., a penalizing contributionfor loops consisting of unpaired cytosine only;2. Additional stabilizing GU-closure term that isapplied only in the context of hairpin loops, enclosedby a GU (not UG) base pair which is preceded bytwo Gs;3. A special intramolecular helix formation of thefour consecutive base pairs GC, GU, UG and CG,which has a single tabulated contribution of -4.12kcal/mol.4. Consideration of an auxilary contributing factorthat reflects the number of states of a bulge loop, i.e.the number of all possible bulges with identicalsequence.5. Average asymmetry correcting penalty in multi-branch loops which constitutes the mean differencein unpaired nucleotides on both sides of the branch-ing stems;6. Extra penalty for three-way branching loops withless then two unpaired nucleotides;

Adapting the dynamic programming recursions to alsotake into account these loop configurations resulted inan increase of time and memory requirements without acompensating benefit in terms of prediction accuracy.The data-set we used for measuring the prediction per-formance also did not reveal any significant unfavorableeffect of our simplification of the model. However, freeenergy evaluation of a given sequence/structure pair, asdone by RNAeval, may introduce these extra cases inthe near future as an additional parameter, such as loga-rithmic multi-branch loop evaluation.All our folding algorithms assume -d2 as the default

dangling-end model, allowing a single nucleotide to con-tribute with all its possible favorable interactions. The dan-gling-end/helix-stacking model suggested by the Turner’04parameters is realized with the -d3 option. An additionalmodel allowing a single nucleotide to be involved in atmost one favorable interaction but ignoring helix-stackingcan be chosen with -d1, while -d0 deactivates dangling-end and helix-stacking contributions altogether.

PerformanceThe base pair positions along the RNA sequence weretaken as predicted properties for all of the performancemeasurements. Thus, all base pairs in the referencestructure contribute to the number of true positives(TP). The number of false positives (FP) is obtained bycounting all base pairs that are in the predicted but notin the reference secondary structure. Along with that, allbase pairs present in the reference but not in the predic-tion result are regarded as false negatives (FN). Thesenumbers are then used to compute the sensitivity, alsoknown as true positive rate (TPR), and precision, alsoknown as positive predictive value (PPV) [105].

TPR =TP

TP + FN

PPV =TP

TP + FP

To combine these performance measures into one sin-gle value, we used the Matthews Correlation Coefficient(MCC) [106] and the F1-score (F-measure), i.e. the har-monic mean of precision and true positive rate.

MCC =TP · TN− FP · FN

√(TP + FP)(TP + FN)(TN + FP)(TN + FN)

F1 = 2 · PPV · TPRPPV + TPR

Since the total number of possible base pairings is

bound by12· n · (n− 1) , with sequence length n, we

estimated the number of true negative (TN) which isrequired for calculating the MCC by its upper bound of

TN =12· n · (n− 1)− TP .

Detailed description of Figure 1Example calls of programs included in the ViennaRNAPackage and their corresponding output. (A) RNA-fold output on a small example sequence. Top: On-screen output - mfe, ensemble representation, and cen-troid structure as dot-parenthesis (Vienna) representa-tions. Numbers in brackets denote the energies, and thecentroid’s mean distance to the ensemble. Below: post-script output as generated by the above programm call.The mountain plot and the generating program call arein the center of the sub figure. The bottom shows posi-tional entropy derived reliabilty information color codedinto the secondary structure drawing.(B) Example output of programs for local folding.

Top: Dot plot as generated by RNAplfold. The plot isa cut out along the diagonal of a quadratic dot plot (seee.g. part (D) of this figure). At the bottom, an exampleoutput of RNALfold is shown. Local optimal

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 11 of 14

Page 12: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

substructures are shown in dot-parenthesis notationtogether with their energy and the index of their firstbase.(C) Example output of RNAup. At the bottom the best

interacting site between the two input molecules isshown. The xmgrace generated picture above shows theenergy necessary to open a window of 4 consecutivebases and the interaction energy that can be achievedwhen the probe molecule is bound to the target mole-cule in black and red, respectively.(D) RNAalifold output. At the top and bottom pic-

tures generated by RNAalifold are shown. The con-servation of the base pairs is encoded in a color scheme.Red means only one of the 6 possible base pairs is pre-sent, ochre means two, green 3 and so on. Paler colorsindicate that some of the sequences cannot form a basepair at the respective position in the alignment. The topright corner shows a dot plot. Every dot symbolizes abase pair, the size of the dots at the upper right triangleis proportional to the respective base pair probabilities,while on the lower left triangle the mfe structure isdepicted. On the top right the conservation annotatedconsensus structure drawing can be seen, while on thebottom the annotated alignment is shown. The center ofthe subfigure shows the on-screen output of RNAali-fold. As in the ordinary fold case, the minimum freeenergy structure, a representation of the ensemble struc-ture and the centroid structure are shown. The energiesare split into a thermodynamic part (first) and the con-servation part, which are summed to give the total pre-dicted score.(E) RNAcofold output. At the top the secondary

structure drawing of the minimum free energy foldingof the two molecules is shown. The molecules are colorcoded to make it easier to tell them apart. The “&”character in the on-screen output below is the separatorbetween the two sequences. In addition to the mfe andthe ensemble representation with their energies, thebinding energy is shown.(F) Output for kinetics (using RNAsubopt output fed

into the external programs barriers and treekin).The diagram shows the change in population from thestart, where state 20 is populated, towards the equili-brium state 1. The inner picture shows the barrier treeupon which the relative concentrations of the diagramare based. The 20 lowest suboptimal structures and thepaths connecting them are depicted, together with thebarrier heights.(G) Output of the three versions of RNAsubopt. Left:

Output of the Wuchty algorithm, all structures within acertain energy band are shown. Right: Zuker algorithm,showing the best structures for every possible base pair.Bottom: Stochastic backtracking, random structuresdrawn according to their probability in the ensemble.

Additional material

Additional file 1: Performance comparison (Sensitivity, PPV, F-measure).

Additional file 2: Detailed performance comparison.

AcknowledgementsThis work is dedicated to Peter Schuster on the occasion of his 70thbirthday.We thank all major contributers to earlier versions of the ViennaRNAPackage whose implementation of algorithms and programs helped tocreate this versatile and comprehensive software collection (in alphabeticalorder):Wolfgang Beyer, Sebastian Bonhöffer, Martin Fekete, Walter Fontana, UlrikeMückstein, Wolfgang Schnabel, Manfred Tacker, and Stefan Wuchty.We are grateful to all beta testers who were unrestrained in reporting bugson preliminary versions of ViennaRNA Package 2.0. This work wasfunded, in part, by the austrian GEN-AU projects “regulatory non codingRNA”, “Bioinformatics Integration Network III”, the Austrian FWF project “SFBF43 RNA regulation of the transcriptome”, the European Union FP-7 projectQUANTOMICS and Deutsche Forschungsgemeinschaft (grant No. STA 850/7-1 under the auspices of SPP-1258 “Small Regulatory RNAs in Prokaryotes”).

Author details1Institute for Theoretical Chemistry and Structural Biology, University ofVienna, Währingerstraße 17/3, A-1090 Vienna, Austria. 2Bioinformatics Group,Department of Computer Science, and Interdisciplinary Center forBioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig,Germany. 3Center for non-coding RNA in Technology and Health, Universityof Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. 4MaxPlanck Institute for Mathematics in the Sciences, Inselstraße 22 D-04103Leipzig, Germany. 5Fraunhofer Institute for Cell Therapy and Immunology,Perlickstraße 1, D-04103 Leipzig, Germany. 6Santa Fe Institute, 1399 HydePark Rd, Santa Fe, NM 87501, USA. 7Research Group Bioinformatics andComputational Biology, Faculty of Computer Science, University of Vienna,Währingerstraße 17/3, A-1090 Vienna, Austria.

Authors’ contributionsWork on the Vienna RNA Package is coordinated by ILH. The designand structure of version 2.0 resulted from discussion of RL with ILH, PFS, CF,SHB and CHzS. Implementation and performance analysis was performed byRL with contributions of HT (RNAplex and RNAsnoop). CHzS providedthe converted new energy parameter files. Detailed documentation for theRNAlib was done by RL and SHB based on pre-existing sources. Themanuscript was written by RL with major contribution by SHB, PFS, and ILH.All authors read and approved the final manuscript.

Competing interestsThe authors declare that they have no competing interests.

Received: 22 August 2011 Accepted: 24 November 2011Published: 24 November 2011

References1. Thirumalai D, Lee N, Woodson SA, Klimov DK: Early Events in RNA Folding.

Annu Rev Phys Chem 2001, 52:751-762.2. Tinoco I Jr, Uhlenbeck OC, Levine MD: Estimation of Secondary Structure

in Ribonucleic Acids. Nature 1971, 230:362-367.3. Tinoco I Jr, Borer PN, Dengler B, Levine ND, Uhlenbeck OC, Crothers DM,

Gralla J: Improved estimation of secondary structure in ribonucleic acids.Nature 1973, 246:40-41.

4. Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Caruthers MH, Neilson T,Turner DH: Improved free-energy parameters for predictions of RNAduplex stability. Proc Natl Acad Sci, USA 1986, 83:9373-9377.

5. Jaeger JA, Turner DH, Zuker M: Improved predictions of secondarystructures for RNA. Proc Natl Acad Sci USA 1989, 86:7706-7710.

6. He L, Kierzek R, SantaLucia J, Walter AE, Turner DH: Nearest-NeighborParameters for GU Mismatches. Biochemistry 1991, 30.

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 12 of 14

Page 13: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

7. Xia T, SanatLucia J Jr, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C,Turner DH: Parameters for an expanded nearest-neighbor model forformation of RNA duplexes with Watson-Crick pairs. Biochemistry 1998,37:14719-14735.

8. Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequencedependence of thermodynamic parameters improves prediction of RNAsecondary structure. J Mol Biol 1999, 288(5):911-940.

9. Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH:Incorporating chemical modification constraints into a dynamicprogramming algorithm for prediction ofRNA secondary structure. ProcNatl Acad Sci USA 2004, 101:7287-7292.

10. Nussinov R, Piecznik G, Griggs JR, Kleitman DJ: Algorithms for LoopMatching. SIAM J Appl Math 1978, 35:68-82.

11. Waterman MS, Smith TF: RNA secondary structure: A completemathematical analysis. Mathematical Biosciences 1978, 42:257-266.

12. Nussinov R, Jacobson AB: Fast algorithm for predicting the secondarystructure of single stranded RNA. Proc Natl Acad Sci USA 1980,77:6309-6313.

13. Zuker M, Stiegler P: Optimal computer folding of large RNA sequencesusing thermodynamics and auxiliary information. Nucleic Acids Res 1981,9:133-148.

14. Zuker M, Sankoff D: RNA secondary structures and their prediction. BullMath Biol 1984, 46:591-621.

15. McCaskill JS: The Equilibrium Partition Function and Base Pair BindingProbabilities for RNA Secondary Structure. Biopolymers 1990,29:1105-1119.

16. Eddy SR: A memory-efficient dynamic programming algorithm foroptimal alignment of a sequence to an RNA secondary structure. BMCBioinformatics 2002, 3:18.

17. Nawrocki EP, Kolbe DL, Eddy SR: Infernal 1.0: inference of RNAalignments. Bioinformatics 2009, 25:1335-1337.

18. Do CB, Woods DA, Batzoglou S: CONTRAfold: RNA secondary structureprediction without physics-based models. Bioinformatics 2006, 22:90-98.

19. Markham NR, Zuker M: UNAFold: software for nucleic acid folding andhybridization. Methods Mol Biol 2008, 453:3-31.

20. Reuter JS, Mathews DH: RNAstructure: software for RNA secondarystructure prediction and analysis. BMC Bioinformatics 2010, 11:129-129.

21. Zadeh JN, Steenberg CD, Bois JS, Wolfe BR, Pierce MB, Khan AR, Dirks RM,Pierce NA: NUPACK: Analysis and design of nucleic acid systems. JComput Chem 2011, 32:170-3.

22. Hamada M, Kiryu H, Sato K, Mituyama T, Asai K: Prediction of RNAsecondary structure using generalized centroid estimators. Bioinformatics2009, 25(4):465-473.

23. Ding Y, Lawrence CE: A statistical sampling algorithm for RNA secondarystructure prediction. Nucleic Acids Res 2003, 31:7280-7301.

24. Reeder J, Giegerich R: RNA secondary structure analysis using theRNAshapes package. Curr Protoc Bioinformatics 2009, Chapter 12:Unit12.8..

25. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, Tacker M, Schuster P: FastFolding and Comparison of RNA Secondary Structures (The Vienna RNAPackage). Monatsh Chem 1994, 125(2):167-188.

26. Fontana W, Schuster P: A computer model of evolutionary optimization.Biophys Chem 1987, 26:123-147.

27. Fontana W, Schnabl W, Schuster P: Physical aspects of evolutionaryoptimization and adaption. Phys Rev A 1989, 40:3301-3321.

28. Fontana W, Griesmacher T, Schnabl W, Stadler PF, Schuster P: Statistics ofLandscapes Based on Free Energies, Replication and Degradation RateConstants of RNA Secondary Structures. Monatsh Chem 1991,122:795-819.

29. Bonhoeffer S, McCaskill JS, Stadler PF, Schuster P: RNA Multi-StructureLandscapes. A Study Based on Temperature Dependent PartitionFunctions. Eur Biophys J 1993, 22:13-24.

30. Fontana W, Konings DAM, Stadler PF, Schuster P: Statistics of RNASecondary Structures. Biopolymers 1993, 33:1389-1404.

31. Schuster P, Fontana W, Stadler PF, Hofacker IL: From Sequences to Shapesand Back: A case study in RNA secondary structures. Proc Roy Soc Lond B1994, 255:279-284.

32. Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL: The Vienna RNAwebsuite. Nucleic Acids Res 2008, 36(Web Server):70-74.

33. Hofacker IL, Stadler PF: Memory Efficient Folding Algorithms for CircularRNA Secondary Structures. Bioinformatics 2006, 22:1172-1176.

34. Wuchty S, Fontana W, Hofacker IL, Schuster P: Complete SuboptimalFolding of RNA and the Stability of Secondary Structures. Biopolymers1999, 49:145-165.

35. Tacker M, Stadler PF, Bornberg-Bauer EG, Hofacker IL, Schuster P: AlgorithmIndependent Properties of RNA Structure Prediction. Eur Biophy J 1996,25:115-130.

36. Zuker M: On finding all suboptimal foldings of an RNA molecule. Science1989, 244:48-52.

37. Hofacker IL, Priwitzer B, Stadler PF: Prediction of Locally Stable RNASecondary Structures for Genome-Wide Surveys. Bioinformatics 2004,20:186-190.

38. Bernhart S, Hofacker IL, Stadler PF: Local RNA Base Pairing Probabilities inLarge Sequences. Bioinformatics 2006, 22:614-615.

39. Lorenz R, Flamm C, Hofacker IL: 2D Projections of RNA foldingLandscapes. In German Conference on Bioinformatics 2009, Volume 157 ofLecture Notes in Informatics. Edited by: Grosse I, Neumann S, Posch S,Schreiber F, Stadler PF. Bonn: Gesellschaft f. Informatik; 2009:11-20.

40. Bernhart SH, Tafer H, Mückstein U, Flamm C, Stadler PF, Hofacker IL:Partition Function and Base Pairing Probabilities of RNA Heterodimers.Algorithms Mol Biol 2006, 1:3.

41. Mückstein U, Tafer H, Bernhart SH, Hernandez-Rosales M, Vogel J, Stadler PF,Hofacker IL: Translational Control by RNA-RNA Interaction: ImprovedComputation of RNA-RNA Binding Thermodynamics. In BioinformaticsResearch and Development, Volume 13 of Communications in Computer andInformation Science. Edited by: Elloumi M, Küng J, Linial M, Murphy R,Schneider K, Toma C. Springer; 2008:114-127.

42. Dimitrov RA, Zuker M: Prediction of Hybridization and Melting forDouble-Stranded Nucleic Acids. Biophys J 2004, 87:215-226.

43. Beyer W: RNA Secondary Structure Prediction including Pseudoknots.Master’s thesis University Vienna; 2010 [http://www.tbi.univie.ac.at/newpapers/Master_theses2010.html].

44. Rehmsmeier M, Steffen P, Hochsmann M, Giegerich R: Fast and effectiveprediction of microRNA/target duplexes. RNA 2004, 10:1507-1517.

45. Tafer H, Hofacker IL: RNAplex: a fast tool for RNA-RNA interaction search.Bioinformatics 2008, 24(22):2657-2663.

46. Tafer H, Ammann F, Eggenhoffer F, Stadler PF, Hofacker IL: FastAccessibility-Based Prediction of RNA-RNA Interactions. Bioinformatics2011, 27:1934-1940.

47. Tafer H, Kehr S, Hertel J, Hofacker IL, Stadler PF: RNAsnoop: efficient targetprediction for H/ACA snoRNAs. Bioinformatics 2010, 26(5):610-616.

48. Gruber AR, Bernhart SH, Zhou Y, Hofacker IL: RNALfoldz: Efficientprediction of thermodynamically stable, local secondary structures. InGerman Conference on Bioinformatics 2010, Volume 173 of Lecture Notes inInformatics. Edited by: Schomburg D, Grote A. Bonn: Gesellschaft f.Informatik; 2010:12-21.

49. Hofacker IL, Fekete M, Stadler PF: Secondary Structure Prediction forAligned RNA Sequences. J Mol Biol 2002, 319:1059-1066.

50. Bernhart SH, Hofacker IL, Will S, Gruber AR, Stadler PF: RNAalifold:improved consensus structure prediction for RNA alignments. BMCBioinformatics 2008, 9:474.

51. Hofacker IL, Fekete M, Flamm C, Huynen MA, Rauscher S, Stolorz PE,Stadler PF: Automatic Detection of Conserved RNA Structure Elements inComplete RNA Virus Genomes. Nucl Acids Res 1998, 26:3825-3836.

52. Hofacker IL, Stadler PF: Automatic Detection of Conserved Base PairingPatterns in RNA Virus Genomes. Comp & Chem 1999, 23:401-414.

53. Klein R, Eddy S: RSEARCH: Finding homologs of single structured RNAsequences. BMC Bioinformatics 2003, 4:44.

54. Sankoff D: Simultaneous solution of the RNA folding, alignment, andproto-sequence problems. SIAM J Appl Math 1985, 45:810-825.

55. Havgaard JH, Lyngs RB, Gorodkin J: The foldalign web server for pairwisestructural RNA alignment and mutual motif search. Nucleic Acids Research2005, 33(suppl 2):W650-W653.

56. Mathews DH, Turner DH: Dynalign: an algorithm for finding thesecondary structure common to two RNA sequences. Journal of MolecularBiology 2002, 317(2):191-203.

57. Hofacker IL, Bernhart SHF, Stadler PF: Alignment of RNA Base PairingProbability Matrices. Bioinformatics 2004, 20:2222-2227.

58. Hofacker IL, Stadler PF: The Partition Function Variant of Sankoff’sAlgorithm. In Computational Science - ICCS 2004, Volume 3039 of LectureNotes in Computer Science Edited by: Bubak M, van Albada G, Sloot PA,Dongarra J 2004, 728-735, [Kraków, June 6-9, 2004]..

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 13 of 14

Page 14: SOFTWARE ARTICLE Open Access ViennaRNA Package 2

59. Will S, Missal K, Hofacker IL, Stadler PF, Backofen R: Inferring Non-CodingRNA Families and Classes by Means of Genome-Scale Structure-BasedClustering. PLoS Comp Biol 2007, 3:e65.

60. Busch A, Backofen R: INFO-RNA-a fast approach to inverse RNA folding.Bioinformatics 2006, 22(15):1823-1831.

61. Andronescu M, Aguirre-Hernández R, Condon A, Hoos HH: RNAsoft: a suiteof RNA secondary structure prediction and design software tools. NucleicAcids Research 2003, 31(13):3416-3422.

62. Zadeh JN, Wolfe BR, Pierce NA: Nucleic acid sequence design via efficientensemble defect optimization. J Comput Chem 2011, 32:439-452.

63. Flamm C, Hofacker IL, Maurer-Stroh S, Stadler PF, Zehl M: Design of Multi-Stable RNA Molecules. RNA 2001, 7:254-265.

64. Flamm C, Fontana W, Hofacker IL, Schuster P: RNA Folding at ElementaryStep Resolution. RNA 2000, 6:325-338.

65. Flamm C, Hofacker IL, Stadler PF, Wolfinger MT: Barrier Trees ofDegenerate Landscapes. Z Phys Chem 2002, 216:155-173.

66. Bruccoleri RE, Heinrich G: An improved algorithm for nucleic acidsecondary structure display. Computer applications in the biosciences:CABIOS 1988, 4:167-173.

67. Turner DH, Mathews DH: NNDB: the nearest neighbor parameterdatabase for predicting stability of nucleic acid secondary structure.Nucleic Acids Res 2010, 38(Database):280-282.

68. SantaLucia J, Hicks D: The thermodynamics of DNA structural motifs.Annu Rev Biophys Biomol Struct 2004, 33:415-440.

69. Andronescu MS, Pop C, Condon AE: Improved free energy parameters forRNA pseudoknotted secondary structure prediction. RNA 2010, 16:26-42.

70. Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP: Efficientparameter estimation for RNA secondary structure prediction.Bioinformatics 2007, 23:i19-i28.

71. Ding Y, Chan CY, Lawrence CE: RNA secondary structure prediction bycentroids in a Boltzmann weighted ensemble. RNA 2005, 11(8):1157-1166.

72. Knudsen B, Hein J: Pfold: RNA secondary structure prediction usingstochastic context-free grammars. Nucleic Acids Research 2003,31(13):3423.

73. Andronescu M, Bereg V, Hoos HH, Condon A: RNA STRAND: the RNAsecondary structure and statistical analysis database. BMC Bioinformatics2008, 9:340-340.

74. Sczyrba A, Krüger J, Mersch H, Kurtz S, Giegerich R: RNA-related tools onthe Bielefeld Bioinformatics Server. Nucleic Acids Research 2003,31(13):3767-3770.

75. Höchsmann M, Töller T, Giegerich R, Kurtz S: Local similarity in RNAsecondary structures. Proc IEEE Comput Soc Bioinform Conf 2003, 2:159-168.

76. Aksay C, Salari R, Karakoc E, Alkan C, Sahinalp S: taveRNA: a web suite forRNA algorithms and applications. Nucleic acids research 2007, 35(suppl 2):W325.

77. Parisien M, Major F: The MC-Fold and MC-Sym pipeline infers RNAstructure from sequence data. Nature 2008, 452:51-55.

78. Darty K, Denise A, Ponty Y: VARNA: Interactive drawing and editing of theRNA secondary structure. Bioinformatics 2009, 25(15):1974-1975.

79. Höner zu Siederdissen C, Bernhart SH, Stadler PF, Hofacker IL: A FoldingAlgorithm for Extended RNA Secondary Structures. Bioinformatics 2011,27:129-136.

80. Pearson W, Lipman D: Improved tools for biological sequencecomparison. Proceedings of the National Academy of Sciences of the UnitedStates of America 1988, 85(8):2444.

81. Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H,Valentin F, Wallace I, Wilm A, Lopez R, et al: Clustal W and Clustal Xversion 2.0. Bioinformatics 2007, 23(21):2947.

82. Gardner P, Daub J, Tate J, Nawrocki E, Kolbe D, Lindgreen S, Wilkinson A,Finn R, Griffiths-Jones S, Eddy S, et al: Rfam: updates to the RNA familiesdatabase. Nucleic acids research 2009, 37(suppl 1):D136.

83. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank.Nucleic Acids Research 2009, 37(suppl 1):D26-D31.

84. Kalaš M, Puntervoll P, Joseph A, Bartaševičiūte E, Töpfer A, Venkataraman P,Pettifer S, Bryne J, Ison J, Blanchet C, et al: BioXSD: the common data-exchange format for everyday bioinformatics web services. Bioinformatics2010, 26(18):i540.

85. Waugh A, Gendron P, Altman R, Brown JW, Case D, Gautheret D, Harvey SC,Leontis N, Westbrook J, Westhof E, Zuker M, Major F: RNAML: a standardsyntax for exchanging RNA information. RNA 2002, 8(6):707-717.

86. Hofacker IL, Huynen MA, Stadler PF, Stolorz PE: Knowledge Discovery inRNA Sequence Families of HIV Using Scalable Computers. In Proceedingsof the 2nd International Conference on Knowledge Discovery and DataMining, Portland, OR. Edited by: Simoudis E, Han J, Fayyad U, Menlo Park.CA: AAAI Press; 1996:20-25.

87. Fekete M, Hofacker IL, Stadler PF: Prediction of RNA base pairingprobabilities using massively parallel computers. J Comp Biol 2000,7:171-182.

88. Dagum L, Menon R: OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 1998,5:46-55.

89. Gardner P, Giegerich R: A comprehensive comparison of comparativeRNA structure prediction approaches. BMC bioinformatics 2004, 5:140.

90. Zakov S, Goldberg Y, Elhadad M, Ziv-Ukelson M: Rich parameterizationimproves RNA structure prediction. Research in Computational MolecularBiology Springer; 2011, 546-562.

91. Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS: MicroRNA targetsin Drosophila. Genome Biol 2003, 5.

92. Thadani R, Tammi MT: MicroTar: predicting microRNA targets from RNAduplexes. BMC Bioinformatics 2006, 7:Suppl 5.

93. Rusinov V, Baev V, Minkov IN, Tabler M: MicroInspector: a web tool fordetection of miRNA binding sites in an RNA sequence. Nucleic Acids Res2005, 33(Web Server):696-700.

94. Hertel J, Stadler PF: Hairpins in a Haystack: recognizing microRNAprecursors in comparative genomics data. Bioinformatics 2006,22(14):197-202.

95. Washietl S, Hofacker IL, Stadler PF: Fast and reliable prediction ofnoncoding RNAs. Proc Natl Acad Sci USA 2005, 102(7):2454-2459.

96. Gruber AR, Findeiss S, Washietl S, Hofacker IL, Stadler PF: RNAZ 2.0:IMPROVED NONCODING RNA DETECTION. Pac Symp Biocomput 2010,15:69-79.

97. Hertel J, Hofacker IL, Stadler PF: SnoReport: computational identificationof snoRNAs with unknown targets. Bioinformatics 2008, 24(2):158-164.

98. Reiche K, Stadler PF: RNAstrand: reading direction of structured RNAs inmultiple sequence alignments. Algorithms Mol Biol 2007, 2:6-6.

99. Kaczkowski B, Torarinsson E, Reiche K, Havgaard JH, Stadler PF, Gorodkin J:Structural profiles of human miRNA families from pairwise clustering.Bioinformatics 2009, 25(3):291-294.

100. Tafer H, Ameres SL, Obernosterer G, Gebeshuber CA, Schroeder R,Martinez J, Hofacker IL: The impact of target site accessibility on thedesign of effective siRNAs. Nat Biotechnol 2008, 26(5):578-583.

101. Kiryu H, Kin T, Asai K: Robust prediction of consensus secondarystructures using averaged base pairing probability matrices.Bioinformatics 2007, 23:434-441.

102. Stocsits RR, Letsch H, Hertel J, Misof B, Stadler PF: Accurate and efficientreconstruction of deep phylogenies from structured RNAs. Nucleic AcidsResearch 2009, 37(18):6184-6193.

103. Heyne S, Will S, Beckstette M, Backofen R: Lightweight comparison ofRNAs based on exact sequence-structure matches. Bioinformatics 2009,25(16):2095-2102.

104. Upper D: The unsuccessful self-treatment of a case of “writer’s block”.Journal of Applied Behavior Analysis 1974, 7(3):497.

105. Dowell R, Eddy S: Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMCbioinformatics 2004, 5:71.

106. Matthews B: Comparison of the predicted and observed secondarystructure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-ProteinStructure 1975, 405(2):442-451.

doi:10.1186/1748-7188-6-26Cite this article as: Lorenz et al.: ViennaRNA Package 2.0. Algorithms forMolecular Biology 2011 6:26.

Lorenz et al. Algorithms for Molecular Biology 2011, 6:26http://www.almob.org/content/6/1/26

Page 14 of 14