does native state topology determine the rna folding...

9
Does Native State Topology Determine the RNA Folding Mechanism? Eric J. Sorin 1 , Bradley J. Nakatani 1 , Young Min Rhee 1 Guha Jayachandran 2 , V Vishal 1 and Vijay S. Pande 1,3,4,5 * 1 Department of Chemistry Stanford University, Stanford CA 94305-5080, USA 2 Department of Computer Science, Stanford University Stanford, CA 94305-5080 USA 3 Department of Biophysics Stanford University, Stanford CA 94305-5080, USA 4 Department of Structural Biology, Stanford University Stanford, CA 94305-5080 USA 5 Stanford Synchrotron Radiation Laboratory Stanford University Stanford, CA 94305-5080 USA Recent studies in protein folding suggest that native state topology plays a dominant role in determining the folding mechanism, yet an analogous statement has not been made for RNA, most likely due to the strong coupling between the ionic environment and conformational energetics that make RNA folding more complex than protein folding. Applying a distributed computing architecture to sample nearly 5000 complete tRNA folding events using a minimalist, atomistic model, we have characterized the role of native topology in tRNA folding dynamics: the simulated bulk folding behavior predicts well the experimentally observed folding mechanism. In contrast, single-molecule folding events display multiple discrete folding transitions and compose a largely diverse, heterogeneous dynamic ensemble. This both supports an emerging view of hetero- geneous folding dynamics at the microscopic level and highlights the need for single-molecule experiments and both single-molecule and bulk simulations in interpreting bulk experimental measurements. q 2004 Elsevier Ltd. All rights reserved. Keywords: RNA folding; bulk kinetics; mechanism; distributed computing; biasing potential *Corresponding author Introduction The study of biological self-assembly is at the forefront of modern biophysics, with many labora- tories around the world focused on answering the question “How do biopolymers adopt biologically active native folds?” In recent years, protein folding studies have put forth a new “topology- driven” view of folding by recognizing correlations between folding mechanism and topology of the native state. 1 For instance, the laboratories of Goddard and Plaxco have put forth several reports connecting the folding rates of small proteins with the topological character of those proteins, 2–5 and comparisons between topology and unfolding pathways have been made. 6,7 Ferrara and Caflisch studied the folding of peptides forming beta struc- ture and concluded that the folding free energy landscape is determined by the topology of the native state, whereas specific atomic interactions determine the pathway(s) followed, 8 and our recent comparison between protein and RNA hair- pin structures supports this line of reasoning. 9 This is not surprising when one considers that the folding transition states for proteins of similar shape were shown to be insensitive to significant sequence mutations. 10 – 12 Here, we attempt to directly characterize the contribution of native state topology to the RNA folding mechanism at both the single-molecule and bulk levels. Because bulk experiments inher- ently include ensemble-averaged observables, with distributions around those averages expected within a given system, single-molecule studies complement bulk studies by allowing observation 0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. E-mail address of the corresponding author: [email protected] Abbreviations used: RMSD, root-mean-squared deviations; LJ, Lennard – Jones. doi:10.1016/j.jmb.2004.02.024 J. Mol. Biol. (2004) 337, 789–797

Upload: others

Post on 16-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

Does Native State Topology Determine the RNAFolding Mechanism?

Eric J. Sorin1, Bradley J. Nakatani1, Young Min Rhee1

Guha Jayachandran2, V Vishal1 and Vijay S. Pande1,3,4,5*

1Department of ChemistryStanford University, StanfordCA 94305-5080, USA

2Department of ComputerScience, Stanford UniversityStanford, CA 94305-5080USA

3Department of BiophysicsStanford University, StanfordCA 94305-5080, USA

4Department of StructuralBiology, Stanford UniversityStanford, CA 94305-5080USA

5Stanford SynchrotronRadiation LaboratoryStanford UniversityStanford, CA 94305-5080USA

Recent studies in protein folding suggest that native state topology plays adominant role in determining the folding mechanism, yet an analogousstatement has not been made for RNA, most likely due to the strongcoupling between the ionic environment and conformational energeticsthat make RNA folding more complex than protein folding. Applying adistributed computing architecture to sample nearly 5000 complete tRNAfolding events using a minimalist, atomistic model, we have characterizedthe role of native topology in tRNA folding dynamics: the simulated bulkfolding behavior predicts well the experimentally observed foldingmechanism. In contrast, single-molecule folding events display multiplediscrete folding transitions and compose a largely diverse, heterogeneousdynamic ensemble. This both supports an emerging view of hetero-geneous folding dynamics at the microscopic level and highlights theneed for single-molecule experiments and both single-molecule and bulksimulations in interpreting bulk experimental measurements.

q 2004 Elsevier Ltd. All rights reserved.

Keywords: RNA folding; bulk kinetics; mechanism; distributed computing;biasing potential*Corresponding author

Introduction

The study of biological self-assembly is at theforefront of modern biophysics, with many labora-tories around the world focused on answering thequestion “How do biopolymers adopt biologicallyactive native folds?” In recent years, proteinfolding studies have put forth a new “topology-driven” view of folding by recognizing correlationsbetween folding mechanism and topology of thenative state.1 For instance, the laboratories ofGoddard and Plaxco have put forth several reportsconnecting the folding rates of small proteins withthe topological character of those proteins,2 – 5 andcomparisons between topology and unfolding

pathways have been made.6,7 Ferrara and Caflischstudied the folding of peptides forming beta struc-ture and concluded that the folding free energylandscape is determined by the topology of thenative state, whereas specific atomic interactionsdetermine the pathway(s) followed,8 and ourrecent comparison between protein and RNA hair-pin structures supports this line of reasoning.9

This is not surprising when one considers thatthe folding transition states for proteins of similarshape were shown to be insensitive to significantsequence mutations.10 – 12

Here, we attempt to directly characterize thecontribution of native state topology to the RNAfolding mechanism at both the single-moleculeand bulk levels. Because bulk experiments inher-ently include ensemble-averaged observables,with distributions around those averages expectedwithin a given system, single-molecule studiescomplement bulk studies by allowing observation

0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.

E-mail address of the corresponding author:[email protected]

Abbreviations used: RMSD, root-mean-squareddeviations; LJ, Lennard–Jones.

doi:10.1016/j.jmb.2004.02.024 J. Mol. Biol. (2004) 337, 789–797

Page 2: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

of dynamics that are masked by such ensemble-averaging.13 – 15 Yet differences between single-molecule and bulk folding behaviors can beprofound, as detailed herein.

In the past, the ability of simulation to predictbulk system dynamical properties has been hin-dered by the limited number of events that onecould simulate. Using a distributed computingarchitecture16 to overcome this barrier, we havecollected nearly 5000 complete folding events forthe 76 nucleotide tRNA topology, which is signifi-cantly more complex than the small, fast-foldingsystems commonly examined in foldingsimulations.9,17,18 This degree of sampling repre-sents ,500 CPU years of work conducted on,10,000 CPUs within our global computingnetwork†.

Interestingly, topological considerations haveonly been applied to the smallest of RNAs,9 likelybecause the folding of functional RNAs is knownto be more complex than that of small proteinsdue to the coupling of RNA conformational ener-getics with the predominantly ionic environment.19

For instance, it has been demonstrated that theidentity and concentration of ions present canalter the folding and unfolding dynamics exhibitedby the tRNA topology,20 – 22 further complicatingour understanding of RNA folding.

To directly assess the relationship betweennative topology and RNA folding mechanism, weemploy a minimalist model similar to the Go-likemodels previously used to study proteinfolding.15,23 – 28 We improve upon these models bydeveloping a folding potential that: (i) treats thepolymer in atomic detail; (ii) introduces attractivenon-native interaction energies; and (iii) includesion-dependent effects on the folding dynamics.These features allow for a much more accurateapproximation of the true energetics than thecoarse-grained models commonly used to studythe dynamics of large biomolecules,28 while inher-ently maintaining the polymeric entropy expectedfrom atomistic models, thereby allowing a causalrelationship to be inferred between changes inthe energetics of the model and changes in theresulting dynamics.

Striving for simplicity, we consider the mostelementary model of the ionic effect in RNA fold-ing, which assumes that monovalent and divalentcations stabilize secondary and tertiary structure,respectively. We propose that such a simple modelis adequate to capture the essential foldingdynamics observed in experiments under varyingionic conditions: our model employs two para-meters, 12 and 13, which specify the energetic bene-fit of native secondary and tertiary interactions,respectively, coupled only in their simultaneoususe in thermal calibrations of the model.

We note that while the use of Go-like potentialsto extract precise folding dynamics is questionable

at best, the simplified biasing potential used herepresents an ideal methodology to probe therelationship between topology and folding mecha-nism: the folding potential is built on informationfrom the native tRNA fold. If the relationshipbetween topology and folding mechanism is negli-gible, our results should show significant disagree-ment with experimental observations. If, on theother hand, this relationship is significant, theagreement with experiment should also besignificant.

Indeed, we show below that the bulk foldingbehavior predicted by this simple, atomistic,minimalist model predicts well the experimentallyobserved bulk folding mechanism, suggesting thattopology does in fact contribute to the bulk foldingmechanism. Furthermore, massive sampling has,for the first time, allowed direct comparison ofsimulated ensemble kinetics to single-moleculefolding simulations, illustrating a tremendousheterogeneity inherent to individual folding eventsthat collectively contribute to a topologically dri-ven bulk folding mechanism. We discuss belowthe general results seen in single-molecule foldingtrajectories, followed by analysis of kinetic andmechanistic observations from an ensemble-averaged perspective, as well as the role of poly-meric entropy in folding. Caveats of the model arethen discussed in Methods.

Results

Diverse pathways in tRNA folding

The native fold of tRNA can be described interms of nine “substructures” (Figure 1(a)). Thewell-defined secondary structure is composed offour helix-stem regions (S1, S2, S3, S4), while thetertiary structure consists of a set of five regions oflong-range contact (T12, T13, T14, T23, T24), wherethe dual subscripts denote the two helix-stems incontact for a given tertiary region. Formation ofthese substructures in simulated single-moleculefolding events occur as a series of numerous, dis-crete transitions in the fraction of native contacts,Q. To quantitatively characterize the observedpathways, the relative folding times for thesenine substructures were calculated. In many cases,multiple substructures folded in the same timerange. To unambiguously determine the sub-structural folding order in a given trajectory, thevariance in the fraction of native contacts withineach substructure throughout that trajectory,VarðQÞ; was calculated, and the relative foldingtime of each substructure was defined as the tem-poral point at which VarðQÞ was a maximum (i.e.the point at which Q was increasing most rapidly).

Pearson correlations (defined as the covarianceof two sets divided by the product of their stan-dard deviations) between folding times for eachpair of substructures within the ensemble of fold-ing events were then evaluated, and the resulting† Folding@Home, http://folding.stanford.edu

790 Bulk tRNA Folding Simulations

Page 3: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

matrix (Figure 1(b)) shows a distinct separationbetween two phases during folding. Theseuncorrelated groups are referred to below asphase 1 ¼ {S2; S3; S4;T12} and phase 2 ¼ {T13;T14;T23;T24; S1}: Each folding trajectory was thenassigned to a specific folding pathway, denoted bythe sequence of substructural folding events thatoccurred in that specific simulation.

In an effort to simplify the representation of themany pathways observed, folding simulationswere grouped into three classes. While the initialevents in the folding process were similar for allfolders (predominantly stem formation), the final

stages of folding exhibited a much larger variation.For this reason, the ordering of the substructuralfolding events within phase 2 was used to classifyfolding trajectories. The observed statisticalweighting of folding pathways is detailed for eachfolding class in Table 1. For brevity, pathways thatcontributed less than 1% of the folding ensemblehave been removed, and the tabulated pathwaysaccount for approximately 90% of observedfolders.

Although this sampling of pathways cannotquantitatively capture equilibrium thermodynamicbehavior, insight into the qualitative nature of thefolding landscape (e.g. locations of intermediatesand free energy barriers) can be gained by consid-ering the relative probabilities of each microstatepresent in the simulated ensemble along a minimalnumber of folding parameters (the fraction ofnative secondary and tertiary contacts, Q2 and Q3,respectively). The log of the conformational proba-bilities on this simplified, two-dimensional surfaceis plotted in Figure 2(a). It is evident that eachclass of folding pathways predicts multiple inter-mediates in the folding process, observed as thepreviously mentioned numerous, discrete steps insingle-molecule folding trajectories. A graphicalrepresentation of the most statistically predomi-nant pathway observed is shown in Figure 2(b),and an animation of this simulation is availablefor viewing online†.

Ensemble-averaged kinetics

We find significant differences in comparingensemble averaged dynamic properties to single-molecule trajectories. As shown in Figure 3(a),again using the fraction of native contacts as theprimary folding parameter, several distinct popu-lations are present throughout the ensemble offolding events. With the exception of the fullyextended chain (E, a physically irrelevant stateused to start simulations without biasing the resul-ting folding pathways), these states qualitatively

Figure 1. tRNA substructure depiction and foldingtime correlations. (a) tRNA substructures: red, S1 accep-tor stem; orange, S2 D-stem; green, S3 anti-codon stem;blue, S4 T-stem; regions of tertiary contact are notedwith arrows, where Tij indicates tertiary interactionsbetween the Si and Sj stems. (b) Relative folding time(Pearson) correlations between each substructural pairwithin the ensemble of 2592 folding trajectories.

Table 1. Weighting of tRNA folding pathways

Class Weight (%) Pathway

I 36.3 S4 [S2,T12] S3 [T13,T23] [T14,T24] S1

10.6 S4 S3 [S2,T12] [T13,T23] [T14,T24] S1

4.5 [S2, T12] S4 S3 [T13,T23] [T14,T24] S1

1.7 S4 [S2,T12] S3 T24 [T13,T23] T14 S1

II 13.0 S4 [S2,T12] S3 S1 [T13,T23] [T14,T24]3.4 S4 S3 [S2,T12] S1 [T13,T23] [T14,T24]2.2 S4 [S2,T12] S3 S1 T14 [T13,T23] T24

1.5 S4 [S2,T12] S1 S3 [T13, T23] [T14,T24]1.4 [S2,T12] S4 S3 S1 [T13,T23] [T14,T24]

III 8.1 S4 [S2,T12] S3 [T13,T23] S1 [T14,T24]2.4 S4 S3 [S2,T12] [T13,T23] S1 [T14,T24]2.4 S4 S3 [S2,T12] [T13,T23] S1 [T14,T24]1.6 S4 [S2,T12] S3 [T13,T23] T14 S1 T24

† http://folding.stanford.edu/tRNA/

Bulk tRNA Folding Simulations 791

Page 4: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

agree with experimentally observed populationsdescribed by Sosnick and co-workers,20,29 whodifferentiate between multiple unfolded states (U0

and U in the presence and absence of 4 M urea,respectively) and intermediates (INa denotes thebulk intermediate observed in high sodium con-centrations and IMg is the bulk intermediateobserved in the presence of magnesium).

The distributions of native contact formationPfoldðQÞ versus the number of native contacts ðQÞfor each substructure are plotted in Figure 3(b),with secondary substructures (upper panel) color-coded as in Figure 1(a), and tertiary substructures(lower panel) scaled from white (T12) to black(T24). Early events in the folding process (when Qis small) include formation of the three non-terminal helices, alongside the T12 tertiary contact(phase 1). Long-range tertiary contacts{T13;T14;T23;T24} then form in any variety of tem-poral sequences after phase 1 has completed, withthe formation of the terminal helix-stem ðS1; redÞeither preceding or succeeding collapse andformation of tertiary structure, as describedexperimentally.20

The distribution of the radius of gyration Rg isshown in Figure 3(c), and shows quantitativeagreement with experimentally measured molecu-lar sizes attained for collapsed states:29 the IMg

and N ensembles have kRgl ¼ 31:8ð^3:4Þ A andkRgl ¼ 26:2ð^1:3Þ A, respectively. These values areonly slightly larger than those reported by Fanget al. due to the added polymeric flexibility of ourmodel (see Methods). While the quantitative agree-ment for the native state Rg is fortuitous (based onthe construction of the model), the quantitative

agreement observed for IMg is a striking predictionof the character of the intermediate from a modelbased solely on native topology information. Thedistribution of all-atom root-mean-squared devia-tions (RMSD) from the native structure is alsoshown (inset).

The folding mechanism predicted from a bulkperspective (which is dependent on ion identityand concentration) is shown in Figure 3(d). Fromthe unfolded state, the IMg (full formation ofsecondary structure) and INa (formation of thethree non-terminal helix-stems) intermediates areboth possible. From IMg, the INa intermediate canbe formed, with phase 2 tertiary contacts forminglast. Alternatively, folding may include formationof tertiary contacts followed by zipping of theterminal S1 helix-stem.

To illustrate how a three-state (U ! I ! N)observation could come from a large ensemble ofindividual folding events, each exhibiting numer-ous discrete transitions, Figure 4 depicts theensemble-averaged fraction of native contactskQðtÞl alongside a single, randomly chosen trajec-tory (Figure 4(a)). Figure 4(b) demonstrates howthe discrete nature of folding is lost when consider-ing only ten randomly chosen individual foldingevents in the averaging. With more than 2500trajectories (Figure 4(c)), all sense of discrete, step-wise folding is lost, and the kQðtÞl curve becomesa smooth function of time. Still, the averaging ofonly ten independent folding trajectories is morerepresentative of a single trajectory than of theensemble as a whole, and only after including ,500or more events (data not shown) does the kQðtÞlcurve approximate that of the entire ensemble.

Figure 2. Statistical energy pro-files and the predominant foldingpathway. (a) Locations of freeenergy barriers and minima on thesimplified two-dimensional foldingsurface, where folding classes weredefined by the order of substruc-tural formation in the second phaseof folding. (b) Graphical represen-tation of the most predominantfolding pathway in our simulationswith substructures color-coded asin Figure 1(a). Labels indicatewhich substructures have foldedor are folding in each frame. Therelevant statistical energy landscapeis shown in grayscale with thetrajectory overlaid in red.

792 Bulk tRNA Folding Simulations

Page 5: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

It is convenient to consider the formation of eachof the nine independent substructures within theensemble, starting from the extended polymer, asa stochastic (Poisson) process with a given foldingrate ln. This is particularly the case for simplehelix formation (with no bulges or other internalstructure), which has been shown to be a two-state process.9,30,31 Such processes are additivesuch that the sum of n independent Poissonprocesses is itself a Poisson process with a totalrate equal to the sum of the n individual rates.Thus, if folding occurred simply as simultaneousformation of all nine substructures, one wouldexpect the total number of native contacts Q toincrease exponentially in time, with a total rateequal to the sum of the rates for each substructure.

In contrast to the simple, exponentially increas-ing kQðtÞl described above, two such phases areapparent in Figure 4(c). Because phase 2 beginsonly after a portion of individual trajectories havecompleted phase 1, the two transitions in thekQðtÞl curve were fit individually to single-expo-nential processes (dotted curves in Figure 4(c)),resulting in R2 values of 0.999 and 0.998, respect-ively. Were there not a significant overlap betweenthe two phases (i.e. if all trajectories completedphase 1 prior to any of them starting phase 2),these fits would be expected to fully predict kQðtÞl;which would then exhibit a cusp at the crossoverpoint between the two fits. As the resulting rate ofphase 1 is greater than that for phase 2, the buildupof a single observable intermediate between thephases is predicted.

To consider the dynamic populations presentfrom a chemical kinetics perspective, Figure 4(d)shows the evolution of ensemble mole fractions of

Figure 4. Ensemble-averaged kinetics. The fraction ofnative contacts for a single folding trajectory (a) shownnext to the ensemble-averaged fraction of native contactsfor ten randomly chosen folding simulations (b) and all2592 folding events (c). kQðtÞl approaches a smooth func-tion of time with two distinct, sequential exponentialphases (fitted as dotted curves) in (c). In (b) and (c),Var½QðtÞ� is shown as a thick gray line. (d) The time-dependent mole fractions for the U, I, and N states.

Figure 3. Ensemble-averaged properties during tRNAfolding. (a) Detected states in the bulk folding process.(b) Distributions of relative folding times for each sub-structure, with secondary structures color-coded as inFigure 1(a) (upper panel) and tertiary structures scaledfrom white to black (lower panel). (c) Rg and RMSD(inset) distributions for the observed states. (d) Predictedion-dependent bulk folding mechanism.

Bulk tRNA Folding Simulations 793

Page 6: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

unfolded U, intermediate I, and native state N con-formations. The I state was defined (from statisticalpopulations in Figure 2(a)) as having formed,60% and ,10% of native secondary and tertiarycontacts, respectively, and exhibits a concentrationmaximum that coincides with the crossover pointof the individual fits in (c). The native state curveis well fit by the proper biexponential relationshipfor the sequential, irreversible reaction U ! I ! Nwith an R2 value of 0.999.

Pathways for overcoming entropy

A strength of atomistic, minimalist models istheir utility in elucidating the role of polymericentropy in the folding dynamics.27 It is thus inter-esting to consider the balance of interaction energyand polymeric entropy in tRNA folding. We stressthat in this minimalist model the energetics aredefined by the native tRNA topology or, morespecifically, by the “stability ratio,” 12/13. The fold-ing mechanisms for extreme values of the stabilityratio are obvious: for 12=13 q 1 secondary structurewill form preferentially, and for 12=13 p 1; tertiarystructure would be preferred. Simulation can playan important role in characterizing the experi-mentally relevant intermediate regime, where onemust consider the delicate balance between theenergetic benefits and entropic penalties of form-ing native structure.

The variation of 12/13 implies the variation ofionic conditions: calibration of our model toexperiment suggests that for 12=13 ¼ 1 (Figure 5,top panels) we are in a regime analogous tomagnesium-mediated folding, whereby three ofthe four helix-stem elements are structured in theearly stages of folding (observed experimentally

as a preference for IMg), with S1 forming late in thefolding process. In contrast, when secondary struc-ture is significantly more favorable than tertiarystructure, 12=13 ¼ 3; our model simulates a regimeanalogous to higher sodium concentrations in theabsence of magnesium, as shown in Figure 5(lower panels). In this case, the formation of theterminal stem-helix precedes the formation ofphase 2 tertiary contacts (INa). We stress that thisfinding is in good agreement with the rather non-canonical experimental results reported by Sheltonet al.,20 in which the simple cloverleaf secondarystructure is not a necessary intermediate on thetRNA folding pathway.

Like the formation of nucleic acid hairpins, thefolding of the terminal RNA helix-stem appearsto be cooperative due to a balance between theenergetics of interaction and the polymeric entropyloss of structure formation. Indeed, it has beenpreviously suggested that the polymeric entropyof biomolecules can be sufficient to lead tocooperative folding.32 However, one would expecta significant entropic penalty for bringing togetherthe two ends of the 76 nucleotide tRNA, and sur-mounting this barrier is apparently accomplishedin one of the two experimentally observed ways.

Increasing the stability ratio results in greatersecondary structure stability, and the enthalpy ofbase-pairing dominates the conformational freeenergy, thus favoring early zipping of the terminalstem (INa). On the other hand, decreasing thestability ratio implies a more stable tertiary struc-ture, and the entropic barrier to terminal zippingdominates, resulting in unpaired S1 strands (IMg)until late in the folding process (after long-rangecollapse has occurred). We thus hypothesize thatthis entropic barrier is responsible for the distinction-dependent dynamics observed in tRNA fold-ing, with counterions tuning the relative stabilityof the intermediates and, in effect, “tipping thescale” in favor of one mechanism over another. Tobe sure, our simulations show that even a some-what subtle change in the stability ratio can leadto a qualitative change in the folding behavior.

Discussion and Implications

Using massively distributed computationalsampling and a minimalist, atomistic model withimplicit solvation and counterion effects, wehave shown that bulk non-equilibrium reactiondynamics can be extracted from many single-mol-ecule events without the need for extrapolatingbulk behavior from a small number of simulations.While our ensemble-averaging includes far fewerevents than the number studied in standard bulkexperiments, we have observed convergence to anensemble signal from ,2500 individual foldingevents, and have identified the lower bound forattaining the ensemble signal to be ,500 indi-vidual trajectories for a molecule of this size andcomplexity.

Figure 5. Ion-dependent dynamics. As in Figure 3(b),the distributions of relative substructural folding timesare shown for both perturbed values of the stabilityratio 12=13:

794 Bulk tRNA Folding Simulations

Page 7: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

Our results indicate that native tRNA topologyserves as a dominant predictor of the bulk foldingmechanism. To be sure, a biasing potential builtfrom native state information that includes non-native interaction energetics recovers the knownthree-state behavior reported previously and wellpredicts the character of known environmentallydependent intermediates, as specified by bothmolecular size and substructure formation.20 Incontrast to the ensemble-averaged behavior, greatdiversity in single-molecule folding pathways hasbeen observed. This is not surprising when con-sidered in the context of contemporary foldingtheories, which identify native folds as energetic“traps” and portray the polymeric motion thatleads to trapping in low-energy states as predomi-nantly stochastic in nature.

Still, it remains a common practice to interpretbulk signals from various experimental sources asrepresenting the “true” dynamics on a microscopiclevel, and it has only recently been put forth that(unobservable) intermediates are likely presenteven in the most simple two-state folders.33 Thesampling reported herein thus serves as anexample of the distinction that should be drawnbetween single-molecule (microscopic) and bulk(macroscopic) probes, and highlights the need forboth single-molecule experiments and simulation,both single-molecule and bulk, in the interpre-tation of bulk measurements. Indeed, it will beintriguing to see this complexity revealed bysingle-molecule experiments of tRNA folding.

Methods

Construction of the minimalist model

To construct a minimalist model for RNA folding, wehave expanded upon previous minimalist models of pro-tein folding15,25,28,34,35 by including non-native long-rangeinteraction energy terms (adding energetic frustrationto the folding landscape) and by considering the role ofsolvated counterions in the folding process. We definenative contacts as atomic pairs separated by three ormore residues and within 6.0 A of one another in thenative structure (based on tRNAphe, PDB ID 6tna). Theseinteratomic interactions are modeled using differentclasses of long-range Lennard–Jones (LJ) interactions:native contact pairs within the same helix-stem region(secondary contacts) are assigned energy 12 (initially,ten times greater than non-native interaction energies),while all other native pairs (tertiary contacts) areassigned LJ energy 13.

An initial stability ratio of 12=13 ¼ 2 was used, therebyassigning twice the energetic benefit to native secondarycontacts as that assigned to native tertiary contacts, anda total of 2592 complete folding events were simulated.To gain insight into the distinct kinetics reported fordiffering ionic conditions,20 over 1000 complete foldingevents were also collected using stability ratios of 1 and3, respectively. Each parameterization was calibrated toa melting temperature of ,343 K such that an effectivetemperature of 280(^10) K was employed for all foldingsimulations.

A unified-atom variant of the AMBER94 force field36

was used to define bonded interactions (bond, angle,and torsion terms), with optimal geometries taken fromthe native state. To speed the dynamics of interest, lowviscosity (tT ¼ 0.5 ps, ,2% that of water) was employedand dihedral energy terms were increased by a factor of5 relative to the effective temperature. The polymer wasmade less rigid by also decreasing bonded interactionenergies by a factor of 10 relative to the effectivetemperature. A complete set of simulation input files isavailable†.

All simulations were conducted using the stochasticdynamics integrator within the Gromacs moleculardynamics suite37 with 15 A cutoffs on all non-bondedinteractions. As the mean fraction of native contactspresent in simulations of the native state was 0.87, allfolding simulations (started from a fully extended con-formation) were considered fully folded when 87% ofall possible secondary contacts and 87% of all possibletertiary contacts were formed.

Caveats of the biasing potential

While minimalist models allow the study of events notobservable via fully atomistic molecular dynamics,15,25,27

the limitations of the minimalist model approach shouldbe considered. First of all, a quantitative prediction ofabsolute transition rates is not possible, and temporalanalyses have therefore been conducted in units ofiterations, yielding qualitative information on the natureof relative rates.15 Moreover, since non-native inter-actions are less energetically favorable than native ones,this model is not expected to accurately characterizeoff-pathway folding intermediates (kinetic traps), otherthan those resulting from topological frustration.15,26

However, we note that the inclusion of weak non-nativeattractions not present in previous biasing modelsgreatly enhances the stochastic nature of folding trajec-tories, allowing a more plausible description of theunderlying energetic landscape.

Furthermore, as 12 and 13 were chosen to be constantover the whole molecule, the simulated tRNA sequencemay not have specific energetic biases to particular sub-structures. While some tRNA sequences may break thisapproximation, the goal of this work is to study thebiasing role of polymeric entropy and we are thus inves-tigating a general property of RNA folding, i.e. the effectof the tRNA topology on folding. While one couldinclude substructure-dependent energetic biases tomodel differences between specific tRNA sequences,this would obscure the roles of polymeric entropy andtopology in folding.

Finally, a prime consideration in employing a biasingpotential of any sort, is the possibility of artifacts withinthe potential distorting the observed dynamics far fromthat of the true biomolecular system, and the true test ofsuch a potential is the ability to show agreement withcurrent knowledge prior to making predictions. Withinthe results reported above, we propose that such “arti-factual folding” is seen as the early formation of the T12

tertiary contact (Figure 3(b)), which has not been shownto occur in the early stages of folding. Such potential arti-facts may also affect the relative rates of individual helix-stem formation. We believe such artifactual folding to beminimal within the model, and stress that the ordering of

† http://folding.stanford.edu/tRNA

Bulk tRNA Folding Simulations 795

Page 8: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

phase 1 secondary structure formation does not changethe overall character of the analyses presented above.

Acknowledgements

We thank Erik Lindahl for technical aid, DanHerschlag, Rhiju Das, Sung-Joo Lee, and MarkEngelhardt for their comments on the manuscript,and the Folding@Home volunteers worldwidewho made this work possible (a list of contributorsis available)†. G.J. is a Siebel Fellow. E.J.S. andY.M.R. are supported by predoctoral fellowshipsfrom the Krell/DOE CSGF and Stanford GraduateFellowship, respectively, with additional support-ing grants ACS-PRF (36028-AC4) and NSF MRSECCPIMA (DMR-9808677), and a gift from Intel.

References

1. Baker, D. (2000). A surprising simplicity to proteinfolding. Nature, 405, 39–42.

2. Debe, D. A. & Goddard, W. A. (1999). First principlesprediction of protein folding rates. J. Mol. Biol. 294,619–625.

3. Plaxco, K. W., Simons, K. T., Ruczinski, I. & David, B.(2000). Topology, stability, sequence, and length:defining the determinants of two-state protein fold-ing kinetics. Biochemistry, 39, 11177–11183.

4. Makarov, D. E., Keller, C. A., Plaxco, K. W. & Metiu,H. (2002). How the folding rate constant of simple,single-domain proteins depends on the numberof native contacts. Proc. Natl Acad. Sci. USA, 99,3535–3539.

5. Jewett, A. I., Pande, V. S. & Plaxco, K. W. (2003).Cooperativity, smooth energy landscapes and theorigins of topology-dependent protein folding rates.J. Mol. Biol. 326, 247–253.

6. Klimov, D. K. & Thirumalai, D. (2000). Native top-ology determines force-induced unfolding pathwaysin globular proteins. Proc. Natl Acad. Sci. USA, 97,7254–7259.

7. Gsponer, J. & Caflisch, A. (2001). Role of nativetopology investigated by multiple unfolding simu-lations of four SH3 domains. J. Mol. Biol. 309,285–298.

8. Ferrara, P. & Caflisch, A. (2001). Native topology orspecific interactions: what is more important forprotein folding? J. Mol. Biol. 306, 837–850.

9. Sorin, E. J., Rhee, Y. M., Nakatani, B. J. & Pande, V. S.(2003). Insights into nucleic acid conformationaldynamics from massively parallel stochastic simu-lations. Biophys. J. 85, 790–803.

10. Chiti, F., Taddei, N., White, P. M., Bucciantini, M.,Magherini, F., Stefani, M. & Dobson, C. M. (1999).Mutational analysis of acylphosphatase suggests theimportance of topology and contact order in proteinfolding. Nature Struct. Biol. 6, 1005–1009.

11. Martınez, J. C. & Serrano, L. (1999). The folding tran-sition state between SH3 domains is conformation-ally restricted and evolutionarily conserved. NatureStruct. Biol. 6, 1010–1016.

12. Riddle, D. S., Grantcharova, V. P., Santiago, J. V.,Alm, E., Ruczinski, I. & Baker, D. (1999). Experimentand theory highlight role of native state topology inSH3 folding. Nature Struct. Biol. 6, 1016–1024.

13. Deschenes, L. A. & Vanden Bout, D. A. (2001).Single-molecule studies of heterogeneous dynamicsin polymer melts near the glass transition. Science,292, 255–258.

14. McKinney, S. A., Declais, A. C., Lilley, D. M. J. & Ha,T. (2003). Structural dynamics of individual Hollidayjunctions. Nature Struct. Biol. 10, 93–97.

15. Shimada, J. & Shakhnovich, E. I. (2002). The ensem-ble folding kinetics of protein G from an all-atomMonte Carlo simulation. Proc. Natl Acad. Sci. USA,99, 11175–11180.

16. Zagrovic, B., Sorin, E. J. & Pande, V. (2001).b-Hairpin folding simulations in atomistic detailusing an implicit solvent model. J. Mol. Biol. 313,151–169.

17. Pande, V. S., Baker, I., Chapman, J., Elmer, S., Kaliq,S., Larson, S. et al. (2003). Atomistic protein foldingsimulations on the submillisecond timescale usingworldwide distributed computing. Biopolymers, 68,91–109.

18. Sorin, E. J., Engelhardt, M. A., Herschlag, D. &Pande, V. S. (2002). RNA simulations: probing hair-pin unfolding and the dynamics of a GNRA tetra-loop. J. Mol. Biol. 317, 493–506.

19. Thirumalai, D., Lee, N., Woodson, S. A. & Klimov,D. K. (2001). Early events in RNA folding. Annu.Rev. Phys. Chem. 52, 751–762.

20. Shelton, V. M., Sosnick, T. R. & Pan, T. (2001). Alter-ing the intermediate in the equilibrium folding ofunmodified yeast tRNAPhe with monovalent anddivalent cations. Biochemistry, 40, 3629–3638.

21. Stein, A. & Crothers, D. M. (1976). Conformationalchanges of transfer RNA. The role of magnesium(II).Biochemistry, 15, 160–168.

22. Stein, A. & Crothers, D. M. (1976). Equilibrium bind-ing of magnesium(II) by Escherichia coli tRNAfMet.Biochemistry, 15, 157–160.

23. Ueda, Y., Taketomi, H. & Go, N. (1975). Studies onprotein folding, unfolding and fluctuations by com-puter simulation. I. The effects of specific aminoacid sequence represented by specific inter-unitinteractions. Int. J. Pept. Protein Res. 7, 445–459.

24. Ueda, Y., Taketomi, H. & Go, N. (1978). Studies onprotein folding, unfolding, and fluctuations by com-puter-simulation. 2. 3-dimensional lattice model oflysozyme. Biopolymers, 17, 1531–1548.

25. Shea, J. E., Onuchic, J. N. & Brooks, C. L. (2002).Probing the folding free energy landscape of thesrc-SH3 protein domain. Proc. Natl Acad. Sci. USA,99, 16064–16068.

26. Clementi, C., Hugh Nymeyer, H. & Onuchic, J. N.(2000). Topological and energetic factors: what deter-mines the structural details of the transition stateensemble and “en-route” intermediates for proteinfolding? an investigation for small globular proteins.J. Mol. Biol. 298, 937–953.

27. Pande, V. S. & Rokhsar, D. S. (1998). Is the moltenglobule a third phase of proteins? Proc. Natl Acad.Sci. USA, 95, 1490–1494.

28. Clementi, C., Garcia, A. E. & Onuchic, J. N. (2003).Interplay among tertiary contacts, secondary struc-ture formation and side-chain packing in the proteinfolding mechanism: all-atom representation study ofprotein L. J. Mol. Biol. 326, 933–954.

29. Fang, X., Littrell, K., Yang, X.-J., Henderson, S. J.,† Folding@Home, http://folding.stanford.edu

796 Bulk tRNA Folding Simulations

Page 9: Does Native State Topology Determine the RNA Folding ...folding.cnsm.csulb.edu/pubs/sorin_tRNA_2004.pdf · Does Native State Topology Determine the RNA Folding Mechanism? Eric J

Siefert, S., Thiyagarajan, P. et al. (2000). Mg2þ-depen-dent compaction and folding of yeast tRNAPhe andthe catalytic domain of the B. subtilis RNase P RNAdetermined by small-angle X-ray scattering. Biochem-istry, 39, 11107–11113.

30. Zhang, W. B. & Chen, S. J. (2002). RNA hairpin-fold-ing kinetics. Proc. Natl Acad. Sci. USA, 99, 1931–1936.

31. Ansari, A., Kuznetsov, S. V. & Shen, Y. Q. (2001).Configurational diffusion down a folding funneldescribes the dynamics of DNA hairpins. Proc. NatlAcad. Sci. USA, 98, 7771–7776.

32. Pande, V. S., Grosberg, A. Y. & Tanaka, T. (2000).Heteropolymer freezing and design: towards physi-cal models of protein folding. Rev. Mod. Phys. 72,259–314.

33. Daggett, V. & Fersht, A. (2003). The present view ofthe mechanism of protein folding. Nature Rev. Mol.Cell Biol. 4, 497–502.

34. Nymeyer, H., Garcia, A. E. & Onuchic, J. N. (1998).Folding funnels and frustration in off-lattice minim-alist protein landscapes. Proc. Natl Acad. Sci. USA,95, 5921–5928.

35. Li, L. & Shakhnovich, E. I. (2001). Constructing, veri-fying, and dissecting the folding transition state ofchymotrypsin inhibitor 2 with all-atom simulations.Proc. Natl Acad. Sci. USA, 98, 13014–13018.

36. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R.,Merz, K. M., Ferguson, D. M. et al. (1995). A secondgeneration force field for the simulation of proteins,nucleic acids, and organic molecules. J. Am. Chem.Soc. 117, 5179–5197.

37. Lindahl, E., Hess, B. & van der Spoel, D. (2001).GROMACS 3.0: a package for molecular simulationand trajectory analysis. J. Mol. Model. 7, 306–317.

Edited by J. Doudna

(Received 3 October 2003; received in revised form 7 February 2004; accepted 10 February 2004)

Bulk tRNA Folding Simulations 797