ms3d structural elucidation of the hiv-1 packaging … structural elucidation of the hiv-1 packaging...

6
MS3D structural elucidation of the HIV-1 packaging signal Eizadora T. Yu* , Arie Hawkins , Julian Eaton, and Daniele Fabris Department of Chemistry and Biochemistry, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250 Edited by Roger D. Kornberg, Stanford University School of Medicine, Stanford, CA, and approved July 1, 2008 (received for review January 17, 2008) The structure of HIV-1 -RNA has been elucidated by a concerted approach combining structural probes with mass spectrometric detection (MS3D), which is not affected by the size and crystalli- zation properties of target biomolecules. Distance constraints from bifunctional cross-linkers provided the information required for assembling an all-atom model from the high-resolution coordi- nates of separate domains by triangulating their reciprocal place- ment in 3D space. The resulting structure revealed a compact cloverleaf morphology stabilized by a long-range tertiary interac- tion between the GNRA tetraloop of stemloop 4 (SL4) and the upper stem of stemloop 1 (SL1). The preservation of discrete stemloop structures ruled out the possibility that major rearrange- ments might produce a putative supersite with enhanced affinity for the nucleocapsid (NC) domain of the viral Gag polyprotein, which would drive genome recognition and packaging. The steric situation of single-stranded regions exposed on the cloverleaf structure offered a valid explanation for the stoichiometry exhib- ited by full-length -RNA in the presence of NC. The participation of SL4 in a putative GNRA loop-receptor interaction provided further indications of the plasticity of this region of genomic RNA, which can also anneal with upstream sequences to stabilize alter- native conformations of the 5 untranslated region (5-UTR). Con- sidering the ability to sustain specific NC binding, the multifaceted activities supported by the SL4 sequence suggest a mechanism by which Gag could actively participate in regulating the vital func- tions mediated by 5-UTR. Substantiated by the 3D structure of -RNA, the central role played by SL4 in specific RNA-RNA and protein-RNA interactions advances this domain as a primary target for possible therapeutic intervention. GNRA loop-receptor interaction high-resolution mass spectrometry HIV-1 Psi-RNA modeling structural probing T he multifaceted activities attributed to the packaging signal (-RNA) (1, 2) of HIV-1 are mediated by discrete stemloop domains that exercise individual functions, but can also operate in concert to complete complex tasks [supporting information (SI) Fig. S1] (3–5). Stemloop 1 (SL1) is primarily responsible for genome dimerization (6, 7), but its mutations induce significant reductions of RNA encapsidation (8). Stemloop 2 (SL2) contains the major splice donor (SD) (9), but its high affinity for the nucleocapsid (NC) domain of the Gag polyprotein implies its participation in genome recognition and packaging (10, 11). Stemloop 3 (SL3) is sufficient by itself to induce the packaging of heterologous RNA into virus-like particles (12), but its deletion does not completely eliminate encapsidation that is still sustained with suboptimal efficiency by the remaining stemloops (13). Located downstream of the gag’s starting codon, stemloop 4 (SL4) may be engaged in both coding and non-coding activities, as suggested by its possible involvement in long-range pairing interactions (14) and its ability to bind NC in vitro (11, 15). Despite a wealth of data showing extensive communication among domains, the bases for their functional relationships are not substantiated by comprehensive structural information, which is still missing because of the size and flexibility of full-length -RNA. A promising alternative for elucidating the 3D structure of biomolecules with no intrinsic limitations of size and crystalli- zation properties is represented by the combination of footprint- ing and cross-linking techniques with mass spectrometric detec- tion, collectively known as MS3D (16). We have recently developed MS3D approaches for RNA and its functional as- semblies, which involve the application of structural probes under conditions that preserve the native folding of the target substrate (17). The identity and sequence position of ensuing covalent adducts are determined by nuclease digestion, mass mapping, and tandem sequencing (17–19), which enable the utilization of a broader probe selection than allowed by tradi- tional electrophoresis-based methods (20). The experimental constraints obtained from structural probing are combined with fundamental information pertaining RNA structure to generate all-atom 3D models of target substrates. The validity of this approach was confirmed by the excellent agreement between the NMR and MS3D structures of the mouse mammary tumor virus pseudoknot completed in earlier work (21). The MS3D approach was implemented here to complete the structural determination of full-length -RNA. Any ambiguities arising from the simultaneous presence of alternative dimeric conformations were avoided by focusing on a 115-nt construct corresponding to the 240 –354 sequence of HIV-1 subtype B (Lai variant), in which G259 was replaced with A to disrupt the palindrome responsible for initiating RNA dimerization (DIS, Fig. S1) (6, 7). Additional mutants were designed ad hoc to validate salient features that emerged from initial probing experiments. Reproducing the sequential nature of the RNA folding process, full-length -RNA was modeled from the high-resolution coordinates of separate domains available in the Protein Data Bank (PDB), which were assembled according to the experimental constraints from structural probes. The new insights provided by this all-atom model are discussed in the context of the known functional communications among do- mains. A greater understanding of the structural basis for these relationships could provide the key for developing novel inhib- itors aimed at disrupting the crucial processes of genome rec- ognition, dimerization, and packaging in HIV-1. Results Structural Probing. The secondary structure of full-length -RNA was probed by monofunctional agents that alkylate the base- pairing faces of nucleotides exposed to the medium. Kethoxal (KT) was applied to probe N1 and N2 of unpaired guanines, Author contributions: E.T.Y. and D.F. designed research; E.T.Y. and A.H. performed re- search; E.T.Y., A.H., J.E., and D.F. analyzed data; and D.F. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. *Present address: Sandia National Laboratories, Livermore, CA 94551. E.T.Y. and A.H. contributed equally to this work. To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0800509105/DCSupplemental. © 2008 by The National Academy of Sciences of the USA 12248 –12253 PNAS August 26, 2008 vol. 105 no. 34 www.pnas.orgcgidoi10.1073pnas.0800509105

Upload: dodieu

Post on 09-Apr-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MS3D structural elucidation of the HIV-1 packaging … structural elucidation of the HIV-1 packaging signal Eizadora T. Yu* †, Arie Hawkins †, Julian Eaton, and Daniele Fabris

MS3D structural elucidation of the HIV-1packaging signalEizadora T. Yu*†, Arie Hawkins†, Julian Eaton, and Daniele Fabris‡

Department of Chemistry and Biochemistry, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250

Edited by Roger D. Kornberg, Stanford University School of Medicine, Stanford, CA, and approved July 1, 2008 (received for review January 17, 2008)

The structure of HIV-1 �-RNA has been elucidated by a concertedapproach combining structural probes with mass spectrometricdetection (MS3D), which is not affected by the size and crystalli-zation properties of target biomolecules. Distance constraints frombifunctional cross-linkers provided the information required forassembling an all-atom model from the high-resolution coordi-nates of separate domains by triangulating their reciprocal place-ment in 3D space. The resulting structure revealed a compactcloverleaf morphology stabilized by a long-range tertiary interac-tion between the GNRA tetraloop of stemloop 4 (SL4) and theupper stem of stemloop 1 (SL1). The preservation of discretestemloop structures ruled out the possibility that major rearrange-ments might produce a putative supersite with enhanced affinityfor the nucleocapsid (NC) domain of the viral Gag polyprotein,which would drive genome recognition and packaging. The stericsituation of single-stranded regions exposed on the cloverleafstructure offered a valid explanation for the stoichiometry exhib-ited by full-length �-RNA in the presence of NC. The participationof SL4 in a putative GNRA loop-receptor interaction providedfurther indications of the plasticity of this region of genomic RNA,which can also anneal with upstream sequences to stabilize alter-native conformations of the 5� untranslated region (5�-UTR). Con-sidering the ability to sustain specific NC binding, the multifacetedactivities supported by the SL4 sequence suggest a mechanism bywhich Gag could actively participate in regulating the vital func-tions mediated by 5�-UTR. Substantiated by the 3D structure of�-RNA, the central role played by SL4 in specific RNA-RNA andprotein-RNA interactions advances this domain as a primary targetfor possible therapeutic intervention.

GNRA loop-receptor interaction � high-resolution mass spectrometry �HIV-1 Psi-RNA � modeling � structural probing

The multifaceted activities attributed to the packaging signal(�-RNA) (1, 2) of HIV-1 are mediated by discrete stemloop

domains that exercise individual functions, but can also operatein concert to complete complex tasks [supporting information(SI) Fig. S1] (3–5). Stemloop 1 (SL1) is primarily responsible forgenome dimerization (6, 7), but its mutations induce significantreductions of RNA encapsidation (8). Stemloop 2 (SL2) containsthe major splice donor (SD) (9), but its high affinity for thenucleocapsid (NC) domain of the Gag polyprotein implies itsparticipation in genome recognition and packaging (10, 11).Stemloop 3 (SL3) is sufficient by itself to induce the packagingof heterologous RNA into virus-like particles (12), but itsdeletion does not completely eliminate encapsidation that is stillsustained with suboptimal efficiency by the remaining stemloops(13). Located downstream of the gag’s starting codon, stemloop4 (SL4) may be engaged in both coding and non-coding activities,as suggested by its possible involvement in long-range pairinginteractions (14) and its ability to bind NC in vitro (11, 15).Despite a wealth of data showing extensive communicationamong domains, the bases for their functional relationships arenot substantiated by comprehensive structural information,which is still missing because of the size and flexibility offull-length �-RNA.

A promising alternative for elucidating the 3D structure ofbiomolecules with no intrinsic limitations of size and crystalli-zation properties is represented by the combination of footprint-ing and cross-linking techniques with mass spectrometric detec-tion, collectively known as MS3D (16). We have recentlydeveloped MS3D approaches for RNA and its functional as-semblies, which involve the application of structural probesunder conditions that preserve the native folding of the targetsubstrate (17). The identity and sequence position of ensuingcovalent adducts are determined by nuclease digestion, massmapping, and tandem sequencing (17–19), which enable theutilization of a broader probe selection than allowed by tradi-tional electrophoresis-based methods (20). The experimentalconstraints obtained from structural probing are combined withfundamental information pertaining RNA structure to generateall-atom 3D models of target substrates. The validity of thisapproach was confirmed by the excellent agreement between theNMR and MS3D structures of the mouse mammary tumor viruspseudoknot completed in earlier work (21).

The MS3D approach was implemented here to complete thestructural determination of full-length �-RNA. Any ambiguitiesarising from the simultaneous presence of alternative dimericconformations were avoided by focusing on a 115-nt constructcorresponding to the 240–354 sequence of HIV-1 subtype B (Laivariant), in which G259 was replaced with A to disrupt thepalindrome responsible for initiating RNA dimerization (DIS,Fig. S1) (6, 7). Additional mutants were designed ad hoc tovalidate salient features that emerged from initial probingexperiments. Reproducing the sequential nature of the RNAfolding process, full-length �-RNA was modeled from thehigh-resolution coordinates of separate domains available in theProtein Data Bank (PDB), which were assembled according tothe experimental constraints from structural probes. The newinsights provided by this all-atom model are discussed in thecontext of the known functional communications among do-mains. A greater understanding of the structural basis for theserelationships could provide the key for developing novel inhib-itors aimed at disrupting the crucial processes of genome rec-ognition, dimerization, and packaging in HIV-1.

ResultsStructural Probing. The secondary structure of full-length �-RNAwas probed by monofunctional agents that alkylate the base-pairing faces of nucleotides exposed to the medium. Kethoxal(KT) was applied to probe N1 and N2 of unpaired guanines,

Author contributions: E.T.Y. and D.F. designed research; E.T.Y. and A.H. performed re-search; E.T.Y., A.H., J.E., and D.F. analyzed data; and D.F. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

*Present address: Sandia National Laboratories, Livermore, CA 94551.

†E.T.Y. and A.H. contributed equally to this work.

‡To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/cgi/content/full/0800509105/DCSupplemental.

© 2008 by The National Academy of Sciences of the USA

12248–12253 � PNAS � August 26, 2008 � vol. 105 � no. 34 www.pnas.org�cgi�doi�10.1073�pnas.0800509105

Page 2: MS3D structural elucidation of the HIV-1 packaging … structural elucidation of the HIV-1 packaging signal Eizadora T. Yu* †, Arie Hawkins †, Julian Eaton, and Daniele Fabris

1-cyclohexyl-3-(2-morpholinoethyl) carbodiimide (CMCT) wasused for N3-uridine, and dimethyl sulfate (DMS) for N1-adenineand N3-cytidine (17, 18). DMS was also used to assess theparticipation of guanine in helical structures, taking advantageof its ability to methylate N7-guanine in the absence of stackinginteractions (22). Probing reactions completed under a variety ofexperimental conditions were terminated by ethanol precipita-tion to eliminate excess reagent and undesirable salts. Irrevers-ibly modified products were then digested by specific nucleases(e.g., RNase A, T1, etc.) to enable mass mapping and sequencingby electrospray ionization-Fourier transform ion cyclotron res-onance (ESI-FTICR) mass spectrometry (23).

The representative ESI-FTICR mass spectrum in Fig. 1Ashows that footprinting reagents induce unique and readilyrecognizable shifts in the molecular masses of digestion prod-ucts. The relatively low abundance of alkylated adducts ascompared with their unmodified counterparts is the intendedresult of the low probe/substrate ratios necessary to minimizestructure distortion and kinetic traps (17–20). Nucleases withdifferent specificity were used to obtain overlapping maps andenable the correct assignment of alkylated nucleotides. If nec-essary, tandem sequencing was performed to solve any ambigu-ity caused by the possible presence of multiple adducts on thesame digestion product (19). A comprehensive summary ofproducts provided by monofunctional probing is provided inTable S1.

The tertiary structure of �-RNA was probed using bifunc-tional agents capable of cross-linking juxtaposed nucleotides,which provide valuable distance relationships between the con-

jugated moieties. Nitrogen mustard (NM) was used to bridgeN7-guanine and N3-adenine with a relatively flexible spacer of9 � 3 Å, whereas 1,4-phenyl-diglyoxal (PDG) was used tocross-link N1- and N2-guanine with a more rigid aromatic spacerof 7.5 � 0.5 Å (20). In the case of bifunctional reagents, nucleasetreatment provided a wide range of products (Fig. 1B) consistingof two hydrolytic fragments conjugated by the cross-linker,bifunctional adducts in which both conjugated bases werepresent on the same fragment or monofunctional adducts cor-responding to failed cross-links (20). As described earlier for thefootprinting experiments, mass mapping (Fig. S2) and tandemsequencing (Fig. S3) were used as needed to characterize theproducts of probing reactions and identify the sequence positionof conjugated nucleotides. The masses of observed cross-linksand their assignments are summarized in Table S2.

Modification Maps. The results provided by structural probes wererationalized using two-dimensional plots, in which the positionof modified nucleotides were marked on the consensus second-ary structure of �-RNA (4, 24). The color-coded symbols in Fig.2 identify bases modified by monofunctional footprinting agents,whereas dashed lines connect representative cross-linked prod-ucts. The filling offers a pictorial representation of the degree ofmodification observed at each position under a variety ofprobing conditions (17, 21), which covered different Mg2�

concentrations, probe/substrate ratios, reaction temperatures,and incubation intervals (see Materials and Methods). In this way,nucleotides that are either consistently exposed or protected canbe readily differentiated from those affected by dynamic pro-

~ ~~ ~

500 1000 1500 m/z

R.I.

A32

4:U

337*

A32

4:U

337*

+ K

T

G29

2:U

295

A32

4:U

337*

AG

C*/

GA

C*+

KT

G25

4:C

258*

G25

4:C

258*

+ K

TU

UU

1240A

324:

U33

7*+

KT

A32

4:U

337*

+ 2

KT

A26

8:C

274

+ K

T

G27

5:C

281

+ K

TA

~

G27

5-C

281*

R.I.

m/z500 1000 1500

A32

4:U

337*

G31

7:C

322

A32

4:U

337*

G25

4:C

258*

G27

5:C

281

A26

8:C

274*

GG

C

UU

GC

/GU

A30

1:U

306*

G34

4:C

349*

BG

292:

U29

5+N

MO

H

G25

4:C

258*

+N

MO

H

A324:U337*+NMOH

G34

2:C

349*

+N

MA

324:

U33

7*+

NM

G31

7:C

322*

+ 2

NM

OH

A30

1:U

307*

A32

4:U

337*

-G

G246:U249

G344:C349*

G246:C248*

A268:C274*

G292:U295

G317:C322

1120 1140

Fig. 1. Representative ESI-FTICR mass spectra of RNase A digestion products obtained from full-length �-RNA probed with kethoxal (KT) (A) and nitrogenmustard (NM) (B). Hydrolysis products are identified by their 5� and 3� nucleotides separated by a colon. All digestion products included a 2�,3�-cyclic phosphateend, except for those marked by an asterisk, which included a linear 3�-phosphate. A bar connects cross-linked hydrolysis products. (A) Comparing the intensityof A324:U337 (reduced by half to fit on scale) and A324:U337 � KT (enlarged 2.5 times in Inset) gives an idea of typical alkylation yields achieved in probingreactions, which were kept as low as possible to minimize the risk of structure distortion (17,18). (B) The Inset allows one to examine the yields achieved bybifunctional probes.

Yu et al. PNAS � August 26, 2008 � vol. 105 � no. 34 � 12249

BIO

PHYS

ICS

Page 3: MS3D structural elucidation of the HIV-1 packaging … structural elucidation of the HIV-1 packaging signal Eizadora T. Yu* †, Arie Hawkins †, Julian Eaton, and Daniele Fabris

cesses (e.g., transient pair dissociation, or ‘‘breathing’’). Thisnotation also allows for a direct comparison of the susceptibilityof different targets to the same probe, which may be ascribed todifferent structural contexts. It is important to note that, unlikefor pseudoknot substrates investigated in ref. 21, no significantvariations in the modification patterns were observed at differ-ent Mg2� concentration. In those structures, local changesdetected at 50 �M Mg2� were ascribed to protection effectsinduced by specific metal coordination by high-affinity sites (21).No further changes were observed at 100 �M or higher, whichcould be attributed to the ‘‘diffuse’’ ions attracted near the RNAsurface by its electrostatic field (25). No major changes in themodification patterns were detected for �-RNA within the samerange of concentrations, which suggests that the effects of Mg2�

on this structure are not as significant.In the SL1 domain, guanines located in the bulge were highly

susceptible to DMS methylation but were modified by CMCTwith only half the yield exhibited by those in the apical loop. TheN7-position of guanines in canonical A and B helixes is effec-tively protected by base stacking (17, 22); therefore, increasedDMS reactivity is consistent with disruption of regular stackingbetween contiguous nucleobases in the bulge region. Reducedsusceptibility to CMCT can instead be ascribed to a directparticipation in Watson–Crick interactions or to a partiallyobstructed access to its base-pairing face. Consistent with theseresults, high-resolution structures obtained from isolated SL1have shown that G247 and A271 form a noncanonical base pairprone to local breathing, whereas G272 and G273 can adoptalternative conformations pointing in or out of the bulge, whichmay leave them partially exposed to alkylation (26, 27).

Despite their double-stranded context, G265 in the SL1 upperstem and G340 and G350 in SL4 were effectively modified byDMS (Fig. 2). A common characteristic of these nucleotides istheir participation in noncanonic GU pairs (or GU wobbles),which force the local helical structure to assume a less compactconformation with reduced stacking stabilization. Therefore, itwas not surprising to detect methylation of G339 and G342,which flank the tandem GU wobbles of SL4 and are thus affectedby helix distortion. In the triple-base platform of SL2, A296 was

effectively modified because its N1 position remains accessibleto the relatively small DMS probe (17). Finally, lower alkylationlevels were observed for nucleotides involved in pairs at the endof helical regions, which are left unprotected by the ending of thestacked pattern or are transiently exposed by base pair breathing.

Modifications provided by bifunctional probes consisted ofintra- and interdomain conjugated products (Fig. 2). The formerare usually observed between nucleotides exposed on single-stranded regions of the same stemloop or linker, which aresituated within the span of the selected probe. Intradomainconjugates are necessary to corroborate the secondary structuresidentified by monofunctional probes and are valuable in estab-lishing the pairing alignment between complementary regions.This characteristic is particularly important when the RNAsequence of interest is capable of folding into alternative struc-tures stabilized by minor register shifts in base pairing. Incontrast, interdomain cross-links are obtained between nucleo-tides located in different secondary structures, which are not inclose proximity on a 2D map but are juxtaposed by the substratefold. The information afforded by this type of cross-link is usedto constrain the reciprocal positions of entire domains in threedimensions (21). Additionally, these conjugates can lead to thepositive identification of long-range tertiary interactions that arenot sustained by canonical base-pairing and consequently are notdetectable by classic monofunctional probes.

Among intradomain cross-links, products bridging A259 andG261, A259-A263, and G247–G272 correctly define the struc-ture of the apical and internal loops of SL1 (Fig. 2 and Fig. S2).In particular, the first two are consistent with a full-f ledgedstemloop and rule out the formation of any extended duplexconformation that would contradict the inability of this mutantto form stable dimers (28). More significantly, salient interdo-main cross-links were observed between the loops of contiguousstemloops, defining unambiguously their reciprocal placement in3D space (Fig. 2 and Fig. S4). For example, G292–A319 connectsthe loops of SL2 and SL3, whereas G265–G292 links SL1 andSL2 (Fig. S2 A and B, respectively). Other important cross-linksbridge the loop and upper stem of SL4 with the bulge of SL1(e.g., G273–G342 and G247–A345) (Fig. S2C). Taken together,the observed bifunctional products provided a network of spatialrelationships, which enabled triangulating the positions of thestemloops in the global fold of �-RNA.

MS3D Modeling. The results provided by �-RNA were comparedwith those obtained by probing its isolated domains underidentical experimental conditions. The vast majority of thedescribed monofunctional products and intradomain cross-linkswere obtained also from the separate RNA constructs with onlysmall yield variations, whereas none of the interdomain conju-gates was detected. The excellent agreement between domain-specific modifications is consistent with the preservation of theRNA secondary structure determined by well defined pairingwithin each stemloop, which was not significantly affected by theglobal fold of the full-length construct. This observationprompted a modeling approach that resembled very closely thehierarchic process prospected for RNA folding (29), in whichelements of secondary structure are initially formed by theannealing of complementary sequences within the same domain,whereas tertiary interactions are subsequently established be-tween discrete domains mediated by hydrogen bonding andelectrostatic interactions. Reproducing this sequential progres-sion, the high-resolution coordinates of SL1, SL2, SL3, and SL4(PDB entries 2D1B, 1ESY, 1BN0, and 1JTW) (30–33) weremerged with linker structures created by MC-SYM (34). Initialrounds of simulated annealing and energy minimization wereperformed using CNS (35), restrained only by base pair hydrogenbonding, base pair planarity, and standard backbone torsionangles. In subsequent rounds, distance constraints provided by

SL3

AAGGAGAGAGAUG

GG

••

AAAAAUUUUGAC GU A

UA

C GCG

GG

GA

GGGGC CG C

G CGC

A AUCGUAG

G GU

GG-3’

AG

G C

G C

G UGU

GCGA

5’-GGA

G

C GU AC G

CG

GAG

G

CU

AUG

CU

G

AC

CAA A

CGC

CG

A

250

260

270

320

DMSCMCTKT

350

SL1 SL2SL4

Modeling (M)Validation (V)2ary struct. conf. (I)

290

300

340

Fig. 2. Map summarizing the results obtained by probing full-length �-RNAwith mono- and bifunctional reagents. Reported are the sites of modificationby DMS (red triangles), CMCT (blue circles), KT (green squares), and nucleo-tides bridged by NM (dashed lines). Filling gives a representation of themodification degree. When multiple adducts were detected with differentdistributions on the same digestion product, the position of modified nucle-otides was indicated by referring to the entire hydrolysis fragment, using its5� and 3� ends separated by a colon. For clarity, only a handful of represen-tative cross-links are included here as green, red, or blue lines according towhether they were used for modeling, validation, or secondary structureconfirmation (see Table S2). A complete summary of bifunctional cross-linkersgrouped according to usage is provided in Fig. S4.

12250 � www.pnas.org�cgi�doi�10.1073�pnas.0800509105 Yu et al.

Page 4: MS3D structural elucidation of the HIV-1 packaging … structural elucidation of the HIV-1 packaging signal Eizadora T. Yu* †, Arie Hawkins †, Julian Eaton, and Daniele Fabris

high-scoring bifunctional cross-links (M in Table S2 and shownin Fig. S4 A) were applied to define the domains’ spatial ar-rangement. Intradomain cross-links (I in Fig. S4B) were mainlyused to verify that the structures of discrete domains wereretained in the global fold of �-RNA, but were not included inmodel building. After structural strain was relieved by furtherminimization, the resulting model possessed the atomic-leveldetail contained in the initial high-resolution structures and thetertiary topology determined by the cross-linking data (Fig. 3).

Validation of the MS3D structure was obtained by using asubset of intra- and interdomain cross-links, which had beenpurposely excluded from the modeling procedure (V in Table S2and shown in Fig. S4C). The distances between cross-linkedpositions in the final model matched very closely the rangeallowed by the respective probes. A handful of cross-linksinvolving flexible single-stranded regions (F in Fig. S4D), whichhad been used for simulated annealing, fell slightly out of rangeafter the final step of energy minimization was performedwithout explicit cross-linking constraints to relieve model strain.However, no steric clashes or unfavorable geometric situationswere noted, which would contradict the actual experimentalobservation of these conjugates, thus indicating that the modeloffered a valid representation of the structural context encoun-tered in solution by the probes. When the coordinates ofdouble-stranded nucleotides were compared with the start-ing high-resolution coordinates used as input, an overall �2.90Å rmsd was obtained from all backbone atoms excluding hydro-gens, which should be ascribed to the flexible nature of the probespacers and to the need for minimizing any strain induced by theglobal fold onto the initial secondary structures.

After an initial model was obtained, further experiments wereperformed to test salient features revealed by its structure.Because of the numerous bifunctional products bridging thetetraloop and wobble regions of SL4 with the bulge and upperstem of SL1 (Fig. 2 and Table S2), the modeling process placedthese domains in very close contiguity of each other, raising thepossibility that they may be involved in a long-range tertiarycontact. In particular, these constraints could be simultaneouslysatisfied only by locating the tetraloop in direct contact with theregion immediately above the bulge. The fact that isolated SL4presents a classic GNRA tetraloop structure (15, 33) suggestedthat such contact may consist of a GNRA loop-receptor (L/R)interaction stabilized by noncanonical hydrogen bonding be-

tween its loop nucleobases and the minor groove of the SL1upper stem (36–38). The combination of a helical region fol-lowed by an asymmetric loop is very common in receptorsobserved in nature (36), or selected in vitro (37, 38). For thesereasons, we investigated the L/R hypothesis by generating amutant in which A345 was replaced by C to maintain the stabilityof the SL4 structure (15, 33) while varying the second nucleotideof the tetraloop, which determines receptor recognition (37, 38).Submitted to monofunctional probing, the mutant exhibited nodetectable variations of footprinting pattern (data not shown),which confirmed the preservation of the tetraloop fold. Incontrast, the mutation induced the complete elimination ofinterdomain cross-links between SL4 and SL1, providing exper-imental evidence for the prospected long-range interaction.

In the absence of direct experimental information on themolecular contacts between nucleobases, the L/R interactionwas modeled according to structural features exhibited by knownloop-receptor motifs, while ensuring that domain placementsatisfied the cross-linking constraints. Although the structure ofGAGA-specific receptors has not been described, in vitro selec-tion methods demonstrated that this type of tetraloop can bindwith excellent affinity to the broad-spectrum class II receptors(38). In �-RNA, the residual susceptibility to footprinting agentsmanifested by L/R structures and the absence of detectableMg2� effects are consistent with characteristics exhibited by thisclass of receptors, which remain solvent accessible even in boundform and are not magnesium-dependent (38). For these reasons,their structure was used as template for homology-modeling the�-RNA interaction. Matching the consensus sequence at thisposition, the final model displays A347 making hydrogen bondsin the minor groove of the C248-G270 base pair (Fig. 4). G346is hydrogen-bonded with the noncanonical G247-A271 base pairthat replaces the consensus A-U mismatch, with which it sharesa shallower narrow groove (38). This arrangement provides twocontiguous base-triplets stabilized by stacking interactions. Inaddition, A345 forms specific hydrogen bonds with G272 andstacks on G273 protruding out of the bulge motif. This bulgenucleotide is capable of assuming different conformations inisolated SL1 (26, 27, 39) but can be locked in the extrahelicalposition by stacking with A345, thus contributing to stabilize theL/R interaction. The mutation permissivity described for class IIreceptors and the stringent requirements driving their selectionto maximize loop-binding in vitro (38) can account for local

A B

SL1

SL4

5’

3’

SL3

SL2

SL1

SL4

SL3

SL2

5’3’

G259A

A345C

A345C

A296

Fig. 3. Ribbon diagram of full-length �-RNA viewed from the side (A) and top (B). Red, SL1; green, SL2; blue, SL3; yellow, SL4. Linker sequences betweenadjoining stemloops are orange. The relatively compact cloverleaf conformation is stabilized by a long-range tertiary interaction between the apical loop of SL4and the upper stem region of SL1.

Yu et al. PNAS � August 26, 2008 � vol. 105 � no. 34 � 12251

BIO

PHYS

ICS

Page 5: MS3D structural elucidation of the HIV-1 packaging … structural elucidation of the HIV-1 packaging signal Eizadora T. Yu* †, Arie Hawkins †, Julian Eaton, and Daniele Fabris

differences with the �-RNA structure, which has likely evolvedto retain significant dynamic properties. In this direction, itshould be noted that the nucleotides involved in this L/Rinteraction are nearly 100% conserved in all HIV-1 sequencespublished in the 2006/07 HIV Sequence Compendium (40).Although the proposed model will require confirmation byhigh-resolution techniques, the loss of the long-range contactintroduced by mutating A345 is consistent with the centralposition assumed by this nucleobase in the complex network ofhydrogen bonds and stacking interactions, which stabilizes theGNRA loop-receptor interaction in �-RNA.

DiscussionThe MS3D model of full-length �-RNA presents a compactcloverleaf morphology in which the helical stems of the differentstemloops are situated nearly parallel to each other (Fig. 3). Theapical loops point in the same direction, opposite to the linkersbetween adjoining stems. This arrangement places the majorityof the construct’s single-stranded nucleotides in positions thatare not hindered by juxtaposed structures but are insteadaccessible to footprinting agents and possible ligands. Consistentwith the constraints provided by bifunctional probes, SL2 andSL4 flank SL1 from nearly opposite sides, whereas SL3 is placedcloser to SL4. Significantly, the putative GNRA loop-receptorinteraction between SL4 and SL1 defines the overall cloverleaffold by connecting secondary structures that are coded bynucleotides at the opposite ends of the �-RNA sequence.

The compact nature of this structure provides valuable indi-cations for understanding the modes of interaction between�-RNA and the viral Gag polyprotein during genome recogni-tion and packaging. In particular, the preservation of the indi-vidual stemloops accounts for their ability to act as discretebinding units for the NC domain of Gag, which was substantiatedby the identification of six specific binding sites on the apicalloops, the SL1 bulge, and the linker between SL3 and SL4 (11,41). The distribution of such sites on the different domains andthe observation that NC can bind them in vitro with comparableaffinities rule out the possible formation of an alleged supersiteof enhanced affinity, which might be produced only in thecontext of a full-length construct by extensive remodeling of itssecondary structure. The very close spatial proximity of thesesites could explain the lack of complete packaging suppressionafter deletion of individual stemloops (13), which could bepartially compensated by the intrinsic ability of Gag to dimerizethrough protein–protein interactions (42).

The participation of SL4 in a specific GNRA loop-receptorinteraction further expands the broad range of activities per-formed by this region of genomic RNA in vitro and in vivo. In

addition to spanning the start of the gag gene, this sequence hasbeen shown capable of annealing with different upstream se-quences to stabilize alternative conformations of the 5�-untranslated region (5�-UTR) of viral RNA, which have beenproposed to mediate distinct biological activities (14). Althoughprobing of 5�-UTR in virions and infected cells has failed toestablish the presence of either a full-f ledged SL4 stemloop, oralternative RNA conformers (24), these conclusions were basedsolely on the monofunctional agent DMS, which can only revealthe presence of footprinting protection but cannot identify itssource. Considering the ability of NC to bind the SL4 tetraloop(11, 41) in possible competition with its SL1 interaction, even thetransient folding of a full-f ledged stemloop could provide amechanism for Gag to affect the 3D structure of 5�-UTR, eitherby dissociating the GNRA loop-receptor interaction, or bychaperoning the transition between alternative conformers. Ifconfirmed in vivo, the central role suggested by these multifac-eted activities would make the SL4 sequence a primary target forpossible therapeutic intervention.

Finally, the MS3D elucidation of �-RNA demonstrates thefeasibility of a general divide-and-conquer strategy for thestructural determination of progressively larger multisubunitassemblies that are not directly amenable to established tech-niques. According to such strategy, high-resolution coordinatescould be obtained for well behaved parts of the target structureby NMR or crystallography, and structural probing would pro-vide the constraints necessary to assemble the separate sectionsinto a complete 3D model. It is important to emphasize that,once a probing reaction is completed without disruption of thenative fold, the desired structural information is stored into apattern of irreversible chemical modifications, which is notaffected by subsequent hydrolytic procedures designed to handlesubstrates of virtually any size. The broad applicability of MS-based platforms to substrates of diverse biochemical nature andthe development of software tools for aiding the direct utilizationof these types of constraints are expected to propel MS3D to theforefront of the strategies used for elucidating previously intrac-table biological substrates.

Materials and MethodsRNA Samples. RNA constructs were prepared by in vitro transcription ofappropriate DNA templates, using the phage T7 polymerase reaction. Atemplate spanning the G240–G354 sequence of HIV-1 subtype B (Lai isolate)was prepared from the pNL4-3 plasmid provided by the National Institutes ofHealth AIDS Reagents Repository (Germantown, MD). The described muta-tions were introduced using the QuikChange II kit (Stratagene). TranscribedRNAs were purified by denaturing gel electrophoresis performed on 6%(wt/vol) polyacrylamide gels and recovered by electroelution from manuallyexcised bands. Desalting was performed by ultrafiltration using Centricondevices from Millipore (11, 21).

Structural Probing. Dimethyl sulfate (DMS), 1-cyclohexyl-3-(2-morpholinoeth-yl)carbodiimide metho-p-toluenesulfonate (CMCT), and bis (2-chloroethyl)methylamine [nitrogen mustard (NM)] were purchased from Sigma–Aldrichand used without further purification. Kethoxal (KT) was purchased from ICN.1,4-Phenyl-diglyoxal (PDG) was synthesized in house and purified according toprocedures described in ref. 20. Due to their known or suspected mutagenicactivity, these reagents were treated using all possible precautions to avoidinhalation and direct contact with eyes, mucosae, and skin, including the useof latex gloves, face masks, and fume hoods.

Before probing, each target RNA was reannealed by incubating for fiveminutes at 90°C and slow-cooling to room temperature. Typical probingbuffers consisted of either 50 mM (NH4)2BO3 for KT and PDG or 50 mM NH4OAcfor the remaining agents, with pH adjusted to the optimum value for eachprobe (17–19). In selected experiments, 50 to 100 �M MgCl2 was added todetermine the effect of the divalent metal on probing patterns. It should benoted that, although the [Mg2�] in viral particles is still debated, a value ofonly 240 �M has been reported for the intracellular compartment of humanlymphocytes (43). Probe to RNA ratios were experimentally optimized for eachspecific reaction to avoid saturating susceptible nucleotides, which mightresult in structure distortion and secondary alkylation. Typical reactions in-

GCA

AG

GC GG A

5’

U A

G CC G

G272

G273

SL1

5’

SL4

247 344

A B

G273

A347

G247

SL1

SL4

5’

3’

5’

5’3’

3’

Fig. 4. GNRA loop-receptor interaction in �-RNAs. (A) diagram showing theplacement of the SL1 and SL4 domains in the proposed interaction. (B)All-atom model detailing the specific molecular contacts. The model wasobtained by using class II receptors as template (38).

12252 � www.pnas.org�cgi�doi�10.1073�pnas.0800509105 Yu et al.

Page 6: MS3D structural elucidation of the HIV-1 packaging … structural elucidation of the HIV-1 packaging signal Eizadora T. Yu* †, Arie Hawkins †, Julian Eaton, and Daniele Fabris

volved adding an �10-fold excess probe to 25 �M RNA in a total volume of 20�l. Typical reactions were carried out at room temperature for 20 min andquenched by ethanol precipitation in the presence of 1.5 M NH4OAc. ProbedRNA was reconstituted in RNase-free water to prepare �10 �M aliquots fordigestion with RNase A and RNase T1. Both enzymes were purchased fromSigma–Aldrich and used as received. A total of 0.05 units of RNase was addedper mg of RNA substrate; each digestion was carried out in 20 mM NH4OAc (pH7.5) at 37°C for 1–2 h to drive the hydrolysis to near completion.

Mass Spectrometry and Data Analysis. Analyte solutions containing a finalconcentration of 5–10 �M RNA in electrospray buffer (50:50 10 mM ammo-nium acetate:iso-propanol) were prepared by appropriate dilution of theoriginal digestion mixtures. All determinations were performed on a BrukerApex III Fourier transform ion cyclotron resonance (FTICR) mass spectrometerequipped with a 7T shielded superconductive magnet and a thermally-assistedApollo electrospray ion source (17–19). Each experiment required loading �5�l of analyte solution into a nanospray needle with a stainless steel wireinserted from the back end to provide the necessary voltage. Tandem massspectrometry was completed as described earlier (19). Data reduction andanalysis were performed with the aid of MS2Links (19) and the Mongo OligoMass Calculator version 2.05 [made available by J. A. McCloskey (University ofUtah, Salt Lake City)].

Molecular Modeling. Initial models were generated by combining the coordi-nates of each stemloop (PDB entries 2D1B, IESY, 1BN0, and 1JTW) (30–33),using the merge�structure.inp module of CNS (35). Linker sequences werebuilt de novo using MC-SYM (34). In iterative fashion, the number of cross-

links used for model building was progressively increased each round, follow-ing the ranking order provided in Table S2. The iterations were stopped whenno significant variations were detected between successive models. A handfulof randomly selected cross-links were used for validation purposes. The CNSmodules anneal.inp and minimize.inp were used to submit the model torounds of simulated annealing and Cartesian molecular dynamics restrainedby standard backbone torsion angles, planarity, and hydrogen bonding ofbase pairs, according to the modified CharmM force field used by CNS.Restraint set files normally used for NOE restraints in anneal.inp were used tospecify distances between cross-linked nucleotides. The specific contacts be-tween the GNRA tetraloop and its double-stranded receptor were modeled byhomology with interactions supported by class II receptors (38). To establishthe correct contacts in the minor groove of the SL1 helix, hydrogen bondingwas enforced between nucleotide A347 and base pair C248–G270, betweenG346 and G247–A271, and between A345 and G272, as discussed in Results.The specific hydrogen bonding arrangement was built according to typicalbase-triplet interactions. The final model was energy minimized withoutenforcing the cross-linking constraints to relieve possible strain, which did notresult in disruption of the tetraloop docking into the receptor site. Modelswere viewed using PyMOL (http://pymol.sourceforge.net), and figures wereprepared using the NUCCYL module (www.mssm.edu/students/jovinl02/research/nuccyl.html) on a Linux computer.

ACKNOWLEDGMENTS. We thank Dr. M. F. Summers (University of MarylandBaltimore County) for helpful discussions. This work was supported by Na-tional Institutes of Health Grant GM643208 and National Science FoundationGrant CHE-0439067.

1. Harrison GP, Lever AML (1992) The human immunodeficiency virus type-1 packagingsignal and a major splice donor region have a conserved stable secondary structure.J Virol 66:4144–4153.

2. Baudin F, et al. (1993) Functional sites in the 5� region of human immunodeficiencyvirus type 1 RNA form defined structural domains. J Mol Biol 229:382–397.

3. Luban J, Goff S (1994) Mutational analysis of cis-acting packaging signals in humanimmunodeficiency virus type 1 RNA. J Virol 68:3784–3793.

4. Clever J, Sassetti C, Parslow TG (1995) RNA secondary structure and binding sites for gaggene products in the 5� packaging signal of human immunodeficiency virus type 1.J Virol 69:2101–2109.

5. Berkhout B (1996) Structure and function of the human immunodeficiency virus leaderRNA. Prog Nucleic Acids Res Mol Biol 54:1–34.

6. Laughrea M, Jette, L (1994) A 19-nucleotide sequence upstream of the 5� major splicedonor is part of the dimerization domain of human immunodeficiency virus 1 genomicRNA. Biochemistry 33:13464–13474.

7. Skripkin E, Paillart JC, Marquet R, Ehresmann B, Ehresmann C (1994) Identification ofthe primary site of the human immunodeficiency virus type 1 RNA dimerization in vitro.Proc Natl Acad Sci USA 91:4945–4949.

8. Berkhout B, van Wamel JL (1996) Role of the DIS hairpin in replication of humanimmunodeficiency virus type 1. J Virol 70:6723–6732.

9. Purcell DF, Martin MA (1993) Alternative splicing of human immunodeficiency virustype 1 mRNA modulates viral protein expression, replication, and infectivity. J Virol67:6365–6378.

10. Amarasinghe GK, et al. (2000) NMR structure of the HIV-1 nucleocapsid protein boundto stem-loop SL2 of the �-RNA packaging signal. Implications for genome recognition.J Mol Biol 301:491–511.

11. Hagan N, Fabris D (2003) Direct mass spectrometric determination of the stoichiometryand binding affinity of the complexes between HIV-1 nucleocapsid protein and RNAstem-loops hairpins of the HIV-1 �-recognition element. Biochemistry 42:10736–10745.

12. Hayashi T, Shioda T, Iwakura Y, Shibuta H (1992) RNA packaging signal of humanimmunodeficiency virus type 1. Virology 188:590–599.

13. Clever J, Parslow TG (1997) Mutant human immunodeficiency virus type I genomeswith defects in RNA dimerization or encapsidation. J Virol 71:3407–3414.

14. Huthoff H, Berkhout B (2001) Two alternating structures of the HIV-1 leader RNA. RNA7:143–157.

15. Amarasinghe GK, et al. (2001) Stem-loop 4 of the HIV-1 �-RNA packaging signalexhibits weak affinity for the nucleocapsid protein. Structural studies and implicationsfor genome recognition. J Mol Biol 314:961–970.

16. Young MM, et al. (2000) High throughput protein fold identification by using exper-imental constraints derived from intramolecular cross-links and mass spectrometry.Proc Natl Acad Sci USA 97:5802–5806.

17. Yu E, Fabris D (2003) Direct probing of RNA structures and RNA-protein interactions inthe HIV-1 packaging signal by chemical modification and electrospray ionizationFourier transform mass spectrometry. J Mol Biol 330:211–223.

18. Yu ET, Fabris D (2004) Toward multiplexing the application of solvent accessibilityprobes for the investigation of RNA three-dimensional structures by electrosprayionization—Fourier transform mass spectrometry. Anal Biochem 344:356–366.

19. Kellersberger KA, Yu E, Kruppa GH, Young MM, Fabris D (2004) Top-down character-ization of nucleic acids modified by structural probes using high-resolution tandemmass spectrometry and automated data interpretation. Anal Chem 76:2438–2445.

20. Zhang Q, Yu ET, Kellersberger KA, Crosland E, Fabris D (2006) Toward building adatabase of bifunctional probes for the MS3D investigation of nucleic acids structures.J Am Soc Mass Spectrom 17:1570–1581.

21. Yu ET, Zhang Q, Fabris D (2005) Untying the FIV frameshifting pseudoknot structure byMS3D. J Mol Biol 345:69–80.

22. Peattie DA, Gilbert W (1980) Chemical probes for higher-order structure in RNA. ProcNatl Acad Sci USA 77:4679–4682.

23. HendricksonCL,EmmettMR,MarshallAG(1999)Electrospray ionizationFouriertransformion cyclotron resonance mass spectrometry. Annu Rev Phys Chem 50:517–536.

24. Paillart JC, et al. (2004) First snapshots of the HIV-1 RNA structure in infected cells andin virions. J Biol Chem 279:48397–48403.

25. Draper DE (2004) A guide to ions and RNA structure. RNA 10:335–343.26. Yuan Y, Kerwood DJ, Paoletti AC, Shubsda MF, Borer PN (2003) Stem of SL1 RNA in

HIV-1: Structure and nucleocapsid protein binding for a 1 � 3 internal loop. Biochem-istry 42:5259–5269.

27. Lawrence DC, Stover CC, Noznitsky J, Wu Z, Summers MF (2003) Structure of the intactstem and bulge of HIV-1 Psi-RNA stem-loop SL1. J Mol Biol 326:529–542.

28. Turner KB, Hagan NA, Fabris D (2007) Understanding the isomerization of the HIV-1dimerization initiation domain by the nucleocapsid protein. J Mol Biol 369:812–828.

29. Tinoco I, Bustamante C (1999) How RNA folds. J Mol Biol 293:271–281.30. Baba S, et al. (2005) Solution RNA Structures of the HIV-1 Dimerization Initiation Site

in the Kissing-Loop and Extended-Duplex Dimers. J Biochem (Tokyo) 138:583–592.31. Amarasinghe GK, De Guzman RN, Turner RB, Summers MF (2000) NMR structure of

stem-loop SL2 of the HIV-1 psi RNA packaging signal reveals a novel A-U-A base-tripleplatform. J Mol Biol 299:145–156.

32. Pappalardo L, Kerwood DJ, Pelczer I, Borer PN (1998) Three-dimensional folding of anRNA hairpin required for packaging HIV-1. J Mol Biol 282:801–818.

33. Kerwood DJ, Cavaluzzi MJ, Borer PN (2001) Structure of SL-4 from the HIV-1 packagingsignal. Biochemistry 40:14518–14529.

34. Major F, Gautheret D, Cedergren R (1993) Reproducing the three-dimensional structure ofa tRNA molecule from structural constraints. Proc Natl Acad Sci USA 90:9408–9412.

35. Brunger AT, et al. (1998). Crystallography and NMR System: A new software suite formacromolecular structure determination. Acta Crystallogr D 54:905–921.

36. Cate JH, et al. (1996) RNA tertiary structure mediation by adenosine platforms. Science273:1696–1699.

37. Costa M, Michel F (1997) Rules for RNA recognition of GNRA tetraloops deduced by invitro selection: Comparison with in vivo evolution. EMBO J 16:3289–3302.

38. Geary C, Baudrey S, Jaeger L (2008). Comprehensive features of natural and in vitroselected GNRA tetraloop-binding receptors. Nucleic Acids Res 36:1138–1152.

39. Greatorex J, Gallego J, Varani G, Lever A (2002) Structure and stability of wild-type andmutant RNA internal loops from the SL-1 domain of the HIV-1 packaging signal. J MolBiol 322:543–557.

40. Leitner T, et al., eds (2007) HIV Sequence Compendium 2006/2007 (Theoretical Biologyand Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM).

41. Fabris D, Chaudhari P, Hagan NA, Turner KB (2007) Functional investigations ofretroviral protein-ribonucleic acid complexes by nanospray Fourier transform ioncyclotron resonance mass spectrometry. Eur J Mass Spectrom 13:29–33.

42. Johnson MC, Scobie HM, Ma YM, Vogt VM (2002) Nucleic acid-independent retrovirusassembly can be driven by dimerization. J Virol 76:11177–11185.

43. Delva P, Pastori C, Degan M, Montesi G, Lechi A (2004) Catecholamine-induced regu-lation in vitro and ex vivo of intralymphocyte ionized magnesium. J Membr Biol199:163–171.

Yu et al. PNAS � August 26, 2008 � vol. 105 � no. 34 � 12253

BIO

PHYS

ICS