distinct families of site-specific retrotransposons occupy identical

9
MOLECULAR AND CELLULAR BIOLOGY, Nov. 1992, P. 5102-5110 0270-7306/92/115102-09$02.00/0 Copyright ) 1992, American Society for Microbiology Vol. 12, No. 11 Distinct Families of Site-Specific Retrotransposons Occupy Identical Positions in the rRNA Genes of Anopheles gambiae NORA J. BESANSKY, 12* SUSAN M. PASKEWITZ,lt DIANE MILLS HAMM,1 AND FRANK H. COLLINS1'2 Malaria Branch, Division of Parasitic Diseases, National Center for Infectious Diseases, Centers for Disease Control, Atlanta, Georgia 30333,1 and Department of Biology, Emory University, Atlanta, Georgia 303222 Received 15 May 1992/Returned for modification 1 July 1992/Accepted 27 August 1992 Two distinct site-specific retrotransposon families, named RT1 and RT2, from the sibling mosquito species Anopheles gambiae and A. arabiensis, respectively, were previously identified. Both were shown to occupy identical nucleotide positions in the 28S rRNA gene and to be flanked by identical 17-bp target site duplications. Full-length representatives of each have been isolated from a single species, A. gambiae, and the nucleotide sequences have been analyzed. Beyond insertion specificity, RT1 and RT2 share several structural and sequence features which show them to be members of the LINE-like, or non-long-terminal-repeat retrotrans- poson, class of reverse transcriptase-encoding mobile elements. These features include two long overlapping open reading frames (ORFs), poly(A) tails, the absence of long terminal repeats, and heterogeneous 5' truncation of most copies. The first ORF of both elements, particularly ORF1 of RT1, is glutamine rich and contains long tracts of polyglutamine reminiscent of the opa repeat. Near the carboxy ends, three cysteine- histidine motifs occur in ORF1 and one occurs in ORF2. In addition, each ORF2 contains a region of sequence similarity to reverse transcriptases and integrases. Alignments of the protein sequences from RT1 and RT2 reveal 36% identity over the length of ORF1 and 60%o identity over the length of ORF2, but the elements cannot be aligned in the 5' and 3' noncoding regions. Unlike that of RT2, the 5' noncoding region of RT1 contains 3.5 copies of a 500-bp subrepeat, followed by a poly(T) tract and two imperfect 55-bp subrepeats, the second spanning the beginning of ORF1. The pattern of distribution of these elements among five sibling species in the A. gambiae complex is nonuniform. RT1 is present in laboratory and wild A. gambiae, A. arabiensis, and A. melas but has not been detected in A. quadriannulatus or A. merus. RT2 has been detected in all available members of the A. gambiae complex except A. merus. Copy number fluctuates, even among the offspring of individual wild female A. gambiae mosquitoes. These findings reflect a complex evolutionary history balancing gain and loss of copies against the coexistence of two elements competing for a conserved target site in the same species for perhaps millions of years. Among the transposable elements that encode reverse transcriptase (RT) are those without long terminal repeats (LTRs). Examples of this type, sometimes referred to as non-LTR retrotransposons, are the mammalian LINE-1 (L1) elements and the Drosophila melanogaster I factors, F elements, and jockey (6). Comparisons among the many elements characterized from a broad phylogenetic spectrum of organisms have revealed several common structural and sequence features. Full-length elements, typically about 6 kb long, include one or two long open reading frames (ORFs) and poly(A)- or (A)-rich terminal tracts. Most copies, thou- sands in mammals and tens in other organisms, are hetero- geneously truncated at the 5' end. The coding region in- cludes domains with RT homology and may also contain one or more cysteine-histidine (Cys) motifs reminiscent of the nucleic acid-binding domains of retroviral nucleocapsid pro- teins. Recent reports have demonstrated transposition through an RNA intermediate for the I factor (37, 56) and the mouse Li element (25). Indeed, the RT enzymes encoded by CRE1, jockey, and a human Li element have been shown to be functional (30, 33, 50), and the RT activity is associated with virus-like particles of Li RNA and protein in the microsomal fraction of human and mouse embryonal carci- * Corresponding author. t Present address: Department of Entomology, University of Wisconsin, Madison, WI 53706. noma cells (20, 49). However, unlike retroviruses, Li-like elements are not infectious (43), and the absence of LTRs indicates a very different replication strategy that is not well understood. Peptide alignments show that the RT and cap- sid-like domains of non-LTR retrotransposons are more closely related to one another than to those of other RT- encoding mobile elements (23, 51, 66, 68). However, al- though the coding region may occupy over 80% of the length of a given element, the sequence diversity among character- ized non-LTR retrotransposons is high enough to preclude amino acid alignment of any but very short segments. Many non-LTR retrotransposons, such as mammalian Li elements (32) and Ti elements from mosquitoes in the Anopheles gambiae complex (8, 9), are widely dispersed in the host genomic DNA and have no apparent insertion site specificity. However, some elements of this same class exhibit a preference for a particular target site. In three different trypanosomatid protozoa, three distinct elements (CRE1, SLACS, and CZAR) interrupt some portion of the array of spliced leader RNA genes at the same conserved sequence (2). The Drosophila G element occupies a precise site in the intergenic spacers of the rRNA genes (22). Similarly, some 28S rRNA genes are interrupted by non- LTR retrotransposons in most insects (34). Most of the rDNA insertions studied belong to the Ri and R2 families of elements that are found at highly conserved sites located 74 bp apart and termed Ri and R2 sites, respectively. Excep- 5102

Upload: trinhnhu

Post on 04-Jan-2017

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Distinct Families of Site-Specific Retrotransposons Occupy Identical

MOLECULAR AND CELLULAR BIOLOGY, Nov. 1992, P. 5102-51100270-7306/92/115102-09$02.00/0Copyright ) 1992, American Society for Microbiology

Vol. 12, No. 11

Distinct Families of Site-Specific Retrotransposons Occupy IdenticalPositions in the rRNA Genes ofAnopheles gambiae

NORA J. BESANSKY,12* SUSAN M. PASKEWITZ,lt DIANE MILLS HAMM,1AND FRANK H. COLLINS1'2

Malaria Branch, Division of Parasitic Diseases, National Centerfor Infectious Diseases, Centers for DiseaseControl, Atlanta, Georgia 30333,1 and Department of Biology, Emory University, Atlanta, Georgia 303222

Received 15 May 1992/Returned for modification 1 July 1992/Accepted 27 August 1992

Two distinct site-specific retrotransposon families, named RT1 and RT2, from the sibling mosquito speciesAnopheles gambiae and A. arabiensis, respectively, were previously identified. Both were shown to occupyidentical nucleotide positions in the 28S rRNA gene and to be flanked by identical 17-bp target site duplications.Full-length representatives of each have been isolated from a single species, A. gambiae, and the nucleotidesequences have been analyzed. Beyond insertion specificity, RT1 and RT2 share several structural andsequence features which show them to be members of the LINE-like, or non-long-terminal-repeat retrotrans-poson, class of reverse transcriptase-encoding mobile elements. These features include two long overlappingopen reading frames (ORFs), poly(A) tails, the absence of long terminal repeats, and heterogeneous 5'truncation of most copies. The first ORF of both elements, particularly ORF1 of RT1, is glutamine rich andcontains long tracts of polyglutamine reminiscent of the opa repeat. Near the carboxy ends, three cysteine-histidine motifs occur in ORF1 and one occurs in ORF2. In addition, each ORF2 contains a region of sequencesimilarity to reverse transcriptases and integrases. Alignments of the protein sequences from RT1 and RT2reveal 36% identity over the length ofORF1 and 60%o identity over the length ofORF2, but the elements cannotbe aligned in the 5' and 3' noncoding regions. Unlike that of RT2, the 5' noncoding region of RT1 contains 3.5copies of a 500-bp subrepeat, followed by a poly(T) tract and two imperfect 55-bp subrepeats, the secondspanning the beginning of ORF1. The pattern of distribution of these elements among five sibling species in theA. gambiae complex is nonuniform. RT1 is present in laboratory and wild A. gambiae, A. arabiensis, and A.melas but has not been detected in A. quadriannulatus or A. merus. RT2 has been detected in all availablemembers of the A. gambiae complex except A. merus. Copy number fluctuates, even among the offspring ofindividual wild female A. gambiae mosquitoes. These findings reflect a complex evolutionary history balancinggain and loss of copies against the coexistence of two elements competing for a conserved target site in the samespecies for perhaps millions of years.

Among the transposable elements that encode reversetranscriptase (RT) are those without long terminal repeats(LTRs). Examples of this type, sometimes referred to asnon-LTR retrotransposons, are the mammalian LINE-1 (L1)elements and the Drosophila melanogaster I factors, Felements, and jockey (6). Comparisons among the manyelements characterized from a broad phylogenetic spectrumof organisms have revealed several common structural andsequence features. Full-length elements, typically about 6 kblong, include one or two long open reading frames (ORFs)and poly(A)- or (A)-rich terminal tracts. Most copies, thou-sands in mammals and tens in other organisms, are hetero-geneously truncated at the 5' end. The coding region in-cludes domains with RT homology and may also contain oneor more cysteine-histidine (Cys) motifs reminiscent of thenucleic acid-binding domains of retroviral nucleocapsid pro-teins. Recent reports have demonstrated transpositionthrough an RNA intermediate for the I factor (37, 56) and themouse Li element (25). Indeed, the RT enzymes encoded byCRE1, jockey, and a human Li element have been shown tobe functional (30, 33, 50), and the RT activity is associatedwith virus-like particles of Li RNA and protein in themicrosomal fraction of human and mouse embryonal carci-

* Corresponding author.t Present address: Department of Entomology, University of

Wisconsin, Madison, WI 53706.

noma cells (20, 49). However, unlike retroviruses, Li-likeelements are not infectious (43), and the absence of LTRsindicates a very different replication strategy that is not wellunderstood. Peptide alignments show that the RT and cap-sid-like domains of non-LTR retrotransposons are moreclosely related to one another than to those of other RT-encoding mobile elements (23, 51, 66, 68). However, al-though the coding region may occupy over 80% of the lengthof a given element, the sequence diversity among character-ized non-LTR retrotransposons is high enough to precludeamino acid alignment of any but very short segments.Many non-LTR retrotransposons, such as mammalian Li

elements (32) and Ti elements from mosquitoes in theAnopheles gambiae complex (8, 9), are widely dispersed inthe host genomic DNA and have no apparent insertion sitespecificity. However, some elements of this same classexhibit a preference for a particular target site. In threedifferent trypanosomatid protozoa, three distinct elements(CRE1, SLACS, and CZAR) interrupt some portion of thearray of spliced leader RNA genes at the same conservedsequence (2). The Drosophila G element occupies a precisesite in the intergenic spacers of the rRNA genes (22).Similarly, some 28S rRNA genes are interrupted by non-LTR retrotransposons in most insects (34). Most of therDNA insertions studied belong to the Ri and R2 families ofelements that are found at highly conserved sites located 74bp apart and termed Ri and R2 sites, respectively. Excep-

5102

Page 2: Distinct Families of Site-Specific Retrotransposons Occupy Identical

DISTINCT RETROTRANSPOSONS WITH IDENTICAL TARGET SITES 5103

tions from the Japanese beetle (34) and fungus gnat (41)insert about 30 bp upstream of the R2 site, one from anematode (3) inserts between the two sites, and the mosquitoelements reported here, RT1 and RT2, insert about 630 bpdownstream (55).The A. gambiae complex, named for the member species

that is the most important vector of malaria in the world, isa group of at least six tropical African mosquito species thatare morphologically indistinguishable as adults. They are soclosely related that the divergence from a common ancestorof the most anthropophilic members may have accompaniedthe shift in settlement patterns and land use associated withthe origins of agriculture in Africa in the last several thou-sand years (17). Nucleotide sequence comparisons usingregions of the ribosomal and mitochondrial DNAs are con-sistent with this view (11). Previously described non-LTRretrotransposons from diverse species have been too di-verged to align, except for a few relatively short conservedamino acid sequence motifs typical of RT and putativenucleic acid-binding proteins. Sequence comparisons amongdistinct but less diverged element families from siblingspecies may help define the importance of less understoodregions of the ORFs, particularly the more rapidly evolvinggag-like ORF1. Analysis of ORF2 structural domains fromdistinct site-specific elements may elucidate those residueswhich dictate target site preference and recognition. Com-parison between two elements competing for the same targetsite inA. gambiae and other species in the complex may alsoprovide an evolutionary picture of their transmission, prop-agation, and loss. With these goals in mind, we isolated fromA. gambiae full-length representatives of two non-LTRretrotransposon families with identical target site specifici-ties and determined and analyzed their complete nucleotidesequences.

MATERIALS AND METHODS

Isolation and sequencing of clones. Genomic clones fromEMBL3 phage libraries containing RT1 and RT2 insertionsin the 3' region of the 28S rRNA gene ofA. gambiae and A.arabiensis were isolated as described previously (55). Full-length RT1 clone Agr23 fromA. gambiae was obtained fromthe initial screening. A full-length representative of RT2from A. gambiae, Ag106, was selected from the EMBL3library by using the KjpnI-PstI fragment of Aar23, a trun-cated RT2 element from A. arabiensis, as a probe (55).Overlapping restriction fragments from each bacteriophageclone were subcloned into plasmid Bluescript SKII+ (Strat-agene Cloning Systems, La Jolla, Calif.).Each subclone was sequenced from both strands by the

dideoxy-chain termination method (58) as modified for dou-ble-stranded sequencing (62), by using a combination ofcommercially available and synthetic oligonucleotide prim-ers. The structural organization of both elements and theirinsertion site in the rDNA are shown in Fig. 1. Sequenceswere analyzed by using the Genetics Computer GroupSequence Analysis Package (21) and a program kindlyprovided by W.-H. Li for estimating the rate of synonymousand nonsynonymous nucleotide substitutions between se-quences (45), run on the VAX network of the Centers forDisease Control.

Preparation of probes. RT1 and RT2 probes were derivedfrom subclones pAgr23H4S6 (-1.9-kb insert) andpAgrlO6S3 (-2.8-kb insert), respectively (Fig. 1). Probeswere prepared following digestion of subclone DNA torelease the insert and electrophoresis through a 1.2% low-

S S RTIjPAg23H4S6) Rn

_____~~~~ (pAgrlO6S3)S(AAA)- \\\\\\\\\

n ,. . ..

ORF1FIG. 1. Structural organization of RT1 and RT2 and their rDNA

insertion site relative to those of Rl and R2. Cross-hatched boxes,coding regions; stipled boxes, subrepeat regions; black and openboxes, coding and transcribed spacer sequences, respectively; S,SalI sites used for subcloning. Not drawn to scale.

melting-temperature agarose gel. The insert was excisedfrom the gel and labeled with [32P]dCTP to 5 x 108 cpm/,ugby random primer extension in the gel overnight (27).

Southern analysis. DNA was prepared from individualadult mosquitoes as described previously (15), digested with15 U of SalI (Boehringer Mannheim Biochemicals, Indianap-olis, Ind.) for 90 min, electrophoresed on an 0.8% agarosegel, and transferred to GeneScreen Plus membrane (DuPont, NEN Research Products, Boston, Mass.). Hybridiza-tion was performed overnight in 0.25% nonfat dry milk(Carnation)-6x SSC (lx SSC is 0.15 M NaCl plus 0.015 Msodium citrate) at 65°C. The blot was washed three times for15 min each time in prewarmed 0.1x SSC-0.1% sodiumdodecyl sulfate at 65°C. Activity was recorded both byautoradiography and by a Betascope 603 Blot Analyzer(Betagen Corp., Waltham, Mass.) made available by theCenters for Disease Control Biotechnology Core Facility.The latter was used in accordance with the manufacturer'sspecifications to record beta emissions directly from thehybridized membranes.

Northern (RNA) analysis. Total RNA and polyadenylatedmRNA were isolated from embryos, larvae, pupae, andadults of anA. gambiae G3 colony by using the guanidinium-cesium chloride method (47) and the FastTrack mRNAisolation kit (Invitrogen), respectively. RNA (10 pg of totalRNA, 1 pg of mRNA) was separated by electrophoresis ona 0.8% agarose gel under denaturing conditions by using theformaldehyde method (48), transferred to GeneScreen Plusmembrane, and hybridized in accordance with the manufac-turer's recommendations.

Nucleotide sequence accession numbers. The sequencesreported here have been given GenBank accession numbersM93690 (RT1) and M93691 (RT2).

RESULTS

Nucleotide sequence of an RT1 element. Clone Agr23 fromA. gambiae was previously determined to contain a full-length RT1 element on the basis of its hybridization patterntoA. gambiae genomicDNA (55) and sequence comparisonsat the 5' ends of other elements of the same length (data notshown). The RT1 element in Agr23 is 8,036 nucleotides long

VOL. 12, 1992

Page 3: Distinct Families of Site-Specific Retrotransposons Occupy Identical

5104 BESANSKY ET AL.

1 QLQQQQQQRQPQRYVVAGSSQQQQQ..QHQQQQQKRKRPKPELIEISPGQNETII-lllllllllll-11-11111 11111 _ Li1 1 -II

!FESVSLKIRKAVDDNGTHKELKDFIIMGRRTDKALLRLTLARSANAIII 1111 III III

QH'QQQQ'Q'QORQ'P'QRQAV'AG'SQ'Q'QQQERMQQQQQLQRKRKPRPDIIEVSPSEGETWDGIYDKVRKAIRLDAAHSENKGHIKQGRRTHARLLRKLSKTANA101 LILQQIRTIIGEAGTCRHVTEKAALVVNDIDPLAKEEELTALLENKIEGGAGIVSTSIRTMPDGTQRARVRLPAKAAKALDGTKLRLGFCISRVKMAPP

II IIII 11 I 111 1I M 11 11 11 11 11 III III III 11 IIILMLEGVRKIIGDAGVSRLVTENGELLVVDIDPLATEEDIIAALDAKIGASAGVVSASIWELPDGSKRARIRLPVKSARQLEGLKLFLCDCVSKVRAAPP

201 PKEHLRCYRCLEHGHNARDCRSPVDRQNVCIRCGQEGHKAGTCMEEIRCGKCDGPHVIGDRTCDRSATQ11 III 11 III MIii 11111 i 1 11 III

PPERQRCERCL2MHASNRSTADRQNLCIRCGLTGHKARSCQNEIGHSECARSAQRFIG. 2. Alignment of 267 C-terminal amino acids from ORFi of RT1 and RT2. Identities are indicated by vertical lines; Cys motifs are

underlined.

(data not shown). It is flanked at each end by a 17-bpduplication of a 28S rDNA sequence (55). The first of twoORFs is preceded by a 2,427-bp noncoding leader withseveral interesting features. Beginning at position 214, thereare 3.5 consecutive imperfect subrepeats approximately 500nucleotides long. These are followed by a poly(T) tractapproximately 60 bases long. Beyond this tract is a sequencecontaining two 55-bp subrepeats separated by 160 bases.Each subrepeat is very G+C rich (67%) and contains severalpalindromes that potentially participate in secondary struc-ture. The last of the two subrepeats overlaps the beginning ofORF1 by 38 bp.ORF1 spans positions 2428 to 4266, ending with the

termination codon TGA. A potential initiation codon isfound 55 bp from the beginning of the ORF. No othermethionines are found for 75 codons downstream. Thisfinding, together with a favorable context for initiation (44),suggests that the first methionine is the initiation codon.Translation beginning at this AUG triplet would produce a595-amino-acid protein with a net charge of +19 at pH 7.5.Most striking is a preponderance of glutamine residues, 21%(125 of 595) overall. Further, they are clustered at the centerof the ORF and occur in runs of up to 20 consecutiveglutamines, reminiscent of the opa repeat (65). This repeat,which is defined by repeated triplets of CA(G, A, C) andencodes glutamine or histidine, is present in gene productsthat are expressed in a developmental or tissue-specificmanner in organisms such as yeast, fruit flies, humans, mice,and rats (31). At the carboxyl (C) terminus of ORF1 are threeCys motifs, arranged in a pattern typical of sequences fromretroviral nucleocapsid proteins that recognize single-stranded nucleic acids (7).The second ORF extends from position 4096 to position

7731, ending with the termination codon TAA. The begin-ning of ORF2 overlaps ORF1 by 171 bp in the -1 readingframe. Methionines occur 31 and 40 codons from the begin-ning of the ORF, but neither one is in a favorable sequencecontext for initiation of translation. Thus, as can occur withretroviruses, translation of ORF2 may occur by ribosomalframe shifting. This mechanism has also been proposed toaccount for translation of ORF2 in Ri elements (35). ORF2contains a domain of RT homology followed by two distinctamino acid sequence motifs, one or both of which maycorrespond to an integrase domain.

Following ORF2 is a 304-bp noncoding sequence ending ina poly(A) tail. Interestingly, the poly(A) tail of all 5'-truncated copies is distinct from that of full-length copies inthat it is interrupted at the third position by a cytidine,AAC(A). (55). Preceding the tail by 14 bp is a putativepolyadenylation signal (AATAAA).

Nucleotide sequence of an RT2 element. The 5' end of theA. gambiae RT2 element was defined from sequence com-parisons among different phage clones containing elements

presumed to be full length on the basis of restriction patterns(data not shown). Clone Ag1O6, containing a full-length RT2element, was selected for sequencing. The complete se-quence of this RT2 element was determined (data notshown). It is 6,731 bp long and flanked by the same 17 bp of28S rDNA as the RT1 elements. The 1,134-bp 5' noncodingregion is almost 1,300 bp shorter than the correspondingregion from RT1, accounting for the difference in overalllength between the two elements. Unlike RT1, this region isdevoid of any significant subrepeats or palindromes. How-ever, the nucleotide composition switches abruptly fromsegments rich in A+T to segments rich in G+C, particularlyC in the mRNA-equivalent strand.ORFi begins at position 1134 and ends at position 2858

with the sequence TGA. The 5'-proximal ATG is not en-countered until position 1533, almost 400 bp downstream.Although a purine (G) is found three positions upstream, it isuncommon for the initiation codon to occur so far from thebeginning of the ORF (44). Sequence comparisons with othercopies of RT2 in the 5' noncoding region suggest that Agi06is a full-length copy, but the possibility that AgiO6 isdefective near the beginning of ORFi cannot be ruled out.Translation from the 5' ATG codon would produce a 442-amino-acid protein with a net change of +24 at pH 7.5.Although not as glutamine rich as its counterpart from RT1(13 versus 21%), glutamine is a predominant residue in thisputative peptide. Three consecutive Cys motifs occur at theC terminus.

Overlapping ORFi by 44 bp in the +1 reading frame,ORF2 spans positions 2812 to 6477. Since no potentialinitiator codon occurs in the first 468 bp, ORF2 may betranslated by ribosomal frame shifting. While retroviralframe shifting is -1, in retrotransposon Tyl there is aprecedent for +1 frame shifting (12). This is similar to thesituation that has been described for Li elements from mice,in which ORFi and ORF2 overlap in the + 1 reading frame(32). ORF2 contains regions that show homology to RT andintegrase. Following ORF2 is a 3' noncoding region of 244bp, terminating with a putative polyadenylation signal and apoly(A) tail.Comparison of the ORFs. The overall identity between the

ORFi of RT1 and that of RT2 is 36% at the amino acid level.The percent identity increases to 55% when only 267 C-ter-minal amino acids are considered (Fig. 2). Thus, the Cterminus, following the run of glutamines, is more highlyconserved than the amino (N) terminus. As noted by Jakub-czak et al. (35), the first of the three conserved Cys motifsfound in several Li-like elements is invariant and is identicalin spacing to motifs in retroviral nucleocapsid genes(CX2CX4HX4C). The second and particularly the third mayvary from the spacing of four residues between the secondcysteine and the histidine. Interestingly, while the canonicalspacing is found within the first and second motifs of the

I

MOL. CELL. BIOL.

Page 4: Distinct Families of Site-Specific Retrotransposons Occupy Identical

DISTINCT RETROTRANSPOSONS WITH IDENTICAL TARGET SITES 5105

ribosomal insertion elements RT1, RT2, and Ri from Bom-byx mon, RlBm (67), and D. melanogaster, RlDm (35), thespacing within the third motifs of RT1 and RT2 is identical tothat of F, G, and jockey (three residues) and differs from thatof the two Ri elements (eight residues). The Cys motifs ofthe ribosomal insertion elements R2Bm and R2Dm, as wellas the trypanosomatid site-specific elements CZAR,SLACS, and INGI, more closely resemble the motif found intranscription factor TFIIIA (63). This type of Cys motifinteracts with double-stranded DNA (7).

Overall identity between ORF2 of RT1 and that of RT2 ishigh, 60% at the amino acid level. Except for a glutamine-rich stretch present only in RT2 near the N terminus (31 of 55residues, including runs of 10 and 6 glutamines), this highlevel of identity is upheld throughout the length of ORF2.Centrally located in both ORFs is a region with extensivehomology to RT, as has been reported for numerous otherretrotransposons (66, 68). By using the Genetics ComputerGroup program Pileup (21), an amino acid alignment wascreated with the RT domains of RT1, RT2, and eight othernon-LTR retrotransposons-RlDm (35), RlBm (67), R2Dm(35), R2Bm (13), I (26), Ingi (42), and two from A. gambiae,Ti (8) and T2 (5). The alignment confirmed that RT1 and RT2are more closely related to one another (75% identity over280 residues) than either is to any other (28 to 40% identityover about 270 residues; data not shown). This RT alignmentalso suggested an especially close evolutionary relationshipbetween Ri ribosomal insertion elements and RT1-RT2. The30% of residues shared between RT1 or RT2 and theelements that do not specifically insert into rDNA arecomparable to the 29% shared between RT1 or RT2 and theR2 ribosomal insertion elements. In contrast, RT1 or RT2and the Ri ribosomal insertion elements share 39% ofresidues in this domain. Supporting this relationship is asimilarity in the target site duplications of RT1-RT2 and Ri(55). RT1-RT2 and Ri insert in opposite orientations in therDNA, and 9 bp of the RT1-RT2 target site duplication(TATCCCTGT) are identical but reversed in orientation withrespect to 9 bp of the target site duplication of Ri. It isimportant to note that all individuals of A. gambiae testedhave another, 5-kb insertion in approximately 15% of the 28Scoding sequences, 5' to the RT1-RT2 insertion site (16).These insertions, which do not vary in abundance, maycorrespond to Ri and/or R2 elements, but no example hasbeen cloned.N terminal to the RT domain in both RT1 and RT2 are two

potential nucleic acid-binding motifs, one or both of whichmight correspond to an integrase domain. Like the Cys motiffound in retroviral integrases, HX3HX22-32CX2CX108 130ER(38), the first Cys motif encountered has the sequenceHX5HX25CX2CX9=101ER. A similar motif has been re-ported from the trypanosomatid site-specific elements, al-though it occurs at the 5' end of ORF2, N terminal to the RTdomain (63). Encompassed by the first Cys motif, the secondpotential Cys motif in RT1, CX2CX7HHX4C, shares twocysteines with the first motif. In RT2, the two histidines arereplaced by two aspartic acid residues. This second Cysmotif more closely resembles the pattern of cysteine andhistidine residues that characterizes the Cys motifs in ORFi.Interestingly, the second Cys motif matches a consensus,CX1l3CX7sHX4C, derived from similar motifs found in thesame relative location in eight other non-LTR retrotrans-posons (35). The relative importance of the first and secondCys motifs is not known. If they play a role in target siterecognition, however, the substitution of two negativelycharged histidine residues by two positively charged aspartic

acid residues in the second motif of RT2 would be unex-pected.A comparison of the rates of synonymous (silent) and

nonsynonymous (replacement) substitutions for both ORFsof RT1 and RT2 was conducted by using the method of Li etal. (45). In ORF2, the rate of synonymous substitutions(expressed in terms of 10-' substitutions per site per year +the standard error) was sixfold higher than the rate ofnonsynonymous substitutions (2.1 + 0.2 versus 0.35 + 0.01).For ORF1, when the comparison was based on alignment ofthe entire coding region, no significant difference in rates ofsubstitution was observed, probably owing to the lack of ameaningful alignment. This suggests an accelerated rate ofevolution at the N terminus. However, if the comparisonwas limited to the last 816 bp of the ORF, again a sixfolddifference in rates was found (2.0 + 0.4 versus 0.35 + 0.03).The absolute rates are based on an estimated divergencetime of 80 million years between mammals. The rate ofsubstitution at silent sites in nuclear genes of mammals is atleast three times lower than that observed in Drosophilaspecies (60), and since rates of substitution in retroelementsare generally higher than in nuclear genes because of error-prone RT (24), the absolute rates are less meaningful thanthe sixfold relative difference between the types of substitu-tions, indicating selection for protein function.RT gene expression. Developmental Northern blots were

prepared by using both total RNA and mRNA isolated fromembryos, larvae, pupae, and adults of the G3 colony of A.gambiae. These were probed separately with an internalfragment of an RT1 clone and an RT2 clone (see Materialsand Methods) and an A. gambiae actin clone (57) as apositive control. While a signal was detected with the actinprobe, no signal was detected with the RT probes, even afterprolonged exposure of the blots to film (data not shown).

Distribution of RT1 and RT2 in the A. gambiae complex.The presence of RT1 and RT2 elements in members of theA.gambiae complex was assessed by Southern blotting. Totalgenomic DNA was isolated from individual adult mosqui-toes, digested with Sall, fractionated on agarose gels, blot-ted, and probed individually with internal fragments ofcloned RT1, RT2, and the xanthine dehydrogenase (Xdh)gene (19) from A. gambiae. Since RT1 and RT2 sequencesdo not cross-hybridize on Southern blots, each probe wasused simultaneously to produce the results presented in Fig.3.

Figure 3A shows the results of comparisons among vari-ous colonies and field-collected specimens. The Xdh probeproduced a single 5.5-kb band of hybridization in all lanes.Strong bands of hybridization of the expected size for RT2were detected from each A. gambiae, A. arabiensis, and A.quadriannulatus colony and field specimen assayed. No RT2signals were detected from A. merus colonies or field spec-imens. Although no A. melas specimens were assayed, RT2does occur in this species because a region of this elementhas been successfully cloned by PCR and sequenced fromthe BAL colony (10).The distribution pattern of RT1 was complex. An intense

band of hybridization of the expected size for RT1 wasdetected from the A. gambiae colony assayed. In addition,faint RT1 signals detected from both A. arabiensis coloniesindicated the presence of RT1 in lower copy number in thisspecies. No RT1 signal was detected from A. quadriannula-tus or either of two A. merus colonies. Again, A. melas wasnot assayed, but the occurrence of RT1 in this species hasbeen demonstrated by PCR amplification and sequencing(10). A faint RT1 hybridization signal was detected from a

VOL. 12, 1992

Page 5: Distinct Families of Site-Specific Retrotransposons Occupy Identical

5106 BESANSKY ET AL.

A5.5 0-

1 2 3 4 5 6 7 8 9 101112

--Xdh

'-pk,.2.8 o- -!

IS F-.1111 ., r.-.[I

B5-5 -

F F F F F F

-RT2

N * -RT1

F F F M MM M

-Xdh

2.8 -- ~~~~-:- -RT2

-I' --o RT1

FIG. 3. (A) Hybridization patterns of Xdh, RT1, and RT2 toSalI-digested genomic DNA extracted from individual adult fe-males. Lanes: 1 to 6, laboratory specimens of A. gambiae (SUA)(lane 1), A. arabiensis (ARZAG and GMAL) (lanes 2 and 3), A.quadriannulatus (SQUAD) (lane 4), andA. merus (V12 and ZULU)(lanes 5 and 6); 7 to 9, progeny of wild-caught A. gambiae females(no. 25, 28, and 34); 10 to 12, wild specimens ofA. arabiensis (lane10), A. quadriannulatus (lane 11), and A. merus (lane 12). (B)Hybridization patterns of Xdh, RT1, and RT2 to SalI-digestedgenomic DNA extracted from progeny of wild-caught female 34. F,female; M, male.

Kenyan specimen of A. arabiensis, and no signal wasdetected from A. quadriannulatus and A. merus field speci-mens from Zimbabwe and Kenya, respectively. Three fieldspecimens of A. gambiae, the daughters of blood-fed fe-males captured at the same time from one Kenyan village,revealed a surprising pattern of hybridization with the RT1probe. One, from family 25, showed a moderate hybridiza-tion signal. The next, from family 28, showed a signal nostronger than that of the Xdh band. A third, from family 34,showed an intense hybridization signal.To investigate this phenomenon more closely, a Southern

blot containing genomic DNA of siblings from family 34 wasprepared as before and probed with the Xdh, RT1, and RT2probes (Fig. 3B). The hybridization signal with the RT2probe was uniform, taking into account the fact that sons,with about half of the signal intensity, have only one Xchromosome, to which the rRNA loci map (16). With theRT1 probe, two different signal intensities were observed.The bands detected in four of the daughters appeared to haveabout half of the intensity of bands detected in another fivedaughters. Likewise, the bands detected in two sons ap-peared to have roughly half of the intensity of those detectedin another two sons. We were able to arrive at a morequantitative estimate of copy number by using for each lanethe Xdh band as a single-copy gene reference and analyzingthe blot directly with an instrument that senses and recordsbeta emissions. After 4 h of data collection, the total activity(expressed as total counts) recorded from RT1 or RT2 in agiven lane was divided by the total activity recorded fromXdh in that lane, after correction for the background in eachcase. Since males are heterogametic, activity ratios weredoubled to estimate the copy number per haploid genome.Occasional very faint bands detected by the RT1 and RT2probes in unexpected positions, probably the result of devi-

TABLE 1. Copy numbers of RT1 and RT2 in selected colony andfield mosquitoese

Copy no.Mosquito and colony or line

RT1 RT2

A. gambiae coloniesG3 20 4SUA 76 18

A. arabiensis coloniesARZ 2 32GMAL 1 53

A. quadiiannulatus SQUAD ob 7A. merus coloniesV12 Ob obZULU ob ob

A. arabiensis Kl line 3 24A. quadriannulatus CHIL line ob 10A. gambiae lines

25, male 24 1028, male 4 1034

Sib A 38 23Sib B 37 19Sib C 4 19Sib D 5 20Sib E 4 20Sib F 31 17Sib G 33 18Sib H 39 20Sib I 50 26Sib J 8 30Sib K, male 7 28Sib L, male 10 28Sib M, male 80 22Sib N, male 79 20

a Individuals are female unless otherwise indicated. Copy number refers tohaploid genome as determined by the element/Xdh activity ratio, assuming nodosage compensation. Since males are heterogametic, copy number estimatesbased on this ratio were doubled. Sib, sibling.bAbsence of elements was supported by PCR.

ant element copies, were omitted from the analysis. Theresults for both blots are given in Table 1.The copy numbers of RT1 and RT2 in two laboratory

colonies ofA. gambiae varied about fourfold, from 20 to 76copies of RT1 and from 4 to 18 copies of RT2. Thesenumbers are consistent with the 10 to 30 copies of RT2estimated from the offspring of field-collected females. How-ever, a 20-fold variation in copy number (4 to 80) wasobserved for RT1 among siblings of a single female, field-collected female 34. It is possible that such large copynumber fluctuations would also have been detected in labo-ratory colonies, had more individuals been assayed. WhileRT1 predominates over RT2 in wild and laboratory-rearedA. gambiae, the reverse applies toA. arabiensis, with 1 to 3copies of RT1 versus 24 to 53 copies of RT2. The A.quadriannulatus specimens represent an extreme case, inwhich RT2 elements are present in low copy number (7 to 10)and RT1 elements are not detected. As noted earlier, bothelement families are apparently absent from A. merus.

DISCUSSIONIn this study, we characterized two distinct families of

non-LTR retrotransposons from the mosquito A. gambiaethat share a specific insertion site in the 28S coding region ofthe rDNA. These two element families, RT1 and RT2, not

MOL. CELL. BIOL.

..:1.9 0- PI .iw, 'Momr-

Page 6: Distinct Families of Site-Specific Retrotransposons Occupy Identical

DISTINCT RETROTRANSPOSONS WITH IDENTICAL TARGET SITES 5107

only share all of the structural features that typify elementsof this class but are more similar to each other at thenucleotide and amino acid levels within the coding regionsthan are any two other families of such elements described todate, including other site-specific ribosomal insertion ele-ments. The protein sequence divergence between RT1 andRT2 averaged across ORF2 compares to that found withinthe conserved 3' end of ORF2 among members of themammalian Li family derived from humans and mice (32).We have shown that while the ORF2 regions are more

closely related than the ORFi regions of RT1 and RT2,considerable similarity can still be found between ORFls,even outside of the conserved nucleic acid-binding (Cys)domains. The relatively high degree of conservation amongthe ORFs and the strong bias toward silent versus replace-ment substitutions are consistent with the operation ofselective constraints associated with their proposed roles inencoding functional nucleic acid-binding proteins, RT, andintegrases. That they are more similar to the Ri ribosomalinsertion elements than to other non-LTR retrotransposons,both throughout the RT domain and in target sequence, isalso consistent with a hypothesis of divergence of RT1 andRT2 from an Ri ancestral sequence (or vice versa).

It is difficult to reconcile the relative similarity amongcoding regions with the lack of any obvious relationshipamong 5' and 3' noncoding regions. This phenomenon hasbeen previously observed within the mammalian Li familyof elements, where the 5' and 3' ends diverge so much morerapidly than the coding regions that there are species-specificlength differences at the 3' ends, and no meaningful se-quence alignment at either end is possible between membersfrom distantly related species (64). In fact, among the Lielements of mice, three alternative 5' noncoding regions thatlack sequence similarity have been described, the so-calledA, F, and V types (39, 46, 54). Types A and F are composedof tandem repeats of 208 and 206 bp, respectively, reminis-cent of the 500-bp tandem-repeat structure of the 5' end ofRT1. Type V lacks any tandem repetitive structure, as doesthe 5' end of RT2. Interestingly, F and V sequences arefound separate from Li elements in the mouse genome.Although human Li elements lack a 5' repeat structure, the5' end of rat Li elements also contains a tandem repeat 600bp long (29). Apart from the mammalian Li family, twosite-specific trypanosomatid non-LTR retrotransposons con-taining a 185-bp tandem repeat at the 5' end have beendescribed, although related elements that lack this structuralfeature also exist (2). Thus, it appears that this type ofelement may be able to recombine at the 5' end, by anundefined mechanism, with unrelated genomic sequences(39) and may do so relatively frequently in an evolutionarytime frame. We intend to establish whether the 5' tandemrepeats exhibited by RT1 are found by themselves in the A.gambiae genome.The type A repeats of mouse Li and the repeats of rat Li

are able to promote transcription (29, 59). However, atandem repetitive structure is not critical for transcription ofnon-LTR retrotransposons, since sequences within the first100 bp of the human Li element are also capable of promot-ing transcription (61), as are sequences from D. melanogas-ter jockey and F elements (52, 53). Differences in bothstructure and sequence do seem to have important effects onregulation of transcription. For example, although the hu-man Li promoter is active in rat and mouse cells, its activityappears to be both cell type and stage specific (61). Incontrast, jockey seems to be transcribed at high levels at alldevelopmental stages (53). Restrictions on the level and

timing of transcription have important implications for theability of the element to persist and spread in the genome. Tothe extent that the acquisition of new 5' ends with differenttranscriptional regulation can lead to the establishment ofnew subfamilies or families of elements in the same organ-ism, this mechanism might partially underly the proliferationof many related families in the genome. Unfortunately, wehave been unable to detect transcripts from either RT1 orRT2 elements at any developmental stage of the mosquito orfrom the A. gambiae cell line. Neither RT1 nor RT2 sharesany sequence similarity to the promoter motifs present at the5' ends of F, jockey, and other D. melanogaster non-LTRretrotransposons (52). It is possible that the level of tran-scription was lower than was measurable by Northern blot-ting or that the window of active transcription was limited toa very narrow time period at a given developmental stage.Given the extremely close evolutionary relatedness of

members of the A. gambiae complex, the uneven distribu-tion of RT1 and RT2 in both wild and colonized specimens issurprising. The only species that apparently lacked se-quences homologous to either element family was A. merus,and A. quadriannulatus harbored only RT2. These resultscould be attributed to either a sampling artifact or sequencedivergence, rather than true absence. A sampling artifactcannot be ruled out but seems unlikely given the consistentfailure to detect these element families by either PCR orSouthern analysis by using specimens that originated fromgeographically distant regions, Kenya and Zululand. Se-quence divergence is more difficult to discount, however,even though the probes included one of the most highlyconserved regions of the coding portion of these elementsand non-LTR retrotransposons generally. Sequence compar-isons have been made among PCR fragments of RT2 clonedfrom four members of the species complex (10). They showthat of the last 174 nucleotides of ORF2, there are no intra-nor interspecific differences among elements from A. gam-biae, A. arabiensis, and A. quadriannulatus but that ele-ments from A. melas differ at 54 (31%) of the positions. Ifthere were elements in A. merus that showed a comparabledegree of divergence over the 2- to 3-kb region representedby the probes, then it is possible that washing of Southernblots at high stringency would remove the probe. Experi-ments are in progress to test this possibility.The remaining species assayed contained various copy

numbers of both subfamilies. Because of the existence ofvarious polyploid tissues in the adult, the potential under-replication of interrupted rRNA genes in those tissues (4),the possibility of somatic elimination of heterochromaticsequences (40), and questions about dosage compensation inmale mosquitoes, the copy numbers in Table 1 are not exact.Nevertheless, they reveal striking differences, particularlywith respect to RT1 elements, not only between species andbetween colonies of the same species, but also among theoffspring of individual adult females from the field. Thesedifferences in copy number of RT1 between isofemale linessuggest the coexistence in a single population of several Xchromosomes that differ with respect to the copy number ofthis element. Within isofemale line 34, males and femalesboth fall into two groups: high copy number, ranging from 31to 80, and low copy number, ranging from 4 to 10. Thus,Mendelian inheritance may be sufficient to explain theimmediate results of this particular mating. Interestingly, thecopy number fluctuations detected in RT2 were much lower,particularly within isofemale line 34, in which no strikingdifferences in copy number were observed.The distribution pattern of RT1 and RT2 needs to be

VOL. 12, 1992

Page 7: Distinct Families of Site-Specific Retrotransposons Occupy Identical

5108 BESANSKY ET AL.

understood in the context of their role as elements that insertspecifically into the 28S coding region of a percentage (from2 to 20%) of the 500 to 600 rRNA genes. In D. melanogaster,individuals with 50% of rRNA genes interrupted in the 28Sregion, known as bobbed 8 mutants, have significantlylonger development times, presumably because of insuffi-cient rRNA. Studies of rDNA transcription in these mutantshave shown that interrupted genes are not usually tran-scribed and that most of those that are transcribed areprocessed or degraded (36). We assume that a similarphenomenon occurs inA. gambiae. This phenotypic effect issurely subject to selective pressures, particularly in thecontext of population flushes and crashes associated withrainy and dry seasons and ecologically marginal situations inthe field. Other important considerations include recombina-tion mechanisms typical of tandemly repeated DNA, such asrRNA genes. Intrachromosomal recombination would resultin rapid loss of part of a tandem array on a single chromo-some, and unequal crossing over between chromosomeswould result in expansion of the copy number on onechromosome and deletion on the other. Assuming that someof the elements in these families are still capable of amplifi-cation by retroposition and that some gene flow betweentaxa still occurs, albeit at a very low frequency (18), the onlyconclusion that can safely be made is that the copy numberof RT1 and RT2 is unstable within this sibling speciescomplex.

In the face of such flux, it is remarkable that RT1 and RT2appear to have coexisted for millions of years in competitionfor the same conserved insertion site. Support for thisassertion comes from a sequence comparison between func-tional I factors from D. melanogaster and D. teissieri (1).Sequences that hybridize to I-factor probes have been de-tected in all but 1 of the 21 members of the D. melanogastergroup and four species outside it, leading to the suggestionthat I elements are evolutionarily old components of thegenomes of these species (28). The level of single-copynuclear DNA divergence between D. melanogaster and D.teissieri has been determined by DNA-DNA hybridization tobe about a 7% base pair mismatch (14). Assuming a conver-sion of 1.7% mismatch per million years (14), D. melanogas-ter may have split from D. teissieri about 4 million years ago.The overall nucleotide mismatch between I factors of D.melanogaster and D. teissien is 15% (1), twice as high asmeasured for single-copy nuclear DNA, only a small fractionof which is composed of coding sequences. The overallnucleotide mismatch between the RT1 and RT2 codingregions is 39%. Therefore, discounting the possibility ofhorizontal transmission, which we have no experimentalevidence to support, and assuming comparable rates ofDNAdivergence in anopheles, RT1 and RT2 may have coexistedfor at least 4 million and perhaps as many as 10 million years.These elements may owe their lengthy coexistence to twofactors. First, because of major structural and sequencedifferences in the 5' noncoding region, they are almostcertainly regulated quite differently. These differences inregulation may be reflected in the copy number differencesshown in Table 1. The second factor concerns the rapidlyevolving N terminus of ORF1. Others have suggested thatthe first ORF in non-LTR retrotransposons serves a species-specific function, since it seems to be much less conservedamong elements from different species than the second,RT-encoding ORF. We propose that the different N terminiof the respective ORFls encode specialized functions uniqueto each family that allow each to coexist in an individual

niche in the genome. Thus, ORF1 differences may have lessto do with species specificity than with functional specificity.

ACKNOWLEDGMENTS

This work was supported by World Health Organization/TropicalDisease Research grants 890534 and 900573 and by the NationalCenter of Infectious Diseases of the U.S. Centers for DiseaseControl.We gratefully acknowledge the support of staff of the CDC

Biotechnology Core Facility Branch, B. Holloway, M. Rasmussen,and E. George, for synthesis of oligonucleotide sequencing primersand assistance with the Betascope Analyzer and S. McKneally forcomputer assistance. We thank an anonymous reviewer whosehelpful comments improved the manuscript.

REFERENCES1. Abad, P., C. Vaury, A. Pelisson, M.-C. Chaboissier, I. Busseau,

and A. Bucheton. 1989. A long interspersed repetitive element-the I factor of Drosophila teissieri-is able to transpose indifferent Drosophila species. Proc. Natl. Acad. Sci. USA 86:8887-8891.

2. Aksoy, S. 1991. Site-specific retrotransposons of the trypanoso-matid protozoa. Parasitol. Today 7:281-285.

3. Back, E., E. V. Meir, F. Mueller, D. Schaller, H. Neuhaus, P.Aeby, and H. Tobler. 1984. Intervening sequences in the ribo-somal RNA genes ofAscaris lumbricoides: DNA sequences atjunctions and genomic organization. EMBO J. 3:2523-2529.

4. Beckingham, K., and N. Thompson. 1982. Under-replication ofintron+ rDNA cistrons in polyploid nurse cell nuclei of Calli-phora erythrocephala. Chromosoma 87:177-196.

5. Bedell, J. A., and N. J. Besansky. Unpublished data.6. Berg, D. E., and M. M. Howe. 1989. Mobile DNA. American

Society for Microbiology, Washington, D.C.7. Berg, J. M. 1990. Zinc fingers and other metal-binding domains.

J. Biol. Chem. 265:6513-6516.8. Besansky, N. J. 1990. A retrotransposable element from the

mosquito Anopheles gambiae. Mol. Cell. Biol. 10:863-871.9. Besansky, N. J. 1990. Evolution of the Ti retroposon family in

the Anopheles gambiae complex. Mol. Biol. Evol. 7:229-246.10. Besansky, N. J., and J. A. Bedell. Unpublished data.11. Besansky, N. J., D. Mills Hamm, and F. H. Collins. Unpublished

data.12. Boeke, J. D., and K. B. Chapman. 1991. Retrotransposition

mechanisms. Curr. Opin. Cell Biol. 3:502-507.13. Burke, W. D., C. C. Calalang, and T. H. Eickbush. 1987. The

site-specific ribosomal insertion element type II ofBombyx mori(R2Bm) contains the coding sequence for a reverse tran-scriptase-like enzyme. Mol. Cell. Biol. 7:2221-2230.

14. Caccone, A., G. D. Amato, and J. R. Powell. 1988. Rates andpatterns of scnDNA and mtDNA divergence within the Dro-sophila melanogaster subgroup. Genetics 118:671-683.

15. Collins, F. H., M. A. Mendez, M. 0. Rasmussen, P. C. Mehaffey,N. J. Besansky, and V. Finnerty. 1987. A ribosomal RNA geneprobe differentiates member species of the Anopheles gambiaecomplex. Am J. Trop. Med. Hyg. 37:37-41.

16. Collins, F. H., S. M. Paskewitz, and V. Finnerty. 1989. Ribo-somal RNA genes of the Anopheles gambiae complex. Adv.Dis. Vector Res. 6:1-28.

17. Coluzzi, M., V. Petrarca, and M. A. Di Deco. 1985. Chromo-somal inversion intergradation and incipient speciation inAnopheles gambiae. Boll. Zool. 52:45-63.

18. Coluzzi, M., A. Sabatini, V. Petrarca, and M. A. Di Deco. 1979.Chromosomal differentiation and adaptation to human environ-ments in theAnophelesgambiae complex. Trans. R. Soc. Trop.Med. Hyg. 73:483-497.

19. Crews-Oyen, A., D. Mills Hamm, and F. H. Collins. Unpub-lished data.

20. Deragon, J.-M., D. Sinnett, and D. Labuda. 1990. Reversetranscriptase activity from human embryonal carcinoma cellsNTera2D1. EMBO J. 9:3363-3368.

21. Devereux, J., P. Haeberli, and 0. Smithies. 1984. A comprehen-sive set of sequence analysis programs for the VAX. Nucleic

MOL. CELL. BIOL.

Page 8: Distinct Families of Site-Specific Retrotransposons Occupy Identical

DISTINCT RETROTRANSPOSONS WITH IDENTICAL TARGET SITES 5109

Acids Res. 12:387-395.22. Di Nocera, P. P., F. Graziani, and G. Lavorgna. 1986. Genomic

and structural organization of Drosophila melanogaster G ele-ments. Nucleic Acids Res. 14:675-691.

23. Doolittle, R. F., D.-F. Feng, M. S. Johnson, and M. A. McClure.1989. Origins and evolutionary relationships of retroviruses. Q.Rev. Biol. 64:1-30.

24. Dougherty, J. P., and H. M. Temin. 1988. Determination of therate of base-pair substitution and insertion mutations in retro-virus replication. J. Virol. 62:2817-2822.

25. Evans, J. P., and R. D. Pahmiter. 1991. Retrotransposition of amouse L-1 element. Proc. Natl. Acad. Sci. USA 88:8792-8795.

26. Fawcett, D. H., C. K. Lister, E. Kellett, and D. J. Finnegan.1986. Transposable elements controlling I-R hybrid dysgenesisin D. melanogaster are similar to mammalian LINEs. Cell47:1007-1015.

27. Feinberg, A. P., and B. Vogelstein. 1984. A technique forradiolabeling DNA restriction endonuclease fragments to highspecific activity. Anal. Biochem. 137:266-267.

28. Finnegan, D. J. 1989. The I factor and I-R hybrid dysgenesis inDrosophila melanogaster, p. 503-517. In D. E. Berg and M. M.Howe (ed.), Mobile DNA. American Society for Microbiology,Washington, D.C.

29. Furano, A. V., S. M. Robb, and F. T. Robb. 1988. The structureof the regulatory region of the rat Li (LlRn, long interspersedrepeated) DNA family of transposable elements. Nucleic AcidsRes. 16:9215-9231.

30. Gabriel, A., and J. D. Boeke. 1991. Reverse transcriptaseencoded by a retrotransposon from the trypanosomatidCrithidiafasciculata. Proc. Natl. Acad. Sci. USA 88:9794-9798.

31. Grabowski, D. T., J. P. Carney, and M. R. Kelley. 1991. ADrosophila gene containing the opa repetitive element is exclu-sively expressed in adult male abdomens. Nucleic Acids Res.19:1709.

32. Hutchison, C. A., II, S. C. Hardies, D. D. Loeb, W. R. Shehee,and M. H. Edgeli. 1989. LINEs and related retroposons: longinterspersed repeated sequences in the eucaryotic genome, p.593-617. In D. E. Berg and M. M. Howe (ed.), Mobile DNA.American Society for Microbiology, Washington, D.C.

33. Ivanov, V. A., A. A. Melnikov, A. V. Siunov, L. I. Fodor, andY. V. Ilyin. 1991. Authentic reverse transcriptase is coded byjockey, a mobile Drosophila element related to mammalianLINEs. EMBO J. 10:2489-2495.

34. Jakubezak, J. L., W. D. Burke, and T. H. Eickbush. 1991.Retrotransposable elements Ri and R2 interrupt the rRNAgenes of most insects. Proc. Natl. Acad. Sci. USA 88:3295-3299.

35. Jakubczak, J. L., Y. Xiong, and T. H. Eickbush. 1990. Type I(R1) and type II (R2) ribosomal DNA insertions of Drosophilamelanogaster are retrotransposable elements closely related tothose of Bombyx mon. J. Mol. Biol. 212:37-52.

36. Jamfich, M., and 0. L. Miller, Jr. 1984. The rare transcripts ofinterrupted rRNA genes in Drosophila melanogaster are proc-essed or degraded during synthesis. EMBO J. 3:1541-1545.

37. Jensen, S., and T. Heidmann. 1991. An indicator gene fordetection of germline retrotransposition in transgenic Droso-phila demonstrates RNA-mediated transposition of the LINE 1element. EMBO J. 10:1927-1937.

38. Johnson, M. S., M. A. McClure, D. F. Feng, J. Gray, and R. F.Doolittle. 1986. Computer analysis of retroviral pol genes:assignment of enzymatic functions to specific sequences andhomologies with nonviral enzymes. Proc. Natl. Acad. Sci. USA83:7648-7652.

39. Jubier-Maurin, V., G. Cuny, A.-M. Laurent, L. Paquereau, andG. Roizes. 1992. A new 5' sequence associated with mouse Lielements is representative of a major class of Li termini. Mol.Biol. Evol. 9:41-55.

40. Karpen, G. H., and A. C. Spradling. 1990. Reduced DNApolytenization of a minichromosome region undergoing posi-tion-effect variegation in Drosophila. Cell 63:97-107.

41. Kerrebrock, A. W., R. Srivastava, and S. A. Gerbi. 1989.Isolation and characterization of ribosomal DNA variants fromSciara coprophila. J. Mol. Biol. 20:1-13.

42. Kimmel, B. E., 0. K. ole-Moiyoi, and J. R. Young. 1987. Ingi, a5.2-kb dispersed sequence element from Trypanosoma bruceithat carries half of a smaller mobile element at either end andhas homology with mammalian LINEs. Mol. Cell. Biol. 7:1465-1475.

43. Kinsey, J. A. 1990. Tad, a LINE-like transposable element ofNeurospora, can transpose between nuclei in heterokaryons.Genetics 126:317-323.

44. Kozak, M. 1984. Compilation and analysis of sequences up-stream from the translational start site in eukaryotic mRNAs.Nucleic Acids Res. 12:857-872.

45. Li, W.-H., C.-I. Wu, and C.-C. Luo. 1985. A new method forestimating synonymous and nonsynonymous rates of nucleotidesubstitution considering the relative likelihood of nucleotide andcodon changes. Mol. Biol. Evol. 2:150-174.

46. Loeb, D. D., R. W. Padgett, S. C. Hardies, W. R. Shehee, M. B.Comer, M. H. Edgell, and C. A. Hutchison HI. 1986. Thesequence of a large LlMd element reveals a tandemly repeated5' end and several features found in retrotransposons. Mol.Cell. Biol. 6:168-182.

47. MacDonald, R. J., G. H. Swift, A. E. Przybyla, and J. M.Chirgwin. 1987. Isolation of RNA using guanidinium salts.Methods Enzymol. 152:219-227.

48. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecularcloning: a laboratory manual. Cold Springs Harbor Laboratory,Cold Springs Harbor, N.Y.

49. Martin, S. L. 1991. Ribonucleoprotein particles with LINE-1RNA in mouse embryonal carcinoma cells. Mol. Cell. Biol.11:4804-4807.

50. Mathias, S. L., A. F. Scott, H. H. Kazazian, Jr., J. D. Boeke, andA. Gabriel. 1991. Reverse transcriptase encoded by a humantransposable element. Science 254:1808-1810.

51. McClure, M. A. 1991. Evolution of retroposons by acquisitionor deletion of retrovirus-like genes. Mol. Biol. Evol. 8:835-856.

52. Minchiotti, G., and P. P. Di Nocera. 1991. Convergent transcrip-tion initiates from oppositely oriented promoters within the 5'end regions of Drosophila melanogaster F elements. Mol. Cell.Biol. 11:5171-5180.

53. Mizrokhi, L. J., S. G. Georgieva, and Y. V. Ilyin. 1988. Jockey,a mobile Drosophila element similar to mammalian LINEs, istranscribed from the internal promoter by polymerase II. Cell54:685-691.

54. Padgett, R. W., C. A. Hutchison m, and M. H. Edgell. 1988. TheF-type 5' motif of mouse Li elements: a major class of Litermini similar to the A-type in organization but unrelated insequence. Nucleic Acids Res. 16:739-749.

55. Paskewitz, S. M., and F. H. Collins. 1989. Site-specific ribo-somal DNA insertion elements in Anopheles gambiae and A.arabiensis: nucleotide sequence of gene-element boundaries.Nucleic Acids Res. 17:8125-8133.

56. Pelisson, A., D. J. Finnegan, and A. Bucheton. 1991. Evidencefor retrotransposition of the I factor, a LINE element ofDrosophila melanogaster. Proc. Natl. Acad. Sci. USA 88:4907-4910.

57. Salazar, C., D. Mills Hamm, C. B. Beard, and F. H. Collins.Unpublished data.

58. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequenc-ing with chain-terminating inhibitors. Proc. Natl. Acad. Sci.USA 74:5463-5467.

59. Severynse, D., C. Hutchison m, and M. Edgell. 1989. Transcrip-tional regulatory sequences within the LlMd family. Presentedat the LINE-1 Related Transposable Elements Workshop,Washington, D.C. 9-11 October.

60. Sharp, P. M., and W.-H. Li. 1989. On the rate ofDNA sequenceevolution in Drosophila. J. Mol. Evol. 28:398-402.

61. Swergold, G. 1990. Identification, characterization, and cellspecificity of a human LINE-1 promoter. Mol. Cell. Biol.10:6718-6729.

62. Toneguzzo, F., S. Glynn, E. Levi, S. Jmolsness, and A. Hayday.1988. Use of a chemically modified T7 DNA polymerase formanual and automated sequencing of supercoiled DNA. Bio-Techniques 6:460-469.

63. Villanueva, M. S., S. P. Williams, C. B. Beard, F. R. Richards,

VOL. 12, 1992

Page 9: Distinct Families of Site-Specific Retrotransposons Occupy Identical

MOL. CELL. BIOL.

and S. Aksoy. 1991. A new member of a family of site-specificretrotransposons is present in the spliced leader RNA genes ofTrypanosoma cruzi. Mol. Cell. Biol. 11:6139-6148.

64. Weiner, A. M., P. L. Deininger, and A. Efstratiadis. 1986.Nonviral retroposons: genes, pseudogenes, and transposableelements generated by the reverse flow of genetic information.Annu. Rev. Biochem. 55:631-661.

65. Wharton, K. A., B. Yedvobnick, V. G. Finnerty, and S. Arta-vanis-Tsakonas. 1985. opa: a novel family of transcribed repeatsshared by the Notch locus and other developmentally regulated

loci in D. melanogaster. Cell 40:55-62.66. Xiong, Y., and T. H. Eickbush. 1988. Similarity of reverse

transcriptase-like sequences of viruses, transposable elements,and mitochondrial introns. Mol. Biol. Evol. 5:675-690.

67. Xiong, Y., and T. H. Eickbush. 1988. The site-specific ribosomalDNA insertion element RlBm belongs to a class of non-long-terminal-repeat retrotransposons. Mol. Cell. Biol. 8:114-123.

68. Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution ofretroelements based upon their reverse transcriptase sequences.EMBO J. 9:3353-3362.

5110 BESANSKY ET AL.