the centromeric retrotransposons of rice are transcribed ... · this family of retrotransposons...

13
Copyright Ó 2007 by the Genetics Society of America DOI: 10.1534/genetics.107.071902 The Centromeric Retrotransposons of Rice Are Transcribed and Differentially Processed by RNA Interference Pavel Neumann, Huihuang Yan and Jiming Jiang 1 Department of Horticulture, University of Wisconsin, Madison, Wisconsin 53706 Manuscript received February 9, 2007 Accepted for publication March 12, 2007 ABSTRACT Retrotransposons consist of significant portions of many complex eukaryotic genomes and are often enriched in heterochromatin. The centromeric retrotransposon (CR) family in grass species is colonized in the centromeres and highly conserved among species that have been diverged for .50 MY. These unique characteristics have inspired scientists to speculate about the roles of CR elements in organization and function of centromeric chromatin. Here we report that the CRR (CR of rice) elements in rice are highly enriched in chromatin associated with H3K9me2, a hallmark for heterochromatin. CRR elements were transcribed in root, leaf, and panicle tissues, suggesting a constitutive transcription of this retro- transposon family. However, the overall transcription level was low and the CRR transcripts appeared to be derived from relatively few loci. The majority of the CRR transcripts had chimerical structures and contained only partial CRR sequences. We detected small RNAs (smRNAs) cognate to nonautonomous CRR1 (noaCRR1) and CRR1, but not CRR2 elements. This result was also confirmed by in silico analysis of rice smRNA sequences. These results suggest that different CRR subfamilies may play different roles in the RNAi- mediated pathway for formation and maintenance of centromeric heterochromatin. T HE centromeric retrotransposon of rice (CRR) is highly enriched in the centromeres of rice chro- mosomes (Cheng et al. 2002; Nagaki et al. 2005). CRR is a Ty3/gypsy-like retrotransposon (Metaviridae) and belongs to a class of chromoviruses (Gorinsek et al. 2004, 2005; Kordis 2005). This class of retrotransposon is defined by the presence of a chromodomain within its integrase and seems to be widespread among eukar- yotes. The chromodomain identified within chromovi- ruses is related to the chromodomain of heterochromatin protein 1 (HP1) of Drosophila melanogaster (Gorinsek et al. 2005; Kordis 2005), which interacts with histone 3 meth- ylated at lysine 9 (H3K9me) (Bannister et al. 2001; Fischle et al. 2003). Both HP1 and H3K9me are hallmarks of heterochromatin (Elgin 1996; Grewal and Rice 2004). Although it has not been experimentally proved yet, it is likely that the chromodomain of the CRR elements is responsible for their preferential localization in the cen- tromeres as well as in pericentromeric regions. Phylogenetic analysis showed that the CRR family has diverged into two subfamilies of autonomous elements (CRR1 and CRR2) and two subfamilies of nonautono- mous elements (noaCRR1 and noaCRR2) (Nagaki et al. 2005). Close CRR relatives identified in several distantly related grass species (Miller et al. 1998), including maize (centromeric retrotransposon of m aize, CRM) (Zhong et al. 2002; Nagaki et al. 2003), barley (Cereba elements) (Presting et al. 1998; Hudakova et al. 2001), and sug- arcane (Nagaki and Murata 2005), are almost exclu- sively located in centromeres. These results suggest that this family of retrotransposons colonized in the centro- meres of the common ancestor of these grass species and has maintained its centromeric specificity for .50 MY (Kellogg 2001). The comparison of centromeric retro- transposons from rice, maize, and barley revealed several highly conserved motifs even within their long terminal repeats (LTRs), suggesting that the sequences might have evolved under selective pressure at the DNA level (Nagaki et al. 2003). Repetitive DNA elements located in the centromeric heterochromatin are often transcriptionally silent. How- ever, a low level of transcription paradoxically is required for establishing the transcriptionally silent state of het- erochromatin through RNA interference (RNAi) (reviewed in Bernstein and Allis 2005; Matzke and Birchler 2005). In the RNAi pathway, double-stranded RNA (dsRNA) derived from repetitive DNA sequences is processed by the RNAi machinery into short interfer- ence RNA (siRNA). The siRNAs then participate in RNAi by targeting other RNA molecules for degradation and also by driving heterochromatin formation at com- plementary genomic sites (Grewal and Rice 2004; Verdel et al. 2004; Chan et al. 2005; Gendrel and Colot 2005). Thus, the importance of this pathway not only lies in silencing of repetitive DNA elements, which are frequent targets of RNAi, but also seems to be crucial 1 Corresponding author: Department of Horticulture, University of Wisconsin, 1575 Linden Dr., Madison, WI 53706. E-mail: [email protected] Genetics 176: 749–761 ( June 2007)

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

Copyright � 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.107.071902

The Centromeric Retrotransposons of Rice Are Transcribed andDifferentially Processed by RNA Interference

Pavel Neumann, Huihuang Yan and Jiming Jiang1

Department of Horticulture, University of Wisconsin, Madison, Wisconsin 53706

Manuscript received February 9, 2007Accepted for publication March 12, 2007

ABSTRACT

Retrotransposons consist of significant portions of many complex eukaryotic genomes and are oftenenriched in heterochromatin. The centromeric retrotransposon (CR) family in grass species is colonizedin the centromeres and highly conserved among species that have been diverged for .50 MY. Theseunique characteristics have inspired scientists to speculate about the roles of CR elements in organizationand function of centromeric chromatin. Here we report that the CRR (CR of rice) elements in rice arehighly enriched in chromatin associated with H3K9me2, a hallmark for heterochromatin. CRR elementswere transcribed in root, leaf, and panicle tissues, suggesting a constitutive transcription of this retro-transposon family. However, the overall transcription level was low and the CRR transcripts appeared to bederived from relatively few loci. The majority of the CRR transcripts had chimerical structures and containedonly partial CRR sequences. We detected small RNAs (smRNAs) cognate to nonautonomous CRR1 (noaCRR1)and CRR1, but not CRR2 elements. This result was also confirmed by in silico analysis of rice smRNAsequences. These results suggest that different CRR subfamilies may play different roles in the RNAi-mediated pathway for formation and maintenance of centromeric heterochromatin.

THE centromeric retrotransposon of rice (CRR) ishighly enriched in the centromeres of rice chro-

mosomes (Cheng et al. 2002; Nagaki et al. 2005). CRRis a Ty3/gypsy-like retrotransposon (Metaviridae) andbelongs to a class of chromoviruses (Gorinsek et al.2004, 2005; Kordis 2005). This class of retrotransposonis defined by the presence of a chromodomain withinits integrase and seems to be widespread among eukar-yotes. The chromodomain identified within chromovi-ruses is related to the chromodomain of heterochromatinprotein 1 (HP1) of Drosophila melanogaster (Gorinsek et al.2005; Kordis 2005), which interacts with histone 3 meth-ylated at lysine 9 (H3K9me) (Bannister et al. 2001;Fischle et al. 2003). Both HP1 and H3K9me are hallmarksofheterochromatin(Elgin1996;GrewalandRice2004).Although it has not been experimentally proved yet, itis likely that the chromodomain of the CRR elements isresponsible for their preferential localization in the cen-tromeres as well as in pericentromeric regions.

Phylogenetic analysis showed that the CRR family hasdiverged into two subfamilies of autonomous elements(CRR1 and CRR2) and two subfamilies of nonautono-mous elements (noaCRR1 and noaCRR2) (Nagaki et al.2005). Close CRR relatives identified in several distantlyrelated grass species (Miller et al. 1998), including maize(centromeric retrotransposon of maize, CRM) (Zhong

et al. 2002; Nagaki et al. 2003), barley (Cereba elements)(Presting et al. 1998; Hudakova et al. 2001), and sug-arcane (Nagaki and Murata 2005), are almost exclu-sively located in centromeres. These results suggest thatthis family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species andhas maintained its centromeric specificity for .50 MY(Kellogg 2001). The comparison of centromeric retro-transposons from rice, maize, and barley revealed severalhighly conserved motifs even within their long terminalrepeats (LTRs), suggesting that the sequences might haveevolved under selective pressure at the DNA level (Nagaki

et al. 2003).Repetitive DNA elements located in the centromeric

heterochromatin are often transcriptionally silent. How-ever, a low level of transcription paradoxically is requiredfor establishing the transcriptionally silent state of het-erochromatin through RNA interference (RNAi) (reviewedin Bernstein and Allis 2005; Matzke and Birchler

2005). In the RNAi pathway, double-stranded RNA(dsRNA) derived from repetitive DNA sequences isprocessed by the RNAi machinery into short interfer-ence RNA (siRNA). The siRNAs then participate inRNAi by targeting other RNA molecules for degradationand also by driving heterochromatin formation at com-plementary genomic sites (Grewal and Rice 2004;Verdel et al. 2004; Chan et al. 2005; Gendrel andColot 2005). Thus, the importance of this pathway notonly lies in silencing of repetitive DNA elements, whichare frequent targets of RNAi, but also seems to be crucial

1Corresponding author: Department of Horticulture, University ofWisconsin, 1575 Linden Dr., Madison, WI 53706.E-mail: [email protected]

Genetics 176: 749–761 ( June 2007)

Page 2: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

for heterochromatin formation and centromere function,which has been documented in several species, includ-ing Schizosacharomyces pombe (Volpe et al. 2003; Cam

et al. 2005; Pidoux and Allshire 2005), humans(Fukagawa et al. 2004), mice (Kanellopoulou et al.2005), Drosophila (Pal-Bhadra et al. 2004; Deshpande

et al. 2005), and Trypanosoma brucei (Durand-Dubief

and Bastin 2003).The high abundance and centromere-specific locali-

zation of the CRR elements in rice provide an idealsystem for studying the transcription and its associationwith heterochromatin formation and centromere func-tion of the centromeric repeats. Here we report a com-prehensive transcription study of the CRR elementsusing a combination of bioinformatics and experimen-tal approaches. We demonstrate that CRR transcriptsare present in all tested rice organs, suggesting that thetranscription of CRR sequences is constitutive. Tran-scription initiation as well as termination occur bothinside and outside of CRR elements. Some of the CRRtranscripts are processed into small RNA (smRNA). Thepotential impact of CRR transcription on centromericheterochromatin formation and centromere function isdiscussed.

MATERIALS AND METHODS

Chromatin immunoprecipitation–polymerase chain reac-tion, reverse transcriptase–polymerase chain reaction, and 39rapid amplification of cDNA ends analyses: A japonica ricevariety Nipponbare was used in all experiments. Chromatinimmunoprecipitation (ChIP) and ChIP–polymerase chainreaction (ChIP–PCR) followed published protocols (Yan et al.2005). Two sets of primers were designed from the CRR1,CRR2, and noaCRR1 elements. These primers were designedfrom the LTR of CRR1 (CRR1_LTRF1 and CRR1_LTRR1), CRR2(CRR2_LTRF1 and CRR2_LTRR1), and noaCRR1 (noaCRR1_LTRF1and noaCRR1_LTRR1); the gag region of CRR1 (CRR1_CDF1and CRR1_CDR1) and CRR2 (CRR2_CDF1 and CRR2_CDR1); and the 59-UTR region of noaCRR1 (noaCRR1_insF1 andnoaCRR1_insR1) (supplemental Table 1 at http://www.genetics.org/supplemental/). Reverse transcriptase–polymerase chainreaction (RT–PCR) and 39 rapid amplification of cDNA ends(RACE) were according to Lee et al. (2006). Random hexamerswere used for the reverse transcription in RT–PCR, and the 39-RACE_oligoT primer (59-GGC CAC GCG TCG ACT AGT ACTTTT TTT TTT TTT TTT TTV-39) was used in 39-RACE. RT–PCR reaction, using primers specific for actin, actin_F1 (59-CTC GTC TGC GAT AAT GGA A-39), and actin_R1 (59-ATCCTC CAATCC AGA CAC TG-39), was carried out as a control ofequal RNA loading. GeneRACER kit (Invitrogen, San Diego)was used for 59-RACE. Nested PCR was employed to increasethe sensitivity and specificity of both RACE methods. Productsof the first PCR were used as a template in the second PCR thatwas carried out using nested primers. The first 39-RACE PCRwas done using primers CRR_ppt1F and AUAP_39-RACE (59-GGC CAC GCG TCG ACTAGTAC-39). 39-RACE nested PCR wasdone using a primer specific to a particular CRR subfamily(CRR1_1F, CRR2_1F, or noaCRR1_1F) and AUAP_39-RACE.The first 59-RACE PCR was done using primers CRR_pbs1F andRACER_PN1 (59-GGA GCA CGA GGA CAC TGA-39). The 59-RACE nested PCR was done using a primer specific for a partic-

ular CRR subfamily (CRR1_2R, CRR2_2R or noaCRR1_2R) andRACER_PN2 (59-CAC TGA CAT GGA CTG AAG GA-39).

Cloning, sequencing, and sequence analyses: Both RT–PCRand RACE products were cloned into pCRII TOPO vectorusing the TOPO TA cloning kit (Invitrogen). The sequencingof selected clones was done by the dideoxy-mediated chain-termination method using the BigDye terminator v3.1 cyclesequencing kit (Applied Biosystems, Foster City, CA). Sequen-ces of RT–PCR and RACE products were assembled usingStaden Package software (Staden 1996) and all of them weresubmitted into the GenBank database (supplemental Table 2at http://www.genetics.org/supplemental/).

GenBank EST (Boguski et al. 1993) and KOME full-lengthcDNA (fl-cDNA) (Kikuchi et al. 2003) were searched for CRR-similar sequences using the Blast program (Altschul et al.1997). The sequences were analyzed using programs imple-mented at the Biology WorkBench website (http://workbench.sdsc.edu/), EMBOSS (Rice et al. 2000), and Staden Packagesoftware (Staden 1996). Preliminary sequence compari-sons were done using the Dotter and J-dotter programs(Sonnhammer and Durbin 1995; Brodie et al. 2004). Multi-ple alignments were made using CLUSTALW (Thompsonet al.1994). RPS-Blast (Marchler-Bauer et al. 2003) and InterPro-Scan (Zdobnov and Apweiler 2001) were used to searchconserved domain and InterPro protein databases, respec-tively. Splice site analysis was performed using the FSPLICEprogram implemented at the SoftBerry server (http://www.softberry.com/berry.phtml). Databases of CRR elements, RT–PCR products, fl-cDNAs, ESTs, and rice genome pseudomole-cules were screened and compared with one another usingStandalone Blast. The database of CRR elements contained allsequences published by Nagaki et al. (2005). Sequences of therice genome pseudomolecules (build 4.0 release) were down-loaded from the International Rice Genome SequencingProject website (http://rgp.dna.affrc.go.jp/IRGSP/Build4/build4.html). Sequences of CRR elements that we consideredtypical for each CRR subfamily and used for database searchesand basic sequence comparisons were downloaded from theGenBank database under the following accession nos.: AY827957(CRR1_CH1-2), AY828150 (CRR1_CH10-1), AY828151 (CRR1_CH10-2), AY827959 (CRR2_CH1-1), AY828019 (CRR2_CH4-2),AY828024 (CRR2_CH4-7), AY828152 (CRR2_CH10-1),AY828039 (noaCRR1_CH4-5), and AY828004 (noaCRR2_CH1-3). Two databases of smRNA sequences were searched toidentify CRR-derived smRNA sequences (Sunkar et al. 2005b;Johnson et al. 2007). Sequences of all CRR elements describedby Nagaki et al. (2005) and sequences of RT–PCR and RACEproducts were used as queries in the smRNA database search.

Northern blot hybridization: Total RNA was isolated fromleaves and treated with DNaseI (Ambion, Austin, TX). Approx-imately 40 mg RNA was loaded in each lane and resolvedon denaturing 1% agarose gel and then transferred on NytranSPC nylon membrane (Schleicher & Schuell BioScience, Keene,NH). Probes were radioactively labeled with 3000 Ci/mmol[a-32P]dATP (Amersham Biosciences, Piscataway, NJ) usingthe Strip-EZ DNA kit (Ambion). PCR-amplified inserts fromclones ID51 (CRR1 LTR; DV468825), ID69 (CRR2 LTR;DV468838), ID13 (CRR1 reverse transcriptase coding domain,rt; DV468787), ID18 (CRR2 rt; DV468782), ID101 (noaCRR1LTR; DV468869), and ID45 (noaCRR1 gag; DV468819) wereused as templates for the labeling. The hybridization wasperformed overnight in 125 mm sodium phosphate buffer(pH 7.2) containing 50% deionized formamide, 7% SDS, and250 mm sodium chloride at 50�. After the hybridization,membranes were washed twice in 23 SSC and 0.1% SDS for15 min, twice in 0.23 SSC and 0.1% SDS for 15 min, and,finally, once in 53 SSC and 0.5% SDS for 10 min at 65�. Signalswere detected using a phosphorimager.

750 P. Neumann, H. Yan and J. Jiang

Page 3: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

Detection of smRNA: Low-molecular-weight RNA was iso-lated using the mirVana microRNA (miRNA) isolation kit(Ambion). Ten micrograms of RNA was resolved on denatur-ing 15% polyacrylamide gel and then transferred to NytranSPC nylon membrane (Schleicher & Schuell BioScience).Probes were labeled with 800 Ci/mmol [a-32P]UTP using theMAXIscript kit for in vitro transcription labeling (Ambion).The templates for in vitro transcription were prepared fromRACE clones ID51 (CRR1 LTR; DV468825), ID69 (CRR2 LTR;DV468838), and ID101 (noaCRR1 LTR; DV468869). Promotersequences for T7 polymerase were added to the CRR sequencesby PCR using primer pairs T71CRR1_1F (59-TAA TAC GACTCA CTA TAG GGT GAT GAG GAC ATC CCT TCC-39) andAUAP_39-RACE, T71CRR2_1F (59-TAA TAC GAC TCA CTATAG GGT GAT GAG GAC ATC AAC ACC A-39) and AUAP_39-RACE or T71RACER_PN2 (59-TAA TAC GAC TCA CTA TAGGGC ACT GAC ATG GAC TGA AGG A-39) and noaCRR1_2R.To visualize marker RNA, 0.5 fmol of the marker-specific tem-plate was added to the labeling reactions. The hybridizationwas performed overnight in 125 mm sodium phosphate buffer(pH 7.2) containing 50% deionized formamide, 7% SDS, and250 mm sodium chloride at 42�. After the hybridization, mem-branes were washed three times in 23 SSC and 0.1% SDS for10 min, twice in 13 SSC and 0.1% SDS for 15 min, and, finally,once in 53 SSC and 0.5% SDS for 10 min at 50�. DNA oli-gonucleotide probes P98-A8 (59-ACA CGG ACA TCT ACA GAAAAA-39) and OsmiR156 (59-TGT GCT CAC TCT CTT CTGTCA-39) were labeled at 59-ends with [g-32P]ATP (AmershamBiosciences) using T4 polynucleotide kinase (Invitrogen) accord-ing to manufacturer’s protocol. For these two probes, temper-atures of hybridization and washing were changed to 38� and 42�,respectively. Signals were detected using a phosphorimager.

RESULTS

CRR elements are enriched in chromatin associatedwith histone H3 dimethylation at Lys 4: Cytologicalanalysis showed that both CRR1 and CRR2 elements arealmost exclusively located in the centromeres, whereasnoaCRR1 elements are located in both centromeric andpericentromeric regions (Nagaki et al. 2005). We per-formed ChIP analysis to reveal whether the CRR elementsare preferentially associated with rice heterochromatin.We used antibodies against histone H3 dimethylation atLys 4 (H3K4me2), which is preferentially associated withtranscriptionally active euchromatin, and histone H3dimethylation at Lys 9 (H3K9me2), which is a hallmarkof repressive heterochromatin ( Jenuwein and Allis

2001). Two pairs of PCR primers were designed fromthe CRR1, CRR2, and noaCRR1 elements. Quantita-tive real-time PCR analysis revealed significant enrich-ments of the CRR sequences in the immunoprecipitatedDNA associated with H3K9me2 but not with H3K4me2(Figure 1). These results demonstrate that CRR1, CRR2,and noaCRR1 elements are highly enriched in chromo-somal domains associated with heterochromatin.

Bioinformatic analysis of transcribed CRR sequencesin databases: To find if CRR elements are transcribed inthe rice genome, we first searched the KOME (fl-cDNA)(Kikuchi et al. 2003) and GenBank EST (Boguski et al.1993) databases. Sequences typical for each of four CRRsubfamilies (CRR1, CRR2, noaCRR1, and noaCRR2)

described by Nagaki et al. (2005) were used as queries.We found 25 CRR-like fl-cDNA sequences (from a total of28,469), which originated from natural plant tissues orcallus (supplemental Table 3 at http://www.genetics.org/supplemental/). Of these fl-cDNA sequences, 7 belongedto CRR2, 14 to noaCRR1, and only 1 to noaCRR2 sub-families (Figure 2; supplemental Table 3 at http://www.genetics.org/supplemental/). The remaining 3 fl-cDNAsequences had chimerical structures containing sequen-ces derived from different CRR subfamilies or from CRR3,a previously uncharacterized subfamily.

Only two fl-cDNA sequences, AK064474 and AK106612,appeared to contain exclusively CRR-related sequences.These two fl-cDNAs had a similar structure, containingthe 39-part of the coding region, PPT, and an incom-plete 39-LTR (loci J and Q in Figure 2). All other fl-cDNAaccessions (23 of 25; 92%) were chimerical transcriptscontaining both CRR-related and CRR-unrelated se-quences. Most of the CRR-related regions correspondedto the LTRs which, in some cases, were extended into the59-UTR or preceded by the coding region (Figure 2). Noneof the fl-cDNAs contained the complete polyprotein-coding region. The CRR-unrelated regions were alwayslocated upstream of the CRR-related regions, indicatingthat upstream, not CRR-related promoters, drove the tran-scription of these sequences. Seventeen fl-cDNA sequen-ces terminated in the CRR-related regions and 8 extendedinto downstream CRR-unrelated regions. Six fl-cDNAsequences terminated within the A-rich region in the

Figure 1.—Association of CRR elements with H3K4me2and H3K9me2. ChIP–PCR primers were designed from the LTR(CRR1, CRR2, noaCRR1), gag region (CRR1 and CRR2), and59-UTR (noaCRR1). The presence of H3K4me2 or H3K9me2was inferred when enrichment of an antibody-binding fractionover mock DNA was significant in Student’s t -test (a ¼ 0.05, n ¼3). Relative fold enrichment (RFE) is calculated as described(Yan et al. 2005) and indicated by the height of the bar with stan-dard error; the dashed line (RFE ¼ 1) represents the RFE of thereference gene Cen8.t00238 that shows no H3K4me2 and noH3K9me2 (Yan et al. 2005). C1 (gene Cen8.t00390 ; see Yan et al.2005) is a positive control for H3K4me2. C2 (gene Cen8.t00633;see Yan et al. 2005) is a positive control for H3K9me2.

Centromeric Retrotransposons of Rice 751

Page 4: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

59-UTR of CRR sequences, which likely worked as a falsepoly(A) tail during reverse transcription. Interestingly,the majority of fl-cDNA sequences (22 of 25; 88%) wasderived from the plus strand of the CRR elements.

The fl-cDNA sequences were used to search thegenome sequence of Oryza sativa cv. Nipponbare to findcorresponding genomic loci. The identified genomic locishared 99.7–100% sequence similarity with the fl-cDNAs.

Figure 2.—Structure of CRR-related fl-cDNAs and their corresponding genomic loci. The fl-cDNA sequences (exons) areshown as white rectangles and their orientation is from the 59- to the 39-end. The only exception is fl-cDNA AK069963 (chromo-some 11), which is shown in the opposite orientation. The longest ORFs are highlighted in blue. The genomic loci are named A–Sand their positions in the rice chromosome molecules (version 4.0) are summarized in Table 1. CRR-related regions and CRR-unrelated regions within the genomic loci are depicted as red rectangles and thick black lines, respectively. The horizontal blackarrows show the position and orientation of LTRs. The orientation of sequences lacking LTR is depicted by white arrows. Thevertical black arrows mark the position of A-rich regions, which likely represent false poly(A) tails in corresponding fl-cDNA se-quences. The imperfect tandem repeat present in the genomic locus ‘‘I’’ is depicted as an array of yellow arrowheads (note thattheir number does not reflect the number of monomers).

752 P. Neumann, H. Yan and J. Jiang

Page 5: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

Thirteen genomic sequences were interrupted by one tonine introns, which were identified as regions missing inthe fl-cDNA sequences and bordered by the canonicalGT-AG dinucleotides. The intron spliced out from se-quence AK102311 was extremely long (12,999 bp) andwas composed largely of a cluster of an unknown tan-dem repeat (locus I in Figure 2). Comparison of multiplefl-cDNAs derived from the same genomic loci revealedalternative splicing and variability in the position of poly(A)tails (loci A, B, E, and O in Figure 2). The genomic lociwere also analyzed with respect to the CRR sequences.Full-length elements were identified in eight loci. Theremaining genomic loci contained solo LTR or trun-cated CRR sequences. The similarity between two LTRsof the full-length elements varied between 90.33% and99.91%. Considering that 59- and 39-LTRs are identicalupon insertion and that the rate of nucleotide substitu-tion is 6.5 3 10�9 (Gaut et al. 1996), the age of these CRRelements was estimated to be from 0.1 to 15 MY.

The search in the EST database in the GenBank re-vealed 55 accessions (of 406,790 ESTs) with sequencesimilarity to CRRs. Most of them (38) were similar tothe noaCRR1 subfamily. Six sequences shared similaritywith CRR1, six with CRR2, and five with noaCRR2 sub-families. About half of the ESTs had a chimerical struc-ture containing CRR-unrelated sequences. CRR-relatedregions mostly corresponded to the LTR or the 59-UTRregions (supplemental Table 4 at http://www.genetics.org/supplemental/). The internal coding region waspresent in only eight ESTs and it was exclusively repre-sented by a short sequence upstream of the 39-LTR. Only

six EST sequences (11%) originated from plants grownunder normal physiological conditions, whereas most ofthem (78%) were from stressed or transgenic plants orfrom callus tissue (supplemental Table 4 at http://www.genetics.org/supplemental/). No EST sequence wasidentical to any of the KOME fl-cDNAs.

Experimental confirmation of CRR transcription:RT–PCR was used to assess the transcriptional patternsof the CRR elements in different rice organs (root, leaf,and panicles) and to exploit transcripts associated withthe internal regions of CRRs, which were not found inthe fl-cDNA/EST databases. RT–PCR was performedusing primers specific for three CRR subfamilies, includ-ing CRR1, CRR2, and noaCRR1 (supplemental Table 1 athttp://www.genetics.org/supplemental/). These primerscovered the entire internal region of individual CRRelements in 0.5- to 1-kb intervals. Since the similarity inthe pol region is very high between CRR1 and CRR2, weused the same primer pairs to cover this region for bothsubfamilies.

RT–PCR products of expected sizes were obtainedin all reactions and showed the same pattern in threedifferent organs (Figure 3). However, there were differ-ences in yield in many cases, mostly showing a higherabundance of CRR transcripts in panicles than inroots and leaves (Figure 3, A and B). As the actin con-trol showed a similar yield in all three organs, those dif-ferences were not due to unequal RNA loading. Oneconsiderably shorter fragment was amplified in additionto the fragment of expected size using primers CRR1/2_2F and CRR1/2_3R (Figure 3A). Negative controls,

Figure 3.—RT–PCR anal-ysis of CRR transcription. RT–PCR was performed using 14primer pairs specific to bothCRR1 and CRR2 subfamilies(A); to CRR1 only (B); toCRR2 only (C); and tonoaCRR1 only (D). RNA iso-lated from roots (R), leaves(L), or panicles (P) was usedas a template. RT–PCR withprimers specific for the actingene was carried out as anRNA loading control (E).The approximate locationsof the primers within theCRR elements are marked inapproximate positions of am-plified regions within autono-mous (CRR1 and CRR2) andnonautonomous (noaCRR1)elements (F). Only onescheme is shown for bothautonomous CRR elements,

CRR1 and CRR2, because their structure is the same. Primer combinations were as follows: 1—CRR1/2_1F and CRR1/2_2R;2—CRR1/2_2F and CRR1/2_3R; 3—CRR1/2_3F and CRR1/2_4R; 4—CRR1/2_4F and CRR_ppt1R; 5—CRR1_2F and CRR1_3R;6—CRR1_3F and CRR1_4R; 7—CRR1_4F and 5R; 8—CRR2_2F and CRR2_3R; 9—CRR2_3F and CRR2_4R; 10—CRR2_4F and CRR2_5R;11—noaCRR1_2F and noaCRR1_3R; 12—noaCRR1_6F 1 noaCRR1_7Rb; 13—noaCRR1_4F 1 noaCRR1_5R; 14—noaCRR1_5F andCRR_ppt1R.

Centromeric Retrotransposons of Rice 753

Page 6: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

which were included for each primer combination andeach RNA sample, yielded no products. RT–PCR prod-ucts amplified from leaf and in few cases from root andpanicle were cloned and sequenced. All of the obtainedsequences (64 total; supplemental Table 2 at http://www.genetics.org/supplemental/) shared high similar-ity with CRR elements. Most sequences derived from thecoding region contained one intact open reading frame(ORF), which could be directly translated into protein.The putative protein sequences contained motifs of ac-tive sites, which suggests that they could be functional(data not shown).

To examine the abundance and size variability ofvarious CRR transcripts described above, we carried outNorthern blot hybridizations using RNA isolated fromleaves. Blots containing �40 mg of total RNA were hy-bridized with probes specific to the LTR region of CRR1,CRR2, and noaCRR1, the RT-coding region of CRR1and CRR2, and the GAG-coding region of noaCRR1. Al-though all probes were prepared from RT–PCR orRACE clones (see below), strong and unambiguoussignals were detected only by the CRR2 LTR probe andthe two noaCRR1 probes (Figure 4A). These three probeshybridized to an�3.1-kb long transcript. The CRR2 LTRprobe hybridized to this transcript strongly. However,both noaCRR1 probes hybridized weakly (Figure 4A).We assume that the transcript likely originated from aCRR2 element and that the weak hybridization to thenoaCRR1 probes was due to partial similarity betweenCRR2 and noaCRR1 sequences. Its size, which is consid-erably shorter than would be expected for a hypothet-ical full-length CRR2 transcript (7–8 kb), suggests thatthe transcript was prematurely terminated, spliced, ororiginated from a truncated element. In addition to thistranscript, the three probes also hybridized as a 4- to15-kb smear having the highest intensity between 7 and10 kb. The Northern hybridization results revealed a lowlevel of transcription, suggesting that most of the CRRelements are transcriptionally silent.

Mapping of 59- and 39-ends of CRR transcripts: Al-though database searches and RT–PCR results showedthat CRR sequences are transcribed in the rice genome,it remained unclear whether CRR transcription can beinitiated from the CRR promoters and whether thereare termination (polyadenylation) sites additional tothose identified in some fl-cDNA and EST sequences.To address these questions, we conducted 59- and 39-RACE analyses using RNA isolated from leaves. BothRACE methods resulted in amplification of one or moremajor products for each of three CRR subfamilies(Figure 5). These products were cloned and several ran-domly selected clones were sequenced. The analysis of17 39-RACE and 27 59-RACE sequences revealed severalsites of transcription initiation as well as terminationwithin each CRR subfamily (supplemental Figure 1 athttp://www.genetics.org/supplemental/). Transcriptioninitiations were found within the LTR but also in

sequences located upstream of CRR elements. On theother hand, all polyadenylation sites were found withinthe LTR sequences. Importantly, some 59-RACE and39-RACE products contained overlapping sequences.Since the overlap sequences between 59- and 39-LTRs inretrotransposon transcripts are crucial for replication(Wilhelm and Wilhelm 2001), this finding suggeststhat some CRR transcripts are capable of replication.

CRR transcripts are processed into smRNA: To findif CRR transcripts are processed into smRNA, we con-ducted Northern hybridizations using blots containinglow-molecular-weight RNA isolated from rice leaves(Figure 4B). The blots were hybridized with the CRR-specific probes derived from LTRs (see materials and

methods). The noaCRR1 probe hybridized strongly tothe smRNA of �23–24 nt in length (Figure 4C). FaintsmRNA signals were also detected with the CRR1 probe.

Figure 4.—Detection of CRR transcription and CRRsmRNAs by Northern blot hybridization. (A) Detection ofCRR transcripts using five different probes derived fromCRR1, CRR2, and noaCRR1. RNA ladder between 0.24 and9.9 kb was used as a marker (M). Signals were detected onlywith CRR2 LTR and the two noaCRR1 probes, which hybrid-ized to an �3.1-kb transcript (solid arrowheads) and as asmear between 4 and 15 kb. (B) Low-molecular-weight RNAwas separated on 15% polyacrylamide gel. A faint band corre-sponding to the smRNA is marked by an open arrowhead. (C)Blots were hybridized with four CRR probes. The noaCRR1probe hybridized strongly to the smRNA of 23–24 nt. TheCRR1 probe generated several faint hybridization signals tothe smRNA of �21–25 nt. The CRR2 LTR and P98-A8 probesdid not hybridize with the smRNA. Probe OsmiR156 was usedto check RNA quality on blots.

754 P. Neumann, H. Yan and J. Jiang

Page 7: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

However, the CRR2 probes did not hybridize with smRNAs.The CRR2 LTR probe detected an�3.1-kb CRR transcripton the total RNA blots (Figure 4A), which implies that themajority, if not all, of CRR2 transcripts escaped process-ing by the RNAi machinery in rice leaves. To check RNAquality, the blots were reprobed with a probe specific forpreviously described miRNA OsmiR156 (Sunkar et al.2005a). Both sizes of this miRNA (20 and 21 nt) were con-sistently detected, showing that the small RNA on all blotswas intact (Figure 4C).

In addition to the experimental detection of CRR-derived smRNA, we also searched for CRR-related smRNAsin rice databases (Sunkar et al. 2005b; Johnson et al. 2007).The search within the databases of Sunkar et al. (2005b)and Johnson et al. (2007) revealed 1 and 25 sequences,respectively, with high similarity to CRR elements (supple-mental Table 5 at http://www.genetics.org/supplemental/).Of these smRNAs, 1 was derived from CRR1, 4 fromCRR2, 17 from noaCRR1, and 4 from noaCRR2. Themajority (88%) of smRNA sequences originated fromthe LTR and 59-UTR regions (supplemental Table 5at http://www.genetics.org/supplemental/). This is inagreement with the fact that these two regions weremost represented among the fl-cDNA and EST sequen-ces described above. Some smRNA sequences showed aperfect match to the fl-cDNA and EST sequences but itwas not possible to determine whether they originatedfrom these sequences because they have multiple iden-tical matches to different genomic loci. The only excep-tion was smRNA15884, which matches only one locusin the entire rice genome. All identified CRR-derivedsmRNAs occurred only one to two times among 35,454in the genome-mapped smRNA collection (supplemen-tal Table 5 at http://www.genetics.org/supplemental/

and Johnson et al. 2007). Importantly, five smRNAsequences shared significance with the noaCRR1 and1 with the CRR1 probes that hybridized to the smRNAin Northern hybridization (Figure 4). In contrast, nosmRNA sequence was similar to the CRR2 probe, whichdid not hybridize to the smRNA. Thus, our experimen-tal data are in agreement with the result of the compu-tational analysis of CRR-derived smRNA sequences.

Splicing of CRR sequences: Two spliced regions werefound in the CRR1 sequences derived from RT–PCRand 39-RACE analyses. The first one was identified as amissing region in the shorter fragment amplified byprimers CRR1/2_2F and CRR1/2_3R (Figure 3A). Themissing region proved to be an intron because it wasbordered by canonical 59GT and AG 39 dinucleotides.This was also confirmed by absence of the spliced-sizeproduct when the genomic DNA was used as a templatefor PCR (data not shown). The splicing of this regionresulted in a complete removal of the RT-coding domainpredicted by the RPS-Blast program. The position of theacceptor site was the same in all analyzed sequences.However, there were three different donor sites (DS1–3)located very close to one another and alternatively usedfor the splicing (Figure 6). As the donor sites were placedin three different reading frames, the impact of thesplicing on the translation of the downstream codingdomain (RNaseH, integrase, chromodomain) differed.The splicing at the site DS2 removed only the RT-codingdomain but still allowed the translation of the down-stream domains, whereas the splicing at sites DS1 andDS3 separated upstream and downstream coding re-gions into different reading frames and most likelysuppressed translation of the downstream domains.Although all spliced sequences had higher similarityto CRR1 than to the CRR2 subfamily, comparison ofthis region showed that donor and acceptor sites areconserved in all CRR subfamilies and also in the relatedCRM (maize) and Cereba (barley) elements (Figure 6).

The second spliced region was found in the chromo-domain-coding region in the 39-LTR of CRR1 elements.It was detected in one 39-RACE (DV468825) and twoEST (CA998740 and CA999991) sequences as a deletionbetween donor and acceptor splice sites predicted byFSPLICE program (Figure 7). Splice-site analysis of thecorresponding region in CRR2 sequences revealed thesame donor but a different acceptor site. One fl-cDNACRR2 sequence (AK121472) was spliced at the same donorsite, which suggests that CRR2 transcripts were spliced inthe same region as CRR1. The GT dinucleotide at thedonor site was fully conserved among other sequences ofcentromeric retrotransposons (Figure 7). However, theAG dinucleotide in both of the two acceptor sites waspresent in CRR and CRM but absent in Cereba sequences.

Comparison of the fl-cDNA AK067185 with the corre-sponding genomic sequence revealed a splicing event inthe coding region of noaCRR1 (locus F in Figure 2; supple-mental Figure 2 at http://www.genetics.org/supplemental/).

Figure 5.—Mapping ends of CRR transcripts using RACE.(Left) 39-RACE products. (Right) 59-RACE products. DNAfrom bands marked by arrowheads was cloned.

Centromeric Retrotransposons of Rice 755

Page 8: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

This event removed the entire GAG-coding domain, butallowed potential translation of the downstream codingsequence (data not shown). Although we identified onlyone sequence spliced in this region, donor and acceptorsites were conserved in 16 of 26 noaCRR1 elements an-alyzed (supplemental Figure 2 at http://www.genetics.org/supplemental/), which suggests that their tran-scripts may also undergo splicing. Two additional fl-cDNAs (AK064150 and AK111287) were spliced withinthe 59-LTR at the same acceptor site, which seems to behighly conserved among noaCRR1 elements (data notshown). However, as the donor sites were located at dif-ferent positions in CRR-unrelated sequences, it is notclear whether full-size noaCRR1 transcripts can bespliced in the LTR as well.

DISCUSSION

Transcription of the CRR elements: The rice genomeis occupied by many families of LTR retrotransposons

(Gao et al. 2004). However, the transcriptional or trans-positional activities are largely unknown for most riceretrotransposons. Transcripts derived from the RIRE9family were detected in leaves and stems but not inroots (Li et al. 2000). Retrotransposon-related se-quences belonging to several other families were foundin the rice EST database (Vicient et al. 2001a; Jiang

et al. 2002). Retrotransposons are generally silent, butcan be activated by various stress conditions (Wessler

1996; Grandbastien 1998). Few rice retrotransposonfamilies (Tos10, Tos17–20) are transpositionally activeunder tissue culture conditions (Hirochika et al. 1996;Hirochika 1997). Nevertheless, it has been demon-strated in several species that some retrotransposonfamilies, or rather only some of their copies, are tran-scriptionally active (Suoniemi et al. 1997; Meyers et al.2001; Rossi et al. 2001; Vicient et al. 2001a,b; Echenique

et al. 2002; Neumann et al. 2003, 2005), which sug-gests that they can escape from the host silencingmechanisms.

Figure 6.—Alignmentofsequencesofcentromeric retrotransposons in theregionspliced inCRR1transcripts.DV468781–DV468790are GenBank accession nos. of sequences derived from the RT–PCR products. Donor (GT; marked DS1–DS3) and acceptor (AG) splicesites detected in CRR1 and found in other CR sequences are shown against a solid background. Shading marks additional GTand AGdinucleotides predicted as potential splice sites using the FSPLICE program. The reverse-transcriptase domain identified within the pu-tative protein sequence of CRR1-CH10 using RPS-Blast (Marchler-Bauer et al. 2003) is in boldface type and underlined. Asterisks markthe positions of fully conserved nucleotides. Dots indicate parts of the intron sequences that are not shown in the alignment.

756 P. Neumann, H. Yan and J. Jiang

Page 9: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

CRR transcripts were detected by RT–PCR in all threeorgans tested (root, leaf, panicle), suggesting that theCRR-related sequences are transcribed constitutively.Although transcription can be initiated from promoterslocated within 59-LTRs (supplemental Figure 1 at http://www.genetics.org/supplemental/), analysis of the fl-cDNAsequences showed that most CRR transcripts are prob-ably initiated from promoters upstream of the CRRelements (Figure 2). The level of CRR transcription ap-peared to be low. The frequencies of CRR transcriptsin the fl-cDNA and EST databases are 0.089% and0.013%, respectively. Similar EST frequencies were re-ported for both Ty3/gypsy and Ty1/copia retrotranspo-sons present in other Graminae species (Vicient et al.2001a; Echenique et al. 2002), suggesting that the CRRfamily does not differ significantly in transcriptionallevel from other retrotransposon families. The low levelof transcription of the CRR sequences, except for a fewspecific loci, was confirmed by Northern blot hybridiza-tion (Figure 4A). We also found that many CRR transcriptswere possibly derived from relatively few genomic ele-ments. Several fl-cDNAs were derived from the samegenomic loci (Figure 2). A total of 24 RT–PCR and RACEsequences had the highest similarity to the CRR2 elementassociated with fl-cDNA AY828096 (locus N in Figure 2).Six RT–PCR sequences were most similar to three differ-ent regions of one noaCRR1 element (DQ458290). All sixCRR2 39-RACE sequences seemed to originate from thesame genomic locus (locus E in Figure 2). These results

suggest that only a few CRR elements escape silencing andcontribute to the transcription.

We mapped the chromosomal locations of the geno-mic loci corresponding to the fl-cDNAs (Table 1). Twoloci were mapped to regions of chromosomes 3 and10 that contain the centromere-specific satellite CentOarrays. Chromosomes 3 and 10 contain ,500 kb of theCentO repeats (Cheng et al. 2002; Yan et al. 2006).These two loci are likely located within the CENH3-binding domains that span �750 kb in chromosome 8and�1800 kb in chromosome 3 (Nagaki et al. 2004; Yan

et al. 2006). Nine other loci were located within 5 Mbpfrom the CentO arrays. The remaining eight loci weremore distant (up to �21 Mbp) from the centromeres.These results show that the CRR elements located inthe centromeric regions are not more frequently tran-scribed than those in the pericentromeric regions.

Possible function of the CRR transcripts: We ana-lyzed the potential coding capacity of the CRR-relatedfl-cDNAs. The longest ORF was considered a potentialcoding sequence. The length of such coding sequencesvaried from 99 to 2046 nt (Figure 2). Comparison of thepositions between the longest ORFs and CRR-relatedregions showed that they were overlapping in 13 (52%)fl-cDNAs, which in some cases was reflected by similarityto CRR putative polyprotein sequences (data not shown).The remaining fl-cDNAs contained a CRR-related re-gion exclusively upstream (2 fl-cDNAs) or downstream(10 fl-cDNAs) of the longest ORF. Proteins translated

Figure 7.—LTR sequences are spliced in the chromodomain-coding region. (A) Alignment of DNA sequences. Donor (GT)and acceptor (AG) splice sites are shown against a solid background. Asterisks mark the positions of fully conserved nucleotides.Dots indicate parts of the intron sequences that are not shown in the alignment. (B) Alignment of chromodomain sequences fromdifferent families of centromeric retrotransposons. The chromodomain in the Cereba protein defined by Kordis (2005) is boxed.Parts missing in putative proteins translated from spliced CRR1 and CRR2 sequences are in boldface type.

Centromeric Retrotransposons of Rice 757

Page 10: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

from these fl-cDNAs were used as queries for searchesmade by RPS-Blast (Marchler-Bauer et al. 2003) andInterProScan (Zdobnov and Apweiler 2001), but nosignificant similarity to any proteins with known func-tion was found.

The role of partial and chimerical CRR transcripts canonly be speculated. Some of them may encode for pro-teins of unknown function, while others may representnoncoding RNAs with regulatory functions (Morey andAvner 2004; Costa 2005). Several recent reports dem-onstrated the function of noncoding RNA on hetero-chromatin structure and centromere function. Maison

et al. (2002) were the first to demonstrate that the higher-order structure of the centromeric heterochromatin inmice depends on the presence of an RNA component.In maize, transcripts ranging in size from 40 to 900 ntderived from CRM elements and centromeric satelliterepeat CentC are tightly bound to CENH3, suggesting apotential role of such noncoding RNA in the specifica-tion of centromeric chromatin (Topp et al. 2004). Mostrecently, Bouzinba-Segard (2006) showed that centro-meric RNAs transcribed from the centromeric minorsatellite repeat of mice are localized to centromeres.Forced accumulation of the 120-nt transcripts leads todefects in chromosome segregation and sister-chromatidcohesion (Bouzinba-Segard et al. 2006). The authorsproposed that the centromeric RNAs may play a role inregulation of heterochromatin assembly by tethering ofkinetochore- and heterochromatin-associated proteins.

Our PCR-based experiment could not prove the hy-pothesis that CRR promoters may play a significant rolein transcription of the downstream sequences, such asCentO repeats, which are highly intermingled with CRRelements in rice centromeres (Cheng et al. 2002).Cotranscripts of CRR and CentO repeats neither werefound in the sequence databases nor were detectedusing RT–PCR (data not shown). In addition, we did notdetect transcripts by RT–PCR using primers designedfrom downstream sequences of specific CRR elementsin 12 loci within the centromere of chromosome 8 (datanot shown). Thus, although some CRR insertions canprovide active promoters, as we showed by 59-RACE ex-periments, their impact on transcription of the down-stream sequences may be limited. This is in agreementwith the result of May et al. (2005) who detected co-transcripts of the Arabidopsis thaliana centromeric satel-lite repeat cen180 and retrotransposon Athila (106B) inthe met1 mutant but not in the wild-type plant.

We propose that the long transcripts and smRNA de-rived from CRR elements may contribute to the for-mation of centromeric heterochromatin in rice. Onepossible explanation for the low level of CRR transcrip-tion is that most CRR transcripts may be turned overquickly by the RNAi pathway or integrated into centro-meric chromatin. Several recent studies reported that asignificant number of transcripts derived from centro-meric repeats accumulated in mutants of componentsof the RNAi machinery (Volpe et al. 2002; Fukagawa

TABLE 1

Distances of the transcribed CRR loci from centromeres

Centromere position (bp)a Locus Locus start (bp) Locus end (bp) Distance (bp)

Chromosome 1: 16,932,169–18,790,672 A 19,449,052 19,451,341 658,380B 19,941,448 19,952,765 1,150,776C 39,107,933 39,103,924 20,317,261D 40,043,730 40,045,321 21,253,058

Chromosome 2: 13,595,245–14,536,146 E 19,908,619 19,902,819 5,372,473Chromosome 3: CenH3-binding region— F 20,807,744 20,802,720 0

19,446,542–20,972,049 G 22,933,275 22,939,425 1,961,226H 25,226,904 25,234,144 4,254,855I 27,685,061 27,659,042 6,713,012

Chromosome 5: 12,469,491–12,533,116 J 12,070,306 12,076,501 462,810Chromosome 6: 16,301,659–16,367,415 K 7,059,565 7,055,723 9,307,850

L 22,915,357 22,918,270 6,547,942Chromosome 7: 12,322,115–12,941,422 M 15,378,002 15,374,900 2,436,580Chromosome 8: CenH3-binding region— N 12,220,132 12,227,950 763,621

12,983,753–13,853,371 O 26,264,391 26,273,122 12,411,020Chromosome 9: 2,697,295–3,546,982 P 5,545,687 5,532,031 1,998,705

Q 19,701,131 19,693,149 16,154,149Chromosome 10: 7,865,379–9,282,956 R 8,993,007 8,990,787 0Chromosome 11: 12,115,113–14,267,963 S 15,954,099 15,960,552 3,733,183

a Positions of centromeric regions were estimated on the basis of the presence of CentO sequences. The se-quence coordinates shown in the table mark positions of CentO arrays in the centromeres but do not delimitthe whole CENH3-binding regions. The only exceptions are centromeres 3 and 8 for which the whole CENH3-binding regions are already known (Nagaki et al. 2004; Yan et al. 2006). Distances were measured from the locusstart to the closer centromere border.

758 P. Neumann, H. Yan and J. Jiang

Page 11: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

et al. 2004; Murchison et al. 2005). Our data showedthat there are considerable differences in abundanceand origin of the smRNAs derived from individualCRR subfamilies. While the noaCRR1 transcripts seemto be extensively processed into smRNA, CRR1- andCRR2-derived smRNAs are much less abundant (Figure4C; supplemental Table 5 at http://www.genetics.org/supplemental/). Strikingly, all identified small RNAsequences derived from CRR1, noaCRR1, and noaCRR2elements originated from the LTR and 59-UTR regions.In contrast, all CRR2 smRNAs originated from the insidepart, including 59-UTR and gag-pol coding regions. As weshowed that all CRR regions are transcribed (Figures 3and 5), these findings suggest a differential extent ofRNAi processing of the CRR transcripts. CRR1 andCRR2 elements are almost exclusively located in thecentromeric regions. However, noaCRR1 elements aredistributed more widely in the rice genome and werefound in both centromeric and noncentromeric regions(Nagaki et al. 2005). The observed differences may alsobe due to differences in copy numbers of individualCRR elements in the rice genome and to differences inabundance of the corresponding transcripts. The ab-sence of CRR2 LTR-derived smRNA correlates with thehigh abundance of long transcripts hybridizing with theCRR2 LTR probe (Figure 4), which suggests that someCRR2 transcripts escape from the RNAi processing.These results suggest that the different CRR subfamilies,and perhaps also different parts of their sequences, playdifferent roles in the RNAi-mediated pathway for forma-tion and maintenance of centromeric heterochromatin.

Alternative splicing of CRR elements: Although ret-roviruses are spliced during post-transcriptional pro-cessing (Rabson and Graves 1997), splicing of LTRretrotransposons is rare and has been reported in only afew retrotransposon families (Brierley and Flavell 1990;Vicient et al. 2001b; Neumann et al. 2003). Retrotrans-poson splicing was proposed to play roles in regulationof the GAG/GAG-PRO-POL or GAG-PRO/GAG-PRO-POLratios (Brierley and Flavell 1990; Neumann et al.2003) and in expression of the envelope-like gene(Vicient et al. 2001b). The splicing of CRR transcriptsis unique in that at least two regions, both of which aredifferent from the spliced regions in the three above-mentioned retrotransposons, are spliced from autono-mous CRR elements. CRR elements seem to encode allprotein domains within a single ORF (Nagaki et al.2005) and thus cannot regulate the GAG (GAG-PRO) toGAG-PRO-POL ratio by the two common translationalrecoding mechanisms on the basis of stop-codon read-through or ribosomal frameshifting (Swanstrom andWills 1997; Vogt 1997; Gao et al. 2003). However, asGAG is generally required in a higher molar amountthan POL (Swanstrom and Wills 1997), CRRs must usea strategy that favors expression of GAG or GAG-PROover GAG-PRO-POL polyproteins. Considering that thesplicing of the reverse-transcriptase-coding region in

CRRs results in separation of gag-pro and pol genes intodifferent reading frames, it is likely that translation ofthe pol gene from spliced transcripts is suppressed whilethe translation of gag and pro genes remains unaffected.As only some CRR transcripts are spliced, the ratiobetween GAG-PRO and GAG-PRO-POL polyproteinsmight be controlled by the ratio between spliced andunspliced transcripts. Interestingly, the splice sites appearto be conserved among all autonomous CRR subfamilies(including the newly discovered CRR3 subfamily; Figures6 and 7) as well as in the centromeric retrotransposonsfrom maize (CRM) and barley (Cereba), supporting ourhypothesis that CRR splicing may play a role in regulationof the expression of its encoding genes.

We thank Dan Voytas for his valuable comments on the manuscript.This research was supported by Department of Energy grant FG02-01ER15266 to J.J.

LITERATURE CITED

Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang,Z. Zhang et al., 1997 Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic AcidsRes. 25: 3389–3402.

Bannister, A. J., P. Zegerman, J. F. Partridge, E. A. Miska, J. O.Thomas et al., 2001 Selective recognition of methylated lysine9 on histone H3 by the HP1 chromo domain. Nature 410:120–124.

Bernstein, E., and C. D. Allis, 2005 RNA meets chromatin. GenesDev. 19: 1635–1655.

Boguski, M. S., T. M. J. Lowe and C. M. Tolstoshev, 1993 Dbest:database for expressed sequence tags. Nat. Genet. 4: 332–333.

Bouzinba-Segard, H., A. Guais and C. Francastel, 2006 Accumulationof small murine minor satellite transcripts leads to impaired centro-meric architecture and function. Proc. Natl. Acad. Sci. USA 103:8709–8714.

Brierley, C., and A. J. Flavell, 1990 The retrotransposon Copiacontrols the relative levels of its gene products posttranscription-ally by differential expression from its two major mRNAs. NucleicAcids Res. 18: 2947–2951.

Brodie, R., R. L. Roper and C. Upton, 2004 JDotter: a Java inter-face to multiple dotplots generated by dotter. Bioinformatics20: 279–281.

Cam, H. P., T. Sugiyama, E. S. Chen, X. Chen, P. C. FitzGerald et al.,2005 Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic control of the fission yeast genome. Nat.Genet. 37: 809–819.

Chan, S. W. L., I. R. Henderson and S. E. Jacobsen, 2005 Gardeningthe genome: DNA methylation in Arabidopsis thaliana. Nat. Rev.Genet. 6: 351–360.

Cheng, Z. K., F. G. Dong, T. Langdon, O. Y. Shu, C. R. Buell et al.,2002 Functional rice centromeres are marked by a satellite re-peat and a centromere-specific retrotransposon. Plant Cell 14:1691–1704.

Costa, F. F., 2005 Non-coding RNAs: new players in eukaryotic bi-ology. Gene 357: 83–94.

Deshpande, G., G. Calhoun and P. Schedl, 2005 Drosophilaargonaute-2 is required early in embryogenesis for the assemblyof centric/centromeric heterochromatin, nuclear division, nuclearmigration, and germ-cell formation. Genes Dev. 19: 1680–1685.

Durand-Dubief, M., and P. Bastin, 2003 TbAGOI, an Argonaute pro-tein required for RNA interference, is involved in mitosis and chro-mosome segragation in Trypanosoma brucei. BMC Biol. 1: 2.

Echenique, V., B. Stamova, P. Wolters, G. Lazo, V. L. Carollo

et al., 2002 Frequencies of Ty1-copia and Ty3-gypsy retroele-ments within the Triticeae EST databases. Theor. Appl. Genet.104: 840–844.

Elgin, S. C. R., 1996 Heterochromatin and gene regulation in Dro-sophila. Curr. Opin. Genet. Dev. 6: 193–202.

Centromeric Retrotransposons of Rice 759

Page 12: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

Fischle, W., Y. M. Wang, S. A. Jacobs, Y. C. Kim, C. D. Allis et al.,2003 Molecular basis for the discrimination of repressivemethyl-lysine marks in histone H3 bv Polvcomb and HP1 chro-modomains. Genes Dev. 17: 1870–1881.

Fukagawa, T., M. Nogami, M. Yoshikawa, M. Ikeno, T. Okazaki

et al., 2004 Dicer is essential for formation of the heterochroma-tin structure in vertebrate cells. Nat. Cell Biol. 6: 784–791.

Gao, L. H., E. M. McCarthy, E. W. Ganko and J. F. McDonald,2004 Evolutionary history of Oryza sativa LTR retrotransposons:a preliminary survey of the rice genome sequences. BMC Ge-nomics 5: 18.

Gao, X., E. R. Havecker, P. V. Baranov, J. F. Atkins and D. F.Voytas, 2003 Translational recoding signals between gag andpol in diverse LTR retrotransposons. RNA 9: 1422–1430.

Gaut, B. S., B. R. Morton, B. C. McCaig and M. T. Clegg,1996 Substitution rate comparisons between grasses and palms:synonymous rate differences at the nuclear gene Adh parallelrate differences at the plastid gene rbcL. Proc. Natl. Acad. Sci.USA 93: 10274–10279.

Gendrel, A. V., and V. Colot, 2005 Arabidopsis epigenetics: whenRNA meets chromatin. Curr. Opin. Plant Biol. 8: 142–147.

Gorinsek, B., F. Gubensek and D. Kordis, 2004 Evolutionary ge-nomics of chromoviruses in eukaryotes. Mol. Biol. Evol. 21: 781–798.

Gorinsek, B., F. Gubensek and D. Kordis, 2005 Phylogenomic anal-ysis of chromoviruses. Cytogenet. Genome Res. 110: 543–552.

Grandbastien, M. A., 1998 Activation of plant retrotransposons un-der stress conditions. Trends Plant Sci. 3: 181–187.

Grewal, S. I. S., and J. C. Rice, 2004 Regulation of heterochroma-tin by histone methylation and small RNAs. Curr. Opin. Cell Biol.16: 230–238.

Hirochika, H., 1997 Retrotransposons of rice: their regulation anduse for genome analysis. Plant Mol. Biol. 35: 231–240.

Hirochika, H., K. Sugimoto, Y. Otsuki, H. Tsugawa and M. Kanda,1996 Retrotransposons of rice involved in mutations inducedby tissue culture. Proc. Natl. Acad. Sci. USA 93: 7783–7788.

Hudakova, S., W. Michalek, G. G. Presting, R. ten Hoopen, K. dos

Santos et al., 2001 Sequence organization of barley centro-meres. Nucleic Acids Res. 29: 5029–5035.

Jenuwein, T., and C. D. Allis, 2001 Translating the histone code.Science 293: 1074–1080.

Jiang, N., I. K. Jordan and S. R. Wessler, 2002 Dasheng andRIRE2: a nonautonomous long terminal repeat element andits putative autonomous partner in the rice genome. Plant Phys-iol. 130: 1697–1705.

Johnson, C., L. Bowman, A. T. Adai, V. Vance and V. Sundaresan,2007 CSRDB: a small RNA integrated database and browser re-source for cereals. Nucleic Acids Res. 35: D829–D833.

Kanellopoulou, C., S. A. Muljo, A. L. Kung, S. Ganesan, R.Drapkin et al., 2005 Dicer-deficient mouse embryonic stemcells are defective in differentiation and centromeric silencing.Genes Dev. 19: 489–501.

Kellogg, E. A., 2001 Evolutionary history of the grasses. Plant Phys-iol. 125: 1198–1205.

Kikuchi, S., K. Satoh, T. Nagata, N. Kawagashira, K. Doi et al.,2003 Collection, mapping, and annotation of over 28,000cDNA clones from japonica rice. Science 301: 376–379.

Kordis, D., 2005 A genomic perspective on the chromodomain-containing retrotransposons: chromoviruses. Gene 347: 161–173.

Lee, H. R., P. Neumann, J. Macas and J. Jiang, 2006 Transcriptionand evolutionary dynamics of the centromeric satellite repeatCentO in rice. Mol. Biol. Evol. 23: 2505–2520.

Li, Z. Y., S. Y. Chen, X. W. Zheng and L. H. Zhu, 2000 Identificationand chromosomal localization of a transcriptionally active retro-transposon of Ty3-gypsy type in rice. Genome 43: 404–408.

Maison, C., D. Bailly, A. H. F. M. Peters, J. P. Quivy, D. Roche et al.,2002 Higher-order structure in pericentric heterochromatin in-volves a distinct pattern of histone modification and an RNAcomponent. Nat. Genet. 30: 329–334.

Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott, N. D.Fedorova, L. Y. Geer et al., 2003 CDD: a curated Entrez data-base of conserved domain alignments. Nucleic Acids Res. 31:383–387.

Matzke, M. A., and J. A. Birchler, 2005 RNAi-mediated pathwaysin the nucleus. Nat. Rev. Genet. 6: 24–35.

May, B. P., Z. B. Lippman, Y. D. Fang, D. L. Spector and R. A.Martienssen, 2005 Differential regulation of strand-specifictranscripts from Arabidopsis centromeric satellite repeats. PLoSGenet. 1: 705–714.

Meyers, B. C., S. V. Tingley and M. Morgante, 2001 Abundance,distribution, and transcriptional activity of repetitive elements inthe maize genome. Genome Res. 11: 1660–1676.

Miller, J. T., F. G. Dong, S. A. Jackson, J. Song and J. M. Jiang,1998 Retrotransposon-related DNA sequences in the centro-meres of grass chromosomes. Genetics 150: 1615–1623.

Morey, C., and P. Avner, 2004 Employment opportunities for non-coding RNAs. FEBS Lett. 567: 27–34.

Murchison, E. P., J. F. Partridge, O. H. Tam, S. Cheloufi and G.J. Hannon, 2005 Characterization of Dicer-deficient murineembryonic stem cells. Proc. Natl. Acad. Sci. USA 102: 12135–12140.

Nagaki, K., and M. Murata, 2005 Characterization of CENH3 andcentromere-associated DNA sequences in sugarcane. Chromo-some Res. 13: 195–203.

Nagaki, K., J. Q. Song, R. M. Stupar, A. S. Parokonny, Q. P. Yuan

et al., 2003 Molecular and cytological analyses of large tracks ofcentromeric DNA reveal the structure and evolutionary dynamicsof maize centromeres. Genetics 163: 759–770.

Nagaki, K., Z. K. Cheng, S. Ouyang, P. B. Talbert, M. Kim et al.,2004 Sequencing of a rice centromere uncovers active genes.Nat. Genet. 36: 138–145.

Nagaki, K., P. Neumann, D. F. Zhang, S. Ouyang, C. R. Buell et al.,2005 Structure, divergence, and distribution of the CRR cen-tromeric retrotransposon family in rice. Mol. Biol. Evol. 22:845–855.

Neumann, P., D. Pozarkova and J. Macas, 2003 Highly abundantpea LTR retrotransposon Ogre is constitutively transcribed andpartially spliced. Plant Mol. Biol. 53: 399–410.

Neumann, P., D. Pozarkova, A. Koblizkova and J. Macas,2005 PIGY, a new plant envelope-class LTR retrotransposon.Mol. Genet. Genomics 273: 43–53.

Pal-Bhadra, M., B. A. Leibovitch, S. G. Gandhi, M. Rao, U. Bhadra

et al., 2004 Heterochromatin silencing and HP1 localization inDrosophila are dependent on the RNAi machinery. Science 303:669–672.

Pidoux, A. L., and R. C. Allshire, 2005 The role of heterochroma-tin in centromere function. Philos. Trans. R. Soc. Lond. B Biol.Sci. 360: 569–579.

Presting, G. G., L. Malysheva, J. Fuchs and I. Schubert,1998 A TY3/GYPSY retrotransposon-like sequence localizes tothe centromeric regions of cereal chromosomes. Plant J. 16:721–728.

Rabson, A. B., and B. J. Graves, 1997 Synthesis and processingof viral RNA, pp. 205–261 in Retroviruses, edited by J. M. Coffin,S. H. Hughes and H. E. Varmus. Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, NY.

Rice, P., I. Longden and A. Bleasby, 2000 EMBOSS: The Europeanmolecular biology open software suite. Trends Genet. 16: 276–277.

Rossi, M., P. G. Araujo and M. A. Van Sluys, 2001 Survey of trans-posable elements in sugarcane expressed sequence tags (ESTs).Genet. Mol. Biol. 24: 147–154.

Sonnhammer, E. L. L., and R. Durbin, 1995 A dot-matrix programwith dynamic threshold control suited for genomic DNA andprotein sequence analysis. Gene 167: 1–10.

Staden, R., 1996 The Staden sequence analysis package. Mol. Bio-technol. 5: 233–241.

Sunkar, R., T. Girke, P. K. Jain and J. K. Zhu, 2005a Cloningand characterization of MicroRNAs from rice. Plant Cell 17:1397–1411.

Sunkar, R., T. Girke and J. K. Zhu, 2005b Identification and char-acterization of endogenous small interfering RNAs from rice.Nucleic Acids Res. 33: 4443–4454.

Suoniemi, A., D. Schmidt and A. H. Schulman, 1997 BARE-1 inser-tion site preferences and evolutionary conservation of RNA andcDNA processing sites. Genetica 100: 219–230.

Swanstrom, R., and J. W. Wills, 1997 Synthesis, assembly, and pro-cessing of viral proteins, pp. 263–334 in Retroviruses, edited byJ. M. Coffin, S. H. Hughes and H. E. Varmus. Cold SpringHarbor Laboratory Press, Cold Spring Harbor, NY.

760 P. Neumann, H. Yan and J. Jiang

Page 13: The Centromeric Retrotransposons of Rice Are Transcribed ... · this family of retrotransposons colonized in the centro-meres of the common ancestor of these grass species and has

Thompson, J. D., D. G. Higgins and T. J. Gibson, 1994 Clustal-W:improving the sensitivity of progressive multiple sequence align-ment through sequence weighting, position-specific gap penal-ties and weight matrix choice. Nucleic Acids Res. 22: 4673–4680.

Topp, C. N., C. X. Zhong and R. K. Dawe, 2004 Centromere-encoded RNAs are integral components of the maize kineto-chore. Proc. Natl. Acad. Sci. USA 101: 15986–15991.

Verdel, A., S. T. Jia, S. Gerber, T. Sugiyama, S. Gygi et al.,2004 RNAi-mediated targeting of heterochromatin by the RITScomplex. Science 303: 672–676.

Vicient, C. M., M. J. Jaaskelainen, R. Kalendar and A. H.Schulman, 2001a Active retrotransposons are a common fea-ture of grass genomes. Plant Physiol. 125: 1283–1292.

Vicient, C. M., R. Kalendar and A. H. Schulman, 2001b Envelope-class retrovirus-like elements are widespread, transcribed andspliced, and insertionally polymorphic in plants. Genome Res.11: 2041–2049.

Vogt, V. M., 1997 Retroviral virions and genomes, pp. 27–70 inRetroviruses, edited by J. M. Coffin, S. H. Hughes and H. E.Varmus. Cold Spring Harbor Laboratory Press, Cold SpringHarbor, NY.

Volpe, T. A., C. Kidner, I. M. Hall, G. Teng, S. I. S. Grewal

et al., 2002 Regulation of heterochromatic silencing and his-tone H3 lysine-9 methylation by RNAi. Science 297: 1833–1837.

Volpe, T., V. Schramke, G. L. Hamilton, S. A. White, G. Teng et al.,2003 RNA interference is required for normal centromerefunction in fission yeast. Chromosome Res. 11: 137–146.

Wessler, S. R., 1996 Plant retrotransposons: turned on by stress.Curr. Biol. 6: 959–961.

Wilhelm, M., and F. X. Wilhelm, 2001 Reverse transcription ofretroviruses and LTR retrotransposons. Cell. Mol. Life Sci. 58:1246–1262.

Yan, H. H., W. W. Jin, K. Nagaki, S. L. Tian, S. Ouyang

et al., 2005 Transcription and histone modifications in therecombination-free region spanning a rice centromere. PlantCell 17: 3227–3238.

Yan, H. H., H. Ito, K. Nobuta, S. Ouyang, W. W. Jin et al.,2006 Genomic and genetic characterization of rice Cen3 revealsextensive transcription and evolutionary implications of a com-plex centromere. Plant Cell 18: 2123–2133.

Zdobnov, E. M., and R. Apweiler, 2001 InterProScan: an integra-tion platform for the signature-recognition methods in InterPro.Bioinformatics 17: 847–848.

Zhong, C. X., J. B. Marshall, C. Topp, R. Mroczek, A. Kato et al.,2002 Centromeric retroelements and satellites interact withmaize kinetochore protein CENH3. Plant Cell 14: 2825–2836.

Communicating editor: V. Sundaresan

Centromeric Retrotransposons of Rice 761