on the relationship between gc content and the number of predicted microrna binding sites by...

5
Computational Biology and Chemistry 32 (2008) 222–226 Contents lists available at ScienceDirect Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem Brief Communication On the relationship between GC content and the number of predicted microRNA binding sites by MicroInspector Nicole Davis a , Natasha Biddlecom a , David Hecht a , Gary B. Fogel b,a Southwestern College, 900 Otay Lakes Road, Chula Vista, CA 91910, USA b Natural Selection Inc., 9330 Scranton Road, Suite 150, San Diego, CA 92121, USA article info Article history: Received 11 February 2008 Received in revised form 15 February 2008 Accepted 16 February 2008 Keywords: MicroInspector MicroRNA Target prediction GC content abstract MicroRNA GC content and length is believed to play a role in the prediction of putative microRNA targets. MicroInspector was evaluated to determine the extent to which these characteristics of microRNAs play a role in binding site predictive accuracy. A strong bias towards under predicting the number of expected bindings sites for low GC content sequences was observed, especially for microRNAs with <50% GC content. Researchers working with organisms with unusually low GC content should be aware of this bias. © 2008 Elsevier Ltd. All rights reserved. 1. Introduction MicroRNAs (miRs) are known to play key roles in cellular differ- entiation and regulation, and operate via basepairing interactions with a target sequence (Foshay and Gallicano, 2007; Marquez and McCaffrey, 2007; Liu et al., 2008). miRs are also believed to be involved with tumorigenesis and some miRs may have diagnos- tic value (Lowery et al., 2008; Schetter et al., 2008). Unfortunately the number of experimentally verified miRs exceeds the number of experimentally verified miR targets (Betel et al., 2008). Compu- tational and experimental approaches to help to ascertain putative miR targets are highly desired by the research community. Several bioinformatics approaches have been generated for the prediction of miR targets in sequence information, based on databases of known miRs and calculations of sequence similarity and/or free energy of binding (Sethupathy et al., 2006a,b; Thadani and Tammi, 2006). For example, MicroInspector (http://mirna.imbb.forth.gr/microinspector/) is a tool for the detec- tion of microRNA binding sites (Rusinov et al., 2005). While there are many tools that attempt to identify as many mRNA targets as possible for each given miR, MicroInspector attempts to identify possible binding sites based on rules for the free energy of binding that are appropriate to known miRs for each organism of inter- est. These putative binding sites can be examined with follow-on experimentation (Bonnet et al., 2004a,b). Corresponding author. Tel.: +1 858 455 6449; fax: +1 858 455 1560. E-mail address: [email protected] (G.B. Fogel). The authors of MicroInspector acknowledged several possible issues with the reliance on free energy calculation for binding prediction. These included a higher likelihood for binding site pre- diction with increase in GC content of the sequence and miR length (Rusinov et al., 2005). We chose to better understand the relation- ship of GC content and miR length to binding site prediction from MicroInspector, specifically to determine which organisms might be most affected by such a bias is simply because of their genomic GC content. 2. Materials and Methods 2.1. MicroInspector MicroInspector (Rusinov et al., 2005) is a web-based tool for searching miR binding sites in a target RNA sequence. The user is first asked to input a sequence to be analyzed (which can be either RNA or DNA). The user can either input the sequence data manually or provide GenBank accession numbers. A hybridization tempera- ture can then be adjusted by the user if required. This value affects the calculation of binding affinity and the program authors recom- mend the use of a default temperature of 37 C. In addition, a value for a minimum free energy also needs to be defined, and the default value is 20 kcal/mol. This cut-off affects the number of results that are presented to the user: only those results that have lower energy than the cut-off are displayed. As a final step, the user must select an miR database to be interrogated, effectively allowing the user to choose a miR database in an organism-specific manner. 1476-9271/$ – see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2008.02.004

Upload: nicole-davis

Post on 26-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Computational Biology and Chemistry 32 (2008) 222–226

Contents lists available at ScienceDirect

Computational Biology and Chemistry

journa l homepage: www.e lsev ier .com/ locate /compbio lchem

Brief Communication

On the relationship between GC content and the numberMi

Gar

lengatedictiveontenorga

of predicted microRNA binding sites by

Nicole Davisa, Natasha Biddlecoma, David Hechta,a Southwestern College, 900 Otay Lakes Road, Chula Vista, CA 91910, USAb Natural Selection Inc., 9330 Scranton Road, Suite 150, San Diego, CA 92121, USA

a r t i c l e i n f o

Article history:Received 11 February 2008Received in revised form 15 February 2008Accepted 16 February 2008

Keywords:MicroInspectorMicroRNATarget predictionGC content

a b s t r a c t

MicroRNA GC content andMicroInspector was evalua role in binding site predbindings sites for low GC cResearchers working with

1. Introduction

MicroRNAs (miRs) are known to play key roles in cellular differ-entiation and regulation, and operate via basepairing interactionswith a target sequence (Foshay and Gallicano, 2007; Marquez and

McCaffrey, 2007; Liu et al., 2008). miRs are also believed to beinvolved with tumorigenesis and some miRs may have diagnos-tic value (Lowery et al., 2008; Schetter et al., 2008). Unfortunatelythe number of experimentally verified miRs exceeds the numberof experimentally verified miR targets (Betel et al., 2008). Compu-tational and experimental approaches to help to ascertain putativemiR targets are highly desired by the research community.

Several bioinformatics approaches have been generated forthe prediction of miR targets in sequence information, basedon databases of known miRs and calculations of sequencesimilarity and/or free energy of binding (Sethupathy et al.,2006a,b; Thadani and Tammi, 2006). For example, MicroInspector(http://mirna.imbb.forth.gr/microinspector/) is a tool for the detec-tion of microRNA binding sites (Rusinov et al., 2005). While thereare many tools that attempt to identify as many mRNA targets aspossible for each given miR, MicroInspector attempts to identifypossible binding sites based on rules for the free energy of bindingthat are appropriate to known miRs for each organism of inter-est. These putative binding sites can be examined with follow-onexperimentation (Bonnet et al., 2004a,b).

∗ Corresponding author. Tel.: +1 858 455 6449; fax: +1 858 455 1560.E-mail address: [email protected] (G.B. Fogel).

1476-9271/$ – see front matter © 2008 Elsevier Ltd. All rights reserved.doi:10.1016/j.compbiolchem.2008.02.004

croInspector

y B. Fogelb,∗

th is believed to play a role in the prediction of putative microRNA targets.to determine the extent to which these characteristics of microRNAs playaccuracy. A strong bias towards under predicting the number of expectedt sequences was observed, especially for microRNAs with <50% GC content.nisms with unusually low GC content should be aware of this bias.

© 2008 Elsevier Ltd. All rights reserved.

The authors of MicroInspector acknowledged several possibleissues with the reliance on free energy calculation for bindingprediction. These included a higher likelihood for binding site pre-diction with increase in GC content of the sequence and miR length(Rusinov et al., 2005). We chose to better understand the relation-ship of GC content and miR length to binding site prediction fromMicroInspector, specifically to determine which organisms mightbe most affected by such a bias is simply because of their genomic

GC content.

2. Materials and Methods

2.1. MicroInspector

MicroInspector (Rusinov et al., 2005) is a web-based tool forsearching miR binding sites in a target RNA sequence. The user isfirst asked to input a sequence to be analyzed (which can be eitherRNA or DNA). The user can either input the sequence data manuallyor provide GenBank accession numbers. A hybridization tempera-ture can then be adjusted by the user if required. This value affectsthe calculation of binding affinity and the program authors recom-mend the use of a default temperature of 37 ◦C. In addition, a valuefor a minimum free energy also needs to be defined, and the defaultvalue is −20 kcal/mol. This cut-off affects the number of results thatare presented to the user: only those results that have lower energythan the cut-off are displayed. As a final step, the user must selectan miR database to be interrogated, effectively allowing the user tochoose a miR database in an organism-specific manner.

logy and Chemistry 32 (2008) 222–226 223

Fig. 1. Distribution of GC content (%) for known human microRNA hairpin sequencesin the microRNA registry (n = 533). According to the Kolmogorov–Smirnov test, thedata are not likely to be normally distributed (P = 0.00 where the normal distributionhas mean = 50.52 and standard deviation = 10.35).

N. Davis et al. / Computational Bio

Once these parameters are established, the algorithm scanseach target sequence for every miR sequence from the chosen miRdatabase consecutively in an attempt to identify possible regionsof hybridization. This is accomplished using a window of 6 nt slid-ing by 1 nt throughout the sequences. For each window pair, theapproach searches for domains having five Watson–Crick base pairsor four Watson–Crick pairs with one additional G:U at any locationin the window. If these conditions are not found, the window isshifted by 1 nt and the calculation is repeated until all windowshave been examined. If any window does satisfy the basepairingcriteria, then additional methods are used to examine hybridiza-tion and folding. The output of this approach to the user is a list ofputative binding locations of miRs to any sequence that was inputto the original query window.

For the purposes of our investigation, all human and randomsequences as well as their reverse complements were processedthrough MicroInspector (Rusinov et al., 2005) to identify the num-ber and location of possible microRNA binding sites. Standarddefault settings were used as recommended.

2.2. Sequence Curation

One hundred random sequences each of length 100 nucleotideswere generated using the Sequence Manipulation Suite (Stothard,2000) with default settings. This process was repeated nine timeswith each set using a different GC content in the range (20–80%)to cover reasonable estimates of GC content for genomic sequenceinformation as well as known microRNAs. Hairpin sequences forall known human, Caenorhabditis briggsae, Arabidopsis thaliana, andRattus norvegicus microRNAs were downloaded from the microRNAregistry (Griffiths-Jones, 2004; Griffiths-Jones et al., 2006) and foreach sequence, the GC content was calculated. MicroRNAs werebinned into three GC content categories (low ≤33%GC, medium

34–66%GC, and high ≥67%GC).

3. Results

3.1. Human MicroRNAs and Random Sequences

Analysis of all known human microRNA hairpin sequences inthe miR registry (n = 533) indicates that GC content is normallydistributed with a mean of 50.52% and standard deviation of 10.35%(Fig. 1). The mean hairpin length is 89 nucleotides with a standarddeviation of 13 nucleotides and is not normally distributed (Fig. 2).The microRNA GC content is similar to the estimated GC content ofthe human genome; the mean GC content of the human genomehas been estimated to be ∼40% (Lander et al., 2001). When humanmicroRNA hairpin sequences are provided to MicroInspector, thenumber of binding sites predicted by MicroInspector was found tobe dependent on the GC content of the sequence being provided.

To further analyze this behavior, the set of known humanmicroRNAs was subdivided into categories of GC content by arbi-trarily chosen three equal bins based on GC% (Table 1). ThemicroRNAs used for these different GC contents are provided in

Table 1Categorization of microRNA hairpin sequences by GC content (low, medium, or high) wipredicted by MicroInspector

GC category Number of sequences Mean GC%

Low 26 30.1Medium 30 50.0High 30 71.5

The predicted number of binding sites by MicroInspector in the GC category “low” is staP < 0.0001).

Fig. 2. Distribution of length in nucleotides (nt) for known human microRNA hairpinsequences in the microRNA registry (n = 533). According to the Kolmogorov–Smirnovtest, the data are not normally distributed (P = 0.00 where the normal distribu-tion has mean = 90.85 and standard deviation = 13.61). In particular there is a largeoverabundance of sequences with length 95–100 nt.

Table 2. Sequences with a high mean GC content yielded roughly10 times the mean number of predicted binding sites using MicroIn-spector when compared to those with a low mean GC content,however, the mean number and standard deviation of predictedbinding sites for sequences in the medium GC content and highGC content categories were quite similar. The observation of such adramatic difference in the number of predicted binding sites byMicroInspector relative to GC content was unanticipated, espe-cially due to the observed variance in GC content for known humanmicroRNA hairpins in the miR registry. Using an unpaired t-test,this difference in the number of predicted binding sites was statis-

th resulting mean GC content by category and number of microRNA binding sites

Predicted number of binding sites by MicroInspector (� ± �)

1.3 ± 1.58.8 ± 5.4

10.0 ± 4.9

tistically different from either the “medium” (P < 0.0001) or “high” GC categories

224 N. Davis et al. / Computational Biology and Chemistry 32 (2008) 222–226

Table 2MicroRNA sequences used in each of the three GC categories listed in Table 1

Low GC Medium GC High GC

hsa-mir-376c MI0000776 hsa-mir-1-2 MI0000437 hsa-mir-150 MI0000479hsa-mir-376a-1 MI0000784 hsa-mir-15b MI0000438 hsa-mir-99b MI0000746hsa-mir-374a MI0000782 hsa-mir-30b MI0000441 hsa-mir-375 MI0000783hsa-mir-135a-2 MI0000453 hsa-mir-27b MI0000440 hsa-mir-328 MI0000804hsa-mir-297 MI0005775 hsa-mir-124-2 MI0000444 hsa-mir-149 MI0000478hsa-mir-944 MI0005769 hsa-let-7g MI0000433 hsa-mir-339 MI0000815hsa-mir-450b MI0005531 hsa-let-7i MI0000434 hsa-mir-638 MI0003653

4-3 M3 MI02 MI0b MI02 MI05b-12 MI03a-17 MI00 MI03 MI09 MI00c MI0 MI00b M8 MI06 MI09b M6 MI07 MI02 MI07 MI01 MI0

hsa-mir-548d-1 MI0003668 hsa-mir-12hsa-mir-651 MI0003666 hsa-mir-14hsa-mir-620 MI0003634 hsa-mir-15hsa-mir-644 MI0003659 hsa-mir-23hsa-mir-626 MI0003640 hsa-mir-12hsa-mir-548a-2 MI0003598 hsa-mir-12hsa-mir-586 MI0003594 hsa-mir-13hsa-mir-580 MI0003587 hsa-mir-13hsa-mir-579 MI0003586 hsa-mir-13hsa-mir-569 MI0003576 hsa-mir-76hsa-mir-568 MI0003574 hsa-mir-54hsa-mir-567 MI0003573 hsa-mir-88hsa-mir-556 MI0003562 hsa-mir-22hsa-mir-553 MI0003558 hsa-mir-89hsa-mir-539 MI0003514 hsa-mir-22hsa-mir-450a-2 MI0003187 hsa-mir-66hsa-mir-450a-1 MI0001652 hsa-mir-64hsa-mir-384 MI0001145 hsa-mir-44hsa-mir-29a MI0000087 hsa-mir-65

hsa-mir-65hsa-mir-67hsa-mir-76hsa-mir-42

tically significant for “low” GC content sequences relative to either

“medium” (P < 0.0001) or “high” GC content sequences (P < 0.0001).However, when comparing the “medium” and “high” GC con-tent categories, there was no statistically significant difference(P = 0.3711).

In order to test the hypothesis that the number of predictedbinding sites in MicroInspector was affected by GC content, ninesets of random sequences of length 100 nucleotides were gen-erated, with each set having a predefined GC content (Table 3).The length of 100 nucleotides was selected as representative ofthe length of known human microRNA hairpins. Providing eachof these sequence sets to MicroInspector provided sufficient evi-dence to test the hypothesis in a controlled manner. As indicatedby Table 3, the resulting number of predicted binding sites offeredby MicroInspector was significantly affected by the GC content ofthe input sequence, and appears to follow a nonlinear relationship.Sequences with low GC contents had unusually low numbers ofpredicted binding sites while sequences at 70% GC had the highestnumber of predicted binding sites. Curiously, the mean number ofpredicted binding sites decreased when the GC content of the inputsequence was >70%. Sequences with a GC content >60% sharedroughly the same number of predicted binding sites (Table 3). The

Table 3Predicted number of microRNA binding sites via MicroInspector using sets of ran-dom sequences of differing GC content

GC content (%) Predicted number of binding sites by MicroInspector (� ± �)

20 0.39 ± 0.9525 1 ± 1.1630 1.76 ± 2.0340 5.05 ± 3.4150 9.66 ± 4.6460 14.83 ± 7.1170 16.98 ± 7.3575 16.27 ± 7.2780 12.46 ± 5.62

I0000445 hsa-mir-939 MI0005761000459 hsa-mir-935 MI0005757000462 hsa-mir-940 MI0005762000439 hsa-mir-943 MI0005768000442 hsa-mir-941-4 MI0005766

MI0000446 hsa-mir-941-3 MI0005765000449 hsa-mir-937 MI0005759

MI0000450 hsa-mir-933 MI0005755000454 hsa-mir-874 MI0005532005567 hsa-mir-675 MI0005416005565 hsa-mir-658 MI0003682005540 hsa-mir-663 MI00036720005536 hsa-mir-662 MI0003670005533 hsa-mir-661 MI0003669

I0005529 hsa-mir-647 MI0003662003761 hsa-mir-639 MI0003654003661 hsa-mir-636 MI0003651

I0003673 hsa-mir-33b MI0003646003678 hsa-mir-615 MI0003628003681 hsa-mir-602 MI0003615005522 hsa-mir-596 MI0003608003763 hsa-mir-572 MI0003579003685 hsa-mir-564 MI0003570

mean number of predicted binding sites over all sequences for allnine sequence sets was 8.7 binding sites.

3.2. Predicted MicroRNA Binding Sites from Non-human Species

To determine if the GC content bias of MicroInspector was spe-cific to Homo sapiens microRNAs, the above analysis was repeatedfor other organisms with experimentally determined microRNAsin miR registry. Table 4 provides a listing of all known microRNAsin miR registry (release 10.0). Note that the GC content of hair-pins from all known species is large (34% to ∼60%) and that themean GC% over all species is 47.1% (n = 49). Therefore, species suchas A. thaliana may have an abnormally lower number of predictedmicroRNA binding sites in MicroInspector simply because of its

lower overall GC content.

To test this hypothesis, we downloaded all hairpin microRNAsfrom three non-human species in the miR registry (A. thaliana,GC = 39.0%; R. norvegicus, GC = 49%, and Chlamydomonas reinhardtii,GC = 60%) as different representatives from this GC content distri-bution, with sufficient sequences in miR registry and similarity tothe “low,” “medium,” and “high” categories shown in Table 1. Thedistributions for microRNA hairpin GC content and length for theseorganisms are shown in Figs. 3–8, respectively. As anticipated, thenumber of resulting binding sites predicted by MicroInspector wasreduced for A. thaliana relative to R. norvegicus (Table 5) in a mannersimilar to that observed for Table 1.

4. Discussion

The prediction of microRNA binding sites continues to be animportant component of systems biology research, leading to a bet-ter understanding of the dynamic interplay of noncoding RNAs inthe cell. Tools such as MicroInspector are useful in gaining insightfor putative binding locations that can be addressed with follow-on experimental analysis. In addition, these same computational

N. Davis et al. / Computational Biology and Chemistry 32 (2008) 222–226 225

Fig. 3. Distribution of GC content (%) for Arabidopsis thaliana microRNA hairpinsequences in the microRNA registry (n = 184). According to the Kolmogorov–Smirnovtest, the data are consistent with the normal distribution (P = 0.56 where the normaldistribution has mean = 39.48 and standard deviation = 6.601).

Fig. 4. Distribution of length in nucleotides for A. thaliana microRNA hairpinsequences in the microRNA registry (n = 184). According to the Kolmogorov–Smirnovtest, the data are not normally distributed (P = 0.00 where the normal distributionhas mean = 236.6 and standard deviation = 106.8). In particular there is a large over-abundance of sequences with length ∼100 nt and many hairpins with lengths over200 nt.

Fig. 5. Distribution of GC content (%) for Rattus norvegicus microRNA hairpinsequences in the microRNA registry (n = 285). According to the Kolmogorov–Smirnovtest, the data are not likely to be normally distributed (P = 0.01 where the normaldistribution has mean = 50.60 and standard deviation = 8.514).

Fig. 6. Distribution of length in nucleotides for R. norvegicus microRNA hairpinsequences in the microRNA registry. According to the Kolmogorov–Smirnov test,the data are not normally distributed (P = 0.00 where the normal distribution hasmean = 87.69 and standard deviation = 11.22). In particular there is a large overabun-dance of sequences with length 95–100 nt, and at 110 nt (n = 285).

Fig. 7. Distribution of GC content (%) for Chlamydomonas reinhardtii microRNAhairpin sequences in the microRNA registry (n = 50). According to theKolmogorov–Smirnov test, the data are consistent with the normal distribu-tion (P = 1.00 where the normal distribution has mean = 60.01 and standarddeviation = 9.249).

Fig. 8. Distribution of length in nucleotides for C. reinhardtii microRNA hairpinsequences in the microRNA registry (n = 50). According to the Kolmogorov–Smirnovtest, the data are not normally distributed (P = 0.00 where the normal distributionhas mean = 342.1 and standard deviation = 218.8).

226 N. Davis et al. / Computational Biology a

Table 4Number of microRNA hairpin sequences per species listed in miR registry sorted bydecreasing mean GC%

Organism Average GC% Number of sequences

Chlamydomonas reinhardtii 60.18 49Rhesus monkey 58.21 7Saccharum officinarum 54.90 16Zea mays 54.87 96Ovis aries 53.51 4Sorghum bicolor 52.35 72Mouse gammaherpesvirus 50.73 9Rattus norvegicus 49.66 285Selaginella moellendorffii 49.34 58Bos taurus 49.28 117Homo sapiens 48.96 533Mus musculus 48.56 442Pongo pygmaeus 48.43 84Bombyx mori 48.39 21Macaca mulatta 48.37 71Pinus taeda 48.12 27Macaca nemestrina 47.90 75Pan paniscus 47.85 89Tetraodon nigroviridis 47.44 132Pan troglodytes 47.42 83Physcomitrella patens 47.42 220Gorilla gorilla 47.29 86Lagothrix lagotricha 47.17 48Saguinus labiatus 47.11 42Xenopus laevis 47.07 7Fugu rubripes 46.97 131Monodelphis domestica 46.93 107Cricetulus griseus 46.74 1Sus scrofa 46.71 54Ateles geoffroyi 46.70 45Apis mellifera 46.63 54Triticum aestivum 46.56 32Lemur catta 46.00 16Gallus gallus 45.77 149Xenopus tropicalis 45.63 177Danio rerio 45.55 337Glycine max 45.36 22

siRNA hotspots and GC preference by plant Dicer-like proteins. FEBS Lett. 581,

Canis familiaris 45.03 6Oryza sativa 44.73 243Caenorhabditis briggsae 44.44 95Anopheles gambiae 44.37 38Caenorhabditis elegans 43.79 135Populus trichocarpa 43.76 215Drosophila pseudoobscura 42.28 73Drosophila melanogaster 41.31 93

Arabidopsis thaliana 39.01 184Medicago truncatula 38.95 30Brassica napus 38.47 5Schmidtea mediterranea 34.00 63

Table 5Predicted number of binding sites by MicroInspector for three organisms with variedmean GC%

Organism Number ofsequences

Mean GC% Predicted number ofbinding sites byMicroInspector (� ± �)

A. thaliana 184 39.0 1.1 ± 1.8R. norvegicus 285 49.7 5.7 ± 5.0

approaches help us to understand the rules that guide RNA inter-ference and RNA-mediated regulation in the cell. Thus, it is alsocritical to understand any shortcomings of these computationalapproaches.

We investigated the role of GC content on the number ofmicroRNA binding sites predicted by MicroInspector. A strongbias towards under predicting bindings sites for low GC content

nd Chemistry 32 (2008) 222–226

sequences was observed, especially for microRNAs with <50% GCcontent (Tables 3 and 5). Researchers working with organisms withunusually low GC content should be aware of this bias. Table 4lists some of the organisms that may fall into this category. Thereare two possible reasons for this bias in the number of predictedbinding sites. Either the binding efficiency calculation in MicroIn-spector overly favors GC-rich regions at the risk of ignoring truepositive binding sites in low GC sequences, or the data suggests thatsequences with low GC content truly have fewer microRNA bindingsites (Ho et al., 2007). Given that microRNA sequences exhibit a fold-ing free energy that is lower than randomized sequences (Bonnetet al., 2004a,b), further experimental evidence and comparison toknown miR targets (Sethupathy et al., 2006a,b) will help to resolvethis issue. However, by knowing the distribution of the bias for GCcontent by MicroInspector, it should be able to apply this as a cor-rective filter for binding prediction so that true positive bindingsites are not ignored in low GC-content genomic sequences.

Acknowledgements

This work was supported by the National Science Foundationunder a SBIR Phase IIcc award (DMI-0522270). The views, opinionsand/or findings contained in this report are those of the authorsand should not be construed as an official National Science Foun-dation position, policy or decision unless so designated by otherdocumentation.

References

Betel, D., Wilson, M., Gabow, A., Marks, D.S., Sander, C., 2008. The microRNA.orgresource: targets and expression. Nucleic Acids Res. 36, D149–D153.

Bonnet, E., Wuvts, J., Rouze, P., Van de Peer, Y., 2004a. Detection of 91 potentialconserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifiesimportant target genes. Proc. Natl. Acad. Sci. U.S.A. 101, 11511–11516.

Bonnet, E., Wuyts, J., Rouze, P., Van de Peer, Y., 2004b. Evidence that microRNA pre-cursors, unlike other non-coding RNAs, have lower folding free energies thanrandom sequences. Bioinformatics 20, 2911–2917.

Griffiths-Jones, S., 2004. The microRNA registry. Nucleic Acids Res. 32, D109–D111.Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., Enright, A.J., 2006. miR-

Base: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res.34, D140–D144.

Foshay, K.M., Gallicano, G.I., 2007. Small RNAs, big potential: the role of microRNAin stem cell function. Curr. Stem Cell Res. Ther. 2, 264–271.

Ho, T., Wang, H., Pallett, D., Dalmay, T., 2007. Evidence for targeting common

3267–3272.Lander, E., et al., 2001. Initial sequencing and analysis of the human genome. Nature

409, 860–921.Liu, X., Fortin, K., Mourelatos, Z., 2008. MicroRNAs: biogenesis and molecular func-

tions. Brain Pathol. 18, 113–121.Lowery, A.J., Miller, N., McNeill, R.E., Kerin, M.J., 2008. MicroRNAs as prognostic indi-

cators and therapeutic targets: potential effect on breast cancer management.Clin. Cancer Res. 14, 360–365.

Marquez, R.T., McCaffrey, A.P., 2007. Advances in MicroRNA: implications for genetherapists. Hum. Gene Ther. 19 (December) (Epub ahead of print).

Rusinov, V., Vesselin, B., Minkov, I.N., Tabler, M., 2005. MicroInspector: a web toolfor detection of miRNA binding sites in an RNA sequence. Nucleic Acids Res. 33,W696–W700.

Schetter, A.J., Leung, S.Y., Sohn, J.J., Zanetti, K.A., Bowman, E.D., Yanaihara, N., Yuen,S.T., Chan, T.L., Kwong, D.L., Au, D.K., Liu, C.G., Calin, G.A., Croce, C.M., Harris, C.C.,2008. MicroRNA expression profiles associated with prognosis and therapeuticoutcome in colon adenocarcinoma. JAMA 299, 425–436.

Sethupathy, P., Corda, B., Hatzigeorgiou, A.G., 2006a. TarBase: a comprehensivedatabase of experimentally supported animal microRNA targets. RNA 12,192–197.

Sethupathy, P., Megraw, M., Hatzigeorgiou, A.G., 2006b. A guide through presentcomputational approaches for the identification of mammalian microRNA tar-gets. Nat. Methods 3, 881–886.

Stothard, P., 2000. The sequence manipulation suite: JavaScript programs for analyz-ing and formatting protein and DNA sequences. BioTechniques 28, 1102–1104.

Thadani, T., Tammi, M.T., 2006. MicroTar: predicting microRNA targets from RNAduplexes. BMC Bioinformatics 7, S20.