research article guitar: an r/bioconductor package for ...research article guitar: an r/bioconductor...

9
Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic Features Xiaodong Cui, 1 Zhen Wei, 2,3 Lin Zhang, 4 Hui Liu, 4 Lei Sun, 5 Shao-Wu Zhang, 6 Yufei Huang, 1 and Jia Meng 2,3 1 Department of Electrical and Computer Engineering, University of Texas at San Antonio, San Antonio, TX 78230, USA 2 Department of Biological Sciences, HRINU, SUERI, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China 3 Institute of Integrative Biology, University of Liverpool, Liverpool L69 3BX, UK 4 School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China 5 School of Information Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, China 6 School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi 710027, China Correspondence should be addressed to Jia Meng; [email protected] Received 10 February 2016; Revised 5 April 2016; Accepted 11 April 2016 Academic Editor: Ivan Merelli Copyright © 2016 Xiaodong Cui et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Biological features, such as genes and transcription factor binding sites, are oſten denoted with genome-based coordinates as the genomic features. While genome-based representation is usually very effective in correlating various biological features, it can be tedious to examine the relationship between RNA-related genomic features and the landmarks of RNA transcripts with existing tools due to the difficulty in the conversion between genome-based coordinates and RNA-based coordinates. We developed here an open source Guitar R/Bioconductor package for sketching the transcriptomic view of RNA-related biological features represented by genome based coordinates. Internally, Guitar package extracts the standardized RNA coordinates with respect to the landmarks of RNA transcripts, with which hundreds of millions of RNA-related genomic features can then be efficiently analyzed within minutes. We demonstrated the usage of Guitar package in analyzing posttranscriptional RNA modifications (5-methylcytosine and N6-methyladenosine) derived from high-throughput sequencing approaches (MeRIP-Seq and RNA BS-Seq) and show that RNA 5-methylcytosine (m 5 C) is enriched in 5 UTR. e newly developed Guitar R/Bioconductor package achieves stable performance on the data tested and revealed novel biological insights. It will effectively facilitate the analysis of RNA methylation data and other RNA-related biological features in the future. 1. Introduction Genome-based coordinates, which consist of the name of chromosome and the starting/ending coordinates, have been widely used to denote the genomic location of various bio- logical features, such as genes, SNPs, and transcription factor binding sites (TFBS). With genome-based coordinates, the relationship between different biological features can be easily inferred. Currently, genomic features (biological features represented by genome-based coordinates) have become the basis of many bioinformatics tools in various biological data processing pipelines, and dedicated types of operation are also available [1]. While genome-based coordinates are very useful for analysis of genome related biological features, it can still be tedious for analysis or visualization of RNA-related features, such as RNA N6-methyladenosine (m 6 A) and RNA 5-methylcytosine (m 5 C) [2]. As an emerging layer of gene expression regulation, posttranscriptional RNA modifications, including m 6 A and m 5 C, are recently found to play various important roles in a number of biological processes, such as translation efficiency [3], microRNA processing [4], RNA-protein interaction [5], RNA stability [6], and pluripotency [7]. Together with the development of new sequencing approaches [8–11] for unbi- ased profiling of the posttranscriptional RNA modifications, a number of bioinformatics tools [12, 13] have been created Hindawi Publishing Corporation BioMed Research International Volume 2016, Article ID 8367534, 8 pages http://dx.doi.org/10.1155/2016/8367534

Upload: others

Post on 24-Mar-2021

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

Research ArticleGuitar An RBioconductor Package for Gene Annotation GuidedTranscriptomic Analysis of RNA-Related Genomic Features

Xiaodong Cui1 Zhen Wei23 Lin Zhang4 Hui Liu4 Lei Sun5 Shao-Wu Zhang6

Yufei Huang1 and Jia Meng23

1Department of Electrical and Computer Engineering University of Texas at San Antonio San Antonio TX 78230 USA2Department of Biological Sciences HRINU SUERI Xirsquoan Jiaotong-Liverpool University Suzhou 215123 China3Institute of Integrative Biology University of Liverpool Liverpool L69 3BX UK4School of Information and Electrical Engineering China University of Mining and Technology Xuzhou 221116 China5School of Information Engineering Yangzhou University Yangzhou Jiangsu 225127 China6School of Automation Northwestern Polytechnical University Xirsquoan Shaanxi 710027 China

Correspondence should be addressed to Jia Meng jiamengxjtlueducn

Received 10 February 2016 Revised 5 April 2016 Accepted 11 April 2016

Academic Editor Ivan Merelli

Copyright copy 2016 Xiaodong Cui et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Biological features such as genes and transcription factor binding sites are often denoted with genome-based coordinates as thegenomic features While genome-based representation is usually very effective in correlating various biological features it can betedious to examine the relationship between RNA-related genomic features and the landmarks of RNA transcripts with existingtools due to the difficulty in the conversion between genome-based coordinates and RNA-based coordinatesWe developed here anopen source Guitar RBioconductor package for sketching the transcriptomic view of RNA-related biological features representedby genome based coordinates Internally Guitar package extracts the standardized RNA coordinates with respect to the landmarksof RNA transcripts with which hundreds of millions of RNA-related genomic features can then be efficiently analyzed withinminutesWe demonstrated the usage ofGuitar package in analyzing posttranscriptional RNAmodifications (5-methylcytosine andN6-methyladenosine) derived from high-throughput sequencing approaches (MeRIP-Seq and RNA BS-Seq) and show that RNA5-methylcytosine (m5C) is enriched in 51015840UTR The newly developed Guitar RBioconductor package achieves stable performanceon the data tested and revealed novel biological insights It will effectively facilitate the analysis of RNAmethylation data and otherRNA-related biological features in the future

1 Introduction

Genome-based coordinates which consist of the name ofchromosome and the startingending coordinates have beenwidely used to denote the genomic location of various bio-logical features such as genes SNPs and transcription factorbinding sites (TFBS) With genome-based coordinates therelationship betweendifferent biological features can be easilyinferred Currently genomic features (biological featuresrepresented by genome-based coordinates) have become thebasis of many bioinformatics tools in various biological dataprocessing pipelines and dedicated types of operation arealso available [1] While genome-based coordinates are very

useful for analysis of genome related biological features it canstill be tedious for analysis or visualization of RNA-relatedfeatures such as RNA N6-methyladenosine (m6A) and RNA5-methylcytosine (m5C) [2]

As an emerging layer of gene expression regulationposttranscriptional RNA modifications including m6A andm5C are recently found to play various important roles in anumber of biological processes such as translation efficiency[3] microRNA processing [4] RNA-protein interaction [5]RNA stability [6] and pluripotency [7] Together with thedevelopment of new sequencing approaches [8ndash11] for unbi-ased profiling of the posttranscriptional RNA modificationsa number of bioinformatics tools [12 13] have been created

Hindawi Publishing CorporationBioMed Research InternationalVolume 2016 Article ID 8367534 8 pageshttpdxdoiorg10115520168367534

2 BioMed Research International

for interpretation of these datasets A mammalian RNAmethylation database [14] has been created that paved theway for a systematic understanding of the RNA methylomeregulation mechanism [15] however to our knowledge nobioinformatics effort has been made specifically for effectivevisualization of RNA methylation features from global levelConceivably the functions of RNA-related features are likelyto be related to the landmarks of RNA transcripts that istranscription starting site (TSS) start codon stop codonand transcription ending site (TES) and the existing toolsdeveloped for genome-based features are not effective foranalysis of RNA methylation data

Compared with genome-regulated biological features(eg histonemodifications and TFBS) visualization of RNA-related features (such as RNA methylation sites) representedin genomic coordinates is nontrivial due to the followingreasons

(i) Increased Ambiguity due to IsoformsThe ambiguity ingenome-based coordinates is often due to the repeatsCompared with genome transcriptome has increasedcomplexity due to transcript isoform and alternativesplicing It is possible that an RNA methylation sitefalls into the CDS of one transcript but on the31015840UTR of another isoform transcript It is knownthat many genes have very large number of isoformtranscripts so it may be impossible to specificallyassign a genomic feature aligned on this gene to aparticular isoform transcript

(ii) Variation in Transcript Length In human and mousetranscripts vary greatly in size It is important tonotice that 1000 base pairs (bp) from start codonmay be a long distance for shorter genes but notfor the longer ones For genomic coordinates bphas been shown to be a reasonable unit howeverwhen measuring distance between two RNA-relatedfeatures a relative unit standardized by the entirewidth of the transcript may be more suitable Thisidea has been used widely today in many studiesPlease note that the technical resolution remains thesame when measured in bp on shorter or longertranscripts When a standardized coordinate is usedlonger transcripts actually obtain a higher resolutionin terms of standardized coordinate compared withthe shorter transcripts Compared with a transcriptwith 20 k bp it is more difficult to tell whether theRNA methylation site on a 200 bp transcript is closeto its stop codon even though the technical resolutionremains the same

(iii) Complexity in Landmarks of RNA Transcript For his-tone modifications and transcription factor bindingsites landmarks of interests are often the transcrip-tion starting site (TSS) and transcription ending sites(TES) For RNA methylation two additional land-marks are the start codon and stop codon of mRNAand the setting is further complicated by the existenceof long noncoding RNA (lncRNA) and various smallRNA families Furthermore some mRNAs may nothave 31015840UTRor 51015840UTR Additionally the 51015840UTR CDS

and 31015840UTR are apparently of different length for mostgenes so the 3 components need to be standardizedindependently when summarized or compared withother transcripts There are already tools such asngsplot [16] developed to handle TSS and TES butto our knowledge none supportsmore detailed struc-ture (start codon and stop codon) It is important toinclude all RNA landmarks and discriminate codingand noncoding RNAs when analyzing RNA-relatedfeatures due to their intrinsic property

For the aforementioned reasons a dedicated approach needsto be developed for visualization of RNA methylation dataand other RNA-related biological features

2 Design and Implementation

We develop here an open source R package Guitar for geneannotation guided transcriptomic analysis of RNA-relatedgenomic features such as RNA methylation sites denoted ingenome-based coordinates The approach is detailed next

21 Guitar Coordinates To visualize the multiple RNA-related features together transcripts of different length needto be standardized in the first place For this purpose weconstructed the Guitar coordinates which is essentially thegenomic projection of the standardized transcriptomic coor-dinates Specifically each component of a single transcriptis divided into a number of bins of equal width For longnoncoding RNA the whole transcript is a single componentfor mRNA there are 3 components that is 51015840UTR CDSand 31015840UTR Their genomic projected coordinates are thenobtained with the help of GenomicFeatures RBioconductorpackage [1] Please note that of interest are the mature mRNAand lncRNA and it is possible that a specific bin may spanintrons The generated Guitar coordinates are essentially stillgenome-based coordinates but clearly associated with land-marks of transcript for example 02 standardized lncRNAlength from the TSS The procedures for generating Guitarcoordinates are illustrated in Figure 1

22 Guitar Coordinates of a Transcriptome As mentionedpreviously for mRNA of interest are usually 3 componentsrather than a single one that is 51015840UTR CDS and 31015840UTRConsistently the Guitar coordinates need to be generatedseparately for all the 3 regions In order to make the 3components comparable each component is standardizedindependently and contributes to 13 of the entire codingtranscript (the difference between 51015840UTR CDS and 31015840UTRin size can also be reflected in the analysis byGuitar package)For lncRNA this is not needed and theGuitar coordinates aregenerated for the entire lncRNA

Due to the existence of isoform ambiguity the samegenomic locationmay be associated withmultiple transcriptsand thus related to multiple Guitar coordinates To ensurethe specificity of the generated Guitar coordinates filteringof highly ambiguous transcripts may be needed Two filtersare implemented Firstly a length filter is implemented toselect transcripts longer than a user-defined threshold This

BioMed Research International 3

lncRNA

Pre-lncRNAExon-1 Exon-2 Exon-3Intron Intron

Binning

Bin-1 Bin-2 Bin-3

Guitar coordinates

Exon-1 Exon-2 Exon-3Intron Intron

Bin-1 Bin-2 Bin-3

Convert to genomiccoordinates

5

5

5

5

3

3

3

3

Standardized coordinateson lncRNA

Splicing

Figure 1 Guitar coordinates This figure illustrates how the Guitar coordinates are generated based on 3 bins on lncRNA transcript Thebins may be split into multiple sections of the transcript represented by GRangesList object which can be conveniently compared with othergenome-based features to reflect their distribution In practice if 70 of a feature overlap 51015840UTR and 30 CDS then it is likely that thefeature overlaps with more Guitar Coordinates (GCs) composed from 51015840UTR than from CDS As a result more GCs from 51015840UTR will reportthe overlapping of this feature and the resulting weight will precisely reflect the 70 and 30 division

is to ensure the generated Guitar coordinates have sufficientresolution from the technology perspective with the dataanalyzed For techniques with single-base resolution thiscutoff can be smaller (eg 10 bp) while for techniqueswith lower resolution it should be larger (eg 100 bp)Secondly an ambiguity filter is implemented to discard geneswith too many isoforms which can cause ambiguity in thefeature assignment stage The implementation of this filter ismainly to reduce memory usage and save computation time(see Supplementary Material Figure S1 and Table S1 for acomparison of different settings in Supplementary Materialavailable online at httpdxdoiorg10115520168367534) Inaddition the filter may be necessary for less-studied specieswith problematic gene annotationThe second filter can filterout a number of genes with higher isoform ambiguity anddoing so will significantly decrease the memory usage forthe constructed Guitar coordinates and save computationtime For this purpose we implemented a simple strategyby counting the number of overlapping transcripts on thegenome that is checking the number of transcripts sharingexons In the default setting ofGuitar package we filtered outtranscripts that overlap with more than 3 other transcriptson the genome The parameter provides reasonable goodresults in the data tested and can be easily customized by theuser After applying the ambiguity filter if a genomic featurestill overlaps with multiple transcripts the ambiguity will befactored in the analysis Specifically if a site is located onboth CDS and 31015840UTR of the same transcript then the Guitarcoordinates associated with those components should haveoverlapped with that genomic feature and those coordinatesare labeled as associatedwith the genomic feature withweight1 indicating the feature is fully associated with this transcriptHowever if a genomic feature is associated with 119899 transcripts

that is it overlaps with 119899 transcripts then the overlappingcoordinates are associated with that genomic feature withweight 1119899 reflecting the ambiguity in association

23 Transcriptomic View of Genomic Features To sketchthe transcriptomic view of genomic features the Guitarcoordinates can be compared with other genomic featuresand the number of overlapped features can then be countedand standardized on mRNA and lncRNA respectively Thedistribution of genomic features onRNAwill then be summa-rized and visualized by ggplot2 package for quality graphics[17] We also provided a convenient function GuitarPlotfor fast visualization of various genomic features in differentformats The general working procedures for Guitar packageare shown in Figure 2

Besides the aforementioned structure there are a fewmore useful features provided by Guitar package

(i) Including neighborhood DNA regions for comparisonThe neighborhood DNA regions (promoter regionand its complementary DNA at the 31015840 end) can beoptionally included in analysis ofGuitar packageThiswill be useful for analyzing genomic features that arerelated to both DNA and RNA such as H3K4me3ChIP-Seq data

(ii) Ambiguous Assignment of the Ambiguous GenomicFeatures Due to ambiguity of heterogeneous tran-scriptome some genomic features overlap with morethan one transcript and thus with ambiguous belong-ings In the Guitar package ambiguous features arealso counted in the analysis with their weights equallydivided between all the overlapping transcripts How-ever we can optionally filter out genomic features

4 BioMed Research International

Prepare transcript info

(i) Downloadedfrom UCSC

(ii) Filtered by size and ambiguity

Build guitar coordinates Guitar plot

(i) mRNA

(ii) lncRNA(iii) Promoter and

tail DNAregions

5998400UTR CDS

and 3998400UTR

(i) Count overlaps withother features

(ii) Assign weights

(iii) Normalizationfor distribution

Figure 2 Workflow of Guitar package The gene annotation can be automatically downloaded from Internet or provided as a TxDb object[1] by the user After filtering transcripts that may not provide useful information Guitar coordinates of different components are builtindependently with which the actual transcriptomic distribution of genomic features can be observed

overlapping with more than a predefined numberof transcripts (Default 5) to completely discard theimpact of these highly ambiguous features

(iii) Resizing the Components in Visualization In practicedifferent components of mRNA that is 51015840UTR CDSand 31015840UTR are of quite different width 51015840UTR isusually much shorter than 31015840UTR and CDS in mouseand human Although these components are treatedindependently when building the Guitar coordinateswe still calculated the average length of each compo-nent across different genes so the true relative widthcan be optionally reflected in the plot generated fromGuitar package however doing so may make the51015840UTR region too small for clear observation

(iv) Connection with ggplot2 for More Complex Graph-ics The Guitar may optionally return intermediateresults which can be reused in the ggplot2 packagefor more complex graphics This is designed for theadvanced users only and not recommended

3 Results

We developed the Guitar RBioconductor package for geneannotation guided transcriptomic analysis of RNA-relatedgenomic features It is currently and publicly available fromBioconductor In this section we show the application ofGuitar package to the analysis of RNA methylation sitesdenoted as genomic features with a few examples Moreinformation is available from the documentation of theGuitar RBioconductor package

31 Case Study 1 RNA N6-Methyladenosine from MeRIP-SeqIn the first example we studyMeRIP-Seq data [10] profiling ofthe transcriptome RNA N6-methyladenosine (m6A) [18 19]sites in human HepG2 cell lines The RNA m6A methylationsites are obtained with exomePeak RBioconductor package[12] Compared with other peak calling algorithms oneadded flexibility of the exomePeak package is to detect onlythe highly methylated sites With a larger ldquoIPinput ratiordquoexomePeak will report only the highly methylated sites thatis a higher proportion of a specific kind of RNA moleculecarries methylation at this site

We implemented exomePeak setting different ldquoIPinputratiordquo (1 2 4 and 8) and we then examined the distributionsof the peaks in different exonic regions at each enrichmentthreshold As shown in Figure 3 the detected m6A sites withmore than 8 times of enrichment are overly present near thestop codonbut highly deficient near transcription starting site(TSS) and on 51015840UTR The different distribution patterns ofthe highly and weakly methylated sites may indicate functionversatility and call for more specialized analysis targetingthe sites with different methylation level In contrast todistribution preference of m6A sites on mRNA it is almostuniformly (or randomly) distributed on lncRNA regardlessof the ldquoIPinput ratiordquo specified

In the aforementioned example we implemented thedefault

32 Case Study 2 RNA 5-Methylcytosine from BS-Seq Inprevious study we confirmed that RNA m6A methylationsites are enriched near stop codon and discovered that highlyenrichedm6Amethylation sites are depleted on 51015840UTRNextwe study 5-methylcytosine (m5C) [20 21] profiled with RNABS-Seq experiment [11 22] Different from the relatively well-studiedDNAm5Cmethylation [23 24] the functions of RNAm5C on mRNA are still largely elusive [20] and its globaldistributions onmRNAand lncRNA are poorly characterizedso far

The bisulfite sequencing data characterizing the RNAm5C methylation profiles in mouse embryo fibroblast [25]was directly obtained from GEO (GSE44359) and thenprocessed with trim galore [26] to remove low-quality readsand adaptor sequences the processed reads are then alignedto mouse mm10 genome assembly with Bismark [27] and424627 Cytosine (C) residuals are called with at least 5 readsaligned To analyze the distribution ofmethylatedC residualswe further divided all the reported C residuals into 3 groupsbased on their methylation level with a binomial test usingthe average methylation probability (165) of the entiretranscriptome at confidence level 005

It can be seen from Figure 4 that the distribution ofm5C on both lncRNA and mRNA has a very strong 51015840 biasthat is more reported C residuals from BS-Seq data are onthe 51015840UTR region The distribution patterns of 3 groups ofresiduals with different methylation level are quite different

BioMed Research International 5

00

05

10

15

20

25Fr

eque

ncy

Fold enrichment

12

48

Fold enrichment

12

48

Distribution on mRNA

lncRNA00

05

10

15

Freq

uenc

y

Distribution on lncRNA

5998400UTR CDS 3

998400UTR

Figure 3 m6A sites on mRNA and lncRNA In mRNA the strongest binding sites (ldquoIPinput ratiordquo larger than 8) are highly enriched nearstop codon side of 31015840UTR and deficient on TSS (transcription starting site) side of 51015840UTR and the phenomena are more prominent than lowlymethylated sites In contrast the m6A sites are almost uniformly distributed on lncRNA despite the ldquoIPinput ratiordquo specified Please notethat in this figure the size of 51015840UTR CDS and 31015840UTR reflects their true width within the transcriptome so the 51015840UTR region is much shortercompared with the other two components This result is based on peaks called on human HepG2 dataset [10]

0

1

2

Freq

uenc

y

Methylation levelHigherUnsureLower

Distribution on mRNA

lncRNA00

05

10

15

20

Freq

uenc

y

Distribution on lncRNA

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTR

Figure 4 Distribution of RNAm5C residuals on mRNA and lncRNA We divided all the Bismark reported C residuals into 3 groups that is9514 residuals with methylation level higher than 165 8600 lower and 406513 residuals cannot be determined with statistical significanceWe can see that for all 3 groups there exists a strong 51015840 bias on the mouse RNA BS-Seq data on both mRNA and lncRNA The highly m5Cmethylated residuals are strongly enriched at 51015840UTR of mRNA

on mRNA with highly methylated Cs (higher than 165with 005 confidence level) prominently enriched on 51015840UTRand near start codon A cytosine residual located on 51015840UTRis around 1 times more likely to be methylated than thaton CDS and 068 times more likely than that on 31015840UTRWhile previous study indicates RNAm5C is enriched on both

51015840UTR and 31015840UTR our analysis with increased resolutionindicates the enrichment is a lot stronger on 51015840UTR than31015840UTR [20] We also notice the pattern is complementary tothe distribution of m6A which is enriched near stop codonand 31015840UTR [8] (see Figure 3) These observations suggestthat RNA m5C and m6A may work in a complementary

6 BioMed Research International

manner On lncRNA however despite the overall 51015840 biasthere is no apparent difference in their distribution patternsrevealed

We in the following compared the relative distributionof highly and lowly m5C methylated C residuals on RNAand DNA The same processing pipe line as described pre-viously is applied to mouse embryo fibroblast whole genomebisulfite sequencing data [25] which profiles the DNA m5Cmethylation rather than that on RNA A total of 3641243cytosine residuals are reported by Bismark and they are thendivided into 3 groups (988335 1966017 and 686891 residualsresp) based onwhether theirmethylation level is significantlyhigher or lower than the average reported DNA methylationlevel (6579) with a binomial test at 005 confidence levelThe distributions of the 3 groups of cytosine residuals arestill profiled by Guitar package with neighborhood DNAregions included and the results are shown in Figure 5 Wecan see that compared with RNA methylation profile wherethe highly methylated residuals are enriched on 51015840UTRthe DNA methylation profile shows a clear oppose patternwith the lowly methylated residuals enriched on 51015840UTR andpeaked near the start codon to enable transcription initiationThis observation indicates DNA and RNA methyltransferasecomplexes probably have quite different sequence specificityeven though some key enzyme genes such as Dnmt2 [28 29]are shared between them [30]

4 Discussion and Conclusion

Currently most biological features are represented withgenome-based coordinates making it rather tedious forcomparing with transcriptomic landmarks We developed aGuitar R package which can be a useful tool for analyzingRNA-related genomic features especially RNA methylationBuilt upon the highly efficient GRangesList structure andGenomicFeatures RBioconductor packages Guitar can effi-ciently process millions of genomic features within minutesfor efficient transcriptomic analysis It may automaticallydownload gene information from UCSC genome browserincluding neighborhood DNA regions and allocate theweight of ambiguous features The developed Guitar coordi-nates also provide low level conversion from transcript-basedcoordinates to genome-based coordinates which shouldfacilitate various customized analysis Nevertheless there arestill a number of issues remaining to be addressed

Firstly highly abundant transcripts and ambiguousgenomic features may dominate the result from Guitar pack-age Currently all genomic features and all RNA transcriptshave the same weight this is fine when the genomic featuresare biological features such as RNA methylation sites butit may cause problem when the genomic features are NGSmapped reads such as RNAseq reads because some genesare a lot more highly expressed than the other genes and thereads generated from these genes may dominate the result Amore robust approach may be to use the median abundanceof all RNA transcripts rather the summed density It is alsopossible to develop a boxplot-like visualization scheme in thefuture to indicate the estimated higher and lower bounds forthe distribution of tested features

000

025

050

075

100

Freq

uenc

y

000

025

050

075

100

Freq

uenc

y

5998400UTR CDS 3

998400UTR

Methylation levelHigherUnsureLower

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTRPromoter(1kb) (1kb)

Tail

RNA m5C on mRNA

DNA m5C on mRNA encoding region

Figure 5 Distribution of DNA and RNA m5C residuals WhileC residuals of mRNA are likely to be methylated on 51015840UTR it isopposite on DNA On DNA the start codon and 51015840UTR regions areless likely to bemethylated compared with other regions to allow theinitiation of transcription process

Secondly the developed normalization scheme withinGuitar package only facilitates the comparison of featuredistribution on mRNA transcripts or on lncRNA transcriptsA cross-comparison between mRNA and lncRNA is not sup-ported so far In general it is still very difficult to rigorouslycontrast the distribution of genomic features on mRNA andlncRNA due to their intrinsic difference Compared withmRNA lncRNA usually has lower expression level and lessnumber of exons and is less conserved in sequence

Thirdly because of the heterogeneity of transcriptomeand the discrepancy between transcriptomic and genomicfeatures ambiguity arises and cannot be resolved perfectlyPrevious study of transcriptomic distribution of RNAmethy-lation relies on the longest transcript to eliminate ambiguity

BioMed Research International 7

in feature assignment whichmay require further justificationon why the longest transcript should be the canonical one Amore reasonable solution may be to rely on the most abun-dant isoform transcript however the isoform quantificationis usually a difficult problem and such information may notbe handy and sometimes may be unavailable at all CurrentlyGuitar package adapts two strategies to address this issueFirstly highly ambiguous genomic features or transcripts areexcluded from the analysis Secondly the weight of ambigu-ousmapping is evenly divided among transcripts It should bepossible to develop amore rigorous formulation for examplewith fuzzy system model [31ndash33] to further improve theallocation of weight from ambiguous overlapping

Fourthly during the development process of the Gui-tar package we also realized the Guitar coordinates maypotentially be used for accessing the data quality of MeRIP-Seq datasets The reproducibility uniformity homogeneityand the information entropy within RNA-related sequencingdata (RNA-SeqMeRIP-Seq and BS-Seq) can be convenientlyassessed with the help of Guitar coordinates however thisapplication has not been explored so far andwill be our futurework

Despite the aforementioned limitations the Guitar pack-age is capable of sketching the approximate transcriptomicdistribution of millions of genomic features within minutesin a single line of command We believe that the newlydeveloped Guitar coordinates and the Guitar package haveachieved stable performance on the data tested and shouldeffectively facilitate the analysis of RNAmethylation data andother RNA-related biological features

Additional Points

Project name is Guitar Project homepage is httpbiocon-ductororgpackagesGuitar Operating system(s) is Plat-form Independent Programming language is R License isGPL-2 For manual and guide please see supplementarymaterials

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Jia Meng Yufei Huang and Shao-Wu Zhang conceived theproject Xiaodong Cui Zhen Wei and Jia Meng carried outdevelopment and implementation of the software draftedthe paper and carried out development and implementationof the software Lin Zhang Hui Liu and Lei Sun providedvaluable discussion and helped draft the paper All authorsread and approved the final paper Xiaodong Cui and ZhenWei contribute equally to this work

Acknowledgments

This study is supported by National Natural Science Foun-dation of China (61401370 61473232 91430111 61501466

and 61301220) Fundamental Research Funds for the Cen-tral Universities (2014QNB47 2014QNA84) Jiangsu Scienceand Technology Program (BK20140403) and US NationalInstitutes of Health 5 U54 CA113001 and R01GM113245The authors also appreciate the computational support fromComputational Systems Biology Core University of Texas atSanAntonio funded byNational Institute onMinorityHealthand Health Disparities (G12MD007591) from the NationalInstitutes of Health of USA

References

[1] M LawrenceWHuberH Pages et al ldquoSoftware for computingand annotating genomic rangesrdquo PLoS Computational Biologyvol 9 no 8 Article ID e1003118 2013

[2] N Liu and T Pan ldquoRNA epigeneticsrdquo Translational Researchvol 165 no 1 pp 28ndash35 2015

[3] X Wang B S Zhao I A Roundtree et al ldquoN(6)-methylad-enosine modulates messenger RNA translation efficiencyrdquo Cellvol 161 no 6 pp 1388ndash1399 2015

[4] C R Alarcon H Lee H Goodarzi N Halberg and S FTavazoie ldquoN 6-methyladenosinemarks primarymicroRNAs forprocessingrdquo Nature vol 519 no 7544 pp 482ndash485 2015

[5] N Liu Q Dai G Zheng C He M Parisien and T Pan ldquoN6-methyladenosine-dependent RNA structural switches regulateRNA-protein interactionsrdquo Nature vol 518 no 7540 pp 560ndash564 2015

[6] YWang Y Li J I TothMD Petroski Z Zhang and J C ZhaoldquoN 6-methyladenosinemodification destabilizes developmentalregulators in embryonic stem cellsrdquo Nature Cell Biology vol 16no 2 pp 191ndash198 2014

[7] S Geula S Moshitch-Moshkovitz D Dominissini et al ldquom6AmRNA methylation facilitates resolution of naıve pluripotencytoward differentiationrdquo Science vol 347 no 6225 pp 1002ndash1006 2015

[8] K D Meyer Y Saletore P Zumbo O Elemento C E Masonand S R Jaffrey ldquoComprehensive analysis of mRNA methyla-tion reveals enrichment in 31015840 UTRs and near stop codonsrdquo Cellvol 149 no 7 pp 1635ndash1646 2012

[9] DDominissini SMoshitch-MoshkovitzM Salmon-DivonNAmariglio and G Rechavi ldquoTranscriptome-wide mapping ofN6-methyladenosine by m6A-seq based on immunocapturingand massively parallel sequencingrdquo Nature Protocols vol 8 no1 pp 176ndash189 2013

[10] D Dominissini S Moshitch-Moshkovitz S Schwartz et alldquoTopology of the human and mouse m6A RNA methylomesrevealed by m6A-seqrdquo Nature vol 484 no 7397 pp 201ndash2062012

[11] M Schaefer T Pollex K Hanna and F Lyko ldquoRNA cytosinemethylation analysis by bisulfite sequencingrdquo Nucleic AcidsResearch vol 37 no 2 article e12 2009

[12] J Meng X Cui M K Rao Y Chen and Y Huang ldquoExome-based analysis for RNA epigenome sequencing datardquo Bioinfor-matics vol 29 no 12 pp 1565ndash1567 2013

[13] J Meng Z Lu H Liu et al ldquoA protocol for RNA methylationdifferential analysis with MeRIP-Seq data and exomePeakRBioconductor packagerdquo Methods vol 69 no 3 pp 274ndash2812014

[14] H Liu M A Flores J Meng et al ldquoMeT-DB a database oftranscriptome methylation in mammalian cellsrdquo Nucleic AcidsResearch vol 43 no 1 pp D197ndashD203 2015

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

2 BioMed Research International

for interpretation of these datasets A mammalian RNAmethylation database [14] has been created that paved theway for a systematic understanding of the RNA methylomeregulation mechanism [15] however to our knowledge nobioinformatics effort has been made specifically for effectivevisualization of RNA methylation features from global levelConceivably the functions of RNA-related features are likelyto be related to the landmarks of RNA transcripts that istranscription starting site (TSS) start codon stop codonand transcription ending site (TES) and the existing toolsdeveloped for genome-based features are not effective foranalysis of RNA methylation data

Compared with genome-regulated biological features(eg histonemodifications and TFBS) visualization of RNA-related features (such as RNA methylation sites) representedin genomic coordinates is nontrivial due to the followingreasons

(i) Increased Ambiguity due to IsoformsThe ambiguity ingenome-based coordinates is often due to the repeatsCompared with genome transcriptome has increasedcomplexity due to transcript isoform and alternativesplicing It is possible that an RNA methylation sitefalls into the CDS of one transcript but on the31015840UTR of another isoform transcript It is knownthat many genes have very large number of isoformtranscripts so it may be impossible to specificallyassign a genomic feature aligned on this gene to aparticular isoform transcript

(ii) Variation in Transcript Length In human and mousetranscripts vary greatly in size It is important tonotice that 1000 base pairs (bp) from start codonmay be a long distance for shorter genes but notfor the longer ones For genomic coordinates bphas been shown to be a reasonable unit howeverwhen measuring distance between two RNA-relatedfeatures a relative unit standardized by the entirewidth of the transcript may be more suitable Thisidea has been used widely today in many studiesPlease note that the technical resolution remains thesame when measured in bp on shorter or longertranscripts When a standardized coordinate is usedlonger transcripts actually obtain a higher resolutionin terms of standardized coordinate compared withthe shorter transcripts Compared with a transcriptwith 20 k bp it is more difficult to tell whether theRNA methylation site on a 200 bp transcript is closeto its stop codon even though the technical resolutionremains the same

(iii) Complexity in Landmarks of RNA Transcript For his-tone modifications and transcription factor bindingsites landmarks of interests are often the transcrip-tion starting site (TSS) and transcription ending sites(TES) For RNA methylation two additional land-marks are the start codon and stop codon of mRNAand the setting is further complicated by the existenceof long noncoding RNA (lncRNA) and various smallRNA families Furthermore some mRNAs may nothave 31015840UTRor 51015840UTR Additionally the 51015840UTR CDS

and 31015840UTR are apparently of different length for mostgenes so the 3 components need to be standardizedindependently when summarized or compared withother transcripts There are already tools such asngsplot [16] developed to handle TSS and TES butto our knowledge none supportsmore detailed struc-ture (start codon and stop codon) It is important toinclude all RNA landmarks and discriminate codingand noncoding RNAs when analyzing RNA-relatedfeatures due to their intrinsic property

For the aforementioned reasons a dedicated approach needsto be developed for visualization of RNA methylation dataand other RNA-related biological features

2 Design and Implementation

We develop here an open source R package Guitar for geneannotation guided transcriptomic analysis of RNA-relatedgenomic features such as RNA methylation sites denoted ingenome-based coordinates The approach is detailed next

21 Guitar Coordinates To visualize the multiple RNA-related features together transcripts of different length needto be standardized in the first place For this purpose weconstructed the Guitar coordinates which is essentially thegenomic projection of the standardized transcriptomic coor-dinates Specifically each component of a single transcriptis divided into a number of bins of equal width For longnoncoding RNA the whole transcript is a single componentfor mRNA there are 3 components that is 51015840UTR CDSand 31015840UTR Their genomic projected coordinates are thenobtained with the help of GenomicFeatures RBioconductorpackage [1] Please note that of interest are the mature mRNAand lncRNA and it is possible that a specific bin may spanintrons The generated Guitar coordinates are essentially stillgenome-based coordinates but clearly associated with land-marks of transcript for example 02 standardized lncRNAlength from the TSS The procedures for generating Guitarcoordinates are illustrated in Figure 1

22 Guitar Coordinates of a Transcriptome As mentionedpreviously for mRNA of interest are usually 3 componentsrather than a single one that is 51015840UTR CDS and 31015840UTRConsistently the Guitar coordinates need to be generatedseparately for all the 3 regions In order to make the 3components comparable each component is standardizedindependently and contributes to 13 of the entire codingtranscript (the difference between 51015840UTR CDS and 31015840UTRin size can also be reflected in the analysis byGuitar package)For lncRNA this is not needed and theGuitar coordinates aregenerated for the entire lncRNA

Due to the existence of isoform ambiguity the samegenomic locationmay be associated withmultiple transcriptsand thus related to multiple Guitar coordinates To ensurethe specificity of the generated Guitar coordinates filteringof highly ambiguous transcripts may be needed Two filtersare implemented Firstly a length filter is implemented toselect transcripts longer than a user-defined threshold This

BioMed Research International 3

lncRNA

Pre-lncRNAExon-1 Exon-2 Exon-3Intron Intron

Binning

Bin-1 Bin-2 Bin-3

Guitar coordinates

Exon-1 Exon-2 Exon-3Intron Intron

Bin-1 Bin-2 Bin-3

Convert to genomiccoordinates

5

5

5

5

3

3

3

3

Standardized coordinateson lncRNA

Splicing

Figure 1 Guitar coordinates This figure illustrates how the Guitar coordinates are generated based on 3 bins on lncRNA transcript Thebins may be split into multiple sections of the transcript represented by GRangesList object which can be conveniently compared with othergenome-based features to reflect their distribution In practice if 70 of a feature overlap 51015840UTR and 30 CDS then it is likely that thefeature overlaps with more Guitar Coordinates (GCs) composed from 51015840UTR than from CDS As a result more GCs from 51015840UTR will reportthe overlapping of this feature and the resulting weight will precisely reflect the 70 and 30 division

is to ensure the generated Guitar coordinates have sufficientresolution from the technology perspective with the dataanalyzed For techniques with single-base resolution thiscutoff can be smaller (eg 10 bp) while for techniqueswith lower resolution it should be larger (eg 100 bp)Secondly an ambiguity filter is implemented to discard geneswith too many isoforms which can cause ambiguity in thefeature assignment stage The implementation of this filter ismainly to reduce memory usage and save computation time(see Supplementary Material Figure S1 and Table S1 for acomparison of different settings in Supplementary Materialavailable online at httpdxdoiorg10115520168367534) Inaddition the filter may be necessary for less-studied specieswith problematic gene annotationThe second filter can filterout a number of genes with higher isoform ambiguity anddoing so will significantly decrease the memory usage forthe constructed Guitar coordinates and save computationtime For this purpose we implemented a simple strategyby counting the number of overlapping transcripts on thegenome that is checking the number of transcripts sharingexons In the default setting ofGuitar package we filtered outtranscripts that overlap with more than 3 other transcriptson the genome The parameter provides reasonable goodresults in the data tested and can be easily customized by theuser After applying the ambiguity filter if a genomic featurestill overlaps with multiple transcripts the ambiguity will befactored in the analysis Specifically if a site is located onboth CDS and 31015840UTR of the same transcript then the Guitarcoordinates associated with those components should haveoverlapped with that genomic feature and those coordinatesare labeled as associatedwith the genomic feature withweight1 indicating the feature is fully associated with this transcriptHowever if a genomic feature is associated with 119899 transcripts

that is it overlaps with 119899 transcripts then the overlappingcoordinates are associated with that genomic feature withweight 1119899 reflecting the ambiguity in association

23 Transcriptomic View of Genomic Features To sketchthe transcriptomic view of genomic features the Guitarcoordinates can be compared with other genomic featuresand the number of overlapped features can then be countedand standardized on mRNA and lncRNA respectively Thedistribution of genomic features onRNAwill then be summa-rized and visualized by ggplot2 package for quality graphics[17] We also provided a convenient function GuitarPlotfor fast visualization of various genomic features in differentformats The general working procedures for Guitar packageare shown in Figure 2

Besides the aforementioned structure there are a fewmore useful features provided by Guitar package

(i) Including neighborhood DNA regions for comparisonThe neighborhood DNA regions (promoter regionand its complementary DNA at the 31015840 end) can beoptionally included in analysis ofGuitar packageThiswill be useful for analyzing genomic features that arerelated to both DNA and RNA such as H3K4me3ChIP-Seq data

(ii) Ambiguous Assignment of the Ambiguous GenomicFeatures Due to ambiguity of heterogeneous tran-scriptome some genomic features overlap with morethan one transcript and thus with ambiguous belong-ings In the Guitar package ambiguous features arealso counted in the analysis with their weights equallydivided between all the overlapping transcripts How-ever we can optionally filter out genomic features

4 BioMed Research International

Prepare transcript info

(i) Downloadedfrom UCSC

(ii) Filtered by size and ambiguity

Build guitar coordinates Guitar plot

(i) mRNA

(ii) lncRNA(iii) Promoter and

tail DNAregions

5998400UTR CDS

and 3998400UTR

(i) Count overlaps withother features

(ii) Assign weights

(iii) Normalizationfor distribution

Figure 2 Workflow of Guitar package The gene annotation can be automatically downloaded from Internet or provided as a TxDb object[1] by the user After filtering transcripts that may not provide useful information Guitar coordinates of different components are builtindependently with which the actual transcriptomic distribution of genomic features can be observed

overlapping with more than a predefined numberof transcripts (Default 5) to completely discard theimpact of these highly ambiguous features

(iii) Resizing the Components in Visualization In practicedifferent components of mRNA that is 51015840UTR CDSand 31015840UTR are of quite different width 51015840UTR isusually much shorter than 31015840UTR and CDS in mouseand human Although these components are treatedindependently when building the Guitar coordinateswe still calculated the average length of each compo-nent across different genes so the true relative widthcan be optionally reflected in the plot generated fromGuitar package however doing so may make the51015840UTR region too small for clear observation

(iv) Connection with ggplot2 for More Complex Graph-ics The Guitar may optionally return intermediateresults which can be reused in the ggplot2 packagefor more complex graphics This is designed for theadvanced users only and not recommended

3 Results

We developed the Guitar RBioconductor package for geneannotation guided transcriptomic analysis of RNA-relatedgenomic features It is currently and publicly available fromBioconductor In this section we show the application ofGuitar package to the analysis of RNA methylation sitesdenoted as genomic features with a few examples Moreinformation is available from the documentation of theGuitar RBioconductor package

31 Case Study 1 RNA N6-Methyladenosine from MeRIP-SeqIn the first example we studyMeRIP-Seq data [10] profiling ofthe transcriptome RNA N6-methyladenosine (m6A) [18 19]sites in human HepG2 cell lines The RNA m6A methylationsites are obtained with exomePeak RBioconductor package[12] Compared with other peak calling algorithms oneadded flexibility of the exomePeak package is to detect onlythe highly methylated sites With a larger ldquoIPinput ratiordquoexomePeak will report only the highly methylated sites thatis a higher proportion of a specific kind of RNA moleculecarries methylation at this site

We implemented exomePeak setting different ldquoIPinputratiordquo (1 2 4 and 8) and we then examined the distributionsof the peaks in different exonic regions at each enrichmentthreshold As shown in Figure 3 the detected m6A sites withmore than 8 times of enrichment are overly present near thestop codonbut highly deficient near transcription starting site(TSS) and on 51015840UTR The different distribution patterns ofthe highly and weakly methylated sites may indicate functionversatility and call for more specialized analysis targetingthe sites with different methylation level In contrast todistribution preference of m6A sites on mRNA it is almostuniformly (or randomly) distributed on lncRNA regardlessof the ldquoIPinput ratiordquo specified

In the aforementioned example we implemented thedefault

32 Case Study 2 RNA 5-Methylcytosine from BS-Seq Inprevious study we confirmed that RNA m6A methylationsites are enriched near stop codon and discovered that highlyenrichedm6Amethylation sites are depleted on 51015840UTRNextwe study 5-methylcytosine (m5C) [20 21] profiled with RNABS-Seq experiment [11 22] Different from the relatively well-studiedDNAm5Cmethylation [23 24] the functions of RNAm5C on mRNA are still largely elusive [20] and its globaldistributions onmRNAand lncRNA are poorly characterizedso far

The bisulfite sequencing data characterizing the RNAm5C methylation profiles in mouse embryo fibroblast [25]was directly obtained from GEO (GSE44359) and thenprocessed with trim galore [26] to remove low-quality readsand adaptor sequences the processed reads are then alignedto mouse mm10 genome assembly with Bismark [27] and424627 Cytosine (C) residuals are called with at least 5 readsaligned To analyze the distribution ofmethylatedC residualswe further divided all the reported C residuals into 3 groupsbased on their methylation level with a binomial test usingthe average methylation probability (165) of the entiretranscriptome at confidence level 005

It can be seen from Figure 4 that the distribution ofm5C on both lncRNA and mRNA has a very strong 51015840 biasthat is more reported C residuals from BS-Seq data are onthe 51015840UTR region The distribution patterns of 3 groups ofresiduals with different methylation level are quite different

BioMed Research International 5

00

05

10

15

20

25Fr

eque

ncy

Fold enrichment

12

48

Fold enrichment

12

48

Distribution on mRNA

lncRNA00

05

10

15

Freq

uenc

y

Distribution on lncRNA

5998400UTR CDS 3

998400UTR

Figure 3 m6A sites on mRNA and lncRNA In mRNA the strongest binding sites (ldquoIPinput ratiordquo larger than 8) are highly enriched nearstop codon side of 31015840UTR and deficient on TSS (transcription starting site) side of 51015840UTR and the phenomena are more prominent than lowlymethylated sites In contrast the m6A sites are almost uniformly distributed on lncRNA despite the ldquoIPinput ratiordquo specified Please notethat in this figure the size of 51015840UTR CDS and 31015840UTR reflects their true width within the transcriptome so the 51015840UTR region is much shortercompared with the other two components This result is based on peaks called on human HepG2 dataset [10]

0

1

2

Freq

uenc

y

Methylation levelHigherUnsureLower

Distribution on mRNA

lncRNA00

05

10

15

20

Freq

uenc

y

Distribution on lncRNA

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTR

Figure 4 Distribution of RNAm5C residuals on mRNA and lncRNA We divided all the Bismark reported C residuals into 3 groups that is9514 residuals with methylation level higher than 165 8600 lower and 406513 residuals cannot be determined with statistical significanceWe can see that for all 3 groups there exists a strong 51015840 bias on the mouse RNA BS-Seq data on both mRNA and lncRNA The highly m5Cmethylated residuals are strongly enriched at 51015840UTR of mRNA

on mRNA with highly methylated Cs (higher than 165with 005 confidence level) prominently enriched on 51015840UTRand near start codon A cytosine residual located on 51015840UTRis around 1 times more likely to be methylated than thaton CDS and 068 times more likely than that on 31015840UTRWhile previous study indicates RNAm5C is enriched on both

51015840UTR and 31015840UTR our analysis with increased resolutionindicates the enrichment is a lot stronger on 51015840UTR than31015840UTR [20] We also notice the pattern is complementary tothe distribution of m6A which is enriched near stop codonand 31015840UTR [8] (see Figure 3) These observations suggestthat RNA m5C and m6A may work in a complementary

6 BioMed Research International

manner On lncRNA however despite the overall 51015840 biasthere is no apparent difference in their distribution patternsrevealed

We in the following compared the relative distributionof highly and lowly m5C methylated C residuals on RNAand DNA The same processing pipe line as described pre-viously is applied to mouse embryo fibroblast whole genomebisulfite sequencing data [25] which profiles the DNA m5Cmethylation rather than that on RNA A total of 3641243cytosine residuals are reported by Bismark and they are thendivided into 3 groups (988335 1966017 and 686891 residualsresp) based onwhether theirmethylation level is significantlyhigher or lower than the average reported DNA methylationlevel (6579) with a binomial test at 005 confidence levelThe distributions of the 3 groups of cytosine residuals arestill profiled by Guitar package with neighborhood DNAregions included and the results are shown in Figure 5 Wecan see that compared with RNA methylation profile wherethe highly methylated residuals are enriched on 51015840UTRthe DNA methylation profile shows a clear oppose patternwith the lowly methylated residuals enriched on 51015840UTR andpeaked near the start codon to enable transcription initiationThis observation indicates DNA and RNA methyltransferasecomplexes probably have quite different sequence specificityeven though some key enzyme genes such as Dnmt2 [28 29]are shared between them [30]

4 Discussion and Conclusion

Currently most biological features are represented withgenome-based coordinates making it rather tedious forcomparing with transcriptomic landmarks We developed aGuitar R package which can be a useful tool for analyzingRNA-related genomic features especially RNA methylationBuilt upon the highly efficient GRangesList structure andGenomicFeatures RBioconductor packages Guitar can effi-ciently process millions of genomic features within minutesfor efficient transcriptomic analysis It may automaticallydownload gene information from UCSC genome browserincluding neighborhood DNA regions and allocate theweight of ambiguous features The developed Guitar coordi-nates also provide low level conversion from transcript-basedcoordinates to genome-based coordinates which shouldfacilitate various customized analysis Nevertheless there arestill a number of issues remaining to be addressed

Firstly highly abundant transcripts and ambiguousgenomic features may dominate the result from Guitar pack-age Currently all genomic features and all RNA transcriptshave the same weight this is fine when the genomic featuresare biological features such as RNA methylation sites butit may cause problem when the genomic features are NGSmapped reads such as RNAseq reads because some genesare a lot more highly expressed than the other genes and thereads generated from these genes may dominate the result Amore robust approach may be to use the median abundanceof all RNA transcripts rather the summed density It is alsopossible to develop a boxplot-like visualization scheme in thefuture to indicate the estimated higher and lower bounds forthe distribution of tested features

000

025

050

075

100

Freq

uenc

y

000

025

050

075

100

Freq

uenc

y

5998400UTR CDS 3

998400UTR

Methylation levelHigherUnsureLower

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTRPromoter(1kb) (1kb)

Tail

RNA m5C on mRNA

DNA m5C on mRNA encoding region

Figure 5 Distribution of DNA and RNA m5C residuals WhileC residuals of mRNA are likely to be methylated on 51015840UTR it isopposite on DNA On DNA the start codon and 51015840UTR regions areless likely to bemethylated compared with other regions to allow theinitiation of transcription process

Secondly the developed normalization scheme withinGuitar package only facilitates the comparison of featuredistribution on mRNA transcripts or on lncRNA transcriptsA cross-comparison between mRNA and lncRNA is not sup-ported so far In general it is still very difficult to rigorouslycontrast the distribution of genomic features on mRNA andlncRNA due to their intrinsic difference Compared withmRNA lncRNA usually has lower expression level and lessnumber of exons and is less conserved in sequence

Thirdly because of the heterogeneity of transcriptomeand the discrepancy between transcriptomic and genomicfeatures ambiguity arises and cannot be resolved perfectlyPrevious study of transcriptomic distribution of RNAmethy-lation relies on the longest transcript to eliminate ambiguity

BioMed Research International 7

in feature assignment whichmay require further justificationon why the longest transcript should be the canonical one Amore reasonable solution may be to rely on the most abun-dant isoform transcript however the isoform quantificationis usually a difficult problem and such information may notbe handy and sometimes may be unavailable at all CurrentlyGuitar package adapts two strategies to address this issueFirstly highly ambiguous genomic features or transcripts areexcluded from the analysis Secondly the weight of ambigu-ousmapping is evenly divided among transcripts It should bepossible to develop amore rigorous formulation for examplewith fuzzy system model [31ndash33] to further improve theallocation of weight from ambiguous overlapping

Fourthly during the development process of the Gui-tar package we also realized the Guitar coordinates maypotentially be used for accessing the data quality of MeRIP-Seq datasets The reproducibility uniformity homogeneityand the information entropy within RNA-related sequencingdata (RNA-SeqMeRIP-Seq and BS-Seq) can be convenientlyassessed with the help of Guitar coordinates however thisapplication has not been explored so far andwill be our futurework

Despite the aforementioned limitations the Guitar pack-age is capable of sketching the approximate transcriptomicdistribution of millions of genomic features within minutesin a single line of command We believe that the newlydeveloped Guitar coordinates and the Guitar package haveachieved stable performance on the data tested and shouldeffectively facilitate the analysis of RNAmethylation data andother RNA-related biological features

Additional Points

Project name is Guitar Project homepage is httpbiocon-ductororgpackagesGuitar Operating system(s) is Plat-form Independent Programming language is R License isGPL-2 For manual and guide please see supplementarymaterials

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Jia Meng Yufei Huang and Shao-Wu Zhang conceived theproject Xiaodong Cui Zhen Wei and Jia Meng carried outdevelopment and implementation of the software draftedthe paper and carried out development and implementationof the software Lin Zhang Hui Liu and Lei Sun providedvaluable discussion and helped draft the paper All authorsread and approved the final paper Xiaodong Cui and ZhenWei contribute equally to this work

Acknowledgments

This study is supported by National Natural Science Foun-dation of China (61401370 61473232 91430111 61501466

and 61301220) Fundamental Research Funds for the Cen-tral Universities (2014QNB47 2014QNA84) Jiangsu Scienceand Technology Program (BK20140403) and US NationalInstitutes of Health 5 U54 CA113001 and R01GM113245The authors also appreciate the computational support fromComputational Systems Biology Core University of Texas atSanAntonio funded byNational Institute onMinorityHealthand Health Disparities (G12MD007591) from the NationalInstitutes of Health of USA

References

[1] M LawrenceWHuberH Pages et al ldquoSoftware for computingand annotating genomic rangesrdquo PLoS Computational Biologyvol 9 no 8 Article ID e1003118 2013

[2] N Liu and T Pan ldquoRNA epigeneticsrdquo Translational Researchvol 165 no 1 pp 28ndash35 2015

[3] X Wang B S Zhao I A Roundtree et al ldquoN(6)-methylad-enosine modulates messenger RNA translation efficiencyrdquo Cellvol 161 no 6 pp 1388ndash1399 2015

[4] C R Alarcon H Lee H Goodarzi N Halberg and S FTavazoie ldquoN 6-methyladenosinemarks primarymicroRNAs forprocessingrdquo Nature vol 519 no 7544 pp 482ndash485 2015

[5] N Liu Q Dai G Zheng C He M Parisien and T Pan ldquoN6-methyladenosine-dependent RNA structural switches regulateRNA-protein interactionsrdquo Nature vol 518 no 7540 pp 560ndash564 2015

[6] YWang Y Li J I TothMD Petroski Z Zhang and J C ZhaoldquoN 6-methyladenosinemodification destabilizes developmentalregulators in embryonic stem cellsrdquo Nature Cell Biology vol 16no 2 pp 191ndash198 2014

[7] S Geula S Moshitch-Moshkovitz D Dominissini et al ldquom6AmRNA methylation facilitates resolution of naıve pluripotencytoward differentiationrdquo Science vol 347 no 6225 pp 1002ndash1006 2015

[8] K D Meyer Y Saletore P Zumbo O Elemento C E Masonand S R Jaffrey ldquoComprehensive analysis of mRNA methyla-tion reveals enrichment in 31015840 UTRs and near stop codonsrdquo Cellvol 149 no 7 pp 1635ndash1646 2012

[9] DDominissini SMoshitch-MoshkovitzM Salmon-DivonNAmariglio and G Rechavi ldquoTranscriptome-wide mapping ofN6-methyladenosine by m6A-seq based on immunocapturingand massively parallel sequencingrdquo Nature Protocols vol 8 no1 pp 176ndash189 2013

[10] D Dominissini S Moshitch-Moshkovitz S Schwartz et alldquoTopology of the human and mouse m6A RNA methylomesrevealed by m6A-seqrdquo Nature vol 484 no 7397 pp 201ndash2062012

[11] M Schaefer T Pollex K Hanna and F Lyko ldquoRNA cytosinemethylation analysis by bisulfite sequencingrdquo Nucleic AcidsResearch vol 37 no 2 article e12 2009

[12] J Meng X Cui M K Rao Y Chen and Y Huang ldquoExome-based analysis for RNA epigenome sequencing datardquo Bioinfor-matics vol 29 no 12 pp 1565ndash1567 2013

[13] J Meng Z Lu H Liu et al ldquoA protocol for RNA methylationdifferential analysis with MeRIP-Seq data and exomePeakRBioconductor packagerdquo Methods vol 69 no 3 pp 274ndash2812014

[14] H Liu M A Flores J Meng et al ldquoMeT-DB a database oftranscriptome methylation in mammalian cellsrdquo Nucleic AcidsResearch vol 43 no 1 pp D197ndashD203 2015

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

BioMed Research International 3

lncRNA

Pre-lncRNAExon-1 Exon-2 Exon-3Intron Intron

Binning

Bin-1 Bin-2 Bin-3

Guitar coordinates

Exon-1 Exon-2 Exon-3Intron Intron

Bin-1 Bin-2 Bin-3

Convert to genomiccoordinates

5

5

5

5

3

3

3

3

Standardized coordinateson lncRNA

Splicing

Figure 1 Guitar coordinates This figure illustrates how the Guitar coordinates are generated based on 3 bins on lncRNA transcript Thebins may be split into multiple sections of the transcript represented by GRangesList object which can be conveniently compared with othergenome-based features to reflect their distribution In practice if 70 of a feature overlap 51015840UTR and 30 CDS then it is likely that thefeature overlaps with more Guitar Coordinates (GCs) composed from 51015840UTR than from CDS As a result more GCs from 51015840UTR will reportthe overlapping of this feature and the resulting weight will precisely reflect the 70 and 30 division

is to ensure the generated Guitar coordinates have sufficientresolution from the technology perspective with the dataanalyzed For techniques with single-base resolution thiscutoff can be smaller (eg 10 bp) while for techniqueswith lower resolution it should be larger (eg 100 bp)Secondly an ambiguity filter is implemented to discard geneswith too many isoforms which can cause ambiguity in thefeature assignment stage The implementation of this filter ismainly to reduce memory usage and save computation time(see Supplementary Material Figure S1 and Table S1 for acomparison of different settings in Supplementary Materialavailable online at httpdxdoiorg10115520168367534) Inaddition the filter may be necessary for less-studied specieswith problematic gene annotationThe second filter can filterout a number of genes with higher isoform ambiguity anddoing so will significantly decrease the memory usage forthe constructed Guitar coordinates and save computationtime For this purpose we implemented a simple strategyby counting the number of overlapping transcripts on thegenome that is checking the number of transcripts sharingexons In the default setting ofGuitar package we filtered outtranscripts that overlap with more than 3 other transcriptson the genome The parameter provides reasonable goodresults in the data tested and can be easily customized by theuser After applying the ambiguity filter if a genomic featurestill overlaps with multiple transcripts the ambiguity will befactored in the analysis Specifically if a site is located onboth CDS and 31015840UTR of the same transcript then the Guitarcoordinates associated with those components should haveoverlapped with that genomic feature and those coordinatesare labeled as associatedwith the genomic feature withweight1 indicating the feature is fully associated with this transcriptHowever if a genomic feature is associated with 119899 transcripts

that is it overlaps with 119899 transcripts then the overlappingcoordinates are associated with that genomic feature withweight 1119899 reflecting the ambiguity in association

23 Transcriptomic View of Genomic Features To sketchthe transcriptomic view of genomic features the Guitarcoordinates can be compared with other genomic featuresand the number of overlapped features can then be countedand standardized on mRNA and lncRNA respectively Thedistribution of genomic features onRNAwill then be summa-rized and visualized by ggplot2 package for quality graphics[17] We also provided a convenient function GuitarPlotfor fast visualization of various genomic features in differentformats The general working procedures for Guitar packageare shown in Figure 2

Besides the aforementioned structure there are a fewmore useful features provided by Guitar package

(i) Including neighborhood DNA regions for comparisonThe neighborhood DNA regions (promoter regionand its complementary DNA at the 31015840 end) can beoptionally included in analysis ofGuitar packageThiswill be useful for analyzing genomic features that arerelated to both DNA and RNA such as H3K4me3ChIP-Seq data

(ii) Ambiguous Assignment of the Ambiguous GenomicFeatures Due to ambiguity of heterogeneous tran-scriptome some genomic features overlap with morethan one transcript and thus with ambiguous belong-ings In the Guitar package ambiguous features arealso counted in the analysis with their weights equallydivided between all the overlapping transcripts How-ever we can optionally filter out genomic features

4 BioMed Research International

Prepare transcript info

(i) Downloadedfrom UCSC

(ii) Filtered by size and ambiguity

Build guitar coordinates Guitar plot

(i) mRNA

(ii) lncRNA(iii) Promoter and

tail DNAregions

5998400UTR CDS

and 3998400UTR

(i) Count overlaps withother features

(ii) Assign weights

(iii) Normalizationfor distribution

Figure 2 Workflow of Guitar package The gene annotation can be automatically downloaded from Internet or provided as a TxDb object[1] by the user After filtering transcripts that may not provide useful information Guitar coordinates of different components are builtindependently with which the actual transcriptomic distribution of genomic features can be observed

overlapping with more than a predefined numberof transcripts (Default 5) to completely discard theimpact of these highly ambiguous features

(iii) Resizing the Components in Visualization In practicedifferent components of mRNA that is 51015840UTR CDSand 31015840UTR are of quite different width 51015840UTR isusually much shorter than 31015840UTR and CDS in mouseand human Although these components are treatedindependently when building the Guitar coordinateswe still calculated the average length of each compo-nent across different genes so the true relative widthcan be optionally reflected in the plot generated fromGuitar package however doing so may make the51015840UTR region too small for clear observation

(iv) Connection with ggplot2 for More Complex Graph-ics The Guitar may optionally return intermediateresults which can be reused in the ggplot2 packagefor more complex graphics This is designed for theadvanced users only and not recommended

3 Results

We developed the Guitar RBioconductor package for geneannotation guided transcriptomic analysis of RNA-relatedgenomic features It is currently and publicly available fromBioconductor In this section we show the application ofGuitar package to the analysis of RNA methylation sitesdenoted as genomic features with a few examples Moreinformation is available from the documentation of theGuitar RBioconductor package

31 Case Study 1 RNA N6-Methyladenosine from MeRIP-SeqIn the first example we studyMeRIP-Seq data [10] profiling ofthe transcriptome RNA N6-methyladenosine (m6A) [18 19]sites in human HepG2 cell lines The RNA m6A methylationsites are obtained with exomePeak RBioconductor package[12] Compared with other peak calling algorithms oneadded flexibility of the exomePeak package is to detect onlythe highly methylated sites With a larger ldquoIPinput ratiordquoexomePeak will report only the highly methylated sites thatis a higher proportion of a specific kind of RNA moleculecarries methylation at this site

We implemented exomePeak setting different ldquoIPinputratiordquo (1 2 4 and 8) and we then examined the distributionsof the peaks in different exonic regions at each enrichmentthreshold As shown in Figure 3 the detected m6A sites withmore than 8 times of enrichment are overly present near thestop codonbut highly deficient near transcription starting site(TSS) and on 51015840UTR The different distribution patterns ofthe highly and weakly methylated sites may indicate functionversatility and call for more specialized analysis targetingthe sites with different methylation level In contrast todistribution preference of m6A sites on mRNA it is almostuniformly (or randomly) distributed on lncRNA regardlessof the ldquoIPinput ratiordquo specified

In the aforementioned example we implemented thedefault

32 Case Study 2 RNA 5-Methylcytosine from BS-Seq Inprevious study we confirmed that RNA m6A methylationsites are enriched near stop codon and discovered that highlyenrichedm6Amethylation sites are depleted on 51015840UTRNextwe study 5-methylcytosine (m5C) [20 21] profiled with RNABS-Seq experiment [11 22] Different from the relatively well-studiedDNAm5Cmethylation [23 24] the functions of RNAm5C on mRNA are still largely elusive [20] and its globaldistributions onmRNAand lncRNA are poorly characterizedso far

The bisulfite sequencing data characterizing the RNAm5C methylation profiles in mouse embryo fibroblast [25]was directly obtained from GEO (GSE44359) and thenprocessed with trim galore [26] to remove low-quality readsand adaptor sequences the processed reads are then alignedto mouse mm10 genome assembly with Bismark [27] and424627 Cytosine (C) residuals are called with at least 5 readsaligned To analyze the distribution ofmethylatedC residualswe further divided all the reported C residuals into 3 groupsbased on their methylation level with a binomial test usingthe average methylation probability (165) of the entiretranscriptome at confidence level 005

It can be seen from Figure 4 that the distribution ofm5C on both lncRNA and mRNA has a very strong 51015840 biasthat is more reported C residuals from BS-Seq data are onthe 51015840UTR region The distribution patterns of 3 groups ofresiduals with different methylation level are quite different

BioMed Research International 5

00

05

10

15

20

25Fr

eque

ncy

Fold enrichment

12

48

Fold enrichment

12

48

Distribution on mRNA

lncRNA00

05

10

15

Freq

uenc

y

Distribution on lncRNA

5998400UTR CDS 3

998400UTR

Figure 3 m6A sites on mRNA and lncRNA In mRNA the strongest binding sites (ldquoIPinput ratiordquo larger than 8) are highly enriched nearstop codon side of 31015840UTR and deficient on TSS (transcription starting site) side of 51015840UTR and the phenomena are more prominent than lowlymethylated sites In contrast the m6A sites are almost uniformly distributed on lncRNA despite the ldquoIPinput ratiordquo specified Please notethat in this figure the size of 51015840UTR CDS and 31015840UTR reflects their true width within the transcriptome so the 51015840UTR region is much shortercompared with the other two components This result is based on peaks called on human HepG2 dataset [10]

0

1

2

Freq

uenc

y

Methylation levelHigherUnsureLower

Distribution on mRNA

lncRNA00

05

10

15

20

Freq

uenc

y

Distribution on lncRNA

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTR

Figure 4 Distribution of RNAm5C residuals on mRNA and lncRNA We divided all the Bismark reported C residuals into 3 groups that is9514 residuals with methylation level higher than 165 8600 lower and 406513 residuals cannot be determined with statistical significanceWe can see that for all 3 groups there exists a strong 51015840 bias on the mouse RNA BS-Seq data on both mRNA and lncRNA The highly m5Cmethylated residuals are strongly enriched at 51015840UTR of mRNA

on mRNA with highly methylated Cs (higher than 165with 005 confidence level) prominently enriched on 51015840UTRand near start codon A cytosine residual located on 51015840UTRis around 1 times more likely to be methylated than thaton CDS and 068 times more likely than that on 31015840UTRWhile previous study indicates RNAm5C is enriched on both

51015840UTR and 31015840UTR our analysis with increased resolutionindicates the enrichment is a lot stronger on 51015840UTR than31015840UTR [20] We also notice the pattern is complementary tothe distribution of m6A which is enriched near stop codonand 31015840UTR [8] (see Figure 3) These observations suggestthat RNA m5C and m6A may work in a complementary

6 BioMed Research International

manner On lncRNA however despite the overall 51015840 biasthere is no apparent difference in their distribution patternsrevealed

We in the following compared the relative distributionof highly and lowly m5C methylated C residuals on RNAand DNA The same processing pipe line as described pre-viously is applied to mouse embryo fibroblast whole genomebisulfite sequencing data [25] which profiles the DNA m5Cmethylation rather than that on RNA A total of 3641243cytosine residuals are reported by Bismark and they are thendivided into 3 groups (988335 1966017 and 686891 residualsresp) based onwhether theirmethylation level is significantlyhigher or lower than the average reported DNA methylationlevel (6579) with a binomial test at 005 confidence levelThe distributions of the 3 groups of cytosine residuals arestill profiled by Guitar package with neighborhood DNAregions included and the results are shown in Figure 5 Wecan see that compared with RNA methylation profile wherethe highly methylated residuals are enriched on 51015840UTRthe DNA methylation profile shows a clear oppose patternwith the lowly methylated residuals enriched on 51015840UTR andpeaked near the start codon to enable transcription initiationThis observation indicates DNA and RNA methyltransferasecomplexes probably have quite different sequence specificityeven though some key enzyme genes such as Dnmt2 [28 29]are shared between them [30]

4 Discussion and Conclusion

Currently most biological features are represented withgenome-based coordinates making it rather tedious forcomparing with transcriptomic landmarks We developed aGuitar R package which can be a useful tool for analyzingRNA-related genomic features especially RNA methylationBuilt upon the highly efficient GRangesList structure andGenomicFeatures RBioconductor packages Guitar can effi-ciently process millions of genomic features within minutesfor efficient transcriptomic analysis It may automaticallydownload gene information from UCSC genome browserincluding neighborhood DNA regions and allocate theweight of ambiguous features The developed Guitar coordi-nates also provide low level conversion from transcript-basedcoordinates to genome-based coordinates which shouldfacilitate various customized analysis Nevertheless there arestill a number of issues remaining to be addressed

Firstly highly abundant transcripts and ambiguousgenomic features may dominate the result from Guitar pack-age Currently all genomic features and all RNA transcriptshave the same weight this is fine when the genomic featuresare biological features such as RNA methylation sites butit may cause problem when the genomic features are NGSmapped reads such as RNAseq reads because some genesare a lot more highly expressed than the other genes and thereads generated from these genes may dominate the result Amore robust approach may be to use the median abundanceof all RNA transcripts rather the summed density It is alsopossible to develop a boxplot-like visualization scheme in thefuture to indicate the estimated higher and lower bounds forthe distribution of tested features

000

025

050

075

100

Freq

uenc

y

000

025

050

075

100

Freq

uenc

y

5998400UTR CDS 3

998400UTR

Methylation levelHigherUnsureLower

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTRPromoter(1kb) (1kb)

Tail

RNA m5C on mRNA

DNA m5C on mRNA encoding region

Figure 5 Distribution of DNA and RNA m5C residuals WhileC residuals of mRNA are likely to be methylated on 51015840UTR it isopposite on DNA On DNA the start codon and 51015840UTR regions areless likely to bemethylated compared with other regions to allow theinitiation of transcription process

Secondly the developed normalization scheme withinGuitar package only facilitates the comparison of featuredistribution on mRNA transcripts or on lncRNA transcriptsA cross-comparison between mRNA and lncRNA is not sup-ported so far In general it is still very difficult to rigorouslycontrast the distribution of genomic features on mRNA andlncRNA due to their intrinsic difference Compared withmRNA lncRNA usually has lower expression level and lessnumber of exons and is less conserved in sequence

Thirdly because of the heterogeneity of transcriptomeand the discrepancy between transcriptomic and genomicfeatures ambiguity arises and cannot be resolved perfectlyPrevious study of transcriptomic distribution of RNAmethy-lation relies on the longest transcript to eliminate ambiguity

BioMed Research International 7

in feature assignment whichmay require further justificationon why the longest transcript should be the canonical one Amore reasonable solution may be to rely on the most abun-dant isoform transcript however the isoform quantificationis usually a difficult problem and such information may notbe handy and sometimes may be unavailable at all CurrentlyGuitar package adapts two strategies to address this issueFirstly highly ambiguous genomic features or transcripts areexcluded from the analysis Secondly the weight of ambigu-ousmapping is evenly divided among transcripts It should bepossible to develop amore rigorous formulation for examplewith fuzzy system model [31ndash33] to further improve theallocation of weight from ambiguous overlapping

Fourthly during the development process of the Gui-tar package we also realized the Guitar coordinates maypotentially be used for accessing the data quality of MeRIP-Seq datasets The reproducibility uniformity homogeneityand the information entropy within RNA-related sequencingdata (RNA-SeqMeRIP-Seq and BS-Seq) can be convenientlyassessed with the help of Guitar coordinates however thisapplication has not been explored so far andwill be our futurework

Despite the aforementioned limitations the Guitar pack-age is capable of sketching the approximate transcriptomicdistribution of millions of genomic features within minutesin a single line of command We believe that the newlydeveloped Guitar coordinates and the Guitar package haveachieved stable performance on the data tested and shouldeffectively facilitate the analysis of RNAmethylation data andother RNA-related biological features

Additional Points

Project name is Guitar Project homepage is httpbiocon-ductororgpackagesGuitar Operating system(s) is Plat-form Independent Programming language is R License isGPL-2 For manual and guide please see supplementarymaterials

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Jia Meng Yufei Huang and Shao-Wu Zhang conceived theproject Xiaodong Cui Zhen Wei and Jia Meng carried outdevelopment and implementation of the software draftedthe paper and carried out development and implementationof the software Lin Zhang Hui Liu and Lei Sun providedvaluable discussion and helped draft the paper All authorsread and approved the final paper Xiaodong Cui and ZhenWei contribute equally to this work

Acknowledgments

This study is supported by National Natural Science Foun-dation of China (61401370 61473232 91430111 61501466

and 61301220) Fundamental Research Funds for the Cen-tral Universities (2014QNB47 2014QNA84) Jiangsu Scienceand Technology Program (BK20140403) and US NationalInstitutes of Health 5 U54 CA113001 and R01GM113245The authors also appreciate the computational support fromComputational Systems Biology Core University of Texas atSanAntonio funded byNational Institute onMinorityHealthand Health Disparities (G12MD007591) from the NationalInstitutes of Health of USA

References

[1] M LawrenceWHuberH Pages et al ldquoSoftware for computingand annotating genomic rangesrdquo PLoS Computational Biologyvol 9 no 8 Article ID e1003118 2013

[2] N Liu and T Pan ldquoRNA epigeneticsrdquo Translational Researchvol 165 no 1 pp 28ndash35 2015

[3] X Wang B S Zhao I A Roundtree et al ldquoN(6)-methylad-enosine modulates messenger RNA translation efficiencyrdquo Cellvol 161 no 6 pp 1388ndash1399 2015

[4] C R Alarcon H Lee H Goodarzi N Halberg and S FTavazoie ldquoN 6-methyladenosinemarks primarymicroRNAs forprocessingrdquo Nature vol 519 no 7544 pp 482ndash485 2015

[5] N Liu Q Dai G Zheng C He M Parisien and T Pan ldquoN6-methyladenosine-dependent RNA structural switches regulateRNA-protein interactionsrdquo Nature vol 518 no 7540 pp 560ndash564 2015

[6] YWang Y Li J I TothMD Petroski Z Zhang and J C ZhaoldquoN 6-methyladenosinemodification destabilizes developmentalregulators in embryonic stem cellsrdquo Nature Cell Biology vol 16no 2 pp 191ndash198 2014

[7] S Geula S Moshitch-Moshkovitz D Dominissini et al ldquom6AmRNA methylation facilitates resolution of naıve pluripotencytoward differentiationrdquo Science vol 347 no 6225 pp 1002ndash1006 2015

[8] K D Meyer Y Saletore P Zumbo O Elemento C E Masonand S R Jaffrey ldquoComprehensive analysis of mRNA methyla-tion reveals enrichment in 31015840 UTRs and near stop codonsrdquo Cellvol 149 no 7 pp 1635ndash1646 2012

[9] DDominissini SMoshitch-MoshkovitzM Salmon-DivonNAmariglio and G Rechavi ldquoTranscriptome-wide mapping ofN6-methyladenosine by m6A-seq based on immunocapturingand massively parallel sequencingrdquo Nature Protocols vol 8 no1 pp 176ndash189 2013

[10] D Dominissini S Moshitch-Moshkovitz S Schwartz et alldquoTopology of the human and mouse m6A RNA methylomesrevealed by m6A-seqrdquo Nature vol 484 no 7397 pp 201ndash2062012

[11] M Schaefer T Pollex K Hanna and F Lyko ldquoRNA cytosinemethylation analysis by bisulfite sequencingrdquo Nucleic AcidsResearch vol 37 no 2 article e12 2009

[12] J Meng X Cui M K Rao Y Chen and Y Huang ldquoExome-based analysis for RNA epigenome sequencing datardquo Bioinfor-matics vol 29 no 12 pp 1565ndash1567 2013

[13] J Meng Z Lu H Liu et al ldquoA protocol for RNA methylationdifferential analysis with MeRIP-Seq data and exomePeakRBioconductor packagerdquo Methods vol 69 no 3 pp 274ndash2812014

[14] H Liu M A Flores J Meng et al ldquoMeT-DB a database oftranscriptome methylation in mammalian cellsrdquo Nucleic AcidsResearch vol 43 no 1 pp D197ndashD203 2015

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

4 BioMed Research International

Prepare transcript info

(i) Downloadedfrom UCSC

(ii) Filtered by size and ambiguity

Build guitar coordinates Guitar plot

(i) mRNA

(ii) lncRNA(iii) Promoter and

tail DNAregions

5998400UTR CDS

and 3998400UTR

(i) Count overlaps withother features

(ii) Assign weights

(iii) Normalizationfor distribution

Figure 2 Workflow of Guitar package The gene annotation can be automatically downloaded from Internet or provided as a TxDb object[1] by the user After filtering transcripts that may not provide useful information Guitar coordinates of different components are builtindependently with which the actual transcriptomic distribution of genomic features can be observed

overlapping with more than a predefined numberof transcripts (Default 5) to completely discard theimpact of these highly ambiguous features

(iii) Resizing the Components in Visualization In practicedifferent components of mRNA that is 51015840UTR CDSand 31015840UTR are of quite different width 51015840UTR isusually much shorter than 31015840UTR and CDS in mouseand human Although these components are treatedindependently when building the Guitar coordinateswe still calculated the average length of each compo-nent across different genes so the true relative widthcan be optionally reflected in the plot generated fromGuitar package however doing so may make the51015840UTR region too small for clear observation

(iv) Connection with ggplot2 for More Complex Graph-ics The Guitar may optionally return intermediateresults which can be reused in the ggplot2 packagefor more complex graphics This is designed for theadvanced users only and not recommended

3 Results

We developed the Guitar RBioconductor package for geneannotation guided transcriptomic analysis of RNA-relatedgenomic features It is currently and publicly available fromBioconductor In this section we show the application ofGuitar package to the analysis of RNA methylation sitesdenoted as genomic features with a few examples Moreinformation is available from the documentation of theGuitar RBioconductor package

31 Case Study 1 RNA N6-Methyladenosine from MeRIP-SeqIn the first example we studyMeRIP-Seq data [10] profiling ofthe transcriptome RNA N6-methyladenosine (m6A) [18 19]sites in human HepG2 cell lines The RNA m6A methylationsites are obtained with exomePeak RBioconductor package[12] Compared with other peak calling algorithms oneadded flexibility of the exomePeak package is to detect onlythe highly methylated sites With a larger ldquoIPinput ratiordquoexomePeak will report only the highly methylated sites thatis a higher proportion of a specific kind of RNA moleculecarries methylation at this site

We implemented exomePeak setting different ldquoIPinputratiordquo (1 2 4 and 8) and we then examined the distributionsof the peaks in different exonic regions at each enrichmentthreshold As shown in Figure 3 the detected m6A sites withmore than 8 times of enrichment are overly present near thestop codonbut highly deficient near transcription starting site(TSS) and on 51015840UTR The different distribution patterns ofthe highly and weakly methylated sites may indicate functionversatility and call for more specialized analysis targetingthe sites with different methylation level In contrast todistribution preference of m6A sites on mRNA it is almostuniformly (or randomly) distributed on lncRNA regardlessof the ldquoIPinput ratiordquo specified

In the aforementioned example we implemented thedefault

32 Case Study 2 RNA 5-Methylcytosine from BS-Seq Inprevious study we confirmed that RNA m6A methylationsites are enriched near stop codon and discovered that highlyenrichedm6Amethylation sites are depleted on 51015840UTRNextwe study 5-methylcytosine (m5C) [20 21] profiled with RNABS-Seq experiment [11 22] Different from the relatively well-studiedDNAm5Cmethylation [23 24] the functions of RNAm5C on mRNA are still largely elusive [20] and its globaldistributions onmRNAand lncRNA are poorly characterizedso far

The bisulfite sequencing data characterizing the RNAm5C methylation profiles in mouse embryo fibroblast [25]was directly obtained from GEO (GSE44359) and thenprocessed with trim galore [26] to remove low-quality readsand adaptor sequences the processed reads are then alignedto mouse mm10 genome assembly with Bismark [27] and424627 Cytosine (C) residuals are called with at least 5 readsaligned To analyze the distribution ofmethylatedC residualswe further divided all the reported C residuals into 3 groupsbased on their methylation level with a binomial test usingthe average methylation probability (165) of the entiretranscriptome at confidence level 005

It can be seen from Figure 4 that the distribution ofm5C on both lncRNA and mRNA has a very strong 51015840 biasthat is more reported C residuals from BS-Seq data are onthe 51015840UTR region The distribution patterns of 3 groups ofresiduals with different methylation level are quite different

BioMed Research International 5

00

05

10

15

20

25Fr

eque

ncy

Fold enrichment

12

48

Fold enrichment

12

48

Distribution on mRNA

lncRNA00

05

10

15

Freq

uenc

y

Distribution on lncRNA

5998400UTR CDS 3

998400UTR

Figure 3 m6A sites on mRNA and lncRNA In mRNA the strongest binding sites (ldquoIPinput ratiordquo larger than 8) are highly enriched nearstop codon side of 31015840UTR and deficient on TSS (transcription starting site) side of 51015840UTR and the phenomena are more prominent than lowlymethylated sites In contrast the m6A sites are almost uniformly distributed on lncRNA despite the ldquoIPinput ratiordquo specified Please notethat in this figure the size of 51015840UTR CDS and 31015840UTR reflects their true width within the transcriptome so the 51015840UTR region is much shortercompared with the other two components This result is based on peaks called on human HepG2 dataset [10]

0

1

2

Freq

uenc

y

Methylation levelHigherUnsureLower

Distribution on mRNA

lncRNA00

05

10

15

20

Freq

uenc

y

Distribution on lncRNA

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTR

Figure 4 Distribution of RNAm5C residuals on mRNA and lncRNA We divided all the Bismark reported C residuals into 3 groups that is9514 residuals with methylation level higher than 165 8600 lower and 406513 residuals cannot be determined with statistical significanceWe can see that for all 3 groups there exists a strong 51015840 bias on the mouse RNA BS-Seq data on both mRNA and lncRNA The highly m5Cmethylated residuals are strongly enriched at 51015840UTR of mRNA

on mRNA with highly methylated Cs (higher than 165with 005 confidence level) prominently enriched on 51015840UTRand near start codon A cytosine residual located on 51015840UTRis around 1 times more likely to be methylated than thaton CDS and 068 times more likely than that on 31015840UTRWhile previous study indicates RNAm5C is enriched on both

51015840UTR and 31015840UTR our analysis with increased resolutionindicates the enrichment is a lot stronger on 51015840UTR than31015840UTR [20] We also notice the pattern is complementary tothe distribution of m6A which is enriched near stop codonand 31015840UTR [8] (see Figure 3) These observations suggestthat RNA m5C and m6A may work in a complementary

6 BioMed Research International

manner On lncRNA however despite the overall 51015840 biasthere is no apparent difference in their distribution patternsrevealed

We in the following compared the relative distributionof highly and lowly m5C methylated C residuals on RNAand DNA The same processing pipe line as described pre-viously is applied to mouse embryo fibroblast whole genomebisulfite sequencing data [25] which profiles the DNA m5Cmethylation rather than that on RNA A total of 3641243cytosine residuals are reported by Bismark and they are thendivided into 3 groups (988335 1966017 and 686891 residualsresp) based onwhether theirmethylation level is significantlyhigher or lower than the average reported DNA methylationlevel (6579) with a binomial test at 005 confidence levelThe distributions of the 3 groups of cytosine residuals arestill profiled by Guitar package with neighborhood DNAregions included and the results are shown in Figure 5 Wecan see that compared with RNA methylation profile wherethe highly methylated residuals are enriched on 51015840UTRthe DNA methylation profile shows a clear oppose patternwith the lowly methylated residuals enriched on 51015840UTR andpeaked near the start codon to enable transcription initiationThis observation indicates DNA and RNA methyltransferasecomplexes probably have quite different sequence specificityeven though some key enzyme genes such as Dnmt2 [28 29]are shared between them [30]

4 Discussion and Conclusion

Currently most biological features are represented withgenome-based coordinates making it rather tedious forcomparing with transcriptomic landmarks We developed aGuitar R package which can be a useful tool for analyzingRNA-related genomic features especially RNA methylationBuilt upon the highly efficient GRangesList structure andGenomicFeatures RBioconductor packages Guitar can effi-ciently process millions of genomic features within minutesfor efficient transcriptomic analysis It may automaticallydownload gene information from UCSC genome browserincluding neighborhood DNA regions and allocate theweight of ambiguous features The developed Guitar coordi-nates also provide low level conversion from transcript-basedcoordinates to genome-based coordinates which shouldfacilitate various customized analysis Nevertheless there arestill a number of issues remaining to be addressed

Firstly highly abundant transcripts and ambiguousgenomic features may dominate the result from Guitar pack-age Currently all genomic features and all RNA transcriptshave the same weight this is fine when the genomic featuresare biological features such as RNA methylation sites butit may cause problem when the genomic features are NGSmapped reads such as RNAseq reads because some genesare a lot more highly expressed than the other genes and thereads generated from these genes may dominate the result Amore robust approach may be to use the median abundanceof all RNA transcripts rather the summed density It is alsopossible to develop a boxplot-like visualization scheme in thefuture to indicate the estimated higher and lower bounds forthe distribution of tested features

000

025

050

075

100

Freq

uenc

y

000

025

050

075

100

Freq

uenc

y

5998400UTR CDS 3

998400UTR

Methylation levelHigherUnsureLower

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTRPromoter(1kb) (1kb)

Tail

RNA m5C on mRNA

DNA m5C on mRNA encoding region

Figure 5 Distribution of DNA and RNA m5C residuals WhileC residuals of mRNA are likely to be methylated on 51015840UTR it isopposite on DNA On DNA the start codon and 51015840UTR regions areless likely to bemethylated compared with other regions to allow theinitiation of transcription process

Secondly the developed normalization scheme withinGuitar package only facilitates the comparison of featuredistribution on mRNA transcripts or on lncRNA transcriptsA cross-comparison between mRNA and lncRNA is not sup-ported so far In general it is still very difficult to rigorouslycontrast the distribution of genomic features on mRNA andlncRNA due to their intrinsic difference Compared withmRNA lncRNA usually has lower expression level and lessnumber of exons and is less conserved in sequence

Thirdly because of the heterogeneity of transcriptomeand the discrepancy between transcriptomic and genomicfeatures ambiguity arises and cannot be resolved perfectlyPrevious study of transcriptomic distribution of RNAmethy-lation relies on the longest transcript to eliminate ambiguity

BioMed Research International 7

in feature assignment whichmay require further justificationon why the longest transcript should be the canonical one Amore reasonable solution may be to rely on the most abun-dant isoform transcript however the isoform quantificationis usually a difficult problem and such information may notbe handy and sometimes may be unavailable at all CurrentlyGuitar package adapts two strategies to address this issueFirstly highly ambiguous genomic features or transcripts areexcluded from the analysis Secondly the weight of ambigu-ousmapping is evenly divided among transcripts It should bepossible to develop amore rigorous formulation for examplewith fuzzy system model [31ndash33] to further improve theallocation of weight from ambiguous overlapping

Fourthly during the development process of the Gui-tar package we also realized the Guitar coordinates maypotentially be used for accessing the data quality of MeRIP-Seq datasets The reproducibility uniformity homogeneityand the information entropy within RNA-related sequencingdata (RNA-SeqMeRIP-Seq and BS-Seq) can be convenientlyassessed with the help of Guitar coordinates however thisapplication has not been explored so far andwill be our futurework

Despite the aforementioned limitations the Guitar pack-age is capable of sketching the approximate transcriptomicdistribution of millions of genomic features within minutesin a single line of command We believe that the newlydeveloped Guitar coordinates and the Guitar package haveachieved stable performance on the data tested and shouldeffectively facilitate the analysis of RNAmethylation data andother RNA-related biological features

Additional Points

Project name is Guitar Project homepage is httpbiocon-ductororgpackagesGuitar Operating system(s) is Plat-form Independent Programming language is R License isGPL-2 For manual and guide please see supplementarymaterials

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Jia Meng Yufei Huang and Shao-Wu Zhang conceived theproject Xiaodong Cui Zhen Wei and Jia Meng carried outdevelopment and implementation of the software draftedthe paper and carried out development and implementationof the software Lin Zhang Hui Liu and Lei Sun providedvaluable discussion and helped draft the paper All authorsread and approved the final paper Xiaodong Cui and ZhenWei contribute equally to this work

Acknowledgments

This study is supported by National Natural Science Foun-dation of China (61401370 61473232 91430111 61501466

and 61301220) Fundamental Research Funds for the Cen-tral Universities (2014QNB47 2014QNA84) Jiangsu Scienceand Technology Program (BK20140403) and US NationalInstitutes of Health 5 U54 CA113001 and R01GM113245The authors also appreciate the computational support fromComputational Systems Biology Core University of Texas atSanAntonio funded byNational Institute onMinorityHealthand Health Disparities (G12MD007591) from the NationalInstitutes of Health of USA

References

[1] M LawrenceWHuberH Pages et al ldquoSoftware for computingand annotating genomic rangesrdquo PLoS Computational Biologyvol 9 no 8 Article ID e1003118 2013

[2] N Liu and T Pan ldquoRNA epigeneticsrdquo Translational Researchvol 165 no 1 pp 28ndash35 2015

[3] X Wang B S Zhao I A Roundtree et al ldquoN(6)-methylad-enosine modulates messenger RNA translation efficiencyrdquo Cellvol 161 no 6 pp 1388ndash1399 2015

[4] C R Alarcon H Lee H Goodarzi N Halberg and S FTavazoie ldquoN 6-methyladenosinemarks primarymicroRNAs forprocessingrdquo Nature vol 519 no 7544 pp 482ndash485 2015

[5] N Liu Q Dai G Zheng C He M Parisien and T Pan ldquoN6-methyladenosine-dependent RNA structural switches regulateRNA-protein interactionsrdquo Nature vol 518 no 7540 pp 560ndash564 2015

[6] YWang Y Li J I TothMD Petroski Z Zhang and J C ZhaoldquoN 6-methyladenosinemodification destabilizes developmentalregulators in embryonic stem cellsrdquo Nature Cell Biology vol 16no 2 pp 191ndash198 2014

[7] S Geula S Moshitch-Moshkovitz D Dominissini et al ldquom6AmRNA methylation facilitates resolution of naıve pluripotencytoward differentiationrdquo Science vol 347 no 6225 pp 1002ndash1006 2015

[8] K D Meyer Y Saletore P Zumbo O Elemento C E Masonand S R Jaffrey ldquoComprehensive analysis of mRNA methyla-tion reveals enrichment in 31015840 UTRs and near stop codonsrdquo Cellvol 149 no 7 pp 1635ndash1646 2012

[9] DDominissini SMoshitch-MoshkovitzM Salmon-DivonNAmariglio and G Rechavi ldquoTranscriptome-wide mapping ofN6-methyladenosine by m6A-seq based on immunocapturingand massively parallel sequencingrdquo Nature Protocols vol 8 no1 pp 176ndash189 2013

[10] D Dominissini S Moshitch-Moshkovitz S Schwartz et alldquoTopology of the human and mouse m6A RNA methylomesrevealed by m6A-seqrdquo Nature vol 484 no 7397 pp 201ndash2062012

[11] M Schaefer T Pollex K Hanna and F Lyko ldquoRNA cytosinemethylation analysis by bisulfite sequencingrdquo Nucleic AcidsResearch vol 37 no 2 article e12 2009

[12] J Meng X Cui M K Rao Y Chen and Y Huang ldquoExome-based analysis for RNA epigenome sequencing datardquo Bioinfor-matics vol 29 no 12 pp 1565ndash1567 2013

[13] J Meng Z Lu H Liu et al ldquoA protocol for RNA methylationdifferential analysis with MeRIP-Seq data and exomePeakRBioconductor packagerdquo Methods vol 69 no 3 pp 274ndash2812014

[14] H Liu M A Flores J Meng et al ldquoMeT-DB a database oftranscriptome methylation in mammalian cellsrdquo Nucleic AcidsResearch vol 43 no 1 pp D197ndashD203 2015

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

BioMed Research International 5

00

05

10

15

20

25Fr

eque

ncy

Fold enrichment

12

48

Fold enrichment

12

48

Distribution on mRNA

lncRNA00

05

10

15

Freq

uenc

y

Distribution on lncRNA

5998400UTR CDS 3

998400UTR

Figure 3 m6A sites on mRNA and lncRNA In mRNA the strongest binding sites (ldquoIPinput ratiordquo larger than 8) are highly enriched nearstop codon side of 31015840UTR and deficient on TSS (transcription starting site) side of 51015840UTR and the phenomena are more prominent than lowlymethylated sites In contrast the m6A sites are almost uniformly distributed on lncRNA despite the ldquoIPinput ratiordquo specified Please notethat in this figure the size of 51015840UTR CDS and 31015840UTR reflects their true width within the transcriptome so the 51015840UTR region is much shortercompared with the other two components This result is based on peaks called on human HepG2 dataset [10]

0

1

2

Freq

uenc

y

Methylation levelHigherUnsureLower

Distribution on mRNA

lncRNA00

05

10

15

20

Freq

uenc

y

Distribution on lncRNA

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTR

Figure 4 Distribution of RNAm5C residuals on mRNA and lncRNA We divided all the Bismark reported C residuals into 3 groups that is9514 residuals with methylation level higher than 165 8600 lower and 406513 residuals cannot be determined with statistical significanceWe can see that for all 3 groups there exists a strong 51015840 bias on the mouse RNA BS-Seq data on both mRNA and lncRNA The highly m5Cmethylated residuals are strongly enriched at 51015840UTR of mRNA

on mRNA with highly methylated Cs (higher than 165with 005 confidence level) prominently enriched on 51015840UTRand near start codon A cytosine residual located on 51015840UTRis around 1 times more likely to be methylated than thaton CDS and 068 times more likely than that on 31015840UTRWhile previous study indicates RNAm5C is enriched on both

51015840UTR and 31015840UTR our analysis with increased resolutionindicates the enrichment is a lot stronger on 51015840UTR than31015840UTR [20] We also notice the pattern is complementary tothe distribution of m6A which is enriched near stop codonand 31015840UTR [8] (see Figure 3) These observations suggestthat RNA m5C and m6A may work in a complementary

6 BioMed Research International

manner On lncRNA however despite the overall 51015840 biasthere is no apparent difference in their distribution patternsrevealed

We in the following compared the relative distributionof highly and lowly m5C methylated C residuals on RNAand DNA The same processing pipe line as described pre-viously is applied to mouse embryo fibroblast whole genomebisulfite sequencing data [25] which profiles the DNA m5Cmethylation rather than that on RNA A total of 3641243cytosine residuals are reported by Bismark and they are thendivided into 3 groups (988335 1966017 and 686891 residualsresp) based onwhether theirmethylation level is significantlyhigher or lower than the average reported DNA methylationlevel (6579) with a binomial test at 005 confidence levelThe distributions of the 3 groups of cytosine residuals arestill profiled by Guitar package with neighborhood DNAregions included and the results are shown in Figure 5 Wecan see that compared with RNA methylation profile wherethe highly methylated residuals are enriched on 51015840UTRthe DNA methylation profile shows a clear oppose patternwith the lowly methylated residuals enriched on 51015840UTR andpeaked near the start codon to enable transcription initiationThis observation indicates DNA and RNA methyltransferasecomplexes probably have quite different sequence specificityeven though some key enzyme genes such as Dnmt2 [28 29]are shared between them [30]

4 Discussion and Conclusion

Currently most biological features are represented withgenome-based coordinates making it rather tedious forcomparing with transcriptomic landmarks We developed aGuitar R package which can be a useful tool for analyzingRNA-related genomic features especially RNA methylationBuilt upon the highly efficient GRangesList structure andGenomicFeatures RBioconductor packages Guitar can effi-ciently process millions of genomic features within minutesfor efficient transcriptomic analysis It may automaticallydownload gene information from UCSC genome browserincluding neighborhood DNA regions and allocate theweight of ambiguous features The developed Guitar coordi-nates also provide low level conversion from transcript-basedcoordinates to genome-based coordinates which shouldfacilitate various customized analysis Nevertheless there arestill a number of issues remaining to be addressed

Firstly highly abundant transcripts and ambiguousgenomic features may dominate the result from Guitar pack-age Currently all genomic features and all RNA transcriptshave the same weight this is fine when the genomic featuresare biological features such as RNA methylation sites butit may cause problem when the genomic features are NGSmapped reads such as RNAseq reads because some genesare a lot more highly expressed than the other genes and thereads generated from these genes may dominate the result Amore robust approach may be to use the median abundanceof all RNA transcripts rather the summed density It is alsopossible to develop a boxplot-like visualization scheme in thefuture to indicate the estimated higher and lower bounds forthe distribution of tested features

000

025

050

075

100

Freq

uenc

y

000

025

050

075

100

Freq

uenc

y

5998400UTR CDS 3

998400UTR

Methylation levelHigherUnsureLower

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTRPromoter(1kb) (1kb)

Tail

RNA m5C on mRNA

DNA m5C on mRNA encoding region

Figure 5 Distribution of DNA and RNA m5C residuals WhileC residuals of mRNA are likely to be methylated on 51015840UTR it isopposite on DNA On DNA the start codon and 51015840UTR regions areless likely to bemethylated compared with other regions to allow theinitiation of transcription process

Secondly the developed normalization scheme withinGuitar package only facilitates the comparison of featuredistribution on mRNA transcripts or on lncRNA transcriptsA cross-comparison between mRNA and lncRNA is not sup-ported so far In general it is still very difficult to rigorouslycontrast the distribution of genomic features on mRNA andlncRNA due to their intrinsic difference Compared withmRNA lncRNA usually has lower expression level and lessnumber of exons and is less conserved in sequence

Thirdly because of the heterogeneity of transcriptomeand the discrepancy between transcriptomic and genomicfeatures ambiguity arises and cannot be resolved perfectlyPrevious study of transcriptomic distribution of RNAmethy-lation relies on the longest transcript to eliminate ambiguity

BioMed Research International 7

in feature assignment whichmay require further justificationon why the longest transcript should be the canonical one Amore reasonable solution may be to rely on the most abun-dant isoform transcript however the isoform quantificationis usually a difficult problem and such information may notbe handy and sometimes may be unavailable at all CurrentlyGuitar package adapts two strategies to address this issueFirstly highly ambiguous genomic features or transcripts areexcluded from the analysis Secondly the weight of ambigu-ousmapping is evenly divided among transcripts It should bepossible to develop amore rigorous formulation for examplewith fuzzy system model [31ndash33] to further improve theallocation of weight from ambiguous overlapping

Fourthly during the development process of the Gui-tar package we also realized the Guitar coordinates maypotentially be used for accessing the data quality of MeRIP-Seq datasets The reproducibility uniformity homogeneityand the information entropy within RNA-related sequencingdata (RNA-SeqMeRIP-Seq and BS-Seq) can be convenientlyassessed with the help of Guitar coordinates however thisapplication has not been explored so far andwill be our futurework

Despite the aforementioned limitations the Guitar pack-age is capable of sketching the approximate transcriptomicdistribution of millions of genomic features within minutesin a single line of command We believe that the newlydeveloped Guitar coordinates and the Guitar package haveachieved stable performance on the data tested and shouldeffectively facilitate the analysis of RNAmethylation data andother RNA-related biological features

Additional Points

Project name is Guitar Project homepage is httpbiocon-ductororgpackagesGuitar Operating system(s) is Plat-form Independent Programming language is R License isGPL-2 For manual and guide please see supplementarymaterials

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Jia Meng Yufei Huang and Shao-Wu Zhang conceived theproject Xiaodong Cui Zhen Wei and Jia Meng carried outdevelopment and implementation of the software draftedthe paper and carried out development and implementationof the software Lin Zhang Hui Liu and Lei Sun providedvaluable discussion and helped draft the paper All authorsread and approved the final paper Xiaodong Cui and ZhenWei contribute equally to this work

Acknowledgments

This study is supported by National Natural Science Foun-dation of China (61401370 61473232 91430111 61501466

and 61301220) Fundamental Research Funds for the Cen-tral Universities (2014QNB47 2014QNA84) Jiangsu Scienceand Technology Program (BK20140403) and US NationalInstitutes of Health 5 U54 CA113001 and R01GM113245The authors also appreciate the computational support fromComputational Systems Biology Core University of Texas atSanAntonio funded byNational Institute onMinorityHealthand Health Disparities (G12MD007591) from the NationalInstitutes of Health of USA

References

[1] M LawrenceWHuberH Pages et al ldquoSoftware for computingand annotating genomic rangesrdquo PLoS Computational Biologyvol 9 no 8 Article ID e1003118 2013

[2] N Liu and T Pan ldquoRNA epigeneticsrdquo Translational Researchvol 165 no 1 pp 28ndash35 2015

[3] X Wang B S Zhao I A Roundtree et al ldquoN(6)-methylad-enosine modulates messenger RNA translation efficiencyrdquo Cellvol 161 no 6 pp 1388ndash1399 2015

[4] C R Alarcon H Lee H Goodarzi N Halberg and S FTavazoie ldquoN 6-methyladenosinemarks primarymicroRNAs forprocessingrdquo Nature vol 519 no 7544 pp 482ndash485 2015

[5] N Liu Q Dai G Zheng C He M Parisien and T Pan ldquoN6-methyladenosine-dependent RNA structural switches regulateRNA-protein interactionsrdquo Nature vol 518 no 7540 pp 560ndash564 2015

[6] YWang Y Li J I TothMD Petroski Z Zhang and J C ZhaoldquoN 6-methyladenosinemodification destabilizes developmentalregulators in embryonic stem cellsrdquo Nature Cell Biology vol 16no 2 pp 191ndash198 2014

[7] S Geula S Moshitch-Moshkovitz D Dominissini et al ldquom6AmRNA methylation facilitates resolution of naıve pluripotencytoward differentiationrdquo Science vol 347 no 6225 pp 1002ndash1006 2015

[8] K D Meyer Y Saletore P Zumbo O Elemento C E Masonand S R Jaffrey ldquoComprehensive analysis of mRNA methyla-tion reveals enrichment in 31015840 UTRs and near stop codonsrdquo Cellvol 149 no 7 pp 1635ndash1646 2012

[9] DDominissini SMoshitch-MoshkovitzM Salmon-DivonNAmariglio and G Rechavi ldquoTranscriptome-wide mapping ofN6-methyladenosine by m6A-seq based on immunocapturingand massively parallel sequencingrdquo Nature Protocols vol 8 no1 pp 176ndash189 2013

[10] D Dominissini S Moshitch-Moshkovitz S Schwartz et alldquoTopology of the human and mouse m6A RNA methylomesrevealed by m6A-seqrdquo Nature vol 484 no 7397 pp 201ndash2062012

[11] M Schaefer T Pollex K Hanna and F Lyko ldquoRNA cytosinemethylation analysis by bisulfite sequencingrdquo Nucleic AcidsResearch vol 37 no 2 article e12 2009

[12] J Meng X Cui M K Rao Y Chen and Y Huang ldquoExome-based analysis for RNA epigenome sequencing datardquo Bioinfor-matics vol 29 no 12 pp 1565ndash1567 2013

[13] J Meng Z Lu H Liu et al ldquoA protocol for RNA methylationdifferential analysis with MeRIP-Seq data and exomePeakRBioconductor packagerdquo Methods vol 69 no 3 pp 274ndash2812014

[14] H Liu M A Flores J Meng et al ldquoMeT-DB a database oftranscriptome methylation in mammalian cellsrdquo Nucleic AcidsResearch vol 43 no 1 pp D197ndashD203 2015

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

6 BioMed Research International

manner On lncRNA however despite the overall 51015840 biasthere is no apparent difference in their distribution patternsrevealed

We in the following compared the relative distributionof highly and lowly m5C methylated C residuals on RNAand DNA The same processing pipe line as described pre-viously is applied to mouse embryo fibroblast whole genomebisulfite sequencing data [25] which profiles the DNA m5Cmethylation rather than that on RNA A total of 3641243cytosine residuals are reported by Bismark and they are thendivided into 3 groups (988335 1966017 and 686891 residualsresp) based onwhether theirmethylation level is significantlyhigher or lower than the average reported DNA methylationlevel (6579) with a binomial test at 005 confidence levelThe distributions of the 3 groups of cytosine residuals arestill profiled by Guitar package with neighborhood DNAregions included and the results are shown in Figure 5 Wecan see that compared with RNA methylation profile wherethe highly methylated residuals are enriched on 51015840UTRthe DNA methylation profile shows a clear oppose patternwith the lowly methylated residuals enriched on 51015840UTR andpeaked near the start codon to enable transcription initiationThis observation indicates DNA and RNA methyltransferasecomplexes probably have quite different sequence specificityeven though some key enzyme genes such as Dnmt2 [28 29]are shared between them [30]

4 Discussion and Conclusion

Currently most biological features are represented withgenome-based coordinates making it rather tedious forcomparing with transcriptomic landmarks We developed aGuitar R package which can be a useful tool for analyzingRNA-related genomic features especially RNA methylationBuilt upon the highly efficient GRangesList structure andGenomicFeatures RBioconductor packages Guitar can effi-ciently process millions of genomic features within minutesfor efficient transcriptomic analysis It may automaticallydownload gene information from UCSC genome browserincluding neighborhood DNA regions and allocate theweight of ambiguous features The developed Guitar coordi-nates also provide low level conversion from transcript-basedcoordinates to genome-based coordinates which shouldfacilitate various customized analysis Nevertheless there arestill a number of issues remaining to be addressed

Firstly highly abundant transcripts and ambiguousgenomic features may dominate the result from Guitar pack-age Currently all genomic features and all RNA transcriptshave the same weight this is fine when the genomic featuresare biological features such as RNA methylation sites butit may cause problem when the genomic features are NGSmapped reads such as RNAseq reads because some genesare a lot more highly expressed than the other genes and thereads generated from these genes may dominate the result Amore robust approach may be to use the median abundanceof all RNA transcripts rather the summed density It is alsopossible to develop a boxplot-like visualization scheme in thefuture to indicate the estimated higher and lower bounds forthe distribution of tested features

000

025

050

075

100

Freq

uenc

y

000

025

050

075

100

Freq

uenc

y

5998400UTR CDS 3

998400UTR

Methylation levelHigherUnsureLower

Methylation levelHigherUnsureLower

5998400UTR CDS 3

998400UTRPromoter(1kb) (1kb)

Tail

RNA m5C on mRNA

DNA m5C on mRNA encoding region

Figure 5 Distribution of DNA and RNA m5C residuals WhileC residuals of mRNA are likely to be methylated on 51015840UTR it isopposite on DNA On DNA the start codon and 51015840UTR regions areless likely to bemethylated compared with other regions to allow theinitiation of transcription process

Secondly the developed normalization scheme withinGuitar package only facilitates the comparison of featuredistribution on mRNA transcripts or on lncRNA transcriptsA cross-comparison between mRNA and lncRNA is not sup-ported so far In general it is still very difficult to rigorouslycontrast the distribution of genomic features on mRNA andlncRNA due to their intrinsic difference Compared withmRNA lncRNA usually has lower expression level and lessnumber of exons and is less conserved in sequence

Thirdly because of the heterogeneity of transcriptomeand the discrepancy between transcriptomic and genomicfeatures ambiguity arises and cannot be resolved perfectlyPrevious study of transcriptomic distribution of RNAmethy-lation relies on the longest transcript to eliminate ambiguity

BioMed Research International 7

in feature assignment whichmay require further justificationon why the longest transcript should be the canonical one Amore reasonable solution may be to rely on the most abun-dant isoform transcript however the isoform quantificationis usually a difficult problem and such information may notbe handy and sometimes may be unavailable at all CurrentlyGuitar package adapts two strategies to address this issueFirstly highly ambiguous genomic features or transcripts areexcluded from the analysis Secondly the weight of ambigu-ousmapping is evenly divided among transcripts It should bepossible to develop amore rigorous formulation for examplewith fuzzy system model [31ndash33] to further improve theallocation of weight from ambiguous overlapping

Fourthly during the development process of the Gui-tar package we also realized the Guitar coordinates maypotentially be used for accessing the data quality of MeRIP-Seq datasets The reproducibility uniformity homogeneityand the information entropy within RNA-related sequencingdata (RNA-SeqMeRIP-Seq and BS-Seq) can be convenientlyassessed with the help of Guitar coordinates however thisapplication has not been explored so far andwill be our futurework

Despite the aforementioned limitations the Guitar pack-age is capable of sketching the approximate transcriptomicdistribution of millions of genomic features within minutesin a single line of command We believe that the newlydeveloped Guitar coordinates and the Guitar package haveachieved stable performance on the data tested and shouldeffectively facilitate the analysis of RNAmethylation data andother RNA-related biological features

Additional Points

Project name is Guitar Project homepage is httpbiocon-ductororgpackagesGuitar Operating system(s) is Plat-form Independent Programming language is R License isGPL-2 For manual and guide please see supplementarymaterials

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Jia Meng Yufei Huang and Shao-Wu Zhang conceived theproject Xiaodong Cui Zhen Wei and Jia Meng carried outdevelopment and implementation of the software draftedthe paper and carried out development and implementationof the software Lin Zhang Hui Liu and Lei Sun providedvaluable discussion and helped draft the paper All authorsread and approved the final paper Xiaodong Cui and ZhenWei contribute equally to this work

Acknowledgments

This study is supported by National Natural Science Foun-dation of China (61401370 61473232 91430111 61501466

and 61301220) Fundamental Research Funds for the Cen-tral Universities (2014QNB47 2014QNA84) Jiangsu Scienceand Technology Program (BK20140403) and US NationalInstitutes of Health 5 U54 CA113001 and R01GM113245The authors also appreciate the computational support fromComputational Systems Biology Core University of Texas atSanAntonio funded byNational Institute onMinorityHealthand Health Disparities (G12MD007591) from the NationalInstitutes of Health of USA

References

[1] M LawrenceWHuberH Pages et al ldquoSoftware for computingand annotating genomic rangesrdquo PLoS Computational Biologyvol 9 no 8 Article ID e1003118 2013

[2] N Liu and T Pan ldquoRNA epigeneticsrdquo Translational Researchvol 165 no 1 pp 28ndash35 2015

[3] X Wang B S Zhao I A Roundtree et al ldquoN(6)-methylad-enosine modulates messenger RNA translation efficiencyrdquo Cellvol 161 no 6 pp 1388ndash1399 2015

[4] C R Alarcon H Lee H Goodarzi N Halberg and S FTavazoie ldquoN 6-methyladenosinemarks primarymicroRNAs forprocessingrdquo Nature vol 519 no 7544 pp 482ndash485 2015

[5] N Liu Q Dai G Zheng C He M Parisien and T Pan ldquoN6-methyladenosine-dependent RNA structural switches regulateRNA-protein interactionsrdquo Nature vol 518 no 7540 pp 560ndash564 2015

[6] YWang Y Li J I TothMD Petroski Z Zhang and J C ZhaoldquoN 6-methyladenosinemodification destabilizes developmentalregulators in embryonic stem cellsrdquo Nature Cell Biology vol 16no 2 pp 191ndash198 2014

[7] S Geula S Moshitch-Moshkovitz D Dominissini et al ldquom6AmRNA methylation facilitates resolution of naıve pluripotencytoward differentiationrdquo Science vol 347 no 6225 pp 1002ndash1006 2015

[8] K D Meyer Y Saletore P Zumbo O Elemento C E Masonand S R Jaffrey ldquoComprehensive analysis of mRNA methyla-tion reveals enrichment in 31015840 UTRs and near stop codonsrdquo Cellvol 149 no 7 pp 1635ndash1646 2012

[9] DDominissini SMoshitch-MoshkovitzM Salmon-DivonNAmariglio and G Rechavi ldquoTranscriptome-wide mapping ofN6-methyladenosine by m6A-seq based on immunocapturingand massively parallel sequencingrdquo Nature Protocols vol 8 no1 pp 176ndash189 2013

[10] D Dominissini S Moshitch-Moshkovitz S Schwartz et alldquoTopology of the human and mouse m6A RNA methylomesrevealed by m6A-seqrdquo Nature vol 484 no 7397 pp 201ndash2062012

[11] M Schaefer T Pollex K Hanna and F Lyko ldquoRNA cytosinemethylation analysis by bisulfite sequencingrdquo Nucleic AcidsResearch vol 37 no 2 article e12 2009

[12] J Meng X Cui M K Rao Y Chen and Y Huang ldquoExome-based analysis for RNA epigenome sequencing datardquo Bioinfor-matics vol 29 no 12 pp 1565ndash1567 2013

[13] J Meng Z Lu H Liu et al ldquoA protocol for RNA methylationdifferential analysis with MeRIP-Seq data and exomePeakRBioconductor packagerdquo Methods vol 69 no 3 pp 274ndash2812014

[14] H Liu M A Flores J Meng et al ldquoMeT-DB a database oftranscriptome methylation in mammalian cellsrdquo Nucleic AcidsResearch vol 43 no 1 pp D197ndashD203 2015

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

BioMed Research International 7

in feature assignment whichmay require further justificationon why the longest transcript should be the canonical one Amore reasonable solution may be to rely on the most abun-dant isoform transcript however the isoform quantificationis usually a difficult problem and such information may notbe handy and sometimes may be unavailable at all CurrentlyGuitar package adapts two strategies to address this issueFirstly highly ambiguous genomic features or transcripts areexcluded from the analysis Secondly the weight of ambigu-ousmapping is evenly divided among transcripts It should bepossible to develop amore rigorous formulation for examplewith fuzzy system model [31ndash33] to further improve theallocation of weight from ambiguous overlapping

Fourthly during the development process of the Gui-tar package we also realized the Guitar coordinates maypotentially be used for accessing the data quality of MeRIP-Seq datasets The reproducibility uniformity homogeneityand the information entropy within RNA-related sequencingdata (RNA-SeqMeRIP-Seq and BS-Seq) can be convenientlyassessed with the help of Guitar coordinates however thisapplication has not been explored so far andwill be our futurework

Despite the aforementioned limitations the Guitar pack-age is capable of sketching the approximate transcriptomicdistribution of millions of genomic features within minutesin a single line of command We believe that the newlydeveloped Guitar coordinates and the Guitar package haveachieved stable performance on the data tested and shouldeffectively facilitate the analysis of RNAmethylation data andother RNA-related biological features

Additional Points

Project name is Guitar Project homepage is httpbiocon-ductororgpackagesGuitar Operating system(s) is Plat-form Independent Programming language is R License isGPL-2 For manual and guide please see supplementarymaterials

Competing Interests

The authors declare that they have no competing interests

Authorsrsquo Contributions

Jia Meng Yufei Huang and Shao-Wu Zhang conceived theproject Xiaodong Cui Zhen Wei and Jia Meng carried outdevelopment and implementation of the software draftedthe paper and carried out development and implementationof the software Lin Zhang Hui Liu and Lei Sun providedvaluable discussion and helped draft the paper All authorsread and approved the final paper Xiaodong Cui and ZhenWei contribute equally to this work

Acknowledgments

This study is supported by National Natural Science Foun-dation of China (61401370 61473232 91430111 61501466

and 61301220) Fundamental Research Funds for the Cen-tral Universities (2014QNB47 2014QNA84) Jiangsu Scienceand Technology Program (BK20140403) and US NationalInstitutes of Health 5 U54 CA113001 and R01GM113245The authors also appreciate the computational support fromComputational Systems Biology Core University of Texas atSanAntonio funded byNational Institute onMinorityHealthand Health Disparities (G12MD007591) from the NationalInstitutes of Health of USA

References

[1] M LawrenceWHuberH Pages et al ldquoSoftware for computingand annotating genomic rangesrdquo PLoS Computational Biologyvol 9 no 8 Article ID e1003118 2013

[2] N Liu and T Pan ldquoRNA epigeneticsrdquo Translational Researchvol 165 no 1 pp 28ndash35 2015

[3] X Wang B S Zhao I A Roundtree et al ldquoN(6)-methylad-enosine modulates messenger RNA translation efficiencyrdquo Cellvol 161 no 6 pp 1388ndash1399 2015

[4] C R Alarcon H Lee H Goodarzi N Halberg and S FTavazoie ldquoN 6-methyladenosinemarks primarymicroRNAs forprocessingrdquo Nature vol 519 no 7544 pp 482ndash485 2015

[5] N Liu Q Dai G Zheng C He M Parisien and T Pan ldquoN6-methyladenosine-dependent RNA structural switches regulateRNA-protein interactionsrdquo Nature vol 518 no 7540 pp 560ndash564 2015

[6] YWang Y Li J I TothMD Petroski Z Zhang and J C ZhaoldquoN 6-methyladenosinemodification destabilizes developmentalregulators in embryonic stem cellsrdquo Nature Cell Biology vol 16no 2 pp 191ndash198 2014

[7] S Geula S Moshitch-Moshkovitz D Dominissini et al ldquom6AmRNA methylation facilitates resolution of naıve pluripotencytoward differentiationrdquo Science vol 347 no 6225 pp 1002ndash1006 2015

[8] K D Meyer Y Saletore P Zumbo O Elemento C E Masonand S R Jaffrey ldquoComprehensive analysis of mRNA methyla-tion reveals enrichment in 31015840 UTRs and near stop codonsrdquo Cellvol 149 no 7 pp 1635ndash1646 2012

[9] DDominissini SMoshitch-MoshkovitzM Salmon-DivonNAmariglio and G Rechavi ldquoTranscriptome-wide mapping ofN6-methyladenosine by m6A-seq based on immunocapturingand massively parallel sequencingrdquo Nature Protocols vol 8 no1 pp 176ndash189 2013

[10] D Dominissini S Moshitch-Moshkovitz S Schwartz et alldquoTopology of the human and mouse m6A RNA methylomesrevealed by m6A-seqrdquo Nature vol 484 no 7397 pp 201ndash2062012

[11] M Schaefer T Pollex K Hanna and F Lyko ldquoRNA cytosinemethylation analysis by bisulfite sequencingrdquo Nucleic AcidsResearch vol 37 no 2 article e12 2009

[12] J Meng X Cui M K Rao Y Chen and Y Huang ldquoExome-based analysis for RNA epigenome sequencing datardquo Bioinfor-matics vol 29 no 12 pp 1565ndash1567 2013

[13] J Meng Z Lu H Liu et al ldquoA protocol for RNA methylationdifferential analysis with MeRIP-Seq data and exomePeakRBioconductor packagerdquo Methods vol 69 no 3 pp 274ndash2812014

[14] H Liu M A Flores J Meng et al ldquoMeT-DB a database oftranscriptome methylation in mammalian cellsrdquo Nucleic AcidsResearch vol 43 no 1 pp D197ndashD203 2015

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

8 BioMed Research International

[15] L Liu S-W Zhang Y-C Zhang et al ldquoDecomposition ofRNA methylome reveals co-methylation patterns induced bylatent enzymatic regulators of the epitranscriptomerdquoMolecularBioSystems vol 11 no 1 pp 262ndash274 2015

[16] L Shen N Shao X Liu and E Nestler ldquoNgsplot quickmining and visualization of next-generation sequencing data byintegrating genomic databasesrdquo BMC Genomics vol 15 article284 2014

[17] C Ginestet ldquoggplot2 elegant graphics for data analysisrdquo Journalof the Royal Statistical Society Series A (Statistics in Society) vol174 no 1 pp 245ndash246 2011

[18] Y Fu D Dominissini G Rechavi and C He ldquoGene expressionregulation mediated through reversible m6A RNA methyla-tionrdquo Nature Reviews Genetics vol 15 no 5 pp 293ndash306 2014

[19] K D Meyer and S R Jaffrey ldquoThe dynamic epitranscriptomeN6-methyladenosine and gene expression controlrdquo NatureReviews Molecular Cell Biology vol 15 no 5 pp 313ndash326 2014

[20] J E Squires H R Patel M Nousch et al ldquoWidespreadoccurrence of 5-methylcytosine in human coding and non-coding RNArdquo Nucleic Acids Research vol 40 no 11 pp 5023ndash5033 2012

[21] S Hussain J Aleksic S Blanco S Dietmann and M FryeldquoCharacterizing 5-methylcytosine in the mammalian epitran-scriptomerdquo Genome Biology vol 14 article 215 2013

[22] Y Motorin F Lyko and M Helm ldquo5-Methylcytosine inRNA detection enzymatic formation and biological functionsrdquoNucleic Acids Research vol 38 no 5 Article ID gkp1117 pp1415ndash1430 2009

[23] C Bock ldquoAnalysing and interpreting DNA methylation datardquoNature Reviews Genetics vol 13 no 10 pp 705ndash719 2012

[24] P A Jones ldquoFunctions of DNAmethylation islands start sitesgene bodies and beyondrdquo Nature Reviews Genetics vol 13 no7 pp 484ndash492 2012

[25] V Khoddami and B R Cairns ldquoIdentification of direct tar-gets and modified bases of RNA cytosine methyltransferasesrdquoNature Biotechnology vol 31 no 5 pp 458ndash464 2013

[26] F Krueger Trim Galore A wrapper tool around Cutadaptand FastQC to consistently apply quality and adapter trimmingto FastQ files with some extra functionality for MspI-digestedRRBS-type (Reduced Representation Bisufite-Seq) libraries 2013

[27] F Krueger and S R Andrews ldquoBismark a flexible aligner andmethylation caller for Bisulfite-Seq applicationsrdquo Bioinformat-ics vol 27 no 11 Article ID btr167 pp 1571ndash1572 2011

[28] A Dong J A Yoder X Zhang L Zhou T H Bestor andX Cheng ldquoStructure of human DNMT2 an enigmatic DNAmethyltransferase homolog that displays denaturant-resistantbinding to DNArdquoNucleic Acids Research vol 29 no 2 pp 439ndash448 2001

[29] M Schaefer T Pollex K Hanna et al ldquoRNA methylation byDnmt2 protects transfer RNAs against stress-induced cleavagerdquoGenes amp Development vol 24 no 15 pp 1590ndash1595 2010

[30] M A Machnicka K Milanowska O O Oglou et al ldquoMOD-OMICS a database of RNA modification pathwaysmdash2013updaterdquo Nucleic Acids Research vol 41 no 1 pp D262ndashD2672013

[31] C L P Chen J Wang C-H Wang and L Chen ldquoA newlearning algorithm for a fully connected neuro-fuzzy inferencesystemrdquo IEEE Transactions on Neural Networks and LearningSystems vol 25 no 10 pp 1741ndash1757 2014

[32] J Zhou C L P Chen L Chen and H-X Li ldquoA collaborativefuzzy clustering algorithm in distributed network environ-mentsrdquo IEEE Transactions on Fuzzy Systems vol 22 no 6 pp1443ndash1456 2014

[33] M Gan C L P Chen H-X Li and L Chen ldquoGradientradial basis function based varying-coefficient autoregressivemodel for nonlinear and nonstationary time seriesrdquo IEEE SignalProcessing Letters vol 22 no 7 pp 809ndash812 2015

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Guitar: An R/Bioconductor Package for ...Research Article Guitar: An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology