gramene 2016: comparative plant genomics and pathway resources › download › pdf ›...

8
Published online 8 November 2015 Nucleic Acids Research, 2016, Vol. 44, Database issue D1133–D1140 doi: 10.1093/nar/gkv1179 Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz 1 , Joshua Stein 1 , Sharon Wei 1 , Justin Preece 2 , Andrew Olson 1 , Sushma Naithani 2 , Vindhya Amarasinghe 2 , Palitha Dharmawardhana 2 , Yinping Jiao 1 , Joseph Mulvaney 1 , Sunita Kumari 1 , Kapeel Chougule 1 , Justin Elser 2 , Bo Wang 1 , James Thomason 1 , Daniel M. Bolser 3 , Arnaud Kerhornou 3 , Brandon Walts 3 , Nuno A. Fonseca 3 , Laura Huerta 3 , Maria Keays 3 , Y. Amy Tang 3 , Helen Parkinson 3 , Antonio Fabregat 3 , Sheldon McKay 4 , Joel Weiser 4 , Peter D’Eustachio 5 , Lincoln Stein 4 , Robert Petryszak 3 , Paul J. Kersey 3 , Pankaj Jaiswal 2 and Doreen Ware 1,6,* 1 Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA, 2 Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA, 3 EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK, 4 Informatics and Bio-computing Program, Ontario Institute of Cancer Research, Toronto, M5G 1L7, Canada, 5 Department of Biochemistry & Molecular Pharmacology, NYU School of Medicine, New York, NY 10016, USA and 6 USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA Received September 25, 2015; Accepted October 13, 2015 ABSTRACT Gramene (http://www.gramene.org) is an online re- source for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled ref- erence genomes that are integrated using ontology- based annotation and comparative analyses, and ac- cessed through both visual and programmatic in- terfaces. Additional community data, such as ge- netic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reac- tome pathway portal (http://plantreactome.gramene. org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addi- tion to 200 curated rice reference pathways, the portal hosts gene homology-based pathway projec- tions for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI’s Ex- pression Atlas to enable the projection of baseline and differential expression data from curated ex- pression studies in plants. Gramene’s archive web- site (http://archive.gramene.org) continues to pro- vide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel car- rying video tutorials. INTRODUCTION Modern agriculture faces global challenges including food security and adaptation to climate change. In this con- text, bioinformatics resources can facilitate better under- standing of the complexity, structure and evolution of plants. Gramene provides online resources for visualiz- ing and comparing plant genomes and biological path- ways. Community-based gene annotations constitute the primary sources of annotations accompanying each refer- ence genome, and are supplemented with functional clas- sification and phylogenomics-based comparisons. In addi- tion, we annotate and display variation data, largely ob- tained through collaboration with large-scale re-sequencing and genotyping initiatives. The project is also committed to consolidating plant pathway databases by applying both manual curation and automated methods. Gramene is powered by several platform infrastruc- tures that are linked via the Drupal content management system to provide a unified user experience. Our genome browser (http://ensembl.gramene.org/genome browser) takes advantage of the Ensembl infrastructure (1) to provide an interface for exploration of genome features, * To whom correspondence should be addressed. Tel: +1 516 367 6979; Fax: +1 516 367 6851; Email: [email protected] Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US. at Cold Spring Harbor Laboratory on March 25, 2016 http://nar.oxfordjournals.org/ Downloaded from

Upload: others

Post on 29-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

Published online 8 November 2015 Nucleic Acids Research 2016 Vol 44 Database issue D1133ndashD1140doi 101093nargkv1179

Gramene 2016 comparative plant genomics andpathway resourcesMarcela K Tello-Ruiz1 Joshua Stein1 Sharon Wei1 Justin Preece2 Andrew Olson1Sushma Naithani2 Vindhya Amarasinghe2 Palitha Dharmawardhana2 Yinping Jiao1Joseph Mulvaney1 Sunita Kumari1 Kapeel Chougule1 Justin Elser2 Bo Wang1James Thomason1 Daniel M Bolser3 Arnaud Kerhornou3 Brandon Walts3 NunoA Fonseca3 Laura Huerta3 Maria Keays3 Y Amy Tang3 Helen Parkinson3Antonio Fabregat3 Sheldon McKay4 Joel Weiser4 Peter DrsquoEustachio5 Lincoln Stein4Robert Petryszak3 Paul J Kersey3 Pankaj Jaiswal2 and Doreen Ware16

1Cold Spring Harbor Laboratory Cold Spring Harbor NY 11724 USA 2Department of Botany and Plant PathologyOregon State University Corvallis OR 97331 USA 3EMBL-European Bioinformatics Institute Wellcome TrustGenome Campus Hinxton CB10 1SD UK 4Informatics and Bio-computing Program Ontario Institute of CancerResearch Toronto M5G 1L7 Canada 5Department of Biochemistry amp Molecular Pharmacology NYU School ofMedicine New York NY 10016 USA and 6USDA ARS NAA Robert W Holley Center for Agriculture and HealthAgricultural Research Service Ithaca NY 14853 USA

Received September 25 2015 Accepted October 13 2015

ABSTRACT

Gramene (httpwwwgrameneorg) is an online re-source for comparative functional genomics in cropsand model plant species Its two main frameworks aregenomes (collaboration with Ensembl Plants) andpathways (The Plant Reactome and archival BioCycdatabases) Since our last NAR update the databasewebsite adopted a new Drupal management platformThe genomes section features 39 fully assembled ref-erence genomes that are integrated using ontology-based annotation and comparative analyses and ac-cessed through both visual and programmatic in-terfaces Additional community data such as ge-netic variation expression and methylation are alsomapped for a subset of genomes The Plant Reac-tome pathway portal (httpplantreactomegrameneorg) provides a reference resource for analyzingplant metabolic and regulatory pathways In addi-tion to sim200 curated rice reference pathways theportal hosts gene homology-based pathway projec-tions for 33 plant species Both the genome andpathway browsers interface with the EMBL-EBIrsquos Ex-pression Atlas to enable the projection of baselineand differential expression data from curated ex-pression studies in plants Gramenersquos archive web-site (httparchivegrameneorg) continues to pro-

vide previously reported resources on comparativemaps markers and QTL To further aid our userswe have also introduced a live monthly educationalwebinar series and a Gramene YouTube channel car-rying video tutorials

INTRODUCTION

Modern agriculture faces global challenges including foodsecurity and adaptation to climate change In this con-text bioinformatics resources can facilitate better under-standing of the complexity structure and evolution ofplants Gramene provides online resources for visualiz-ing and comparing plant genomes and biological path-ways Community-based gene annotations constitute theprimary sources of annotations accompanying each refer-ence genome and are supplemented with functional clas-sification and phylogenomics-based comparisons In addi-tion we annotate and display variation data largely ob-tained through collaboration with large-scale re-sequencingand genotyping initiatives The project is also committedto consolidating plant pathway databases by applying bothmanual curation and automated methods

Gramene is powered by several platform infrastruc-tures that are linked via the Drupal content managementsystem to provide a unified user experience Our genomebrowser (httpensemblgrameneorggenome browser)takes advantage of the Ensembl infrastructure (1) toprovide an interface for exploration of genome features

To whom correspondence should be addressed Tel +1 516 367 6979 Fax +1 516 367 6851 Email warecshledu

Published by Oxford University Press on behalf of Nucleic Acids Research 2015This work is written by (a) US Government employee(s) and is in the public domain in the US

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1134 Nucleic Acids Research 2016 Vol 44 Database issue

functional ontologies variation data and comparativephylogenomics For the past six years Gramene haspartnered with the Plants division of Ensembl Genomes(httpplantsensemblorg) to jointly produce a commongenome browser mirrored yet independently hosted by theUnited States and the United Kingdom respectively ThisUSA-European collaboration has facilitated the timelyadoption of the software updates and novelty tools thatfrequently accompany Ensembl releases (1)

Gramene developed a comprehensive framework formetabolic signaling and genetic networks at the systemslevel for plants using the Reactome data model analysis andvisualization platform (2) also known as The Plant Reac-tome The current version of The Plant Reactome databasewebsite (httpplantreactomegrameneorg) has a new userinterface and features 238 rice pathways (sim80 of whichwere manually curated) which also serve as reference forprojection of orthologous pathways for 33 plant specieswith sequenced and annotated plant genomes

Gramenersquos software platforms provide region-specific(eg genome browser) or pathway-specific (eg Plant Reac-tome) data downloads In addition project data are avail-able for customizable downloads from the GrameneMart(34) bulk downloads via File Transfer Protocol (ftpftpgrameneorgpubgramene) and programmatic access viapublic MySQL (5) and a new RESTful API (httpdatagrameneorg) together with REST APIs from Plant Reac-tome and Ensembl This article summarizes Gramene up-dates since the last report in this journal (3) through the47th release of the Gramene database in August 2015 Thewebsite database and its contents continue to be updatedfive times per year changes can be followed at the Gramenenews portal (httpwwwgrameneorgblog) and by brows-ing the sitersquos release notes (httpwwwgrameneorgrelease-notes)

NEW PLANT GENOMES

Gramene added 12 new fully sequenced reference genomes(Supplementary Table S1) bringing the total to 39 Thecomposition of selected species reflected community needsbalanced with other considerations such as assembly qual-ity and placement within the plant phylogenetic tree aswell as collaborative support In addition all referencegenome assemblies are accessioned with the InternationalNucleotide Sequence Database Collaboration (INSDChttpwwwinsdcorg) (6) a data sharing policy standardadopted by Gramene and Ensembl Plants As shown inFigure 1A and Supplementary Table S1 the list of in-cluded species provides broad taxonomic representationacross green plants with greater representation in partic-ular clades such as rice wheat and the Brassicas In all theset includes the genomes of 21 monocots 12 core eudicots1 basal angiosperm and 5 non-flowering plants which col-lectively serve both crop and model organism research com-munities Five of the newly added species are wild relativesof rice Oryza meridionalis O glumaepatula O nivara Orufipogon O longistaminata and the rice outgroup Leersiaperrieri which were sequenced by an international consor-tium of researchers known as I-OMAP (7) With the ad-dition of these species Gramene now hosts data for almost

half of the 23 extant species in the Oryza genus providing anunprecedented resource for rice research Deep species rep-resentation is also provided within the Triticaceae includ-ing the first chromosome-scale assembly of hexaploid breadwheat (89) now the largest genome (16 Gbp) in GrameneNewly added eudicot genomes include three nutritionallyand economically important species the cole crops (Bras-sica oleracea) (10) cocoa (Theobroma cacao) (11) and peach(Prunus persica) (12) Two other new species are of partic-ular interest in diverse fields of research The addition ofAmborella trichopoda (1314) basal to Angiosperms (flow-ering plants) should aid in efforts to elucidate archaic andderived genomic features within the angiosperm tree of lifeThe genome of one of the smallest known free-living eu-karyotes a unicellular green alga Ostreococcus lucimarinus(15) has attracted intense research interest due to its centralrole in the oceanic carbon cycle

ANNOTATION AND COMPARATIVE GENOMICS

A principal aim of Gramene is to integrate and draw con-nections between genomes through consistent functionalannotation and comparative analyses For each genomethe community-recognized gene annotation set is character-ized for InterPro domains assigned Gene Ontology (GO)and Plant Ontology terms (Supplementary Table S2) (16)and cross-referenced to source databases Genomes aremapped by whole genome alignment between selected pairsof species (1718) In the past year the number of avail-able pairwise alignments increased from 64 to 133 withparticular emphasis on comparisons within the rice andwheat-related species (Supplementary Table S3) In addi-tion protein-coding genes from all species were used tobuild phylogenetic gene trees using the Ensembl ComparaGene Trees pipeline (19) In addition to providing inferredevolutionary histories of gene families this pipeline iden-tifies orthologous relationships between genes which aresubsequently used to build synteny maps between suffi-ciently related species (Supplementary Table S4) (5) Thegene tree pipeline also outputs contiguous lsquosplitrsquo gene mod-els frequently associated with either the artifacts of mis-annotation or evolutionary adaptations We provide thesplit gene data as species-wise downloadable files with eachrelease (Supplementary Table S5)

Advances in genomics technologies and research overthe last several years have provided new types of data forGramene to display particularly in the realms of gene ex-pression regulation and epigenetics RNA-seq alignmenttracks representing whole transcriptome expression in di-verse tissues conditions or genetic backgrounds are nowdisplayed for wheat maize (20) and all rice-related specieseither as publicly accessible data tracks or importable geneexpression Atlas data as described later Methylation sitedata derived from bisulphite sequencing of two maize ge-netic backgrounds (21) are now displayed in the maizegenome browser (Figure 1B) New data and software ad-vances have also contributed to improved annotation tracksin maize including long non-coding RNAs (21) and proteincoding genes annotated with MAKER-P (2223)

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1135

Figure 1 New genome visualizations and improvements to the Gramene database (A) Species tree of 39 plant reference genomes available in Gramene(B) Visualization of maize methylomic variation (Regulski et al 2013) (C) Improvements to BLAST include multiple target species selection processtracking system and data integration with genome browser visualizations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1136 Nucleic Acids Research 2016 Vol 44 Database issue

PLANT GENETIC DIVERSITY AND SEQUENCE VARI-ATION

Gramene provides SNP andor structural variation datasets for 11 genomes (Supplementary Table S6 (24ndash39)) Thenew bread wheat variation data include over 15 millionSNPs from the Wheat HapMap study (36) about 725 000non-redundant SNPs imported from CerealsDB (AxiomiSelect and KASP array sets httpwwwcerealsdbuknet)and over 103 million inter-homoeologous variants As partof the wheat genome analysis we provide homoeologousSNPs (ie variants between each two of the A B andD wheat component genomes (40) (Kersey et al this is-sue)) as well as their corresponding sequence alignmentsagainst the Hordeum vulgare genome in the wheat and bar-ley genome browsers Sorghum variation was enriched withover 65 million SNPs genotyped in 45 Sorghum bicolor linesby aligning them against the BTx623 reference genomeplus two S propinquum lines reported by Mace et al (35)and about 265 000 SNPs genotyped in 378 lines from theUS Sorghum Association Panel (SAP) (34) The Panzea27 genotyping-by-sequencing (GBS) data set (httpwwwpanzeaorg) consisting of almost 720 000 SNPs typed in 16718 maize and teosinte lines was newly added to our maizegenetic variation collection The barley variation collectionbenefitted from the addition of about 5 million SNPs from90 Morex times Barke individuals and about 6 million vari-ants from 84 Oregon Wolfe barley individuals determinedby POPSEQ (41) in addition to sim2600 markers associatedwith the SNPs from the Illumina iSelect 9k barley SNP chip(42) The new variation data for tomato include over 71million SNPs obtained from whole-genome sequencing of84 tomato accessions and related species (32) SNPs identi-fied in the de-novo transcriptome studies on T monococ-cum (sim500 000 SNPs) (29) and Brachypodium distachyon(sim340 000 SNPs) (27) are also being provided We continueto assign putative functional and structural consequencesto variants on the genes analyzed with the Ensembl varianteffect predictor (VEP) tool (43) Consequences can be vi-sualized in the context of transcript structure and proteindomains For many studies we also provide information onthe genotypes of individual plant accessions and their phe-notypes

BETTER BLAST

Each Gramene release brings new features through ad-vances in the Ensembl software infrastructure As of July2015 Gramene makes use of Ensembl version 80 andwe improved our plant-specific Basic Local AlignmentSearch Tool (BLAST Figure 1C) BLAST improvementsinclude better integration of input and output with thegenome browser (eg direct input from a DNAprotein se-quence view page results linked to browser views includinggenome-wide distribution of hits local alignments associ-ated genes etc) multi-species target selection and trackingsystem for job progress and completion

PLANT PATHWAYS

The pathway portal of Gramene (httpwwwgrameneorgpathways) provides resources for comparative analysis of

plant pathways across several model and crop plant speciesThe Plant Reactome (httpplantreactomegrameneorg) adatabase of plant metabolic and regulatory pathways wasdeveloped collaboratively by Gramene and the Human Re-actome project (44) The Reactome data model organizesgene products small molecules and macromolecular inter-actions into reactions and pathways to build a systems-levelframework of an eukaryotic organism and to illustrate howit is influenced by intrinsic developmental cues and in re-sponse to various signals originating in the surrounding en-vironment

The Plant Reactome features O sativa as a referencespecies To build the reference pathway database cellular-level metabolic networks from RiceCyc (45) were importedinto the Reactome data structure via BioPax v20 Thedatabase was further enriched by adding a small set ofhighly conserved signaling and regulatory pathways (egcell cycle DNA replication transcription translation etc)that were projected from the curated human Reactomebased on gene homology From this initial platform we con-tinue to curate metabolic and regulatory pathways in riceRecently we began to curate and link signaling and reg-ulatory pathways The signaling pathways involve variousplant hormones such as brassinosteroid auxin ethyleneabscisic acid and strigolactone Currently the Plant Reac-tome database contains sim240 rice reference pathways ofwhich 200 were manually curated

The pathways are organized based on a hierarchical clas-sification (Figure 2A) The detailed view of a pathway showsa graphical display of the various associated reactions en-zymes metabolites and cofactors in the context of their sub-cellular locations (Figure 2B) Typically details and sum-mary of various entities (eg overview molecules struc-tures download links etc) are provided below the path-way diagram as shown in Figure 2 Links are also providedto Gramene gene loci and several external resources likeUniProt PIR ChEBI PubChem PubMed and GO Foradvanced users the data are also accessible for downloadin various formats from the download tab and via APIs

We developed gene homology-based pathway projec-tions for 33 plant species (httpplantreactomegrameneorgstatshtml) by using O sativa ssp japonica as a refer-ence Gene IDs and corresponding UniProt IDs of genesfrom curated rice pathways were used as a reference to iden-tify orthologs from plant species with a sequenced genomeandor transcriptomes even when the data might not havebeen served from the Gramene browser For example weapplied Inparanoid clustered orthology data on annotatedleaf transcriptomes of wild Oryza species provided by theOMAP project (46) and the O sativa Aus Kasalath genome(47) which are not hosted by Gramene

In addition to search and browsing of pathways thenew version of Plant Reactome allows analysis of expres-sion data (eg transcriptome proteome metabolome etc)Users can upload custom data to conduct pathway enrich-ment analysis andor compare rice reference pathways withprojected pathways on another plant species To illustratethe utility of the pathway enrichment tool we uploadedtissue-specific expression data from Oryza sativa cv Nip-ponbare (Figure 2B) provided by the EMBL-EBI Expres-sion Atlas project (48) (Petryszak et al this issue)

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1137

Figure 2 A view of Gramenersquos Plant Reactome Pathway browser (httpplantreactomegrameneorg) showing the gene expression data overlay featureof the Analysis Tools The left panel (A) hierarchically lists pathways from reference species Oryza sativa along with the number of gene entities mappedfrom the gene expression data and the FDR value The upper-right panel (B) shows the network of entities from the starch biosynthesis pathway paintedwith the EBI Atlas baseline expression profiles (Petryszak et al 2013) of associated genes across seven tissue types from O sativa cv Nipponbare (httpwwwebiacukgxaexperimentsE-MTAB-2037) Gene products colored in blue and yellow indicate low and high relative expression values respectivelyDiagram entities simultaneously displaying multiple colors signify a defined set or complex of multiple gene products Display options at the top of thebrowser allow users to toggle the display of left and right panels and view a diagram key defining pathway entities reaction types and attributes If theusers are in the expression analysis panel as show here they will see a data analysis table (C) that provides a list of mapped pathways counts of genes andreactions and statistical parameters such as P-value FDR and average expression Users have options to download the analyzed data table The pop-updialogue box in this example (D) displays the constituent gene products for one such entity along with their expression ID and level

Gramene continues to provide access to the previ-ously described Pathway Tools-based metabolic networks(34549) from the iPlant Collaborative website (httppathwayiplantcollaborativeorg)

PLANT GENE EXPRESSION ATLAS

In collaboration with the EMBL-EBIrsquos Expression Atlas(httpwwwebiacukgxa) project (48) (Petryszak et althis issue) we provide information about gene expressionassayed in samples of different cell types organism partsdevelopmental stages diseases and other conditions Thesedata include microarray and RNA-sequencing studies im-ported from ArrayExpress selected for having at least threebiological replicates and other quality measures To system-atize individual studies into a common framework meta-

data was manually curated and annotated with ontologyterms prior to processing using standardized analysis meth-ods described by Petryszak et al (this issue) Atlas datacan be queried for both the baseline and differential ex-pression of genes Baseline expression data provided onlyfor carefully selected RNA-sequencing based studies re-port transcript abundance estimates for each gene in allsamples (control treated tissues cell types collection ofgermplasms) Differential expression data report statisti-cally significant differential gene expression in manually cu-rated pairwise comparisons between two sets of biologi-cal replicates ndash the lsquoreferencersquo set (eg lsquohealthyrsquo or lsquowildtypersquo) and a lsquotestrsquo set (eg lsquodiseasedrsquo or lsquomutantrsquo) As ofAugust 2015 Atlas contains 389 experiments in 11 speciesof plants (httpwwwebiacukgxaplantexperiments eg

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1138 Nucleic Acids Research 2016 Vol 44 Database issue

rice wheat maize and Arabidopsis) including seven base-line studies reporting expression in tissues strains and cul-tivars Atlasrsquos ability to display expression across all tissuesand all baseline studies in a juxtaposed manner in a singleintuitive interface makes it easy for the user to spot corre-lated patterns of expression across multiple lsquoomicsrsquo studiesAtlas offers a number of visualizations eg lsquoenrichmentrsquo ofGO terms Plant Reactome pathways and InterPro domainswithin each pairwise comparison The user interface alsoallows researchers to select the experimental condition andthe genes and to upload the expression data as a data trackon the plant genome browsers provided by Gramene andEnsembl Plants

NEW PUBLIC WEB SERVICE

Gramene now provides a public web service at httpdatagrameneorg that collects and links data from GramenersquosEnsembl and Plant Reactome resources exposing themthrough a RESTful API alongside the Ensembl and Reac-tome REST APIs This service is the basis for developmentof our web-based search analysis and visualization toolsbut is open to all developers

In order to provide a comprehensive gene search ser-vice we decided to integrate Gramenersquos Ensembl geneswith pathway Interpro domain and plant and GO associ-ations using MongoDB Gramenersquos genes document collec-tion (GDC) contains basic information on each gene (egID name description species genomic location cross ref-erences etc) as well as references to Compara gene treesassociated GO and PO terms (50) Interpro domains andpathways Separate document collections are maintainedfor each of these structured annotation types to facilitatecustomization and extension of the database in the fu-ture After incorporation of some additional indexing fieldsfields from the GDC are loaded into a Solr core AuxiliaryMongoDB collections are also loaded into Solr so that wecan suggest relevant filters given a partial query string Weuse Solr for its advanced querying and facet-counting fea-tures and MongoDB for its flexible support for structureddocuments

Using this integrated service it is possible to composecomplex queries across all hosted genomes and obtain a va-riety of summary statistics for the genes in the result set Theweb services provided through httpdatagrameneorg aredocumented with Swagger and open to all developers

WEBINARS AND VIDEO-TUTORIALS

In November 2014 Gramene began to offer monthly we-binars focused on providing an overview of resources andfunctionalities available to users of the Gramene databasefor comparative plant genomics as well as explaining howusers can visualize and analyze data of their choice usingvarious built-in tools The recorded webinars are availableon Gramenersquos YouTube channel (httpsgooglqQ2Pjn)Examples include webinars focused on specific plant species(eg Arabidopsis rice maize etc) andor specific resources(eg genome browser Plant Reactome GrameneMart ex-pression data analysis phylogenetic comparisons and genetrees etc) We are open to suggestions for webinar topics

via e-mail (webinarsgrameneorg) and we benefit fromuser feedback via post-webinar surveys (eg httpswwwsurveymonkeycomr3J2J2M9)

DISCUSSION AND FUTURE PERSPECTIVE

Currently in its 15th year the Gramene database has be-come an unparalleled resource for integrated genomic andpathway information for major model and crop plantsSince the last NAR update in January 2014 Grameneadded 12 new plant genome assemblies to a total of 39fully sequenced reference genomes many of which werealso updated with newer assembly versions andor anno-tations We added sim100 million new genetic variants to anet total of over 190 million plant genetic and structuralvariants from 11 plant species scored on a diversity ofgermplasm accessions Thus the bulk of the new variationdata set is comprised of over 70 million SNPs determined in84 tomato accessions (httpwwwtomatogenomenet) (32)over 12 million bread wheat SNPs (including 10 million ofinter-homoeologous wheat variants Wheat HapMap SNPsand wheat SNPs from the CerealsDB database) as wellas maize SNPs typed in 16 718 maize and teosinte linesfrom the Panzea 27 GBS set (httpwwwpanzeaorggenotypescctl) In addition we expanded our data collec-tion describing the transcriptome and epigenome of eco-nomically important species like rice and maize

In the next year besides projected growth of existing ge-nomic and genetic data we aim to consolidate our phe-notypic data collection by updating our QTL informationand storing and visualizing genome-wide association stud-ies data

Further user interface and search improvements includedevelopment of a comparative genomics-based search inter-face available at httpsearchgrameneorg (beta version)that queries and connects data from the public web servicesdescribed above This enables users to find genes of interestby asking specific questionsndashndashrather than performing a gen-eral text searchndashndashand to see the distribution of results acrossall genomes in Gramene The speed with which results arereturned enables an lsquoexploratoryrsquo navigation style when aninteresting gene is identified it can be used as the basis offurther searches eg to see all available homologs or findgenes with similar domain structures In its alpha releasethe search interface is a JavaScript web application built us-ing ReactJS interface library with visualizations developedin D3 and derived from the KBase platform The web appli-cation is open source with source code public and managedat Github Releases are built using Node Package Manager(lsquonpmrsquo) and deployed automatically using Travis-CI

We will continue to interact develop and provide value-added data and analysis tools to plant biologists breedersgeneticists molecular biologists biochemists and bioinfor-matics researchers With special emphasis on inter an intra-specific analysis of small and large-scale data sets gener-ated from global crop and emerging plant model specieswe expect that researchers will benefit by building hypothe-ses andor validating the existing or new knowledge aboutnovel genes their function expression role in pathwaysphenotypes and associated functional and structural vari-ations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1139

Our monthly webinar series will continue to consider thespecific needs of the plant community addressing trend-ing topics and targeting communities working on particularspecies that would most benefit from direct assistance forworking with Gramenersquos data and resources

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors are grateful to Gramenersquos users researchersand numerous collaborators for sharing data sets generatedin their projects valuable suggestions and feedback on im-proving the overall quality of Gramene as a community re-source We would also like to thank the Cold Spring HarborLaboratory (CSHL) and the Dolan DNA Learning Cen-ter the Center for Genome Research and Biocomputing(CGRB) at Oregon State University and the Ontario In-stitute for Cancer Research (OICR) for infrastructure sup-port We also thank Peter van Buren from Cold Spring Har-bor Laboratory for system administration support DavidCroft (EBI) and Robin Haw (OICR) for support on PlantReactome development undergraduate students Dylan Be-orchia Teague Green and Kindra Amoss (OSU) for theirhelp in pathway curation Samuel Fox and Matthew Geniza(OSU) on gene expression Atlas curation and data analysis

FUNDING

National Science Foundation [IOS-0703908 and IOS-1127112] United States Department of Agriculture - Agri-cultural Research Service [58-1907-4-030 and 1907-21000-030-00D to DW] European Communityrsquos 7th Frame-work Programme (FP72007-2013 Infrastructures) [con-tract 283496 to PK] United Kingdom Biotechnol-ogy and Biosciences Research Council [BBJ000328X1I0080711 and H5315191 to PK] The in-kind infrastruc-ture and intellectual support for the development and run-ning the Plant Reactome is supported by the Reactomedatabase project via a grant from the US National Institutesof Health [P41 HG003751 to LS] EU grant [LSHG-CT-2005-518254] lsquoENFINrsquo Ontario Research Fund EBI In-dustry Programme The funders had no role in the study de-sign data analysis or preparation of the manuscript Fund-ing for open access charge ldquoGramene - Exploring Func-tion through Comparative Genomics and Network Analy-sisrdquo [NSF award 1127112]Conflict of interest statement None declared

REFERENCES1 CunninghamF AmodeMR BarrellD BealK BillisK

BrentS Carvalho-SilvaD ClaphamP CoatesG FitzgeraldSet al (2015) Ensembl 2015 Nucleic Acids Res 43 D662ndashD669

2 CroftD (2013) Building models using Reactome pathways astemplates Methods Mol Biol 1021 273ndash283

3 MonacoMK SteinJ NaithaniS WeiS DharmawardhanaPKumariS AmarasingheV Youens-ClarkK ThomasonJ PreeceJet al (2014) Gramene 2013 comparative plant genomics resourcesNucleic Acids Res 42 D1193ndashD1199

4 SpoonerW Youens-ClarkK StainesD and WareD (2012)GrameneMart the BioMart data portal for the Gramene projectDatabase J Biol Databases Curation bar056

5 Youens-ClarkK BucklerE CasstevensT ChenC DeclerckGDerwentP DharmawardhanaP JaiswalP KerseyPKarthikeyanAS et al (2011) Gramene database in 2010 updatesand extensions Nucleic Acids Res 39 D1085ndashD1094

6 NakamuraY CochraneG Karsch-MizrachiI and InternationalNucleotide Sequence Database C (2013) The internationalnucleotide sequence database collaboration Nucleic Acids Res 41D21ndashD24

7 JacqueminJ BhatiaD SinghK and WingRA (2013) TheInternational Oryza Map Alignment Project development of agenus-wide comparative genomics platform to help solve the 9billion-people question Curr Opin Plant Biol 16 147ndash156

8 International Wheat Genome Sequencing C (2014) Achromosome-based draft sequence of the hexaploid bread wheat(Triticum aestivum) genome Science 345 1251788

9 ChapmanJA MascherM BulucA BarryK GeorganasESessionA StrnadovaV JenkinsJ SehgalS OlikerL et al(2015) A whole-genome shotgun approach for assembling andanchoring the hexaploid bread wheat genome Genome Biol 16 26

10 ParkinIA KohC TangH RobinsonSJ KagaleS ClarkeWETownCD NixonJ KrishnakumarV BidwellSL et al (2014)Transcriptome and methylome profiling reveals relics of genomedominance in the mesopolyploid Brassica oleracea Genome Biol 15R77

11 MotamayorJC MockaitisK SchmutzJ HaiminenNLivingstoneD 3rd CornejoO FindleySD ZhengP UtroFRoyaertS et al (2013) The genome sequence of the most widelycultivated cacao type and its use to identify candidate genesregulating pod color Genome Biol 14 r53

12 International Peach Genome I VerdeI AbbottAG ScalabrinSJungS ShuS MarroniF ZhebentyayevaT DettoriMTGrimwoodJ et al (2013) The high-quality draft genome of peach(Prunus persica) identifies unique patterns of genetic diversitydomestication and genome evolution Nat Genet 45 487ndash494

13 Amborella GenomeP (2013) The Amborella genome and theevolution of flowering plants Science 342 1241089

14 ChamalaS ChanderbaliAS DerJP LanT WaltsBAlbertVA dePamphilisCW Leebens-MackJ RounsleySSchusterSC et al (2013) Assembly and validation of the genome ofthe nonmodel basal angiosperm Amborella Science 342 1516ndash1517

15 PalenikB GrimwoodJ AertsA RouzeP SalamovAPutnamN DupontC JorgensenR DerelleE RombautsS et al(2007) The tiny eukaryote Ostreococcus provides genomic insightsinto the paradox of plankton speciation Proc Natl Acad SciUSA 104 7705ndash7710

16 KerseyPJ StainesDM LawsonD KuleshaE DerwentPHumphreyJC HughesDS KeenanS KerhornouAKoscielnyG et al (2012) Ensembl Genomes an integrative resourcefor genome-scale data from non-vertebrate species Nucleic AcidsRes 40 D91ndashD97

17 KentWJ BaertschR HinrichsA MillerW and HausslerD(2003) Evolutionrsquos cauldron duplication deletion andrearrangement in the mouse and human genomes Proc Natl AcadSci USA 100 11484ndash11489

18 HarrisRS (2007) Improved pairwise alignment of genomic DNA PhDThesis Pennsylvania State University

19 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates GenomeRes 19 327ndash335

20 ErhardKF Jr TalbotJE DeansNC McClishAE andHollickJB (2015) Nascent transcription affected by RNApolymerase IV in Zea mays Genetics 199 1107ndash1125

21 RegulskiM LuZ KendallJ DonoghueMT ReindersJLlacaV DeschampsS SmithA LevyD McCombieWR et al(2013) The maize methylome influences mRNA splice sites andreveals widespread paramutation-like switches guided by small RNAGenome Res 23 1651ndash1662

22 CampbellMS LawM HoltC SteinJC MogheGDHufnagelDE LeiJ AchawanantakunR JiaoD LawrenceCJet al (2014) MAKER-P a tool kit for the rapid creation

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Page 2: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

D1134 Nucleic Acids Research 2016 Vol 44 Database issue

functional ontologies variation data and comparativephylogenomics For the past six years Gramene haspartnered with the Plants division of Ensembl Genomes(httpplantsensemblorg) to jointly produce a commongenome browser mirrored yet independently hosted by theUnited States and the United Kingdom respectively ThisUSA-European collaboration has facilitated the timelyadoption of the software updates and novelty tools thatfrequently accompany Ensembl releases (1)

Gramene developed a comprehensive framework formetabolic signaling and genetic networks at the systemslevel for plants using the Reactome data model analysis andvisualization platform (2) also known as The Plant Reac-tome The current version of The Plant Reactome databasewebsite (httpplantreactomegrameneorg) has a new userinterface and features 238 rice pathways (sim80 of whichwere manually curated) which also serve as reference forprojection of orthologous pathways for 33 plant specieswith sequenced and annotated plant genomes

Gramenersquos software platforms provide region-specific(eg genome browser) or pathway-specific (eg Plant Reac-tome) data downloads In addition project data are avail-able for customizable downloads from the GrameneMart(34) bulk downloads via File Transfer Protocol (ftpftpgrameneorgpubgramene) and programmatic access viapublic MySQL (5) and a new RESTful API (httpdatagrameneorg) together with REST APIs from Plant Reac-tome and Ensembl This article summarizes Gramene up-dates since the last report in this journal (3) through the47th release of the Gramene database in August 2015 Thewebsite database and its contents continue to be updatedfive times per year changes can be followed at the Gramenenews portal (httpwwwgrameneorgblog) and by brows-ing the sitersquos release notes (httpwwwgrameneorgrelease-notes)

NEW PLANT GENOMES

Gramene added 12 new fully sequenced reference genomes(Supplementary Table S1) bringing the total to 39 Thecomposition of selected species reflected community needsbalanced with other considerations such as assembly qual-ity and placement within the plant phylogenetic tree aswell as collaborative support In addition all referencegenome assemblies are accessioned with the InternationalNucleotide Sequence Database Collaboration (INSDChttpwwwinsdcorg) (6) a data sharing policy standardadopted by Gramene and Ensembl Plants As shown inFigure 1A and Supplementary Table S1 the list of in-cluded species provides broad taxonomic representationacross green plants with greater representation in partic-ular clades such as rice wheat and the Brassicas In all theset includes the genomes of 21 monocots 12 core eudicots1 basal angiosperm and 5 non-flowering plants which col-lectively serve both crop and model organism research com-munities Five of the newly added species are wild relativesof rice Oryza meridionalis O glumaepatula O nivara Orufipogon O longistaminata and the rice outgroup Leersiaperrieri which were sequenced by an international consor-tium of researchers known as I-OMAP (7) With the ad-dition of these species Gramene now hosts data for almost

half of the 23 extant species in the Oryza genus providing anunprecedented resource for rice research Deep species rep-resentation is also provided within the Triticaceae includ-ing the first chromosome-scale assembly of hexaploid breadwheat (89) now the largest genome (16 Gbp) in GrameneNewly added eudicot genomes include three nutritionallyand economically important species the cole crops (Bras-sica oleracea) (10) cocoa (Theobroma cacao) (11) and peach(Prunus persica) (12) Two other new species are of partic-ular interest in diverse fields of research The addition ofAmborella trichopoda (1314) basal to Angiosperms (flow-ering plants) should aid in efforts to elucidate archaic andderived genomic features within the angiosperm tree of lifeThe genome of one of the smallest known free-living eu-karyotes a unicellular green alga Ostreococcus lucimarinus(15) has attracted intense research interest due to its centralrole in the oceanic carbon cycle

ANNOTATION AND COMPARATIVE GENOMICS

A principal aim of Gramene is to integrate and draw con-nections between genomes through consistent functionalannotation and comparative analyses For each genomethe community-recognized gene annotation set is character-ized for InterPro domains assigned Gene Ontology (GO)and Plant Ontology terms (Supplementary Table S2) (16)and cross-referenced to source databases Genomes aremapped by whole genome alignment between selected pairsof species (1718) In the past year the number of avail-able pairwise alignments increased from 64 to 133 withparticular emphasis on comparisons within the rice andwheat-related species (Supplementary Table S3) In addi-tion protein-coding genes from all species were used tobuild phylogenetic gene trees using the Ensembl ComparaGene Trees pipeline (19) In addition to providing inferredevolutionary histories of gene families this pipeline iden-tifies orthologous relationships between genes which aresubsequently used to build synteny maps between suffi-ciently related species (Supplementary Table S4) (5) Thegene tree pipeline also outputs contiguous lsquosplitrsquo gene mod-els frequently associated with either the artifacts of mis-annotation or evolutionary adaptations We provide thesplit gene data as species-wise downloadable files with eachrelease (Supplementary Table S5)

Advances in genomics technologies and research overthe last several years have provided new types of data forGramene to display particularly in the realms of gene ex-pression regulation and epigenetics RNA-seq alignmenttracks representing whole transcriptome expression in di-verse tissues conditions or genetic backgrounds are nowdisplayed for wheat maize (20) and all rice-related specieseither as publicly accessible data tracks or importable geneexpression Atlas data as described later Methylation sitedata derived from bisulphite sequencing of two maize ge-netic backgrounds (21) are now displayed in the maizegenome browser (Figure 1B) New data and software ad-vances have also contributed to improved annotation tracksin maize including long non-coding RNAs (21) and proteincoding genes annotated with MAKER-P (2223)

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1135

Figure 1 New genome visualizations and improvements to the Gramene database (A) Species tree of 39 plant reference genomes available in Gramene(B) Visualization of maize methylomic variation (Regulski et al 2013) (C) Improvements to BLAST include multiple target species selection processtracking system and data integration with genome browser visualizations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1136 Nucleic Acids Research 2016 Vol 44 Database issue

PLANT GENETIC DIVERSITY AND SEQUENCE VARI-ATION

Gramene provides SNP andor structural variation datasets for 11 genomes (Supplementary Table S6 (24ndash39)) Thenew bread wheat variation data include over 15 millionSNPs from the Wheat HapMap study (36) about 725 000non-redundant SNPs imported from CerealsDB (AxiomiSelect and KASP array sets httpwwwcerealsdbuknet)and over 103 million inter-homoeologous variants As partof the wheat genome analysis we provide homoeologousSNPs (ie variants between each two of the A B andD wheat component genomes (40) (Kersey et al this is-sue)) as well as their corresponding sequence alignmentsagainst the Hordeum vulgare genome in the wheat and bar-ley genome browsers Sorghum variation was enriched withover 65 million SNPs genotyped in 45 Sorghum bicolor linesby aligning them against the BTx623 reference genomeplus two S propinquum lines reported by Mace et al (35)and about 265 000 SNPs genotyped in 378 lines from theUS Sorghum Association Panel (SAP) (34) The Panzea27 genotyping-by-sequencing (GBS) data set (httpwwwpanzeaorg) consisting of almost 720 000 SNPs typed in 16718 maize and teosinte lines was newly added to our maizegenetic variation collection The barley variation collectionbenefitted from the addition of about 5 million SNPs from90 Morex times Barke individuals and about 6 million vari-ants from 84 Oregon Wolfe barley individuals determinedby POPSEQ (41) in addition to sim2600 markers associatedwith the SNPs from the Illumina iSelect 9k barley SNP chip(42) The new variation data for tomato include over 71million SNPs obtained from whole-genome sequencing of84 tomato accessions and related species (32) SNPs identi-fied in the de-novo transcriptome studies on T monococ-cum (sim500 000 SNPs) (29) and Brachypodium distachyon(sim340 000 SNPs) (27) are also being provided We continueto assign putative functional and structural consequencesto variants on the genes analyzed with the Ensembl varianteffect predictor (VEP) tool (43) Consequences can be vi-sualized in the context of transcript structure and proteindomains For many studies we also provide information onthe genotypes of individual plant accessions and their phe-notypes

BETTER BLAST

Each Gramene release brings new features through ad-vances in the Ensembl software infrastructure As of July2015 Gramene makes use of Ensembl version 80 andwe improved our plant-specific Basic Local AlignmentSearch Tool (BLAST Figure 1C) BLAST improvementsinclude better integration of input and output with thegenome browser (eg direct input from a DNAprotein se-quence view page results linked to browser views includinggenome-wide distribution of hits local alignments associ-ated genes etc) multi-species target selection and trackingsystem for job progress and completion

PLANT PATHWAYS

The pathway portal of Gramene (httpwwwgrameneorgpathways) provides resources for comparative analysis of

plant pathways across several model and crop plant speciesThe Plant Reactome (httpplantreactomegrameneorg) adatabase of plant metabolic and regulatory pathways wasdeveloped collaboratively by Gramene and the Human Re-actome project (44) The Reactome data model organizesgene products small molecules and macromolecular inter-actions into reactions and pathways to build a systems-levelframework of an eukaryotic organism and to illustrate howit is influenced by intrinsic developmental cues and in re-sponse to various signals originating in the surrounding en-vironment

The Plant Reactome features O sativa as a referencespecies To build the reference pathway database cellular-level metabolic networks from RiceCyc (45) were importedinto the Reactome data structure via BioPax v20 Thedatabase was further enriched by adding a small set ofhighly conserved signaling and regulatory pathways (egcell cycle DNA replication transcription translation etc)that were projected from the curated human Reactomebased on gene homology From this initial platform we con-tinue to curate metabolic and regulatory pathways in riceRecently we began to curate and link signaling and reg-ulatory pathways The signaling pathways involve variousplant hormones such as brassinosteroid auxin ethyleneabscisic acid and strigolactone Currently the Plant Reac-tome database contains sim240 rice reference pathways ofwhich 200 were manually curated

The pathways are organized based on a hierarchical clas-sification (Figure 2A) The detailed view of a pathway showsa graphical display of the various associated reactions en-zymes metabolites and cofactors in the context of their sub-cellular locations (Figure 2B) Typically details and sum-mary of various entities (eg overview molecules struc-tures download links etc) are provided below the path-way diagram as shown in Figure 2 Links are also providedto Gramene gene loci and several external resources likeUniProt PIR ChEBI PubChem PubMed and GO Foradvanced users the data are also accessible for downloadin various formats from the download tab and via APIs

We developed gene homology-based pathway projec-tions for 33 plant species (httpplantreactomegrameneorgstatshtml) by using O sativa ssp japonica as a refer-ence Gene IDs and corresponding UniProt IDs of genesfrom curated rice pathways were used as a reference to iden-tify orthologs from plant species with a sequenced genomeandor transcriptomes even when the data might not havebeen served from the Gramene browser For example weapplied Inparanoid clustered orthology data on annotatedleaf transcriptomes of wild Oryza species provided by theOMAP project (46) and the O sativa Aus Kasalath genome(47) which are not hosted by Gramene

In addition to search and browsing of pathways thenew version of Plant Reactome allows analysis of expres-sion data (eg transcriptome proteome metabolome etc)Users can upload custom data to conduct pathway enrich-ment analysis andor compare rice reference pathways withprojected pathways on another plant species To illustratethe utility of the pathway enrichment tool we uploadedtissue-specific expression data from Oryza sativa cv Nip-ponbare (Figure 2B) provided by the EMBL-EBI Expres-sion Atlas project (48) (Petryszak et al this issue)

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1137

Figure 2 A view of Gramenersquos Plant Reactome Pathway browser (httpplantreactomegrameneorg) showing the gene expression data overlay featureof the Analysis Tools The left panel (A) hierarchically lists pathways from reference species Oryza sativa along with the number of gene entities mappedfrom the gene expression data and the FDR value The upper-right panel (B) shows the network of entities from the starch biosynthesis pathway paintedwith the EBI Atlas baseline expression profiles (Petryszak et al 2013) of associated genes across seven tissue types from O sativa cv Nipponbare (httpwwwebiacukgxaexperimentsE-MTAB-2037) Gene products colored in blue and yellow indicate low and high relative expression values respectivelyDiagram entities simultaneously displaying multiple colors signify a defined set or complex of multiple gene products Display options at the top of thebrowser allow users to toggle the display of left and right panels and view a diagram key defining pathway entities reaction types and attributes If theusers are in the expression analysis panel as show here they will see a data analysis table (C) that provides a list of mapped pathways counts of genes andreactions and statistical parameters such as P-value FDR and average expression Users have options to download the analyzed data table The pop-updialogue box in this example (D) displays the constituent gene products for one such entity along with their expression ID and level

Gramene continues to provide access to the previ-ously described Pathway Tools-based metabolic networks(34549) from the iPlant Collaborative website (httppathwayiplantcollaborativeorg)

PLANT GENE EXPRESSION ATLAS

In collaboration with the EMBL-EBIrsquos Expression Atlas(httpwwwebiacukgxa) project (48) (Petryszak et althis issue) we provide information about gene expressionassayed in samples of different cell types organism partsdevelopmental stages diseases and other conditions Thesedata include microarray and RNA-sequencing studies im-ported from ArrayExpress selected for having at least threebiological replicates and other quality measures To system-atize individual studies into a common framework meta-

data was manually curated and annotated with ontologyterms prior to processing using standardized analysis meth-ods described by Petryszak et al (this issue) Atlas datacan be queried for both the baseline and differential ex-pression of genes Baseline expression data provided onlyfor carefully selected RNA-sequencing based studies re-port transcript abundance estimates for each gene in allsamples (control treated tissues cell types collection ofgermplasms) Differential expression data report statisti-cally significant differential gene expression in manually cu-rated pairwise comparisons between two sets of biologi-cal replicates ndash the lsquoreferencersquo set (eg lsquohealthyrsquo or lsquowildtypersquo) and a lsquotestrsquo set (eg lsquodiseasedrsquo or lsquomutantrsquo) As ofAugust 2015 Atlas contains 389 experiments in 11 speciesof plants (httpwwwebiacukgxaplantexperiments eg

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1138 Nucleic Acids Research 2016 Vol 44 Database issue

rice wheat maize and Arabidopsis) including seven base-line studies reporting expression in tissues strains and cul-tivars Atlasrsquos ability to display expression across all tissuesand all baseline studies in a juxtaposed manner in a singleintuitive interface makes it easy for the user to spot corre-lated patterns of expression across multiple lsquoomicsrsquo studiesAtlas offers a number of visualizations eg lsquoenrichmentrsquo ofGO terms Plant Reactome pathways and InterPro domainswithin each pairwise comparison The user interface alsoallows researchers to select the experimental condition andthe genes and to upload the expression data as a data trackon the plant genome browsers provided by Gramene andEnsembl Plants

NEW PUBLIC WEB SERVICE

Gramene now provides a public web service at httpdatagrameneorg that collects and links data from GramenersquosEnsembl and Plant Reactome resources exposing themthrough a RESTful API alongside the Ensembl and Reac-tome REST APIs This service is the basis for developmentof our web-based search analysis and visualization toolsbut is open to all developers

In order to provide a comprehensive gene search ser-vice we decided to integrate Gramenersquos Ensembl geneswith pathway Interpro domain and plant and GO associ-ations using MongoDB Gramenersquos genes document collec-tion (GDC) contains basic information on each gene (egID name description species genomic location cross ref-erences etc) as well as references to Compara gene treesassociated GO and PO terms (50) Interpro domains andpathways Separate document collections are maintainedfor each of these structured annotation types to facilitatecustomization and extension of the database in the fu-ture After incorporation of some additional indexing fieldsfields from the GDC are loaded into a Solr core AuxiliaryMongoDB collections are also loaded into Solr so that wecan suggest relevant filters given a partial query string Weuse Solr for its advanced querying and facet-counting fea-tures and MongoDB for its flexible support for structureddocuments

Using this integrated service it is possible to composecomplex queries across all hosted genomes and obtain a va-riety of summary statistics for the genes in the result set Theweb services provided through httpdatagrameneorg aredocumented with Swagger and open to all developers

WEBINARS AND VIDEO-TUTORIALS

In November 2014 Gramene began to offer monthly we-binars focused on providing an overview of resources andfunctionalities available to users of the Gramene databasefor comparative plant genomics as well as explaining howusers can visualize and analyze data of their choice usingvarious built-in tools The recorded webinars are availableon Gramenersquos YouTube channel (httpsgooglqQ2Pjn)Examples include webinars focused on specific plant species(eg Arabidopsis rice maize etc) andor specific resources(eg genome browser Plant Reactome GrameneMart ex-pression data analysis phylogenetic comparisons and genetrees etc) We are open to suggestions for webinar topics

via e-mail (webinarsgrameneorg) and we benefit fromuser feedback via post-webinar surveys (eg httpswwwsurveymonkeycomr3J2J2M9)

DISCUSSION AND FUTURE PERSPECTIVE

Currently in its 15th year the Gramene database has be-come an unparalleled resource for integrated genomic andpathway information for major model and crop plantsSince the last NAR update in January 2014 Grameneadded 12 new plant genome assemblies to a total of 39fully sequenced reference genomes many of which werealso updated with newer assembly versions andor anno-tations We added sim100 million new genetic variants to anet total of over 190 million plant genetic and structuralvariants from 11 plant species scored on a diversity ofgermplasm accessions Thus the bulk of the new variationdata set is comprised of over 70 million SNPs determined in84 tomato accessions (httpwwwtomatogenomenet) (32)over 12 million bread wheat SNPs (including 10 million ofinter-homoeologous wheat variants Wheat HapMap SNPsand wheat SNPs from the CerealsDB database) as wellas maize SNPs typed in 16 718 maize and teosinte linesfrom the Panzea 27 GBS set (httpwwwpanzeaorggenotypescctl) In addition we expanded our data collec-tion describing the transcriptome and epigenome of eco-nomically important species like rice and maize

In the next year besides projected growth of existing ge-nomic and genetic data we aim to consolidate our phe-notypic data collection by updating our QTL informationand storing and visualizing genome-wide association stud-ies data

Further user interface and search improvements includedevelopment of a comparative genomics-based search inter-face available at httpsearchgrameneorg (beta version)that queries and connects data from the public web servicesdescribed above This enables users to find genes of interestby asking specific questionsndashndashrather than performing a gen-eral text searchndashndashand to see the distribution of results acrossall genomes in Gramene The speed with which results arereturned enables an lsquoexploratoryrsquo navigation style when aninteresting gene is identified it can be used as the basis offurther searches eg to see all available homologs or findgenes with similar domain structures In its alpha releasethe search interface is a JavaScript web application built us-ing ReactJS interface library with visualizations developedin D3 and derived from the KBase platform The web appli-cation is open source with source code public and managedat Github Releases are built using Node Package Manager(lsquonpmrsquo) and deployed automatically using Travis-CI

We will continue to interact develop and provide value-added data and analysis tools to plant biologists breedersgeneticists molecular biologists biochemists and bioinfor-matics researchers With special emphasis on inter an intra-specific analysis of small and large-scale data sets gener-ated from global crop and emerging plant model specieswe expect that researchers will benefit by building hypothe-ses andor validating the existing or new knowledge aboutnovel genes their function expression role in pathwaysphenotypes and associated functional and structural vari-ations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1139

Our monthly webinar series will continue to consider thespecific needs of the plant community addressing trend-ing topics and targeting communities working on particularspecies that would most benefit from direct assistance forworking with Gramenersquos data and resources

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors are grateful to Gramenersquos users researchersand numerous collaborators for sharing data sets generatedin their projects valuable suggestions and feedback on im-proving the overall quality of Gramene as a community re-source We would also like to thank the Cold Spring HarborLaboratory (CSHL) and the Dolan DNA Learning Cen-ter the Center for Genome Research and Biocomputing(CGRB) at Oregon State University and the Ontario In-stitute for Cancer Research (OICR) for infrastructure sup-port We also thank Peter van Buren from Cold Spring Har-bor Laboratory for system administration support DavidCroft (EBI) and Robin Haw (OICR) for support on PlantReactome development undergraduate students Dylan Be-orchia Teague Green and Kindra Amoss (OSU) for theirhelp in pathway curation Samuel Fox and Matthew Geniza(OSU) on gene expression Atlas curation and data analysis

FUNDING

National Science Foundation [IOS-0703908 and IOS-1127112] United States Department of Agriculture - Agri-cultural Research Service [58-1907-4-030 and 1907-21000-030-00D to DW] European Communityrsquos 7th Frame-work Programme (FP72007-2013 Infrastructures) [con-tract 283496 to PK] United Kingdom Biotechnol-ogy and Biosciences Research Council [BBJ000328X1I0080711 and H5315191 to PK] The in-kind infrastruc-ture and intellectual support for the development and run-ning the Plant Reactome is supported by the Reactomedatabase project via a grant from the US National Institutesof Health [P41 HG003751 to LS] EU grant [LSHG-CT-2005-518254] lsquoENFINrsquo Ontario Research Fund EBI In-dustry Programme The funders had no role in the study de-sign data analysis or preparation of the manuscript Fund-ing for open access charge ldquoGramene - Exploring Func-tion through Comparative Genomics and Network Analy-sisrdquo [NSF award 1127112]Conflict of interest statement None declared

REFERENCES1 CunninghamF AmodeMR BarrellD BealK BillisK

BrentS Carvalho-SilvaD ClaphamP CoatesG FitzgeraldSet al (2015) Ensembl 2015 Nucleic Acids Res 43 D662ndashD669

2 CroftD (2013) Building models using Reactome pathways astemplates Methods Mol Biol 1021 273ndash283

3 MonacoMK SteinJ NaithaniS WeiS DharmawardhanaPKumariS AmarasingheV Youens-ClarkK ThomasonJ PreeceJet al (2014) Gramene 2013 comparative plant genomics resourcesNucleic Acids Res 42 D1193ndashD1199

4 SpoonerW Youens-ClarkK StainesD and WareD (2012)GrameneMart the BioMart data portal for the Gramene projectDatabase J Biol Databases Curation bar056

5 Youens-ClarkK BucklerE CasstevensT ChenC DeclerckGDerwentP DharmawardhanaP JaiswalP KerseyPKarthikeyanAS et al (2011) Gramene database in 2010 updatesand extensions Nucleic Acids Res 39 D1085ndashD1094

6 NakamuraY CochraneG Karsch-MizrachiI and InternationalNucleotide Sequence Database C (2013) The internationalnucleotide sequence database collaboration Nucleic Acids Res 41D21ndashD24

7 JacqueminJ BhatiaD SinghK and WingRA (2013) TheInternational Oryza Map Alignment Project development of agenus-wide comparative genomics platform to help solve the 9billion-people question Curr Opin Plant Biol 16 147ndash156

8 International Wheat Genome Sequencing C (2014) Achromosome-based draft sequence of the hexaploid bread wheat(Triticum aestivum) genome Science 345 1251788

9 ChapmanJA MascherM BulucA BarryK GeorganasESessionA StrnadovaV JenkinsJ SehgalS OlikerL et al(2015) A whole-genome shotgun approach for assembling andanchoring the hexaploid bread wheat genome Genome Biol 16 26

10 ParkinIA KohC TangH RobinsonSJ KagaleS ClarkeWETownCD NixonJ KrishnakumarV BidwellSL et al (2014)Transcriptome and methylome profiling reveals relics of genomedominance in the mesopolyploid Brassica oleracea Genome Biol 15R77

11 MotamayorJC MockaitisK SchmutzJ HaiminenNLivingstoneD 3rd CornejoO FindleySD ZhengP UtroFRoyaertS et al (2013) The genome sequence of the most widelycultivated cacao type and its use to identify candidate genesregulating pod color Genome Biol 14 r53

12 International Peach Genome I VerdeI AbbottAG ScalabrinSJungS ShuS MarroniF ZhebentyayevaT DettoriMTGrimwoodJ et al (2013) The high-quality draft genome of peach(Prunus persica) identifies unique patterns of genetic diversitydomestication and genome evolution Nat Genet 45 487ndash494

13 Amborella GenomeP (2013) The Amborella genome and theevolution of flowering plants Science 342 1241089

14 ChamalaS ChanderbaliAS DerJP LanT WaltsBAlbertVA dePamphilisCW Leebens-MackJ RounsleySSchusterSC et al (2013) Assembly and validation of the genome ofthe nonmodel basal angiosperm Amborella Science 342 1516ndash1517

15 PalenikB GrimwoodJ AertsA RouzeP SalamovAPutnamN DupontC JorgensenR DerelleE RombautsS et al(2007) The tiny eukaryote Ostreococcus provides genomic insightsinto the paradox of plankton speciation Proc Natl Acad SciUSA 104 7705ndash7710

16 KerseyPJ StainesDM LawsonD KuleshaE DerwentPHumphreyJC HughesDS KeenanS KerhornouAKoscielnyG et al (2012) Ensembl Genomes an integrative resourcefor genome-scale data from non-vertebrate species Nucleic AcidsRes 40 D91ndashD97

17 KentWJ BaertschR HinrichsA MillerW and HausslerD(2003) Evolutionrsquos cauldron duplication deletion andrearrangement in the mouse and human genomes Proc Natl AcadSci USA 100 11484ndash11489

18 HarrisRS (2007) Improved pairwise alignment of genomic DNA PhDThesis Pennsylvania State University

19 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates GenomeRes 19 327ndash335

20 ErhardKF Jr TalbotJE DeansNC McClishAE andHollickJB (2015) Nascent transcription affected by RNApolymerase IV in Zea mays Genetics 199 1107ndash1125

21 RegulskiM LuZ KendallJ DonoghueMT ReindersJLlacaV DeschampsS SmithA LevyD McCombieWR et al(2013) The maize methylome influences mRNA splice sites andreveals widespread paramutation-like switches guided by small RNAGenome Res 23 1651ndash1662

22 CampbellMS LawM HoltC SteinJC MogheGDHufnagelDE LeiJ AchawanantakunR JiaoD LawrenceCJet al (2014) MAKER-P a tool kit for the rapid creation

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Page 3: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

Nucleic Acids Research 2016 Vol 44 Database issue D1135

Figure 1 New genome visualizations and improvements to the Gramene database (A) Species tree of 39 plant reference genomes available in Gramene(B) Visualization of maize methylomic variation (Regulski et al 2013) (C) Improvements to BLAST include multiple target species selection processtracking system and data integration with genome browser visualizations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1136 Nucleic Acids Research 2016 Vol 44 Database issue

PLANT GENETIC DIVERSITY AND SEQUENCE VARI-ATION

Gramene provides SNP andor structural variation datasets for 11 genomes (Supplementary Table S6 (24ndash39)) Thenew bread wheat variation data include over 15 millionSNPs from the Wheat HapMap study (36) about 725 000non-redundant SNPs imported from CerealsDB (AxiomiSelect and KASP array sets httpwwwcerealsdbuknet)and over 103 million inter-homoeologous variants As partof the wheat genome analysis we provide homoeologousSNPs (ie variants between each two of the A B andD wheat component genomes (40) (Kersey et al this is-sue)) as well as their corresponding sequence alignmentsagainst the Hordeum vulgare genome in the wheat and bar-ley genome browsers Sorghum variation was enriched withover 65 million SNPs genotyped in 45 Sorghum bicolor linesby aligning them against the BTx623 reference genomeplus two S propinquum lines reported by Mace et al (35)and about 265 000 SNPs genotyped in 378 lines from theUS Sorghum Association Panel (SAP) (34) The Panzea27 genotyping-by-sequencing (GBS) data set (httpwwwpanzeaorg) consisting of almost 720 000 SNPs typed in 16718 maize and teosinte lines was newly added to our maizegenetic variation collection The barley variation collectionbenefitted from the addition of about 5 million SNPs from90 Morex times Barke individuals and about 6 million vari-ants from 84 Oregon Wolfe barley individuals determinedby POPSEQ (41) in addition to sim2600 markers associatedwith the SNPs from the Illumina iSelect 9k barley SNP chip(42) The new variation data for tomato include over 71million SNPs obtained from whole-genome sequencing of84 tomato accessions and related species (32) SNPs identi-fied in the de-novo transcriptome studies on T monococ-cum (sim500 000 SNPs) (29) and Brachypodium distachyon(sim340 000 SNPs) (27) are also being provided We continueto assign putative functional and structural consequencesto variants on the genes analyzed with the Ensembl varianteffect predictor (VEP) tool (43) Consequences can be vi-sualized in the context of transcript structure and proteindomains For many studies we also provide information onthe genotypes of individual plant accessions and their phe-notypes

BETTER BLAST

Each Gramene release brings new features through ad-vances in the Ensembl software infrastructure As of July2015 Gramene makes use of Ensembl version 80 andwe improved our plant-specific Basic Local AlignmentSearch Tool (BLAST Figure 1C) BLAST improvementsinclude better integration of input and output with thegenome browser (eg direct input from a DNAprotein se-quence view page results linked to browser views includinggenome-wide distribution of hits local alignments associ-ated genes etc) multi-species target selection and trackingsystem for job progress and completion

PLANT PATHWAYS

The pathway portal of Gramene (httpwwwgrameneorgpathways) provides resources for comparative analysis of

plant pathways across several model and crop plant speciesThe Plant Reactome (httpplantreactomegrameneorg) adatabase of plant metabolic and regulatory pathways wasdeveloped collaboratively by Gramene and the Human Re-actome project (44) The Reactome data model organizesgene products small molecules and macromolecular inter-actions into reactions and pathways to build a systems-levelframework of an eukaryotic organism and to illustrate howit is influenced by intrinsic developmental cues and in re-sponse to various signals originating in the surrounding en-vironment

The Plant Reactome features O sativa as a referencespecies To build the reference pathway database cellular-level metabolic networks from RiceCyc (45) were importedinto the Reactome data structure via BioPax v20 Thedatabase was further enriched by adding a small set ofhighly conserved signaling and regulatory pathways (egcell cycle DNA replication transcription translation etc)that were projected from the curated human Reactomebased on gene homology From this initial platform we con-tinue to curate metabolic and regulatory pathways in riceRecently we began to curate and link signaling and reg-ulatory pathways The signaling pathways involve variousplant hormones such as brassinosteroid auxin ethyleneabscisic acid and strigolactone Currently the Plant Reac-tome database contains sim240 rice reference pathways ofwhich 200 were manually curated

The pathways are organized based on a hierarchical clas-sification (Figure 2A) The detailed view of a pathway showsa graphical display of the various associated reactions en-zymes metabolites and cofactors in the context of their sub-cellular locations (Figure 2B) Typically details and sum-mary of various entities (eg overview molecules struc-tures download links etc) are provided below the path-way diagram as shown in Figure 2 Links are also providedto Gramene gene loci and several external resources likeUniProt PIR ChEBI PubChem PubMed and GO Foradvanced users the data are also accessible for downloadin various formats from the download tab and via APIs

We developed gene homology-based pathway projec-tions for 33 plant species (httpplantreactomegrameneorgstatshtml) by using O sativa ssp japonica as a refer-ence Gene IDs and corresponding UniProt IDs of genesfrom curated rice pathways were used as a reference to iden-tify orthologs from plant species with a sequenced genomeandor transcriptomes even when the data might not havebeen served from the Gramene browser For example weapplied Inparanoid clustered orthology data on annotatedleaf transcriptomes of wild Oryza species provided by theOMAP project (46) and the O sativa Aus Kasalath genome(47) which are not hosted by Gramene

In addition to search and browsing of pathways thenew version of Plant Reactome allows analysis of expres-sion data (eg transcriptome proteome metabolome etc)Users can upload custom data to conduct pathway enrich-ment analysis andor compare rice reference pathways withprojected pathways on another plant species To illustratethe utility of the pathway enrichment tool we uploadedtissue-specific expression data from Oryza sativa cv Nip-ponbare (Figure 2B) provided by the EMBL-EBI Expres-sion Atlas project (48) (Petryszak et al this issue)

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1137

Figure 2 A view of Gramenersquos Plant Reactome Pathway browser (httpplantreactomegrameneorg) showing the gene expression data overlay featureof the Analysis Tools The left panel (A) hierarchically lists pathways from reference species Oryza sativa along with the number of gene entities mappedfrom the gene expression data and the FDR value The upper-right panel (B) shows the network of entities from the starch biosynthesis pathway paintedwith the EBI Atlas baseline expression profiles (Petryszak et al 2013) of associated genes across seven tissue types from O sativa cv Nipponbare (httpwwwebiacukgxaexperimentsE-MTAB-2037) Gene products colored in blue and yellow indicate low and high relative expression values respectivelyDiagram entities simultaneously displaying multiple colors signify a defined set or complex of multiple gene products Display options at the top of thebrowser allow users to toggle the display of left and right panels and view a diagram key defining pathway entities reaction types and attributes If theusers are in the expression analysis panel as show here they will see a data analysis table (C) that provides a list of mapped pathways counts of genes andreactions and statistical parameters such as P-value FDR and average expression Users have options to download the analyzed data table The pop-updialogue box in this example (D) displays the constituent gene products for one such entity along with their expression ID and level

Gramene continues to provide access to the previ-ously described Pathway Tools-based metabolic networks(34549) from the iPlant Collaborative website (httppathwayiplantcollaborativeorg)

PLANT GENE EXPRESSION ATLAS

In collaboration with the EMBL-EBIrsquos Expression Atlas(httpwwwebiacukgxa) project (48) (Petryszak et althis issue) we provide information about gene expressionassayed in samples of different cell types organism partsdevelopmental stages diseases and other conditions Thesedata include microarray and RNA-sequencing studies im-ported from ArrayExpress selected for having at least threebiological replicates and other quality measures To system-atize individual studies into a common framework meta-

data was manually curated and annotated with ontologyterms prior to processing using standardized analysis meth-ods described by Petryszak et al (this issue) Atlas datacan be queried for both the baseline and differential ex-pression of genes Baseline expression data provided onlyfor carefully selected RNA-sequencing based studies re-port transcript abundance estimates for each gene in allsamples (control treated tissues cell types collection ofgermplasms) Differential expression data report statisti-cally significant differential gene expression in manually cu-rated pairwise comparisons between two sets of biologi-cal replicates ndash the lsquoreferencersquo set (eg lsquohealthyrsquo or lsquowildtypersquo) and a lsquotestrsquo set (eg lsquodiseasedrsquo or lsquomutantrsquo) As ofAugust 2015 Atlas contains 389 experiments in 11 speciesof plants (httpwwwebiacukgxaplantexperiments eg

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1138 Nucleic Acids Research 2016 Vol 44 Database issue

rice wheat maize and Arabidopsis) including seven base-line studies reporting expression in tissues strains and cul-tivars Atlasrsquos ability to display expression across all tissuesand all baseline studies in a juxtaposed manner in a singleintuitive interface makes it easy for the user to spot corre-lated patterns of expression across multiple lsquoomicsrsquo studiesAtlas offers a number of visualizations eg lsquoenrichmentrsquo ofGO terms Plant Reactome pathways and InterPro domainswithin each pairwise comparison The user interface alsoallows researchers to select the experimental condition andthe genes and to upload the expression data as a data trackon the plant genome browsers provided by Gramene andEnsembl Plants

NEW PUBLIC WEB SERVICE

Gramene now provides a public web service at httpdatagrameneorg that collects and links data from GramenersquosEnsembl and Plant Reactome resources exposing themthrough a RESTful API alongside the Ensembl and Reac-tome REST APIs This service is the basis for developmentof our web-based search analysis and visualization toolsbut is open to all developers

In order to provide a comprehensive gene search ser-vice we decided to integrate Gramenersquos Ensembl geneswith pathway Interpro domain and plant and GO associ-ations using MongoDB Gramenersquos genes document collec-tion (GDC) contains basic information on each gene (egID name description species genomic location cross ref-erences etc) as well as references to Compara gene treesassociated GO and PO terms (50) Interpro domains andpathways Separate document collections are maintainedfor each of these structured annotation types to facilitatecustomization and extension of the database in the fu-ture After incorporation of some additional indexing fieldsfields from the GDC are loaded into a Solr core AuxiliaryMongoDB collections are also loaded into Solr so that wecan suggest relevant filters given a partial query string Weuse Solr for its advanced querying and facet-counting fea-tures and MongoDB for its flexible support for structureddocuments

Using this integrated service it is possible to composecomplex queries across all hosted genomes and obtain a va-riety of summary statistics for the genes in the result set Theweb services provided through httpdatagrameneorg aredocumented with Swagger and open to all developers

WEBINARS AND VIDEO-TUTORIALS

In November 2014 Gramene began to offer monthly we-binars focused on providing an overview of resources andfunctionalities available to users of the Gramene databasefor comparative plant genomics as well as explaining howusers can visualize and analyze data of their choice usingvarious built-in tools The recorded webinars are availableon Gramenersquos YouTube channel (httpsgooglqQ2Pjn)Examples include webinars focused on specific plant species(eg Arabidopsis rice maize etc) andor specific resources(eg genome browser Plant Reactome GrameneMart ex-pression data analysis phylogenetic comparisons and genetrees etc) We are open to suggestions for webinar topics

via e-mail (webinarsgrameneorg) and we benefit fromuser feedback via post-webinar surveys (eg httpswwwsurveymonkeycomr3J2J2M9)

DISCUSSION AND FUTURE PERSPECTIVE

Currently in its 15th year the Gramene database has be-come an unparalleled resource for integrated genomic andpathway information for major model and crop plantsSince the last NAR update in January 2014 Grameneadded 12 new plant genome assemblies to a total of 39fully sequenced reference genomes many of which werealso updated with newer assembly versions andor anno-tations We added sim100 million new genetic variants to anet total of over 190 million plant genetic and structuralvariants from 11 plant species scored on a diversity ofgermplasm accessions Thus the bulk of the new variationdata set is comprised of over 70 million SNPs determined in84 tomato accessions (httpwwwtomatogenomenet) (32)over 12 million bread wheat SNPs (including 10 million ofinter-homoeologous wheat variants Wheat HapMap SNPsand wheat SNPs from the CerealsDB database) as wellas maize SNPs typed in 16 718 maize and teosinte linesfrom the Panzea 27 GBS set (httpwwwpanzeaorggenotypescctl) In addition we expanded our data collec-tion describing the transcriptome and epigenome of eco-nomically important species like rice and maize

In the next year besides projected growth of existing ge-nomic and genetic data we aim to consolidate our phe-notypic data collection by updating our QTL informationand storing and visualizing genome-wide association stud-ies data

Further user interface and search improvements includedevelopment of a comparative genomics-based search inter-face available at httpsearchgrameneorg (beta version)that queries and connects data from the public web servicesdescribed above This enables users to find genes of interestby asking specific questionsndashndashrather than performing a gen-eral text searchndashndashand to see the distribution of results acrossall genomes in Gramene The speed with which results arereturned enables an lsquoexploratoryrsquo navigation style when aninteresting gene is identified it can be used as the basis offurther searches eg to see all available homologs or findgenes with similar domain structures In its alpha releasethe search interface is a JavaScript web application built us-ing ReactJS interface library with visualizations developedin D3 and derived from the KBase platform The web appli-cation is open source with source code public and managedat Github Releases are built using Node Package Manager(lsquonpmrsquo) and deployed automatically using Travis-CI

We will continue to interact develop and provide value-added data and analysis tools to plant biologists breedersgeneticists molecular biologists biochemists and bioinfor-matics researchers With special emphasis on inter an intra-specific analysis of small and large-scale data sets gener-ated from global crop and emerging plant model specieswe expect that researchers will benefit by building hypothe-ses andor validating the existing or new knowledge aboutnovel genes their function expression role in pathwaysphenotypes and associated functional and structural vari-ations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1139

Our monthly webinar series will continue to consider thespecific needs of the plant community addressing trend-ing topics and targeting communities working on particularspecies that would most benefit from direct assistance forworking with Gramenersquos data and resources

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors are grateful to Gramenersquos users researchersand numerous collaborators for sharing data sets generatedin their projects valuable suggestions and feedback on im-proving the overall quality of Gramene as a community re-source We would also like to thank the Cold Spring HarborLaboratory (CSHL) and the Dolan DNA Learning Cen-ter the Center for Genome Research and Biocomputing(CGRB) at Oregon State University and the Ontario In-stitute for Cancer Research (OICR) for infrastructure sup-port We also thank Peter van Buren from Cold Spring Har-bor Laboratory for system administration support DavidCroft (EBI) and Robin Haw (OICR) for support on PlantReactome development undergraduate students Dylan Be-orchia Teague Green and Kindra Amoss (OSU) for theirhelp in pathway curation Samuel Fox and Matthew Geniza(OSU) on gene expression Atlas curation and data analysis

FUNDING

National Science Foundation [IOS-0703908 and IOS-1127112] United States Department of Agriculture - Agri-cultural Research Service [58-1907-4-030 and 1907-21000-030-00D to DW] European Communityrsquos 7th Frame-work Programme (FP72007-2013 Infrastructures) [con-tract 283496 to PK] United Kingdom Biotechnol-ogy and Biosciences Research Council [BBJ000328X1I0080711 and H5315191 to PK] The in-kind infrastruc-ture and intellectual support for the development and run-ning the Plant Reactome is supported by the Reactomedatabase project via a grant from the US National Institutesof Health [P41 HG003751 to LS] EU grant [LSHG-CT-2005-518254] lsquoENFINrsquo Ontario Research Fund EBI In-dustry Programme The funders had no role in the study de-sign data analysis or preparation of the manuscript Fund-ing for open access charge ldquoGramene - Exploring Func-tion through Comparative Genomics and Network Analy-sisrdquo [NSF award 1127112]Conflict of interest statement None declared

REFERENCES1 CunninghamF AmodeMR BarrellD BealK BillisK

BrentS Carvalho-SilvaD ClaphamP CoatesG FitzgeraldSet al (2015) Ensembl 2015 Nucleic Acids Res 43 D662ndashD669

2 CroftD (2013) Building models using Reactome pathways astemplates Methods Mol Biol 1021 273ndash283

3 MonacoMK SteinJ NaithaniS WeiS DharmawardhanaPKumariS AmarasingheV Youens-ClarkK ThomasonJ PreeceJet al (2014) Gramene 2013 comparative plant genomics resourcesNucleic Acids Res 42 D1193ndashD1199

4 SpoonerW Youens-ClarkK StainesD and WareD (2012)GrameneMart the BioMart data portal for the Gramene projectDatabase J Biol Databases Curation bar056

5 Youens-ClarkK BucklerE CasstevensT ChenC DeclerckGDerwentP DharmawardhanaP JaiswalP KerseyPKarthikeyanAS et al (2011) Gramene database in 2010 updatesand extensions Nucleic Acids Res 39 D1085ndashD1094

6 NakamuraY CochraneG Karsch-MizrachiI and InternationalNucleotide Sequence Database C (2013) The internationalnucleotide sequence database collaboration Nucleic Acids Res 41D21ndashD24

7 JacqueminJ BhatiaD SinghK and WingRA (2013) TheInternational Oryza Map Alignment Project development of agenus-wide comparative genomics platform to help solve the 9billion-people question Curr Opin Plant Biol 16 147ndash156

8 International Wheat Genome Sequencing C (2014) Achromosome-based draft sequence of the hexaploid bread wheat(Triticum aestivum) genome Science 345 1251788

9 ChapmanJA MascherM BulucA BarryK GeorganasESessionA StrnadovaV JenkinsJ SehgalS OlikerL et al(2015) A whole-genome shotgun approach for assembling andanchoring the hexaploid bread wheat genome Genome Biol 16 26

10 ParkinIA KohC TangH RobinsonSJ KagaleS ClarkeWETownCD NixonJ KrishnakumarV BidwellSL et al (2014)Transcriptome and methylome profiling reveals relics of genomedominance in the mesopolyploid Brassica oleracea Genome Biol 15R77

11 MotamayorJC MockaitisK SchmutzJ HaiminenNLivingstoneD 3rd CornejoO FindleySD ZhengP UtroFRoyaertS et al (2013) The genome sequence of the most widelycultivated cacao type and its use to identify candidate genesregulating pod color Genome Biol 14 r53

12 International Peach Genome I VerdeI AbbottAG ScalabrinSJungS ShuS MarroniF ZhebentyayevaT DettoriMTGrimwoodJ et al (2013) The high-quality draft genome of peach(Prunus persica) identifies unique patterns of genetic diversitydomestication and genome evolution Nat Genet 45 487ndash494

13 Amborella GenomeP (2013) The Amborella genome and theevolution of flowering plants Science 342 1241089

14 ChamalaS ChanderbaliAS DerJP LanT WaltsBAlbertVA dePamphilisCW Leebens-MackJ RounsleySSchusterSC et al (2013) Assembly and validation of the genome ofthe nonmodel basal angiosperm Amborella Science 342 1516ndash1517

15 PalenikB GrimwoodJ AertsA RouzeP SalamovAPutnamN DupontC JorgensenR DerelleE RombautsS et al(2007) The tiny eukaryote Ostreococcus provides genomic insightsinto the paradox of plankton speciation Proc Natl Acad SciUSA 104 7705ndash7710

16 KerseyPJ StainesDM LawsonD KuleshaE DerwentPHumphreyJC HughesDS KeenanS KerhornouAKoscielnyG et al (2012) Ensembl Genomes an integrative resourcefor genome-scale data from non-vertebrate species Nucleic AcidsRes 40 D91ndashD97

17 KentWJ BaertschR HinrichsA MillerW and HausslerD(2003) Evolutionrsquos cauldron duplication deletion andrearrangement in the mouse and human genomes Proc Natl AcadSci USA 100 11484ndash11489

18 HarrisRS (2007) Improved pairwise alignment of genomic DNA PhDThesis Pennsylvania State University

19 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates GenomeRes 19 327ndash335

20 ErhardKF Jr TalbotJE DeansNC McClishAE andHollickJB (2015) Nascent transcription affected by RNApolymerase IV in Zea mays Genetics 199 1107ndash1125

21 RegulskiM LuZ KendallJ DonoghueMT ReindersJLlacaV DeschampsS SmithA LevyD McCombieWR et al(2013) The maize methylome influences mRNA splice sites andreveals widespread paramutation-like switches guided by small RNAGenome Res 23 1651ndash1662

22 CampbellMS LawM HoltC SteinJC MogheGDHufnagelDE LeiJ AchawanantakunR JiaoD LawrenceCJet al (2014) MAKER-P a tool kit for the rapid creation

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Page 4: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

D1136 Nucleic Acids Research 2016 Vol 44 Database issue

PLANT GENETIC DIVERSITY AND SEQUENCE VARI-ATION

Gramene provides SNP andor structural variation datasets for 11 genomes (Supplementary Table S6 (24ndash39)) Thenew bread wheat variation data include over 15 millionSNPs from the Wheat HapMap study (36) about 725 000non-redundant SNPs imported from CerealsDB (AxiomiSelect and KASP array sets httpwwwcerealsdbuknet)and over 103 million inter-homoeologous variants As partof the wheat genome analysis we provide homoeologousSNPs (ie variants between each two of the A B andD wheat component genomes (40) (Kersey et al this is-sue)) as well as their corresponding sequence alignmentsagainst the Hordeum vulgare genome in the wheat and bar-ley genome browsers Sorghum variation was enriched withover 65 million SNPs genotyped in 45 Sorghum bicolor linesby aligning them against the BTx623 reference genomeplus two S propinquum lines reported by Mace et al (35)and about 265 000 SNPs genotyped in 378 lines from theUS Sorghum Association Panel (SAP) (34) The Panzea27 genotyping-by-sequencing (GBS) data set (httpwwwpanzeaorg) consisting of almost 720 000 SNPs typed in 16718 maize and teosinte lines was newly added to our maizegenetic variation collection The barley variation collectionbenefitted from the addition of about 5 million SNPs from90 Morex times Barke individuals and about 6 million vari-ants from 84 Oregon Wolfe barley individuals determinedby POPSEQ (41) in addition to sim2600 markers associatedwith the SNPs from the Illumina iSelect 9k barley SNP chip(42) The new variation data for tomato include over 71million SNPs obtained from whole-genome sequencing of84 tomato accessions and related species (32) SNPs identi-fied in the de-novo transcriptome studies on T monococ-cum (sim500 000 SNPs) (29) and Brachypodium distachyon(sim340 000 SNPs) (27) are also being provided We continueto assign putative functional and structural consequencesto variants on the genes analyzed with the Ensembl varianteffect predictor (VEP) tool (43) Consequences can be vi-sualized in the context of transcript structure and proteindomains For many studies we also provide information onthe genotypes of individual plant accessions and their phe-notypes

BETTER BLAST

Each Gramene release brings new features through ad-vances in the Ensembl software infrastructure As of July2015 Gramene makes use of Ensembl version 80 andwe improved our plant-specific Basic Local AlignmentSearch Tool (BLAST Figure 1C) BLAST improvementsinclude better integration of input and output with thegenome browser (eg direct input from a DNAprotein se-quence view page results linked to browser views includinggenome-wide distribution of hits local alignments associ-ated genes etc) multi-species target selection and trackingsystem for job progress and completion

PLANT PATHWAYS

The pathway portal of Gramene (httpwwwgrameneorgpathways) provides resources for comparative analysis of

plant pathways across several model and crop plant speciesThe Plant Reactome (httpplantreactomegrameneorg) adatabase of plant metabolic and regulatory pathways wasdeveloped collaboratively by Gramene and the Human Re-actome project (44) The Reactome data model organizesgene products small molecules and macromolecular inter-actions into reactions and pathways to build a systems-levelframework of an eukaryotic organism and to illustrate howit is influenced by intrinsic developmental cues and in re-sponse to various signals originating in the surrounding en-vironment

The Plant Reactome features O sativa as a referencespecies To build the reference pathway database cellular-level metabolic networks from RiceCyc (45) were importedinto the Reactome data structure via BioPax v20 Thedatabase was further enriched by adding a small set ofhighly conserved signaling and regulatory pathways (egcell cycle DNA replication transcription translation etc)that were projected from the curated human Reactomebased on gene homology From this initial platform we con-tinue to curate metabolic and regulatory pathways in riceRecently we began to curate and link signaling and reg-ulatory pathways The signaling pathways involve variousplant hormones such as brassinosteroid auxin ethyleneabscisic acid and strigolactone Currently the Plant Reac-tome database contains sim240 rice reference pathways ofwhich 200 were manually curated

The pathways are organized based on a hierarchical clas-sification (Figure 2A) The detailed view of a pathway showsa graphical display of the various associated reactions en-zymes metabolites and cofactors in the context of their sub-cellular locations (Figure 2B) Typically details and sum-mary of various entities (eg overview molecules struc-tures download links etc) are provided below the path-way diagram as shown in Figure 2 Links are also providedto Gramene gene loci and several external resources likeUniProt PIR ChEBI PubChem PubMed and GO Foradvanced users the data are also accessible for downloadin various formats from the download tab and via APIs

We developed gene homology-based pathway projec-tions for 33 plant species (httpplantreactomegrameneorgstatshtml) by using O sativa ssp japonica as a refer-ence Gene IDs and corresponding UniProt IDs of genesfrom curated rice pathways were used as a reference to iden-tify orthologs from plant species with a sequenced genomeandor transcriptomes even when the data might not havebeen served from the Gramene browser For example weapplied Inparanoid clustered orthology data on annotatedleaf transcriptomes of wild Oryza species provided by theOMAP project (46) and the O sativa Aus Kasalath genome(47) which are not hosted by Gramene

In addition to search and browsing of pathways thenew version of Plant Reactome allows analysis of expres-sion data (eg transcriptome proteome metabolome etc)Users can upload custom data to conduct pathway enrich-ment analysis andor compare rice reference pathways withprojected pathways on another plant species To illustratethe utility of the pathway enrichment tool we uploadedtissue-specific expression data from Oryza sativa cv Nip-ponbare (Figure 2B) provided by the EMBL-EBI Expres-sion Atlas project (48) (Petryszak et al this issue)

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1137

Figure 2 A view of Gramenersquos Plant Reactome Pathway browser (httpplantreactomegrameneorg) showing the gene expression data overlay featureof the Analysis Tools The left panel (A) hierarchically lists pathways from reference species Oryza sativa along with the number of gene entities mappedfrom the gene expression data and the FDR value The upper-right panel (B) shows the network of entities from the starch biosynthesis pathway paintedwith the EBI Atlas baseline expression profiles (Petryszak et al 2013) of associated genes across seven tissue types from O sativa cv Nipponbare (httpwwwebiacukgxaexperimentsE-MTAB-2037) Gene products colored in blue and yellow indicate low and high relative expression values respectivelyDiagram entities simultaneously displaying multiple colors signify a defined set or complex of multiple gene products Display options at the top of thebrowser allow users to toggle the display of left and right panels and view a diagram key defining pathway entities reaction types and attributes If theusers are in the expression analysis panel as show here they will see a data analysis table (C) that provides a list of mapped pathways counts of genes andreactions and statistical parameters such as P-value FDR and average expression Users have options to download the analyzed data table The pop-updialogue box in this example (D) displays the constituent gene products for one such entity along with their expression ID and level

Gramene continues to provide access to the previ-ously described Pathway Tools-based metabolic networks(34549) from the iPlant Collaborative website (httppathwayiplantcollaborativeorg)

PLANT GENE EXPRESSION ATLAS

In collaboration with the EMBL-EBIrsquos Expression Atlas(httpwwwebiacukgxa) project (48) (Petryszak et althis issue) we provide information about gene expressionassayed in samples of different cell types organism partsdevelopmental stages diseases and other conditions Thesedata include microarray and RNA-sequencing studies im-ported from ArrayExpress selected for having at least threebiological replicates and other quality measures To system-atize individual studies into a common framework meta-

data was manually curated and annotated with ontologyterms prior to processing using standardized analysis meth-ods described by Petryszak et al (this issue) Atlas datacan be queried for both the baseline and differential ex-pression of genes Baseline expression data provided onlyfor carefully selected RNA-sequencing based studies re-port transcript abundance estimates for each gene in allsamples (control treated tissues cell types collection ofgermplasms) Differential expression data report statisti-cally significant differential gene expression in manually cu-rated pairwise comparisons between two sets of biologi-cal replicates ndash the lsquoreferencersquo set (eg lsquohealthyrsquo or lsquowildtypersquo) and a lsquotestrsquo set (eg lsquodiseasedrsquo or lsquomutantrsquo) As ofAugust 2015 Atlas contains 389 experiments in 11 speciesof plants (httpwwwebiacukgxaplantexperiments eg

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1138 Nucleic Acids Research 2016 Vol 44 Database issue

rice wheat maize and Arabidopsis) including seven base-line studies reporting expression in tissues strains and cul-tivars Atlasrsquos ability to display expression across all tissuesand all baseline studies in a juxtaposed manner in a singleintuitive interface makes it easy for the user to spot corre-lated patterns of expression across multiple lsquoomicsrsquo studiesAtlas offers a number of visualizations eg lsquoenrichmentrsquo ofGO terms Plant Reactome pathways and InterPro domainswithin each pairwise comparison The user interface alsoallows researchers to select the experimental condition andthe genes and to upload the expression data as a data trackon the plant genome browsers provided by Gramene andEnsembl Plants

NEW PUBLIC WEB SERVICE

Gramene now provides a public web service at httpdatagrameneorg that collects and links data from GramenersquosEnsembl and Plant Reactome resources exposing themthrough a RESTful API alongside the Ensembl and Reac-tome REST APIs This service is the basis for developmentof our web-based search analysis and visualization toolsbut is open to all developers

In order to provide a comprehensive gene search ser-vice we decided to integrate Gramenersquos Ensembl geneswith pathway Interpro domain and plant and GO associ-ations using MongoDB Gramenersquos genes document collec-tion (GDC) contains basic information on each gene (egID name description species genomic location cross ref-erences etc) as well as references to Compara gene treesassociated GO and PO terms (50) Interpro domains andpathways Separate document collections are maintainedfor each of these structured annotation types to facilitatecustomization and extension of the database in the fu-ture After incorporation of some additional indexing fieldsfields from the GDC are loaded into a Solr core AuxiliaryMongoDB collections are also loaded into Solr so that wecan suggest relevant filters given a partial query string Weuse Solr for its advanced querying and facet-counting fea-tures and MongoDB for its flexible support for structureddocuments

Using this integrated service it is possible to composecomplex queries across all hosted genomes and obtain a va-riety of summary statistics for the genes in the result set Theweb services provided through httpdatagrameneorg aredocumented with Swagger and open to all developers

WEBINARS AND VIDEO-TUTORIALS

In November 2014 Gramene began to offer monthly we-binars focused on providing an overview of resources andfunctionalities available to users of the Gramene databasefor comparative plant genomics as well as explaining howusers can visualize and analyze data of their choice usingvarious built-in tools The recorded webinars are availableon Gramenersquos YouTube channel (httpsgooglqQ2Pjn)Examples include webinars focused on specific plant species(eg Arabidopsis rice maize etc) andor specific resources(eg genome browser Plant Reactome GrameneMart ex-pression data analysis phylogenetic comparisons and genetrees etc) We are open to suggestions for webinar topics

via e-mail (webinarsgrameneorg) and we benefit fromuser feedback via post-webinar surveys (eg httpswwwsurveymonkeycomr3J2J2M9)

DISCUSSION AND FUTURE PERSPECTIVE

Currently in its 15th year the Gramene database has be-come an unparalleled resource for integrated genomic andpathway information for major model and crop plantsSince the last NAR update in January 2014 Grameneadded 12 new plant genome assemblies to a total of 39fully sequenced reference genomes many of which werealso updated with newer assembly versions andor anno-tations We added sim100 million new genetic variants to anet total of over 190 million plant genetic and structuralvariants from 11 plant species scored on a diversity ofgermplasm accessions Thus the bulk of the new variationdata set is comprised of over 70 million SNPs determined in84 tomato accessions (httpwwwtomatogenomenet) (32)over 12 million bread wheat SNPs (including 10 million ofinter-homoeologous wheat variants Wheat HapMap SNPsand wheat SNPs from the CerealsDB database) as wellas maize SNPs typed in 16 718 maize and teosinte linesfrom the Panzea 27 GBS set (httpwwwpanzeaorggenotypescctl) In addition we expanded our data collec-tion describing the transcriptome and epigenome of eco-nomically important species like rice and maize

In the next year besides projected growth of existing ge-nomic and genetic data we aim to consolidate our phe-notypic data collection by updating our QTL informationand storing and visualizing genome-wide association stud-ies data

Further user interface and search improvements includedevelopment of a comparative genomics-based search inter-face available at httpsearchgrameneorg (beta version)that queries and connects data from the public web servicesdescribed above This enables users to find genes of interestby asking specific questionsndashndashrather than performing a gen-eral text searchndashndashand to see the distribution of results acrossall genomes in Gramene The speed with which results arereturned enables an lsquoexploratoryrsquo navigation style when aninteresting gene is identified it can be used as the basis offurther searches eg to see all available homologs or findgenes with similar domain structures In its alpha releasethe search interface is a JavaScript web application built us-ing ReactJS interface library with visualizations developedin D3 and derived from the KBase platform The web appli-cation is open source with source code public and managedat Github Releases are built using Node Package Manager(lsquonpmrsquo) and deployed automatically using Travis-CI

We will continue to interact develop and provide value-added data and analysis tools to plant biologists breedersgeneticists molecular biologists biochemists and bioinfor-matics researchers With special emphasis on inter an intra-specific analysis of small and large-scale data sets gener-ated from global crop and emerging plant model specieswe expect that researchers will benefit by building hypothe-ses andor validating the existing or new knowledge aboutnovel genes their function expression role in pathwaysphenotypes and associated functional and structural vari-ations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1139

Our monthly webinar series will continue to consider thespecific needs of the plant community addressing trend-ing topics and targeting communities working on particularspecies that would most benefit from direct assistance forworking with Gramenersquos data and resources

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors are grateful to Gramenersquos users researchersand numerous collaborators for sharing data sets generatedin their projects valuable suggestions and feedback on im-proving the overall quality of Gramene as a community re-source We would also like to thank the Cold Spring HarborLaboratory (CSHL) and the Dolan DNA Learning Cen-ter the Center for Genome Research and Biocomputing(CGRB) at Oregon State University and the Ontario In-stitute for Cancer Research (OICR) for infrastructure sup-port We also thank Peter van Buren from Cold Spring Har-bor Laboratory for system administration support DavidCroft (EBI) and Robin Haw (OICR) for support on PlantReactome development undergraduate students Dylan Be-orchia Teague Green and Kindra Amoss (OSU) for theirhelp in pathway curation Samuel Fox and Matthew Geniza(OSU) on gene expression Atlas curation and data analysis

FUNDING

National Science Foundation [IOS-0703908 and IOS-1127112] United States Department of Agriculture - Agri-cultural Research Service [58-1907-4-030 and 1907-21000-030-00D to DW] European Communityrsquos 7th Frame-work Programme (FP72007-2013 Infrastructures) [con-tract 283496 to PK] United Kingdom Biotechnol-ogy and Biosciences Research Council [BBJ000328X1I0080711 and H5315191 to PK] The in-kind infrastruc-ture and intellectual support for the development and run-ning the Plant Reactome is supported by the Reactomedatabase project via a grant from the US National Institutesof Health [P41 HG003751 to LS] EU grant [LSHG-CT-2005-518254] lsquoENFINrsquo Ontario Research Fund EBI In-dustry Programme The funders had no role in the study de-sign data analysis or preparation of the manuscript Fund-ing for open access charge ldquoGramene - Exploring Func-tion through Comparative Genomics and Network Analy-sisrdquo [NSF award 1127112]Conflict of interest statement None declared

REFERENCES1 CunninghamF AmodeMR BarrellD BealK BillisK

BrentS Carvalho-SilvaD ClaphamP CoatesG FitzgeraldSet al (2015) Ensembl 2015 Nucleic Acids Res 43 D662ndashD669

2 CroftD (2013) Building models using Reactome pathways astemplates Methods Mol Biol 1021 273ndash283

3 MonacoMK SteinJ NaithaniS WeiS DharmawardhanaPKumariS AmarasingheV Youens-ClarkK ThomasonJ PreeceJet al (2014) Gramene 2013 comparative plant genomics resourcesNucleic Acids Res 42 D1193ndashD1199

4 SpoonerW Youens-ClarkK StainesD and WareD (2012)GrameneMart the BioMart data portal for the Gramene projectDatabase J Biol Databases Curation bar056

5 Youens-ClarkK BucklerE CasstevensT ChenC DeclerckGDerwentP DharmawardhanaP JaiswalP KerseyPKarthikeyanAS et al (2011) Gramene database in 2010 updatesand extensions Nucleic Acids Res 39 D1085ndashD1094

6 NakamuraY CochraneG Karsch-MizrachiI and InternationalNucleotide Sequence Database C (2013) The internationalnucleotide sequence database collaboration Nucleic Acids Res 41D21ndashD24

7 JacqueminJ BhatiaD SinghK and WingRA (2013) TheInternational Oryza Map Alignment Project development of agenus-wide comparative genomics platform to help solve the 9billion-people question Curr Opin Plant Biol 16 147ndash156

8 International Wheat Genome Sequencing C (2014) Achromosome-based draft sequence of the hexaploid bread wheat(Triticum aestivum) genome Science 345 1251788

9 ChapmanJA MascherM BulucA BarryK GeorganasESessionA StrnadovaV JenkinsJ SehgalS OlikerL et al(2015) A whole-genome shotgun approach for assembling andanchoring the hexaploid bread wheat genome Genome Biol 16 26

10 ParkinIA KohC TangH RobinsonSJ KagaleS ClarkeWETownCD NixonJ KrishnakumarV BidwellSL et al (2014)Transcriptome and methylome profiling reveals relics of genomedominance in the mesopolyploid Brassica oleracea Genome Biol 15R77

11 MotamayorJC MockaitisK SchmutzJ HaiminenNLivingstoneD 3rd CornejoO FindleySD ZhengP UtroFRoyaertS et al (2013) The genome sequence of the most widelycultivated cacao type and its use to identify candidate genesregulating pod color Genome Biol 14 r53

12 International Peach Genome I VerdeI AbbottAG ScalabrinSJungS ShuS MarroniF ZhebentyayevaT DettoriMTGrimwoodJ et al (2013) The high-quality draft genome of peach(Prunus persica) identifies unique patterns of genetic diversitydomestication and genome evolution Nat Genet 45 487ndash494

13 Amborella GenomeP (2013) The Amborella genome and theevolution of flowering plants Science 342 1241089

14 ChamalaS ChanderbaliAS DerJP LanT WaltsBAlbertVA dePamphilisCW Leebens-MackJ RounsleySSchusterSC et al (2013) Assembly and validation of the genome ofthe nonmodel basal angiosperm Amborella Science 342 1516ndash1517

15 PalenikB GrimwoodJ AertsA RouzeP SalamovAPutnamN DupontC JorgensenR DerelleE RombautsS et al(2007) The tiny eukaryote Ostreococcus provides genomic insightsinto the paradox of plankton speciation Proc Natl Acad SciUSA 104 7705ndash7710

16 KerseyPJ StainesDM LawsonD KuleshaE DerwentPHumphreyJC HughesDS KeenanS KerhornouAKoscielnyG et al (2012) Ensembl Genomes an integrative resourcefor genome-scale data from non-vertebrate species Nucleic AcidsRes 40 D91ndashD97

17 KentWJ BaertschR HinrichsA MillerW and HausslerD(2003) Evolutionrsquos cauldron duplication deletion andrearrangement in the mouse and human genomes Proc Natl AcadSci USA 100 11484ndash11489

18 HarrisRS (2007) Improved pairwise alignment of genomic DNA PhDThesis Pennsylvania State University

19 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates GenomeRes 19 327ndash335

20 ErhardKF Jr TalbotJE DeansNC McClishAE andHollickJB (2015) Nascent transcription affected by RNApolymerase IV in Zea mays Genetics 199 1107ndash1125

21 RegulskiM LuZ KendallJ DonoghueMT ReindersJLlacaV DeschampsS SmithA LevyD McCombieWR et al(2013) The maize methylome influences mRNA splice sites andreveals widespread paramutation-like switches guided by small RNAGenome Res 23 1651ndash1662

22 CampbellMS LawM HoltC SteinJC MogheGDHufnagelDE LeiJ AchawanantakunR JiaoD LawrenceCJet al (2014) MAKER-P a tool kit for the rapid creation

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Page 5: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

Nucleic Acids Research 2016 Vol 44 Database issue D1137

Figure 2 A view of Gramenersquos Plant Reactome Pathway browser (httpplantreactomegrameneorg) showing the gene expression data overlay featureof the Analysis Tools The left panel (A) hierarchically lists pathways from reference species Oryza sativa along with the number of gene entities mappedfrom the gene expression data and the FDR value The upper-right panel (B) shows the network of entities from the starch biosynthesis pathway paintedwith the EBI Atlas baseline expression profiles (Petryszak et al 2013) of associated genes across seven tissue types from O sativa cv Nipponbare (httpwwwebiacukgxaexperimentsE-MTAB-2037) Gene products colored in blue and yellow indicate low and high relative expression values respectivelyDiagram entities simultaneously displaying multiple colors signify a defined set or complex of multiple gene products Display options at the top of thebrowser allow users to toggle the display of left and right panels and view a diagram key defining pathway entities reaction types and attributes If theusers are in the expression analysis panel as show here they will see a data analysis table (C) that provides a list of mapped pathways counts of genes andreactions and statistical parameters such as P-value FDR and average expression Users have options to download the analyzed data table The pop-updialogue box in this example (D) displays the constituent gene products for one such entity along with their expression ID and level

Gramene continues to provide access to the previ-ously described Pathway Tools-based metabolic networks(34549) from the iPlant Collaborative website (httppathwayiplantcollaborativeorg)

PLANT GENE EXPRESSION ATLAS

In collaboration with the EMBL-EBIrsquos Expression Atlas(httpwwwebiacukgxa) project (48) (Petryszak et althis issue) we provide information about gene expressionassayed in samples of different cell types organism partsdevelopmental stages diseases and other conditions Thesedata include microarray and RNA-sequencing studies im-ported from ArrayExpress selected for having at least threebiological replicates and other quality measures To system-atize individual studies into a common framework meta-

data was manually curated and annotated with ontologyterms prior to processing using standardized analysis meth-ods described by Petryszak et al (this issue) Atlas datacan be queried for both the baseline and differential ex-pression of genes Baseline expression data provided onlyfor carefully selected RNA-sequencing based studies re-port transcript abundance estimates for each gene in allsamples (control treated tissues cell types collection ofgermplasms) Differential expression data report statisti-cally significant differential gene expression in manually cu-rated pairwise comparisons between two sets of biologi-cal replicates ndash the lsquoreferencersquo set (eg lsquohealthyrsquo or lsquowildtypersquo) and a lsquotestrsquo set (eg lsquodiseasedrsquo or lsquomutantrsquo) As ofAugust 2015 Atlas contains 389 experiments in 11 speciesof plants (httpwwwebiacukgxaplantexperiments eg

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1138 Nucleic Acids Research 2016 Vol 44 Database issue

rice wheat maize and Arabidopsis) including seven base-line studies reporting expression in tissues strains and cul-tivars Atlasrsquos ability to display expression across all tissuesand all baseline studies in a juxtaposed manner in a singleintuitive interface makes it easy for the user to spot corre-lated patterns of expression across multiple lsquoomicsrsquo studiesAtlas offers a number of visualizations eg lsquoenrichmentrsquo ofGO terms Plant Reactome pathways and InterPro domainswithin each pairwise comparison The user interface alsoallows researchers to select the experimental condition andthe genes and to upload the expression data as a data trackon the plant genome browsers provided by Gramene andEnsembl Plants

NEW PUBLIC WEB SERVICE

Gramene now provides a public web service at httpdatagrameneorg that collects and links data from GramenersquosEnsembl and Plant Reactome resources exposing themthrough a RESTful API alongside the Ensembl and Reac-tome REST APIs This service is the basis for developmentof our web-based search analysis and visualization toolsbut is open to all developers

In order to provide a comprehensive gene search ser-vice we decided to integrate Gramenersquos Ensembl geneswith pathway Interpro domain and plant and GO associ-ations using MongoDB Gramenersquos genes document collec-tion (GDC) contains basic information on each gene (egID name description species genomic location cross ref-erences etc) as well as references to Compara gene treesassociated GO and PO terms (50) Interpro domains andpathways Separate document collections are maintainedfor each of these structured annotation types to facilitatecustomization and extension of the database in the fu-ture After incorporation of some additional indexing fieldsfields from the GDC are loaded into a Solr core AuxiliaryMongoDB collections are also loaded into Solr so that wecan suggest relevant filters given a partial query string Weuse Solr for its advanced querying and facet-counting fea-tures and MongoDB for its flexible support for structureddocuments

Using this integrated service it is possible to composecomplex queries across all hosted genomes and obtain a va-riety of summary statistics for the genes in the result set Theweb services provided through httpdatagrameneorg aredocumented with Swagger and open to all developers

WEBINARS AND VIDEO-TUTORIALS

In November 2014 Gramene began to offer monthly we-binars focused on providing an overview of resources andfunctionalities available to users of the Gramene databasefor comparative plant genomics as well as explaining howusers can visualize and analyze data of their choice usingvarious built-in tools The recorded webinars are availableon Gramenersquos YouTube channel (httpsgooglqQ2Pjn)Examples include webinars focused on specific plant species(eg Arabidopsis rice maize etc) andor specific resources(eg genome browser Plant Reactome GrameneMart ex-pression data analysis phylogenetic comparisons and genetrees etc) We are open to suggestions for webinar topics

via e-mail (webinarsgrameneorg) and we benefit fromuser feedback via post-webinar surveys (eg httpswwwsurveymonkeycomr3J2J2M9)

DISCUSSION AND FUTURE PERSPECTIVE

Currently in its 15th year the Gramene database has be-come an unparalleled resource for integrated genomic andpathway information for major model and crop plantsSince the last NAR update in January 2014 Grameneadded 12 new plant genome assemblies to a total of 39fully sequenced reference genomes many of which werealso updated with newer assembly versions andor anno-tations We added sim100 million new genetic variants to anet total of over 190 million plant genetic and structuralvariants from 11 plant species scored on a diversity ofgermplasm accessions Thus the bulk of the new variationdata set is comprised of over 70 million SNPs determined in84 tomato accessions (httpwwwtomatogenomenet) (32)over 12 million bread wheat SNPs (including 10 million ofinter-homoeologous wheat variants Wheat HapMap SNPsand wheat SNPs from the CerealsDB database) as wellas maize SNPs typed in 16 718 maize and teosinte linesfrom the Panzea 27 GBS set (httpwwwpanzeaorggenotypescctl) In addition we expanded our data collec-tion describing the transcriptome and epigenome of eco-nomically important species like rice and maize

In the next year besides projected growth of existing ge-nomic and genetic data we aim to consolidate our phe-notypic data collection by updating our QTL informationand storing and visualizing genome-wide association stud-ies data

Further user interface and search improvements includedevelopment of a comparative genomics-based search inter-face available at httpsearchgrameneorg (beta version)that queries and connects data from the public web servicesdescribed above This enables users to find genes of interestby asking specific questionsndashndashrather than performing a gen-eral text searchndashndashand to see the distribution of results acrossall genomes in Gramene The speed with which results arereturned enables an lsquoexploratoryrsquo navigation style when aninteresting gene is identified it can be used as the basis offurther searches eg to see all available homologs or findgenes with similar domain structures In its alpha releasethe search interface is a JavaScript web application built us-ing ReactJS interface library with visualizations developedin D3 and derived from the KBase platform The web appli-cation is open source with source code public and managedat Github Releases are built using Node Package Manager(lsquonpmrsquo) and deployed automatically using Travis-CI

We will continue to interact develop and provide value-added data and analysis tools to plant biologists breedersgeneticists molecular biologists biochemists and bioinfor-matics researchers With special emphasis on inter an intra-specific analysis of small and large-scale data sets gener-ated from global crop and emerging plant model specieswe expect that researchers will benefit by building hypothe-ses andor validating the existing or new knowledge aboutnovel genes their function expression role in pathwaysphenotypes and associated functional and structural vari-ations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1139

Our monthly webinar series will continue to consider thespecific needs of the plant community addressing trend-ing topics and targeting communities working on particularspecies that would most benefit from direct assistance forworking with Gramenersquos data and resources

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors are grateful to Gramenersquos users researchersand numerous collaborators for sharing data sets generatedin their projects valuable suggestions and feedback on im-proving the overall quality of Gramene as a community re-source We would also like to thank the Cold Spring HarborLaboratory (CSHL) and the Dolan DNA Learning Cen-ter the Center for Genome Research and Biocomputing(CGRB) at Oregon State University and the Ontario In-stitute for Cancer Research (OICR) for infrastructure sup-port We also thank Peter van Buren from Cold Spring Har-bor Laboratory for system administration support DavidCroft (EBI) and Robin Haw (OICR) for support on PlantReactome development undergraduate students Dylan Be-orchia Teague Green and Kindra Amoss (OSU) for theirhelp in pathway curation Samuel Fox and Matthew Geniza(OSU) on gene expression Atlas curation and data analysis

FUNDING

National Science Foundation [IOS-0703908 and IOS-1127112] United States Department of Agriculture - Agri-cultural Research Service [58-1907-4-030 and 1907-21000-030-00D to DW] European Communityrsquos 7th Frame-work Programme (FP72007-2013 Infrastructures) [con-tract 283496 to PK] United Kingdom Biotechnol-ogy and Biosciences Research Council [BBJ000328X1I0080711 and H5315191 to PK] The in-kind infrastruc-ture and intellectual support for the development and run-ning the Plant Reactome is supported by the Reactomedatabase project via a grant from the US National Institutesof Health [P41 HG003751 to LS] EU grant [LSHG-CT-2005-518254] lsquoENFINrsquo Ontario Research Fund EBI In-dustry Programme The funders had no role in the study de-sign data analysis or preparation of the manuscript Fund-ing for open access charge ldquoGramene - Exploring Func-tion through Comparative Genomics and Network Analy-sisrdquo [NSF award 1127112]Conflict of interest statement None declared

REFERENCES1 CunninghamF AmodeMR BarrellD BealK BillisK

BrentS Carvalho-SilvaD ClaphamP CoatesG FitzgeraldSet al (2015) Ensembl 2015 Nucleic Acids Res 43 D662ndashD669

2 CroftD (2013) Building models using Reactome pathways astemplates Methods Mol Biol 1021 273ndash283

3 MonacoMK SteinJ NaithaniS WeiS DharmawardhanaPKumariS AmarasingheV Youens-ClarkK ThomasonJ PreeceJet al (2014) Gramene 2013 comparative plant genomics resourcesNucleic Acids Res 42 D1193ndashD1199

4 SpoonerW Youens-ClarkK StainesD and WareD (2012)GrameneMart the BioMart data portal for the Gramene projectDatabase J Biol Databases Curation bar056

5 Youens-ClarkK BucklerE CasstevensT ChenC DeclerckGDerwentP DharmawardhanaP JaiswalP KerseyPKarthikeyanAS et al (2011) Gramene database in 2010 updatesand extensions Nucleic Acids Res 39 D1085ndashD1094

6 NakamuraY CochraneG Karsch-MizrachiI and InternationalNucleotide Sequence Database C (2013) The internationalnucleotide sequence database collaboration Nucleic Acids Res 41D21ndashD24

7 JacqueminJ BhatiaD SinghK and WingRA (2013) TheInternational Oryza Map Alignment Project development of agenus-wide comparative genomics platform to help solve the 9billion-people question Curr Opin Plant Biol 16 147ndash156

8 International Wheat Genome Sequencing C (2014) Achromosome-based draft sequence of the hexaploid bread wheat(Triticum aestivum) genome Science 345 1251788

9 ChapmanJA MascherM BulucA BarryK GeorganasESessionA StrnadovaV JenkinsJ SehgalS OlikerL et al(2015) A whole-genome shotgun approach for assembling andanchoring the hexaploid bread wheat genome Genome Biol 16 26

10 ParkinIA KohC TangH RobinsonSJ KagaleS ClarkeWETownCD NixonJ KrishnakumarV BidwellSL et al (2014)Transcriptome and methylome profiling reveals relics of genomedominance in the mesopolyploid Brassica oleracea Genome Biol 15R77

11 MotamayorJC MockaitisK SchmutzJ HaiminenNLivingstoneD 3rd CornejoO FindleySD ZhengP UtroFRoyaertS et al (2013) The genome sequence of the most widelycultivated cacao type and its use to identify candidate genesregulating pod color Genome Biol 14 r53

12 International Peach Genome I VerdeI AbbottAG ScalabrinSJungS ShuS MarroniF ZhebentyayevaT DettoriMTGrimwoodJ et al (2013) The high-quality draft genome of peach(Prunus persica) identifies unique patterns of genetic diversitydomestication and genome evolution Nat Genet 45 487ndash494

13 Amborella GenomeP (2013) The Amborella genome and theevolution of flowering plants Science 342 1241089

14 ChamalaS ChanderbaliAS DerJP LanT WaltsBAlbertVA dePamphilisCW Leebens-MackJ RounsleySSchusterSC et al (2013) Assembly and validation of the genome ofthe nonmodel basal angiosperm Amborella Science 342 1516ndash1517

15 PalenikB GrimwoodJ AertsA RouzeP SalamovAPutnamN DupontC JorgensenR DerelleE RombautsS et al(2007) The tiny eukaryote Ostreococcus provides genomic insightsinto the paradox of plankton speciation Proc Natl Acad SciUSA 104 7705ndash7710

16 KerseyPJ StainesDM LawsonD KuleshaE DerwentPHumphreyJC HughesDS KeenanS KerhornouAKoscielnyG et al (2012) Ensembl Genomes an integrative resourcefor genome-scale data from non-vertebrate species Nucleic AcidsRes 40 D91ndashD97

17 KentWJ BaertschR HinrichsA MillerW and HausslerD(2003) Evolutionrsquos cauldron duplication deletion andrearrangement in the mouse and human genomes Proc Natl AcadSci USA 100 11484ndash11489

18 HarrisRS (2007) Improved pairwise alignment of genomic DNA PhDThesis Pennsylvania State University

19 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates GenomeRes 19 327ndash335

20 ErhardKF Jr TalbotJE DeansNC McClishAE andHollickJB (2015) Nascent transcription affected by RNApolymerase IV in Zea mays Genetics 199 1107ndash1125

21 RegulskiM LuZ KendallJ DonoghueMT ReindersJLlacaV DeschampsS SmithA LevyD McCombieWR et al(2013) The maize methylome influences mRNA splice sites andreveals widespread paramutation-like switches guided by small RNAGenome Res 23 1651ndash1662

22 CampbellMS LawM HoltC SteinJC MogheGDHufnagelDE LeiJ AchawanantakunR JiaoD LawrenceCJet al (2014) MAKER-P a tool kit for the rapid creation

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Page 6: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

D1138 Nucleic Acids Research 2016 Vol 44 Database issue

rice wheat maize and Arabidopsis) including seven base-line studies reporting expression in tissues strains and cul-tivars Atlasrsquos ability to display expression across all tissuesand all baseline studies in a juxtaposed manner in a singleintuitive interface makes it easy for the user to spot corre-lated patterns of expression across multiple lsquoomicsrsquo studiesAtlas offers a number of visualizations eg lsquoenrichmentrsquo ofGO terms Plant Reactome pathways and InterPro domainswithin each pairwise comparison The user interface alsoallows researchers to select the experimental condition andthe genes and to upload the expression data as a data trackon the plant genome browsers provided by Gramene andEnsembl Plants

NEW PUBLIC WEB SERVICE

Gramene now provides a public web service at httpdatagrameneorg that collects and links data from GramenersquosEnsembl and Plant Reactome resources exposing themthrough a RESTful API alongside the Ensembl and Reac-tome REST APIs This service is the basis for developmentof our web-based search analysis and visualization toolsbut is open to all developers

In order to provide a comprehensive gene search ser-vice we decided to integrate Gramenersquos Ensembl geneswith pathway Interpro domain and plant and GO associ-ations using MongoDB Gramenersquos genes document collec-tion (GDC) contains basic information on each gene (egID name description species genomic location cross ref-erences etc) as well as references to Compara gene treesassociated GO and PO terms (50) Interpro domains andpathways Separate document collections are maintainedfor each of these structured annotation types to facilitatecustomization and extension of the database in the fu-ture After incorporation of some additional indexing fieldsfields from the GDC are loaded into a Solr core AuxiliaryMongoDB collections are also loaded into Solr so that wecan suggest relevant filters given a partial query string Weuse Solr for its advanced querying and facet-counting fea-tures and MongoDB for its flexible support for structureddocuments

Using this integrated service it is possible to composecomplex queries across all hosted genomes and obtain a va-riety of summary statistics for the genes in the result set Theweb services provided through httpdatagrameneorg aredocumented with Swagger and open to all developers

WEBINARS AND VIDEO-TUTORIALS

In November 2014 Gramene began to offer monthly we-binars focused on providing an overview of resources andfunctionalities available to users of the Gramene databasefor comparative plant genomics as well as explaining howusers can visualize and analyze data of their choice usingvarious built-in tools The recorded webinars are availableon Gramenersquos YouTube channel (httpsgooglqQ2Pjn)Examples include webinars focused on specific plant species(eg Arabidopsis rice maize etc) andor specific resources(eg genome browser Plant Reactome GrameneMart ex-pression data analysis phylogenetic comparisons and genetrees etc) We are open to suggestions for webinar topics

via e-mail (webinarsgrameneorg) and we benefit fromuser feedback via post-webinar surveys (eg httpswwwsurveymonkeycomr3J2J2M9)

DISCUSSION AND FUTURE PERSPECTIVE

Currently in its 15th year the Gramene database has be-come an unparalleled resource for integrated genomic andpathway information for major model and crop plantsSince the last NAR update in January 2014 Grameneadded 12 new plant genome assemblies to a total of 39fully sequenced reference genomes many of which werealso updated with newer assembly versions andor anno-tations We added sim100 million new genetic variants to anet total of over 190 million plant genetic and structuralvariants from 11 plant species scored on a diversity ofgermplasm accessions Thus the bulk of the new variationdata set is comprised of over 70 million SNPs determined in84 tomato accessions (httpwwwtomatogenomenet) (32)over 12 million bread wheat SNPs (including 10 million ofinter-homoeologous wheat variants Wheat HapMap SNPsand wheat SNPs from the CerealsDB database) as wellas maize SNPs typed in 16 718 maize and teosinte linesfrom the Panzea 27 GBS set (httpwwwpanzeaorggenotypescctl) In addition we expanded our data collec-tion describing the transcriptome and epigenome of eco-nomically important species like rice and maize

In the next year besides projected growth of existing ge-nomic and genetic data we aim to consolidate our phe-notypic data collection by updating our QTL informationand storing and visualizing genome-wide association stud-ies data

Further user interface and search improvements includedevelopment of a comparative genomics-based search inter-face available at httpsearchgrameneorg (beta version)that queries and connects data from the public web servicesdescribed above This enables users to find genes of interestby asking specific questionsndashndashrather than performing a gen-eral text searchndashndashand to see the distribution of results acrossall genomes in Gramene The speed with which results arereturned enables an lsquoexploratoryrsquo navigation style when aninteresting gene is identified it can be used as the basis offurther searches eg to see all available homologs or findgenes with similar domain structures In its alpha releasethe search interface is a JavaScript web application built us-ing ReactJS interface library with visualizations developedin D3 and derived from the KBase platform The web appli-cation is open source with source code public and managedat Github Releases are built using Node Package Manager(lsquonpmrsquo) and deployed automatically using Travis-CI

We will continue to interact develop and provide value-added data and analysis tools to plant biologists breedersgeneticists molecular biologists biochemists and bioinfor-matics researchers With special emphasis on inter an intra-specific analysis of small and large-scale data sets gener-ated from global crop and emerging plant model specieswe expect that researchers will benefit by building hypothe-ses andor validating the existing or new knowledge aboutnovel genes their function expression role in pathwaysphenotypes and associated functional and structural vari-ations

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Nucleic Acids Research 2016 Vol 44 Database issue D1139

Our monthly webinar series will continue to consider thespecific needs of the plant community addressing trend-ing topics and targeting communities working on particularspecies that would most benefit from direct assistance forworking with Gramenersquos data and resources

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors are grateful to Gramenersquos users researchersand numerous collaborators for sharing data sets generatedin their projects valuable suggestions and feedback on im-proving the overall quality of Gramene as a community re-source We would also like to thank the Cold Spring HarborLaboratory (CSHL) and the Dolan DNA Learning Cen-ter the Center for Genome Research and Biocomputing(CGRB) at Oregon State University and the Ontario In-stitute for Cancer Research (OICR) for infrastructure sup-port We also thank Peter van Buren from Cold Spring Har-bor Laboratory for system administration support DavidCroft (EBI) and Robin Haw (OICR) for support on PlantReactome development undergraduate students Dylan Be-orchia Teague Green and Kindra Amoss (OSU) for theirhelp in pathway curation Samuel Fox and Matthew Geniza(OSU) on gene expression Atlas curation and data analysis

FUNDING

National Science Foundation [IOS-0703908 and IOS-1127112] United States Department of Agriculture - Agri-cultural Research Service [58-1907-4-030 and 1907-21000-030-00D to DW] European Communityrsquos 7th Frame-work Programme (FP72007-2013 Infrastructures) [con-tract 283496 to PK] United Kingdom Biotechnol-ogy and Biosciences Research Council [BBJ000328X1I0080711 and H5315191 to PK] The in-kind infrastruc-ture and intellectual support for the development and run-ning the Plant Reactome is supported by the Reactomedatabase project via a grant from the US National Institutesof Health [P41 HG003751 to LS] EU grant [LSHG-CT-2005-518254] lsquoENFINrsquo Ontario Research Fund EBI In-dustry Programme The funders had no role in the study de-sign data analysis or preparation of the manuscript Fund-ing for open access charge ldquoGramene - Exploring Func-tion through Comparative Genomics and Network Analy-sisrdquo [NSF award 1127112]Conflict of interest statement None declared

REFERENCES1 CunninghamF AmodeMR BarrellD BealK BillisK

BrentS Carvalho-SilvaD ClaphamP CoatesG FitzgeraldSet al (2015) Ensembl 2015 Nucleic Acids Res 43 D662ndashD669

2 CroftD (2013) Building models using Reactome pathways astemplates Methods Mol Biol 1021 273ndash283

3 MonacoMK SteinJ NaithaniS WeiS DharmawardhanaPKumariS AmarasingheV Youens-ClarkK ThomasonJ PreeceJet al (2014) Gramene 2013 comparative plant genomics resourcesNucleic Acids Res 42 D1193ndashD1199

4 SpoonerW Youens-ClarkK StainesD and WareD (2012)GrameneMart the BioMart data portal for the Gramene projectDatabase J Biol Databases Curation bar056

5 Youens-ClarkK BucklerE CasstevensT ChenC DeclerckGDerwentP DharmawardhanaP JaiswalP KerseyPKarthikeyanAS et al (2011) Gramene database in 2010 updatesand extensions Nucleic Acids Res 39 D1085ndashD1094

6 NakamuraY CochraneG Karsch-MizrachiI and InternationalNucleotide Sequence Database C (2013) The internationalnucleotide sequence database collaboration Nucleic Acids Res 41D21ndashD24

7 JacqueminJ BhatiaD SinghK and WingRA (2013) TheInternational Oryza Map Alignment Project development of agenus-wide comparative genomics platform to help solve the 9billion-people question Curr Opin Plant Biol 16 147ndash156

8 International Wheat Genome Sequencing C (2014) Achromosome-based draft sequence of the hexaploid bread wheat(Triticum aestivum) genome Science 345 1251788

9 ChapmanJA MascherM BulucA BarryK GeorganasESessionA StrnadovaV JenkinsJ SehgalS OlikerL et al(2015) A whole-genome shotgun approach for assembling andanchoring the hexaploid bread wheat genome Genome Biol 16 26

10 ParkinIA KohC TangH RobinsonSJ KagaleS ClarkeWETownCD NixonJ KrishnakumarV BidwellSL et al (2014)Transcriptome and methylome profiling reveals relics of genomedominance in the mesopolyploid Brassica oleracea Genome Biol 15R77

11 MotamayorJC MockaitisK SchmutzJ HaiminenNLivingstoneD 3rd CornejoO FindleySD ZhengP UtroFRoyaertS et al (2013) The genome sequence of the most widelycultivated cacao type and its use to identify candidate genesregulating pod color Genome Biol 14 r53

12 International Peach Genome I VerdeI AbbottAG ScalabrinSJungS ShuS MarroniF ZhebentyayevaT DettoriMTGrimwoodJ et al (2013) The high-quality draft genome of peach(Prunus persica) identifies unique patterns of genetic diversitydomestication and genome evolution Nat Genet 45 487ndash494

13 Amborella GenomeP (2013) The Amborella genome and theevolution of flowering plants Science 342 1241089

14 ChamalaS ChanderbaliAS DerJP LanT WaltsBAlbertVA dePamphilisCW Leebens-MackJ RounsleySSchusterSC et al (2013) Assembly and validation of the genome ofthe nonmodel basal angiosperm Amborella Science 342 1516ndash1517

15 PalenikB GrimwoodJ AertsA RouzeP SalamovAPutnamN DupontC JorgensenR DerelleE RombautsS et al(2007) The tiny eukaryote Ostreococcus provides genomic insightsinto the paradox of plankton speciation Proc Natl Acad SciUSA 104 7705ndash7710

16 KerseyPJ StainesDM LawsonD KuleshaE DerwentPHumphreyJC HughesDS KeenanS KerhornouAKoscielnyG et al (2012) Ensembl Genomes an integrative resourcefor genome-scale data from non-vertebrate species Nucleic AcidsRes 40 D91ndashD97

17 KentWJ BaertschR HinrichsA MillerW and HausslerD(2003) Evolutionrsquos cauldron duplication deletion andrearrangement in the mouse and human genomes Proc Natl AcadSci USA 100 11484ndash11489

18 HarrisRS (2007) Improved pairwise alignment of genomic DNA PhDThesis Pennsylvania State University

19 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates GenomeRes 19 327ndash335

20 ErhardKF Jr TalbotJE DeansNC McClishAE andHollickJB (2015) Nascent transcription affected by RNApolymerase IV in Zea mays Genetics 199 1107ndash1125

21 RegulskiM LuZ KendallJ DonoghueMT ReindersJLlacaV DeschampsS SmithA LevyD McCombieWR et al(2013) The maize methylome influences mRNA splice sites andreveals widespread paramutation-like switches guided by small RNAGenome Res 23 1651ndash1662

22 CampbellMS LawM HoltC SteinJC MogheGDHufnagelDE LeiJ AchawanantakunR JiaoD LawrenceCJet al (2014) MAKER-P a tool kit for the rapid creation

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Page 7: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

Nucleic Acids Research 2016 Vol 44 Database issue D1139

Our monthly webinar series will continue to consider thespecific needs of the plant community addressing trend-ing topics and targeting communities working on particularspecies that would most benefit from direct assistance forworking with Gramenersquos data and resources

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors are grateful to Gramenersquos users researchersand numerous collaborators for sharing data sets generatedin their projects valuable suggestions and feedback on im-proving the overall quality of Gramene as a community re-source We would also like to thank the Cold Spring HarborLaboratory (CSHL) and the Dolan DNA Learning Cen-ter the Center for Genome Research and Biocomputing(CGRB) at Oregon State University and the Ontario In-stitute for Cancer Research (OICR) for infrastructure sup-port We also thank Peter van Buren from Cold Spring Har-bor Laboratory for system administration support DavidCroft (EBI) and Robin Haw (OICR) for support on PlantReactome development undergraduate students Dylan Be-orchia Teague Green and Kindra Amoss (OSU) for theirhelp in pathway curation Samuel Fox and Matthew Geniza(OSU) on gene expression Atlas curation and data analysis

FUNDING

National Science Foundation [IOS-0703908 and IOS-1127112] United States Department of Agriculture - Agri-cultural Research Service [58-1907-4-030 and 1907-21000-030-00D to DW] European Communityrsquos 7th Frame-work Programme (FP72007-2013 Infrastructures) [con-tract 283496 to PK] United Kingdom Biotechnol-ogy and Biosciences Research Council [BBJ000328X1I0080711 and H5315191 to PK] The in-kind infrastruc-ture and intellectual support for the development and run-ning the Plant Reactome is supported by the Reactomedatabase project via a grant from the US National Institutesof Health [P41 HG003751 to LS] EU grant [LSHG-CT-2005-518254] lsquoENFINrsquo Ontario Research Fund EBI In-dustry Programme The funders had no role in the study de-sign data analysis or preparation of the manuscript Fund-ing for open access charge ldquoGramene - Exploring Func-tion through Comparative Genomics and Network Analy-sisrdquo [NSF award 1127112]Conflict of interest statement None declared

REFERENCES1 CunninghamF AmodeMR BarrellD BealK BillisK

BrentS Carvalho-SilvaD ClaphamP CoatesG FitzgeraldSet al (2015) Ensembl 2015 Nucleic Acids Res 43 D662ndashD669

2 CroftD (2013) Building models using Reactome pathways astemplates Methods Mol Biol 1021 273ndash283

3 MonacoMK SteinJ NaithaniS WeiS DharmawardhanaPKumariS AmarasingheV Youens-ClarkK ThomasonJ PreeceJet al (2014) Gramene 2013 comparative plant genomics resourcesNucleic Acids Res 42 D1193ndashD1199

4 SpoonerW Youens-ClarkK StainesD and WareD (2012)GrameneMart the BioMart data portal for the Gramene projectDatabase J Biol Databases Curation bar056

5 Youens-ClarkK BucklerE CasstevensT ChenC DeclerckGDerwentP DharmawardhanaP JaiswalP KerseyPKarthikeyanAS et al (2011) Gramene database in 2010 updatesand extensions Nucleic Acids Res 39 D1085ndashD1094

6 NakamuraY CochraneG Karsch-MizrachiI and InternationalNucleotide Sequence Database C (2013) The internationalnucleotide sequence database collaboration Nucleic Acids Res 41D21ndashD24

7 JacqueminJ BhatiaD SinghK and WingRA (2013) TheInternational Oryza Map Alignment Project development of agenus-wide comparative genomics platform to help solve the 9billion-people question Curr Opin Plant Biol 16 147ndash156

8 International Wheat Genome Sequencing C (2014) Achromosome-based draft sequence of the hexaploid bread wheat(Triticum aestivum) genome Science 345 1251788

9 ChapmanJA MascherM BulucA BarryK GeorganasESessionA StrnadovaV JenkinsJ SehgalS OlikerL et al(2015) A whole-genome shotgun approach for assembling andanchoring the hexaploid bread wheat genome Genome Biol 16 26

10 ParkinIA KohC TangH RobinsonSJ KagaleS ClarkeWETownCD NixonJ KrishnakumarV BidwellSL et al (2014)Transcriptome and methylome profiling reveals relics of genomedominance in the mesopolyploid Brassica oleracea Genome Biol 15R77

11 MotamayorJC MockaitisK SchmutzJ HaiminenNLivingstoneD 3rd CornejoO FindleySD ZhengP UtroFRoyaertS et al (2013) The genome sequence of the most widelycultivated cacao type and its use to identify candidate genesregulating pod color Genome Biol 14 r53

12 International Peach Genome I VerdeI AbbottAG ScalabrinSJungS ShuS MarroniF ZhebentyayevaT DettoriMTGrimwoodJ et al (2013) The high-quality draft genome of peach(Prunus persica) identifies unique patterns of genetic diversitydomestication and genome evolution Nat Genet 45 487ndash494

13 Amborella GenomeP (2013) The Amborella genome and theevolution of flowering plants Science 342 1241089

14 ChamalaS ChanderbaliAS DerJP LanT WaltsBAlbertVA dePamphilisCW Leebens-MackJ RounsleySSchusterSC et al (2013) Assembly and validation of the genome ofthe nonmodel basal angiosperm Amborella Science 342 1516ndash1517

15 PalenikB GrimwoodJ AertsA RouzeP SalamovAPutnamN DupontC JorgensenR DerelleE RombautsS et al(2007) The tiny eukaryote Ostreococcus provides genomic insightsinto the paradox of plankton speciation Proc Natl Acad SciUSA 104 7705ndash7710

16 KerseyPJ StainesDM LawsonD KuleshaE DerwentPHumphreyJC HughesDS KeenanS KerhornouAKoscielnyG et al (2012) Ensembl Genomes an integrative resourcefor genome-scale data from non-vertebrate species Nucleic AcidsRes 40 D91ndashD97

17 KentWJ BaertschR HinrichsA MillerW and HausslerD(2003) Evolutionrsquos cauldron duplication deletion andrearrangement in the mouse and human genomes Proc Natl AcadSci USA 100 11484ndash11489

18 HarrisRS (2007) Improved pairwise alignment of genomic DNA PhDThesis Pennsylvania State University

19 VilellaAJ SeverinJ Ureta-VidalA HengL DurbinR andBirneyE (2009) EnsemblCompara GeneTrees completeduplication-aware phylogenetic trees in vertebrates GenomeRes 19 327ndash335

20 ErhardKF Jr TalbotJE DeansNC McClishAE andHollickJB (2015) Nascent transcription affected by RNApolymerase IV in Zea mays Genetics 199 1107ndash1125

21 RegulskiM LuZ KendallJ DonoghueMT ReindersJLlacaV DeschampsS SmithA LevyD McCombieWR et al(2013) The maize methylome influences mRNA splice sites andreveals widespread paramutation-like switches guided by small RNAGenome Res 23 1651ndash1662

22 CampbellMS LawM HoltC SteinJC MogheGDHufnagelDE LeiJ AchawanantakunR JiaoD LawrenceCJet al (2014) MAKER-P a tool kit for the rapid creation

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from

Page 8: Gramene 2016: comparative plant genomics and pathway resources › download › pdf › 33027963.pdf · Gramene 2016: comparative plant genomics and pathway resources Marcela K. Tello-Ruiz1,

D1140 Nucleic Acids Research 2016 Vol 44 Database issue

management and quality control of plant genome annotations PlantPhysiol 164 513ndash524

23 LawM ChildsKL CampbellMS SteinJC OlsonAJ HoltCPanchyN LeiJ JiaoD AndorfCM et al (2015) Automatedupdate revision and quality control of the maize genomeannotations using MAKER-P improves the B73 RefGen v3 genemodels and identifies new genes Plant Physiol 167 25ndash39

24 ClarkRM SchweikertG ToomajianC OssowskiS ZellerGShinnP WarthmannN HuTT FuG HindsDA et al (2007)Common sequence polymorphisms shaping genetic diversity inArabidopsis thaliana Science 317 338ndash342

25 AtwellS HuangYS VilhjalmssonBJ WillemsG HortonMLiY MengD PlattA TaroneAM HuTT et al (2010)Genome-wide association study of 107 phenotypes in Arabidopsisthaliana inbred lines Nature 465 627ndash631

26 GanX StegleO BehrJ SteffenJG DreweP HildebrandKLLyngsoeR SchultheissSJ OsborneEJ SreedharanVT et al(2011) Multiple reference genomes and transcriptomes forArabidopsis thaliana Nature 477 419ndash423

27 FoxSE PreeceJ KimbrelJA MarchiniGL SageAYouens-ClarkK CruzanMB and JaiswalP (2013) Sequencing andde novo transcriptome assembly of Brachypodium sylvaticum(Poaceae) Appl Plant Sci 1

28 International Barley Genome Sequencing C MayerKFWaughR BrownJW SchulmanA LangridgeP PlatzerMFincherGB MuehlbauerGJ SatoK et al (2012) A physicalgenetic and functional sequence assembly of the barley genomeNature 491 711ndash716

29 FoxSE GenizaM HanumappaM NaithaniS SullivanCPreeceJ TiwariVK ElserJ LeonardJM SageA et al (2014) Denovo transcriptome assembly and analyses of gene expression duringphotomorphogenesis in diploid wheat Triticum monococcum PLoSOne 9 e96855

30 McNallyKL ChildsKL BohnertR DavidsonRM ZhaoKUlatVJ ZellerG ClarkRM HoenDR BureauTE et al (2009)Genomewide SNP variation reveals relationships among landracesand modern varieties of rice Proc Natl Acad Sci USA 10612273ndash12278

31 ZhaoK WrightM KimballJ EizengaG McClungAKovachM TyagiW AliML TungCW ReynoldsA et al (2010)Genomic diversity and introgression in O sativa reveal the impact ofdomestication and breeding on the rice genome PLoS One 5 e10780

32 Tomato Genome Sequencing C AflitosS SchijlenE de JongHde RidderD SmitS FinkersR WangJ ZhangG LiN et al(2014) Exploring genetic variation in the tomato (Solanum sectionLycopersicon) clade by whole-genome sequencing Plant J 80136ndash148

33 ZhengLY GuoXS HeB SunLJ PengY DongSS LiuTFJiangS RamachandranS LiuCM et al (2011) Genome-widepatterns of genetic variation in sweet and grain sorghum (Sorghumbicolor) Genome Biol 12 R114

34 MorrisGP RamuP DeshpandeSP HashCT ShahTUpadhyayaHD Riera-LizarazuO BrownPJ AcharyaCBMitchellSE et al (2013) Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum Proc NatlAcad Sci USA 110 453ndash458

35 MaceES TaiS GildingEK LiY PrentisPJ BianLCampbellBC HuW InnesDJ HanX et al (2013)Whole-genome sequencing reveals untapped genetic potential inAfricarsquos indigenous cereal crop sorghum Nat Commun 4 2320

36 JordanKW WangS LunY GardinerLJ MacLachlanRHuclP WiebeK WongD ForrestKL ConsortiumI et al (2015)

A haplotype map of allohexaploid wheat reveals distinct patterns ofselection on homoeologous genomes Genome Biol 16 48

37 MylesS ChiaJM HurwitzB SimonC ZhongGY BucklerEand WareD (2010) Rapid genomic characterization of the genusvitis PLoS One 5 e8219

38 GoreMA ChiaJM ElshireRJ SunQ ErsozESHurwitzBL PeifferJA McMullenMD GrillsGSRoss-IbarraJ et al (2009) A first-generation haplotype map ofmaize Science 326 1115ndash1117

39 ChiaJM SongC BradburyPJ CostichD de LeonNDoebleyJ ElshireRJ GautB GellerL GlaubitzJC et al (2012)Maize HapMap2 identifies extant variation from a genome in fluxNat Genet 44 803ndash807

40 BrenchleyR SpannaglM PfeiferM BarkerGL DrsquoAmoreRAllenAM McKenzieN KramerM KerhornouA BolserDet al (2012) Analysis of the bread wheat genome using whole-genomeshotgun sequencing Nature 491 705ndash710

41 MascherM MuehlbauerGJ RokhsarDS ChapmanJSchmutzJ BarryK Munoz-AmatriainM CloseTJ WiseRPSchulmanAH et al (2013) Anchoring and ordering NGS contigassemblies by population sequencing (POPSEQ) Plant J 76718ndash727

42 ComadranJ KilianB RussellJ RamsayL SteinN GanalMShawP BayerM ThomasW MarshallD et al (2012) Naturalvariation in a homolog of Antirrhinum CENTRORADIALIScontributed to spring growth habit and environmental adaptation incultivated barley Nat Genet 44 1388ndash1392

43 ChenY CunninghamF RiosD McLarenWM SmithJPritchardB SpudichGM BrentS KuleshaE Marin-GarciaPet al (2010) Ensembl variation resources BMC Genomics 11 293

44 CroftD MundoAF HawR MilacicM WeiserJ WuGCaudyM GarapatiP GillespieM KamdarMR et al (2014) TheReactome pathway knowledgebase Nucleic Acids Res 42D472ndashD477

45 DharmawardhanaP RenL AmarasingheV MonacoMThomasonJ RavenscroftD McCouchS WareD and JaiswalP(2013) A genome scale metabolic network for rice and accompanyinganalysis of tryptophan auxin and serotonin biosynthesis regulationunder biotic stress Rice 6 15

46 WingRA AmmirajuJS LuoM KimH YuY KudrnaDGoicoecheaJL WangW NelsonW RaoK et al (2005) The oryzamap alignment project the golden path to unlocking the geneticpotential of wild rice species Plant Mol Biol 59 53ndash62

47 SakaiH KanamoriH Arai-KichiseY Shibata-HattaMEbanaK OonoY KuritaK FujisawaH KatagiriS MukaiYet al (2014) Construction of pseudomolecule sequences of the ausrice cultivar Kasalath for comparative genomics of Asian cultivatedrice DNA Res 21 397ndash405

48 PetryszakR BurdettT FiorelliB FonsecaNAGonzalez-PortaM HastingsE HuberW JuppS KeaysMKryvychN et al (2014) Expression Atlas updatendasha database of geneand transcript expression from microarray- and sequencing-basedfunctional genomics experiments Nucleic Acids Res 42 D926ndashD932

49 MonacoMK SenTZ DharmawardhanaPD RenLSchaefferM NaithaniS AmarasingheV ThomasonJ HarperLand GardinerJ (2013) Maize metabolic network construction andtranscriptome analysis Plant Genome 6

50 CooperL WallsRL ElserJ GandolfoMA StevensonDWSmithB PreeceJ AthreyaB MungallCJ RensingS et al (2013)The plant ontology as a tool for comparative plant anatomy andgenomic analyses Plant Cell Physiol 54 e1

at Cold Spring H

arbor Laboratory on M

arch 25 2016httpnaroxfordjournalsorg

Dow

nloaded from