the arabidopsis information resource: making and mining the … · 2017-11-18 · associate genes...

12
RESEARCH ARTICLE The Arabidopsis Information Resource: Making and Mining the “Gold Standard” Annotated Reference Plant Genome Tanya Z. Berardini,* Leonore Reiser, Donghui Li, Yarik Mezheritsky, Robert Muller, Emily Strait, and Eva Huala Phoenix Bioinformatics, Redwood City, California Received 8 April 2015; Revised 15 July 2015; Accepted 15 July 2015 Summary: The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabi- dopsis thaliana that provides a global research commu- nity with centralized access to data for over 30,000 Arabidopsis genes. TAIR’s biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips, and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIR’s focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experi- mentally based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinfor- matics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription-based funding while still keeping its data relatively open and accessible. genesis 53:474–485, 2015. V C 2015 Wiley Periodicals, Inc. Key words: genome database; plant; sustainable funding INTRODUCTION Arabidopsis thaliana is an annual dicotyledonous plant that serves as a model plant for thousands of research- ers across the globe. The Arabidopsis Information Resource (TAIR) is a curated online database of genetic and molecular biology data for A. thaliana (Lamesch et al., 2012). Arabidopsis is an attractive model orga- nism because of its small genome size, experimental tractability, and short generation time (Koornneef and Meinke 2010; Somerville and Koornneef, 2002). Arabi- dopsis was the first plant genome to be sequenced, and subsequently the subject of a concerted effort to under- stand the functions of the more than 30,000 genes iden- tified to date (Arabidopsis Genome Initiative, 2000); https://www.arabidopsis.org/portals/masc/projects. jsp). Decades of research have produced an expansive collection of data about Arabidopsis that serves as a ref- erence for understanding plant gene functions and ulti- mately unraveling mechanisms of plant physiology, biochemistry, and development. TAIR continues to grow and evolve along with its research community, data, and knowledge about the organism. TAIR was launched in 1999 just prior to the release of the public genome sequence and rapidly became an indispensable resource for thousands of plant biology researchers around the world (Huala et al., 2001). In 2014, TAIR registered over 2.1 million visits with an average of 178,000 visits and 61,000 users per month (Google Analytics, accessed March 1, 2015). Visits originated from around the world with Asia *Correspondence to: Tanya Z. Berardini, Phoenix Bioinformatics, 643 Bair Island Road, Suite 403, Redwood City, CA 94063, USA. E-mail: [email protected] Contract grant sponsor: National Human Genome Research Institute (NHGRI) of the National Institutes of Health, Contract grant number: U41HG002273 Published online 4 August 2015 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/dvg.22877 V C 2015 Wiley Periodicals, Inc. genesis 53:474–485 (2015)

Upload: others

Post on 30-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

RESEARCH ARTICLE

The Arabidopsis Information Resource: Making and Miningthe “Gold Standard” Annotated Reference Plant Genome

Tanya Z. Berardini,* Leonore Reiser, Donghui Li, Yarik Mezheritsky, Robert Muller,Emily Strait, and Eva Huala

Phoenix Bioinformatics, Redwood City, California

Received 8 April 2015; Revised 15 July 2015; Accepted 15 July 2015

Summary: The Arabidopsis Information Resource (TAIR)is a continuously updated, online database of geneticand molecular biology data for the model plant Arabi-dopsis thaliana that provides a global research commu-nity with centralized access to data for over 30,000Arabidopsis genes. TAIR’s biocurators systematicallyextract, organize, and interconnect experimental datafrom the literature along with computational predictions,community submissions, and high throughput datasetsto present a high quality and comprehensive picture ofArabidopsis gene function. TAIR provides tools for datavisualization and analysis, and enables ordering of seedand DNA stocks, protein chips, and other experimentalresources. TAIR actively engages with its users whocontribute expertise and data that augments the work ofthe curatorial staff. TAIR’s focus in an extensive andevolving ecosystem of online resources for plant biologyis on the critically important role of extracting experi-mentally based research findings from the literature andmaking that information computationally accessible. Inresponse to the loss of government grant funding, theTAIR team founded a nonprofit entity, Phoenix Bioinfor-matics, with the aim of developing sustainable fundingmodels for biological databases, using TAIR as a testcase. Phoenix has successfully transitioned TAIR tosubscription-based funding while still keeping its datarelatively open and accessible. genesis 53:474–485,2015. VC 2015 Wiley Periodicals, Inc.

Key words: genome database; plant; sustainable funding

INTRODUCTION

Arabidopsis thaliana is an annual dicotyledonous plantthat serves as a model plant for thousands of research-ers across the globe. The Arabidopsis Information

Resource (TAIR) is a curated online database of geneticand molecular biology data for A. thaliana (Lameschet al., 2012). Arabidopsis is an attractive model orga-nism because of its small genome size, experimentaltractability, and short generation time (Koornneef andMeinke 2010; Somerville and Koornneef, 2002). Arabi-dopsis was the first plant genome to be sequenced, andsubsequently the subject of a concerted effort to under-stand the functions of the more than 30,000 genes iden-tified to date (Arabidopsis Genome Initiative, 2000);https://www.arabidopsis.org/portals/masc/projects.jsp). Decades of research have produced an expansivecollection of data about Arabidopsis that serves as a ref-erence for understanding plant gene functions and ulti-mately unraveling mechanisms of plant physiology,biochemistry, and development.

TAIR continues to grow and evolve along with itsresearch community, data, and knowledge about theorganism. TAIR was launched in 1999 just prior to therelease of the public genome sequence and rapidlybecame an indispensable resource for thousands ofplant biology researchers around the world (Hualaet al., 2001). In 2014, TAIR registered over 2.1 millionvisits with an average of 178,000 visits and 61,000 usersper month (Google Analytics, accessed March 1, 2015).Visits originated from around the world with Asia

* Correspondence to: Tanya Z. Berardini, Phoenix Bioinformatics, 643

Bair Island Road, Suite 403, Redwood City, CA 94063, USA.E-mail: [email protected]

Contract grant sponsor: National Human Genome Research Institute

(NHGRI) of the National Institutes of Health, Contract grant number:

U41HG002273Published online 4 August 2015 in

Wiley Online Library (wileyonlinelibrary.com).

DOI: 10.1002/dvg.22877

VC 2015 Wiley Periodicals, Inc. genesis 53:474–485 (2015)

Page 2: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

accounting for 42%, Europe 28%, and the Americas 27%.As of March 2015, the number of registered TAIR userswas 27,500, with about 38% of these considered to behighly active. Over the years the underlying data pro-vided by the resource has grown in both quantity andquality. The main activity of TAIR is producing a “goldstandard” functional annotation for this important refer-ence plant in order to make experimental data maximallydiscoverable and computable by the research commu-nity. That activity was significantly curtailed in recentyears due to the loss of TAIR’s main funding source fromthe National Science Foundation. The most significantchange in TAIR has been the transition from grant fund-ing to subscription support from institutions and individ-uals worldwide. This new, international, sustainablefunding model provides increased stability over tradi-tional grant funding and ensures that TAIR can continueits mission to provide highest quality Arabidopsisgenome information to the research community.

GOLD STANDARD ANNOTATION OF THEARABIDOPSIS GENOME

A rigorously annotated reference genome dataset isessential for making inferences, producing accuratemodels, and generating testable hypotheses about genefunctions in Arabidopsis and other plant genomes. Oneof the most common uses of TAIR is to extrapolate thefunction of genes in agriculturally important speciesbased on orthology to Arabidopsis genes. Biologicalroles can be inferred with greater degree of confidenceif the evidence supporting the assertion for the refer-ence gene is experimentally based rather than the resultof a computational prediction. Thus, the main activityof TAIR curators is integrating experimental data fromthe research literature to produce high quality annota-tions that enable functional and comparative genomics(Lamesch et al., 2010).

TAIR curators extract and organize a variety of datafrom the peer-reviewed literature (Table 1). Curated datainclude, but are not limited to, gene function informationcaptured in the form of Gene Ontology (GO) annotations(Berardini et al., 2004; Gene Ontology Consortium et al.,2013), gene expression information in the form of PlantOntology (PO) annotations (Cooper et al., 2013), genesymbol and full names, alleles, phenotypes and germ-plasm information, and publications. These data are care-fully curated according to exacting standards and arepresented in a structured way that reflects actual biologi-cal relationships. When community members submitdata to TAIR, a curator reviews the submission and per-forms standard quality control checks before incorpora-tion of this data into the resource.

Because the Arabidopsis literature is extensive andTAIR’s curation resources are limited, we established atriage system that enables us to focus our literature

curation efforts on articles with novel or high impactresults (Li et al., 2012). Each month, approximately 350new articles indexed by PubMed contain the word“Arabidopsis” in the title, abstract, or MeSH headings.Of those, about 60% contain information about one ormore Arabidopsis genes. TAIR curators use a semi-automated entity recognition and linking method toassociate genes to research articles (Yoo et al., 2006).Curators read the abstracts of all new articles, reviewcomputationally generated links, and validate, invali-date, or manually add new links between genes andarticles. As a result, the growing literature corpus isaccurately associated to individual genes and becomesaccessible from the corresponding gene detail pages. Asubset of these articles, such as those that containexperimental results on gene function for previouslyuncharacterized genes, is read in depth and experimen-tally derived results are added to TAIR in a structuredmanner.

Gene Function

Gene function information is captured in the form ofGO annotations that describe the molecular function,biological role, and subcellular localization of geneproducts (The Gene Ontology Consortium et al., 2000).The ontologies themselves are controlled vocabulariesthat are structured to accurately represent biology andare continuously reviewed and updated. TAIR curatorsplay an important role in maintaining the ontologiesand ensure that plant biology terms and relations areincorporated correctly. TAIR is the main source of man-ual GO annotations for Arabidopsis. Since 2001 TAIRhas created tens of thousands of annotations from theliterature. In the current set of valid annotations, TAIR’scontribution includes 91,445 manual GO annotationsto 18,932 distinct Arabidopsis gene products

Table 1Literature-Derived, Curated Data Added to TAIR Since August 31,

2013 (data as of June 15, 2015)

Data type Number added or updated

Articles 7,428Gene symbols 1,288Genes linked to articles 10,818Articles linked to genes 4,019Articles used for

experimentally-supportedGO or PO annotations

432 (TAIR curators 1

community)*215 (TAIR curators only)

Experimentally supportedGO and PO annotations

2,870 (TAIR curators 1community)*

1,120 (TAIR curators only)Alleles from the literature 150Phenotypes from the

literature90

*All community submissions are reviewed by a TAIR curatorprior to incorporation into the database. Occasionally, additionalinformation that supplements or clarifies the user’s submission isadded during the review process.

TAIR: MAKING AND MINING THE “GOLD STANDARD” PLANT GENOME 475

Page 3: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

accounting for 80.6% of all publically available experi-mentally based (i.e., manual) GO annotations for A. thali-

ana (http://www.ebi.ac.uk/GOA/arabidopsis_release,GOA Arabidopsis (version 118), released on 27 May,2015). Annotations are curated according to a rigorouslydefined set of standards that have been extensively dis-cussed and documented by the GO consortium (Hillet al., 2008). In addition to the annotation done by TAIRcurators, we integrate manual A. thaliana annotationsfrom UniProtKB, the GO Consortium, and contributionsfrom our research community to present a unified viewof Arabidopsis gene function. TAIR also incorporatescomputationally generated annotations from UniProtthat rely on InterPro domain mapping (Burge et al.,2012) as well as in-house computational inferencesbased on sequence features. The computational annota-tions are especially valuable in the case where no otherinformation is available from the research literature.Computationally predicted gene function annotationsare removed from display as new experimentally vali-dated annotations are added, or when more accurate andup-to-date computational predictions become available.Researchers can filter annotations based on evidence toselect only those annotations that are supported byexperiment.

Alleles and Phenotypes

The Arabidopsis community has generated an exten-sive toolkit of genetic resources including numerousmutant collections. Tens of thousands of insertion lines,knockdowns, point mutations, over-expressors andmore have been donated to the stock centers and manyof the insertion sites have been mapped to the genome(Alonso et al., 2003; Rosso et al., 2003; Sessions et al.,2002; Till et al., 2003; Woody et al., 2007). Many of thesealleles have been characterized experimentally anddescribed in the literature. TAIR adds value to thesegenetic resources by gathering phenotype descriptionsfrom the literature and linking that information to theallele record in TAIR, making phenotype data moreaccessible to researchers. For example, if a researcherreports the phenotype of a T-DNA insertion line gener-ated by the Salk T-DNA Express project, TAIR curatorswill associate information about the molecular pheno-type of the insertion (e.g., verified to be a null allele) aswell as the mutant plant phenotype. Phenotype informa-tion is summarized in the form of free text descriptionsthat are searchable in the database. Alleles and pheno-types are linked to the source publications so thatresearchers can immediately find the relevant references.

Gene Expression

Researchers wishing to mine Arabidopsis geneexpression data in TAIR can access data from differenttypes of experiments ranging from large-scale, genome

wide expression studies of whole plants, tissues andsingle cells to in situ expression of single genes usingprobes or reporter genes. We integrate gene expressiondata using PO annotations. The PO terms describe plantstructures from the whole plant down to individualcells and plant growth and developmental stages(whole plant and plant parts) (Jaiswal et al., 2005). Aswith the GO annotations, the PO annotations facilitatecross experiment and cross species comparisons(Cooper et al., 2013). As part of our literature curationworkflow, we annotate gene expression patternsreported in publications. This often adds significantgranularity to what might be known from studies usingmicroarrays or RNA-seq data. For example, a geneshown to be expressed in flowers in the AtGenExpresstool (Schmid et al., 2005) might be shown to have verytissue-specific expression (e.g., only ovules) whenexamined by in situ hybridization. TAIR captures anddisplays both of these results to present a more detailedexpression profile. Literature curation adds granularityand makes experimental resources more visible byincluding information about the type of experimentsupporting the annotation. If a gene fusion is used tolocalize expression that information will be part of theannotation so a researcher can quickly determinewhom to contact to obtain the construct.

Genome organization and structure; then andnow

The reference genome annotation for the original A.

thaliana Col-0 genome sequence has been periodicallyupdated to incorporate new experimental data on geneexpression, update gene structures, and add splice var-iants and newly discovered genes. From 2005 to 2010,TAIR was responsible for updating both gene structuresand gene function for the standard Arabidopsis genomerelease, available from NCBI’s RefSeq and many otherresources. Using a combination of computational meth-ods and manual review, TAIR produced a total of fivegenome releases with the most recent (TAIR10) madepublic in November 2010 (https://www.arabidopsis.org/portals/genAnnotation/gene_structural_annota-tion/annotation_data.jsp#data) (Lamesch et al., 2012;Swarbreck et al., 2008;). In 2013, the newly funded Ara-port project took on the responsibility to provide addi-tional Arabidopsis genome releases (Araport, www.araport.org) (Krishnakumar et al., 2015). The Araportgenome releases will be made available from TAIR alongwith data on tagged insertion sites, protein features likepredicted subcellular localization and domains, andorthology data from other organisms. As we have donewith previous releases generated by TAIR, we will alsoproduce a set of custom sequence datasets based onthe Araport genome release for our sequence analysistools.

476 BERARDINI ET AL.

Page 4: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

Gene Nomenclature

Each locus in Arabidopsis is assigned a unique identi-fier, termed the AGI locus code (AGI, ArabidopsisGenome Initiative) which consists of the prefix At, fol-lowed by the chromosome identifier (1–5 or M or C) fol-lowed by g for gene and then a unique 5 digit number(e.g. At2g46340). Typically, authors of research articlesrefer to a gene using a gene symbol rather than the AGIlocus identifier. However, gene symbols are not uniqueidentifiers for Arabidopsis genes; in many cases onegene symbol refers to two or more genes or the samegene has been given two or more gene symbols by dif-ferent research groups. This creates challenges for bothresearchers and curators who want to search for or addnew information to a specific gene. TAIR curators some-times expend considerable effort to resolve thesenomenclature issues and correctly link genes to publica-tions or other information. In cases where the authordoes not explicitly include the unique AGI locus identi-fier in an article, curators must often undertake somedetective work to infer which identifier should be asso-ciated to the new gene symbol and publication.

TAIR also maintains a community registry for genesymbol nomenclature (https://www.arabidopsis.org/portals/nomenclature/symbol_main.jsp) that was estab-lished to minimize duplication of gene symbols andthus make it easier to unambiguously identify genes inarticles. Researchers can use this registry to ensure thatthe symbol they wish to use for a new gene has notalready been used for a different gene, and reserve it fora future publication. Researchers who have discovereda new gene that lacks an AGI locus identifier in themost recent genome release should contact the Araportproject at https://www.araport.org/contact to have oneassigned. The newly assigned identifier should beincluded in abstracts and publications discussing thenew gene.

SEARCHING, BROWSING, ANDDOWNLOADING DATA

TAIR provides basic and advanced search tools for eachof the various data types in the database (https://www.arabidopsis.org/servlets/Search?type5general&action5

new_search). The header at the top of every page fea-tures a simple name search that can be used to searchfor specific types of data such as genes or clones. Forexample, a researcher looking for information about aspecific locus can simply type the AGI locus identifieror symbolic name into the search box, and chooseGene from the drop down menu. For each data typethere are also advanced search tools that include addi-tional relevant search parameters such as locationwithin the genome, associated keywords, and muta-gens. Thus, a researcher interested in cell wall biosyn-

thesis could use the advanced Gene search to identifyall Arabidopsis genes annotated to this process basedon experimental evidence (https://www.arabidopsis.org/servlets/Search?action5new_search&type5gene).This would be done by entering the keyword cell wallbiosynthesis into the keyword field, choosing“contains” search and selecting one or more types ofexperimental evidence (inferred from direct assay,inferred from expression pattern, inferred fromgenetic interaction, inferred from physical interaction,and inferred from mutant phenotype). The results canthen be downloaded as a list or browsed individually.

There are several entry points for browsing differenttypes of data. The keyword browser can be used to findannotations, loci, and publications associated to GOor PO terms (https://www.arabidopsis.org/servlets/Search?action5new_search&type5keyword). Users withquestions about nomenclature and gene families canbrowse the registry of gene class symbols (https://www.arabidopsis.org/servlets/processor?type5genesymbol&update_action5view_symbol) or community-generatedgene family pages (https://www.arabidopsis.org/browse/genefamily/index.jsp).

Researchers wishing to retrieve large datasets, suchas all GO annotations or upstream sequences for a setof co-expressed genes, can use the Bulk Downloadsearch and retrieval tools (https://www.arabidopsis.org/tools/bulk/index.jsp). Very large datasets, such asall GO or PO annotations, FASTA formatted BLAST data-sets, and whole genome structural annotations, can bedownloaded as complete files (https://www.arabidop-sis.org/download/index.jsp). Custom datasets can alsobe created for TAIR subscribers upon request.

THE LOCUS DETAIL PAGE

Each of the more than 30,000 genes in the Arabidopsisgenome has its own locus detail page that presents anoverview of the information associated with that gene.The Locus Detail Page (Fig. 1) is the heart of TAIR and isthe most frequently visited page in the entire resource.From this single page, users can navigate to GO and POannotations, amino acid and DNA sequence details,RNA expression data, publication lists, and externalresources that provide additional data for that particularlocus. Each locus page is divided into logically groupedcontent areas including gene structural features such asUTRs, introns, and exons; protein features such asdomains and molecular weights; polymorphisms; germ-plasms and phenotypes; associated publications; andothers. For ease of viewing, a limited amount of detailfrom the associated data is shown in the locus page.Clicking on a related data type opens a new detail pagefor that data element with additional information.

An example of a linked page providing an additionallevel of detail is the Annotation Detail page, which can

TAIR: MAKING AND MINING THE “GOLD STANDARD” PLANT GENOME 477

Page 5: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

FIG. 1. A TAIR Locus Detail page. The locus page aggregates and summarizes all the information in TAIR about any given locus with themost frequently used data toward the top. Each locus is referred to by its unique AGI identifier along with other names from the literature.The graphical representation of the locus is linked to GBrowse for viewing within the chromosomal context. GO and PO annotations arepresented in summary form; the detailed view is accessible from the link to annotation detail page (red circle). Similarly, links within eachdata subtype (e.g., Protein Data, Polymorphism) can be followed to access more detailed information from TAIR. Hyperlinks from the exter-nal links section go directly to resources outside of TAIR. Manually validated associated publications are shown at the bottom.

478 BERARDINI ET AL.

Page 6: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

be opened by clicking on the “Annotation Detail” linkon the locus page. This secondary page provides thedetailed evidence and other metadata associated to indi-vidual GO and PO annotations, including the publica-tion on which the annotation is based, the evidencecode or basis for linking the gene product to the con-trolled vocabulary term, and the person or organizationthat contributed the annotation. Figure 2 illustratesannotations attributed to a variety of sources.

DATA VISUALIZATION AND ANALYSIS TOOLS

TAIR’s main tools to visually interrogate the Arabidopsisgenome are the Seqviewer (http://tairvm09.tacc.utexas.edu/servlets/sv) (Huala et al., 2001) and GBrowse,(http://tairvm17.tacc.utexas.edu/cgi-bin/gb2/gbrowse/arabidopsis/) (Stein et al., 2002). Both can be used forexploration of the A. thaliana reference sequence andmapped sequenced objects such as cDNAs, ESTs, poly-morphisms, gene-disrupting T-DNA insertion sites, andmarkers. SeqViewer was built in-house and is searchableby feature name and short (15–150 nt) sequences. Itallows users to scan entire chromosomes or drill down to

the nucleotide level, viewing 10 kb of nucleotidesequence at a time with gene structures and mappedobjects clearly marked at the individual nucleotide level,making visualization of the locations of mapped objectsvery clear (Fig. 3a). GBrowse, a component of the GMODproject (Stein et al., 2002) provides views of additionaldata types not available in SeqViewer including methyla-tion and phosphorylation patterns, repeat regions, andorthologous genes in other species, as well as additionalexpression data types including peptides and RNA-seqreads (Fig. 3b). GBrowse can also be customized toinclude user-defined annotation tracks. Currently, themost recent genome annotation release, TAIR10 (Novem-ber 2010), provides the underlying data for both brows-ers. The reference genome will be updated to theAraport genome annotation when this is released.

TAIR features other specialized tools that are devel-oped in-house or draw upon data only available in TAIR.The chromosome map tool is used for drawing posi-tions of loci on the five Arabidopsis chromosomes(https://www.arabidopsis.org/jsp/ChromosomeMap/tool.jsp). Users upload a list of loci that are then marked onthe map and the image can be saved and used for

FIG. 2. A TAIR Annotation Detail page. This secondary page linked from the annotation summary on the locus page presents additionaldetails and links about the origins of each annotation including publication (linked to publication detail page), keyword (linked to keyworddetail page) annotation contributor (linked to the community detail page), evidence code, and evidence description.

TAIR: MAKING AND MINING THE “GOLD STANDARD” PLANT GENOME 479

Page 7: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

publications. TAIR also offers a variety of sequence analy-sis programs including NCBI BLAST (https://www.arabi-dopsis.org/Blast/index.jsp) (Altschul et al., 1990) andWU-BLAST (https://www.arabidopsis.org/wublast/index2.jsp) for identification of sequence similarity, PatMatch forshort sequence pattern matching (https://www.arabidop-sis.org/cgi-bin/patmatch/nph-patmatch.pl) (Yan et al.,2005), and statistical software for finding common motifsin upstream sequences (https://www.arabidopsis.org/tools/bulk/motiffinder/index.jsp). These sequence analysisprograms are paired with datasets that are generated in-house (https://www.arabidopsis.org/help/helppages/BLAST_help.jsp#datasets), some of which are exclu-sively available at TAIR.

STOCK ORDERING

The Arabidopsis research community has generated anextensive collection of biological resources that have

been made available via the major stock centers, theArabidopsis Biological Resource Center (ABRC; https://abrc.osu.edu/) and the Nottingham Arabidopsis StockCenter (NASC; http://arabidopsis.info/). Stocks includeall types of germplasms ranging from natural variants tosingle and multiple mutated lines, insertion mutants,and other transgenics, as well as DNA stocks (clones)and protein chips. TAIR and ABRC have a long-standingcollaboration in which TAIR provides the database infra-structure and software tools to facilitate ordering ofseed and DNA stocks, protein chips, and other experi-mental resources for the research community from theABRC. The high level of integration with ABRC meansthat stocks are linked to relevant data to facilitate dis-covery and are not just independently searchable andbrowseable from within TAIR. If researcher wants toidentify knockout lines to study the phenotype of agene of interest, there are several ways to find and orderthe relevant stock. Starting from a locus, all associated

FIG. 3. Genome Browsers at TAIR. (a): SeqViewer close-up view. Each of the five chromosomes is represented by a green line on top. Asubset of any given chromosome is shown in the blue-boxed area along with the specific tracks of data selected using the check boxes inthe upper left. (b): SeqViewer nucleotide view. Drilled down nucleotide level view of locus AT3G57280 showing the exact positions of exons(orange), introns (light blue), UTR (red), TDNA/transposon insertions (purple track), translation start/stop (boxed blue area). (c): GBrowseshowing similar data as (a) but with a different presentation. (d): Additional data tracks in GBrowse. Some examples of data types areshown that are visible in GBrowse but not in SeqViewer such as phosphorylation sites and transcriptional regulatory sequences.

480 BERARDINI ET AL.

Page 8: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

alleles and germplasms can be viewed on the locusdetail page (Fig. 1). Alternatively, the Polymorphismsearch (https://www.arabidopsis.org/servlets/Search?action5new_search&type5polyallele) can be used toidentify specific types of alleles for a given locus, oralleles can be identified based on physical locationusing SeqViewer or GBrowse. If TAIR has curated phe-notype descriptions for a stock, that information will bedisplayed in the Germplasm record. Users must have abasic (free) TAIR account and be affiliated with anactive laboratory to order stocks from ABRC; a subscrip-tion to TAIR is not required for searching for, browsingthrough, or ordering stocks.

COMMUNITY CURATION IN TAIR

As a community resource, TAIR is proactive in encour-aging users to contribute expertise and data. Becauseour community members are the experts in their fields,we actively solicit their contributions of GO and POannotations by directly contacting authors of articlespublished in Plant Physiology and The Plant Journal(Berardini et al., 2012). TAIR has well-established col-laborations with these journals which provide us withdata identifying authors of recently accepted articles, sothat we may contact them and remind them to submitdata to TAIR. An additional eight journals (Plant Cell,Journal of Integrative Plant Biology, Environmental

and Experimental Botany, Molecular Plant, Plant, Cell

and Environment, Journal of Experimental Botany,Plant Science, and Plant Physiology and Biochemistry)include instructions to submit data to TAIR in theirinstructions to authors or as a part of the manuscriptsubmission process. To make this process as simple aspossible, we developed an online data submission tool,TOAST (TAIR Online Annotation Submission Tool) thatguides researchers through the use of controlled vocab-ularies (https://www.arabidopsis.org/doc/submit/func-tional_annotation/123) (Berardini et al., 2012). We alsoincorporate submissions from researchers in spread-sheet format. Community contributed annotations areattributed to the submitter and are linked to the publi-cation record in TAIR as well as the locus. As of June2015, a total of 24,379 GO and 57,788 PO have beensubmitted to TAIR by 730 authors derived from experi-mental evidence described in 1,124 articles and pub-lished in 80 journals. Another way that registeredcommunity members can contribute data is by addinginformal comments about any data object from theobject detail (e.g., locus or germplasm detail) page.These comments are visible to everyone and are animportant way of sharing information. For example, if aresearcher orders a T-DNA insertion line from ABRCand finds that the insertion is not actually present, thatinformation can be added to the comments so that the

community is aware of the finding which may other-wise not ever be published.

COMMUNITY ENGAGEMENT

TAIR staff members provide many other important serv-ices to the research community in addition to biocura-tion. TAIR is a member of the Gene OntologyConsortium and actively participates both in the devel-opment of the ontologies to ensure plant biology isaccurately represented and in the specification of anno-tation standards. Curators attend professional meetingssuch as the International Society of Biocurators annualconference, to learn about and follow best practicesand ensure that TAIR adheres to the highest qualitystandards for data curation. Curators also staff the TAIRhelpdesk which receives and replies to an average of 10queries a week, ranging from requests for custom datasets to help using tools, nomenclature questions, sub-scription enquiries, and requests for collaboration. Werespond to questions not only about TAIR-specific dataand tools, but also general questions from researchersand students on a wide variety of topics. Researchersalso contact us to point out errors or omissions andmake requests to add certain links or datasets. Thisfeedback from our community is one of the very impor-tant ways in which we maintain the accuracy and integ-rity of information in the database and informs ourdecision making regarding priority areas for datacuration.

As a central organizing resource for the community,we strive to help foster connections among researchersand disseminate information of interest to them.Researchers can promote conferences and list jobopportunities through TAIR; we typically post severalon the TAIR site every week and also funnel them intoour Twitter feed (@tair_news). Job seekers can sub-scribe to the job posting RSS feed to be notified whennew opportunities are posted.

Recently we have begun using social media moreextensively and have reached over 2,400 Facebook fol-lowers (https://www.facebook.com/tairnews) and over850 Twitter followers (https://twitter.com/tair_news).Our Facebook community increased by 150% afterimplementing a global ad campaign in October of 2014that was used to reach out to more members of theresearch community. We regularly post articles of inter-est to both Facebook and Twitter with positive engage-ment from our followers.

We also engage directly with community members atconferences such as the Plant and Animal Genomemeeting, the annual meeting of the American Societyfor Plant Biology, and the International Conference forArabidopsis Research. Through a combination of work-shops, exhibit hall booths, posters, and both spontane-ous and planned one-on-one encounters, we gather

TAIR: MAKING AND MINING THE “GOLD STANDARD” PLANT GENOME 481

Page 9: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

information on the needs of our scientific communityas well as the new trends in plant biology both inresearch areas and technology. We also conduct occa-sional online surveys to solicit community input on vari-ous aspects of TAIR usage and future directions.

TAIR’S PLACE IN THE PLANT BIOLOGYRESEARCH LANDSCAPE

TAIR is part of an extensive and evolving ecosystem ofdatabases and tools that researchers can use to accessArabidopsis data. The main function of a model orga-nism database is to serve as a trusted, integrated sourceof information for the organism and TAIR has servedthat role for many years. In the time since TAIR waslaunched, the amount of information available hasgrown tremendously along with the number of placeswhere that information resides. Gene and protein infor-mation about Arabidopsis, including data from TAIR,can be found in the widely-used NCBI and UniProt data-bases. Examples of other popular resources that includeArabidopsis data are: Genevisible (http://genevisible.com/) (Zimmermann et al., 2004), the electronic Fluo-rescent Pictograph (eFP) Browser (http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi) (Winter et al., 2007), andMapMan (http://mapman.gabipd.org/web/guest/map-man) (Thimm et al., 2004) for gene expression data;AraCyc for metabolic pathways (Rhee et al., 2006); theBio-Analytic Resource (BAR; http://bar.utoronto.ca/wel-come.htm) and Membrane Based Interactome Database(MIND; https://associomics.dpb.carnegiescience.edu/Associomics/Home.html) for interaction data; 1001genomes for natural variations (http://1001genomes.org/), and Aramemnon (http://aramemnon.uni-koeln.de/) for analyzing proteins based on subcellular localiza-tion. TAIR’s role within this rapidly expanding data eco-system is to provide researchers with a dataset ofexperimentally verified results along with their associ-ated sources and experimental methods that accuratelyand consistently represent the current state of knowl-edge about Arabidopsis gene function. To accomplishthis, TAIR focuses primarily on extraction of experimen-tal results and associated information from the Arabi-dopsis research literature and representation of theseresults in both computable (as Gene Ontology annota-tions) and human-readable (as a series of convenientlyorganized web pages) formats. No other resource in thecurrent data ecosystem has taken on this charge, andTAIR has been and still remains the ultimate source of alarge majority (�80%) of this type of data at all the otherresources within our data landscape, with the remain-der coming from multispecies curation efforts at Uni-Prot/SwissProt.

The loss of NSF funding for TAIR and the perceivedconsequences of closure were deeply felt by theresearch community which recognized a continuing

need for an Arabidopsis resource and made recommen-dations that ultimately led to the formation of AraPort(International Arabidopsis Informatics Consortium,2010; Krishnakumar et al., 2015). TAIR’s decision, in2013, to pursue alternative funding models rather thanshut down was a response both to the Arabidopsis com-munity, which voiced strong support for TAIR (https://www.arabidopsis.org/doc/about/tair_survey/411), andthe need to find scalable solutions to broadly addressthe question of database sustainability. Ultimately thisdecision benefits the community because our emphasison functional annotation and continuous curation ofthe literature complements Araport’s focus on genomeannotation and analytics. These two Arabidopsis resour-ces reinforce each other and help ensure that the plantresearch community continues to have access to thehighest quality data and most completely and accuratelyannotated plant genome possible.

THE SUSTAINABILITY CRISIS AND TAIR’STRANSITION TO SUBSCRIPTION SUPPORT

Historically, biological databases have been funded bygrants from national funding agencies, either as stand-alone projects or as outputs of broader research efforts.Yet, despite the important role of these databases insupporting research, there are currently no establishedmechanisms to supply long-term funding. With theexception of a very few repositories with fundingdirectly from line-item federal budget allocations,ongoing support generally depends on repeatedlyapplying for, and successfully acquiring, competitivegrant funds (Abbott, 2009; Merali and Giles, 2005).While the number of repositories has increased, therehas been no commensurate increase in funding, asagency budgets have remained relatively flat over thelast decade (AAAS Society, 2014).

In response to this sustainability crisis, and feedbackfrom our community, TAIR established a new user-based funding mechanism. From 1999, when theresource was established, to August 2013, TAIR wassupported primarily by grants from the National Sci-ence Foundation. In September 2013, the remainingTAIR team founded a 501(c)(3) nonprofit entity, Phoe-nix Bioinformatics, with the explicit aim of developingsustainable funding models for data repositories, start-ing with TAIR as a test case. As a popular and highlyused database, TAIR was in a good position to pioneer anew approach to funding scientific resources based oncontributions from users. TAIR began requiring sub-scriptions for companies in October 2013 and for aca-demic/nonprofit researchers in April 2014. As of June2015, TAIR is supported by a large base of subscribersincluding two country-wide subscriptions (China andSwitzerland), 140 individual academic institutions, fouracademic consortia, and �200 individual researchers.

482 BERARDINI ET AL.

Page 10: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

To enforce access restrictions, Phoenix developed asubscription management system customized for TAIR’sneeds. The user-based subscription funding model pro-vides greater stability and longevity for fundamentallyimportant datasets, better alignment of project goalswith the needs of the research community, and the abil-ity to scale up funding support in direct proportion tothe demand for a dataset or tool. A subscription modelalso provides a way to fund valuable activities that aredifficult to support via traditional grant funding, includ-ing manual curation of research literature.

ASSURING BROAD ACCESS TO TAIR DATA

To ensure that Arabidopsis data remain open and acces-sible while still retaining an incentive to subscribe, weadopted the following set of guiding principles to maxi-mize data availability: (1) TAIR data are made freelyavailable to all after one year to facilitate reuse by otherrepositories (ftp://ftp.arabidopsis.org/home/tair/TAIR_Public_Releases/); (2) subscriptions for more recentdata must be affordable and offered at a range of levelsincluding country, consortium, institution, and individ-ual to maximize subscription options and coverage ofresearchers; (3) researchers can access a few pages permonth of recent data without subscribing, to facilitateaccess by occasional users (e.g., animal researchers);(4) students taking a course in which TAIR is a coursematerial, provided with free access; (5) free access isprovided to the lowest income countries https://www.arabidopsis.org/doc/about/tair_subscriptions/413).Datasets resulting from our grant-funded Gene Ontol-ogy data curation and collection work are released on amonthly basis to the Gene Ontology Consortium with-out delay and are available at http://geneontology.org/.

CURRENT AND FUTURE DIRECTIONS FOR TAIRAND PHOENIX

In the first few months following the end of TAIR’s fed-eral funding in late 2013, the remaining TAIR staff laidthe groundwork for future efforts by establishing Phoe-nix Bioinformatics as a separate 501(c)(3) nonprofitorganization located in Redwood City, California. Assubscription revenue increased over the course ofPhoenix’s first year of operations in 2014, we were ableto add additional staff and shift our focus from maintain-ing a basic level of curation and software support toenhancing TAIR with an increase in both the number ofarticles curated and the amount and variety of dataextracted. Specifically, we have doubled the number ofpapers curated per month and resumed curating newalleles and phenotypes which had been dramaticallyreduced during the past two years. Going forward,TAIR will expand efforts to capture and integrate exper-imental data from the literature for better coverage of

new research results especially as new technologies areadopted by the community. We have also begun makinglong overdue improvements to the TAIR software,including streamlining our registration process, restruc-turing parts of the locus detail pages and other parts ofthe TAIR site for improved usability, and enhancing ourinternal curation software to improve the efficiency ofour curation efforts.

As we move into our second year of user-supportedfunding, we will continue to provide new and updatedfunctional information for Arabidopsis genes and add tothe corpus of allele, phenotype, and publication infor-mation in TAIR. New reference datasets will be inte-grated, including the new genome release expectedfrom Araport. With a recent grant from the Alfred P.Sloan Foundation, we have begun developing a newcloud-based subscription management system capableof providing subscription services to additional biologi-cal databases and other research resources in need ofstable financial support. In keeping with the nonprofitmission and vision of Phoenix Bioinformatics, our newhome, we will continue to explore sustainable fundingmodels, apply our lessons learned to other databases inneed of help, and support the community of biologicaldatabases and the researchers that rely on them fortheir work.

ACKNOWLEDGMENTS

TAIR is supported by national, academic institutional, cor-porate, and individual subscriptions (see, https://goo.gl/NYY6cf for a list of institutional subscribers). The contentis solely the responsibility of the authors and does notnecessarily represent the official views of the NationalInstitutes of Health. TAIR is administered by the501(c)(3) non-profit Phoenix Bioinformatics (www.phoe-nixbioinformatics.org).

LITERATURE CITED

AAAS Society. 2014. Historical Trends in Federal R&D.http://www.aaas.org/page/historical-trends-federal-rd.Accessed July 30, 2015.

Abbott A. 2009. Plant genetics database at risk as fundsrun dry. Nat News 462:258–259.

Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H,Shinn P, Stevenson DK, Zimmerman J, Barajas P,Cheuk R, Gadrinab C, Heller C, Jeske A, Koesema E,Meyers CC, Parker H, Prednis L, Ansari Y, Choy N,Deen H, Geralt M, Hazari N, Hom E, Karnes M,Mulholland C, Ndubaku R, Schmidt I, Guzman P,Aguilar-Henonin L, Schmid M, Weigel D, Carter DE,Marchand T, Risseeuw E, Brogden D, Zeko A,Crosby WL, Berry CC, Ecker JR. 2003. Genome-wide insertional mutagenesis of Arabidopsis thali-

ana Science 301:653–657.

TAIR: MAKING AND MINING THE “GOLD STANDARD” PLANT GENOME 483

Page 11: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.1990. Basic local alignment search tool. J Mol Biol215:403–410.

Arabidopsis Genome Initiative. 2000. Analysis of thegenome sequence of the flowering plant Arabidop-

sis thaliana. Nature 408:796–815.Berardini TZ, Li D, Muller R, Chetty R, Ploetz L, Singh S,

Wensel A, Huala E. 2012. Assessment of community-submitted ontology annotations from a noveldatabase-journal partnership. Database 2012:bas030.

Berardini TZ, Mundodi S, Reiser L, Huala E, Garcia-Hernandez M, Zhang P, Mueller LA, Yoon J, Doyle A,Lander G, Moseyko N, Yoo D, Xu I, Zoeckler B,Montoya M, Miller N, Weems D, Rhee SY. 2004.Functional annotation of the Arabidopsis genomeusing controlled vocabularies. Plant Physiol 135:745–755.

Burge S, Kelly E, Lonsdale D, Mutowo-Muellenet P,McAnulla C, Mitchell A, Sangrador-Vegas A, Yong S-Y, Mulder N, Hunter S. 2012. Manual GO annotationof predictive protein signatures: The InterProapproach to GO curation. Database 2012:bar068.

Cooper L, Walls RL, Elser J, Gandolfo MA, StevensonDW, Smith B, Preece J, Athreya B, Mungall CJ,Rensing S, Hiss M, Lang D, Reski R, Berardini TZ, LiD, Huala E, Schaeffer M, Menda N, Arnaud E,Shrestha R, Yamazaki Y, Jaiswal P. 2013. The plantontology as a tool for comparative plant anatomyand genomic analyses. Plant Cell Physiol 54:e1.

Gene Ontology Consortium, Blake JA, Dolan M,Drabkin H, Hill DP, Li N, Sitnikov D, Bridges S,Burgess S, Buza T, McCarthy F, Peddinti D, Pillai L,Carbon S, Dietze H, Ireland A, Lewis SE, Mungall CJ,Gaudet P, Chrisholm RL, Fey P, Kibbe WA, Basu S,Siegele DA, McIntosh BK, Renfro DP, Zweifel AE, HuJC, Brown NH, Tweedie S, Alam-Faruque Y,Apweiler R, Auchinchloss A, Axelsen K, Bely B,Blatter M-C, Bonilla C, Bouguerleret L, Boutet E,Breuza L, Bridge A, Chan WM, Chavali G, Coudert E,Dimmer E, Estreicher A, Famiglietti L, FeuermannM, Gos A, Gruaz-Gumowski N, Hieta R, Hinz C,Hulo C, Huntley R, James J, Jungo F, Keller G, LaihoK, Legge D, Lemercier P, Lieberherr D, Magrane M,Martin MJ, Masson P, Mutowo-Muellenet P,O’Donovan C, Pedruzzi I, Pichler K, Poggioli D,Porras Mill�an P, Poux S, Rivoire C, Roechert B,Sawford T, Schneider M, Stutz A, Sundaram S,Tognolli M, Xenarios I, Foulgar R, Lomax J,Roncaglia P, Khodiyar VK, Lovering RC, Talmud PJ,Chibucos M, Giglio MG, Chang H-Y, Hunter S,McAnulla C, Mitchell A, Sangrador A, Stephan R,Harris MA, Oliver SG, Rutherford K, Wood V, BahlerJ, Lock A, Kersey PJ, McDowall DM, Staines DM,Dwinell M, Shimoyama M, Laulederkind S, HaymanT, Wang S-J, Petri V, Lowry T, D’Eustachio P,

Matthews L, Balakrishnan R, Binkley G, Cherry JM,Costanzo MC, Dwight SS, Engel SR, Fisk DG, HitzBC, Hong EL, Karra K, Miyasato SR, Nash RS, Park J,Skrzypek MS, Weng S, Wong ED, Berardini TZ,Huala E, Mi H, Thomas PD, Chan J, Kishore R,Sternberg P, Van Auken K, Howe D, Westerfield M.2013. Gene Ontology annotations and resources.Nucleic Acids Res 41:D530–D535.

Hill DP, Smith B, McAndrews-Hill MS, Blake JA. 2008.Gene Ontology annotations: What they mean andwhere they come from. BMC Bioinformatics 9:S2.

Huala E, Dickerman AW, Garcia-Hernandez M, WeemsD, Reiser L, LaFond F, Hanley D, Kiphart D, ZhuangM, Huang W, Mueller LA, Bhattacharyya D, Bhaya D,Sobral BW, Beavis W, Meinke DW, Town CD,Somerville C, Rhee SY. 2001. The Arabidopsis Infor-mation Resource (TAIR): A comprehensive databaseand web-based information retrieval, analysis, andvisualization system for a model plant. NucleicAcids Res 29:102–105.

International Arabidopsis Informatics Consortium.2010. An international bioinformatics infrastructureto underpin the Arabidopsis community. Plant Cell22:2530–2536.

Jaiswal P, Avraham S, Ilic K, Kellogg EA, McCouch S,Pujar A, Reiser L, Rhee SY, Sachs MM, Schaeffer M,Stein L, Stevens P, Vincent L, Ware D, Zapata F.2005. Plant Ontology (PO): A controlled vocabularyof plant structures and growth stages. Comp FunctGenomics 6:388–397.

Koornneef M, Meinke D. 2010. The development of Ara-bidopsis as a model plant. Plant J 61:909–921.

Krishnakumar V, Hanlon MR, Contrino S, Ferlanti ES,Karamycheva S, Kim M, Rosen BD, Cheng C-Y,Moreira W, Mock SA, Stubbs J, Sullivan JM, KrampisK, Miller JR, Micklem G, Vaughn M, Town CD. 2015.Araport: The Arabidopsis information portal.Nucleic Acids Res 43:D1003–D1009.

Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C,Sasidharan R, Muller R, Dreher K, Alexander DL,Garcia-Hernandez M, Karthikeyan AS, Lee CH,Nelson WD, Ploetz L, Singh S, Wensel A, Huala E.2012. The Arabidopsis Information Resource(TAIR): Improved gene annotation and new tools.Nucleic Acids Res 40:D1202–D1210.

Lamesch P, Dreher K, Swarbreck D, Sasidharan R, ReiserL, Huala E. 2010. Using the Arabidopsis InformationResource (TAIR) to find information about Arabi-dopsis genes. Curr Protoc Bioinformatics 30:1.11.1–1.11.51.

Li D, Berardini TZ, Muller RJ, Huala E. 2012. Building anefficient curation workflow for the Arabidopsis liter-ature corpus. Database 2012:bas047

Merali Z, Giles J. 2005. Databases in peril. Nature 435:1010–1011.

484 BERARDINI ET AL.

Page 12: The arabidopsis information resource: Making and mining the … · 2017-11-18 · associate genes to research articles (Yoo et al., 2006). Curators read the abstracts of all new articles,

Rhee SY, Zhang P, Foerster H, Tissier C. 2006. AraCyc:Overview of an arabidopsis metabolism databaseand its applications for plant research. In: SaitoPDK, Dixon PDRA, Willmitzer PDL, editors. Plantmetabolomics. Heidelberg, Berlin: Springer. pp141–154.

Rosso MG, Li Y, Strizhov N, Reiss B, Dekker K,Weisshaar B. 2003. An Arabidopsis thaliana T-DNAmutagenized population (GABI-Kat) for flankingsequence tag-based reverse genetics. Plant Mol Biol53:247–259.

Schmid M, Davison TS, Henz SR, Pape UJ, Demar M,Vingron M, Sch€olkopf B, Weigel D, Lohmann JU.2005. A gene expression map of Arabidopsis thali-

ana development. Nat Genet 37:501–506.Sessions A, Burke E, Presting G, Aux G, McElver J,

Patton D, Dietrich B, Ho P, Bacwaden J, Ko C, ClarkeJD, Cotton D, Bullis D, Snell J, Miguel T, HutchisonD, Kimmerly B, Mitzel T, Katagiri F, Glazebrook J,Law M, Goff SA. 2002. A high-throughput Arabidop-sis reverse genetics system. Plant Cell 14:2985–2994.

Somerville C, Koornneef M. 2002. A fortunate choice:The history of Arabidopsis as a model plant. NatRev. Genetics 3:883–889.

Stein LD, Mungall C, Shu S, Caudy M, Mangone M, DayA, Nickerson E, Stajich JE, Harris TW, Arva A, LewisS. 2002. The Generic Genome Browser: A buildingblock for a model organism system database.Genome Res 12:1599–1610.

Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R,Ploetz L, Radenbaugh A, Singh S, Swing V, Tissier C,Zhang P, Huala E. 2008. The Arabidopsis Informa-tion Resource (TAIR): Gene structure and functionannotation. Nucleic Acids Res 36:D1009–D1014.

The Gene Ontology Consortium; Ashburner M, Ball CA,Blake JA, Botstein D, Butler H, Cherry JM, Davis AP,Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP,

Issel-Tarver L, Kasarskis A, Lewis S, Matese JC,Richardson JE, Ringwald M, Rubin GM, Sherlock G.2000. Gene ontology: Tool for the unification ofbiology. Nat Genet 25:25–29.

Thimm O, Bl€asing O, Gibon Y, Nagel A, Meyer S, Kr€ugerP, Selbig J, M€uller LA, Rhee SY, Stitt M. 2004. MAP-MAN: A user-driven tool to display genomics datasets onto diagrams of metabolic pathways and otherbiological processes. Plant J 37:914–939.

Till BJ, Reynolds SH, Greene EA, Codomo CA, Enns LC,Johnson JE, Burtner C, Odden AR, Young K, TaylorNE, Henikoff JG, Comai L, Henikoff S. 2003. Large-scale discovery of induced point mutations withhigh-throughput TILLING. Genome Res 13:524–530.

Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV,Provart NJ. 2007. An “Electronic FluorescentPictograph” browser for exploring and analyzinglarge-scale biological data sets (I Baxter, Ed.). PLoSOne 2:e718.

Woody ST, Austin-Phillips S, Amasino RM, Krysan PJ.2007. The WiscDsLox T-DNA collection: An arabi-dopsis community resource generated by using animproved high-throughput T-DNA sequencing pipe-line. J Plant Res 120:157–165.

Yan T, Yoo D, Berardini TZ, Mueller LA, Weems DC,Weng S, Cherry JM, Rhee SY. 2005. PatMatch: A pro-gram for finding patterns in peptide and nucleotidesequences. Nucleic Acids Res 33:W262–W266.

Yoo D, Xu I, Berardini TZ, Rhee SY, Narayanasamy V,Twigger S. 2006. PubSearch and PubFetch: A simplemanagement system for semiautomated retrievaland annotation of biological information from theliterature. Curr Protoc Bioinformatics 13:9.7.1–9.7.27.

Zimmermann P, Hirsch-Hoffmann M, Hennig L,Gruissem W. 2004. GENEVESTIGATOR. Arabidopsismicroarray database and analysis toolbox. PlantPhysiol 136:2621–2632.

TAIR: MAKING AND MINING THE “GOLD STANDARD” PLANT GENOME 485