tair: bringing together data for the global plant biology community philippe lamesch kate dreher the...

62
TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource www.arabidopsis.org contact us: [email protected]

Upload: jacob-combs

Post on 27-Mar-2015

218 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

TAIR: Bringing together data for the global plant biology community

Philippe LameschKate Dreher

The Arabidopsis Information Resourcewww.arabidopsis.org

contact us: [email protected]

Page 2: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

o Philippe Lamesch

Introducing TAIR and PMN

TAIR10 genome annotation

TAIR gene confidence ranking

TAIR tools

o Kate Dreher

Ee

Rr

Outline

Page 3: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

TAIR: The Arabidopsis Information Resource

• collect, curate and distribute information on Arabidopsis• information freely available from arabidopsis.org

Page 4: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Slides available from TAIR www.arabidopsis.org

Page 5: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

TAIR is used worldwide

Visits per month (source: Google Analytics)

Page 6: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

TAIR usage worldwide : July 2009-July 2010

Page 7: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

What TAIR does:(1) Arabidopsis genome annotation

Page 8: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

What TAIR does:(2) manual literature curation

• Controlled vocabulary annotations

Gene Ontology (GO) http://www.geneontology.org/

Plant Ontology (PO) http://www.plantontology.org/

• Gene name, symbol

• Allele, phenotype

• Summary statement composition

Page 9: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Who we partner with:

PMN (Plant Metabolic Network) and PlantCyc

A comprehensive plant biochemical pathway database, containing curated information from the literature and computational analyses about the genes, enzymes, compounds, reactions, and pathways involved in primary and secondary metabolism

Page 10: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Who we partner with:ABRC

Distribution of biological research materials

Page 11: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

• A new approach for improving the Arabidopsis genome annotation for TAIR10

• The Arabidopsis gene structure confidence ranking

Arabidopsis genome annotation

Page 12: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Arabidopsis genome annotation

• Arabidopsis genome sequenced almost 10 years ago• High quality sequence with few gaps• TIGR did initial genome annotation• TAIR took over responsibility in 2005• Current TAIR9 stats: 27,379 protein coding genes 4827 pseudogenes or transposable elements 1312 ncRNAs

Page 13: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Genome annotation at TAIRAdd novel genesUpdate exon/intron structures of existing genesDelete mispredicted genesMerge and split genesChange gene typesAdd splice-variants

Page 14: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Genome annotation at TAIR

Annotate ‘atypical’ gene classes

* * * ** * *

Trans. element

Short protein-coding genes

Transposable element genes

Pseudogenes

uORFs (genes within UTR of other genes)

Add novel genesUpdate exon/intron structures of existing genesDelete mispredicted genesMerge and split genesChange gene typesAdd splice-variants

Page 15: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Arabidopsis gene structure annotation A new approach

TAIR6-TAIR9: Use ESTs and cDNAs and a assembly tool called PASA to improve gene structures

TAIR10

TAIR10: Use new experimental data and new prediction tools to further improve gene structure predictions

Page 16: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Using PASA and ESTs/cDNAs

Clustered transcripts

NCBI

Genome annotation TAIR6-TAIR9

Page 17: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Clustered transcripts

Resulting gene model

NCBI

Using PASA and ESTs/cDNAs

Genome annotation TAIR6-TAIR9

Page 18: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Clustered transcripts

Resulting gene model

Previous gene model

NCBI

comparison

Novel genesNew Splice-variantsGene structure updates

Using PASA and ESTs/cDNAs

Genome annotation TAIR6-TAIR9

Page 19: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

ESTs

cDNAs

Radish sequence alignmentsEugene

predictiondicot sequence alignments

monocot sequence alignments

Aceview genepredictions

2 gene isoforms

Manual annotation at TAIR: Apollo

Short MS peptide

Page 20: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

TAIR10: using proteomics and RNA-seq data to improve genome annotation

4-step process:1.Mapping RNA seq & Peptides2.Assembly/Gene built3.Manual review4.Integration (genome release/Gbrowse)

Page 21: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Mapping and Assembly1. Mapping• RNA-seq sequences (Tophat (C. Trapnell),

Supersplat (T.C. Mockler))• Peptides (6-frame translation, spliced exon graph)

2. Assembly approaches• Augustus (M. Stanke)o Uses spliced RNA seq reads, peptideso Aim: Identify additional splice-variants, update existing

genes• TAU (T.C. Mockler)o Uses spliced RNA seq readso Aim: Identify additional splice-variants• Cufflinks (C. Trapnell)o Uses spliced and unspliced RNA seq datao Aim: Identify novel genes

Page 22: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Augustus

TopHat, SuperSplat

145,000 RNA-seq junctions based on >1 read

203,000 clustered spliced RNA-seq junctions

(spliced RNA-seq junction)

RNA-seq datasets (Mockler Lab, Ecker Lab)

200 Million aligned RNA-seq reads

Page 23: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Augustus145,000 RNA-seq junctions based on >1 read 260,000 peptides (Baerenfaller et al, Castellana et al)

Augustus gene prediction

+ ESTs & cDNAs+ AGI models

11% of RNA-seq junctions incorporated into Augustus models64% of peptide sequences incorporated into Augustus models

Predicted Augustus models:5461 distinct models1596 novel models

Page 24: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Categorisation/Review

TAU Models

RNA-seq Junctions

Augustus Model

TAIR confidence rank

TAIR Model

Peptides

(Splice variants, NMD targets)

(correction)

(colour reflects matching model)

Incorrect junction in TAIR model

Unsupported exon

Page 25: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Example Augustus update

Page 26: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Example Augustus splice variant

Page 27: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Example 2 August splice variant

Page 28: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Augustus/TAU/Cufflinks Augustus• Incorporate 64% of peptides not contained in TAIR, 11 % for RNA-seq

junctions• 5461 potential updated genes• 1596 potential novel genesTAU• 30,083 junctions distinct to Augustus or TAIR models• 10,902 junctions incorporated into 10,491 TAU modelsCufflinks• 367 novel assemblies which fall above the 100 bp

#TE-filter applied to AUG and cufflinks models

Page 29: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Preliminary TAIR 10 Results

Novel genes Updated genes Splice-variants B-list Rejects

Page 30: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Preliminary TAIR 10 Results

Novel genes 126 Updated genes 1182Splice-variants 5885 (18% of all loci) B-list 1586 Rejects 2318

Page 31: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Gene Confidence Rank

• Attributes confidence scores to all exons and gene models based on different types of experimental and computational evidence

Page 32: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Assigning A Confidence Rank

E1

E4

Page 33: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Full support

No support

Page 34: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

New and updated tools at TAIR

• N-Browse• GBrowse• Synteny viewer

Page 35: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

• N-Browse (in collaboration wit the Kris Gunsalus Lab, NYU)

• > 7,000 experimental interactions• Interactions curated by TAIR, IntAct & BioGrid• Tutorial at

http://www.arabidopsis.org/tools/nbrowse.jsp#nb-tut

New and updated tools at TAIR

Page 36: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

N-Browse

Page 37: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

N-Browse: Finding information about edges (interactions)

Page 38: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

N-Browse: How to select and move nodes

Page 39: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

N-Browse: How to visualize GO terms from a selected set of nodes

Page 40: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

N-Browse: How to load your own file and overlay it with the curated interaction data

Page 41: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

N-Browse: How to save your session and export your data

Page 42: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

New Tools at TAIR

• N-Browse• GBrowse• Synteny viewer

Page 43: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

GBrowseHeader

Main Browser Window

Track Menu

Page 44: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Alternative gene annotations

• Eugene (transcript, proteins +) Thierry-Mieg (NCBI)

• Gnomon (transcript, proteins) Souvorov (NCBI)

• Aceview (transcript) Sebastien Aubourg

• Hanada et al 2007 (3633 predicted genes)

Page 45: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Proteomic Data• High-density Arabidopsis proteome map (Baerenfaller.

2008)Incorrect start codon

Page 46: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

VISTA plot Gbrowse track

Page 47: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Transcriptome data

Page 48: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Orthologs and Gene Families

Page 49: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Variation

Page 50: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Promoter Elements

Page 51: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Methylation

Page 52: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Decorated Fasta file

Page 53: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Decorated Fasta file

Page 54: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Decorated Fasta file

Page 55: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

New Tools at TAIR

• N-Browse• GBrowse• Synteny viewer

Data provided by Pedro Pattyn at the University of Ghent

Page 56: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

AT5G48000

AT5G48010

AT5G47990

Page 57: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource
Page 58: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

www.arabidopsis.org

[email protected]

www.plantcyc.org

[email protected]

Page 59: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource
Page 60: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Example 2 Augustus update

Page 61: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

GBrowseHeader

Main Browser Window

Track Menu

Page 62: TAIR: Bringing together data for the global plant biology community Philippe Lamesch Kate Dreher The Arabidopsis Information Resource

Gbrowse