making best use of tair tools and datasets philippe lamesch donghui li the arabidopsis information...
TRANSCRIPT
![Page 1: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/1.jpg)
Making best use of TAIR tools and datasets
Philippe LameschDonghui Li
The Arabidopsis Information Resourcewww.arabidopsis.org
contact us: [email protected]
![Page 2: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/2.jpg)
TAIR: The Arabidopsis Information Resource
• collect, curate and distribute information on Arabidopsis• information freely available from arabidopsis.org
![Page 3: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/3.jpg)
• Gene structure – Philippe Lamesch
• Gene function – Donghui Li
• Metabolic pathway – Donghui Li
• New tools – Philippe Lamesch
Outline
![Page 4: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/4.jpg)
Slides available from TAIR www.arabidopsis.org
![Page 5: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/5.jpg)
TAIR is used worldwideVisits per month (source: Google Analytics)
![Page 6: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/6.jpg)
TAIR usage in Asia: June 2009-June 2010
![Page 7: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/7.jpg)
What we do: (1) Arabidopsis genome annotation
![Page 8: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/8.jpg)
What we do: (2) manual literature curation
• Controlled vocabulary annotations
Gene Ontology (GO) http://www.geneontology.org/
Plant Ontology (PO) http://www.plantontology.org/
• Gene name, symbol
• Allele, phenotype
• Summary statement composition
![Page 9: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/9.jpg)
What we do: (3) metabolic pathway curation
AraCyc
A metabolic pathway database for Arabidopsis thaliana that contains information about both predicted and experimentally determined pathways, reactions, compounds, genes and enzymes.
PlantCyc and PMN (Plant Metabolic Network)
![Page 10: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/10.jpg)
What we do: (4) work with ABRC to distribute research material
![Page 11: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/11.jpg)
Part I: The Arabidopsis genome annotation
• A new approach for improving the Arabidopsis genome annotation• Where to find gene structure related data at TAIR• The Arabidopsis gene structure confidence ranking
![Page 12: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/12.jpg)
Arabidopsis genome annotation
• Arabidopsis genome sequenced almost 10 years ago• High quality sequence with few gaps• TIGR did initial genome annotation• TAIR took over responsibility in 2005• Current TAIR9 stats: 27,379 protein coding genes 4827 pseudogenes or transposable elements 1312 ncRNAs
![Page 13: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/13.jpg)
Genome annotation at TAIRAdd novel genesUpdate exon/intron structures of existing genesDelete mispredicted genesMerge and split genesChange gene typesAdd splice-variants
![Page 14: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/14.jpg)
Genome annotation at TAIR
Annotate ‘atypical’ gene classes
* * * ** * *
Trans. element
Short protein-coding genes
Transposable element genes
Pseudogenes
uORFs (genes within UTR of other genes)
Add novel genesUpdate exon/intron structures of existing genesDelete mispredicted genesMerge and split genesChange gene typesAdd splice-variants
![Page 15: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/15.jpg)
Arabidopsis gene structure annotation A new approach
TAIR6-TAIR9: Use ESTs and cDNAs and a assembly tool called PASA to improve gene structures
TAIR10
TAIR10: Use new experimental data and new prediction tools to further improve gene structure predictions
![Page 16: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/16.jpg)
Using PASA and ESTs/cDNAs
Clustered transcripts
NCBI
Genome annotation TAIR6-TAIR9
![Page 17: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/17.jpg)
Clustered transcripts
Resulting gene model
NCBI
Using PASA and ESTs/cDNAs
Genome annotation TAIR6-TAIR9
![Page 18: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/18.jpg)
Clustered transcripts
Resulting gene model
Previous gene model
NCBI
comparison
Novel genesNew Splice-variantsGene structure updates
Using PASA and ESTs/cDNAs
Genome annotation TAIR6-TAIR9
![Page 19: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/19.jpg)
ESTs
cDNAs
Radish sequence alignmentsEugene
predictiondicot sequence alignments
monocot sequence alignments
Aceview genepredictions
2 gene isoforms
Manual annotation at TAIR: Apollo
Short MS peptide
![Page 20: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/20.jpg)
TAIR10: using proteomics and RNA-seq data to improve genome annotation
4-step process:1.Mapping RNA seq & Peptides2.Assembly/Gene built3.Manual review4.Integration (genome release/Gbrowse)
![Page 21: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/21.jpg)
Mapping and Assembly1. Mapping• RNA-seq sequences (Tophat (C. Trapnell),
Supersplat (T.C. Mockler))• Peptides (6-frame translation, spliced exon graph)
2. Assembly approaches• Augustus (M. Stanke)o Uses spliced RNA seq reads, peptideso Aim: Identify additional splice-variants, update existing
genes• TAU (T.C. Mockler)o Uses spliced RNA seq readso Aim: Identify additional splice-variants• Cufflinks (C. Trapnell)o Uses spliced and unspliced RNA seq datao Aim: Identify novel genes
![Page 22: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/22.jpg)
Augustus
TopHat, SuperSplat
145,000 RNA-seq junctions based on >1 read
203,000 clustered spliced RNA-seq junctions
(spliced RNA-seq junction)
RNA-seq datasets (Mockler Lab, Ecker Lab)
200 Million aligned RNA-seq reads
![Page 23: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/23.jpg)
Augustus145,000 RNA-seq junctions based on >1 read 260,000 peptides (Baerenfaller et al, Castellana et al)
Augustus gene prediction
+ ESTs & cDNAs+ AGI models
11% of RNA-seq junctions incorporated into Augustus models64% of peptide sequences incorporated into Augustus models
Predicted Augustus models:5461 distinct models1596 novel models
![Page 24: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/24.jpg)
Categorisation/Review
TAU Models
RNA-seq Junctions
Augustus Model
TAIR confidence rank
TAIR Model
Peptides
(Splice variants, NMD targets)
(correction)
(colour reflects matching model)
Incorrect junction in TAIR model
Unsupported exon
![Page 25: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/25.jpg)
Example Augustus update
![Page 26: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/26.jpg)
Example 2 Augustus update
![Page 27: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/27.jpg)
Example Augustus splice variant
![Page 28: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/28.jpg)
Example 2 August splice variant
![Page 29: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/29.jpg)
Augustus/TAU/Cufflinks Augustus• Incorporate 64% of peptides not contained in TAIR, 11 % for RNA-seq
junctions• 5461 potential updated genes• 1596 potential novel genesTAU• 30,083 junctions distinct to Augustus or TAIR models• 10,902 junctions incorporated into 10,491 TAU modelsCufflinks• 367 novel assemblies which fall above the 100 bp & >15 FPKM filter
#TE-filter applied to AUG and cufflinks models 4
![Page 30: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/30.jpg)
Preliminary Results
4
Augustus/TAU/Cufflinks predicted models are classified into categories:
Novel genes Updated genes Splice-variants B-list Rejects
![Page 31: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/31.jpg)
Preliminary Results
4
Augustus/TAU/Cufflinks predicted models are classified into categories:
Novel genes 21 Updated genes 812Splice-variants 2134 B-list 1586 Rejects 2318
![Page 32: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/32.jpg)
Where can you find gene structure data on TAIR?
• ON GENE MODEL PAGE• Graphic of exon-intron structure• Coordinates of each exon• ON GBROWSE• Graphic display of structure and overlapping
evidence data• ON FTP SITE• GFF files with exact structures of each gene model• Files with gene confidence ranking information
![Page 33: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/33.jpg)
Gene Locus Page
![Page 34: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/34.jpg)
Gene Model Page
![Page 35: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/35.jpg)
Where can you find gene structure data on TAIR?
• ON GENE MODEL PAGE• Graphic of exon-intron structure• Coordinates of each exon• ON GBROWSE• Graphic display of structure and overlapping
evidence data• ON FTP SITE• GFF files with exact structures of each gene model• Files with gene confidence ranking information
![Page 36: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/36.jpg)
Gbrowse
![Page 37: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/37.jpg)
GBrowseHeader
Main Browser Window
Track Menu
![Page 38: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/38.jpg)
Where can you find gene structure data on TAIR?
• ON GENE MODEL PAGE• Graphic of exon-intron structure• Coordinates of each exon• ON GBROWSE• Graphic display of structure and overlapping
evidence data• ON FTP SITE• GFF files with exact structures of each gene model• Files with gene confidence ranking information
![Page 39: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/39.jpg)
FTP site
![Page 40: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/40.jpg)
FTP site
![Page 41: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/41.jpg)
FTP site
![Page 42: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/42.jpg)
Where can you find gene structure data on TAIR?
• ON GENE MODEL PAGE• Graphic of exon-intron structure• Coordinates of each exon• ON GBROWSE• Graphic display of structure and overlapping
evidence data• ON FTP SITE• GFF files with exact structures of each gene model• Files with gene confidence ranking information
![Page 43: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/43.jpg)
Gene Confidence Rank
• Attributes confidence scores to all exons and gene models based on different types of experimental and computational evidence
![Page 44: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/44.jpg)
Assigning A Confidence Rank
E1
E4
![Page 45: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/45.jpg)
Full support
No support
![Page 46: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/46.jpg)
New Tools at TAIR
• N-Browse• GBrowse• Synteny viewer
![Page 47: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/47.jpg)
New Tools at TAIR
• N-Browse (in collaboration wit the Kris Gunsalus Lab, NYU)
• GBrowse• Synteny viewer
![Page 48: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/48.jpg)
N-Browse
![Page 49: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/49.jpg)
N-Browse: Finding information about edges (interactions)
![Page 50: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/50.jpg)
N-Browse: How to select and move nodes
![Page 51: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/51.jpg)
N-Browse: How to visualize GO terms from a selected set of nodes
![Page 52: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/52.jpg)
N-Browse: How to load your own file and overlay it with the curated interaction data
![Page 53: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/53.jpg)
N-Browse: How to save your session and export your data
![Page 54: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/54.jpg)
New Tools at TAIR
• N-Browse• GBrowse• Synteny viewer
![Page 55: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/55.jpg)
GBrowseHeader
Main Browser Window
Track Menu
![Page 56: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/56.jpg)
Alternative gene annotations• Eugene (transcript, proteins +) Thierry-Mieg (NCBI)
• Gnomon (transcript, proteins) Souvorov (NCBI)
• Aceview (transcript) Sebastien Aubourg
• Hanada et al 2007 (3633 predicted genes)Identify possible corrections
![Page 57: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/57.jpg)
Proteomic Data• High-density Arabidopsis proteome map (Baerenfaller.
2008)Incorrect start codon
![Page 58: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/58.jpg)
VISTA plot Gbrowse track
![Page 59: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/59.jpg)
Transcriptome data
![Page 60: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/60.jpg)
Orthologs and Gene Families
![Page 61: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/61.jpg)
Variation
![Page 62: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/62.jpg)
Promoter Elements
![Page 63: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/63.jpg)
Methylation
![Page 64: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/64.jpg)
Decorated Fasta file
![Page 65: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/65.jpg)
Decorated Fasta file
![Page 66: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/66.jpg)
Decorated Fasta file
![Page 67: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/67.jpg)
New Tools at TAIR
• N-Browse• GBrowse• Synteny viewer
Data provided by Pedro Pattyn at the University of Ghent
![Page 68: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/68.jpg)
AT5G48000
AT5G48010
AT5G47990
![Page 69: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/69.jpg)
![Page 70: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/70.jpg)
www.arabidopsis.org
www.arabidopsis.org/biocyc
www.plantcyc.org
![Page 71: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/71.jpg)
Acknowledgements
Curators:
- Peifen Zhang
- Tanya Berardini
- David Swarbreck
- Kate Dreher
- Rajkumar Sasidharan
Tech Team :- Bob Muller- Larry Ploetz- Raymond Chetty- Anjo Chi- Vanessa Kirkup- Cynthia Lee- Tom Meyer- Shanker Singh- Chris Wilks
AraCyc and TAIR
PI and Co-PIEva HualaSue Rhee
Metabolic Pathway Software:- Peter Karp and SRI group
![Page 72: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/72.jpg)
![Page 73: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/73.jpg)
![Page 74: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/74.jpg)
![Page 75: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/75.jpg)
![Page 76: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/76.jpg)
![Page 77: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/77.jpg)
![Page 78: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/78.jpg)
![Page 79: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/79.jpg)
![Page 80: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/80.jpg)
![Page 81: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/81.jpg)
![Page 82: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/82.jpg)
Automated pipeline at TAIRProgram for aligned sequence(PASA)
Clustered transcripts
Resulting gene model
Previous gene model
Based on a set of rules a decision is made
comparison
NCBI
![Page 83: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/83.jpg)
Gene structure annotation in Arabidopsis
NEW: 282 genes; 1056 exonsUPDATED: 1254 models; 1144 exons
NEW: 1291 genes; 683 exonsUPDATED: 3811 models; 4007 exons
NEW: 681 genes; 828 exonsUPDATED: 10,792 models and 14,050 exons
TAIR6
![Page 84: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/84.jpg)
How do MOD curators annotate genomes?
Experimental & Computational Evidence
Automatic pipeline
Manualannotation
Genome annotation
![Page 85: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/85.jpg)
How do MOD curators annotate genomes?
Experimental & Computational Evidence
Automatic pipeline
Manualannotation
Genome annotation
ESTs cDNAs
![Page 86: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/86.jpg)
How do MOD curators annotate genomes?
Experimental & Computational Evidence
Automatic pipeline
Manualannotation
Genome annotation
![Page 87: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/87.jpg)
How do MOD curators annotate genomes?
Experimental & Computational Evidence
Automatic pipeline
Manualannotation
Genome annotation
![Page 88: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/88.jpg)
How do MOD curators annotate genomes?
Experimental & Computational Evidence
Automatic pipeline
Manualannotation
Genome annotation
Alternative gene modelsShort MS peptidesCommunity submissions…
![Page 89: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/89.jpg)
Manual annotation at different MODs
Genomeediting
tool
Evidenceset
Set of annotation
rules+ +
![Page 90: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/90.jpg)
Manual annotation at different MODs
Genomeediting
tool
Evidenceset
Set of annotation
rules+ +
Nucleotide sequenceShort peptidesProtein similarityAlternative predictions…
Apollo (Arabidopsis, Fly)Aceview (Worm)Zmap/Otterlace (Human)Artemis (Pathogen Project)…
Exon sizeIntron sizeNumber of UTRsCoding/Non-coding ratioSplice-junctions…
![Page 91: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/91.jpg)
Responsibilities of a gene structure curator
ATG TGAGT GTAG AG
Delete wrongly predicted genes
![Page 92: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/92.jpg)
Responsibilities of a gene structure curator
ATG TGAGT GTAG AG
cDNA
Update mispredicted exon-intron structure
![Page 93: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/93.jpg)
Responsibilities of a gene structure curator
ATG TGAGT GTAG AG
cDNA
Update mispredicted exon-intron structure
![Page 94: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/94.jpg)
Responsibilities of a gene structure curator
ATG TGAGT GTAG AG
Annotate splice-variants
ATG TGAGT AG
![Page 95: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/95.jpg)
Responsibilities of a gene structure curator
Annotate ‘atypical’ gene classes
* * * ** * *
Trans. element
Short protein-coding genes
Transposable element genes
Pseudogenes
uORFs (genes within UTR of other genes)
![Page 96: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/96.jpg)
Responsibilities of a gene structure curator
ATG TGAGT GTAG AG
Define gene type
Protein-coding tRNA snRNA snoRNA rRNA…
![Page 97: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/97.jpg)
Categorisation/Review• 17,915 total gene models• Categorise/Prioritise (CDS length, Blast similarity, gene
confidence rank)
TAU Models
RNA-seq Junctions
Augustus Model
TAIR confidence rank
TAIR Model
Peptides
(Splice variants, NMD targets)
(correction)
(colour reflects matching model)
Incorrect junction in TAIR model
Unsupported exon
5
![Page 98: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/98.jpg)
Augustus
• RNA-seq Junctions = cluster reads
• Augustus Input: RNA-seq junctions, peptides, ESTs/cDNAs, TAIR models
• Provide evidence ranking and bonus scores
Junction assembly
Raw spliced RNA-seq reads (8,819,162 reads)
(203,317 Junctions)
![Page 99: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/99.jpg)
Examples of large-scale community datasets recently integrated into the Arabidopsis
annotation• Transposable elements (Quesneville Lab)• Pseudogenes (Gerstein Lab)• Short MS peptides (Baerenfaller et al,
Castellana et al)• Short genes (Hanada et al)
![Page 100: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/100.jpg)
Model Organism Databases
![Page 101: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/101.jpg)
Augustus- Results
4
Augustus models were classified into 4 categories:
Novel genes 20Updated genes 897Splice-variants 1826B-list 1173Rejects 3137
![Page 102: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/102.jpg)
Arabidopsis gene structure annotation A new approach
TAIR6-TAIR9: ESTs and cDNAs serve as main source of experimental data used for genome annotation
cDNA s & ESTs
Automated annotation
Annotated Arabidopsis genome
PASAProgram To Assemble
Spliced Alignments
![Page 103: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/103.jpg)
Arabidopsis gene structure annotation A new approach
TAIR6-TAIR9: ESTs and cDNAs serve as main source of experimental data used for genome annotation
cDNA s & ESTs
Automated annotation
Manualannotation
Annotated Arabidopsis genome
PASAProgram To Assemble
Spliced Alignments
![Page 104: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/104.jpg)
Arabidopsis gene structure annotation A new approach
TAIR6-TAIR9: ESTs and cDNAs serve as main source of experimental data used for Arabidopsis genome annotation
cDNA s & ESTs
Automated annotation
Annotated Arabidopsis genome
PASAProgram To Assemble
Spliced Alignments
![Page 105: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/105.jpg)
Arabidopsis gene structure annotation A new approach
TAIR6-TAIR9: ESTs and cDNAs serve as main source of experimental data used for genome annotation
cDNA s & ESTs
Automated annotation
Manualannotation
Annotated Arabidopsis genome
PASAProgram To Assemble
Spliced Alignments
![Page 106: Making best use of TAIR tools and datasets Philippe Lamesch Donghui Li The Arabidopsis Information Resource contact us: curator@arabidopsis.org](https://reader031.vdocument.in/reader031/viewer/2022013100/5515088e550346c77d8b46ee/html5/thumbnails/106.jpg)
Arabidopsis gene structure annotation A new approach
TAIR6-TAIR9: ESTs and cDNAs serve as main source of experimental data used for genome annotation
cDNA s & ESTs
Automated annotation
Manualannotation
Annotated Arabidopsis genome
MS peptidesRNA-seq data
PASAProgram To Assemble
Spliced Alignments