umr 1095 - asp umr 1095 - asp structural & comparative genomics in bread wheat triannotpipeline...

12
UMR 1095 - UMR 1095 - ASP ASP Structural & Structural & Comparative Genomics in Comparative Genomics in Bread Wheat Bread Wheat TriAnnotPipeline TriAnnotPipeline A LifeGrid Project A LifeGrid Project based on AUVERGRID based on AUVERGRID F. Giacomoni , M. Reichstadt, P. Leroy Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France 3rd EGEE 3rd EGEE User Forum User Forum February 12th, 2008 February 12th, 2008

Upload: brendan-henderson

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

UMR 1095 - ASPUMR 1095 - ASP

Structural & Comparative Structural & Comparative Genomics in Bread Wheat Genomics in Bread Wheat

TriAnnotPipelineTriAnnotPipeline A LifeGrid Project based on A LifeGrid Project based on

AUVERGRIDAUVERGRID

F. Giacomoni, M. Reichstadt, P. Leroy

Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France

3rd EGEE3rd EGEE

User ForumUser Forum

February 12th, 2008February 12th, 2008

Wheat as a challenge for GenomicsWheat as a challenge for Genomics

• Important Economic Crop

• Large Genome size

BarleyRice Bread wheat

4.800 Mb4.800 Mb

2.800 Mb2.800 Mb

380 Mb380 Mb

Maize

85%Repeat sequences70-80%

50-80%50% 140 Mb140 Mb

10%

17.000 Mb17.000 Mb

Human ~ 3.000 Mb

A. thaliana

I.N.R.A. Work on the Wheat Genome

• Sequencing • Annotating

Discover Genes Find Transposable Elements Study other biological components

AAAATCGATATAGAGTATGTAGACAAATTTTAAACCCGGGGGAGAGAGAGA DNA sequence

Results after Annotation of the DNA Sequence

EugeneEugeneGenemarkHMMGenemarkHMMGeneIDGeneID

General Pipeline Structure of TriAnnot

TriAnnot TriAnnot PipelinePipelineGRIDGRID

DataBase (DataBase (chadochado) ) & Viewers & Viewers ((GBrowseGBrowse))http://urgi.versailles.inra.fr/projects/TriAnnot/

TriSetTriSetGeneFarGeneFar

mm

ManualManualcurationcuration

training data set

GenesGenes

ManualManualcurationcuration

TEsTEs

TREPconsTREPconsREPETREPET

DNADNAsequencessequences

TEsTEs

ManualManualcurationcuration

WEB / Pipeline

Production

GBrowse Login/password

DataBanks

WEB / PipelineDevelopment

DownLoadgff/ARTEMISgameXml/APOLLO

Manual CurationAPOLLOAPOLLO

GnpDB

On

Lin

e

Login/password

Login/password

RepeatMasker, est2genome, Gmap, BLAST, HMMPfam

UpLoad Login/password Loc

al

GnpGenome

GFF

gff

Users

TriAnnotPipelineGRID ArchitectureTriAnnotPipelineGRID Architecture

GRID & Cluster

Transposable Element Transposable Element

& repeats& repeats

Panel 1Panel 1

BAC sequenceBAC sequenceFASTA formatFASTA format

BAC with masked TEBAC with masked TE

Block1aBlock1a Block1bBlock1b

BLASTx / TREPprot

TRF SSR

RepeatMaskerTREPnr,TREPtotalRepBase,

AnnotatiAnnotationon

MaskinMaskingg

Other Other biological biological target target searchessearches

Panel 3Panel 3

……

nt, sts, htgs, gss

tRNAtRNA

miRNAmiRNA

mtDNAmtDNAcpDNAcpDNA

Block5bBlock5b

Block5cBlock5c

Block5dBlock5d

BLASTnUGset / IRGSP/TIGR pseudo

Block5aBlock5a

Panel 2Panel 2Gene Gene annotationannotationGene StructureGene Structure

ab initio Prediction Prediction GeneMarkHMM, GeneID, EuGene, GENSCAN, GeneZilla

BLASTx BLASTx SwissProt / TrEMBL

BAC with masked TEs & GenesBAC with masked TEs & Genes

Block2Block2

BLAST/Gmap BLAST/Gmap with transcriptsFL-cDNA, EST, mRNA

Block3aBlock3a

Block3bBlock3b

Gene ModelGene Model

EVM + PASA EVM + PASA (US)RAP-like RAP-like (Japan)

EUGENE EUGENE (France)

Block3cBlock3c

Known Protein Known Protein

Putative ProteinPutative ProteinDomain Containing ProteinDomain Containing ProteinExpressed GeneExpressed Gene

Conserved Hypothetical GeneConserved Hypothetical GeneHypothetical GeneHypothetical Gene

Gene FunctionGene FunctionIWGSC annotation guide line

Block4Block4

Best HitBest Hitproteinsproteins - At- At - Os - Os

Bes

t Hit

Bes

t Hit

TriAnnotPipelineGRID Detailed ArchitectureTriAnnotPipelineGRID Detailed Architecture

PIPELINE PART :

WEB INTERFACE PART with:Upload of BAC FASTA format sequence

Programming parameters of the Annotation with 5 blocks

Production of a step.xml Wheat Seq

STEP_0: * 3 RepeatMasker vs 3 DataBanks

STEP_1: * 8 BLASTn vs 8 DataBanks

* 1 BLASTx vs 1 DataBank

* 1 Tandem Repeat Finder

STEP_2: * 1 EugeneIMM Rice

* 1 GeneId

* 4 GeneMarkHMM with 4 matrix

STEP_3: * 1 tBLASTx vs 1 DataBank

* 1 BLASTn vs 1 DataBank

* 1 BLASTx vs 1 DataBank

STEP_4: * 2 tBLASTn vs 2 DataBank

RESULTS FILES (GFF Format)

PIPELINE PART:

WEB INTERFACE PART with:Upload of BAC FASTA format sequence

Programming parameters of the Annotation with 5 blocks

Production of a step.xml Wheat Seq

PIPELINE_GRID PART I (STEP_1A)

PIPELINE LOCAL PART:

STEP_1B: * 1 TRF

STEP_2: * 1 EugeneIMM Rice

* 1 GeneId

* 4 GeneMarkHMM

STEP_3C: * 3 Gene Modelling

PIPELINE_GRID PART II (STEP_1B, 3A, 3B, 4A, 4B, 5A et 5D)

5 RM 3 BLASTx 8 GMap6 BLASTp 1 PFAM1 tBLASTn14 BLASTn

5 RepeatMasker (RM)

RESULTS FILES (GFF Format)

TriAnnotPipelineGRID ArchitectureTriAnnotPipelineGRID Architecture

Bioinformatic algorithmsBioinformatic algorithms

SE

Bioinformatic databasesBioinformatic algorithms

Bioinformatic package

ServerUser Interface

Server partGrid part

DB updateservice

ComputingElement (CE)

UIJDL

Bioinformatic algorithmsBioinformatic algorithms

CEUIServer

Get the parameterCreate the XML step fileGet the input (sequence) fileCreate the grid environment(JDL, shellscripts)Mask the repeated sequencesRepeatMasker/Blast/GMap/HMMerRetrieve the outputFill the database

Get the parameterCreate the XML step fileGet the input (sequence) fileCreate the grid environment(JDL, shellscripts)Mask the repeated sequencesRepeatMasker/Blast/GMap/HMMerRetrieve the outputFill the database

Get the parameterCreate the XML step fileGet the input (sequence) fileCreate the grid environment(JDL, shellscripts)Mask the repeated sequencesRepeatMasker/Blast/GMap/HMMerRetrieve the outputFill the database

ComputingElement (CE)

UIJDL

Bioinformatic algorithmsBioinformatic algorithms

CE

1-Parameters+ input file

2-Creation XML file9-DB filling

3-copy input files

4-Creation environment

6-job running (BLAST/HMMer/RepeatMasker/GMap)

5-job submission 7- job output

8-output transfer

UIJDL

2007-2007-20082008

TriAnnotPipelineGRID PartnersTriAnnotPipelineGRID Partners

F. GiacomoniF. GiacomoniC. CharpentierC. CharpentierN. GuilhotN. GuilhotF. ChouletF. ChouletP. LeroyP. LeroyC. FeuilletC. Feuillet

T. Tanaka T. Tanaka H. IkawaH. IkawaH. NumaH. NumaT. ItohT. Itoh

M. AlauxM. AlauxT. FlutreT. FlutreI. Blanc-LenfleI. Blanc-LenfleS. RebouxS. RebouxH. QuesnevilleH. Quesneville

B. HaasB. HaasF. LegeaiF. Legeai

B. KronmillerB. Kronmiller

M. ReichstadtM. ReichstadtA. ClaudeA. ClaudeM. LiauzuM. LiauzuA. MahulA. Mahul