mitochondrial genome annotationi develop an automated pipeline for mtdna annotation by re ning taxon...
TRANSCRIPT
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Mitochondrial Genome AnnotationProtein Genes
Marwa Al Arab1,2
1Institute of BioinformaticsUniversity of Leipzig
2Department of BioinformaticsLebanese University
TBI Bled 2015
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Outline
IntroductionMitochondrial DNAProblem
Materials and MethodsToolsTrainingAnnotation
Results
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Mitochondrial DNAProblem
Mitochondrial DNA
I Circular molecule located inmitochondria withineurkaryotic cells.
I transform energy to a formused by the cells
I Length about 16500nucleotide.
I 13 protein coding genes
I 22 trna genes
I 2 rrna
[wikipedia]
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Mitochondrial DNAProblem
Problem
I Refseq is the most used repository for mitochondrial genomeannotation
I Refseq suffers form several inconsistencies and errors inannotation
I Missing or incorrect strandI Confusing trnL1/trnL2I trnS1/trnS2I Inconsistencies in gene names(Bernt et. al 2012)
I Problem in developing automated analysis for mitochondrialdata
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Mitochondrial DNAProblem
Objective
I Develop an automated pipeline for mtdna annotation byrefining taxon specific hmm and covariance models .
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
ToolsTrainingAnnotation
Tools
I HMMER an implementation ofprofile HMMs (Sean Eddy and hisgroup ).
I hmmbuild build a model frommultiple sequences
I hmmalign algin a model tosequences
I hmmsearch search a model insequences database
I hmmscan search a genome inmodels database
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
ToolsTrainingAnnotation
Training Data
I Build taxon specific modelsalong the phylogenetic treenodes
I Step 1: Build protein modelsfor leaf sequences
I Step 2: Recursively lift upthe model with best score
I Step 3: Build the modelsdatabase
A
B
AB
C
D
CD
ABCD
MA
MB
MC
MD
MCD
MAB
MABCD
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
ToolsTrainingAnnotation
Training Data
I Build taxon specific modelsalong the phylogenetic treenodes
I Step 1: Build protein modelsfor leaf sequences
I Step 2: Recursively lift upthe model with best score
I Step 3: Build the modelsdatabase
A
B
AB
C
D
CD
ABCD
MA
MB
MC
MD
MCD
MAB
MABCD
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
ToolsTrainingAnnotation
Training Data
I Build taxon specific modelsalong the phylogenetic treenodes
I Step 1: Build protein modelsfor leaf sequences
I Step 2: Recursively lift upthe model with best score
I Step 3: Build the modelsdatabase
A
B
AB
C
D
CD
ABCD
MA
MB
MC
MD
MCD
MAB
MABCD
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
ToolsTrainingAnnotation
Training Data
I Build taxon specific modelsalong the phylogenetic treenodes
I Step 1: Build protein modelsfor leaf sequences
I Step 2: Recursively lift upthe model with best score
I Step 3: Build the modelsdatabase
A
B
AB
C
D
CD
ABCD
MA
MB
MC
MD
MCD
MAB
MABCD
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
ToolsTrainingAnnotation
Example: Part of nad1 Alignment
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
ToolsTrainingAnnotation
Annotation
I Taxonomic level database
StartInput Fasta
Translate in 6 Frames
Search DB
Overlap Best hit
Output Bed
yes
no
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Results
I Train 3843 mt genome sequence from refseq63
I Test on 925 genome which are newly annotated in refseq69
I Scan against level models database
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Phylum and Class Models
Sn = TPTP+FN Sp = TP
TP+FP
Phylum models
Sn = 0.986 Sp = 0.985
Class models
Sn = 0.989 Sp = 0.988
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Phylum and Class Models
Sn = TPTP+FN Sp = TP
TP+FP
Phylum models
Sn = 0.986 Sp = 0.985
Class models
Sn = 0.989 Sp = 0.988
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Phylum and Class ModelsSn = TP
TP+FN Sp = TPTP+FP
Phylum models
Sn = 0.986 Sp = 0.985
Class models
Sn = 0.989 Sp = 0.988
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Phylum and Class ModelsSn = TP
TP+FN Sp = TPTP+FP
Phylum models
Sn = 0.986 Sp = 0.985
Class models
Sn = 0.989 Sp = 0.988
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Conclusion and Outlook
I An automated pipeline to annotate protein coding genes inmtDNA by refining taxon specific hmm
I OutlookI Find the best level database and best parameters to minimize
FN and maximize TPI Analyse the results in deep to improve the refseq annotationI Apply on tRna
Marwa Al Arab Mitochondrial Genome Annotation
IntroductionMaterials and Methods
ResultsConclusion and Outlook
Thanks to
I Matthias Bernt
I Christian Honer
I Frank Juhling
I Abdullah Sahyoun
I Peter Stadler
Marwa Al Arab Mitochondrial Genome Annotation