mitochondrial genome annotationi develop an automated pipeline for mtdna annotation by re ning taxon...

19
Introduction Materials and Methods Results Conclusion and Outlook Mitochondrial Genome Annotation Protein Genes Marwa Al Arab 1,2 1 Institute of Bioinformatics University of Leipzig 2 Department of Bioinformatics Lebanese University TBI Bled 2015 Marwa Al Arab Mitochondrial Genome Annotation

Upload: others

Post on 14-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Mitochondrial Genome AnnotationProtein Genes

Marwa Al Arab1,2

1Institute of BioinformaticsUniversity of Leipzig

2Department of BioinformaticsLebanese University

TBI Bled 2015

Marwa Al Arab Mitochondrial Genome Annotation

Page 2: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Outline

IntroductionMitochondrial DNAProblem

Materials and MethodsToolsTrainingAnnotation

Results

Marwa Al Arab Mitochondrial Genome Annotation

Page 3: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Mitochondrial DNAProblem

Mitochondrial DNA

I Circular molecule located inmitochondria withineurkaryotic cells.

I transform energy to a formused by the cells

I Length about 16500nucleotide.

I 13 protein coding genes

I 22 trna genes

I 2 rrna

[wikipedia]

Marwa Al Arab Mitochondrial Genome Annotation

Page 4: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Mitochondrial DNAProblem

Problem

I Refseq is the most used repository for mitochondrial genomeannotation

I Refseq suffers form several inconsistencies and errors inannotation

I Missing or incorrect strandI Confusing trnL1/trnL2I trnS1/trnS2I Inconsistencies in gene names(Bernt et. al 2012)

I Problem in developing automated analysis for mitochondrialdata

Marwa Al Arab Mitochondrial Genome Annotation

Page 5: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Mitochondrial DNAProblem

Objective

I Develop an automated pipeline for mtdna annotation byrefining taxon specific hmm and covariance models .

Marwa Al Arab Mitochondrial Genome Annotation

Page 6: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

ToolsTrainingAnnotation

Tools

I HMMER an implementation ofprofile HMMs (Sean Eddy and hisgroup ).

I hmmbuild build a model frommultiple sequences

I hmmalign algin a model tosequences

I hmmsearch search a model insequences database

I hmmscan search a genome inmodels database

Marwa Al Arab Mitochondrial Genome Annotation

Page 7: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

ToolsTrainingAnnotation

Training Data

I Build taxon specific modelsalong the phylogenetic treenodes

I Step 1: Build protein modelsfor leaf sequences

I Step 2: Recursively lift upthe model with best score

I Step 3: Build the modelsdatabase

A

B

AB

C

D

CD

ABCD

MA

MB

MC

MD

MCD

MAB

MABCD

Marwa Al Arab Mitochondrial Genome Annotation

Page 8: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

ToolsTrainingAnnotation

Training Data

I Build taxon specific modelsalong the phylogenetic treenodes

I Step 1: Build protein modelsfor leaf sequences

I Step 2: Recursively lift upthe model with best score

I Step 3: Build the modelsdatabase

A

B

AB

C

D

CD

ABCD

MA

MB

MC

MD

MCD

MAB

MABCD

Marwa Al Arab Mitochondrial Genome Annotation

Page 9: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

ToolsTrainingAnnotation

Training Data

I Build taxon specific modelsalong the phylogenetic treenodes

I Step 1: Build protein modelsfor leaf sequences

I Step 2: Recursively lift upthe model with best score

I Step 3: Build the modelsdatabase

A

B

AB

C

D

CD

ABCD

MA

MB

MC

MD

MCD

MAB

MABCD

Marwa Al Arab Mitochondrial Genome Annotation

Page 10: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

ToolsTrainingAnnotation

Training Data

I Build taxon specific modelsalong the phylogenetic treenodes

I Step 1: Build protein modelsfor leaf sequences

I Step 2: Recursively lift upthe model with best score

I Step 3: Build the modelsdatabase

A

B

AB

C

D

CD

ABCD

MA

MB

MC

MD

MCD

MAB

MABCD

Marwa Al Arab Mitochondrial Genome Annotation

Page 11: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

ToolsTrainingAnnotation

Example: Part of nad1 Alignment

Marwa Al Arab Mitochondrial Genome Annotation

Page 12: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

ToolsTrainingAnnotation

Annotation

I Taxonomic level database

StartInput Fasta

Translate in 6 Frames

Search DB

Overlap Best hit

Output Bed

yes

no

Marwa Al Arab Mitochondrial Genome Annotation

Page 13: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Results

I Train 3843 mt genome sequence from refseq63

I Test on 925 genome which are newly annotated in refseq69

I Scan against level models database

Marwa Al Arab Mitochondrial Genome Annotation

Page 14: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Phylum and Class Models

Sn = TPTP+FN Sp = TP

TP+FP

Phylum models

Sn = 0.986 Sp = 0.985

Class models

Sn = 0.989 Sp = 0.988

Marwa Al Arab Mitochondrial Genome Annotation

Page 15: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Phylum and Class Models

Sn = TPTP+FN Sp = TP

TP+FP

Phylum models

Sn = 0.986 Sp = 0.985

Class models

Sn = 0.989 Sp = 0.988

Marwa Al Arab Mitochondrial Genome Annotation

Page 16: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Phylum and Class ModelsSn = TP

TP+FN Sp = TPTP+FP

Phylum models

Sn = 0.986 Sp = 0.985

Class models

Sn = 0.989 Sp = 0.988

Marwa Al Arab Mitochondrial Genome Annotation

Page 17: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Phylum and Class ModelsSn = TP

TP+FN Sp = TPTP+FP

Phylum models

Sn = 0.986 Sp = 0.985

Class models

Sn = 0.989 Sp = 0.988

Marwa Al Arab Mitochondrial Genome Annotation

Page 18: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Conclusion and Outlook

I An automated pipeline to annotate protein coding genes inmtDNA by refining taxon specific hmm

I OutlookI Find the best level database and best parameters to minimize

FN and maximize TPI Analyse the results in deep to improve the refseq annotationI Apply on tRna

Marwa Al Arab Mitochondrial Genome Annotation

Page 19: Mitochondrial Genome AnnotationI Develop an automated pipeline for mtdna annotation by re ning taxon speci c hmm and covariance models . Marwa Al Arab Mitochondrial Genome Annotation

IntroductionMaterials and Methods

ResultsConclusion and Outlook

Thanks to

I Matthias Bernt

I Christian Honer

I Frank Juhling

I Abdullah Sahyoun

I Peter Stadler

Marwa Al Arab Mitochondrial Genome Annotation