why manual genome annotation? even the best gene predictors and genome annotation pipelines rarely...

Post on 22-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Why Manual Genome Annotation?

Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis-annotated exon. (Yandell and Ence, 2012, Nature Reviews)

Automated annotation is often not good enough for genes you really care about!

Yandell and Ence, 2012, Nature Reviewshttp://www.yandell-lab.org/publications/pdf/euk_genome_annotation_review.pdf

Different lines of evidence go into modern gene annotation pipelines:1. Computational prediction (Open Reading Frames, etc.)2. Evidence based prediction (ESTs, RNA-seq, etc)3. Homology based prediction (BLAST, etc)Synthesized into a consensus gene annotation – still may be wrong!

Bees(Order Hymenoptera, Family Apidae)

Western Honey Bee (Apis mellifera)

Common Eastern Bumble Bee (Bombus impatiens)

Buff-Tailed Bumble Bee (Bombus terrestris) Dwarf Asian Honey Bee

(Apis florea)

NADPH + H+ + O2 + R-H NADP+ + H2O + R-OH

cytochrome P450 monooxygenase enzymes

classification: CYP 3 A 4

family>40% amino acid sequence-homology

sub-family>55% amino acid sequence-homology

isoenzyme

*15 A-B

allele

Chemical signalling??? (pheromone synthesis and breakdown)

Detoxication(toxin and pesticide metabolism)

Hormone synthesis (highly conserved orthologs)+ Detoxication

Organism P450s food / environment

Nasonia vitripennis 92 f ly pupae

Apis mellifera 46 nectar and pollen / homeostatic nest

Anopheles gambiae 106 blood and detritus / standing water

Drosophila melanogaster 85 rotting fruit

Tribolium castaneum 131 seeds

Organism P450s Mito CYP2 CYP3 CYP4

Drosophila melanogaster 85 11 6 36 32

Apis mellifera 46 6 8 28 4

Nasonia vitripennis 87 6 7 45 29

Repeats

Intron splice sites are highly conserved

P450s:~ 500 amino acids (1500 nucleotides)Highly conserved heme-binding site (cysteine)

Basic Annotation Rules

CDS StartAmino acid MNucleotide ATG

CDS Stop * Amino AcidTAA/TAG/TAG Nucleotide

Translation Frames

Frame 1Frame 2Frame 3

http://en.wikipedia.org/wiki/File:Exon_and_Intron_classes.png

http://doc.goldenhelix.com/SVS/latest/_images/splice_site_diagram.png

Intron splice sites

GT-AG

“(\w)”

“\1 “

‘GT’ intron donor site

‘AG’ intron acceptor site

‘GT’ intron donor site

1 nucelotide “G” for next codon = Phase 1 intron

‘AG’ intron acceptor site

2 nucelotides “AA” before first full codon

Combine with “G” on exon 2

Make the codon “GAA” for glutamic acid (E)

This start looks good!

Jamboree!Search for paralogs using one of these genes from Apis mellifera in the protein database on Genbank (e.g. CYP9R1 AND Apis mellifera)

CYP9R1 CYP6AS3CYP6BD1CYP6AQ1CYP4G11

Use BLASTP to find predicted paralogs in the NCBI “nr” database. Select one of the following bees for the Organism:

Apis floreaBombus impatiensBombus terrestrisMegachile rotundata

Copy and paste verified amino acid sequences (FASTA formatted) into a text file:

Add comments to the header and include a geneidentifier

Send to me at: johnson.5005@osu.edu

Thanks!!

top related