annotation of drosophila virilis
DESCRIPTION
Annotation of Drosophila virilis. Chris Shaffer GEP workshop, 2006. Annotation of D. virilis. Outline of general technique and then one practical example This technique may not be the best with other projects (e.g. corn, bacteria) The technique optimized for projects: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/1.jpg)
Annotation of Drosophila virilis
Chris Shaffer GEP workshop, 2006
![Page 2: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/2.jpg)
Annotation of D. virilis
Outline of general technique and then one practical example
This technique may not be the best with other projects (e.g. corn, bacteria)
The technique optimized for projects:– Moderately close, well annotated neighbor
species– No EST, mRNA or expression data
available
![Page 3: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/3.jpg)
Helpful Hints
Evolutionary distance between D. virilis and D. melanogaster is much larger than chimp to human– Conservation will be at the protein domain
level– Synteny is detectable in some fosmids– Most genes stay on the same
chromosome (3 exceptions seen in ~40 genes)
![Page 4: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/4.jpg)
D. virilis
Average gene size will be smaller than mammals
Very low density of pseudogenesAlmost all genes in virilis will have the
same basic structure as melanogaster orthologs; mapping exon by exon works well for most genes
![Page 5: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/5.jpg)
How to proceed
First, identify features of interest:1. Genscan results
• Watch out for ends - fused or split genes2. Regions of high similarity with D.
melanogaster protein, identified by BLAST• Overlapping genes usually on opposite strand• Be vigilant for partial genes at fosmid ends
3. Regions with high similarity to known genes (i.e. BLAST to nr) not covered above
![Page 6: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/6.jpg)
Basic Procedure
For each feature of interest:1. Identify the likely ortholog in D. m.2. Use D. m. database to find gene model of
ortholog and identify all exons3. Use BLASTX to identify locations and
frames of each exon, one by one 4. Based on locations, frames, and gene
predictions, find donor and acceptor splice sites that link frames together; identify the exact base location (start and stop) of each coding exon
5. double check your results by translation
![Page 7: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/7.jpg)
Basic procedure (graphically)fosmid
BLASTX of predicted gene to melanogaster proteins suggests this region orthologous to Dm gene with 5 exons:
feature
BLASTX of each exon to locate region of similarity:
1 3 3 2 1
![Page 8: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/8.jpg)
Basic procedure (graphically)
Zoom in on ends of exons and find first met, matching intron Doner (GT) and Acceptor (AG) sites and final stop codon
GT AG
Once these have been identified, write down the exact location of the first base and last base of each exon. Use these numbers to check your gene model
1 3 3 2 1
1 3
Met GT
1121 1402 1754 2122 26011187 1591 1939 2434 2789
![Page 9: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/9.jpg)
Example Annotation Open Safari and go to goose.wustl.edu Click on Genome Browser
![Page 10: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/10.jpg)
Example Annotation Settings are: Insect; D. virilis; Mar. 2005; chr10 (chr10 is a fosmid from 2005) Click submit
![Page 11: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/11.jpg)
Example Annotation
Seven predicted Genscan genesEach one would be investigated
![Page 12: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/12.jpg)
Investigate 10.4 All putative genes will need to be
analyzed; we will focus on 10.4 in this example
To zoom in on this gene enter: chr10:15000-21000 in position box
Then click jump button
![Page 13: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/13.jpg)
Step 1: Find Ortholog
If this is a real gene it will probably have at least some homology to a D. melanogaster protein
Step one: do a BLAST search with the predicted protein sequence of 10.4 to all proteins in D. melanogaster
![Page 14: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/14.jpg)
Step 1: Find Ortholog
Click on one of the exons in gene 10.4On the Genscan report page click on
Predicted Protein Select and copy the sequenceDo a blastp search of the predicted
sequence to the D. melanogaster “Annotated Proteins” database at
http://flybase.net/blast
![Page 15: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/15.jpg)
Step 1: Find OrthologThe results show a significant hit to the
“A” and “B” isoforms of the gene “mav”
![Page 16: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/16.jpg)
Step 1: Results of Ortholog search
The alignment looks right for virilis vs. melanoaster- regions of high similarity interspersed with regions of little or no similarity
We have a probable ortholog: maverick
![Page 17: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/17.jpg)
Step 2: Gene modelWhat does mav look like?Go to ENSEMBL to get exons and map
them to regions:– Web brower- go to www.ensemble.org
![Page 18: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/18.jpg)
Click on DrosophilaSearch for mav (top right search box)Click on “Ensembl Gene: CG1901”Scroll down to map and notice two
isoforms:
Step 2: Gene model
![Page 19: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/19.jpg)
Step 2: Gene model We now have a gene model (two exon gene,
two isoforms). We will annotate isoform A since it is the
largest. Due to time constraints, our policy so far is to have students pick and annotate only one isoform for each feature.
If more than one isoform exists, pick the largest or the one with the most exons
Here student should choose to annotate isoform A (largest)
All isoforms should be annotated eventually
![Page 20: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/20.jpg)
Step 3: Investigate Exons Given we need to annotate isoform A, we
need exon sequence for exon 1 and 2, so we do BLASTX search
Click on [Peptide info] for isoform A on right just above map
Scroll down to find peptide sequence with exons in different colors:
YNASSNKYSLINVSQSKNFPQLFNKKLSVQWINTVPIQSRQTRETRDIGLETKRHSKPSKRVDETRLKHLVLKGLGIKKLPDMRKVNISQAEYSSKYIEYLSRLRSNQEKGNSYFNNFMGASFTRDLHFLSITTNGFNDISNKRLRHRRSLKKINRLNQNPKKHQNYGDLLRGEQDTMNILLHFPLTNAQDANFHHDK
![Page 21: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/21.jpg)
Step 3: Investigate ExonsStart with exon 1We will use a varient of the BLAST
program, called blast2seq. This version compares two sequences instead of comparing a sequence to a database
Best to search entire fosmid DNA sequence (easier to keep track of positions) with the amino acid sequence of exon 1
![Page 22: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/22.jpg)
Step 3: Investigate Exons Create 3 tabs in Safari In the first tab, go to the goose browser chr10
of virilis; click the DNA button, then click “get DNA”
In the second tab, go to www.ensembl.org and get the peptide sequence for the melanogaster mav gene
These first two tabs now have the two sequences you are going to compare
In the third tab go to NCBI blast page and click on “Align two sequences (bl2seq)”
![Page 23: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/23.jpg)
Step 3: Investigate Exons Copy and paste the genomic sequence from
tab 1 into sequence box 1 of tab 3 Copy and paste the peptide sequence of
exon 1 from tab 2 into sequence box 2 Since we are comparing a DNA sequence to
a protein we need to run BLASTX Turn off the filter Leave other values at default for now Click “align” button to run the comparison
![Page 24: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/24.jpg)
Step 3: Investigate Exons No significant homology found Either the mav ortholog is not in this fosmid
(unlikely given the original blastp hit) or this exon is not well conserved
Lets look for similarities of lower quality Click the back button to go back to the
bl2seq page Change the expect value to 1000 and click
align
![Page 25: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/25.jpg)
Step 3: Investigate Exons We have a weak alignment (50 identities and
94 similarities), but we have seen worse when comparing single exons from these two species
Notice the location of the hit (bases 16866 to 17504) and frame +3
![Page 26: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/26.jpg)
Step 3: Investigate ExonsA similar search with exon 2
sequences gives a location of chr10:18476-19744 and frame +2
For larger genes continue with each exon, searching with bl2seq (adjusting e cutoff if necessary) and noting location and frame of region of similarity
![Page 27: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/27.jpg)
Step 4: Create Gene Model Pick ATG (met) at start of gene, first met in
frame with coding region of similarity (+3) For each putative intron/exon boundary
compare location of BLASTX result with gene finder results to locate exact first and last base of the exon and check that the intron starts with “GT” and ends with “AG”
Exons: 16515-17504; 18473-19744 Intron GT and AG present
![Page 28: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/28.jpg)
Step 4: Confirm Gene ModelAs a final check we need to create the putative
mRNA, translate it and make sure the protein we get out is similar to expected:
1. Enter coordinates for each exon in browser2. Click “DNA” button at top then “get DNA”3. Copy the sequence into a text file4. Repeat for each exon, adding DNA to file5. Go to http://us.expasy.org/tools/dna.html6. Enter your entire sequence, hit “Translate
Sequence”; should get one long protein7. Compare the protein sequence to ortholog
using bl2seq
![Page 29: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/29.jpg)
Step 4: confirm model (Future)
We have a web page under construction which will simplify confirmation
This web site will double check intron- exon boundaries, translate the putative message and create a data file suitable for uploading
![Page 30: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/30.jpg)
Considerations
Some exons are very hard to find (small or non-conserved; keep increasing E value to find any hits (10,000,000 not unheard of)
Donor “GC” seen on rare occasions We have seen one example where the only
reasonable interpretation was that an intron had moved (out of about 70 genes)
Without est and expression data you may get stuck; use your best judgment
![Page 31: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/31.jpg)
Gene Function
In addition to annotation of the genes we ask the students to look into the function of each gene and discuss what they found in their final paper on annotation
For genes in Drosophila the best source to begin your investigation into gene function is the drosophila online database called Flybase.
![Page 32: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/32.jpg)
Flybase
www.flybase.org flybase.bio.indiana.edu
![Page 33: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/33.jpg)
Flybase gene info
Search for gene name Will find links to info pages with many
helpful referencesRemember many genes have functions
assigned based only on similarity dataThis is especially true for anonymous
genes “CG#####”. Take any functional assignment with large amounts of skepticism, consider it a guess at best
![Page 34: Annotation of Drosophila virilis](https://reader036.vdocument.in/reader036/viewer/2022062500/568158ec550346895dc62b84/html5/thumbnails/34.jpg)
Gene function for Mav