genome analysis2

27
Gene identification

Upload: malla-reddy-college-of-pharmacy

Post on 16-Jul-2015

111 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Genome analysis2

Gene identification

Page 2: Genome analysis2

Open Reading frame

Page 3: Genome analysis2

Six ORFs of dsDNA

Page 4: Genome analysis2

Six ORFs of dsDNA

Page 5: Genome analysis2

Complication with Introns

Page 6: Genome analysis2

Genome annotation

•Annotation : Obtaining biological information from unprocessed sequence data

•Structural annotation : Identification of genes and other other important sequence elements

•Functional annotation : The determination of the functional roles of genes in the organism

Page 7: Genome analysis2

Genome annotation

•Raw genomic sequence can be annotated by,

i. Comparison with databases of previously cloned genes and ESTs

ii. Gene prediction based on consensus features such as Promoters Splice sites Polyadenylation sites and ORFs

Page 8: Genome analysis2

Gene identificationGene finding in eukaryotes is difficultGenome GenesBacterial genome 80-85%Yeast 70% Fruit fly 25%Human genome 3-5%

In human genome, Typical exon = 150bp Intron = Several kbs Complete gene = Hundreds of kbs

Page 9: Genome analysis2

ORF prediction•Three reading frames are possible from each strand of a DNA using “six-frame translation process” - Result is 6 potential protein sequences - Longest frame uninterrupted by a stop codon is the correct one

•Finding the ends of ORF is easier than finding beginning Beginning can be find using, - Start codon - kozak sequence (CCGCCAUGG) flanking start codon - CpG islands

Page 10: Genome analysis2

Software programs for gene identification

•Advantage : Speed – annotation can be carried out concurrently with sequencing itself.

•Disadvantage : Accuracy

•Two strategies used are, - Homology searching - ab initio prediction

Page 11: Genome analysis2
Page 12: Genome analysis2
Page 13: Genome analysis2
Page 14: Genome analysis2
Page 15: Genome analysis2

ab initio prediction

Based on type of algorithm,GRAIL – Based on neural networks - Predicts exons, genes, promoters, polyAs, CpG islands EST similarities, repetitive elements,

GeneFinder – Rule-based system

GENSCAN, GENEI, HMMGene, GeneMarkHMM, FGENEH – Hidden Markov model

Page 16: Genome analysis2

Genescan

Page 17: Genome analysis2

ab initio prediction

1. Feature dependent methods, Features of eukaryotic genes recognized are, -Control signals such as TATA box, cap site, Kozak consensus and polyadenylation sites

HEXON, MZEF are gene predicting programs that can predict only a single feature, exon.

2. Few programs depend on differences in base composition

Page 18: Genome analysis2

ab initio predictionAccuracy problem – Algorithms are not 100% accurate

Errors include - Incorrect calling of exon boundaries - Missed exons - Failure to detect entire genes

Solution:Running different programs on single genome

Page 19: Genome analysis2

Homology searching•Finding genes in long sequences by looking for matches with sequences that are known to be transcribed, e.g. cDNA, EST or a gene

Programs used are BLAST (Basic Local Alignment Search Tool)based, BLASTN BLASTX BLASTP etc.

Page 20: Genome analysis2

Homology searching or ab initio ?

•Algorithms that take similarity data into account are better at gene prediction – Reese et al(2000), Fortna et al(2001)

Latest gene prediction algorithms combine similarity data with ab initio methods examples : Grail/Exp, GenieEST, GenomeScan

tRNAScanSE : For tRNA identification

Page 21: Genome analysis2

Advanced gene finding programs

Page 22: Genome analysis2

GLIMMER•Gene Locator and Interpolated Markov ModelER•For finding genes in microbial DNA

Page 23: Genome analysis2

GLIMMER

Page 24: Genome analysis2

GLIMMER

Page 25: Genome analysis2

GeneMark

Page 26: Genome analysis2

GeneMark

Page 27: Genome analysis2

GenScan