csce555 bioinformatics lecture 3 gene finding meeting: mw 4:00pm-5:15pm swgn2a21 instructor: dr....

Post on 14-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CSCE555 BioinformaticsCSCE555 Bioinformatics

Lecture 3 Gene FindingMeeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555

University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.

RoadmapRoadmap

Transcription and Translation

Structure and Organization of Genes

Gene Finding in genomes of Prokaryotic

organisms

Introduction to Sequence Alignment

Summary

04/21/23 2

How to Do Great How to Do Great Bioinformatics?Bioinformatics?You need to understand biologyYou need to understand the

NEEDS of biologistsYou know how to identify the key

problems in biology that become addressable today

Transcription & TranslationTranscription & Translation

Prokaryotic Cells Eukaryotic Cells

Transcription Process: RNA Transcription Process: RNA PolymerasePolymerase

Translation: How Ribosome Translation: How Ribosome Synthesizes ProteinsSynthesizes Proteins

Ribosomes manufacture proteins based on mRNA instructions. Each ribosome reads mRNA, recruits tRNA molecules to fetch amino acids, and assembles the amino acids in the proper order.

Genetic Code

Genetic CodeGenetic Code

Gene Structure of Gene Structure of Prokaryotic CellsProkaryotic Cells

TAATGATAG

Genes in Eukaryotic CellsGenes in Eukaryotic Cells

Pre-mRNA Splicing Pre-mRNA Splicing ProcessProcess

11M Alternative SplicingM Alternative Splicing

Gene Info:1) A DNA sequence coding for the pre-mRNA2) An additional DNA code or other regulating process, which regulates the alternative splicing.

Core Promoter Core Promoter StructureStructure

RoadmapRoadmap

Transcription and Translation

Structure and Organization of Genes

Gene Finding in genomes of Prokaryotic

organisms

Introduction to Sequence Alignment

Summary

04/21/23 13

How to Find GenesHow to Find Genes

TAATGATAG

ATG

Gene-Finding Algorithm Gene-Finding Algorithm Input: DNA sequences, a threshold

gene length KOutput: All possible ORF sequencesProcedure:Scan each of 3 ORFs, and find

subsequence that start with ATG and end with one of (TAA, TAG, TGA)

Repeat above for the complementary sequences also

Risk of the Simple Gene Risk of the Simple Gene Finding AlgorithmFinding AlgorithmThe identified ORFs may arise

just from randomness.How likely is it for an ORF to be a

result of random sequences?Significance of an ORF to be

Gene:◦We expect the likelihood of ORF

being result of random sequences to be less than p.

Calculating pCalculating p3 out of 64 are stopping condonsP( run of k non-stop

condons)=(61/64)^k(61/64)^62=0.051

Setting k=64 (62+1 ATG+ 1 StopCondon)

will make sure the identified ORFs are less likely to be out of random permutation.

Permutation Permutation Test/Randomization TestTest/Randomization TestA generic method to estimate

significance level (p value)Example: how likely that a 10-condon

ORF is result of random permutation?Method:

◦Randomly generate (or permute given sequences) 10,000 sequences

◦Draw a histogram of seq lengths of sequences that have a stop-condon (Null distribution)

◦Calculate the percentage of random ORFs that have lengths >=10.

Estimating cut-off K for gene Estimating cut-off K for gene finding algorithmfinding algorithmExact theoretical calculation:

sensitive to the assumptions, equal probability of condons, etc

Randomized test: do a permutation test, find a length k such that <5% of random ORFs have lengths greater than k.

Sequence Alignment: the Sequence Alignment: the ProblemProblemGiven two sequences, measure

their similarityATAACTTTAATTAAATCCTTTTACTAAA

Web Tool to Align Two Web Tool to Align Two SequencesSequenceshttp://www.ebi.ac.uk/emboss/alig

n

Applications of Sequence Applications of Sequence AlignmentAlignmentPrediction of functions of

(gene/protein/promoters) homology

Database search◦Find similar sequences that are similar

to our query sequence (e.g. new gene)Gene finding by genome

comparisonSequence divergence/phylogeny Sequence Assembly

SummarySummaryTranscription, TranslationGene structures of Prokaryotic

and Eukaryotic cellsFinding genes (ORFs) for

prokaryotic cellsSequence alignment applications

top related