csce555 bioinformatics lecture 3 gene finding meeting: mw 4:00pm-5:15pm swgn2a21 instructor: dr....
Post on 14-Jan-2016
218 Views
Preview:
TRANSCRIPT
CSCE555 BioinformaticsCSCE555 Bioinformatics
Lecture 3 Gene FindingMeeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555
University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.
RoadmapRoadmap
Transcription and Translation
Structure and Organization of Genes
Gene Finding in genomes of Prokaryotic
organisms
Introduction to Sequence Alignment
Summary
04/21/23 2
How to Do Great How to Do Great Bioinformatics?Bioinformatics?You need to understand biologyYou need to understand the
NEEDS of biologistsYou know how to identify the key
problems in biology that become addressable today
Transcription & TranslationTranscription & Translation
Prokaryotic Cells Eukaryotic Cells
Transcription Process: RNA Transcription Process: RNA PolymerasePolymerase
Translation: How Ribosome Translation: How Ribosome Synthesizes ProteinsSynthesizes Proteins
Ribosomes manufacture proteins based on mRNA instructions. Each ribosome reads mRNA, recruits tRNA molecules to fetch amino acids, and assembles the amino acids in the proper order.
Genetic Code
Genetic CodeGenetic Code
Gene Structure of Gene Structure of Prokaryotic CellsProkaryotic Cells
TAATGATAG
Genes in Eukaryotic CellsGenes in Eukaryotic Cells
Pre-mRNA Splicing Pre-mRNA Splicing ProcessProcess
11M Alternative SplicingM Alternative Splicing
Gene Info:1) A DNA sequence coding for the pre-mRNA2) An additional DNA code or other regulating process, which regulates the alternative splicing.
Core Promoter Core Promoter StructureStructure
RoadmapRoadmap
Transcription and Translation
Structure and Organization of Genes
Gene Finding in genomes of Prokaryotic
organisms
Introduction to Sequence Alignment
Summary
04/21/23 13
How to Find GenesHow to Find Genes
TAATGATAG
ATG
Gene-Finding Algorithm Gene-Finding Algorithm Input: DNA sequences, a threshold
gene length KOutput: All possible ORF sequencesProcedure:Scan each of 3 ORFs, and find
subsequence that start with ATG and end with one of (TAA, TAG, TGA)
Repeat above for the complementary sequences also
Risk of the Simple Gene Risk of the Simple Gene Finding AlgorithmFinding AlgorithmThe identified ORFs may arise
just from randomness.How likely is it for an ORF to be a
result of random sequences?Significance of an ORF to be
Gene:◦We expect the likelihood of ORF
being result of random sequences to be less than p.
Calculating pCalculating p3 out of 64 are stopping condonsP( run of k non-stop
condons)=(61/64)^k(61/64)^62=0.051
Setting k=64 (62+1 ATG+ 1 StopCondon)
will make sure the identified ORFs are less likely to be out of random permutation.
Permutation Permutation Test/Randomization TestTest/Randomization TestA generic method to estimate
significance level (p value)Example: how likely that a 10-condon
ORF is result of random permutation?Method:
◦Randomly generate (or permute given sequences) 10,000 sequences
◦Draw a histogram of seq lengths of sequences that have a stop-condon (Null distribution)
◦Calculate the percentage of random ORFs that have lengths >=10.
Estimating cut-off K for gene Estimating cut-off K for gene finding algorithmfinding algorithmExact theoretical calculation:
sensitive to the assumptions, equal probability of condons, etc
Randomized test: do a permutation test, find a length k such that <5% of random ORFs have lengths greater than k.
Sequence Alignment: the Sequence Alignment: the ProblemProblemGiven two sequences, measure
their similarityATAACTTTAATTAAATCCTTTTACTAAA
Web Tool to Align Two Web Tool to Align Two SequencesSequenceshttp://www.ebi.ac.uk/emboss/alig
n
Applications of Sequence Applications of Sequence AlignmentAlignmentPrediction of functions of
(gene/protein/promoters) homology
Database search◦Find similar sequences that are similar
to our query sequence (e.g. new gene)Gene finding by genome
comparisonSequence divergence/phylogeny Sequence Assembly
SummarySummaryTranscription, TranslationGene structures of Prokaryotic
and Eukaryotic cellsFinding genes (ORFs) for
prokaryotic cellsSequence alignment applications
top related