csce555 bioinformatics lecture 3 gene finding meeting: mw 4:00pm-5:15pm swgn2a21 instructor: dr....

23
CSCE555 Bioinformatics CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu .

Upload: amelia-jennings

Post on 14-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

CSCE555 BioinformaticsCSCE555 Bioinformatics

Lecture 3 Gene FindingMeeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555

University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.

Page 2: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

RoadmapRoadmap

Transcription and Translation

Structure and Organization of Genes

Gene Finding in genomes of Prokaryotic

organisms

Introduction to Sequence Alignment

Summary

04/21/23 2

Page 3: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

How to Do Great How to Do Great Bioinformatics?Bioinformatics?You need to understand biologyYou need to understand the

NEEDS of biologistsYou know how to identify the key

problems in biology that become addressable today

Page 4: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Transcription & TranslationTranscription & Translation

Prokaryotic Cells Eukaryotic Cells

Page 5: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Transcription Process: RNA Transcription Process: RNA PolymerasePolymerase

Page 6: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Translation: How Ribosome Translation: How Ribosome Synthesizes ProteinsSynthesizes Proteins

Ribosomes manufacture proteins based on mRNA instructions. Each ribosome reads mRNA, recruits tRNA molecules to fetch amino acids, and assembles the amino acids in the proper order.

Genetic Code

Page 7: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Genetic CodeGenetic Code

Page 8: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Gene Structure of Gene Structure of Prokaryotic CellsProkaryotic Cells

TAATGATAG

Page 9: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Genes in Eukaryotic CellsGenes in Eukaryotic Cells

Page 10: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Pre-mRNA Splicing Pre-mRNA Splicing ProcessProcess

Page 11: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

11M Alternative SplicingM Alternative Splicing

Gene Info:1) A DNA sequence coding for the pre-mRNA2) An additional DNA code or other regulating process, which regulates the alternative splicing.

Page 12: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Core Promoter Core Promoter StructureStructure

Page 13: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

RoadmapRoadmap

Transcription and Translation

Structure and Organization of Genes

Gene Finding in genomes of Prokaryotic

organisms

Introduction to Sequence Alignment

Summary

04/21/23 13

Page 14: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

How to Find GenesHow to Find Genes

TAATGATAG

ATG

Page 15: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Gene-Finding Algorithm Gene-Finding Algorithm Input: DNA sequences, a threshold

gene length KOutput: All possible ORF sequencesProcedure:Scan each of 3 ORFs, and find

subsequence that start with ATG and end with one of (TAA, TAG, TGA)

Repeat above for the complementary sequences also

Page 16: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Risk of the Simple Gene Risk of the Simple Gene Finding AlgorithmFinding AlgorithmThe identified ORFs may arise

just from randomness.How likely is it for an ORF to be a

result of random sequences?Significance of an ORF to be

Gene:◦We expect the likelihood of ORF

being result of random sequences to be less than p.

Page 17: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Calculating pCalculating p3 out of 64 are stopping condonsP( run of k non-stop

condons)=(61/64)^k(61/64)^62=0.051

Setting k=64 (62+1 ATG+ 1 StopCondon)

will make sure the identified ORFs are less likely to be out of random permutation.

Page 18: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Permutation Permutation Test/Randomization TestTest/Randomization TestA generic method to estimate

significance level (p value)Example: how likely that a 10-condon

ORF is result of random permutation?Method:

◦Randomly generate (or permute given sequences) 10,000 sequences

◦Draw a histogram of seq lengths of sequences that have a stop-condon (Null distribution)

◦Calculate the percentage of random ORFs that have lengths >=10.

Page 19: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Estimating cut-off K for gene Estimating cut-off K for gene finding algorithmfinding algorithmExact theoretical calculation:

sensitive to the assumptions, equal probability of condons, etc

Randomized test: do a permutation test, find a length k such that <5% of random ORFs have lengths greater than k.

Page 20: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Sequence Alignment: the Sequence Alignment: the ProblemProblemGiven two sequences, measure

their similarityATAACTTTAATTAAATCCTTTTACTAAA

Page 21: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Web Tool to Align Two Web Tool to Align Two SequencesSequenceshttp://www.ebi.ac.uk/emboss/alig

n

Page 22: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

Applications of Sequence Applications of Sequence AlignmentAlignmentPrediction of functions of

(gene/protein/promoters) homology

Database search◦Find similar sequences that are similar

to our query sequence (e.g. new gene)Gene finding by genome

comparisonSequence divergence/phylogeny Sequence Assembly

Page 23: CSCE555 Bioinformatics Lecture 3 Gene Finding Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:

SummarySummaryTranscription, TranslationGene structures of Prokaryotic

and Eukaryotic cellsFinding genes (ORFs) for

prokaryotic cellsSequence alignment applications