intro to molbiol and tics

Upload: nik-dim

Post on 07-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Intro to Molbiol and tics

    1/81

  • 8/3/2019 Intro to Molbiol and tics

    2/81

    The Hierarchical Structure of an organism

    L

    evelofOrganization

    organism

    organs

    tissues

    cellschromsomes

    http://images.google.co.il/imgres?imgurl=www.chromosome5.com/chromosome.jpg&imgrefurl=http://www.chromosome5.com/&h=500&w=551&prev=/images%3Fq%3Dchromosome%2Bpicture%26svnum%3D10%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26sa%3DNhttp://images.google.com/imgres?imgurl=www.washington.edu/newsroom/news/images/heart.jpg&imgrefurl=http://www.washington.edu/newsroom/news/images/&h=1392&w=1040&prev=/images%3Fq%3Dheart%2Btissue%26svnum%3D10%26hl%3Den%26lr%3D%26ie%3DUTF-8
  • 8/3/2019 Intro to Molbiol and tics

    3/81

    http:/

    /www.brooklyn.cuny.e

    du/bc/ahp/LAD/C4/C4

    _Chromosomes.html

    Chromosomes are made up of protein and DNA

  • 8/3/2019 Intro to Molbiol and tics

    4/81

    Chromosomes

    Figure 4-14. Two closely related species of deer with very different chromosome numbers. In the evolution of the Indianmuntjac, initially separate chromosomes fused, without having a major effect on the animal. These two species have roughlythe same number of genes. (Adapted from M.W. Strickberger, Evolution, 3rd edition, 2000, Sudbury, MA: Jones & Bartlett

    Publishers

  • 8/3/2019 Intro to Molbiol and tics

    5/81

    dista

    ncelearning.ksi.edu/

    demo/bio378/lecture

    .html

    DNA is a macromolecule composed of four basic units

  • 8/3/2019 Intro to Molbiol and tics

    6/81

    A chromosome is a long sequence

  • 8/3/2019 Intro to Molbiol and tics

    7/81

    A simple view of the central dogma

  • 8/3/2019 Intro to Molbiol and tics

    8/81

    All organisms on earth use the same operating system

    http://www.cbs.dtu.dk/staff/dave/roanoke/central_new_dogma.gif
  • 8/3/2019 Intro to Molbiol and tics

    9/81

    Central Dogma of GeneExpression

  • 8/3/2019 Intro to Molbiol and tics

    10/81

    Dinner discussion: Integrative Bioinformatics & Genomics VU

    metabolome

    proteome

    genome

    transcriptome

    physiome

    Genomics

  • 8/3/2019 Intro to Molbiol and tics

    11/81

    Genetic information is carried as three base genetic code

    Four bases (A G C T/U) must encode for 20 a.a.

    Therefore a combination is required: 43 = 64

    Triplet code is called a CODON that must begin at a precise site

    Of 64, 61 specify individual a.a.; and three are STOP codons

    starting codon is AUG (methionine)

    Code is universal, synonymous,degenerate

    Reading frame

    3rd base in codon wobble

    frameshifts/deletions/insertions

    (MUTATIONS)

  • 8/3/2019 Intro to Molbiol and tics

    12/81

    The genetic code

  • 8/3/2019 Intro to Molbiol and tics

    13/81

    All organisms use the same genetic code

  • 8/3/2019 Intro to Molbiol and tics

    14/81

    Three different reading frames

  • 8/3/2019 Intro to Molbiol and tics

    15/81

    Codon Bias

    Gene Findersare oftenorganism

    specific Coding

    regions oftenmodelled by

    5th orderMarkov chain(hexamers/di-codons)

  • 8/3/2019 Intro to Molbiol and tics

    16/81

    Important Features

    DNA contains genetic template" forproteins.

    DNA is found in the nucleus

    Protein synthesis occurs in thecytoplasm - ribosome.

    "Genetic information" must betransferred to the cytoplasm where

    proteins are synthesized.

  • 8/3/2019 Intro to Molbiol and tics

    17/81

    DNA proteingenetic code

    *

    There are between 30,000 to 40,000 genes in the human genome

    The human gene inventory corresponds to ~1.5% of the genome

    (coding regions)

    http://www.iacr.bbsrc.ac.uk/notebook/courses/guide/images/dna.gifhttp://www.missouri.edu/~jesse105/ar2001/Images/protein.gif
  • 8/3/2019 Intro to Molbiol and tics

    18/81

    The average human gene is 1.4 kb long,

    but distributed in exons over an average of 30 kb.

    There are about 11 genes/ Mb DNA

    Chromosome 19 having the highest density (close to 30).

    The average gene

    Ashurst, J.L. and Collins, J.E. 2003.Annu Rev Genomics Hum Genet 4: 69-88.

  • 8/3/2019 Intro to Molbiol and tics

    19/81

  • 8/3/2019 Intro to Molbiol and tics

    20/81

    Component Regions of RNA

    The leading and trailing regions, the 5 and 3 untranslated regions (UTR), areimportant for cellular positioning, message stability, and the efficiency with whichprotein can be made from the mRNA. The open reading frame codes for the protein.

  • 8/3/2019 Intro to Molbiol and tics

    21/81

    Insulin gene

    promoter

    +1

    poly-A

    signal

    100 bp

    +1411

    1 exon 2intron 1 intron 2 exon 3

    primary transcript (1431 nt)

    transcription

    splicing

    1 exon 2 exon 3

    capping, polyadenylation

    mRNA (465 nt + poly-A tail)

    AAAAAAAAAAAAAAAAAAAAAAAAAAG

    1 exon 2 exon 3

    Most eukaryotic genes contain introns

    5 end 3 end

    5 end 3 end

  • 8/3/2019 Intro to Molbiol and tics

    22/81

  • 8/3/2019 Intro to Molbiol and tics

    23/81

    AGCCCUCCAGGACAGGCUGCAUCAGAAGAGGCCAUCAAGCAGAUCACUGUCCUUCUGCCAUGGCCCUGUGGAUGCG

    CCUCCUGCCCCUGCUGGCGCUGCUGGCCCUCUGGGGACCUGACCCAGCCGCAGCCUUUGUGAACCAACACCUGUGC

    GGCUCACACCUGGUGGAAGCUCUCUACCUAGUGUGCGGGGAACGAGGCUUCUUCUACACACCCAAGACCCGCCGGG

    AGGCAGAGGACCUGCAGGUGGGGCAGGUGGAGCUGGGCGGGGGCCCUGGUGCAGGCAGCCUGCAGCCCUUGGCCCU

    GGAGGGGUCCCUGCAGAAGCGUGGCAUUGUGGAACAAUGCUGUACCAGCAUCUGCUCCCUCUACCAGCUGGAGAAC

    UACUGCAACUAGACGCAGCCUGCAGGCAGCCCCACACCCGCCGCCUCCUGCACCGAGAGAGAUGGAAUAAAGCCCU

    UGAACCAGC

    Insulin Gene mature mRNA sequence

    1

    465

    3

    5

    AAAAAAAAAAAAAAAAAAAAAAAAAAAAA

    59

    392

    159 = 5 UTR

    60-388 = protein-coding region

    392-465 = 3 UTR

  • 8/3/2019 Intro to Molbiol and tics

    24/81

    RNA Processing, including capping and splicing,is co-transcriptional

  • 8/3/2019 Intro to Molbiol and tics

    25/81

    TYPES OF INTRONS

    GUAG INTRONS;

    AU AC INTRONS;

  • 8/3/2019 Intro to Molbiol and tics

    26/81

    The GT-AG (or GU-AG) Rule

    Intron boundaries are defined by the nucleotides GU

    (GT in DNA) and AG. Called the GT-AG rule.

    Splicing enhancers (and silencers) are found in the exons.

    The majority of animal and plant introns are removed by thespliceosome that recognizes GT-AG introns.

    However, plants and animals (but not fungi) have a second alternativespliceosome that is responsible for splicing non-canonical introns.

    After removal from the primary transcript, virtually all introns aredegraded.

    Alternative splicing is thought to explain our complexity despite ourlimited number of protein coding genes (~30,000).

  • 8/3/2019 Intro to Molbiol and tics

    27/81

    HOW THE SPLICING STARTS?

    Andrew P. ReadHuman Molecular Genetics

  • 8/3/2019 Intro to Molbiol and tics

    28/81

    The Splicing Reaction

    An unusual 5-2linkage is madebetween thebranch pointnucleotide and the5 splice site.

    The the free 3 endof the 5 exon willdisplace the 3splice site.

    This liberates theso-called lariatstructure, which isdegraded.

  • 8/3/2019 Intro to Molbiol and tics

    29/81

    The Branch Point Attack in More Detail

  • 8/3/2019 Intro to Molbiol and tics

    30/81

    A mature mRNA transcript looks like this

  • 8/3/2019 Intro to Molbiol and tics

    31/81

  • 8/3/2019 Intro to Molbiol and tics

    32/81

    Alternative Splicing

    The pre-mRNA contains the introns and the exons encoded in the DNA. For the mRNAto produce a functional protein, the introns must be removed. In removing the introns,a variety of potential exon combinations are possible, ie, different combinations ofexons may be joined together to generate different forms of the same protein.

  • 8/3/2019 Intro to Molbiol and tics

    33/81

    female

    male

    Alternative splicing can generate different polypeptides

    487 amino acidpolypeptide

    549 amino acidpolypeptide

  • 8/3/2019 Intro to Molbiol and tics

    34/81

    a-tropomyosin splicing in different cell types

  • 8/3/2019 Intro to Molbiol and tics

    35/81

    Summary

    A Drosophilahomolog of human Down syndrome cell adhesion molecule (DSCAM), an immunoglobulinsuperfamily member, is required for the formation of axon pathways in the embryonic central nervoussystem. cDNA and genomic analyses reveal the existence of multiple forms of Dscam with a conservedarchitecture containing variable Ig and transmembrane domains. Alternative splicing can potentiallygenerate more than 38,000 Dscam isoforms. This molecular diversity may contribute to the specificityof neuronal connectivity.

    (12) (2)(48) (33)

    A huge amount of diversity can be derivedfrom a single gene

  • 8/3/2019 Intro to Molbiol and tics

    36/81

    Forms of alternative splicing

    Exon skipping / inclusion

    Alternative 3 splice site

    Alternative 5 splice site

    Mutually exclusive exons

    Intron retention

    Constitutive exon Alternatively spliced exons

  • 8/3/2019 Intro to Molbiol and tics

    37/81

    Alt splicing as a mechanism of gene regulation

    Functional domains can be added/subtracted protein diversity

    Can introduce early stop codons, resulting in truncated proteins or

    unstable mRNAs

    It can modify the activity of the transcription factors, affecting theexpression of genes

    It is observed nearly in all metazoans

    Estimated to occur in 30%-60% of human

  • 8/3/2019 Intro to Molbiol and tics

    38/81

    How to study alternative splicing?

  • 8/3/2019 Intro to Molbiol and tics

    39/81

    Mapping the human genome

    1953 DNA double helix (Watson and Crick)1972 Recombinant DNA (Berg, et al.)1977 DNA sequencing (Maxam and Gilbert, and Sanger)1980 Physical mapping by RFLPs (Bostein, Davis, and White)1985 PCR (Mullis)1986 Automated DNA sequencing machine (Hood and Smith)1987 YACs (Burke, Olson, and Carle) Fluorescent chain-

    terminating dideoxynucleotides (DuPont)Commercial DNA sequencing machine (AppliedBiosystems)

    1991 Expressed Sequence Tag (EST, Venter et al.)1992 Bacterial Artificial Chromosomes (BACs, Shizuya et al.)1997 Capillary sequencing machine (Molecular Dynamics)

    Cloning and Sequencing

  • 8/3/2019 Intro to Molbiol and tics

    40/81

    What is bioinformatics?

    Interface of biology and computers

    Analysis of proteins, genes and genomes

    using computer algorithms andcomputer databases

    Genomics is the analysis of genomes.

    The tools of bioinformatics are used to makesense of the billions of base pairs of DNAthat are sequenced by genomics projects.

  • 8/3/2019 Intro to Molbiol and tics

    41/81

    Mapping the human genome

    Bioinformatics1970 Global alignment, dynamic programming (Needleman and Wunsch)1981 Local alignment (Smith and Waterman)1990 Basic Local Alignment Search Tool (BLAST, Altschul et. al.)

    1994 Hidden Markov Model (HMM, Krogh et. al.)protein domaingene structure

    1995 Phred/phrap (Phil Green and Brent Ewing)Phred: assign confidence score to sequenced nucleotidePhrap: assemble sequences

    Align genome and cDNA/EST: sim4, spidey, BLATGene prediction: GeneScan, FgeneSH, Genie, GeneWise

  • 8/3/2019 Intro to Molbiol and tics

    42/81

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

    20

    30

    4050

    60

    70

    80

    90

    100

    110

    120

    130

    140

    150

    160

    170

    180

    190

    200

    210

    220

    230

    240

    Chromosome number

    Length of the human genome

    Estimated base-pairs in the genome: 3,231,365,992

    Basic characteristics of the human genome

  • 8/3/2019 Intro to Molbiol and tics

    43/81

    g

    T841,181,383

    A839,853,043

    C

    581,077,516

    G

    581,490,131

    It is estimated that 88% is in finished form.

    Currently in finished form: 2,843,602,073 base-pairs

    10 20 30 40 50 60 70

    A

    C

    G

    T

    Mbp (mega base-pairs)

    Estimated base-pairs in the genome: 3,231,365,992

    Colors correspondto chromosomes

    GC content of the human genome

  • 8/3/2019 Intro to Molbiol and tics

    44/81

    IHGSC. Nature (2001) 409 860-921

    GC content of the human genome

  • 8/3/2019 Intro to Molbiol and tics

    45/81

    The average chromosome 6

    167,000,000 bp

    6p21Major Histocompatibility Complex

    2,190 genes 61% of the genes have CpG

    islands 2kb upstream and

    1kb downstream of 5 and 3

    ends of genes. Mean gene length 32,530 bp

    Mean exon length 318 bp

    Mean transcript per gene

    1.79

    Mean exons per gene 5.28Mungall, A.J. et al. 2003. The DNA sequence and analysis of human chromosome 6.Nature 425: 805-811.

    Human genome: individuals 0 1%

  • 8/3/2019 Intro to Molbiol and tics

    46/81

    Human genome: individuals 0.1%different

    Lots of variation!3.2 x 109 bp/genome x 0.001 changes/bp =

    3.2 x 106 changes/genome

    Two major types of variationSNPs, RFLPsRepeated DNA - short to long repeats

    For every person . . .

    Di t ib ti f DNA A di

  • 8/3/2019 Intro to Molbiol and tics

    47/81

    Distribution of DNA Accordingto Function

  • 8/3/2019 Intro to Molbiol and tics

    48/81

    Genome structure: GC content

    GC vs. genes GC vs. introns/exons

  • 8/3/2019 Intro to Molbiol and tics

    49/81

    Protein-coding genes

  • 8/3/2019 Intro to Molbiol and tics

    50/81

    Pawson, T. et al., Trends in Cell Biology Vol.11 No.12 December 2001

    The SH2 domain is found embedded in a wide variety of metazoan proteins that

    regulate functionally diverse processes.

    Sequence definition

    Distinct regions of proteinsequence that are highlyconserved in evolution.

    Recurrence of domains

  • 8/3/2019 Intro to Molbiol and tics

    51/81

    Recurrence of chromosomal segments

    Science v. 291, pp 1304-1351

  • 8/3/2019 Intro to Molbiol and tics

    52/81

    Recurrence of segments

    Science v. 291, pp 1304-1351

  • 8/3/2019 Intro to Molbiol and tics

    53/81

    There are dead genes in the genome

    www.people.virginia.edu/ ~rjh9u/hbmut.html

  • 8/3/2019 Intro to Molbiol and tics

    54/81

    Finding genes -- computer searches

    Computer searches locate most genes in prokaryotes,Archeae, and yeast, but only ~1/3 of human genes areidentified correctly.

    CriteriaProtein start, stop signals, splicing signals . . .Codon biasComparisons to other genomes (mouse, rat, fish, fly,

    mosquito, worm, yeast . . .)

    Some hard problems: small genes, post-translationalmodifications,unique genes, spliced genes, alternative splicing, generearrangements (e.g. IgGs) . . .

  • 8/3/2019 Intro to Molbiol and tics

    55/81

    Virus: 1,377 viral genomes and 36 viroids Organelle: mitochondria (~ 600); chloroplasts (~40) Microbial: archaea (~20); bacteria (~700) Eukaryota:

    YeastSaccharomyces cerevisiae(bakers); Schizosaccharomyces pombe(fission)

    MetazoaHomo sapiens(human); Pan troglodytes (chimp); Musmusculus(mouse); Rattus norvegicus(rat); Gallus gallus(chicken); Drosophilamelanogaster(fly);Anopheles gambiae(mosquito); Caenorhabditis elegans(worm);Fugu rubripes (Puffer fish); Danio rerio(zebrafish);

    PlantsArabidopsis thaliana(thale-cress); Oryza sativa(rice);Avenasativa(oat); Glycine max(soybean); Hordeum vulgare(barley);Lycopersicon esculentum(tomato); Triticum aestivum(breadwheat); Zea mays(corn)

    OthersEncephalitozoon cuniculi; Guillardia theta nucleomorph;

    Plasmodium falciparum;Leishmania major

    Completed Genomes (as of 2004)

    Si il f d i

  • 8/3/2019 Intro to Molbiol and tics

    56/81

    Similar genes are found across organisms

    Protein kinase, cAMP-dependent, catalytic, alpha

    Mus TCTTAGACAAGCAGAAGGTGGTGAAGCTAAAGCAGATCGAGCACACTCTGAATGAGAAGC

    Rat TCTTGGACAAGCAGAAGGTGGTGAAGCTGAAGCAGATCGAGCACACTCTGAATGAGAAGC

    Chinese TCTTGGACAAACAGAAGGTGGTGAAGCTGAAGCAGATTGAGCACACTCTAAATGAGAAGC

    Oryctolagus TCCTCGACAAACAGAAGGTGGTGAAGCTGAAACAGATCGAGCACACCCTGAACGTTAAAC

    Canis TCCTCGACAAACAGAAGGTCGTGAAGCTGAAACAGATTGAGCATACCCTGAACGAAAAGC

    Ovis TCCTCGACAAACAGAAGGTGGTGAAGCTGAAACAGATTGAGCACACCCTGAACGAGAAGC

    B.taurus TCCTCGACAAACAGAAGGTGGTGAAGCTGAAACAGATTGAGCACACCCTGAATGAGAAGC

    Homo TCCTCGACAAACAGAAGGTGGTGAAACTGAAACAGATCGAACACACCCTGAATGAAAAGC

    ** * ***** ******** ***** ** ** ***** ** ** ** ** ** * ** *

  • 8/3/2019 Intro to Molbiol and tics

    57/81

    The Minimal Genome

    E. coli

    H.influenzae M.genitalium

    1,146

    1,129

    88918

    10

    239

    1

    O i # Ch # G E I t

  • 8/3/2019 Intro to Molbiol and tics

    58/81

    Organism # Chromosomes # Genes Exons Introns

    Mycoplasmagenitalium

    1 500 500

    1/gene

    0

    Deinococcusradiodurans

    2 3200 3500

    1.02/gene

    61

    Saccharomyces

    cerevisiae

    16 6200 6500

    1.04/gene

    220

    C. elegans 6 18,000 91,000

    5/gene

    73,000

    4/gene

    Drosophila

    melanogaster

    5 14,000 54,000

    4/gene

    44,000

    3/gene

    60 bp/intron

    Arabodopsis

    thaliana

    5 25,000 133,000

    5/gene

    247/exon

    107,000

    4/gene

    169 bp/intron

    Homo sapiens 23 30,000 310,000

    8+/gene

    455 bp/exon

    250,000

    7/gene

    3400 bp/intron

  • 8/3/2019 Intro to Molbiol and tics

    59/81

    Needles in Haystacks...

    Only 2% of human genome is

    coding regions

    Intron-exon structure of genes Large introns (average 3365 bp )

    Small exons (average 145 bp)

    Long genes (average 27 kb)

  • 8/3/2019 Intro to Molbiol and tics

    60/81

    ESTs (Expressed Sequence Tags)

    Single-pass sequencing of a small (end) piece of cDNA

    Typically 200-500 nucleotides long

    It may contain coding and/or non-coding region

  • 8/3/2019 Intro to Molbiol and tics

    61/81

    ESTs

    Cells from a specific

    organ, tissue ordevelopmental stage

    AAAAAA 35

    AAAAAA 35

    TTTTTT53

    AAAAAA 35

    TTTTTT53

    TTTTTT

    53

    AAAAAA 35

    TTTTTT53

    mRNA extraction

    RNA

    DNA

    Double stranded cDNA

    Add oligo-dT primer

    Reverse transcriptase

    Ribonuclease H

    DNA polimeraseRibonuclease H

  • 8/3/2019 Intro to Molbiol and tics

    62/81

    ESTs

    AAAAAA 35

    TTTTTT53Clone cDNA into a vector

    Multiple cDNA clones5 EST

    3 EST

    Single-pass sequence reads

  • 8/3/2019 Intro to Molbiol and tics

    63/81

    Splice variants

    Genomic

    Primary transcript

    Splicing

    cDNA clones

    (double stranded)

    EST sequences

    (Single-pass sequence reads)5 3 5 3

    Sampling the Transcriptome with ESTs

    oligo-dT primer

    Reverse transcriptase

    Large scale EST-sequencing coupled to Genome

  • 8/3/2019 Intro to Molbiol and tics

    64/81

    sequencing

  • 8/3/2019 Intro to Molbiol and tics

    65/81

    EST sequencing

    Is fast and cheap

    Gives direct information about the gene sequence

    Partial information

    Resulting ESTs Known gene

    (DB searches) Similar to known geneContaminant

    Novel gene

  • 8/3/2019 Intro to Molbiol and tics

    66/81

    Number of public entries: 20,039,613

    Summary by organism

    Homo sapiens (human) 5,472,005 Mus musculus + domesticus (mouse) 4,056,481

    Rattus sp. (rat) 583,841

    Triticum aestivum (wheat) 549,926

    Ciona intestinalis 492,511

    Gallus gallus (chicken)460,385

    Danio rerio (zebrafish) 450,652

    Zea mays (maize)391,417

    Xenopus laevis (African clawed frog)

    359,901

    dbEST release 20 February 2004

  • 8/3/2019 Intro to Molbiol and tics

    67/81

    EST lengths

    Human EST length distribution(dbEST Sep. 2003 )

    ~ 450 bp

  • 8/3/2019 Intro to Molbiol and tics

    68/81

    ESTs provide expression data

    eVOC Ontologies http://www.sanbi.ac.za/evoc/

    AnatomicalSystem

    Cell Type

    The tissue, organ or anatomical system from which the sample was prepared.Examples are digestive, lungand retina.

    Pathology

    The precise cell type from which a sample was prepared. Examples are: B-

    lymphocyte, fibroblastand oocyte.

    DevelopmentalStage

    The pathological state of the sample from which the sample was prepared.Examples are: normal, lymphoma, and congenital.

    Pooling

    The stage during the organism's development at which the sample was prepared.Examples are: embryo, fetus, and adult.

    Indicates whether the tissue used to prepare the library was derived from single ormultiple samples. Examples are pooled, pooled donorand pooled tissue.

    J Kelso et al. Genome Research 2002

  • 8/3/2019 Intro to Molbiol and tics

    69/81

    Exon Size

    0

    5

    10

    15

    20

    25

    30

    35

    1-

    100

    100-

    200

    200-

    300

    300-

    500

    >500

    Fungi

    Verterbrate

  • 8/3/2019 Intro to Molbiol and tics

    70/81

    Intron Size

    0

    10

    20

    30

    40

    50

    60

    70

  • 8/3/2019 Intro to Molbiol and tics

    71/81

    Intron Prevalence

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    0 1 >1

    Yeast

    Fungi

    Mammal

  • 8/3/2019 Intro to Molbiol and tics

    72/81

    Gene Finding Challenges

    Need the correct reading frame Introns can interrupt an exon in mid-

    codon

    There is no hard and fast rule foridentifying donor and acceptorsplice sites

    Signals are very weak

  • 8/3/2019 Intro to Molbiol and tics

    73/81

    Codon Bias

    Gene Findersare oftenorganism

    specific Coding

    regions oftenmodelled by

    5th orderMarkov chain(hexamers/di-codons)

  • 8/3/2019 Intro to Molbiol and tics

    74/81

    Overpredicting Genes

    Easy to predict all exons

    Report all sequences

    flanked by ..AG and GT.. asexons

    Sensitivity = 100%

    Specificity ~ 0%

  • 8/3/2019 Intro to Molbiol and tics

    75/81

    Locating ORFs

    Simplest method of predicting coding regions isto search for open reading frames (ORFs)

    open reading frames begin with a start (AUG)codon, and ends with one of three stop codons

    Six total reading frames

  • 8/3/2019 Intro to Molbiol and tics

    76/81

    Locating ORFs

    Example from HW#1:

    AUUGCAAUGGAAUUAGUAAUCUCUAUUUCCGCCCUUAUUAUAGUUGAAUAGAUAGCCGUA

    E L V I S I S A L I I V E

    O

  • 8/3/2019 Intro to Molbiol and tics

    77/81

    Locating ORFs

    Prokaryotes: DNA sequences coding for proteinsgenerally transcribed into mRNA which is translated intoprotein with very little modification

    Locating an open reading frame from a start codon to astop codon can give a strong suggestion into proteincoding regions

    Longer ORFs are more likely to predict protein-coding

    regions than shorter ORFs.

    L i ORF

  • 8/3/2019 Intro to Molbiol and tics

    78/81

    Locating ORFs

    Eukaryotes: mRNA undergoes processing toremove introns before the protein is translated

    ORF corresponding to a gene may containregions with stop codons found within intronicregions

    Posttranscriptional modification makes geneprediction more difficult

    L i Si il S

  • 8/3/2019 Intro to Molbiol and tics

    79/81

    Locating Similar Sequences

    Take the new DNA sequence, translatinginto six reading frames

    Compare each to protein sequence

    databases

    Locates known open reading frames

    L i Si il S

  • 8/3/2019 Intro to Molbiol and tics

    80/81

    Locating Similar Sequences

  • 8/3/2019 Intro to Molbiol and tics

    81/81

    Top ten challenges for bioinformatics

    [1] Precise models of where and when transcriptionwill occur in a genome (initiation and termination)

    [2] Precise, predictive models of alternative RNA splicing

    [3] Precise models of signal transduction pathways;ability to predict cellular responses to external stimuli

    [4] Determining protein:DNA, protein:RNA, protein:proteinrecognition codes

    [5] Accurate ab initio protein structure prediction