wssp-12 chapter 4 sequencing dna

6
6/26/12 1 WSSP-12 Chapter 4 Sequencing DNA atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt Cloning Wolffia a. cDNA fragments into pTriplEX2 Determine the size of the insert by PCR and digests Sequencing DNA: Rapid DNA sequencing methods were first developed in the mid 1970's. DNA sequencing has developed rapidly; many genomes are completely sequenced. 1995 bacterium H. influenzae 1.8 x 10 6 bp ~1,700 genes • 1996 yeast Saccharomyces cerevisiae 12 x 10 6 ~6,000 genes 1998 nematode Caenorhabditis elegans 97 x 10 6 ~ 20,000 genes 2003 Human genome! 3 x 10 9 ~25,000 genes Handling the Explosion of Sequence Data

Upload: others

Post on 11-May-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WSSP-12 Chapter 4 Sequencing DNA

6/26/12

1

WSSP-12 Chapter 4�Sequencing DNA �

atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgctga ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgttgg attgaaggta attatcttgc atgagccagc tgatgagtat gatacagttt !!

Cloning Wolffia a. cDNA fragments into pTriplEX2 �Determine the size of the insert by PCR and digests�

Sequencing DNA:

•  Rapid DNA sequencing methods were first developed in the mid 1970's.

•  DNA sequencing has developed rapidly; many genomes are completely sequenced.

• 1995 bacterium H. influenzae 1.8 x 106 bp ~1,700 genes

• 1996 yeast Saccharomyces cerevisiae 12 x 106 ~6,000 genes

•  1998 nematode Caenorhabditis elegans 97 x 106 ~ 20,000 genes

•  2003 Human genome! 3 x 109 ~25,000 genes

Handling the Explosion of Sequence Data

Page 2: WSSP-12 Chapter 4 Sequencing DNA

6/26/12

2

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

Dec, 2008

GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. Currently >1.0 X 1012 bases in 1.08 X 106 sequence records in the traditional GenBank divisions. 1.5 X 1012 bases in 4.8 X 107 sequence records in the Whole Genome Sequencing (WGS) division.

Why are we sequencing these Genomes?

•  The information generated from these projects will serve as a blueprint for investigating the structure, function, and expression patterns of genes that are involved in various cellular processes (this was controversial at the time).

•  A goal of this project is to make you familiar with genes and searching nucleotide and protein databases.

•  The number and location of all restriction

sites without restriction mapping.

Once a gene is sequenced a lot of information can be determined about the gene

•  After conceptual translation of the DNA

sequence into protein sequence, possible similarities to other proteins.

Once a gene is sequenced a lot of information can be determined about the gene

AATTCGAGTTTGTG!!ASN-TRP-SER-LEU! ILE-ALA-VAL-CYS! PHE-GLU-PHE-TRP!

Frame 1!

Frame 2!

Frame 3!

•  After conceptual translation of the DNA

sequence into protein sequence, possible similarities to other proteins.

•  Structure predictions of the encoded protein based on the protein sequence.

Once a gene is sequenced a lot of information can be determined about the gene

Page 3: WSSP-12 Chapter 4 Sequencing DNA

6/26/12

3

1980 Nobel Prize: Fred Sanger and Walter Gilbert each developed methods for DNA sequencing in 1970s

Gilbert (Chemical Method)

Sanger (Enzymatic Method)

Almost everyone uses Sanger's method (or variants thereof) today. New methods being developed

How does it work? The fundamental idea behind both methods is the same.

v One needs a known or fixed starting point on one end of the DNA to be sequenced.

v DNA fragments are then generated that are random in length but end with a defined type of base--either A, G, C or T.

v The random populations of DNA fragments are then separated using high-resolution gels or chromatography.

v This gel system can separate fragments that differ in as little as one base in length.

No one really knows why DNA synthesis can't start fresh (without adding onto something that's already there). Will take advantage of this requirement

Synthesis of the newly synthesized strand goes in the opposite direction to the template strand!

5' 3'

5' 3'

Template

Deoxyribonucleotides (dA, dC, dG, dT) have an OH at the 3’ position that allow the next base to add onto the chain. Dideoxyribonucleotides (ddA, ddC, ddG, ddT) have a H at the 3’ position and prevent another base from adding onto the chain.

P P P C

C

C C

C

O Thymine

OH H

1’

2’ 3’

4’

5’

P P P C

C

C C

C

O Thymine

H

1’

2’ 3’

4’

5’

H

dTTP

ddTTP

p. 4-3

Page 4: WSSP-12 Chapter 4 Sequencing DNA

6/26/12

4

What happens if added equal amounts of ddC and dC?

3' -TACCGCAATGCAACT - 5'

5' -ATGGC 5' -ATGGCGTTAC 5' -ATGGCGTTACGTTGA…

20 (not in chapter)

(not in chapter)

Reading sequence the old way…

The figure on the right shows the action spectra of the four dyes that are normally linked to ddNTPs for automated DNA sequencing. Each dye fluoresces a different color when illuminated by a laser beam. BASE DYE WAVELENGTH AdRGG 570 ddATP GdROX 620 ddTTP CdR110 540 ddCTP TdTAMARA 600 ddGTP p. 4-3

Fluorescent dye terminators and automated DNA sequencing

• Since four different dyes are used, all the reactions can be done in a single tube, thus increasing throughput

• Some of the new sequencing machines use a small column (capillary), which can be reused.

• Sensitive lasers are used to determine the 3’ nucleotide of each successive fragment that migrates off the column

How does it work? •  Need to known starting point on the DNA •  Need a method to detect where each

base is positioned on the DNA strand. •  Separate products at single base

resolution

Page 5: WSSP-12 Chapter 4 Sequencing DNA

6/26/12

5

CTCGGAAGCGCGCCATTGTGTTGG!

CTCGGAAGCGCGCCATT!CTCGGAAGCGCGCCAT!CTCGGAAGCGCGCCA!

CTCGGAAGCGCGCCATTGTGTT!

CTCGGAAGCGCGCCATTGT!CTCGGAAGCGCGCCATTG!

CTCGGAAGCGCGCCATTGTGTTG!

CTCGGAAGCGCGCCATTGTGT!CTCGGAAGCGCGCCATTGTG!

CTCGGAAGCGCGCCATTGTGTTGGT!

CTCGGAAGCGCGCCATTGTGTTGGTACCC!CTCGGAAGCGCGCCATTGTGTTGGTACC!CTCGGAAGCGCGCCATTGTGTTGGTAC!CTCGGAAGCGCGCCATTGTGTTGGTA!

CTCGGAA!GAGCCAACGCGCGGTAACACAACCATGGGCCCTT!

Primer Template 3’

3’ 5’

5’

Separate fragments by size

See the Cycle Sequence Tutorial

27 28

atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgctga ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgttgg attgaaggta attatcttgc atgagccagc tgatgagtat gatacagttt !!

LOCUS AB231879 1383 bp mRNA linear INV 07-JUN-2006!DEFINITION Artemia franciscana mRNA for zinc finger protein Af-Zic, complete! cds.!ACCESSION AB231879!VERSION AB231879.1 GI:94966317!KEYWORDS .!SOURCE Artemia franciscana! ORGANISM Artemia franciscana! Eukaryota; Metazoa; Arthropoda; Crustacea; Branchiopoda; Anostraca;! Artemiidae; Artemia.!REFERENCE 1! AUTHORS Aruga,J., Kamiya,A., Takahashi,H., Fujimi,T.J., Shimizu,Y.,! Ohkawa,K., Yazawa,S., Umesono,Y., Noguchi,H., Shimizu,T.,! Saitou,N., Mikoshiba,K., Sakaki,Y., Agata,K. and Toyoda,A.! TITLE A wide-range phylogenetic analysis of Zic proteins: Implications! for correlations between protein structure conservation and body! plan complexity! JOURNAL Genomics 87 (6), 783-792 (2006)! PUBMED 16574373!REFERENCE 2 (bases 1 to 1383)! AUTHORS Aruga,J. and Toyoda,A.! TITLE Direct Submission! JOURNAL Submitted (10-AUG-2005) Jun Aruga, RIKEN Brain Science Institute,! Laboratory for Comparative Neurogenesis; 2-1 Hirosawa, Wako-shi,! Saitama 351-0198, Japan (E-mail:[email protected],! URL:http://www.brain.riken.go.jp/labs/lcn/, Tel:81-48-467-9791,! Fax:81-48-467-9792)!FEATURES Location/Qualifiers! source 1..1383! /organism="Artemia franciscana"! /mol_type="mRNA"! /db_xref="taxon:6661"! gene 1..1383! /gene="Af-Zic"! CDS 1..1383! /gene="Af-Zic"! /codon_start=1! /product="zinc finger protein Af-Zic"! /protein_id="BAE94140.1"! /db_xref="GI:94966318"! /translation="MTASLSASVMNPSFIKRESPASATALFVPNQFSAVPNFGFHHVP! SACATEQSSEMLNPFVDNHLRLNDQSNFQGYHHPHHGQIQQHHLGSYAARDFLFRRDM! GLGMGLEAHHTHAAQHHHMFDPSHAAAAAHHAMFTGFDHNTMRLPTEMYTRDASGYAA! QQFHQMGSMAPMAHPASAGAFLRYMRTPIKQELHCLWVDPEQPSPKKTCGKTFGSMHE! GKVFARSENLKIHKRTHTGEKPFKCEFEGCDRRFANSSDRKKHSHVHTSDKPYNCKVR! GCDKSYTHPSSLRKHMKVHGKSPPPASSGCDSDENESIADTNSDSAASPSPSSHDSSQ! VQVNHNRPPNHHNLGLGFTNPGHIGDWYVHQSAPDMPVPPATEHSPIGPPMHHPPNSL! NYFKTELVQN"!ORIGIN ! 1 atgactgcta gtttaagtgc aagcgtgatg aatccaagtt ttataaagag ggaaagtcct! 61 gcatcggcta cagccctgtt cgtaccaaac caatttagtg cagtgcctaa ttttggattt! 121 caccatgttc ctagtgcttg tgcaactgag caaagtagtg aaatgctgaa cccttttgtg (Note: the rest of the DNA sequence was deleted to save space)!!

Genbank DNA sequence report

General Databases:

NCBI DNA and protein sequences (USA database) EMBL DNA sequences (European Molecular Biology Laboratory) GenEMBL GenBank and EMBL sequences combined DDBJ DNA sequences (Japan’s equivalent of Genbank) PIR Protein Identification Resource (protein sequences) SwissProt Protein sequences (Switzerland and EMBL) Genpept Translations of DNA based on authors’ information PDB Coordinates for protein 3D structure. (Now maintained at Rutgers) Organism Specific Databases:

Sanger Worm sequence and genomic database SGD Saccharomyces Genomic Database YPD Yeast Protein Database WPD Worm Protein Database WormBase C. elegans Flybase Drosophila sequence and genetic database Human Many

Page 6: WSSP-12 Chapter 4 Sequencing DNA

6/26/12

6

DNA search programs BLAST--basic local alignment search tool BLASTn--you provide nucleotide sequence, program compares

and reports nucleotide similarity alignment BLASTp--you provide protein sequence, program compares

and reports protein similarity alignment BLASTx--you provide nucleotide sequence, program translates In all six reading frames and compares and reports protein similarity alignment

All three of these programs will be used in this project.

Next Generation DNA Sequencing

•  Traditional Sanger Sequencing –  700-1000 bp –  96 samples/run

•  Roche 454 –  200-400 bp –  1 million/run

•  NextGen: SOLiD/Illumina short read sequencing –  25-50 bp –  >300 million/run

Genomic scaffold

SOLiD System Overview

© 2008 Applied Biosystems