stratton nature 45: 719 , 2009

21
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY

Upload: sheena

Post on 22-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

DNA SEQUENCING & ASSEMBLY. Evolution of DNA sequencing technologies - 1980 to present day . Stratton Nature 45: 719 , 2009. $$$ Motivation to “spur DNA sequencing technologies, boost accuracy and drive down costs” . - PowerPoint PPT Presentation

TRANSCRIPT

  • Stratton Nature 45: 719, 2009Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY

  • The X Prize Foundation of Playa Vista, California, is offering a $10-million prize to the first team to accurately sequence the genomes of 100 people aged 100 or older, for $1,000 or less apiece and within 30 days [beginning September 5, 2013]. see Nature 487:417, July 26, 2012$$$ Motivation to spur DNA sequencing technologies, boost accuracy and drive down costs with an accuracy of
  • but it stops when dideoxynucleotideis incorporated4 parallel sets of reactions: ddATP + 4 dNTPsddCTP + 4 dNTPs etc.Fig. 4.2Sanger chain termination method (Fred Sanger, 1977)- enzymatic synthesis of DNA strand complementary to template of interestNobel Prizes: Sanger 1958 (protein structure)1980 (DNA sequencing)

  • Fig. 4.2Ratio of ddATP:dATP importantto get appropriate size range of products- set of products each terminating with ddA- their sizes reflect positions of T in template DNA

  • Products (each differing in length by 1 nt) resolved on denaturing polyacrylamide gels...Automated sequencing profileAutoradiographFig. 4.1Fig. 4.3or by capillary electrophoresis

  • Fig. 4.5PRIMERS FOR SEQUENCING1. Universal - forward & reverse2. Custom-designed internal use new sequence info to design primer to sequence next stretchIf insert is too long to completely sequence using universal primers, can use this strategy to close a sequencing gap

  • or can find another clone in library that has overlap, and sequence it using universal primersFig. 3.35

  • if particular region of genome is not represented in clone library can use a different vector to prepare a second clone library then use probes (eg. oligomers) mapping to ends of contigs from first library to screen second library(maybe region was unstable in first vector)What if there is a physical gap?Fig. 4.17

  • Fig. 4.11Which contigs are adjacent? or by PCRYou have 9 contigs & design oligomers mapping close to their ends (#1-18)Example of closing a physical gapscreening by hybridization 8 7 1 2

  • What if physical gap is very short?- then sequence the PCR product directly- could use oligomers mapping to ends of contigs in PCRreactions with uncloned DNA template3 55 3 < 10 kb or so- this slide also illustrates a method for finding overlapping clones

  • Fig. 4.12- repeat to walk along genomeASSEMBLING INFO FROM CLONES INTO CONTIGS1. CHROMOSOME WALKING by hybridization- sequence from one clone is used as probe to screenlibrary of clones to find overlapping one

  • But what if probe contains repeated sequences?Problem avoided if use short unique-sequence probe(eg oligomer) mapping close to end of clone- so hybridizes to multiple clones or if pre-hybridize with repeat sequenceFig. 3.34

  • 2. CHROMOSOME WALKING by PCRFig. 4.13- reactions can be carried out as pools for more rapid screening- design primer pairs based on sequence at end of clone- use other clones in library for template DNA- will get PCR amplicon for any new clones with that sequence(combinatorial screening) Fig. 4.14

  • Fig. 4.15A3. CLONE FINGERPRINTINGRestriction profile fingerprint To identify overlapping clones: by finding features that they shareor clones having STS in common (Fig.4.15D)

  • Fig. 4.10Haemophilus genome project 1995 (1.8 Mbp)1. DNA sonicated, fragments (1.6 2 kb) cloned in plasmid vectors2. Shotgun sequencing of insert ends~ 20,000 clones analyzed, 11 Mbp of sequence, scaffolds with sequencing gaps & physical gaps4. Screened for overlapping clones reduced to 42 contigs3. Assembled into 140 contigs5. Assumed gaps represented genome regions unstable in plasmid vector - switched to lambda vector6. Probed l library with oligomers from contig ends or used PCR with primer pairs from contig ends

  • Cost per Megabase of DNA Sequence (or Why biologists panic about computing) Next generation sequencing technologiesNational Human Genome Research Institute - major challenge to correctly assemble the massive amount of sequence data generatedand to interpret it !

  • Genome Res 11:3, 2001- one dNTP is added at a time + enzyme (apyrase) that degradesdNTP if not incorporated into new strand, then next dNTP added - incorporation detected by chemiluminescence of pyrophosphate (PPi)Fig. 4.91. PyrosequencingCwww.youtube.com/watch?v=kYAGFrbGl6E&feature=related

  • Medini Nat Rev Microbiol. 6:419, 2008- DNA sheared, adaptors ligated, attached to bead & PCR amplified- beads captured in wells & pyrosequencing carried out in parallel on each DNA fragmentEnzymes on beads and primerSample preparationPyrosequencingPCRPolymerasePPiLightGenomic DNA- average read of ~ 700 (?) bpMassively-parallel pyrosequencing (on beads or chips)454 technology ... but up to 1.6 million reactions can be carried out in parallel on a 6.4 cm2 slideexpect ~ 500 million nucleotides of sequence data per 10 hour run (July 2010)

  • 2. Illumina sequencing (parallel microchip)- average read of ~ 40-100 bp (short-read) - add adaptors to sheared DNA, attach to chip, then PCR bridge amplification- denature clusters of ~ 1000 copies of DNA molecules & sequential sequencing using four fluorophore-labelled ntsSOLEXA technologyMedini Nat Rev Microbiol. 6:419, 2008www.youtube.com/watch?v=HtuUFUnYB9Y&feature=relatedHiSeq 2000 HiSeq 1000(Illumina website Sept. 2012)

    Output (2 100 bp)600 Gb300 GbRun Time (2 100 bp) ~11 days~8.5 daysPaired-end Reads6 Billion3 BillionSingle Reads3 Billion1.5 BillionMaximum Read Length**2 100 bp2 100 bpBases Above Q30***> 85% (2 x 50 bp) > 80% (2 x 100 bp)

  • 3. Single molecule real-time sequencing (Helicos, Pacific Biosciences)Metzker Nature Reviews Genetics 11:31, 2010- continuous monitoring of nt incorporation (rather than termination as in Sanger method) and no amplification- formation of phosphodiester bond releases fluorophore- nanoscale wells on chip so ~ one DNA polymerase molecule per well(Helicos website Sept. 2012)- read length 25 to 55 bases, 21-35 Gigabases per run

  • Chin et al. New Eng J Med 364:33, 2011Press release, Dec 9,2010: PacBio & Harvard Use Fast Gene Sequencer to Crack DNA Code of Haitian Cholera StrainH1 and H2 strains were sequenced in < 24 hr with enough reads to cover the genomes 60 and 32 times, respectively.