ultrasequencing: methods and applications of the new...

15
Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms Laura Moya Andérico Master in Advanced Genetics Genomics Class December 16 th , 2015

Upload: duongdang

Post on 27-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Ultrasequencing: Methods and Applications of the New

Generation Sequencing Platforms

Laura Moya Andérico Master in Advanced Genetics

Genomics Class December 16th, 2015

Brief Overview First-generation sequencing— Sanger

• Sample preparation: DNA fragments are labeled with dyes • Capillary electrophoresis: arranges fragments by length

• Average read length ~800 bp

• Expensive, low throughput, high turnaround time

Second-generation sequencing— Illumina • Higher throughput • “Wash-and-scan” process • Higher turnaround time • Requires PCR amplification prior to sequencing • Dephasing yields many problems

Transition to TGSIon Torrent semiconductor sequencer

• Simplifies sequencing process • Pros and cons

Helicos Genetic Analysis Platform • First available instrument with SMS technology • High turnaround time but no PCR is required • Process is similar to “wash-and-scan” in SGS

Single molecule sequencing— improved sequencing by synthesis technology (SBS) used in SGS

and ultimately, throughput is limited as well, compared withwhat SMS platforms will be capable of achieving.

Sitting even closer to the TGS boundary is the HelicosGenetic Analysis Platform, the first commercially availablesequencing instrument to carry out SMS (24–27). TheHelicos sequencing instrument works by imaging individualDNA molecules affixed to a planar surface as they are extendedusing a defined primer and a modified polymerase as well asproprietary fluorescently labeled nucleotide analogues, referredto as Virtual Terminator nucleotides, in which the dye isattached to the nucleotide via a chemically cleavable groupthat allows for step-wise sequencing to be carried out (25).Because halting is still required in this process (similar toSGS technologies), the time to sequence a single nucleotide ishigh, and the read lengths realized are !32 nucleotides long.However, given the SMS nature of this technology, no PCRis required for sequencing, a significant advantage over SGStechnologies. However, also due to the single-molecule natureof this technology (and all of the SMS technologies), the rawread error rates are generally at or .5%, although the highlyparallel nature of this technology can deliver high fold coverageand a consensus or finished read accuracy of .99%. This tech-nology is capable of sequencing an entire human genome, albeitat significant cost by today’s standards (roughly $50 000 inreagents) (28). It can follow roughly one billion individualDNA molecules as they are sequenced over the course ofmany days. Unlike SGS, these many hundreds of millions ofsequencing reactions can be carried out asynchronously, a hall-mark of TGS. Further, given individual monitoring of tem-plates, the enzymatic incorporation step does not need to be

driven to completion, which serves to reduce the overallmis-incorporation error rate. As with the other TGS technol-ogies discussed below, deletions and insertions are a significantissue.

The sample preparation part of this technology involvesfragmenting genomic DNA into smaller pieces, adding a 3’poly(A) tail to the fragments, labeling and blocking by term-inal transferase. These templates are then captured onto asurface with covalently bound 5’ dT(50) oligonucleotides viahybridization (25). The surface is then imaged using charge-coupled device (CCD) sensors, where those templates thathave been appropriately captured are identified and thentracked for SBS. The process then resembles the‘wash-and-scan’ steps of SGS in which a labeled nucleotideand polymerase mixture are flooded onto the system and incu-bated for a period of time, the surface is then washed toremove the synthesis mixture and scanned to detect the fluor-escent label. The dye–nucleotide linker is then cleaved torelease the dye, and this process is repeated.

Not only can this technology be used to sequence DNA,but the DNA polymerase can be replaced with a reversetranscriptase enzyme to sequence RNA directly (29),without requiring the conversion of RNA to cDNA orwithout the need for ligation/amplification steps, somethingall existing SGS technologies require for RNA sequencing(5). Instead, each RNA molecule is polyadenylated and3’-blocked and captured on a surface coated with dT(50)oligonucleotides, similar to the DNA sequencing process.Sequencing is then carried out as described for DNA, butusing reverse transcriptase instead of DNA polymerase. In

Table 1. Comparison of first-generation sequencing, SGS and TGS

First generation Second generationa Third generationa

Fundamental technology Size-separation of specifically end-labeled DNA fragments, produced bySBS or degradation

Wash-and-scan SBS SBS, by degradation, or direct physicalinspection of the DNA molecule

Resolution Averaged across many copies of theDNA molecule being sequenced

Averaged across many copies of theDNA molecule being sequenced

Single-molecule resolution

Current raw read accuracy High High ModerateCurrent read length Moderate (800–1000 bp) Short, generally much shorter than

Sanger sequencingLong, 1000 bp and longer in

commercial systemsCurrent throughput Low High ModerateCurrent cost High cost per base Low cost per base Low-to-moderate cost per base

Low cost per run High cost per run Low cost per run

RNA-sequencing method cDNA sequencing cDNA sequencing Direct RNA sequencing and cDNAsequencing

Time from start of sequencingreaction to result

Hours Days Hours

Sample preparation Moderately complex, PCR amplificationnot required

Complex, PCR amplificationrequired

Ranges from complex to very simpledepending on technology

Data analysis Routine Complex because of large datavolumes and because short readscomplicate assembly andalignment algorithms

Complex because of large data volumesand because technologies yield newtypes of information and new signalprocessing challenges

Primary results Base calls with quality values Base calls with quality values Base calls with quality values,potentially other base informationsuch as kinetics

aThere are many TGS technologies in development but few have been reduced to practice. While there is significant potential of TGS to radically improve currentthroughput and read-length characteristics (among others), the ultimate practical limits of these technologies remain to be explored. Furthermore, there is activedevelopment of SGS technologies that will also improve read-length and throughput characteristics.

R230 Human Molecular Genetics, 2010, Vol. 19, Review Issue 2

by guest on Novem

ber 7, 2015http://hm

g.oxfordjournals.org/D

ownloaded from

Table Comparing First, Second, and Third Generation Sequencing

Table 1. Schadt E, et al (2010). Human Molecular Genetics 19(2): 227-240.

Third-generation Sequencing Single-molecule real-time sequencing — PacBio

• Observes single molecule of DNA polymerase synthesize a strand of DNA using zero-mode waveguide technology

• Pros: fast turn-around time, requires minimal amounts of reagents/sample preparation, no PCR amplification needed, >1000bp average read lengths

• Cons: >5% read error rates, low throughput

Third-generation Sequencing (cont.)

Single-molecule real-time sequencing — PacBio

Third-Generation Sequencing:Despite the technological differences, the three categories of second-

generation sequencing techniques have similar workflows for thesample preparation and analysis [21]. They also harness the need of anamplification step for template DNA as the NGS techniques are notdesigned to detect single fluorescent events [12]. PCR amplification isassociated with PCR bias and has the possibility of base sequenceerrors or favoring certain sequences over the others. These biases canbe avoided if a single molecule is used for sequencing without a prioramplification step. Also, the data generated by NGS techniques ismassive, approximately 300 G bases or more by Illumina HiSeq 2000instrument. The time-to-result is also long (many days) due to thelarge number of scanning and washing cycles required. Loss ofsynchronicity with addition of each nucleotide is another disadvantageof the technique and may lead to noise and second-generationsequencing errors [22] and short read lengths. The sequencingbiochemistry, configuration and generation of array varies for secondand third-generation sequencing techniques, few of which arediscussed below.

Single-molecule sequencing (SMS)

SMS also known as single-template technology provides severaladvantages over second-generation sequencing. Two devices, PacificBiosciences Single Molecule Real-time (SMRTTM) sequencing and theHelicos Biosciences true Single Molecule Sequencing (tSMS) were theproviders of the first commercial 3G instruments. The techniquesutilize the sequencing by synthesis approach, similar to a few of theNGS techniques but differ by not requiring amplification and hence,reduce the sequencing errors due to amplification as in NGS, reducingcompositional bias [18] producing long sequences and supporting ashort run-around time. The technique uses a DNA polymerase to drivethe reaction and is based on real-time imaging of fluorescent labelednucleotides as they are synthesized along the template DNA molecules[23]. This imaging is performed by dense array of zero-modewaveguide (ZMW) nanostructures that allow optical interrogation ofsingle fluorescent molecules (Figure 2). Single functioning DNApolymerase is immobilized at the bottom of each ZMW to processfluorescently labeled nucleotide substrates. The four bases (G, C, T, A)are differentially labeled with deoxyribonucleoside pentaphosphate(DN5Ps) substrates. The fluorescent substrate is linked to thephosphate chain rather than the base and therefore the phosphatechain is cleaved when the nucleotide is incorporated into the DNAstrand. Thus, on the incorporation of the phospholinked nucleotide,DNA polymerase frees the substrate molecule from the nucleotidewhen it cleaves the phosphate chain. The label is quickly removed anddoes not halt the DNA polymerase activity; which halts afterincorporating few base-labeled nucleotides. This sequencing techniquegenerated long read lengths of ~1000 bp in 2009. The recentPacBio®RSII sequencer (Pacific Biosciences, California) released in2013 can generate 8.5 kb reads by combining the P5 DNA polymerasewith C3 chemistry (P5-C3), with the longest reads exceeding 30,000bases. This SMRT sequencer was reported to be least biased and goodcoverage in extreme GC content (both GC-rich and GC-poor)compared to Illumina and Ion Torrent sequencers [24]. Although theSMRT technique have many advantages over the NGS techniques, anumber of challenges still remain such as excess of 5% error rates byinsertions and deletions when assembling genomes [25].

Figure 2: Principle of single molecule-real time sequencing. A) Asingle molecule of DNA template-DNA polymerase complex isimmobilized at the bottom of ZMW illuminated by laser light fromthe bottom. THE ZMW enables detection of the each incorporatedphospholinked nucleotide by the polymerase against the bulkbackground of nucleotides. B) Schematic representation of thephospholinked dNTP incorporation cycle, with a correspondingexpected time trace of detected fluorescence intensity from theZMW. (1) Cognate association of a phospholinked nucleotide withthe template in the polymerase active site, (2) Increasedfluorescence output on the corresponding color channel. (3)Formation of a phosphodiester bond liberates the dye-linker-pyrophosphate product that diffuses out of the ZMW, and ends thefluorescence pulse. (4) Translocation of the polymerase to the nextposition, and (5) binding of the next cognate molecule to the activesite, thereby beginning the subsequent pulse (Figure reproducedfrom Eid et al with permission [24]).

The tSMS technique is slightly different from the SMRT by using aflourophore tagged DNA polymerase which in proximity to anucleotide, tagged with an acceptor fluorophore, emits a fluorescenceresonance energy transfer (FRET) signal. The fluorophore label isreleased after incorporation. These two 3G techniques, SMRT andtSMS have a series of advantages compared to the second-generationsuch as higher throughput, longer read lengths to enhance de novoassembly, direct detection of haplotype, whole-chromosome phasing,higher consensus accuracy of identifying rare variants, less samplerequirement, making it useful for clinical application. The tSMS stillretains many characteristics of second-generation sequencers such asthe sequencing approach and chemistry, but the ability to performdirect RNA-sequencing signifies its clear improvement over second-generation techniques.

Transmission electron microscopy (TEM) offered by HalcyonMolecular is another SMS based technique that images andchemically detects atoms comprising DNA templates [26].Scanning tunneling microscopy can detect DNA basesaccording to specific electronic differences among the fourbases. Although these ideas appear to be straightforward, ithas yet to make a long journey due to its challenges to

Citation: Srinivasan S, Batra J (2014) Four Generations of Sequencing- Is it ready for the Clinic Yet?. Next Generat Sequenc & Applic 1: 107.doi:10.4172/jngsa.1000107

Page 4 of 8

Next Generat Sequenc & ApplicISSN:jngsa JNGSA, an open Access Journal

Volume 1 • Issue 1 • 1000107

Figure 2. Srinivasan S, Batra J (2014). Next Generat Sequenc & Applic 1: 107.

Third-generation Sequencing (cont.) Single molecule sequencing — Transmission

electron microscope (TEM) by Halcyon Molecular • Directly images and chemically detects atoms

that identify the nucleotides comprising a DNA template

• DNA bases are detected with scanning tunneling microscopy

• Challenges: requires preparation of stretched ssDNA on a surface, expensive microscopes are needed, and requires more specific equipment

Third-generation Sequencing (cont.)

TEM-based DNA sequencing using electron microscopy — ZS Genetics

• Directly visualizes DNA by labeling atoms within the nucleotides using annular dark-field scanning transmission electron microscopy (ADF-STEM)

• Promises >10kb base reads at a rate of 1.7 billion bases a day

• Challenges: reduce label losses, increase detectable differences between DNA base types, and increase the fraction of bases labeled within a molecule

Third-generation Sequencing (cont.) TEM-based DNA sequencing using electron

microscopy — ZS Genetics

reactions sequence-specific because polymerase reactionsare intrinsically sequence specific.

METHODS

The “test pattern” DNA was built from a synthetic gene~provided by DNA 2.0, Menlo Park, CA, USA! with a 3,072base-pair segment with all the thymines of one strand in a

repeating pattern, . . . TNTNNNNNNNNN . . . , where T rep-resents thymine and N represents any of the other threenucleobases. This pseudo-repeating region was amplifiedusing flanking priming sites and standard polymerase chainreaction methods, with one standard primer and one biotin-ylated primer. The product was purified by centrifugationfilter and bound to Dynabeads. Single-stranded DNA wasobtained by denaturation, and the template strand was thenused as template in a one-primer, one-cycle polymerizationreaction using Bst polymerase standard reaction conditions,replacing 1 mM of dTTP with 1.5 mM CH3-Hg-S-dUTP~Livingston et al., 1976!. The DNA product was gel purified.Final concentration and buffer exchange was done on cen-trifugation filter. The efficacy of label inclusion was testedwith restriction enzymes, which were seen not to react withmodified recognition sites ~Banfalvi & Sarkar, 1995!, con-firming the presence of modifications at those sites. TheDNA was also assayed by inductively coupled plasma massspectrometry, which confirmed the presence of mercury.Single stranded M13 and primers were processed in thesame manner.

Mercury-labeled DNA was deposited and aligned on anamorphous carbon film on a 400 mesh Au transmissonelectron microscopy ~TEM! grid using a method similar toBensimon et al. ~1994!. The sample was vacuum dried for2 min, then placed immediately into the STEM apparatus.

TEM imaging was conducted on FEI T-12 TEM ~FEICompany, Hillsboro, OR! at 80 kV. ADF-STEM imagingwas performed by an aberration-corrected STEM, Carl ZeissLibra 200-80kV ~Carl Zeiss, Oberkochen, Germany! withCs ! "1.2 mm, 80 kV with elastic scattering using the

Figure 1. Heavy atoms labels detected within DNA molecules.a: Schematic showing heavy atoms deflecting portion of the rasterscanned electron beam. Highly deflected electrons are detected onthe ADF detector. b: Unlabeled DNA bases scatter fewer electronsthan the heavy-atom-labeled bases, distinguished by detector current.

Figure 2. Heavy-atom-labeling strategy. A single stranded templateis primed with a complementary oligonucleotide primer. For simplic-ity, the lengths of the primer and the template have been shortened.In the presence of polymerase, the template directs the synthesis of acomplementary strand. Thymine deoxyribose nucleotide triphos-phates in the primer extension reaction have been completelyreplaced with a heavy-atom-modified analog. Consequently, the re-sulting double-stranded DNA molecule is modified with heavy at-oms on the thymine bases of the synthetic strand. These heavy atomsprovide signal to the dark-field detector of a STEM system.

Figure 3. DNA alignment. a: Bright-field TEM image of multipleDNA molecules linearized on amorphous carbon surface. b: Dark-field STEM image of linearized DNA molecule on thin amorphouscarbon substrate.

1050 David C. Bell et al.

reactions sequence-specific because polymerase reactionsare intrinsically sequence specific.

METHODS

The “test pattern” DNA was built from a synthetic gene~provided by DNA 2.0, Menlo Park, CA, USA! with a 3,072base-pair segment with all the thymines of one strand in a

repeating pattern, . . . TNTNNNNNNNNN . . . , where T rep-resents thymine and N represents any of the other threenucleobases. This pseudo-repeating region was amplifiedusing flanking priming sites and standard polymerase chainreaction methods, with one standard primer and one biotin-ylated primer. The product was purified by centrifugationfilter and bound to Dynabeads. Single-stranded DNA wasobtained by denaturation, and the template strand was thenused as template in a one-primer, one-cycle polymerizationreaction using Bst polymerase standard reaction conditions,replacing 1 mM of dTTP with 1.5 mM CH3-Hg-S-dUTP~Livingston et al., 1976!. The DNA product was gel purified.Final concentration and buffer exchange was done on cen-trifugation filter. The efficacy of label inclusion was testedwith restriction enzymes, which were seen not to react withmodified recognition sites ~Banfalvi & Sarkar, 1995!, con-firming the presence of modifications at those sites. TheDNA was also assayed by inductively coupled plasma massspectrometry, which confirmed the presence of mercury.Single stranded M13 and primers were processed in thesame manner.

Mercury-labeled DNA was deposited and aligned on anamorphous carbon film on a 400 mesh Au transmissonelectron microscopy ~TEM! grid using a method similar toBensimon et al. ~1994!. The sample was vacuum dried for2 min, then placed immediately into the STEM apparatus.

TEM imaging was conducted on FEI T-12 TEM ~FEICompany, Hillsboro, OR! at 80 kV. ADF-STEM imagingwas performed by an aberration-corrected STEM, Carl ZeissLibra 200-80kV ~Carl Zeiss, Oberkochen, Germany! withCs ! "1.2 mm, 80 kV with elastic scattering using the

Figure 1. Heavy atoms labels detected within DNA molecules.a: Schematic showing heavy atoms deflecting portion of the rasterscanned electron beam. Highly deflected electrons are detected onthe ADF detector. b: Unlabeled DNA bases scatter fewer electronsthan the heavy-atom-labeled bases, distinguished by detector current.

Figure 2. Heavy-atom-labeling strategy. A single stranded templateis primed with a complementary oligonucleotide primer. For simplic-ity, the lengths of the primer and the template have been shortened.In the presence of polymerase, the template directs the synthesis of acomplementary strand. Thymine deoxyribose nucleotide triphos-phates in the primer extension reaction have been completelyreplaced with a heavy-atom-modified analog. Consequently, the re-sulting double-stranded DNA molecule is modified with heavy at-oms on the thymine bases of the synthetic strand. These heavy atomsprovide signal to the dark-field detector of a STEM system.

Figure 3. DNA alignment. a: Bright-field TEM image of multipleDNA molecules linearized on amorphous carbon surface. b: Dark-field STEM image of linearized DNA molecule on thin amorphouscarbon substrate.

1050 David C. Bell et al.

Figure 1a, 3b. Bell David, et al. 2012. Microscopy and Microanalysis 18(5):1049-1053.

Third-generation Sequencing (cont.) Nanopore Sequencing — Oxford Nanopore

• Single DNA molecules pass through a nanopore chamber

• Uses microwells within an array chip for sample preparation, detection, and analysis

• Predictions: Billon DNA bases in ~6 hours for ~800 €

• Nanopore sequencing variations: exonuclease-assisted, NanoTag-SBS, Optipore, sequencing by electronic tunneling, among others

Third-generation Sequencing (cont.) Nanopore Sequencing — Oxford Nanopore

Wang et al. The evolution of nanopore sequencing

FIGURE 2 | Schematic illustration of a nanopore sequencing device. (A)A U-tube supports the lipid bilayer membrane bathed in electrolyte solution inwhich a 120 mV bias is applied. During DNA translocation through thenanopore, ionic current is recorded by a PCA connected to the cis (negative)and trans (positive) chambers. (B) When an ssDNA molecule traverses

through α-HL from cis to trans chamber, the open pore current drops to fourdifferent levels (Ib: current blockage), each for a certain time (τ: dwell time).(C) Current signals could reveal sequence information (Deamer and Branton,2002; Bayley, 2006). Reproduced by copyright permissions of AmericanChemical Society and Elsevier.

that MspA pore had great potential to sequence DNA (Derringtonet al., 2010). The feature of single recognition site of MspA poreseems to be more advantageous than α-HL (Manrao et al., 2011).This group replaced negatively-charged amino acids with neutralasparagine residues in pore’s constriction site, and with posi-tively charged basic residues in pore’s entrance through geneticengineering (Butler et al., 2008), which enabled easy DNA cap-ture and deceleration of DNA translocation through the pore.They later demonstrated that the engineered MspA pore exhibitedbetter base resolution than α-HL pore by generating larger sig-nal difference between bases (Derrington et al., 2010). However,development of new methods is needed to avoid signal overlap-ping between different bases, particularly deoxynucleotides ade-nine and guanine (Derrington et al., 2010; Manrao et al., 2011),to increase accuracy. For precise SBR, the length of the recog-nition region of a nanopore shouldn’t be larger than ∼0.5-nm,equivalent to phosphorus-phosphorus distance of a nucleotide(or base spacing) in an ssDNA strand (Wilson et al., 2009; Cherf

et al., 2012). The constriction region of MspA is about 0.6 nmlong (Manrao et al., 2012), which means signal interference fromadjacent bases (Manrao et al., 2011, 2012; Laszlo et al., 2013,2014). Their previous work showed that about four bases togetheraround the constriction region contribute to the overall currentblockage (Manrao et al., 2011, 2012). Recently, they have resortedto tetramer maps, which are the standard electric signal curvescollected by measuring current blockage signals when each of thecombination of 256 possible 4-mers is translocating through thepore, and algorithms to circumvent this problem (Laszlo et al.,2014).

Single base resolution by graphene and other solid-statenanopores. Besides α-HL, researchers have long been investigat-ing solid-state nanopores, with an initial hope to tackle problemsin the protein nanopores, such as instability and dimension-tuning difficulties. In 2001, Golovchenko, Branton and colleaguesdemonstrated that nanopores as small as 1.8 nm in diameter

www.frontiersin.org January 2015 | Volume 5 | Article 449 | 3

Figure 2. Wang Y, et al. 2015. Frontiers in Genetics 5(449):1-20.

Third-generation Sequencing (cont.) TGS Applications

• Gene expression analysis • ChIP-seq • De novo genome sequencing • Metagenomics • Non-invasive prenatal testing • Disease gene identification • Pharmacogenomics

All applications require a minimal amount of reads of a pre-defined length to be able to formulate proper conclusions

Conclusions Sequencing techniques have progressed greatly within a short

time span • Higher throughput • Longer read lengths • More economical — closer to the $100 genome

DNA sequencing is not the only utility with TGS technologies TGS has many applications — will they be added to the

clinical diagnostic setting?

Must have collaboration between scientists and informatics to keep up with the large amount of data generated

References Schadt E, Turner S, Kasarskis A (2010). A Window into Third-Generation Sequencing. Human Molecular Genetics 19(2): 227-240.

Srinivasan S, Batra J (2014). Four Generations of Sequencing- Is it ready for the Clinic Yet? Next Generat Sequenc & Applic 1: 107.

Bell D, et al (2012). DNA Base Identification by Electron Microscopy. Microscopy and Microanalysis 18(5):1049-1053.

Wang Y, Yang Q, Wang (2015). The Evolution of Nanopore Sequencing. Frontiers in Genetics 5(449):1-20.

Buermans H, den Dunnen J (2014). Next Generation Sequencing Technology: Advances and Applications. Biochimica et Biophysica Acta 1842:1932-1941.

Bahassi E, Stambrook P (2014). Next-generation Sequencing Technologies: Breaking the Sound Barrier of Human Genetics. Mutagenesis 29(5):303-310.

Thank you for listening!

Questions?