dna sequencing

39
DNA sequencing The term DNA sequencing refers to methods for determining the order of the nucleotide bases, adenine , guanine , cytosine , and thymine , in a molecule of DNA . The first DNA sequences were obtained by academic researchers, using laborious methods based on 2-dimensional chromatography in the early 1970s. Following the development of dye -based sequencing methods with automated analysis, DNA sequencing has become easier and orders of magnitude faster. Knowledge of DNA sequences of genes and other parts of the genome of organisms has become indispensable for basic research studying biological processes, as well as in applied fields such as diagnostic or forensic research. The advent of DNA sequencing has significantly accelerated biological research and discovery. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of the human genome , in the Human Genome Project . Related projects, often by scientific collaboration across continents, have generated the complete DNA sequences of many animal, plant, and microbial genomes. DNA Sequence Trace RNA sequencing, which is technically easier to perform than DNA sequencing, was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2 , identified and published by Walter Fiers and his coworkers at the University of Ghent (Ghent , Belgium ), between 1972 [1] and 1976. [2] Prior to the development of rapid DNA sequencing methods in the early 1970s by Frederick Sanger at the University of Cambridge , in England and Walter Gilbert and Allan Maxam at Harvard , [3] [4] a number of laborious methods were used. For instance, in 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. [5] The chain-termination method developed by Sanger and coworkers in 1975 soon became the method of choice, owing to its relative ease and reliability. [6] [7] ] Maxam-Gilbert sequencing In 1976-1977, Allan Maxam and Walter Gilbert developed a DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases. [3] Although Maxam and Gilbert published their chemical sequencing method two years after the ground-breaking paper of Sanger and

Upload: mymobile502

Post on 19-Nov-2014

991 views

Category:

Documents


0 download

TRANSCRIPT

DNA sequencing

The term DNA sequencing refers to methods for determining the order of the nucleotide bases, adenine, guanine, cytosine, and thymine, in a molecule of DNA. The first DNA sequences were obtained by academic researchers, using laborious methods based on 2-dimensional chromatography in the early 1970s. Following the development of dye-based sequencing methods with automated analysis, DNA sequencing has become easier and orders of magnitude faster. Knowledge of DNA sequences of genes and other parts of the genome of organisms has become indispensable for basic research studying biological processes, as well as in applied fields such as diagnostic or forensic research. The advent of DNA sequencing has significantly accelerated biological research and discovery. The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of the human genome, in the Human Genome Project. Related projects, often by scientific collaboration across continents, have generated the complete DNA sequences of many animal, plant, and microbial genomes.

DNA Sequence Trace

RNA sequencing, which is technically easier to perform than DNA sequencing, was one of the earliest forms of nucleotide sequencing. The major landmark of RNA sequencing is the sequence of the first complete gene and the complete genome of Bacteriophage MS2, identified and published by Walter Fiers and his coworkers at the University of Ghent (Ghent, Belgium), between 1972[1] and 1976.[2]

Prior to the development of rapid DNA sequencing methods in the early 1970s by Frederick Sanger at the University of Cambridge, in England and Walter Gilbert and Allan Maxam at Harvard,[3][4] a number of laborious methods were used. For instance, in 1973, Gilbert and Maxam reported the sequence of 24 basepairs using a method known as wandering-spot analysis. [5]

The chain-termination method developed by Sanger and coworkers in 1975 soon became the method of choice, owing to its relative ease and reliability.[6][7]

] Maxam-Gilbert sequencing

In 1976-1977, Allan Maxam and Walter Gilbert developed a DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases.[3] Although Maxam and Gilbert published their chemical sequencing method two years after the ground-breaking paper of Sanger and Coulson on plus-minus sequencing,[6][8] Maxam-Gilbert sequencing rapidly became more popular, since purified DNA could be used directly, while the initial Sanger method required that each read start be cloned for production of single-stranded DNA. However, with the improvement of the chain-termination method (see below), Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity prohibiting its use in standard molecular biology kits, extensive use of hazardous chemicals, and difficulties with scale-up.

The method requires radioactive labelling at one end and purification of the DNA fragment to be sequenced. Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). Thus a series of labelled fragments is generated, from the radiolabelled end to the first 'cut' site in each molecule. The fragments in the four reactions are arranged side by side in gel electrophoresis for size separation. To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabelled DNA fragment, from which the sequence may be inferred.

Also sometimes known as 'chemical sequencing', this method originated in the study of DNA-protein interactions (footprinting), nucleic acid structure and epigenetic modifications to DNA, and within these it still has important applications.

Chain-termination methods

Part of a radioactively labelled sequencing gel

Because the chain-terminator method (or Sanger method after its developer Frederick Sanger) is more efficient and uses fewer toxic chemicals and lower amounts of radioactivity than the method of Maxam and Gilbert, it rapidly became the method of choice. The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators.

The classical chain-termination method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, radioactively or fluorescently labeled nucleotides, and modified nucleotides that terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) which are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, thus terminating DNA strand extension and resulting in various DNA fragments of varying length.

The newly synthesized and labeled DNA fragments are heat denatured, and separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C); the DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. In the image on the right, X-ray film was exposed to the gel, and the dark bands correspond to DNA fragments of different lengths. A dark band in a lane indicates a DNA fragment that is the result of chain termination after incorporation of a dideoxynucleotide (ddATP, ddGTP, ddCTP, or ddTTP). The relative positions of the different bands among the four lanes are then used to read (from bottom to top) the DNA sequence.

DNA fragments are labeled with a radioactive or fluorescent tag on the primer (1), in the new DNA strand with a labeled dNTP, or with a labeled ddNTP. (click to expand)

Technical variations of chain-termination sequencing include tagging with nucleotides containing radioactive phosphorus for radiolabelling, or using a primer labeled at the 5’ end with a fluorescent dye. Dye-primer sequencing facilitates reading in an optical system for faster and more economical analysis and automation. The later development by Leroy Hood and coworkers [9][10] of fluorescently labeled ddNTPs and primers set the stage for automated, high-throughput DNA sequencing.

Sequence ladder by radioactive sequencing compared to fluorescent peaks (click to expand)

Chain-termination methods have greatly simplified DNA sequencing. For example, chain-termination-based kits are commercially available that contain the reagents needed for sequencing, pre-aliquoted and ready to use. Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.

[edit] Dye-terminator sequencing

Capillary electrophoresis (click to expand)

Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which with different wavelengths of fluorescence and emission. Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing. Its limitations include dye effects due to differences in the incorporation of the dye-labelled chain terminators into the DNA fragment, resulting in unequal peak heights and shapes in the electronic DNA sequence trace chromatogram after capillary electrophoresis (see figure to the right). This problem has been addressed with the use of modified DNA polymerase enzyme systems and dyes that minimize incorporation variability, as well as methods for eliminating "dye blobs". The dye-terminator sequencing method, along with automated high-throughput DNA sequence analyzers, is now being used for the vast majority of sequencing projects.

[edit] Challenges

Common challenges of DNA sequencing include poor quality in the first 15-40 bases of the sequence and deteriorating quality of sequencing traces after 700-900 bases. Base calling software typically gives an estimate of quality to aid in quality trimming.

In cases where DNA fragments are cloned before sequencing, the resulting sequence may contain parts of the cloning vector. In contrast, PCR-based cloning and emerging sequencing technologies based on pyrosequencing often avoid using cloning vectors.

[edit] Automation and sample preparation

View of the start of an example dye-terminator read (click to expand)

Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a day. DNA sequencers carry out capillary electrophoresis for size separation, detection and recording of dye fluorescence, and data output as fluorescent peak trace chromatograms. Sequencing reactions by thermocycling, cleanup and re-suspension in a buffer solution before loading onto the sequencer are performed separately. A number of commercial and non-commercial software packages can trim low-quality DNA traces automatically. These programs score the quality of each peak and remove low-quality base peaks (generally located at the ends of the sequence). The accuracy of such algorithms is below visual examination by a human operator, but sufficient for automated processing of large sequence data sets.

[edit] Large-scale sequencing strategies

Current methods can directly sequence only relatively short (300-1000 nucleotides long) DNA fragments in a single reaction.[11] The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide.

Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping DNA regions.(click to expand)

Large-scale sequencing aims at sequencing very long DNA pieces, such as whole chromosomes. Common approaches consist of cutting (with restriction enzymes) or shearing (with mechanical forces) large DNA fragments into shorter DNA fragments. The fragmented DNA is cloned into a DNA vector, and amplified in Escherichia coli. Short DNA fragments purified from individual bacterial colonies are individually sequenced and assembled electronically into one long, contiguous sequence. This method does not require any pre-existing information about the sequence of the DNA and is referred to as de novo sequencing. Gaps in the assembled sequence may be filled by primer walking. The different strategies have different tradeoffs in speed and accuracy; shotgun methods are often used for sequencing large genomes, but its assembly is complex and difficult, particularly with sequence repeats often causing gaps in genome assembly.

New sequencing methods

High-throughput sequencing

The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once.[12][13] High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods.

In vitro clonal amplification

Molecular detection methods are not sensitive enough for single molecule sequencing, so most approaches use an in vitro cloning step to amplify individual DNA molecules. Emulsion PCR isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. Polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Emulsion PCR is used in the methods by Marguilis et al. (commercialized by 454 Life Sciences), Shendure and Porreca et al. (also known as "polony sequencing") and SOLiD sequencing, (developed by Agencourt, now Applied Biosystems).[14][15][16] Another method for in vitro clonal amplification is bridge PCR, where fragments are amplified upon primers attached to a solid surface. The single-molecule method developed by Stephen Quake's laboratory (later commercialized by Helicos) skips this amplification step, directly fixing DNA molecules to a surface.[17]

Parallelized sequencing

DNA molecules are physically bound to a surface, and sequenced in parallel.Sequencing by synthesis, like dye-termination electrophoretic sequencing, uses a DNA polymerase to determine the base sequence. Reversible terminator methods (used by Illumina and Helicos) use reversible versions of dye-terminators, adding one nucleotide at a time, detect fluorescence at each position in real time, by repeated removal of the blocking group

to allow polymerization of another nucleotide. Pyrosequencing (used by 454) also uses DNA polymerization, adding one nucleotide species at a time and detecting and quantifying the number of nucleotides added to a given location through the light emitted by the release of attached pyrophosphates.[14][18]

Sequencing by ligation

This enzymatic sequencing method uses a DNA ligase to determine the target sequence.[15][16][19] Used in the polony method and in the SOLiD technology, it uses a pool of all possible oligonucleotides of a fixed length, labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position.

Microfluidic Sanger Sequencing

In microfluidic Sanger sequencing the entire thermocycling amplification of DNA fragments as well as their separation by electrophoresis is done on a single chip (approximately 100 cm in diameter) thus reducing the reagent usage as well as cost.[citation needed] In some instances researchers[who?] have shown that they can increase the through-put of conventional sequencing through the use of microchips.[citation needed] Research will still need to be done in order to make this use of technology effective.

Other sequencing technologies

Sequencing by hybridization is a non-enzymatic method that uses a DNA microarray. A single pool of DNA whose sequence is to be determined is fluorescently labeled and hybridized to an array containing known sequences. Strong hybridization signals from a given spot on the array identifies its sequence in the DNA being sequenced.[20] Mass spectrometry may be used to determine mass differences between DNA fragments produced in chain-termination reactions.[21]

DNA sequencing methods currently under development include labeling the DNA polymerase,[22] reading the sequence as a DNA strand transits through nanopores,[23][24] and microscopy-based techniques, such as AFM or electron microscopy that are used to identify the positions of individual nucleotides within long DNA fragments (>5,000 bp) by nucleotide labeling with heavier elements (e.g., halogens) for visual detection and recording.[25]

In October 2006, the X Prize Foundation established an initiative to promote the development of full genome sequencing technologies, called the Archon X Prize, intending to award $10 million to "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome."[26]

Sanger sequencing

Part of a radioactively labelled sequencing gel

In chain terminator sequencing (Sanger sequencing), extension is initiated at a specific site on the template DNA by using a short oligonucleotide 'primer' complementary to the template at that region. The oligonucleotide primer is extended using a DNA polymerase, an enzyme that replicates DNA. Included with the primer and DNA polymerase are the four deoxynucleotide bases (DNA building blocks), along with a low concentration of a chain terminating nucleotide (most commonly a di-deoxynucleotide). Limited incorporation of the chain terminating nucleotide by the DNA polymerase results in a series of related DNA fragments that are terminated only at positions where that particular nucleotide is used. The fragments are then size-separated by electrophoresis in a slab polyacrylamide gel, or more commonly now, in a narrow glass tube (capillary) filled with a viscous polymer.

View of the start of an example dye-terminator read (click to expand)

An alternative to the labelling of the primer is to label the terminators instead, commonly called 'dye terminator sequencing'. The major advantage of this approach is the complete sequencing set can be performed in a single reaction, rather than the four needed with the labeled-primer approach. This is accomplished by labelling each of the dideoxynucleotide chain-terminators with a separate fluorescent dye, which fluoresces at a different wavelength. This method is easier and quicker than the dye primer approach, but may produce more uneven data peaks (different heights), due to a template dependent difference in the incorporation of the large dye chain-terminators. This problem has been significantly reduced with the introduction of new enzymes and dyes that minimize incorporation variability.

This method is now used for the vast majority of sequencing reactions as it is both simpler and cheaper. The major reason for this is that the primers do not have to be separately labelled (which can be a significant expense for a single-use custom primer), although this is less of a concern with frequently used 'universal' primers.

Sanger Method for DNA Sequencing

DNA sequencing, first devised in 1975, has become a powerful technique in molecular biology, allowing analysis of genes at the nucleotide level. For this reason, this tool has been applied to many areas of research.

For example, the polymerase chain reaction (PCR), a method which rapidly produces numerous copies of a desired piece of DNA, requires first knowing the flanking sequences of this piece. Another important use of DNA sequencing is identifying restriction sites in plasmids. Knowing these restriction sites is useful in cloning a foreign gene into the plasmid. Before the advent of DNA sequencing, molecular biologists had to sequence proteins directly; now amino acid sequences can be determined more easily by sequencing a piece of cDNA and finding an open reading frame. In eukaryotic gene expression, sequencing has allowed researchers to identify conserved sequence motifs and determine their importance in the promoter region. Furthermore, a molecular biologist can utilize sequencing to identify the site of a point mutation. These are only a few examples illustrating the way in which DNA sequencing has revolutionized molecular biology.

Dideoxynucleotide sequencing represents only one method of sequencing DNA. It is commonly called Sanger sequencing since Sanger devised the method. This technique utilizes 2',3'-dideoxynucleotide triphospates (ddNTPs), molecules that differ from deoxynucleotides by the having a hydrogen atom attached to the 3' carbon rather than an OH group. (Figure 1). These molecules terminate DNA chain elongation because they cannot form a phosphodiester bond with the next deoxynucleotide.

In order to perform the sequencing, one must first convert double stranded DNA into single stranded DNA. This can be done by denaturing the double stranded DNA with NaOH. A Sanger reaction consists of the following: a strand to be sequenced (one of the single strands which was denatured using NaOH), DNA primers (short pieces of DNA that are both complementary to the strand which is to be sequenced and radioactively labelled at the 5' end), a mixture of a particular ddNTP (such as ddATP) with its normal dNTP (dATP in this case), and the other three dNTPs (dCTP, dGTP, and dTTP). The concentration of ddATP should be 1% of the concentration of dATP. The logic behind this ratio is that after DNA polymerase is added, the polymerization will take place and will terminate whenever a ddATP is incorporated into the growing strand. If the ddATP is only 1% of the total concentration of dATP, a whole series of labeled strands will result (Figure 1). Note that the lengths of these strands are dependent on the location of the base relative to the 5' end.

This reaction is performed four times using a different ddNTP for each reaction. When these reactions are completed, a polyacrylamide gel electrophoresis (PAGE) is performed. One reaction is loaded into one lane for a total of four lanes (Figure 2). The gel is transferred to a nitrocellulose filter and autoradiography is performed so that only the bands with the radioactive label on the 5' end will appear. In PAGE, the shortest fragments will migrate the farthest. Therefore, the bottom-most band indicates that its particular dideoxynucleotide was added first to the labeled primer. In Figure 2, for example, the band that migrated the farthest was in the ddATP reaction mixture. Therefore, ddATP must have been added first to the primer, and its complementary base, thymine, must have been the base present on the 3' end of the sequenced strand. One can continue reading in this fashion. Note in Figure 2 that if one reads the bases from the bottom up, one is reading the 5' to 3' sequence of the strand complementary to the sequenced strand. The sequenced strand can be read 5' to 3' by reading top to bottom the bases complementary to the those on the gel.

Figure 1. This figure shows the structure of a dideoxynucleotide (notice the H atom attached to the 3' carbon). Also depicted in this figure are the ingredients for a Sanger reaction. Notice the different lengths of labeled strands produced in this reaction.

Figure 2. This figure is a representation of an acrylamide sequencing gel. Notice that the sequence of the strand of DNA complementary to the sequenced strand is 5' to 3' ACGCCCGAGTAGCCCAGATT while the sequence of the sequenced strand, 5' to 3', is AATCTGGGCTACTCGGGCGT.

Figure 7-29. Sanger (dideoxy) method for sequencing DNA fragments. (a) A single strand of the DNA to be sequenced (blue line) is hybridized to a 5′-end-labeled synthetic deoxyribonucleotide primer. The primer is elongated in four separate reaction mixtures containing the four normal deoxyribonucleoside triphosphates (dNTPs) plus one of the four dideoxyribonucleoside triphosphates (ddNTPs) in a ratio of 100 to 1. A ddNTP molecule can add at the position of the corresponding normal dNTP, but when this occurs, chain elongation stops because the ddNTP lacks a 3′ hydroxyl. In time, each reaction mixture will contain a mixture of prematurely terminated chains ending at every occurrence of the ddNTP (yellow). (b) Three of the labeled chains that would be generated in the presence of ddGTP from the specific DNA sequence shown in blue. (c) An actual autoradiogram of a polyacrylamide gel in which more than 300 bases can be read. Each reaction was carried out in duplicate using Sequenase™, a commercial preparation of the DNA polymerase from bacteriophage T7. [Part (c) courtesy of United States Biochemical Corporation.]

Microfluidic Sanger Sequencing

The completion of the Human Genome Project (HGP) has been a cornerstone in the advancement of biological studies. The outcomes of obtaining a complete reference map (including the sequence) of the human genome have ushered in the post-genome era of studies. Genomics will (if it hasn’t already) revolutionize medicine, forensics, molecular biology, biotechnology, and many other related and even unrelated disciplines in the future.[1][2]

Sequencing of DNA has largely been based on dideoxy chain termination developed by Sanger et al. [3]. However, the ability of the HGP in obtaining the full human genomic sequence meant that modifications were

required to be made to this method. In particular, the incorporation of technological innovation, making sequencing automated and high-throughput, made this decade-long worldwide effort successful [4].

Briefly, in its modern inception, high-throughput genome sequencing (also referred to as Whole Genome Shot-gun Sequencing) involves fragmenting the genome into small single-stranded pieces, followed by amplification of the fragments by Polymerase Chain Reaction (PCR). Adopting the Sanger method, each DNA fragment is irreversibly terminated with the incorporation of a fluorescently labeled dideoxy chain-terminating nucleotide, thereby producing a DNA “ladder” of fragments that each differ in length by one base and bear a base-specific fluorescent label at the terminal base. Amplified base ladders are then separated by Capillary Array Electrophoresis (CAE) with automated, in situ “finish-line” detection of the fluorescently labeled ssDNA fragments, which provides an ordered sequence of the fragments. These sequence reads are then computer assembled into overlapping or contiguous sequences (termed "contigs") which resemble the full genomic sequence once fully assembled.[5]

Rapid technological developments have now emerged as a result of the HGP. In particular Massively Parallel Sequencing approaches such as those now in wide commercial use (Illumins/Solexa, Roche/454 Pyrosequencing, and ABI SOLiD) are proving to be attractive tools for sequencing.

Typically MPS methods can only obtain short read lengths (35bp with Illumina platforms to a maximum of 200-300bp by 454 Pyrosequencing). Sanger Methods on the other hand achieve read lengths of approximately 800bp (typically 500-600bp with non-enriched DNA). The longer read lengths in Sanger methods display significant advantages over MPS tools especially in terms of sequencing repetitive regions of the genome. A challenge of short-read sequence data is particularly an issue in sequencing new genomes (de novo) and in sequencing highly rearranged genome segments, typically those seen of cancer genomes or in regions of chromosomes that exhibit structural variation.[6]

Microfluidic Sanger Sequencing

Microfluidic Sanger sequencing is a lab-on-a-chip application for DNA sequencing, in which the Sanger sequencing steps (thermal cycling, sample purification, and capillary electrophoresis) are integrated on a wafer-scale chip using nanoliter-scale sample volumes. This technology generates long and accurate sequence reads, while obviating many of the significant shortcomings of the conventional Sanger method (e.g. high consumption of expensive reagents, reliance on expensive equipment, personnel-intensive manipulations, etc.) by integrating and automating the Sanger sequencing steps.

Applications of Microfluidic Sequencing Technologies

Other useful applications of DNA sequencing include single nucleotide polymorphism (SNP) detection, single-strand conformation polymorphism (SSCP) hetroduplex analysis, and short tandem repeat (STR) analysis. Resolving DNA fragments according to differences in size and/or conformation is the most critical step in studying these features of the genome[5].

Device design

A microfluidic sequencing chip developed by Richard Mathies and colleagues (University of California, Berkeley)[7].

The sequencing chip has a four-layer construction, consisting of three 100-mm-diameter glass wafers (on which device elements are microfabricated) and a polydimethylsiloxane (PDMS) membrane. Reaction chambers and capillary electrophoresis channels are etched between the top two glass wafers, which are thermally bonded. Three-dimensional channel interconnections and microvalves are formed by the PDMS and bottom manifold glass wafer.

The device consists of three functional units, each corresponding to the Sanger sequencing steps. The Thermal Cycling (TC) unit is comprised of a 250-nanoliter reaction chamber with integrated resistive temperature

detector, microvalves, and a surface heater. Movement of reagent between the top all-glass layer and the lower glass-PDMS layer occurs through 500-μm-diameter via-holes. After thermal-cycling, the reaction mixture undergoes purification in the capture/purification chamber, and then is injected into the capillary electrophoresis (CE) chamber. The CE unit consists of a 30-cm capillary which is folded into a compact switchback pattern via 65-μm-wide turns.

Sequencing chemistry

Thermal cycling

In the TC reaction chamber, dye-terminator sequencing reagent, template DNA, and primers are loaded into the TC chamber and thermal-cycled for 35 cycles ( at 95°C for 12 seconds and at 60°C for 55 seconds).

Purification

The charged reaction mixture (containing extension fragments, template DNA, and excess sequencing reagent) is conducted through a capture/purification chamber at 30°C via a 33-Volts/cm electric field applied between capture outlet and inlet ports. The capture gel through which the sample is driven, consists of 40 μM of oligonucleotide (complementary to the primers) covalently bound to a polyacrylamide matrix. Extension fragments are immobilized by the gel matrix, and excess primer, template, free nucleotides, and salts are eluted through the capture waste port. The capture gel is heated to 67-75°C to release extension fragments.

Capillary electrophoresis

Extension fragments are injected into the CE chamber where they are electrophoresed through a 125-167-V/cm field.

Platforms

The Apollo 100 platform (Microchip Biotechnologies Inc., Dublin, CA)[8] integrates the first two Sanger sequencing steps (thermal cycling and purification) in a fully automated system. The manufacturer claims that samples are ready for capillary electrophoresis within three hours of the sample and reagents being loaded into the system. The Apollo 100 platform requires sub-microliter volumes of reagents.

Comparisons to other sequencing techniques

The ultimate goal of high-throughput sequencing is to develop systems that are low-cost, and extremely efficient at obtaining extended (longer) read lengths. Longer read lengths of each single electrophoretic separation, substantially reduces the cost associated with de novo DNA

The Sanger Method

By Sarah Obenrader

_____________________________________________

This web page was produced as an assignment for an undergraduate course at Davidson College.

Background Information

DNA sequencing enables us to perform a thorough analysis of DNA because it provides us with the most basic information of all: the sequence of nucleotides. With this knowledge, for example, we can locate regulatory and gene sequences, make comparisons between homologous genes across species and identify mutations. Scientists recognized that this could potentially be a very powerful tool, and so there was competition to create a method that would sequence DNA. Then in 1974, two methods were independently developed by an American team and an English team to do exactly this. The Americans, lead by Maxam and Gilbert, used a “chemical cleavage protocol”, while the English, lead by Sanger, designed a procedure similar to the natural process of DNA replication. Even though both teams shared the 1980 Nobel Prize, Sanger’s method became the standard because of its practicality (Speed, 1992).

Sanger’s method, which is also referred to as dideoxy sequencing or chain termination, is based on the use of dideoxynucleotides (ddNTP’s) in addition to the normal nucleotides (NTP’s) found in DNA. Dideoxynucleotides are essentially the same as nucleotides except they contain a hydrogen group on the 3’ carbon instead of a hydroxyl group (OH). These modified nucleotides, when integrated into a sequence, prevent the addition of further nucleotides. (Speed, 1992).This occurs because a phosphodiester bond cannot form between the dideoxynucleotide and the next incoming nucleotide, and thus the DNA chain is terminated.

The Method

Before the DNA can be sequenced, it has to be denatured into single strands using heat. Next a primer is annealed to one of the template strands. This primer is specifically constructed so that its 3' end is located next to the DNA sequence of interest. Either this primer or one of the nucleotides should be radioactively or fluorescently labeled so that the final product can be detected on a gel (Russell, 2002). Once the primer is

attached to the DNA, the solution is divided into four tubes labeled "G", "A", "T" and "C". Then reagents are added to these samples as follows:

"G" tube: all four dNTP's, ddGTP and DNA polymerase

"A" tube: all four dNTP's, ddATP and DNA polymerase

"T" tube: all four dNTP's, ddTTP and DNA polymerase

"C" tube: all four dNTP's, ddCTP and DNA polymerase

As shown above, all of the tubes contain a different ddNTP present, and each at about one-hundreth the concentration of the the normal precursors (Russell, 2002). As the DNA is synthesized, nucleotides are added on to the growing chain by the DNA polymerase. However, on occasion a dideoxynucleotide is incorporated into the chain in place of a normal nucleotide, which results in a chain-terminating event. For example if we looked at only the "G" tube, we might find a mixture of the following products:

Figure 1: An example of the potential fragments that could be produced in the "G" tube. The fragments are all different lengths due to the random integration of the ddGTP's (Metzenberg).

The key to this method, is that all the reactions start from the same nucleotide and end with a specific base. Thus in a solution where the same chain of DNA is being synthesized over and over again, the new chain will terminate at all positions where the nucleotide has the potential to be added because of the integration of the dideoxynucleotides (Russell, 2002). In this way, bands of all different lengths are produced. Once these reactions are completed, the DNA is once again denatured in preparation for electrophoresis. The contents of each of the four tubes are run in separate lanes on a polyacrylmide gel in order to separate the different sized bands from one another. After the contents have been run across the gel, the gel is then exposed to either UV light or X-Ray, depending on the method used for labeling the DNA.

Figure 2: This is a polyacrylmide gel of the reactions in the "G" tube (the same sequences seen in figure 1). The longer fragments of DNA traveled shorter distances than the smaller fragments because of their heavier molecular weight.The blue section indicates the primer, the black section indicates the newly synthesized strand and the red denotes a ddGTP, which terminated the chain (Metzenberg).

As shown in Figure 2, smaller fragments are produced when the ddNTP is added closer to the primer because the chains are smaller and therefore migrate faster across the gel. If all of the reactions from the four tubes are combined on one gel, the actual DNA sequence in the 5' to 3' direction can be determined by reading the banding pattern from the bottom of the gel up. It is important to remember though that this sequence is complementary to the template strand from the beginning.

Figure 3: This is an autoradiogram of a dideoxy sequencing gel. The letters over the lanes indicate which dideoxy nucleotide was used in the sample being represented by that lane. When you read from the bottom up, you are reading the complementary sequence of the template strand (Metzenberg).

Automated Sequencing

With the many advancements in technology that we have achieved since 1974, it is no surprise that the Sanger method has become outdated. However, the new technology that has emerged to replace this method is based on the same principles of Sanger's method. Automated sequencing has been developed so that more DNA can be sequenced in a shorter period of time. With the automated procedures the reactions are performed in a single tube containing all four ddNTP's, each labeled with a different color dye (Russell, 2002).

Figure 4: In automated sequencing, the oligonucleotide primers can be "end-labeled" with different color dyes, one for each ddNTP. These dyes fluoresce at different wavelengths, which are read via a machine (Metzenberg).

As in Sanger's method, the DNA is separated on a gel, but they are all run on the same lane as opposed to four different ones.

Figure 5: Results of gel electrophoresis for the dye labeled DNA in automated sequencing. The image on the left shows what the gel looks like if the four reactions are run in different lanes, as opposed to the image on the right which shows a gel where all the DNA is run in one lane (Metzenberg).

Since the four dyes fluoresce at different wavelengths, a laser then reads the gel to determine the identity of each band according to the wavelengths at which it fluoresces. The results are then depicted in the form of a chromatogram, which is a diagram of colored peaks that correspond to the nucleotide in that location in the sequence (Russell, 2002).

Figure 6: Results from an automated sequence shown in the form of a chromatogram. The colors represent the four bases: blue is C, green is A, black is G and red is T (Metzenberg).

Sanger Dideoxy Method

This method is based on DNA replication. DNA replication terminated at different sites will produce DNA fragments of variable lengths.

Controlled termination at specific sites can be achieved with the use of 2’,3’-dideoxy analog of the four nucleotides

                            

                            

These analogs lack the 3’- OH group required to form the next phosphodiester bond with the incoming nucleotide.

DNA replication terminates at the site where a dideoxy analog is incorporated.

          

            

DNA replication is performed in four separate tubes, each containing:

               i.         Single stranded DNA to be sequenced

              ii.         DNA polymerase

            iii.         Primers

            iv.         The four dNTPs (dATP, dCTP, dTTP and dGTP)

              v.         Small amount of one of the four 2’,3’-dideoxy analog (ddATP or ddCTP or ddTTP or ddGTP)

Either the primers or the dNTPs are radiolabeled with 32P .

       The amount of dideoxy analog added is small enough (~1% of total dNTP) that termination will occur only occasionally.

       The correct nucleotide will be inserted sometimes and the dideoxy analog other times.

       In this way, all possible DNA fragments will be produced.

               

              http://www.mcb.mcgill.ca/~hallett/GEP/Lecture15/Image31.gif

       The products of all four reactions will be separated by gel electrophoresis in four separate lanes.

       Polyacrylamide gels are used to separate fragments containing up to 1000bp.

       More porous agarose gels are used to resolve mixtures of larger fragments, up to 20kb.

       Smaller DNA fragments runs faster (towards the positive electrode as DNA is negatively charged) and appear at the bottom of the gel.

       The base sequence of the new DNA is read from the autoradiogram of the gel in 5’ 3’direction starting from the smallest fragment.

Fluorescent Detection of Oligonucleotides

       Fluorescent detection is a highly effective alternative for visualizing DNA.

       This method eliminates the use of radioactive reagents and can be readily automated.

       Either fluorescent-tagged terminators (dideoxy analogs) or florescent-tagged primers can be used.

       When using fluorescent-tagged terminators, each of the four dideoxy nucleotides should carry a tag with a different color.

       If fluorescent-tagged primer is the choice, the primers in each of the four separate mixtures should carry tags of different colors.

       The DNA mixture will be separated by gel electrophoresis.

       Lasers will be used to activate the fluorescent dideoxy analogs or primers and a detector to distinguish the colors.

       The last base incorporated into the DNA can be determined from the color of the detected DNA fragment.

                                

                            Adapted from: http://www.licor.com/bio/Images/IR2Schem.jpg

                               

                             http://www.dls.ym.edu.tw/ol_biology2/ultranet/FluorDideoxySeq.gif

New Developments

More robust and high-throughput methods are currently being developed to meet the need of whole genome projects, where millions of bases need to be sequenced.

One of the growing techniques developed is pyrosequencing.

Pyrosequencing is based on the detection of released pyrophosphate (PPi) during DNA synthesis.

DNA template can be immobilized on a solid phase and the four nucleotides are added in a stepwise fashione; only one of the four dNTP in the reaction mixture.

If the nucleotide is incorporated, PPi is released.

Subsequent addition of ATP sulfurylase converts PPi to ATP, which provides energy for luciferase to generate light.

The light is easily detected by a photodiode, photomultiplier tube, or a charge-coupled device camera (CCD) camera.

Because the added nucleotide is known, the sequence of nucleotide can be determined.

              

                                        Genome Research Vol. 11, Issue 1, 3-11, January 2001

The methods described in this chapter provide some useful approaches for DNA sequencing of templates produced by PCR. These procedures have been employed successfully for large-scale DNA sequencing of cosmid fragments subcloned in plasmid or M13 vectors, and for sequence analysis of cDNAs cloned in bacteriophage lambda vectors. In addition, the method describing direct sequencing from PEG-precipitated PCR product has been used successfully for analysis of Caenorhabditis elegans genomic and cDNA sequences. It is important to reiterate that for every combination of amplification primer pair and target DNA, there is an optimal method for PCR amplification; the ability to sequence the products of any PCR experiment directly will also vary. A coupled PCR/DNA sequencing method that works well for one experimental system may work

quite poorly with others. Hence, a few days or hours spent optimizing PCR amplification conditions and selecting the best DNA sequencing method for the target DNA of interest will be time well spent.

Radiolabeled sequencing gel preparation, loading, and electrophoresis (26,29)

To prepare polyacrylamide gels for DNA sequencing, the appropriate amount of urea is dissolved by heating in water and electrophoresis buffer, the respective amount of deionized acrylamide-bisacrylamide solution is added, and ammonium persulfate and TEMED are added to initiate polymerization. Immediately after the addition of the polymerizing agents, the gel solution is poured between two glass plates, taped together and separated by thin spacers corresponding to the desired thickness of the gel, taking care to avoid and eliminate air bubbles. Prior to taping, these glass plates are cleaned with Alconox detergent and hot water, are rinsed with double distilled water, and dried with a Kimwipe. Typically, the notched glass plate is treated with a silanizing reagent and then rinsed with double distilled water. After pouring, the gel immediately is laid horizontally and a well forming comb is inserted into the gel and held in place by metal clamps. The polyacrylamide gels are allowed to polymerize for at least 30 minutes prior to use. After polymerization, the comb and the tape at the bottom of the gel are removed. The vertical electrophoresis apparatus is assembled by clamping the top and bottom buffer wells onto the gel, and adding running buffer to the buffer chambers. The wells are cleaned by circulating buffer into the wells with a syringe and, immediately prior to the loading of each sample, the urea in each well is suctioned out with a mouth pipette.

Each base-specific sequencing reaction terminated with the short termination mix is loaded using a mouth pipette onto a 0.15 mm X 50 cm X 20 cm, denaturing 5% polyacrylamide gel and electrophoresed for 2.25 hours at 22 mA. The reactions terminated with the long termination mix typically are divided in half and loaded onto two 0.15 mm X 70 cm X 20 cm denaturing 4% polyacrylamide gels. One gel is electrophoresed at 15 mA for 8-9 hours and the other is electrophoresed for 20-24 hours at 15 mA. After electrophoresis, the glass plates are separated and the gel is blotted to Whatman paper, covered with plastic wrap, dried by heating on a Hoefer vacuum gel drier, and exposed to X-ray film. Depending on the intensity of the signal and whether the radiolabel is 32-P or 35-S, exposure times varied from 4 hours to several days. After exposure, the films are developed by processing in developer and fixer solutions, rinsed with water, and air dried. The autoradiogram then is placed on a light-box and the sequence is manually read and the data typed into a computer.

C. Taq-polymerase catalyzed cycle sequencing using fluorescent-labeled dye primers (10,26)

Each base-specific fluorescent-labeled cycle sequencing reaction routinely included approximately 100 or 200 ng Biomek isolated single-stranded DNA for A and C or G and T reactions, respectively. Double-stranded cycle sequencing reactions similarly contained approximately 200 or 400 ng of plasmid DNA, isolated using either the standard alkaline lysis or the diatomaceous earth modified alkaline lysis procedures. All reagents except template DNA are added in one pipetting step from a premix of previously aliquotted stock solutions stored at -20degC (see Appendix B). To prepare the reaction premixes, reaction buffer is combined with the base-specific nucleotide mixes. Prior to use, the base-specific reaction premixes are thawed and combined with diluted Taq DNA polymerase and the individual fluorescent end-labeled universal primers (see Appendix C) to yield the final reaction mixes, that are sufficient for 24 template samples.

Once the above mixes are prepared, four aliquots of single or double-stranded DNA are pipetted into the bottom of each 0.2 ml thin-walled reaction tube, corresponding to the A, C, G, and T reactions, and then an aliquot of the respective reaction mixes is added to the side of each tube. These tubes are part of a 96-tube/retainer set tray in a microtiter plate format, which fits into a Perkin Elmer Cetus Cycler 9600. Strip caps are sealed onto the tube/retainer set and the plate is centrifuged briefly. The plate then is placed in the cycler whose heat block had been preheated to 95deg C, and the cycling program immediately started. The cycling protocol consisted of 15-30 cycles of seven-temperatures:

95degC denaturation

55degC annealing

72degC extension

95degC denaturation

72degC extension

95degC denaturation, and

72degC extension, linked to a 4deg C final soak file.

At this stage, the reactions frequently are frozen and stored at -20degC for up to several days. Prior to pooling and precipitation, the plate is centrifuged briefly to reclaim condensation. The primer and base-specific reactions are pooled into ethanol, and the DNA is precipitated and dried. These sequencing reactions could be stored for several days at -20degC.

D. Taq-polymerase catalyzed cycle sequencing using fluorescent-labeled dye terminator reactions

One of the major problems in DNA cycle sequencing is that when fluorescent primers (1) are used the reaction conditions are such that the nested fragment set distribution is highly dependent upon the template concentration in the reaction mix. We have recently observed that the nested fragment set distribution for the DNA cycle sequencing reactions using the fluorescent labelled terminators (8) is much less sensitive to DNA concentration than that obtained with the fluorescent labelled primer reactions as described above. In addition, the fluorescent terminator reactions require only one reaction tube per template while the fluorescent labelled primer reactions require one reaction tube for each of the four terminators. This latter point allows the fluorescent labelled terminator reactions to be pipetted easily in a 96 well format. The protocol used, as described below, is easily interfaced with the 96 well template isolation and 96 well reaction clean-up procedures also described herein. By performing all three of these steps in a 96 well format, the overall procedure is highly reproducable and therefore less error prone.

E. Sequenase[TM] catalyzed sequencing with dye-labeled terminators (29-32)

Single-stranded dye-terminator reactions required approximately 2 ug of phenol extracted M13-based template DNA. The DNA is denatured and the primer annealed by incubating DNA, primer, and buffer at 65degC. After the reaction cooled to room temperature, alpha-thio-deoxynucleotides, fluorescent-labeled dye-terminators, and diluted Sequenase[TM] DNA polymerase are added and the mixture is incubated at 37degC. The reaction is stopped by adding ammonium acetate and ethanol, and the DNA fragments are precipitated and dried. To aid in the removal of unincorporated dye-terminators, the DNA pellet is rinsed twice with ethanol. The dried sequencing reactions could be stored up to several days at -20degC.

Double-stranded dye-terminator reactions required approximately 5 ug of diatomaceous earth modified-alkaline lysis midi-prep purified plasmid DNA. The double-stranded DNA is denatured by incubating the DNA in sodium hydroxide at 65degC, and after incubation, primer is added and the reaction is neutralized by adding an acid-buffer. Reaction buffer, alpha-thio-deoxynucleotides, fluorescent-labeled dye-terminators, and diluted Sequenase[TM] DNA polymerase then are added and the reaction is incubated at 37degC. Ammonium acetate is added to stop the reaction and the DNA fragments similarly are precipitated, rinsed, dried, and stored.

F. Fluorescent-labeled sequencing gel preparation, pre-electrophoresis, sample loading, electrophoresis, data collection, and analysis on the ABI 373A DNA sequencer

Polyacrylamide gels for fluorescent DNA sequencing are prepared as described above except that the gel mix is filtered prior to polymerization. Optically-ground, low fluorescence glass plates are carefully cleaned with hot water, distilled water, and ethanol to remove potential fluorescent contaminants prior to taping. Denaturing 6% polyacrylamide gels are poured into 0.3 mm X 89 cm X 52 cm taped plates and fitted with 36 well forming combs. After polymerization, the tape and the comb are removed from the gel and the outer surfaces of the glass plates are cleaned with hot water, and rinsed with distilled water and ethanol. The gel is assembled into an ABI sequencer, and the checked by laser-scanning. If baseline alterations are observed on the ABI-associated Macintosh computer display, the plates are recleaned. Subsequently, the buffer wells are attached, electrophoresis buffer is added, and the gel is pre-electrophoresed for 10-30 minutes at 30 W.

Prior to sample loading, the pooled and dried reaction products are resuspended in formamide/EDTA loading buffer by vortexing and then heated at 90degC. A sample sheet is created within the ABI data collection software on the Macintosh computer which indicated the number of samples loaded and the fluorescent-labeled

mobility file to use for sequence data processing. After cleaning the sample wells with a syringe, the odd-numbered sequencing reactions are loaded into the respective wells using a micropipettor equipped with a flat-tipped gel-loading tip. The gel then is electrophoresed for 5 minutes before the wells are cleaned again and the even numbered samples are loaded. The filter wheel used for dye-primers and dye-terminators is specified on the ABI 373A CPU, also where electrophoresis conditions are adjusted. Typically electrophoresis and data collection are for 10 hours at 30W on the ABI 373A that is fitted with a heat-distributing aluminum plate in contact with the outer glass gel plate in the region between the laser stop and the sample loading wells (26).

After data collection, an image file is created by the ABI software which related the fluorescent signal detected to the corresponding scan number. The software then determined the sample lane positions based on the signal intensities. After the lanes are tracked, the cross-section of data for each lane are extracted and processed by baseline subtraction, mobility calculation, spectral deconvolution, and time correction. On the Macintosh computer, the collected data can be viewed in several formats. The overall graphics image of the gel can be displayed to assess the accuracy of lane tracking, and the data from each sample lane can be viewed as either a four-color raw fluorescent signal versus scan number, as a chromatogram of processed sequence data, or as a string of nucleotides. After processing, the sequence data files are transferred to a SPARCstation 2 using NFS Share.

G. Double-stranded sequencing of cDNA clones containing long poly(A) tails using anchored poly(dT) primers

Sequencing double stranded DNA templates has become a common and efficient procedure (10) for rapidly obtaining sequence data while avoiding preparation of single stranded DNA. Double stranded templates of cDNAs containing long poly(A) tracts are difficult to sequence with vector primers which anneal downstream of the poly(A) tail. Sequencing with these primers results in a long poly(T) ladder followed by a sequence which is difficult to read. In an attempt to solve this problem we synthesized three primers which contain (dT)17 and either (dA) or (dC) or (dG) at the 3' end. We reasoned that the presence of these three bases at the 3' end would 'anchor' the primers at the upstream end of the poly(A) tail and allow sequencing of the region immediately upstream of the poly(A) region.

Using this protocol, over 300 bp of readable sequence could be obtained. We have applied this approach to several other poly(A)-containing cDNA clones with similar results. Sequencing of the opposite strand of these cDNAs using insert-specific primers occurred directly upstream of the poly(A) region. The ability to directly obtain sequence immediately upstream from the poly(A) tail of cDNAs should be of particular importance to large scale efforts to generate sequence-tagged sites (STSs) (11) from cDNAs (12,13).

H. cDNA sequencing based on PCR and random shotgun cloning

The following is a rapid and efficient method for sequencing cloned cDNAs based on PCR amplification (14), random shotgun cloning (1,3,15), and automated fluorescent sequencing (16). This method was developed in our laboratory because once the sequence of a genomic DNA containing cosmid is obtained and putative exons are predicted, the corresponding cDNAs should be sequenced in a timely manner. However, the presently implemented directed cDNA sequencing strategies, i.e. primer walking (17) and exonuclease III deletion (18), are both time consuming and labor intensive, while the alternative, i.e. randomly shearing the intact plasmid followed by shotgun sequencing (1,3,15), leads to a significant number of clones containing the original cDNA cloning vector rather than the desired cDNA insert.

This is a PCR-based approach where the "universal" forward and/or reverse priming sites were excluded from the resulting PCR product by choosing a primer pair that lay between the usual "universal" forward and reverse priming sites and the multiple cloning sites of the Stratagene Bluescript vector. These two PCR primers, with the sequence 5'-TCGAGGTCGACGGTATCG-3' for the forward or -16bs primer and 5'-GCCGCTCTAGAACTAG TG-3' for the reverse or +19bs primer, now have been used to amplify sufficient quantities of cDNA inserts in the 1.2 to 3.4 kb size range so that the random shotgun sequencing approach described below could be implemented.

Automated Fluorescent DNA Sequencing  -  Background

Automated fluorescent DNA sequencing using capillary DNA sequencing machines like the ABI/PRISM 3100 Genetic Analyzers in the CGC is based on the use of a different colored fluorescent dye for each of the four DNA bases. Attaching these dyes to the four dideoxynucleotide terminators used in the standard Sanger chain termination DNA sequencing reaction results in all the fragments ending in one of the four fluorescent dye-labeled terminator corresponding to the dideoxy bases. These fragments are then separated on a liquid denaturing gel pumped into each capillary. The fluorescence is detected as the fragments electrophorese through a transparent section of the capillary which runs in front of a CCD camera, while being excited with laser light. The use of four different dyes allows the sequencing reaction to be performed in a single tube and the resulting fragments to be loaded in a single well, resulting in greater sample capacity per “lane” as compared to what is possible with radioactive labeled fragments. The dyes can also be attached to the primer used to initiate the extension by a DNA polymerase. This method is used less frequently since dye-labeled primers are expensive to manufacture and four separate reactions must be set up for each sequence (they can still be separated in a single lane though). The resulting sequence is, however, generally cleaner than dye-terminator sequencing. The dye-terminator approach can be used with different types of DNA polymerases.

Sequenase (T7) polymerase has been a widely used enzyme for DNA sequencing with radioactive nucleotides because of its high processivity and low error rate. However using this type of enzyme for fluorescent DNA sequencing is not optimal because the amount of fluorescent DNA produced is low and thus a large amount of template is required to produce good fluorescent signals. The enzyme currently in general use for automated fluorescent DNA sequencing is a variant of Taq DNA polymerase. The thermostability of Taq polymerase allows the sequencing reactions to be carried out like PCR reactions, and the reactions are called "cycle sequencing" reactions. Cycle sequencing reactions are analogous to PCRs except only a single primer is used and only single stranded products are generated. The advantage of using the thermostable Taq polymerase over the use of Sequenase is that multiple rounds of sequencing can be performed without the need to add fresh enzyme. This allows the use of much less template DNA, making this the method of choice in the majority of circumstances. Modifications (point mutations affecting amino acid residues in or near the active site) to the Taq DNA polymerase have enabled it to incorporate the fluorescent dye-labeled terminators more evenly and efficiently, resulting in very even peak heights over a sequence read. Attempts have been made to develop polymerases which have some of the advantageous properties of Sequenase but are thermostable like Taq. One of these efforts is described in more detail at http://www.atp.nist.gov/eao/sp950-2/chapt3-2.htm   Another important innovation was the introduction of dideoxy terminators that use two fluorescent dye labels to take advantage of fluorescence resonance energy transfer (FRET) (i.e. ABI BigDye).  Taking advantage of FRET using the dual-labeled terminators has allowed improvement of signal intensity and spectral separation to the point where very small amounts of templates can be used. Fluorescent sequencing methods are now robust enough that sequencing large templates such as BACS, P1s, etc. that used to be impractical have become reasonably easy, and sequencing plasmids and PCR products is trivial.

A typical processed electropherogram generated using ABI BigDye v3.1 cycle sequencing chemistry and run on a 3100 is shown at right. The G bases are displayed in black, the A's in green, the C's in blue and the T's in red.

The picture below is a screen capture from the data collection software. You can see that the cycle sequencing reaction products from 16 wells of a 96-well plate are being electrophoresed simultaneously. The large pane at the bottom of the screen shows the fluorescent DNA bands which have been separated up to this point in each of

the 16 capillaries lined up next to each other. Since the 3100 is a 16 capillary machine, it takes 6 runs to process a full plate of samples.

Next Section

Abstract

DNA sequencing by synthesis (SBS) on a solid surface during polymerase reaction offers a paradigm to decipher DNA sequences. We report here the construction of such a DNA sequencing system using molecular engineering approaches. In this approach, four nucleotides (A, C, G, T) are modified as reversible terminators by attaching a cleavable fluorophore to the base and capping the 3′-OH group with a small chemically reversible moiety so that they are still recognized by DNA polymerase as substrates. We found that an allyl moiety can be used successfully as a linker to tether a fluorophore to 3′-O-allyl-modified nucleotides, forming chemically cleavable fluorescent nucleotide reversible terminators, 3′-O-allyl-dNTPs-allyl-fluorophore, for application in SBS. The fluorophore and the 3′-O-allyl group on a DNA extension product, which is generated by incorporating 3′-O-allyl-dNTPs-allyl-fluorophore in a polymerase reaction, are removed simultaneously in 30 s by Pd-catalyzed deallylation in aqueous buffer solution. This one-step dual-deallylation reaction thus allows the reinitiation of the polymerase reaction and increases the SBS efficiency. DNA templates consisting of homopolymer regions were accurately sequenced by using this class of fluorescent nucleotide analogues on a DNA chip and a four-color fluorescent scanner.

Shotgun sequencing

In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun.

Since the chain termination method of DNA sequencing can only be used for fairly short strands (100 to 1000 basepairs), longer sequences must be subdivided into smaller fragments, and subsequently re-assembled to give the overall sequence. Two principal methods are used for this: chromosome walking, which progresses through the entire strand, piece by piece, and shotgun sequencing, which is a faster but more complex process, and uses random fragments.

In shotgun sequencing [1] [2], DNA is broken up randomly into numerous small segments, which are sequenced using the chain termination method to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence [1].

Example

For example, consider the following two rounds of shotgun reads:

Strand Sequence

Original AGCATGCTGCAGTCATGCTTAGGCTA

First shotgun sequenceAGCATGCTGCAGTCATGCT--------------------------TAGGCTA

Second shotgun sequence AGCATG--------------------

------CTGCAGTCATGCTTAGGCTA

Reconstruction AGCATGCTGCAGTCATGCTTAGGCTA

In this extremely simplified example, none of the reads cover the full length of the original sequence, however, the four reads can be assembled into the original sequence using the overlap of their ends to align and order them. In reality, this process uses enormous amounts of information that are rife with ambiguities and sequencing errors. Assembly of complex genomes is additionally complicated by the great abundance of repetitive sequence, meaning similar short reads could come from completely different parts of the sequence.

Many overlapping reads for each segment of the original DNA are necessary to overcome these difficulties and accurately assemble the sequence. For example, to complete the Human Genome Project, most of the human genome was sequenced at 12X or greater coverage; that is, each base in the final sequence was present, on average, in 12 reads. Even so, current methods have failed to isolate or assemble reliable sequence for approximately 1% of the (euchromatic) human genome.

Whole genome shotgun sequencing

Whole genome shotgun sequencing for small (4000 to 7000 basepair) genomes was already in use in 1979 [1] broader application benefited from pairwise end sequencing, known colloquially as double-barrel shotgun sequencing. As sequencing projects began to take on longer and more complicated DNAs, multiple groups began to realize that useful information could be obtained by sequencing both ends of a fragment of DNA. Although sequencing both ends of the same fragment and keeping track of the paired data was more cumbersome than sequencing a single end of two distinct fragments, the knowledge that the two sequences were oriented in opposite directions and were about the length of a fragment apart from each other was valuable in reconstructing the sequence of the original target fragment. The first published description of the use of paired ends was in 1990 [3] as part of the sequencing of the human HPRT locus, although the use of paired ends was limited to closing gaps after the application of a traditional shotgun sequencing approach. The first theoretical description of a pure pairwise end sequencing strategy, assuming fragments of constant length, was in 1991[4]. At the time, there was community consensus that the optimal fragment length for pairwise end sequencing would be three times the sequence read length. In 1995 Roach et al.[5] introduced the innovation of using fragments of varying sizes, and demonstrated that a pure pairwise end-sequencing strategy would be possible on large targets. The strategy was subsequently adopted by The Institute for Genomic Research (TIGR) to sequence the genome of the bacterium Haemophilus influenzae in 1995 [6] , and then by Celera Genomics to sequence the drosophila melanogaster (fruit fly) genome in 2000 [7], and subsequently the human genome.

To apply the strategy, high-molecular-weight DNA is sheared into random fragments, size-selected (usually 2, 10, 50, and 150 kb), and cloned into an appropriate vector. The clones are then sequenced from both ends using the chain termination method yielding two short sequences. Each sequence is called an end-read or read and two reads from the same clone are referred to as mate pairs. Since the chain termination method usually can only produce reads between 500 and 1000 bases long, in all but the smallest clones, mate pairs will rarely overlap.

The original sequence is reconstructed from the reads using sequence assembly software. First, overlapping reads are collected into longer composite sequences known as contigs. Contigs can be linked together into scaffolds by following connections between mate pairs. The distance between contigs can be inferred from the mate pair positions if the average fragment length of the library is known and has a narrow window of deviation.

Coverage

Coverage is the average number of reads representing a given nucleotide in the reconstructed sequence. It can be calculated from the length of the original genome (G), the number of reads(N), and the average read length(L) as NL / G. For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2x redundancy. This parameter also enables one to estimate other quantities, such as the percentage of the genome covered by reads (the coverage). A high coverage in shotgun sequencing is desired because it can overcome errors in base calling and assembly. The subject of DNA sequencing theory addresses the relationships of such quantities.

Proponents of this approach argue that it is possible to sequence the whole genome at once using large arrays of sequencers, which makes the whole process much more efficient than more traditional approaches. Detractors argue that although the technique quickly sequences large regions of DNA, its ability to correctly link these regions is suspect, particularly for genomes with repeating regions. As sequence assembly programs become more sophisticated and computing power becomes cheaper, it may be possible to overcome this limitation[citation

needed].

Next-Generation Sequencing

Although shotgun sequencing was the most advanced technique for sequencing genomes from about 1995-2005, other technologies started surfacing, called next-generation sequencing. These technologies produce shorter reads (anywhere from 25-500bp) but many hundreds of thousands or million reads in a relatively short time (on the order of a day). This results in high coverage, but the assembly process is much more computationally expensive. These technologies are vastly superior to shotgun sequencing due to the high volume of data and the relatively short time it takes to sequence a whole genome. The major disadvantage is that the accuracies are usually lower (although this is compensated by the high coverage).