new generation sequencing technologies: an overview

45
Sequencing technologies – the next generation Paolo Dametto 30.08.2011

Upload: paolo-dametto

Post on 21-Nov-2014

1.205 views

Category:

Technology


1 download

DESCRIPTION

Adapted version of my technical Journal club presentation on the new sequencing technologies.

TRANSCRIPT

Page 1: New Generation Sequencing Technologies: an overview

Sequencing technologies –

the next generation

Paolo Dametto

30.08.2011

Page 2: New Generation Sequencing Technologies: an overview

1953: Discovery of the structure of the DNA double helix

Nobel prize in Physiology or Medicine 1962

Page 3: New Generation Sequencing Technologies: an overview

History of DNA sequencing

1953 Discovery of the structure of the DNA double helix

1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA.

1977 The first complete DNA genome to be sequenced is that of bacteriophage φX174

1977 Frederick Sanger publishes "DNA sequencing with chain-terminating inhibitors“

1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb.

1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370.

1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae

1995 Craig Venter, Hamilton Smith, and colleagues at The Institute for Genomic Research (TIGR) publish the first complete genome of a free-living organism, the bacterium Haemophilus influenzae. The circular chromosome contains 1,830,137 bases and its publication in the journal Science marks the first use of whole-genome shotgun sequencing, eliminating the need for initial mapping efforts.

1996 Pål Nyrén and his student Mostafa Ronaghi at the Royal Institute of Technology in Stockholm publish their method of pyrosequencing

1998 Phil Green and Brent Ewing of the University of Washington publish "phred" for sequencer data analysis.

2001 A draft sequence of the human genome is published

2004 454 Life Sciences markets a parallelized version of pyrosequencing.The first version of their machine reduced sequencing costs 6-fold compared to automated Sanger sequencing, and was the second of a new generation of sequencing technologies, after MPSS.

Page 4: New Generation Sequencing Technologies: an overview

Sanger sequencing: chain-terminating inhibitors

Page 5: New Generation Sequencing Technologies: an overview

A breakthrough: fluorescent chain-terminating inhibitors

Automated DNA sequencer

• Capillary electrophoresis• Costs reduced by 90%• Human operation 15 min/day/machine • 1 million bp/day

3730x/ DNA analyzer

First generation DNA sequencer

• Manual preparation of acrylamide gels• Manual loading of samples• Contigs of 500-600 bp• 2.4 millions bp/year(1000 years needed to sequence the human genome)

ABI PRISM 377

Page 6: New Generation Sequencing Technologies: an overview

Next-generation sequencing (NGS):newer methods for DNA sequencing The potential of NGS technologies is akin to the early days of PCR, with one’s

imagination being the primary limitation of its use (Metzker ML, 2010, Nature review)

NGS platforms produce an enormous volume of data cheaply, so it expands the realm of experimentation beyond just determining the order of bases:

gene-expression studies (RNA-seq) identification of rare transcripts without prior knowledge of a particular gene alternative splicing identification

large-scale comparative and evolutionary studies

re-sequencing of human genomes to enhance our understanding of how genetic differences affect health and disease

Page 7: New Generation Sequencing Technologies: an overview

The variety of NGS features makes it likely that multiple platforms coexist in the marketplace, with some having clear advantages for particular applications over others

NGS differs in template preparation, sequencing and imaging, and data analysis

Commercially available technologies: Roche/454 Illumina/Solexa Helicos BioSciences Life/APG – SOLiD system Pacific Biosciences Ion Torrent technology

Experimental Nanopore sequencing

NGS technologies overview

Page 8: New Generation Sequencing Technologies: an overview

Roche/454 - Pyrosequencing

1. Emulsion-based sample preparation (emPCR)

Several thousandcopies of the sametemplate sequenceon each bead

on average 1.6 million wells

Page 9: New Generation Sequencing Technologies: an overview

2. Pyrosequencing: non-electrophoretic, bioluminescence method that measures the release of inorganic pyrophosphate by proportionally converting it into visible light using a series of enzymatic reaction

Roche/454 - Pyrosequencing

DNA polymerase

(DNA)n + dNTP (DNA)n+1 + PPi

Nucleotide incorporation generates light seen as a peakin the Pyrogram trace

Video http://www.youtube.com/watch?v=kYAGFrbGl6E

Page 10: New Generation Sequencing Technologies: an overview

Roche/454 - Pyrosequencing

3. Imaging Sequencing and de novo assembly of

the Mycoplasma genitalium genome

25 million bases in one four-hour run 96% coverage at 99.96% accuracy 100-fold increase in throughput over current

Sanger sequencing

Most of errors result from a broadening of signal distribution, particularly for large homopolymers (seven or more), leading to ambiguous base call

Future directions: increasing in throughput by miniaturization

of the fibre-optic reactors improvements to reduce cross-talking

between adjacent wells

Page 11: New Generation Sequencing Technologies: an overview

Over 1300 publications...

Roche/454 - Pyrosequencing

Applications Whole genome sequencing Targeted resequencing Sequencing-based Transcriptome Analysis Metagenomics

Page 12: New Generation Sequencing Technologies: an overview
Page 13: New Generation Sequencing Technologies: an overview

Illumina/Solexa

1. Solid-phase amplification can produce 100-200 million spatially separated clusters, providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction

Page 14: New Generation Sequencing Technologies: an overview

Sequencing by Cyclic Reversible Termination (CRT): CRT uses reversible terminators in a cyclic method that comprises nucleotide incorporation, fluorescence imaging and cleavage

1. a DNA polymerase, bound to the primed template, adds or incorporates just one fluorescently modified nucleotide

2. Unincorporated nucleotides are washed away and a four-color imaging is acquired by total internal reflection fluorescence (TIFR) using two laser

3. A cleavage step (TCEP, a reducing agent) removes the terminating group restoring the 3’-OH group and the fluorescent dye

Illumina/Solexa

Page 15: New Generation Sequencing Technologies: an overview

3. Imaging

Illumina/Solexa

Page 16: New Generation Sequencing Technologies: an overview

Paired reads are very powerful in all areas of the analysis because they provided very accurate read alignment and thus improved the accuracy and coverage of consensus sequence and SNP calling

Illumina/Solexa

Video http://www.youtube.com/watch?v=77r5p8IBwJk

Page 17: New Generation Sequencing Technologies: an overview

Applications DNA sequencing Gene Regulation Analysis Sequencing-based Transcriptome Analysis SNPs and SVs discovery Cytogenetic Analysis ChIP-sequencing Small RNA discovery analysis

Illumina/Solexa

A whole human genome sequence was determined in 8 weeks to an average depth of ~ 40X, discovering ~ 4 new million SNPs and ~400000 SVs (with an accuracy <1% for both over-calls and under-calls)

Considering the whole human genome sequencing as a clinical tool in the near future: unravel the complexities of human variation in cancer and other diseases, paving the way for the use of personal genome sequences in medicine and healthcare

1861 publications...

Page 18: New Generation Sequencing Technologies: an overview

Helicos BioSciences

The use of PCR is problematic for two reasons:1. PCR introduces an uncontrolled bias in template representation because its

efficiencies vary as a function of template properties

2. PCR introduces errors (generating false-positive SNPs)

Single-molecule sequencing has been developed to circumvent these problems

Page 19: New Generation Sequencing Technologies: an overview

1. Template preparation: one pass-sequencing

The library preparation process is simple and fast and does not require the use of PCR. It results in single-stranded poly(dA)-tailed templates

Poly(dT) oligonucleotides are covalently anchored to glass cover slip at random positions, and they are used to capture the template strands and as primers for sequencing

Helicos BioSciences

Page 20: New Generation Sequencing Technologies: an overview

Each cycle consists of:1. adding the polymerase and one

of the labeled nucleotide

2. rinsing, imaging of multiple positions

3. cleavage of the dye labels

224 cycles were performed to sequence the genome of the M13 virus to an average depth of >150X with 100% coverage

Helicos BioSciences2. Sequencing

Page 21: New Generation Sequencing Technologies: an overview

3. Imaging

Helicos BioSciences

The system showed higher error rates compared to the previous platforms, mostly due to multiple incorporations in the presence of homopolymers

The two-pass sequencing improved the overall quality

Page 22: New Generation Sequencing Technologies: an overview

Helicos BioSciences

Template preparation: two pass-sequencing

Page 23: New Generation Sequencing Technologies: an overview

ChIP-seq Goren, A et al. (2010). Chromatin profiling by

directly sequencing small quantities of immunoprecipitated DNA. Nat Methods 7, 47-49.

Methy-seq Pastor WA et al. (2011). Genome-wide mapping of

5-hydroxymethylcytosine in embryonic stem cells. Nature. May 19;473(7347):394-7. Epub 2011 May 8

Direct RNA sequencing Ozsolak, F et al. (2010). Comprehensive

polyadenylation site maps in yeast and human reveal pervasive alternative polyadenylation. Cell 143, 1018-1029.

cDNA-Based DGE, RNA-Seq and Small RNA Sequencing Ting, DT et al. (2011). Aberrant overexpression of

satellite repeats in pancreatic and other epithelial cancers. Science 331, 593-6.

Lipson, D et al. (2009). Quantification of the yeast transcriptome by single-molecule sequencing. Nat Biotechnol 27, 652-658.

Helicos BioSciences

Video http://www.youtube.com/watch?v=TboL7wODBj4

Page 24: New Generation Sequencing Technologies: an overview

Life/APG – SOLiD platform

Sequencing by ligation (SBL) uses another cyclic method that differs from CRT in its use of DNA ligase and a two-base-encoded probes

Life/APG has commercialized their SBL platform called support oligonucleotide ligation detection (SOLiD)

Page 25: New Generation Sequencing Technologies: an overview

Two-base-encoded probes: an oligonucleotide sequence in which two interrogation bases are associated with a particular dye (e.g. AA, CC, GG, TT are encoded with a blue dye) there are 16 possible combinations, each dye is

associated with 4

1,2-probes indicates that the first and second nucleotides are the interrogation bases. The remaining bases consist of either degenerate or universal bases

A phosphorothiolate linkage is present between the fifth and six nucleotides of the probe sequence, which is then cleaved with silver ions.

Life/APG – SOLiD platformSOLiD sequencing Chemistry

Page 26: New Generation Sequencing Technologies: an overview

1. Emulsion-based sample preparation (emPCR)

Life/APG – SOLiD platform

2. Chemical crosslinking to an amino-coated glass surface

Page 27: New Generation Sequencing Technologies: an overview

Life/APG – SOLiD platform

3. SBL protocol

Upon the annealing of a universal primer, a library of 1,2-probes is added. Ligation of complementary probes follows.

Four-color imaging

The ligated 1,2-probes are chemically cleaved with silver ions to generate a 5’-PO4 group

The SOLiD cycle is repeated 9 times

Page 28: New Generation Sequencing Technologies: an overview

The extended primer is then stripped and four more ligation rounds are performed, each with ten ligation cycles

3. SBL protocol

Life/APG – SOLiD platform

Page 29: New Generation Sequencing Technologies: an overview

Video http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related

Life/APG – SOLiD platform ChIP-seq

Chromatin immunoprecipitation sequencing (ChIP-Seq) on the SOLiD™ System Publication: Nature Methods, (2009)

Chromosome length influences replication-induced topological stress Publication: Nature (2011)

Methy-seq Increased methylation variation in

epigenetic domains across cancer types Publication: Nature Genetics (2011)

Metagenomics The carnivorous bladderwort (Utricularia,

Lentibulaiceae) a system inflates Publication: Journal of Experimental Botany (2010)

cDNA-Based DGE, RNA-Seq and Small RNA Sequencing Evolution of yeast noncoding RNAs

reveals an alternative mechanism for widespread Intron loss Publication: Science (2010)

Page 30: New Generation Sequencing Technologies: an overview

Pacific Biosciences

Page 31: New Generation Sequencing Technologies: an overview

Pacific Biosciences All the aforementioned methods use enzymatic activities and various

termination approaches, leading to short sequence reads (max. 350 bp)

Real-Time DNA sequencing wants to exploit the high catalytic rates and the high processivity of the DNA polymerase, using the latter as a real-time sequencing engine in order to obtain longer reads. To fully harness the intrinsic speed, fidelity, and processivity of the DNApol , several technical challenges must be met simultaneously:

The speed at which each polymerase synthesizes DNA exhibits stochastic fluctuation, so polymerases must be observed individually

A high nucleotide concentration is required, so a reduction in the observation volume which allow single-molecule detection is needed

DNApol has to work with 100% fluorescently labeled dNTPs

A surface chemistry is required to retain the activity of DNApol and inhibits nonspecific absorption of labeled dNTPs

Page 32: New Generation Sequencing Technologies: an overview

Pacific Biosciences Single Molecule Real Time (SMRT) DNA sequencing

The zero-mode waveguide (ZMW) design reduces the observation volume down to the zeptolitre range (10-21 l ), reducing the number of stray fluorescently labeled molecules that enter the detection layer for a given period

The residence time of phospholinked nucleotides in the active site is usually on the millisecond scale, and that correspond to a recorded fluorescence pulse

Page 33: New Generation Sequencing Technologies: an overview

Pacific Biosciences

Video http://www.youtube.com/watch?v=_B_cUZ8hSYU

Page 34: New Generation Sequencing Technologies: an overview

Pacific Biosciences An initial accuracy of the reading

was estimated at 83% at 1X. Common mistakes were insertion, deletion and mismatches. Up to 15X, the authors demonstrated

that the accuracy is >99%

In 2009, Pacific Biosciences reported improvements to their platform. E.Coli was sequenced at 38X covering 99.3% of the genome, with an accuracy of >99.999% average read length: 964 bp

Page 35: New Generation Sequencing Technologies: an overview

Comparison of next-generation sequencing platforms

Page 36: New Generation Sequencing Technologies: an overview

NGS technologies and personal genomes Human genome studies aim to catalogue SNPs and SVs and their

association to phenotypic differences, with the eventual goal of personalized genomics for medical purposes > Pharmacogenomics

Somatic mutations associated with acute myeloid leukemia have been identified using Illumina/Solexa (Ley T.J. et al. 2008 Nature)

Elucidation of both allelic variants in a family with a recessive form of Charcot-marie-Tooth disease using the SOLiD platform (Lupsky J.R. et al. in press N.Engl.J.Med.)

The Cancer Genome Atlas aims at discovering SNPs and SVs associated with major cancers (The Cancer Genome Atlas Research Network, 2011 Nature)

Beijing Genomics Institute (BGI) is working on the “1000 Plant & Animal Reference Genomes Project" aiming at generating reference genomes for 1,000 economically and scientifically important plant/animal species. They use Illumina/Solexa and SOLiD platforms

Page 37: New Generation Sequencing Technologies: an overview
Page 38: New Generation Sequencing Technologies: an overview

Sequencing services and the $1,000 genome Illumina announced a personal genome sequencing service that

provides 30-fold base coverage for the price of $48,000.

Complete Genomics offers a similar service with 40-fold coverage priced at $5,000. It is based on a business model that is reliant on huge customers volume. They use a newly optimized SBL protocol which uses a combinatorial probe anchor ligation (cPAL). Reagents: $4,400

The greatest challenge for current technology developers consists in closing the gap between $10,000 and $1,000 for a single genome. The timetable for the $1,000 draft genome is difficult to predict

Nanopore sequencing?

Page 39: New Generation Sequencing Technologies: an overview

Nanopore sequencing

The system uses the Staphylococcus auereus toxin α-hemolysin, a robust heptameric protein which normally forms holes in membranes.

DNA and RNA can be electrophoretically driven through a nanopore of suitable diameter (Kasianowicz J.J. et al 1996 PNAS)

Page 40: New Generation Sequencing Technologies: an overview

Nanopore sequencing – how does it work?

When a small voltage (~100 mV) is imposed across a nanopore in a membrane separating two chambers containing acqueous electrolytes, the ionic current through the pore can be measured

Molecules going through the nanopore cause disruption in the ionic current, and by measuring the disruption molecules can be identified.

Lipid bilayer with high electronic resistant

Ionic current

Hemolysin

Page 41: New Generation Sequencing Technologies: an overview

Nanopore – exonuclease sequencing

Exonuclease

Aminocycledextrin adaptor

DNA to be sequenced

Page 42: New Generation Sequencing Technologies: an overview

The DNA polymer passes through the nanopore itself

The nanopore is engineered to allow single-base resolution within the strand

A DNA polymerase, coupled with a α-hemolysin, synthesizes a new strand of DNA using as a template the polymer coming out of the pore

Video nanopore: http://www.youtube.com/watch?v=_rRrOT9gfpo&feature=related

Nanopore – strand sequencing

DNA Polymerase

Page 43: New Generation Sequencing Technologies: an overview

Nanopore sequencing Advantages

minimal sample preparation no requirement for polymerase or ligase potential of very long read-lengths ( > 10,000 – 50,000 nt ) it might well achieve the $1,000 per mammalian genome goal the instrument is inexpensive

Challenges to slow down DNA translocation from microseconds per base to milliseconds to reduce stochastic motion of the DNA molecule in transit in order to decrease

the signal/noise ratio a stable support for the hemolysin heptamer

Page 44: New Generation Sequencing Technologies: an overview

Ion torrent technology

http://lifetech-it.hosted.jivesoftware.com/videos/1016

Page 45: New Generation Sequencing Technologies: an overview