next generation sequencing: an overview

Next generation sequencing: an overview

A I BhatIndian Institute of Spices Research

Calicut

DNA sequencing

• Chain termination method (Sangers et al., 1977): In this method, the sequence of a single stranded DNA molecule is determined by enzymatic synthesis of complementary polynucleotide chains, these chains terminating at specific nucleotide positions.

• The chemical degradation method (Maxum and Gilbert, 1977), in which the sequence of a double stranded DNA molecule is determined by treatment with chemicals that cut the molecule at specific nucleotide positions

Chain termination method

Dye-terminator sequencing• Utilizes labelling of the chain terminator ddNTPs, which

permits sequencing in a single reaction• Each of the four dideoxynucleotide chain terminators is

labelled with different fluorescent dyes (ddA Green, ddT Red, ddG Yellow and ddC Blue), each of which with different wavelengths of fluorescence and emission.

• The fragment stopping at the base position can be detected on the gel by a powerful laser beam.

• Owing to its greater expediency and speed, dye-terminator sequencing is now the mainstay in automated sequencing.

Capillary electrophoresis

Sanger method can sequence only 1000–1200 bp in one reaction

View of dye-terminator read

Genome sequencing

1970s: Bacteriophage

1995, the bacterium Haemophilus influenzae

Followed by several other bacteria and archaea

The first eukaryotic chromosome sequence in 1992: yeast

Many eukaryotes several plants and their pathogens

2006: Human genome

Until 2006, all genome sequencing used Sanger chemistry

Shotgun sequencing

Human Genome Project

Genomic DNA is enzymatically or mechanically broken down

Cloned into sequencing vectors

Sequenced individually

Numerous fragments of DNA sequenced –BIRTH OF GENOME INFORMATICS AND NEXT GENERATION SEQUENCING

Whole genome sequencing

The core philosophy of massive parallel sequencing used in next-generation sequencing (NGS) is adapted from shotgun sequencing

NGS -breaking the entire genome into small pieces

Ligating DNA to designated adapters

DNA synthesis (sequencing-by-synthesis)

massively parallel sequencing

Coverage (number of short reads that overlap each other within a specific genomic region)

Sufficient coverage is critical for accurate assembly of the genomic sequence.

To ensure the correct identification of genetic variants, short-read coverage of at least 30× is recommended in whole-genome scans

(Zhang et al., 2011. J Genet Genomics, 38:95-109)

Next generation sequencing

• Enables a genome to be sequenced within hours to days.

• The 454 FLX Pyrosequencer from Roche Applied Sciences was the first next-generation sequencer to become commercially available in 2004,

• The Solexa 1G Genetic Analyzer from Illumina was commercialized 2006

• SOLiD (Supported Oligonucleotide Ligation and Detection) System from Applied Biosystems launched in 2007

Next-next generation or third generation sequencing

• Single molecule sequencing

http://www.roche.com/

http://www.illumina.com/

http://www.appliedbiosystems.com/

http://www.appliedbiosystems.com/

Technology Amplification Read length

Throughput Sequence by synthesis

Currently available Roche/GS-FLX Titanium Emulsion PCR 400-600

bp 500 Mbp/run

Pyrosequencing

Illumina/HiSeq 2000, HiScan Bridge PCR (Cluster PCR)

2 x 100 bp

200 Gbp/run

Reversible terminators

ABI/SOLiD 5500xl Emulsion PCR 50-100 bp

>100 Gbp/run

Sequencing-by-ligation (octamers)

Polonator/G.007 Emulsion PCR 26 bp 8-10 Gbp/run

Sequencing-by-ligation (monomers)

Helicos/Heliscope No 35 (25-55) bp

21e37 Gbp/run

True single-molecule sequencing (tSMS)

In development

Pacific BioSciences/RS No 1000 bp N/A Single-molecule real time (SMRT)

Visigen Biotechnologies No >100 Kbp

N/A

U.S. Genomics No N/A N/A Single-molecule mapping

Genovoxx No N/A N/A Single-molecule sequencing by synthesis

Oxford Nanopore Technologies No 35 bp N/A Nanopores/exonuclease-coupled

NABsys No N/A N/A Nanopores

Electronic BioSciences No N/A N/A Nanopores

Platforms on NGS technologies

BioNanomatrix/nano analyzer No 400 Kbp N/A Nanochannel arrays

GE Global Research No N/A N/A Closed complex/nanoparticle

IBM No N/A N/A Nanopores LingVitae No N/A N/A Nanopores Complete Genomics No 70 bp N/A DNA nanoball arrays base 4 innovation No N/A N/A Nanostructure arrays CrackerBio No N/A N/A Nanowells Reveo No N/A N/A Nano-knife edge Intelligent BioSystems No N/A N/A Electronics lLightSpeed Genomiics N/A Direct-read

sequencing by EM

Next (2nd) generation platforms

3130XL GS-FLX-Titanium Genome Analyser SOLiDApplied Biosystem Roche Illumina Applied Biosystem

700bpx96 400bp x1 million 100bp x 2 billion 50bp x 2.4 billion

Specific targets (PCR products,clones)

De novo sequencing Re-sequencing (can de novo sequencing)

Re-sequencing (can de novo sequencing)

Roche GS-FLX 454 Genome Sequencer

Longest short reads (600 bp) among all the NGS platforms

Generates ~400–600 Mb of sequence reads per run

de novo assembly of microbes in metagenomics

Raw base accuracy reported is very good (over 99%)

Chemistry

• Nucleotide incorporation releases pyrophosphate (PPi)

• ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5´ phosphosulfate.

• This ATP acts as fuel to the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP.

• The light produced in the luciferase-catalyzed reaction is detected by a camera and analyzed in a program.

• Unincorporated nucleotides and ATP are degraded by the apyrase, and the reaction can restart with another nucleotide.

http://en.wikipedia.org/wiki/Pyrophosphate

http://en.wikipedia.org/wiki/Adenosine_triphosphate

http://en.wikipedia.org/wiki/Apyrase

Illumina/Solexa Genome Analyzer

Superior data quality and proper read lengths have made it the system of choice for many genome sequencing projects.

Majority of published NGS papers used Genome Analyzer.

uses a proprietary reversible terminator-based method that enables detection of single bases as they are incorporated into growing DNA strands

A fluorescently-labeled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base.

Since all four reversible terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias.

The end result is true base-by-base sequencing that enables the industry’s most accurate data for a broad range of applications.

Solexa-based Whole Genome SequencingAdapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk

ABI SOLiD platform

The latest model, 5500×l solid system (previously known as SOLiD4hq)

Can generate over 2.4 billion reads per run with a raw base accuracy of 99.94%

The SOLiD4 platform probably provides the best data quality as a result of its sequencing-by-ligation approach but the DNA library preparation procedures prior to sequencing can be tedious and time consuming.

Preferred for Re-sequencing than denovo sequencing.

(Zhang et al., 2011)

Next generation sequencing using Roche 454

Sample Preparation

Nucleic acid isolation

Double-stranded cDNA synthesis

Rapid library preparationFragmentation (Nebulization/ shearing) into smaller sized

fragments of 400 to 1000 bp

Addition of adopters

Remove small fragment (<300 bp)

Library Quality Assessment

Emulsion based clonal amplification (emPCR)

•Preparation of reagents and of emulsion oil

•Preparation of amplification mix (addition of additive, amplification

mix, primers, enzyme mix and PPiase)

•DNA library capture (one molecule of DNA per bead and one bead per aqueous microreactor to be insulated from other beads by surrounding oil.

•Emulsification (shaking captured library to form a water–in-oil mixture)

•Amplification (emulsified beads are clonally amplified)

•Bead recovery and enrichment

Sequencing

Clonally amplified fragments loaded onto a PicoTiter Plate device for sequencing (diameter of Plate wells allow only one bead per well)

After addition of sequencing enzymes, fluidics subsystem of sequencing instrument flows individual nucleotides in a fixed order across all wells

Addition of one (or more) nucleotide(s) complementary to the template strand results in a chemiluminescent signal recorded by the CCD camera within the instrument

During nucleotide flow, thousands of beads each carrying millions of copies of ss DNA molecule are sequenced in parallel

Each 10-h sequencing run will typically produce over 1,000,000 flowgrams (one flowgram per bead)

Base calling (to check quality of each read)

Trimming primer sequence

Production of contigs

NGS platform under development (3rd Generation sequencers)

Aim single DNA molecule sequencing (without amplification)

Provides accurate data with long reads

i) Flouresence based single molecule sequencing (Pacific Biosciences; US Genomics)

ii) Nano technologies for single molecule sequencing (Oxford Nanopore technologies, Nabsys, BioNanomatrix, Electronic Biosciences, Cracker Bio)

iii) Electronic detection for single molecule sequencing (Reveo, Intelligent Biosystems)

iv) Electron microscopy for single molecule sequencing (Light speed genomics, Halcyon Molecular, ZS Genetics)

Single Molecule Sequencing (Helicos Biosciences, USA)

Billions of single molecules of sample DNA are captured on an application-specific proprietary surface serve as templates for the sequencing-by-synthesis

Polymerase and one fluorescently labeled nucleotide (C, G, A or T) are added.The polymerase catalyzes the sequence-specific incorporation of fluorescent nucleotides into nascent complementary strands on all the templates.

After a wash step, which removes all free nucleotides, the incorporated nucleotides are imaged and their positions recorded.

The fluorescent group is removed in a highly efficient cleavage process, leaving behind the incorporated nucleotide.

The process continues through each of the other three bases.

Multiple four-base cycles result in complementary strands greater than 25 bases in length synthesized on billions of templates—providing a greater than 25-base read from each of those individual templates.

Single Molecule

Sequencing

(Helicos Biosciences,

USA)

Ion Sequencing (Rothberg et al., Life technologies: Nature, July 2011)

Non-optical method of DNA sequencing of genomes

Sequence data obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip

The ion chip contains ion-sensitive, 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions.

Performance of the system showed by sequencing three bacterial and one human genome

World’s smallest solid state pH meter

DNA is fragmented, ligated to adapters, and clonally amplified onto beads.

Sequencing primers and DNA polymerase are then bound to the templates and pipetted into the chip’s loading port. Individual beads are loaded into individual sensor wells by spinning. Well depth will allow only a single bead to occupy a well

All four nucleotides are provided in a stepwise fashion during an automated run. When nucleotide in the flow is complementary to the template base directly downstream of the sequencing primer, the nucleotide is incorporated into the nascent strand by the bound polymerase.

This increases length of sequencing primer by one base (or more, if a homopolymer stretch is directly downstream of the primer) and results in the hydrolysis of the incoming nucleotide triphosphate, which causes the net liberation of a single proton for each nucleotide incorporated during that flow.

Release of proton produces a shift in pH of surrounding solution proportional to the no. of nucleotides incorporated in the flow (0.02 pH units per single base incorporation). This is detected by the sensor on the bottom of each well, converted to a voltage and digitized by off-chip electronics . The signal generation and detection occurs over 4 s

After the flow of each nucleotide, a wash is used to ensure nucleotides do not remain in the well.

Sequencing methods

Mining NGS data to obtain meaningful information

Average NGS experiment generates gigabytes to terabytes of raw data

Existing bioinformatics tools functions fit into several general categories: (1) alignment of reads to a reference sequence (2) de novo assembly (3) reference-based assembly (4) genetic variation detection (such as SNV, Indel) (5) genome annotation (6) utilities for data analysis.

The most important step in NGS data analysis is successful assembly or alignment of reads to a reference genome.

After successful alignment and assembly the next step is to interpret the large number of putative novel genetic variants (or mutations) present by chance

Recognition of functional variants is at the center of the NGS data analysis and bioinformatics

Thanks

next generation sequencing: an overview

Documents

sequencing vectors

automated sequencing

generation sequencing

shotgun sequencing ngs

entire genome

genomic sequence

single stranded dna

generation sequencer