next gen sequencing (ngs) technology overview

32
Next Gen Sequencing [NGS] History of DNA Sequencing Maxam-Gilbert Sanger ABI NGS Technologies: 454, Illumina, PacBio, ABI, Helicos, Ion Torrent, Nanopores Applications: Genomes, RNASeq, ChIPSeq, CGH, CancerGenome, Environmental Human Genome: 1990-2000 Presented by Dominic Suciu, Ph.D.

Upload: dominic-suciu

Post on 11-May-2015

3.671 views

Category:

Technology


3 download

DESCRIPTION

This is part of a talk I gave at MSFT a few years ago

TRANSCRIPT

Page 1: Next Gen Sequencing (NGS) Technology Overview

Next Gen Sequencing [NGS]

• History of DNA Sequencing– Maxam-Gilbert– Sanger– ABI

• NGS Technologies:– 454, Illumina, PacBio, ABI, Helicos, – Ion Torrent, Nanopores

• Applications:– Genomes, RNASeq, ChIPSeq, CGH,

CancerGenome, Environmental

Human Genome: 1990-2000

Presented by Dominic Suciu, Ph.D.

Page 2: Next Gen Sequencing (NGS) Technology Overview

Preliminaries: Central Dogma

Gene ~ Protein ~ Enzyme

Gene (DNA)[Program in directory]

Protein (PolyPeptide)[Program in RAM]

~~ Enzyme ~~Functional agent

Messenger RNA

Genome (DNA)[Hard drive]

Page 3: Next Gen Sequencing (NGS) Technology Overview

Preliminaries: Phages

BacterioPhages are viruses that infect bacteria

Some Bacteria are immune to certain phages [Hamilton O. Smith, early 70’s]

Restriction Endonucleases: Enzymes that specifically cleave certain DNA sequences.

Bacterial cells use these as a crude anti-phage defense mechanisms

Page 4: Next Gen Sequencing (NGS) Technology Overview

Preliminaries: Restriction Enzymes

• Molecular scissors• Their discovery allowed researchers to physically map genomes• Big confirmatory clue that Genome sequence determines species and even individuals

Page 5: Next Gen Sequencing (NGS) Technology Overview

Preliminaries: Cloning

Start with picograms of DNAEnd up with microgarms of highly purified copiesEach Colony is highly enrichedEach colony is endlessly amplifyable

pBR322: is a vector, an engineered phage.It can reproduce itself inside a bacterial host and do nothing else.

Page 6: Next Gen Sequencing (NGS) Technology Overview

Preliminaries: PCR [1985]

As long as you know the beginning and end of a sequence, you can amplify anything

Page 7: Next Gen Sequencing (NGS) Technology Overview

Deconstructing Sequencing• DNA source: gel-purified fragment, cloning product,

random fragmentation.

• DNA Amplification: need enough to be able to detect signal given off by base interrogation

• DNA Seq Method: Base interrogation method to uniquely detect G,A,T,C bases.

• Sequence Positioning: Need an organizing principal to place these bases into a sequence.

The methods presented here represent unique ways to solve each of these issues

Page 8: Next Gen Sequencing (NGS) Technology Overview

Maxam-Gilbert 1975Fragment population distribution

corresponds to appearance of

base within sequence

Page 9: Next Gen Sequencing (NGS) Technology Overview

Maxam-Gilbert 1975Chemical Sequencing

Issues:• Need perfectly pure single species of DNA• Nasty Chemicals• Radioactive End-labeling• 4-lanes/read• Sequence only what you can purify

Advantages:- 1st DNA sequencing available- 2-300 bp/read

Fragment population distribution

corresponds to appearance of

base within sequence

Page 10: Next Gen Sequencing (NGS) Technology Overview

Sanger “Sequencing-by-Synthesis” 1977

Issues:- Radioactive End-labeling- 4-lanes/read- Sequencing gels

Advantages:- 4-500 bp/reads- Radioactive Incorporation- Primer gives you control

dNTP ddNTP

Page 11: Next Gen Sequencing (NGS) Technology Overview

PCR Dye-Terminator 1990’s

Issues:- Sequencing gels- 1 run/day

Advantages:- 600-700 bp/reads- 96 reads/run- Each terminator dye has a different

color. Lets you combine all 4 reactions in one lane.

- Single lane/read- Primer gives you control

Page 12: Next Gen Sequencing (NGS) Technology Overview

Human Genome Project (15 years) Hierarchical Shotgun Sequencing [start1990]

- Randomly insert Human DNA into BAC clones (~150kbp each)- Combine these BAC clones to create a scaffold of the human

genome. Each BAC clone will be mapped to a region on a Human Chromosome

- Pass BAC clones to different Genome Centers throughout US- At each center, each vector is sequenced using shotgun sequencing- Wait 15 years for results.

Page 13: Next Gen Sequencing (NGS) Technology Overview

Issues with Shotgun Sequencing

• Reads-> contigs -> scaffolds -> genome reconstruction• Repeat regions can confuse Contig assemblers.• It was hoped that by focusing each shotgun run to a single 40-150kb region, these

issues would be minimized.• According to Venter, it simply multiplied the number of times one encountered the

same problem

Page 14: Next Gen Sequencing (NGS) Technology Overview

Shotgun Sequencing: Venter 1997

Same approach is used throughout NGS

Paired-end sequencing:1. Randomly cut genomic DNA.2. Use Gel-purification to make three

libraries of random DNA fragments: 2kb, 10kb, 50kb

2. Sequence from both ends.3. Use distance information to assemble contigs into scaffolds.

Distance information allows you to ‘jump’ over repeat regions.

This approach allowed Venter to ‘jump’ over the federal sequencing project

Page 15: Next Gen Sequencing (NGS) Technology Overview

NGS Revolution: Roche / 454 -> [2005]

ABI 3700 state of the art in 1997

- 1 sample per rxn (96 rxns) in 2 hrs

- Each sample had to be individually manipulated

454 solved both these problems

PPi + H+

Paired-end reads can be done by including both primers on each micro-bead

Emulsion PCR:

Page 16: Next Gen Sequencing (NGS) Technology Overview

Roche / 454 -> [2005]• emPCR: No need for

cells• Each well is a single

sequencing run.• Very fast reaction

Page 17: Next Gen Sequencing (NGS) Technology Overview

Illumina [Solexa 2007]

No need for Cell-based amplification

Bridge Amplification: PCR on a surface

Page 18: Next Gen Sequencing (NGS) Technology Overview

Illumina

Advantages:• No need for cells• Each cluster of DNA

molecules is a single reaction.• Enormous amounts of reads• Paired ends Sequence from

both sides.

Disadvantages:• Slow• Short reads• Reagent costs

Page 19: Next Gen Sequencing (NGS) Technology Overview

Ion Torrent/LifeTechnologies [2010]

Method:• Emulsion PCR• Each bead is placed in a

single well.• CHEAP/RuggedDisadvantages:• Low density • Sample prep

PPi + H+

Page 20: Next Gen Sequencing (NGS) Technology Overview

ABI-SOLiD

Advantages:• Extremely accurate

Disadvantages:• Takes a long time• Expensive reagent costs

12/cycles/position

Page 21: Next Gen Sequencing (NGS) Technology Overview

Complete Genomics

Advantages:• Whole genome in 3 months• 40x coverage!!!

Disadvantages:• Labor Intensive Takes a long

time: 3 months sample prep• Expensive: $10-20k/GENOME• No Instrument: CRO model

Page 22: Next Gen Sequencing (NGS) Technology Overview

Helicos

Advantages:• No amplification Single Molecule Detection

Disadvantages:• It doesn’t work

8-10 days

Page 23: Next Gen Sequencing (NGS) Technology Overview

PacBio

Key Factors:• Zero-mode waveguide

• Zeptoliter vol• Continuous process• Lariat sequencing• Low reagent costs

Disadvantages:• Low Num reads

Page 24: Next Gen Sequencing (NGS) Technology Overview

Next-Next Generation:NanoPores

Illumina/Oxford NanoporeRoche/IBM all-semiconductorStratos genomicsNabSys (Graphene monolayer)

Page 25: Next Gen Sequencing (NGS) Technology Overview

Applications: Genome SequencingSequencing of whole genomes: bacterial, animal, human.

De novo Genome Sequencing: Even with the large number of reads, putting a genome together from raw sequence reads is still a non-trivial task, due to sample prep and inherent complexity.

Re-sequencing: Sequencing individual with a genetic disease in order to find hereditary mutations.Read depth allows one to compute allele-frequencies.

454: Due to its long reads, this method is best for de novo. Useful for scaffolding.

SOLiD, Illumina: used for re-sequencingSOLiD: wins out due to accuracy loses based on complexity/costComplete Genomics: CRO model, depth 40x

Page 26: Next Gen Sequencing (NGS) Technology Overview

Applications: Exon Sequencing

Mutational screening: what are the mutations in the actual coding regions?

Most heritable disease models have mutations in the coding regions.

Use enrichment to focus sequencing to expressed space. Then make as many reads as possible in order to accurately compute mutations.

Illumina, 454, ABI

Page 27: Next Gen Sequencing (NGS) Technology Overview

Enrichment: Microarrays are Not dead!

Why?:In order to focus sequencing run on the region you are interested in.

Ex: • Expressed region of genome (1%)• Genes of interest: mutational studies.

Three ways:• Micro-droplet PCR: each droplet has

unique set of amplification primers.• MIP-PCR• On-chip enrichment, using

microarrays.• On-bead enrichment: make oligo

pools, use them to capture targets for sequencing.

Page 28: Next Gen Sequencing (NGS) Technology Overview

Two approaches for finding causative mutation responsible for Miller Syndrome

Sequence Whole Genome: Complete Genomics• Sequenced Mother, Father and 2 kids (both affected) 1 kindred• Regions where they share both copies from parents (22%)• Both diseases are rare: look for locations with low prevalence

SNP’s (dbSNP)• Narrowed down to 4 genes• 2 of these were found to be causative agent in exome sequencing

study

Exome Array: Just sequence expressed sequence space (1%): Illumina GAII• Sequenced genomes from 4 affected individuals in 3

kindreds• Found 4600 mutants• Ignored any previously discovered SNPs from dbSNP• Looked for mutations that appeared in all 3 kindreds• Focused on damaging mutations Non-synonymous, stop

codon• Discovered causative locus by elimination

Page 29: Next Gen Sequencing (NGS) Technology Overview

Applications: RNA-Seq

Microarrays are Dead!

Don’t have to design probes ahead of time, just sequence mRNA and count number of sequences for each gene.Read count ~ Expression level

In environmental genomics, sequencing can be used to determine which genes are being expressed in a sample.

Illumina: Only method that has the read depth to get useful spread between high and low-expressed genes. Its Dynamic Range far surpasses microarrays in this respect, especially for smaller genomes.

Page 30: Next Gen Sequencing (NGS) Technology Overview

Applications: ChIP-SeqChIP Chromosomal Immune Precipitation

Illumina, ABI-SOLiD

Where does my DNA-binding transcription factor bind within the genome?

Page 31: Next Gen Sequencing (NGS) Technology Overview

Environmental Genomics

GAM: Genome Annotation Machine:• Genome Annotation• Gene Identification• Comparative Genomics• Functional characterization• Phylogenetic char.• Protein Structural char.

whowhat

Page 32: Next Gen Sequencing (NGS) Technology Overview

Summary