next-generation sequencing (ngs) technologies – overview ngs targeted re-sequencing – fishing...

73
Next–generation DNA sequencing technologies – theory & practice

Upload: malik-noyce

Post on 15-Dec-2015

229 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Next–generation DNA sequencing technologies –

theory & practice

Page 2: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Next-Generation sequencing (NGS) technologies – overview

NGS targeted re-sequencing – fishing out the regions of interest

NGS workflow: data collection and processing – the exome sequencing pipeline

Outline

Page 3: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PART I: NGS technologiesNext-Generation sequencing (NGS) technologies – overview

Page 4: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

The automated Sanger method is considered as a ‘first-generation’ technology, and newer methods are referred to as next-generation sequencing (NGS).

DNA Sequencing – the next generation

Page 5: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

1953 Discovery of DNA double helix structure 1977

◦ A Maxam and W Gilbert "DNA seq by chemical degradation"◦ F Sanger"DNA sequencing with chain-terminating inhibitors"

1984 DNA sequence of the Epstein-Barr virus, 170 kb 1987 Applied Biosystems - first automated sequencer 1991 Sequencing of human genome in Venter's lab 1996 P. Nyrén and M Ronaghi - pyrosequencing 2001 A draft sequence of the human genome 2003 human genome completed 2004 454 Life Sciences markets first NGS machine

Landmarks in DNA sequencing

Page 6: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection
Page 7: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Random genome sequencing• 25 Mb• 300k reads• 110bp

Sanger sequencing• Targeted • 700-1000 bp

DNA Sequencing – the next generation

Page 8: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

The newer technologies constitute various strategies that rely on a combination of ◦ Library/template preparation◦ Sequencing and imaging

DNA Sequencing – the next generation

Page 9: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Commercially available technologies◦ Roche – 454

GSFLX titanium Junior

◦ Illumina HiSeq2000 MySeq

◦ Life – SOLiD 5500xl Ion torrent

◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

DNA Sequencing – the next generation

Page 10: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

DNA Sequencing – the next generation

Page 11: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Page 12: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Page 13: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Produce a non-biased source of nucleic acid material from the genome

Current methods:◦ randomly breaking genomic DNA into smaller

sizes◦ Ligate adaptors◦ attach or immobilize the template to a solid

surface or support◦ the spatially separated template sites allows

thousands to billions of sequencing reactions to be performed simultaneously

Template preparation

Page 14: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Clonal amplification◦ Roche – 454◦ Illumina – HiSeq◦ Life – SOLiD

Single molecule sequencing◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

Template preparation

Page 15: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

In solution – emulsion PCR (emPCR)◦ Roche – 454◦ Life – SOLiD

Solid phase – Bridge PCR◦ Illumina – HiSeq

Template preparation: Clonal amplification

Page 16: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Template preparation: Clonal amplification - emPCR

Page 17: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequencing

SOLiD 454

Page 18: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Pyrosequencing

Picotitre plate Pyrosequencing

Page 19: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Pyrosequencing

Page 20: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequencing by ligation

Page 21: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequencing by ligation

Page 22: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequencing by ligation

Page 23: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Template preparation: Clonal amplification – Bridge PCR

Page 24: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Template preparation: Single molecule templates

Heliscope BioPac

Page 25: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

HiSeq Heliscope

Page 26: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

The major advance offered by NGS is the ability to cheaply produce an enormous volume of data

The arrival of NGS technologies in the marketplace has changed the way we think about scientific approaches in basic, applied and clinical research

DNA Sequencing – the next generation

Page 27: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PART II: NGS targeted resequencing

fishing out the regions of interest

Page 28: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

The beginning

Random genome

sequencing

??? ??? Sanger sequencing• Targeted • 700-1000 bp

Page 29: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Library/template preparation Library enrichment for target Sequencing and imaging

DNA Sequencing – the next generation

Page 30: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Target enrichment strategies

Random genome

sequencing

Hybrid Capture

PCR based Sanger sequencing

Page 31: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Target enrichment strategies

Page 32: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Target enrichment strategies

Page 33: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Target enrichment strategies

Page 34: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Target enrichment strategies: MIP

Page 35: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Hybrid Capture

In solution• Agilent• Nimblegen• ...

Solid phase• Agilent• Nimblegen• Febit• ...

Page 36: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Hybrid Capture

In solution• Relatively cheap• High throughput is possible• Small amounts of DNA

sufficient

Solid phase• Straightforward method• Flexible• Higher amounts of DNA

Page 37: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Target enrichment strategies

Page 38: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PCR based approaches

• Uniplex• Multiplex• Fluidigm• Raindance• Multiplicon

• Longrange PCR products• Raindance

Page 39: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PCR based approaches: Raindance

Page 40: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PCR based approaches: Fluidigm• 48.48 Access Array

Page 41: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PCR based approaches: Fluidigm• 48.48 Access Array

Page 42: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PCR based approaches: Fluidigm• 48.48 Access Array

Page 43: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Target enrichment strategies

Page 44: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

PART III: NGS workflow

data collection and processing – the exome sequencing pipeline

Page 45: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

The human genome◦ Genome = 3Gb◦ Exome = 30Mb◦ 180 000 exons

Protein coding genes ◦ constitute only approximately 1% of the human

genome ◦ It is estimated that 85% of the mutations with

large effects on disease-related traits can be found in exons or splice sites

Whole Exome Sequencing

Page 46: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

gDNA3 Gb

Exome 38Mb NGS

Exome sequencing

Page 47: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

1/01/2010 1/08/2010 1/01/2011

1100860

300

5900

2600

1000

7000

3460

1300

exome capture Seq - 2.5Gbases total cost

The past, present & future

Page 48: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

HiSeq specifications:◦ 2 flow cells◦ 16 lanes (8 per flow cell)◦ 200-300 Gbases per flow cell◦ 10 days for a single run

Exome throughput◦ 96 @ 60x coverage per run◦ 3000 @ 60x coverage per year

Exome sequencing capacity

Page 49: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Data processing workflow

Data formatting & QC

Mapping & QC

Variant calling

Variant annotation

Variant filtering/comparison

Page 50: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Data processing

Page 51: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection
Page 52: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

DATA STORAGEDATA GENERATION DATA PROCESSING

REPORTING &

VALIDATION

RESULTS

INTERPRETATION

Page 53: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Page 54: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Page 55: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Page 56: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

Page 57: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection
Page 58: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

NGS data processing: overview

1

•Mapping

2

•Duplicate marking

3

•Local realignment

4

•Base quality recalibration

5

•Analysis-ready mapped reads

Page 59: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Page 60: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

QC NGS

Mapping

QC HC

DATA PROCESSING

Page 61: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

QC NGS

Mapping

QC HC

DATA PROCESSING

Page 62: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Page 63: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection
Page 64: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection
Page 65: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection
Page 66: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Page 67: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

SNPs vs Indels

0

200000

400000

600000

800000

1000000

1200000

INDELSNP

Page 68: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

exonic vs non-exonic

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

stopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionnon-codingframeshift insertionframeshift deletion

Page 69: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Exonic

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

synonymous SNVstoploss SNVstopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Page 70: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Exonic

0

50

100

150

200

250

300

350

400

450

500

stoploss SNVstopgain SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Page 71: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

Page 72: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection
Page 73: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

REPORTING &

VALIDATION

RESULTSValidated variants in candidate

genes

INTERPRETATION