how will new sequencing technologies enable the hmp? elaine mardis, ph.d. associate professor of...

17
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington University School of Medicine [email protected]

Upload: sheryl-harrington

Post on 20-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

How will new sequencing technologies enable the

HMP?Elaine Mardis, Ph.D.

Associate Professor of GeneticsCo-Director, Genome Sequencing

CenterWashington University School of

[email protected]

Page 2: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Advantages of Next Gen Platforms

• No sub-cloning, no use of E. coli as host- cloning bias abolished

- one FTE can keep several instruments busy

• Each sequence is from a unique DNA molecule

- quantitation is possible through “counting”

- enhanced dynamic range- detection of rare variants

• Multiple sequence-based assays on one platform

[email protected]

Page 3: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

New Sequencing Platforms

• Roche FLX Sequencer

• Illumina 1G Analyzer

• ABI SOLiD Sequencer

• Helicos Single-molecule [email protected]

Page 4: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Roche FLX: Vital Statistics

• >100Mb data/7 hours/$16K• Read lengths average 250 bp• Accuracy is hindered by homopolymer run

in/dels• Coverage model is higher than for 3730 data

[email protected]

© Elaine Mardis, Ph.D.

Currently:

By year’s end:

• Improved pipeline and read assembly software• Paired end reads• 400 bp read lengths• Bar-code tagging of libraries

Page 5: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Illumina 1G Analyzer: Vitals

• 1 Gb/4 days/$3-5000 • 40 bp read lengths, 8 channel flow cell• Read accuracy is highest in 1st 25 bp, ~1%

overall error rate

• Biased representation of high AT regions

Currently:

By year’s end:• Paired end read capability• 50 bp read lengths• Improved short read mapping, assembly algorithms (?)

[email protected]

Page 6: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Cross-Platform Comparisons

Platform cost $350K $500K $395KRead length 650 bp + 250 bp 40-50 bp

Cost/run $55 $16,000 $3-5,000

Mbp/day 1.4 200 333

Cost/Mbp $880 $160 $5

Accuracy highNo subs,Indels at

homopolymershigh

Paired end reads Yes Coming Yes*

Criterion 3730 Roche Illumina

[email protected]

© Elaine Mardis, Ph.D.

Page 7: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

AB SOLiD™: Vital Statistics

• 500Mb-1Gb/5 days/?$$• 50 base pair read lengths/ paired end

or fragment reads• Ligation based sequencing with high

accuracy due to 2-base encoding• Analysis software is unknown• Early access platform due Q3 of ‘07

[email protected]

Page 8: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

HeliScope sequencer• Single molecule detection obviates PCR

amplification step

• >25Mbp/hour initial data rate, 1000Mbp/hour

ultimately with <1% error rate

• Short read lengths, single molecule

sequencing with high fidelity

• Two 25 channel flow cells

• Read mapping/assembly capability (?)

[email protected]

Page 9: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Comparative metagenomics: Cecal contents of obese mice (ob/ob) and lean littermates

• EXPERIMENTAL DESIGN: 1) Remove cecal contents of 2

ob/ob, 2 +/+, and 1 ob/+ C57Bl/6J mice and isolate DNA.

2) 454 pyrosequencing of total DNA - 350,000 reads/mouse (one ob/ob, one +/+ mouse).

3) Compare data from each mouse to all known bacterial sequences.

4) Use data clustering methods to examine similarities and differences between all 5 mice that were sequenced.

5) Perform microbiota transplantation to test for ability to transfer phenotype to gnotobiotic mice.

[email protected]

© Elaine Mardis, Ph.D.

Page 10: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Next Gen RNA Sequencing• Our laboratory has developed a robust full-

length cDNA process for 454-based sequencing of eukaryotic transcriptomes that features low input of total RNA, enzyme-based normalization and the ability to preferentially sequence the 5’ ends of cDNAs.

• We presently are working to modify this approach for sequencing microbiotal transcriptomes and clinical isolates likely to contain viral RNA genomes (e.g. nasal lavage samples).

[email protected]

© Elaine Mardis, Ph.D.

Page 11: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Illumina ‘Mockagenomics’ Experiment

[email protected]

• We created two mock metagenomic samples by combining known bacterial and human genomic DNAs and sequenced them by Illumina platform to generate short (30bp) reads.

• We plan to compare the relative strengths of classification by assembly and alignment to those of “signature” characterization (GC content, kmer analysis) for short read data

Page 12: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Practical Issues

• DNA quality and quantity• Value of paired end vs. fragment

reads• Normalization vs. quantitation• Depth of “search space”

[email protected]

Page 13: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Sample prep

• Evaluate DNA• Fragment (2-500bp)• Repair ends• Adapter ligate• Enrich• Amplify on

bead(Roche/AB) or on glass slide (Illumina)

• Evaluate DNA• Fragment (2.5kb)• Repair ends• Adapter ligate• Methylate• Restrict adapters• Circularize• 2° restriction with

type IIS enzyme• Purify tags+adapter• Amplify

Fragment reads Paired end reads

[email protected]

Page 14: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Paired End Libraries

Internal Adapter

25 base

Tag #1

25 base

Tag #2

Mate Pair Library

EcoP15I orfragmentation

[email protected]

Page 15: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Sequencing:

PESP#1 PESP#2

NaIO4 U.S.E.R.

Read 1 (25 to 40 cycles) Read 2 (25-40 cycles)Total 50-80 cycles

3-primer PE method

Graft:P7:P7diol:9TUP5

[P7+P7diol] = [9TUP5]

P5 P7 P7diolUP5 P7 P7diol

UP5 P7 P7diol

U

P7diol & 9TUP5 linearisable

P7 non-linearisable

Cluster formation:Heterogeneous clusters containing:• P7/9TUP5 bridges• P7diol/9TUP5 bridges

SBS8 SBS3

NaIO4 USER

S B S 8 S B S 3

N a I O 4NaIO 4 USERUSER

P7diol/9TUP5 P7/9TUP5

Page 16: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

What are the issues?

• Consented sample availability!!• Read length and accuracy• Sample complexity• Sensitivity to detect • Coverage and cost• DNA vs. RNA• Bioinformatics-based analyses

[email protected]

Page 17: How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington

Bioinformatics Challenges

• Most daunting issue: the ability to analyze enormous data sets intelligently and efficiently

• Metagenomic analysis tools are now emerging for next gen sequence data

• Testing and implementation into analysis pipelines will follow

• Output is only as good as the depth of the search space and the depth of coverage for any given combination of sample & sequencer

[email protected]