introduction to the miseq: technology and...

55
© 2013 Illumina, Inc. All rights reserved. Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners. Introduction to the MiSeq: Technology and Applications Joseph Aman Field Applications Scientist

Upload: danganh

Post on 04-May-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

© 2013 Illumina, Inc. All rights reserved.

Illumina, IlluminaDx, BaseSpace, BeadArray, BeadXpress, cBot, CSPro, DASL, DesignStudio, Eco, GAIIx, Genetic Energy, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium,

iSelect, MiSeq, Nextera, NuPCR, SeqMonitor, Solexa, TruSeq, TruSight, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks

of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.

Introduction to the MiSeq:

Technology and

Applications

Joseph Aman

Field Applications Scientist

2

Introduction to the MiSeq

Sample preparation kits

– DNA

– RNA

16s Sequencing Overview

Illumina sequencing portfolio update

Overview

3

MiSeq System

Proven Pedigree – Bench top Friendly

4

Illumina Sequencing Workflow

Data Analysis

Sequencing

Cluster Generation

Library Preparation

5

Library Preparation

mRNA

Small RNA

Other Apps ChIP-Sequencing

Genomic DNA Active Chromatin

6

The aim of the sample prep step is to obtain nucleic

acid fragments with adapters attached on both ends

Sample Prep is Critical for Successful Sequencing

Dual Index Library shown

7

The flow cell

Everything except sample preparation is completed on the flow cell

• Template annealing (1 - 96 samples)

• Template amplification

• Sequencing primer hybridization

• Sequencing-by-synthesis reaction

• Generation of fluorescent signal

8

Cluster Generation

Bind single

DNA

molecules to

surface

Amplify on

surface

~1000 molecules per ~ 1 µm cluster

9

MiSeq: Industry Best Data Quality, Paired-End Reads, High

Output, and Flexible Read Length

Read Length

Quality

Scores >Q30

1x36 bp >90% bases

2x25 bp >90% bases

2x100 bp >85% bases

2x150 bp >80% bases

2x250 bp >75% bases

2x300 bp >75% bases

Read Length (bp)

Da

ta o

utp

ut

610

Mb

1x36

850

Mb

2 x 25

3.4G

2 x 100

5.1G

2 x 150

8.5G

2 x 250

15G

2 x 300

10

MiSeq Applications

PCR

Amplicon (TruSight

Panels)

Targeted

RNA

Expression

TruSeq

Amplicon

11

Illumina Sequencing Publications

More than 5,200 publications using

Illumina’s SBS technology

Greater than 85% of all SRA projects

done on MiSeq

0

2000

4000

6000

8000

10000

12000

14000

MiSeq PGM GS Jr Proton

13,473 MiSeq projects*

86%

8% 5%

0.3%

*Projects in NCBIs Sequence Read Archive (SRA) as of

01-02-14

0

1000

2000

3000

4000

5000

6000

2007 2008 2009 2010 2011 2012 2013

12

Illumina Sequencing

TruSeq DNA

Nextera Mate Pair

Nextera DNA

Nextera XT

TruSeq Custom Amplicon

TruSeq Amplicon

Cancer Panel

Nextera Enrichment

Custom/Exome

TruSeq ChIP

TruSeq RNA v2

TruSeq Targeted RNA

TruSeq Stranded

mRNA/ Total RNA

TruSeq small RNA

Illumina Sample Preparation Portfolio

13

What kind of organism/s am I working with?

– Is there a genome or transcriptome reference? Do I

need to make one?

– What level of ploidy does it have? 1n, 2n, 4n, 8n??

– How big is the genome/region of interest?

– Does it have a balanced base composition?

What kind of samples do I have access to?

– Fresh? Frozen? FFPE? Dried? Ancient?

Environmental?

– Quantity. How much nucleic acid starting material can I

get? Do I only get one biological replicate?

– Quality. Will my starting material be degraded or

contain potential contaminants?

How many samples do I need?

– What number of samples will I need to be confident of

my discovery? What false positive/false negative rates

will I tolerate?

– How much biological/technical replication is prudent?

What are my limitations?

– Time, budget, manpower, skills, tools, access to

instrumentation

– Do I need to operate within a regulatory framework?

Questions to ask yourself

14

Illumina DNA Sequencing Applications Portfolio The growing family of sample prep

TruSeq DNA PCR-Free

Nextera Mate Pair

Nextera Nextera XT

Summary Eliminates PCR-

induced bias Long-insert; gel-

free Low-input, fast, MiSeq

Lowest-input Fast, MiSeq,

small genomes

Time ~5 h 1.5 d 1.5 h 1.5 h

Apps WGRS De novo, SVs WGRS Amplicons,

plasmids, small genomes

Input 1ug+ 1ug+ 50ng 1ng

Indexing 96 48 (gel-free) 96 96

Quality Best! +++ +++ +++

DNA Sample Prep

TruSeq Nano DNA

The new gold standard with

Low-input

~6 h

WGRS

100ng+

96

Best!

15

TruSeq DNA Sample Prep Workflow

DNA RNA

Gel size selection, if needed

16

Nextera and Nextera XT Sample Prep Kit Features

• Fast sample prep, 90 min.

• Single read or paired-end compatibility Highlights

• 50 to 1 ng DNA per sample

• Index up to 96 samples

• Parallel processing of up to 96 samples

Sample input and indices

• Construct completed at the PCR step

• Gel-free protocol

• Enzymatic fragmentation

Specific considerations

• Large and small whole genomes

• Now used for all enrichment workflows

• Nextera Rapid Enrichment!

Suitable for:

17

p7

Index 1 Read 2 Sequencing Primer

p5

Index 1 Read 2 Sequencing Primer

Transposons

Genomic DNA

~ 300 bp

Tagmentation

Reduced-Cycle

PCR Amplification

p7 Index 1 Rd2 SP Rd1 SP p5 Index 2

Sequencing-Ready Fragment

Nextera DNA Sample Prep

Enrichment

18

TruSeq Custom Amplicon Assay Time Go from DNA to called variants in ~2 days

Design studio for custom

panels,

Fixed panels also avaliable

1536 amplicons,

96 samples per plate

Automated

sequencing

Day 1

Receive custom

oligos;

Hybridization setup

Day 1

Assay

biochemistry

Day 1-2

Cluster gen and

sequencing

on MiSeq

Day 2

Finished at 5:00PM

Real-time

analysis

Simple, efficient,

automatic data analysis

and variant calling

19

Nextera® Mate-Pair Sample Prep Kit Industry’s only gel-free protocol, lowest DNA input, ideal for de novo seq

Enables wide-range of whole-genome apps

– De novo assembly of small genomes & complex

genomes (cancer)

– Genome finishing: ‘close-to-finished’ ref. genomes from

single library type

– Spls w/limited input DNA (metagenomics)

– Detection of structural variation

Optimized combo of Nextera & TruSeq DNA Spl Prep

Gel-Free option: 3-15kb gap size

– 1 ug input; 1.5 days: 3hrs HOT

– Gel+ option for more refined gap size, 4ug input

20

TruSeq ChIP Seq - How does this work?

TS ChIP Sample Prep

1. Cross link w/ formalin, shear

2. Remaining DNA protected by

proteins

3. Immunoprecipitate with

antibody against target protein

of interest

4. Reverse crosslinks, use DNA

as input for library generation

5. Sequence, reads indicate

where protein was bound

Figure from Szalkowski, A.M, and Schmid, C.D.(2010).

Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing banchmarking

efforts.

Briefings in Bioinfomatics.

21

TruSeq RNA

TruSeq Stranded RNA

Stranded mRNA

Stranded Total RNA

TruSeq RNA v2

mRNA

TruSeq Small RNA

Small RNA

Illumina RNA Portfolio

22

Sample Prep Solutions for RNA from Discovery to

Validation Transcriptome to Targeted

Whole TranscriptomeAnalysis

RNA-seq for Expression

Profiling

TruSeq

Targeted

RNA

TruSeq RNA-Seq Portfolio

TruSeq Stranded mRNA

TruSeq Stranded Total RNA w/RiboZero

• HRM

• Gold

• Plant

• Globin

TruSeq RNA

TruSeq Targeted RNA Expression

TruSeq Small RNA

23

TruSeq mRNA Sample Prep Workflow

RNA

24

TruSeq Total RNA Sample Prep Workflow

B

B

B

B

B

B B

B

B

B

B

B

B

Total RNA

Add rRNA Removal

Solution

Add rRNA

Removal Beads

Remove RRB-

bound rRNA

Ribosomal Depleted

RNA

25

Accurate targeting of virtually the entire transcriptome for Human, Mouse, Rat

– Assay specific gene families including alternative isoforms

Individual exons and splice junctions

cSNP detection for allele specific expression

– Non-coding RNA transcripts

Validation of over 10,000 assay designs

Add custom content to Fixed Panels

TruSeq Targeted RNA Create custom panels, select pre-validated fixed, or add-on custom

Pre-Validated Fixed Panels

Immune Response Cardiotox

Lung Cancer Apoptosis

Breast Cancer Neuro Panel

Stem Cell Prostate Cancer

P53 Pathway Wnt Pathway

Cytochrome P450 Cell Cycle

NFκB Pathway Hedgehog

Design Studio: Custom Panel Creation

26

Rapid Workflow

=

Sample to answer in 1.5 days with < 4hrs

hands-on time

Sample prep for 48-384 samples per run

Single MiSeq run equivalent to 15,000

qPCR reactions or 40 384-well plates

On instrument analysis with MSR

Modified from existing DASL Sample Prep Chemistry

27

TruSeq Small RNA Workflow

Interrogate regulatory miRNAs

Starts directly from total RNA -1.0 ug or less input

Pre-pool samples for single gel excision step

© 2011 Illumina, Inc. All rights reserved.

Illumina, illuminaDx, BeadArray, BeadXpress, cBot, CSPro, DASL, Eco, Genetic Energy, GAIIx, Genome Analyzer, GenomeStudio, GoldenGate, HiScan, HiSeq, Infinium, iSelect, MiSeq, Nextera,

Sentrix, Solexa, TruSeq, VeraCode, the pumpkin orange color, and the Genetic Energy streaming bases design are trademarks or registered trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.

16S Metagenomics

Analysis on the

MiSeq System:

29

Metagenomics & Microbiomics The unexplored living world

Metagenomics is the analysis of genomic DNA from a whole community

– 16s DNA

– Whole Genome

– Targeted Gene

Survey of micro organisms in specific environments

– Soil

– Aqueous / Marine

– Medical / Ag

What can we learn

– Taxonomic diversity (‘who is there’),

– Physiology (‘what are they doing’)

– Gene discovery

> 99% of the bacteria present in nature are non-culturable

Operational Taxonomic Unit - OTU

sciencemolecular.blogspot.com/

Metagenomics

30

Approaches to Study the Microbiome and Metagenomics

31

16S Ribosomal RNA Gene is conserved across all bacteria

32

Biological question asked?

How many samples per run?

ROI amplicon size?

Data analysis?

Choosing a 16S protocol: Four Questions to answer

33

Nextera XT Protocol

• <96 samples

• For amplicons >400 bp

• MSR/BaseSpace Analysis

Illumina Dem. Protocol, 2x PCR

• <96 samples

• Targets regions <400 bp

• MSR/BaseSpace Analysis

Caporaso, et al Protocol (ISME)

• >96 sample, up to 2167 barcodes

• Targets 16S V4 region; ~300 bp

• Export FASTQ to QIIME

Available options:

34

Option #1 - Illumina Demonstrated Protocol

35

taxonomic

pathforward primer reverse primer

total

number of

sequences

found

number of

sequences

w/o

sequence

data at

primer start

number of

matchescoverage Start End ~ Amplicon Source Mis HVR

silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-a-A-18 282323 56 236961 83.9 343 802 459 Table 17 0 V3-V4

silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-a-A-18 282323 56 262442 93.0 343 802 459 Table 18 1 V3-V4

silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-b-A-18 282323 56 238918 84.6 343 802 459 Table 17 0 V3-V4

silva;Bacteria;S-D-Bact-0343-a-S-15S-D-Bact-0785-b-A-18 282323 56 262456 93.0 343 802 459 Table 18 1 V3-V4

silva;Bacteria;S-D-Bact-0347-a-S-19S-D-Bact-0787-b-A-20 282323 56 257513 91.2 347 806 459 Table 18 1 V3-V4

silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0787-b-A-20 282323 48 234238 83.0 371 806 435 Table 17 0 V3-V4

silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0787-b-A-20 282323 48 258210 91.5 371 806 435 Table 18 1 V3-V4

silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0785-a-A-21 282323 48 233701 82.8 371 805 434 Table 17 0 V3-V4

silva;Bacteria;S-D-Bact-0371-a-S-20S-D-Bact-0785-a-A-21 282323 48 256593 90.9 371 805 434 Table 18 1 V3-V4

36

Optimal Primers for MiSeq Sequencing – 341F and 785R

37

• Sequence 96 sample

pooled library in single

MiSeq run

• MSR demultiplexes

data to uniquely

assign reads to

samples

Three main steps

• Amplify genomic ROI

• “Tails” on PCR

primers

• Amplify amplicons

from PCR# 1 using

indexed adapter

oligos from ILMN

• Produces barcoded

amplicons ready for

MiSeq

• Pool up to 96 samples

PCR#1 PCR #2 MiSeq

38

Step ① 1st round PCR to amplify region-of-interest (ROI)

DNA

F

R

Locus-specific sequence

Overhang adapter sequence used in Step 2

cleanup

39

Step ② 2nd round PCR to add indices + adapters

Index adapter oligos from ILMN: contain P5/P7 adapters

to make template compatible with flow cell, also

contains a unique sample index

P5 Index

1

Insert to be sequenced Index

2

P7

cleanup

40

Sequencing order on MiSeq system

– Read 1 – sequence amplicons in Forward direction up to 250 nuc.

– Index 1 – read first barcode

– Index 2 – read second barcode (software can now uniquely identify the sample)

– Read 2 – sequence amplicons in Reverse direction up to 250 nuc.

Step ③ Pool and sequence on MiSeq

P5 Index

1

Insert to be sequenced Index

2

P7

Read 1

Read 2 (optional)

Index 1 Index 2

41

Option #2 - Nextera XT

42

16S Ribosomal RNA Gene is conserved across all bacteria

43

A. Nextera XT transposome

with adapters is combined

with template DNA

B. Tagmentation to fragment

and add adapters

C. Limited cycle PCR to add

sequencing primer

sequences and indicies

NexteraXT design

44

Alternative methods allow increased multiplexing on MiSeq and HiSeq

However, more modifications are required…

Method for Multiplexing >96 samples

Option #3 - the Caporaso et al method (ISME 2012)

45

16s metagenomics using the Caporaso et al

(2012) ISME protocol

Libraries are prepared by direct amplification primer sets

that amplify the region (v4 in this case) and add the

clustering and sequencing primer regions.

Protocols developed for both the MiSeq and HiSeq

platforms – available on the Earth Microbiome (EMP) web

site (www.earthmicrobiome.org).

Average 15M reads on the MiSeq (2x150) and 70M on the

HiSeq (2x100).

Employs a 12-base index tag (Golay code).

46

MSR/BaseSpace Classification Summary Output

47

Wildlife Disease

Laboratories

– Bruce Rideout (Director)

– Josephine Braun (Scientist)

– Goal is to remove disease as

a roadblock to conservation

– Carry out following tasks

Disease surveillance efforts

Diagnostic test development

Outbreak investigations for

the animals at the San Diego

Zoo, San Diego Zoo Safari

Park, and field conservation

programs

Collaboration with San Diego Zoo Global

Conservation of the Desert Tortoise

48

Desert Tortoise 1 - Healthy

Species Desert Tortoise

Individual 18919

Disease Severity 0 HEALTHY CONTROL

Sample Name DNA-732

Origin Nasal flush

Total # Sequences PF 322,916

Unclassified (Chorny & Probert) 44,769

Classified (Chorny & Probert) 278,147

Rank Species

Total #

sequences in

sample

Total % sequences

in sample

Cumulative %

total Notes

1 Calothrix parietina 106,004 38.11% 38.1% Blue-green algae on rocks.

2 Rickettsia sp. 15,631 5.62% 43.7% Tick, flea, lice borne - some diseases.

3 Acinetobacter sp. 5,764 2.07% 45.8% Soil bacteria.

4 Symploca atlantica 4,917 1.77% 47.6% Algae.

5 Proteobacteria 4,109 1.48% 49.0% Variable. Some pathogens known. Needs more detail.

6 Thiomonas thermosulfata 3,847 1.38% 50.4% Extremophile.

7 Bacteria 3,789 1.36% 51.8% ??

8 Blautia sp. 3,705 1.33% 53.1% Gut bacteria.

9 Blautia coccoides 3,330 1.20% 54.3% Gut bacteria.

10 Pedobacter 2,881 1.04% 55.4% Soil bacteria.

Total 153,977

Other Species of Note

Mycoplasma agassizii 138 0.05%

49

Desert Tortoise 6 - Diseased

Species Desert Tortoise

Individual 16025

Severity 4 DISEASED

Sample Name DNA-1283

Origin Nasal flush

Total # Sequences 198,672

Unclassified (Chorny & Probert) 23,077

Classified (Chorny & Probert) 175,595

Rank Species

Total # sequences in

sample Total % sequences

in sample Cumulative %

total Notes

1 Mycoplasma agassizii 31,924 18.2% 18.2% Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994).

2 Myroides odoratus 11,654 6.6% 24.8%

3 Flavobacteriaceae 11,481 6.5% 31.4%

4 Chelonobacter sp. 11,384 6.5% 37.8% Associated with diseased tortoises (Gregersen et al., 2009).

5 Pedobacter sp. 7,894 4.5% 42.3% Soil bacteria.

6 Flavobacterium swingsii 5,492 3.1% 45.5%

7 Proteus penneri 5,263 3.0% 48.5% Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract.

8 Pseudomonas brenneri 4,538 2.6% 51.0% Water borne. Biofilms.

9 Pseudomonas sp. 4,193 2.4% 53.4% Water borne. Biofilms.

10 Pseudomonas marginalis 4,160 2.4% 55.8% Water borne. Biofilms.

Total 97,983

50

Desert Tortoise 8 - Diseased

Species Desert Tortoise

Individual 13498

Severity 4 DISEASED

Sample Name DNA-1300 (S10)

Origin Nasal flush

Total # Sequences PF 271,248

Unclassified (Chorny & Probert) 25,323

Classified (Chorny & Probert) 245,925

Rank Species Total # sequences

in sample

Total % sequences in

sample Cumulative %

total Notes

1 Chelonobacter sp. 141,734 57.6% 57.6% Associated with diseased tortoises (Gregersen et al., 2009).

2 Chelonobacter oris 37,883 15.4% 73.0% Associated with diseased tortoises (Gregersen et al., 2009).

3 Mycoplasma agassizii 7,639 3.1% 76.1% Proven as an etiologic agent of URTD in Desert Tortoise (Brown et al., 1994).

4 Granulicatella adiacens 5,825 2.4% 78.5% Normal commensal human mucosal membranes.

5 Flavobacteriaceae 5,443 2.2% 80.7% Water borne.

6 Myroides odoratus 4,855 2.0% 82.7% Human nosocomial infection.

7 Pedobacter sp. 4,108 1.7% 84.4% Soil bacteria.

8 Deinococcus sp. 2,051 0.8% 85.2% World's Toughest Bacteria. Soil, water.

9 Flavobacterium swingsii 1,800 0.7% 85.9% Water borne.

10 Proteus penneri 1,625 0.7% 86.6% Found intestinal tract. Invasive pathogen. MDR. Infects urinary tract.

Total 212,963

51

Mycoplasma agassizii present in majority of tortoises

– 8 of 9 diseased Desert Tortoise contain pathogen

– 7 of 9 diseased Desert Tortoise nasal flush samples this pathogen is in top ten

bacteria

Tortoise-16025 it is number one bacteria (18.2%)

Tortoise-18963 it is number two bacteria (13.9%)

Tortoise-13498 it is number three bacteria (3.1%)

– Previously recognized as etiologic agent of URTD in Desert Tortoise

Chelonobacter sp. also present in many of the tortoises

– 8 of 10 Desert Tortoise nasal flush samples this bacteria is in top ten

– Chelonobacter previously linked/associated with URTD

Healthy Control Desert Tortoise is dominated by diverse microbiome of soil and

water borne bacteria

– Points to “healthy” microbiome constituent species?

– Healthy Control Desert Tortoise also contains M. agassizii trace at 0.05%!

Summary

Nasal Microbiome of Desert Tortoises

52

MiSeq HiSeq 2000/2500

NextSeq 500 HiSeq XTen

Focused Power Flexible Power Production Power Population Power

New Illumina Sequencing Portfolio

$1000 human genome and

extreme throughput for

population-scale sequencing

Power and efficiency for

large-scale genomics

Speed and simplicity

for personal scale

genomics

Speed and simplicity

for targeted and small

genome sequencing

53

A new sequencer that combines high throughput NGS applications with the

speed, ease of use and affordability of a desktop sequencer

The most flexible applications of any desktop sequencer

– Exome, transcriptome, whole genome sequencing in a single run

– Industry-leading SBS chemistry: >75% >Q30, no homopolymer issues

Sample size flexibility

– 2 output modes: high and mid flow cells and reagents

Push-button simplicity

– Load & Go workflows

– Integrated sample-to-results solution: streamlined

informatics on-premise or in cloud

Accessible affordability

– Runs starting at $1,000 (Human Genome ~$4k)

– System list price $250K

Introducing NextSeq 500

54

Fast Applications

2 x 150bp 2 x 75bp 1 x 75bp

Exome | T-ome

18 | HOURS

NIPT | GEx

12 | HOURS

Human Genome

30 | HOURS

55

One System, Two output modes

High-Output Up to 120 Gb

400M clusters PF

1 x 75 bp to 2 x 150 bp

20

GEX

profiles

NIPT

30x

genome

6-12 exomes

RNA-Seq

Mid-Output Up to 40 Gb

130M clusters PF

2 x 75 bp to 2 x 150 bp

6-36 panels

2-3 exomes

2-4 samples

RNA-Seq