find meaning in complexity © copyright 2014-2015 by pacific biosciences of california, inc. all...

Post on 19-Dec-2015

218 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FIND MEANING IN COMPLEXITY© Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved.

For Research Use Only. Not for use in diagnostic procedures.

Tyson A. Clark, Ph.D. February 11, 2015

Iso-Seq™ Method: Sample Prep and Experimental Design for Full-Length cDNA Sequencing

2

Outline

1. Introduction to Full-length cDNA Sequencing

2. The Iso-Seq™ Method

3. Size Selection

4. Applications

5. Experimental Design Considerations

Transcript Diversity

Drosophila DSCAM Gene – 38,000 Isoforms

4

Schmucker D, et al. 2000. Cell 101:671–684

One Gene, Two Isoforms with Opposite Effects

bcl-x gene

bcl-xL bcl-xS

Inhibits cell death Activates cell death

mRNA isoforms

Current State of Transcript Assembly

“The way we do RNA-seq now is… you take the transcriptome, you blow it up into pieces and then you try to figure out how they all go back together again… If you think about it, it’s kind of a crazy way to do things.”

Michael SnyderStanford University

Tal Nawy (2013) End-to-end RNA sequencing, Nature Methods 10: 1144–1145

Ian Korf (2013) Genomics: the state of the art inRNA-seq analysis. Nature Methods 10: 1165-1166

Gene

Determination of Transcript Isoforms

Short-read technologies:

Reads spanning

splice junctions

Insufficient ConnectivitySplice Isoform Uncertainty

PacBio’sIso-Seqsolution:

Full-length cDNA Sequence ReadsSplice Isoform Certainty – No Assembly Required

mRNA isoforms

The Iso-Seq Method for High-quality, Full-length Transcripts

PolyA mRNA

AAAAA

AAAAA

AAAAA

AAAAA

cDNA synthesis with adapters

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

Size partitioning & PCR amplification

SMRTbell™ ligation

PacBio® RS II Sequencing

Experimental Pipeline

Informatics Pipeline

Remove adaptersRemove artifacts

Clean sequence

reads

Reads clustering

Isoform clusters

Consensus calling

Nonredundant transcript isoforms

Quality filtering

Final isoformsPacBio raw sequence

reads

Raw5’ primer 3’ primer

Map to reference genome

Evidenced-based gene models

(AAA)n

(TTT)n

SMRT® adapter

1 2 3 4 5

6 7 8 9 10

(TTT)n

(AAA)n

5’ UTR

Coding sequence3’

UTRpolyA

tail

SMRT® adapter

https://github.com/PacificBiosciences/cDNA_primer/

(AAA)nReads of Insert (AAA)n

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

9

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

10

Clontech SMARTer™ PCR cDNA Synthesis Kit

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

11

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or Gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

12

Testing PCR Enzymes to Improve Representation of Long cDNAs

Phusion

Kapa Hifi

SeqAmp

13

2nd Amplification (after size selection)

4000

2000

1250

800

500

Brain

1-2

kb

2-3

kb

3-6

kb

5-1

0 k

b

6-1

0 k

b

8-1

2 k

b

10

-15

kb

Kapa Polymerase

14

2nd Amplification (after size selection)

4000

2000

1250

800

500

Heart

1-2

kb

2-3

kb

3-6

kb

5-1

0 k

b

8-1

2 k

b

Liver

1-2

kb

2-3

kb

3-6

kb

5-1

0 k

b

Kapa Polymerase

Size Distribution of Amplified cDNA from Multiple Tissues

15

Brain

Heart

Liver

Ma

rke

r

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

16

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or Gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

Options for Size Fractionation

• No Size Selection

• Agarose Gel

• Sage BluePippin™ System

• SageELF™ System

17

No Size Selection

• Advantages

– Decreased sample prep time and effort

– Additional equipment is not required

• Drawbacks

– Full-length sequences will be predominately from smaller transcripts

18

No Size Selection 4 Size Bins

Ma

rke

r

Ma

rke

r

Amplified cDNA

19

BluePippin™ Size Selection

Size (bp)

Size Distribution of Size-selected SMRTbell™ Libraries

1-2kb

2-3kb

3-6kb

5-10kb

Bioanalyzer

20

Size Fractionation with SageELF™ System

21

Brain Amplifed cDNA – Size Selected

M 12 11 10 9 8 7 6 5 4 3 2 1 800-

1600

1600

-270

0

2700

-480

0

4800

-800

0

3000

1500

800500300

100

SageELF BluePippin

Kapa Polymerase

22

SageELF – 12 size bins (Amplified cDNA)

23

Amplified cDNA After Size Fractionation on SageELF System

Ma

rke

r

24

Ma

rke

r

0.8-2 kb 2-3 kb 3-5 kb >5 kb

Amplified cDNA After Size Fractionation on SageELF System

25

SageELF SMRTbell Libraries

Ma

rke

r

0.8-2 kb

2-3 kb

3-5 kb>5 kb

26

Distribution of FL Reads from SageELF Fractionated Libraries

0.8-2 kb 2-3 kb

3-5 kb >5 kb

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

27

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or Gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

28

SMRTbell Template Size Selection

3-6 kb SMRTbell LibraryBefore BP Size Selection

3-6 kb SMRTbell LibraryAfter BP Size Selection

Mar

ker

Mar

ker

29

SMRTbell Template Size Selection

3-6 kb SMRTbell LibraryBefore BP Size Selection

3-6 kb SMRTbell LibraryAfter BP Size Selection

For Research Use Only. Not for use in diagnostic procedures.

Applications

Transcript Identification and Annotation

31

Brain

Heart

Liver

Identification of Alternatively Spliced Isoforms

32

Brain

Heart

Liver

Targeted Sequencing

33

Targeted Sequencing

34

Targeted Sequencing

35

Allele-Specific Transcriptomes

Tilgner et al. (2014) PNAS 111: 9869-9874

Normalization

• Normalization reduces the representation of highly expressed genes

• Increases the diversity on a per-sequence basis

• Potential Issues:– Transcripts with secondary structure may be

degraded

– Long transcripts may be preferentially removed

– Rare isoforms of an abundant gene may be lost

• Further work to better understand these methods are ongoing

37

38

Experimental Design Considerations

• Targeted or Full Transcriptome

• Size Selection

– Methods:− No Size Selection

− Agarose Gel

− Sage BluePippin System

− SageELF System

– Number of Size Bins

• Typical Results:

– ~20,000 to 25,000 full-length transcript sequences per SMRT Cell

– Larger size fractions will have a lower percentage of FL reads

39

How Many SMRT Cells?

Number of SMRT Cells(per sample)

Experimental Goals

1 Targeted, gene-specific isoform characterization

1-8General survey of full-length isoforms in a transcriptome (moderate to high expression levels) with or without size selction

12-16 A comprehensive survey of full-length isoforms in the transcriptome across 3-4 size fractions

>16 Deep sequencing for comprehensive isoform discovery and identification of low abundance transcripts across 3-4 size fractions

40

Resources

• Iso-Seq Website (general information):

– http://www.pacb.com/isoseq

• Iso-Seq Analysis Information:

– https://github.com/PacificBiosciences/cDNA_primer/wiki

• Protocols:

– http://www.pacb.com/support/pubmap/documentation.html

• Available Datasets:

– MCF-7 Cancer Cell Line− http://

blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html

– Human Normal Tissues (Brain, Heart, Liver)− http://

blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html

For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences, and BluePippin and SageELF are trademarks of Sage Science. All other trademarks are the sole property of their respective owners.

top related