find meaning in complexity © copyright 2014-2015 by pacific biosciences of california, inc. all...

41
FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in diagnostic procedures. Tyson A. Clark, Ph.D. February 11, 2015 Iso-Seq™ Method: Sample Prep and Experimental Design for Full-Length cDNA Sequencing

Upload: kelley-pearson

Post on 19-Dec-2015

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

FIND MEANING IN COMPLEXITY© Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved.

For Research Use Only. Not for use in diagnostic procedures.

Tyson A. Clark, Ph.D. February 11, 2015

Iso-Seq™ Method: Sample Prep and Experimental Design for Full-Length cDNA Sequencing

Page 2: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

2

Outline

1. Introduction to Full-length cDNA Sequencing

2. The Iso-Seq™ Method

3. Size Selection

4. Applications

5. Experimental Design Considerations

Page 3: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Transcript Diversity

Page 4: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Drosophila DSCAM Gene – 38,000 Isoforms

4

Schmucker D, et al. 2000. Cell 101:671–684

Page 5: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

One Gene, Two Isoforms with Opposite Effects

bcl-x gene

bcl-xL bcl-xS

Inhibits cell death Activates cell death

mRNA isoforms

Page 6: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Current State of Transcript Assembly

“The way we do RNA-seq now is… you take the transcriptome, you blow it up into pieces and then you try to figure out how they all go back together again… If you think about it, it’s kind of a crazy way to do things.”

Michael SnyderStanford University

Tal Nawy (2013) End-to-end RNA sequencing, Nature Methods 10: 1144–1145

Ian Korf (2013) Genomics: the state of the art inRNA-seq analysis. Nature Methods 10: 1165-1166

Page 7: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Gene

Determination of Transcript Isoforms

Short-read technologies:

Reads spanning

splice junctions

Insufficient ConnectivitySplice Isoform Uncertainty

PacBio’sIso-Seqsolution:

Full-length cDNA Sequence ReadsSplice Isoform Certainty – No Assembly Required

mRNA isoforms

Page 8: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

The Iso-Seq Method for High-quality, Full-length Transcripts

PolyA mRNA

AAAAA

AAAAA

AAAAA

AAAAA

cDNA synthesis with adapters

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

AAAAATTTTT

Size partitioning & PCR amplification

SMRTbell™ ligation

PacBio® RS II Sequencing

Experimental Pipeline

Informatics Pipeline

Remove adaptersRemove artifacts

Clean sequence

reads

Reads clustering

Isoform clusters

Consensus calling

Nonredundant transcript isoforms

Quality filtering

Final isoformsPacBio raw sequence

reads

Raw5’ primer 3’ primer

Map to reference genome

Evidenced-based gene models

(AAA)n

(TTT)n

SMRT® adapter

1 2 3 4 5

6 7 8 9 10

(TTT)n

(AAA)n

5’ UTR

Coding sequence3’

UTRpolyA

tail

SMRT® adapter

https://github.com/PacificBiosciences/cDNA_primer/

(AAA)nReads of Insert (AAA)n

Page 9: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

9

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

Page 10: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

10

Clontech SMARTer™ PCR cDNA Synthesis Kit

Page 11: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

11

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or Gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

Page 12: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

12

Testing PCR Enzymes to Improve Representation of Long cDNAs

Phusion

Kapa Hifi

SeqAmp

Page 13: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

13

2nd Amplification (after size selection)

4000

2000

1250

800

500

Brain

1-2

kb

2-3

kb

3-6

kb

5-1

0 k

b

6-1

0 k

b

8-1

2 k

b

10

-15

kb

Kapa Polymerase

Page 14: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

14

2nd Amplification (after size selection)

4000

2000

1250

800

500

Heart

1-2

kb

2-3

kb

3-6

kb

5-1

0 k

b

8-1

2 k

b

Liver

1-2

kb

2-3

kb

3-6

kb

5-1

0 k

b

Kapa Polymerase

Page 15: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Size Distribution of Amplified cDNA from Multiple Tissues

15

Brain

Heart

Liver

Ma

rke

r

Page 16: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

16

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or Gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

Page 17: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Options for Size Fractionation

• No Size Selection

• Agarose Gel

• Sage BluePippin™ System

• SageELF™ System

17

Page 18: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

No Size Selection

• Advantages

– Decreased sample prep time and effort

– Additional equipment is not required

• Drawbacks

– Full-length sequences will be predominately from smaller transcripts

18

No Size Selection 4 Size Bins

Ma

rke

r

Ma

rke

r

Amplified cDNA

Page 19: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

19

BluePippin™ Size Selection

Size (bp)

Size Distribution of Size-selected SMRTbell™ Libraries

1-2kb

2-3kb

3-6kb

5-10kb

Bioanalyzer

Page 20: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

20

Size Fractionation with SageELF™ System

Page 21: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

21

Brain Amplifed cDNA – Size Selected

M 12 11 10 9 8 7 6 5 4 3 2 1 800-

1600

1600

-270

0

2700

-480

0

4800

-800

0

3000

1500

800500300

100

SageELF BluePippin

Kapa Polymerase

Page 22: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

22

SageELF – 12 size bins (Amplified cDNA)

Page 23: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

23

Amplified cDNA After Size Fractionation on SageELF System

Ma

rke

r

Page 24: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

24

Ma

rke

r

0.8-2 kb 2-3 kb 3-5 kb >5 kb

Amplified cDNA After Size Fractionation on SageELF System

Page 25: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

25

SageELF SMRTbell Libraries

Ma

rke

r

0.8-2 kb

2-3 kb

3-5 kb>5 kb

Page 26: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

26

Distribution of FL Reads from SageELF Fractionated Libraries

0.8-2 kb 2-3 kb

3-5 kb >5 kb

Page 27: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries

27

polyA+ RNA

Total RNA

Optional Poly-A Selection

Reverse Transcription

Full Length1st Strand cDNA

PCR Optimization

Large Scale Amplification

Amplified cDNA

Size Selection(BluePippin™,SageELF™, or Gel)

Re-Amplification

SMRTbell™Template Preparation

SMRT® Sequencing

Optional Size Selection(BluePippin or SageELF)

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

1-2kb

2-3kb

3-6kb

5-10kb

3-6kb

Optional 5-10 kbsize fraction

Page 28: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

28

SMRTbell Template Size Selection

3-6 kb SMRTbell LibraryBefore BP Size Selection

3-6 kb SMRTbell LibraryAfter BP Size Selection

Mar

ker

Mar

ker

Page 29: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

29

SMRTbell Template Size Selection

3-6 kb SMRTbell LibraryBefore BP Size Selection

3-6 kb SMRTbell LibraryAfter BP Size Selection

Page 30: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

For Research Use Only. Not for use in diagnostic procedures.

Applications

Page 31: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Transcript Identification and Annotation

31

Brain

Heart

Liver

Page 32: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Identification of Alternatively Spliced Isoforms

32

Brain

Heart

Liver

Page 33: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Targeted Sequencing

33

Page 34: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Targeted Sequencing

34

Page 35: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Targeted Sequencing

35

Page 36: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Allele-Specific Transcriptomes

Tilgner et al. (2014) PNAS 111: 9869-9874

Page 37: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

Normalization

• Normalization reduces the representation of highly expressed genes

• Increases the diversity on a per-sequence basis

• Potential Issues:– Transcripts with secondary structure may be

degraded

– Long transcripts may be preferentially removed

– Rare isoforms of an abundant gene may be lost

• Further work to better understand these methods are ongoing

37

Page 38: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

38

Experimental Design Considerations

• Targeted or Full Transcriptome

• Size Selection

– Methods:− No Size Selection

− Agarose Gel

− Sage BluePippin System

− SageELF System

– Number of Size Bins

• Typical Results:

– ~20,000 to 25,000 full-length transcript sequences per SMRT Cell

– Larger size fractions will have a lower percentage of FL reads

Page 39: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

39

How Many SMRT Cells?

Number of SMRT Cells(per sample)

Experimental Goals

1 Targeted, gene-specific isoform characterization

1-8General survey of full-length isoforms in a transcriptome (moderate to high expression levels) with or without size selction

12-16 A comprehensive survey of full-length isoforms in the transcriptome across 3-4 size fractions

>16 Deep sequencing for comprehensive isoform discovery and identification of low abundance transcripts across 3-4 size fractions

Page 40: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

40

Resources

• Iso-Seq Website (general information):

– http://www.pacb.com/isoseq

• Iso-Seq Analysis Information:

– https://github.com/PacificBiosciences/cDNA_primer/wiki

• Protocols:

– http://www.pacb.com/support/pubmap/documentation.html

• Available Datasets:

– MCF-7 Cancer Cell Line− http://

blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html

– Human Normal Tissues (Brain, Heart, Liver)− http://

blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html

Page 41: FIND MEANING IN COMPLEXITY © Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in

For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences, and BluePippin and SageELF are trademarks of Sage Science. All other trademarks are the sole property of their respective owners.