find meaning in complexity © copyright 2014-2015 by pacific biosciences of california, inc. all...
TRANSCRIPT
FIND MEANING IN COMPLEXITY© Copyright 2014-2015 by Pacific Biosciences of California, Inc. All rights reserved.
For Research Use Only. Not for use in diagnostic procedures.
Tyson A. Clark, Ph.D. February 11, 2015
Iso-Seq™ Method: Sample Prep and Experimental Design for Full-Length cDNA Sequencing
2
Outline
1. Introduction to Full-length cDNA Sequencing
2. The Iso-Seq™ Method
3. Size Selection
4. Applications
5. Experimental Design Considerations
Transcript Diversity
Drosophila DSCAM Gene – 38,000 Isoforms
4
Schmucker D, et al. 2000. Cell 101:671–684
One Gene, Two Isoforms with Opposite Effects
bcl-x gene
bcl-xL bcl-xS
Inhibits cell death Activates cell death
mRNA isoforms
Current State of Transcript Assembly
“The way we do RNA-seq now is… you take the transcriptome, you blow it up into pieces and then you try to figure out how they all go back together again… If you think about it, it’s kind of a crazy way to do things.”
Michael SnyderStanford University
Tal Nawy (2013) End-to-end RNA sequencing, Nature Methods 10: 1144–1145
Ian Korf (2013) Genomics: the state of the art inRNA-seq analysis. Nature Methods 10: 1165-1166
Gene
Determination of Transcript Isoforms
Short-read technologies:
Reads spanning
splice junctions
Insufficient ConnectivitySplice Isoform Uncertainty
PacBio’sIso-Seqsolution:
Full-length cDNA Sequence ReadsSplice Isoform Certainty – No Assembly Required
mRNA isoforms
The Iso-Seq Method for High-quality, Full-length Transcripts
PolyA mRNA
AAAAA
AAAAA
AAAAA
AAAAA
cDNA synthesis with adapters
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
AAAAATTTTT
Size partitioning & PCR amplification
SMRTbell™ ligation
PacBio® RS II Sequencing
Experimental Pipeline
Informatics Pipeline
Remove adaptersRemove artifacts
Clean sequence
reads
Reads clustering
Isoform clusters
Consensus calling
Nonredundant transcript isoforms
Quality filtering
Final isoformsPacBio raw sequence
reads
Raw5’ primer 3’ primer
Map to reference genome
Evidenced-based gene models
(AAA)n
(TTT)n
SMRT® adapter
1 2 3 4 5
6 7 8 9 10
(TTT)n
(AAA)n
5’ UTR
Coding sequence3’
UTRpolyA
tail
SMRT® adapter
https://github.com/PacificBiosciences/cDNA_primer/
(AAA)nReads of Insert (AAA)n
Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries
9
polyA+ RNA
Total RNA
Optional Poly-A Selection
Reverse Transcription
Full Length1st Strand cDNA
PCR Optimization
Large Scale Amplification
Amplified cDNA
Size Selection(BluePippin™,SageELF™, or gel)
Re-Amplification
SMRTbell™Template Preparation
SMRT® Sequencing
Optional Size Selection(BluePippin or SageELF)
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
3-6kb
Optional 5-10 kbsize fraction
10
Clontech SMARTer™ PCR cDNA Synthesis Kit
Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries
11
polyA+ RNA
Total RNA
Optional Poly-A Selection
Reverse Transcription
Full Length1st Strand cDNA
PCR Optimization
Large Scale Amplification
Amplified cDNA
Size Selection(BluePippin™,SageELF™, or Gel)
Re-Amplification
SMRTbell™Template Preparation
SMRT® Sequencing
Optional Size Selection(BluePippin or SageELF)
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
3-6kb
Optional 5-10 kbsize fraction
12
Testing PCR Enzymes to Improve Representation of Long cDNAs
Phusion
Kapa Hifi
SeqAmp
13
2nd Amplification (after size selection)
4000
2000
1250
800
500
Brain
1-2
kb
2-3
kb
3-6
kb
5-1
0 k
b
6-1
0 k
b
8-1
2 k
b
10
-15
kb
Kapa Polymerase
14
2nd Amplification (after size selection)
4000
2000
1250
800
500
Heart
1-2
kb
2-3
kb
3-6
kb
5-1
0 k
b
8-1
2 k
b
Liver
1-2
kb
2-3
kb
3-6
kb
5-1
0 k
b
Kapa Polymerase
Size Distribution of Amplified cDNA from Multiple Tissues
15
Brain
Heart
Liver
Ma
rke
r
Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries
16
polyA+ RNA
Total RNA
Optional Poly-A Selection
Reverse Transcription
Full Length1st Strand cDNA
PCR Optimization
Large Scale Amplification
Amplified cDNA
Size Selection(BluePippin™,SageELF™, or Gel)
Re-Amplification
SMRTbell™Template Preparation
SMRT® Sequencing
Optional Size Selection(BluePippin or SageELF)
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
3-6kb
Optional 5-10 kbsize fraction
Options for Size Fractionation
• No Size Selection
• Agarose Gel
• Sage BluePippin™ System
• SageELF™ System
17
No Size Selection
• Advantages
– Decreased sample prep time and effort
– Additional equipment is not required
• Drawbacks
– Full-length sequences will be predominately from smaller transcripts
18
No Size Selection 4 Size Bins
Ma
rke
r
Ma
rke
r
Amplified cDNA
19
BluePippin™ Size Selection
Size (bp)
Size Distribution of Size-selected SMRTbell™ Libraries
1-2kb
2-3kb
3-6kb
5-10kb
Bioanalyzer
20
Size Fractionation with SageELF™ System
21
Brain Amplifed cDNA – Size Selected
M 12 11 10 9 8 7 6 5 4 3 2 1 800-
1600
1600
-270
0
2700
-480
0
4800
-800
0
3000
1500
800500300
100
SageELF BluePippin
Kapa Polymerase
22
SageELF – 12 size bins (Amplified cDNA)
23
Amplified cDNA After Size Fractionation on SageELF System
Ma
rke
r
24
Ma
rke
r
0.8-2 kb 2-3 kb 3-5 kb >5 kb
Amplified cDNA After Size Fractionation on SageELF System
25
SageELF SMRTbell Libraries
Ma
rke
r
0.8-2 kb
2-3 kb
3-5 kb>5 kb
26
Distribution of FL Reads from SageELF Fractionated Libraries
0.8-2 kb 2-3 kb
3-5 kb >5 kb
Detailed Clontech Workflow for Conversion of cDNA into SMRTbell™ Libraries
27
polyA+ RNA
Total RNA
Optional Poly-A Selection
Reverse Transcription
Full Length1st Strand cDNA
PCR Optimization
Large Scale Amplification
Amplified cDNA
Size Selection(BluePippin™,SageELF™, or Gel)
Re-Amplification
SMRTbell™Template Preparation
SMRT® Sequencing
Optional Size Selection(BluePippin or SageELF)
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
1-2kb
2-3kb
3-6kb
5-10kb
3-6kb
Optional 5-10 kbsize fraction
28
SMRTbell Template Size Selection
3-6 kb SMRTbell LibraryBefore BP Size Selection
3-6 kb SMRTbell LibraryAfter BP Size Selection
Mar
ker
Mar
ker
29
SMRTbell Template Size Selection
3-6 kb SMRTbell LibraryBefore BP Size Selection
3-6 kb SMRTbell LibraryAfter BP Size Selection
For Research Use Only. Not for use in diagnostic procedures.
Applications
Transcript Identification and Annotation
31
Brain
Heart
Liver
Identification of Alternatively Spliced Isoforms
32
Brain
Heart
Liver
Targeted Sequencing
33
Targeted Sequencing
34
Targeted Sequencing
35
Allele-Specific Transcriptomes
Tilgner et al. (2014) PNAS 111: 9869-9874
Normalization
• Normalization reduces the representation of highly expressed genes
• Increases the diversity on a per-sequence basis
• Potential Issues:– Transcripts with secondary structure may be
degraded
– Long transcripts may be preferentially removed
– Rare isoforms of an abundant gene may be lost
• Further work to better understand these methods are ongoing
37
38
Experimental Design Considerations
• Targeted or Full Transcriptome
• Size Selection
– Methods:− No Size Selection
− Agarose Gel
− Sage BluePippin System
− SageELF System
– Number of Size Bins
• Typical Results:
– ~20,000 to 25,000 full-length transcript sequences per SMRT Cell
– Larger size fractions will have a lower percentage of FL reads
39
How Many SMRT Cells?
Number of SMRT Cells(per sample)
Experimental Goals
1 Targeted, gene-specific isoform characterization
1-8General survey of full-length isoforms in a transcriptome (moderate to high expression levels) with or without size selction
12-16 A comprehensive survey of full-length isoforms in the transcriptome across 3-4 size fractions
>16 Deep sequencing for comprehensive isoform discovery and identification of low abundance transcripts across 3-4 size fractions
40
Resources
• Iso-Seq Website (general information):
– http://www.pacb.com/isoseq
• Iso-Seq Analysis Information:
– https://github.com/PacificBiosciences/cDNA_primer/wiki
• Protocols:
– http://www.pacb.com/support/pubmap/documentation.html
• Available Datasets:
– MCF-7 Cancer Cell Line− http://
blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html
– Human Normal Tissues (Brain, Heart, Liver)− http://
blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html
For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences, and BluePippin and SageELF are trademarks of Sage Science. All other trademarks are the sole property of their respective owners.