single molecule, real-time sequencing of full-length cdna
TRANSCRIPT
For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2015 Pacific Biosciences of California, Inc. All rights reserved.
Brain
Heart
Liver
Brain
Heart
Liver
Single Molecule, Real-Time Sequencing of Full-length cDNA Transcripts Uncovers Novel Alternatively Spliced Isoforms Tyson A. Clark, Ting Hon, and Elizabeth Tseng Pacific Biosciences, 1380 Willow Road, Menlo Park, CA 94025
In higher eukaryotic organisms, the majority of multi-exon genes are alternatively spliced. Different mRNA isoforms from the same gene can produce proteins that have distinct properties such as structure, function, or subcellular localization. Thus, the importance of understanding the full complement of transcript isoforms with potential phenotypic impact cannot be underscored. While microarrays and other NGS-based methods have become useful for studying transcriptomes, these technologies yield short, fragmented transcripts that remain a challenge for accurate, complete reconstruction of splice variants. The Iso-Seq™ protocol developed at PacBio offers the only solution for direct sequencing of full-length, single-molecule cDNA sequences to survey transcriptome isoform diversity useful for gene discovery and annotation. Knowledge of the complete isoform repertoire is also key for accurate quantification of isoform abundance. As most transcripts range from 1 – 10 kb, fully intact RNA molecules can be sequenced using SMRT® Sequencing (avg. read length: 10-15 kb) without requiring fragmentation or post-sequencing assembly. Our open-source computational pipeline delivers high-quality, non-redundant sequences for unambiguous identification of alternative splicing events, alternative transcriptional start sites, polyA tail, and gene fusion events. The standard Iso-Seq protocol workflow available for all researchers is presented using a deep dataset of full-length cDNA sequences from the MCF-7 cancer cell line, and multiple tissues (brain, heart, and liver). Detected novel transcripts approaching 10 kb and alternative splicing events are highlighted. Even in extensively profiled samples, the method uncovered large numbers of novel alternatively spliced isoforms and previously unannotated genes.
Abstract Sample Prep Improvements SageELF™ Size Fractionation
polyA+ RNA
Total RNA
Optional PolyA Selection
Reverse Transcription (Clontech SMARTer PCR cDNA Synthesis Kit)
Full-length 1st Strand cDNA
PCR Optimization
Large-scale Amplification
Amplified cDNA
1-2 kb
2-3 kb
3-6 kb
Size Selection (BluePippin™ System or Gel)
1-2 kb
2-3 kb
3-6 kb
Re-Amplification
1-2 kb
2-3 kb
3-6 kb
SMRTbell™ Template Preparation
1-2 kb
2-3 kb
3-6 kb
SMRT Sequencing
3-6 kb
Optional Size Selection (BluePippin System)
5-10 kb
Brain Heart Liver
Clontech® SMARTer® PCR cDNA Synthesis Kit
Iso-Seq Sample Preparation Workflow
Size Distribution of Amplified cDNA From Multiple Tissues
Sample Preparation Methods
Summary and Resources
Targeted Full-Length cDNA Sequencing
Full-Length Human Tissue Transcriptomes
PacBio Sequencing of Iso-Seq Libraries From 3 Human Tissues
Full-Length Non-Redundant Transcript Sequences
Sequencing of Full-Length RT-PCR Products Shows Differential Alternative Splicing Across Three Tissues
SageELF Allows For Collection of cDNA Molecules in 12 Fractions Across the Entire Size Distribution
Bioanalyzer® Traces of SageELF Size-Selected cDNA from Human Brain
Phusion Kapa Hifi SeqAmp
Protocol Adjustments Improve Representation of Longer Transcripts
Brain
4000
2000 1250 800 500
1-2
kb
2-3
kb
3-6
kb
5-10
kb
6-10
kb
8-12
kb
10-1
5 kb
Heart Liver
4000
2000 1250 800 500
1-2
kb
2-3
kb
3-6
kb
5-10
kb
8-12
kb
1-2
kb
2-3
kb
3-6
kb
5-10
kb
cDNA Amplified with Kapa Hifi
PacBio human three tissue dataset available here: http://blog.pacificbiosciences.com/2014/10/data-release-whole-human-transcriptome.html PacBio MCF-7 transcriptome dataset available here: http://blog.pacificbiosciences.com/2013/12/data-release-human-mcf-7-transcriptome.html Additional information and Iso-Seq protocols: http://www.pacb.com/applications/isoseq/index.html Details on data analysis of Iso-Seq data can be found here: https://github.com/PacificBiosciences/cDNA_primer/wiki
Sage Science’s BluePippin Size Fractionation
Summary: • The Iso-Seq method provides full-length cDNA
sequences without the need for assembly. • Improved sample prep and size-selection methods allows
for sequencing of transcripts up to 10 kb. • Alternatively spliced transcripts can be easily identified
from either whole transcriptome or targeted sequencing.
Example Bioanalyzer trace of four size-selected Iso-Seq libraries
Changing the PCR enzyme allows for amplification of transcripts in the 5-10 kb size range from tissue samples that have significant expression of cDNAs in that size range.
Two examples of genes with differential alternative splicing across the three tissues
Overview of the dataset showing numbers of transcripts of various sizes and the number of isoforms per gene
Sage ELF increases the flexibility of size selection and allows for isolation of amplified cDNAs from several hundred kb up to more than 10 kb in size.
Amplified cDNAs after size selection on either Sage ELF or BluePippin.
PacBio sequencing of full-length RT-PCR products simplifies identification of alternatively spliced isoforms and allows for relative quantification of isoform abundance.
RNA is converted into first strand cDNA using the Clontech SMARTer PCR cDNA Synthesis Kit followed by universal amplification. Amplified cDNA is size fractionated and converted into
SMRTbell templates for sequencing on the PacBio® RS II.