a study of tp53 rna splicing illustrates pitfalls of rna...

10
Molecular and Cellular Pathobiology A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA-seq Methodology Sunali Mehta 1,2,3 , Peter Tsai 4,5 , Annette Lasham 2,3,4 , Hamish Campbell 6 , Roger Reddel 6 , Antony Braithwaite 1,2,3,6 , and Cristin Print 2,3,4,5 Abstract TP53 undergoes multiple RNA-splicing events, resulting in at least nine mRNA transcripts encoding at least 12 func- tionally different protein isoforms. Antibodies specic to p53 protein isoforms have proven difcult to develop, thus researchers must rely on the transcript information to infer isoform abundance. In this study, we used deep RNA-seq, droplet digital PCR (ddPCR), and real-time quantitative reverse transcriptase PCR (RT-qPCR) from nine human cell lines and RNA-seq data available for tumors in The Cancer Genome Atlas to analyze TP53 splice variant expression. All three methods detected expression of the FL/40TP53a_T1 variant in most human tumors and cell lines. However, other less abundant variants were only detected with PCR- based methods. Using RNA-seq simulation analysis, we determined why RNA-seq is unable to detect less abundant TP53 transcripts and discuss the implications of these nd- ings for the general interpretation of RNA-seq data. Cancer Res; 76(24); 19. Ó2016 AACR. Introduction The p53 tumor suppressor protein plays a central role in maintaining the integrity of the genome by promoting repair, senescence, or apoptosis of DNA-damaged cells (1, 2). Thus, it is not surprising that TP53 is the most frequently mutated gene in most human cancers (3). Recently, the clinical importance of assessing factors beyond just TP53 mutation status, such as the relative abundance of functionally distinct TP53 RNA variants and the protein isoforms they encode, has been recognized. In humans, at least 9 different protein-encoding transcripts are expressed from the TP53 locus (Fig. 1). Multiple studies have now shown that p53 isoforms have distinct functions and frequently contribute to diseases including cancer (418). Elevated expression of the D40p53 isoform has been associated with metastatic melanoma and triple-negative breast cancers (7, 19). In contrast, elevated levels of TP53b were found to be negatively associated with tumor size and positively associated with disease-free survival of breast cancer patients with TP53 mutations (7). Opposing functions of these two isoforms was further demonstrated in melanoma cell lines where D40p53 was shown to inhibit p53-dependent transcription of down- stream target genes, whereas p53b stimulated transactivation (19). p53b also promoted replicative senescence in T cells, which was blocked by D133p53 (4). D133p53 has also been shown to inhibit p53-dependent apoptosis and G 1 arrest, but not G 2 arrest (20), suggesting that D133p53 does not just act as a dominant negative, and the zebrash homolog of D133p53, D113p53, also inhibits p53-initiated apoptosis (21). Studies in mice suggest that the D40p53 isoform controls the switch from pluripotency to differentiation (22) as well as controling b-cell proliferation and glucose homeostasis (23). D133p53a has been shown to stimu- late proliferation (6) and angiogenesis of tumor cells (6) and D133p53b promotes stem cell differentiation by upregulating pluripotency factors such as SOX2, OCT3/4, and NANOG (24). A mouse model of D133p53 designated D122p53 was shown to promote hyperproliferation, tumorigenesis, and inammation (9) and overexpression of D122p53 promoted cell migration, invasion through three-dimensional (3D) matrices (16), and upregulation of metastasis-associated proteins (14). Finally, recent data from the D122p53 mouse has shown that there is also cooperativity as D122p53 enhanced the survival of p53- mutant mice (25) and full-length p53 increased the ability of D122p53 to promote cell migration (16). This is consistent with the isoforms functioning in a complex network to determine cell fate outcomes (17). Thus, given the potential importance of p53 isoforms in normal cell physiology and in disease, it is important to be able to quantitate the expression levels of the isoforms in cells and tissues. Unfortunately, as antibodies specic to p53 protein isoforms have proved difcult to develop (26), analysis of isoform abundance can presently only be inferred from RNA transcript levels, although it is true that there will not always be a direct correlation between transcript and protein. Methods to detect TP53 RNA splice variants include RT-qPCR (27), digital droplet PCR (ddPCR), expression microarrays, and NanoString. ddPCR has 1 Department of Pathology, University of Otago, Dunedin, New Zealand. 2 Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Auckland, New Zealand. 3. Maurice Wilkins Centre for Molecular Biodiscovery, University of Otago, Dunedin, New Zealand. 4 Department of Molecular Medicine and Pathol- ogy, Faculty of Medicine, University of Auckland, Auckland, New Zealand. 5 Bioinformatics Institute, University of Auckland, Auckland, New Zealand. 6 Children's Medical Research Institute, University of Sydney, Westmead, New South Wales, Australia. Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). A. Braithwaite and C. Print are co-senior authors of this article. Corresponding Author: Sunali Mehta, Department of Pathology, Dunedin School of Medicine, University of Otago, PO Box 56, Dunedin 9054, New Zealand. Phone: 642-1175-5384; Fax: 643-479-7136; E-mail: [email protected] doi: 10.1158/0008-5472.CAN-16-1624 Ó2016 American Association for Cancer Research. Cancer Research www.aacrjournals.org OF1 Research. on May 7, 2018. © 2016 American Association for Cancer cancerres.aacrjournals.org Downloaded from Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Upload: phamcong

Post on 05-Mar-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

Molecular and Cellular Pathobiology

A Study of TP53 RNA Splicing Illustrates Pitfallsof RNA-seq MethodologySunali Mehta1,2,3, Peter Tsai4,5, Annette Lasham2,3,4, Hamish Campbell6,Roger Reddel6, Antony Braithwaite1,2,3,6, and Cristin Print2,3,4,5

Abstract

TP53 undergoes multiple RNA-splicing events, resulting inat least nine mRNA transcripts encoding at least 12 func-tionally different protein isoforms. Antibodies specific to p53protein isoforms have proven difficult to develop, thusresearchers must rely on the transcript information to inferisoform abundance. In this study, we used deep RNA-seq,droplet digital PCR (ddPCR), and real-time quantitativereverse transcriptase PCR (RT-qPCR) from nine human celllines and RNA-seq data available for tumors in The Cancer

Genome Atlas to analyze TP53 splice variant expression. Allthree methods detected expression of the FL/40TP53a_T1variant in most human tumors and cell lines. However,other less abundant variants were only detected with PCR-based methods. Using RNA-seq simulation analysis, wedetermined why RNA-seq is unable to detect less abundantTP53 transcripts and discuss the implications of these find-ings for the general interpretation of RNA-seq data. CancerRes; 76(24); 1–9. �2016 AACR.

IntroductionThe p53 tumor suppressor protein plays a central role in

maintaining the integrity of the genome by promoting repair,senescence, or apoptosis of DNA-damaged cells (1, 2). Thus, it isnot surprising that TP53 is the most frequently mutated gene inmost human cancers (3). Recently, the clinical importance ofassessing factors beyond just TP53 mutation status, such as therelative abundance of functionally distinct TP53RNAvariants andthe protein isoforms they encode, has been recognized. Inhumans, at least 9 different protein-encoding transcripts areexpressed from the TP53 locus (Fig. 1).

Multiple studies have now shown that p53 isoforms havedistinct functions and frequently contribute to diseases includingcancer (4–18). Elevated expression of the D40p53 isoform hasbeen associated with metastatic melanoma and triple-negativebreast cancers (7, 19). In contrast, elevated levels of TP53b werefound to be negatively associated with tumor size and positivelyassociated with disease-free survival of breast cancer patients with

TP53 mutations (7). Opposing functions of these two isoformswas further demonstrated in melanoma cell lines where D40p53was shown to inhibit p53-dependent transcription of down-stream target genes, whereas p53b stimulated transactivation(19). p53b also promoted replicative senescence in T cells, whichwas blocked by D133p53 (4). D133p53 has also been shown toinhibit p53-dependent apoptosis and G1 arrest, but not G2 arrest(20), suggesting that D133p53 does not just act as a dominantnegative, and the zebrafish homolog of D133p53, D113p53, alsoinhibits p53-initiated apoptosis (21). Studies inmice suggest thatthe D40p53 isoform controls the switch from pluripotency todifferentiation (22) as well as controling b-cell proliferation andglucose homeostasis (23). D133p53a has been shown to stimu-late proliferation (6) and angiogenesis of tumor cells (6) andD133p53b promotes stem cell differentiation by upregulatingpluripotency factors such as SOX2, OCT3/4, and NANOG (24).A mouse model of D133p53 designated D122p53 was shown topromote hyperproliferation, tumorigenesis, and inflammation(9) and overexpression of D122p53 promoted cell migration,invasion through three-dimensional (3D) matrices (16), andupregulation of metastasis-associated proteins (14). Finally,recent data from the D122p53 mouse has shown that there isalso cooperativity as D122p53 enhanced the survival of p53-mutant mice (25) and full-length p53 increased the ability ofD122p53 to promote cell migration (16). This is consistent withthe isoforms functioning in a complex network to determine cellfate outcomes (17).

Thus, given thepotential importance of p53 isoforms innormalcell physiology and in disease, it is important to be able toquantitate the expression levels of the isoforms in cells and tissues.Unfortunately, as antibodies specific to p53protein isoformshaveproved difficult to develop (26), analysis of isoform abundancecan presently only be inferred from RNA transcript levels,although it is true that there will not always be a direct correlationbetween transcript and protein. Methods to detect TP53 RNAsplice variants include RT-qPCR (27), digital droplet PCR(ddPCR), expression microarrays, and NanoString. ddPCR has

1Department of Pathology, University ofOtago, Dunedin, NewZealand. 2MauriceWilkins Centre for Molecular Biodiscovery, University of Auckland, Auckland,New Zealand. 3.MauriceWilkins Centre for Molecular Biodiscovery, University ofOtago, Dunedin, New Zealand. 4Department of Molecular Medicine and Pathol-ogy, Faculty of Medicine, University of Auckland, Auckland, New Zealand.5Bioinformatics Institute, University of Auckland, Auckland, New Zealand.6Children's Medical Research Institute, University of Sydney, Westmead, NewSouth Wales, Australia.

Note: Supplementary data for this article are available at Cancer ResearchOnline (http://cancerres.aacrjournals.org/).

A. Braithwaite and C. Print are co-senior authors of this article.

Corresponding Author: Sunali Mehta, Department of Pathology, DunedinSchool of Medicine, University of Otago, PO Box 56, Dunedin 9054, NewZealand. Phone: 642-1175-5384; Fax: 643-479-7136; E-mail:[email protected]

doi: 10.1158/0008-5472.CAN-16-1624

�2016 American Association for Cancer Research.

CancerResearch

www.aacrjournals.org OF1

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 2: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

Figure 1.

The genomic complexity at the TP53 locus using the GRCh37/hg19 as a reference. A, Schematic of the human TP53 gene structure. The nine TP53 RNA transcriptsencoding 12 protein isoforms (FLp53a, FLp53b, FLp53g , D40p53a, D40p53b, D40p53g , D133p53a, D133p53b, D133p53g) generated by alternative splicing(a, b, and g) and alternative promoter usage (P1 and P2) are indicated along with other TP53 transcripts reported in the literature. At the top of the figure, exons arenumbered and illustrated in purple, with the regions of supplementary exon 9b that are included in the alternatively spliced b and g variants shown in pinkand blue, respectively. Intron 1 was truncated due to space constraints (represented by double vertical dashed lines). SINE repeat regions along the gene locus areshown in orange. Genomic sequences that correspond to the RT-qPCR products generated in this study are shown: pFL, pD40, and pD133 correspondto the full-length, D40, and D133 alternative 50 ends of TP53 transcripts, respectively, while pa, pb, and pg correspond to the a, b, and g alternative 30 ends of TP53transcripts, respectively. Four 125-nt RNA sequence regions against which RNA-seq reads were counted are also shown; R1 is located in exon 4, R2 is commonto the AluJb element and the D133TP53 50 UTR sequence, R3 is unique to the D133TP53 50 UTR sequence and excludes the AluJb element, and R4 spansexons 5 and 6. B, The intron 1–exon 2 junction of FL/D40TP53_T2 and FL/D40TP53_T1 mRNA transcripts are compared, showing the additional CAG sequenceincorporated at the start of exon 2 in the FL/D40TP53_T1 splice variant family. C, The partial overlap between the 50 UTR of the D133P53 splice variant familyand an AluJb repeat element is identified, with the sequence of this overlap shown below. D, The overlap between transcripts from adjacent geneWRAP53 and theexon 1 of TP53 relative to the forward strand of chromosome 17. Gene structures were drawn using Fancy Gene v1.4 (49).

Mehta et al.

Cancer Res; 76(24) December 15, 2016 Cancer ResearchOF2

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 3: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

the particular advantage of providing absolute quantitation oftarget TP53 sequences (28). At the whole transcriptome level,RNA-seq is progressively displacing traditional microarray meth-ods, as it can in principle identify novel transcripts generated byalternative splicing, gene fusion events, strand-specific and anti-sense transcription, and small and noncoding RNAs, as well asassessing allelic bias (29–32).

Use of RNA-seq data from large cancer genomic data reposi-tories (33, 34) potentially provides an excellent resource toinvestigate expression of TP53 transcript variants. While assessingthe abundance of transcripts expressed from the TP53 locus as awhole is straightforward, identifying individual TP53 transcriptvariants is not trivial, due to the complexity of the TP53 gene locusencoding multiple transcripts, several of which appear to beexpressed at low levels. TP53 gene transcription can be initiatedeither at a promoter upstream of exon 1 (P1) expressing severalforms of full-length protein (FLp53) or froman internal promoterin intron 4 (P2), expressing theD133p53 isoforms. TheD133TP53transcript also encodes the D160p53 protein, which lacks the first160 amino acids generated by alternative initiation of translation(10). In addition, internal ribosome entry site (IRES)-mediatedtranslation of TP53 mRNA leads to expression of the D40p53isoforms (35). Of note, there are two sets of transcripts capable ofencodingboth full-length (FL)/D40TP53_T1 and FL/D40TP53_T2,differ by only three nucleotides in their length, due to theincorporation of CAG at the intron exon junction of exon 2 (Fig.1B). Another mechanism by which D40TP53 transcripts are gen-erated dependsonG-quadraplex structures in intron3,which affectthe splicing of intron 2 (36). Finally, alternative splicing betweenexon 9 and exons 9B/10 generates three alternative 30 ends forTP53 transcripts (TP53a, b, and g ; ref. 12). Thus, the human TP53gene can express at least 9 transcript variants (Fig. 1) encoding atleast 12 protein isoforms (FLp53a; b and g , D40p53a; b and g ,D133p53a, D133p53b, D133p53g , D160p53a, b, and g). Inaddition to these protein-coding transcripts, several noncodingtranscripts generated through alternative splicing have beenreported, for example, p53c, a transcriptionally inactive p53isoform with an ability to reprogram cells toward a metastatic-like state (Fig. 1; refs. 31, 37).

To characterize TP53 RNA splice variant expression, in thisstudy, we quantified the expression of all known TP53 transcriptsusing RNA-seq data fromTheCancerGenomeAtlas (TCGA) and 9human cell lines. Our results show that irrespective of tumor typeand cell line, and even when using unusually high RNA-seq readdepth, commonly used bioinformatic analysis pipelines struggledto detect low abundance TP53 RNA splice variants and had lowersensitivity compared with PCR-based methods. Our analysis ofsimulated RNA-seq data showed that in the presence of unequalTP53 splice variant abundance or low RNA-seq read counts,significant biases in TP53 splice variant quantification occur. Ourdata also suggest that only the FL/D40TP53a_T1 variant is highlyexpressed in human cancers and in the 9 cell lines, whereas otherisoforms are expressed at much lower levels. These results haveboth biological and technical implications for TP53 research aswell as broader implications for the many studies that quantifyRNA splice variants in RNA-seq data.

Materials and MethodsRetrieval of RNA-seq data fromTCGAdatasets and data analysis

We downloaded TCGA RNA-seq data, which was processedusing the RNA-seq by Expectation–Maximization (RSEM) meth-

od (38) and normalized to a fixed upper quartile (TCGA Maps-pliceRSEM version 0.7 pipeline using the GRCh37/hg19 refer-ence), for 10,310 tumors over 32 cancer types (SupplementaryTable S1; level 3 data downloaded on October 20, 2015 from theTCGA data portal). Detailed description of the processing proto-col can be found in the TCGA open access FTP download direc-tories (https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ft-pusers/anonymous/tumor/). All data analyses and visualizationswere performed using the R statistical framework.

RNA-seq of cell lines and data analysisTotal RNA was extracted using TRIzol reagent (Thermo Fisher

Scientific) from 6 human osteosarcoma cell lines: HAL, KPD,U2OS, Saos-2, ZK58, OHS, and three human fibroblast cell lines:IIICF/c, LFS-05F-24 (LFS), and JFCF-6 (JFCF) obtained from R.Reddel (Children's Medical Research Institute, New South Wales,Australia) over the period of 2014–2015. All cell lines werevalidated for authenticity by CellBank Australia (www.cellbanka-sutralia.com) using STR profiling. Strand-specific libraries for Tru-Seq total RNA sequencingwere generated and 2� 15 cycles of PCRamplification were carried out as per the manufacturer's recom-mendation prior to sequencing of the 9 samples. Sequencing wasperformed using 3 lanes of the Illumina HiSeq platform (2� 125bp, paired end) with each library multiplexed across all 3 lanes.Library and run parameters are summarized in SupplementaryTable S2. Raw sequence files for the cell lines are deposited in SRAand their accession numbers are SAMN05725778 (HAL), SAMN05725779 (KPD), SAMN05725784 (IIICF/c), SAMN05725785(LFS), SAMN05725780 (U2OS), SAMN05725781 (Saos-2),SAMN05725782 (ZK58), SAMN05725786 (JFCF), and SAMN05725783 (OHS), respectively. RNA-seq reads were qualitytrimmed using cutadapt v1.9.1 (39) to remove surplus adapters,bases with Phred score <20, and paired reads <50 bp post-trim-ming. Reads were aligned using bowtie-2 v2.2.7 (40) via RSEMv1.2.28 (38) with recommended RSEM parameters for strandedRNA-seq data. Gene and transcript abundance was quantifiedusing RSEM. Sequencing reads were also aligned using bowtie-2v2.2.7 (40) via TopHat v2.1.1 (41) and quantified using Cufflinksv2.2.1 (42), using the default settings.

The mRNA sequence for 7 TP53 transcripts (FL/D40TP53a_T1,FL/D40TP53a_T2 FL/D40TP53b, FL/D40TP53g, D133TP53a, b,and g) was downloaded from the UCSC genome browser (43)alongwith the Locus ReferenceGenomic (LRG) database IDs (44)for the reference track set to GRCh37/hg19 (Supplementary TableS3). Readswere then simulated for eachof these 7TP53 transcriptsusing dwgsim v0.1.11 (https://github.com/nh13/DWGSIM;ref. 45). Simulated reads were aligned and transcript abundancewas quantified using either bowtie-2 (40) via RSEM (38), orbowtie-2 (40) via TopHat (41), and quantified using Cufflinks(42). Read counts in each sample overlapping 4 specific genomicregions were counted by using the sequences and their reversecomplements as bespoke references then mapping the reads tothese references using bowtie-2 (40), followed by counting withHTSeq Python scripts (46).

ddPCR and RT-qPCRThe RNA (2 mg) extracted from the 9 cell lines was DNase I

treated (Thermo Fisher Scientific) and then reverse transcribedusing qScript cDNA SuperMix (Quanta Biosciences), according tothemanufacturer's instructions. Primerswere designed for specificTP53 transcript subclasses (FL/D40TP53_T1, FL/D40TP53_T2 and

TP53 Transcript Analysis Identifies Pitfalls of RNA-seq

www.aacrjournals.org Cancer Res; 76(24) December 15, 2016 OF3

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 4: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

D133TP53, TP53a, b, and g). Primer sequences and amplificationefficiency and the method used to determine amplification effi-ciency for eachprimer pair is provided in Supplementary Table S4.Transcript abundance was measured using the Bio-Rad QX200ddPCR System (Bio-Rad). In brief, 0.5–50 ng of cDNA was addedto10mLofQX200ddPCREvaGreenSupermix (Bio-Rad)with 100nmol/L each of forward and reverse primers in a total volume of20 mL. Droplets were generated with Droplet Generation Oil forEvaGreen (Bio-Rad), transferred to a 96-well PCR plate, and anendpoint PCR run in a C1000 Touch Thermal Cycler as follows:95�C for 5 minutes, then 40 cycles of 95�C for 30 seconds and60�C for 1 minute, then 1 cycle each of 4�C for 5 minutes, andthen 90�C for 5 minutes. Droplets were read using the Bio-RadQX200 reader and the data analyzed using QuantaSoft software(Supplementary Fig. S1). The limit of detection was based onobtaining >1 but <2 droplets for each primer pair in the volumeused for the ddPCR reaction, which was corrected for the actualamount of cDNA in the reaction to calculate copies/mg. On thebasis of this analysis, the limit of detection for ddPCRwas definedas 30 copies/mg.

RT-qPCRwas done as follows: 66ng of cDNAwas added to 5mLof SYBR Premix Ex Taq II (Tli RNase H Plus; Takara Bio), 0.4 mL ofRox dye, and 200 nmol/L of each primer in a final volume of 10mL. The PCRs were run on aQuantStudio 12K Flex Real-Time PCRSystem (Life Technologies) as follows: 95�C for 2 minutes, then40 cycles of 95�C for 30 seconds, 60�C for 1minute, followed by amelt curve stage to visualize the dissociation curves of the product(Supplementary Fig. S2). RT-qPCR was performed for each sam-ple with each primer pair in triplicate. Transcript abundance fromRT-qPCR was calculated using the equation:

Amplification efficiency �ðgeometric mean of reference gene threshold Ct�TP53 threshold CtÞ

To avoid detection of any false positive products, the limit ofdetection was set at a Ct value of 32, as its precision decreases andthe number of false positive hits increases at higher Ct values. TheRT-qPCR products were Sanger sequenced and the sequences ofthe amplicons are provided in Supplementary Fig. S3.

Results and DiscussionRNA-seq detects abundantly expressed TP53 transcriptionalvariants

To estimate the expression of TP53 transcript variants, weanalyzed TCGA RNA-seq data from 10,310 human tumors across32 tumor types. In 99% of TCGA tumors, the overall TP53 geneexpression was above the median expression of all genes and in75%of tumorsTP53was expressed above the75thpercentile of allgenes (Fig. 2A). However, despite the abundance of TP53, onlytwoRNA splice variants were readily identified across the differenttumor types. These included FL/D40TP53a_T1 and an uncharac-terized variant (uc010cne.1), while other TP53 transcripts, includ-ing the canonical FL/D40TP53a_T2, FL/D40TP53b, and FL/D40TP53g transcripts and the D133TP53a, b, and g transcripts,were not detected (Fig. 2B).

Similar to the tumors, the only transcript variant from RSEM-processed RNA-seq data detected in 6 of the 9 cell lines was the FL/D40TP53a_T1. All other TP53 transcript variants were barelydetected, including uc010cne.1 (Fig. 2C).

DeepRNA-seq, RT-qPCR, andddPCRof human cancer cell linesTo determine whether deep RNA-seq data (121–139 million

reads per sample) analyzed using the RSEM protocol of the 9 cell

lines was an accurate reflection of the transcript abundance,ddPCR and RT-qPCR were also performed for each of the TP53transcript subclasses (FL/D40TP53_T1, FL/D40TP53_T2, andD133TP53, TP53a, b, and g). All three methods detected expres-sion from the TP53 locus in IIICF/c, LFS, U2OS, JFCF, and OHScells, but not in Saos-2, HAL, KPD, or ZK58 cells (Fig. 3; Supple-mentary Table S5). As we observed for the TCGA data, deep RNA-seq detected the FL/D40TP53a_T1 transcript across the 5 cell lineswith an expected read count of between 83 and 10,032 reads(Supplementary Table S5). Similarly, expression of the transcriptsubclasses FL/D40TP53_T1 and TP53a using ddPCR (Fig. 3A andB) andRT-qPCR (Supplementary Table S5)wasmuch higher thanother TP53 transcripts across these 5 cell lines. However, evendeep RNA-seq struggled to detect low abundance TP53 transcriptsubclasses including FL/D40TP53_T2, D133TP53, and TP53b(Supplementary Table S5), whereas these were readily detectedby ddPCR (Fig. 3C–E; Supplementary Table S5) and RT-qPCR(Supplementary Table S5). For example, in U20S cells whereddPCR detected 1,574 copies/mg of D133TP53 mRNA, whileRNA-seq failed to detect any transcript except FL/D40TP53a_T1(Fig. 3; Supplementary Table S5). Moreover, across these 5 celllines, relative expression of theD133TP53 transcript was 1%of theFL/D40TP53_T1 transcript except in IIICF/c cells (SupplementaryFig. S4). Similarly, the relative levels of the FL/D40TP53_T2 andTP53b were 2% and 20% of the FL/D40TP53_T1 transcript acrossthe cell lines, respectively (Supplementary Fig. S4). The relativeexpression of these transcripts to the FL/D40TP53_T1 transcriptfurther highlights their rarity. These data suggest that even deepRNA-seq both fails to detect and underestimates the abundance ofrare TP53 transcripts. None of the three methods detected theexpression of the TP53g across the cell lines implying that thistranscript is expressed at very low levels in all samples examined.Despite RNA-seq data on tumors from TCGA barely detectingD133TP53 transcripts, high levels of this transcript have beendetected using RT-qPCR in cohorts of brain, prostate, and coloncancers (unpublished data). These results suggest that there aremarked differences in sensitivity between RNA-seq and PCR-based methods.

Lack of detection of TP53 transcript subclasses with low levels ofexpression (FL/D40TP53_T2,D133TP53 and TP53b) by deep RNA-seqwas partially explainedby the fact that�5 individually countedRNA-seq reads were found to contain both PCR primers for FL/D40TP53_T2,D133TP53, and TP53b transcript subclasses (Supple-mentary Table S6). Also, the detection of the D133TP53 transcriptsubclass is complicated by the overlap between the AluJb sequenceand the 50-UTR of D133TP53 (Fig. 1C). Once again,�6 reads fromdeep RNA-seq spanned the 125-nt region unique to the 50-UTR ofD133TP53, suggesting low expression of this splice variant relativeto other variants, given that much higher numbers of reads werecounted in 125-nt regions of exon 4 and exons 5/6. A highernumber of RNA-seq reads in the region overlapping the AluJbsequence than in theadjacent50-UTRofD133TP53 suggests that theartifactual read counts could have been recorded against this repeatregion due to expressed AluJb sequences from elsewhere in thegenome (Supplementary Table S7).

Simulation analysis of RNA-seq read assignment revealsmapping uncertainty for low abundance transcripts

In addition to a very low number of reads, assigning a set ofshort RNA-seq reads (125 nt), few of which cross RNA splice sites,to numerous alternatively spliced transcripts in a complex locus

Mehta et al.

Cancer Res; 76(24) December 15, 2016 Cancer ResearchOF4

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 5: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

like TP53 is theoretically a difficult task. Thus, to test this, wereplicated theRSEManalysis donebyTCGAusing simulatedRNA-seq reads. Initially, input for RSEM was 1,000 simulated sepa-rately aligned and counted reads for FL/D40TP53_T1, FL/D40TP53_T2, FL/D40TP53b, FL/D40TP53g, D133TP53a, b, andg respectively. As expected, RSEMcould accurately assign the reads

to the correct TP53 transcript when the input data consisted ofonly one of the transcripts. Furthermore, when the simulatedinput consisted of an equal mixture of these transcripts each with1,000 reads, RSEM assigned the reads with a maximum error of�5%to the correctTP53 transcript variants.However, RSEM failedto accurately assign reads to the correct TP53 transcript variant

Figure 2.

The transcript T1 is the most frequently detected TP53 transcript from RSEM-analyzed data. A, Percentile rank of TP53 gene expression within individual tumors ofRNA-seq RSEM analyzed data. Within each tumor, genes were sorted by increasing expression of the TP53 gene along the x-axis, with the median geneexpression across tumors shown in the dotted gray line.B,Distribution of the expression of either anyRNAencodedby the TP53gene locus or specific TP53 transcriptvariants from RNA-seq data analyzed using the RSEM method for 32 cancer types from TCGA. Details for each of the tumor types, including definition of theabbreviations, are summarized in Supplementary Table S1. C, Distribution of the expression of either the TP53 gene locus or transcript variant from RNA-seq dataanalyzed using the RSEM method for nine human cell lines. The line in the middle of each box represents the median, and the top and bottom outlines ofthe box represent the first and third quartile, respectively.

TP53 Transcript Analysis Identifies Pitfalls of RNA-seq

www.aacrjournals.org Cancer Res; 76(24) December 15, 2016 OF5

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 6: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

when the simulated input consisted of a mixture of TP53a, b, andg transcripts each with only 8 reads (Fig. 4). Similar results wereobserved when the same simulated RNA-seq data were analyzedusing TopHat2-Cufflinks as an alternative method. This suggeststhat, perhaps unsurprisingly, the accuracy of multiple RNA-seqanalysis methods for enumerating transcript variants degradeswith very low read counts.

We then tested the robustness of RSEM for detecting mixturescontaining high and low abundance TP53 transcripts. The inputfor RSEM consisted of 1,000 simulated reads for FL/D40TP53a_T1, representing an abundantly expressed transcript,with low abundance transcripts represented by a simulated inputof either 400, 40, or 8 reads for FL/D40TP53a_T2, FL/D40TP53b,FL/D40TP53g, D133TP53a, b, and g, respectively. In all 3 simula-tions (400, 40, or 8 reads), RSEM assigned the reads to the FL/D40TP53a_T1 with a maximum error of �10%. However, as thenumber of simulated input reads for the various TP53 transcriptsdecreased compared with the FL/D40TP53a_T1, assignment errorby the RSEM algorithm increased, resulting in overrepresentationof some TP53 transcripts and lack of detection of others (Fig. 4).For example, as the number of simulated input readsdecreased forthe D133TP53 subclass, it resulted in artifactual over-representa-tion of the D133TP53a (blue bar), artifactual under-representa-tion of D133TP53b (purple bar) and D133TP53g (light blue bar),respectively (Fig. 4). Similarly, an over representation of the FL/D40TP53b (orange bar) and FL/D40TP53g (pink bar) and underrepresentation of the FL/D40TP53a_T2 (green bar), respectively,was observed across the different simulations (Fig. 4). To confirm

thatTP53 splice variant quantificationwas not influencedbyTP53gene mutation, we showed the 20 most frequent Cosmic TP53single-nucleotide variants and indels caused no significant deg-radation in the mapping of sequence reads to the correct positionin TP53 (Supplementary Table S8). These read assignment errorssuggest that the accuracy of RNA-seq for quantifying splice var-iants is reduced when there is a mixture of high and very lowabundance splice variants transcribed from complex loci such asTP53.

Analyses of RNA-seq data from TCGA (10,310 tumors over 32cancer types) and deep RNA-seq, ddPCR, and RT-qPCR data from9 human cell lines led to several key findings. First, FL/D40TP53a_T1was themost abundantly expressed TP53 transcriptacross all tumor types (Fig. 2B) and in the cell lines detected byRNA-seq (Fig. 2C). Similarly, RT-qPCR and ddPCR detectedabundant expression of the FL/D40TP53_T1 and TP53a transcriptsubclasses in these cell lines (Fig. 3A and B). In contrast, RNA-seqstruggled to detectTP53 transcripts expressedwith lowabundance(e.g., D133p53) across the cell lines (Fig. 2), which were, however,detected by ddPCR (Fig. 3C) and RT-qPCR (Supplementary TableS1), consistent with previous reports (4, 6, 10–12, 37, 47).Interestingly, RNA-seq analyses of tumor RNA identified thenoncanonical TP53 transcript uc010cne.1 to be abundantlyexpressed; however, this was not observed from the RNA-seq datain the 9 cell lines and therefore needs to be validated using PCR-based methods (Fig. 2B and C). Finally, lack of detection of TP53transcripts expressed at low levels byRNA-seq is at least in part dueto the very low numbers of reads produced for these transcripts,

Figure 3.

ddPCR detects all TP53 transcript subclasses except TP53g. ddPCR of the various TP53 transcript subclasses was performed on the nine cell lines, and the subclassabundance per microgram of RNA was calculated using the QuantaSoft software. The lower limit of detection was <30 copies/mg RNA. FL/D40TP53_T1(A), TP53a (B), FL/D40TP53_T2 (C), D133TP53 (D), and TP53b (E). TP53 transcripts are expressed at different levels, hence the y-axis scales are differentbetween the plots.

Mehta et al.

Cancer Res; 76(24) December 15, 2016 Cancer ResearchOF6

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 7: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

even from the relatively deep RNAseq performed here (Supple-mentary Tables S2 and S3). This appears to be compounded byreduced reliability of RNA-seq analysis methods when faced witha wide range of abundance of splice variant reads that need to bemapped to a single complex locus such as TP53 (Fig. 4). It isimportant, however, to remember that low transcript abundanceand limited ability of detection does not imply a lack of biologicalor clinical significance, as the functional consequences of even lowabundance TP53 isoform expression can be profound (seeIntroduction).

Several factors potentially influence the ability of current RNA-seq methods to measure the abundance of a large dynamic rangeof RNA transcripts. In addition to the bioinformatic factors, theseinclude genomic factors such as: fragmentation site biases, biasesin the efficiency of cDNA synthesis, representational bias of senseversus antisense transcripts, stochastic amplification biases, GCbias, template switching, and transcript length bias (30, 32).However, we show using simulations that over and above thesegenomic issues, there are significant bioinformatic challengeswhen assigning a mixture of very low and moderately highabundance short reads to a complex locus with largely overlap-ping splice variants, such as TP53.

On the basis of our findings, RNA-seq data for the TP53 locusand other transcriptionally complex loci, from publicly availableinformation, can be reliably used to quantify total expression of allRNA splice variants from a gene, or individual RNA splice variantsthat are expressed at high levels, for example, FL/D40TP53a_T1.

However, RNA-seq assignment of alternatively spliced transcriptsfrom complex single loci such as TP53, expressed at low and/orwide ranging levels,maybe inaccurate. In some instances,however,where good coverage of the targeted loci can be obtained, itmaybepossible to improve the detection of the low abundance transcriptsby using deep targeted sequencing approaches. In the future, theuse of PacBio and similar technologies that produce long reads(48), or long RT-qPCR, may circumvent the challenge of quanti-fying RNA transcripts with potential splice variants at both ends.However, for now, accurate quantification of low abundancetranscripts in the TP53 and other complex loci can best be doneusing empirical PCR-based methods.

Disclosure of Potential Conflicts of InterestNo potential conflicts of interest were disclosed.

Authors' ContributionsConception and design: S. Mehta, H. Campbell, A. Braithwaite, C. PrintDevelopment of methodology: S. Mehta, A. Lasham, C. PrintAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): S. Mehta, A. Lasham, H. CampbellAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): S. Mehta, P. Tsai, A. Lasham, A. Braithwaite, C. PrintWriting, review, and/or revision of the manuscript: S. Mehta, P. Tsai,A. Lasham, H. Campbell, R.R. Reddel, A. Braithwaite, C. PrintAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): S. Mehta, R.R. Reddel, C. PrintStudy supervision: S. Mehta, A. Lasham, A. Braithwaite, C. Print

Figure 4.

Accurate quantitation of transcripts byRSEM is dependent onRNA-seq read depth and relative abundance. From left to right, the first stacked bar shows the ability ofRSEM to assign simulated reads to the correct TP53 transcript variant with an error of about �5% when 1,000 simulated input reads were mixed togetherfor FL/D40TP53a_T1, FL/D40TP53a_T2, FL/D40TP53b, FL/D40TP53g, D133TP53a, D133TP53b, D133TP53g (reads ¼ 1,000/transcript), respectively. The secondstacked bar illustrates a slightly degraded ability of RSEM to assign the reads to the correct TP53 transcript when the number of simulated input reads wasreduced 125-fold from 1,000 to 8 reads for all of FL/D40TP53a_T1, FL/D40TP53a_T2, D133TP53a, D133TP53b, D133TP53g, FL/D40TP53b, and FL/D40TP53g,respectively. Stacked bars 3–5 demonstrate the progressive loss of RSEM's ability to assign reads to the correct TP53 transcript variant as the number of simulatedinput reads are diluted 2.5-fold (400 reads), 25-fold (40 reads), and 125-fold (8 reads) for FL/D40TP53a_T2, D133TP53a, D133TP53b, D133TP53g , FL/D40TP53b, andFL/D40TP53g, respectively, representing low abundance transcripts in the presence of undiluted FL/D40TP53a_T1 transcript (1,000 simulated input reads)representing an abundantly expressed transcript. Each stacked bar plot shows the proportion of the expected counts (RSEM)/number of simulated inputreads. For example,whenRSEMaccurately assigns the reads, the height of each stacked bar should equal 1. Inaccuracies byRSEM reduce the height of the stacked barto <1 when the number of expected count for each variant is less than the input reads and >1 when the expected count for each variant is greater than theinput reads.

TP53 Transcript Analysis Identifies Pitfalls of RNA-seq

www.aacrjournals.org Cancer Res; 76(24) December 15, 2016 OF7

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 8: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

AcknowledgmentsWe are grateful for the support provided by The University of Auckland, New

Zealand, University of Otago, New Zealand, and Children's Medical ResearchInstitute, Sydney, Australia.Wewould also like to thankNewZealandGenomicsLimited for preparing the RNA-seq libraries for the nine cell lines and JaneNoblefor authenticating the cell lines.

Grant SupportThis work was supported by the Health Research Council of New Zealand-

grant (awarded to A.W. Braithwaite), the Royal Society ofNewZealandMarsden

Fund (awarded to A.W. Braithwaite), the Maurice Wilkins Centre, New Zealand(S.Y. Mehta, C.G. Print, and A.W. Braithwaite), and Cancer Council of NewSouth Wales, Australia (R.R. Reddel).

The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be hereby markedadvertisement in accordance with 18 U.S.C. Section 1734 solely to indicatethis fact.

Received June 12, 2016; revised September 2, 2016; accepted September 27,2016; published OnlineFirst October 20, 2016.

References1. Lane DP. Cancer. p53, guardian of the genome. Nature 1992;358:15–6.2. Braithwaite AW, Royds JA, Jackson P. The p53 story: layers of complexity.

Carcinogenesis 2005;26:1161–9.3. Olivier M, Hollstein M, Hainaut P. TP53 mutations in human cancers:

origins, consequences, and clinical use. Cold Spring Harb Perspect Biol2010;2:a001008.

4. Mondal AM, Horikawa I, Pine SR, Fujita K, Morgan KM, Vera E, et al. p53isoforms regulate aging-and tumor-associated replicative senescence in Tlymphocytes. J Clin Invest 2013;123:5247–57.

5. Hafsi H, Santos-Silva D, Courtois-Cox S, Hainaut P. Effects of D40p53, anisoform of p53 lacking the N-terminus, on transactivation capacity of thetumor suppressor protein p53. BMC Cancer 2013;13:134.

6. Bernard H, Garmy-Susini B, Ainaoui N, Van Den Berghe L, Peurichard A,Javerzat S, et al. The p53 isoform, D133p53a, stimulates angiogenesis andtumour progression. Oncogene 2013;32:2150–60.

7. Avery-Kiejda KA, Morten B, Wong-Brown MW, Mathe A, Scott RJ. Therelative mRNA expression of p53 isoforms in breast cancer is associatedwith clinical features and outcome. Carcinogenesis 2013;35:586–96.

8. Hofstetter G, Berger A, Berger R, Zoric A, Braicu EI, Reimer D, et al. The N-terminally truncated p53 isoform D40p53 influences prognosis in mucin-ous ovarian cancer. Int J Gynecol Cancer 2012;22:372–9.

9. Slatter TL, Hung N, Campbell H, Rubio C, Mehta R, Renshaw P, et al.Hyperproliferation, cancer, and inflammation in mice expressing aD133p53-like isoform. Blood 2011;117:5166–77.

10. Marcel V, Perrier S, Aoubala M, Ageorges S, Groves MJ, Diot A, et al.Delta160p53 is a novel N-terminal p53 isoform encoded by D133p53transcript. FEBS Lett 2010;584:4463–8.

11. Fujita K,Mondal AM,Horikawa I, NguyenGH, Kumamoto K, Sohn JJ, et al.p53 isoforms D133p53 and p53b are endogenous regulators of replicativecellular senescence. Nat Cell Biol 2009;11:1135–42.

12. Bourdon J-C. p53 and its isoforms in cancer. Br J Cancer 2007;97:277–82.13. Campbell H, Slatter T, Jeffs A, Mehta R, Rubio C, Baird M, et al. Does

D133p53 isoform trigger inflammation and autoimmunity? Cell Cycle2012;11:446–50.

14. Sawhney S, Hood K, Shaw A, Braithwaite AW, Stubbs R, Hung NA, et al.Alpha-enolase is upregulated on the cell surface and responds to plasmin-ogen activation inmice expressing a D133p53amimic. PLoSOne 2015;10:e0116270.

15. Slatter T, Hung N, Bowie S, Campbell H, Rubio C, Speidel D, et al.D122p53, a mouse model of D133p53a, enhances the tumor-suppres-sor activities of an attenuated p53 mutant. Cell Death Dis 2015;6:e1783.

16. Roth I, Campbell H, Rubio C, Vennin C, Wilson M, Wiles A, et al. TheD133p53 isoform and itsmouse analogueD122p53 promote invasion andmetastasis involving pro-inflammatorymolecules interleukin-6 andCCL2.Oncogene 2016;35:4981–9

17. Joruiz SM, Bourdon J-C. p53 isoforms: key regulators of the cell fatedecision. Cold Spring Harb Perspect Med 2016;6:pii: a026039.

18. Nutthasirikul N, Limpaiboon T, Leelayuwat C, Patrakitkomjorn S, Jear-anaikoon P. Ratio disruption of the D133p53 and TAp53 isoform equi-librium correlates with poor clinical outcome in intrahepatic cholangio-carcinoma. Int J Oncol 2013;42:1181–8.

19. Avery-Kiejda KA, Zhang XD, Adams LJ, Scott RJ, Vojtesek B, Lane DP, et al.Small molecular weight variants of p53 are expressed in humanmelanomacells and are induced by the DNA-damaging agent cisplatin. Clin CancerRes 2008;14:1659–68.

20. Aoubala M, Murray-Zmijewski F, Khoury MP, Fernandes K, Perrier S,Bernard H, et al. p53 directly transactivates D133p53a, regulating cell fateoutcome in response to DNA damage. Cell Death Differ 2011;18:248–58.

21. Chen J, Ng SM,ChangC, ZhangZ, Bourdon J-C, LaneDP, et al. p53 isoformD113p53 is a p53 target gene that antagonizes p53 apoptotic activity viaBclxL activation in zebrafish. Genes Dev 2009;23:278–90.

22. Ungewitter E, ScrableH.Delta40p53 controls the switch frompluripotencyto differentiation by regulating IGF signaling in ESCs. Genes Dev 2010;24:2408–19.

23. Hinault C, Kawamori D, Liew CW, Maier B, Hu J, Keller SR, et al. D40Isoform of p53 controls beta-cell proliferation and glucose homeostasis inmice. Diabetes 2011;60:1210–22.

24. Arsic N, Gadea G, Lagerqvist EL, Busson M, Cahuzac N, Brock C, et al. Thep53 isoform D133p53b; promotes cancer stem cell potential. Stem CellReports 2015;4:531–40.

25. Slatter TL, Ganesan P, Holzhauer C, Mehta R, Rubio C, Williams G, et al.p53-mediated apoptosis prevents the accumulation of progenitor B cellsand B-cell tumors. Cell Death Differ 2010;17:540–50.

26. Marcel V, KhouryMP, FernandesK,Diot A, LaneDP, Bourdon JC.Detectingp53 isoforms at protein level. Methods Mol Biol 2013;962:15–29.

27. KhouryMP,Marcel V, FernandesK,Diot A, LaneDP, Bourdon JC.Detectingand quantifying p53 isoforms at mRNA level in cell lines and tissues.Methods Mol Biol 2013;962:1–14.

28. Hindson CM, Chevillet JR, Briggs HA, Gallichotte EN, Ruf IK, Hindson BJ,et al. Absolute quantificationbydroplet digital PCR versus analog real-timePCR. Nat Methods 2013;10:1003–5.

29. Prensner JR, Cao X, Wu Y-M, Robinson D, Wang R, Chen G, et al. Thelandscape of antisense gene expression in human cancers. Genome Res2015;25:1068–79.

30. Xuan J, Yu Y,Qing T, Guo L, Shi L.Next-generation sequencing in the clinic:promises and challenges. Cancer Lett 2013;340:284–95.

31. Mercer TR, Gerhardt DJ, Dinger ME, Crawford J, Trapnell C, Jeddeloh JA,et al. Targeted RNA sequencing reveals the deep complexity of the humantranscriptome. Nat Biotech 2012;30:99–104.

32. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and oppor-tunities. Nat Rev Genet 2011;12:87–98.

33. The Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA,Mills GB, Shaw KRM, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 2013;45:1113–20.

34. Zhang J, Baran J, Cros A, Guberman JM,Haider S, Hsu J, et al. InternationalCancer Genome Consortium Data Portal–a one-stop shop for cancergenomics data. Database 2011;2011:bar026.

35. Sharathchandra A, Katoch A, Das S. IRESmediated translational regulationof p53 isoforms. Wiley Interdiscip Rev RNA 2014;5:131–9.

36. Marcel V, TranPL, SagneC,Martel-PlancheG,Vaslin L, Teulade-FichouMP,et al. G-quadruplex structures in TP53 intron 3: role in alternative splicingand in production of p53mRNA isoforms. Carcinogenesis 2011;32:271–8.

37. Senturk S, Yao Z, Camiolo M, Stiles B, Rathod T,Walsh AM, et al. p53c is atranscriptionally inactive p53 isoform able to reprogram cells toward ametastatic-like state. Proc Natl Acad Sci 2014;111:E3287–96.

38. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seqdata with or without a reference genome. BMC Bioinformatics 2011;12:1471–2105.

39. Martin M. Cutadapt removes adapter sequences from high-throughputsequencing reads. EMBnet J 2011;17:10–2.

Cancer Res; 76(24) December 15, 2016 Cancer ResearchOF8

Mehta et al.

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 9: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

40. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. NatMethods 2012;9:357— 9.

41. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctionswith RNA-Seq. Bioinformatics 2009;25:1105–11.

42. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Van Baren MJ,et al. Transcript assembly and quantification by RNA-Seq reveals unan-notated transcripts and isoform switching during cell differentiation.Nature Biotechnol 2010;28:511–5.

43. KentWJ, Sugnet CW, Furey TS, RoskinKM,Pringle TH, Zahler AM, et al. Thehuman genome browser at UCSC. Genome Res 2002;12:996–1006.

44. Dalgleish R, Flicek P, Cunningham F, Astashyn A, Tully RE, Proctor G, et al.Locus Reference Genomic sequences: an improved basis for describinghuman DNA variants. Genome Med 2010;2:1–7.

45. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. Thesequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–9.

46. Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work withhigh-throughput sequencing data. Bioinformatics 2015;31:166–9.

47. Boldrup L, Bourdon JC, Coates PJ, Sjostrom B, Nylander K. Expression ofp53 isoforms in squamous cell carcinoma of the head and neck. EurJ Cancer 2007;43:617–23.

48. Rhoads A, Au KF. PacBio sequencing and its applications. GenomicsProteomics Bioinformatics 2015;13:278–89.

49. Rambaldi D, Ciccarelli FD. FancyGene: dynamic visualization of genestructures and protein domain architectures on genomic loci. Bioinfor-matics 2009;25:2281–2.

www.aacrjournals.org Cancer Res; 76(24) December 15, 2016 OF9

TP53 Transcript Analysis Identifies Pitfalls of RNA-seq

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624

Page 10: A Study of TP53 RNA Splicing Illustrates Pitfalls of RNA ...cancerres.aacrjournals.org/content/canres/early/2016/12/01/0008... · Molecular and Cellular Pathobiology A Study of TP53

Published OnlineFirst October 20, 2016.Cancer Res   Sunali Mehta, Peter Tsai, Annette Lasham, et al.   Methodology

RNA Splicing Illustrates Pitfalls of RNA-seqTP53A Study of

  Updated version

  10.1158/0008-5472.CAN-16-1624doi:

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerres.aacrjournals.org/content/suppl/2016/10/20/0008-5472.CAN-16-1624.DC1

Access the most recent supplemental material at:

   

   

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected] at

To order reprints of this article or to subscribe to the journal, contact the AACR Publications

  Permissions

  Rightslink site. (CCC)Click on "Request Permissions" which will take you to the Copyright Clearance Center's

.http://cancerres.aacrjournals.org/content/early/2016/12/01/0008-5472.CAN-16-1624To request permission to re-use all or part of this article, use this link

Research. on May 7, 2018. © 2016 American Association for Cancercancerres.aacrjournals.org Downloaded from

Published OnlineFirst October 20, 2016; DOI: 10.1158/0008-5472.CAN-16-1624