software for robust transcript discovery and quantification from rna- seq
DESCRIPTION
Software for Robust Transcript Discovery and Quantification from RNA- Seq. Ion Mandoiu , Alex Zelikovsky , Serghei Mangul. Outline. Background Existing approaches Proposed Flow Datasets. Alternative Splicing. RNA- Seq. Make cDNA & shatter into fragments. Sequence fragment ends . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/1.jpg)
Software for Robust Transcript Discovery and Quantification from RNA-Seq
Ion Mandoiu, Alex Zelikovsky, Serghei Mangul
![Page 2: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/2.jpg)
Outline
• Background• Existing approaches• Proposed Flow• Datasets
![Page 3: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/3.jpg)
Alternative Splicing
![Page 4: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/4.jpg)
RNA-Seq
A B C D E
Make cDNA & shatter into fragments
Sequence fragment ends
Map reads
Gene Expression (GE)
A B C
A C
D E
Isoform Discovery (ID) Isoform Expression (IE)
![Page 5: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/5.jpg)
Existing approaches
• Genome-guided reconstruction– Exon identification– Genome-guided assembly
• Genome independent reconstruction– Genome-independent assembly
• Annotation-guided reconstruction– Explicitly use existing annotation during assembly
![Page 6: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/6.jpg)
Genome-guided reconstruction (GGR)
• Scripture(2010)– Reports all isoforms
• Cufflinks(2010)– Reports a minimal
set of isoforms
Trapnell, M. et al MAY 2010, Guttman, M. et al MAY 2010
![Page 7: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/7.jpg)
Genome independent reconstruction (GIR)
• Trinity(2011),Velvet(2008), TransABySS(2008)– de Brujin k-mer graph
• Efficiently construct graph from large amount of raw data
• Scoring algorithm to recover all plausible splice form• Robustness to the noise steaming from sequencing
errors
Grabherr, M. et al. Nat. Biotechnol. JULY 2011
![Page 8: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/8.jpg)
GGR vs GIR
Garber, M. et al. Nat. Biotechnol. JUNE 2011
![Page 9: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/9.jpg)
Max Set vs Min Set
Garber, M. et al. Nat. Biotechnol. JUNE 2011
![Page 10: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/10.jpg)
Reconstruction Strategies Comparison
Grabherr, M. et al. Nat. Biotechnol. MAY 2011
![Page 11: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/11.jpg)
IsoEM
• EM Algorithm for IE– Single and/or paired reads– Fragment length distribution– Strand information– Base quality scores
Nicolae, M. et al.
![Page 12: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/12.jpg)
IsoEM Validation on MAQC Samples
RNA-Seq: 6 MAQC libraries, 47-92M 35bp reads each [Bullard et al. 10]qPCR: Quadruplicate measurements for 832 Ensembl genes [MAQC Consortium 06]
0.35
0.45
0.55
0.65
0.75
0.85 HBRR 1X, IsoEM
HBRR 1A, IsoEM
UHRR 1X, IsoEM
UHRR 1A, IsoEM
UHRR 2, IsoEM
UHRR 3, IsoEM
UHRR 4, IsoEM
UHRR 5, IsoEM
HBRR 1X, Cufflinks
HBRR 1A, Cufflinks
UHRR 1X, Cufflinks
UHRR 1A, Cufflinks
UHRR 3, Cufflinks
UHRR 4, Cufflinks
UHRR 5, Cufflinks
UHRR 2, CufflinksMillion Mapped Bases
r2
![Page 13: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/13.jpg)
VSEM : Virtual String EM
• Estimate total frequency of missing transcripts
• Identify read spectrum sequenced from missing transcripts
Mangul, S. et al.
ML estimates of string
frequencies
Computeexpected read
frequencies
Update weightsof reads in
virtual string
EM(Incomplete) Panel+ Virtual Stringwith 0-weightsin virtual string
Virtual String
frequencychange>ε?
Output stringfrequencies
EM
YESNO
![Page 14: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/14.jpg)
Proposed Flow
• Step 1: Read error correction• Step 2: Maximum likelihood estimation of
isoform frequencies and identification of unexplained reads
• Step 3: Read clustering• Step 4: Read graph construction and candidate
transcript generation. Continue Step 2
![Page 15: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/15.jpg)
SOLiD RNA-Seq Datasets
MCF7-SOLiD4 (April 2010) Paired End
MCF7-SOLiD5500 (December 2010) Paired End
MCF7-SOLiD5500 (December 2010) Frag Color
MCF7-SOLiD5500 (December 2010) Frag ECC Base
Total BAM records processed (valid records): 540,187,060 964,677,956 447,491,122 442,406,834Total unmapped records: 135,285,131 249,120,112 0 0Total not primary records: 0 0 0 0Total low mapQV(<10) records: 125,776,254 302,827,913 116,983,995 149,380,139Not in any chromosome in the dictionary: 12,483,859 26,731,194 18,800,675 9,338,242Total reads passing filters: 266,641,816 385,998,737 311,706,452 283,688,453Counted on exons: 202,347,590 282,998,093 232,539,004 209,808,863Counted on introns: 32,366,424 53,218,659 44,321,422 42,017,833Counted intergenic: 31,927,802 49,781,985 34,846,026 31,861,757
![Page 16: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/16.jpg)
Validation Datasets
• MAQC Sample : 1K transcripts– HBR (brain sample)– UHR (universal human reference)
![Page 17: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/17.jpg)
Available Annotations
• NCBI• UCSC• Ensembl• AceViewLe
ss c
onse
rvati
ve
![Page 18: Software for Robust Transcript Discovery and Quantification from RNA- Seq](https://reader035.vdocument.in/reader035/viewer/2022081505/568161d7550346895dd1dca9/html5/thumbnails/18.jpg)
Q/A