cloud implementation of gt-far (genome and transcriptome-free analysis of rna-seq) university of...

9
Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

Upload: kelly-carroll

Post on 04-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq)

University of Southern California

Page 2: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

GT-FAR Pipeline

Page 3: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

GT-FAR Components

1. Read Quality Control and Adaptor Trimming for Input Read File

2. Sequential Ungapped Mapping to Reference Gene-Models/Genome*

3. Gapped alignment to Reference Gene Models/Genome to faciliate Splice Variant Prediction*

4. Sample Quantificationa) A reference based version concerning gene/junction/exon/pre-mRNA

expression*b) A reference free quantification of read/kmer sequences

5. Outputa) Quantification data, visualization, and an alignment sam file for further

analysis b) Capable of including >99% in reference based output in high quality human

samples

6. * When a reference genome and gtf file are available. If one is not available only a sequence/kmer based analysis (4b) is performed.

Page 4: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

Pegasus WMS on the Cloud

• Allows scientist to design an analysis at a high-level without worrying about how to invoke it, execute it

• Provides Python, Java, and Perl APIs for workflow creation• Automatically executes computations on computational resources available

to the community or individual• When failures occur, it tries to recover from them using a variety of

mechanisms• Records provenance• Used in a number of domains: astronomy, bioinformatics, earthquake

science, helioseismology, gravitational-wave physics, seismology, etc.. • Detailed documentation on workflow design and execution at http://

pegasus.isi.edu • Pegasus tutorial on Amazon AWS http://pegasus.isi.edu/wms/docs/latest/

vm_amazon.php • User support available [email protected]

Page 5: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

GT-FAR Cloud Based Pipeline

• Investigators can start an EC2 instance with a GUI/GT-FAR

• Users can upload input files (FastQ file in gzip format) using web browser

• Tracks running workflows• Users are able to download

the outputs to their local laptops

• Outputs are also made available in Amazon S3

• Allows for error reporting and debugging

• GT-FAR pipeline is available as a cloud-based solution hosted on Amazon EC2. (http://genomics.isi.edu )

• The pipeline is executed on distributed resources using the Pegasus Workflow Management System (http://pegasus.isi.edu )

Capabilities

Page 6: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

GTFAR Success Email

Page 7: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

GTFAR Failure Email

Page 8: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

Expression of APOL1

• APOL1 has moderate expression – we can notice that it all comes from a few exons and matching

junctions– Hence, it is driven by a single transcript. 

Page 9: Cloud Implementation of GT-FAR (Genome and Transcriptome-Free Analysis of RNA-Seq) University of Southern California

RNA-seq Analysis Workflows

• GT-FAR (Read-based RNA-seq Analysis)– New Functions: Novel Splice Junctions, Reference-free analysis– Pegasus WMS: http://pegasus.isi.edu – Pegasus GT-FAR (genome and transcriptome free analysis of RNA):

http://genomics.isi.edu/gtfar – Pegasus tutorial on Amazon AWS http://pegasus.isi.edu/wms/docs/latest/

vm_amazon.php – GitHub: https://github.com/pegasus-isi/pegasus-gtfar

• RseqFlow (Standard RNA-seq Analysis)– Command line based– Functions: RPKM, Differential Expression, Variants– Google: https://code.google.com/p/rseqflow/– GitHub: https://github.com/herstein/RseqFlow– SourceForge: http://sourceforge.net/projects/rseqflow