rna-seq analysis overview

22

Click here to load reader

Upload: volien

Post on 31-Dec-2016

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: RNA-Seq Analysis Overview

RNA-Seq Analysis Using Pathogen Portal’s RNA-Seq analysis pipeline RNA-

Rocket

Overview

Creating an account Exploring the site Getting data Checking quality Starting analysis Further analysis

Page 2: RNA-Seq Analysis Overview

Create an account

Step 1: Create a login account: I. Go to http://pathogenportal.org

II. Click on RNA Rocket. III. Click on Create account IV. Fill in the required information.

Page 3: RNA-Seq Analysis Overview

Exploring the site: Launch Pad

- Interactive concept diagram

- Task oriented menu system

- Designed for novice user

Page 4: RNA-Seq Analysis Overview

Exploring the site: Launch Pad Trim Reads

- User guide to why, what, and how

- Details required inputs and expected outputs

- Helps organize files into project spaces

Page 5: RNA-Seq Analysis Overview

Exploring the site: Project View

- View existing projects - Download files - View metadata - Stream to BRC sites - Manage space allocation - Share projects

Page 6: RNA-Seq Analysis Overview

Exploring the site: Shared Data Published Projects

- View shared projects - Import into your project

space - Share with collaborators - Provide data for

presentations… -

Page 7: RNA-Seq Analysis Overview

Getting Data A. Importing shared data B. Transferring ENA/SRA data C. Uploading your own

1. Click on “Shared Data” “Published Projects”

2. Click on the title of the Project you wish to import

Page 8: RNA-Seq Analysis Overview

3. Click “Import History” to import the Project into your Project View

Page 9: RNA-Seq Analysis Overview

Getting Data A. Importing shared data B. Transferring ENA/SRA data C. Uploading your own

1. Navigate to the ‘Launch Pad’ page and click the ‘Get fastq files from SRA/ENA’ link

2. Click the ‘Continue’ button

Page 10: RNA-Seq Analysis Overview

3. Search for the SRA or ENA accession in the search box provided. Alternatively search for the GEO, ArrayExpress, SRA, or ENA identifiers in the global search box at the top.

4. Click on the Nucleotide Sequences Record title you wish to import.

Page 11: RNA-Seq Analysis Overview

5. On the subsequent ENA record page click the ‘File’ link in the ‘Fastq files (galaxy)’ column for the files you wish to transfer.

Page 12: RNA-Seq Analysis Overview

Getting Data A. Importing shared data B. Transferring ENA/SRA data C. Uploading your own

1. To upload data from your computer or a remote computer click the ‘Upload Files’ link on

the Launch Pad page.

2. On the subsequent page use the ‘Choose File’ button to upload files from your own

computer (limited to 2Gb), the ‘URL/Text’ box to paste URLs for files on remote computers, and the FTP instructions for transferring files over FTP (better for larger files).

Paste the FastQ URLs here

Choose files from your computer here

Instructions for using FTP

Page 13: RNA-Seq Analysis Overview

Checking quality Read base quality can affect how the reads map to the genome. Different sequencing technologies can have different quality and base-call error profiles. Depending on the quality of base calls you may wish to trim your read sequences or make special adjustments to the alignment parameters to account for this. There are two tools, FastQC and SAMStat, for checking the average base call quality in a fastq file and the number of reads aligned, respectively. An example is provided in Shared Data Published Projects RNASeq_QC_Demo Here we show two classes of files: 1. the original reads 2. trimmed version of those reads with low quality ends removed For these two classes we give both the FastQC and SAMStat report

Original fastq & analysis

Trimmed fastq & analysis

Click the eye see the contents of a file or report

Page 14: RNA-Seq Analysis Overview

From the FastQC report we see that the average base call quality is improved by trimming the reads.

From the SAMStat report we see that the number of unaligned reads only shows a slight improvement with trimming. Modern alignment software is often able to account for the base call quality in determining alignments. Also of note is that the ‘Mean Base Quality’ profile is not substantially different for MAPQ >=30 and MAPQ < 3.

Page 15: RNA-Seq Analysis Overview

Starting Analysis Test datasets have been provided for the purpose of starting an alignment and transcript assembly job at Shared Data Published Projects RNASeq_Run_Demo. - To begin, import this history into your own workspace by using the ‘Import history’ functionality

demonstrated previously.

- After the Project is imported it should appear in your ‘Project View’

Page 16: RNA-Seq Analysis Overview

- Proceed to the ‘Launch Pad’ page and click the ‘Align Reads & Assemble Transcripts’ link.

-­‐ On the next page choose the type of analysis (we are analyzing a paired end prokaryotic

sample). -­‐ Next select the target project from the drop down menu. You should have a project called

‘imported: RNASeq_Run_Demo’. Once you select the correct project you should see the two FASTQ files listed. Next click ‘Continue’.

Page 17: RNA-Seq Analysis Overview

The following page allows you to configure the parameters for the various tools that will run as part of the analysis you have selected. Here we describe the bare minimum for running a job. More care should be taken when customizing analysis to your data. First populate the Upstream and Downstream Read Files with READ1_SHORT.fastq and READ2_SHORT.fastq respectively.

Select the reference organism ‘Salmonella enterica subsp. Typhimurium 14028S’ from the dropdown. It may take a moment for the dropdown to appear once clicked due to the number of organisms.

Page 18: RNA-Seq Analysis Overview

Select ‘Run Workflow’ at the bottom of the page

If the workflow is successfully queued you should see the following

Page 19: RNA-Seq Analysis Overview

Next go to the ‘Project View’ page to see the status of your jobs From the display in the right most panel: Grey jobs are pending, Green jobs are complete, and Yellow Jobs are running.

Page 20: RNA-Seq Analysis Overview

Further Analysis Test datasets have been provided for the purpose of testing the RNA-Seq visualization capabilities at PATRIC. Navigate to Shared Data Published Projects RNASeq_Analysis_Demo The files displayed each have a visualization component on the PATRIC site. This can be done by first clicking the dataset title to expand the dataset section, then clicking the display at PATRIC link.

Displaying BAM at PATRIC

Read Quality View

Expression View

Page 21: RNA-Seq Analysis Overview

Displaying BigWig at PATRIC

Displaying GFF at PATRIC

Page 22: RNA-Seq Analysis Overview

Displaying GeneList file at PATRIC