how to do differential expression analysis from fastq ... · •introduction: •btrim is a fast...

Post on 01-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How to do differential expression analysis from Fastq

format data on HPCWritten by: yiran.zhang

Pipeline

• ShortRead: Quality Assessment, filtering and trimming

• Fastqc: Quality control

• Btrim: Filtering and Trimming

• HISAT2: Align the reads to reference genome

• Htseq-count: Count reads using htseq-count

• DESeq: Differential gene expression analysis based on the negative binomial distribution.

ShortRead

• Introduction:• The ShortRead package provides functionality for working with FASTQ files

from high throughput sequence analysis

• Environment required:• R

• Function:• Quality Assessment

• filtering and trimming

ShortRead

• Write your own function to guarantee the reads can not contain ‘N’

> qaSummary[["baseCalls"]] A C G T Ngu2_read1.fq 21685857 28412307 28130219 21767568 4049gu2_read2.fq 21729895 28722063 27816884 21730591 567gu3_read1.fq 21723444 28346939 28174527 21751407 3683gu3_read2.fq 21734048 28697947 27824570 21742736 699ye1_read1.fq 21675483 28443112 28095517 21781660 4228ye1_read2.fq 21702486 28762839 27785294 21748745 636ye3_read1.fq 21795076 28360347 27968237 21872354 3986ye3_read2.fq 21807964 28695030 27618484 21877864 658

Fastqc: Quality control

• Introduction:• FastQC aims to provide a simple way to do some quality control checks on

raw sequence data coming from high throughput sequencing pipelines.

Fastqc

Btrim

• ShortRead drops the reads containing the ‘N’, but it looks like that the low quality bases still exists, so we decide to filtering and trimming the ShortRead result with Btrim.

• Introduction:• Btrim is a fast and lightweight tool to trim adapters and low quality regions in

reads from ultra high-throughput next-generation sequencing machines.

• Note:• Use fastqc to get the quality control report again, to check whether the

filtered and trimmed reads are reasonable. Just edit your previous command of fastqc.pbs and submit it.

HISAT2

• Introduction:• HISAT2 is a fast and sensitive alignment program for mapping next-generation

sequencing reads (both DNA and RNA) to a population of human genomes (as well as against a single reference genome).

• Advantage: Highly efficient

• Note:• It will create a SAM file which can be directly used in the further work of

htseq.

HISAT2

HTSeq:

• Introduction:

• HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

• Require:• htseq-count [options] <alignment_file> <gff_file>

DESeq

• Introduction:• Differential gene expression analysis based on the negative binomial

distribution.

Further work

• Try to connect the whole Pipeline which can make this work in less commands and steps.

• Adjust program to our system to do ambiguous reads mapping

Thank you

top related