biases in rna- seq data

15
Biases in RNA-Seq data

Upload: minna

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Biases in RNA- Seq data . Transcript length bias. Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both transcripts is doubled in a treatment sample. The biological variance is the same for both transcripts. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biases in RNA- Seq  data

Biases in RNA-Seq data

Page 2: Biases in RNA- Seq  data

Transcript length bias

Two transcripts of length 50 and 100 have the same abundance in a control sample.

The expression of both transcripts is doubled in a treatment sample.

The biological variance is the same for both transcripts.

They have the same level of differential expression.

control

control

treatment

treatment

treatment

nn

meanmeant control

varvar

The transcripts are fragmented into short reads of 10 bases, and reported by the RNA-Seq experiment.

There will be more hits to the 100 base transcript – its n will be larger, so it will be reported as more significantly changed.

Page 3: Biases in RNA- Seq  data

Oshlack and Wakefield 2009, Biology Direct, 4, 14

Page 4: Biases in RNA- Seq  data

Random priming aims to sample transcripts uniformly, rather than from just one end (such as with the oligo dT primer ……)

Page 5: Biases in RNA- Seq  data

Counts of reads along gene Apoe in different tissues of the Wold data. (a) brain, (b) liver, (c) skeletal muscle. Each vertical line stands for the count of reads starting at that position. The grey lines are counts in the UTR regions and a further 100 bp. Here introns are deleted and exons are connected into a single piece.

Li et al. 2010, Genome Biology, 11, R50

Page 6: Biases in RNA- Seq  data

Nucleotide frequencies versus position for stringently mapped reads. For each experiment, mapped reads were extended upstream of the 5 -start position, ′such that the first position of the actual read is 1 and positions 0 to −20 are obtained from the genome. The first hexamer of the read is shaded. Brief experimental protocols are indicated in the key

Hansen et al. Nucleic Acids Research, 2010, 38, e31

Biases are caused by hexamer priming that is not random

Page 7: Biases in RNA- Seq  data

Roberts et al. 2011, Genome Biology, 12, R22

Page 8: Biases in RNA- Seq  data

Human experiment (SRA012427) Yeast experiment (SRA020818_RH)

GC content biases some RNA-Seq experiments, but not at the same level in all experiments.

Roberts et al. 2011, Genome Biology, 12, R22

Page 9: Biases in RNA- Seq  data

Affymetrix GeneChips are the dominant platform for microarray observations, and have been so for almost a decade – there are more than one hundred thousand hybridizations in the public domain. There has only been a handful of standardised protocols used. This huge dataset allows sensitive meta-analysis.

Next-generation sequencing is rapidly evolving. There is no market leader, and there have been only a relatively small number of published studies of RNA-Seq for even the most popular NGS platforms. There are clearly biases in the data, and the protocols and chemistry used to generate the data leaves signatures. It is hard to perform meta-analysis.

Page 10: Biases in RNA- Seq  data

Affymetrix

Page 11: Biases in RNA- Seq  data

Applied Biosystems

Page 12: Biases in RNA- Seq  data

Illumina

Page 13: Biases in RNA- Seq  data

Life Technologies

Page 14: Biases in RNA- Seq  data

Pacific Biosciences

Page 15: Biases in RNA- Seq  data

Helicos 1 year

Helicos since 2007