introduction to deseq and edger packages
DESCRIPTION
Introduction to DESeq and edgeR packages. Peter A.C. ’ t Hoen. Poisson distribution. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/1.jpg)
Introduction to DESeq and edgeR packages
Peter A.C. ’t Hoen
![Page 2: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/2.jpg)
Poisson distribution
• discrete probability distribution that expresses the probability
of a number of events occurring in a fixed period of time if
these events occur with a known average rate and
independently of the time since the last event
= expected k = number of occurrences
![Page 3: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/3.jpg)
Count process
• Poisson distribution
Yt ~ Poisson(λt) with λt = pnt
t: tag
λ: true expression
Y: observed expression
p: probability
n: total number of RNA molecules
• Truncated Poisson distribution: zero can mean not expressed or not counted
• Count variance ~ λt
• Murray F Freeman and John W Tukey. Ann Math Statist, 21:607-611, (1950)
![Page 4: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/4.jpg)
Negative binomial distribution
• discrete probability distribution of the number of successes in
a sequence of Bernoulli trials before a specified (non-random)
number r of failures occurs
• also arises as a continuous mixture of Poisson distributions
where the mixing distribution of the Poisson rate is a gamma
distribution. That is, we can view the negative binomial as a
Poisson(λ) distribution, where λ is itself a random variable,
distributed according to Gamma(r, p/(1 − p)).
![Page 5: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/5.jpg)
edgeR (1)
• Robinson, Smyth (Biostatistics, 2008; Bioinformatics 2007)
• Package available from Bioconductor with very informative
vignette
Yij ~ NB (ij , )
Var (Yij) = ij ( 1 + ij x )
• Negative binomial (gamma Poisson) with average mu
• Phi is overdispersion parameter (biological variation)
• = 0 gives Poisson distribution
![Page 6: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/6.jpg)
Overdispersion in our data
![Page 7: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/7.jpg)
edgeR (2)
• Test per gene
Ygij ~ NB (gij , g ) where gij = Mj x pgj
Var (Ygij) = gij ( 1 + ij x g)
pgi is proportion of tags for tag g in sample i
Mj is library size for sample i and library j
g is dispersion parameter for tag g
![Page 8: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/8.jpg)
edgeR (3)
• Estimation of common dispersion parameter by conditioning
g on the sum of counts and maximizing the common
likelihood
lC() = lg (g)
• Common dispersion parameter OR weighted linear
combination of common and individual likelihoods
WL (g) = lg(g) + lC(g)
![Page 9: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/9.jpg)
edgeR (4)
• Exact test replacing hypergeometric probabilities with NB-
derived probabilities (qCML) for single factor experiment
• Generalized linear models and Cox-Reid profile-adjusted
likelihood (CR) method for multifactorial experiments
![Page 10: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/10.jpg)
edgeR: what is new?
• Exact Test not able to work with confounders
replaced by generalized linear model with log likelihood
ratio test
• Abundance trending in dispersion estimates
![Page 11: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/11.jpg)
Dispersion trend
dispersion
abundance
![Page 12: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/12.jpg)
Dispersion trending (after filtering for low ab)
dispersion
abundance
![Page 13: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/13.jpg)
DESeq (1)
• Anders and Huber: Genome Biology (2010) 11:R106
• Roughly same principles as edgeR
• No multifactorial analysis implemented yet
![Page 14: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/14.jpg)
DESeq (2)
(1) Yij ~ NB (ij , σ2ij )
(2) ij = sj qi,ρ(j) sj scaling factor for sample j
qi,ρ(j) proportional concentration
of tag i in condition ρ
(3) σ2ij = ij + s2
j νi,ρ(j) νi,ρ(j) is a smooth function
depending on qi,ρ(j) (concentration)
Count noise Extra variance
![Page 15: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/15.jpg)
DESeq (3): variance trend with expression
Purple: PoissonDashed orange: edgeR (before trending)Orange: DESeq
You can derive:Squared CV is 1/μ + φ
![Page 16: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/16.jpg)
DESeq (3)
• Differences with edgeR:
• Complete shrinkage to trended dispersion; limited tagwise
dispersion estimates
• Different variance estimates for different sample groups allowed
• Deals better with samples with large differences in read depth?
![Page 17: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/17.jpg)
DESeq (4): statistical testing
• In analogy to initial edgeR implementation exact test on the
NB probabilities in the two conditions
![Page 18: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/18.jpg)
Conclusions
• edgeR and DESeq are comparable implementation of
statistical tests using NB distribution
• edgeR and DESeq produce largely similar results
• Implementation of generalized linear models in edgeR allows
for testing with confounders
• Results comparable to limma for medium – high expressed
genes: modeling of stochastic effects is particularly important
for low expressed genes
![Page 19: Introduction to DESeq and edgeR packages](https://reader031.vdocument.in/reader031/viewer/2022020712/56813f59550346895daa2726/html5/thumbnails/19.jpg)
Comparison to limma (on sqrt scaled data)