lewin a 1, richardson s 1, marshall c 1, glazier a 2 and aitman t 2 (2006), biometrics 62, 1-9. 1:...
TRANSCRIPT
![Page 1: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/1.jpg)
Lewin A1, Richardson S1, Marshall C1, Glazier A2 and Aitman T2 (2006),
Biometrics 62, 1-9.
1: Imperial College Dept. Epidemiology2: Imperial College Microarray Centre
Bayesian Modelling of Differential Gene Expression
![Page 2: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/2.jpg)
Introduction to microarrays and differential expression
Bayesian hierarchical model for differential expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially expressed genes
Further work
Outline
![Page 3: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/3.jpg)
(1) Array contains thousands of spots
Millions of strands of DNA of known sequence fixed to each spot
(2) Sample (unknown sequences of cDNA) labelled with fluorescent dye
(3) Matching sequences of DNA and cDNA hybridize together
**
**
*
(4) Array washed only matching samples left (see which from fluorescent spots)
Pictures courtesy of Affymetrix
Microarrays measure gene expression (mRNA)
DNA TGCT
cDNA ACGA
![Page 4: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/4.jpg)
Microarray Data
3 SHR compared with 3 transgenic rats (with Cd36)
3 wildtype (normal) mice compared with 3 mice with Cd36 knocked out
12000 genes on each array
Biological Question
Find genes which are expressed differently between animals with and without Cd36.
Microarray experiment to find genes associated with Cd36
Cd36: gene known to be important in insulin resistance Aitman et al 1999, Nature Genet 21:76-83
![Page 5: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/5.jpg)
Introduction to microarrays and differential expression
Bayesian hierarchical model for differential expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially expressed genes
Further work
Outline
![Page 6: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/6.jpg)
1st level yg1r | g, δg, g1 N(g – ½ δg + r(g)1 , g1
2), yg2r | g, δg, g2 N(g + ½ δg + r(g)2 , g2
2),
Bayesian hierarchical model for differential expression
array effect or normalisation (function of g)
differential effect for gene g between 2 conditions
(fixed effect or mixture prior)
overall gene expression
(fixed effect)variance for each gene
ygsr is log gene expession
![Page 7: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/7.jpg)
2nd level gs
2 | μs, τs logNorm (μs, τs)
Hyper-parameters μs and τs can be influential, so these are estimated in the model.
3rd levelμs N( c, d) τs Gamma (e, f)
Prior for gene variances
Variances estimated using information from all measurements (~12000 x 3) rather than just 3
3 wildtype mice
![Page 8: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/8.jpg)
Spline Curver(g)s = quadratic in g for ars(k-1) ≤ g ≤ ars(k)
with coeff (brsk(1), brsk
(2) ), k =1, … #breakpoints
Prior for array effects (Normalization)
Locations of break points not fixedMust do sensitivity checks on # break points
a1 a2 a3a0
![Page 9: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/9.jpg)
loessBayesian posterior mean
Array effect as function of gene effect
![Page 10: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/10.jpg)
Inference on δ
(1)dg = E(δg | data) posterior mean
Like point estimate of log fold change.
Decision Rule: gene g is DE if |dg| > δcut
(2)pg = P( |δg| > δcut | data)
posterior probability (incorporates uncertainty)
Decision Rule: gene g is DE if pg > pcut
This allows biologist to specify what size of effect
is interesting (not just statistical significance)
Decision Rules for Inference: Fixed Effects Model
biologicalinterest
biologicalinterest
statisticalconfidence
![Page 11: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/11.jpg)
Illustration of decision rule
pg = P( |δg| > log(2)
and g > 4 | data)
x pg > 0.8
Δ t-statistic > 2.78 (95% CI)
3 wildtype v. 3 knock-out mice
![Page 12: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/12.jpg)
Introduction to microarrays and differential expression
Bayesian hierarchical model for differential expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially expressed genes
Further work
Outline
![Page 13: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/13.jpg)
Key Points
Predict new data from the model (using the posterior distribution)
Get Bayesian p-value for each gene
Use all genes together (1000’s) to assess model fit (p-value distribution close to Uniform if model is good)
Predictive Model Checks
![Page 14: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/14.jpg)
Mixed Predictive Checks
g
ybarg Sgpost.pred.
Sg
mixedpred.
Sg
σgpredσg
μ,τ
Mixed prediction is less conservative than posterior prediction
![Page 15: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/15.jpg)
Bayesian predictive p-values
![Page 16: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/16.jpg)
Introduction to microarrays and differential expression
Bayesian hierarchical model for differential expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially expressed genes
Further work
Outline
![Page 17: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/17.jpg)
Picture from Gene Ontology website
Links connect more general to more specific terms
Directed Acyclic Graph
~16,000 terms
Gene Ontology: network of terms
![Page 18: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/18.jpg)
Picture from Gene Ontology website
Each term may have 1000s of genes annotated (or none)
Gene may be annotated to several GO terms
Gene annotated to term A annotated to all ancestors of A
Annotations of genes to a node
![Page 19: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/19.jpg)
GO annotations of genes associated with the insulin-resistance gene Cd36
Compare GO annotations of genes most and least differentially expressed
Most differentially expressed ↔ pg > 0.5 (280 genes)
Least differentially expressed ↔ pg < 0.2 (11171 genes)
![Page 20: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/20.jpg)
GO annotations of genes associated with the insulin-resistance gene Cd36
Use Fisher’s test to compare GO annotations of genes most and least differentially expressed (one test for each GO term)
None significant with simple multiple testing adjustment, but there are many dependencies
Inflammatory response recently
found to be important in insulin resistance
![Page 21: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/21.jpg)
Summary of work in Biometrics paper
Bayesian hierarchical model flexible, estimates variances robustly
Predictive model checks show exchangeable prior good for gene variances
Useful to find GO terms over-represented in the most differentially-expressed genes
![Page 22: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/22.jpg)
Introduction to microarrays and differential expression
Bayesian hierarchical model for differential expression
Decision rules
Predictive model checks
Gene Ontology analysis for differentially expressed genes
Further work
Outline
![Page 23: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/23.jpg)
BGmix: mixture model for differential expression
Group genes into 3 classes: non-DE over-expressed under-expressed
Estimation and classification is simultaneous
Change the prior on the differential expression parameters δg
![Page 24: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/24.jpg)
BGmix: mixture model for differential expression
Choice of Null Distribution True log fold changes = 0
‘Nugget’ null: true log fold changes = small but not necessarily zero
Choice of DE genes distributions Gammas
Uniforms
Normal
![Page 25: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/25.jpg)
Outputs Point estimates (and s.d.) of log fold changes (stabilised and
smoothed)
Posterior probability for gene to be in each group
Estimate of proportion of differentially expressed genes based on grouping (parameter of model)
BGmix: mixture model for differential expression
![Page 26: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/26.jpg)
Obtaining gene lists Threshold on posterior probabilities
(Posterior probability of classification in the null < threshold → gene is DE)
Estimate of False Discovery Rate for any gene list (estimate = average of posterior probabilities)
Very simple estimate!
Choice of decision rule: Bayes Rule Fix False Discovery Rate More complex rules for mixture
of 3 components
BGmix: mixture model for differential expression
![Page 27: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/27.jpg)
g gpred
zg
ybarg Sg
mixedpred.
ybarg
mixedpred.
Sg
σgpredσg
μ,τη
w Model checks for
differential expression parameters δg
More complex for mixture model
Important point: we check each mixture component separately
Predictive Checks for Mixture Model
![Page 28: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/28.jpg)
Bayesian p-values for Mixture Model
Simulated data from incorrect model
Simulated data from correct model
![Page 29: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/29.jpg)
Acknowledgements
Co-authors
Sylvia Richardson, Clare Marshall (IC Epidemiology)
Tim Aitman, Anne-Marie Glazier (IC Microarray Centre)
Collaborators on BGX Grant
Anne-Mette Hein, Natalia Bochkina (IC Epidemiology)
Helen Causton (IC Microarray Centre)
Peter Green (Bristol)
BBSRC Exploiting Genomics Grant
![Page 30: Lewin A 1, Richardson S 1, Marshall C 1, Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College](https://reader037.vdocument.in/reader037/viewer/2022102900/5515e16355034638038b4c2c/html5/thumbnails/30.jpg)
Papers and Software
Software:
Winbugs code for model in Biometrics paper
BGmix (R package) includes mixture model
Papers:
BGmix paper, submitted
Paper on predictive checks for mixure prior, in preparation
http://www.bgx.org.uk/