methods for the detection of conservation of methylation ...marjoram... · methods for the...
TRANSCRIPT
Methods for the detection of conservation of methylation
(R21, year 2)
(or, how statisticians can make your life more complicated)
Let’s detect conservation!
Darryl Shibata
1. The phenotype of a cell is determined by its epigenome.
2. Develop methods to rank genes/pathways/regions according to their degree of conservation of methylation status.
3. Thus, we will identify the genes/pathways most important for growth of an individual Colorectal cancer
4. (A small step towards) Personalized medicine!
That sounds great, but…
1. What exactly do you mean by conservation? How do we measure it?
2. How does your technology work?
3. What kind of error rates do you get using your technology (Illumina Infinium MethylationEPIC BeadChip Kit)?
4. You need this by when?!?
What does conservation mean?
Scherer, et al., Nucleic Acids Research, 2020, Vol. 48, No. 8
What does conservation mean?
Scherer, et al., Nucleic Acids Research, 2020, Vol. 48, No. 8
Pooled Data
6
Samples from left and right side of tumor.
Our preliminary data are pooled, rather than cell-level. • Output = “beta values”. • After much QC, we can think of the beta value for a given
site as the “proportion of cells that are methylated at that site in that sample”.
Statistics to measure conservation
• Variance - by site or by region • Manhattan distance • Proportion of sites with extreme methylation
proportions ([0,0.2] or [0.8,1])
7
Overall strategy
• e.g., for a gene-based analysis: • Calculate the observed statistic value for
each gene • Rank genes according to those values • Pick-off the genes with highest(lowest)
value, as the most conserved.
8
• Look for genes in which methylation is statistically significantly conserved across normal tissue.
• Belief: those genes are “important”. • ‘Validate’: Look at expression of those conserved genes
in the Expression Atlas database. Is it high? • Colon • Small intestine • Endometrium
Proof of concept / QC
Variance of Statistic
• Suppose n sites in the region/gene.
• Proportion of sites with extreme methylation proportions ([0,0.2] or [0.8,1]) • Binomial distribution: variance is O(1/n)
• Variance • Variance of variance:
10
=(n� 1)2µ4
n3� (n� 1)(n� 3)µ2
2
n3= O(1/n)
<latexit sha1_base64="uEADy/LAHmOHSF2KL6d+syM/X8o=">AAACKnicbZDLTgIxFIY7XhFvqEs3jcQEFuAMkKgLEtSNOzGRSwLDpFM60NDpTNqOCZnwPG58FTcsNMStD2K5LBA8SZs//3dO2vO7IaNSmebE2Njc2t7ZTewl9w8Oj45TJ6d1GUQCkxoOWCCaLpKEUU5qiipGmqEgyHcZabiDhylvvBIhacBf1DAkto96nHoUI6UtJ3VXbnsC4TjDc1a2U4BtP3JKo5h3iiOYg0tMX8XslBY6hTkvP2WsK551Umkzb84KrgtrIdJgUVUnNW53Axz5hCvMkJQtywyVHSOhKGZklGxHkoQID1CPtLTkyCfSjmerjuCldrrQC4Q+XMGZuzwRI1/Koe/qTh+pvlxlU/M/1oqUd2PHlIeRIhzPH/IiBlUAp7nBLhUEKzbUAmFB9V8h7iMdj9LpJnUI1urK66JeyFul/O1zKV25X8SRAOfgAmSABa5BBTyCKqgBDN7AB/gEX8a7MTYmxve8dcNYzJyBP2X8/AJycaOd</latexit>
Bootstrapping:
• For a gene with m CpG sites: • Construct a large number of ‘null genes’ that also
have m CpG sites. • Calculate statistic value for each null gene. • Rank the statistic value for the observed gene
within the set of values for null genes to get a p-value.
• Do this for all genes of all lengths, and look at those with lowest resulting p-values.
11
Software: Methcon5
12
JCO Clin Cancer Inform 4:100-107. © 2020 by American Society of Clinical Oncology
github.com/USCbiostats/MethCon5
More complex models?
𝛼k: region-specific tendency to be methylated𝜌n: density of CpGs at nth CpG𝛽k: region-specific dependence on densitydn: distance between nth CpG and (n-1)th CpGγk:region-specificcorrelationparameter
Software: DNAMeda (Shiny App - Visualization)
Islands
Shores
Non-island
github.com/USCbiostats/DNAMeda
Software: mutational signatures
15
9
A Unifying Model to Test Difference?
Somatic mutations
pmsignature EstimatedProportions,
!"
Are !"different in
two groups?
e.g., Wilcoxon
Uncertainty in Proportions
HiLDA
HiLDA = “Hierarchical Latent Dirichlet Allocation”
!!~#$% &!", . . . , &!#, )!!"~#$% &"", . . . , &"#, )"
HILDA
16github.com/USCbiostats/HiLDA
Software: iMutSig
17
iMutSig - Shiny app.
18
Acknowledgements
19
USC: Emil Hvitfeldt, Kim Siegmund, Darryl Shibata, Zhi Yang
JH: Hari Easwaren, Tom Pisanic
And many thanks to ITCR