a brief introduction to epistasis detection
DESCRIPTION
a brief introduction to epistasis detectionTRANSCRIPT
![Page 1: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/1.jpg)
A brief introduction toepistasis detection in GWAS
2014. 01. 27.
Hyun-hwan Jeong
![Page 2: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/2.jpg)
Agenda
• Introduction
• Problem definition
• Computational detection methods
• Challenges
2
![Page 3: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/3.jpg)
Introduction
![Page 4: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/4.jpg)
Single Nucleotide Polymorphism
• A single letter change in DNA sequence• DNA sequence : 99.9% identical
• Common type of genetic variation
• ≥ 1% changes in general population
…ATTCGCCGGCTGCAACGTTAGA……ATTCGCCGGCTGCAGCGTTAGA……ATTCGCCGGCTGCATCGTTAGA…
4
![Page 5: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/5.jpg)
Genotype, phenotype and Allele
http://en.wikipedia.org/wiki/Phenotype
genotype
phenotype
allele
5
![Page 6: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/6.jpg)
Genome Wide Association Studyfor relation between Single SNP and disease
Manhattan plot of the GWAS of the discovery cohort comprising 2,346 SSc cases and 5,193 healthy controls. - Nature Genetics 42, 426–429 (2010)
6
![Page 7: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/7.jpg)
Why is detecting epistasis needed in GWAS?
An illustration of interaction pattern between two SNPs with no marginal effect. -Bioinformatics 26, 30-37 (2010)
7
![Page 8: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/8.jpg)
Problem Definition
8
![Page 9: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/9.jpg)
Problem definitionEpistasis detection problem• Object
• Detection of causative SNPs for disease • Maximum value for defined measure
• Dataset• 0.5M ~ 1M SNPs
• 4,000 ~ 5,000 subjects• Binary disease status(case/control)
• 100MB ~ 1GB genotype data file
9
![Page 10: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/10.jpg)
Problem definition – Data format
𝑆𝑁𝑃𝟎 𝑆𝑁𝑃𝟏 𝑆𝑁𝑃𝟐 𝑆𝑁𝑃𝟑 𝑆𝑁𝑃𝟒 𝑆𝑁𝑃𝟓 𝑆𝑁𝑃𝟔 𝑆𝑁𝑃𝟕 𝑆𝑁𝑃𝟖 𝑆𝑁𝑃𝟗 𝐶𝐿𝐴𝑆𝑆
1 1 0 0 0 0 1 0 1 1 1
0 0 1 0 0 0 1 1 0 2 1
0 0 0 0 0 0 1 0 0 0 1
1 1 0 0 0 0 0 1 0 2 1
0 0 0 0 0 0 0 1 0 0 1
0 0 0 0 0 0 0 0 0 1 0
1 1 0 1 0 0 0 1 1 1 0
0 0 0 1 1 1 0 1 1 1 0
0 1 0 2 0 0 0 1 0 1 0
0 0 0 1 0 0 1 2 1 0 0
• 3 values for SNP columns - 0(AA), 1(Aa/aA), 2(AA/AA)• Binary values for CLASS – 1(case/affected subjects), 0(control/normal)
10
![Page 11: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/11.jpg)
Problem definition – measure(1/3)
• On contingency table
• Popular measure in epistasis detection• 𝜒2 − 𝑡𝑒𝑠𝑡
• Mutual information
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 39 91 95 92 14 31 63 4 71 500
Control 100 15 55 5 22 150 50 93 10 500
sum 139 106 150 97 36 181 113 97 81 1000
11
![Page 12: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/12.jpg)
Problem definition – measure(2/3)
• 𝜒2 − 𝑡𝑒𝑠𝑡 based on 𝜒2 − distribution• 𝐻0 : no association between SNPs and disease status
𝜒2 = (𝑂𝑖−𝐸𝑖)
2
𝐸𝑖Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 39 91 95 92 14 31 63 4 71 500
Control 100 15 55 5 22 150 50 93 10 500
sum 139 106 150 97 36 181 113 97 81 1000
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 69.5 53 75 48.5 18 90.5 56.5 48.5 40.5 500
Control 69.5 53 75 48.5 18 90.5 56.5 48.5 40.5 500
sum 139 106 150 97 36 181 113 97 81 1000
𝜒2 − 𝑣𝑎𝑙𝑢𝑒 ∶ 379.07, 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 2.76 × 10−77
Expected contingency table
12
![Page 13: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/13.jpg)
Problem definition – measure(3/3)
• Mutual information(1/3)• Non-parametric measure
13
![Page 14: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/14.jpg)
Problem definition – measure(3/3)
• Mutual information(2/3)
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 39 91 95 92 14 31 63 4 71 500
Control 100 15 55 5 22 150 50 93 10 500
sum 139 106 150 97 36 181 113 97 81 1000
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 0.039 0.091 0.095 0.092 0.014 0.031 0.063 0.004 0.071 0.500
Control 0.100 0.015 0.055 0.005 0.022 0.150 0.050 0.093 0.010 0.500
sum 0.139 0.106 0.150 0.097 0.036 0.181 0.113 0.097 0.081 1.000
Frequency Table
14
![Page 15: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/15.jpg)
Problem definition – measure(3/3)
• Mutual information(3/3)
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 0.183 0.315 0.323 0.317 0.086 0.155 0.251 0.032 0.271 0.500
Control 0.332 0.091 0.230 0.038 0.121 0.411 0.216 0.319 0.066 0.500
sum 0.396 0.343 0.411 0.326 0.173 0.446 0.355 0.326 0.294
Entropy Table
𝐼 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒; 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 = 𝐻 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 + 𝐻 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 − 𝐻 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒, 𝑑𝑖𝑠𝑒𝑎𝑠𝑒
= 3.07 + 1.00 − 3.76
= 0.31
15
![Page 16: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/16.jpg)
Methods to detect epistasis
16
![Page 17: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/17.jpg)
Methods – Computational Approaches• Multifactor Dimensionality Reduction
(Ritchie et al. 2002)
• SNPHarvester (Yang et al. 2009)
• SNPRuler (Wan et al. 2010)
• Mutual Information With Clustering(Leem et al. 2014)
17
![Page 18: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/18.jpg)
MethodsMultifactor dimensionality reduction(1/2)
18
![Page 19: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/19.jpg)
MethodsMultifactor dimensionality reduction(2/2)
• Model free, non-parametric methods
• Pattern-based method• Association rule for each combinations of SNPs and
phenotypes• i.e. 𝑖𝑓 𝑆𝑁𝑃10 = 0 𝑎𝑛𝑑 𝑆𝑁𝑃13 = 4 𝑡ℎ𝑒𝑛 𝑐𝑙𝑎𝑠𝑠 = 1
• Exhaustive Search• Computational Burden
• Cross Validation Consistency• To select best model
19
![Page 20: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/20.jpg)
MethodsSNPHarvester(1/2)
20
![Page 21: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/21.jpg)
MethodsSNPHarvester(2/2)• Local search
• Local optima problem
• PathSeeker algorithm• Successive Runs
• Score function : 𝜒2 − 𝑣𝑎𝑙𝑢𝑒
21
![Page 22: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/22.jpg)
MethodsSNPRuler• Pattern-based method
• Predictive rule
• Branch-and-bound algorithm• Upper bound of 𝜒2 − 𝑣𝑎𝑙𝑢𝑒 in d.f. is 1
22
![Page 23: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/23.jpg)
MethodsMutual Information With Clustering(1/2)
: SNPs: causative SNPs
d1
d2
distanceScore=d1+d2
Centroid 1
Centroid 2
Centroid 3
3 SNPs with the highest mutual
information value
m candidates
m candidates
m candidates
23
![Page 24: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/24.jpg)
MethodsMutual Information With Clustering(2/2)
• Mutual information• As distance measure for clustering
• K-means clustering algorithm• Candidate selection
• Reduce search space dramtically
• Can detect high-order epistatic interaction• Also, shows better performance (power, execution time)
than previous methods
24
![Page 25: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/25.jpg)
Challenges in epistasis detection
25
![Page 26: a brief introduction to epistasis detection](https://reader033.vdocument.in/reader033/viewer/2022052623/559cdfcd1a28ab84288b459b/html5/thumbnails/26.jpg)
Challenges
• Reducing computational burden• Filtering
• Parallel processing
• Higher-order epistatic interaction detection• Larger than 2
• Novel measure of association between SNPs and disease
26