statistical approaches to detect epistasis in genome wide

92
Statistical approaches to detect epistasis in Genome Wide Association Studies Virginie Stanislas Supervisors: Cyril Dalmasso and Christophe Ambroise Laboratoire de Mathématiques et Modélisation d’Évry December 18th, 2017

Upload: others

Post on 26-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical approaches to detect epistasis in Genome Wide

Statistical approaches to detect epistasis in GenomeWide Association Studies

Virginie StanislasSupervisors: Cyril Dalmasso and Christophe Ambroise

Laboratoire de Mathématiques et Modélisation d’Évry

December 18th, 2017

Page 2: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Summary1 General context

Complex diseasesGWASEpistasis

2 A new methodGeneral modeling approachInteractions constructionCoefficients estimation

3 Evaluation and comparisonSimulation designs and scenariosSetting parametersComparison with G-GEECase-control methods comparisonsNon parametric interaction modeling approach

4 ApplicationAnkylosing SpondylitisCrohn’s DiseaseAnalysis and results

5 ConclusionsV. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 1 / 54

Page 3: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Summary1 General context

Complex diseasesGWASEpistasis

2 A new methodGeneral modeling approachInteractions constructionCoefficients estimation

3 Evaluation and comparisonSimulation designs and scenariosSetting parametersComparison with G-GEECase-control methods comparisonsNon parametric interaction modeling approach

4 ApplicationAnkylosing SpondylitisCrohn’s DiseaseAnalysis and results

5 Conclusions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 2 / 54

Page 4: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Complex diseases

Monogenic disease Complex disease

Manolio et al. J Clin Invest. 2008 ;118(5):1590-1605.

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 3 / 54

Page 5: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Genome-Wide Association Studies

GWAS characteristics:Objective: find associations between genetic markers(SNPi,j ∈ 0, 1, 2) and a phenotypic trait (Yi ∈ 0, 1 or Yi ∈ R)

c©Pasieka, Science Photo Library

Genetic markers : SNP

https://genomainternational.com

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 4 / 54

Page 6: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Genome-Wide Association Studies

GWAS characteristics:Objective: find associations between genetic markers(SNPi,j ∈ 0, 1, 2) and a phenotypic trait (Yi ∈ 0, 1 or Yi ∈ R)

c©Pasieka, Science Photo Library

Genetic markers : SNP

https://genomainternational.com

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 4 / 54

Page 7: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Genome-Wide Association Studies

SNP analysisDifferences between cases and controls at a specific SNP

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 5 / 54

Page 8: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Genome-Wide Association StudiesSNP analysisDifferences between cases and controls at a specific SNP

GWAS limits:

Reproductibility

Genetic factors missing

Factors:

High dimension (p » n)

Small effects

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 5 / 54

Page 9: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Genome-Wide Association StudiesSNP analysisDifferences between cases and controls at a specific SNP

GWAS limits:

Reproductibility

Genetic factors missing

Missing heritability

Missing heritability factors:

Non consideration of rare variants(MAF < 0.1%)

Non consideration of structural variants(insertion, deletion, copy numbers...)

Incorrect estimation measure of heritability

Complex structure of genetic data

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 5 / 54

Page 10: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Definition

Epistasis: Interaction of alleles effects from different markers

locus 1locus 2 bb bB BB

aa 0 0 0aA 0 1 1AA 0 1 1

Different definitions according to disciplines with two major distinctions:

Biological epistasis

Statistical epistasis

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 6 / 54

Page 11: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Definition

Epistasis: Interaction of alleles effects from different markers

locus 1locus 2 bb bB BB

aa 0 0 0aA 0 1 1AA 0 1 1

Different definitions according to disciplines with two major distinctions:

Biological epistasis

Statistical epistasis

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 6 / 54

Page 12: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Definition

Biological epistasis:Physical interaction at the individual level

Moore & Williams Bioessays 2005 ;27(6):637-646.

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 7 / 54

Page 13: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Definition

Statistical epistasis:Deviation from additive effects of genetic variants at the population level

Moore & Williams Bioessays 2005 ;27(6):637-646.

A possible model:

logit[P(y = 1|x1, x2)] = β0 + β1x1 + β2x2 + x1x2

withy a binary phenotypex1, x2 the individual effect of both markers

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 8 / 54

Page 14: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Definition

Statistical epistasis:Deviation from additive effects of genetic variants at the population level

Moore & Williams Bioessays 2005 ;27(6):637-646.

A possible model:

logit[P(y = 1|x1, x2)] = β0 + β1x1 + β2x2 + β3x1x2

withy a binary phenotypex1, x2 the individual effect of both markers

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 8 / 54

Page 15: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Definition

Statistical epistasis:Deviation from additive effects of genetic variants at the population level

Moore & Williams Bioessays 2005 ;27(6):637-646.

A possible model:

logit[P(y = 1|x1, x2)] = β0 + β1x1 + β2x2 + β3x1x2

withy a binary phenotypex1, x2 the individual effect of both markers

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 8 / 54

Page 16: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological

: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 17: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs

: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 18: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality

: Correlation (linkage disequilibrium):between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 19: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markers

between observed and causal markers: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 20: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 21: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 22: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 23: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 24: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Challenges to detect it

Methodological: 5× 1011 pairwise interactions to investigate for a GWAS with 106 SNPs: Curse of dimensionality: Correlation (linkage disequilibrium):

between observed markersbetween observed and causal markers

: Distinction between marginal and interaction effects

Interpretation: Moving from statistical estimate of epistasis to biological epistasis

Epistasis is ubiquitous in human biologyInvestigation indispensable to understand genetic data

Large number of approaches proposed

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 9 / 54

Page 25: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - A variety of methods

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 10 / 54

Page 26: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Scale of interactions

Existing methods: : mainly SNP x SNP: some at a group scale

Group definition:: genes: haplotypes: ...

Advantages of group scaleapproaches:: genetic effects more detectable: reduce the number of variables: consideration of the correlation: results biologically interpretable

U.S. National Library of Medicine

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 11 / 54

Page 27: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Gene scale methods

Gene level test outside a regression framework:

Aggregating interaction testsCo-association tests

Gene level regression based approaches:

PCA

PLS

Kernel

+

Objectives: To develop a new gene scale method that:: considers a more accurate definition of interaction variables,: is applicable to numerous genes,: resorts to a group penalty

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 12 / 54

Page 28: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Gene scale methods

Gene level test outside a regression framework:

Aggregating interaction testsCo-association tests

Gene level regression based approaches:

PCA

PLS

Kernel

+ logistic regression(He 2011, Li 2009, Zhang 2008, Wang T 2009)

Objectives: To develop a new gene scale method that:: considers a more accurate definition of interaction variables,: is applicable to numerous genes,: resorts to a group penalty

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 12 / 54

Page 29: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Gene scale methods

Gene level test outside a regression framework:

Aggregating interaction testsCo-association tests

Gene level regression based approaches:

PCA

PLS

Kernel

+ penalized regression(D’Angelo 2009, Wang X 2014)

Objectives: To develop a new gene scale method that:: considers a more accurate definition of interaction variables,: is applicable to numerous genes,: resorts to a group penalty

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 12 / 54

Page 30: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Epistasis - Gene scale methods

Gene level test outside a regression framework:

Aggregating interaction testsCo-association tests

Gene level regression based approaches:

PCA

PLS

Kernel

+ penalized regression(D’Angelo 2009, Wang X 2014)

Objectives: To develop a new gene scale method that:: considers a more accurate definition of interaction variables,: is applicable to numerous genes,: resorts to a group penalty

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 12 / 54

Page 31: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Summary1 General context

Complex diseasesGWASEpistasis

2 A new methodGeneral modeling approachInteractions constructionCoefficients estimation

3 Evaluation and comparisonSimulation designs and scenariosSetting parametersComparison with G-GEECase-control methods comparisonsNon parametric interaction modeling approach

4 ApplicationAnkylosing SpondylitisCrohn’s DiseaseAnalysis and results

5 Conclusions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 13 / 54

Page 32: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Group modeling approach

SNP1,1 .. SNP1,p1 .. SNPG,1 .. SNPG,pGPheno

Ind1 1 0 0 1 y1Ind2 0 0 2 1 y2. 2 1 1 2 .. 0 1 0 0 .Indi 0 2 1 0 yi︸ ︷︷ ︸

gene1︸ ︷︷ ︸

geneG

We noteSNP1,1 = X1,1

r , s two genes

model:

g(E [y |X ]) =∑g

∑pg

βg ,pg X g ,pg︸ ︷︷ ︸Main effects

+∑r ,s

γr ,sZ r ,s︸ ︷︷ ︸Interaction effects

β =

β1,1, β1,2, · · · , β1,p1︸ ︷︷ ︸gene1

, · · · , βG,1, · · · , βG,pG︸ ︷︷ ︸geneG

T

γ =

γ12, ... γ1G︸︷︷︸γ1G,1,··· ,γ1G,q

...,γ(G−1)G

q: # of interaction variables for a couple

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 14 / 54

Page 33: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Group modeling approach

SNP1,1 .. SNP1,p1 .. SNPG,1 .. SNPG,pGPheno

Ind1 1 0 0 1 y1Ind2 0 0 2 1 y2. 2 1 1 2 .. 0 1 0 0 .Indi 0 2 1 0 yi︸ ︷︷ ︸

gene1︸ ︷︷ ︸

geneG

We noteSNP1,1 = X1,1

r , s two genes

model:

g(E [y |X ]) =∑g

∑pg

βg ,pg X g ,pg︸ ︷︷ ︸Main effects

+∑r ,s

γr ,sZ r ,s︸ ︷︷ ︸Interaction effects

β =

β1,1, β1,2, · · · , β1,p1︸ ︷︷ ︸gene1

, · · · , βG,1, · · · , βG,pG︸ ︷︷ ︸geneG

T

γ =

γ12, ... γ1G︸︷︷︸γ1G,1,··· ,γ1G,q

...,γ(G−1)G

q: # of interaction variables for a couple

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 14 / 54

Page 34: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variables construction:

Based on literature proposal:

methods criteria interaction term

Principal Component analysis (PCA) var(X rv) and var(X sv)∑q

j=1∑q

k=1 γrsjkT

rj T

sk

Partial Least Square (PLS) cov2(YX rc,X sw)∑q

j=1 γrsj T rs

j

Canonical Correlation Analysis (CCA) cor(X ra,X sb)∑q

j=1 γrsj U r

j Vsj

Original proposition: Gene-Gene Eigen Epistasis (G-GEE)We consider fu(X r ,X s) to represent the interaction between genes r , s.We can choose fu(X r ,X s) following two conditions:

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 15 / 54

Page 35: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variables construction:

Based on literature proposal:

methods criteria interaction term

Principal Component analysis (PCA) var(X rv) and var(X sv)∑q

j=1∑q

k=1 γrsjkT

rj T

sk

Partial Least Square (PLS) cov2(YX rc,X sw)∑q

j=1 γrsj T rs

j

Canonical Correlation Analysis (CCA) cor(X ra,X sb)∑q

j=1 γrsj U r

j Vsj

Original proposition: Gene-Gene Eigen Epistasis (G-GEE)We consider fu(X r ,X s) to represent the interaction between genes r , s.We can choose fu(X r ,X s) following two conditions:

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 15 / 54

Page 36: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variables construction:

Based on literature proposal:

methods criteria interaction term

Principal Component analysis (PCA) var(X rv) and var(X sv)∑q

j=1∑q

k=1 γrsjkT

rj T

sk

Partial Least Square (PLS) cov2(YX rc,X sw)∑q

j=1 γrsj T rs

j

Canonical Correlation Analysis (CCA) cor(X ra,X sb)∑q

j=1 γrsj U r

j Vsj

Original proposition: Gene-Gene Eigen Epistasis (G-GEE)We consider fu(X r ,X s) to represent the interaction between genes r , s.We can choose fu(X r ,X s) following two conditions:

criteria

: cov2((X r ,X s), fu(X r ,X s))

: cov2(y , fu(X r ,X s))

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 15 / 54

Page 37: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variables construction:

Based on literature proposal:

methods criteria interaction term

Principal Component analysis (PCA) var(X rv) and var(X sv)∑q

j=1∑q

k=1 γrsjkT

rj T

sk

Partial Least Square (PLS) cov2(YX rc,X sw)∑q

j=1 γrsj T rs

j

Canonical Correlation Analysis (CCA) cor(X ra,X sb)∑q

j=1 γrsj U r

j Vsj

Original proposition: Gene-Gene Eigen Epistasis (G-GEE)We consider fu(X r ,X s) to represent the interaction between genes r , s.We can choose fu(X r ,X s) following two conditions:

criteria methods

: cov2((X r ,X s), fu(X r ,X s)) G-GEEc1

: cov2(y , fu(X r ,X s)) G-GEEc2

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 15 / 54

Page 38: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variables construction: G-GEEc1

We set: fu(X r ,X s) = F rsu with F rs = X rijX

sik

j=1··· ,pr ;k=1,··· ,psi=1···n

u = arg minu,‖u‖=1

ˆcov2(X ,F rsu)

with X = (X r ,X s)

minu,‖u‖=1

|| ˆcov[F rsu,X ]||2 = minu,‖u‖=1

uTF rsTXXTF rsu

u : eigen vector associated to the smallest eigenvalue of F rsTXXTF rs

We then obtain for each couple (r , s) : Z rs = F rsu

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 16 / 54

Page 39: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variables construction: G-GEEc2

We set: fu(X r ,X s) = F rsu with F rs = X rijX

sik

j=1··· ,pr ;k=1,··· ,psi=1···n

u = arg maxu,‖u‖=1

ˆcov2(y ,F rsu)

maxu,‖u‖=1

|| ˆcov[F rsu, y ]||2 = maxu,‖u‖=1

uTF rsTyyTF rsu

u : eigen vector associated to the largest eigenvalue of F rsTyyTF rs

u = F rsTy

We then obtain for each couple (r , s) : Z rs = F rsu = F rsF rsTy

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 17 / 54

Page 40: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variable modeling approaches comparison

methods criteria interaction term

G-GEEc1 cov2(X , fu(X r ,X s)) F rsuγrs

G-GEEc2 cov2(Y , fu(X r ,X s)) F rsuγrs

PCA var(X rv) and var(X sv)∑q

j=1∑q

k=1 γrsjkT

rj T

sk

PLS cov2(YX rc ,X sw)∑q

j=1 γrsj T rs

j

CCA cor(X ra,X sb)∑q

j=1 γrsj U r

j Vsj

g(E [y |X ]) =∑g

∑pg

βg ,pg X g ,pg +∑r ,s

γr ,sZ r ,s

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 18 / 54

Page 41: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Interaction variable modeling approaches comparison

methods criteria interaction term

G-GEEc1 cov2(X , fu(X r ,X s)) F rsuγrs

G-GEEc2 cov2(Y , fu(X r ,X s)) F rsuγrs

PCA var(X rv) and var(X sv)∑q

j=1∑q

k=1 γrsjkT

rj T

sk

PLS cov2(YX rc ,X sw)∑q

j=1 γrsj T rs

j

CCA cor(X ra,X sb)∑q

j=1 γrsj U r

j Vsj

g(E [y |X ]) =∑g

∑pg

βg ,pg X g ,pg +∑r ,s

γr ,sZ r ,s

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 18 / 54

Page 42: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation

Group LASSO regression

(β, γ) = argminβ,γ

∑i

−logL(yi ; X iβ + Z iγ) + λ

∑g

√pg ||βg ||2 +

∑rs

√prps ||γrs ||2

Limits of the groupLASSO regression:

P(S∗ ⊂ S) −→n→+∞

1 but |S | |S∗|

Difficult to compute p-value or confidence interval

Adaptive-Ridge Cleaning Becu JM et al., 2017

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 19 / 54

Page 43: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation

Group LASSO regression

(β, γ) = argminβ,γ

∑i

−logL(yi ; X iβ + Z iγ) + λ

∑g

√pg ||βg ||2 +

∑rs

√prps ||γrs ||2

Limits of the groupLASSO regression:

P(S∗ ⊂ S) −→n→+∞

1 but |S | |S∗|

Difficult to compute p-value or confidence interval

Adaptive-Ridge Cleaning Becu JM et al., 2017

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 19 / 54

Page 44: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation

Group LASSO regression

(β, γ) = argminβ,γ

∑i

−logL(yi ; X iβ + Z iγ) + λ

∑g

√pg ||βg ||2 +

∑rs

√prps ||γrs ||2

Limits of the groupLASSO regression:

P(S∗ ⊂ S) −→n→+∞

1 but |S | |S∗|

Difficult to compute p-value or confidence interval

Adaptive-Ridge Cleaning Becu JM et al., 2017

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 19 / 54

Page 45: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Setting: Hθ = Xβ + ZγSplit randomly H in two subsets H1 and H2 of size n/2

First stage: Screening on H1

θ = argminθ

(∑i

−logL(yi ; H1iθ) + λ

[∑g

√pg ||θg ||2

])

: S : support of the selected groups

Second stage: Cleaning on H2

θ = argminθ ; θj=0 if j /∈S

(∑i

−logL(yi ; H2iθ) + µ

[∑g

∑j∈g

λ√pg

||θg ||2θ2j

])

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 20 / 54

Page 46: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Setting: Hθ = Xβ + ZγSplit randomly H in two subsets H1 and H2 of size n/2

First stage: Screening on H1

θ = argminθ

(∑i

−logL(yi ; H1iθ) + λ

[∑g

√pg ||θg ||2

])

: S : support of the selected groups

Second stage: Cleaning on H2

θ = argminθ ; θj=0 if j /∈S

(∑i

−logL(yi ; H2iθ) + µ

[∑g

∑j∈g

λ√pg

||θg ||2θ2j

])

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 20 / 54

Page 47: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Setting: Hθ = Xβ + ZγSplit randomly H in two subsets H1 and H2 of size n/2

First stage: Screening on H1

θ = argminθ

(∑i

−logL(yi ; H1iθ) + λ

[∑g

√pg ||θg ||2

])

: S : support of the selected groups

Second stage: Cleaning on H2

θ = argminθ ; θj=0 if j /∈S

(∑i

−logL(yi ; H2iθ) + µ

[∑g

∑j∈g

λ√pg

||θg ||2θ2j

])

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 20 / 54

Page 48: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Setting: Hθ = Xβ + ZγSplit randomly H in two subsets H1 and H2 of size n/2

First stage: Screening on H1

θ = argminθ

(∑i

−logL(yi ; H1iθ) + λ

[∑g

√pg ||θg ||2

])

: S : support of the selected groups

Second stage: Cleaning on H2

θ = argminθ ; θj=0 if j /∈S

(∑i

−logL(yi ; H2iθ) + µ

[∑g

∑j∈g

λ√pg

||θg ||2θ2j

])

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 20 / 54

Page 49: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Significance of θ: Permutation test based on a Fisher test approach

Fg =

∑i (yi − yωi )

2 −∑

i (yi − yΩi )

2∑i (yi − yΩ

i )2

F ∗g =

∑i (yi − yωi )

2 −∑

i (yi − yΩ∗i )2∑

i (yi − yΩ∗i )2

With:yω: predicted values obtained without the group gyΩ: predicted values using all groups g ∈ SyΩ∗: predicted values using all groups g ∈ S on the matrix H∗ ofpermuted elements for columns corresponding to group g

Empirical p-values:

pg =1B

B∑b=1

1Fg≤F∗bg

with B the number of permutations

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 21 / 54

Page 50: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Significance of θ: Permutation test based on a Fisher test approach

Fg =

∑i (yi − yωi )

2 −∑

i (yi − yΩi )

2∑i (yi − yΩ

i )2

F ∗g =

∑i (yi − yωi )

2 −∑

i (yi − yΩ∗i )2∑

i (yi − yΩ∗i )2

With:yω: predicted values obtained without the group gyΩ: predicted values using all groups g ∈ S

yΩ∗: predicted values using all groups g ∈ S on the matrix H∗ ofpermuted elements for columns corresponding to group g

Empirical p-values:

pg =1B

B∑b=1

1Fg≤F∗bg

with B the number of permutations

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 21 / 54

Page 51: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Significance of θ: Permutation test based on a Fisher test approach

Fg =

∑i (yi − yωi )

2 −∑

i (yi − yΩi )

2∑i (yi − yΩ

i )2

F ∗g =

∑i (yi − yωi )

2 −∑

i (yi − yΩ∗i )2∑

i (yi − yΩ∗i )2

With:yω: predicted values obtained without the group gyΩ: predicted values using all groups g ∈ SyΩ∗: predicted values using all groups g ∈ S on the matrix H∗ ofpermuted elements for columns corresponding to group g

Empirical p-values:

pg =1B

B∑b=1

1Fg≤F∗bg

with B the number of permutations

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 21 / 54

Page 52: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Coefficients estimation: Adaptive-Ridge Cleaning

Significance of θ: Permutation test based on a Fisher test approach

Fg =

∑i (yi − yωi )

2 −∑

i (yi − yΩi )

2∑i (yi − yΩ

i )2

F ∗g =

∑i (yi − yωi )

2 −∑

i (yi − yΩ∗i )2∑

i (yi − yΩ∗i )2

With:yω: predicted values obtained without the group gyΩ: predicted values using all groups g ∈ SyΩ∗: predicted values using all groups g ∈ S on the matrix H∗ ofpermuted elements for columns corresponding to group g

Empirical p-values:

pg =1B

B∑b=1

1Fg≤F∗bg

with B the number of permutationsV. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 21 / 54

Page 53: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Summary1 General context

Complex diseasesGWASEpistasis

2 A new methodGeneral modeling approachInteractions constructionCoefficients estimation

3 Evaluation and comparisonSimulation designs and scenariosSetting parametersComparison with G-GEECase-control methods comparisonsNon parametric interaction modeling approach

4 ApplicationAnkylosing SpondylitisCrohn’s DiseaseAnalysis and results

5 Conclusions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 22 / 54

Page 54: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Simulations design: Genotype

Completely simulated genotype:

X i ∼ Np(0,Σ) with Σ a block diagonal correlation matrix(ρ correlation level for two SNPs in the same gene)

MAF j ∼ U [0.05, 0.5] with fixed MAFj if j causal SNP

Genotype from real data:

From a real data set composed of 763 individuals and 63,340 SNPsstructured in 7216 genes.

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 23 / 54

Page 55: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Simulations design: Phenotype

from Wang X et al., 2014:

g(E [y i |(X i ,Z i )]) = β0 +∑g

βg

(∑k∈C

X gik

)+∑rs

γrs

∑(j,k)∈C2

X rijX

sik

PCA model:

g(E [y i |(X i ,Z i )]) = β0 +∑g

βg

(∑k∈C

X gik

)+∑rs

γrsCri1C

si1.

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 24 / 54

Page 56: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Simulations design

Adjustment of the strength of association for continuous outcomes

: εi generated from N (0, σ2): σ2 determined from R2 coefficient

We note Hθ = [X ,Z ]

[βγ

], and R2 =

∑(Hiθ − y)2∑

(Hiθ + εi − y)2

We can determined an expression for σ2

σ2 =(1− R2)

∑(Hiθ − y)2

R2(n − 2)

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 25 / 54

Page 57: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Simulations studies

First comparison: PCA, PLS and CCAChoosing the parameters

Second comparison: with G-GEEc1 and G-GEEc2Using completely simulated genotypeUsing genotype from a real data set

Third comparison: Case-control methods

Fourth comparison: Investigation of new interaction variable definitions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 26 / 54

Page 58: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

First comparison: methods issued of the literature

Design: Completely simulated genotypeContinuous phenotype from Wang X et al., 2014

Parameters:

Correlation among SNPs ρMAF values of causal SNPsValues of β and γ

Number of componentsR2

Number of genes

Number of SNPs by genesNumber of causal SNPs bycausal genesNumber of subjectsMarginal or/and interactioneffects

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 27 / 54

Page 59: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

First comparison: methods issued of the literature

Design: Completely simulated genotypeContinuous phenotype from Wang X et al., 2014

Parameters:

Correlation among SNPs ρMAF values of causal SNPsValues of β and γ

Number of componentsR2

Number of genes

Number of SNPs by genesNumber of causal SNPs bycausal genesNumber of subjectsMarginal or/and interactioneffects

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 27 / 54

Page 60: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

First comparison: methods issued of the literature

CCA PCA PLS

pow

erM

ain

0.0

0.2

0.4

0.6

0.8

1.0

CCA PCA PLS

pow

erP

air

0.0

0.2

0.4

0.6

0.8

1.0

CCA PCA PLS

FP

mai

n0.

00.

20.

40.

60.

81.

0

CCA PCA PLS

FP

pair

0.0

0.2

0.4

0.6

0.8

1.0

MainEff noMainEff

CCA PCA PLS

pow

erM

ain

0.0

0.2

0.4

0.6

0.8

1.0

CCA PCA PLS

pow

erP

air

0.0

0.2

0.4

0.6

0.8

1.0

CCA PCA PLS

FP

mai

n0.

00.

20.

40.

60.

81.

0CCA PCA PLS

FP

pair

0.0

0.2

0.4

0.6

0.8

1.0

R2=0.2 R2=0.05

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 28 / 54

Page 61: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

First comparison: methods issued of the literature

Parameters:

ρ = 0.8MAF = 0.2β = γ = 2Number of components =2R2

Number of genes=6

Number of SNPs by genes=6Number of causal SNPs bycausal genes=2Number of subjects=600Marginal or/and interactioneffects

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 29 / 54

Page 62: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Simulations studies

First comparison: PCA, PLS and CCAChoosing the parameters

Second comparison: with G-GEEc1 and G-GEEc2Using completely simulated genotypeUsing genotype from a real data set

Third comparison: Case-control methods

Fourth comparison: Investigation of new interaction variable definitions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 30 / 54

Page 63: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Second comparison: G-GEE and simulated genotypes

Wang model

PCA model

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Gen

e P

air

Pow

er

R2

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Gen

e P

air

Pow

er

R2

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Gen

e P

air

Pow

er

R2

0.0 0.2 0.4 0.6 0.8

0.0

0.2

0.4

0.6

0.8

1.0

Gen

e P

air

Pow

er

R2

GGEE1 GGEE2 CCA PCA PLS

: Main effects:gene 1gene 2

: Interaction effects:gene 1 x gene 2

: Interaction effects:gene 3 x gene 4

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 31 / 54

Page 64: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Discoveries matrix - an example

GGEE PCA PLS

X.Gene5.Gene6

X.Gene4.Gene6

X.Gene4.Gene5

X.Gene3.Gene6

X.Gene3.Gene5

X.Gene3.Gene4

X.Gene2.Gene6

X.Gene2.Gene5

X.Gene2.Gene4

X.Gene2.Gene3

X.Gene1.Gene6

X.Gene1.Gene5

X.Gene1.Gene4

X.Gene1.Gene3

X.Gene1.Gene2

Gene6

Gene5

Gene4

Gene3

Gene2

Gene1

0.0

0.2

0.4

0.6

0.8

1.0

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 32 / 54

Page 65: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Second comparison: G-GEE and simulated Genotypes - R2 = 0.2

Wang model

PCA model

GGEE1 GGEE2 CCA PCA PLS

X.Genes5.Genes6X.Genes4.Genes6X.Genes4.Genes5X.Genes3.Genes6X.Genes3.Genes5X.Genes3.Genes4X.Genes2.Genes6X.Genes2.Genes5X.Genes2.Genes4X.Genes2.Genes3X.Genes1.Genes6X.Genes1.Genes5X.Genes1.Genes4X.Genes1.Genes3X.Genes1.Genes2

Genes6Genes5Genes4Genes3Genes2Genes1

R2 = 0.2

GGEE1 GGEE2 CCA PCA PLS

X.Genes5.Genes6X.Genes4.Genes6X.Genes4.Genes5X.Genes3.Genes6X.Genes3.Genes5X.Genes3.Genes4X.Genes2.Genes6X.Genes2.Genes5X.Genes2.Genes4X.Genes2.Genes3X.Genes1.Genes6X.Genes1.Genes5X.Genes1.Genes4X.Genes1.Genes3X.Genes1.Genes2

Genes6Genes5Genes4Genes3Genes2Genes1

R2 = 0.2

GGEE1 GGEE2 CCA PCA PLS

X.Genes5.Genes6X.Genes4.Genes6X.Genes4.Genes5X.Genes3.Genes6X.Genes3.Genes5X.Genes3.Genes4X.Genes2.Genes6X.Genes2.Genes5X.Genes2.Genes4X.Genes2.Genes3X.Genes1.Genes6X.Genes1.Genes5X.Genes1.Genes4X.Genes1.Genes3X.Genes1.Genes2

Genes6Genes5Genes4Genes3Genes2Genes1

R2 = 0.2

GGEE1 GGEE2 CCA PCA PLS

X.Genes5.Genes6X.Genes4.Genes6X.Genes4.Genes5X.Genes3.Genes6X.Genes3.Genes5X.Genes3.Genes4X.Genes2.Genes6X.Genes2.Genes5X.Genes2.Genes4X.Genes2.Genes3X.Genes1.Genes6X.Genes1.Genes5X.Genes1.Genes4X.Genes1.Genes3X.Genes1.Genes2

Genes6Genes5Genes4Genes3Genes2Genes1

R2 = 0.2

: Main effects:gene 1gene 2

: Interaction effects:gene 1 x gene 2

: Interaction effects:gene 3 x gene 4

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 33 / 54

Page 66: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Second comparison: G-GEE and real genotypes

Settings

Main effects:gene 1gene 2

Interaction effects:gene 1 x gene 2

Main effects:gene 1gene 2

Interaction effects:gene 3 x gene 4

Main effects:-

Interaction effects:gene 1 x gene 2

Wangsimulationmodel

PCAsimulationmodel

0.0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4R2

GGEE PCA PLS

0.0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4R2

GGEE PCA PLS

0.0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4R2

GGEE PCA PLS

0.0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4R2

GGEE PCA PLS

0.0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4R2

GGEE PCA PLS

0.0

0.2

0.4

0.6

0.8

1.0

0.1 0.2 0.3 0.4R2

GGEE PCA PLS

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 34 / 54

Page 67: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Second comparison: G-GEE and real genotypes - R2 = 0.2

Settings

Main effects:gene 1gene 2

Interaction effects:gene 1 x gene 2

Main effects:gene 1gene 2

Interaction effects:gene 3 x gene 4

Main effects:-

Interaction effects:gene 1 x gene 2

Wangsimulationmodel

PCAsimulationmodel

GGEE PCA PLSX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE PCA PLSX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE PCA PLSX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE PCA PLSX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE PCA PLSX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE PCA PLSX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 35 / 54

Page 68: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Simulations studies

First comparison: PCA, PLS and CCAChoosing the parameters

Second comparison: G-GEEc1 and G-GEEc2Using completely simulated genotypeUsing genotype from a real data set

Third comparison: Case-control methods

Fourth comparison: Investigation of new interaction variable definitions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 36 / 54

Page 69: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Third comparison: Case-control methods

Methods defined outside a regression framework

Aggregating tests: minP (Emily et al. ,2016): GATES (Li et al., 2011)

Co-association test: PLSPM (Zhang et al., 2013)

LD based test: CLD (Rajapakse et al., 2012)

Entropy based method: GBIBM (Li et al., 2015)

Package R: GeneGeneInteR (Emily et al. ,2017)

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 37 / 54

Page 70: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Third comparison: Case-control methods

Design:Real GenotypesContinuous phenotype simulation from Wang X et al., 2014 :

GGEE CLD PLSPM GBIGM minP GATES

X.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE CLD PLSPM GBIGM minP GATES

X.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

Main effects:gene 1gene 2

Interaction effects:gene 1 x gene 2-

Main effects:gene 1gene 2

Interaction effects:gene 3 x gene 4-

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 38 / 54

Page 71: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Third comparison: Case-control methods

Design:Completely simulated GenotypesContinuous phenotype simulation from Wang X et al., 2014 :

GGEE minPX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE minPX.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

Main effects:gene 1gene 2

Interaction effects:gene 1 x gene 2-

Main effects:gene 1gene 2

Interaction effects:gene 3 x gene 4-

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 39 / 54

Page 72: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Simulations studies

First comparison: PCA, PLS and CCAChoosing the parameters

Second comparison: with G-GEEc1 and G-GEEc2Using completely simulated genotypeUsing genotype from a real data set

Third comparison: Case-control methods

Fourth comparison: Investigation of new interaction variable definitions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 40 / 54

Page 73: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Fourth comparison: Machine Learning based approaches

With G-GEEc2, we looked for:

u = arg maxu,‖u‖=1

cov2(y , fu(X r ,X s))

with fu(X r ,X s) = F rsu and F rs = X rijX

sik

j=1··· ,pr ;k=1,··· ,psi=1···n

We now find new functions fu(X r ,X s) that maximized the criteria:

EX r ,X s ,Y [(y − fu(X r ,X s))2]

With the following non parametric approaches:Random ForestsBoostingSVMNeural Network

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 41 / 54

Page 74: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Fourth comparison: Machine Learning based approaches -R2 = 0.4

Design:Real GenotypesContinuous phenotype simulation from Wang X et al., 2014 :

GGEE RF RF_F BOOST NN

X.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE RF RF_F BOOST NN

X.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

Main effects:gene 1gene 2

Interaction effects:gene 1 x gene 2-

Main effects:gene 1gene 2

Interaction effects:gene 3 x gene 4-

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 42 / 54

Page 75: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Fourth comparison: Machine Learning based approaches -R2 = 0.4

Design:Real GenotypesContinuous phenotype simulation from Wang X et al., 2014 :

GGEE SVMlin SVMpol SVMpol5 SVMrad

X.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

GGEE SVMlin SVMpol SVMpol5 SVMrad

X.Gene5.Gene6X.Gene4.Gene6X.Gene4.Gene5X.Gene3.Gene6X.Gene3.Gene5X.Gene3.Gene4X.Gene2.Gene6X.Gene2.Gene5X.Gene2.Gene4X.Gene2.Gene3X.Gene1.Gene6X.Gene1.Gene5X.Gene1.Gene4X.Gene1.Gene3X.Gene1.Gene2

Gene6Gene5Gene4Gene3Gene2Gene1

Main effects:gene 1gene 2

Interaction effects:gene 1 x gene 2-

Main effects:gene 1gene 2

Interaction effects:gene 3 x gene 4-

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 43 / 54

Page 76: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Summary1 General context

Complex diseasesGWASEpistasis

2 A new methodGeneral modeling approachInteractions constructionCoefficients estimation

3 Evaluation and comparisonSimulation designs and scenariosSetting parametersComparison with G-GEECase-control methods comparisonsNon parametric interaction modeling approach

4 ApplicationAnkylosing SpondylitisCrohn’s DiseaseAnalysis and results

5 Conclusions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 44 / 54

Page 77: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Ankylosing Spondylitis

Chronic inflammatory disease of the axial skeleton

Epidemiology:Age at first symptoms: 20 - 30 yearsSexe: predominance for men (sex ratio 2M:1W)Prevalence: depend of populations (0.1% - 1.4%)

http://b4tea.com/

Risk factors:Strong genetic component(heritability >90%)Importance of HLA complex

HLA complex:Localized on chromosome 6Regroup about 200 genesCoding the immunity systemAntigen HLA-B27 : associated toSPA

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 45 / 54

Page 78: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Crohn’s Disease

Form of chronic inflammation bowel disease

Epidemiology:

Prevalence: 10-30 per 100, 000(Europe and North America)More common in theindustrialized worldMedian onset of disease: 30 years

Ananthakrishnan, Nat. Rev. Gastroenterol. Hepatol 2015

Multiple risk factors:

EnvironmentalMicrobiotaGenetic

Genetic factors:: NOD2, first identified mutation: Potential interactions:

NOD2 and TLR proteinsNOD2 and CTLA4IL23R and CTLA4NOD2 and IBD5IBD5, ATGL16L1 and IL23R

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 46 / 54

Page 79: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Quality controls and filtering

Markers filtering:

SNP call rate ≤ 95%

MAF ≤ 5%

Deviation from Hardy Weinberg Equilibrium in controls (p < 1× 10−5)

Duplicates

SNPs not belonging to one unique gene

Subject filtering:

Sample call rate ≤ 93%

Duplicates

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 47 / 54

Page 80: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Ankylosing Spondylitis

Data set: International Genetics of Ankylosing Spondylitis study

401 cases357 controls6 611 genes51 287 SNPs

: 29 known genes: 62 genes from an univariate analysis: 91 genes to investigate

Significant resultsG-GEE NKX2-3 x HCG27PLS HLA-B

HCP5HLAB x HCG27

PCA HLA-BEOMES x HCP5IL1R2 x MICB

ZFP57 x LOC101929772TRIM31 x HCG26

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 48 / 54

Page 81: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Ankylosing Spondylitis

Data set: International Genetics of Ankylosing Spondylitis study

401 cases357 controls6 611 genes51 287 SNPs

: 29 known genes: 62 genes from an univariate analysis: 91 genes to investigate

Significant resultsG-GEE NKX2-3 x HCG27PLS HLA-B

HCP5HLAB x HCG27

PCA HLA-BEOMES x HCP5IL1R2 x MICB

ZFP57 x LOC101929772TRIM31 x HCG26

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 48 / 54

Page 82: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Ankylosing Spondylitis

Data set: International Genetics of Ankylosing Spondylitis study

401 cases357 controls6 611 genes51 287 SNPs

: 29 known genes: 62 genes from an univariate analysis: 91 genes to investigate

Significant resultsG-GEE NKX2-3 x HCG27PLS HLA-B

HCP5HLAB x HCG27

PCA HLA-BEOMES x HCP5IL1R2 x MICB

ZFP57 x LOC101929772TRIM31 x HCG26

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 48 / 54

Page 83: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Ankylosing Spondylitis

Data set: International Genetics of Ankylosing Spondylitis study

401 cases357 controls6 611 genes51 287 SNPs

: 29 known genes: 62 genes from an univariate analysis: 91 genes to investigate

Significant resultsG-GEE NKX2-3 x HCG27PLS HLA-B

HCP5HLAB x HCG27

PCA HLA-BEOMES x HCP5IL1R2 x MICB

ZFP57 x LOC101929772TRIM31 x HCG26

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 48 / 54

Page 84: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Crohn’s Disease

Data set: Wellcome Trust Case-Control Consortium

1938 cases1500 controls17 304 genes140 487 SNPs

: 72 known genes: 60 genes from an univariate analysis(22 known): 110 genes to investigate

Significant resultsG-GEE LOC105369715 x STAT1

STAT1 x CD6PLS IFNGR1 x SBNO2

IRGM x NOD2PCA IRGM

LOC101929544 x TLR4BATF x IL10

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 49 / 54

Page 85: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Crohn’s Disease

Data set: Wellcome Trust Case-Control Consortium

1938 cases1500 controls17 304 genes140 487 SNPs

: 72 known genes: 60 genes from an univariate analysis(22 known): 110 genes to investigate

Significant resultsG-GEE LOC105369715 x STAT1

STAT1 x CD6PLS IFNGR1 x SBNO2

IRGM x NOD2PCA IRGM

LOC101929544 x TLR4BATF x IL10

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 49 / 54

Page 86: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Crohn’s Disease

Data set: Wellcome Trust Case-Control Consortium

1938 cases1500 controls17 304 genes140 487 SNPs

: 72 known genes: 60 genes from an univariate analysis(22 known): 110 genes to investigate

Significant resultsG-GEE LOC105369715 x STAT1

STAT1 x CD6PLS IFNGR1 x SBNO2

IRGM x NOD2PCA IRGM

LOC101929544 x TLR4BATF x IL10

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 49 / 54

Page 87: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Crohn’s Disease

Data set: Wellcome Trust Case-Control Consortium

1938 cases1500 controls17 304 genes140 487 SNPs

: 72 known genes: 60 genes from an univariate analysis(22 known): 110 genes to investigate

Significant resultsG-GEE LOC105369715 x STAT1

STAT1 x CD6PLS IFNGR1 x SBNO2

IRGM x NOD2PCA IRGM

LOC101929544 x TLR4BATF x IL10

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 49 / 54

Page 88: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Summary1 General context

Complex diseasesGWASEpistasis

2 A new methodGeneral modeling approachInteractions constructionCoefficients estimation

3 Evaluation and comparisonSimulation designs and scenariosSetting parametersComparison with G-GEECase-control methods comparisonsNon parametric interaction modeling approach

4 ApplicationAnkylosing SpondylitisCrohn’s DiseaseAnalysis and results

5 Conclusions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 50 / 54

Page 89: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Conclusions and perspectives

Contributions:

: Proposition of a new Group LASSO framework

: Proposition of an original interaction modeling

Pubication, software and presentations:

: Package G-GEE available on Github

: Stanislas, V., Dalmasso, C., and Ambroise, C. (2017). Eigen-Epistasis fordetecting gene-gene interactions. BMC Bioinformatics, 18(1):54.

: 4 talks and 3 posters in international conferences

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 51 / 54

Page 90: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Conclusions and perspectives

Limitations:

: Number of SNPs by genes to analyze

: Computation costs for estimation coefficients

: Choice of the genes to consider

: Confusion phenomenon

: Sensitive to group definition

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 52 / 54

Page 91: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Conclusions and perspectives

Perspectives:

: Explore new fu(X r ,X s) definitions

: Optimization of the computational cost of F rs

: New selection of the parameter λ

: Using another penalization regression framework

: Gene selection using biological knowledge

: Investigate other grouping definitions

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 53 / 54

Page 92: Statistical approaches to detect epistasis in Genome Wide

General context A new method Evaluation and comparison Application Conclusions

Thank you for your attention !

V. Stanislas Statistical approaches to detect epistasis in Genome Wide Association Studies 54 / 54