Epidemiology 719
Quantitative methods in genetic epidemiology Bhramar Mukherjee and Sebastian Zoellner [email protected] [email protected]
Acknowledgements
• Peter Kraft (HSPH)
• Ken Rice (UW)
• Nilanjan Chatterjee (NCI)
• Stephen Channock (NCI)
• Lu Wang (UM)
• Nan Laird (HSPH)
• Goncalo Abecasis (UM)
Course OverviewA brave new world
Reverse Effects
Central Course Theme
Genetic Association and Gene-Environment Interaction
Course Advice for You:
Assigned Paper 1
Assigned Paper 1
• GWAS of Age-related macular degeneration
• Initial GWAS identified four loci explaining one-half of the heritability. Appreciable predictive power.
• Additional GWAS to explain remaining heritability. Combined scan vs replication. Meta-Analysis.
Assigned Paper 2
Assigned Paper 2
• Collaborative Association Study of Psoriasis
• Examined ~1,500 cases / ~1,500 controls at ~500,000 SNPs
• • Examined 20 promising SNPs in extra ~5,000 cases / ~5,000 controls
• Outcome: 7 regions of confirmed association with psoriasis
Assigned Paper 3
Assigned Paper 3
• Meta-analysis of colorectal cancer (COGENT study) .
• A thorough evaluation of ten confirmed loci for colorectal cancer. Very detailed. Supplementary material also available online.
• Interesting combination of various study design.
Tests for Association
Basic principle of GWAS
Depends on study design
• Case-control study
• Family-based study: case-parent triad, case-sib pairs being popular choices
• Longitudinal Cohort Study
• Looking at a secondary outcome under case-control sampling
The GWAS Mantra!
Primary Analysis
• Single marker association tests
• Genetic susceptibility model
- Dominant, recessive, co-dominant
• Which test to use
• Multiple testing correction
Case-Control Study: Standard Analysis
Pros and Cons
• Simple, Complete.
• Robust to misspecification of the true dominance pattern
• Less powerful.
• Unreliable for sparse table
Pros and Cons
• Test statistic has single df, so more powerful.
• Simple to report.
• Not robust to true mode of dominance
• Does not present entire information in the data.
Armitage’s trend test
• Test linear trend in log(OR) with # A allele
• Test statistic still has single d.f.• Simplicity, use information from the 2 x 3 array• More robust than 2 x 2 tests, but less robust
than the 2 d.f. test.
Allelic test
• Previous tests were based on genotype
• Can also treat allele as the unit of observation.
• You have doubled the sample size!!
But…
• Serious impact on Type 1 error under departures from HWE
• Interpretation becomes trickier.
Example
AIC: Akaike information criterion, lower the value, better is model fit
Using logistic regression• Trick: Just code genotype differently• Dominant: G=1 if AA or Aa, 0 otherwise• Recessive: G=1 if AA, 0 otherwise• Trend: G=# A alleles, thus G=2 if AA, =1 if Aa
and 0 if aa• Two df test: Create two dummy variables:G1=1 if Aa and 0 otherwiseG2=1 if AA and 0 otherwisePerform likelihood ratio test of full (G1 and G2) vs
reduced model (No G1, G2).• Adjust for other variables, fit a multivariate model
Example
Flip Flops
• Under a co-dominant model you see a non-monotone trend, i.e.,
• OR(Aa)<1 and OR(AA)>1
• You will likely miss these under the trend test.
Alternative tests• Use alternative maximal test statistic• Calculate dominant, recessive, trend, co-
dominabt: take maximum test statistic• Use permutation to get right P-valueCaveats• Resist temptations of going on to a fishing
expedition. “MOST SIGNIFICANT CODING”
• Mode of inheritance models were developed for simple mendelian disease with near-complete penetrance, much more difficult to believe for complex diseases.
Reliable Test
• Co-dominant is model free
• Not much loss of power unless AA (homozygous carriers) are very rare.
• Log additive is what is reported most of the time to risk some false positives but enhance power.
QQ Plots• Nice visual tools for checking association and
systematic biases
• Plot observed (-log10)P-values versus expected under the global null (i.e. quantiles draw from U[0,1])
• Since vast majority (>99%) of tested markers are not associated with the trait, plot should fall along y=x line (if we are lucky we will see a few departures in the tail).
• Departures could be due to stratification, cryptic relatedness, differential genotyping error, incorrect test.
Clear population stratification bias
Family Based Studies
• You heard about population stratification.• Solutions: -Match on self-reported ethnicity-Adjust for Principal components extracted from
markers.-Use family based controlsCase-sib (conditional logistic regression)Case-parent (TDT, FBAT)Nuclear Families Extended Pedigrees
Why use families
• Robust tests.
• Detect genotyping error with inheritance impossibilities. (mother and father AA, offspring Aa).
• Do not have to think about selection of “good” controls.
Why not use families
• Case-control typically more powerful unless the disease is very rare.
• Not just logistic regression or trend chi-squared as analysis tools.
• Much harder to recruit (depending on the disease : late onset or childhood disorder).
Hypotheses
• Case-control design
• Family-based designs have no power to detect association unless linkage is present.
• When testing for association in family-based design, HA is always: both linkage and association is present between the marker and disease susceptibility locus (DSL) underlying the trait.
The null hypotheses
Mendelian Transmission
Transmission Disequilibrium Test
• Spielman et al, AJHG, 1998
• H0: Association but no linkage
A simple statistical test
• No variation in diagonal elements (homozygote parents have no uncertainty in determining the conditional genotype distribution of the offspring).
• Under the null (i.e., Mendelian transmission) x| x+y~Binomial (n=x+y, p=1/2)
• Similarly, like McNemar’s test:
TDT Example
Test statistic:
FBAT : Family based association tests
• Extending TDT beyond case-parent trios
• Test statistic for FBAT mimics a natural covariance function between trait and genotype.
• i: family j: individual Sum over all i and j
• T: Trait (centered) X: Coded Genotype
• S: parental genotype or a sufficient statistic for parental genotype
Details
• The E(X|S) is calculated under Mendelian transmission.
• X-E(X|S): residual of the transmission of parental genotype to offspring.
• You basically assess whether there is any association between the trait and this genotype residual.
Test statistic
• Under all three nulls, E(U)=0.
• Note all expectation and variance are on X, conditional on parental genotype and trait T.
Test Statistic:
Another Tool: Conditional Likelihood
• Used in case-sib studies, in general in matched studies. Strata: Family/pair.
• Breslow et al (1978) first proposed this tool for matched case-control data.
• R function clogit does this. Underlying codes use survival model as there is connection with partial likelihood from Cox’s proportional hazard model.
• For case-sib studies, sib is the control and the genotype is the exposure. The contribution of a given pair to the conditional likelihood is:
• exp[β Genotype(case)] exp[β Genotype(case)]+exp[β Genotype(control)] -Obtain variance-covariance of conditional MLE
using inverse Fisher information.
Summary
• Different tests for association in case-control studies: 2 by 3 table and logistic regression.
• Family based studies: tests and hypotheses.
• Study design choices, power, recruitment.