introduction to genetic association analysis in families · introduction to genetic association...
TRANSCRIPT
Introduction to Genetic Association Analysis inFamilies
Hae-Won Uh
Department of Medical Statistics and Bioinformatics, LUMC
Leiden, June 24, 2011
1 / 43
Overview
1 Background
2 Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history
3 Summary and future work
2 / 43
Challenge: finding genetic variants affecting humanhealth
Chromosomes 22 pairs of autosomes, X, Y
Genes ca. 23,000 protein-coding genes
Base-pairs ca. 3 billion DNA base pairs
Variants 99.9% of bases are identical between all people→ ca. 3 million SNPs
3 / 43
Terminology
DNA: strings of bases, A, T, G or C
Genes: a segment of DNA
Single Nucleotide Polymorphism (SNP):one-letter variations in the DNA sequence
4 / 43
Genetic concept
Genotype (or SNP) AA,Aa,aa is codedautosome: 0, 1, or 2 minor alleles aX: 0,1, or 2 for females & 0 or 2 for males
Hardy-Weinberg Equilibrium (HWE) assumptionthe alleles at the pair of chromosomes are independentp frequency of minor alleleP(AA) = (1− p)2, P(Aa) = 2p(1− p), P(aa) = p2
Genetic modelspecifies the relationship between genotypes and the diseasedominant, recessive, additive, X-linked
5 / 43
Overview
Background
Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history
Summary and future work
6 / 43
Methods for finding genetic variants
Linkagestudies linkage of marker alleles and disease within families
Associationseeks a marker allele that is present more frequently in casesthan in controlscan be more powerful than linkage methods for common diseases
7 / 43
Use of families in population-based associationstudies
AdvantageMost common diseases demonstrate familial aggregationMore powerful than (traditional) family-based tests
ChallengeNeed tests that correct for relatedness between subjects
8 / 43
Example: sibling pair - control study
Leiden Longevity Study (LLS)
9 / 43
LLS genome-wide association study (GWAS)
Study samplesibling pairs from 421 families: mean age 94y1670 controls: mean age 58y
(genotyped) SNPs: 500K
After imputation: 2.5 million
Score testretrospective likelihoodcorrects for correlation within families
10 / 43
LLS genome-wide association study (GWAS)
Study samplesibling pairs from 421 families: mean age 94y1670 controls: mean age 58y
(genotyped) SNPs: 500K
After imputation: 2.5 million
Score testretrospective likelihoodcorrects for correlation within families
11 / 43
Selective genotyping
NotationXi the genotype 0, 1, or 2 for subject i(for i = 1, . . . ,n)Yi the phenotypeY is the mean of Y , or proportion of cases in case-control studiesS ascertainment event
Retrospective likelihoodP(X |Y ,S) = P(X |Y )
12 / 43
Score test for independent cases
The score statistic UX =∑n
i=1(Yi − Y )Xi
UX is asymptotically normally distributed under H0with zero mean and variance
Var UX = VX
n∑i=1
(Yi − Y )2,
where VX is the variance of Xi .
Under H0, the ratio U2X/Var UX ∼ χ2
(1)
To account for the correlations between relatives modify VX (Uh et al,BMC Proc, 2009)
13 / 43
Overview
Background
Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history
Summary and future work
14 / 43
Genetic correlation
To account for the correlations between relatives we modify VX using
Correlation coefficients ρij
Kinship coefficients Φij
Identity by descent (IBD) sharing
15 / 43
Identity by descent (IBD)
Two alleles which are copies of a common ancestral allele are said tobe IBD.
Subjects 3 and 4share IBD=1(the partenal allele, a)
But, in these families,they sharerespectively0 and 2 IBD
16 / 43
IBD sharing by relative pairs
When unable to assign IBD sharing, we can assess the probabilitythat two individuals share 0, 1 or 2 alleles IBD: (π0, π1, π2)
Prior IBD probabilities are the probabilities of IBD sharingconditional only on the relationship between 2 subjects
The prior values for a sibling pair: (π0, π1, π2) = (1/4,1/2,1/4)
17 / 43
Kinship coefficients Φij
Measure of relatedness between two individuals i and j
Probability that one allele sampled at random from each of twoindividuals are IBD
Derived from prior IBD probabilities:
Φij =14π1 +
12π2.
18 / 43
Calculation of Φij
19 / 43
IBD sharing and kinship by relationship
20 / 43
Correlation coefficients of relatives
Define the correlation matrix K as follows
K =
1 ρ12 . . . ρ1nρ12 1 . . . ρ2n... . . . . . .
...ρ1n ρ2n . . . 1
.
ρij is twice of the prior kinship coefficients.
21 / 43
IBD sharing, kinship and correlation coefficients
ρ = 2× Φ
22 / 43
Correlation coefficients of relatives II
T the transition matrix from the ITO matrices of Li and Sacks(Biometrics, 1954)
ρij = π2 + π1ρT =(1
2
)R,
where πk is the probability that the specified relatives shares kalleles IBD and R is the degree of relationship.For autosomal loci ρT equals to 1/2.
23 / 43
Correlation coefficients of relatives III
For autosomal genes
Multiplicative effectgrandparent-grandchild: 0 + 1/2 ∗ ρT = 1/4double first cousins: 1/16 + 6/16 ∗ ρT = 1/4
Recessive effectUnder HWE, ρT = p/(1 + p) with minor allele frequency p. Thecorrelation of a sib-pair is
ρrec ij =14
+12
( p1 + p
)=
1 + 3p4(1 + p)
24 / 43
Correlation coefficients of relatives III
For autosomal genes
Multiplicative effectgrandparent-grandchild: 0 + 1/2 ∗ ρT = 1/4double first cousins: 1/16 + 6/16 ∗ ρT = 1/4
Recessive effectUnder HWE, ρT = p/(1 + p) with minor allele frequency p. Thecorrelation of a sib-pair is
ρrec ij =14
+12
( p1 + p
)=
1 + 3p4(1 + p)
25 / 43
Correlation coefficients of relatives IV
For X-linked SNPsFour basic correlations:
ρTf ,f = 1/2, ρTf ,m = ρTm,f = 1/√
2, ρTm,m = 0,
where the subscripts indicate female pairs, mixed pairs, and malepairs, respectively.
For example,full sisters: 1/2 + 1/2ρTf ,f = 3/4
maternal uncle and niece: 1/4ρTm,f = 1/4 ∗ 1/√
2
26 / 43
Variance of autosomal SNP for sibling pair (s1, s2)
Multiplicative effect
E Xs1 = 2p, Var Xs1 = 2p(1− p)
Var(Xs1 + Xs2) = Var Xs1 + Var Xs2 + 2 Cov(Xs1,Xs2)
Cov(Xs1,Xs2) = ρs1,s2√
Var Xs1√
Var Xs2 = (1/2){2p(1− p)}
Recessive effect
E Xs1 = p2, Var Xs1 = p2(1− p)2
Cov(Xs1,Xs2) =1 + 3p
4(1 + p)p2(1− p)2
27 / 43
Variance of autosomal SNP for sibling pair (s1, s2)
Multiplicative effect
E Xs1 = 2p, Var Xs1 = 2p(1− p)
Var(Xs1 + Xs2) = Var Xs1 + Var Xs2 + 2 Cov(Xs1,Xs2)
Cov(Xs1,Xs2) = ρs1,s2√
Var Xs1√
Var Xs2 = (1/2){2p(1− p)}
Recessive effect
E Xs1 = p2, Var Xs1 = p2(1− p)2
Cov(Xs1,Xs2) =1 + 3p
4(1 + p)p2(1− p)2
28 / 43
Variance of X-linked SNP for sibling pair
Females carry 2 copies: X ∈ {0,1,2} and E Xf = 2pMales carry 1 copy: X ∈ {0,2} and E Xm = 2p
Variance of females or males
Females: σ2f = 2p(1− p)
Males: σ2m = 4p(1− p)
Covariance of sibling pairs
Sister-Sister: [1/2 + 1/2ρTf ,f ]σ2f = (3/4)2p(1− p)
Brother-Brother: [1/2 + 0ρTm,m ]σ2m = 2p(1− p)
Sister-Brother: [0 + 1/2ρTm,f ]√σ2
m
√σ2
f = p(1− p)
29 / 43
Variance of X-linked SNP for sibling pair
Females carry 2 copies: X ∈ {0,1,2} and E Xf = 2pMales carry 1 copy: X ∈ {0,2} and E Xm = 2p
Variance of females or males
Females: σ2f = 2p(1− p)
Males: σ2m = 4p(1− p)
Covariance of sibling pairs
Sister-Sister: [1/2 + 1/2ρTf ,f ]σ2f = (3/4)2p(1− p)
Brother-Brother: [1/2 + 0ρTm,m ]σ2m = 2p(1− p)
Sister-Brother: [0 + 1/2ρTm,f ]√σ2
m
√σ2
f = p(1− p)
30 / 43
Score test for related cases
The score statisticUX = (Y − Y )>X
Var UX = (Y − Y )>K (Y − Y )σ2X ,
For binary & quantitative outcome Y
Under H0, the ratio U2X/Var UX ∼ χ2
(1)
Implemented in CCassoc & QTassoc(www.msbi.nl/uh)
31 / 43
Overview
Background
Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history
Summary and future work
32 / 43
When using imputed SNP data
Instead of X ∈ (0,1,2) we have the posterior probability $ = ($0, $1, $2)>
obtained by imputation software.
In the score statistic replace X with its expectation
UX = (Y − Y )>X
The variance of the score statistic is
Var UX = Iimp = Icomp − Icomp|imp
The loss of information due to uncertainty for 1 subject
Icomp|imp;i = $i1(1−$i1) + 4$i2(1−$i2)− 4$i1$i2
33 / 43
Efficiency measure R2T
Relative efficiency measure for case control design (Uh et al, BMCGenet, 2009)
R2T = Iimp/Icomp
Post-imputation: to assess the quality of imputed genotypesI MACH r2, IMPUTE info, BEAGLE R2
Post-analysis: to assess the quality of imputed genotypes w.r.t.parameter of association
I SNPTEST info, CCassoc & QTassoc R2T
34 / 43
Efficiency measure R2T for (extra) QC
λGC = {median (observed test statistic)}/0.456
R2T > 0.30 R2
T > 0.98
35 / 43
Overview
Background
Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history
Summary and future work
36 / 43
Additional phenotypic information in LLS
37 / 43
More efficient test
CCassoc deals with related cases with ascertainment
Want: incorporate (available) phenotypic information ofun-genotyped relatives for optimal weighting
How? MQLS test (Thornton & McPeek, 2007) is allelic testWant to develop a test directly from score statisticIn fact we want to modify CCassoc
38 / 43
Multiplex-Case Control Score (MCCS) test
Let YN and YM phenotype of subjects with Non-missing and M issinggenotype.K N,M is N × (N ∪M) correlation matrix.
UX = (Y − Y )>X in CCassoc
Include phenotypic information in
Y ∗ = YN + K−1K N,MYM
Then U∗X = (Y ∗ − Y ∗)>X in MCCS
39 / 43
Example
1 case from 1 affected & 2 unaffected siblings:
Y ∗ = 1 + (1/2 1/2)
(00
)= 1
1 case from 2 affected & 1 unaffected siblings:
Y ∗ = 1 + (1/2 1/2)
(10
)= 1.5
1 case selected & typed from 3 affected siblings:
Y ∗ = 1 + (1/2 1/2)
(11
)= 2
40 / 43
Weight for LLS data
●●● ●● ●●●●
●
●●
●●
●●●●
●●
●●
0 4 7 10 14
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Autosomal SNPs
Family size
Wei
ght
●●●●●
●
●
●
●●
●
●●
●
●●●
●
● ●●
●●
●
●
●●
●
●
●●●
●
●●●
●
● ●
●
●
0 4 7 10 14
0.0
0.5
1.0
1.5
2.0
2.5
3.0
X−linked SNPs
Family size
Wei
ght
427 nonagenariansibling pairsadditional phenotypicinformation: age atdeath of family members(n= 2425 for controls &n=277 for cases)
41 / 43
Summary and future work
We developed CCassoc & MCCSTo test SNP for multiplicative, dominant, and recessive effects in arelated sampleTo test X-linked SNP in a related sampleFor genome-wide scan data (e.g. 650K) and imputed genotypes(e.g. 2.5 million)To incorporate phenotypic information of un-genotyped relative foroptimal weighting
We want to extend MCCS:Direct use of family history score (Houwing-Duistermaat, 2009)Extend to other outcome
42 / 43
Acknowledgement
Medical StatisticsJeanine Houwing-DuistermaatQuinta Helmer
Molecular EpidemiologyJoris Deelen, Marian Beekman, Eline Slagboom
Financial supportVIDI grant (NWO 917.66.344) from the Netherlands Organizationfor Scientific ResearchIOP genomics/SenterNovem (IGE05007)
43 / 43