introduction to genetic association analysis in families · introduction to genetic association...

Introduction to Genetic Association Analysis inFamilies

Hae-Won Uh

Department of Medical Statistics and Bioinformatics, LUMC

[email protected]

Leiden, June 24, 2011

1 / 43

Overview

1 Background

2 Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history

3 Summary and future work

2 / 43

Challenge: finding genetic variants affecting humanhealth

Chromosomes 22 pairs of autosomes, X, Y

Genes ca. 23,000 protein-coding genes

Base-pairs ca. 3 billion DNA base pairs

Variants 99.9% of bases are identical between all people→ ca. 3 million SNPs

3 / 43

Terminology

DNA: strings of bases, A, T, G or C

Genes: a segment of DNA

Single Nucleotide Polymorphism (SNP):one-letter variations in the DNA sequence

4 / 43

Genetic concept

Genotype (or SNP) AA,Aa,aa is codedautosome: 0, 1, or 2 minor alleles aX: 0,1, or 2 for females & 0 or 2 for males

Hardy-Weinberg Equilibrium (HWE) assumptionthe alleles at the pair of chromosomes are independentp frequency of minor alleleP(AA) = (1− p)2, P(Aa) = 2p(1− p), P(aa) = p2

Genetic modelspecifies the relationship between genotypes and the diseasedominant, recessive, additive, X-linked

5 / 43

Overview

Background

Genetic association analysis in familiesScore testGenetic correlationUsing imputed dataIncorporating family history

Summary and future work

6 / 43

Methods for finding genetic variants

Linkagestudies linkage of marker alleles and disease within families

Associationseeks a marker allele that is present more frequently in casesthan in controlscan be more powerful than linkage methods for common diseases

7 / 43

Use of families in population-based associationstudies

AdvantageMost common diseases demonstrate familial aggregationMore powerful than (traditional) family-based tests

ChallengeNeed tests that correct for relatedness between subjects

8 / 43

Example: sibling pair - control study

Leiden Longevity Study (LLS)

9 / 43

LLS genome-wide association study (GWAS)

Study samplesibling pairs from 421 families: mean age 94y1670 controls: mean age 58y

(genotyped) SNPs: 500K

After imputation: 2.5 million

Score testretrospective likelihoodcorrects for correlation within families

10 / 43

LLS genome-wide association study (GWAS)

Study samplesibling pairs from 421 families: mean age 94y1670 controls: mean age 58y

(genotyped) SNPs: 500K

After imputation: 2.5 million

Score testretrospective likelihoodcorrects for correlation within families

11 / 43

Selective genotyping

NotationXi the genotype 0, 1, or 2 for subject i(for i = 1, . . . ,n)Yi the phenotypeY is the mean of Y , or proportion of cases in case-control studiesS ascertainment event

Retrospective likelihoodP(X |Y ,S) = P(X |Y )

12 / 43

Score test for independent cases

The score statistic UX =∑n

i=1(Yi − Y )Xi

UX is asymptotically normally distributed under H0with zero mean and variance

Var UX = VX

n∑i=1

(Yi − Y )2,

where VX is the variance of Xi .

Under H0, the ratio U2X/Var UX ∼ χ2

(1)

To account for the correlations between relatives modify VX (Uh et al,BMC Proc, 2009)

13 / 43

Overview

Background



14 / 43

Genetic correlation

To account for the correlations between relatives we modify VX using

Correlation coefficients ρij

Kinship coefficients Φij

Identity by descent (IBD) sharing

15 / 43

Identity by descent (IBD)

Two alleles which are copies of a common ancestral allele are said tobe IBD.

Subjects 3 and 4share IBD=1(the partenal allele, a)

But, in these families,they sharerespectively0 and 2 IBD

16 / 43

IBD sharing by relative pairs

When unable to assign IBD sharing, we can assess the probabilitythat two individuals share 0, 1 or 2 alleles IBD: (π0, π1, π2)

Prior IBD probabilities are the probabilities of IBD sharingconditional only on the relationship between 2 subjects

The prior values for a sibling pair: (π0, π1, π2) = (1/4,1/2,1/4)

17 / 43

Kinship coefficients Φij

Measure of relatedness between two individuals i and j

Probability that one allele sampled at random from each of twoindividuals are IBD

Derived from prior IBD probabilities:

Φij =14π1 +

12π2.

18 / 43

Calculation of Φij

19 / 43

IBD sharing and kinship by relationship

20 / 43

Correlation coefficients of relatives

Define the correlation matrix K as follows

K =

1 ρ12 . . . ρ1nρ12 1 . . . ρ2n... . . . . . .

...ρ1n ρ2n . . . 1

.

ρij is twice of the prior kinship coefficients.

21 / 43

IBD sharing, kinship and correlation coefficients

ρ = 2× Φ

22 / 43

Correlation coefficients of relatives II

T the transition matrix from the ITO matrices of Li and Sacks(Biometrics, 1954)

ρij = π2 + π1ρT =(1

2

)R,

where πk is the probability that the specified relatives shares kalleles IBD and R is the degree of relationship.For autosomal loci ρT equals to 1/2.

23 / 43

Correlation coefficients of relatives III

For autosomal genes

Multiplicative effectgrandparent-grandchild: 0 + 1/2 ∗ ρT = 1/4double first cousins: 1/16 + 6/16 ∗ ρT = 1/4

Recessive effectUnder HWE, ρT = p/(1 + p) with minor allele frequency p. Thecorrelation of a sib-pair is

ρrec ij =14

+12

( p1 + p

)=

1 + 3p4(1 + p)

24 / 43

Correlation coefficients of relatives III

For autosomal genes

Multiplicative effectgrandparent-grandchild: 0 + 1/2 ∗ ρT = 1/4double first cousins: 1/16 + 6/16 ∗ ρT = 1/4

Recessive effectUnder HWE, ρT = p/(1 + p) with minor allele frequency p. Thecorrelation of a sib-pair is

ρrec ij =14

+12

( p1 + p

)=

1 + 3p4(1 + p)

25 / 43

Correlation coefficients of relatives IV

For X-linked SNPsFour basic correlations:

ρTf ,f = 1/2, ρTf ,m = ρTm,f = 1/√

2, ρTm,m = 0,

where the subscripts indicate female pairs, mixed pairs, and malepairs, respectively.

For example,full sisters: 1/2 + 1/2ρTf ,f = 3/4

maternal uncle and niece: 1/4ρTm,f = 1/4 ∗ 1/√

2

26 / 43

Variance of autosomal SNP for sibling pair (s1, s2)

Multiplicative effect

E Xs1 = 2p, Var Xs1 = 2p(1− p)

Var(Xs1 + Xs2) = Var Xs1 + Var Xs2 + 2 Cov(Xs1,Xs2)

Cov(Xs1,Xs2) = ρs1,s2√

Var Xs1√

Var Xs2 = (1/2){2p(1− p)}

Recessive effect

E Xs1 = p2, Var Xs1 = p2(1− p)2

Cov(Xs1,Xs2) =1 + 3p

4(1 + p)p2(1− p)2

27 / 43

Variance of autosomal SNP for sibling pair (s1, s2)

Multiplicative effect

E Xs1 = 2p, Var Xs1 = 2p(1− p)

Var(Xs1 + Xs2) = Var Xs1 + Var Xs2 + 2 Cov(Xs1,Xs2)

Cov(Xs1,Xs2) = ρs1,s2√

Var Xs1√

Var Xs2 = (1/2){2p(1− p)}

Recessive effect

E Xs1 = p2, Var Xs1 = p2(1− p)2

Cov(Xs1,Xs2) =1 + 3p

4(1 + p)p2(1− p)2

28 / 43

Variance of X-linked SNP for sibling pair

Females carry 2 copies: X ∈ {0,1,2} and E Xf = 2pMales carry 1 copy: X ∈ {0,2} and E Xm = 2p

Variance of females or males

Females: σ2f = 2p(1− p)

Males: σ2m = 4p(1− p)

Covariance of sibling pairs

Sister-Sister: [1/2 + 1/2ρTf ,f ]σ2f = (3/4)2p(1− p)

Brother-Brother: [1/2 + 0ρTm,m ]σ2m = 2p(1− p)

Sister-Brother: [0 + 1/2ρTm,f ]√σ2

m

√σ2

f = p(1− p)

29 / 43

Variance of X-linked SNP for sibling pair

Females carry 2 copies: X ∈ {0,1,2} and E Xf = 2pMales carry 1 copy: X ∈ {0,2} and E Xm = 2p

Variance of females or males

Females: σ2f = 2p(1− p)

Males: σ2m = 4p(1− p)

Covariance of sibling pairs

Sister-Sister: [1/2 + 1/2ρTf ,f ]σ2f = (3/4)2p(1− p)

Brother-Brother: [1/2 + 0ρTm,m ]σ2m = 2p(1− p)

Sister-Brother: [0 + 1/2ρTm,f ]√σ2

m

√σ2

f = p(1− p)

30 / 43

Score test for related cases

The score statisticUX = (Y − Y )>X

Var UX = (Y − Y )>K (Y − Y )σ2X ,

For binary & quantitative outcome Y

Under H0, the ratio U2X/Var UX ∼ χ2

(1)

Implemented in CCassoc & QTassoc(www.msbi.nl/uh)

31 / 43

www.msbi.nl/uh

Overview

Background



32 / 43

When using imputed SNP data

Instead of X ∈ (0,1,2) we have the posterior probability $ = ($0, $1, $2)>

obtained by imputation software.

In the score statistic replace X with its expectation

UX = (Y − Y )>X

The variance of the score statistic is

Var UX = Iimp = Icomp − Icomp|imp

The loss of information due to uncertainty for 1 subject

Icomp|imp;i = $i1(1−$i1) + 4$i2(1−$i2)− 4$i1$i2

33 / 43

Efficiency measure R2T

Relative efficiency measure for case control design (Uh et al, BMCGenet, 2009)

R2T = Iimp/Icomp

Post-imputation: to assess the quality of imputed genotypesI MACH r2, IMPUTE info, BEAGLE R2

Post-analysis: to assess the quality of imputed genotypes w.r.t.parameter of association

I SNPTEST info, CCassoc & QTassoc R2T

34 / 43

Efficiency measure R2T for (extra) QC

λGC = {median (observed test statistic)}/0.456

R2T > 0.30 R2

T > 0.98

35 / 43

Overview

Background



36 / 43

Additional phenotypic information in LLS

37 / 43

More efficient test

CCassoc deals with related cases with ascertainment

Want: incorporate (available) phenotypic information ofun-genotyped relatives for optimal weighting

How? MQLS test (Thornton & McPeek, 2007) is allelic testWant to develop a test directly from score statisticIn fact we want to modify CCassoc

38 / 43

Multiplex-Case Control Score (MCCS) test

Let YN and YM phenotype of subjects with Non-missing and M issinggenotype.K N,M is N × (N ∪M) correlation matrix.

UX = (Y − Y )>X in CCassoc

Include phenotypic information in

Y ∗ = YN + K−1K N,MYM

Then U∗X = (Y ∗ − Y ∗)>X in MCCS

39 / 43

Example

1 case from 1 affected & 2 unaffected siblings:

Y ∗ = 1 + (1/2 1/2)

(00

)= 1

1 case from 2 affected & 1 unaffected siblings:

Y ∗ = 1 + (1/2 1/2)

(10

)= 1.5

1 case selected & typed from 3 affected siblings:

Y ∗ = 1 + (1/2 1/2)

(11

)= 2

40 / 43

Weight for LLS data

●●● ●● ●●●●

●

●●

●●

●●●●

●●

●●

0 4 7 10 14

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Autosomal SNPs

Family size

Wei

ght

●●●●●

●

●

●

●●

●

●●

●

●●●

●

● ●●

●●

●

●

●●

●

●

●●●

●

●●●

●

● ●

●

●

0 4 7 10 14

0.0

0.5

1.0

1.5

2.0

2.5

3.0

X−linked SNPs

Family size

Wei

ght

427 nonagenariansibling pairsadditional phenotypicinformation: age atdeath of family members(n= 2425 for controls &n=277 for cases)

41 / 43


We developed CCassoc & MCCSTo test SNP for multiplicative, dominant, and recessive effects in arelated sampleTo test X-linked SNP in a related sampleFor genome-wide scan data (e.g. 650K) and imputed genotypes(e.g. 2.5 million)To incorporate phenotypic information of un-genotyped relative foroptimal weighting

We want to extend MCCS:Direct use of family history score (Houwing-Duistermaat, 2009)Extend to other outcome

42 / 43

Acknowledgement

Medical StatisticsJeanine Houwing-DuistermaatQuinta Helmer

Molecular EpidemiologyJoris Deelen, Marian Beekman, Eline Slagboom

Financial supportVIDI grant (NWO 917.66.344) from the Netherlands Organizationfor Scientific ResearchIOP genomics/SenterNovem (IGE05007)

43 / 43

introduction to genetic association analysis in families · introduction to genetic association...

Documents