gwas for quantitative traits -...

Post on 07-Feb-2018

221 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GWAS for quantitative traits

Peter M. Visscher

peter.visscher@qimr.edu.au

Queensland Institute ofMedical Research

Overview

• Darwin and Mendel

• Background: population genetics

• Background: quantitative genetics

• GWAS– Examples

– Analysis

– Statistical power

[Galton, 1889]

Mendelian Genetics

Following a single (or several) genes that we can directly score

Phenotype highly informative

as to genotype

Darwin & Mendel

• Darwin (1859) Origin of Species– Instant Classic, major immediate impact

– Problem: Model of Inheritance• Darwin assumed Blending inheritance

• Offspring = average of both parents

• zo = (zm + zf)/2

• Fleming Jenkin (1867) pointed out problem– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)

– Hence, under blending inheritance, half the variation is removed each generation and this must somehow be replenished by mutation.

Mendel• Mendel (1865), Experiments in Plant Hybridization• No impact, paper essentially ignored

– Ironically, Darwin had an apparently unread copy in his library

– Why ignored? Perhaps too mathematical for 19th century biologists

• Rediscovery in 1900 (by three independent groups)

• Mendel’s key idea: Genes are discrete particles passed on intact from parent to offspring

The height vs. pea debate

(early 1900s)

Do quantitative traits have the same hereditary and evolutionary properties as discrete characters?

Biometricians Mendelians

RA Fisher (1918). Transactions of the Royal Societyof Edinburgh52: 399-433.

m-a m+d m+a

QQ

Qq

qq

Trait

m-a m+d m+a

QQ

Qq

qq

Trait

Population Genetics

• Allele and genotype frequencies• Hardy-Weinberg Equilibrium• Linkage (dis)equilibrium

Allele and Genotype Frequencies

6

Given genotype frequencies, we can always compute allelefrequencies, e.g.,

The converse is not true: given allele frequencies we cannot uniquely determine the genotype frequencies

For n alleles, there are n(n+1)/2 genotypes

If we are willing to assume random mating,

Hardy-Weinbergproportions

∑≠

+=ji

jiiii AAfreqAAfreqp )(21)(

≠=

=jipp

jipAAfreq

ji

iji for 2

for )(

2

Hardy-Weinberg• Prediction of genotype frequencies from allele freqs

• Allele frequencies remain unchanged over generations,provided:

• Infinite population size (no genetic drift)

• No mutation

• No selection

• No migration

• Under HW conditions, a single generation of randommating gives genotype frequencies in Hardy-Weinbergproportions, and they remain forever in these proportions

QC in GWAS studies

Linkage equilibrium

Random mating and recombination eventually changesgamete frequencies so that they are in linkage equilibrium (LE).

Once in LE, gamete frequencies do not change (unless acted on by other forces)

At LE, alleles in gametes are independent of each other:

freq(AB) = freq(A)*freq(B)freq(ABC) = freq(A) * freq(B) * freq(C)

Linkage disequilibriumWhen linkage disequilibrium (LD) present, alleles are nolonger independent --- knowing that one allele is in the gamete provides information on alleles at other loci:

freq(AB) ≠ freq(A) * freq(B)

The disequilibrium between alleles A and B is given by

DAB = freq(AB) – freq(A)*freq(B)

GWAS relies on LD between markers and causal variants

Linkage equilibrium Linkage disequilibrium

Q1 M1

Q2 M2

Q1 M2

Q2 M1

Q1 M1

Q2 M2

Q1 M2

Q2 M1

Q1 M1

Q1 M1

Q2 M2

Q2 M2

Q1 M1

Q2 M2

Q1 M1

Q2 M2

The Decay of Linkage DisequilibriumThe frequency of the AB gamete is given by

freq(AB) = freq(A)*freq*(B) + DAB

If recombination frequency between the A and B lociis c, the disequilibrium in generation t is

D(t) = D(0) (1 – c)t

Note that D(t) -> zero, although the approach can be slow when c is very small

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 10 20 30 40 50 60 70 80 90 100Generation

LD

c = 0.10c = 0.01c = 0.001

NB: Gene mapping & GWAS

Forces that Generate LD

• Drift (finite population size)• Selection• Migration (admixture)• Mutation• Population structure (stratification)

Effective population size determines the number of markers needed for GWAS

Quantitative Genetics

The analysis of traits whose variation is determined by both a number of genes and

environmental factors

Phenotype is highly uninformative as tounderlying genotype

m-a m+d m+a

QQ

Qq

qq

Trait

m-a m+d m+a

QQ

Qq

qq

Trait

Complex (or Quantitative) trait

• No (apparent) simple Mendelian basis for variation in the trait

• May be a single gene strongly influenced by environmental factors

• May be the result of a number of genes of equal (or differing) effect

• Most likely, a combination of both multiple genes and environmental factors.

• Example: Blood pressure, cholesterol levels, IQ, height, etc.

Basic model of Quantitative Genetics

Basic model: P = G + E

G = average phenotypic value for that genotypeif we are able to replicate it over the universeof environmental values, G = E[P]

G x E interaction --- G values are differentacross environments. Basic model nowbecomes P = G + E + GE

Biometrical model for single diallelic Quantitative

Trait Locus (QTL)

Contribution of the QTL to the Mean (X)

aaAaAAGenotypes

Frequencies, f(x)

Effect, x

p2 2pq q2

a d -a

( )∑=i

ii xfxµ

= a(p2) + d(2pq) – a(q2)Mean (X) = a(p-q) + 2pqd

Example: Apolipoprotein E & Alzheimer’s

Genotype ee Ee EE

Average age of onset 68.4 75.5 84.3

2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95

d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85

d/a = -0.10 Only small amount of dominance

Biometrical model for single diallelic QTL

Contribution of the QTL to the Variance (X)

aaAaAAGenotypes

Frequencies, f(x)

Effect, x

p2 2pq q2

a d -a

= (a-m)2p2 + (d-m)22pq + (-a-m)2q2Var (X)

( ) ( )∑ −=i

ii xfxVar 2µ

= VQTL

HW proportions

Biometrical model for single diallelic QTL

= (a-m)2p2 + (d-m)22pq + (-a-m)2q2Var (X)

= 2pq[a+(q-p)d]2 + (2pqd)2

= VAQTL+ VDQTL

Additive effects: the main effects of individual allelesDominance effects: represent the interaction between alleles

Biometrical model for single biallelic QTL

aa Aa AA

m

-a

a

d

Var (X) = Regression Variance + Residual Variance= Additive Variance + Dominance Variance

Fisher 1918

Association (GWAS)

• State of play

• Model

• Analysis method

• Power of detection

• GWAS works

• Effect sizes are typically small– Disease: OR ~1.1 to ~1.3

– Quantitative traits: % var explained <<1%

Disease Number of loci

Percent of Heritability Measure Explained

Heritability Measure

Age-related macular degeneration

5 50% Sibling recurrence risk

Crohn’s disease 32 20% Genetic risk (liability)

Systemic lupus erythematosus

6 15% Sibling recurrence risk

Type 2 diabetes 18 6% Sibling recurrence risk

HDL cholesterol 7 5.2% Phenotypic variance

Height 40 5% Phenotypic variance

Early onset myocardial infarction

9 2.8% Phenotypic variance

Fasting glucose 4 1.5% Phenotypic variance

Effect sizes QT (104 SNPs)% variance explained, quantitative

traits

05

101520253035

0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7

Freq

uenc

y

Linear model for single SNP

• Allelic

Y = µ+ b*x + ex = 0, 1, 2 for genotypes aa, Aa and AA

• Genotypic

Y = µ + Gi + eGi = genotype group for corresponding to

genotypes aa, Aa and AA

Additive model

Additive + dominance model

Method

• Linear regression

• ANOVA

• (other: maximum likelihood, Bayesian)

Test statistic (allelic model)

212,1

22

2

~)ˆvar(/ˆ

)1,0(~)ˆ(/ˆ

χ

σ

≈=

≈=

N

N

FbbT

NtbbT

)1(2)var()ˆvar(

22

ppNxNb ee

−==

σσ

Statistical Power (additive model)

q2 = {2p(1-p)[a + d(1-2p)]2} / σp2

Non-centrality parameter of χ2 test:

λ = Nq2/(1-q2) ≈ Nq2

Required sample size given type-I (α) and type-II (β) error:

N = [(1-q2)/(q2)](z(1-α/2) + z(1-β))2 ≈ (z(1-α/2) + z(1-β))

2 / q2

LD again

r2 = LD correlation between QTL and genotyped SNP

Proportion of variance explained at SNP= r2q2

Required sample size for detectionN ≈ (z(1-α/2) + z(1-β))

2 / (r2q2)

Genetic Power Calculator (Shaun Purcell)http://pngu.mgh.harvard.edu/~purcell/gpc/

Serum bilirubin: if all GWAS were so simple…

RS2070959_A210

95%

CI P

HEN

OTY

PE

2.000

1.500

1.000

0.500

0.000

-0.500

38% of phenotypic variance explained

1984

top related