estimation of quantitative genetic parameters from distant ... 8 l4 p… · estimation of...

36
Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher [email protected] 1

Upload: others

Post on 17-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Estimationofquantitativegeneticparametersfromdistant

relativesusingmarkerdata

[email protected]

1

Page 2: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Keyconcepts• DenseSNPpanelsallowtheestimationoftheexpected

geneticcovariancebetweendistantrelatives(‘unrelateds’)• AmodelbaseduponestimatedrelationshipsfromSNPsis

equivalenttoamodelfittingallSNPssimultaneously• ThetotalgeneticvarianceduetoLDbetweencommonSNPs

and(unknown)causalvariantscanbeestimated• GeneticvariancecapturedbycommonSNPscanbeassigned

tochromosomesandchromosomesegments

2

Page 3: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

1886

3

Page 4: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

4

Page 5: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

100yearslaterHeritabilityofhumanheight

h2 ~ 80%5

Page 6: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

6

Page 7: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Disease Number of loci

Percent of Heritability Measure Explained

Heritability Measure

Age-related macular degeneration

5 50% Sibling recurrence risk

Crohn’s disease 32 20% Genetic risk (liability)

Systemic lupus erythematosus

6 15% Sibling recurrence risk

Type 2 diabetes 18 6% Sibling recurrence risk

HDL cholesterol 7 5.2% Phenotypic variance

Height 40 5% Phenotypic variance

Early onset myocardial infarction

9 2.8% Phenotypic variance

Fasting glucose 4 1.5% Phenotypic variance

WhereistheDarkMatter?

7

Page 8: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Hypothesistestingvs.Estimation

• GWAS=hypothesistesting– Stringentp-valuethreshold– Estimatesofeffectsbiased(“Winner’sCurse”)

• E(bhat|test(bhat)>T)>b{bfixed}• var(bhat)=var(b)+var(bhat|b){brandom}

• CanweestimatethetotalproportionofvariationaccountedforbyallSNPs?

8

Page 9: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Basicidea• Estimatesofadditivegeneticvariancefromknownpedigreeisunbiased– Ifmodeliscorrect– Despitevariationinidentitygiventhepedigree– PedigreegivescorrectexpectedIBD

• Unknownpedigree:estimategenome-wideIBDfrommarkerdata– Estimateadditivegeneticvariancegiventhisestimateofrelatedness

• Ideaisnotnew– (Evolutionary)geneticsliterature(Ritland,Lynch,Hill,others)

9

Page 10: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Closevsdistantrelatives

• Detectionofcloserelatives(fullsibs,parent-offspring,halfsibs)frommarkerdataisrelativelystraightforward

• Butcloserelativesmayshareenvironmentalfactors– Biasedestimatesofgeneticvariance

• Solution:useonly(very)distantrelatives

10

Page 11: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

AmodelforasinglecausalvariantAA AB BB

frequency (1-p)2 2p(1-p) p2

x 0 1 2effect 0 b 2bz=[x-E(x)]/sx -2p/√{2p(1-p)} (1-p)/√{2p(1-p)} 2(1-p)/√{2p(1-p)}

yj = µ’ +xijbi +ej x=0,1,2{standardassociationmodel}

yj = µ +zijuj +ej u=bsx;µ =µ’+bsx

11

Page 12: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Multiple(m)causalvariants

yj = µ +Szijuj +ej

= µ +gj +ej

y = µ1 +g +e

= µ1 +Zu +e

12

Page 13: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Equivalence

Letubearandomvariable,u~N(0,su2)

Thensg2 =msu

2 and

var(y) =ZZ’su2 +Ise

2

=ZZ’(sg2/m)+Ise

2

=Gsg2 +Ise

2

13

Model with individual genome-wide additive values using relationships (G) at the causal variants is equivalent to a model fitting all causal variants

We can estimate genetic variance just as if we would do using pedigree relationships

Page 14: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Butwedon’thavethecausalvariantsIfweestimateG fromSNPs:

– loseinformationduetoimperfectLDbetweenSNPsandcausalvariants

– howmuchwelosedependson• densityofSNPs• allelefrequencyspectrumofSNPsvs.causalvariants

– estimateofvarianceàmissingheritability

14

LetA betheestimateofG fromNSNPs:

Ajk =(1/N)S {xij – 2pi)(xik – 2pi)/{2pi(1-pi)}

=(1/N)S zijzik

Page 15: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Data• ~4000 ‘unrelated’ individuals• Ancestry ~British Isles• Measurement on height (self-report or clinically measured)• GWAS on 300k (‘adults’) or 600k (16-year olds) SNPs

15

Page 16: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Lack of evidence for population stratification within the Australian sample

16

Page 17: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Methods• Estimate realised relationship matrix from

SNPs• Estimate additive genetic variance

22)var( eg ss IAVy +==

)1(2),cov(

)var()var(),cov(

ii

ikij

iikiij

iikiijijk pp

xxaxax

axaxA

-==

iii egy +=

Base population = current population

ïïî

ïïí

ì

=-

++-+

¹-

--

==

å

åå

kjpp

pxpxN

kjpp

pxpxN

AN

A

iii

iijiij

iii

iikiij

i ijkjk

,)1(22)21(11

,)1(2

)2)(2(11

22

17

Page 18: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Statistical analysis

22)var( eg ss IAVy +==

y standardised ~N(0,1)

No fixed effects other than mean

A estimated from SNPs

Residual maximum likelihood (REML)

18

Page 19: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

h2 ~ 0.5 (SE 0.1)

19

Page 20: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Partitioning variation• If we can estimate the variance captured by

SNPs genome-wide, we should be able to partition it and attribute variance to regions of the genome

• “Population based linkage analysis”

20

Page 21: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Genome partitioning

21

• Partition additive genetic variance according to groups of SNPs– Chromosomes– Chromosome segments– MAF bins– Genic vs non-genic regions– Etc.

• Estimate genetic relationship matrix from SNP groups

• Analyse phenotypes by fitting multiple relationship matrices

• Linear model & REML (restricted maximum likelihood)

Page 22: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Application: the GENEVA Consortium• Data

– ~14,000 European Americans• ARIC• NHS• HPFS

– Affy 6.0 genotype data• ~600,000 after stringent QC

– Phenotypes on height, BMI, vWF and QT Interval

22

Page 23: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

QC of SNPs

• 780,062 SNPs after QC steps listed in the table.

• Exclude 141,772 SNPs with MAF < 0.02 in European-ancestry group.

• Exclude 36,949 SNPs with missingness > 2% in all samples.

• Include autosomal SNPs only.

• End up with 577,778 SNPs.23

Page 24: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Results (genome-wide)

24

Page 25: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

1

2

3

4

5

6

789

101112

13

14

15

16

17

1819

20

21

22

0

0.01

0.02

0.03

0.04

0.05

0 50 100 150 200 250

Varia

nceexplainedbyeachchromosom

e

Chromosomelength(Mb)

Slope=1.6×10-4

P =1.4×10-6

R2 =0.695

Height(combined)

12

34

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

2122

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250

Varia

nceexplainedbyeachchromosom

eChromosomelength(Mb)

Slope=2.3×10-5

P =0.214R2 =0.076

BMI(combined)

25

Genome-partitioning: longer chromosomes explain more variation

Page 26: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

1

234

5

6

7

8

9

10

11

12

13 14

15

16

17

1819

20

21 22

R²=0.511

0.000

0.002

0.004

0.006

0.008

0.010

0.00 0.01 0.02 0.03 0.04 0.05

Varia

nceexplainedbyGIANTheightSNPson

eachch

romosom

e

Varianceexplainedbyeachchromosome

Height (11,586 unrelated)

1

2

34

5

6

7

8

9

10

11

12

13

14

1516

17

18

19

20

2122

0.000

0.004

0.008

0.012

0.016

0.020

0.000 0.004 0.008 0.012 0.016 0.020

Varia

nceexplainedbychrom

osom

e(adjustedfortheFTO

SNP)

Varianceexplainedbychromosome(noadjustment)

BMI(11,586unrelated)

FTO

26

Results are consistent with reported GWAS

Page 27: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

1

234

5

6

78

9

1011

12

13

14

151617

18

192021220.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0 50 100 150 200 250

Varia

nceexplainedbyeachchromosom

e

Chromosomelength(Mb)

Slope=6.9×10-5

P =0.524R2 =0.021

vWF(ARIC)

27

Inference robust with respect to genetic architecture

1

2

3

4

5

6

7

8

9

1011

12

13

14

151617

18

192021

22

0.00

0.03

0.06

0.09

0.12

0.15

0.00 0.03 0.06 0.09 0.12 0.15

Varia

nceexplainedbychrom

osom

e(adjustedfortheABO

SNP)

Varianceexplainedbychromosome(noadjustment)

vWF(6,662unrelated)

ABO

Page 28: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

0

0.01

0.02

0.03

0.04

0.05

0.06

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varia

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

Height(combined)17,277proteincoding geneshGg

2 =0.328(s.e.=0.024)hGi

2 =0.126(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=2.1x10-10

0

0.005

0.01

0.015

0.02

0.025

0.03

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varia

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

BMI(combined)17,277proteincoding geneshGg

2 =0.117(s.e.=0.023)hGi

2 =0.047(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=0.022

28

Genic regions explain variation disproportionately

Page 29: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Using imputed sequence data

• How much information is gained by using SNP array data imputed to a fully sequenced reference?

• How much is lost relative to whole genome sequencing?

29Yang et al. 2015 (Nature Genetics)

Page 30: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Accounting for LD and MAF spectrum allows unbiased estimation of genetic variance (GREML-LDMS)

30

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

1.2#

Random# More#common# Rarer# Rarer#&#DHS#

Herita

bility*es,m

ate*

GREML:SC#GREML:MS#LDAK#LDAK:MS#LDres#LDres:MS#GREML:LDMS#

Yang et al. 2015 (Nature Genetics)

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

Random# More#common# Rarer# Rarer#&#DHS#

Heritab

ility*es

,mate*

7MAF_4LD#

7MAF_3LD#

7MAF_2LD#

2MAF_2LD#

Page 31: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Very little difference in “taggability” between SNP chips

31

Genetic variation captured after imputation96% due to common variants73% due to rare variants

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

0# 0.1# 0.2# 0.3# 0.4# 0.5# 0.6# 0.7# 0.8# 0.9#

Prop

or%o

n'of'varia%o

n'captured

'

Imputa%on'R2'threshold'

Common#1#Affymetrix#6#

Common#1#Affymetrix#Axiom#

Common#1#Illumina#OmniExpress#

Common#1#Illumina#Omni2.5#

Common#1#Illumina#CoreExome#

Rare#1#Affymetrix#6#

Rare#1#Affymetrix#Axiom#

Rare#1#Illumina#OmniExpress#

Rare#1#Illumina#Omni2.5#

Rare#1#Illumina#CoreExome#

Yang et al. 2015 (Nature Genetics)

Page 32: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

n = 45k data on height and BMI

32

Totals~60% for height~30% for BMI

0.00#

0.05#

0.10#

0.15#

0.20#

0.25#

<#0.1# 0.1#~#0.2# 0.2#~#0.3# 0.3#~#0.4# 0.4#~#0.5#

Varia

nce(explaine

d(

MAF(stra2fied(variant(group(

Height# BMI#

Yang et al. 2015 (Nature Genetics)

0.00#

0.02#

0.04#

0.06#

0.08#

0.10#

0.12#

0.14#

2.5e+5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

Varia

nce(explaine

d(

MAF(

1st#quar4le#(low#LD)#

2nd#quar4le#

3rd#quar4le#

4th#quar4le#(high#LD)#

0.00#

0.01#

0.02#

0.03#

0.04#

0.05#

0.06#

0.07#

2.5e,5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

Varia

nce(explaine

d(

MAF(

Page 33: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

100%

80%

45%

16%

SlidebyRobertMaier 33

h2 overestimation?untaggedrarevariants?

better tagging of ungenotyped variants

samplesize/power

Partitioning variance of height

TotalvarianceHeritability (based on Twin or family studies)SNP heritability from imputation to sequenced referenceSNP-heritability (variance explained by all genotyped SNPs ontheChip)VarianceexplainedbygenomewidesignificantSNPs

missingheritability60%

Page 34: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Scaling revisitedu = bsx ~ N(0, su

2) implies

b2 proportional to su2/[2p(1-p)], so rare variants have larger allelic effect: natural

selection

If b2 = su2 then no relationship between frequency and effect size: neutral

model

In between: b2 = su2 [2p(1-p)]-s

Variance explained by SNP: 2p(1-p)su2[2p(1-p)]-s

= su2 [2p(1-p)]1-s

s = 0: common SNPs explain more variations = 1: all SNPs explain the same amount of variation

34

Page 35: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Multiple methods to estimate additive genetic variance

• Individual-level data– GREML– Haseman-Elston regression(yjyj) = µ + bAij

• Summary dataLDscore regression

• Consideration:– data availability– model assumptions– computation

35

Page 36: Estimation of quantitative genetic parameters from distant ... 8 L4 P… · Estimation of quantitative genetic parameters from distant relatives using marker data Peter M. Visscher

Key concepts• Dense SNP panels allow the estimation of the expected

genetic covariance between distant relatives (‘unrelateds’)

• A model based upon estimated relationships from SNPs is equivalent to a model fitting all SNPs simultaneously

• The total genetic variance due to LD between common SNPs and (unknown) causal variants can be estimated

• Genetic variance captured by common SNPs can be assigned to chromosomes and chromosome segments

36