Download - Estimation of genetic variation and SNP- heritability from

1

EstimationofgeneticvariationandSNP-heritabilityfromGWASdata

• DenseSNPpanelsallowtheestimationoftheexpectedgeneticcovariancebetweendistantrelatives

• AmodelbaseduponestimatedrelationshipsfromSNPsisequivalenttoamodelfittingallSNPssimultaneously

• ThetotalgeneticvarianceduetoLDbetweencommonSNPsand(unknown)causalvariantscanbeestimated

• GeneticvariancecapturedbycommonSNPscanbepartitionedacrossthegenome

• DifferentmethodstoestimaterelatednessfromSNPsassumedifferentgenetictraitarchitectures

2

Keyconcepts

EstimationofSNP-heritabilityfromGWASdata

Background– 2008:GWASwasperceivedbymanytohavefailedasanexperimentaldesign

– Missingheritability:discrepancybetweenpedigreeheritabilityandvariancecapturedbyassociatedSNPs

3

Disease Number of loci

Percent of Heritability Measure Explained

Heritability Measure

Age-related macular degeneration

5 50% Sibling recurrence risk

Crohn’s disease 32 20% Genetic risk (liability)

Systemic lupus erythematosus

6 15% Sibling recurrence risk

Type 2 diabetes 18 6% Sibling recurrence risk

HDL cholesterol 7 5.2% Phenotypic variance

Height 40 5% Phenotypic variance

Early onset myocardial infarction

9 2.8% Phenotypic variance

Fasting glucose 4 1.5% Phenotypic variance

WhereistheDarkMatter?

4

Hypothesistestingvs.Estimation

GWAS=hypothesistesting– Stringentp-valuethreshold– Estimatesofeffectsbiased(“Winner’sCurse”)

Canweestimate thetotalproportionofvariationaccountedforbyallSNPs?

5

Amodelforasinglecausalvariant

AA AB BBfrequency (1-p)2 2p(1-p) p2

x 0 1 2effect 0 b 2bw =[x-E(x)]/sx -2p/√{2p(1-p)} (1-2p)/√{2p(1-p)} 2(1-p)/√{2p(1-p)}

yj = µ’ +xijbi +ej x=0,1,2{standardassociationmodel}

yj = µ +wijuj +ej u=bsx;µ =µ’+bsx

6

yj = µ +Swijuj +ej

= µ +gj +ej

y = µ1 +g +e

= µ1 +Wu +e

7

Weighting scheme 1

Multiple(M)causalvariants

Letubearandomvariable,u~N(0,su2)

Thensg2 =Msu

2

var(y)=WW’ su2 +Ise

2

=WW’(sg2/M)+Ise

2

=Gsg2 +Ise

2

8

Model with individual genome-wide additive values using relationships (G) at the causal variants is equivalent to a model

fitting all causal variants

We can estimate genetic variance just as if we would do using pedigree relationships

Equivalence

IfweestimateG fromSNPs:– loseinformationduetoimperfectLDbetweenSNPsandcausalvariants

– howmuchwelosedependson• densityofSNPs• allelefrequencyspectrumofSNPsvs.causalvariants

– estimateofvarianceàmissingheritability

9

G fromMSNPs:

Gjk =(1/M)S {xij – 2pi)(xik – 2pi)/{2pi(1-pi)}

=(1/M)S wijwik

Butwedon’thavethecausalvariants

• EstimaterealisedrelationshipmatrixfromSNPs• Estimateadditivegeneticvariance

y =Xb +e =Wu +e,var(y)=Gsg2 +Ise

2

Gjk =(1/M)S {xij – 2pi)(xik – 2pi)/{2pi(1-pi)}=(1/M)S wijwik

• Basepopulation=currentpopulation• Weightingscheme1 10

Methods(Yangetal.2010)

var(y) =V =Gσ g2 + Iσ e

2

y standardised~N(0,1)

Nofixedeffectsotherthanmean

G estimatedfromSNPs

Residualmaximumlikelihood(REML)

11

Statisticalanalysis

h2 ~ 0.5 (SE 0.1)

12

Results

[Yang et al. 2010, Nature Genetics]

13[Visscher et al. 2010, Twin Research and Human Genetics]

Checkingforpopulationstructure

GeneticvarianceassociatedwithallSNPscanbeestimatedfromGWASdata

– useSNPstoestimateG– usephenotypeson“unrelated”individualsandGtoestimategeneticvariance

Empiricalresults:mostadditivegeneticvariationforheightiscapturedbycommonSNPs

– little‘missing’heritability– GWASworksfine

14

ConclusionsYangetal.2010

y = mean + g1 + g2 + g3 + g4 + g5 + evar(gi) = (WiWi’/Mi)σi

2 for SNPs in group i

Examples of groupings:• chromosome• genome annotation• MAF• LD

15

Partitioning of genetic variation

IfwecanestimatethevariancecapturedbySNPsgenome-wide,weshouldbeabletopartitionitandattributevariancetoregionsofthegenome

“Populationbasedlinkageanalysis”

Application(2):partitioningvariation

16

17

Exampleonquantitativetraits


1

2

3

4

5

6

789

101112

13

14

15

16

17

1819

20

21

22

0

0.01

0.02

0.03

0.04

0.05

0 50 100 150 200 250

Varia

nceexplainedbyeachchromosom

e

Chromosomelength(Mb)

Slope=1.6×10-4

P =1.4×10-6

R2 =0.695

Height(combined)

12

34

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

2122

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250

Varia


e


Slope=2.3×10-5

P =0.214R2 =0.076

BMI(combined)

18

longerchromosomesexplainmorevariation

Partitioning onchromosomes


1

234

5

6

7

8

9

10

11

12

13 14

15

16

17

1819

20

21 22

R²=0.511

0.000

0.002

0.004

0.006

0.008

0.010

0.00 0.01 0.02 0.03 0.04 0.05

Varia

nceexplainedbyGIANTheightSNPson

eachch

romosom

e

Varianceexplainedbyeachchromosome

Height (11,586 unrelated)

1

2

34

5

6

7

8

9

10

11

12

13

14

1516

17

18

19

20

2122

0.000

0.004

0.008

0.012

0.016

0.020

0.000 0.004 0.008 0.012 0.016 0.020

Varia

nceexplainedbychrom

osom

e(adjustedfortheFTO

SNP)

Varianceexplainedbychromosome(noadjustment)

BMI(11,586unrelated)

FTO

19

ResultsareconsistentwithreportedGWAS


1

234

5

6

78

9

1011

12

13

14

151617

18

192021220.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0 50 100 150 200 250

Varia


e


Slope=6.9×10-5

P =0.524R2 =0.021

vWF(ARIC)

20

Inferencerobustwithrespecttogeneticarchitecture

1

2

3

4

5

6

7

8

9

1011

12

13

14

151617

18

192021

22

0.00

0.03

0.06

0.09

0.12

0.15

0.00 0.03 0.06 0.09 0.12 0.15

Varia

nceexplainedbychrom

osom

e(adjustedfortheABO

SNP)

Varianceexplainedbychromosome(noadjustment)

vWF(6,662unrelated)

ABO


0

0.01

0.02

0.03

0.04

0.05

0.06

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varia

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

Height(combined)17,277proteincoding geneshGg

2 =0.328(s.e.=0.024)hGi

2 =0.126(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=2.1x10-10

0

0.005

0.01

0.015

0.02

0.025

0.03

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Varia

nceexplained

Chromosome

intergenic(± 20Kb)

genic(± 20Kb)

BMI(combined)17,277proteincoding geneshGg

2 =0.117(s.e.=0.023)hGi

2 =0.047(s.e.=0.022)Coverageofgenicregions=49.4%P(observedvs.expected)=0.022

21

Genic regionsexplainvariationdisproportionately


Partitioning ongenomeannotation

Application(3):Usingimputedsequencedata

HowmuchinformationisgainedbyusingSNParraydataimputedtoafullysequencedreference?

Howmuchislostrelativetowholegenomesequencing?

PartitionvariationaccordingtoMAFandLD

22[Yang et al. 2015 Nature Genetics]

AccountingforLDandMAFspectrumallowsunbiasedestimationofgeneticvariance

23

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

1.2#

Random# More#common# Rarer# Rarer#&#DHS#

Herita

bility*es,m

ate*

GREML:SC#GREML:MS#LDAK#LDAK:MS#LDres#LDres:MS#GREML:LDMS#

[Yang et al. 2015 Nature Genetics]

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

Random# More#common# Rarer# Rarer#&#DHS#

Heritab

ility*es

,mate*

7MAF_4LD#

7MAF_3LD#

7MAF_2LD#

2MAF_2LD#

Verylittledifferencein“taggability”betweenSNPchips

24

Genetic variation captured after imputation:96% due to common variants73% due to rare variants

0.0#

0.2#

0.4#

0.6#

0.8#

1.0#

0# 0.1# 0.2# 0.3# 0.4# 0.5# 0.6# 0.7# 0.8# 0.9#

Prop

or%o

n'of'varia%o

n'captured

'

Imputa%on'R2'threshold'

Common#1#Affymetrix#6#

Common#1#Affymetrix#Axiom#

Common#1#Illumina#OmniExpress#

Common#1#Illumina#Omni2.5#

Common#1#Illumina#CoreExome#

Rare#1#Affymetrix#6#

Rare#1#Affymetrix#Axiom#

Rare#1#Illumina#OmniExpress#

Rare#1#Illumina#Omni2.5#

Rare#1#Illumina#CoreExome#


n=45kdataonheightandBMI

25

Totals~60% for height~30% for BMI

0.00#

0.05#

0.10#

0.15#

0.20#

0.25#

<#0.1# 0.1#~#0.2# 0.2#~#0.3# 0.3#~#0.4# 0.4#~#0.5#

Varia

nce(explaine

d(

MAF(stra2fied(variant(group(

Height# BMI#


0.00#

0.02#

0.04#

0.06#

0.08#

0.10#

0.12#

0.14#

2.5e+5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

Varia

nce(explaine

d(

MAF(

1st#quar4le#(low#LD)#

2nd#quar4le#

3rd#quar4le#

4th#quar4le#(high#LD)#

0.00#

0.01#

0.02#

0.03#

0.04#

0.05#

0.06#

0.07#

2.5e,5#~#0.001#

0.001#~#0.01#

0.01#~#0.1#

0.1#~#0.2#

0.2#~#0.3#

0.3#~#0.4#

0.4#~#0.5#

Varia

nce(explaine

d(

MAF(

100%

80%

45%

16%

SlidebyRobertMaier 26

h2 overestimation?untaggedrarevariants?

better tagging of ungenotyped variants

samplesize/power

Partitioningvarianceofheight

TotalvarianceHeritability (based on Twin or family studies)SNP heritability from imputation to sequenced referenceSNP-heritability (variance explained by all genotyped SNPs ontheChip)VarianceexplainedbygenomewidesignificantSNPs

missingheritability60%

Estimated relatedness and trait architecture

𝐺"#$ = (1

∑ 2𝑝+ 1 − 𝑝+ -./0+1-

)3 𝑧+#𝑧+$ 2𝑝+ 1 − 𝑝+ /0

+1-

If G describes the genetic covariance between individuals [var(g) = Gsg

2], then what is the equivalent linear model in terms of SNP effects?

27

Equivalent models

𝐲 = 𝟏𝜇 + ∑ 𝐗#𝛽#;# + e

𝛽#~𝑁(0, 2𝑝#(1 − 𝑝#)/ 𝜎AB)

ℎ#B = 2𝑝# 1 − 𝑝# 𝐸 𝛽B = 2𝑝#(1 − 𝑝#)-./𝜎AB

28

S = -1

𝛽#~𝑁(0, 2𝑝#(1 − 𝑝#F- 𝜎AB)

ℎ#B = 2𝑝#(1 − 𝑝#)G𝜎AB = 𝜎AB

• Weighting scheme 1• All SNPs contribute equally to heritability• Rare variants have bigger effects• “Purifying selection model”

29

S = 0

𝛽#~𝑁 0, 2𝑝# 1 − 𝑝#G 𝜎AB ~𝑁(0, 𝜎AB)

ℎ#B = 2𝑝#(1 − 𝑝#)-𝜎AB

• Weighting scheme 2• Common SNPs contribute more to heritability• Rare and common variants have same effects• “Neutral model”

30

Weighting scheme and genetic architecture

• Weighting schemes 1 and 2 can be justified in two ways:– IBD vs IBS– A priori assumption about the relationship

between allele frequency of effect size (natural selection)

• Can we estimate genetic architecture from the data?

31

Bayesian mixture model (BayesS)

𝒚 = 𝟏𝜇 +3 𝑿#𝛽#�

#+ 𝒆

𝛽#~𝑁 0, 2𝑝#𝑞#/𝜎AB 𝜋 + 𝜙 1 − 𝜋

• S measures the relationship between effect size and MAF 𝑝#– S = 0: independence– S < 0: negatively related (rare variant tends to have large effect)– S > 0: positively related (common variant tends to have large effect)– GCTAdefault:S=-1

• 𝜋 is the polygenicity (proportion of SNPs with non-zero effects)

• ℎ/_`B = 𝑉𝑎𝑟 𝑔 /𝜎fB where g = ∑ 𝑿#𝛽#�#

• Simultaneously estimate SNP effects and genetic architecture parameters using MCMC

• Account for LD between SNPs

Zeng …. Yang 2017 (BioRxiv)

Direction of S distinguishes stabilising selection from directional and disruptive selection

S = 0 S = 0.14 S = 0.13 S = −0.19 S = 0.09

S = 3.70

Neutral Directional (+) Directional (−) Stabilizing Disruptive

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.5

1.0

1.5

2.0

2.5

MAF Bin

Varia

nce

of C

oded

Alle

le E

ffect

s

𝛽#~𝑁 0, 1 + 2𝑝#𝑞#/𝜎AB


Height vs. BMI

Polygenicity

Heritability

S

0.04 0.06 0.08 0.10

0.25 0.30 0.35 0.40 0.45 0.50 0.55

−0.5 −0.4 −0.3 −0.20

5

10

15

20

0

30

60

90

0

100

200

300

Value

Density

Height

BMI

• Both height and BMI have

been under selection

• Selection has been stronger for

height than BMI-associated

SNPs

• Height is more heritable than

BMI.

• BMI is more polygenic than

height.

Posterior distribution of genetic architecture parameters


●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

S Heritability Polygenicity

−0.75 −0.50 −0.25 0.00 0.25 0.1 0.2 0.3 0.4 0.5 0.00 0.05 0.10 0.15Fluid intelligence score

Neuroticism scoreMDD

Birth weightBody fat percentage

Diastolic blood pressureBMI

WeightT2D

BaldnessSystolic blood pressure

Peak expiratory flowEducational attainment

Forced expiratory volumeBasal metabolic rate

Hand grip strength leftAge menarche

Age at first live birthHeel BMD T score

Heel QUIHand grip strength right

Forced vital capacityHeight

HCadjBMIMean time to correctly identify matches

WHRadjBMIWCadjBMIPulse rate

Age at menopause

Highest Probability Density

40000

60000

80000

100000

120000N

29 traits in UKB


24 traits with significant 𝑆h

0.00

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4 0.5MAF

Cum

ulat

ive g

enet

ic v

aria

nce

expl

aine

d

Age at menopause

Pulse rate

Mean time to correctly identify matches

WCadjBMI

Age at first live birth

WHRadjBMI

Hand grip strength right

HCadjBMI

Hand grip strength left

Forced vital capacity

Age menarche

Educational attainment

Forced expiratory volume

Basal metabolic rate

Systolic blood pressure

Heel QUI

Peak expiratory flow

Heel BMD T score

Height

Weight

Diastolic blood pressure

BMI

Body fat percentage

Baldness

●

●

●

●

●

●

●

●

●●

●●

●

●

●● ●

●

●

●

●

●

●

●

0.52

0.54

0.56

0.58

0.60

0.3 0.4 0.5 0.6

Absolute value of S

AUC


0

5

10

15

−0.6 −0.3 0.0 0.3 0.6Posterior Mode

Cou

nt

S

Heritability

Polygenicity

●

●

●●● ● ●●● ●●

●●● ●●●●●

●● ●

●●

●

●

●●

●

r = 0.048

0.0

0.2

0.4

0.6

0.1 0.2 0.3 0.4 0.5Heritability

Abso

lute

val

ue o

f S

●

●

●● ●● ●●●●●

●● ●●●●●●

● ● ●

●●

●

●

●●

●

r = −0.359

0.0

0.2

0.4

0.6

0.00 0.05 0.10Polygenicity

Abso

lute

val

ue o

f S

● ●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

r = −0.021

0.1

0.2

0.3

0.4

0.5

0.00 0.05 0.10Polygenicity

Her

itabi

lity

Mean Median

S -0.350 -0.367

h2SNP 0.223 0.222

𝜋 5.8% 5.1%

Summarize over 29 traits


Multiplemethodstoestimateadditivegeneticvariance

Individual-leveldata- GREML- Haseman-Elston regression

(yjyk)=mean +bGjk +e

Summarydata- LDscore regression

Consideration:- dataavailability- modelassumptions- computation

38

• Dense SNP panels allow the estimation of the expected genetic covariance between distant relatives

• A model based upon estimated relationships from SNPs is equivalent to a model fitting all SNPs simultaneously

• The total genetic variance due to LD between common SNPs and (unknown) causal variants can be estimated

• Genetic variance captured by common SNPs can be partitioned across the genome

• Different methods to estimate relatedness from SNPs assume different genetic trait architectures

39

Key concepts

Download - Estimation of genetic variation and SNP- heritability from

Top Related