partitioning heritability using gwas summary statistics with ld score regression

Partitioning heritability by functional annotation using summary statistics

Hilary FinucaneMIT Department of Mathematics

HSPH Department of EpidemiologyOctober 21, 2014

Acknowledgements

• Brendan Bulik-Sullivan• Alkes Price• Ben Neale• Alexander Gusev• Nick Patterson• Po-Ru Loh• Gosia Trynka• Han Xu• Verneri Anttila• Yakir Reshef

• Chongzhi Zang• Stephan Ripke• Schizophrenia

Working Group of the PGC

• Shaun Purcell• Mark Daly• Eli Stahl• Soumya Raychaudhuri• Sara Lindstrom

Partitioning heritability by functional annotation is an important goal

• Learn about genetic architecture of disease– Where does the heritability lie?

• Learn about disease biology– What are the relevant cell types?

• Learn about the functional annotations– Which functional annotations show the highest

enrichments?• Downstream applications– Fine mapping– Risk prediction– GWAS priors

Maurano et al. 2012 ScienceTrynka et al. 2013 Nat GenetPickrell 2014 AJHG

What is partitioned heritability?

• Our model is

Where • Y is an individual’s phenotype, • Xj is an individual’s genotype at the j-th SNP

(normalized to mean 0 and variance 1),• βj is the effect of SNP j, and • ε is noise and random environmental effects.


• Our model is

• We define heritability as


• Our model is

• We define heritability as

and the heritability of a category as

Partitioning heritabilityusing variance components has yielded many insights

• 31% of schizophrenia SNP-heritability lies in CNS+ gene regions spanning 20% of the genome1.

• 28% of Tourette syndrome SNP-heritability and 29% of OCD SNP-heritability lies in parietal lobe eQTLs spanning 5% of the genome2.

• 79% of SNP-heritability, averaged across WTCCC and WTCCC2 traits, lies in DHS regions spanning 16% of the genome3.

2 Davis et al. 2013 PLoS Genet1 Lee et al. 2012 Nat Genet

3 Gusev et al. in press AJHG

A method for partitioning heritability from summary statistics is needed

• Variance components methods are intractable at very large sample sizes.

• There is lots of information in large meta-analyses.

• Lots of publicly available summary statistics allow us to compare many phenotypes and many annotations to get a big picture.

Our method partitions heritability from summary statistics

• Input: – Sample size and p-value for every SNP tested in a

large GWAS of a quantitative or case-control trait– LD information from a reference panel like 1000G– Genome annotation of interest– Other genome annotations to include in the

model.

Our method partitions heritability from summary statistics

• Input: – Sample size and p-value for every SNP tested in a

large GWAS of a quantitative or case-control trait– LD information from a reference panel like 1000G– Genome annotation of interest– Other genome annotations to include in the model.

• Output:– Estimated proportion of heritability that falls within

the annotation of interest.– Enrichment = (% of heritability) / (% of SNPs)

Outline

• Description of method

• Validation on simulated data

• Results on real data

LD is important for summary statistics-based methods

• Some SNPs have a lot of LD to other SNPs in the same category.

• Some SNPs have a lot of LD to SNPs in other categories.

• Some SNPs do not have a lot of LD to other SNPs.

LD is important for summary statistics-based methods

• Some SNPs have a lot of LD to other SNPs in the same category.

• Some SNPs have a lot of LD to SNPs in other categories.

• Some SNPs do not have a lot of LD to other SNPs.

Our solution: LD Score Regression.See Bulik-Sullivan et al. biorxiv (under revision, Nat Genet) and ASHG 2014 poster 1787T Bulik-Sullivan

LD Score Regression: basic intuition

High LD region Low LD region

Chi-s

quar

e

• Polygenicity causes more chi-square statistic inflation in high LD regions than in low LD regions

Mean chi-square for high LD region: high Mean chi-square for low LD region: low

Multivariate LD Score Regression: basic intuition

Enriched category BIG difference between lots of LD vs little LD to the category

Depleted category SMALL difference between lots of LD vs little LD to the category

High chi-square Low chi-square

Low chi-square Low chi-square

Multivariate LD Score regression allows us to partition SNP heritability

• Multivariate LD Score: the sum over all SNPs in a category of r^2 with that SNP.



• Derivations based on a polygenic model give:



• Derivations based on a polygenic model give:

• Easily extends to overlapping categories.


To estimate partitioned heritability: • Estimate LD Scores from a reference panel.• Regress chi-square statistics on LD Scores.• The slopes give the partitioned heritability.• For best results, use many categories!

Outline




Multivariate LD Score regression works in simulations

• Standard errors are over 100 simulations.• Simulated quantitative phenotype with h2 = 0.5. • M = 110,444, N = 2,713

True h2(DHS) 0.092REML (2 cat) 0.089 (0.006)LD Score (27 cat) 0.086 (0.012)

Null simulations DHS 3x enrichedTrue h2(DHS) 0.276REML (2 cat) 0.281 (0.006)LD Score (27 cat) 0.278 (0.013)





FANTOM5 Enhancer* causalTrue h2(DHS) 0.379REML (2 cat) 0.531 (0.007)LD Score (27 cat) 0.361 (0.015)

* Andersson et al. 2014 Nature





FANTOM5 Enhancer* causalTrue h2(DHS) 0.379REML (2 cat) 0.531 (0.007)LD Score (27 cat) 0.361 (0.015)

* Andersson et al. 2014 Nature

FANTOM5 Enhancer* causal, Excluded from the model


Outline




Phenotype Citation Sample size

Schizophrenia SCZ working grp of the PGC, 2014 Nature 70,100

Bipolar Disorder Bip working grp of the PGC, 2011 Nat Genet 16,731

Rheumatoid Arthritis* Okada et al., 2014 Nature 38,242

Crohn’s Disease* Jostins et al., 2012 Nature 20,883

Ulcerative Colitis* Jostins et al., 2012 Nature 27,432

Height Wood et al., 2014 Nature Genetics 253,280

BMI Speliotes et al., 2010 Nature Genetics 123,865

Coronary Artery Disease Schunkert et al., 2011 Nature Genetics 86,995

College (yes/no) Rietveld et al., Science 2013 126,559

Type 2 Diabetes Morris et al., 2012 Nature Genetics 69,033

*HLA locus excluded from all analyses for autoimmune traits

Datasets analyzed

Annotations used

Mark Source/reference

Coding, 3’ UTR, 5’ UTR, Promoter, Intron UCSC; Gusev et al., in press AJHG

Digital Genomic Footprint, TFBS ENCODE; Gusev et al., in press AJHG

CTCF binding site, Promoter Flanking, Repressed, Transcribed, TSS, Enhancer, Weak Enhancer

ENCODE; Hoffman et al., 2012 Nucleic Acids Research

DHS, fetal DHS, H3K4me1, H3K4me3, H3K9ac

Trynka et al., 2013 Nature Genetics.*

Conserved Lindblad-Toh et al., 2011 Nature

FANTOM5 Enhancer Andersson et al., 2014 Nature

lincRNAs Cabili et al., 2011 Genes Dev

DHS and DHS promoter Maurano et al., 2012 Science

H3K27ac Roadmap; PGC2 2014 Nature

*Post-processed from ENCODE and Roadmap data by S. Raychaudhuri and X. Liu labs

Coding, Intergenic, Enhancer, H3K4me3, and DHS enrichments in six phenotypes

(Bars indicate 95% confidence intervals)

Coding, Intergenic, Enhancer, H3K4me3, DHS, and Conserved enrichments in six phenotypes

*Lindblad-Toh et al., 2011 Nature


Coding, Intergenic, Enhancer, H3K4me3, DHS, and FANTOM5 Enhancer enrichments in six phenotypes


*Andersson et al., 2014 Nature

Cell-type specific H3K27ac enrichments inform trait biology

• We group 56 cell types into 7 basic categories.• For each trait (10 traits) – For each category (7 categories)• We asses the significance of improvement to

the model from adding that category.

Conclusions

• Many annotations are enriched in many phenotypes.

• Conserved regions, 2.6% of SNPs, are estimated to explain 30% of heritability on average.

• FANTOM5 Enhancers are extremely enriched in auto-immune traits.

• H3K27ac cell-type enrichment matches and extends our understanding of disease biology.

Acknowledgements

• Brendan Bulik-Sullivan• Alkes Price• Ben Neale• Alexander Gusev• Nick Patterson• Po-Ru Loh• Gosia Trynka• Han Xu• Verneri Anttila• Yakir Reshef

• Chongzhi Zang• Stephan Ripke• Schizophrenia

Working Group of the PGC

• Shaun Purcell• Mark Daly• Eli Stahl• Soumya Raychaudhuri• Sara Lindstrom