partitioning heritability using gwas summary statistics with ld score regression

34
Partitioning heritability by functional annotation using summary statistics Hilary Finucane MIT Department of Mathematics HSPH Department of Epidemiology October 21, 2014

Upload: bbuliksullivan

Post on 01-Dec-2014

388 views

Category:

Science


3 download

DESCRIPTION

Hilary Finucane ASHG 2014 talk

TRANSCRIPT

Page 1: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Partitioning heritability by functional annotation using summary statistics

Hilary FinucaneMIT Department of Mathematics

HSPH Department of EpidemiologyOctober 21, 2014

Page 2: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Acknowledgements

• Brendan Bulik-Sullivan• Alkes Price• Ben Neale• Alexander Gusev• Nick Patterson• Po-Ru Loh• Gosia Trynka• Han Xu• Verneri Anttila• Yakir Reshef

• Chongzhi Zang• Stephan Ripke• Schizophrenia

Working Group of the PGC

• Shaun Purcell• Mark Daly• Eli Stahl• Soumya Raychaudhuri• Sara Lindstrom

Page 3: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Partitioning heritability by functional annotation is an important goal

• Learn about genetic architecture of disease– Where does the heritability lie?

• Learn about disease biology– What are the relevant cell types?

• Learn about the functional annotations– Which functional annotations show the highest

enrichments?• Downstream applications– Fine mapping– Risk prediction– GWAS priors

Maurano et al. 2012 ScienceTrynka et al. 2013 Nat GenetPickrell 2014 AJHG

Page 4: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

What is partitioned heritability?

• Our model is

Where • Y is an individual’s phenotype, • Xj is an individual’s genotype at the j-th SNP

(normalized to mean 0 and variance 1),• βj is the effect of SNP j, and • ε is noise and random environmental effects.

Page 5: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

What is partitioned heritability?

• Our model is

• We define heritability as

Page 6: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

What is partitioned heritability?

• Our model is

• We define heritability as

and the heritability of a category as

Page 7: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Partitioning heritabilityusing variance components has yielded many insights

• 31% of schizophrenia SNP-heritability lies in CNS+ gene regions spanning 20% of the genome1.

• 28% of Tourette syndrome SNP-heritability and 29% of OCD SNP-heritability lies in parietal lobe eQTLs spanning 5% of the genome2.

• 79% of SNP-heritability, averaged across WTCCC and WTCCC2 traits, lies in DHS regions spanning 16% of the genome3.

2 Davis et al. 2013 PLoS Genet1 Lee et al. 2012 Nat Genet

3 Gusev et al. in press AJHG

Page 8: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

A method for partitioning heritability from summary statistics is needed

• Variance components methods are intractable at very large sample sizes.

• There is lots of information in large meta-analyses.

• Lots of publicly available summary statistics allow us to compare many phenotypes and many annotations to get a big picture.

Page 9: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Our method partitions heritability from summary statistics

• Input: – Sample size and p-value for every SNP tested in a

large GWAS of a quantitative or case-control trait– LD information from a reference panel like 1000G– Genome annotation of interest– Other genome annotations to include in the

model.

Page 10: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Our method partitions heritability from summary statistics

• Input: – Sample size and p-value for every SNP tested in a

large GWAS of a quantitative or case-control trait– LD information from a reference panel like 1000G– Genome annotation of interest– Other genome annotations to include in the model.

• Output:– Estimated proportion of heritability that falls within

the annotation of interest.– Enrichment = (% of heritability) / (% of SNPs)

Page 11: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Outline

• Description of method

• Validation on simulated data

• Results on real data

Page 12: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Outline

• Description of method

• Validation on simulated data

• Results on real data

Page 13: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

LD is important for summary statistics-based methods

• Some SNPs have a lot of LD to other SNPs in the same category.

• Some SNPs have a lot of LD to SNPs in other categories.

• Some SNPs do not have a lot of LD to other SNPs.

Page 14: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

LD is important for summary statistics-based methods

• Some SNPs have a lot of LD to other SNPs in the same category.

• Some SNPs have a lot of LD to SNPs in other categories.

• Some SNPs do not have a lot of LD to other SNPs.

Our solution: LD Score Regression.See Bulik-Sullivan et al. biorxiv (under revision, Nat Genet) and ASHG 2014 poster 1787T Bulik-Sullivan

Page 15: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

LD Score Regression: basic intuition

High LD region Low LD region

Chi-s

quar

e

• Polygenicity causes more chi-square statistic inflation in high LD regions than in low LD regions

Mean chi-square for high LD region: high Mean chi-square for low LD region: low

Page 16: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score Regression: basic intuition

Enriched category BIG difference between lots of LD vs little LD to the category

Depleted category SMALL difference between lots of LD vs little LD to the category

High chi-square Low chi-square

Low chi-square Low chi-square

Page 17: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score regression allows us to partition SNP heritability

• Multivariate LD Score: the sum over all SNPs in a category of r^2 with that SNP.

Page 18: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score regression allows us to partition SNP heritability

• Multivariate LD Score: the sum over all SNPs in a category of r^2 with that SNP.

• Derivations based on a polygenic model give:

Page 19: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score regression allows us to partition SNP heritability

• Multivariate LD Score: the sum over all SNPs in a category of r^2 with that SNP.

• Derivations based on a polygenic model give:

• Easily extends to overlapping categories.

Page 20: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score regression allows us to partition SNP heritability

To estimate partitioned heritability: • Estimate LD Scores from a reference panel.• Regress chi-square statistics on LD Scores.• The slopes give the partitioned heritability.• For best results, use many categories!

Page 21: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Outline

• Description of method

• Validation on simulated data

• Results on real data

Page 22: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score regression works in simulations

• Standard errors are over 100 simulations.• Simulated quantitative phenotype with h2 = 0.5. • M = 110,444, N = 2,713

True h2(DHS) 0.092REML (2 cat) 0.089 (0.006)LD Score (27 cat) 0.086 (0.012)

Null simulations DHS 3x enrichedTrue h2(DHS) 0.276REML (2 cat) 0.281 (0.006)LD Score (27 cat) 0.278 (0.013)

Page 23: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score regression works in simulations

• Standard errors are over 100 simulations.• Simulated quantitative phenotype with h2 = 0.5. • M = 110,444, N = 2,713

True h2(DHS) 0.092REML (2 cat) 0.089 (0.006)LD Score (27 cat) 0.086 (0.012)

Null simulations DHS 3x enrichedTrue h2(DHS) 0.276REML (2 cat) 0.281 (0.006)LD Score (27 cat) 0.278 (0.013)

FANTOM5 Enhancer* causalTrue h2(DHS) 0.379REML (2 cat) 0.531 (0.007)LD Score (27 cat) 0.361 (0.015)

* Andersson et al. 2014 Nature

Page 24: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Multivariate LD Score regression works in simulations

• Standard errors are over 100 simulations.• Simulated quantitative phenotype with h2 = 0.5. • M = 110,444, N = 2,713

True h2(DHS) 0.092REML (2 cat) 0.089 (0.006)LD Score (27 cat) 0.086 (0.012)

Null simulations DHS 3x enrichedTrue h2(DHS) 0.276REML (2 cat) 0.281 (0.006)LD Score (27 cat) 0.278 (0.013)

FANTOM5 Enhancer* causalTrue h2(DHS) 0.379REML (2 cat) 0.531 (0.007)LD Score (27 cat) 0.361 (0.015)

* Andersson et al. 2014 Nature

FANTOM5 Enhancer* causal, Excluded from the model

True h2(DHS) 0.379REML (2 cat) 0.531 (0.007)LD Score (26 cat) 0.318 (0.014)

Page 25: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Outline

• Description of method

• Validation on simulated data

• Results on real data

Page 26: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Phenotype Citation Sample size

Schizophrenia SCZ working grp of the PGC, 2014 Nature 70,100

Bipolar Disorder Bip working grp of the PGC, 2011 Nat Genet 16,731

Rheumatoid Arthritis* Okada et al., 2014 Nature 38,242

Crohn’s Disease* Jostins et al., 2012 Nature 20,883

Ulcerative Colitis* Jostins et al., 2012 Nature 27,432

Height Wood et al., 2014 Nature Genetics 253,280

BMI Speliotes et al., 2010 Nature Genetics 123,865

Coronary Artery Disease Schunkert et al., 2011 Nature Genetics 86,995

College (yes/no) Rietveld et al., Science 2013 126,559

Type 2 Diabetes Morris et al., 2012 Nature Genetics 69,033

*HLA locus excluded from all analyses for autoimmune traits

Datasets analyzed

Page 27: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Annotations used

Mark Source/reference

Coding, 3’ UTR, 5’ UTR, Promoter, Intron UCSC; Gusev et al., in press AJHG

Digital Genomic Footprint, TFBS ENCODE; Gusev et al., in press AJHG

CTCF binding site, Promoter Flanking, Repressed, Transcribed, TSS, Enhancer, Weak Enhancer

ENCODE; Hoffman et al., 2012 Nucleic Acids Research

DHS, fetal DHS, H3K4me1, H3K4me3, H3K9ac

Trynka et al., 2013 Nature Genetics.*

Conserved Lindblad-Toh et al., 2011 Nature

FANTOM5 Enhancer Andersson et al., 2014 Nature

lincRNAs Cabili et al., 2011 Genes Dev

DHS and DHS promoter Maurano et al., 2012 Science

H3K27ac Roadmap; PGC2 2014 Nature

*Post-processed from ENCODE and Roadmap data by S. Raychaudhuri and X. Liu labs

Page 28: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Coding, Intergenic, Enhancer, H3K4me3, and DHS enrichments in six phenotypes

(Bars indicate 95% confidence intervals)

Page 29: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Coding, Intergenic, Enhancer, H3K4me3, DHS, and Conserved enrichments in six phenotypes

*Lindblad-Toh et al., 2011 Nature

(Bars indicate 95% confidence intervals)

Page 30: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Coding, Intergenic, Enhancer, H3K4me3, DHS, and FANTOM5 Enhancer enrichments in six phenotypes

(Bars indicate 95% confidence intervals)

*Andersson et al., 2014 Nature

Page 31: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Cell-type specific H3K27ac enrichments inform trait biology

• We group 56 cell types into 7 basic categories.• For each trait (10 traits) – For each category (7 categories)• We asses the significance of improvement to

the model from adding that category.

Page 32: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Page 33: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Conclusions

• Many annotations are enriched in many phenotypes.

• Conserved regions, 2.6% of SNPs, are estimated to explain 30% of heritability on average.

• FANTOM5 Enhancers are extremely enriched in auto-immune traits.

• H3K27ac cell-type enrichment matches and extends our understanding of disease biology.

Page 34: Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Acknowledgements

• Brendan Bulik-Sullivan• Alkes Price• Ben Neale• Alexander Gusev• Nick Patterson• Po-Ru Loh• Gosia Trynka• Han Xu• Verneri Anttila• Yakir Reshef

• Chongzhi Zang• Stephan Ripke• Schizophrenia

Working Group of the PGC

• Shaun Purcell• Mark Daly• Eli Stahl• Soumya Raychaudhuri• Sara Lindstrom