jian yang - mixed linear model analyses of human complex traits using snp data

Mixed linear model analyses of human complex traits using SNP

data

Jian Yang Queensland Brain Ins1tute

The University of Queensland

1

Why do we need a mixed linear model?

Linear model •  y = b0 + x1b1 + x2b2 + … + xpbp + e y = phenotype xi = independent variable y ~ N(b0 + x1b1 + x2b2 + … + xpbp, σ2e) b0 = mean term b1 … bp = effect sizes (regression coefficients) e = residual, e ~ N(0, σ2e)

Linear model

•  In matrix form y = Xb + e y = {yj}n x 1; X = {Xij}n x p; b = {bi}p x 1; e = {ej}n x 1 •  Es1ma1on b-‐hat = (XTX)-‐1XTy var(b-‐hat) = σ2e(XTX)-‐1

Special cases

•  Simple regression y = b0 + x1b1 + e b1-‐hat = b-‐hat = X1

Ty / (X1TX1)

E(b1-‐hat) = b1 = cov(x1, y) / var(x1) var(b1-‐hat) = σ2e / [n*var(x1)]

•  Condi1onal analysis y | b2 … bp = b0 + x1b1 + e

Limita1ons

•  n > p: sample size needs to be > than the number of parameters

•  All the effect sizes are treated as fixed (we have no idea about the varia1on in effect sizes)

•  What if n << p?

What is a mixed linear model (MLM)?

•  y = Xb + Zu + e

Fixed effects: b (special case: X = 1 and b = b0) Random effects: u = {ui}, u ~ N(0, σ2uA) A = correla1on matrix between ui and uj E(y) = Xb var(y) = V = ZAZTσ2u + Iσ2e

Parameter es1ma1on •  Es1ma1on of variance components (σ2u) logL = -‐1/2(log|V| + log|XTV-‐1X| + yTPy P = V-‐1 -‐ V-‐1X(XTV-‐1X)-‐1XTV-‐1

•  Predic1on of random effects (u) u-‐hat = σ2u-‐hat ZTPy

•  Es1ma1on of fixed effects (b) b-‐hat = (XTV-‐1X)-‐1XTy

Linear model: b-‐hat = (XTX)-‐1XTy

MLM analysis of human complex traits

•  Animal and plant breeding –  predic1ng breeding values –  linkage mapping (QTL mapping)

•  Human gene1cs (before 2007) –  pedigree based analysis of variance (heritability) –  linkage mapping

•  Human gene1cs (aker 2007) –  esBmaBng SNP-‐based heritability –  associa1on analysis –  gene1c risk predic1on

Background

Mendelian traits Complex traits

Cys1c fibrosis Human height

Schizophrenia

Obesity

Major ques1ons

•  Are these traits heritable?

•  If so, what is the heritability?

•  How many genes involved and where are they located?

Risk of schizophrenia (%)

13

Resemblance between twins for human height

Heritability = ~80%

Heritability = ~80%

Heritability = 40%~60% Resemblance between relaBves for body mass index (BMI)

Relatedness CorrelaBon Full-‐sibs 0.36 Father-‐son 0.28

Complex traits such as height, BMI and SCZ are highly heritable.

14 8 genes for human complex traits before 2002

Glazier et al. 2002 Science

IdenBfying genes underlying complex traits

1700

30

8

15

Genome-‐wide AssociaBon Study (GWAS)

Manolio 2010 NEJM

Genome-‐wide threshold P = 5×10-‐8

Linear model (simple regression) y = b0 + x1b1 + e y = trait value x1 = SNP genotype (0, 1 or 2) b1-‐hat = X1

Ty / (X1TX1) = cov(x1,y) / var(x1)

SE2(b1-‐hat) = σ2e / [n var(x1)]

16

An explosion of gene discoveries

~5000 geneBc variants associated with ~650 traits / diseases

Glazier et al. 2002 Science

Prior to GWAS GWAS

0"

1000"

2000"

3000"

4000"

5000"

6000"

2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013"first"half"

Num

ber'o

f'SNPs'

Year'

Height: •  180 loci •  ~180K samples •  < 10% of variance explained •  heritability = ~80%

17

Schizophrenia: •  22 loci •  ~21K cases / ~38K controls •  < 3% of variance explained •  heritability = ~80%

The missing heritability problem

BMI: •  32 loci •  ~250K samples •  ~1% of variance explained •  Heritability = 40% ~ 60%

Lango Allen et al. 2010 Nature

Speliotes et al. 2010 Nat Genet

Ripke et al. 2013 Nat Genet

Fiwng all SNPs in a MLM •  y = Wu + e

W = {wij}n x m, wij = standardised SNP genotype u ~ N(0, Iσ2u) var(y) = ZZTσ2u + Iσ2e variance explained = mσ2u / (mσ2u + σ2e)

•  Let g = Zu

y = g + e g ~ N(0, Aσ2g), A = gene1c rela1onship matrix var(y) = Aσ2g + Iσ2e variance explained = σ2g / (σ2g + σ2e)

•  var(y) = (1/m)ZZT(mσ2u) + Iσ2e

A = ZZT / m

Family studies: comparing phenotypic similarity to family relatedness – Our method: comparing phenotypic similarity to gene8c similarity (es8mated from SNPs) in unrelated individuals GWAS: tes8ng a SNP at a 8me in unrelated samples – Our method: Es8ma8ng the contribu8on from all SNPs together

19

~50% of variaBon explained by all SNPs for height vs. ~10% from GWAS

Reconciling family studies and GWAS

20

GWAS vs All-‐SNP esBmaBon

Yang et al. 2011 Nat Genet

Lee et al. 2012 Nat Genet

Yang et al. 2010 Nat Genet

Many geneBc variants each with a small effect contribuBng to the trait variaBon

0% 10% 20% 30% 40% 50%

Height

Schizophrenia

Obesity (BMI) GWAS

Our method

Genome par11oning

•  Single component MLM y = g + e (or y = Wu + e)

•  Mul1-‐component MLM y = g1 + g2 + … + g22 + e

var(y) = A1σ2g1 + A2σ2g2 + … + A22σ2g22 + Iσ2e

1"

2"

3"4"

5"

6"

7"

8"

9"

10"

11"12"13"

14"15"

16"

17"

18"

19"

20"

21"22"

0.00"

0.01"

0.02"

0.03"

0" 50" 100" 150" 200" 250" 300"

Heritab

ility*

Chromosome*length*

22

Yang et al. 2011 Nat Genet Lee et al. 2012 Nat Genet Yang et al. unpublished

1

2

3

4

5

6

789

101112

13

14

15

16

17

1819

20

21

22

0

0.01

0.02

0.03

0.04

0.05

0.06

0 50 100 150 200 250

Heritability

Chromosome length (Mb)

12

3

45

6

7

89

1011

1213

141516

17

18

19

202122

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250

Heritability

Chromosome length (Mb)

~12,000 individuals 9000 cases 12,000 controls

Schizophrenia Height BMI

~25,000 individuals

ParBBoning the geneBc variance into individual chromosomes

GeneBc variants distributed across the whole genome

ParBBoning the geneBc variance based on funcBonal annotaBon

23

GeneBc signals are enriched in or close to funcBonal genes

Yang et al. 2011 Nat Genet Lee et al. 2012 Nat Genet

Schizophrenia

30%

35%

35% CNS+ genes

intergenic

Other genes 83%

17%

Height

68%

32% Genic es1mate

Intergenic es1mate

BMI

More …

•  Bivariate analysis – es1ma1ng the gene1c correla1on between two traits or two diseases using SNP data (Deary et al. 2012 Nature; Lee et al. 2013 Nat Genet)

•  Fiwng a mixture distribu1on rather than a single normal distribu1on to the random effects – e.g. Zhou et al. 2013 PLoS Genet

25

Linear model (simple regression based associaBon test)

y = b0 + x1b1 + e y = trait value; x1 = SNP genotype (0, 1 or 2) b1-‐hat = X1

Ty / (X1TX1) = cov(x1,y) / var(x1)

SE(b1-‐hat) = σ2e / [n var(x1)] Assump1on: e is independent and iden1cally distributed Issues: 1)  Relatedness: there are rela1ves in the sample –

inflated false posi1ve rate 2)  Popula1on stra1fica1on: individuals of different

ancestries – spurious associa1on; e.g. trait = ea1ng with chops1cks, data = a random sample of US popula1on.

Popula1on stra1fica1on es1mated from SNP data

Solu1on: MLM based associa1on analysis

•  y = Xb + Zu + e or y = Xb + g + e V = var(y) = Aσ2g + Iσ2e

•  Tes1ng for fixed effects given sample structure b-‐hat = (XTV-‐1X)-‐1XTy var(b-‐hat) = σ2e(XTV-‐1X)-‐1

•  Issue: a SNP is fi}ed twice.

Excluding the SNP from calcula1ng the gene1c rela1onship matrix

So^ware tool h_p://gump.qimr.edu.au/gcta/

Complex Traits Genomics Group (UQ) •  Peter Visscher •  Naomi Wray •  Hong Lee University of Melbourne •  Mike Goddard QIMR cohort •  Nick Mar1n •  Grant Montgomery

GENEVA Consor8um •  Teri Manolio •  Bruce Weir dbGaP

30

Acknowledgements

The Australian NeurogeneBcs Conference

at the Queensland Brain InsBtute (QBI), The

University of Queensland, on September 11th and 12th, 2014

h_p://web.qbi.uq.edu.au/anc2014/

jian yang - mixed linear model analyses of human complex traits using snp data

Science

e b1hat

eectsizesregressioncoecients

linearmodel y

independentvariable

linearmodel inmatrixform

yvarx1 varb1hat

xty varbhat

nx1 es1ma1on bhat