jian yang - mixed linear model analyses of human complex traits using snp data

31
Mixed linear model analyses of human complex traits using SNP data Jian Yang Queensland Brain Ins1tute The University of Queensland 1

Upload: australian-bioinformatics-network

Post on 10-May-2015

1.890 views

Category:

Science


5 download

DESCRIPTION

Most traits and common diseases in humans, such as height, cognitive ability, psychiatric disorders and obesity, are influenced by many genes and their interplay with environmental factors. These diseases/traits are called “complex” traits to differentiate them from “Mendelian” traits that are caused by single genes. Understanding the genetic architecture of human complex traits, e.g. how much of the difference between people’s susceptibilities to diseases are accounted for by their difference in DNA sequence, how many genes are involved in the etiology of diseases, where the genes are located and how much effects of the genes are on the disease risks, is essential to diagnosis, discovery of new drug targets and prevention. To date, thousands gene loci as represented by single nucleotide polymorphisms (SNPs) have been identified to be associated with hundreds of human complex traits by the genome‐wide association study (GWAS) technique. In this lecture, I will be introducing the use of mixed linear model in the analyses of GWAS data, to estimate the proportion of variance for a trait that can be explained by all SNPs (or called SNP heritability), to quantify the extent to which two traits (or diseases) share a common genetic basis (genetic correlation) using all SNPs, and to control for population structure in genome‐wide association analyses of individuals SNPs. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

TRANSCRIPT

Page 1: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Mixed  linear  model  analyses  of  human  complex  traits  using  SNP  

data    

Jian  Yang  Queensland  Brain  Ins1tute  

The  University  of  Queensland  

1  

Page 2: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Why  do  we  need  a  mixed  linear  model?  

Page 3: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Linear  model  •  y  =  b0  +  x1b1  +  x2b2  +  …  +  xpbp  +  e    y  =  phenotype    xi  =  independent  variable    y  ~  N(b0  +  x1b1  +  x2b2  +  …  +  xpbp,  σ2e)    b0  =  mean  term  b1  …  bp  =  effect  sizes  (regression  coefficients)    e  =    residual,  e  ~  N(0,  σ2e)    

Page 4: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Linear  model  

•  In  matrix  form  y  =  Xb  +  e  y  =  {yj}n  x  1;  X  =  {Xij}n  x  p;  b  =  {bi}p  x  1;  e  =  {ej}n  x  1    •  Es1ma1on  b-­‐hat  =  (XTX)-­‐1XTy  var(b-­‐hat)  =  σ2e(XTX)-­‐1  

Page 5: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Special  cases  

•  Simple  regression  y  =  b0  +  x1b1  +  e  b1-­‐hat  =  b-­‐hat  =  X1

Ty  /  (X1TX1)  

E(b1-­‐hat)  =  b1  =  cov(x1,  y)  /  var(x1)    var(b1-­‐hat)  =  σ2e  /  [n*var(x1)]  

•  Condi1onal  analysis  y  |  b2  …  bp  =  b0  +  x1b1  +  e  

 

Page 6: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Limita1ons  

•  n  >  p:  sample  size  needs  to  be  >  than  the  number  of  parameters  

•  All  the  effect  sizes  are  treated  as  fixed  (we  have  no  idea  about  the  varia1on  in  effect  sizes)  

•  What  if  n  <<  p?  

Page 7: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

What  is  a  mixed  linear  model  (MLM)?  

•  y  =  Xb  +  Zu  +  e  

Fixed  effects:  b  (special  case:  X  =  1  and  b  =  b0)  Random  effects:  u  =  {ui},  u  ~  N(0,  σ2uA)  A  =  correla1on  matrix  between  ui  and  uj  E(y)  =  Xb  var(y)  =  V  =  ZAZTσ2u  +  Iσ2e  

Page 8: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Parameter  es1ma1on  •  Es1ma1on  of  variance  components  (σ2u)      logL  =  -­‐1/2(log|V|  +  log|XTV-­‐1X|  +  yTPy    P  =  V-­‐1  -­‐  V-­‐1X(XTV-­‐1X)-­‐1XTV-­‐1  

 

•  Predic1on  of  random  effects  (u)      u-­‐hat  =  σ2u-­‐hat  ZTPy  

 •  Es1ma1on  of  fixed  effects  (b)    b-­‐hat  =  (XTV-­‐1X)-­‐1XTy  

         Linear  model:  b-­‐hat  =  (XTX)-­‐1XTy    

Page 9: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

MLM  analysis  of  human  complex  traits  

•  Animal  and  plant  breeding  –  predic1ng  breeding  values  –  linkage  mapping  (QTL  mapping)  

•  Human  gene1cs  (before  2007)  –  pedigree  based    analysis  of  variance  (heritability)  –  linkage  mapping  

•  Human  gene1cs  (aker  2007)  –  esBmaBng  SNP-­‐based  heritability  –  associa1on  analysis  –  gene1c  risk  predic1on  

Page 10: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Background  

Page 11: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Mendelian  traits   Complex  traits  

Cys1c  fibrosis  Human  height  

Schizophrenia  

Obesity  

Page 12: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Major  ques1ons  

•  Are  these  traits  heritable?  

•  If  so,  what  is  the  heritability?  

•  How  many  genes  involved  and  where  are  they  located?    

Page 13: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Risk  of  schizophrenia  (%)  

13  

Resemblance  between  twins  for  human  height  

Heritability  =  ~80%  

Heritability  =  ~80%  

Heritability  =  40%~60%  Resemblance  between  relaBves  for  body  mass  index  (BMI)  

Relatedness  CorrelaBon  Full-­‐sibs      0.36  Father-­‐son    0.28  

Complex  traits  such  as  height,  BMI  and  SCZ  are  highly  heritable.  

Page 14: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

14  8  genes  for  human  complex  traits  before  2002  

Glazier  et  al.  2002  Science  

IdenBfying  genes  underlying  complex  traits  

1700  

30  

8  

Page 15: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

15  

Genome-­‐wide  AssociaBon  Study  (GWAS)  

Manolio  2010  NEJM  

Genome-­‐wide  threshold  P  =  5×10-­‐8  

Linear  model  (simple  regression)  y  =  b0  +  x1b1  +  e  y  =  trait  value  x1  =  SNP  genotype  (0,  1  or  2)  b1-­‐hat  =  X1

Ty  /  (X1TX1)  =  cov(x1,y)  /  var(x1)  

SE2(b1-­‐hat)  =  σ2e  /  [n  var(x1)]  

Page 16: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

16  

An  explosion  of  gene  discoveries  

~5000  geneBc  variants  associated  with  ~650  traits  /  diseases  

Glazier  et  al.  2002  Science  

Prior  to  GWAS   GWAS  

0"

1000"

2000"

3000"

4000"

5000"

6000"

2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013"first"half"

Num

ber'o

f'SNPs'

Year'

Page 17: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

                     

                     

Height:  •  180  loci    •  ~180K  samples  •  <  10%  of  variance  explained  •  heritability  =  ~80%  

17  

Schizophrenia:  •  22  loci    •  ~21K  cases  /  ~38K  controls    •  <  3%  of  variance  explained  •  heritability  =  ~80%  

The  missing  heritability  problem  

BMI:  •  32  loci  •  ~250K  samples  •  ~1%  of  variance  explained    •  Heritability  =  40%  ~  60%  

Lango  Allen  et  al.  2010  Nature  

Speliotes  et  al.  2010  Nat  Genet  

Ripke  et  al.  2013  Nat  Genet  

Page 18: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Fiwng  all  SNPs  in  a  MLM  •  y  =  Wu  +  e  

 W  =  {wij}n  x  m,  wij  =  standardised  SNP  genotype    u  ~  N(0,  Iσ2u)    var(y)  =  ZZTσ2u  +  Iσ2e    variance  explained  =  mσ2u    /  (mσ2u    +  σ2e)  

   •  Let  g  =  Zu  

 y  =  g  +  e    g  ~  N(0,  Aσ2g),  A  =  gene1c  rela1onship  matrix    var(y)  =  Aσ2g  +  Iσ2e    variance  explained  =  σ2g    /  (σ2g    +  σ2e)  

 •  var(y)  =  (1/m)ZZT(mσ2u)  +  Iσ2e    

 A  =  ZZT  /  m  

Page 19: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Family  studies:  comparing  phenotypic  similarity  to  family  relatedness    –  Our  method:  comparing  phenotypic  similarity  to  gene8c  similarity  (es8mated  from  SNPs)  in  unrelated  individuals    GWAS:  tes8ng  a  SNP  at  a  8me  in  unrelated  samples    –  Our  method:  Es8ma8ng  the  contribu8on  from  all  SNPs  together  

19  

~50%  of  variaBon  explained  by  all  SNPs  for  height  vs.  ~10%  from  GWAS  

Reconciling  family  studies  and  GWAS  

Page 20: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

20  

GWAS    vs  All-­‐SNP  esBmaBon  

Yang  et  al.  2011  Nat  Genet  

Lee  et  al.  2012  Nat  Genet  

Yang  et  al.  2010  Nat  Genet  

Many  geneBc  variants  each  with  a  small  effect  contribuBng  to  the  trait  variaBon  

0%  10%  20%  30%  40%  50%  

Height  

Schizophrenia  

Obesity  (BMI)  GWAS  

Our  method  

Page 21: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Genome  par11oning  

•  Single  component  MLM  y  =  g  +  e  (or  y  =  Wu  +  e)    

•  Mul1-­‐component  MLM  y  =  g1  +  g2  +  …  +  g22  +  e  

 var(y)  =  A1σ2g1  +  A2σ2g2  +  …  +  A22σ2g22  +  Iσ2e  

Page 22: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

1"

2"

3"4"

5"

6"

7"

8"

9"

10"

11"12"13"

14"15"

16"

17"

18"

19"

20"

21"22"

0.00"

0.01"

0.02"

0.03"

0" 50" 100" 150" 200" 250" 300"

Heritab

ility*

Chromosome*length*

22  

Yang  et  al.  2011  Nat  Genet   Lee  et  al.  2012  Nat  Genet  Yang  et  al.  unpublished  

1

2

3

4

5

6

789

101112

13

14

15

16

17

1819

20

21

22

0

0.01

0.02

0.03

0.04

0.05

0.06

0 50 100 150 200 250

Heritability

Chromosome  length  (Mb)

12

3

45

6

7

89

1011

1213

141516

17

18

19

202122

0

0.005

0.01

0.015

0.02

0.025

0 50 100 150 200 250

Heritability

Chromosome  length  (Mb)

~12,000  individuals   9000  cases    12,000  controls    

Schizophrenia  Height   BMI  

~25,000  individuals  

ParBBoning  the  geneBc  variance  into  individual  chromosomes  

GeneBc  variants  distributed  across  the  whole  genome  

Page 23: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

ParBBoning  the  geneBc  variance  based  on  funcBonal  annotaBon    

23  

GeneBc  signals  are  enriched  in  or  close  to  funcBonal  genes  

Yang  et  al.  2011  Nat  Genet   Lee  et  al.  2012  Nat  Genet  

Schizophrenia  

30%  

35%  

35%   CNS+  genes  

intergenic  

Other  genes  83%  

17%  

Height  

68%  

32%   Genic  es1mate  

Intergenic  es1mate  

BMI  

Page 24: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

More  …  

•  Bivariate  analysis  –  es1ma1ng  the  gene1c  correla1on  between  two  traits  or  two  diseases  using  SNP  data  (Deary  et  al.  2012  Nature;  Lee  et  al.  2013  Nat  Genet)  

•  Fiwng  a  mixture  distribu1on  rather  than  a  single  normal  distribu1on  to  the  random  effects  –  e.g.  Zhou  et  al.  2013  PLoS  Genet  

 

Page 25: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

25  

Linear  model  (simple  regression  based  associaBon  test)  

y  =  b0  +  x1b1  +  e  y  =  trait  value;  x1  =  SNP  genotype  (0,  1  or  2)    b1-­‐hat  =  X1

Ty  /  (X1TX1)  =  cov(x1,y)  /  var(x1)  

SE(b1-­‐hat)  =  σ2e  /  [n  var(x1)]    Assump1on:  e  is  independent  and  iden1cally  distributed    Issues:  1)  Relatedness:  there  are  rela1ves  in  the  sample  –  

inflated  false  posi1ve  rate  2)  Popula1on  stra1fica1on:  individuals  of  different  

ancestries  –  spurious  associa1on;  e.g.  trait  =  ea1ng  with  chops1cks,  data  =  a  random  sample  of  US  popula1on.  

Page 26: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Popula1on  stra1fica1on  es1mated  from  SNP  data  

 

Page 27: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Solu1on:  MLM  based  associa1on  analysis  

•  y  =  Xb  +  Zu  +  e  or  y  =  Xb  +  g  +  e      V  =  var(y)  =  Aσ2g  +  Iσ2e  

 •  Tes1ng  for  fixed  effects  given  sample  structure    b-­‐hat  =  (XTV-­‐1X)-­‐1XTy    var(b-­‐hat)  =  σ2e(XTV-­‐1X)-­‐1  

•  Issue:  a  SNP  is  fi}ed  twice.      

Page 28: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Excluding  the  SNP  from  calcula1ng  the  gene1c  rela1onship  matrix    

Page 29: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

So^ware  tool  h_p://gump.qimr.edu.au/gcta/  

Page 30: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

Complex  Traits  Genomics  Group  (UQ)  •  Peter  Visscher  •  Naomi  Wray  •  Hong  Lee    University  of  Melbourne  •  Mike  Goddard    QIMR  cohort  •  Nick  Mar1n  •  Grant  Montgomery  

GENEVA  Consor8um  •  Teri  Manolio  •  Bruce  Weir    dbGaP  

30  

Acknowledgements  

Page 31: Jian Yang - Mixed linear model analyses of human complex traits using SNP data

The  Australian  NeurogeneBcs  Conference    

 at  the  Queensland  Brain  InsBtute  (QBI),  The  

University  of  Queensland,  on  September  11th  and  12th,  2014  

 h_p://web.qbi.uq.edu.au/anc2014/