genome-wide association studies john s. witte. association studies hirschhorn & daly, nat rev...

39
Genome-wide Association Studies John S. Witte

Upload: lesley-stinchcomb

Post on 22-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Genome-wide Association Studies

John S. Witte

Page 2: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Association Studies

Hirschhorn & Daly, Nat Rev Genet 2005

Candidate Gene or GWAS

Page 3: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Affymetrix Array

Genome-wide Association Studies

Altshuler & Clark, Science 2005

Page 4: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Genome-wide Assocation Studies (GWAS)

Page 5: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

GWAS+ Strategy

Clarification:Sequencing+

Confirmation /Characterization:

Follow-upGenotyping+

Discovery:Multi-stage

GWAS+

# Markers # Samples

Tim

e

Page 6: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

GWAS+ Strategy

Clarification:Sequencing+

Confirmation /Characterization:

Follow-upGenotyping+

Discovery:Multi-stage

GWAS+

# Markers # Samples

Tim

e

Page 7: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

1,2,

3,…

……

……

……

……

,N

1,2,3,……………………………,M

SNPs

Sam

ples

One-Stage DesignOne-Stage Design

Stage 1

Sta

ge 2

samples

markers

Two-Stage DesignTwo-Stage Design

1,2,3,……………………………,M

SNPs

Sam

ples

1,2,

3,…

……

……

……

……

,N

One- and Two-Stage GWA DesignsOne- and Two-Stage GWA Designs

Page 8: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

SNPs

Sam

ples

Replication-based analysisSNPs

Sam

ples

Stage 1

Stag

e 2

One-Stage DesignOne-Stage Design

Joint analysisSNPs

Sam

ples

Stage 1

Stag

e 2

Two-Stage DesignTwo-Stage Design

Page 9: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Multistage Designs

• Joint analysis has more power than replication

• p-value in Stage 1 must be liberal

• Lower cost—do not gain power

• http://www.sph.umich.edu/csg/abecasis/CaTS/index.html

Page 10: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

QC Steps

• Filter SNPs and Individuals– MAF, Low call rates

• Test for HWE among controls & within ethnic groups. Use conservative alpha-level

• Check for relatedness. Identity-by-state calculations.

Page 11: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Analysis of GWAS

• Most common approach: look at each SNP one-at-a-time.• Possibly add in multi-marker information.• Further investigate / report top SNPs only.• Or backwards replication…

P-values

Page 12: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

GWAS Analysis

• Most commonly trend test.• Log additive model, logistic regression.• Adjust for potential population stratification.

Page 13: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Quantile-Quantile (QQ) PlotQuantile-Quantile (QQ) Plot

Page 14: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

http://cgems.cancer.govchromosome

Example: GWAS of Prostate Cancer

Witte, Nat Genet 2007

0

5

10

15

20

25

30

128.10 128.20 128.30 128.40 128.50 128.60 128.70

Position on 8q24 (Mb)

-lo

g(p

-va

lue

)

Gudmundsson et al.

Haiman et al.

Yeager et al.

Combined (adjusted)

rs6983267

rs1447295

rs16901979

Region 1Region 2

Region 3

Multiple prostate cancer loci on 8q24

Page 15: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Locus A Freq Association

Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn

2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking

3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic

6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.

7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking

8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic

8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic

8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic

10q11 rs10993994 C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.

10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity

11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic

17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties

17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic

19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA

Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis

Prostate Cancer Replications

Witte, Nat Rev Genet 2009Modest ORs

Page 16: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Locus A Freq Association

Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn

2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking

3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic

6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.

7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking

8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic

8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic

8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic

10q11 rs10993994 C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.

10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity

11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic

17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties

17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic

19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA

Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis

Prostate Cancer Replications

Witte, Nat Rev Genet 2009Modest ORs

Page 17: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Locus A Freq Association

Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn

2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking

3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic

6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.

7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking

8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic

8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic

8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic

10q11 rs10993994

C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.

10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity

11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic

17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties

17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic

19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA

Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis

SNPs Missed in Replication?

Witte, Nat Rev Genet, 2009

24,223 smallestP-value!

Page 18: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Manolio et al. Clin Invest 2008www.genome.gov/gwastudies

ProstateCancer

Page 19: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

BRCA1

Smoking

0

0.25

0.5

0.75

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PA

R

Risk Allele Frequency

Population Attributable Risk and GWAs

MI

Type 2 Diabetes

Obesity

MI

Crohn's

Type 1 Diabetes

Prostate Cancer

Lung Cancer

Breast Cancer

Population Attributable Risks for GWAS

Jorgenson & Witte, 2009

Smoking & lung cancer

BRCA1 & Breast cancer

Page 20: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Limitations of GWAS• Not very predictive

Witte, Nat Rev Genet 2009

Example: AUC for Br Cancer Risk

Gail = 58%SNPs = 58.9%G + S = 61.8%

Wacholder et al. NEJM 2010

Page 21: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Limitations of GWAS

• Not very predictive • Explain little heritability• Focus on common variation• Many associated variants are not causal

Page 22: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Where’s the Heritability?

McCarthy et al., 2008

Many moreof these?

See: NEJM, April 30, 2009

Common disease rare variant (CDRV) hypothesis: diseases due tomultiple rare variants with intermediate penetrances (allelic heterogeneity)

Page 23: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Will GWAS results explain more heritability?

• Possibly, if…1. Causal SNPs not yet detected due to power /

practical issues (e.g., not yet included in replication studies).

2. Stronger effects for causal SNPs:Associated SNP may only serve as a

marker for multiple different causal SNPs.

Page 24: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Imputation of SNP Genotypes

• Estimate unmeasured or missing genotypes.• Based on measured SNPs and external info (e.g.,

haplotype structure of HapMap).• Increase GWAS power.• Allow for combining data across different platforms

(e.g., Affy & Illumina) (for replication / meta-analysis).

Page 25: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Imputation Example

Observed Genotypes

. . . . A . . . . . . . A . . . . A . . .

. . . . G . . . . . . . C . . . . A . . .

Reference Haplotypes

C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C

Study Sample

HapMap/1K genomes

Gonçalo Abecasis

Page 26: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Identify Match with Reference

Observed Genotypes

. . . . A . . . . . . . A . . . . A . . .

. . . . G . . . . . . . C . . . . A . . .

Reference Haplotypes

C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C

Gonçalo Abecasis

Page 27: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Phase chromosomes, impute missing genotypes

Observed Genotypes

c g a g A t c t c c c g A c c t c A t g gc g a a G c t c t t t t C t t t c A t g g

Reference Haplotypes

C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C

Gonçalo Abecasishttp://www.sph.umich.edu/csg/abecasis/MACH

Page 28: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Imputation Application

Chromosomal PositionMarchini Nature Genetics2007http://www.stats.ox.ac.uk/~marchini/#software

TCF7L2 gene region & T2D from the WTCCC data

Observed genotypes blackImputed genotypes red.

Page 29: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Genome-wide Sequence Studies

• Trade off between number of samples, depth, and genomic coverage.

MAF

Sample Size Depth 0.5-1% 2-5%

1,000 20x perfect perfect

2,000 10x r2=0.98 r2=0.995

4,000 5x r2=0.90 r2=0.98

Goncalo Abecasis

Page 30: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Near-term Design Choices

• For example, between:1. Sequencing few subjects with extreme

phenotypes: • e.g., 200 cases, 200 controls, 4x coverage. Then follow-

up in larger population.

2. 10M SNP chip based on 1,000 genomes. • 5K cases, 5K controls.

• Which design will work best…?

Page 31: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

m

SNPORx

m

iiji

j

1

)ln(

• Many weak associations combine to risk?• Score model:

where – ln(ORi ) = ‘score’ for SNPi from ‘discovery’ sample– SNPij = # of alleles (0,1,2) for SNPi, person j in ‘validation’

sample.– Large number of SNPs (m)

• xj associated with disease?

Polygenic Models

ISC / Purcell et al. Nature 2009

Page 32: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Purcell / ISC et al. Nature 2009

Application of Model

Page 33: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Application to CGEMs PCa GWAS

• 1,172 cases, 1,157 controls from PLCO Trial• Oversampled more aggressive cases.• Illumina 550K array.

• PCa & stratified by disease aggressiveness.• Split into halves, resampling:

– one as ‘discovery’ sample;– other as ‘validation’.

• LD filter: r2 = 0.5.

Witte & Hoffman 2010

Page 34: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Results for Prostate Cancer

Page 35: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Nat Rev Cancer 2010;10:205-212

Common Polygenic Model for Prostate and Breast Cancer?

- CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.

Page 36: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Results for PCa & BrCa

Page 37: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Complex diseases

Diabetes

Obesity

Diet

Physical activity

Hypertension

Hyperlipidemia

Vulnerable plaques

Atherosclerosis MI

Genetic susceptibility

Complex diseases: Many causes = many causal pathways!

Page 38: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Pathways

• Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways.

• Example: BioCarta: http://www.biocarta.com/

• May be interested in potential joint and/or interaction effects of multiple genes in one pathway.

Page 39: Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

Moving Beyond Genome

Transcriptome: All messenger RNA molecules (‘transcripts’)

Proteome:All proteins in cell or organism

Metabolome:all metabolites in a biological organism (end products of its gene expression).

Syst

ems

Biol

ogy