genome-wide association studies john s. witte. association studies hirschhorn & daly, nat rev...

Post on 22-Dec-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Genome-wide Association Studies

John S. Witte

Association Studies

Hirschhorn & Daly, Nat Rev Genet 2005

Candidate Gene or GWAS

Affymetrix Array

Genome-wide Association Studies

Altshuler & Clark, Science 2005

Genome-wide Assocation Studies (GWAS)

GWAS+ Strategy

Clarification:Sequencing+

Confirmation /Characterization:

Follow-upGenotyping+

Discovery:Multi-stage

GWAS+

# Markers # Samples

Tim

e

GWAS+ Strategy

Clarification:Sequencing+

Confirmation /Characterization:

Follow-upGenotyping+

Discovery:Multi-stage

GWAS+

# Markers # Samples

Tim

e

1,2,

3,…

……

……

……

……

,N

1,2,3,……………………………,M

SNPs

Sam

ples

One-Stage DesignOne-Stage Design

Stage 1

Sta

ge 2

samples

markers

Two-Stage DesignTwo-Stage Design

1,2,3,……………………………,M

SNPs

Sam

ples

1,2,

3,…

……

……

……

……

,N

One- and Two-Stage GWA DesignsOne- and Two-Stage GWA Designs

SNPs

Sam

ples

Replication-based analysisSNPs

Sam

ples

Stage 1

Stag

e 2

One-Stage DesignOne-Stage Design

Joint analysisSNPs

Sam

ples

Stage 1

Stag

e 2

Two-Stage DesignTwo-Stage Design

Multistage Designs

• Joint analysis has more power than replication

• p-value in Stage 1 must be liberal

• Lower cost—do not gain power

• http://www.sph.umich.edu/csg/abecasis/CaTS/index.html

QC Steps

• Filter SNPs and Individuals– MAF, Low call rates

• Test for HWE among controls & within ethnic groups. Use conservative alpha-level

• Check for relatedness. Identity-by-state calculations.

Analysis of GWAS

• Most common approach: look at each SNP one-at-a-time.• Possibly add in multi-marker information.• Further investigate / report top SNPs only.• Or backwards replication…

P-values

GWAS Analysis

• Most commonly trend test.• Log additive model, logistic regression.• Adjust for potential population stratification.

Quantile-Quantile (QQ) PlotQuantile-Quantile (QQ) Plot

http://cgems.cancer.govchromosome

Example: GWAS of Prostate Cancer

Witte, Nat Genet 2007

0

5

10

15

20

25

30

128.10 128.20 128.30 128.40 128.50 128.60 128.70

Position on 8q24 (Mb)

-lo

g(p

-va

lue

)

Gudmundsson et al.

Haiman et al.

Yeager et al.

Combined (adjusted)

rs6983267

rs1447295

rs16901979

Region 1Region 2

Region 3

Multiple prostate cancer loci on 8q24

Locus A Freq Association

Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn

2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking

3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic

6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.

7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking

8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic

8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic

8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic

10q11 rs10993994 C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.

10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity

11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic

17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties

17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic

19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA

Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis

Prostate Cancer Replications

Witte, Nat Rev Genet 2009Modest ORs

Locus A Freq Association

Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn

2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking

3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic

6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.

7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking

8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic

8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic

8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic

10q11 rs10993994 C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.

10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity

11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic

17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties

17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic

19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA

Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis

Prostate Cancer Replications

Witte, Nat Rev Genet 2009Modest ORs

Locus A Freq Association

Chr Reg SNP Cntrl Case OR p value Nearby Genes / Fcn

2p15 rs721048 G/A 0.19 0.21 1.15 7.7x10-9 EHBP1: endocytic trafficking

3p12 rs2660753 C/T 0.10 0.12 1.30 2.7x10-8 Intergenic

6q25 rs9364554 C/T 0.29 0.33 1.21 5.5x10-10 SLC22A3: drugs and toxins.

7q21 rs6465657 T/C 0.46 0.50 1.19 1.1x10-9 LMTK2: endosomal trafficking

8q24 (2) rs16901979 C/A 0.04 0.06 1.52 1.1x10-12 Intergenic

8q24 (3) rs6983267 T/G 0.50 0.56 1.25 9.4x10-13 Intergenic

8q24 (1) rs1447295 C/A 0.10 0.14 1.42 6.4x10-18 Intergenic

10q11 rs10993994

C/T 0.38 0.46 1.38 8.7x10-29 MSMB: suppressor prop.

10q26 rs4962416 T/C 0.27 0.32 1.18 2.7x10-8 CTBP2: antiapoptotic activity

11q13 rs7931342 T/G 0.51 0.56 1.21 1.7x10-12 Intergenic

17q12 rs4430796 G/A 0.49 0.55 1.22 1.4x10-11 HNF1B: suppressor properties

17q24 rs1859962 T/G 0.46 0.51 1.20 2.5x10-10 Intergenic

19q13 rs2735839 A/G 0.83 0.87 1.37 1.5x10-18 KLK2/KLK3: PSA

Xp11 rs5945619 T/C 0.36 0.41 1.29 1.5x10-9 NUDT10, NUDT11: apoptosis

SNPs Missed in Replication?

Witte, Nat Rev Genet, 2009

24,223 smallestP-value!

Manolio et al. Clin Invest 2008www.genome.gov/gwastudies

ProstateCancer

BRCA1

Smoking

0

0.25

0.5

0.75

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PA

R

Risk Allele Frequency

Population Attributable Risk and GWAs

MI

Type 2 Diabetes

Obesity

MI

Crohn's

Type 1 Diabetes

Prostate Cancer

Lung Cancer

Breast Cancer

Population Attributable Risks for GWAS

Jorgenson & Witte, 2009

Smoking & lung cancer

BRCA1 & Breast cancer

Limitations of GWAS• Not very predictive

Witte, Nat Rev Genet 2009

Example: AUC for Br Cancer Risk

Gail = 58%SNPs = 58.9%G + S = 61.8%

Wacholder et al. NEJM 2010

Limitations of GWAS

• Not very predictive • Explain little heritability• Focus on common variation• Many associated variants are not causal

Where’s the Heritability?

McCarthy et al., 2008

Many moreof these?

See: NEJM, April 30, 2009

Common disease rare variant (CDRV) hypothesis: diseases due tomultiple rare variants with intermediate penetrances (allelic heterogeneity)

Will GWAS results explain more heritability?

• Possibly, if…1. Causal SNPs not yet detected due to power /

practical issues (e.g., not yet included in replication studies).

2. Stronger effects for causal SNPs:Associated SNP may only serve as a

marker for multiple different causal SNPs.

Imputation of SNP Genotypes

• Estimate unmeasured or missing genotypes.• Based on measured SNPs and external info (e.g.,

haplotype structure of HapMap).• Increase GWAS power.• Allow for combining data across different platforms

(e.g., Affy & Illumina) (for replication / meta-analysis).

Imputation Example

Observed Genotypes

. . . . A . . . . . . . A . . . . A . . .

. . . . G . . . . . . . C . . . . A . . .

Reference Haplotypes

C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C

Study Sample

HapMap/1K genomes

Gonçalo Abecasis

Identify Match with Reference

Observed Genotypes

. . . . A . . . . . . . A . . . . A . . .

. . . . G . . . . . . . C . . . . A . . .

Reference Haplotypes

C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C

Gonçalo Abecasis

Phase chromosomes, impute missing genotypes

Observed Genotypes

c g a g A t c t c c c g A c c t c A t g gc g a a G c t c t t t t C t t t c A t g g

Reference Haplotypes

C G A G A T C T C C T T C T T C T G T G CC G A G A T C T C C C G A C C T C A T G GC C A A G C T C T T T T C T T C T G T G CC G A A G C T C T T T T C T T C T G T G CC G A G A C T C T C C G A C C T T A T G CT G G G A T C T C C C G A C C T C A T G GC G A G A T C T C C C G A C C T T G T G CC G A G A C T C T T T T C T T T T G T A CC G A G A C T C T C C G A C C T C G T G CC G A A G C T C T T T T C T T C T G T G C

Gonçalo Abecasishttp://www.sph.umich.edu/csg/abecasis/MACH

Imputation Application

Chromosomal PositionMarchini Nature Genetics2007http://www.stats.ox.ac.uk/~marchini/#software

TCF7L2 gene region & T2D from the WTCCC data

Observed genotypes blackImputed genotypes red.

Genome-wide Sequence Studies

• Trade off between number of samples, depth, and genomic coverage.

MAF

Sample Size Depth 0.5-1% 2-5%

1,000 20x perfect perfect

2,000 10x r2=0.98 r2=0.995

4,000 5x r2=0.90 r2=0.98

Goncalo Abecasis

Near-term Design Choices

• For example, between:1. Sequencing few subjects with extreme

phenotypes: • e.g., 200 cases, 200 controls, 4x coverage. Then follow-

up in larger population.

2. 10M SNP chip based on 1,000 genomes. • 5K cases, 5K controls.

• Which design will work best…?

m

SNPORx

m

iiji

j

1

)ln(

• Many weak associations combine to risk?• Score model:

where – ln(ORi ) = ‘score’ for SNPi from ‘discovery’ sample– SNPij = # of alleles (0,1,2) for SNPi, person j in ‘validation’

sample.– Large number of SNPs (m)

• xj associated with disease?

Polygenic Models

ISC / Purcell et al. Nature 2009

Purcell / ISC et al. Nature 2009

Application of Model

Application to CGEMs PCa GWAS

• 1,172 cases, 1,157 controls from PLCO Trial• Oversampled more aggressive cases.• Illumina 550K array.

• PCa & stratified by disease aggressiveness.• Split into halves, resampling:

– one as ‘discovery’ sample;– other as ‘validation’.

• LD filter: r2 = 0.5.

Witte & Hoffman 2010

Results for Prostate Cancer

Nat Rev Cancer 2010;10:205-212

Common Polygenic Model for Prostate and Breast Cancer?

- CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.

Results for PCa & BrCa

Complex diseases

Diabetes

Obesity

Diet

Physical activity

Hypertension

Hyperlipidemia

Vulnerable plaques

Atherosclerosis MI

Genetic susceptibility

Complex diseases: Many causes = many causal pathways!

Pathways

• Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways.

• Example: BioCarta: http://www.biocarta.com/

• May be interested in potential joint and/or interaction effects of multiple genes in one pathway.

Moving Beyond Genome

Transcriptome: All messenger RNA molecules (‘transcripts’)

Proteome:All proteins in cell or organism

Metabolome:all metabolites in a biological organism (end products of its gene expression).

Syst

ems

Biol

ogy

top related