epi519 gwas talk
DESCRIPTION
A lecture for UW EPI 519 providing background for genome-wide association studies, a few examples of recent papers in the CVD GWAS literature, and some lessons and new directions. The talk was originally given in 2008 (in collaboration with a colleagure), this version has been updated slightly for 2010 and includes references for further reading. Some of the typefaces may have been mangled on conversion; the file download should be more reliable.TRANSCRIPT
Genome-wide Association
Studies
EPI 519 21 October 2010
Joshua C. Bis, PhD University of Washington, Cardiovascular
Health Research Unit The Type 1 Diabetes Genetics Consortium. Nature Genetics, 2009 May 10
Complex phenotypes
Manolio et al. J. Clin. Invest. 118:1590-1605 (2008).
rationale for association studies
Balding. Nature Reviews Genetics. 2006; 7:781-791
candidate genes
Manolio, Boerwinkle, O’Donnell, Wilson. Arterioscler Thromb Vasc Biol. 2004;24:1567-1577.
Trait Gene Polymorphism Frequency Deep Vein Thrombosis
F5 Arg506Gln 0.015
Graves’ disease CTLA4 Thr17Ala 0.62 Type 1 diabetes INS 5’VNTR 0.67 HIV infection CCR5 32 bp Ins/Del 0.05-0.07 Alzheimer’s disease APOE Epsilon 2/3/4 0.16-0.24 Creutzfelt-Jakob PRNP Met129Val 0.37
highly consistent associations*
Hirschhorn: Genet Med, Volume 4(2).March/April 2002.45-61
* Associations between polymorphisms and disease where at least 75% of identified studies achieved statistical significance. (out of 600 gene–disease studies reviewed)
“genomics” The field within genetics concerned with the structure and function of the entire DNA sequence of an individual or population.
-- Thomas Roderick McDonald’s Raw Bar
1986
genome-wide association study “… a study of common genetic variation across the entire human genome designed to identify genetic associations with observable traits.”
-- National Institutes of Health, “Policy for sharing of data obtained in
NIH-sponsored or conducted GWAS”
“A major strength of the genome-wide approach … has been its freedom from reliance on prior knowledge.”
-- “A HapMap harvest of insights into the genetics of common disease”
(Manolio, Brooks, Collins.)
genome-wide publication epidemic
genome.gov/GWAStudies :: 2 November 2009
$1.00
$0.10
$0.01
1 10 100 103 104 105 106
2001 2005
ABI TaqMan
ABI SNPplex
Illumina GoldenGate
Affymetrix 10K
Affymetrix MegAllele
Illumina Infinium/Sentrix
Perlegen Affymetrix 100k/500K
# SNPs
Illumina 2.5M
Costs per Genotype
S. Chanock, NCI
Modified from http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helpsnpfaq
haplotypes
The International HapMap Consortium. Nature | Vol 437 | 27October2005
“… to create a public, genome-wide database of common human sequence variation, providing information needed as a guide to genetic studies of clinical phenotypes.”
-- October 2002
Ben Fry, for Genome Research. November 2005
imputation
Use patterns of variation from HapMap to impute genotypes. Increases power by allowing for association testing at untyped markers and allows comparisons across studies and platforms by using a common set of SNPs.
Li, Willer, Sanna, Abecasis. Annu Rev Genomics Hum Genet. 2009;10:387-406
(SOME PRACTICAL CONSIDERATIONS)
genotyping
genotyping
Ratio of intensities from two channels
Calls = 2461 No calls = 27
genotyping: raw data
analysis × 2.5 million
association study CT TT CT CT TT CT CT CT CC CT CT CC CC CT TT CT CT TT CC CC CT CC TT CTCT CT CC CT CT TT CC CT TT CC CC CC CT CT CT CT CT CC CT CT CC CC CT CTCC CT CC CT TT CT CC CT CC CT CT TT CT CT TT CT CT TT CT CC CC CC CC CTCT CC CT CT TT CC CC TT CT CT CT TT CT CC CT CT TT CC CT TT CC CC CC CTTT CT CT CT TT CC CT CT CC CT CT CT CT CC CT CT CC CC CC CT TT CC CC CCCC TT CT CC TT TT CC CT CT CC CC CT CC CT CT CC CT TT CC CC CC CT CT CCCT CC CT CC CC CT CT CT CT TT CC CT CC CC CT CT CT CT CT CT TT CC CT CTCT CT CC CT CC CT CT CC CC CT CT CT CC CC CC TT CT TT CT CT TT CC TT TTCT CT CC TT CT CC CT CC CT CC CC CC CC CT CT CT CC CT CC CT CT CT CC CCCT CT CT CT TT CC CT CT CT CT CT CT TT CT CT CT CC TT CT CC CC TT TT TTCT TT CC TT TT CC TT CT TT TT CT TT CT CT CC TT TT CT CT CC CC CT TT CTTT CC CC CT TT CC CT CT CT CC CT CC TT TT CT CC CT CC CT CT CT CC CT CCCC CT CT CT CC CT CC TT CT CC CT CT CT CT CC CT CC TT CT TT TT TT CT CTCT CT CT CT CC CT CT CT CC CT TT TT CT CT CC TT CT CT CC TT CC TT CT CTCT CT TT CC CC CC CC CT CT TT CT CC CT CT CC CC CC CT TT TT TT CC CT CTCC CC CC TT TT CT CT TT CT CT CT CC CT CC CC CC CC CT CC CT TT CT CT CTCT CT CC CT CT CT CT CT TT TT TT TT CC CC CC CT TT CT CC CT CT CT TT CCCT CT CC CC CC CT CC TT CC CT TT TT CC CT CC TT CT CT CT CC CC CC CT CTCT TT CT TT CT CC CC CT CC CC CT TT CT CT CT CT CC CT CC TT CT CC CC CTCC CC CT CC CT TT CT CC CT CC TT CC TT CT CC CT CC CT CT CT CC CT CT TTTT CC CT TT TT CC CT CC CT CT CT TT TT CT CC TT CT CT CC CT CC TT CT CTCC CC CT CT CC CT TT CT CT CT CT CT CT TT CC CT CT TT TT CC CT TT CC TTTT TT CC CC CC TT TT CT CC CC CT CT CC CT CT TT TT CC CT CT TT TT CT TTCC CC TT TT TT TT TT CT CT CT CT CC CC CT CC CC TT CT CC CT CT CC CT CTCC CT CC CC TT CT CT CC CT CT CC CT CT CC CT CT TT CT CT CC CC TT CC CCCT CT CC CT CT CT CT TT TT CT CC CT CT CC CT CC TT CC TT CC CC TT CT CCCC CC CC TT CT CT CT CT CC CT CT CT CT CT CC TT CT CT CT TT CT CC CC TTCC TT CT CC CC CT CT CT TT CT CT CC TT CT CT CT CT CC CT CT CC CC CC CTTT CC CT CC TT CC CT CT CT CT CC CT TT CT TT CC CT CT TT CC CC CT TT TTCT TT TT CT TT CT CC CC CT CT CT CC CC CT CT CC TT CC CT CC CT CC TT CCCT CC CT CT TT CC CT TT CT CT CT CT CT CT CT CC CC CT CC CT TT CT CC CCCT CC TT CT CT CC CT CT CT CC CC CT CC CT CC CC CC TT TT CC CT CC CC TTCT CT CT CC CT CT CT CT CT CC CC TT TT CC CC CC TT TT CT CC CC TT CT TTTT CT CC CC CC CT CT TT CC TT TT CT CT CT CT CT TT CT CC CT CT CC CT CTCT CT CT CT CC CC CT TT CC TT CT CC CT CT CT CT CC CT CT TT CT
CC CT CT TT TT CT CC TT TT CC CC CC TT TT TT CT CC TT CT TT TT CT TT CT TT TT TT CT CT CTCT CC TT TT CC CT CT CC TT CT CT TT TT CC CT CT CC CC CC CT CT CT TT CT TT CT TT TT CC CCCT TT TT CT CT CT CT TT CT CT TT TT CC CT CC CT CT CC CT CC CT CC CT CT CC CC TT TT TT TTCT CT CT TT CT CT TT CT TT CC TT CC CC TT CC CT TT CT CC CT CT CT TT CT CT CT CT CT CT CCCT CT CT CC CC CT CC CT TT CT TT CT CT CC CT CT CT CT CT TT CC CT TT TT CC TT CC CT TT CTCT CC TT CT CT TT CT TT TT CT CT CT CT TT CT CT TT CC TT CT CT CT TT TT TT CT CT CT CT CTCT CT CC CC CC TT CC CT TT CT CC CT CT TT TT TT TT CT CC TT CT CT TT CT CT CC CC CC TT CCTT CT CT CT CT CC CT TT CC CC TT TT CC TT CT CT CT CT CC CT CT CC TT CT CT CT CT CC CT CTCT TT CC TT CT CT TT CT CC CT CT CT TT CC CC CC CT CT CC CC CC CT CT CT CT CC CT CT CT CCCT CC CT TT CC TT CT TT CT CC CT CC CT CC CT TT TT CT CT CT CT CT CC TT CC CC CT CT TT CTCT CT CT CT TT CT CT TT CT CT CT CC CC CT CT CC CC CC CT CT CT CT CT CT CC CT CT CC CT TTTT TT CT CC CT CT CT CC TT CT CC CT CC TT TT CT TT TT CT CT TT CT CT TT CT TT CC CT CT CCCT CT CC CT CT CC TT TT CC CT CC CT CT CT CT CT TT CC CC CT CC TT TT TT CC CT CC CC CC CTCC CT CT TT CT CT CT CT CT CC CT TT CC CT TT CT CC CT CT CT CC CC CT CT CT TT CT CT CT CTTT CT CT TT CT CC TT CT CC TT CT CC CC CT CT CC TT CT CC CC CT CT CT CT CC CT CT CC CT CTTT CT TT CT TT TT TT CC CT CT CC CT CT CT CT CC CT CT CT CC TT CT CC TT CT CT CT CC TT TTTT TT CT CT CT CT CT CT CT CT CT CT CT CT TT CT TT CT CT CC CT CT CT CT CC CT TT TT TT CTCC CT TT CT CT CT TT TT CC CT CC CT CC CT CT TT CC TT CC CT CT TT CC CT CC CT CT CC CT CCCT TT CT TT TT CT CC CT CT TT TT CT CC CC CC CT CT CT TT CT TT CT CT CT CT TT CT TT CC TTCT CT CC CC CC CT CC CT CC TT TT TT CC CC TT CT CT CT TT CC CT CT CC CC CT TT CT CT CT TTTT CT CT CT TT CT CT CT TT CT TT CC CT CT TT CT CT TT TT CC TT TT CC CT CT CT CC CC CT TTTT CT TT CT CT CT CT TT TT CT TT CT CT CC CT CT CT TT TT CT CC CT TT CC CT TT TT CT TT CTCT CC CT TT TT CT CT TT CT CC CT TT CT CT CT CT CT CT CC CT CT CC CT CC TT CT CC TT CT TTCT TT CC CT CT CC TT CT CC TT CC CT CT TT TT CT CT TT CT CC CC CT CT TT CT CT TT CC CC CTTT CT TT TT CT CC TT CC CT CT CC TT CT CC CT CC TT CC TT CC CT TT TT TT CT CC TT CC CC CCCT CT CC TT CC TT CT CT CC TT TT CT TT CT CC CT CC CT CT CT TT TT TT CC CT CC CC CC TT TTCT CT CT CT CT TT CT CC CT TT CT TT CC CC CT TT CT CT CC CT CT CC TT TT TT CT TT CC TT TTCC CC CC CT CT TT CT CT CC TT CT CT TT CT CT CT CC TT CT CT CC CT CC CC CT CC CT CT CT TTTT CC CC TT CT CC TT TT CT TT CT CC CT CT CT TT CC CC CC CC CT CT CT CC CT CT CT TT CT CTTT TT CT CC CT CC CC CC CT CT TT CT CT CC CC TT CC CT TT CT TT CT CC CT CT TT TT TT CT TTCC CT CT CC CT CT CC CC CT CT TT CC TT TT CT CT CT TT CT CT CT CC TT TT TT CT TT TT CT CTCC CC CC CT CT CT CT CT CC CT CC CT TT CC CT TT CT CT CT CT CT TT CT TT CT CT TT TT TT CCTT CC CC CC CT CT CC CT TT TT CT CT CC TT CT TT CT TT CT CC CC CC CT TT TT TT CT CT CC CTCT CT CC CC CT CC CC TT CT CC CT CC CT CT CC CC TT CT CC TT TT CT CT TT CT CT CC TT CC CTCT CC CT CT CC CT TT CT CT CC TT TT CT CT CT CT CC TT CT CT CC CT TT TT CC CC CT TT TT CCCT CT CC CC TT CT CT TT TT CT CT CT CT CT TT CT CT CT CC CT TT CT CT CC CT CC CT TT CT CCCC TT CC TT CT CC TT CT TT CT TT TT CT CT CT TT CC CT TT CT CC TT CT TT CC TT CT TT CT CTTT TT CC CC TT CC CC CC CC CC CT CT CT CC CT CT CC CT TT CT CT CT CT CT CT TT CT CC CT CTCC CC CT CT CT CT CT TT TT CT CT TT TT CT TT CC TT TT CT CC TT TT CT TT CC TT CC CT CT TTCT CC TT CT CC CT CT TT CC CT CT TT TT CT CT TT CT CC CT CT CC TT CT CC CT TT TT CC CT TTCT CC CT CT TT CC CT TT TT TT TT CT TT CT TT CT CC CT CC TT CT CT CT CT TT CT CT TT CT CTTT CT TT TT CT CT CC TT CC CT CT CC CT TT CT CT CT CT TT CT CT CT TT CC CC CC CT CC TT CCTT CT CC CT CT CC CC CC CT CT CC TT CT CT CT CT CT CT CT CT TT TT CT CC CC CT CT CT CT CCCT TT TT CT CT CT CC CT CT TT CT CT CT CT CT CC CT CT CT TT CC CC TT CT CC CT CT TT CT CCCT TT CC CT CC CT CT CT TT CC CC CT CT TT CT CT CT TT CT CC CT CT CT CT TT CT CT TT CT CCCT CT CC CT CC CT CT CT CT CT CT CT CT TT TT CC CT TT CC CT TT CC CC CT CC CC CT TT CT CCCT CT CC CC CT CT CT CT CT TT TT CT CT TT CC CC CT CT CC CT CC CT TT CC CT CT CC CC TT TTTT CT TT CT TT CT TT TT CT TT CC TT CT CT CC CC CC CC CT CT TT TT TT CC CT CC CT TT CT TTCC CT TT CT CT CC CC CT CT CT TT TT TT CT CC CT TT TT CT TT CC CT TT CC CT CC CT CT CT TTCC CC CT TT TT TT CT CT CC CT TT CT CT CT TT CT TT CC CT CC CC CT CT CC CT CC TT CT CC CTCT CC CC CT CC TT CT CT CT TT CT CT CC CT CT CC TT CC CT CT TT CT CT TT TT CT CT TT TT TTCT CT CT CC CT CT CC TT CT CT CC CC CT CC CT CT CT CT CT CC CC CC CT CT TT CT TT CT CT CTCT CT CT CT CT TT TT TT CT CC CT CC CT TT TT TT TT TT CT CT CT TT CT CT TT CT CC CT CT CTCT CT TT TT CT CT TT TT CC CT TT TT CC CT TT CT TT TT CT TT TT CC CC CC CC CT CT CT TT CTTT CC CC CC TT CT CT TT TT CT CT CT CT CT TT CT CT CC CT CT CT CT TT CC CC CT CT CT CC TTCT TT TT CT TT TT CC CT CC TT CT CC TT CC CC CT CC CC TT TT CT CT CC CC CT TT TT CC CT TTTT TT TT CT CT CT TT CT CT CC CC CT TT CT TT CT TT CT CC CC CT CC CT CT CT TT CC CC CT CTCT CC CT CT TT CT CT CT CT CT CT CT CC CC TT CT TT CT CT CC CC CC TT CC TT CT CT CT CT TTCC TT CT CC TT CC TT CC CT TT CC CT TT CC CT CT CC CT CC CC CT CC CC CT CT CT CT CT CT CTCT CT CT CT CT CT CC CT TT CT TT CT CT CC TT CT CC CT CC CT CT TT TT CC CT CT CT TT CC CTTT CT CT TT CT CT CT TT CT CT CT TT CC CT CT TT TT CC TT CC CC TT TT TT TT CC CC CT CT CTCT CC CC TT CT TT CC TT CT CT CT TT CT CT CC CC CC TT CT CT CC CT CT CT CT CC CC CT CT CCCC CC CC CC CC CT CT TT CT CT CT CT CT CT TT CT CT TT TT CT CT CT TT CT CT CT CT TT TT TTCT CC CT TT TT CT TT TT TT CT
Odds ratio for C allele: 1.35, p = 6.3 x 10-7
controls cases
Manhattan plot
(McCarthy et al.,Nature Reviews Genetics, May 2008)
p-value the probability of seeing your data or more extreme
data if the null hypothesis is true. By chance, with 1,000,000 statistical tests: • a threshold of p=0.05
would show 50,000 “significant” associations 360 cases : 360 controls
• a threshold of p = 0.05/1,000,000 (5 x 10-8) would show 0.05 “significant” associations 1590 cases: 1590 controls.
study design considerations Case-control or cohort Sample size Phenotype definition
Comparability of cases and controls • Genotyping quality • Population substructure • Laboratory procedures, genotyping, data cleaning
population stratification
requires both allele frequency and disease prevalence differences
Balding. Nature Reviews Genetics. 2006; 7:781-791
(modified from McCarthy et al.,Nature Reviews Genetics, May 2008)
Q-Q plots
TA Manolio et al. Nature 461, 747-753 (2009) doi:10.1038/nature08494
Feasibility of identifying genetic variants by risk allele frequency and strength of genetic effect (odds ratio).
Allele frequency & effect size
reasons for larger sample size: • More genotypes / tests
• More genotype error or misclassification
• Higher heterogeneity of association
• Lower effect size
• Lower frequency of risk allele
• Lower correlation between marker allele and risk allele.
power & sample size
(Rice, personal communication)
Multi-stage discovery Carry-forward a large number of potential associations through multiple, narrowing stages.
Protect against false positives via replication
Minimize false negative results via permissive early thresholds
From Hoover, R. Epidemiology. 18(1):13-‐17, January 2007.
Meta analysis Combine results from several studies to increase power using traditional methods of meta-analysis.
Allows for first stage discovery of small effect sizes
(SELECTED EXAMPLES)
Wellcome Trust Case Control Consortium Biggest projects undertaken to identify genetic variation that
may be associated with disease £ 9 million in funding from Wellcome Trust GWAS of seven common diseases: 2,000 cases each and 3,000
shared controls All genotyping data available to scientific community
www.wtccc.org.uk; (Nature, vol 447, 7 June 2007)
Lon Cardon, SISG 2007
UK Control groups are NOT very different
Lon Cardon, SISG 2007
WTCCC Results
WTCCC results
Samani, NEJM 2007
WTCCC results
Coronary Disease GWAS: 9p21 author McPherson Helgadottir Samani Larson
where when
Science May 2007
Science May 2007
NEJM August 2007
BMC Med Gen Sept 2007
design 3-stage case control
case-control case control cohort
discovery OHS 1 OHS 2 ARIC
deCODE: Iceland A WTCCC Framingham Heart Study
replication CCHS DHS
OHS-3
Iceland B 3 U.S. case-control
German Family Study
case definition
severe premature CHD MI MI or revascularization + fhx of CAD
incident MI
age at onset <60 <70 males <75 females
<66
9p21 results
Helgadottir, Science 2007 McPherson, Science 2007
study SNP locus hazard/odds ratio PAR
ARIC rs10757274 9p21 AB: 1.18 (1.02-1.37) BB: 1.29 (1.09-1.52)
12-15%
CCHS rs10757274 9p21 AB: 1.26 (1.12-1.42) BB: 1.38 (1.19-1.60)
10-13%
deCODE rs10757278 9p21 AB: 1.26 (1.16-1.36) BB: 1.64 (1.47-1.82)
21%
deCODE early onset
rs10757278 9p21 AB: 1.49 (1.31-1.69) BB: 2.02 (1.72 - 2.36)
31%
9p21 Gene Region
9p21 locus not located within a “gene” region contains CDKN2A and CDKN2B genes • role in cell proliferation, cell aging and apoptosis -
important features of atherogenesis • Sequencing did not reveal obvious candidates
may implicate a previously unrecognized gene or regulatory element
same region also associated with type 2 diabetes
MIGen: Population Study Cases (mean age) Controls (mean age) Italian ATVB (Italy) 1,693
(39 y) 1,668 (39 y)
Heart Attack Risk in Puget Sound (USA) 505 (46 y)
559 (45 y)
REGICOR (Spain) 312 (46 y)
317 (46 y)
MGH (USA) 204 (47 y)
260 (54 y)
FINRISK (Finland) 167 (47 y)
172 (47 y)
Malmö Diet & Cancer (Sweden) 86 (47 y)
99 (49 y)
MIGen: Design
MIGen: Early MI SNPs Locus genes of interest
✔ 1p13 CELSR2-PSRC1-SORT1 ✔ 1q41 MIA3 ✘ 2p36 -- ✪ 2q33 WDR12 ✪ 3q22.3 MRAS ✪ 6p24.1 PHACTR1 ✘ 6q25 MTHFD1L ✔ 9p21 CDKN2A-CDKN2B ✔ 10q11 CXCL12 ✘ 15q22 SMAD3 ✪ 19p13.2 LDLR ✪ 21q22 MRPS6-SLC5A3-KCNE2
MIGen: Risk Score
4 replicated loci: CDKN2A-B, CELSR2-PSRC1-SORT1, MIA3, CXCL12 5 new loci: SLC5A3-MRPS6-KCNE2, PHACTR, WDR12, LDLR, PCSK9
(LESSONS, QUESTIONS, DIRECTIONS)
Published Genome-Wide Associations through 6/2010, 904 published GWA at p<5x10-8 for 165 traits
NHGRI GWA Catalog www.genome.gov/GWAStudies
new biology: genomic context
Manolio, NEJM 2010
new biology: mechanisms
Manolio, NEJM 2010
new biology: connections
Manolio, NEJM 2010
missing heritability (2009)
Manolio, Nature 2009
number of loci % of heritability
explained Age-related macular degeneration 5 50%
Crohn’s disease 32 20%
Type-2 diabetes 18 6%
HDL cholesterol 7 5%
Height 40 5%
Early-onset MI 9 2.8%
Fasting glucose 4 1.5%
missing heritability many variants with small effects yet to be found • larger sample sizes have revealed more loci
true positives below significance threshold contribution of rare variants failure to identify true causal variant structural variants poorly captured by arrays previous estimates of heritability flawed GxG or GxE interactions
missing heritability (update)
Meta-analysis of > 100,000 discovers 59 new associations SNPs explain ~12% of trait variability & ~ 25% heritability Some predict MI risk; point to LDL/HDL differences
disease prediction
hope: highly predictive and affordable genetic tests reality: low discriminatory and predictive ability
Manolio, NEJM 2010
next steps Ever larger sample sizes Studies of non-European ethnic populations Sequencing implicated genetic regions More complex genetic models • Gene x Gene interactions • pooling of rare variants
Functional biology: work in basic science and animal models
summary GWAS have led to new biology Small effect sizes Not useful in prediction Much yet to be discovered More complicated than we thought
Don’t forget: • case definition • QC measures • sample size and power • multiple testing • independent replication
“There have been few, if any, similar bursts of discovery in the history of medical research”
-- “Drinking from the fire hose …” (Hunter & Knox)
Consumer Genotyping Toys
Consumer Genotyping Toys
Consumer Genotyping Toys
Consumer Genotyping Toys
Sources / References / Reading 1. The International HapMap Consortium.* A haplotype map of the human genome. Nature, 2005. 437(7063): p. 1299-320.[16255080]. 2. The Type 1 Diabetes Genetics Consortium.* Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature Genetics, 2009
May 10 [19430480] 3. Myocardial Infarction Genetics Consortium.* Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number
variants. Nat Genet. 2009 Mar;41(3):334-41 [19198609] 4. The Wellcome Trust Case Control Constortium.* Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 2007. 447
(7145): p. 661-78.[17554300]. 5. Balding, D.J., A tutorial on statistical methods for population association studies. Nat Rev Genet, 2006. 7(10): p. 781-91.[16983374]. 6. Christensen, K. and J.C. Murray, What genome-wide association studies can do for medicine. N Engl J Med, 2007. 356(11): p. 1094-7.[17360987]. 7. Frazer, K.A., et al., A second generation human haplotype map of over 3.1 million SNPs. Nature, 2007. 449(7164): p. 851-61.[17943122]. 8. Hirschhorn, J.N., et al., A comprehensive review of genetic association studies. Genet Med, 2002. 4(2): p. 45-61.[11882781]. 9. Hoover, R. The evolution of epidemiologic research: from cottage industry to "big" science. Epidemiology. 2007 Jan;18(1):13-7. [17179754] 10. Hunter, D.J. and P. Kraft, Drinking from the fire hose--statistical issues in genomewide association studies. N Engl J Med, 2007. 357(5): p. 436-9.[17634446]. 11. Li Y, Willer C, Sanna S, Abecasis G., Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387-406. [19715440] 12. Johnson AD and O’Donnell CJ: Open access database of GWA results, BMC Medical Genetics 2009: 10:6 13. Manolio, T.A., et al., Genetics of ultrasonographic carotid atherosclerosis. Arterioscler Thromb Vasc Biol, 2004. 24(9): p. 1567-77.[15256397]. 14. Manolio, T.A., L.D. Brooks, and F.S. Collins, A HapMap harvest of insights into the genetics of common disease. J Clin Invest, 2008. 118(5): p. 1590-605.[18451988]. 15. Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of complex diseases. Nature. 2009 Oct 8;461(7265):747-53. [19812666] 16. Manolio, TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med. 2010 Jul 8;363(2):166-76. [20647212] 17. McCarthy, M.I., et al., Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 2008. 9(5): p. 356-69.[18398418]. 18. Pearson, T.A. and T.A. Manolio, How to interpret a genome-wide association study. JAMA, 2008. 299(11): p. 1335-44.[18349094]. 19. Samani NJ, Erdmann J, Hall AS, et al. Genomewide association analysis of coronary artery disease. N Engl J Med. 2007 Aug 2;357(5):443-53. [17634449] 20. Teslovich TM, Musunuru K, Smith AV, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010 Aug 5;466(7307):707-13. [20686565] 21. NHGRI catalog of published GWA studies (http://genome.gov/GWASstudies)