how to interpret a genome-wide association...

12
current as of October 12, 2010. Online article and related content http://jama.ama-assn.org/cgi/content/full/299/11/1335 . 2008;299(11):1335-1344 (doi:10.1001/jama.299.11.1335) JAMA Thomas A. Pearson; Teri A. Manolio How to Interpret a Genome-wide Association Study Correction Contact me if this article is corrected. http://jama.ama-assn.org/cgi/content/full/jama;299/18/2150-a Correction is appended to this PDF and also available at Citations Contact me when this article is cited. This article has been cited 89 times. Topic collections Contact me when new articles are published in these topic areas. Statistics and Research Methods; Genetics; Genetics, Other http://pubs.ama-assn.org/misc/permissions.dtl [email protected] Permissions http://jama.com/subscribe Subscribe [email protected] Reprints/E-prints http://jamaarchives.com/alerts Email Alerts at University of Colorado - Denver HSL on October 12, 2010 www.jama.com Downloaded from

Upload: others

Post on 29-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

current as of October 12, 2010. Online article and related content

http://jama.ama-assn.org/cgi/content/full/299/11/1335

. 2008;299(11):1335-1344 (doi:10.1001/jama.299.11.1335) JAMA

Thomas A. Pearson; Teri A. Manolio

How to Interpret a Genome-wide Association Study

Correction Contact me if this article is corrected.

http://jama.ama-assn.org/cgi/content/full/jama;299/18/2150-aCorrection is appended to this PDF and also available at

Citations Contact me when this article is cited. This article has been cited 89 times.

Topic collections Contact me when new articles are published in these topic areas.

Statistics and Research Methods; Genetics; Genetics, Other

http://pubs.ama-assn.org/misc/[email protected]

http://jama.com/subscribeSubscribe

[email protected]/E-prints

http://jamaarchives.com/alertsEmail Alerts

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 2: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

SPECIAL COMMUNICATION

How to Interpret a Genome-wideAssociation StudyThomas A. Pearson, MD, MPH, PhDTeri A. Manolio, MD, PhD

IN THE PAST 2 YEARS, THERE HAS BEENa dramatic increase in genomic dis-coveries involvng complex, non-Mendelian diseases, with nearly

100 loci for as many as 40 common dis-eases robustly identified and repli-cated in genome-wide association(GWA) studies (T.A.M.; unpublisheddata, 2008). These studies use high-throughput genotyping technologies toassay hundreds of thousands of themost common form of genetic variant,the single-nucleotide polymorphism(SNP), and relate these variants to dis-eases or health-related traits.1 Nearly 12million unique human SNPs have beenassigned a reference SNP (rs) numberin the National Center for Biotechnol-ogy Information’s dbSNP database2 andcharacterized as to specific alleles (al-ternate forms of the SNP), summary al-lele frequencies, and other genomic in-formation.3

The GWA approach is revolution-ary because it permits interrogation ofthe entire human genome at levels ofresolution previously unattainable, inthousands of unrelated individuals, un-constrained by prior hypotheses re-garding genetic associations with dis-ease.4 However, the GWA approach canalso be problematic because the mas-sive number of statistical tests per-formed presents an unprecedented po-tential for false-positive results, leadingto new stringency in acceptable levelsof statistical significance and require-ments for replication of findings.5

The genome-wide, nonhypothesis-driven nature of GWA studies repre-sents an important step beyond candi-

date gene studies, in which the high costof genotyping had limited the numberof variants assayed to several hundredat most. This required careful selec-tion of variants to be studied, oftenbased on imperfect understanding of thebiologic pathways relating genes to dis-ease.6 Many such associations failed tobe replicated in subsequent studies,7,8

leading to calls for all genetic associa-tion reports to include documented rep-lication of findings as a prerequisite forpublication.9,10

For non-Mendelian conditions,GWA studies also represent a valuableadvance over family-based linkage stud-ies, in which multiply affected fami-lies are arduously assembled and in-heritance patterns are related to severalhundred markers throughout the ge-nome. Family-based linkage studies, al-

though successful in identifying genesof large effect in Mendelian diseasessuch as cystic fibrosis and neurofibro-matosis, have had more limitedsuccess in common diseases like ath-erosclerosis and asthma.11 Major limi-tations of linkage studies are relativelylow power for complex disorders in-fluenced by multiple genes, and thelarge size of the chromosomal regionsshared among family members (oftencomprising hundreds of genes), inwhom it can be difficult to narrow the

Author Affiliations: Office of Population Genomics,National Human Genome Research Institute, Na-tional Institutes of Health, Bethesda, Maryland (DrsPearson and Manolio); and Clinical and TranslationalScience Institute, University of Rochester Medical Cen-ter, Rochester, New York (Dr Pearson).Corresponding Author: Teri A. Manolio, MD, PhD, Of-fice of Population Genomics, National Human GenomeResearchInstitute,31CenterDr,Room4B-09,MSC2154,Bethesda, MD 20892-2154 ([email protected]).

Genome-wide association (GWA) studies use high-throughput genotyping tech-nologies to assay hundreds of thousands of single-nucleotide polymor-phisms (SNPs) and relate them to clinical conditions and measurable traits.Since 2005, nearly 100 loci for as many as 40 common diseases and traits havebeen identified and replicated in GWA studies, many in genes not previouslysuspected of having a role in the disease under study, and some in genomicregions containing no known genes. GWA studies are an important advancein discovering genetic variants influencing disease but also have importantlimitations, including their potential for false-positive and false-negative re-sults and for biases related to selection of study participants and genotypingerrors. Although these studies are clearly many steps removed from actual clini-cal use, and specific applications of GWA findings in prevention and treat-ment are actively being pursued, at present these studies mainly represent avaluable discovery tool for examining genomic function and clarifying patho-physiologic mechanisms. This article describes the design, interpretation, ap-plication, and limitations of GWA studies for clinicians and scientists for whomthis evolving science may have great relevance.JAMA. 2008;299(11):1335-1344 www.jama.com

©2008 American Medical Association. All rights reserved. (Reprinted) JAMA, March 19, 2008—Vol 299, No. 11 1335

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 3: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

linkage signal sufficiently to identify acausative gene.

GWA studies build on the valuablelessons learned from candidate gene andfamily linkage studies, as well as the ex-panding knowledge of the relation-ships among SNP variants generated bythe International HapMap Project,12,13

to capture the great majority of com-mon genetic differences among indi-viduals and relate them to health anddisease. These studies not only repre-sent a powerful new tool for identifi-cation of genes influencing commondiseases, but also use new terminolo-gies (BOX 1), apply new models, andpresent new challenges in interpreta-tion. GWA studies rely on the “com-mon disease, common variant” hypoth-esis, which suggests that geneticinfluences on many common diseaseswill be at least partly attributable to alimited number of allelic variantspresent in more than 1% to 5% of thepopulation.14 Many important disease-causing variants may be rarer than thisand are unlikely to be detected with thisapproach.

Although GWA discovery studiesprovide important clues to genomicfunction and pathophysiologic mecha-nisms, they are as yet many steps re-moved from actual clinical applica-tion. Nonetheless, they have gainedconsiderable media attention and havethe potential for generating queries frompatients about whether to get tested forthe “new gene for disease X” based onthe latest report. In this article, we de-scribe the design, interpretation, appli-cation, and limitations of GWA stud-ies for clinicians and scientists for whomthis evolving science may have great rel-evance.

Overview of GWA StudiesA GWA study is defined by theNational Institutes of Health as astudy of common genetic variationacross the entire human genomedesigned to identify genetic associa-t ions with observable tra i t s . 1 5

Although family linkage studies andstudies comprising tens of thousandsof gene-based SNPs also assay genetic

variation across the genome,16 theNational Institutes of Health defini-tion requires sufficient density andselection of genetic markers to cap-ture a large proportion of the com-mon variants in the study population,measured in enough individuals toprovide sufficient power to detectvariants of modest effect.

The present discussion focuses onstudies attempting to assay at least100 000 SNPs selected to serve as prox-ies for the largest possible number ofSNPs.12 The typical GWA study has 4parts: (1) selection of a large numberof individuals with the disease or traitof interest and a suitable comparisongroup; (2) DNA isolation, genotyp-ing, and data review to ensure highgenotyping quality; (3) statistical testsfor associations between the SNPs pass-ing quality thresholds and the disease/trait; and (4) replication of identifiedassociations in an independent popu-lation sample or examination of func-tional implications experimentally.

Most of the roughly 100 GWA stud-ies published by the end of 2007 weredesigned to identify SNPs associatedwith common diseases. However, thetechnique can also be used to identifygenetic variants related to quantita-tive traits such as height17 or electro-cardiographic QT interval,18 and to rankthe relative importance of previouslyidentified susceptibility genes, such asAPOE*e4 in Alzheimer disease19 andCARD15 and IL23R in Crohn dis-ease.20

GWA studies can also demonstrategene-gene interactions, or modifica-tion of the association of one geneticvariant by another, as with GAB2 andAPOE in Alzheimer disease,21 and candetect high-risk haplotypes or combi-nations of multiple SNPs within a singlegene, as in exfoliation glaucoma22 andatrial fibrillation.23 These studies havealso been used to identify SNPs asso-ciated with gene expression, either asconfirmation of a phenotypic associa-tion, such as asthma and ORMDL3 ex-pression,24 or more globally.25 Thus,GWA studies have broader applica-tions than those solely involving dis-

covery of individual SNPs associatedwith discrete disease end points.

Study Designs Used in GWABy far the most frequently used GWAstudy design to date has been the case-control design, in which allele frequen-cies in patients with the disease of in-terest are compared to those in adisease-free comparison group. Thesestudies are often easier and less expen-sive to conduct than studies using otherdesigns, especially if sufficient num-bers of case and control participants canbe assembled rapidly. This design alsocarries the most assumptions, which ifnot met, can lead to substantial biasesand spurious associations (TABLE 1).The most important of these biases in-volve the selected, often unrepresen-tative nature of the study case partici-pants, who are typically sampled fromclinical sources and thus may not in-clude fatal, mild, or silent cases notcoming to clinical attention; and thelack of comparability of case and con-trol participants, who may differ in im-portant ways that could be related bothto genetic risk factors and to diseaseoutcomes.26

If well-established principles of epi-demiologic design are followed, case-control studies can produce valid re-sults that, especially for rare diseases,may not be obtainable in any other way.However, genetic association studiesusing case-control methodologies haveoften not always adhered to these prin-ciples. The often sharply abbreviated de-scriptions of case and control partici-pants and lack of comparison of keycharacteristics in GWA reports27 canmake evaluation of potential biases andreplication of findings quite difficult.28

The trio design includes the af-fected case participant and both of hisor her parents.29 Phenotypic assess-ment (classification of affected status)is performed only in the offspring andonly affected offspring are included, butgenotyping is performed in all 3 triomembers. The frequency with which anallele is transmitted to an affected off-spring from heterozygous parents isthen estimated.29 Under the null hy-

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

1336 JAMA, March 19, 2008—Vol 299, No. 11 (Reprinted) ©2008 American Medical Association. All rights reserved.

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 4: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

pothesis of no association with dis-ease, the transmission frequency foreach allele of a given SNP will be 50%,but alleles associated with the diseasewill be transmitted in excess to the af-fected case individual. Because the triodesign studies allele transmission fromparents to offspring, it is not suscep-tible to population stratification, or ge-

netic differences between case and con-trol participants unrelated to disease butdue to sampling them from popula-tions of different ancestry.30 A signifi-cant challenge of the trio design inGWA studies is its sensitivity to evensmall degrees of genotyping error,4,31

which can distort transmission propor-tions between parents and offspring, es-

pecially for uncommon alleles. There-fore, standards for genotyping qualityin trio studies may need to be morestringent than for other designs.

Cohort studies involve collectingextensive baseline information in alarge number of individuals who arethen observed to assess the incidenceof disease in subgroups defined by

Box 1. Terms Frequently Used in Genome-wide Association StudiesAlleles

Alternate forms of a gene or chromosomal locus that differ in DNAsequence

Candidate geneA gene believed to influence expression of complex phenotypesdue to known biological and/or physiological properties ofits products, or to its location near a region of association orlinkage

Copy number variantsStretches of genomic sequence of roughly 1 kb to 3 Mb in size thatare deleted or are duplicated in varying numbers

False discovery rate59,60

Proportion of significant associations that are actually false posi-tives

False-positive report probability61

Probability that the null hypothesis is true, given a statistically sig-nificant finding

Functional studiesInvestigations of the role or mechanism of a genetic variant in cau-sation of a disease or trait

Gene-environment interactionsModification of gene-disease associations in the presence of envi-ronmental factors

Genome-wide association studyAny study of genetic variation across the entire human genomedesigned to identify genetic association with observable traits orthe presence or absence of a disease, usually referring to studieswith genetic marker density of 100 000 or more to represent alarge proportion of variation in the human genome

Genotyping call rateProportion of samples or SNPs for which a specific allele SNP canbe reliably identified by a genotyping method

HaplotypeA group of specific alleles at neighboring genes or markers thattend to be inherited together

HapMap12,13

Genome-wide database of patterns of common human gen-etic sequence variation among multiple ancestral populationsamples

Hardy Weinberg equilibriumPopulation distribution of 2 alleles (with frequencies p and q)such that the distribution is stable from generation to generationand genotypes occur at frequencies of p2, 2pq, and q2 for themajor allele homozygote, heterozygote, and minor allele homozy-gote, respectively

Linkage disequilibriumAssociation between 2 alleles located near each other on a chromo-some, such that they are inherited together more frequently thanexpected by chance

Mendelian diseaseCondition caused almost entirely by a single major gene, such ascystic fibrosisorHuntington’sdisease, inwhichdiseaseismanifestedin only 1 (recessive) or 2 (dominant) of the 3 possible genotypegroups

Minor alleleThe allele of a biallelic polymorphism that is less frequent in thestudy population

Minor allele frequencyProportion of the less common of 2 alleles in a population (with2 alleles carried by each person at each autosomal locus) rang-ing from less than 1% to less than 50%

Modest effectAssociation between a gene variant and disease or trait that isstatistically significant but carries a small odds ratio (usually !1.5)

Non-Mendelian disease (also “common” or “complex” disease)Condition influenced by multiple genes and environmental fac-tors and not showing Mendelian inheritance patterns

Nonsynonymous SNPApolymorphismthat results inachange intheaminoacidsequenceof a protein (and therefore may affect the function of the protein)

PlatformArrays or chips on which high-throughput genotyping is performed

PolymorphicA gene or site with multiple allelic forms. The term polymorphismusually implies a minor allele frequency of at least 1%

Population attributable riskProportion of a disease or trait in the population that is due to aspecific cause, such as a genetic variant

Population stratification (also “population structure”)A form of confounding in genetic association studies caused by ge-neticdifferencesbetweencasesandcontrolsunrelatedtodiseasebutdue to sampling them from populations of different ancestries

PowerA statistical term for the probability of identifying a differencebetween 2 groups in a study when a difference truly exists

Single-nucleotide polymorphismMost common form of genetic variation in the genome, in whicha single-base substitution has created 2 forms of a DNA se-quence that differ by a single nucleotide

Tag SNPA readily measured SNP that is in strong linkage disequilibriumwith multiple other SNPs so that it can serve as a proxy for theseSNPs on large-scale genotyping platforms

TrioGeneticstudydesignincludinganaffectedoffspringandbothparents

Abbreviation: SNP, single-nucleotide polymorphism.

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

©2008 American Medical Association. All rights reserved. (Reprinted) JAMA, March 19, 2008—Vol 299, No. 11 1337

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 5: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

genetic variants. Although cohortstudies are typically more expensiveand take longer to conduct than case-control studies, they often includestudy participants who are more rep-resentative than clinical series of thepopulation from which they aredrawn, and they typically include avast array of health-related character-istics and exposures for which geneticassociations can be sought.17,18 Forthese reasons, genome-wide genotyp-ing has recently been added to cohortstudies such as the Framingham HeartStudy32 and the Women’s HealthStudy.33

Many GWA studies use multistage de-signs to reduce the number of false-positive results while minimizing thenumber of costly genome-wide scansperformed and retaining statisticalpower.4 Genome-wide scans are typi-cally performed on an initial group ofcase and control participants and thena smaller number of associated SNPs isreplicated in a second or third group ofcase and control participants (TABLE 2).Some studies begin with small num-bers of participants in the initial scan butcarry forward large numbers of SNPs to

minimize false-negative results.34 Otherstudies begin with more participants butcarry forward a smaller proportion of as-sociated SNPs.35 Optimal proportions ofstudy participants and SNPs in eachphase have yet to be determined,36 butcarrying forward a small proportion(!5%) of stage 1 SNPs will often meanlimiting the associations ultimately iden-tified to those having a relatively largeeffect.37

Selection of Study ParticipantsMany genetic studies, whether GWAor otherwise, focus on case partici-pants more likely to have a geneticbasis for their disease, such as early-onset cases or those with multipleaffected relatives. Misclassification ofcase participants can markedly reducestudy power and bias study resultstoward no association, particularlywhen large numbers of unaffectedindividuals are misclassified asaffected. For diseases that are difficultto diagnose reliably, ensuring thatcases are truly affected (as by invasivetesting or imaging), is probably moreimportant than ensuring generaliz-ability, although the limitations on

diagnostic reliability and generaliz-ability should be clearly described sothat clinicians can judge the relevanceto their patients.

The control participants should bedrawn from the same population as thecase participants and should be at riskto develop the disease and be detectedin the study. Inclusion of women as con-trols in genetic association studies of dis-eases limited to men, for example, isproblematic in that this approach addsindividuals to the control group who hadno chance of developing the disease (butmight have done so had they also in-herited a Y chromosome), thus mixingthe controls with possible latent cases.This artificially reduces the differencesin allele frequencies between cases andcontrols and limits the ability of thestudy to detect a true difference (ie, re-duces study power).

If the disease is common, such ascoronary heart disease or hyperten-sion in the United States, efforts shouldbe made to ensure that the controls aretruly disease free. Some studies ad-dress this by using super-controls orpersons at high risk but without evenearly evidence of disease, such as per-

Table 1. Study Designs Used in Genome-wide Association StudiesCase-Control Cohort Trio

Assumptions Case and control participants are drawnfrom the same population

Case participants are representativeof all cases of the disease,or limitations on diagnostic specificityand representativeness areclearly specified

Genomic and epidemiologic data arecollected similarly in cases andcontrols

Differences in allele frequencies relate tothe outcome of interest rather thandifferences in background populationbetween cases and controls

Participants under study are morerepresentative of the populationfrom which they are drawn

Diseases and traits are ascertainedsimilarly in individuals with andwithout the gene variant

Disease-related alleles are transmitted inexcess of 50% to affected offspringfrom heterozygous parents

Advantages Short time frameLarge numbers of case and control

participants can be assembledOptimal epidemiologic design for

studying rare diseases

Cases are incident (developing duringobservation) and free of survival bias

Direct measure of riskFewer biases than case-control studiesContinuum of health-related measures

available in population samples notselected for presence of disease

Controls for population structure;immune to population stratification

Allows checks for Mendelian inheritancepatterns in genotyping quality control

Logistically simpler for studies ofchildren’s conditions

Does not require phenotyping of parentsDisadvantages Prone to a number of biases including

population stratificationCases are usually prevalent cases,

may exclude fatal or short episodes,or mild or silent cases

Overestimate relative risk for commondiseases

Large sample size needed forgenotyping if incidence is low

Expensive and lengthy follow-upExisting consent may be insufficient for

GWA genotyping or data sharingRequires variation in trait being studiedPoorly suited for studying rare diseases

May be difficult to assemble bothparents and offspring, especially indisorders with older ages of onset

Highly sensitive to genotyping error

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

1338 JAMA, March 19, 2008—Vol 299, No. 11 (Reprinted) ©2008 American Medical Association. All rights reserved.

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 6: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

sons with diabetes of long duration butwithout microalbuminuria in a studyof diabetic nephropathy.38 The suc-cess of recent GWA studies using con-trol groups of questionable represen-tativeness due to volunteer bias, suchas the blood donor cohort in the Well-come Trust Case-Control Consortium,39

suggests that initial identification ofSNPs associated with disease may be ro-bust to these biases, especially givensubsequent evidence of replication ofthese associations in studies using moretraditional control groups.40-42

Of more concern may be the risk offalse-negative findings, as many biasestend to reduce the magnitude of ob-served associations toward the null. Useof convenience controls such as blooddonors, however, may also be problem-atic in examining potential modifica-tion of genetic associations by environ-mental exposures and socioculturalfactors, and in the identification of lessstrongly associated SNPs.

Akeycomponent inarticles reportingresults in the epidemiology literatureof observational study is an initialtable comparing relevant characteristicsof thosewithandwithoutdisease, allow-ing assessment of comparability andgeneralizability of the 2 groups. Suchcomparisons are infrequent in GWAstudies,28 but theyare importantbecausecommondiseasesaretypicallyinfluencedbymultipleenvironmental (aswellasge-netic) factors. Important differencesshould be adjusted for in the analysis ifpossible, to avoid the risk of identifyinggenetic associationsnotwith thediseaseof interestbutwithaconfoundingfactor,such as smoking43 or obesity.44

Confoundingduetopopulationstrati-fication(alsocalledpopulationstructure)hasbeencitedasamajor threat to theva-lidity of genetic association studies, butitstrueimportanceisamatterofdebate.45,46

Whenvariationsoccurinallelefrequencybetween population subgroups, such asthosedefinedbyethnicityorgeographicorigin, that in turn differ in their risk fordisease, GWA studies may then falselyidentifythesubgroup-associatedgenesasrelated todisease.30 Populationstructureshouldbeassessedandreported inGWA

studies, typically by examining the dis-tributionof test statisticsgenerated fromthe thousands of association tests per-formed(eg,the"2 test)andassessingtheirdeviationfromthenulldistribution(thatexpectedunder thenullhypothesisofnoSNPassociatedwiththetrait)inaquantile-quantile or “Q-Q,” plot (FIGURE 1). Inthese plots, observed association statis-

tics or calculated P values for each SNParerankedinorder fromsmallest to larg-est and plotted against the values ex-pectedhadtheybeensampledfromadis-tribution of known form (such as the "2

distribution).39 Deviations from the di-agonal identity line suggest that eitherthe assumed distribution is incorrect orthat the sample contains values arising

Table 2. Examples of Multistage Designs in Genome-wide Association Studiesa

Stage

3-Stage Studyb 4-Stage Studyc

Case Participants/Control Participants SNPs Analyzed

Case Participants/Control Participants SNPs Analyzed

1 400/400 500 000 2000/2000 100 0002 4000/4000 25 000 2000/2000 10003 20 000/20 000 25 2000/2000 204 2000/2000 5

Abbreviation: SNP, single-nucleotide polymorphism.aBased on hypothetical data.bFive SNPs associated with disease.cTwo SNPs associated with disease.

Figure 1. Hypothetical Quantile-Quantile Plots in Genome-wide Association Studies

0

5

10

15

20

25

5 10 15 20 25

Obs

erve

d !2

Obs

erve

d !2

Expected !2 Expected !2

0

5

10

15

20

25

5 10 15 20 25

A BBefore-and-after exclusion of most strongly associated locus

Before-and-after adjustment for population stratification

All SNPsExclusion of moststrongly associated locus

Unadjusted

!2 Statistics

Adjusted

The Q-Q plot is used to assess the number and magnitude of observed associations between genotyped single-nucleotide polymorphisms (SNPs) and the disease or trait under study, compared to the association statistics ex-pected under the null hypothesis of no association.39 Observed association statistics (eg, "2 or t statistics) or !log10

P values calculated from them, are ranked in order from smallest to largest on the y-axis and plotted against thedistribution that would be expected under the null hypothesis of no association on the x-axis. Deviations from theidentity line suggest either that the assumed distribution is incorrect or that the sample contains values arising insome other manner, as by a true association.39 A, Observed "2 statistics of all polymorphic SNPs (dark blue) in ahypothetical genome-wide association study of a complex disease vs. the expected null distribution (black line).The sharp deviation above an expected "2 value of approximately 8 could be due to a strong association of thedisease with SNPs in a heavily genotyped region such as the major histocompatibility locus (MHC) on chromo-some 6p21 in multiple sclerosis or rheumatoid arthritis.70 Exclusion of SNPs from such a locus may leave a residualupward deviation (light blue) identifying more associated SNPs with higher observed "2 values (exceeding ap-proximately 17) than expected under the null hypothesis. B, Observed (dark purple) vs expected (black line) "2

statistics for a hypothetical genome-wide association study of a complex disease. Deviation from the expecteddistribution is observed above an expected "2 of approximately 5. Inflation of observed statistics due to related-ness and potential population structure can be estimated by the method of genomic control.49 Correction for thisinflation by simple division reduces the unadjusted "2 statistics (dark purple) to the adjusted levels (light purple),showing deviation only above an expected "2 of approximately 15. The region between expected "2 of approxi-mately 5 to approximately 15 is suggestive of broad differences in allele frequencies that are more likely due topopulation structure than disease susceptibility genes.

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

©2008 American Medical Association. All rights reserved. (Reprinted) JAMA, March 19, 2008—Vol 299, No. 11 1339

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 7: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

in some other manner, as by a true as-sociation.39

Since the underlying assumption inGWA studies is that the vast majority ofassayedSNPsarenotassociatedwith thetrait, strongdeviations fromthenull sug-gest either a very highly associated andheavilygenotyped locus(Figure1,A),orsignificantdifferencesinpopulationstruc-ture(Figure1,B).Severaleffectivestatis-tical methods are available to correct forpopulation structure and are a standardcomponentofrigorousGWAanalyses.28,30

Genotyping and Quality Controlin GWA StudiesGWA studies rely on the typically strongassociations among SNPs located neareach other on a chromosome, which tend

to be inherited together more often thanexpected by chance.50 This nonrandomassociation is called linkage disequilib-rium; alleles of SNPs in high linkage dis-equilibrium are almost always inher-ited together and can serve as proxies foreach other. Their correlation with eachother in the population is measured bythe r2 statistic, which is the proportionof variation of one SNP explained by theother, and ranges from 0 (no associa-tion) to 1 (perfect correlation).

Genomic coverage of GWA genotyp-ing platforms (arrays or chips on whichgenotyping is performed) is often esti-mated by the percent of common SNPshaving an r2 of 0.8 or greater with atleast 1 SNP on the platform.13 Geno-typing platforms comprising 500 000 to

1 000 000 SNPs have been estimated tocapture 67% to 89% of common SNPvariation in populations of Europeanand Asian ancestry and 46% to 66% ofvariation in individuals of recent Afri-can ancestry.13 Higher density plat-forms now also include probes for copynumber variants that are not well taggedby SNPs. Copy number variants, inwhich stretches of genomic sequenceare deleted or are duplicated in vary-ing numbers, have gained increasing at-tention because of their apparent ubi-quity and potential dosage effect ongene expression.51 Newer genotypingplatforms are increasingly being fo-cused on capturing copy number vari-ants, but other structural variants suchas insertions, deletions, and inver-sions, remain difficult to assay.52

GWA studies frequently identify as-sociations with multiple SNPs in a chro-mosomal region and display the asso-ciation statistics by their genomiclocation on a portion of a chromosome(FIGURE 2). For ease of display, asso-ciation statistics are typically shown asthe - log10 of the P value (the probabil-ity of the observed association arisingby chance alone), so that P=.01 wouldbe plotted as “2” on the y-axis andP=10!7 as “7.” Such displays also of-ten plot a matrix of r2 values for eachpair of SNPs in the region, with largerr2 values more intensely shaded. Theseplots can be used to identify linkage dis-equilibrium blocks containing SNPs as-sociated with disease, allowing estima-tion of the independence of the SNPassociations observed.53

Genotyping errors, especially if oc-curring differentially between cases andcontrols, are an important cause of spu-rious associations and must be dili-gently sought and corrected.54 A num-ber of quality control features shouldbe applied both on a per-sample and aper-SNP basis. Checks on sample iden-tity to avoid sample mix-ups should bedescribed and a minimum rate of suc-cessfully genotyped SNPs per sample(usual ly 80%-90% of SNPs at -tempted) should be reported. Oncesamples failing these thresholds are re-moved, individual SNPs across the re-

Figure 2. Associations in the IL23R Gene Region Identified by a Genome-wide AssociationStudy of Inflammatory Bowel Disease

Genome-wideassociationstudies frequently identifyassociationswithmanyhighlycorrelatedsingle-nucleotidepoly-morphisms (SNPs) in a chromosomal region, due in part to linkage disequilibrium, among the SNPs. This can makeit difficult to determine which SNP within a group is likely to be the causative or functional variant. A, Genomic lo-cations of 2 genes, the interleukin 23 receptor (IL23R) and the interleukin 12 receptor, beta-2 (IL12Rb2), and a hy-pothetical protein, NM_001013674, between positions 62700000 and 67580000 of the short arm of chromosome1 at region 1p31, are shown. B, The !log10 P values for association with inflammatory bowel disease are plotted foreach SNP genotyped in the region; those reaching a prespecified value of !log10 of 7 or greater are presumed toshow association with disease. Several strong associations, at !log10 P values or greater, are seen in the region justtelomeric of position approximately 67400000 and extending just centromeric of position approximate 67450000.C, Pairwise linkage disequilibrium estimates between SNPs (measured as r2) are plotted for the region. Higher r2 val-ues are indicated by darker shading. The region contains 4 “triangles” or “blocks” of linkage disequilibrium, 2 oneither side of position 67400000 in the IL23R gene, another in the hypothetical protein telomeric of IL23R, and afourth in the IL12RB2 gene at the centromeric end of the region. The 2 IL23R linkage disequilibrium regions eachcontain SNPs associated with inflammatory bowel disease, while the IL12RB2 region does not. Reproduced withpermission from Duerr et al.53

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

1340 JAMA, March 19, 2008—Vol 299, No. 11 (Reprinted) ©2008 American Medical Association. All rights reserved.

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 8: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

maining samples are subjected to fur-ther checks or filters for probablegenotyping errors, including: (1) theproportion of samples for which a SNPcan be measured (the SNP call rate,typically #95%); (2) the minor allelefrequency (often #1%, as rarer SNPsare difficult to measure reliably; (3) se-vere violations of Hardy-Weinberg equi-librium; (4) Mendelian inheritance er-rors in trio studies; and (5) concordancerates in duplicate samples (typically#99.5%).

Additional checks on genotypingquality should include careful visual in-spection of genotype cluster plots, orintensity values generated by the geno-typing assay to ensure that the stron-gest associations do not merely reflectgenotyping artifact.28,39 Genotyping themost strongly associated SNPs shouldalso be confirmed using a differentmethod.28 Associations with any known“positive controls,” such as TCF7L2 intype 2 diabetes mellitus55 or HLA-DRB1 in rheumatoid arthritis,47 shouldbe reported to increase confidence inthe consistency of findings with priorreports.

Analysis and Presentationof GWA ResultsAssociations with the 2 alleles of eachSNP are tested in a relatively straight-forward manner by comparing the fre-quency of each allele in cases and con-trols (TABLE 3). Because each individualcarries 2 copies of each autosomal SNP,the frequency of each of 3 possiblegenotypes can also be compared(Table 3). Exploratory analyses mayalso include testing of different ge-netic models (dominant, recessive, or

additive), although additive models, inwhich each copy of the allele is as-sumed to increase risk by the sameamount, tend to be the most common(T.A.M.; unpublished data, 2008). Oddsratios of disease associated with the riskallele or genotype(s) can then be cal-culated and are typically modest, of-ten in the range of 1.2 to 1.3. Manystudies also calculate population at-tributable risk, classically defined as theproportion of disease in the popula-tion associated with a given risk factor(in this case, a genetic variant).57

Such estimates are nearly always in-flated because odds ratios overesti-mate relative risks (especially for com-mon diseases58) needed for populationattributable risk calculations, and be-cause odds ratios and allele frequen-cies in published reports have wide con-

fidence intervals so that those selectedby exceeding a specified threshold forstatistical significance tend to be bi-ased upwards, an effect of ascertain-ment known as the “winner’s curse.”59

This exaggerated initial estimate of theodds ratio often leads to replicationstudies that lack sufficient sample sizeand power to replicate the associationbecause larger samples are needed to de-tect smaller odds ratios.

Complexity in analysis emerges dueto the multiple testing carried out inGWA studies, in that the associationtests shown in Table 2 are repeated foreach of the 100 000 to more than 1 mil-lion SNPs assayed (FIGURE 3). At theconventional P! .05 level of signifi-cance, an association study of 1 mil-lion SNPs will show 50 000 SNPs to be“associated” with disease, almost all

Figure 3. Genome-wide Association Findings in Rheumatoid Arthritis

12

!lo

g 10

P v

alue 8

10

6

4

210

9

11

7

5

3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

17 18 19 20 21 22

X

PTPN22MHC

TRAF1–C5

Genome-wide association studies assume a priori hypotheses about candidate genes or regions that might beassociated with disease; rather, they test single-nucleotide polymorphisms (SNPs) throughout the genome forpossible evidence of genetic susceptibility. Associations plotted as !log10 P values for a genome-wide associa-tion study in 1522 cases with rheumatoid arthritis and 1850 controls, showing single data points for SNPs withP !10!4 (lower horizontal red line) for 22 autosomes and the X chromosome. The predefined level of signifi-cance, at 5$10!8 is shown with a horizontal blue line. SNPs at PTPN22 on chromosome 1, the major histo-compatibility comples (MHC) on chromosome 6, and the TRAF1-C5 locus on chromosome 9 exceed this thresh-old. Reproduced with permission from Plenge et al.47

Table 3. Association of Alleles and Genotypes of rs6983267 on Chromosome 8q24 With Colorectal Cancera

Number and Frequency of rs6983267 Allelesin Colorectal Cancer

Number and Frequency of rs6983267Genotypes in Colorectal Cancer

C T "2 (1df) P Value OR CC CT TT "2 (2df) P Value OR ORCases 875 (56.5) 675 (43.5) 24.8 6.3 $ 10!7 1.35b 250 (32.3) 375 (48.4) 150 (19.4) 24.5 4.7 $ 10!6 1.33c 1.81d

Controls 1860 (48.9) 1940 (51.1) 460 (24.2) 940 (49.4) 500 (26.3)Abbreviation: OR, odds ratio.aData are hypothetical; adapted from Tomlinson et al.56

bDenotes allelic odds ratio.cDenotes heterozygote odds ratio.dDenotes homozygote odds ratio.

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

©2008 American Medical Association. All rights reserved. (Reprinted) JAMA, March 19, 2008—Vol 299, No. 11 1341

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 9: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

falsely positive and due to chance alone.The most common manner of dealingwith this problem is to reduce the false-positive rate by applying the Bonfer-roni correction, in which the conven-tional P value is divided by the numberof tests performed.60 A 1 million SNPsurvey would thus use a threshold ofP! .05/106, or 5$10!8, to identify as-sociations unlikely to have occurred bychance. This correction has been criti-cized as overly conservative because itassumes independent associations ofeach SNP with disease even though in-dividual SNPs are known to be corre-lated to some degree due to linkage dis-equilibrium.

Other approaches have been pro-posed, including estimation of the falsediscovery rate or proportion of signifi-cant associations that are actually falsepositive associations,61,62 false-positive re-port probability, or probability that thenull hypothesis is true given a statisti-cally significant finding,63 and estima-

tion of Bayes factors that incorporate theprior probability of association based oncharacteristics of the disease or the spe-cific SNP.39 To date, Bonferroni correc-tion has generally been the most com-monly used correction for multiplecomparisons in GWA reports (T.A.M.;unpublished data, 2008).

Replication and FunctionalStudiesGiven the major challenge of separat-ing the many false-positive associationsfrom the few true-positive associationswith disease in GWA studies, an impor-tant strategy has been replication of re-sults in independent samples.28 This istypically included in a single GWA re-port as part of a multistage design34,35 ormay be reported separately.39,64 Consen-sus criteria for replication have recentlybeen published and include study of thesame or very similar phenotype andpopulation, and demonstration of a simi-lar magnitude of effect and significance

(in the same genetic model and same di-rection) for the same SNP and the sameallele as the initial report.28 Replicationis usually first attempted in studies assimilar as possible to the initial report,but then may be extended to related phe-notypes (such as fat mass in addition toobesity44), different populations (such asWest Africans in addition to Iceland-ers65), or different study designs53 to re-fine and extend the initial findings andincrease confidence in verity.

Lack of reproducibility of genetic as-sociations has been frequently ob-served and has been varyingly attrib-uted to population stratification,phenotype differences, selection bi-ases, genotyping errors, and other fac-tors.28,66 At present, the best way of re-solving these inconsistencies appears tobe additional replication studies withlarger sample sizes, although this maynot be feasible for rare conditions or forassociations identified in unique popu-lations.28

Identification of a robustly replicat-ing SNP-disease association is a cru-cial first step in identifying disease-causing genetic variants and developingsuitable treatments, but it is only a firststep. Association studies essentiallyidentify a genomic location related todisease but provide little informationon gene function unless SNPs with pre-dictable effects on gene expression orthe transcribed product happened to beidentified. Few of the associations iden-tified to date have involved genes pre-viously suspected of being related to thedisease under study, and some havebeen in genomic locations harboring noknown genes.27,67 Examination ofknown SNPs in high linkage disequi-librium with the associated SNP mayidentify variants with plausible bio-logic effects, or sequencing of a suit-able surrounding interval may be un-dertaken to identify rarer variants withmore obvious functional implica-tions. Tissue samples or cell lines canbe examined for expression of the genevariant. Other functional studies mayinclude genetic manipulations in cell oranimal models, such as knockouts orknock-ins.68

Box 2. Ten Basic Questions to Ask About a Genome-wide Association StudyReporta

1. Are the cases defined clearly and reliably so that they can be compared withpatients typically seen in clinical practice?

2. Are case and control participants demonstrated to be comparable to each otheron important characteristics that might also be related to genetic variation and tothe disease?

3. Was the study of sufficient size to detect modest odds ratios or relative risks(1.3-1.5)?

4. Was the genotyping platform of sufficient density to capture a large proportionof the variation in the population studied?

5. Were appropriate quality control measures applied to genotyping assays, in-cluding visual inspection of cluster plots and replication on an independent geno-typing platform?

6. Did the study reliably detect associations with previously reported and repli-cated variants (known positives)?

7. Were stringent corrections applied for the many thousands of statistical testsperformed in defining the P value for significant associations?

8. Were the results replicated in independent population samples?

9. Were the replication samples comparable in geographic origin and phenotypedefinition, and if not, did the differences extend the applicability of the findings?

10. Was evidence provided for a functional role for the gene polymorphism iden-tified?

aFor a more detailed description of interpretation of genome-wide association studies, seeNCI/NHGRI Working Group on Replication in Association Studies.28

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

1342 JAMA, March 19, 2008—Vol 299, No. 11 (Reprinted) ©2008 American Medical Association. All rights reserved.

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 10: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

Limitations of GWA StudiesThe potential for false-positive results,lack of information on gene function, in-sensitivity to rare variants and struc-tural variants, requirement for largesample sizes, and possible biases due tocase and control selection and genotyp-ing errors, are important limitations ofGWA studies. The often limited infor-mation available about environmentalexposures and other non-genetic risk fac-tors in GWA studies will make it diffi-cult to identify gene-environment inter-actions or modification of gene-diseaseassociations in the presence of environ-mental factors. Clinicians and scien-tists should understand the unique as-pects of these studies and be able to assessand interpret GWA results for them-selves and their patients. Ten basic ques-tions to ask about GWA studies, manyof which also apply generically to asso-ciation studies of nongenetic risk fac-tors, are outlined in BOX 2. Most of thesequestions should be answered in the af-firmative for a reliable report; how-ever, many GWA reports lack suffi-cient detail to assess them.28

Many of the design and analysis fea-tures of GWA studies deal with mini-mizing the false-positive rates whilemaintaining power to identify true-positive associations. These same ef-forts to reduce false-positive results, how-ever, may result in overlooking a trueassociation, especially if only a smallnumber of SNPs are carried over from theinitial scan into replication studies. Themost robust findings, ie, those that “sur-vive” multiple rounds of replication, areoften not the most statistically signifi-cant associations in the initial scan, andmay not even be in the top few hundredassociations.69,70 Another cause of false-negative results is the lack of the ge-netic variant of relevance on the geno-typing platform, or lack of variation inthat SNP in the population under study.As the number of SNPs and diversity ofpopulations represented on genotypingplatforms increase, this should becomeless of a problem.

An important question generated bythese early GWA studies relates to thesmall proportion of heritability, or fa-

milial clustering explained by the ge-netic variants identified to date. Most ofthese variants have very modest effectson disease risk, increasing it by only 20%to 50%, and explaining only a small frac-tion of population risk or total esti-mated heritability for most condi-tions.39,71 Might the rest of the geneticinfluence reside in a long “tail” of com-mon SNPs with very small oddsratios, in copy number variants or otherstructural variants, rarer variants of largereffect, or interactions among commonvariants? Or has familial clustering dueto genetic factors been overestimated andimportant environmental influences,either acting alone or in combinationwith genetic variants, been overlooked?This remains to be determined, but it isimportant to realize that even small oddsratios or rare variants can suggest im-portant therapeutic strategies such as thedevelopment of HMG-CoA reductase in-hibitors arising from identification ofLDL-receptor mutations in familial hy-percholesterolemia.72

Clinical Applicationsof GWA FindingsDespite the considerable media atten-tion that GWA reports frequently re-ceive, these studies are clearly manysteps removed from actual clinical ap-plication. The primary use for GWAstudies for the foreseeable future islikely to be in investigation of biologicpathways of disease causation and nor-mal health and development. This is notto suggest that some early successesmay not occur in the near future,through rapid development of treat-ment strategies such as inhibitors ofcomplement activation in age-relatedmacular degeneration.73 Use of GWAfindings in screening for disease risk,while beginning to be marketed com-mercially, is more problematic. Al-though obtaining the latest “gene test”may be alluring to a technology-focused society, evidence is needed thatsuch screening adds information toknown risk factors (such as age, obe-sity, and family history for diabetes),that effective interventions are avail-able, that improved outcomes justify the

associated costs, and that obtaining thisinformation does not have serious ad-verse consequences for patients andtheir families. Such evidence is likelyto be some ways off, but the initial burstof discovery generated by GWA scanshas now mandated a concerted effortto search for these answers.

Author Contributions: Dr Manolio had full access toall of the data in the study and takes responsibility forthe integrity of the data and the accuracy of the dataanalysis.Financial Disclosures: None reported.Funding/Support: This work supported in part by a Clini-cal and Translational Science Award (RR024160) fromthe National Center for Research Resources, NationalInstitutes of Health, to the University of Rochester.Additional Contributions: The authors wish to ac-knowledge the valuable comments of Francis Col-lins, MD, PhD, National Human Genome Research In-stitute, National Institutes of Health; and StephenChanock, MD, National Cancer Institute, National In-stitutes of Health; and the contributions of MaureenMarcello, Clinical and Translational Science Institute,University of Rochester, in the preparation of this ar-ticle. None of these individuals received compensa-tion for their work in association with this article.

REFERENCES

1. Christensen K, Murray JC. What genome-wide as-sociation studies can do for medicine. N Engl J Med.2007;356(11):1094-1097.2. National Center for Biotechnology Information, Na-tional Library of Medicine. Database of Single Nucleo-tide Polymorphisms. http://www.ncbi.nlm.nih.gov/SNP/. Accessed February 14, 2008.3. Wheeler DL, Barrett T, Benson DA, et al. Data-base resources of the National Center for Biotechnol-ogy Information. Nucleic Acids Res. 2008;36(databaseissue):D13-D21.4. Hirschhorn JN, Daly MJ. Genome-wide associa-tion studies for common diseases and complex traits.Nat Rev Genet. 2005;6(2):95-108.5. Hunter DJ, Kraft P. Drinking from the fire hose–statistical issues in genomewide association studies.N Engl J Med. 2007;357(5):436-439.6. Tabor HK, Risch NJ, Myers RM. Candidate-geneapproaches for studying complex genetic traits. NatRev Genet. 2002;3(5):391-397.7. Hirschhorn JN, Lohmueller K, Byrne E, HirschhornK. A comprehensive review of genetic associationstudies. Genet Med. 2002;4(2):45-61.8. Morgan TM, Krumholz HM, Lifton RP, Spertus JA.Nonvalidation of reported genetic risk factors for acutecoronary syndrome in a large-scale replication study.JAMA. 2007;297(14):1551-1561.9. Todd JA. Statistical false positive or true diseasepathway? Nat Genet. 2006;38(7):731-733.10. Patterson M, Cardon L. Replication publication.PLoS Biol. 2005;3(9):e327.11. Altmuller J, Palmer LJ, Fischer G, Scherb H, WjstM. Genome-wide scans of complex human diseases.Am J Hum Genet. 2001;69(5):936-950.12. International HapMap Consortium. A haplotypemap of the human genome. Nature. 2005;437(7063):1299-1320.13. Frazer KA, Ballinger DG, Cox DR, et al; Interna-tional HapMap Consortium. A second generation hu-man haplotype map of over 3.1 million SNPs. Nature.2007;449(7164):851-861.14. Collins FS, Guyer MS, Chakravarti A. Variationson a theme: cataloging human DNA sequencevariation. Science. 1997;278(5343):1580-1581.

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

©2008 American Medical Association. All rights reserved. (Reprinted) JAMA, March 19, 2008—Vol 299, No. 11 1343

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 11: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

15. National Institutes of Health. Policy for sharingof data obtained in NIH supported or conducted ge-nome-wide association studies (GWAS). Federal Reg-ist. 2007;72(166):49290-49297. http://www.grants.nih.gov/grants/guide/notice-files/NOT-OD-07-088.html. Accessed February 14, 2007.16. Hampe J, Franke A, Rosenstiel P, et al. A genome-wide association scan of nonsynonymous SNPs iden-tifies a susceptibility variant for Crohn disease inATG16L1. Nat Genet. 2007;39(2):207-211.17. Weedon MN, Lettre G, Freathy RM, et al. A com-mon variant of HMGA2 is associated with adult andchildhood height in the general population. Nat Genet.2007;39(10):1245-1250.18. Arking DE, Pfeufer A, Post W, et al. A commongenetic variant in the NOS1 regulator NOS1AP modu-lates cardiac repolarization (QT interval). Nat Genet.2006;38(6):644-651.19. Coon KD, Myers AJ, Craig DW, et al. A high-density whole-genome association study reveals thatAPOE is the major susceptibility gene for sporadic late-onset Alzheimer’s disease. J Clin Psychiatry. 2007;68(4):613-618.20. Rioux JD, Xavier RJ, Taylor KD, et al. Genome-wide association study identifies new susceptibility locifor Crohn disease and implicates autophagy in dis-ease pathogenesis. Nat Genet. 2007;39(5):596-604.21. Reiman EM, Webster JA, Myers AJ, et al. GAB2alleles modify Alzheimer’s risk in APOE !4 carriers.Neuron. 2001;54(5):713-720.22. Thorleifsson G, Magnusson KP, Sulem P, et al.Common sequence variants in the LOXL1 gene con-fer susceptibility to exfoliation glaucoma. Science. 2007;317(5843):1397-1400.23. Gudbjartsson DF, Arnar DO, Helgadottir A, et al.Variants conferring risk of atrial fibrillation on chro-mosome 4q25. Nature. 2007;448(7151):353-357.24. Moffatt MF, Kabesch M, Liang L, et al. Geneticvariants regulating ORMDL3 expression contribute tothe risk of childhood asthma. Nature. 2007;448(7152):470-473.25. Dixon AL, Liang L, Moffatt MF, et al. A genome-wide association study of global gene expression. NatGenet. 2007;39(10):1202-1207.26. Manolio TA, Bailey-Wilson JE, Collins FS. Genes,environment and the value of prospective cohortstudies. Nat Rev Genet. 2006;7(10):812-820.27. Libioulle C, Louis E, Hansoul S, et al. Novel Crohndisease locus identified by genome-wide associationmaps to a gene desert on 5p13.1 and modulates ex-pression of PTGER4. PLoS Genet. 2007;3(4):e58.28. Chanock SJ, Manolio T, Boehnke M, et al; NCI-NHGRI Working Group on Replication in AssociationStudies. Replicating genotype-phenotype associations.Nature. 2007;447(7145):655-660.29. Spielman RS, McGinnis RE, Ewens WJ. Transmis-sion test for linkage disequilibrium. Am J Hum Genet.1993;52(3):506-516.30. Cardon LR, Palmer LJ. Population stratification andspurious allelic association. Lancet. 2003;361(9357):598-604.31. Mitchell AA, Cutler DJ, Chakravarti A. Undetectedgenotypingerrorscauseapparentovertransmissionofcom-mon alleles in the transmission/disequilibrium test. AmJ Hum Genet. 2003;72(3):598-610.32. Cupples LA, Arruda HT, Benjamin EJ, et al. TheFramingham Heart Study 100K SNP genome-wide as-sociation study resources. BMC Med Genet. 2007;8(suppl 1):S1.33. Ridker PM, Chasman DI, Zee RY, et al. Rationale,design,andmethodologyoftheWomen’GenomeHealthStudy. Clin Chem. 2008;54(2):249-255.

34. Easton DF, Pooley KA, Dunning AM, et al. Ge-nome-wide association study identifies novel breastcancer susceptibility loci. Nature. 2007;447(7148):1087-1093.35. Zanke BW, Greenwood CM, Rangrej J, et al. Ge-nome-wide association scan identifies a colorectal can-cer susceptibility locus or a chromosome 8q24. NatGenet. 2007;39(8):989-994.36. Skol AD, Scott LJ, Abecasis GR, Boehnke M. Op-timal designs for two-stage genome-wide associa-tion studies. Genet Epidemiol. 2007;31(7):776-788.37. Hoover RN. The evolution of epidemiologicresearch. Epidemiology. 2007;18(1):13-17.38. Mueller PW, Rogus JJ, Cleary PA, et al. Geneticsof Kidneys in Diabetes (GoKinD) study. J Am SocNephrol. 2006;17(7):1782-1790.39. Wellcome Trust Case Control Consortium. Ge-nome-wide association study of 14 000 cases of sevencommon diseases and 3,000 shared controls. Nature.2007;447(7145):661-678.40. Hakonarson H, Grant SF, Bradfield JP, et al. A ge-nome-wide association study identifies KIAA0350 asa type 1 diabetes gene. Nature. 2007;448(7153):591-594.41. Samani NJ, Erdmann J, Hall AS, et al; WTCCC andthe Cardiogenics Consortium. Genomewide associa-tion analysis of coronary artery disease. N Engl J Med.2007;357(5):443-453.42. Scott LJ, Mohlke KL, Bonnycastle LL, et al. A ge-nome-wide association study of type 2 diabetes in Finnsdetects multiple susceptibility variants. Science. 2007;316(5829):1341-1345.43. Dewan A, Liu M, Hartman S, et al. HTRA1 pro-moter polymorphism in wet age-related maculardegeneration. Science. 2006;314(5801):989-992.44. Frayling TM, Timpson NJ, Weedon MN, et al. Acommon variant in the FTO gene is associated withbody mass index and predisposes to childhood andadult obesity. Science. 2007;316(5826):889-894.45. Thomas DC, Witte JS. Point: population stratifi-cation: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol Biomarkers Prev.2002;11(6):505-512.46. Wacholder S, Rothman N, Caporaso N. Coun-terpoint: bias from population stratification is not a ma-jor threat to the validity of conclusions from epide-miological studies of common polymorphisms andcancer. Cancer Epidemiol Biomarkers Prev. 2002;11(6):513-520.47. Plenge RM, Seielstad M, Padyukov L, et al. TRAF1–C5 as a risk locus for rheumatoid arthritis—a genome-wide study. N Engl J Med. 2007;357(12):1199-1209.48. Stacey SN, Manolescu A, Sulem P, et al. Com-mon variants on chromosomes 2q35 and 16q12 con-fer susceptibility to estrogen receptor-positive breastcancer. Nat Genet. 2007;39(7):865-869.49. Devlin B, Roeder K. Genomic control for associa-tion studies. Biometrics. 1999;55(4):997-1004.50. Gabriel SB, Schaffner SF, Nguyen H, et al. Thestructure of haplotype blocks in the human genome.Science. 2002;296(5576):2225-2229.51. McCarroll SA, Altshuler DM. Copy-number varia-tion and association studies of human disease. NatGenet. 2007;39(7)(suppl):S37-S42.52. Feuk L, Marshall CR, Wintle RF, Scherer SW. Struc-tural variants. Hum Mol Genet. 2006;15(spec no 1):R57-R66.53. Duerr RH, Taylor KD, Brant SR, et al. A genome-wide association study identifies IL23R as an inflam-matory bowel disease gene. Science. 2006;314(5804):1461-1463.54. Moskvina V, Craddock N, Holmans P, et al. Ef-

fects of differential genotyping error rate on the typeI error probability of case-control studies. Hum Hered.2006;61(1):55-64.55. Sladek R, Rocheleau G, Rung J, et al. A genome-wide association study identifies novel risk loci for type2 diabetes. Nature. 2007;445(7130):881-885.56. Tomlinson I, Webb E, Carvajal-Carmona L, et al.A genome-wide association scan of tag SNPs identi-fies a susceptibility variant for colorectal cancer at8q2421. Nat Genet. 2007;39(8):984-988.57. Last JM, ed. Dictionary of Epidemiology. NewYork, NY: Oxford University Press; 1983:7.58. Gordis L, ed. Epidemiology. 2nd ed. Philadel-phia, PA: WB Saunders Co; 2000:165.59. Zollner S, Pritchard JK. Overcoming the winner’scurse: estimating penetrance parameters from case con-trol data. Am J Hum Genet. 2007;80(4):605-615.60. Yang Q, Cui J, Chazaro I, Cupples LA, DemissieS. Power and type I error rate of false discovery rateapproaches in genome-wide association studies. BMCGenet. 2005;6(suppl 1):S134.61. Hochberg Y, Benjamini Y. More powerful proce-dures for multiple significance testing. Stat Med. 1990;9(7):811-818.62. Sabatti C, Service S, Freimer N. False discoveryrate in linkage and association genome screens for com-plex disorders. Genetics. 2003;164(2):829-833.63. Wacholder S, Chanock S, Garcia-Closas M, ElGhormli L, Rothman N. Assessing the probability thata positive report is false. J Natl Cancer Inst. 2004;96(6):434-442.64. Todd JA, Walker NM, Cooper JD, et al. Robustassociations of four new chromosome regions fromgenome-wide analyses of type 1 diabetes. Nat Genet.2007;39(7):857-864.65. Helgason A, Palsson S, Thorleifsson G, et al. Re-fining the impact of TCF7L2 gene variants on type 2diabetes and adaptive evolution. Nat Genet. 2007;39(2):218-225.66. Khoury MJ, Little J, Gwinn M, Ioannidis JPA. Onthe synthesis and interpretation of consistent but weakgene-disease associations in the era of genome-wideassociation studies. Int J Epidemiol. 2007;36(2):439-445.67. Yeager M, Orr N, Hayes RB, et al. Genome-wideassociation study of prostate cancer identifies a sec-ond risk locus at 8q24. Nat Genet. 2007;39(5):645-649.68. Frayling TM, McCarthy MI. Genetic studies of dia-betes following the advent of the genome-wide as-sociation study: where do we go from here?Diabetologia. 2007;50(11):2229-2233.69. Hunter DJ, Kraft P, Jacobs KB, et al. A genome-wide association study identifies alleles in FGFR2 as-sociated with risk of sporadic postmenopausal breastcancer. Nat Genet. 2007;39(7):870-874.70. Hafler DA, Compston A, Sawcer S, et al; The In-ternational Multiple Sclerosis Genetics Consortium. Riskalleles for multiple sclerosis identified by a genome-wide study. N Engl J Med. 2007;357(9):851-862.71. Saxena R, Voight BF, Lyssenko V, et al. Genome-wide association analysis identifies loci for type 2 dia-betes and triglyceride levels. Science. 2007;316(5829):1331-1336.72. Slater EE, MacDonald JS. Mechanism of action andbiological profile of HMG CoA reductase inhibitors: anew therapeutic alternative. Drugs. 1988;36(suppl 3):72-82.73. Gehrs KM, Anderson DH, Johnson LV, Hage-man GS. Age-related macular degeneration–emerging pathogenetic and therapeutic concepts. AnnMed. 2006;38(7):450-471.

INTERPRETING A GENOME-WIDE ASSOCIATION STUDY

1344 JAMA, March 19, 2008—Vol 299, No. 11 (Reprinted) ©2008 American Medical Association. All rights reserved.

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from

Page 12: How to Interpret a Genome-wide Association Studypsych.colorado.edu/~carey/pdfFiles/GWAS_Pearson.pdf · Genome-wide association study Any study of genetic variation across the entire

conduct such studies are likely to assume that their workfalls under this category. Conversely, public health and healthservices researchers might be confused to see the term ap-plied to their work.

Furthermore, using “clinical” in both terms perpetu-ates the tendency of the medical profession to viewhealth research through the clinician’s lens alone. Fiscellaet al do include “organizational- and community-focused” research within their definition of applied clini-cal research, but labeling health interventions outside theclinic as “clinical” research may be a forced fit. Pros andcons exist with other potential terms such as knowledgetranslation—the term discussed by Dr Graham and MsTetroe—but all of them are an improvement over theambiguity of T2.

Graham and Tetroe call attention to the excellent workof the Canadian Institutes of Health Research. Canadian in-vestigators and institutions have played a leadership role notonly in writing about the need for researchers to align theirwork with the information needs of end users1 but also inmaking real commitments in programs and funding to fa-cilitate T2 as a nation.2 The United States would do well tofollow the Canadian example.

T1 is among a group of clinical research movementsthat are attracting attention and resources but are ulti-mately unhelpful to patients without T2. Recently, politi-cians and industry have announced plans to channel mil-lions of dollars per year into research on “comparativeeffectiveness”3 and “personalized medicine”4 while keep-ing funding for health services research threadbare.5

Popular research initiatives address worthy questions:whether a treatment can be produced (T1), whether itimproves health (evidence-based medicine), which treat-ment is best (comparative effectiveness), and which isbest for an individual patient (personalized medicine).But the answers remain academic if the patient cannotobtain or use the intervention. Overcoming suchobstacles so that the products of research benefit all thosein need is itself a crucial research priority.

Steven H. Woolf, MD, [email protected] of Family MedicineVirginia Commonwealth UniversityRichmond

Financial Disclosures: None reported.

1. Ross S, Lavis J, Rodriguez C, Woodside J, Denis JL. Partnership experiences:involving decision-makers in the research process. J Health Serv Res Policy. 2003;8(suppl 2):26-34.2. Lomas J. Using ‘linkage and exchange’ to move research into policy at a Ca-nadian foundation. Health Aff (Millwood). 2000;19(3):236-240.3. US House of Representatives Committee on Ways and Means Subcommitteeon Health: hearing on strategies to increase information on comparative clinicaleffectiveness [Tuesday, June 12, 2007]. http://waysandmeans.house.gov/hearings.asp?formmode=detail&hearing=565. Accessed February 29, 2008.4. US Department of Health and Human Services: remarks prepared for the Hon-orable Mike Leavitt, Secretary of Health and Human Services, on personalized medi-cine coalition. http://www.hhs.gov/news/speech/2007/sp20070919a.html. Ac-cessed February 29, 2008.5. Coalition for Health Services Research: federal funding for health services re-search (as of December 26, 2007). http://www.chsr.org/2008.pdf. Accessed Feb-ruary 29, 2008.

CORRECTIONSIncorrect Data: In the Perspectives on Care at the Close of Life article titled “Man-aging an Acute Pain Crisis in a Patient With Advanced Cancer: ‘This Is as Much ofa Crisis as a Code’ ” published in the March 26, 2008, issue of JAMA (2008;299(12):1457-1467), an incorrect dose ratio appeared in Table 2. The hydromorphone-to-methadone ratio for less than 330 mg/24 hours of hydromorphone that read“16:1” should have been “1.6:1.”

Incorrect Legend: In the Special Communication entitled “How to Interpret a Ge-nome-wide Association Study” published in the March 19, 2008, issue of JAMA(2008;299[11]:1335-1344), an integral word was omitted from the Figure 3 leg-end. The sentence that read, “Genome-wide association studies assume a priorihypotheses about candidate genes or regions that might be associated with dis-ease; rather, they test single-nucleotide polymorphisms (SNPs) throughout the ge-nome for possible evidence of genetic susceptibility” should have read, “Genome-wide association studies assume no a priori hypotheses about candidate genes orregions that might be associated with disease; rather, they test single-nucleotidepolymorphisms (SNPs) throughout the genome for possible evidence of geneticsusceptibility.”

Unreported Research Funding: In the Research Letter titled “Exhaled Carbon Mon-oxide With Waterpipe Use in US Students,” published in the January 2, 2008, is-sue of JAMA (2008;299[1]:36-38), the Financial Disclosures should have in-cluded the following: Dr Hammond reports that she has received research fundingfor studies on environmental tobacco smoke from the National Institutes of Healthand from the Flight Attendants Medical Research Institute. However, none of thesegrants were used to support the study reported in this Research Letter.

LETTERS

2150 JAMA, May 14, 2008—Vol 299, No. 18 (Reprinted) ©2008 American Medical Association. All rights reserved.

at University of Colorado - Denver HSL on October 12, 2010 www.jama.comDownloaded from