an introduction to genome-wide association studies shen-chih chang, phd epi 295 oct 2, 2009
Post on 21-Jan-2016
221 Views
Preview:
TRANSCRIPT
An Introduction to An Introduction to Genome-Wide Association Genome-Wide Association StudiesStudies
Shen-Chih Chang, PhDEpi 295Oct 2, 2009
What is a genome-wide What is a genome-wide association study?association study?
Any study of genetic variation across the entire human genome designed to identify genetic association with a disease.
It usually refers to studies with genetic markers of 100,000 or more to represent a large proportion of variation in the human genome.
It allows for efficient and comprehensive analysis of common genomic variation to be conducted without priori hypotheses based on gene function or disease pathways.
It requires very large series of cases and controls to ensure adequate statistical power, and multiple subsequent studies to confirm the initial findings.
Why are such studies Why are such studies possible now?possible now?
With the completion of the Human Genome Project in 2003 and the International HapMap Project in 2005, researchers now have a set of research tools that make it possible to find the genetic contributions to common diseases.
The tools include ◦computerized databases that contain the reference human genome sequence
◦a map of human genetic variation ◦a set of new technologies that can quickly and accurately analyze whole-genome samples.
Molecular Epidemiology Molecular Epidemiology Before/After GWASBefore/After GWAS
Before GWAS◦A few markers at a time◦Gene functions◦Disease pathways◦Biological mechanisms
After GWAS◦Hundreds of thousands of markers at a time
◦Gene hunting◦Association Studies
Association studies
Gene functionsDisease pathwaysBiological mechanisms
Terms Frequently Used in Terms Frequently Used in GWASGWAS
Single-nucleotide polymorphism (SNP)◦DNA sequence variation resulted from a single-base substitution.
◦Most common form of genetic variation in the genome.
Nonsynonymous SNP◦A polymorphism which results in a change in the amino acid sequence of a protein (and therefore may affect the function of the protein)
Alleles◦Alternative DNA sequences at the same physical gene locus, which may or may not result in different phenotypic traits.*Adopted from Pearson, et
al, 2008.
Terms Frequently Used in Terms Frequently Used in GWAS (continued)GWAS (continued)
Minor allele◦The allele of a bi-allelic polymorphism that is less frequent in the study population.
Minor allele frequency (MAF)◦Proportion of the less common allele in a population, ranging from less than 1% to 50%.
Haplotype◦A group of specific alleles at neighboring genes or markers that tend to be inherited together
Terms Frequently Used in Terms Frequently Used in GWAS (continued)GWAS (continued)
Linkage disequilibrium (LD)◦ Associations between 2 alleles located near each other on a chromosome, such that they are inherited together more frequently than expected by chance.
Tag SNP◦ A SNP which is in strong LD with other SNPs so that it can serve as a proxy for these SNPs. So we only need to genotype the Tag SNPs which can represent the variation of the gene(s).
Hardy-Weinberg equilibrium (HWE)◦ The population distribution of two alleles (with frequencies p and q) is stable from generation to generation.
◦ Genotypes occur at frequencies of p2, 2pg, and q2 for the major allele homozygote, heterozygote, and minor allele homozygote, respectively.
Other types of common DNA Other types of common DNA variation variation Deletion/InsertionCopy number variation
Single Nucleotide Single Nucleotide PolymorphismPolymorphism
AAGTCAGTCTAGGATCGGG
TTCAGTCAGATCCTTCAGTCAGATCCTTAGCCCAGCCC
TTCAGTCAGATCCTTCAGTCAGATCCCCAGCCCAGCCC
AAGTCAGTCTAGGAAGTCAGTCTAGGGGTCGGGTCGGG
Person 1
Person 2
SNPSNP
Insertion/DeletionInsertion/Deletion
AAGTCAGTCTAGGATCGGG
TTCAGTCAGATCCTTCAGTCAGATCCTTAGCCCAGCCC
TTCAGTCAGATCCTTCAGTCAGATCCCTCTAGCCCAGCCC
AAGTCAGTCTAGGAAGTCAGTCTAGGGAGATCGGGTCGGG
Person 1
Person 2
Insertion/Insertion/DeletionDeletion
Copy Number VariationCopy Number Variation
AAGTGTCGTCGTCGTCTCGGG
TTCATTCACAGCAGCAGCAGCAGCAGCAGCAGAGCCCAGCCC
TTCATTCACAGCAGCAGCAGCAGCAGAGCCCAGCCC
AAGTAAGTGTCGTCGTCGTCGTCGTCTCGGGTCGGG
Person 1
Person 2
3 vs. 4 trinucleotide 3 vs. 4 trinucleotide repeatsrepeats
Genotyping in GWA Genotyping in GWA StudiesStudiesGWA studies rely on LD information.
Usually at least 1 SNP within a group of SNPs with high LD (r2 ≥ 0.8) will be included on the platform.
Genotyping platforms comprising 500,000 to 1,000,000 SNPs have been estimated to capture:◦67% to 89% of common SNP variation in European and Asian ancestry
◦46% to 66% in African ancestry
Genotyping PlatformsGenotyping Platforms
Affymetrix ◦Genome-Wide Human SNP Array 6.0
More than 906,600 SNPs More than 946,000 copy number probes
Illumina ◦HumanAmni1-Quad /Human1M-Duo
~ 1 million markers◦Human660W_Quad
~650,000 markers◦HumanCytoSNP-12
~ 300,000 markers
Quality Control in GWA Quality Control in GWA StudiesStudiesCertain thresholds should be set up to ensure genotyping quality:◦the SNP call rate, typically > 95%
◦the minor allele frequency, typically > 1%
◦violations of Hardy Weinberg equilibrium
◦concordance rates in duplicate samples, typically > 99.5%.
Study Design in GWA Study Design in GWA StudiesStudies
A multistage approach can reduce the amount of genotyping required, without sacrificing power.
In stage 1, a full set of SNPs is genotyped in a fraction of samples, and a p-value threshold is used to identify a subset of SNPs with putative associations.
In the second and possibly third stages, the SNPs identified from the first stage are re-tested in populations that are larger or of a similar size.
The replication results can be used to distinguish the few true-positive associations identified in stage 1 from the many false-positive results that occur by chance.
Multistage Study Multistage Study DesignsDesigns
Joel N. Hirschhorn & Mark J. DalyNature Reviews Genetics 6, 95-108, 2005
Examining GWAS DATAExamining GWAS DATAPopulation stratification (population sturcture)◦ Refers to confounding in genetic association studies caused by genetic differences between cases and controls unrelated to disease but due to sampling them from populations of different ancestries.
Quantile-Quantile plot (Q-Q plot)◦ In a GWAS, a Q-Q plot can be used to assess the magnitude of population stratification.
◦ Observed association statistics (χ 2 or -log10 P values) are ranked from smallest to largest on the y-axis and plotted against the distribution of what would be expected under the null hypothesis of no association on the x-axis.
◦ Deviations from the identity line suggest either a very highly associated locus or significant differences in population structure.
Copyright restrictions may apply.
Pearson, T. A. et al. JAMA 2008;299:1335-1344.
Hypothetical Quantile-Quantile Plots in Genome-wide Association Studies
The sharp deviation above an expected value of approximately 8 could be due to a strong association of the disease with SNPs in a heavily genotyped region.
Inflation of observed statistics is more likely due to population structure than disease susceptibility genes.
Analyzing GWA StudiesAnalyzing GWA StudiesDifferent genetic model◦Additive model: each copy of the allele is assumed to increase risk by the same amount.
◦Dominant model: rare allele carriers compared to homozygotes of the common allele
◦Recessive model: homozygotes of the rare allele compared to common allele carriers
Correction of multiple comparison◦Bonferroni correction has been the most commonly used correction in GWAS:
◦A threshold of P = 0.05/number of tests performed
Selected Genome-Wide Selected Genome-Wide Association StudiesAssociation StudiesOn Lung Cancer
deCODE Genetics, IcelandPopulation: European descendantOutcome: smoking quantity
First Stage – First Stage – discoverydiscovery
10,995 Icelandic smokersInfimium HumanHap300 SNP chips (Illumina)
306,207 SNPs Significance threshold = 2 x 10-7 ≈ 0.05/306207
•Identified allele T of rs1051730 most strongly associated with smoking quantity (P = 5 x 10-16)•On chromosome 15q24•Within the CHRNA3 gene in a linkage disequilibrium block containing CHRNA5 and CHRNB4 (encode nicotinic acetylcholine receptors)
Second Stage - Second Stage - replicationreplication
Genotyping rs1051730 on additional samples (Centarus)
523 smokers from Spain1,375 smokers from The Netherlands
Third Stage - expansionThird Stage - expansionassociation with lung cancer and association with lung cancer and
peripheral arterial diseaseperipheral arterial diseaseFor lung cancer: three case-control studies from Iceland, Spain, and The Netherlands
For peripheral arterial disease: five case-control studies from Iceland, New Zealand, Austria, Sweden, and Italy
*Adjusted on sex and year of birth.
IARC Central Europe lung cancer studyEuropean descendant Outcome: lung cancer
First Stage - First Stage - discoverydiscovery
1,926 lung cancer cases/2,522 controls
Illumina Sentrix HumanHap300 BeadChip (Illumina)
310,023 SNPs (≈ 80% of common genomic variation)
Significance threshold = 5 x 10-7
•Identified two SNPs on chromosome 15q25,
•rs1051730 (P = 5 x 10-9) •Allelic ORadj, 1.30 (1.19-1.43)
•rs8034191 (P = 9 x 10-10) •Allelic ORadj, 1.32 (1.21-1.45)
Second Stage - Second Stage - expansionexpansion
Genotyping 34 additional 15q25 SNPs (Taqman):◦SNPs with an association P-value of < 10-6 from Center d’Etude du Polymorphism Humain Utah (CEU) HapMap (using an imputation method)
◦SNPs of CHRNA5 and CHRNA3 from previous studies on nicotine dependence
◦All non-synonymous SNPs in dbSNP from the six genes within or near the association region
Findings:◦23 showed evidence of association exceeding the genome-wide significance level
◦Strong LD region
Third Stage - Third Stage - confirmationconfirmationGenotyping rs8034191 (from the first panel) and rs16969968 (from the second panel)
In five independent studies of lung cancer◦ the European Prospective Investigation in Cancer and Nutrition (EPIC) cohort study (781 cases/1,578 controls)
◦ the Beta-Carotene and Retinol Efficacy Trial (CARET) cohort study (764 cases/1,515 controls)
◦ the Health Study of Nord-Trondelag (HUNT) and Tromso cohort studies (235 cases/392 controls)
◦ the Liverpool lung cancer case-control study (403 cases/814 controls)
◦ the Toronto lung cancer case-control study (330 cases/453 controls)
Third Stage - Third Stage - replicationreplication
Findings:•An increased risk for both heterozygous and homozygous carriers was observed in all five replication samples.•Two SNPs are in high LD (D’ = 1.00, r2 = 0.92).•An increased risk was also observed among non-smokers.
Fourth Stage – Fourth Stage – expansionexpansionassociation with head and neck association with head and neck cancercancerGenotyping rs8034191 in two separate studies of head and neck cancer in Europe◦Five of the six countries in the original GWAS, overlapping with the lung cancer controls (726 cases/694 controls)
◦The ARCAGE study with eight countries in Europe (1,536 cases/1,443 controls)
Findings:◦No association with HNC was observed.◦No evidence of an association with nicotine addiction.
MD AndersonEuropean descendant Outcome: lung cancer
First Stage - First Stage - discoverydiscovery
1,154 lung cancer cases/1,137 controls (all smokers)
Illumina HumanHap300 v1.1 BeadChips (Illumina)
315,860 SNPsSignificance threshold = 4.9 x 10-5
(the least significant result among the ten SNPs retained for follow-up)
Choose 10 SNPs with the most significant associations
Second Stage - Second Stage - replicationreplication
Genotyping 10 most significant associations from the discovery phase (Taqman) in two studies:◦Texas (711 cases/632 controls)◦ UK (2,013 cases/3,062 controls)
Findings:◦Elevated risks were replicated in 2/10 SNPs rs1051730 rs8034191 Two SNPs are in High LD (r2 > 0.8)
*Adjusted on age, sex, packyears, and center
A Catalog of Published A Catalog of Published Genome-Wide Association Genome-Wide Association StudiesStudieshttp://www.genome.gov/26525384#1
GWA Replication GWA Replication StudiesStudiesLung CancerBladder CancerHead and Neck Cancer
• Manuscript submitted to JNCI• Organizer: International Lung Cancer Consortium (ILCCO)• Participants: 21 studies (11,645 cases/14,954 controls)
• 9 from North America• 8 from Europe• 4 from Asia
Table 1 – Summary of the participating studies from ILCCO
Ref. Coordinating institute Study location Recruitment
period Eligibility Control source Cases* Controls*
Whites MD Anderson Cancer Center Texas, US Only ever smokers Hospital 709 629
Karmanos Cancer Institute (KCI), Wayne state University
Michigan, US 1988-2007 Population 575 860
University of Hawaii Hawaii, US 1992-1997 26-79 years old Population 138 175 Mayo clinic Minnesota, US 1997-2006 Hospital 1,644 1,021
Norris Cotton Cancer Center, Darmouth
Medical School (NELCS study) New Hampshire, US 2005-2008 Population 228 162
16 Penn State University College of Medicine Florida, US 2000-2003 18-79 years-old, within 1 year diagnosis, no previous cancer
history Community 447 733
18 University of California, Los Angeles (UCLA) California, US 1999-2004 18-65 years old Population 319 581
13 University of California, San Francisco (UCSF)
San Francisco, US 1998-2003, 2005-2009
>18 years old Population and community
1,804 558
17 National Institute of Occupational Health Norway 1986-2005 Current smokers or quit
smoking <5 years Population 443 436
University of Sheffield United Kingdom 2005-2009 Diagnosed under age 61 or reported family history
Population recruited through family
114 133
INSERM U794 France Only ever smokers 135 146 10 12 14 15
Helmholtz Center Munich University of Göttingen Medical School
German Cancer Research Center (DKFZ) German Cancer Research Center (DKFZ)
Munich, Göttingen, Germany Munich, Germany
Heidelberg, Germany Heidelberg, Germany
2000-2008 1990-1998 1997-2007 1994-1998
LUCY 18-51 years old INRA all ages
DKFZ from 18 years old EPIC from 18 years old
Population 1,839 3,336
8 German Cancer Research Center (DKFZ) Saarland, Germany 2000-2003 50-75 years old Population 198 203 University Hospital of Cologne Cologne, Germany 2005-2008 Hospital 450 327
6 Division of Medical Oncology, University
Hospital Zaragoza, Spain 2006-2008 Hospital 350 1,227
6 Radboud University Nijmegen Medical Centre
Netherlands 2008 18-75 years old Population 396 2,068
CHS National Cancer Control Center at Carmel Medical Center and Technion Haifa, Israel 2007-2009 Population 212 197
Asians 18 University of California, Los Angeles (UCLA) California, US 1999-2004 18-65 years old Population 58 53 University of Hawaii Hawaii, US 1992-1997 26-79 years old Population 100 170
Samuel Lunenfeld Research Institute Ontario, Canada 1997-2002 Residents of Greater Toronto
Area Population and
hospital 65 98
9 Seoul National University Korea 2001-2008 Hospital Hospital 271 276 National University of Singapore Singapore 2005-2007 Only women Hospital 484 813
11 Aichi Cancer Center Japan 2000-2005 20-79 years old Hospital 716 716
Total 11,645 14,954
* Maximum number of cases and controls of European and Asian ethnic groups with DNA
SNPs selectedSNPs selected Rare allele freq*
Gene Location Gene Name
Gene Symbol SNP ID
Nucleotide Change
Amino Acid Change
Han Chinese European Gene Function
15q25
Cholinergic receptor, nicotinic, alpha 5
CHRNA5 rs16969968 Ex5-54G>A Asp398Asn 0.03 (A) 0.42 (A) Nicotinic acetylcholine receptors are members of a superfamily of ligand-gated ion channels that mediate fast signal transmission at synapses.
rs931794 -31881G>A 0.31 (G) 0.43 (G) Cholinergic receptor, nicotinic, alpha 3
CHRNA3 rs12914385 IVS4-4117G>A 0.25 (A) 0.43 (A) rs1317286 T>C 0.08 (C) 0.41 (C)
Similar to RIKEN cDNA C630028N24 gene
LOC123688 rs8034191 IVS2+256T>C 0.04 (C) 0.43 (C) A hypothetical gene
6p Nucleolar protein 5B pseudogene
NOL5BP rs4324798 G>A 0.00 (A) 0.11 (A)
(Unknown) (Unknown) rs2256543 C>T 0.17 (T) 0.43 (T)
5p15
Cisplatin resistance related protein CRR9p
CLPTM1L rs402710 IVS16+9G>A 0.27 (A) 0.33 (A) CLPTM1L is a predicted transmembrane protein that is expressed in a range of normal and malignant tissues including skin, lung, breast, ovary and cervix.
Telomerase reverse transcriptase
TERT rs2736100 IVS2-3777G>T 0.41 (G) 0.45 (T) The enzyme consists of a protein component with reverse transcriptase activity, encoded by this gene, and an RNA component which serves as a template for the telomere repeat.
*From HapMap
• For Caucasians: two variants in the 15q25 locus (rs8034191 and rs16969968), two variants in 5p15 (rs402710 and rs2736100), and two variants in 6p21 (rs4324798 and rs2256543).• For Asians: three additional variants were selected in the 15q25 region (rs12914385, rs1317286 and rs931794) and the variants in 6p21 were not genotyped based on their low prevalence in these populations
MethodsMethods Study population
◦ The control group was frequency matched to cases on age and sex in most of the studies.
◦ Some other studies also matched on ethnicity, residence or smoking status
◦ Only Whites or Asians were included
Genotyping method◦ Genotyping was performed locally using Taqman probes supplied centrally from IARC
◦ Toronto and France studies used data from the Illumina HumanHap300 BeadChip
◦ German Multicentre and Saarland study used Sequenom’s iPLEX assay
◦ Spain and Netherlands studies used the Centaurus (Nanogen) platform
Method (continued)Method (continued)• Quality control◦ 90 standard DNAs were used as inter-lab
control◦ The study was excluded from the analysis of a
variant if more than one discrepancy for that variant was found
◦ Average call rates per SNP: 97.1% to 99.6%◦ No deviation from HWE was observed (P cutoff =
0.0005, considering 100 independent tests)
Statistical analysis◦ Unconditional logistic regression to estimate
ORs and 95% CIs◦ Adjusted on age, sex, center, and smoking
packyears◦ Cochran’s Q test for heterogeneity◦ SAS 9.1 software was used
Table 2 – Distribution of selected demographic variables by ethnic group Whites Asians cases controls cases controls n % n % n % n % Sex Male 5,741 57.7 7,325 57.1 838 49.5 902 42.4 Female 4,210 42.3 5,503 42.9 856 50.5 1,224 57.6 Age (years) <50 1,252 12.6 2,969 23.1 182 10.7 243 11.4 50-59 2,499 25.1 3,443 26.8 426 25.1 492 23.1 60-69 3,273 32.9 3,859 30.1 565 33.4 679 31.9 70-79 2,451 24.6 2,310 18.0 443 26.2 612 28.8 ≥80 476 4.8 247 1.9 78 4.6 100 4.7 Smoking status Never 962 9.7 4,136 32.2 674 39.8 1,270 59.7 Former smoker 4,125 41.4 4,491 35.0 461 27.2 470 22.1 Current smoker 4,644 46.7 3,173 24.7 526 31.1 308 14.5 Ex or current 134 1.3 455 3.6 23 1.4 20 0.9 Missing 86 0.9 573 4.5 10 0.6 58 2.7 Histology Adenocarcinoma 3,892 39.1 329 30.1 Squamous cell 2,370 23.8 317 29.0 Large cell 413 4.2 96 8.8 Small cell 1,235 12.4 109 10.0 Other or not specified 2,041 20.5 243 22.2 Total 9,951 12,828 1,694 2,126
Variants Risk allele
Allele Cases Controls Heterozygotes Homozygotes Per allele p-trend p-heterogeneity freq. ref/het./hom. ref/het./hom. OR (95% CI) OR (95% CI) OR (95% CI) (by study) Whites Chr 15q25
rs16969968 A 0.35 3,371 / 4,523 / 1,484 4,827 / 5,019 / 1,373 1.33 (1.24-1.41) 1.54 (1.41-1.69) 1.26 (1.21-1.32) 2.10-26 0.09 rs8034191 G 0.35 2,586 / 3,488 / 1,185 4036 / 4,256 / 1,171 1.33 (1.24-1.43) 1.62 (1.47-1.79) 1.29 (1.23-1.35) 6.10-25 0.15
Chr 5p15 rs2736100 C 0.51 1,878 / 4,526 / 2,722 2,853 / 5,817 / 3,142 1.16 (1.07-1.25) 1.32 (1.21-1.43) 1.15 (1.10-1.20) 1.10-10 0.60 rs402710 G 0.65 873 / 3,847 / 4,140 1,115 / 4,178 / 3,905 1.16 (1.04-1.28) 1.30 (1.18-1.45) 1.14 (1.09-1.19) 5.10-8 0.73
Chr 6p rs2256543 A 0.43 2,898 / 4,519 / 1,803 3,860 / 5,813 / 2,260 1.03 (0.96-1.10) 1.07 (0.98-1.16) 1.03 (0.99-1.08) 0.14 0.92 rs4324798 A 0.08 8,066 /1,630 / 111 10,580 / 1,911 / 99 1.04 (0.96-1.12) 1.39 (1.04-1.87) 1.07 (0.99-1.14) 0.07 0.11
Asians Chr15q25
rs16969968 A 0.03 1,591 / 98 / 2 1,986 / 125 / 5 0.98 (0.75-1.30) 0.44 (0.08-2.31) 0.94 (0.73-1.23) 0.67 0.07 rs8034191 G 0.03 1,583 / 104/ 3 1,992 /122 / 3 1.06 (0.81-1.40) 1.06 (0.21-5.36) 1.06 (0.82-1.37) 0.66 0.09 rs12914385 T 0.30 728 / 647/ 148 584 / 762 / 177 1.05 (0.91-1.21) 1.04 (0.81-1.32) 1.03 (0.93-1.14) 0.58 0.10 rs1317286 G 0.10 1,223 / 291/ 13 1,521 / 313 / 22 1.18 (0.99-1.41) 0.73 (0.36-1.47) 1.10 (0.94-1.30) 0.23 0.10 rs931794 G 0.37 591 / 721/ 213 764 / 828 / 264 1.12 (0.96-1.29) 1.01 (0.82-1.25) 1.03 (0.93-1.14) 0.54 0.10
Chr5p15 rs2736100 C 0.39 538 / 836 / 312 775 / 1,014 / 312 1.24 (1.07-1.43) 1.51 (1.24-1.83) 1.23 (1.12-1.35) 2.10-5 0.32 rs402710 G 0.68 144 / 694 / 842 219 / 917 / 981 1.15 (0.91-1.46) 1.32 (1.05-1.66) 1.15 (1.04-1.27) 0.007 0.22
Ref.: reference class; Het.: Heterozygote; Hom.: Homozygote for the risk allele Risk allele frequencies are calculated among controls ORs are adjusted on age, sex, study
•Among whites:•Increased risk was observed for two SNPs on Chr15, and two SNPs on Chr5
•Among Asians:•Increased risk was observed for two SNPs on Chr5
Table 3 – Summary estimates of the main effects of the selected variants in White and Asian ethnic groups
HeterozygousHomozygous
Adenocarcinomas Squamous Large cell Small cell
Never smokers Former smokers Current smokers
>0-<10 packyears 10-<20 packyears 20-<30 packyears 30-<40 packyears 40-<50 packyears 50-<60 packyears >=60 packyears
<50 50-60 60-70 >=70
Men Women
Co-dominant
By histology (p-heterogeneity=0.38)
By smoking status (p-heterogeneity=0.0001)
By packyears (p-heterogeneity=0.75)
By age (p-trend=0.002)
By gender (p-heterogeneity=0.88)
45231484
9378
37762128384
1106
92239944277
453771
107613731257919
2069
1177235630952750
52644114
50191373
11219
11219112191089610597
370641812875
1591127111651057691400693
2323308435302282
64454774
1.331.54
1.26
1.221.311.231.21
1.021.271.39
1.191.221.221.281.111.291.14
1.491.311.191.16
1.261.27
1.24-1.411.41-1.69
1.21-1.32
1.15-1.291.23-1.411.06-1.431.10-1.33
0.91-1.141.18-1.371.29-1.50
0.99-1.431.05-1.421.07-1.391.13-1.460.97-1.291.07-1.561.00-1.30
1.34-1.661.21-1.421.11-1.291.06-1.27
1.19-1.331.19-1.35
CHRNA5 (rs16969968) Ca Co OR 95%CI
1.0 1.2 1.4 1.6
OR
Figure 1 – Stratified analysis for rs16969968 (Chr 15) in Whites
•No association among never smokers•Stronger associations in current smokers than in former smokers
All Controls Cases
n mean CI 95% n mean CI 95% n mean CI 95%
rs16969968 (CHRNA5) GG 5425 20.74 20.36-21.12 2610 17.99 17.45-18.53 2815 22.68 22.14-23.22
GA 6597 21.85 21.49-22.20 2701 19.20 18.67-19.78 3896 23.70 23.22-24.18
AA 2039 23.48 22.92-24.04 746 20.56 19.68-21.44 1293 25.56 24.84-26.28
p-trend 7.10-19 6.10-9 5.10-12
Table 4 – Association between rs16969968 and smoking intensity expressed in cigarettes per day in White ethnic group Means are adjusted by age, sex, study and case/control status when appropriate
• The mean of cigarettes per day was higher among homozygous carriers for the risk allele compared to carriers of the common allele.
HeterozygousHomozygous
Adenocarcinomas Squamous Large cell Small cell
Never smokers Former smokers Current smokers
<50 50-60 60-70 >=70
Men Women
Co-dominant
By histology (p-heterogeneity=0.0002)
By smoking status (p-heterogeneity=0.44)
By age (p-heterogeneity=0.13)
By gender (p-heterogeneity=0.02)
45262722
9162
35512162405
1205
93436994309
1192232729862657
52883874
58173142
11812
11666118121134411733
397240952760
2708323435362334
67585054
1.161.32
1.15
1.201.061.331.00
1.221.141.12
1.181.241.111.09
1.101.22
1.07-1.251.21-1.43
1.10-1.20
1.13-1.270.99-1.131.15-1.540.92-1.09
1.09-1.351.07-1.231.04-1.20
1.06-1.311.15-1.351.04-1.201.00-1.20
1.05-1.171.14-1.30
TERT (rs2736100) Ca Co OR 95%CI
0.8 1.0 1.2 1.4 1.8
OR
HeterozygousHomozygous
Adenocarcinomas Squamous Large cell Small cell
Never smokers Former smokers Current smokers
<50 50-60 60-70 >=70
Men Women
Co-dominant
By histology (p-heterogeneity=0.01)
By smoking status (p-heterogeneity=0.75)
By age (p-heterogeneity=0.63)
By gender (p-heterogeneity=0.03)
836312
1686
92731794
109
671458524
181423562520
834852
1014312
2101
2101210121012101
1264454305
242487674698
8861215
1.241.51
1.23
1.320.931.171.00
1.271.241.15
1.241.201.341.15
1.101.35
1.07-1.431.24-1.83
1.12-1.35
1.18-1.480.78-1.120.87-1.590.75-1.33
1.10-1.461.02-1.510.93-1.42
0.92-1.670.99-1.441.14-1.590.98-1.37
0.96-1.261.19-1.54
TERT_rs2736100 Ca Co OR 95%CI
0.8 1.0 1.2 1.4 1.8
OR
Figure 2 – Stratified analysis for rs2736100 and rs402710 (Chr 5) in Whites and Asians
• More important in adenocarcinomas and large cell carcinomas.•Stronger association in women (no heterogeneity by gender was observed in adenocarcinoma analysis only)
Whites Asians
HeterozygousHomozygous
Adenocarcinomas Squamous Large cell Small cell
Never smokers Former smokers Current smokers
<50 50-60 60-70 >=70
Men Women
Co-dominant
By histology (p-heterogeneity=0.03)
By smoking status (p-heterogeneity=0.27)
By age (p-heterogeneity=0.51)
By gender (p-heterogeneity=0.85)
38474140
8860
35832004364
1102
84536174217
1144220229102604
49773883
41783905
9198
9052919887318571
307031552547
2108264227831665
54723726
1.161.30
1.14
1.181.151.211.00
1.151.201.09
1.121.211.111.13
1.131.14
1.04-1.281.18-1.45
1.09-1.19
1.11-1.261.07-1.251.03-1.420.91-1.10
1.01-1.301.11-1.301.01-1.18
1.00-1.251.10-1.321.02-1.201.02-1.24
1.07-1.201.07-1.23
CLPTM1L (rs402710)Ca Co OR 95%CI
0.8 1.0 1.2 1.4 1.8
OR
HeterozygousHomozygous
Adenocarcinomas Squamous Large cell Small cell
Never smokers Former smokers Current smokers
<50 50-60 60-70 >=70
Men Women
Co-dominant
By histology (p-heterogeneity=0.58)
By smoking status (p-heterogeneity=0.67)
By age (p-heterogeneity=0.61)
By gender (p-heterogeneity=0.92)
694842
1680
92131695
109
663460525
182420562516
836844
917981
2117
2117211721172117
1263469307
241491676709
8991218
1.151.32
1.15
1.201.151.070.98
1.081.201.07
1.011.191.241.09
1.151.14
0.91-1.461.05-1.66
1.04-1.27
1.06-1.360.95-1.390.78-1.460.73-1.31
0.92-1.250.98-1.470.87-1.33
0.75-1.350.97-1.451.04-1.480.92-1.31
1.00-1.320.99-1.31
CLPTM1L_rs402710 Ca Co OR 95%CI
0.8 1.0 1.2 1.4 1.8
OR
Figure 2 – Stratified analysis for rs2736100 and rs402710 (Chr 5) in Whites and Asians (continued)
Whites Asians
• Heterogeneity by histology observed in Whites only
Number of risk-allele
ca co OR 95% CI p-value
7964 8212
0 95 153 1.00 reference
1 551 765 1.16 0.87-1.55 0.32
2 1538 1856 1.30 0.98-1.71 0.06
3 2364 2458 1.53 1.16-2.01 0.003
4 2097 1955 1.72 1.30-2.26 1.10-4
5 1099 883 1.98 1.49-2.63 2.10-6
6 220 142 2.64 1.86-3.74 4.10-8
per risk-allele 1.15 1.12-1.18 1.10-26
Table 6 – Association between risk of lung cancer and combined genotypes of rs402710, rs2736100 and rs16969968 in Whites
• An OR of 2.64 was found for homozygous carriers of the three risk variants compared to individuals with no risk allele.
ConclusionConclusionThe largest pools of independent studies not included in previous GWAS.
For 15q25◦Replicated the results from GWAS in Whites.
◦Expanded to Asians with no association.
For 5p15◦Confirmed the results in Whites.◦Reported an association in Asians.
For 6p21◦Results were not replicated.
• MD Anderson• European Descendant • Bladder Cancer
• Nature Genetics, 41, 991 - 995 (2009).
First Stage - First Stage - discoverydiscovery
969 bladder cancer cases/957 controls
Illumina HumanHap610 BeadChip (Illumina)
556,426 SNPs• None of the SNPs reached genome-wide significance.• After removing highly linked SNPs, three SNPs had a P-value < 10-5 and 50 SNPs showed a P-value < 10-4.
No evidence for inflation of chi-squared test(none of the SNPs reached genome-wide significance
Second Stage - Second Stage - replicationreplication
Genotyping the top 50 SNPs and the top 10 additional SNPs in 8q24 in 3 additional US sites:◦New Hampshire (800 cases/912 controls)
◦Texas (764 cases/2,807 controls)◦MSKCC (149 cases/152 controls)
One SNP, rs2294008, showed consistent results in the discovery and replication phase.
9 additional European populations were used to replicate this SNP.
Overall allelic OR, 1.15 (95% CI, 1.10-1.20).No heterogeneity between populations was observed.
Similar associations were observed across different strata of gender, smoking status, and age.
Third Stage - Third Stage - expansionexpansionrs2294008 is located in exon 1 of the PSCA gene (prostate stem cell antigen), which is upregulated in most bladder tumors.
Resequenced the genomic region of PSCA in 106 individuals of European ancestry.◦27 SNPs were identified.◦All of the high frequency SNPs are in strong LD with rs2294008.
◦7 of the SNPs were genotyped in the discovery set and identical ORs compared to rs2294008 were observed.
Fourth Stage - Fourth Stage - expansionexpansionBladder cancer cell line study of rs2294008◦the T allele-containing haplotypes showed significantly lower promoter activity
◦substitution of C to T significantly reduced promoter activity
◦substitution of T to C increased promoter activity
◦rs2294008 is a functional variant in vitroParadoxical◦T allele reduces the transcriptional activity of the PSCA promoter
◦PSCA has been shown to be overexpressed in bladder tumors
◦The functional consequence of the T allele in vivo is still unclear
The following slides are from Dr. McKay presented in the 2009 INHANCE meeting, Paris
Total sample numbers Genotypes
Replication
Total
Ca Co
Bremen 163 189
South America 1228 1076
Rome 235 222
Seattle 208 413
UNC 1288 1362
Penn State 429 685
UCLA 319 934
ORC 477 487
Brown 568 651
Pittsburgh 633 793
Netherlands 454 304
Total replicates 6002 7116
Updated: 06/26/09
Discovery phase Replication
4429 ca/5996 co 5322 ca/6218 co
marker segment GeneSymbol coding_status Reason OR 95%CI P OR 95%CI P
rs1573496 4q ADH7 SYNON p_all<1x10-5 0.70 0.62-0.79 8E-09 0.78 0.71-0.85 6E-08
rs7431530 3p2 RBMS3 p_all<1x10-5 0.81 ( 0.74- 0.88) 2E-06 0.94 0.88-1.00 0.03
rs4767364 12q24.13a FLJ13089 p_all<1x10-5 1.21 ( 1.12- 1.32) 2E-06 1.12 1.04-1.19 0.001
rs10801805 1p ZNF326 p_all<1x10-5 1.20 ( 1.11- 1.29) 3E-06 1.03 0.97-1.09 0.29
rs2287802 19p COL5A3 p_all<1x10-5 1.19 ( 1.11- 1.28) 4E-06 1.02 0.96-1.08 0.46
rs4799863 18q1 FHOD3 p_all<1x10-5 0.84 ( 0.78- 0.91) 5E-06 0.95 0.90-1.00 0.06
rs11067362 12q24.21b TBX3 p_all<1x10-5 1.35 ( 1.19- 1.54) 6E-06 1.01 0.92-1.10 0.89
rs2299851 6p MSH5 p_all<1x10-5 0.72 ( 0.62- 0.83) 6E-06 1.09 0.98-1.22 0.10
rs1431918 8q ASPH p_all<1x10-5 1.19 ( 1.10- 1.28) 7E-06 1.04 0.98-1.10 0.18
rs7924284 10q CWF19L1 p_all<1x10-5 1.38 ( 1.20- 1.59) 8E-06 0.97 0.88-1.07 0.59
rs2517452 6p C6orf15 p_oral <5x10-7 0.86 ( 0.80- 0.93) 8E-05 1.01 0.94-1.08 0.75
rs16837730 1p3 OPRD1 P_heavy <5x10-7 1.35 ( 1.14- 1.60) 4E-04 1.11 0.98-1.25 0.11
rs1041973 2q IL1RL1 NONSYN NONSYN p<1x10-4 0.83 ( 0.76- 0.90) 3E-05 0.93 0.87-0.99 0.01
rs3810481 20q13 PRIC285 NONSYN NONSYN p<1x10-4 1.22 ( 1.11- 1.34) 6E-05 1.06 0.98-1.15 0.14
rs2012199 1q FCRL5 NONSYN NONSYN p<1x10-4 1.24 ( 1.11- 1.38) 1E-04 1.07 0.99-1.16 0.10
rs484870 19p FLJ35784 NONSYN NONSYN p<1x10-4 1.16 ( 1.08- 1.26) 1E-04 0.98 0.93-1.04 0.59
rs1494961 4q HEL308 NONSYN NONSYN p<1x10-4 1.15 ( 1.07- 1.24) 1E-04 1.11 1.05-1.17 0.0001
Additional SNPs genotyped, but not selected from GWAS study
rs1229984 4q _ ADH1B NONSYN ADH1B 0.49 ( 0.40- 0.60) 2.E-12 0.68 0.60-0.78 8E-09
rs16969968 15q25 CHRN NONSYNLung cancer smoking varaint
1.04 ( 0.96- 1.13) 0.351 1.09 1.03-1.15 0.002
Alcohol Dehydrogenase 7G → C substitutionGly → AlaMAF: Caucasian 0.09 Asian 0-0.5 (?)
RNA binding motif, single stranded interacting proteinC → T substitutionIntronMAF: Caucasian 0.28 Asian 0.41
chromosome 12 open reading frame 30A → G substitutionIntronMAF: Caucasian 0.65 Asian 0.03
interleukin 1 receptor-like 1 C → A substitutionAla → GluMAF: Caucasian 0.13 Asian 0.17
helicase, POLQ-like G → A substitutionVal → IleMAF: Caucasian 0.48 Asian 0.23
alcohol dehydrogenase 1BA → G substitutionHis → ArgMAF: Caucasian 0.99 Asian 0.23
Nicotinic acetylcholine receptorsG→ A substitutionAsp → AsnMAF: Caucasian 0.42 Asian 0.03
Limitations in GWASLimitations in GWASPotential false-positive (negative) results.
Lack of information on gene function.
Insensitivity to rare variants.Can not assay insertion/deletion variants.
Requirement of large sample sizes.Bias due to case and control selection.
Findings are many steps away from actual clinical application.
What we’ve learned from What we’ve learned from the GWAS?the GWAS?International collaboration.Don’t give up on negative results.
Be an active thinker, explore all possibilities.
What’s next after What’s next after GWAS?GWAS?
Thank you!!Thank you!!
ReferencesReferences Thorgeirsson TE, Geller F, Sulem P, Rafnar T, Wiste A, Magnusson KP,
Manolescu A, Thorleifsson G, Stefansson H, Ingason A, Stacey SN, Bergthorsson JT, Thorlacius S, Gudmundsson J, Jonsson T, Jakobsdottir M, Saemundsdottir J, Olafsdottir O, Gudmundsson LJ, Bjornsdottir G, Kristjansson K, Skuladottir H, Isaksson HJ, Gudbjartsson T, Jones GT, Mueller T, Gottsäter A, Flex A, Aben KK, de Vegt F, Mulders PF, Isla D, Vidal MJ, Asin L, Saez B, Murillo L, Blondal T, Kolbeinsson H, Stefansson JG, Hansdottir I, Runarsdottir V, Pola R, Lindblad B, van Rij AM, Dieplinger B, Haltmayer M, Mayordomo JI, Kiemeney LA, Matthiasson SE, Oskarsson H, Tyrfingsson T, Gudbjartsson DF, Gulcher JR, Jonsson S, Thorsteinsdottir U, Kong A, Stefansson K. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature. 2008 Apr 3;452(7187):638-42.
Hung RJ, McKay JD, Gaborieau V, Boffetta P, Hashibe M, Zaridze D, Mukeria A, Szeszenia-Dabrowska N, Lissowska J, Rudnai P, Fabianova E, Mates D, Bencko V, Foretova L, Janout V, Chen C, Goodman G, Field JK, Liloglou T, Xinarianos G, Cassidy A, McLaughlin J, Liu G, Narod S, Krokan HE, Skorpen F, Elvestad MB, Hveem K, Vatten L, Linseisen J, Clavel-Chapelon F, Vineis P, Bueno-de-Mesquita HB, Lund E, Martinez C, Bingham S, Rasmuson T, Hainaut P, Riboli E, Ahrens W, Benhamou S, Lagiou P, Trichopoulos D, Holcátová I, Merletti F, Kjaerheim K, Agudo A, Macfarlane G, Talamini R, Simonato L, Lowry R, Conway DI, Znaor A, Healy C, Zelenika D, Boland A, Delepine M, Foglio M, Lechner D, Matsuda F, Blanche H, Gut I, Heath S, Lathrop M, Brennan P. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature. 2008 Apr 3;452(7187):633-7.
ReferencesReferences Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, Eisen T, Dong Q,
Zhang Q, Gu X, Vijayakrishnan J, Sullivan K, Matakidou A, Wang Y, Mills G, Doheny K, Tsai YY, Chen WV, Shete S, Spitz MR, Houlston RS. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet. 2008 May;40(5):616-22. Epub 2008 Apr 2.
Wu X, Ye Y, Kiemeney LA, Sulem P, Rafnar T, Matullo G, Seminara D, Yoshida T, Saeki N, Andrew AS, Dinney CP, Czerniak B, Zhang ZF, Kiltie AE, Bishop DT, Vineis P, Porru S, Buntinx F, Kellen E, Zeegers MP, Kumar R, Rudnai P, Gurzau E, Koppova K, Mayordomo JI, Sanchez M, Saez B, Lindblom A, de Verdier P, Steineck G, Mills GB, Schned A, Guarrera S, Polidoro S, Chang SC, Lin J, Chang DW, Hale KS, Majewski T, Grossman HB, Thorlacius S, Thorsteinsdottir U, Aben KK, Witjes JA, Stefansson K, Amos CI, Karagas MR, Gu J. Genetic variation in the prostate stem cell antigen gene PSCA confers susceptibility to urinary bladder cancer. Nat Genet. 2009 Sep;41(9):991-5. Epub 2009 Aug 2.
Pearson TA, Manolio TA. How to interpret a genome-wide association study. JAMA. 2008 Mar 19;299(11):1335-44. Erratum in: JAMA. 2008 May 14;299(18):2150.
top related