supplementary methods · web viewftnd and the c ategorical n icotine d ependence p henotype. as in...

95
Supplementary Methods Nicotine dependence studies. Our study included European/European American (EUR) and African American (AA) ever- smokers from 15 independent studies with data obtained from original study investigators and/or the database of Genotypes and Phenotypes (dbGaP). Five of the studies (total N=17,074 EUR) were included in our prior GWAS meta-analysis of nicotine dependence: 1 Collaborative Genetic Study of Nicotine Dependence (COGEND) 2 ; Chronic Obstructive Pulmonary Disease Gene (COPDGene) 3 ; deCODE Genetics; Environment and Genetics in Lung Cancer Etiology Study (EAGLE) 4, 5 ; and Study of Addiction: Genetics and Environment (SAGE) 6 . The present study included another 10 studies: African American Nicotine Dependence (AAND); COGEND2; Center for Oral Health Research in Appalachia 1 (COHRA1) 7 ; Finnish Twin Cohort (FTC); Genetic Association Information Network (GAIN) GWAS of schizophrenia 8 ; Jackson Heart Study (JHS) / Atherosclerosis Risk in Communities (ARIC); Molecular Genetics of Schizophrenia— nonGAIN study; Netherlands Twin Registry (NTR); University of Wisconsin-Transdisciplinary Tobacco Use Research Center (UW- 1

Upload: dinhngoc

Post on 14-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Supplementary Methods

Nicotine dependence studies. Our study included European/European American (EUR)

and African American (AA) ever-smokers from 15 independent studies with data obtained from

original study investigators and/or the database of Genotypes and Phenotypes (dbGaP). Five of

the studies (total N=17,074 EUR) were included in our prior GWAS meta-analysis of nicotine

dependence:1 Collaborative Genetic Study of Nicotine Dependence (COGEND)2; Chronic

Obstructive Pulmonary Disease Gene (COPDGene)3; deCODE Genetics; Environment and

Genetics in Lung Cancer Etiology Study (EAGLE)4, 5; and Study of Addiction: Genetics and

Environment (SAGE)6. The present study included another 10 studies: African American

Nicotine Dependence (AAND); COGEND2; Center for Oral Health Research in Appalachia 1

(COHRA1)7; Finnish Twin Cohort (FTC); Genetic Association Information Network (GAIN)

GWAS of schizophrenia8; Jackson Heart Study (JHS) / Atherosclerosis Risk in Communities

(ARIC); Molecular Genetics of Schizophrenia—nonGAIN study; Netherlands Twin Registry

(NTR); University of Wisconsin-Transdisciplinary Tobacco Use Research Center (UW-

TTURC)9; and Yale-Penn. Ever-smokers were defined by having reported smoking 100 or more

cigarettes in their lifetime, unless otherwise stated. As before,1 our analyses centered around

studies with Fagerström Test for Nicotine Dependence (FTND) data available.10, 11

AAND. AAND was designed as an AA-focused, community-based study

to compare nicotine dependent smokers with smokers who never developed

nicotine dependence symptoms. Its participants were recruited from the

Chicago area between 2010 and 2013 and administered the FTND to

determine study eligibility. Genotyping was conducted using the Illumina

Omni Express array. After applying our standard QC steps, the final analysis

1

data set included 1,687 AAs with data on the FTND phenotype and

covariates (age, sex, and principal component [PC] eigenvectors computed

to remove any residual bias due to population stratification).

COGEND. AAND was modeled after COGEND, a case–control study that began

in 2001 comparing nicotine dependent smokers to smokers who never developed dependence.2

COGEND recruited participants aged 25 to 44 years old from St. Louis and Detroit and

administered the FTND to determine study eligibility. Current smokers with an FTND score of

>4 were recruited as nicotine dependent cases, while smokers who reported >100 cigarettes

during their lifetime but FTND score <1 were recruited as controls. Participants were genotyped

as part of (1) the Study of Addiction: Genetics and Environment (SAGE)6 using the Illumina

Human1M-Duo array (dbGaP accession number phs000092.v1.p1) or (2) Gene Environment

Association Studies Initiative (GENEVA)12 using the Illumina HumanOmni2.5 array (dbGaP

accession number phs000404.v1.p1). In each subset, we retained genotyped SNPs with a call rate

>98% and HWE P>1×10-4. We then combined the subsets and removed duplicated participants

and first-degree relatives. We carried forward only the SNPs genotyped at the intersection of the

different arrays to circumvent potential bias from combining participants genotyped on different

arrays.13 We applied our standard QC procedures on the combined COGEND subsets, resulting

in 1,935 EAs and 704 AAs with FTND phenotype and covariate (age, sex, and PC eigenvectors)

data for analysis.

COGEND2. COGEND2 was carried out for new recruitment following the COGEND

study design. The EA and AA participants were ascertained between 2011 and 2014 and were

genotyped on the Illumina Omni Express array, alongside the AAND participants.

2

Following our standard QC, there were 292 EAs and 313 AAs for analysis with FTND

phenotype and covariate (age, sex, and PC eigenvectors) data.

COHRA1. COHRA1 represents one of a four-site study designed to conduct genome-

wide association study (GWAS) analyses of dental caries.7 COHRA1 had FTND data available,

unlike the other dental caries GWAS studies. COHRA1 recruitment began in 2003 from four

rural Appalachian counties, two in West Virginia and two in Pennsylvania, and an urban

Appalachian region. Families with at least one adult and one biological child residing in the same

household were recruited. Genotyping was conducted using the Illumina Human610 array. We

obtained their genotyping data from dbGaP accession number phs000095.v2.p1 and their FTND

data from the original study investigators. In families with FTND data available on both

parent(s) and child, our QC step for relatedness entailed selecting a single family member based

first on FTND data availability and secondly on highest call rate if more than one family member

had FTND data available. Following QC, there were 243 EAs with FTND phenotype and

covariate (age, sex, and PC eigenvectors) data used for analysis. Of these, 12 participants were

<18 years old.

COPDGene. COPDGene is a multicenter observational study of COPD cases and

controls from across the United States.3 Participants were aged 45 to 80 years old and reported a

history of smoking (currently or past) and 10 or more cigarette pack-years. The present study

included Non-Hispanic white (henceforth referred to as EUR) and AA current smokers with

FTND data available. Among the COPD cases, the Global Initiative for Chronic Obstructive

Lung Disease (GOLD) criteria were used to stage disease severity based on post-bronchodilator

pulmonary function measures. COPD controls had pulmonary function measures in the normal

range for their sex, age and height. Acute and chronic respiratory disease, cancer and other

3

conditions were used as exclusion criteria. Participants were genotyped on the Illumina

HumanOmni1-Quad array and made available via dbGaP (accession number phs000765.v1.p2).

Following QC, we analyzed 2,211 EAs and 2,115 AAs with FTND phenotype and covariate

(age, sex, disease severity [stages of 0, 1-2, and 3-4], and PC eigenvectors) data.

deCODE Genetics. deCODE Genetics, a large population-based study from Iceland

collected over 18 years (1996–2014), included participants that were originally recruited for

different genetic studies of smoking-related and other phenotypes along with population controls.

The deCODE Genetics study was approved by the Data Protection Commission of Iceland and

the National Bioethics Committee of Iceland. The participants’ cigarette smoking data, as

previously described,14 were gathered using questionnaire data on cigarettes per day (CPD) and

the other FTND components. A third-party system was used to encrypt personal identifiers

associated with phenotypic information and blood samples.15 As used before,1 the present study

included 9,090 smokers who were genotyped using Illumina SNP array chips in one of several

GWAS conducted by deCODE Genetics. Their QC steps were previously described.16 Among

those classified as having mild dependence, 1,558 had FTND data and another 4,313 were low-

intensity smokers who had CPD data available (see “Categorical nicotine dependence

phenotype” for additional details) as well as age and sex to include as covariates.

EAGLE. The population-based EAGLE study recruited newly diagnosed lung cancer

cases and matched controls, aged 35 to 79 years old, from the Italian region of Lombardy.4, 5

EAGLE participants were genotyped on the Illumina HumanHap550v3 array, as part of

GENEVA.12 Their data were deposited in the database of Genotypes and Phenotypes (dbGaP;

accession number phs000093.v2.p2), where we obtained their genome-wide SNP genotypes,

overall FTND score among current and former smokers, and other phenotype and covariate data.

4

Specific FTND item analyses were conducted by the original study investigators. After applying

all participant-level and SNP-level quality control (QC) procedures that were recommended in

the dbGaP release as well as our own set of standard QC procedures, the final analysis data set

for EAGLE included 3,006 participants with FTND phenotype and covariate (age, sex, lung

cancer case/control status, and PC eigenvectors) data. As before,1 since FTND data were

collected based on lifetime and not current smoking habits, lung cancer case/control status was

not included as a covariate.

FTC. Participants originated from the following sub-cohorts of the FTC: (i) the Nicotine

Addiction Genetics (NAG) study of adult twins, who were born between 1938 and 1957 and

were concordant for ever smoking, and their family members (mainly siblings); (ii) a population-

based longitudinal study of five consecutive Finnish twin birth cohorts (1983–1987,

FinnTwin12); and (iii) a population‐based longitudinal study of five consecutive Finnish twin

birth cohorts (1975–1979, FinnTwin16).17, 18 Genotyping was conducted on the Illumina

Human670-QuadCustom or HumanCoreExome array. Their QC procedures have been

previously described.19 The FTC sub-cohorts were imputed separately by genotyping array and

then all available SNPs were merged for analysis with altogether 2,374 participants having

FTND phenotype and covariate (age, sex, birth cohort, and PC eigenvectors) data.

GAIN. The GAIN originated from the U.S.-based Molecular Genetics of Schizophrenia

study.8 Their genotyping was conducted separately using the Affymetrix 6.0 array. We obtained

the data from the Molecular Genetics of Schizophrenia study participants genotyped under the

auspices of GAIN via dbGaP accession number phs000021.v3.p2. Using only the schizophrenia

controls, we applied our standard QC procedures, resulting in the inclusion of 774 EAs and 477

AAs with FTND phenotype and covariate (age, sex, and PC eigenvectors) data.

5

JHS/ARIC. JHS is a longitudinal cohort study of AAs from the Jackson, Mississippi

metropolitan area.20 AAs aged 35 to 84 years old were recruited from the general population,

along with family members aged 21 to 34 years old. Its study design was an extension of the

ARIC longitudinal cohort study that ascertained EA and AA participants (aged 45–64) from four

U.S. communities (Jackson, Mississippi; Forsyth County, North Carolina; suburbs of

Minneapolis, Minnesota; and Washington County, Maryland).21 In both JHS and ARIC, ever-

smokers were defined by report of smoking 400 or more cigarettes in their lifetime. Our study

focused on AA ever-smokers across JHS and ARIC. Like the approach taken in deCODE, we

used the participants with FTND data (all from JHS), supplemented with low-intensity smokers

from ARIC with only CPD data available (see “Categorical nicotine dependence phenotype” for

additional details).

We obtained genome-wide SNP genotype data, generated on the Affymetrix 6.0 array via

dbGaP accession numbers phs000090.v1.p1 for ARIC and phs000286.v3.p1 for JHS. After

applying our standard QC, there were 1,143 with FTND or CPD phenotype and covariate (age,

sex, and PC eigenvectors) data (N=628 from JHS and 515 from ARIC). JHS included

participants originally recruited under ARIC and others recruited under JHS only, but our QC

confirmed that there were no duplicated participants across the JHS and ARIC subsets used for

analysis (as confirmed by identity-by-state <0.9).

nonGAIN. The nonGAIN study originated from the same U.S.-based Molecular Genetics

of Schizophrenia study, as the GAIN study. Genotyping data from the remainder of the

Molecular Genetics of Schizophrenia study8 participants (denoted nonGAIN) were obtained via

dbGaP accession number phs000167.v1.p1. We used only the schizophrenia controls, applied

6

our standard QC, and included 471 EAs with complete FTND and covariate (age, sex, and PC

eigenvectors) data in the final analysis data set.

NTR. Established in 1987, the NTR recruited twins and other multiple birth siblings for

longitudinal research in two collections: the Adult NTR including twins and their family

members, and the Young NTR including twins recruited at birth or in early life and their parents

and sibs.22A subset of NTR participants have genome-wide genotype data derived from various

Affymetrix and Illumina arrays.23, 24 QC procedures have been described elsewhere.24 SNPs that

passed all QC checks were merged across different arrays and used for imputation. NTR filtered

out imputed SNPs with MAF<0.5%, HWE P<1×10-5, Rsq<0.3, Mendelian error rate<2%, or

absolute reference frequency allele difference>0.15 between NTR and 1000 Genomes (1000G).

We analyzed data from 3,599 NTR participants who passed QC and had FTND25 and covariate

information (age, sex, dummy variables to correct for genotyping array, and PC eigenvectors)

available.

SAGE*. SAGE encompasses three U.S.-based case–control studies of addictive disorders

that were ascertained in the United States: COGEND, the Collaborative Study on the Genetics of

Alcoholism,26 and the Family Study of Cocaine Dependence.27 COGEND participants were

removed to avoid redundancy in our study; the remaining participants (henceforth referred to as

SAGE*) were analyzed together as done in prior GWAS analyses.1, 6 We obtained their Illumina

Human1M-Duo genotype and phenotype data via dbGaP (accession number phs000092.v1.p1)

and applied our standard QC procedures, leaving 832 EAs and 633 AAs for analysis with FTND

phenotype data and covariates (age, sex, DSM-IV-defined cocaine dependence, DSM-IV-defined

alcohol dependence, and PC eigenvectors)

7

UW-TTURC. Beginning in 2001, UW-TTURC recruited smokers in Madison and

Milwaukee, Wisconsin for smoking cessation treatment clinical trials.9 Participants smoked at

least 10 CPD and reported being motived to quit smoking. We obtained their Illumina

HumanOmni2.5 genotypes, FTND scores and other phenotypic data via dbGaP accession

number phs000404.v1.p1. After applying standard QC, there were 1,534 EAs and 247 AAs

included in our study with FTND phenotype and covariate (age, sex, and PC eigenvectors).

Yale-Penn. The Yale-Penn study included unrelated individuals (mostly) and small

nuclear families who were recruited from the eastern United States to carry out genetic studies

for alcohol, cocaine or opioid dependence.28-30 Nicotine dependence was not used as a basis for

subject selection, but FTND data were collected among smokers. Genotyping was conducted on

the Illumina HumanOmni1-Quad array. Following QC as previously described,31 there were

2,116 EAs and 2,115 AAs for analysis with the FTND phenotype and covariates (age, sex, DSM-

IV-defined cocaine dependence, and DSM-IV-defined alcohol dependence, and PC

eigenvectors).

QC. Unless otherwise stated, we applied our own standard set of QC procedures to

remove participants with a missing rate >3%, sample duplication (identity-by-state >90%), first-

degree relatedness (identity-by-descent >40%), gender discordance (FST <0.2 for chromosome

X SNPs to confirm females and FST >0.8 to confirm males), excessive homozygosity (FST >0.5

or FST <-0.2), or chromosomal anomalies and to remove SNPs with a missing rate >3% or Hardy-

Weinberg equilibrium (HWE) P<1×10-4. Across all studies, participants with missing covariate

or phenotype data were excluded prior to genome-wide association study (GWAS) analysis.

Supplementary Table 1 provides additional details on the numbers of genotyped SNPs and

participants passing QC and included for imputation and GWAS analysis.

8

1000G imputation. Autosomal SNPs passing QC were used as input to pre-phase study

genotypes using SHAPEIT2 and then to perform imputation, using IMPUTE232 (except where

indicated in Supplementary Table 1) with reference to the 1000G ALL phase I integrated variant

set, as previously described.1 NTR excluded imputed SNPs prior to meta-analysis with a

composite quality measure that included a check for Mendelian inconsistencies. For all other

studies, we used the software-generated quality metrics to evaluate the SNP/indel imputation

quality post hoc rather than imposing an a priori imputation quality filter.

FTND and the categorical nicotine dependence phenotype. As in our previous study,1

our analyses focused on nicotine dependence as defined by the FTND. The FTND is a six-item

questionnaire that queries (1) time to first cigarette in the morning, (2) refraining from smoking

in forbidden places, (3) cigarette most hated to give up, (4) CPD, (5) smoking during the first

hours after waking vs. the rest of the day, and (6) smoking while ill. FTND scores range from 0

(no dependence) to 10 (highest dependence level). We used lifetime FTND10, 11 collected among

current and former smokers (score based on the time period that the participants reported

smoking the most) in AAND, EAGLE, COGEND, COGEND2, deCODE, FTC, GAIN,

nonGAIN, NTR, SAGE*, and Yale-Penn and current FTND (score at the time of interview

among current smokers) in COHRA1, COPDGene, and UW-TTURC. For these studies, we

defined nicotine dependence categorically using the FTND: scores of 0 to 3 for mild, 4 to 6 for

moderate, and 7 to 10 for severe. Two of the 15 studies (JHS/ARIC and deCODE) included

participants who had FTND data available and supplemented with additional low-intensity

smoker participants who had only CPD data that we used as a proxy measure to define mild

dependence (CPD <10) as we did before.1 For JHS/ARIC, we included 352 JHS participants with

FTND score 0–3 and another 4,313 smokers from ARIC with no FTND data available but with

9

CPD <10 reported. For deCODE, we included 1,558 participants with FTND score 0–3 and

another 4,313 smokers with no FTND data available but with CPD <10 reported. Moderate and

severe dependence was defined solely by FTND scores, because of our examination of. the

concordance of our FTND and CPD categories among participants in two studies (4,947

deCODE, 5,803 COGEND EA, and 2,075 COGEND AA participants); all COGEND

participants were included in this phenotype analysis, compared to the subset included in our

GWAS analysis with genotypes available. Overall, we found high concordance between a report

of CPD <10 and FTND scores 0–3 (86.4%). We observed lower concordance between CPD=11–

20 and FTND scores 4–6 (52.7%) and between CPD >21 and FTND scores (50.4%), which

largely resulted from classifying participants at a higher level of dependence (e.g., classifying

participants as severe according to CPD, when many would be classified as moderate according

to FTND) (Supplementary Figure 8). These results suggest that CPD can be used as a proxy to

increase sample size with little misclassification under the mild dependence category. Any

observed phenotype misclassification would be expected to underestimate the SNP associations

with dependence (i.e., conservatively bias results), reducing statistical power and attenuating

effect size estimates.33, 34

Statistical analyses. In each ancestry group and study separately, we used linear

regression to test SNP/indel associations with categorical nicotine dependence (mild=0 /

moderate=1 / severe=2). Covariates included age, sex, principal components (the number needed

to account for >75% of the variability in nicotine dependence among the first 10 components),

and study-specific covariates, as needed, to account for the original study’s ascertainment

scheme (COPD disease severity in COPDGene, lung cancer case/control status in EAGLE, birth

10

cohort in FTC, and DSM-IV-defined alcohol and cocaine dependence in SAGE* and Yale-

Penn).

Three studies included relatives and thus required adjustment for family structure—FTC,

NTR, and Yale-Penn. For FTC and Yale-Penn, kinship matrices were computed and then used to

take into account relatedness when conducting GWAS analyses with the –mmscore option in

ProbAbel.35 We observed deflation of test statistics resulting from this conservative adjustment in

Yale-Penn, and we used genomic control to reverse this deflation as recommended by the

ProbAbel developers.36 The third family-based study, NTR, performed GWAS analyses in

PLINK37 using a sandwich correction to account for familial relatedness.

Fetal brain study used for SNP-methylation association analysis. This study included

166 fetal brain samples, encompassing both males and females and ranging from 56 to 169 days

post-conception, were obtained from the Human Developmental Biology Resource and UK

Medical Research Council Brain Banks network. We used their previously published results to

assess SNP associations with DNA methylation (DNAm). Genotyping was conducted using the

Illumina HumanOmniExpress BeadChip and was used as input for 1000G (phase 3) imputation.

DNAm was measured using the Illumina HumanMethylation450 BeadChip. As previously

described,38 methylation quantitative trait locus (meQTL) results were generated using linear

regression modeling with additive SNP genotype as the predictor, DNAm β value as the

outcome, and covariates for age, sex and the first two genotype-derived principal components.

SNP-methylation association results at P<1×10-8 were made available at

http://epigenetics.essex.ac.uk/mQTL/.

Postmortem brain studies used for SNP-expression association analyses. To assess

top nicotine dependence-associated variation for expression QTL (eQTL) effects,, we used the

11

Genotype-Tissue Expression (GTEx) project39, 40 (v6p release) that includes 1000G (phase I)-

imputed genotype data and RNA sequencing (RNAseq) measures across 13 postmortem human

brain tissues—amygdala (N=62 with genotype and RNAseq data), anterior cingulate cortex

(N=72), caudate (N=100), cerebellar hemisphere (N=89), cerebellum (N=103), cortex (N=96),

frontal cortex (N=92), hippocampus (N=81), hypothalamus (N=81), nucleus accumbens (N=93),

putamen (N=82), spinal cord (N=59), and substantia nigra (N=56). Gene RNA expression

(RNAexp) levels by tissue sample were measured as reads per kilobase transcript per million

reads (RPKM). Genotype data were derived from the Illumina Omni2.5 or Omni5 array and

obtained via dbGaP accession number phs000424.v6.p1. This GTEx release was comprised of

adults (20+ years old); 34.4% females; and 84.3% EAs, 13.7% AAs, 1.0% Asian, and 1.0%

unreported ancestry. The most common cause of death was traumatic injury among the youngest

subjects (54.3%, 20–39 years old) and heart disease among the oldest subjects (37.6%, 60–71

years old). Other causes of death included cerebrovascular, liver, renal, respiratory, and

neurological diseases. Smoking data were not available. Their QC, RNAseq pre-processing and

quantification, and analyses were previously described elsewhere.40, 41 Briefly, linear regression

models were used to test associations of SNP genotypes with reads per kilobase transcript per

million reads (RPKM)-based and normalized RNAexp levels of nearby genes (+/- 1Mb around

the transcription start sites), adjusting for 3 genotyping principal components, genotyping array,

gender, and probabilistic estimation of expression residuals (PEER) factors to account for

technical artifacts in the RNAseq data. Tissues with a sample size <70 were not included in the

eQTL analysis for this release, resulting in 10 of the 13 brain tissues being used for analysis.

We followed up SNPs significantly associated with RNAexp from GTEx for independent

replication in the Brain eQTL Almanac (http://www.braineac.org/) data set to evaluate their

12

associations with RNAexp probes that were measured across 10 postmortem human brain

tissues, many of which overlap with the tissues represented in GTEx—cerebellar cortex (outer

layer of the cerebellum), frontal cortex, hippocampus, inferior olivary nucleus, intralobular white

matter, occipital cortex, putamen, substantia nigra, temporal cortex, and thalamus.42 The

postmortem measurements were made in the UK Brain Expression Consortium, comprised of

134 EUR decedents who were mostly adults free of neurodegenerative disorders. No data were

available on their smoking or any other substance use histories. Illumina Omni1-Quad and

Immunochip array were used for SNP genotyping, which was followed by 1000G (phase I)

imputation. RNAexp levels were measured using Affymetrix Human Exon 1.0 ST arrays. As

previously described,42 for SNPs within +/- 1Mb of each gene’s transcription start site, genotypes

were tested for association with RNAexp levels that had been normalized and adjusted for brain

bank, gender and batch effects.

Acknowledgements

AAND was supported by R01 DA025888 (Co-PIs Eric Johnson and Laura Bierut).

COGEND and COGEND2 were supported by grants from the National Cancer Institute

(NCI; grant number P01 CA089392, PI: Laura Bierut) and NIDA (R01 DA036583 and R01

DA025888, PI: Laura Bierut), both of NIH. Genotype data are available via dbGaP as part of the

“Genetic Architecture of Smoking and Smoking Cessation” (accession number

phs000404.v1.p1) and “Study of Addiction: Genetics and Environment (SAGE)” (accession

number phs000092.v1.p1). Funding support for genotyping, which was performed at CIDR, was

provided by 1 X01 HG005274-01 and by the NIH Genes, Environment and Health Initiative

[GEI] (U01 HG004422). CIDR is fully funded through a federal contract from the NIH to The

13

Johns Hopkins University, contract number HHSN268200782096C. Assistance with genotype

cleaning, as well as with general study coordination, was provided by the GENEVA

Coordinating Center (U01 HG004446).

Funding support for the Center for Oral Health Research in Appalachia (COHRA) Dental

Caries GWAS study (cohort designation COHRA1) was provided by the NIH, National Institute

of Dental and Craniofacial Research (NIDCR) grant number U01 DE018903 (PI: MLM). Data

and samples were provided by COHRA, a collaboration of the University of Pittsburgh and West

Virginia University funded by NIDCR R01 DE014899 (PIs: MLM and DWM). Genotype and

phenotype data are available through dbGaP (accession number phs000095.v2.p1).

The COPDGene® project was supported by award numbers R01 HL089897 and R01

HL089856 from the NHLBI. Research reported in this publication was also supported by the

NHLBI award number K01 HL125858. The content is solely the responsibility of the authors

and does not necessarily represent the official views of the National Institutes of Health. The

content is solely the responsibility of the authors and does not necessarily represent the official

views of NHLBI or NIH. The COPDGene® project is also supported by the COPD Foundation

through contributions made to an Industry Advisory Board comprised of AstraZeneca,

Boehringer Ingelheim, Novartis, Pfizer, Siemens, Sunovion, and GlaxoSmithKline. The authors

acknowledge investigators of the COPDGene® project core units: Administrative (James Crapo

[Principal Investigator] and Edwin Silverman [Principal Investigator]), Barry Make, and

Elizabeth Regan); Genetic Analysis (Terri Beaty, Nan Laird, Christoph Lange, Michael Cho,

Stephanie Santorico, Dawn DeMeo, Nadia Hansel, Craig Hersh, Peter Castaldi, Merry-Lynn

McDonald, Emily Wan, Megan Hardin, Jacqueline Hetmanski, Margaret Parker, Marilyn

Foreman, Brian Hobbs, Robert Busch, Adel El-Bouiez, Megan Hardin, Dandi Qiao, Elizabeth

14

Regan, Eitan Halper-Stromberg, Ferdouse Begum, Sungho Won); Imaging (David Lynch,

Harvey Coxson, MeiLan Han, Eric Hoffman, Stephen Humphries Francine Jacobson, Philip

Judy, Ella Kazerooni, John Newell, Jr., Elizabeth Regan, James Ross, Raul San Jose Estepar,

Berend Stoel, Juerg Tschirren, Eva van Rikxoort, Bram van Ginneken, George Washko, Carla

Wilson, Mustafa Al Qaisi, Teresa Gray, Alex Kluiber, Tanya Mann, Jered Sieren, Douglas

Stinson, Joyce Schroeder, Edwin Van Beek); Pulmonary Function Testing Quality Assurance

(Robert Jensen); Data Coordinating Center and Biostatistics (Douglas Everett, Anna Faino, Matt

Strand, Carla Wilson); and Epidemiology (Jennifer Black-Shinn, Gregory Kinney, Katherine

Pratte). The authors also acknowledge the clinical center investigators: Jeffrey Curtis, Carlos

Martinez, Perry G. Pernicano, Nicola Hanania, Philip Alapat, Venkata Bandi, Mustafa Atik,

Aladin Boriek, Kalpatha Guntupalli, Elizabeth Guy, Amit Parulekar, Arun Nachiappan, Dawn

DeMeo, Craig Hersh, George Washko, Francine Jacobson, R. Graham Barr, Byron Thomashow,

John Austin, Belinda D’Souza, Gregory D.N. Pearson, Anna Rozenshtein, Neil MacIntyre, Jr.,

Lacey Washington, H. Page McAdams, Charlene McEvoy, Joseph Tashjian, Robert Wise, Nadia

Hansel, Robert Brown, Karen Horton, Nirupama Putcha, Richard Casaburi, Alessandra Adami,

Janos Porszasz, Hans Fischer, Matthew Budoff, Dan Cannon, Harry Rossiter, Amir

Sharafkhaneh, Charlie Lan, Christine Wendt, Brian Bell, Marilyn Foreman, Gloria Westney,

Eugene Berkowitz, Russell Bowler, David Lynch, Richard Rosiello, David Pace, Gerard Criner,

David Ciccolella, Francis Cordova, Chandra Dass, Robert D’Alonzo, Parag Desai, Michael

Jacobs, Steven Kelsen, Victor Kim, A. James Mamary, Nathaniel Marchetti, Aditti Satti, Kartik

Shenoy, Robert M. Steiner, Alex Swift, Irene Swift, Gloria Vega-Sanchez, Mark Dransfield,

William Bailey, J. Michael Wells, Surya Bhatt, Hrudaya Nath, Joe Ramsdell, Paul Friedman,

Xavier Soler, Andrew Yen, Alejandro Cornellas, John Newell, Jr., Brad Thompson, MeiLan

15

Han, Ella Kazerooni, Fernando Martinez, Joanne Billings, Tadashi Allen, Frank Sciurba, Divay

Chandra, Joel Weissfeld, Carl Fuhrman, Jessica Bon, Antonio Anzueto, Sandra Adams, Diego

Maselli-Caceres, and Mario Ruiz.

The deCODE work was supported in part by NIDA by grants R01 DA035825 (Principal

Investigator [PI]: Dana Hancock) and R01 DA017932 (PI: Kari Stefansson). deCODE thanks the

participants in the genetic studies whose contributions made this work possible.

Funding support for the Environment and Genetics in Lung Cancer Etiology (EAGLE)

study was provided through the NIH GEI (Z01 CP 010200). The human participants

participating in this study and its companion “Prostate, Lung Colon and Ovary Screening Trial”

are supported by intramural resources of the NCI. Assistance with phenotype harmonization and

genotype cleaning, as well as with general study coordination, was provided by the GENEVA

Coordinating Center (U01 HG004446). Assistance with data cleaning was provided by the

National Center for Biotechnology Information. Funding support for genotyping, which was

performed at The Johns Hopkins University’s Center for Inherited Disease Research (CIDR),

was provided by the NIH GEI (U01 HG004438). The datasets used for the analyses described in

this manuscript were obtained from the database of Genotypes and Phenotypes (dbGaP,

http://www.ncbi.nlm.nih.gov/gap) through accession number phs000093.v2.p2.

The FTC authors warmly thank the participating twin pairs and their family members of

the Finnish Twin Cohort Study for their contribution. We would like to express our appreciation

to the skilled study interviewers A-M Iivonen, K Karhu, H-M Kuha, U Kulmala-Gråhn, M

Mantere, K Saanakorpi, M Saarinen, R Sipilä, L Viljanen, and E Voipio. Anja Häppölä and

Kauko Heikkilä are acknowledged for their valuable contribution in recruitment, data collection,

and data management. Phenotyping and genotyping of the Finnish twin cohorts was supported by

16

the Academy of Finland Center of Excellence in Complex Disease Genetics (grants 213506,

129680), the Academy of Finland (grants 100499, 205585, 118555, 141054, 265240, 263278 and

264146 to Jaakko Kaprio), NIH/NIAAA (grant numbers R37 AA012502, K05 AA000145, and

R01 AA009203 to R. J. Rose and R01 AA015416 and K02 AA018755 to D. M. Dick),

NIH/NIDA (grant number R01 DA012854, PI: Pamela A. F. Madden), Sigrid Juselius

Foundation (to Jaakko Kaprio), Global Research Award for Nicotine Dependence, Pfizer Inc. (to

Jaakko Kaprio), and the Welcome Trust Sanger Institute, UK. Antti-Pekka Sarin and Samuli

Ripatti are acknowledged for genotype data quality controls and imputation. Association

analyses were run at the ELIXIR Finland node hosted at CSC – IT Center for Science for ICT

resources.

Funding support for the Genetic Association Information Network (GAIN) Genome-

Wide Association of Schizophrenia Study was provided by the National Institute of Mental

Health (NIMH) (R01 MH67257, R01 MH59588, R01 MH59571, R01 MH59565, R01

MH59587, R01 MH60870, R01 MH59566, R01 MH59586, R01 MH61675, R01 MH60879, R01

MH81800, U01 MH46276, U01 MH46289 U01 MH46318, U01 MH79469, and U01 MH79470)

and the genotyping of samples was provided through GAIN. The datasets used for the analyses

described in this manuscript were obtained from dbGaP through accession number

phs000021.v3.p2. Samples and associated phenotype data for the Genome-Wide Association of

Schizophrenia Study were provided by the Molecular Genetics of Schizophrenia Collaboration

(PI: Pablo V. Gejman, Evanston Northwestern Healthcare (ENH) and Northwestern University,

Evanston, IL, USA).

JHS (dbGaP accession no. phs000286.v3.p1) is supported and conducted in collaboration

with Jackson State University (HHSN268201300049C and HHSN268201300050C), Tougaloo

17

College (HHSN268201300048C), and the University of Mississippi Medical Center

(HHSN268201300046C and HHSN268201300047C) contracts from the NHLBI and the

National Institute for Minority Health and Health Disparities (NIMHD). Funding for genotyping

was provided by NHLBI contract N01-HC-65226. ARIC (dbGaP accession number

phs000090.v1.p1) is carried out as a collaborative study supported by National Heart, Lung, and

Blood Institute (NHLBI) contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-

55019, N01-HC-55020, N01-HC-55021, N01-HC-55022, R01 HL087641, R01 HL59367 and

R01 HL086694; National Human Genome Research Institute (NHGRI) contract U01 HG004402;

and National Institutes of Health (NIH) contract HHSN268200625226C. The authors thank the

staff and participants of the ARIC study for their important contributions. Infrastructure was

partly supported by grant UL1RR025005, a component of the NIH and NIH Roadmap for

Medical Research. Genotyping was conducted by GENEVA. Funding for GENEVA was

provided by NHGRI grant U01 HG004402 (E. Boerwinkle).

Funding support for the Molecular Genetics of Schizophrenia - nonGAIN study, was

provided by Genomics Research Branch at NIMH, and the genotyping and analysis of samples

was also provided through GAIN and under the MGS U01s: MH79469 and MH79470.

Assistance with data cleaning was provided by NCBI. The dataset used for the analyses

described in this manuscript were obtained via dbGaP accession number phs000167.v1.p1.

Samples and associated phenotype data for the nonGAIN study were collected under the

following grants: NIMH Schizophrenia Genetics Initiative U01s: MH46276 (CR Cloninger),

MH46289 (C Kaufmann), and MH46318 (MT Tsuang); and MGS Part 1 (MGS1) and Part 2

(MGS2) R01s: MH67257 (NG Buccola), MH59588 (BJ Mowry), MH59571 (PV Gejman),

MH59565 (Robert Freedman), MH59587 (F Amin), MH60870 (WF Byerley), MH59566 (DW

18

Black), MH59586 (JM Silverman), MH61675 (DF Levinson), and MH60879 (CR Cloninger).

Further details of collection sites, individuals, and institutions may be found in data supplement

Table 1 of Sanders et al. (2008; PMID: 18198266) and at the study dbGaP pages.

We thank all NTR participants whose data we analyzed in this study. Camelia C. Minica

and Michael M. Neale are supported by the NIDA grant R01 DA018673. We acknowledge the

Netherlands Organisation for Scientific Research (NWO) and the Netherlands Organisation for

Health Research and Development (ZonMW Addiction 31160008; ZonMW 940-37-024;

NWO/SPI 56-464-14192; NWO-400-5-717; NWO-MW 904-61-19; NWO-MagW 480-4-004;

NWO-Veni 016-115-035), the EMGO + Institute for Health and Care Research, the

Neuroscience Campus Amsterdam, BBMRI–NL (184.021.007: Biobanking and Biomolecular

Resources Research Infrastructure), the Avera Institute, Sioux Falls, South Dakota (USA) and

the European Research Council (230374, 284167) for support. Genotyping was funded in part by

grants from the National Institutes of Health (4R37DA018673-6, RC2 MH089951). The

statistical analyses were carried out on the Genetic Cluster Computer

(http://www.geneticcluster.org) which is supported by the Netherlands Scientific Organization

(NWO 480-5-003), the Dutch Brain Foundation and the Department of Psychology and

Education of the VU University Amsterdam.

Funding support for SAGE was further provided through the NIH GEI (U01 HG004422).

Assistance with phenotype harmonization and genotype cleaning, as well as with general study

coordination, was provided by the GENEVA Coordinating Center (U01 HG004446). Assistance

with data cleaning was provided by the National Center for Biotechnology Information. Support

for collection of datasets and samples was provided by the Collaborative Study on the Genetics

of Alcoholism (COGA; U10 AA008401) and the Family Study of Cocaine Dependence (FSCD;

19

R01 DA013423, PI: Laura Bierut). Funding support for genotyping, which was performed at

CIDR, was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse

and Alcoholism (NIAAA), NIDA, and the NIH contract "High throughput genotyping for

studying the genetic contributions to human disease" (HHSN268200782096C). The datasets used

for the analyses described in this manuscript were obtained via dbGaP accession number

phs000092.v1.p1.

The University of Wisconsin-Transdisciplinary Tobacco Use Research Center (UW-

TTURC) study was accessed via dbGaP as part of “Genetic Architecture of Smoking and

Smoking Cession” (accession number phs000404.v1.p1). Funding support for collection of the

UW-TTURC dataset and samples was provided by P50 DA019706 and P50 CA084724. Funding

support for genotyping, which was performed at CIDR, was provided by 1 X01 HG005274-01.

CIDR is fully funded through a federal contract from NIH to The Johns Hopkins University,

contract number HHSN268200782096C. Assistance with genotype cleaning, as well as with

general study coordination, was provided by the GENEVA Coordinating Center (U01

HG004446).

The Yale-Penn study was supported by NIH grants RC2 DA028909 (PI: Joel Gelernter),

R01 DA012690 (PI: Joel Gelernter), R01 DA012849 (PI: Joel Gelernter), R01 DA018432 (PI:

Henry Kranzler), R01 AA011330 (PI: Joel Gelernter), R01 AA017535 (PI: Joel Gelernter), and

the VA Connecticut and Philadelphia VA MIRECCs. Genotyping services for a part of the Yale-

Penn GWAS were provided by CIDR and Yale University (Center for Genome Analysis). CIDR

is fully funded through a federal contract from NIH to The Johns Hopkins University (contract

number N01-HG-65403). We are grateful to Ann Marie Lacobelle, Catherine Aldi, and Christa

Robinson for their excellent technical assistance, to the SSADDA interviewers, led by Yari

20

Nuñez and Michelle Slivinsky, who devoted substantial time and effort to phenotype the study,

and to John Farrell and Alexan Mardigan for database management assistance.

21

Supplementary Table 1. Details of single nucleotide polymorphism (SNP) genotyping, quality control (QC), imputation, and

statistical analysis across the 15 independent studies.

Study and ancestry group

Genotyping platform

N (%) genotyped participants passing QC and used for imputation

N (%) autosomal genotyped SNPs passing QC and used for imputation

Imputation software

1000 Genomes reference panel for imputation

N imputed participants with complete phenotype and covariate data

GWAS statistical analysis software

AAND and COGEND2 AAsa

Illumina Omni Express

2,047 (99.4) 921,133 (99.5) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

AAND: 1,687 COGEND2: 313

ProbAbel

COGEND EAs Illumina Human1M-Duo or HumanOmni2.5

1,952 (99.5) 587,486 (99.9) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

1,935 ProbAbel

COGEND AAs Illumina Human1M-Duo or HumanOmni2.5

709 (99.6) 587,483 (99.8) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

704 ProbAbel

COGEND2 EAs

Illumina Omni Express

411 (100) 921,123 (99.5) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

292 ProbAbel

COHRA1 EAs Illumina Human610

1,964 (59.3)d 558,483 (97.6) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

243c ProbAbel

COPDGene EAs

Illumina HumanOmni1-Quad

6,664 (99.9) 622,037 (99.0) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

2,211 ProbAbel

COPDGene AAs

Illumina HumanOmni1-Quad

3,298 (99.9) 674,808 (99.0) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

2,115 ProbAbel

deCODE Illumina SNP 9,090c 697,913e Long-range Variant set 9,090 ProbAbel

22

Icelanders arrays phasing using the same model as used by IMPUTE

from whole-genome sequencing of 2,636 Icelanders

EAGLE Italians Illumina HumanHap550v3

3,879 (99.5) 536,319 (98.0) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

3,006 ProbAbel

FTC Finns Illumina Human670-QuadCustom or HumanCoreExome array

670 subset I: 2,373 (99)670 subset II: 128 (99)CoreExome: 1,803 (95)

670 subset I: 521,493 (89)670 subset II: 467,103 (88)CoreExome: 342,370 (68)

IMPUTE232 ALL phase I (v3) haplotypes (20130917)

2,374 ProbAbel

GAIN EAs Affymetrix 6.0 2,560 (99.6) 775,889 (96.7) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

774f ProbAbel

GAIN AAs Affymetrix 6.0 1,890 (99.3) 799,209 (96.7) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

477f ProbAbel

JHS/ARIC AAs Affymetrix 6.0 JHS: 1,967 (89.9)ARIC: 2,412 (99.5)

JHS: 835,368 (95.8)c

ARIC: 746,334 (99.9)c

IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

JHS: 628ARIC: 515

ProbAbel

nonGAIN EAs Affymetrix 6.0 2,482 (98.7) 706,194 (94.0) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

471e ProbAbel

NTR Dutch Illumina and Affymetrix SNP arrays

12,240 (99.8) 1,422,177 (99.9)

MACH12 ALL phase1 release v3 20101123 as the first

3,599 PLINK

23

reference set and Genome of the Netherlands (GoNL) version 4 as the second reference set

SAGE* EAs Illumina Human1M-Duo

1,471 (98.3) 935,356 (99.6) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

832 ProbAbel

SAGE* AAs Illumina Human1M-Duo

981 (96.7) 935,355 (99.6) IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

633 ProbAbel

UW-TTURC EAs

Illumina HumanOmni2.5

1,566 (99.6) 2,182,770 (99.2)

IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

1,534 ProbAbel

UW-TTURC AAs

Illumina HumanOmni2.5

257 (100) 2,182,669 (99.2)

IMPUTE232 ALL phase I (v3) haplotypes (20101123)b

247 ProbAbel

Yale-Penn EAs Illumina HumanOmni1-Quad

2,379 (97.1) 889,659 (90.0) IMPUTE232 ALL phase I (v3) haplotypes (20120305) b

2,116 ProbAbel

Yale-Penn AAs Illumina HumanOmni1-Quad

3,318 (99.1) 889,659 (90.0) IMPUTE232 ALL phase I (v3) haplotypes (20120305) b

2,606 ProbAbel

AAs, African Americans; EAs, European AmericansaBecause AAND and COGEND2 participants were genotyped together, the AAs from these studies were processed through QC

together. COGEND2 EAs were processed separately. bPhase I integrated release (v3) haplotype data were downloaded from

https://mathgen.stats.ox.ac.uk/impute/data_download_1000G_phase1_integrated.html.

24

cWhen combining ARIC and JHS, the intersecting set of SNPs that passed QC across both studies were used as the inputs for

imputation to avoid potential bias under the alternative approach of using the union of SNPs available in at least one of the studies.13

dCOHRA1 is comprised of families, and most of the removed participants were due to selecting only one participant with FTND data

available from each relative pair/cluster. eDue to multiple chips being used at deCODE, the participant and SNP numbers used for genotyping varies. See Gudbjartsson et al.

for further details of the QC methods.43

fOur final analysis data sets were restricted to only the schizophrenia controls from the GAIN and nonGAIN studies.

25

Supplementary Table 2. Characteristics of European/European American and African American participants across 15

studies used for genome-wide association study (GWAS) meta-analysis of nicotine dependence.

StudyTotal

NMale, N (%)

Mean age (SD)

N (%) of Europeans/European Americansa

N (%) of African Americansa

Mild dependence

Moderate dependence

Severe dependence

Mild dependence

Moderate dependence

Severe dependence

AAND 1,687 718 (42.6)

35.4 (6.11)

0 0 0 526 (31.2) 830 (49.2) 331 (19.6)

COGEND 2,639 1,011 (38.3)

36.6 (5.57)

941 (48.6) 521 (26.9) 473 (24.4) 248 (35.2) 283 (40.2) 173 (24.6)

COGEND2 605 281 (46.4)

34.4 (5.87)

60 (20.5) 91 (31.2) 141 (48.3) 13 (4.1) 137 (43.8) 163 (52.1)

COHRA1 243 114 (46.9)

32.1 (9.1) 79 (32.5) 127 (52.3) 37 (15.2) 0 0 0

COPDGene 4,326 2,463 (56.9)

55.6 (7.41)

666 (30.1) 964 (43.6) 581 (26.3) 603 (28.5) 960 (45.4) 552 (26.1)

deCODE 9,090 4,253 (46.8)

54.2 (16.7)

5,871 (64.6) 2,074 (22.8) 1,145 (12.6) 0 0 0

EAGLE 3,006 2,528 (84.1)

Not availableb

1,416 (47.1) 1,027 (34.2) 563 (18.7) 0 0 0

FTC 2,374 1,314 (55.3)

45.9 (15.6)

1,345 (56.7) 793 (33.4) 236 (9.9) 0 0 0

GAIN 1,251 596 (47.6)

52 (15.2) 327 (42.3) 280 (36.2) 167 (21.6) 221 (46.3) 176 (36.9) 80 (16.8)

JHS/ARIC 1,143 502 (43.9)

52.9 (9.2) 0 0 0 867 (75.9) 218 (19.0) 58 (5.1)

nonGAIN 671 349 (52.0)

52.9 (15.5)

298 (44.4) 234 (34.9) 139 (20.7) 0 0 0

NTR 3,599 2,197 (61.0)

46.3 (14.9)

2,246 (62.4) 1,063 (29.5) 290 (8.1) 0 0 0

SAGE* 1,465 816 40 (9.88) 243 (29.2) 295 (35.5) 294 (35.3) 211 (33.3) 272 (43.0) 150 (23.7)

26

(55.7)UW-TTURC 1,781 741

(41.6)43.4 (11.2)

311 (20.2) 723 (47.1) 500 (32.6) 40 (16.2) 119 (48.2) 88 (35.6)

Yale-Penn 4,722 2,672 (56.6)

39.9 (9.9) 381 (18.0) 1,014 (47.9) 721 (34.1) 883 (33.9) 1,326 (50.9) 397 (15.2)

AAND, African American Nicotine Dependence; ARIC, Atherosclerosis Risk in Communities; COGEND, Collaborative Genetic

Study of Nicotine Dependence; COHRA1, Center for Oral Health Research in Appalachia 1; COPDGene, Chronic Obstructive

Pulmonary Disease Gene Study; EAGLE, Environment and Genetics in Lung Cancer Etiology Study; FTC, Finnish Twin Cohort;

GAIN, Genetic Association Information Network GWAS of schizophrenia; JHS, Jackson Heart Study; nonGAIN, Molecular Genetics

of Schizophrenia—nonGAIN; NTR, Netherlands Twin Registry; SAGE*, Study of Addiction: Genetics and Environment (* indicates

that overlapping COGEND participants were excluded); SD, standard deviation; UW-TTURC, University of Wisconsin-

Transdisciplinary Tobacco Use Research Center.

aFagerström Test for Nicotine Dependence (FTND) scores were used to categorize nicotine dependence as mild (FTND score 0–3),

moderate (FTND score 4–6) or severe (FTND score 7–10). For deCODE only, the mild category included 1,558 participants with

FTND score 0–3 and an additional set of 4,313 low-intensity smokers with no FTND data available but with <10 cigarettes per day

(CPD) reported. For ARIC only, FTND data were not available, and CPD was used as a proxy measure to define mild (CPD <10),

moderate (CPD=11–20), and severe (CPD>21) ND. bAverage age could not be calculated in EAGLE, because age was only available

as a categorical variable: 23.2% aged 59 or less, 18.2% aged 60–64, 22.4% aged 65–69, 21.4% aged 70–74 and 14.8% aged 75–79.

27

Supplementary Table 3. Study-specific associations of the rs910083-C allele in DNMT3B with each of the six specific

Fagerström Test for Nicotine Dependence (FTND) items. The items were tested separately in regression models (logistic regression

models for items 1, 3, and 4, or linear regression models for items 2, 5, and 6) with rs910083 genotype dosage as the predictor and

age, sex, principal component eigenvectors (if applicable), and other study-specific variables (if applicable) as covariates.

Study (and

ancestry if more

than one)

Item 1:

How soon after

you wake up

do/did you smoke

your first

cigarette?

Item 2:

Do/Did you find

it difficult to

refrain from

smoking in

places where it

is forbidden?

Item 3:

Which

cigarette

would you

hate most to

give up?

Item 4:

How many

cigarettes per

day do/did you

smoke?

Item 5:

Do/did you smoke

more frequently

during the first

hours after waking

than during the

rest of the day?

Item 6:

Do/did you

smoke if you

are so ill that

you are in bed

most of the

day?

β P β P β P β P β P β P

AAND 0.053 0.22 0.10 0.30 0.056 0.56 0.029 0.27 0.022 0.81 0.22 0.024

COGEND AA -0.051 0.54 -0.070 0.61 -0.047 0.72 -0.046 0.44 0.11 0.42 -0.074 0.57

COGEND EA 0.041 0.32 0.038 0.61 0.070 0.32 0.026 0.47 0.11 0.16 0.063 0.38

COGEND2 AA -0.071 0.32 -0.19 0.38 0.19 0.44 0.012 0.89 0.40 0.056 0.074 0.74

COGEND2 EA 0.11 0.22 0.077 0.65 0.0072 0.97 0.057 0.51 0.14 0.43 0.32 0.088

COPDGene AA 0.081 0.039 0.13 0.12 0.15 0.074 0.0032 0.92 0.17 0.042 0.13 0.13

COPDGene EA 0.086 0.0092 -0.058 0.42 -0.053 0.44 0.016 0.51 -0.0016 0.98 -0.055 0.44

deCODE 0.013 0.52 -0.0048 0.92 0.012 0.80 0.028 0.11 0.014 0.77 0.028 0.52

COHRA1 -0.016 0.86 1.2×10-5 1.00 b b 0.037 0.60 0.31 0.11 0.052 0.80

EAGLE 0.0085 0.77 0.027 0.64 0.13 0.80 0.031 0.23 0.13 0.065 0.033 0.58

28

FTC -0.0012 0.97 0.040 0.64 -0.041 0.53 -0.0088 0.69 -0.081 0.26 -0.0082 0.90

GAIN AA 0.10 0.22 -0.16 0.38 0.10 0.50 0.094 0.14 0.23 0.18 -0.22 0.16

GAIN EA 0.11 0.067 0.15 0.27 0.068 0.53 0.055 0.31 0.0044 0.98 0.085 0.47

JHS 0.15 0.070 0.27 0.27 -0.019 0.89 0.12 0.056 0.16 0.29 0.33 0.10

nonGAIN -0.047 0.45 -0.32 0.020 0.11 0.51 -0.011 0.86 b b 0.029 0.82

SAGE* AA 0.11 0.14 0.17 0.27 0.040 0.79 -0.016 0.81 -0.11 0.49 0.19 0.20

SAGE* EA 0.10 0.051 0.16 0.13 0.032 0.77 0.075 0.13 0.16 0.14 0.036 0.74

UW-TTURC AA 0.11 0.18 0.46 0.064 0.19 0.43 0.037 0.62 0.043 0.84 -0.14 0.50

UW-TTURC EA 0.066 0.036 0.14 0.11 0.027 0.72 0.050 0.070 0.20 0.0070 0.11 0.12

Yale-Penn AA 0.00065 0.99 -0.012 0.85 -0.16 0.024 0.025 0.25 0.12 0.12 0.069 0.31

Yale-Penn AA 0.012 0.70 0.045 0.54 -0.024 0.70 0.0082 0.69 0.0028 0.96 -0.045 0.50

Meta-analysis 0.035 1.2×10-4 0.031 0.13 0.021 0.27 0.023 0.0011 0.060 0.0029 0.041 0.030aAll European and African ancestry participants with FTND data in each study were included, except for NTR. bThe regression model

did not converge due to a high number of missing responses. The meta-analysis results for this FTND item reflect the cohorts with

available results.

29

Supplementary Table 4. Nicotine dependence associations of 140 meQTL SNPs, located in

DNMT3B and across its 100kb flanking region. All 140 SNPs were associated with probe

cg13636640 methylation levels at P<3.69×10-13, based on Bonferroni correction for genome-

wide testing of all SNP-probe pairs in Hannon et al.38 (N=166 fetal brain samples). meQTL

results were obtained at http://epigenetics.essex.ac.uk/mQTL/. Results are sorted by nicotine

dependence GWAS meta-analysis p-values across EUR and AA studies (total N=38,602), with β

estimates corresponding to the coded allele. NCBI build 37 positions are indicated.

Chr. 20

position meQTL P SNP

Nicotine dependence meta-analysis

Coded allele β Standard error P

31378690 2.3×10-26 rs910083 C 0.032 0.0057 3.7×10-8

31378343 5.9×10-26 rs73621668 CT 0.033 0.0062 1.0×10-7

31385814 5.8×10-22 rs2424921 T 0.031 0.0059 1.3×10-7

31389518 4.5×10-24 rs2424929 G 0.030 0.0058 2.1×10-7

31389009 4.5×10-24 rs2065576 T 0.030 0.0058 2.2×10-7

31363774 3.4×10-34 rs6087999 G 0.030 0.0057 2.2×10-7

31388461 7.3×10-21 rs6058892 T 0.029 0.0056 2.5×10-7

31381690 5.8×10-22 rs6057647 T 0.031 0.0059 2.5×10-7

31388636 4.5×10-24 rs2424928 C 0.030 0.0058 2.7×10-7

31383530 7.7×10-28 rs875041 T 0.029 0.0057 2.8×10-7

31386347 5.8×10-22 rs6058891 C 0.030 0.0059 3.0×10-7

31385452 5.2×10-21 rs2424920 C 0.030 0.0059 3.0×10-7

31387954 3.0×10-19 rs1997797 G 0.029 0.0056 3.0×10-7

31385269 4.7×10-27 rs992472 T 0.029 0.0056 3.3×10-7

31386449 5.8×10-22 rs2424922 C 0.030 0.0059 3.3×10-7

31392777 2.1×10-24 rs6058893 T 0.029 0.0056 3.9×10-7

31378448 9.0×10-23 rs2424915 T 0.028 0.0055 3.9×10-7

31380072 5.7×10-23 rs1040553 T 0.028 0.0055 4.6×10-7

31337487 3.3×10-43 rs1883730 A 0.029 0.0058 5.2×10-7

31380201 5.7×10-23 rs1040555 T 0.028 0.0055 5.4×10-7

31376490 5.0×10-23 rs4911260 C 0.030 0.0059 5.5×10-7

31380931 9.8×10-22 rs11480729 AT 0.028 0.0055 5.6×10-7

30

31381229 9.8×10-22 rs993419 A 0.027 0.0055 6.7×10-7

31343543 3.3×10-43 rs2424899 C 0.029 0.0058 7.5×10-7

31345752 1.5×10-41 rs1007124 C 0.029 0.0058 7.7×10-7

31348750 1.5×10-41 rs6058869 T 0.029 0.0058 8.0×10-7

31365719 5.0×10-23 rs6058883 C 0.029 0.0059 9.6×10-7

31374520 5.0×10-23 rs2424914 G 0.030 0.006 9.8×10-7

31387525 5.8×10-22 rs1855350 T 0.029 0.006 1.1×10-6

31391360 5.8×10-22 rs4911261 G 0.029 0.006 1.1×10-6

31390533 5.8×10-22 rs2064350 C 0.029 0.006 1.2×10-6

31359819 5.0×10-23 rs2424907 T 0.029 0.006 1.3×10-6

31330413 3.8×10-36 rs4911105 A 0.028 0.0058 1.5×10-6

31335978 1.3×10-26 rs2424895 C 0.029 0.006 1.7×10-6

31367079 8.6×10-30 rs1569686 T 0.028 0.006 1.9×10-6

31357844 1.0×10-23 rs4911256 G 0.029 0.006 2.0×10-6

31361861 8.6×10-30 rs2424909 C 0.028 0.006 2.0×10-6

31354935 2.7×10-27 rs6087992 C 0.029 0.0061 2.1×10-6

31374991 1.7×10-29 rs4911107 A 0.028 0.0059 2.1×10-6

31352585 1.3×10-26 rs4911253 G 0.029 0.0061 2.2×10-6

31330612 3.8×10-36 rs6119279 G 0.027 0.0058 2.3×10-6

31366598 8.6×10-30 rs1003521 C 0.028 0.0059 2.3×10-6

31382770 3.6×10-27 rs2424918 T 0.026 0.0056 2.6×10-6

31368560 2.3×10-23 rs1003522 C 0.028 0.0059 2.6×10-6

31363193 8.6×10-30 rs1883729 A 0.028 0.0059 2.7×10-6

31346001 1.3×10-26 rs1007122 T 0.028 0.006 2.7×10-6

31336676 2.8×10-37 rs12329424 T 0.028 0.006 2.7×10-6

31330115 3.8×10-36 rs4911252 T 0.027 0.0058 2.8×10-6

31393352 1.6×10-27 rs4911262 A 0.028 0.006 2.9×10-6

31361249 8.6×10-30 rs1333469 T 0.028 0.0059 3.0×10-6

31361709 8.6×10-30 rs11367842 T 0.028 0.0059 3.1×10-6

31383353 1.6×10-28 rs910085 G 0.028 0.0059 3.3×10-6

31376282 6.5×10-26 rs4911259 G 0.026 0.0056 3.5×10-6

31

31331330 3.8×10-36 rs4911106 G 0.027 0.0058 3.6×10-6

31358253 1.3×10-26 rs6087995 C 0.028 0.006 3.7×10-6

31370544 1.7×10-29 rs11469084 C 0.025 0.0055 3.8×10-6

31359391 8.6×10-30 rs6087420 A 0.026 0.0056 3.9×10-6

31398105 8.2×10-20 rs6058897 C 0.026 0.0057 4.1×10-6

31369136 1.7×10-29 rs3835238 A 0.025 0.0055 4.3×10-6

31380073 5.7×10-23 rs1040554 T 0.025 0.0055 4.8×10-6

31392366 3.0×10-25 rs4911110 G 0.027 0.0059 5.3×10-6

31346581 1.5×10-41 rs6119281 G 0.026 0.0058 5.8×10-6

31350664 2.9×10-37 rs742630 G 0.027 0.0059 5.9×10-6

31358356 2.5×10-34 rs8115823 A 0.025 0.0056 6.9×10-6

31357097 1.8×10-45 rs1855349 T 0.026 0.0057 7.4×10-6

31344512 5.2×10-36 rs2424901 G 0.027 0.006 7.5×10-6

31344342 5.2×10-36 rs2424900 C 0.027 0.006 7.7×10-6

31336960 2.8×10-37 rs6058860 C 0.027 0.006 7.7×10-6

31348445 5.2×10-36 rs6058868 C 0.027 0.006 8.2×10-6

31346925 5.2×10-36 rs2377667 G 0.027 0.006 8.3×10-6

31345840 5.2×10-36 rs1007123 A 0.027 0.006 8.4×10-6

31335157 1.4×10-31 rs6058859 G 0.027 0.0059 8.7×10-6

31375572 5.6×10-25 rs4911109 A 0.024 0.0054 9.5×10-6

31352316 2.9×10-37 rs2424904 C 0.027 0.006 9.6×10-6

31352927 1.8×10-31 rs2424905 C 0.026 0.0059 9.8×10-6

31349315 5.2×10-36 rs2424903 T 0.027 0.006 9.9×10-6

31384136 1.5×10-25 rs998382 G 0.024 0.0055 1.0×10-5

31382860 6.7×10-26 rs6088008 G 0.024 0.0054 1.2×10-5

31349908 1.8×10-45 rs6087990 C 0.025 0.0057 1.3×10-5

31353785 2.9×10-37 rs6058874 G 0.026 0.0059 1.3×10-5

31387737 3.4×10-25 rs72269343 C 0.024 0.0056 1.5×10-5

31371677 1.3×10-25 rs2889702 A 0.025 0.0058 1.7×10-5

31340997 1.4×10-31 rs6087989 G 0.026 0.0059 1.8×10-5

31336533 2.0×10-41 rs61542599 CAG 0.030 0.0071 1.8×10-5

32

31379184 3.9×10-18 rs2889703 A 0.023 0.0055 1.9×10-5

31373782 2.4×10-26 rs2377668 G 0.024 0.0056 2.1×10-5

31374259 5.5×10-19 rs2424913 T 0.023 0.0055 2.3×10-5

31369479 1.7×10-27 rs4911258 A 0.024 0.0056 2.7×10-5

31375311 2.4×10-26 rs4911108 G 0.025 0.006 2.8×10-5

31368960 1.7×10-27 rs6141814 A 0.023 0.0056 3.1×10-5

31359109 1.2×10-25 rs2424906 C 0.024 0.0058 3.4×10-5

31390840 1.0×10-26 rs17123658 A 0.024 0.0058 3.7×10-5

31380309 2.0×10-26 rs1474738 A 0.023 0.0055 4.1×10-5

31353564 2.9×10-37 rs6058873 T 0.030 0.0072 4.4×10-5

31379665 7.2×10-26 rs910084 T 0.022 0.0054 4.5×10-5

31353231 5.0×10-35 rs14827544

2

T 0.028 0.0071 7.1×10-5

31351623 3.1×10-38 rs2018002 T 0.023 0.0058 8.2×10-5

31400673 8.6×10-26 rs6058900 C 0.023 0.006 1.0×10-4

31329704 2.8×10-30 rs6119946 A 0.023 0.006 1.3×10-4

31340800 3.8×10-16 rs6087988 T 0.023 0.006 1.5×10-4

31335246 1.8×10-13 rs6087984 A 0.025 0.0065 1.7×10-4

31359574 3.0×10-22 rs4911257 C 0.022 0.006 1.8×10-4

31353790 3.1×10-38 rs6058875 G 0.022 0.0058 2.1×10-4

31397258 8.0×10-22 rs3092509 GATTC 0.022 0.006 2.8×10-4

31357579 5.2×10-36 rs4911254 A 0.021 0.006 3.9×10-4

31357580 5.2×10-36 rs4911255 T 0.021 0.006 4.0×10-4

31356095 1.4×10-15 rs11907740 A 0.023 0.0065 5.0×10-4

31337822 1.8×10-13 rs6141802 A 0.021 0.0061 5.8×10-4

31360383 4.7×10-14 rs2424908 T 0.020 0.006 0.0011

31337755 3.8×10-14 rs2424896 T 0.020 0.0062 0.0011

31398876 6.3×10-14 rs8118663 G 0.021 0.0065 0.0015

31420757 3.5×10-14 rs853854 A 0.018 0.0058 0.0018

31346525 3.2×10-14 rs12481172 G 0.020 0.0064 0.0021

31465374 2.5×10-14 rs6058925 G 0.017 0.0059 0.0030

33

31334245 3.7×10-14 rs11484368 GA 0.019 0.0065 0.0031

31460076 2.5×10-14 rs11697394 G 0.017 0.0059 0.0032

31464467 2.5×10-14 rs6058924 G 0.017 0.0059 0.0033

31465450 2.5×10-14 rs6058926 T 0.017 0.0059 0.0033

31459562 2.5×10-14 rs13037833 C 0.017 0.0059 0.0036

31470484 3.2×10-13 rs13038717 G 0.018 0.006 0.0036

31460718 2.5×10-14 rs12625153 C 0.017 0.0059 0.0037

31458773 2.5×10-14 rs6058922 G 0.017 0.0059 0.0044

31333787 6.5×10-14 rs6087982 G 0.019 0.0067 0.0051

31355998 3.4×10-40 NA NA NA NA NA

31343904 1.7×10-38 NA NA NA NA NA

31338098 3.8×10-36 NA NA NA NA NA

31356108 5.2×10-36 NA NA NA NA NA

31347512 2.7×10-29 NA NA NA NA NA

31353232 2.0×10-28 NA NA NA NA NA

31361418 2.9×10-28 NA NA NA NA NA

31397019 2.6×10-27 NA NA NA NA NA

31358416 1.3×10-26 NA NA NA NA NA

31394631 8.4×10-26 NA NA NA NA NA

31372847 1.3×10-20 NA NA NA NA NA

31360923 4.1×10-20 NA NA NA NA NA

31373668 1.1×10-19 NA NA NA NA NA

31385471 1.9×10-17 NA NA NA NA NA

31342224 9.0×10-15 NA NA NA NA NA

31458665 2.5×10-14 NA NA NA NA NA

31460244 2.5×10-14 NA NA NA NA NA

NA, not available

34

Supplementary Table 5. Associations of the rs910083-C allele with DNMT3B gene

expression across the brain tissues in GTEx. Results are sorted by p-value.

Brain tissue N with RNA-sequencing and

SNP genotype data available

β (SE) P

Cerebellar hemisphere 89 0.44 (0.081) 7.0×10-7

Cerebellum 103 0.50 (0.099) 3.0×10-6

Anterior cingulate cortex 72 0.38 (0.095) 2.0×10-4

Hypothalamus 81 0.39 (0.11) 5.8×10-4

Frontal cortex 92 0.31 (0.11) 6.6×10-3

Nucleus accumbens 93 0.26 (0.10) 0.015

Cortex 96 0.24 (0.12) 0.052

Caudate (basal ganglia) 100 0.18 (0.099) 0.080

Hippocampus 81 0.22 (0.13) 0.095

Putamen (basal ganglia) 82 -0.019 (0.12) 0.87

Amygdalaa 62 NA NA

Spinal cord (cervical c-1)a 59 NA NA

Substantia nigraa 56 NA NA

aGTEx did not conduct eQTL testing for tissues with fewer than 70 samples in version 6p.

Associations were presented in these tissues in version 6; rs910083 was not associated with

DNMT3B expression in amygdala (P=0.26), spinal cord (P=0.93), or substantia nigra (P=0.58).

35

Supplementary Table 6. Associations of the rs910083-C allele that surpassed the significance threshold (false discovery rate

<0.05) in GTEx for any nearby gene in any tissue. Results are sorted by p-value.

Gene Tissuea N with RNA-sequencing and

SNP genotype data available

β (SE) P

MAPRE1 Skin - Sun Exposed (Lower leg) 302 0.40 (0.045) 7.3×10-17

MAPRE1 Esophagus – Mucosa 241 0.33 (0.037) 1.8×10-16

MAPRE1 Skin - Not Sun Exposed (Suprapubic) 196 0.38 (0.051) 4.0×10-12

MAPRE1 Whole Blood 338 0.20 (0.030) 1.1×10-10

DNMT3B Brain - Cerebellar Hemisphere 89 0.44 (0.081) 7.0×10-7

COMMD7 Colon – Transverse 169 0.31 (0.061) 1.0×10-6

COMMD7 Nerve – Tibial 256 0.23 (0.047) 1.2×10-6

COMMD7 Esophagus - Muscularis 218 0.20 (0.041) 2.0×10-6

COMMD7 Artery – Tibial 285 0.19 (0.039) 2.5×10-6

DNMT3B Brain – Cerebellum 103 0.50 (0.099) 3.0×10-6

MAPRE1 Pancreas 149 0.23 (0.049) 1.0×10-5

COMMD7 Testis 157 0.21 (0.045) 1.3×10-5

DNMT3B Esophagus - Muscularis 218 0.26 (0.059) 1.8×10-5

36

DNMT3B Colon - Transverse 169 0.33 (0.073) 1.8×10-5

BPIFB4 Artery - Tibial 285 -0.30 (0.072) 5.4×10-5

aSNP-expression associations were tested in the 44 GTEx tissues with more than 70 samples available.

37

Supplementary Table 7. DNMT3B and DBH SNP associations with lung cancer.

Associations were assessed overall (N=27,349 cases and 54,472 controls) and by histological

subtype (N=10,487 cases and 53,505 controls for adenocarcinoma and N=6,937 cases and

53,649 controls for squamous cell lung carcinoma). Participants were all of European/European

American ancestry.

SNP allele (Gene) Lung cancer type P OR (95% CI)

rs910083-C (DNMT3B) All 0.082 1.02 (1.00–1.05)

Adenocarcinoma 0.47 0.99 (0.96–1.02)

Squamous cell carcinoma 0.0095 1.05 (1.01–1.09)

rs56116178-G (DBH) All 7.9×10-4 1.08 (1.03–1.12)

Adenocarcinoma 0.59 1.02 (0.96–1.08)

Squamous cell carcinoma 5.4×10-4 1.13 (1.06–1.22)

38

Supplementary Table 8. DNMT3B and DBH SNP associations with squamous cell carcinoma, with and without adjustment for

smoking, in N up to 50,994 (5,783 cases and 45,211 controls) of the 60,586 that comprised the GWAS meta-analysis for this

subtype. Analyses were carried out in the Transdisciplinary Research for Cancer in Lung of the International Lung Cancer

Consortium (TRICL-ILCCO) Oncoarray and previous lung cancer GWAS (MDACC, IARC, SLRI, GERMAN, Harvard, and

deCODE) studies with smoking data readily available. Among these studies, more than 90% had the data available to enable smoking-

adjusted analyses.

SNP allele

(Gene)

Squamous cell carcinoma ~ SNP + age + sex + PCs

Squamous cell carcinoma ~ SNP + age + sex + PCs +

smoking history (ever/never) + pack-years for ever-smokers

(0 for never-smokers)

P OR (95% CI) P OR (95% CI)

rs910083-C

(DNMT3B)

0.026 1.05 (1.01–1.10) 0.17 1.03 (0.99–1.09)

rs56116178-G

(DBH)

0.011 1.10 (1.02–1.19) 0.29 1.05 (0.96–1.14)

CI, confidence interval; OR, odds ratio; PCs, principal components

39

Supplementary Table 9. HapMap-imputed SNPs implicated at P<5×10-7 in our cross-

ancestry GWAS meta-analysis of nicotine dependence (N=38,602 ever-smokers) and tested

for association with cigarettes per day (CPD, N=38,181 ever-smokers) and smoking

initiation (N=74,035 ever- vs never-smokers) in the Tobacco and Genetics (TAG)

consortium. SNPs are sorted by their nicotine dependence meta-analysis P value. NCBI build 37

positions are indicated. Linkage disequilibrium is presented with the top nicotine dependence-

associated SNP rs910083. β estimates were derived from linear regression models for nicotine

dependence and CPD and a logistic regression model for smoking initiation.

SNP

Chr. 20

position

Code

d

allele

r2 / D’

with

rs910083

in

1000G

EUR

r2 / D’

with

rs91008

3 in

1000G

AFR

Nicotine

dependence

meta-analysis

CPD meta-

analysis (TAG

consortium)

Smoking

initiation

meta-analysis

(TAG

consortium)

β (SE) P β (SE) P β (SE) P

rs242492

1

31,385,814 T 0.97 /

0.99

0.55 / 1 0.031

(0.0059)

1.3×10-7 0.16

(0.082)

0.055 0.012

(0.012)

0.29

rs206557

6

31,389,009 T 0.99 / 1 0.56 /

0.90

0.030

(0.0058)

2.2×10-7 0.15

(0.082)

0.059 0.011

(0.012)

0.33

rs242492

8

31,388,636 T 0.99 / 1 0.56 /

0.90

-0.030

(0.0058)

2.7×10-7 -0.16

(0.082)

0.052 -0.014

(0.012)

0.21

rs605889

1

31,386,347 T 0.99 / 1 0.52 / 1 -0.030

(0.0059)

3.0×10-7 -0.16

(0.082)

0.054 -0.012

(0.012)

0.30

40

rs199779

7

31,387,954 C 0.99 /

0.99

0.76 /

0.99

-0.029

(0.0056)

3.0×10-7 -0.16

(0.082)

0.056 -0.012

(0.012)

0.31

rs992472 31,385,269 T 0.78 /

0.99

0.39 /

0.89

0.029

(0.0056)

3.3×10-7 0.19

(0.084)

0.027 0.023

(0.012)

0.049

rs242492

2

31,386,449 T 0.99 / 1 0.52 / 1 -0.030

(0.0059)

3.3×10-7 -0.16

(0.082)

0.051 -0.015

(0.012)

0.20

rs605889

3

31,392,777 T 0.83 /

0.99

0.68 /

0.97

0.029

(0.0056)

3.9×10-7 0.17

(0.083)

0.044 0.018

(0.012)

0.13

rs242491

5

31,378,448 T 0.97 / 1 0.29 / 1 0.028

(0.0055)

3.9×10-7 0.16

(0.082)

0.044 0.011

(0.012)

0.34

41

Supplementary Figure 1. Quantile-quantile plots of p-values from GWAS meta-analyses of

nicotine dependence across 15 studies. Results are shown for the meta-analyses of (A) all

studies, (B) European/European American ancestry-only studies, and (C) African American-only

studies. The observed vs expected meta-analysis p-values (black dots) are plotted along the

identity line (red) with the corresponding genomic inflation factor (λ) indicated.

(A)

42

(B)

(C)

43

Supplementary Figure 2. Manhattan plots of SNP and indel associations with nicotine

dependence from ancestry-specific GWAS meta-analyses. (A) The European/European

American ancestry-only meta-analysis included 13 studies (total N=28,677), and (B) the African

American-only meta-analysis included 9 studies (total N=9,925). The –log10 meta-analysis p-

values are plotted by chromosomal position of SNPs (depicted as circles) and indels (depicted as

triangles). The genome-wide statistical significance threshold (P<5x10-8) is shown as a solid

black line.

(A)

44

(B)

45

Supplementary Figure 3. DNMT3B SNP associations with severe vs mild nicotine dependence across studies. Associations are

shown for the rs910083-C allele across European/European American and African American participants. Open diamonds indicate the

nicotine dependence association results, sorted by the size of the originating study, and the black-filled diamond indicates the meta-

analysis result. AAs, African Americans; EAs, European Americans.

46

Supplementary Figure 4. SNPs, located in DNMT3B and across its 100kb flanking region, that were significantly associated

with DNA methylation levels in fetal brain (N=166). Genotyped and 1000 Genomes-imputed SNPs (blue lines) were tested for

association with Illumina HumanMethylation450K probes (red lines). Associations at P<3.69×10-13, based on Bonferroni correction

for genome-wide testing of all SNP-probe pairs in Hannon et al.38, are marked as black lines to designate the SNP and DNA

methylation probe positions. NCBI build 37 positions are indicated. Results were plotted at http://epigenetics.essex.ac.uk/mQTL/.

47

Supplementary Figure 5. DNMT3B gene expression across all 13 brain tissues available in

GTEx. RNA expression levels are presented as log10 reads per kilobase transcript per million

reads (RPKM). The sample size for each brain tissue with RNA expression data is presented.

48

Supplementary Figure 6. DNMT3B RNA transcript expression across all 10 tissues available in the Brain eQTL Almanac.

RNA expression levels are presented as log-transformed normalized intensity values. Tissues include the following: cerebellar cortex

(CRBL), intralobular white matter (WHMT), substantia nigra (SNIG), temporal cortex (TCTX), frontal cortex (FCTX), inferior

olivary nucleus (sub-dissected from the medulla, MEDU), thalamus (THAL), occipital cortex (OCTX), hippocampus (HIPP), and

putamen (PUTM).

49

Supplementary Figure 7. DBH SNP associations with severe vs mild nicotine dependence across studies. Associations are

shown for the rs56116178-G allele across European/European American participants, as the allele had very low frequency among

African Americans and was thus not included in the meta-analysis. Open diamonds indicate the nicotine dependence association

results, sorted by the size of the originating study, and the black-filled diamond indicates the meta-analysis result. EAs, European

Americans.

50

Supplementary Figure 8. Cigarettes per day (CPD) reported in each dependence category,

as defined by the Fagerström Test for Nicotine Dependence (FTND). Data were combined

across all COGEND (N=7,878) and deCODE (N=4,947) participants with phenotype data

available to calculate concordance.

0 to 3 4 to 6 7 to 100

5001,0001,5002,0002,5003,0003,5004,000

CPD <= 10 CPD = 11–20 CPD >= 21

FTND score

51

References1. Hancock DB, Reginsson GW, Gaddis NC, Chen X, Saccone NL, Lutz SM et al. Genome-

wide meta-analysis reveals common splice site acceptor variant in CHRNA4 associated

with nicotine dependence. Transl Psychiatry 2015; 5: e651.

2. Bierut LJ, Madden PA, Breslau N, Johnson EO, Hatsukami D, Pomerleau OF et al. Novel

genes identified in a high-density genome wide association study for nicotine

dependence. Hum Mol Genet 2007; 16(1): 24-35.

3. Regan EA, Hokanson JE, Murphy JR, Make B, Lynch DA, Beaty TH et al. Genetic

epidemiology of COPD (COPDGene) study design. COPD 2010; 7(1): 32-43.

4. Landi MT, Chatterjee N, Yu K, Goldin LR, Goldstein AM, Rotunno M et al. A genome-

wide association study of lung cancer identifies a region of chromosome 5p15 associated

with risk for adenocarcinoma. Am J Hum Genet 2009; 85(5): 679-691.

5. Landi MT, Consonni D, Rotunno M, Bergen AW, Goldstein AM, Lubin JH et al.

Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative

population-based case-control study of lung cancer. BMC Public Health 2008; 8: 203.

6. Rice JP, Hartz SM, Agrawal A, Almasy L, Bennett S, Breslau N et al. CHRNB3 is more

strongly associated with Fagerstrom Test for Cigarette Dependence-based nicotine

dependence than cigarettes per day: phenotype definition changes genome-wide

association studies results. Addiction 2012; 107(11): 2019-2028.

7. Shaffer JR, Wang X, Feingold E, Lee M, Begum F, Weeks DE et al. Genome-wide

association scan for childhood caries implicates novel genes. J Dent Res 2011; 90(12):

1457-1462.

52

8. Manolio TA, Rodriguez LL, Brooks L, Abecasis G, Ballinger D, Daly M et al. New

models of collaboration in genome-wide association studies: the Genetic Association

Information Network. Nat Genet 2007; 39(9): 1045-1051.

9. Baker TB, Piper ME, McCarthy DE, Bolt DM, Smith SS, Kim SY et al. Time to first

cigarette in the morning as an index of ability to quit smoking: implications for nicotine

dependence. Nicotine Tob Res 2007; 9 Suppl 4: S555-570.

10. Fagerstrom KO. Measuring degree of physical dependence to tobacco smoking with

reference to individualization of treatment. Addict Behav 1978; 3(3-4): 235-241.

11. Heatherton TF, Kozlowski LT, Frecker RC, Fagerstrom KO. The Fagerstrom Test for

Nicotine Dependence: a revision of the Fagerstrom Tolerance Questionnaire. Br J Addict

1991; 86(9): 1119-1127.

12. Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T et al. Quality

control and quality assurance in genotypic data for genome-wide association studies.

Genet Epidemiol 2010; 34(6): 591-602.

13. Johnson EO, Hancock DB, Levy JL, Gaddis NC, Saccone NL, Bierut LJ et al. Imputation

across genotyping arrays for genome-wide association studies: assessment of bias and a

correction strategy. Hum Genet 2013; 132: 509-522.

14. Thorgeirsson TE, Geller F, Sulem P, Rafnar T, Wiste A, Magnusson KP et al. A variant

associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature

2008; 452(7187): 638-642.

53

15. Gulcher JR, Kristjansson K, Gudbjartsson H, Stefansson K. Protection of privacy by

third-party encryption in genetic research in Iceland. Eur J Hum Genet 2000; 8(10): 739-

742.

16. Sulem P, Gudbjartsson DF, Walters GB, Helgadottir HT, Helgason A, Gudjonsson SA et

al. Identification of low-frequency variants associated with gout and serum uric acid

levels. Nat Genet 2011; 43(11): 1127-1130.

17. Kaprio J. The Finnish Twin Cohort Study: an update. Twin Res Hum Genet 2013; 16(1):

157-162.

18. Kaprio J. Twin studies in Finland 2006. Twin Res Hum Genet 2006; 9(6): 772-777.

19. Loukola A, Buchwald J, Gupta R, Palviainen T, Hallfors J, Tikkanen E et al. A Genome-

wide association study of a biomarker of nicotine metabolism. PLoS Genet 2015; 11(9):

e1005498.

20. Sempos CT, Bild DE, Manolio TA. Overview of the Jackson Heart Study: a study of

cardiovascular diseases in African American men and women. Am J Med Sci 1999;

317(3): 142-146.

21. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The

ARIC investigators. Am J Epidemiol 1989; 129(4): 687-702.

22. Boomsma DI, de Geus EJ, Vink JM, Stubbe JH, Distel MA, Hottenga JJ et al.

Netherlands Twin Register: from twins to twin families. Twin Res Hum Genet 2006; 9(6):

849-857.

54

23. Willemsen G, de Geus EJ, Bartels M, van Beijsterveldt CE, Brooks AI, Estourgie-van

Burk GF et al. The Netherlands Twin Register biobank: a resource for genetic

epidemiological studies. Twin Res Hum Genet 2010; 13(3): 231-245.

24. Minica CC, Dolan CV, Hottenga JJ, Pool R, Genome of the Netherlands C, Fedko IO et

al. Heritability, SNP- and Gene-Based Analyses of Cannabis Use Initiation and Age at

Onset. Behav Genet 2015; 45(5): 503-513.

25. Vink JM, Willemsen G, Beem AL, Boomsma DI. The Fagerstrom Test for Nicotine

Dependence in a Dutch sample of daily smokers and ex-smokers. Addict Behav 2005;

30(3): 575-579.

26. Edenberg HJ. The collaborative study on the genetics of alcoholism: an update. Alcohol

Res Health 2002; 26(3): 214-218.

27. Bierut LJ, Strickland JR, Thompson JR, Afful SE, Cottler LB. Drug use and dependence

in cocaine dependent subjects, community-based individuals, and their siblings. Drug

Alcohol Depend 2008; 95(1-2): 14-22.

28. Gelernter J, Kranzler HR, Sherva R, Almasy L, Koesterer R, Smith AH et al. Genome-

wide association study of alcohol dependence:significant findings in African- and

European-Americans including novel risk loci. Mol Psychiatry 2014; 19(1): 41-49.

29. Gelernter J, Sherva R, Koesterer R, Almasy L, Zhao H, Kranzler HR et al. Genome-wide

association study of cocaine dependence and related traits: FAM53B identified as a risk

gene. Mol Psychiatry 2014; 19(6): 717-723.

55

30. Gelernter J, Kranzler HR, Sherva R, Koesterer R, Almasy L, Zhao H et al. Genome-wide

association study of opioid dependence: multiple associations mapped to calcium and

potassium pathways. Biol Psychiatry 2014; 76(1): 66-74.

31. Gelernter J, Kranzler HR, Sherva R, Almasy L, Herman AI, Koesterer R et al. Genome-

wide association study of nicotine dependence in American populations: identification of

novel risk loci in both African-Americans and European-Americans. Biol Psychiatry

2015; 77(5): 493-503.

32. Howie B, Marchini J, Stephens M. Genotype imputation with thousands of genomes. G3

(Bethesda) 2011; 1(6): 457-470.

33. Gordon D, Finch SJ, Nothnagel M, Ott J. Power and sample size calculations for case-

control genetic association tests when errors are present: application to single nucleotide

polymorphisms. Hum Heredity 2002; 54(1): 22-33.

34. Sinnott JA, Dai W, Liao KP, Shaw SY, Ananthakrishnan AN, Gainer VS et al. Improving

the power of genetic association tests with imperfect phenotype derived from electronic

medical records. Hum Genet 2014; 133(11): 1369-1382.

35. Aulchenko YS, Struchalin MV, van Duijn CM. ProbABEL package for genome-wide

association analysis of imputed data. BMC Bioinformatics 2010; 11: 134.

36. GenABEL tutorial. http://www.genabel.org/sites/default/files/pdfs/GenABEL-

tutorial.pdf, 2014.

56

37. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a

tool set for whole-genome association and population-based linkage analyses. Am J Hum

Genet 2007; 81(3): 559-575.

38. Hannon E, Spiers H, Viana J, Pidsley R, Burrage J, Murphy TM et al. Methylation QTLs

in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci

2016; 19(1): 48-54.

39. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet 2013;

45(6): 580-585.

40. GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot

analysis: multitissue gene regulation in humans. Science 2015; 348(6235): 648-660.

41. Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M et al. Human

genomics. The human transcriptome across tissues and individuals. Science 2015;

348(6235): 660-665.

42. Ramasamy A, Trabzuni D, Guelfi S, Varghese V, Smith C, Walker R et al. Genetic

variability in the regulation of gene expression in ten regions of the human brain. Nat

Neurosci 2014; 17(10): 1418-1428.

43. Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A et al.

Large-scale whole-genome sequencing of the Icelandic population. Nat Genet 2015;

47(5): 435-444.

57