sex differences in cancer driver genes and biomarkers · constance h. li1,2, syed haider1,yu-jia...

12
Genome and Epigenome Sex Differences in Cancer Driver Genes and Biomarkers Constance H. Li 1,2 , Syed Haider 1 , Yu-Jia Shiah 1,2 , Kevin Thai 1 , and Paul C. Boutros 1,2,3 Abstract Cancer differs signicantly between men and women; even after adjusting for known epidemiologic risk factors, the sexes differ in incidence, outcome, and response to therapy. These differences occur in many but not all tumor types, and their origins remain largely unknown. Here, we compare somatic mutation proles between tumors arising in men and in women. We discovered large differences in mutation density and sex biases in the frequency of mutation of specic genes; these differences may be associated with sex biases in DNA mismatch repair genes or microsatellite instability. Sex-biased genes include well-known drivers of cancer such as b-catenin and BAP1. Sex inuenced biomarkers of patient outcome, where different genes were associated with tumor aggression in each sex. These data call for increased study and consider- ation of the molecular role of sex in cancer etiology, progres- sion, treatment, and personalized therapy. Signicance: This study provides a comprehensive cata- log of sex differences in somatic alterations, including in cancer driver genes, which inuence prognostic biomarkers that predict patient outcome after denitive local therapy. Cancer Res; 78(19); 552737. Ó2018 AACR. Introduction Sex differences in cancer have been known at least since 1949 (1), with repeated demonstration that males have higher cancer risk both in studies using North American (e.g., SEER; ref. 2) and international databases (e.g., IARC; ref. 3). Most, but not all tumor types show increased incidence in men: thyroid cancer occurs 2.5 times more frequently in women. These differences remain after controlling for known epidemiologic risk factors (3). At most tumor sites, cancers arising in men induce higher mortality (4); for example, there is a 3-fold increase in lethality from urinary bladder carcinomas in men relative to women (4). Further, there are signicant differences in response to treatment: female patients with nonsmall cell lung cancer respond better to both surgery (5, 6) and chemotherapy (7, 8), even after accounting for differences in variables such as subtype. Female patients with colorectal cancer respond better to surgery, and this difference is driven by improved female survival in the rectal cancer subgroup (9). Similarly, female patients with colorectal also respond better to chemotherapy, which is partially attributed to differences in tumor site and microsatellite instability (10). Finally, a propen- sity-matched study of nasopharyngeal carcinoma found that females have a survival advantage regardless of tumor stage, radiation technique, and chemotherapy regimen, but that this advantage declines and disappears during menopause (11). Some of these differences in treatment response may be attributed to differences in driver mutations between the sexes, and others to differences in epigenetics or chromatin conformation. The origins and mechanisms of these sex differences remain a major unresolved question in cancer biology. They may be caused by differences in the expression of genes on the sex chromosomes, in hormone levels, in developmental biology, or in lifestyle features not reected in current epidemiologic studies. Likely, a mixture of all these components contributes to sex differences in patient outcomes. We hypothesized that, independent of their mechanism, sex differences in cancer would be reected by differences in somatic mutation proles. That is, male and female tumors would acquire mutations at different rates and of different types. Recent intriguing data on missense mutations in melanoma support this hypothesis (12). We, therefore, undertook a systematic evaluation of sex- associated biases in mutations in cancer across a broad range of tumor types. Our study provides a comprehensive pan- cancer catalog of sex-biased mutations and a perspective on sex-specic prognostic biomarkers. Materials and Methods Data acquisition and processing mRNA abundance, DNA genome-wide somatic copy-number and somatic mutation proles for the Cancer Genome Atlas (TCGA) datasets were downloaded from Broad GDAC Firehose (https://gdac.broadinstitute.org/), release 2016-01-28. For mRNA abundance, Illumina HiSeq rnaseqv2 level 3 RSEM- normalized proles were used. Genes with >75% of samples having zero reads were removed from the respective data set. GISTIC v2 (13) level 4 data were used for somatic copy-number analysis. mRNA abundance data were converted to log 2 scale for subsequent analyses. Mutational proles were based on TCGA- reported MutSig v2.0 calls. All preprocessing was performed in R statistical environment (v3.1.3). 1 Computational Biology Program, Ontario Institute for Cancer Research, Tor- onto, Ontario, Canada. 2 Department of Medical Biophysics, University of Tor- onto, Toronto, Ontario, Canada. 3 Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada. Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). Corresponding Author: Paul C. Boutros, Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada. Phone: 647-258-4321; E-mail: [email protected] doi: 10.1158/0008-5472.CAN-18-0362 Ó2018 American Association for Cancer Research. Cancer Research www.aacrjournals.org 5527 on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

Genome and Epigenome

Sex Differences in Cancer Driver Genes andBiomarkersConstance H. Li1,2, Syed Haider1, Yu-Jia Shiah1,2, Kevin Thai1, andPaul C. Boutros1,2,3

Abstract

Cancer differs significantly between men and women; evenafter adjusting for known epidemiologic risk factors, the sexesdiffer in incidence, outcome, and response to therapy. Thesedifferences occur in many but not all tumor types, and theirorigins remain largely unknown. Here, we compare somaticmutation profiles between tumors arising in men and inwomen. We discovered large differences in mutation densityand sex biases in the frequency of mutation of specific genes;these differences may be associated with sex biases in DNAmismatch repair genes or microsatellite instability. Sex-biasedgenes include well-known drivers of cancer such as b-catenin

and BAP1. Sex influenced biomarkers of patient outcome,where different genes were associated with tumor aggressionin each sex. These data call for increased study and consider-ation of the molecular role of sex in cancer etiology, progres-sion, treatment, and personalized therapy.

Significance: This study provides a comprehensive cata-log of sex differences in somatic alterations, including incancer driver genes, which influence prognostic biomarkersthat predict patient outcome after definitive local therapy.Cancer Res; 78(19); 5527–37. �2018 AACR.

IntroductionSex differences in cancer have been known at least since 1949

(1), with repeated demonstration that males have higher cancerrisk both in studies using North American (e.g., SEER; ref. 2) andinternational databases (e.g., IARC; ref. 3).Most, but not all tumortypes show increased incidence in men: thyroid cancer occurs�2.5 times more frequently in women. These differences remainafter controlling for known epidemiologic risk factors (3). Atmosttumor sites, cancers arising inmen induce highermortality (4); forexample, there is a 3-fold increase in lethality from urinarybladder carcinomas in men relative to women (4). Further, thereare significant differences in response to treatment: femalepatients with non–small cell lung cancer respond better to bothsurgery (5, 6) and chemotherapy (7, 8), even after accounting fordifferences in variables such as subtype. Female patients withcolorectal cancer respond better to surgery, and this difference isdriven by improved female survival in the rectal cancer subgroup(9). Similarly, female patients with colorectal also respond betterto chemotherapy, which is partially attributed to differences intumor site and microsatellite instability (10). Finally, a propen-sity-matched study of nasopharyngeal carcinoma found thatfemales have a survival advantage regardless of tumor stage,radiation technique, and chemotherapy regimen, but that this

advantage declines and disappears duringmenopause (11). Someof these differences in treatment response may be attributed todifferences in driver mutations between the sexes, and others todifferences in epigenetics or chromatin conformation.

The origins and mechanisms of these sex differences remain amajor unresolved question in cancer biology. They may becaused by differences in the expression of genes on the sexchromosomes, in hormone levels, in developmental biology,or in lifestyle features not reflected in current epidemiologicstudies. Likely, a mixture of all these components contributes tosex differences in patient outcomes. We hypothesized that,independent of their mechanism, sex differences in cancerwould be reflected by differences in somatic mutation profiles.That is, male and female tumors would acquire mutations atdifferent rates and of different types. Recent intriguing dataon missense mutations in melanoma support this hypothesis(12). We, therefore, undertook a systematic evaluation of sex-associated biases in mutations in cancer across a broad rangeof tumor types. Our study provides a comprehensive pan-cancer catalog of sex-biased mutations and a perspective onsex-specific prognostic biomarkers.

Materials and MethodsData acquisition and processing

mRNA abundance, DNA genome-wide somatic copy-numberand somatic mutation profiles for the Cancer Genome Atlas(TCGA) datasets were downloaded from Broad GDACFirehose (https://gdac.broadinstitute.org/), release 2016-01-28.For mRNA abundance, Illumina HiSeq rnaseqv2 level 3 RSEM-normalized profiles were used. Genes with >75% of sampleshaving zero reads were removed from the respective data set.GISTIC v2 (13) level 4 data were used for somatic copy-numberanalysis. mRNA abundance data were converted to log2 scale forsubsequent analyses. Mutational profiles were based on TCGA-reported MutSig v2.0 calls. All preprocessing was performed in Rstatistical environment (v3.1.3).

1Computational Biology Program, Ontario Institute for Cancer Research, Tor-onto, Ontario, Canada. 2Department of Medical Biophysics, University of Tor-onto, Toronto, Ontario, Canada. 3Department of Pharmacology and Toxicology,University of Toronto, Toronto, Ontario, Canada.

Note: Supplementary data for this article are available at Cancer ResearchOnline (http://cancerres.aacrjournals.org/).

Corresponding Author: Paul C. Boutros, Ontario Institute for Cancer Research,Toronto, ON M5G0A3, Canada. Phone: 647-258-4321; E-mail:[email protected]

doi: 10.1158/0008-5472.CAN-18-0362

�2018 American Association for Cancer Research.

CancerResearch

www.aacrjournals.org 5527

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 2: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

Patients younger than 18, older than 85 or lacking sex anno-tation were excluded from analysis, resulting in a sample size of7,131 across all tumor types for copy-number alterations (CNA;1.5% excluded, Supplementary Table S1) and 6,073 for somaticsingle-nucleotide variants (SNV; 1.5% excluded; SupplementaryTable S1). Geneswere excluded if theyweremutated in fewer than20 patients (for CNAs) or 5% of patients (for SNVs). Gene filterswere applied independently for pan-cancer and per individualtumor type data set. All analyses excluded genes on the X and Ychromosomes.

Mutation loadMutation load per patient was calculated as the sum of SNVs

across all genes on the autosomes. Mutation load was Box–Coxtransformed, and transformed values were compared between thesexes using unpaired two-sided t tests for both pan-cancer andtumor type–specific analysis. A linear regression model was usedto adjust mutation load for tumor type for the pan-cancer com-parison. Tumor type–specific P values were adjusted using theBenjamini–Hochberg false discovery rate procedure. Tumor typeswith q values meeting an FDR threshold of 10% were furtheranalyzed using linear regression to adjust for tumor type–specificvariables described in Supplementary Table S1. A multivariateq value threshold of 0.05 was then used to determine statisticalsignificance. Full results are in Supplementary Table S2.

Genome instabilityGenome instability was calculated as the percentage of the

genome affected by copy-number alterations. The number of basepairs for eachCNA segmentwas summed toobtain a total numberof base pairs altered per patient. The total number of base pairswas divided by the number of assayed bases excluding the sexchromosomes (�7.8 million bp) to obtain the percentage of thegenome altered (PGA). Box–Cox transformed PGA was treated asa continuous variable and compared by sex using two-sidedunpaired t tests for all tumor types combined (pan-cancer) andseparately (tumor type–specific). Linear regression models wereused to adjust PGA for tumor type, age, and race for the pan-cancercomparison. Tumor types where univariate testing indicatedputative sex biases in PGA (FDR threshold of 10%) were alsoadjusted for tumor type–specific variables (SupplementaryTable S1). A q value threshold of 0.05 was used to determinestatistical significance for multivariate results and full results arepresented in Supplementary Table S2.

Genome-spanning CNA analysisAdjacent genes whose copy-number profiles across patients

were highly correlated (Pearson r > 95%) were binned. The copy-number call for each patient was taken to be the majority callacross all genes in each bin. Copy-number calls were collapsed toternary (loss, neutral, gain) representation by combining lossgroups (monoallelic and biallelic) and gain groups (low andhigh). The number of loss, neutral, and gain calls was summedperbin and sex, and assessed using univariate and multivariatetechniques. For univariate analysis, proportional differencesbetween the sexes for gains and losses were tested for each binusing proportions tests. To account for multiple testing, FDRcorrection was performed and an FDR threshold of 10% wasused to select bins for further multivariate analysis.

After identifying candidate pan-cancer significant bins fromunivariate proportions testing, generalized linear modeling was

used to reduce false positives that may arise from unbalancedtumor type subsets of the pan-cancer data. Multivariate logisticregression (MLR)was used to adjust ternary CNA data for sex, age,race, and tumor type. TheMLR sex termwas tested for significanceand FDR corrected to identify bins with pan-cancer sex biases (q <0.05).

The same approach was applied to each tumor type individ-ually. Proportions tests were used to select bins for multivariateanalysis (q value < 0.1). MLR was again used to adjust ternarycopy-number call for clinical variables. MLR modeling for eachtumor type varies based on available clinical data. Tumor type–specificmodels were fit independently per univariately significantbin and variable significance for each bin was extracted from thefitted models. FDR correction was used and an FDR threshold of0.05 was used. A description of pan-cancer and tumor type–specific models, along with a breakdown of the data for eachgroup, can be found in Supplementary Table S1 and results can befound in Supplementary Tables S3–S5.

CNA-mRNA functional analysisGenes in bins altered by sex-biased CNAs after multivariate

adjustment for kidney clear cell and kidney papillary cell cancerswere further investigated to determine sex-biased functionaleffects. Available mRNA samples were matched to those used inCNA analysis. For each gene affected by a sex-biased loss, itsmRNA abundance was modeled against sex, copy-number lossstatus, and a sex–copy-number loss interaction term. The inter-action term was used to identify genes with sex-biased mRNAchanges. FDR-adjusted P values and fold changes were extractedfor visualization. A q value threshold of 0.05 was used forstatistical significance. For genes affected by sex-biased gains, thesame procedure was applied using copy-number gains.

CNA-mRNA survival analysisGenes found to have significant or trending (FDR threshold of

10%) sex biases in the CNA-mRNA functional analysis werefurther analyzed using Cox proportional hazards modeling. Thatis, we focused on genes that were both altered by sex-biased CNAs(MLR q value < 0.05) and showed mRNA abundance differencesbetween the copy-number neutral and loss/gain groups for eithersex (sex–loss interaction q < 0.1). For each gene, the mRNAabundancewasmedian dichotomized over all samples to identifylow- andhigh-expression groups. Coxproportional hazard regres-sion models incorporating sex, mRNA group, and a sex–mRNAgroup interaction were fit for overall survival after checking theproportional hazards assumption. FDR-adjusted interactionP values and log2 hazard ratios were extracted for visualization.A q value threshold of 0.1 was used to identify genes with sex-influenced survival.

Genome-spanning SNV analysisWe focused on genes mutated in at least 5% of patients. All

genes tested are listed in Supplementary Table S6. Mutation datawere binarized to indicate presence or absence of SNV in each geneper patient. Proportions of mutated genes were comparedbetween the sexes using proportions tests for univariate analysis.FDR correctionwas used to adjust P values and a q value thresholdof 0.1 used to select genes for multivariate analysis.

After identifying pan-cancer univariately significant genes fromproportions testing, binary logistic regression (LR) was used toreduce false positives that may arise from unbalanced tumor type

Li et al.

Cancer Res; 78(19) October 1, 2018 Cancer Research5528

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 3: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

subsets of the pan-cancer data. Age and race were also included inthe pan-cancer model. FDR correction was again applied andgenes with significant pan-cancer sex terms were extracted fromthe models (q value < 0.05).

LR was also used for multivariate analysis of each individualtumor type to adjust for clinical variables. The same modelvariables from the CNA MLR models were used. Tumor type–specific models were fitted independently per univariately select-ed gene and variable significance for each gene was extracted fromthe fitted models as P values. FDR correction was used to adjustP values and a LR q value threshold of 0.05was used. A descriptionof pan-cancer and tumor type–specific models can be found inSupplementary Table S1. A summary of results can be found inSupplementary Table S5.

Validation of sex biasesCopy-number data for tumor types with sex-biased CNAs were

downloaded from theProgenetix database (14) as ameta-analysisdata set. Matching genomic regions were analyzed using propor-tions tests to validate genes in sex-biasedCNAs. Similarly, somaticSNV data were obtained from cBioPortal and the ICGC DataPortal and analyzed to validate sex-biased somatic SNV load andgenes with sex-biased mutations frequencies. A description ofvalidation data, data sources, and results are available in Supple-mentary Table S7.

Multigene prognostic modelsComputationally purified tumor mRNA profiles for the Direc-

tor's Challenge data were downloaded (15). The training andvalidation cohorts were processed and split as previouslydescribed andwere checked for balance betweenmale and femalesamples. Colon transcriptomic data were downloaded (16, 17)and reprocessed and normalized. Colon training and validationcohorts were balanced for data source, sex, and survival status.Survival modeling was performed using overall survival as theclinical endpoint for both datasets.

To identify genes univariately associated with survival, puri-fied mRNA abundance was median dichotomized for eachgene identify low- and high-expression groups. Cox propor-tional hazard regression models included variables for sex,mRNA–group and the sex–mRNA group interaction, andP values and log2 hazard ratios were extracted for visualization.A P value threshold of 0.01 was used to determine statisticalsignificance.

Ridge regression models were used to train 50,000 randomlygenerated 100-gene prognostic signatures. The glmnet package(v2.0-5) was used to run 10-fold cross-validation usingglmnetcv (a ¼ 0.1) and AUC as the type measure. Signatureswere trained using the training cohort and validated in thevalidation cohort. Signatures were then run on male- andfemale-only validation patients, and Cox proportional hazardsmodeling was performed. Signatures that failed the propor-tional hazards assumption were removed from analysis. Thesame approach was used to train a signature using the top 100univariately significant genes.

Statistical analysis and data visualizationAll statistical analyses and data visualizationwere performed in

the R statistical environment (v3.2.1) using the BPG (v5.3.4;ref. 18), mlogit (v0.2-4), glmnet (v2.0-5), and pROC (v1.8)packages.

ResultsSex biases in mutation burden

We leveraged data from TCGA studies comprising 7,131matched tumor–normal pairs of 18 tumor types: 4,265 frommales and 2,866 from females (Supplementary Table S1). Wefocused on somatic CNAs and SNVs in protein-coding genesas they are well-established driver events. These data arewell powered to detect differences in driver-gene mutationfrequencies between tumors arising in men and those arising inwomen (Supplementary Fig. S1). We excluded genes andregions of the X and Y chromosomes and analyzed autosomaldifferences (19).

We first compared pan-cancer mutational burden betweentumors arising in men and those arising in women. Male-derivedtumors exhibited a higher density of somatic-coding SNVs thanfemale-derived tumors in univariate analysis (difference inmeans ¼ 0.17; 95% CI, 0.14–0.20, P ¼ 1.0 � 10�29, unpairedWelch t test on Box–Cox transformed mutation load; Supple-mentary Fig. S2). This sex bias persisted even after multivariateanalysis adjusting for imbalances in sample numbers acrosstumor type, race, and age (linear regression P ¼ 4.5 � 10�6;Supplementary Table S2). Afterfinding sex differences on the pan-cancer level, we asked if there were such differences withinindividual cancer types and focused our analysis on each tumortype. Six of these showed univariate sex biases inmutation density(10% FDR threshold; Supplementary Fig. S2) and were furtherinvestigated using tumor type–specific multivariate modeling.Again, we used Box–Cox transformation and linear modeling todeterminewhether sex remained a significant variable after adjust-ing for possible confounders (linear regression q values givenin Fig. 1A; model-specific variables described in SupplementaryTable S1). Finally, because the association between sex andmutation load may be biased by later stage male-derived tumors(Supplementary Table S1), we created a sub–pan-cancer modelusing only tumor types with stage data and found that highermutation prevalence in male-derived samples persisted afteraccounting for stage. A summary of univariate and multivariateresults can be found in Supplementary Table S2.

Of the six tumor types with univariate sex differences (Fig. 1A),males exhibited more somatic-coding SNVs in bladder urothelialcancer (BLCA: difference in Box–Cox means ¼ 0.55; 95% CI,0.20–0.90; multivariate q ¼ 3.6 � 10�3), melanoma (SKCM:difference in Box–Cox means ¼ 0.78; 95% CI, 0.29–1.3; multi-variate q¼ 0.037), renal papillary cell cancer (KIRP: difference inBox–Coxmeans¼ 2.2; 95%CI, 0.81–3.6;multivariate q¼ 0.019),and liver hepatocellular cancer (LIHC: difference in Box–Coxmeans ¼ 0.16; 95% CI, 0.049–0.27; multivariate q ¼ 0.019).There was an opposite trend in glioblastoma where female-derived samples had higher mutation burden (GBM: differencein Box–Cox means ¼ 1.6; 95% CI, 0.14–3.0; multivariate q ¼0.094). Using independent sequencing datasets, we validated themale biases seen in bladder, liver, lung adenocarcinoma, and skincancers (Supplementary Table S7).

To see if these sex differences affected multiple mutation types,we also compared the load of CNAs across tumor types based onthe percentage of genome altered, which is a prognostic marker inseveral tumor types (20–22). A putative univariate sex bias in pan-cancer PGA was not significant after multivariate adjustment(Supplementary Fig. S2); however, 4/18 individual tumor typesshowed univariate sex differences in PGA (Supplementary

Sex Differences in Cancer Driver Genes and Biomarkers

www.aacrjournals.org Cancer Res; 78(19) October 1, 2018 5529

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 4: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

Fig. S2). These were further analyzed with multivariate modelingto examine the influence of sex (Fig. 1B; Supplementary Table S2).Males showed elevated genomic instability in stomach andesophageal cancer (STES: difference in Box–Cox means ¼ 1.7;95% CI, 0.92–2.4; multivariate q ¼ 9.7 � 10�3), head and neckcancer (HNSC: difference in Box–Cox means¼ 1.9; 95% CI, 1.0–2.8; multivariate q ¼ 0.016), and kidney clear cell cancer (KIRC:difference in Box–Cox means ¼ 0.40; 95% CI, 0.14–0.67; mul-tivariate q¼ 0.019). A strong opposite trend was seen in sarcoma,where PGA was higher in female-derived tumors (SARC: differ-ence in Box–Coxmeans¼ 1.5; 95%CI, 0.41–2.7;multivariate q¼0.021).

Measures of mutation burden such as SNV load and PGA maybe correlated with defects in DNA mismatch repair (MMR). Forexample, microsatellite instability (MSI), a marker of defectiveDNAMMR, ismore common in some tumor types (23) and couldbe a confounder in the relationship between mutation burdenand sex. We further examined three tumor types with availableMSI-monodinucleotide assay data: colorectal, pancreatic, andstomach and esophageal cancers. In samples with MSI data(Supplementary Table S1), we found an association betweenMSIand sex in stomach and esophageal cancer (Pearson c2 P ¼ 1.4 �10�5; 40% of female-derived samples vs. 26% of male-derivedsamples; Fig. 1C) and colorectal cancer (Pearson c2 P ¼ 0.025;33%of female-derived samples vs. 25% ofmale-derived samples;Supplementary Fig. S3). By contrast, MSI status was not sexassociated in pancreatic cancer (Pearson c2 P ¼ 0.63). Incorpo-rating MSI into our analyses of SNV burden and PGA, we firstnoted thatMSI was associated with increased SNV burden but notPGA in all three tumor types. We then used multivariate modelsincluding MSI to examine the interplay between sex, MSI, andmutation burden. Intriguingly, though there was no univariaterelationship between sex and SNV burden in stomach and esoph-ageal cancer, a novel sex bias emerged after adjusting forMSI (MVP ¼ 0.023; Fig. 1D). We observed the same effect in an indepen-dent data set (Supplementary Table S7). The association betweensex and PGA persisted in this newmodel, enforcing the sex bias inPGA for this tumor type. Because MSI is thought to result fromdefective DNAMMR,we also looked for sex biases specifically in aset of sevenMMRgenes (24). Thoughwe did not find sex biases in

the mutation rates of DNAMMR genes, we observed significantlylower mRNA abundance in female-derived tumors for MLH1(male mean ¼ 8.89, female mean ¼ 8.5, 95% CI, 0.19, 0.62,t test q ¼ 0.0011) and PMS2 (male mean ¼ 9.0, female mean ¼8.87, 95% CI, 0.05–0.21, t test q ¼ 0.0060). Taken together, thissuggests that differential mRNA abundance may form a linkbetween MMR and sex biases in mutation load in stomach andesophageal cancer. We did not find novel sex biases in colorectalor pancreatic mutation burden after accounting for MSI (Supple-mentary Fig. S3).

To investigate whether sex-biased mutation load is generallyassociated with DNA MMR, we also looked specifically at MMRgenes in all tumor types with sex-biasedmutation load.We founddecreased MSH2 (male mean ¼ 8.45, female mean ¼ 8.83, 95%CI, 0.22–0.53, t test q¼ 3.98� 10�6),MSH3 (male mean¼ 8.50,femalemean¼ 8.71, 95%CI, 0.082–0.34, t test q¼ 1.51� 10�3),MSH6 (male mean ¼ 9.12, female mean ¼ 9.65, 95% CI, 0.37–0.67, t test q ¼ 4.57 � 10�10) and PMS1 (male mean ¼ 7.71,female mean ¼ 8.01, 95% CI, 0.14–0.46, t test q ¼ 2.26 � 10�4)mRNA abundance in male kidney papillary tumors, correspond-ing with highermalemutation prevalence. Similarly, malemRNAabundance of PMS2 (male mean ¼ 8.84, female mean ¼ 8.97,95% CI, 0.025–0.24, t test q ¼ 0.055) and MLH3 (male mean ¼8.70, female mean ¼ 8.87, 95% CI, 0.039–0.30, t test q ¼ 0.055)was also lower than that of female-derived tumors in liver cancer.This suggests that for some tumor types, differences in mutationload may be explained by sex biases in the efficiency of MMR.Taken together, this analysis of mutation burden identified sexbiases across several tumor types even after adjusting for race,tumor stage, and smoking history, among others. Indeed, a sexbias in stomach and esophageal cancer was only discovered afteradjusting for MSI status, highlighting its importance. Finally,changes in the abundances ofDNAMMRmRNA form a candidatemechanism for sex biases in mutation density.

Sex biases in somatic CNAsDifferences inmutationdensitymight reflect changes in specific

driver genes, or alternatively global changes as might be inducedby differences in DNA damage or repair. To distinguish thesepossibilities, we compared male- and female-derived tumors in

Figure 1.

Mutation burden is sex biased. We found sexdifferences in somaticmutation load (A) andgenomeinstability (B). Each point represents a sample(male-derived, blue; female-derived, pink). Wefocused on tumor types with univariately significantsex differences in mutation and show q values frommultivariate modeling here. Red lines show meanmutation burden for each group. C, Mosaic mapshowing the relationship between microsatelliteinstability and sex in stomach and esophagealcancer.D,Highermalemutation prevalence emergesafter adjusting for microsatellite instability. AdjustedBox–Cox transformed data are shown.

Li et al.

Cancer Res; 78(19) October 1, 2018 Cancer Research5530

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 5: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

the entire pan-cancer cohort (Fig. 2A). We binned adjacent genesacross all samples so that all genes within a bin had highlycorrelated sample CNA profiles (Pearson r > 0.95). We thencalculated the average bin CNA profile per sample and comparedthe rates of bin gain and loss between the sexes using proportionstests. Bins that were significant in univariate analysis (10% FDRthreshold) were further analyzed using MLR. Bins with MLRq values < 0.05 contain genes lost or gained at significantlydifferent rates between the sexes.

In pan-cancer analysis, we discovered sex-associated differen-tial CNAs in broad genomic segments covering 3,442 of the23,693 genes annotated to autosomes. The vast majority of these(94.5%; 3,251 genes) were amplifications. Concordant with PGAobservations (Fig. 1B), most weremore prevalent inmale-derivedtumors (Supplementary Tables S3 and S4). Numerous cancerdriver genes were sex biased in their CNA profile. For example,theMYC oncogene was amplified in 48% ofmale-derived tumorsvs. 37% of female-derived tumors (q ¼ 0.037, MLR). Hence, sexbiases are seen in both genome-wide and in pan-cancer gene-specific CNA mutation profiles.

To evaluate if these large-scale pan-cancer differences in CNAsof specific genes also occurred in individual tumor types, weapplied the same methodology to each tumor type. We createdtumor type–specific gene bins and again usedmultivariatemodel-ing to control for tumor type–specific factors (Methods; Supple-mentary Table S1). Sex-biased CNAs affecting thousands of geneswere detected in eight tumor types: kidney clear cell, kidneypapillary, head and neck, stomach and esophageal, liver, bladder,and both lung adenocarcinoma and squamous cell cancer (Fig.2B; Supplementary Figs. S4–S10; Supplementary Table S5). Somesex-biased events were highly focal, such as female-biased loss ofNCKAP5 in head and neck cancer (19% of male-derived vs. 37%of female-derived tumors; q¼0.046,MLR; Supplementary Fig. S5;Supplementary Table S3). Other sex-biased events covered broadgenomic segments, such as whole-chromosome arms.

We performed pathway enrichment analysis for each tumortype to investigate functional consequences of sex-biased CNAs.Significant gene ontology terms related to genes in sex-biasedgains and losses were found using g:Profiler (25) and interactionnetworks were visualized in Cytoscape (26) using EnrichmentMap (Supplementary Fig. S11; ref. 27). The largest perturbednetworks included metabolic processes in liver cancer, as well asnuclear organization and regulation processes in kidney clear cellcancer. In head and neck cancer, sex-biased CNAs affect genesrelated to lipoprotein and sterol activity. Immune-related pro-cesses were significant in several tumor types including stomachand esophageal and both kidney clear cell and papillary cancers.These pathway results suggest sex-biased CNAs may lead todownstream biases in biological processes.

To further characterize the consequences of sex-biased CNAs,we focused on kidney clear cell tumors (KIRC), a tumor type withrobust statistical power (nmale ¼ 336; nfemale ¼ 185; Supplemen-tary Fig. S1) and strong evidence of sex-biased PGA (Fig. 1B). Aftermultivariate adjustment for age, race, stage, and grade, we iden-tified 3,581 genes contained in sex-biased losses and 138 genescontained in sex-biased gains. All of these were more commonlymutated in male-derived tumors (Fig. 2B; Supplementary TablesS4 and S5). All sex-biased CNAs were broad events, with largelosses of chromosomes 3, 6p, 8q, and 9 (covering the driver genesTSC1 and CDKN2A (28)). Most prominent of these was a largeregion from 3p11.1 to 3p12.3 deleted in �60% of male-derived

tumors but only�35% of female-derived tumors (q < 10�3 for allgenes; MLR).

To determine if these sex-biased CNAs influence the tumortranscriptome, we evaluated mRNA abundances in matchedpatient samples. We first focused on genes within large segmentsof sex-biased losses. We used linear regression tomodel mRNA asa function of sex, copy-number loss vs. no copy-number loss, andthe interaction between sex and copy-number loss status. Thisallowed us to identify not only mRNA changes associated withcopy-number loss alone, but also interactionswhere sex and copy-number loss synergize for an additional effect onmRNA. Approx-imately half of genes in regions affected by sex-biased copy-number losses were associated with changes inmRNA abundance(Supplementary Fig. S12), indicating that sex-biased CNAs havetranscriptional consequences. In addition, there were multiplegenes with interaction effects (10% FDR threshold) on chromo-somes 3, 6, and 9 (Supplementary Fig. S12, red lines), includinggenes where sex and copy-number loss together changed mRNAabundance over 2-fold relative to their effects in isolation. Thesesex-copy-number interactions suggested that sex-biased copy-number changes induce transcriptional changes, and in somecases these changes vary by sex.

Next, we extended our focus to all genes affected by sex-biasedlosses (proportions test q < 0.1 and MLR q < 0.05) whose mRNAwas repressed across samples with the loss. Applying the samelinear regression model, we examined the effect of copy-numberloss in the sexes and again extracted both the copy-number lossand the sex-copy-number loss interaction terms. Of the 2,165genes, 74% showed associations between copy-number loss anddecreased mRNA abundance (Supplementary Fig. S13, blackpoints). In addition, copy-number loss affected mRNA abun-dance differently between the sexes in 36 genes that showedsignificant interactions between sex and copy-number loss (sex-loss interaction, q < 0.1; Fig. 2C, red points). Thus, sex-biasedCNAs are associated with divergent transcriptomes in male- andfemale-derived tumors.

Finally, to demonstrate that these transcriptomic divergencesare functional and clinically relevant, we evaluated the associ-ation of the 36 genes with sex biases in both CNAs and mRNAabundance (sex–loss interaction q < 0.1) with overall patientsurvival. Using univariate Cox proportional hazards modeling,we identified 16 sex-biased genes associated with outcome inboth male and female tumors (Fig. 2D). Several genes showedstrikingly divergent clinical associations, and all 16 with sex-biased survival were more prognostic in female-derived sam-ples than male. For instance, loss of LATS1was a marker of poorprognosis in women (HR¼ 0.39; 95% CI, 0.17–0.85, q¼ 0.03),but not men (HR ¼ 1.2; 95% CI, 0.80–1.8, q ¼ 0.67; Fig. 2E).Conversely, UBAC1 loss was a marker of good overall survivalin women (HR ¼ 2.64; 95% CI, 1.5–4.6, q ¼ 0.0037) but notmen (HR ¼ 1.4; 95% CI, 0.95–2.1, q ¼ 0.34; SupplementaryFig. S14). Similar patterns of sex-associated CNAs inducingdivergent transcriptomes associated with clinical aggressivitywere observed for KIRP (Supplementary Fig. S15), demonstrat-ing the generality of this phenomenon.

Taken together, these data demonstrate that the frequency ofsomatic CNAs in specific genes is sex biased in many, but not alltumor types. These differences do not appear to be a result of well-known clinical or epidemiologic factors. Sex-biased CNAs areassociated with sex biases in the transcriptome (and presumablythe proteome as a well), and these transcriptomic differences are

Sex Differences in Cancer Driver Genes and Biomarkers

www.aacrjournals.org Cancer Res; 78(19) October 1, 2018 5531

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 6: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

Figure 2.

Functional sex differences in CNAs are associated with outcome. Sex differences in CNAs for pan-cancer (A) and kidney clear cell cancer (B). Each plot shows, fromtop to bottom, the q value showing significance of sex from multivariate modeling, with yellow (green) points corresponding to 0.05 < q < 0.01 and deepblue (red) points corresponding to q < 0.01; the proportion of samples with aberration; the difference in proportion between male and female groups foramplifications; the same repeated for deletions; and the CNA profile heat map. The columns represent genes ordered by chromosome. Light blue and pink pointsrepresent data for male- and female-derived samples, respectively. C, Transcriptome differences between the sexes are seen in the interaction betweensex and copy-number loss status in mRNA abundance modeling. Red points are genes with significant sex–copy-number loss interaction terms (q < 0.05).D, Genes with sex-biased copy-number loss and mRNA changes are associated with differential overall survival outcomes between the sexes. Again, theinteraction term estimate and q values were used to determine genes with sex biases in survival. E, LATS1 is a marker of poor overall survival in women,but not in men.

Li et al.

Cancer Res; 78(19) October 1, 2018 Cancer Research5532

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 7: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

associated with differences in clinical outcome within andbetween the sexes.

Sex biases in somatic SNVsWe next asked whether sex differences were specific to somatic

CNAs or if they also occurred in other mutation types. Wecompared the proportions of male-derived (nmale ¼ 3,591) andfemale-derived (nfemale¼2,482) sampleswith SNVs inpan-cancerunivariate analysis. Similar to our CNA analysis, we adjusted forunequal sample numbers of the tumor types and other factorsusing LR, here with a binary response variable indicating whetherthe gene harbored SNVs or not. In total, we tested 103 genes thatweremutated in at least 5%of samples (Supplementary Table S6).Of these, four genes showed sex biases after adjustment for tumortype, age, and race, and all four showed elevatedmutation rates inmale-derived samples (Fig. 3A). Some of these mutations may bepassengers and reflect increased DNA damage in male-derivedtumors.

Similarly to our CNA analysis, we next evaluated each of the 18tumor types independently. We screened for candidatemutationsusing univariate analyses and FDR adjustment and then per-formed multivariable modeling. We excluded genes mutated inless than 5% of samples, meaning many lower-frequency sex-biased genes have not yet been uncovered and our results repre-sent a lower bound of sex biases in somatic SNVs.Of the 18 tumortypes evaluated, four exhibited sex-biased mutations in specificgenes: stomach and esophageal, hepatocellular carcinoma, andboth kidney clear cell and kidney papillary cell cancers (Fig. 3B–D;Supplementary Fig. S16; Supplementary Table S6). In stomachand esophageal cancer, all 10 sex-biased genes were mutated in agreater fraction of female-derived samples, including a number oftranscription factors such as ZFHX3 (95% CI of the difference,2.5%–15%, q ¼ 0.018, LR), ZBTB20 (95% CI of the difference,2.2%–14%, q ¼ 0.034, LR), and GTF3C1 (95% CI of the differ-ence, 4.0%–16%, q¼0.012, LR; Fig. 3B; Supplementary Table S6).

The largest differences in mutation frequency were seen in livercarcinoma, where two genes showed dramatic sex biases inmutation frequency (Fig. 3C; Supplementary Table S6). Maletumors were strongly enriched for mutations in b-catenin(CTNNB1), with 33% of male-derived tumors harboring a muta-tion compared with 12% of female-derived tumors (95% CI forthe difference, 12%–30%, q¼ 0.0014, LR). These large differencessuggest mutational associations with etiologic factors. For exam-ple, CTNNB1 mutations occur more frequently in tumors asso-ciated with Hepatitis B (95% CI for the difference, �1.9 to 27%,P ¼ 0.07), and sex remains significant even after accounting forviral and alcohol risk factors. We validated this higher femalemutation frequency inCTNNB1 in an independent patient cohortfrom the Liver Cancer—NCC, JP project on the ICGC Data Portal(17% higher; 95% CI for the difference; 9.7%–25%, P ¼ 2.7 �10�5; Supplementary Table S7).

The deubiquitinating enzyme BAP1 was almost exclusivelymutated in female-derived hepatocellular tumors, occurring in14% of female-derived tumors and 1.6% of male-derived tumors(95% CI of difference, 5.6%–20%, q ¼ 0.017, LR). This enrich-ment of BAP1mutations was also seen in 15% of female-derivedkidney clear cell tumors compared with 6.1% of male-derivedtumors (95%CI of difference, 1.7%–15%, q¼ 0.001, LR; Fig. 3D;Supplementary Table S6)—these tumors are not thought to bevirally associated. BAP1 has been implicated as a tumor suppres-sor and is frequently inactivated in kidney clear cell cancer (28,

29). Comparison of mRNA abundance between hepatocellularcarcinoma samples with mutated and wild-type BAP1 revealedstriking sex differences: female-derived tumors with mutatedBAP1 had 1.4-fold decreased mRNA abundance compared withthose with wild-type BAP1, compared with a 4-fold decrease inmale-derived samples (Supplementary Fig. S17). Indeed, linearmodeling confirmed the significant interaction between sex andBAP1mutation status (P ¼ 5.8 � 10�5). The same sex-associatedmRNA differences were not observed in kidney clear cell cancer(Fig. 3E), but we did observe a striking interaction in survivalmodeling. BAP1mutation was associated with poor prognosis infemale patients (HR ¼ 2.59; 95% CI, 1.40–4.81, P ¼ 0.0025; Fig.3F) but not male patients (HR ¼ 0.80; 95% CI, 0.32–1.97, P ¼0.62). Indeed, the interaction between sex and BAP1 mutationwas significant in Cox proportional hazards survival modeling(interaction q ¼ 0.0025). Mutation of BAP1 is known to beassociated with worse prognosis in kidney clear cell cancer, butevidence on its sex-biased prognostic value is conflicting (30).

We extended this mRNA and survival analysis to other sex-biased SNVs in liver, kidney papillary, and stomach and esoph-ageal cancer but did not find additional sex–SNV interactions inthese data (Supplementary Fig. S18). However, we noted thatEP400 encodes a chromatin remodeling protein thought to beinvolved in ATM-mediated DNA damage response (31). Wereturned to the mutation prevalence data to investigate whethersex-biasedEP400mutation in stomach and esophageal cancerwasassociated with sex biases in mutation burden. Not only ismutated EP400 itself associated with higher SNV load (mutatedEP400mean SNV load¼ 4.82, wild EP400mean¼ 3.82, 95% CI,0.80–1.20, t test P¼ 4.70� 10�13; Supplementary Fig. S19), thereis an interaction between EP400 mutation and sex where muta-tion of this gene is associated with a greater increase in mutationburden in female-derived samples than in male-derived samples(Supplementary Fig. S19, interaction P ¼ 0.009). This indicatesthat not only is EP400mutation associated with increased muta-tion load, there is a greater effect in female-derived samples thanmale. Further, given the relationship between MSI, mutationburden and sex we described previously, we examined whetherthere was a relationship between EP400 mutation and MSI-positive samples and found no association (P > 0.05). Finally,we also validated sex-biased EP400 mutation in an independentdata set (Supplementary Table S7). Overall, our analysis ofsomatic SNVs revealed that sex-biased mutation frequency isassociated with impacts on mRNA abundance, survival, andmutation burden in several tumor types.

Clinical relevance of transcriptomic sex differencesThe differential clinical impact of sex-biased kidney renal cell

genes (Figs. 2B–E, 3E and F) suggested that sex may influence theaccuracy of biomarkers used to personalize therapy. We asked ifsex-na€�ve approaches to prognostic biomarker development resultin biomarkers that can predict survival accurately well across allsamples, but better in one sex than the other. The sex biases inmutational profiles and transcriptional changes suggest that bio-markers developedusingdata fromboth sexeswithout annotationmay containpredictive features biased toward the sex inwhich thattumor type most frequently occurs. We focused on multigeneprognostic mRNA signatures, such as those developed for non–small cell lung cancer to identify early-stage patients who mightbenefit from intensification of therapy (32, 33). We used thebenchmark Director's Challenge data set of 443 tumor samples

Sex Differences in Cancer Driver Genes and Biomarkers

www.aacrjournals.org Cancer Res; 78(19) October 1, 2018 5533

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 8: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

(223 men and 220 women) with mRNA abundance profiles (34)after deconvolution of tumor and stromal expression (15).

Univariate Cox proportional hazards modeling identified starkdifferences between male- and female-derived tumors (Fig. 4A).

Overall, 0.8% of genes were prognostic in both sexes (blackpoints) and 1.5% were prognostic in patients of only one sex(blue and pink points). Strikingly, 79 genes (0.9%) had mRNA-based groups that interacted with sex for an additional effect on

Figure 3.

Sexbiases in driver SNVs. Sex differences in somatic SNVs for pan-cancer (A), stomach and esophageal carcinoma (B), hepatocellular carcinoma (C), and kidney clearcell cancer (D). Each plot shows, from top to bottom, the q value for significance of sex from multivariate modeling, with yellow points corresponding to0.05 < q < 0.01 and green points corresponding to q < 0.01; proportion of samples with aberration; difference in proportion between male and female groups;mutation prevalence across all samples and a heat map showing mutation status for each sample. E, BAP1 mRNA abundance compared across sex andmutation status for kidney clear cell cancer. TheP value for the sex–SNV interaction frommRNAmodeling is shown.F,MutatedBAP1 is associatedwith poor prognosisin female patients, but not male patients in kidney clear cell cancer.

Li et al.

Cancer Res; 78(19) October 1, 2018 Cancer Research5534

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 9: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

survival (red points, P < 0.01). These divergences could be ofsignificantmagnitude. For example, elevated tumor abundance ofSPINK1 was associated with poor outcome in women only(interaction P ¼ 0.0032; Fig. 4B), while FBXO46 was prognosticinmales and not females (interaction P¼ 0.0070; SupplementaryFig. S20). To assess the generality of these results, we assessed a setof 783 patients with colorectal cancer with median 3.5-yearsurvival. There were again large differences in the magnitude andeven direction of association between expression and outcomebetween the sexes (Fig. 4C; Supplementary Fig. S21).

To assess the performance of multigene biomarkers, weapplied ridge regression to the top 100 univariately prognosticgenes found in the combined sex training cohort. The multi-gene signature attained an AUC of 0.63 and was prognostic in

the validation cohort when sex was not considered (HR ¼ 2.3;95% CI, 1.32–4.01, P ¼ 0.0035; Supplementary Fig. S22).However, this overall value hid significant sex bias: thesignature performed very well in men (AUC ¼ 0.73), but wasindistinguishable from chance in women (AUC ¼0.54; Fig. 4D). Finally, we verified that male- and female-derived tumors showed fundamentally distinct distributionsusing the independent training and validation cohorts definedby the data set generators (34) and empirically estimating thenull distributions by training 50,000 randomly generated sig-natures (Supplementary Fig. S23; ref. 35). Together, theseresults show that large sex differences observed in driver geneslead to differences in the development application of biomar-kers for personalized therapy.

Figure 4.

Sex differences influence prognostic biomarker accuracy. Comparing female andmale hazard ratios from univariate Cox proportional hazardsmodeling in non–smallcell lung cancer (A) and colon cancer (C). Red points, genes with significant interaction terms between sex and risk group. Blue and pink points aregenes prognostic only in males and females, respectively. Gray points, genes not significant in either sex. B, SPINK1 is prognostic in females but not males innon–small cell lung cancer. D, Sex-specific receiver operating characteristic curves for a 100-gene non–small cell lung cancer signature fit on the combinedsex training cohort and tested on female and male test cohorts. Blue lines, males; pink lines, females.

Sex Differences in Cancer Driver Genes and Biomarkers

www.aacrjournals.org Cancer Res; 78(19) October 1, 2018 5535

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 10: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

DiscussionThe broad and unexplained sex differences in cancer outcomes

represent a major gap in our understanding of the disease. Weevaluated themolecular origins of these differences by comparingsomatic mutation profiles in male- and female-derived tumorsacross a broad range of tumor types. We discovered large differ-ences in mutation density and sex biases in the frequency ofmutation of specific genes. These differences, however, are notuniform across tumor types. Rather, some show very significantsex bias in their mutational profiles, while others show nodetectable sex biases. Further, some tumor types show sex biasin SNVmutation profiles, others in CNAmutational profiles, andstill others in both. The mechanisms by which these differencesoccur remain to be elucidated.

Candidate mechanisms include differential chromatin architec-ture, mutagen exposures andDNA repair efficacy and bias. Indeed,our analysis of stomach and esophageal cancer suggests a complexrelationship between sex and the cancer genome landscape. Ouranalysis of microsatellite instability in this tumor type posits amechanism in dysfunctional DNA repair where baseline somaticSNV load is lower in female-derived samples. However, the highproportionofMSI-positive female-derived samples aswell as lowermRNA abundance of DNA MMR genes MLH1 and PMS2 lead tohigher SNV load in these individuals. Independently, EP400 is notonly more frequently mutated in female-derived samples, it alsohas a greater impact and drives overall female somatic SNVburdenhigher. As a result, though overall SNV burden appears similarbetweenmale- and female-derived samples, more female samplesharbor defects in DNA repair. Additional work is needed to furtherelucidate the interplay between microsatellite instability, DNArepair machinery, mutation load, and sex.

Our statistical modeling incorporate clinical and environmen-tal variables to approach the true association of sex with thegenomic characteristic of interest. However, it is important to notethe limitations of this method in capturing all confoundingvariables. First, information on environmental variables is incom-plete and may not be accurately reported. Second, adding vari-ables increases model complexity and may decrease overall per-formance if there is insufficient data to support the model.Nevertheless, the tumor type–specific models in this analysisrepresent a foundation for putative sex differences, and ourfindings should be taken in context of each tumor type and itsassociated risk factors. Another challenge of our study lies invalidating our findings in datasets with sufficient power andsimilar quality survival data. Though we were able to validate asubset of sex-biasedCNAs and SNVs, there remainputative TCGA-specific sex biases. These validation challenges may be due tomethodological differences between datasets included in meta-analysis and to the high level of heterogeneity in environmentalfactors that have yet to be accounted for.

Existing literature on sex differences in cancer genomics largelyfocus on individual tumor types and on specific genes or on asingle data type (23, 26, 36). A previous pan-cancer study incor-porating multiple mutation types focused on male-biased loss of

function on X chromosome genes (19). Our analysis comple-ments these sex chromosome–specific findings with a moregeneral methodology by broadly analyzing both SNV and CNAmutations using transparent tumor type–specific models to gen-erate a catalog of sex-biased events. We also describe for the firsttime, a relationship between mutation load and sex-biased DNArepair deficiency.

Thepotential consequences of sex-biasedSNVs andCNAs rangefrom perturbations of biological pathways such as metabolicprocesses to changes in mRNA abundance and prognostic bio-marker performance. Significant insight into these questions onmechanism will arise from on-going primary tumor whole-genome sequencing and chromatin profiling efforts. Independentof their origins, these mutational sex biases have significantconsequences for both preclinical and translational research.Preclinically, the sex of an experimental model (e.g., cell-line,organoid, patient-derived xenograft) may influence the effects ofdriver-gene mutations and, therefore, should be explicitly con-sidered. From a translational perspective, our results suggest thatin some cases, distinctmultigene panels should be used to predictprognosis or drug sensitivity in men and women. Overall, thesedata call for increased study and consideration of the role of sex incancer etiology, progression, treatment, andpersonalized therapy.

Disclosure of Potential Conflicts of InterestNo potential conflicts of interest were disclosed.

Authors' ContributionsConception and design: C.H. Li, Y.-J. Shiah, P.C. BoutrosDevelopment of methodology: C.H. Li, P.C. BoutrosAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): C.H. Li, S. Haider, Y.-J. Shiah, K. ThaiWriting, review, and/or revision of the manuscript: C.H. Li, K. Thai,P.C. BoutrosAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): C.H. Li, K. Thai, P.C. BoutrosStudy supervision: P.C. Boutros

AcknowledgmentsThis studywas conductedwith the support of theOntario Institute for Cancer

Research to P.C. Boutros through funding provided by the Government ofOntario. This work was supported by the Discovery Frontiers: Advancing BigData Science in Genomics Research program, which is jointly funded by theNatural Sciences and Engineering Research Council (NSERC) of Canada, theCanadian Institutes of Health Research (CIHR), Genome Canada, and theCanada Foundation for Innovation (CFI). P.C. Boutros was supported by aTerry Fox Research Institute New Investigator Award and a CIHR New Inves-tigator Award. This work was supported by an NSERC Discovery grant and byCanadian Institutes of Health Research, grant # SVB-145586, to P.C. Boutros.The authors thank all themembers of the Boutros lab for insightful discussions.The results described here are in part based upon data generated by the TCGAResearch Network: http://cancergenome.nih.gov/.

The costs of publication of this articlewere defrayed inpart by the payment ofpage charges. This article must therefore be hereby marked advertisement inaccordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received February 1, 2018; revised May 18, 2018; accepted June 26, 2018;published first October 1, 2018.

References1. Clemmesen J, Busk T. Cancer mortality among males and females in

Denmark, England, and Switzerland; incidence of accessible and inacces-sible cancers in Danish towns and rural areas. Cancer Res 1949;9:415–21.

2. Cook MB, Dawsey SM, Freedman ND, Inskip PD, Wichner SM, QuraishiSM, et al. Sex disparities in cancer incidence by period and age. CancerEpidemiol Biomarkers Prev 2009;18:1174–82.

Li et al.

Cancer Res; 78(19) October 1, 2018 Cancer Research5536

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 11: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

3. Edgren G, Liang L, Adami H-O, Chang ET. Enigmatic sex disparities incancer incidence. Eur J Epidemiol 2012;27:187–96.

4. Cook MB, McGlynn KA, Devesa SS, Freedman ND, Anderson WF. Sexdisparities in cancer mortality and survival. Cancer Epidemiol BiomarkersPrev 2011;20:1629–37.

5. FergusonMK,Wang J, Hoffman PC,Haraf DJ, Olak J,Masters GA, et al. Sex-associated differences in survival of patients undergoing resection for lungcancer. Ann Thorac Surg 2000;69:245–9.

6. Minami H, Yoshimura M, Miyamoto Y, Matsuoka H, Tsubota N. Lungcancer in women: sex-associated differences in survival of patients under-going resection for lung cancer. Chest 2000;118:1603–9.

7. Wakelee HA, Wang W, Schiller JH, Langer CJ, Sandler AB, Belani CP, et al.Survival differences by sex for patients with advanced non-small cell lungcancer on Eastern Cooperative Oncology Group trial 1594. J Thorac Oncol2006;1:441–6.

8. Kris MG, Natale RB, Herbst RS, Lynch TJ Jr, Prager D, Belani CP, et al.Efficacy of gefitinib, an inhibitor of the epidermal growth factor receptortyrosine kinase, in symptomatic patients with non–small cell lung cancer: arandomized trial. JAMA 2003;290:2149–58.

9. Wichmann MW, Muller C, Hornung HM, Lau-Werner U, Schildberg FW.Gender differences in long-term survival of patients with colorectal cancer.Br J Surg 2001;88:1092–8.

10. Elsaleh H, Joseph D, Grieu F, Zeps N, Spry N, Iacopetta B. Association oftumor site and sex with survival benefit from adjuvant chemotherapy incolorectal cancer. Lancet 2000;355:1745–50.

11. OuYang PY, Zhang LN, Lan XW, Xie C, Zhang WW, Wang QX, et al. Thesignificant survival advantage of female sex in nasopharyngeal carcinoma:a propensity-matched analysis. Br J Cancer 2015;112:1554–61.

12. Gupta S, Artomov M, Goggins W, Daly M, Tsao H. Gender disparity andmutation burden in metastatic melanoma. J Natl Cancer Inst 2015;107:djv221.

13. Mermel CH, Schumacher SE, Hill B, Meyerson ML, Beroukhim R, Getz G.GISTIC2.0 facilitates sensitive and confident localization of the targets offocal somatic copy-number alteration in human cancers. Genome Biol2011;12:R41.

14. Cai H, Kumar N, Ai N, Gupta S, Rath P, Baudis M. Progenetix: 12 years ofoncogenomic data curation. Nucleic Acids Res 2014;42:D1055–62.

15. Quon G, Haider S, Deshwar AG, Cui A, Boutros PC, Morris Q. Compu-tational purification of individual tumor gene expression profiles leads tosignificant improvements in prognostic prediction. Genome Med 2013;5:29.

16. Jorissen RN, Gibbs P, Christie M, Prakash S, Lipton L, Desai J, et al.Metastasis-associated gene expression changes predict poor outcomes inpatients with dukes stage B and C colorectal cancer. Clin Cancer Res2009;15:7642–51.

17. Marisa L, de Reyni�es A, Duval A, Selves J, Gaub MP, Vescovo L, et al. Geneexpression classification of colon cancer into molecular subtypes: charac-terization, validation, and prognostic value. PLoSMed 2013;10:e1001453.

18. P'ng C, Green J, Chong LC, Waggott D, Prokopec SD, Shamsi M, et al.BPG: seamless, automated and interactive visualization of scientific data.bioRxiv 2017. doi:10.1101/156067.

19. Dunford A, Weinstock DM, Savova V, Schumacher SE, Cleary JP, Yoda A,et al. Tumor-suppressor genes that escape fromX-inactivation contribute tocancer sex bias. Nat Genet 2017;49:10–6.

20. Vollan HK, Rueda OM, Chin SF, Curtis C, Turashvili G, Shah S, et al.A tumor DNA complex aberration index is an independent predictorof survival in breast and ovarian cancer. Mol Oncol 2015;9:115–27.

21. Lalonde E, Ishkanian AS, Sykes J, Fraser M, Ross-Adams H, Erho N, et al.Tumor genomic and microenvironmental heterogeneity for integratedprediction of 5-year biochemical recurrence of prostate cancer: a retro-spective cohort study. Lancet Oncol 2014;15:1521–32.

22. Hieronymus H, Schultz N, Gopalan A, Carver BS, Chang MT, Xiao Y, et al.Copy number alteration burden predicts prostate cancer relapse. Proc NatlAcad Sci U S A 2014;111:11139–44.

23. Shah SN, Hile SE, Eckert KA. Defective mismatch repair, microsatellitemutation bias, and variability in clinical cancer phenotypes. Cancer Res2010;70:431–5.

24. Li GM. Mechanisms and functions of DNA mismatch repair. Cell Res2008;18:85–98.

25. Reimand J, Arak T, Adler P, Kolberg L, Reisberg S, Peterson H, et al. g:Profiler—a web server for functional interpretation of gene lists (2016update). Nucleic Acids Res 2016;44:W83–9.

26. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al.Cytoscape: a software environment for integrated models of biomolecularinteraction networks. Genome Res 2003;13:2498–504.

27. Merico D, Isserlin R, Stueker O, Emili A, Bader GD. Enrichment map: anetwork-based method for gene-set enrichment visualization and inter-pretation. PLoS One 2010;5:e13984.

28. Cancer Genome Atlas Research Network. Comprehensive molecular char-acterization of clear cell renal cell carcinoma. Nature 2013;499:43–9.

29. Pe~na-Llopis S, Vega-Rubín-de-Celis S, Liao A, Leng N, Pavía-Jim�enez A,Wang S, et al. BAP1 loss defines a new class of renal cell carcinoma. NatGenet 2012;44:751–9.

30. Ricketts CJ, Linehan WM. Gender specific mutation incidence and survivalassociations in clear cell renal cell carcinoma(CCRCC). PLoSOne2015;10:e0140257.

31. Smith RJ, Savoian MS, Weber LE, Park JH. Ataxia telangiectasia mutated(ATM) interacts with p400 ATPase for an efficient DNA damage response.BMC Mol Biol 2016;17:22.

32. Kratz JR, He J, Van Den Eeden SK, Zhu ZH, Gao W, Pham PT, et al. Apractical molecular assay to predict survival in resected non-squamous,non-small-cell lung cancer: development and international validationstudies. Lancet 2012;379:823–32.

33. Lau SK, Boutros PC, Pintilie M, Blackhall FH, Zhu CQ, Strumpf D, et al.Three-gene prognostic classifier for early-stage non small-cell lung cancer. JClin Oncol 2007;25:5562–9.

34. Director's Challenge Consortium for the Molecular Classification ofLung Adenocarcinoma, Shedden K, Taylor JM, Enkemann SA, Tsao MS,Yeatman TJ, et al. Gene expression–based survival prediction in lungadenocarcinoma: a multi-site, blinded validation study. Nat Med 2008;14:822–7.

35. Boutros PC, Lau SK, Pintilie M, Liu N, Shepherd FA, Der SD, et al.Prognostic gene signatures for non-small-cell lung cancer. Proc Natl AcadSci U S A 2009;106:2824–8.

36. Xiao D, Pan H, Li F, Wu K, Zhang X, He J. Analysis of ultra-deep targetedsequencing reveals mutation burden is associated with gender and clinicaloutcome in lung adenocarcinoma. Oncotarget 2016;7:22857–64.

www.aacrjournals.org Cancer Res; 78(19) October 1, 2018 5537

Sex Differences in Cancer Driver Genes and Biomarkers

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Page 12: Sex Differences in Cancer Driver Genes and Biomarkers · Constance H. Li1,2, Syed Haider1,Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3 Abstract Cancer differs significantly

2018;78:5527-5537. Cancer Res   Constance H. Li, Syed Haider, Yu-Jia Shiah, et al.   Sex Differences in Cancer Driver Genes and Biomarkers

  Updated version

  http://cancerres.aacrjournals.org/content/78/19/5527

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerres.aacrjournals.org/content/suppl/2018/10/06/78.19.5527.DC1

Access the most recent supplemental material at:

   

   

  Cited articles

  http://cancerres.aacrjournals.org/content/78/19/5527.full#ref-list-1

This article cites 35 articles, 9 of which you can access for free at:

  Citing articles

  http://cancerres.aacrjournals.org/content/78/19/5527.full#related-urls

This article has been cited by 3 HighWire-hosted articles. Access the articles at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected]

To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerres.aacrjournals.org/content/78/19/5527To request permission to re-use all or part of this article, use this link

on September 23, 2020. © 2018 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from