download (.38 mb )
Post on 12-Jan-2017
218 Views
Preview:
TRANSCRIPT
Cell, Volume 158
Supplemental Information
Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality Livnat Jerby-Arnon, Nadja Pfetzer, Yedael Y. Waldman, Lynn McGarry, Daniel James, Emma Shanks, Brinton Seashore-Ludlow, Adam Weinstock, Tamar Geiger, Paul A Clemons, Eyal Gottlieb, and Eytan Ruppin
Extended Experimental Procedures
Evaluating the DAta-mIning SYnthetic-lethality-identification
pipeline (DAISY) based on experimentally detected Synthetic Lethal
(SL)-interactions
We tested the fit between the Synthetic Lethal (SL)-pairs identified by the DAta-mIning
SYnthetic-lethality-identification pipeline (DAISY), and those detected in six independent
synthetic lethality screens that were conducted in cancer cell lines: (1) An shRNA screen of 88
kinases conducted in renal carcinoma cells to identify the SL-partners of VHL (Bommi-Reddy et
al., 2008); (2) a screen of a small molecule library encompassing 1,200 drugs and drug-like
molecules that identified agents selectively lethal to endometrial adenocarcinoma cells lacking
functional MSH2 (Martin et al., 2009); (3-4) two high-throughput RNA interference (RNAi)
screens that identified determinants of sensitivity to a PARP1-inhibitor in breast cancer among
(3) DNA repair genes (Lord et al., 2008), and (4) kinases (Turner et al., 2008); (5) a genome-
wide shRNA screen (Luo et al., 2009) and (6) a large-scale siRNA screen (Steckel et al., 2012)
that identified genes selectively essential to KRAS-transformed colon cancer cells, but not to
derivatives lacking this oncogene.
We applied DAISY to identify the SL-partners of VHL, MSH2 and PARP1, and the Synthetic
Dosage Lethal (SDL)-partners of KRAS. DAISY examined overall 7,276 gene-pairs that were
experimentally examined in one of the screens described above. In the case of KRAS, for which
two large-scale screens were conducted, DAISY examined only genes that were tested in both
screens as potential KRAS SDL-partners. We considered a gene to be an experimentally
identified KRAS-SDL only if it was detected as a KRAS-SDL in both screens. For MSH2, we
mapped between the drugs that were utilized in the screen to their targets according to DrugBank
(Knox et al., 2011), and disregarded drugs with more than one target, to avoid ambiguity.
To rigorously evaluate DAISY's performances in identifying the SL and SDL partners of these
key cancer-associated genes we used the p-values DAISY generated to classify, in an
unsupervised manner, between SL and non-SL gene pairs or SDL and non-SDL genes. As
described in the Experimental Procedures, DAISY computes for every dataset and every pair of
genes a p-value that denotes the significance of the association between the genes according to
the pertaining dataset (prior to the correction for multiple hypotheses testing). For every
inference procedure we combined the p-values obtained by its datasets into a single p-value per
gene-pair via Fisher's combined probability test, also known as Fisher's Method (Mosteller and
Fisher, 1948).
SoFpvalue(A,B)=Fisher's_Method({SoFpvalue,I(A,B)| I∈ SoFdatasets})
shRNApvalue(A,B)=Fisher's_Method({shRNApvalue,I(A,B)| I∈shRNAdatasets})
mRNApvalue(A,B)=Fisher's_Method({mRNApvalue,I(A,B)| I∈mRNAdatasets})
We further integrated the three combined p-values into one p-value per gene-pair, again via
Fisher's method, when considering all inference procedures or only the SoF and co-expression
procedures.
SoF_mRNApvalue(A,B)=Fisher's_Method(SoFpvalue(A,B)∪mRNApvalue(A,B))
Allpvalue(A,B)= Fisher's_Method(SoFpvalue(A,B)∪shRNApvalue(A,B)∪mRNApvalue(A,B)})
We corrected the p-values for multiple hypotheses testing via Bonferroni correction, after their
combination.
Based on each one of the five p-values described above we generate a Receiver Operating
Characteristic (ROC) curve. The ROC-curve plots the fraction of true positives – correctly
predicted SLs or SDLs – out of the total actual positives (true positive rate) vs. the fraction of
false positives – falsely predicted SLs or SDLs – out of the total actual negatives (false positive
rate), across many decision threshold settings. The latter is an increasing p-value threshold,
starting from the most stringent definition that results in a very small and top-ranked set of
predicted SL and SDL pairs, and moving towards a more permissive setup in which more gene
pairs are predicted to interact. The resulting Area Under the Curve (AUC) of the ROC-curve is a
conventionally used measure of the overall performance of a classifier, where an AUC of 0.5
denotes the performance of a random predictor and an AUC of 1 denotes the performance of an
ideal predictor.
We computed an empirical p-value for the obtained AUC by randomly shuffling the labels
10,000 times, and re-computing the AUC with the random labels. We then counted the number
of times a random AUC was greater or equal to the original AUC. This number divided by
10,000 is the empirical p-value of the AUC.
Experimentally examining the SL-partners DAISY predicted for the
tumor suppressor VHL
Applying DAISY to predict the SL-interactions of VHL
DAISY was applied to detect the SL-partners of VHL (Rmin=0.3, see Experimental Procedures).
We considered only high confidence genes, that is, genes with a DAISY combined p-value
below 1e-05. To filter genes which are essential in RCC4 cells regardless of the loss of VHL we
predicted gene essentiality in this cell line by utilizing the SL-network and the SCNA and
mRNA profiles of RCC4 cells (Barretina et al., 2012). Genes that were predicted to be essential
in RCC4 (expressing pVHL) according to the supervised or unsupervised SL-based gene
essentiality predictors were discarded.
siRNA screening
An siRNA library of 44 targets predicted to reduce viability of VHL-deficient renal cancer cells
was purchased from Qiagen. Individual siRNA pools comprised of four oligos were arrayed in
96 well plates. Isogenic RCC4 renal carcinoma cells either expressing or deficient in VHL were
reverse transfected with 25 nM siRNA using Lipofectamine RNAiMAX (Life Technologies).
Internal plate controls included a non-targeting siRNA control (siNTC) and Allstars HS Cell
Death Control (Qiagen). Cells were seeded at 2000 cells/well in DMEM + 10%FCS and,
following a 72h incubation at 37˚C/20%O2/5%CO2, cells were fixed with 4% formaldehyde,
washed with PBS and stained with DAPI dilactate. Images were acquired at 10x magnification
using the Operetta high content analysis system (Perkin Elmer), and the number of nuclei were
quantified. The screen was performed in triplicate, with two independent replicates to provide 6
data points/siRNA/cell line. The high quality of the screen is reflected in the excellent plate
statistics (mean robust Z prime ± SD = 0.77±0.09 RCC-VHL, 0.086±0.07 RCC+VHL).
Raw cell count data for each test siRNA was normalized to median siNTC (n=8 wells) to
generate a % inhibition (PI) where:
PI = (median siNTC – test siRNA)/median siNTC control)*100
The median of all 6 replicates were subjected to outlier analysis using a threshold based on the
interquartile range (IQR): a PI was classified as an outlier if ≤Q1-1.5*IQR or ≥Q3+1.5*IQR,
with a maximum of 2 outliers accepted. The differential in cell line sensitivity to siRNA
knockdown was calculated as
ΔPI = VHL deficient PI - VHL expressing PI.
Comparing our siRNA screen to the Bommi-Reddy screen
One of genes DAISY predicted to be an SL-partner of VHL (MYT1) has been previously
identified as an SL-partner of VHL in a screen that searched for the SL-partners of VHL among
88 kinases (Bommi-Reddy et al., 2008). We treated MYT1 as a positive control anchor to
compare between our screen and the Bommi-Reddy et al. screen. While in our screen the
inhibition of 45.4% of the genes was at least as selective as the inhibition of MYT1, only 11.9%
of the genes examined in the Bommi-Reddy et al. screen had this property. Hence, according to
this joint positive control, our screen detected 3.83 times more SL-interactions than expected
according to the previous screen (Bernoulli p-value of 4.76e-09).
Drug screen
Nine drugs whose targets were predicted by DAISY to be selectively essential in VHL-deficient
renal cells were tested. All drugs were purchased from Sigma-Aldrich, Dorset, UK. Drugs were
diluted in water and serial dilutions were done by 1 in 3 dilution steps. Staurosporine was used as
a positive control. Water was used as a negative control. Cells were plated at 2000 cells/well in
198ul medium in 96-well black clear bottom plates and cultured for 24h. Then drugs were added
in a volume of 2ul. For each drug a range of concentrations were tested to identify a suitable
working concentration in which there was an effect on cells growth, but not complete death,
which is more likely to be due to non-specific toxicity; the final concentrations are given in
Table S2. Plates were incubated for 24 hours. Then cells were fixed with 4% formaldehyde and
stained with DAPI. Nuclei were counted using a high content imaging system (Operetta,
Perkin Elmer). Values from three treated wells were averaged and normalized to vehicle treated
cells. EC50 was calculated for each drug.
Examining the SL-network based on gene essentiality data
The utility of an SL-network can be examined by employing it in an unsupervised manner to
predict gene essentiality in a cell-line-specific manner, and testing whether these predictions are
supported by experimental results obtained in shRNA screens. The procedure is based on two
parameters:
Deletioncutoff − the SCNA level under which a gene is considered deleted.
SLessentialitycutoff − the minimal number of inactive SL-partners that renders a gene essential.
Given these parameters the procedure is performed as follows, for every cell line: (1)
Underexpressed genes that have an SCNA level below Deletioncutoff are defined as inactive; (2)
the number of inactive SL-partners of each gene denotes its predicted essentiality; (3) genes with
at least SLessentialitycutoff inactive SL-partner are predicted as essential.
To validate the SL-network in this manner we first reconstructed it without the shRNA datasets,
to avoid any circularity. We employed it to predict gene essentiality in 129 cancer cell lines. For
these cell lines we had both gene expression and SCNA data to generate the predictions, and
gene essentiality data for validation (Barretina et al., 2012; Cheung et al., 2011; Marcotte et al.,
2012). We defined Deletioncutoff as -0.1, based on the literature (Beroukhim et al., 2010), and
SLessentialitycutoff as 1 − a gene is said to be essential in a cell line if at least one of its SL pairs is
inactive. A gene was considered underexpressed if its expression was below the 10th percentile of
its expression across all samples in the dataset. We examined a range of Deletioncutoff and
SLessentialitycutoff parameters, demonstrating the robustness of the SL-network performances
(Table S5).
We examined the gene essentiality predictions based on the experimental shRNA scores reported
in two different shRNA screens (Cheung et al., 2011; Marcotte et al., 2012). The lower the
shRNA-essentiality-score is, the more essential the gene is. The examination process was
preformed as follows.
1. For each cell line we obtained four p-values:
a. Two one-sided Wilcoxon ranksum p-values, denoting whether the shRNA-essentiality-
scores of the predicted essential genes are significantly lower than those of genes predicted
as nonessential, when considering all genes or only SL-network genes as the background
model.
b. Two hypergeometric p-values, denoting if the predicted essential genes are significantly
enriched with experimentally identified essential genes, when considering all genes or only
SL-network genes as the background model. We defined a gene as experimentally essential
if its shRNA-essentiality-score in a given cell line was below the 10th percentile of the
shRNA-essentiality-score reported in the screen.
2. We computed, according to each one of these four p-values, the number of cell lines for
which the predictions significantly match the experimental findings (p-value<0.05).
To examine the significance of the results obtained by the SL-network we predicted gene-
essentiality based on 10,000 random networks of the same topology as SL-network, and
evaluated their predictions. Based on the performances of the random networks we obtained four
empirical p-values, each denoting if the performance of the SL-network is significant according
to one of the four p-values described in (1) above.
Examining the SDL-network based on drug efficacy measurements
We evaluated the validity of the SDL-network by employing it, in an unsupervised manner, to
predict the sensitivity of different cancer cell lines to various drugs, and testing the predictions
with drug efficacy measurements. The procedure is based on two parameters:
Overexpressioncutoff − a threshold for identifying overexpressed genes. For every gene we
computed the Overexpressioncutoff percentile of its expression level across the different
samples in the dataset, and defined a gene as overexpressed if its expression is above this
percentile.
SDLessentialitycutoff − the number of overexpressed SDL-partners that renders a gene
essential.
Given these two parameters, for every cell line: we identified its overexpressed genes, predicted
genes with at least SDLessentialitycutoff overexpressed SDL-partner as essential, and predicted the
cell line as sensitive to drugs whose targets were predicted as essential in it. We tested for each
drug whether its efficacy is higher in the cell lines that were predicted as sensitive compared to
its efficacy in cell lines that were predicted as resistant to its administration (one-sided Wilcoxon
ranksum test). We then computed the fraction of drugs for which the network significantly
differentiates (p-value <0.05) between sensitive and resistant cell line. We repeated process of
drug efficacy predictions based on 10,000 random networks of the same topology as the SDL-
network, and obtained empirical p-values, denoting the significance of SDL-network
performances in this task.
To test the predictions we used the data from the Cancer Genome Project (CGP) (Garnett et al.,
2012) and from the Cancer Therapeutics Response Portal (CTRP) (Basu et al., 2013)
pharmacological screens. The CGP data contains the IC50 values of 131 drugs across 639 cancer
cell lines. (The IC50 of a drug denotes the drug concentration required to eradicate 50% of the
cancer cells.) The CTRP data includes the sensitivities of 242 cancer cell lines to 354 small
molecules. The sensitivity measure in this case is termed area-under-the-dose-curve. We
extracted gene expression profiles of 593 out of the 639 cell lines used in the CGP data from the
CGP, and the expression profiles of 241 cell lines used in the CTRP from the Cancer Cell Line
Encyclopedia (CCLE) (Barretina et al., 2012). As our method exploits the SDL-network to
deduce the efficacy of each drug in a given context, we were able to perform the prediction only
for drugs that had at least one of their targets in the SDL-network − 37 and 50 drugs in the CGP
and CTRP data, respectively. We mapped the drugs to their targets based on the mapping
reported in the CGP, the CTRP, and DrugBank (Basu et al., 2013; Garnett et al., 2012; Knox et
al., 2011).
We set the parameters to an Overexpressioncutoff of 80, and an SDLessentialitycutoff of 2. Under
these definitions, we could predict the response of cells only to drugs that had targets with at
least two SDL-partners − 23 and 33 drugs in the CGP and CTRP data, respectively. We
examined the sensitivity of the predictions to the Overexpressioncutoff and SDLessentialitycutoff
parameters. The prediction performances across different parameter settings are provided in
Table S8. Lastly, to evaluate single SDL-interactions, we repeated this analysis for each SDL
pair alone, instead of using the entire SDL-network.
Supervised prediction: Data description
We constructed two types of neural network models. The first model predicts a gene-cell line
pair relation – that is, whether a specific gene is essential in a specific cancer cell line or not. The
second model predicts a drug-cell line pair relation – that is, the efficacy of a specific drug in a
given cell line. Both models use a similar set of 53 features, characterizing the gene's
neighborhood in the SL or SDL network and key genomic features of the cell-line addressed.
Below we describe the prediction models and the features used to construct them.
Supervised SL-based predictions of gene essentiality. The first type of models is given for
each gene-cell pair a set of 53 features (see section below), and predicts based on these features
if the gene is essential in the cancer cell line or not. To generate the features we utilized the SL-
network that was reconstructed without the shRNA datasets, to avoid any potential circularity.
For each of the two gene essentiality datasets (Cheung et al., 2011; Marcotte et al., 2012) we
generated a separate gene essentiality predictor. The predictor is trained to predict the essentiality
of genes that are included in the SL-network and were tested in the pertaining screen.
To predict the gene essentiality data reported in (Marcotte et al., 2012) we generated a neural
network model that predicts the essentiality of 1,510 SL-network-genes in 46 cancer cell lines. If
the zGARP score of the gene in the cell line was below -1.289 (below the 10th percentile of the
zGARP scores), it was denoted as essential in this cell line, and the pair was labeled as 1,
otherwise it was labeled -1 (that is, non-essential). We performed the prediction for 69,460 gene-
cell line pairs, 8,994 (12.9%) of which were labeled as 1, and the rest as -1.
To predict gene essentiality data reported in (Cheung et al., 2011) we generated a neural network
model that predicts the essentiality of 744 SL-network-genes in 92 cancer cell lines. If the
shRNA score of the gene in the cell line was below -1.567 (below the 10th percentile of the
shRNA scores), it was denoted as essential in this cell line, and the pair was labeled as 1,
otherwise it was labeled -1. We performed the prediction for 66,960 gene-cell line pairs, 7,821
(11.7%) of which were labeled as 1, and the rest as -1 (1,488 pairs were omitted due to the lack
of data).
Supervised SDL-based predictions of drug efficacy. The second type of models we obtained
are given a set of features that define a drug-cell line pair, and predict the efficacy of the drug
when administered to the cell line. We constructed such prediction models for each of the
pharmacologic datasets separately: (1) Models that predicts log IC50 values and are trained and
tested on the CGP data (Garnett et al., 2012), and (2) models that predict the area-under-the-
dose-curve and are trained and tested on the CTRP data (Basu et al., 2013). The features used to
build the predictors were generated based on the SDL-network and the genomic profiles of the
cell lines (see next section). To generate the features we extracted from the CCLE the gene
expression and SCNA profiles of 414 and 241 of the cell lines used in the CGP and CTRP data,
respectively. As our method exploits the SDL-network to deduce the efficacy of each drug in a
given cell-line-specific genomic context, we were able to perform the prediction only for drugs
that had at least one of their targets in the SDL-network − 41 and 50 drugs in the CGP and CTRP
data, respectively. For the CGP data the resulting matrix of 414 cell lines by 41 drugs contains
9,657 IC50 values, with 7,317 missing values; overall we had 9,610 drug-cell line pairs, as 47
pairs were removed due to the lack of genomic data (missing mRNA or SCNA data). For the
CTRP data the resulting matrix of 241 cell lines by 50 drugs contains 8,287 efficacy values, with
3,763 missing values; overall we had 8,001 drug-cell line pairs, as 286 pairs were removed due
to the lack of genomic data.
Supervised prediction: Features
We extracted 53 features that describe the state of a given gene in a given cell line based on the
SL or SDL network combined with SCNA and mRNA data extracted from the CCLE (Barretina
et al., 2012):
1. The number of inactive SL-partners or overactive SDL-partners the gene has in the cell
line. (A gene is defined as inactive if it is underexpressed and its SCNA level is below -
0.3, and as overactive if it is overexpressed and its SCNA level is above 0.3)
2-13. The sum, mean, minimal, and maximal levels of SCNA, mRNA, and normalized
mRNA measurements of the SL or SDL partners of the gene in the specific cell line
tested. (The mRNA measurements were normalized via z-score, such that the mean and
standard deviation of the expression of each gene across the samples are 0 and 1,
respectively.)
14-25. The sum, mean, minimal, and maximal levels of the SCNA, mRNA, and normalized
mRNA measurements of the SL or SDL partners of the gene across all cell lines.
26-27. The mRNA and SCNA levels of the gene in the cell line, times the number of inactive
SL-partners or overactive SDL-partners it has.
28-37. To capture key features of the gene's state in the SL and SDL networks we performed a
Principle Component Analysis (PCA) of the adjacency matrix of the networks. As the
networks are directional and not symmetric we also performed PCA with the transpose
of the network adjacency matrix. We then used the five first principle components of
the gene based on each one of these matrices.
38-39. The in- and out-degree of the gene in the SL or SDL network.
40-45. The mean, minimal and maximal SCNA and mRNA levels of the gene across the
different cell lines.
46-47. The mRNA and SCNA levels of the gene in the cell line.
48-53. The mean, minimal and maximal mRNA and SCNA levels measured in the cell line.
To predict drug efficacy in various cancer cell lines we transformed these gene-cell features to
drug-cell features. We mapped between the drug and its target genes, and computed the drug-cell
features as an average of the (target) gene-cell feature. The mapping between drugs and their
targets was according to the CGP (Garnett et al., 2012), the CTRP (Basu et al., 2013), and
DrugBank (Knox et al., 2011).
Constructing supervised neural network predictors
We built neural network predictors by employing the MATLAB implementation of a feed-
forward multi layer perceptron (the function ‘fitnet’) with the default parameters. We defined
three different layers: input, hidden and output layer. The number of features (53, see above)
determined the number of input units. The number of hidden units was 20, and the perceptron
activation function was the sigmoid function. We performed a 5-fold cross-validation for
building our models: We separated the original dataset into five equally sized test sets, obtained
by randomly distributing all gene-cell or drug-cell pairs into five sets. In the discretized form
(gene-cell) each test set had the same ratio between positive and negative samples as in the full
dataset. In each iteration of the cross validation 60% of the data was used to train the model, 20%
was used for internal validation, and the remaining 20% − the test set − was used exclusively for
testing the model.
Predicting gene essentiality based on experimental sh/siRNA screens
The results obtained in gene essentiality screens can be quantified directly by measuring the level
of growth-inhibition observed when knocking-down a gene in a cell line, or indirectly by
measuring the depletion rate of the shRNA or siRNA probes that inhibit the gene. Either way, we
will refer to the output of the screens as gene essentiality scores, denoting for each gene-cell pair
the level of essentiality of the given gene in the given cell line, such that the higher the score the
more essential the gene is.
We assessed the fit between two gene essentiality screens that were conducted on the same cell
line(s), and generated competing predictors to our SL-based gene-essentiality predictors. To this
end, we utilized the gene essentiality levels obtained in one screen to predict the gene essentiality
observed in another screen, as follows. First, we defined which screen is to be predicted, and
which screen will function as a predictor. We labeled each gene-cell pair as true if the pertaining
gene was found to be essential in the given cell line in the predicted screen, and as false
otherwise. A gene was identified as essential in a screen if its essentiality score was among the
top 10% scores obtained in the screen.
We then defined, based on the predictor screen, all the possible valid predictions. A valid
prediction is such that if a certain gene is predicted to be essential in a given cell line, and that
gene has the gene essentiality level of X in the cell line according to the predictor screen, then
every other gene-cell pair that has a gene essentiality level equal or greater than X will also be
predicted as true. Hence, the number of valid predictions based on a predictor screen equals the
number of unique gene essentiality values obtained in that screen. For each valid prediction we
then quantified its True Positive Rate (TPR) and False Positive Rate (FPR), to obtain the ROC
curve of the predictor. The AUC of the predictor represents the prediction accuracy of the
predictor screen, and can be compared to the AUC that was obtained by other predictors, such as
our SL-based predictors.
Experimentally validating the SL-based prediction of gene
essentiality in a breast cancer cell line: siRNA screening
Cells were grown on the medium DMEM/F12 (1:1) (Cat#: 21331, with 10%FCS and 2mM
Glutamine), and reverse-transfected in duplicate with 25 nM of Dharmacon ON-TARGETplus
SMARTpools in 96-well plates using Lullaby transfectant reagent. A SMARTpool targeting
PLK1 and a non-targeting pool were used as positive and negative controls, respectively. After
24 hours, culture medium was topped up with fresh medium (200µl final vol) and cells were
incubated for further 72 hours in designated incubators with 20% or 1% oxygen, respectively.
Then, cells were fixed with 4% formaldehyde and stained with DAPI. Nuclei were counted using
a high content imaging system (Operetta, Perkin Elmer). Inhibition was calculated as
((MEDIAN(NTC) - SAMPLE) / (MEDIAN(NTC) - MEDIAN(PLK1) ) *100
and the average of the 2 replicates was calculated.
Utilizing the SL-network to predict breast cancer prognosis
We analyzed the gene-expression profiles of 2,000 breast cancer clinical samples to examine the
prognostic-value embedded in the SL-network (Curtis et al., 2012). We disregarded samples
whose survival status was ambiguous or unknown, resulting in 1,586 samples. Based on the gene
expression of each one of the SL-pairs we defined two groups of patients:
1. SL- group, consisting of patients whose tumors underexpressed both of the SL-paired
genes; a gene is defined as underexpressed if its expression level in the sample is lower
than its median expression level across all the samples.
2. SL+ group, consisting of patients whose tumors expressed at least one of the SL-paired
genes; a gene is defined as expressed if its expression level in the sample is at least as
high as its median expression level across all the samples.
For each SL-pair we generated the 15-year survival Kaplan-Meier (KM) plots of its two
corresponding SL- and SL+ groups of patients, and obtained a logrank p-value denoting the
significance of the separation between the two groups in terms of their prognosis (Bland and
Altman, 2004). In addition, we defined a signed KM-score, whose magnitude (absolute value) is
-ln(logrank p-value), and hence the more significant the logrank p-value is the higher the
magnitude of the signed KM-score will be. The sign of the signed KM-score is positive if the SL-
group had better prognosis compared to the SL+ group, and negative otherwise. The rationale
behind the signed KM-score is that we assume the SL-pairs not only significantly separate
between groups of patients in respect to their prognosis (as reflected by the logrank p-value), but
do so in a directional manner: the SL- group is expected to have better prognosis as compared to
the SL+ group, since co-underexpression of paired SL genes is likely to increase the vulnerability
of the tumor.
We repeated the analysis described above with two groups of 10,000 randomly selected gene-
pairs: (a) Those that are selected from SL-network-genes, and (b) those that are selected from all
genes. We then compared the results (logrank p-values and signed KM-scores) obtained with the
original SL-network pairs to the results obtained with these control groups via a one-sided
Wilcoxon ranksum test.
For each SL-pair of genes we further performed a Cox-regression to evaluate whether its
prognostic value is significant even when accounting for the following clinical characteristics of
the breast cancer patients: Age at diagnosis, grade, tumor size, lymph nodes, estrogen receptor
expression, HER2 expression, progesterone receptor expression, and genomic instability index
(as previously defined (Bilal et al., 2013)). The logrank and Cox regression p-values that were
obtained for every SL-pair are given in Table S7. Correction for multiple hypotheses testing was
done according to the Benjamini-Hochberg algorithm (Benjamini and Hochberg, 1995).
Lastly, we classified the patients according to the overall SL-network behavior. Instead of
considering only the expression of a specific SL-pair, we considered the expression of the entire
set of SL-pairs in a given sample. We computed for each sample how many of the SL-pairs in
the network it co-underexpressed as the sample global SL-score. As a random model we
generated random networks of the same topology as the SL-network that consisted of essential
genes in breast cancer – 2,077 genes that obtained the lowest average zGARP score measured in
29 breast cancer cell lines (Marcotte et al., 2012). The random network include 2,077 genes as
the original SL-network includes 2,077 genes. Based on each one of these networks we
computed for each sample the number of connected genes it co-underexpressed (its global SL-
scores); we uniformly divided the samples into four groups according to these scores. For each
random network we then computed a logrank p-value, denoting if the 15-year survival of the four
groups is significantly different. We also examined if the order of the four groups is as expected,
that is, if the groups with higher global SL-scores had better 15-year survival. We then counted
the number of random networks that obtained a logrank p-value which is at least as low as that
obtained by the SL-network, and also had the right order of groups in terms of survival.
In this analysis we did not use random networks that consist of the SL-network genes as a
control because the global SL-scores obtained by such networks are highly correlated with the
SL-scores of the original network (mean Spearman correlation coefficient of 0.927, p-value <1e-
30).
Supplemental Results
1. Characterizing the SL and SDL networks
1.1. The essentiality and evolutionary conservation of SL and SDL network genes
Genes that participate in SL and SDL interactions may be viewed as being context-specific
essential. Linking between synthetic lethality and essentiality, it has been shown that in yeast
there is a strong correlation between the number of SL-interactions a gene has and the fitness of
its single-mutant (Costanzo et al., 2010): Genes whose single mutants have severe fitness defects
tend to exhibit an increased number of SL-interactions. In light of this observation we examined
different properties of the SL and SDL network-genes to evaluate their level of essentiality.
We utilized a set of 2,472 essential genes in mouse and their orthologs in human (Georgi et al.,
2013). Based on this set we find that SL and SDL genes are significantly enriched with orthologs
of mouse essential genes (hypergeometric p-values << 1e-30). Furthermore, in concordance with
the findings in yeast, the likelihood of a gene to be an ortholog of a mouse essential gene is
increased if it has a high degree in the network (Figure S2A).
We examined if the SL and SDL genes tend to be more evolutionary conserved compared to
other genes. To this end we utilized the dN/dS ratio as a measure of conservation, where dN
denotes the number of nonsynonymous substitutions per non-synonymous site, and dS denotes
the number of synonymous substitutions per synonymous site. Hence, a low dN/dS ratio is an
indicator of conservation. We extracted dN/dS ratios obtained by comparing between human and
mouse and between human and rhesus macaque from BioMart (Kasprzyk, 2011). The ratios were
available for 16,960 and 17,364 genes for mouse and rhesus macaque, respectively. We find that
SL (SDL) genes are more conserved as compared to other genes both when examining the
conservation in relation to mice (Wilcoxon ranksum p-values of 2.99e-17 and 6.65e-46) and in
relation to rhesus (Wilcoxon ranksum p-values of 7.53e-18 and 5.47e-35). Once again, genes
with a higher degree in the network have even lower dN/dS ratios compared to other network
genes (Figures S2B-C).
1.2. The SL and SDL networks compared to the Protein-Protein Interaction (PPI) network
To examine the association between the Protein-Protein Interactions (PPI) and the SL networks
we extracted information regarding physical interactions from the Human Protein Reference
Database (HPRD), release 9 (Keshava Prasad et al., 2009). The PPI network contains 9,617
proteins and 39,174 interactions. When comparing between physical and SL (SDL) interactions,
we focused on 1,497 (2,083) proteins that are both in the PPI network and in the SL (SDL)
network.
First, we find that genes in the SL and SDL networks have a higher degree in the PPI network
compared to other genes, especially if their degree in the SL or SDL network is high (Wilcoxon
ranksum p-values of 2.19e-26 and 5.79e-22, respectively, Figure S2D). Likewise, the degree of a
gene in the SL or SDL network is weakly correlated to its degree in PPI sub-networks that
include only SL or SDL genes, respectively (Spearman correlation coefficients of 0.136 and
0.098, p-values of 1.34e-07 and 7.03e-06, respectively). Second, genes that interact in the SL or
SDL network are highly enriched with genes that interact in the PPI network (hypergeometric p-
values of 4.020e-07 and <1e-30, fold enrichment of 4.54 and 30.57 for the SL and SDL
networks, respectively). Next we examined if SL and SDL pairs tend to be closer in the PPI
network, though not necessarily neighbors. We computed for each SL-pair the distance between
its partners, that is, the length of the shortest path between its partners in the PPI-network. We
found that SL and SDL interacting genes are significantly closer compared to other gene-pairs
(Wilcoxon ranksum p-values of 1.79e-15 and 2.39e-14 for SL and SDL pairs, respectively).
1.3. Genes in the SL and SDL networks are associated with cancer-specific-proliferation
We examined the association of the SL and SDL network genes to cancer-specific proliferation.
To this end we utilized the cancer Proliferation Index (cPI) and non-cancerous Proliferation
Index (nPI) as reported in (Waldman et al., 2013). The cPI of a gene is based on the association
between its expression levels and growth rates measured across 60 cancer cell lines (NCI-60)
panel. Positive cPI values indicate positive association with growth rate while negative cPI
values indicate negative association with growth rate. Similarly, nPI values are based on the
association between the gene expression levels and growth rates measured across 224
lymphoblastoid cell lines.
Interestingly we find that SL and SDL genes have significantly high cPI values, compared to
non-network genes, especially when considering genes with a high degree in the network
(Wilcoxon ranksum p-values of 8.08e-09, and 4.32e-36, for the SL and SDL networks,
respectively, Figure S2E). The nPI values of network genes are also higher than those of non-
network genes, though much less significantly (Wilcoxon ranksum p-values of 0.013, and 0.133,
for the SL and SDL networks, respectively, Figure S2F). These results imply that the network
genes are involved specifically in cancer proliferations.
1.4. Genes in the SL and SDL networks are overexpressed in normal tissues and in cancer
We processed gene expression profiles measured in 30 different normal human tissues (Su et al.,
2004), as previously described (Waldman et al., 2010). Analyzing these profiles we find that the
SL and SDL genes are expressed in significantly higher levels compared to other genes
(Wilcoxon ranksum p-values of 6.29e-08 and 1.30e-18, respectively, Figure S2G). Additionally,
the number of tissues in which SL and SDL genes are expressed, termed expression breadth, is
significantly high compared to other genes (Wilcoxon ranksum p-values of 9.45e-08 and 3.62e-
28, respectively, Figure S2H). Likewise, SL and SDL genes with a higher degree in the networks
have even higher expression and expression breadth (Figures S2G-H).
We then examined if SL and SDL genes are also overexpressed in cancer clinical samples. To
this end we reconstructed the networks without the TCGA data and utilized the mRNA
expression profiles of 6,296 cancer clinical samples extracted from TCGA (The Cancer Genome
Atlas Research et al., 2013). Indeed, SL and SDL genes are significantly overexpressed in cancer
clinical samples compared to other genes (Wilcoxon ranksum p-values of 3.40e-157 and 6.47e-
235, respectively). As in normal tissue, also in cancer samples the expression of genes is higher
if their degree in the SL or SDL network is higher (Figure S2I).
Lastly, the SL and SDL networks are enriched with cancer-associated genes, including:
anticancer drug targets (Knox et al., 2011), oncogenes and tumor suppressors (Chan et al., 2010;
Zhao et al., 2013), and cancer amplification and deletion drivers (Beroukhim et al., 2010) (Figure
S2J).
1.5. The genomic distribution of SL and SDL pairs
We examined the distribution of the genomic distance between SL and SDL-interacting genes.
We defined the distance between two genes as the genomic distance between them in base-pairs,
if they reside on the same chromosome, and infinity otherwise. We found that 97.6% of the SL-
pairs are located on different chromosomes, and that the distances between them are significantly
high compared to randomly selected gene pairs (Wilcoxon ranksum p-value of 3.62e-11, Figure
S3A). When examining the SDL-pairs, we found the opposite behavior, 84.5% of the SDL-pairs
reside on the same chromosome, and they are significantly close compared to randomly selected
gene pairs (Wilcoxon ranksum p-value <1e-30, Figure S3B).
One of the three inference strategies of DAISY, termed genomic Survival of the Fittest (SoF),
detects SL and SDL interactions based on Somatic Copy-Number Alterations (SCNA), which can
be effected by genomic linkage. Frequent co-amplification of two genomically proximal genes A
and B can leads to over-detection of events like "A is amplified → B is not deleted" that are used
to identify SDL pairs by the SoF filter (see Figure 1 in the main text). We hence examined if the
additional filters DAISY applies manage to prevent it from falsely detecting gene-pairs as SDLs
merely due to their genomic proximity.
First, we conducted an operative test in which we compared the SDL-network to alternative
SDL-networks in which the problem of false-positive detection due to genomic proximity is
alleviated. We constructed 11 such networks: (1) a network that is based on the two other
inference procedures without the SoF approach, and (2) 10 networks that were constructed under
an increasing cutoff that defines the minimal allowed genomic location distance between a pair
in the network, starting from 10%, up to 100% of the average chromosome length. Based on
each one of these networks we then predicted drug response, and examined the predictions
according to the CGP data (Garnett et al., 2012) and the CTRP data (Basu et al., 2013). The
predictive signal of the original SDL-network reported in the main text is significantly superior
to the signal obtained by these alternative networks (Figures S3C-D).
Second, in light of the strong predictive signal displayed we examined if SDL-interactions have a
true tendency towards genomic proximity. To this end we examined three SDL-networks that
were constructed based only on the shRNA-based functional examination approach (Figure 1,
Experimental Procedures), by using one of the three shRNA screens (Cheung et al., 2011; Luo et
al., 2008; Marcotte et al., 2012), and an additional SDL-network that we constructed from the
shRNA and gene expression data but again without using the SoF approach and copy-number
data. In three out of these four SDL-networks (that are devoid of the potential false positive bias
introduced by considering copy number data), SDL-pairs still tend to be significantly more
closely located on the genome than random gene pairs (Wilcoxon ranksum p-values of 4.36e-15,
5.26e-03, and 0.320 for the three shRNA-based networks, and a p-value of 1.80e-219 for the
SDL-network constructed independently of the SoF approach).
These finding support the notion that functionally meaningful SDL-pairs are more closely
located on the genome, an interesting observation whose investigation is beyond the scope of the
current study. Taken together, and especially in light of their operational utility, we did not filter
out SDL-interactions based on their genomic proximity.
2. Harnessing the SL-network to predict gene essentiality in cancer cell lines
2.1. Gene essentiality is cancer cell line specific
We quantified the extent to which gene essentiality in cancer is cell line dependent, and hence
potentially arising from synthetic lethality. We computed for each gene the number of cell lines
in which it was found to be essential, according to two shRNA screens (Cheung et al., 2011;
Marcotte et al., 2012). Among the genes which are essential in at least one cell line, the majority
is essential only in a few cell lines, and not across the board (Figure S4A). Interestingly, the gene
essentiality distribution has a power-law distribution.
2.2. Robustness analysis of SL-based essentiality prediction
To apply the SL-network for predicting gene essentiality in a cell line specific manner we
devised an approach that depends on two parameters: Deletioncutoff and SLessentialitycutoff. The
former denotes the SCNA level under which an underexpressed gene is considered inactive, and
the latter denotes the number of inactive SL-partners required to deduce that a gene is essential
(Extended Experimental Procedures). We applied this approach to predict gene essentiality based
on the SL-network (that was constructed without shRNA data) in overall 129 different cancer
cell lines, and examined the predictions based on the results obtained in two large-scale screens
(Cheung et al., 2011; Marcotte et al., 2012).
In main text we report the results obtained with a Deletioncutoff of -0.1 and an SLessentialitycutoff
of 1. However, we examined the network performances across a broad range of parameters. We
set the Deletioncutoff and SLessentialitycutoff parameters to 10 different values each, ranging from -
0.1 to -1, and from 1-10, respectively. In each setting we characterized the predictive signal of
the network by the four empirical p-values as described in the Extended Experimental
Procedures. A full report of the results obtained by each one of the 100 settings is given in Table
S5. Overall, we find that the network prediction performance is highly robust across a fairly
broad range of definitions (Table S5). However, the more stringent the gene loss and essentiality
definitions are, fewer predictions could be made for more genetically stable cell lines. Likewise,
genes that have a number of SL-partners that is below the SLessentialitycutoff parameter could not
have been predicted as essential in any cell line, regardless of the genomic profiles of the cell
lines. Below we discuss the Deletioncutoff and SLessentialitycutoff parameters and the tradeoff
between them.
The SCNA level of a gene is the observed vs. expected number of copies it has in a given
sample, on a log2 scale. Hence, if the reference state has two copies of a given gene, a SCNA
level of -1 is equivalent to a heterozygous loss of a gene, meaning, one copy. It should be noted,
that SCNA data is measured at the population-level, and hence contains the average SCNA level
of a given gene in a population of cells. If the sample is contaminated with normal cells, the copy
number of the cancer cells will be more extreme, that is, the SCNA level of the cancer cells will
be higher or lower if the measured SCNA level is positive or negative, respectively.
A full deletion of a gene is a rare event − in 78.4% of the cancer SCNA profiles we analyzed
there is not a single gene with a SCNA level lower than -1 (Beroukhim et al., 2010). We
therefore tested several, more moderate, definitions of gene loss (setting the Deletioncutoff to 10
different values ranging from -0.1 to -1). As gene deletion was defined more permissively, one
(partially) deleted SL-partner may not be sufficient to render a gene essential. Hence, we
examined several thresholds on the number of inactive SL-partners that are required to make a
target gene essential (setting the SLessentialitycutoff parameter to 10 different values, ranging from
1-10).
2.3. The prediction-signal and genomic instability
It is more likely that the essentiality of more genes will arise due to synthetic lethality rather than
due to other unrelated causes in cell lines with many inactive genes. Hence, we postulated that
the SL-network will obtain more accurate gene-essentiality-predictions for cell lines with a
higher number of inactive genes. To examine this hypothesis, we computed the Spearman
correlation across all cell lines between the fraction of inactive genes and the prediction-p-
values. The latter were computed as explained in the Extended Experimental Procedures.
We find a significant negative correlation between the fraction of inactive genes in the cell lines
and their prediction-p-values, especially under more stringent Deletioncutoff and SLessentialitycutoff
definitions (Table S5). Hence, the more inactive genes the cell line has, the better the SL-
network predicts its essential genes (Figures S4B-C).
2.4. Comparison to the prediction-signal of a yeast-derived SL-network
We repeated the gene essentiality predictions, with the yeast-derived SL-network, originally
termed the inferred Human SL Network (iHSLN) (Conde-Pueyo et al., 2009), and evaluated the
predictions as described in the Extended Experimental Procedures. The results obtained by the
DAISY-derived-SL-network are significantly superior to those obtained by the iHSLN (Figures
S4D-E, Table S5).
3. Experimentally validating the SL-based predictions of gene essentiality in a
breast cancer cell line
To further examine the gene essentiality predictions obtained based on the SL-network we
conducted a whole genome siRNA screen in the triple negative cell line BT549 under normoxia
and hypoxia (Table S6). The gene essentiality of BT549 has been previously measured via
shRNA (Marcotte et al., 2012). Hence, we could examine the concordance between our
predictions and the experimental screens in comparison to the concordance between the two
experimental screens to each other. We predicted gene essentiality in BT549 by training an SL-
based neural-network model on the gene essentiality reported in the Macrotte screen after
omitting any information regarding BT549 (Extended Experimental Procedures).
The genes that were found as essential in BT549 according to the two experimental screens
significantly overlap, especially when applying a strict definition of gene essentiality (Figures
S4F-G). Notably, the fit between the SL-based gene essentiality predictions and the
experimentally identified gene essentiality is of similar magnitude to the fit between the two
experimental screens (Figures S4F-G). Testifying to the veracity of the predictions, the highest
observed overlap is between the SL-based predictions and the genes that were found as essential
in all screens (hypergeometric p-value of 2.46e-41, Figures S4F-G).
We then compared the predictive value of the SL-based predictions to the predictive value of the
experimental screens. To this end we defined four sets of essential genes in BT549:
1. Essnormxia – The top 10% essential genes according to the siRNA screen conducted under
normoxia.
2. Esshypoxia – The top 10% essential genes according to the siRNA screen conducted under
hypoxia.
3. EssMarcotte – The top 10% essential genes in BT549 according to the shRNA screen (Marcotte
et al., 2012).
4. Essconfident – the intersection between Essnormxia, Esshypoxia, and EssMarcotte.
We also defined four competing predictors of gene essentiality:
1. PsiRNA – the results obtained in the siRNA screen conducted under normoxia (Extended
Experimental Procedures).
2. PshRNA – the results obtained in the shRNA screen for BT549 (Marcotte et al., 2012)
(Extended Experimental Procedures).
3. PSL_Macrotte – the SL-based predictor that was obtained by training the neural network model
on the gene essentiality of other cancer cell lines (not BT549) as reported in the Marcotte
screen (Marcotte et al., 2012).
4. PSL_Achilles – the SL-based predictor that was obtained by training the neural network model
on the gene essentiality of other cancer cell lines as reported in the Achilles screen (Cheung
et al., 2011).
We examined the ability of each predictor to predict gene essentiality as defined by each of the
gene essentiality sets. The SL-based predictors – PSL_Macrotte and PSL_Achilles – predict Essnormxia and
Esshypoxia in the same manner as PshRNA (AUC ~0.6-0.65, Figures S4H-I). PSL_Macrotte improves
upon PsiRNA in predicting EssMarcotte (AUCs of 0.842 and 0.625, respectively, Figure S4J). Lastly,
the SL-based predictors – PSL_Macrotte and PSL_Achilles – obtain the highest AUCs when predicting
Essconfident (AUCs of 0.951 and 0.682, Figure S4K).
Next, we utilized the SL-network to predict gene essentiality in BT549 in an unsupervised
manner, meaning, without learning from experimental gene essentiality measurements (Extended
Experimental Procedures). Genes that were predicted to be essential in BT549 are indeed
enriched with the top 10% of essential genes according to the experimental screens
(hypergeometric p-values of 6.88e-12, 3.04e-08, and 1.46e-08, for Essnormxia, Esshypoxia and
EssMarcotte, respectively). Reassuringly the genes that were predicted as essential are most
significantly enriched with Essconfident (hypergeometric p-value of 3.74e-13).
4. Utilizing the SDL-network to predict drug response
4.1. Self-SDL-interactions
Unlike SL-interactions in the case of SDL-interactions a gene can potentially be an SDL-partner
of itself. Such an interaction implies that over-activation of this gene also induces its essentiality.
This phenomenon is quite frequent in cancer cells, and is termed oncogene addiction (Weinstein
and Joe, 2008). The SDL-network includes 534 inner loops, that is, self-interacting genes. To
assess the significance of these inner loops we performed the unsupervised predictions of drug
response reported in the main text based on the SDL-network with and without self-interactions,
and based only on the self-interactions (Table S8). The overall number of significantly predicted
drugs (Wilcoxon ranksum p-value <0.05) is 21, 17, and 9, when utilizing the SDL-network with
and without self-interactions, and based only on the self-interactions, respectively. As self
interactions improve the prediction performances of the network, we chose to retain them.
However, for many drugs self-interactions alone are insufficient to explain the response to the
drug.
4.2. Robustness analysis of the unsupervised SDL-based drug response predictions
We utilized the SDL-network to predict drug-efficacy in an unsupervised manner. As for the
SL-based prediction of gene essentiality, the prediction is based on two parameters:
Overexpressioncutoff and SDLessentialitycutoff (Extended Experimental Procedures). We repeated
the drug efficacy predictions with different definitions of gene overexpression
(Overexpressioncutoff) and gene essentiality (SDLessentialitycutoff), ranging from 50-90 and 1-5,
respectively. The predictive-signal obtained under each one of the different 25 settings is
reported in Table S8. The prediction-signal is highly robust across a fairly broad range of
definitions. However, when employing more stringent gene essentiality definitions
(SDLessentialitycutoff), we could not predict the response to drugs whose targets have a low
number of SDL-interactions.
4.3. shRNA-based functional examination improves SDL-based drug response predictions
To examine the role of the shRNA-based functional examination in the identification of SDL-
interactions we generated an SDL-network without accounting for shRNA data, and utilized it to
predict drug response in an unsupervised manner. The performances of the resulting network in
drug response prediction compared to the performances of the original SDL-network
demonstrate that the inclusion of shRNA data boosts the predictability of the SDL-network
(Figure S7, Table S8).
4.4. Comparing the SDL-based drug response predictors to mutation and genomic
instability based predictors
We have shown that the SDL-network enables to accurately predict the response of cancer cell
lines to various drugs. To further examine the quality of the SDL-based predictors in a
comparative manner we predicted drug response based on two other well established approaches,
and compared their performance to those obtained by the SDL-predictors.
The first approach is based on the notion that the mutation and copy-number status of the drug
target can be utilized to predict the drug response in cancer. Implementing this approach we
utilized the mutation status and SCNA level of the drug targets, extracted from (Barretina et al.,
2012; Garnett et al., 2012). We first obtain for each drug its single target predictors, each
accounts for one of the drug targets and predicts cell lines in which this specific target is
amplified or mutated (in a missense mutation) as sensitive. For each drug we then consider the
status of all of its targets by generating the following three predictors:
1. The best single target predictor – the predictor that predicted most accurately the
response to the drug (according to a Wilcoxon ranksum test that compared the observed
efficacy of the drug in the predicted sensitive and resistant cell lines).
2. A voting predictor that predicts a cell line to be sensitive to a drug if the majority of the
drug's single target predictors predicted it to be sensitive.
3. A combined predictor that predicts a cell line to be sensitive to a drug if at least one of
the drug's single target predictors predicted it to be sensitive.
The SDL-predictor obtains more accurate predictions for 62.5%, 64.3%, and 78.6% of the drugs
when compared to each of the predictors described in (1)-(3), respectively. Likewise, if
considering only drugs that are significantly (p-value < 0.05) predicted by at least one of the
predictors, the SDL-predictor improves upon the competing predictors in (compared to (1))
81.5%, (2) 77.8% and (3) 88.9% of the drugs. Lastly, while the SDL-predictor obtains significant
predictions for 22 drugs, the competing predictors obtain significant predictions for only (1) 9,
(2) 4, and (3) 4 drugs.
The second approach is based on the concept that genomic instability induces drug resistance. To
predict drug response according to genomic instability we computed the genomic instability
index of each cell line based on its SCNA profile, as previously described (Bilal et al., 2013). We
defined a cell line as genomically stable or unstable if its genomic instability index is below or
above the median genomic instability index across the cell lines, respectively. As expected, we
found that the stable cells were more sensitive to the majority of drugs compared to the unstable
cells. However, the SDL-predictor improves upon the genomic-instability predictor in 38
(67.9%) out of the 56 drugs. When considering only the 27 drugs that are significantly predicted
by one of the predictors, the SDL-predictor obtains more accurate predictions for 20 (74.1%)
drugs. Lastly, while the SDL-predictor obtains significant predictions for 22 drugs, the genomic
instability predictor obtains significant predictions for only 9 drugs.
5. Predicting drug-response based on SL-interactions
The SL-network does not enable to accurately predict the response of cancer cell lines to the
administration of different anticancer drugs (data not shown). This is possibly since anticancer
drugs usually target oncogenes, whose essentiality is mainly dictated by other types of genetic
interactions, as SDL-interactions. Supporting this claim, the SL-network predicts best the
response to a PARP1 inhibitor (ABT-888, one-sided Wilcoxon ranksum p-value 0.046, CGP
data), which is one of the few anticancer drug that currently relies on synthetic lethality. For
comparison, as PARP1 is synthetically lethal with BRCA1/2 (Lord et al., 2008; Turner et al.,
2008), we divided the CGP cell lines according to their BRCA1/2 mutation-status and predicted
that the mutated cell lines will be sensitive to PARP-inhibition. We then compared the IC50
values of ABT-888 in the predicted sensitive and in the predicted resistant cell lines via a one-
sided Wilcoxon ranksum, and obtained a p-value of 0.889. We also used the SCNA and mRNA
levels of the BRCA genes to deduce which cell lines have an inactive form of BRCA1/2. When
predicting these cell lines as sensitive we obtained a one-sided Wilcoxon ranksum p-value 0.902.
Hence, the SL-based predictions of the response to PARP1 inhibition improve upon those
obtained by accounting for the well-established SL-interactions between the BRCA genes and
PARP1.
Supplemental References
Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful
Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological) 57,
289‐300.
Bland, J.M., and Altman, D.G. (2004). The logrank test. BMJ 328, 1073.
Chan, H.‐H., Tsai, S.‐J., and Sun, H.S. (2010). Tumor Associated Gene database
(http://www.binfo.ncku.edu.tw/TAG/GeneDoc.php).
Georgi, B., Voight, B.F., and Bućan, M. (2013). From Mouse to Human: Evolutionary Genomics Analysis
of Human Orthologs of Essential Genes. PLoS Genet 9, e1003484.
Kasprzyk, A. (2011). BioMart: driving a paradigm change in biological data management. Database 2011.
Keshava Prasad, T.S., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla,
D., Raju, R., Shafreen, B., Venugopal, A., et al. (2009). Human Protein Reference Database—2009
update. Nucleic Acids Research 37, D767‐D772.
Knox, C., Law, V., Jewison, T., Liu, P., Ly, S., Frolkis, A., Pon, A., Banco, K., Mak, C., Neveu, V., et al. (2011).
DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Research 39,
D1035‐D1041.
Mosteller, F., and Fisher, R.A. (1948). Questions and Answers. The American Statistician 2, 30‐31.
Su, A.I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K.A., Block, D., Zhang, J., Soden, R., Hayakawa, M.,
Kreiman, G., et al. (2004). A gene atlas of the mouse and human protein‐encoding transcriptomes.
Proceedings of the National Academy of Sciences of the United States of America 101, 6062‐6067.
Waldman, Y.Y., Tuller, T., Shlomi, T., Sharan, R., and Ruppin, E. (2010). Translation efficiency in humans:
tissue specificity, global optimization and differences between developmental stages. Nucleic Acids
Research 38, 2964‐2974.
Weinstein, I.B., and Joe, A. (2008). Oncogene Addiction. Cancer Research 68, 3077‐3080.
Zhao, M., Sun, J., and Zhao, Z. (2013). TSGene: a web resource for tumor suppressor genes. Nucleic Acids
Research 41, D970‐D976.
Table S1. Data Description, Related to Figure 1
Type Data type Additional data No. clinical
samples
Reference
Clinical
samples
SCNA ‐‐ 2,201 (Beroukhim et al., 2010)
SCNA
mRNA and
mutations
‐‐ 6,296 (2,978 with
mutation data)
The Cancer Genome Atlas (TCGA)
(The Cancer Genome Atlas
Research et al., 2013)
Cancer
cell lines
SCNA ‐‐ 591 (Beroukhim et al., 2010)
SCNA &
mRNA
‐‐ 995 The Cancer Cell Line Encyclopedia
(CCLE) (Barretina et al., 2012)
mRNA ‐‐ 790 CPG (Garnett et al., 2012)
mRNA ‐‐ 997 CCLE (Barretina et al., 2012)
shRNA SCNA and mRNA
profiles (Barretina
et al., 2012)
92 Achilles (Cheung et al., 2011)
shRNA SCNA and mRNA
profiles (Barretina
et al., 2012)
46 (Marcotte et al., 2012)
shRNA SCNA profiles
(Beroukhim et al.,
2010)
9 (Luo et al., 2008)
top related