download (.38 mb )

29
Cell, Volume 158 Supplemental Information Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality Livnat Jerby-Arnon, Nadja Pfetzer, Yedael Y. Waldman, Lynn McGarry, Daniel James, Emma Shanks, Brinton Seashore-Ludlow, Adam Weinstock, Tamar Geiger, Paul A Clemons, Eyal Gottlieb, and Eytan Ruppin

Upload: doantuong

Post on 12-Jan-2017

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Download (.38 MB )

Cell, Volume 158

Supplemental Information

Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality Livnat Jerby-Arnon, Nadja Pfetzer, Yedael Y. Waldman, Lynn McGarry, Daniel James, Emma Shanks, Brinton Seashore-Ludlow, Adam Weinstock, Tamar Geiger, Paul A Clemons, Eyal Gottlieb, and Eytan Ruppin

Page 2: Download (.38 MB )

Extended Experimental Procedures

Evaluating the DAta-mIning SYnthetic-lethality-identification

pipeline (DAISY) based on experimentally detected Synthetic Lethal

(SL)-interactions

We tested the fit between the Synthetic Lethal (SL)-pairs identified by the DAta-mIning

SYnthetic-lethality-identification pipeline (DAISY), and those detected in six independent

synthetic lethality screens that were conducted in cancer cell lines: (1) An shRNA screen of 88

kinases conducted in renal carcinoma cells to identify the SL-partners of VHL (Bommi-Reddy et

al., 2008); (2) a screen of a small molecule library encompassing 1,200 drugs and drug-like

molecules that identified agents selectively lethal to endometrial adenocarcinoma cells lacking

functional MSH2 (Martin et al., 2009); (3-4) two high-throughput RNA interference (RNAi)

screens that identified determinants of sensitivity to a PARP1-inhibitor in breast cancer among

(3) DNA repair genes (Lord et al., 2008), and (4) kinases (Turner et al., 2008); (5) a genome-

wide shRNA screen (Luo et al., 2009) and (6) a large-scale siRNA screen (Steckel et al., 2012)

that identified genes selectively essential to KRAS-transformed colon cancer cells, but not to

derivatives lacking this oncogene.

We applied DAISY to identify the SL-partners of VHL, MSH2 and PARP1, and the Synthetic

Dosage Lethal (SDL)-partners of KRAS. DAISY examined overall 7,276 gene-pairs that were

experimentally examined in one of the screens described above. In the case of KRAS, for which

two large-scale screens were conducted, DAISY examined only genes that were tested in both

screens as potential KRAS SDL-partners. We considered a gene to be an experimentally

identified KRAS-SDL only if it was detected as a KRAS-SDL in both screens. For MSH2, we

mapped between the drugs that were utilized in the screen to their targets according to DrugBank

(Knox et al., 2011), and disregarded drugs with more than one target, to avoid ambiguity.

To rigorously evaluate DAISY's performances in identifying the SL and SDL partners of these

key cancer-associated genes we used the p-values DAISY generated to classify, in an

unsupervised manner, between SL and non-SL gene pairs or SDL and non-SDL genes. As

Page 3: Download (.38 MB )

described in the Experimental Procedures, DAISY computes for every dataset and every pair of

genes a p-value that denotes the significance of the association between the genes according to

the pertaining dataset (prior to the correction for multiple hypotheses testing). For every

inference procedure we combined the p-values obtained by its datasets into a single p-value per

gene-pair via Fisher's combined probability test, also known as Fisher's Method (Mosteller and

Fisher, 1948).

SoFpvalue(A,B)=Fisher's_Method({SoFpvalue,I(A,B)| I∈ SoFdatasets})

shRNApvalue(A,B)=Fisher's_Method({shRNApvalue,I(A,B)| I∈shRNAdatasets})

mRNApvalue(A,B)=Fisher's_Method({mRNApvalue,I(A,B)| I∈mRNAdatasets})

We further integrated the three combined p-values into one p-value per gene-pair, again via

Fisher's method, when considering all inference procedures or only the SoF and co-expression

procedures.

SoF_mRNApvalue(A,B)=Fisher's_Method(SoFpvalue(A,B)∪mRNApvalue(A,B))

Allpvalue(A,B)= Fisher's_Method(SoFpvalue(A,B)∪shRNApvalue(A,B)∪mRNApvalue(A,B)})

We corrected the p-values for multiple hypotheses testing via Bonferroni correction, after their

combination.

Based on each one of the five p-values described above we generate a Receiver Operating

Characteristic (ROC) curve. The ROC-curve plots the fraction of true positives – correctly

predicted SLs or SDLs – out of the total actual positives (true positive rate) vs. the fraction of

false positives – falsely predicted SLs or SDLs – out of the total actual negatives (false positive

rate), across many decision threshold settings. The latter is an increasing p-value threshold,

starting from the most stringent definition that results in a very small and top-ranked set of

predicted SL and SDL pairs, and moving towards a more permissive setup in which more gene

pairs are predicted to interact. The resulting Area Under the Curve (AUC) of the ROC-curve is a

conventionally used measure of the overall performance of a classifier, where an AUC of 0.5

denotes the performance of a random predictor and an AUC of 1 denotes the performance of an

ideal predictor.

Page 4: Download (.38 MB )

We computed an empirical p-value for the obtained AUC by randomly shuffling the labels

10,000 times, and re-computing the AUC with the random labels. We then counted the number

of times a random AUC was greater or equal to the original AUC. This number divided by

10,000 is the empirical p-value of the AUC.

Experimentally examining the SL-partners DAISY predicted for the

tumor suppressor VHL

Applying DAISY to predict the SL-interactions of VHL

DAISY was applied to detect the SL-partners of VHL (Rmin=0.3, see Experimental Procedures).

We considered only high confidence genes, that is, genes with a DAISY combined p-value

below 1e-05. To filter genes which are essential in RCC4 cells regardless of the loss of VHL we

predicted gene essentiality in this cell line by utilizing the SL-network and the SCNA and

mRNA profiles of RCC4 cells (Barretina et al., 2012). Genes that were predicted to be essential

in RCC4 (expressing pVHL) according to the supervised or unsupervised SL-based gene

essentiality predictors were discarded.

siRNA screening

An siRNA library of 44 targets predicted to reduce viability of VHL-deficient renal cancer cells

was purchased from Qiagen. Individual siRNA pools comprised of four oligos were arrayed in

96 well plates. Isogenic RCC4 renal carcinoma cells either expressing or deficient in VHL were

reverse transfected with 25 nM siRNA using Lipofectamine RNAiMAX (Life Technologies).

Internal plate controls included a non-targeting siRNA control (siNTC) and Allstars HS Cell

Death Control (Qiagen). Cells were seeded at 2000 cells/well in DMEM + 10%FCS and,

following a 72h incubation at 37˚C/20%O2/5%CO2, cells were fixed with 4% formaldehyde,

washed with PBS and stained with DAPI dilactate. Images were acquired at 10x magnification

using the Operetta high content analysis system (Perkin Elmer), and the number of nuclei were

quantified. The screen was performed in triplicate, with two independent replicates to provide 6

data points/siRNA/cell line. The high quality of the screen is reflected in the excellent plate

statistics (mean robust Z prime ± SD = 0.77±0.09 RCC-VHL, 0.086±0.07 RCC+VHL).

Page 5: Download (.38 MB )

Raw cell count data for each test siRNA was normalized to median siNTC (n=8 wells) to

generate a % inhibition (PI) where:

PI = (median siNTC – test siRNA)/median siNTC control)*100

The median of all 6 replicates were subjected to outlier analysis using a threshold based on the

interquartile range (IQR): a PI was classified as an outlier if ≤Q1-1.5*IQR or ≥Q3+1.5*IQR,

with a maximum of 2 outliers accepted. The differential in cell line sensitivity to siRNA

knockdown was calculated as

ΔPI = VHL deficient PI - VHL expressing PI.

Comparing our siRNA screen to the Bommi-Reddy screen

One of genes DAISY predicted to be an SL-partner of VHL (MYT1) has been previously

identified as an SL-partner of VHL in a screen that searched for the SL-partners of VHL among

88 kinases (Bommi-Reddy et al., 2008). We treated MYT1 as a positive control anchor to

compare between our screen and the Bommi-Reddy et al. screen. While in our screen the

inhibition of 45.4% of the genes was at least as selective as the inhibition of MYT1, only 11.9%

of the genes examined in the Bommi-Reddy et al. screen had this property. Hence, according to

this joint positive control, our screen detected 3.83 times more SL-interactions than expected

according to the previous screen (Bernoulli p-value of 4.76e-09).

Drug screen

Nine drugs whose targets were predicted by DAISY to be selectively essential in VHL-deficient

renal cells were tested. All drugs were purchased from Sigma-Aldrich, Dorset, UK. Drugs were

diluted in water and serial dilutions were done by 1 in 3 dilution steps. Staurosporine was used as

a positive control. Water was used as a negative control. Cells were plated at 2000 cells/well in

198ul medium in 96-well black clear bottom plates and cultured for 24h. Then drugs were added

in a volume of 2ul. For each drug a range of concentrations were tested to identify a suitable

working concentration in which there was an effect on cells growth, but not complete death,

which is more likely to be due to non-specific toxicity; the final concentrations are given in

Table S2. Plates were incubated for 24 hours. Then cells were fixed with 4% formaldehyde and

stained with DAPI. Nuclei were counted using a high content imaging system (Operetta,

Page 6: Download (.38 MB )

Perkin Elmer). Values from three treated wells were averaged and normalized to vehicle treated

cells. EC50 was calculated for each drug.

Examining the SL-network based on gene essentiality data

The utility of an SL-network can be examined by employing it in an unsupervised manner to

predict gene essentiality in a cell-line-specific manner, and testing whether these predictions are

supported by experimental results obtained in shRNA screens. The procedure is based on two

parameters:

Deletioncutoff − the SCNA level under which a gene is considered deleted.

SLessentialitycutoff − the minimal number of inactive SL-partners that renders a gene essential.

Given these parameters the procedure is performed as follows, for every cell line: (1)

Underexpressed genes that have an SCNA level below Deletioncutoff are defined as inactive; (2)

the number of inactive SL-partners of each gene denotes its predicted essentiality; (3) genes with

at least SLessentialitycutoff inactive SL-partner are predicted as essential.

To validate the SL-network in this manner we first reconstructed it without the shRNA datasets,

to avoid any circularity. We employed it to predict gene essentiality in 129 cancer cell lines. For

these cell lines we had both gene expression and SCNA data to generate the predictions, and

gene essentiality data for validation (Barretina et al., 2012; Cheung et al., 2011; Marcotte et al.,

2012). We defined Deletioncutoff as -0.1, based on the literature (Beroukhim et al., 2010), and

SLessentialitycutoff as 1 − a gene is said to be essential in a cell line if at least one of its SL pairs is

inactive. A gene was considered underexpressed if its expression was below the 10th percentile of

its expression across all samples in the dataset. We examined a range of Deletioncutoff and

SLessentialitycutoff parameters, demonstrating the robustness of the SL-network performances

(Table S5).

We examined the gene essentiality predictions based on the experimental shRNA scores reported

in two different shRNA screens (Cheung et al., 2011; Marcotte et al., 2012). The lower the

shRNA-essentiality-score is, the more essential the gene is. The examination process was

preformed as follows.

Page 7: Download (.38 MB )

1. For each cell line we obtained four p-values:

a. Two one-sided Wilcoxon ranksum p-values, denoting whether the shRNA-essentiality-

scores of the predicted essential genes are significantly lower than those of genes predicted

as nonessential, when considering all genes or only SL-network genes as the background

model.

b. Two hypergeometric p-values, denoting if the predicted essential genes are significantly

enriched with experimentally identified essential genes, when considering all genes or only

SL-network genes as the background model. We defined a gene as experimentally essential

if its shRNA-essentiality-score in a given cell line was below the 10th percentile of the

shRNA-essentiality-score reported in the screen.

2. We computed, according to each one of these four p-values, the number of cell lines for

which the predictions significantly match the experimental findings (p-value<0.05).

To examine the significance of the results obtained by the SL-network we predicted gene-

essentiality based on 10,000 random networks of the same topology as SL-network, and

evaluated their predictions. Based on the performances of the random networks we obtained four

empirical p-values, each denoting if the performance of the SL-network is significant according

to one of the four p-values described in (1) above.

Examining the SDL-network based on drug efficacy measurements

We evaluated the validity of the SDL-network by employing it, in an unsupervised manner, to

predict the sensitivity of different cancer cell lines to various drugs, and testing the predictions

with drug efficacy measurements. The procedure is based on two parameters:

Overexpressioncutoff − a threshold for identifying overexpressed genes. For every gene we

computed the Overexpressioncutoff percentile of its expression level across the different

samples in the dataset, and defined a gene as overexpressed if its expression is above this

percentile.

SDLessentialitycutoff − the number of overexpressed SDL-partners that renders a gene

essential.

Page 8: Download (.38 MB )

Given these two parameters, for every cell line: we identified its overexpressed genes,  predicted

genes with at least SDLessentialitycutoff overexpressed SDL-partner as essential, and predicted the

cell line as sensitive to drugs whose targets were predicted as essential in it. We tested for each

drug whether its efficacy is higher in the cell lines that were predicted as sensitive compared to

its efficacy in cell lines that were predicted as resistant to its administration (one-sided Wilcoxon

ranksum test). We then computed the fraction of drugs for which the network significantly

differentiates (p-value <0.05) between sensitive and resistant cell line. We repeated process of

drug efficacy predictions based on 10,000 random networks of the same topology as the SDL-

network, and obtained empirical p-values, denoting the significance of SDL-network

performances in this task.

To test the predictions we used the data from the Cancer Genome Project (CGP) (Garnett et al.,

2012) and from the Cancer Therapeutics Response Portal (CTRP) (Basu et al., 2013)

pharmacological screens. The CGP data contains the IC50 values of 131 drugs across 639 cancer

cell lines. (The IC50 of a drug denotes the drug concentration required to eradicate 50% of the

cancer cells.) The CTRP data includes the sensitivities of 242 cancer cell lines to 354 small

molecules. The sensitivity measure in this case is termed area-under-the-dose-curve. We

extracted gene expression profiles of 593 out of the 639 cell lines used in the CGP data from the

CGP, and the expression profiles of 241 cell lines used in the CTRP from the Cancer Cell Line

Encyclopedia (CCLE) (Barretina et al., 2012). As our method exploits the SDL-network to

deduce the efficacy of each drug in a given context, we were able to perform the prediction only

for drugs that had at least one of their targets in the SDL-network − 37 and 50 drugs in the CGP

and CTRP data, respectively. We mapped the drugs to their targets based on the mapping

reported in the CGP, the CTRP, and DrugBank (Basu et al., 2013; Garnett et al., 2012; Knox et

al., 2011).

We set the parameters to an Overexpressioncutoff of 80, and an SDLessentialitycutoff of 2. Under

these definitions, we could predict the response of cells only to drugs that had targets with at

least two SDL-partners − 23 and 33 drugs in the CGP and CTRP data, respectively. We

examined the sensitivity of the predictions to the Overexpressioncutoff and SDLessentialitycutoff

parameters. The prediction performances across different parameter settings are provided in

Page 9: Download (.38 MB )

Table S8. Lastly, to evaluate single SDL-interactions, we repeated this analysis for each SDL

pair alone, instead of using the entire SDL-network.

Supervised prediction: Data description

We constructed two types of neural network models. The first model predicts a gene-cell line

pair relation – that is, whether a specific gene is essential in a specific cancer cell line or not. The

second model predicts a drug-cell line pair relation – that is, the efficacy of a specific drug in a

given cell line. Both models use a similar set of 53 features, characterizing the gene's

neighborhood in the SL or SDL network and key genomic features of the cell-line addressed.

Below we describe the prediction models and the features used to construct them.

Supervised SL-based predictions of gene essentiality. The first type of models is given for

each gene-cell pair a set of 53 features (see section below), and predicts based on these features

if the gene is essential in the cancer cell line or not. To generate the features we utilized the SL-

network that was reconstructed without the shRNA datasets, to avoid any potential circularity.

For each of the two gene essentiality datasets (Cheung et al., 2011; Marcotte et al., 2012) we

generated a separate gene essentiality predictor. The predictor is trained to predict the essentiality

of genes that are included in the SL-network and were tested in the pertaining screen.

To predict the gene essentiality data reported in (Marcotte et al., 2012) we generated a neural

network model that predicts the essentiality of 1,510 SL-network-genes in 46 cancer cell lines. If

the zGARP score of the gene in the cell line was below -1.289 (below the 10th percentile of the

zGARP scores), it was denoted as essential in this cell line, and the pair was labeled as 1,

otherwise it was labeled -1 (that is, non-essential). We performed the prediction for 69,460 gene-

cell line pairs, 8,994 (12.9%) of which were labeled as 1, and the rest as -1.

To predict gene essentiality data reported in (Cheung et al., 2011) we generated a neural network

model that predicts the essentiality of 744 SL-network-genes in 92 cancer cell lines. If the

shRNA score of the gene in the cell line was below -1.567 (below the 10th percentile of the

shRNA scores), it was denoted as essential in this cell line, and the pair was labeled as 1,

otherwise it was labeled -1. We performed the prediction for 66,960 gene-cell line pairs, 7,821

Page 10: Download (.38 MB )

(11.7%) of which were labeled as 1, and the rest as -1 (1,488 pairs were omitted due to the lack

of data).

Supervised SDL-based predictions of drug efficacy. The second type of models we obtained

are given a set of features that define a drug-cell line pair, and predict the efficacy of the drug

when administered to the cell line. We constructed such prediction models for each of the

pharmacologic datasets separately: (1) Models that predicts log IC50 values and are trained and

tested on the CGP data (Garnett et al., 2012), and (2) models that predict the area-under-the-

dose-curve and are trained and tested on the CTRP data (Basu et al., 2013). The features used to

build the predictors were generated based on the SDL-network and the genomic profiles of the

cell lines (see next section). To generate the features we extracted from the CCLE the gene

expression and SCNA profiles of 414 and 241 of the cell lines used in the CGP and CTRP data,

respectively. As our method exploits the SDL-network to deduce the efficacy of each drug in a

given cell-line-specific genomic context, we were able to perform the prediction only for drugs

that had at least one of their targets in the SDL-network − 41 and 50 drugs in the CGP and CTRP

data, respectively. For the CGP data the resulting matrix of 414 cell lines by 41 drugs contains

9,657 IC50 values, with 7,317 missing values; overall we had 9,610 drug-cell line pairs, as 47

pairs were removed due to the lack of genomic data (missing mRNA or SCNA data). For the

CTRP data the resulting matrix of 241 cell lines by 50 drugs contains 8,287 efficacy values, with

3,763 missing values; overall we had 8,001 drug-cell line pairs, as 286 pairs were removed due

to the lack of genomic data.

Supervised prediction: Features

We extracted 53 features that describe the state of a given gene in a given cell line based on the

SL or SDL network combined with SCNA and mRNA data extracted from the CCLE (Barretina

et al., 2012):

1. The number of inactive SL-partners or overactive SDL-partners the gene has in the cell

line. (A gene is defined as inactive if it is underexpressed and its SCNA level is below -

0.3, and as overactive if it is overexpressed and its SCNA level is above 0.3)

2-13. The sum, mean, minimal, and maximal levels of SCNA, mRNA, and normalized

mRNA measurements of the SL or SDL partners of the gene in the specific cell line

Page 11: Download (.38 MB )

tested. (The mRNA measurements were normalized via z-score, such that the mean and

standard deviation of the expression of each gene across the samples are 0 and 1,

respectively.)

14-25. The sum, mean, minimal, and maximal levels of the SCNA, mRNA, and normalized

mRNA measurements of the SL or SDL partners of the gene across all cell lines.

26-27. The mRNA and SCNA levels of the gene in the cell line, times the number of inactive

SL-partners or overactive SDL-partners it has.

28-37. To capture key features of the gene's state in the SL and SDL networks we performed a

Principle Component Analysis (PCA) of the adjacency matrix of the networks. As the

networks are directional and not symmetric we also performed PCA with the transpose

of the network adjacency matrix. We then used the five first principle components of

the gene based on each one of these matrices.

38-39. The in- and out-degree of the gene in the SL or SDL network.

40-45. The mean, minimal and maximal SCNA and mRNA levels of the gene across the

different cell lines.

46-47. The mRNA and SCNA levels of the gene in the cell line.

48-53. The mean, minimal and maximal mRNA and SCNA levels measured in the cell line.

To predict drug efficacy in various cancer cell lines we transformed these gene-cell features to

drug-cell features. We mapped between the drug and its target genes, and computed the drug-cell

features as an average of the (target) gene-cell feature. The mapping between drugs and their

targets was according to the CGP (Garnett et al., 2012), the CTRP (Basu et al., 2013), and

DrugBank (Knox et al., 2011).

Constructing supervised neural network predictors

We built neural network predictors by employing the MATLAB implementation of a feed-

forward multi layer perceptron (the function ‘fitnet’) with the default parameters. We defined

three different layers: input, hidden and output layer. The number of features (53, see above)

determined the number of input units. The number of hidden units was 20, and the perceptron

activation function was the sigmoid function. We performed a 5-fold cross-validation for

building our models: We separated the original dataset into five equally sized test sets, obtained

Page 12: Download (.38 MB )

by randomly distributing all gene-cell or drug-cell pairs into five sets. In the discretized form

(gene-cell) each test set had the same ratio between positive and negative samples as in the full

dataset. In each iteration of the cross validation 60% of the data was used to train the model, 20%

was used for internal validation, and the remaining 20% − the test set − was used exclusively for

testing the model.

Predicting gene essentiality based on experimental sh/siRNA screens

The results obtained in gene essentiality screens can be quantified directly by measuring the level

of growth-inhibition observed when knocking-down a gene in a cell line, or indirectly by

measuring the depletion rate of the shRNA or siRNA probes that inhibit the gene. Either way, we

will refer to the output of the screens as gene essentiality scores, denoting for each gene-cell pair

the level of essentiality of the given gene in the given cell line, such that the higher the score the

more essential the gene is.

We assessed the fit between two gene essentiality screens that were conducted on the same cell

line(s), and generated competing predictors to our SL-based gene-essentiality predictors. To this

end, we utilized the gene essentiality levels obtained in one screen to predict the gene essentiality

observed in another screen, as follows. First, we defined which screen is to be predicted, and

which screen will function as a predictor. We labeled each gene-cell pair as true if the pertaining

gene was found to be essential in the given cell line in the predicted screen, and as false

otherwise. A gene was identified as essential in a screen if its essentiality score was among the

top 10% scores obtained in the screen.

We then defined, based on the predictor screen, all the possible valid predictions. A valid

prediction is such that if a certain gene is predicted to be essential in a given cell line, and that

gene has the gene essentiality level of X in the cell line according to the predictor screen, then

every other gene-cell pair that has a gene essentiality level equal or greater than X will also be

predicted as true. Hence, the number of valid predictions based on a predictor screen equals the

number of unique gene essentiality values obtained in that screen. For each valid prediction we

then quantified its True Positive Rate (TPR) and False Positive Rate (FPR), to obtain the ROC

curve of the predictor. The AUC of the predictor represents the prediction accuracy of the

Page 13: Download (.38 MB )

predictor screen, and can be compared to the AUC that was obtained by other predictors, such as

our SL-based predictors.

Experimentally validating the SL-based prediction of gene

essentiality in a breast cancer cell line: siRNA screening

Cells were grown on the medium DMEM/F12 (1:1) (Cat#: 21331, with 10%FCS and 2mM

Glutamine), and reverse-transfected in duplicate with 25 nM of Dharmacon ON-TARGETplus

SMARTpools in 96-well plates using Lullaby transfectant reagent. A SMARTpool targeting

PLK1 and a non-targeting pool were used as positive and negative controls, respectively. After

24 hours, culture medium was topped up with fresh medium (200µl final vol) and cells were

incubated for further 72 hours in designated incubators with 20% or 1% oxygen, respectively.

Then, cells were fixed with 4% formaldehyde and stained with DAPI. Nuclei were counted using

a high content imaging system (Operetta, Perkin Elmer). Inhibition was calculated as

((MEDIAN(NTC) - SAMPLE) / (MEDIAN(NTC) - MEDIAN(PLK1) ) *100

and the average of the 2 replicates was calculated.

Utilizing the SL-network to predict breast cancer prognosis

We analyzed the gene-expression profiles of 2,000 breast cancer clinical samples to examine the

prognostic-value embedded in the SL-network (Curtis et al., 2012). We disregarded samples

whose survival status was ambiguous or unknown, resulting in 1,586 samples. Based on the gene

expression of each one of the SL-pairs we defined two groups of patients:

1. SL- group, consisting of patients whose tumors underexpressed both of the SL-paired

genes; a gene is defined as underexpressed if its expression level in the sample is lower

than its median expression level across all the samples.

2. SL+ group, consisting of patients whose tumors expressed at least one of the SL-paired

genes; a gene is defined as expressed if its expression level in the sample is at least as

high as its median expression level across all the samples.

For each SL-pair we generated the 15-year survival Kaplan-Meier (KM) plots of its two

corresponding SL- and SL+ groups of patients, and obtained a logrank p-value denoting the

Page 14: Download (.38 MB )

significance of the separation between the two groups in terms of their prognosis (Bland and

Altman, 2004). In addition, we defined a signed KM-score, whose magnitude (absolute value) is

-ln(logrank p-value), and hence the more significant the logrank p-value is the higher the

magnitude of the signed KM-score will be. The sign of the signed KM-score is positive if the SL-

group had better prognosis compared to the SL+ group, and negative otherwise. The rationale

behind the signed KM-score is that we assume the SL-pairs not only significantly separate

between groups of patients in respect to their prognosis (as reflected by the logrank p-value), but

do so in a directional manner: the SL- group is expected to have better prognosis as compared to

the SL+ group, since co-underexpression of paired SL genes is likely to increase the vulnerability

of the tumor.

We repeated the analysis described above with two groups of 10,000 randomly selected gene-

pairs: (a) Those that are selected from SL-network-genes, and (b) those that are selected from all

genes. We then compared the results (logrank p-values and signed KM-scores) obtained with the

original SL-network pairs to the results obtained with these control groups via a one-sided

Wilcoxon ranksum test.

For each SL-pair of genes we further performed a Cox-regression to evaluate whether its

prognostic value is significant even when accounting for the following clinical characteristics of

the breast cancer patients: Age at diagnosis, grade, tumor size, lymph nodes, estrogen receptor

expression, HER2 expression, progesterone receptor expression, and genomic instability index

(as previously defined (Bilal et al., 2013)). The logrank and Cox regression p-values that were

obtained for every SL-pair are given in Table S7. Correction for multiple hypotheses testing was

done according to the Benjamini-Hochberg algorithm (Benjamini and Hochberg, 1995).

Lastly, we classified the patients according to the overall SL-network behavior. Instead of

considering only the expression of a specific SL-pair, we considered the expression of the entire

set of SL-pairs in a given sample. We computed for each sample how many of the SL-pairs in

the network it co-underexpressed as the sample global SL-score. As a random model we

generated random networks of the same topology as the SL-network that consisted of essential

genes in breast cancer – 2,077 genes that obtained the lowest average zGARP score measured in

29 breast cancer cell lines (Marcotte et al., 2012). The random network include 2,077 genes as

the original SL-network includes 2,077 genes. Based on each one of these networks we

Page 15: Download (.38 MB )

computed for each sample the number of connected genes it co-underexpressed (its global SL-

scores); we uniformly divided the samples into four groups according to these scores. For each

random network we then computed a logrank p-value, denoting if the 15-year survival of the four

groups is significantly different. We also examined if the order of the four groups is as expected,

that is, if the groups with higher global SL-scores had better 15-year survival. We then counted

the number of random networks that obtained a logrank p-value which is at least as low as that

obtained by the SL-network, and also had the right order of groups in terms of survival.

In this analysis we did not use random networks that consist of the SL-network genes as a

control because the global SL-scores obtained by such networks are highly correlated with the

SL-scores of the original network (mean Spearman correlation coefficient of 0.927, p-value <1e-

30).

Page 16: Download (.38 MB )

Supplemental Results

1. Characterizing the SL and SDL networks

1.1. The essentiality and evolutionary conservation of SL and SDL network genes

Genes that participate in SL and SDL interactions may be viewed as being context-specific

essential. Linking between synthetic lethality and essentiality, it has been shown that in yeast

there is a strong correlation between the number of SL-interactions a gene has and the fitness of

its single-mutant (Costanzo et al., 2010): Genes whose single mutants have severe fitness defects

tend to exhibit an increased number of SL-interactions. In light of this observation we examined

different properties of the SL and SDL network-genes to evaluate their level of essentiality.

We utilized a set of 2,472 essential genes in mouse and their orthologs in human (Georgi et al.,

2013). Based on this set we find that SL and SDL genes are significantly enriched with orthologs

of mouse essential genes (hypergeometric p-values << 1e-30). Furthermore, in concordance with

the findings in yeast, the likelihood of a gene to be an ortholog of a mouse essential gene is

increased if it has a high degree in the network (Figure S2A).

We examined if the SL and SDL genes tend to be more evolutionary conserved compared to

other genes. To this end we utilized the dN/dS ratio as a measure of conservation, where dN

denotes the number of nonsynonymous substitutions per non-synonymous site, and dS denotes

the number of synonymous substitutions per synonymous site. Hence, a low dN/dS ratio is an

indicator of conservation. We extracted dN/dS ratios obtained by comparing between human and

mouse and between human and rhesus macaque from BioMart (Kasprzyk, 2011). The ratios were

available for 16,960 and 17,364 genes for mouse and rhesus macaque, respectively. We find that

SL (SDL) genes are more conserved as compared to other genes both when examining the

conservation in relation to mice (Wilcoxon ranksum p-values of 2.99e-17 and 6.65e-46) and in

relation to rhesus (Wilcoxon ranksum p-values of 7.53e-18 and 5.47e-35). Once again, genes

with a higher degree in the network have even lower dN/dS ratios compared to other network

genes (Figures S2B-C).

Page 17: Download (.38 MB )

1.2. The SL and SDL networks compared to the Protein-Protein Interaction (PPI) network

To examine the association between the Protein-Protein Interactions (PPI) and the SL networks

we extracted information regarding physical interactions from the Human Protein Reference

Database (HPRD), release 9 (Keshava Prasad et al., 2009). The PPI network contains 9,617

proteins and 39,174 interactions. When comparing between physical and SL (SDL) interactions,

we focused on 1,497 (2,083) proteins that are both in the PPI network and in the SL (SDL)

network.

First, we find that genes in the SL and SDL networks have a higher degree in the PPI network

compared to other genes, especially if their degree in the SL or SDL network is high (Wilcoxon

ranksum p-values of 2.19e-26 and 5.79e-22, respectively, Figure S2D). Likewise, the degree of a

gene in the SL or SDL network is weakly correlated to its degree in PPI sub-networks that

include only SL or SDL genes, respectively (Spearman correlation coefficients of 0.136 and

0.098, p-values of 1.34e-07 and 7.03e-06, respectively). Second, genes that interact in the SL or

SDL network are highly enriched with genes that interact in the PPI network (hypergeometric p-

values of 4.020e-07 and <1e-30, fold enrichment of 4.54 and 30.57 for the SL and SDL

networks, respectively). Next we examined if SL and SDL pairs tend to be closer in the PPI

network, though not necessarily neighbors. We computed for each SL-pair the distance between

its partners, that is, the length of the shortest path between its partners in the PPI-network. We

found that SL and SDL interacting genes are significantly closer compared to other gene-pairs

(Wilcoxon ranksum p-values of 1.79e-15 and 2.39e-14 for SL and SDL pairs, respectively).

1.3. Genes in the SL and SDL networks are associated with cancer-specific-proliferation

We examined the association of the SL and SDL network genes to cancer-specific proliferation.

To this end we utilized the cancer Proliferation Index (cPI) and non-cancerous Proliferation

Index (nPI) as reported in (Waldman et al., 2013). The cPI of a gene is based on the association

between its expression levels and growth rates measured across 60 cancer cell lines (NCI-60)

panel. Positive cPI values indicate positive association with growth rate while negative cPI

values indicate negative association with growth rate. Similarly, nPI values are based on the

association between the gene expression levels and growth rates measured across 224

lymphoblastoid cell lines.

Page 18: Download (.38 MB )

Interestingly we find that SL and SDL genes have significantly high cPI values, compared to

non-network genes, especially when considering genes with a high degree in the network

(Wilcoxon ranksum p-values of 8.08e-09, and 4.32e-36, for the SL and SDL networks,

respectively, Figure S2E). The nPI values of network genes are also higher than those of non-

network genes, though much less significantly (Wilcoxon ranksum p-values of 0.013, and 0.133,

for the SL and SDL networks, respectively, Figure S2F). These results imply that the network

genes are involved specifically in cancer proliferations.

1.4. Genes in the SL and SDL networks are overexpressed in normal tissues and in cancer

We processed gene expression profiles measured in 30 different normal human tissues (Su et al.,

2004), as previously described (Waldman et al., 2010). Analyzing these profiles we find that the

SL and SDL genes are expressed in significantly higher levels compared to other genes

(Wilcoxon ranksum p-values of 6.29e-08 and 1.30e-18, respectively, Figure S2G). Additionally,

the number of tissues in which SL and SDL genes are expressed, termed expression breadth, is

significantly high compared to other genes (Wilcoxon ranksum p-values of 9.45e-08 and 3.62e-

28, respectively, Figure S2H). Likewise, SL and SDL genes with a higher degree in the networks

have even higher expression and expression breadth (Figures S2G-H).

We then examined if SL and SDL genes are also overexpressed in cancer clinical samples. To

this end we reconstructed the networks without the TCGA data and utilized the mRNA

expression profiles of 6,296 cancer clinical samples extracted from TCGA (The Cancer Genome

Atlas Research et al., 2013). Indeed, SL and SDL genes are significantly overexpressed in cancer

clinical samples compared to other genes (Wilcoxon ranksum p-values of 3.40e-157 and 6.47e-

235, respectively). As in normal tissue, also in cancer samples the expression of genes is higher

if their degree in the SL or SDL network is higher (Figure S2I).

Lastly, the SL and SDL networks are enriched with cancer-associated genes, including:

anticancer drug targets (Knox et al., 2011), oncogenes and tumor suppressors (Chan et al., 2010;

Zhao et al., 2013), and cancer amplification and deletion drivers (Beroukhim et al., 2010) (Figure

S2J).

Page 19: Download (.38 MB )

1.5. The genomic distribution of SL and SDL pairs

We examined the distribution of the genomic distance between SL and SDL-interacting genes.

We defined the distance between two genes as the genomic distance between them in base-pairs,

if they reside on the same chromosome, and infinity otherwise. We found that 97.6% of the SL-

pairs are located on different chromosomes, and that the distances between them are significantly

high compared to randomly selected gene pairs (Wilcoxon ranksum p-value of 3.62e-11, Figure

S3A). When examining the SDL-pairs, we found the opposite behavior, 84.5% of the SDL-pairs

reside on the same chromosome, and they are significantly close compared to randomly selected

gene pairs (Wilcoxon ranksum p-value <1e-30, Figure S3B).

One of the three inference strategies of DAISY, termed genomic Survival of the Fittest (SoF),

detects SL and SDL interactions based on Somatic Copy-Number Alterations (SCNA), which can

be effected by genomic linkage. Frequent co-amplification of two genomically proximal genes A

and B can leads to over-detection of events like "A is amplified → B is not deleted" that are used

to identify SDL pairs by the SoF filter (see Figure 1 in the main text). We hence examined if the

additional filters DAISY applies manage to prevent it from falsely detecting gene-pairs as SDLs

merely due to their genomic proximity.

First, we conducted an operative test in which we compared the SDL-network to alternative

SDL-networks in which the problem of false-positive detection due to genomic proximity is

alleviated. We constructed 11 such networks: (1) a network that is based on the two other

inference procedures without the SoF approach, and (2) 10 networks that were constructed under

an increasing cutoff that defines the minimal allowed genomic location distance between a pair

in the network, starting from 10%, up to 100% of the average chromosome length. Based on

each one of these networks we then predicted drug response, and examined the predictions

according to the CGP data (Garnett et al., 2012) and the CTRP data (Basu et al., 2013). The

predictive signal of the original SDL-network reported in the main text is significantly superior

to the signal obtained by these alternative networks (Figures S3C-D).

Second, in light of the strong predictive signal displayed we examined if SDL-interactions have a

true tendency towards genomic proximity. To this end we examined three SDL-networks that

were constructed based only on the shRNA-based functional examination approach (Figure 1,

Page 20: Download (.38 MB )

Experimental Procedures), by using one of the three shRNA screens (Cheung et al., 2011; Luo et

al., 2008; Marcotte et al., 2012), and an additional SDL-network that we constructed from the

shRNA and gene expression data but again without using the SoF approach and copy-number

data. In three out of these four SDL-networks (that are devoid of the potential false positive bias

introduced by considering copy number data), SDL-pairs still tend to be significantly more

closely located on the genome than random gene pairs (Wilcoxon ranksum p-values of 4.36e-15,

5.26e-03, and 0.320 for the three shRNA-based networks, and a p-value of 1.80e-219 for the

SDL-network constructed independently of the SoF approach).

These finding support the notion that functionally meaningful SDL-pairs are more closely

located on the genome, an interesting observation whose investigation is beyond the scope of the

current study. Taken together, and especially in light of their operational utility, we did not filter

out SDL-interactions based on their genomic proximity.

2. Harnessing the SL-network to predict gene essentiality in cancer cell lines

2.1. Gene essentiality is cancer cell line specific

We quantified the extent to which gene essentiality in cancer is cell line dependent, and hence

potentially arising from synthetic lethality. We computed for each gene the number of cell lines

in which it was found to be essential, according to two shRNA screens (Cheung et al., 2011;

Marcotte et al., 2012). Among the genes which are essential in at least one cell line, the majority

is essential only in a few cell lines, and not across the board (Figure S4A). Interestingly, the gene

essentiality distribution has a power-law distribution.

Page 21: Download (.38 MB )

2.2. Robustness analysis of SL-based essentiality prediction

To apply the SL-network for predicting gene essentiality in a cell line specific manner we

devised an approach that depends on two parameters: Deletioncutoff and SLessentialitycutoff. The

former denotes the SCNA level under which an underexpressed gene is considered inactive, and

the latter denotes the number of inactive SL-partners required to deduce that a gene is essential

(Extended Experimental Procedures). We applied this approach to predict gene essentiality based

on the SL-network (that was constructed without shRNA data) in overall 129 different cancer

cell lines, and examined the predictions based on the results obtained in two large-scale screens

(Cheung et al., 2011; Marcotte et al., 2012).

In main text we report the results obtained with a Deletioncutoff of -0.1 and an SLessentialitycutoff

of 1. However, we examined the network performances across a broad range of parameters. We

set the Deletioncutoff and SLessentialitycutoff parameters to 10 different values each, ranging from -

0.1 to -1, and from 1-10, respectively. In each setting we characterized the predictive signal of

the network by the four empirical p-values as described in the Extended Experimental

Procedures. A full report of the results obtained by each one of the 100 settings is given in Table

S5. Overall, we find that the network prediction performance is highly robust across a fairly

broad range of definitions (Table S5). However, the more stringent the gene loss and essentiality

definitions are, fewer predictions could be made for more genetically stable cell lines. Likewise,

genes that have a number of SL-partners that is below the SLessentialitycutoff parameter could not

have been predicted as essential in any cell line, regardless of the genomic profiles of the cell

lines. Below we discuss the Deletioncutoff and SLessentialitycutoff parameters and the tradeoff

between them.

The SCNA level of a gene is the observed vs. expected number of copies it has in a given

sample, on a log2 scale. Hence, if the reference state has two copies of a given gene, a SCNA

level of -1 is equivalent to a heterozygous loss of a gene, meaning, one copy. It should be noted,

that SCNA data is measured at the population-level, and hence contains the average SCNA level

of a given gene in a population of cells. If the sample is contaminated with normal cells, the copy

number of the cancer cells will be more extreme, that is, the SCNA level of the cancer cells will

be higher or lower if the measured SCNA level is positive or negative, respectively.

Page 22: Download (.38 MB )

A full deletion of a gene is a rare event − in 78.4% of the cancer SCNA profiles we analyzed

there is not a single gene with a SCNA level lower than -1 (Beroukhim et al., 2010). We

therefore tested several, more moderate, definitions of gene loss (setting the Deletioncutoff to 10

different values ranging from -0.1 to -1). As gene deletion was defined more permissively, one

(partially) deleted SL-partner may not be sufficient to render a gene essential. Hence, we

examined several thresholds on the number of inactive SL-partners that are required to make a

target gene essential (setting the SLessentialitycutoff parameter to 10 different values, ranging from

1-10).

2.3. The prediction-signal and genomic instability

It is more likely that the essentiality of more genes will arise due to synthetic lethality rather than

due to other unrelated causes in cell lines with many inactive genes. Hence, we postulated that

the SL-network will obtain more accurate gene-essentiality-predictions for cell lines with a

higher number of inactive genes. To examine this hypothesis, we computed the Spearman

correlation across all cell lines between the fraction of inactive genes and the prediction-p-

values. The latter were computed as explained in the Extended Experimental Procedures.

We find a significant negative correlation between the fraction of inactive genes in the cell lines

and their prediction-p-values, especially under more stringent Deletioncutoff and SLessentialitycutoff

definitions (Table S5). Hence, the more inactive genes the cell line has, the better the SL-

network predicts its essential genes (Figures S4B-C).

2.4. Comparison to the prediction-signal of a yeast-derived SL-network

We repeated the gene essentiality predictions, with the yeast-derived SL-network, originally

termed the inferred Human SL Network (iHSLN) (Conde-Pueyo et al., 2009), and evaluated the

predictions as described in the Extended Experimental Procedures. The results obtained by the

DAISY-derived-SL-network are significantly superior to those obtained by the iHSLN (Figures

S4D-E, Table S5).

Page 23: Download (.38 MB )

3. Experimentally validating the SL-based predictions of gene essentiality in a

breast cancer cell line

To further examine the gene essentiality predictions obtained based on the SL-network we

conducted a whole genome siRNA screen in the triple negative cell line BT549 under normoxia

and hypoxia (Table S6). The gene essentiality of BT549 has been previously measured via

shRNA (Marcotte et al., 2012). Hence, we could examine the concordance between our

predictions and the experimental screens in comparison to the concordance between the two

experimental screens to each other. We predicted gene essentiality in BT549 by training an SL-

based neural-network model on the gene essentiality reported in the Macrotte screen after

omitting any information regarding BT549 (Extended Experimental Procedures).

The genes that were found as essential in BT549 according to the two experimental screens

significantly overlap, especially when applying a strict definition of gene essentiality (Figures

S4F-G). Notably, the fit between the SL-based gene essentiality predictions and the

experimentally identified gene essentiality is of similar magnitude to the fit between the two

experimental screens (Figures S4F-G). Testifying to the veracity of the predictions, the highest

observed overlap is between the SL-based predictions and the genes that were found as essential

in all screens (hypergeometric p-value of 2.46e-41, Figures S4F-G).

We then compared the predictive value of the SL-based predictions to the predictive value of the

experimental screens. To this end we defined four sets of essential genes in BT549:

1. Essnormxia – The top 10% essential genes according to the siRNA screen conducted under

normoxia.

2. Esshypoxia – The top 10% essential genes according to the siRNA screen conducted under

hypoxia.

3. EssMarcotte – The top 10% essential genes in BT549 according to the shRNA screen (Marcotte

et al., 2012).

4. Essconfident – the intersection between Essnormxia, Esshypoxia, and EssMarcotte.

We also defined four competing predictors of gene essentiality:

Page 24: Download (.38 MB )

1. PsiRNA – the results obtained in the siRNA screen conducted under normoxia (Extended

Experimental Procedures).

2. PshRNA – the results obtained in the shRNA screen for BT549 (Marcotte et al., 2012)

(Extended Experimental Procedures).

3. PSL_Macrotte – the SL-based predictor that was obtained by training the neural network model

on the gene essentiality of other cancer cell lines (not BT549) as reported in the Marcotte

screen (Marcotte et al., 2012).

4. PSL_Achilles – the SL-based predictor that was obtained by training the neural network model

on the gene essentiality of other cancer cell lines as reported in the Achilles screen (Cheung

et al., 2011).

We examined the ability of each predictor to predict gene essentiality as defined by each of the

gene essentiality sets. The SL-based predictors – PSL_Macrotte and PSL_Achilles – predict Essnormxia and

Esshypoxia in the same manner as PshRNA (AUC ~0.6-0.65, Figures S4H-I). PSL_Macrotte improves

upon PsiRNA in predicting EssMarcotte (AUCs of 0.842 and 0.625, respectively, Figure S4J). Lastly,

the SL-based predictors – PSL_Macrotte and PSL_Achilles – obtain the highest AUCs when predicting

Essconfident (AUCs of 0.951 and 0.682, Figure S4K).

Next, we utilized the SL-network to predict gene essentiality in BT549 in an unsupervised

manner, meaning, without learning from experimental gene essentiality measurements (Extended

Experimental Procedures). Genes that were predicted to be essential in BT549 are indeed

enriched with the top 10% of essential genes according to the experimental screens

(hypergeometric p-values of 6.88e-12, 3.04e-08, and 1.46e-08, for Essnormxia, Esshypoxia and

EssMarcotte, respectively). Reassuringly the genes that were predicted as essential are most

significantly enriched with Essconfident (hypergeometric p-value of 3.74e-13).

4. Utilizing the SDL-network to predict drug response

4.1. Self-SDL-interactions

Unlike SL-interactions in the case of SDL-interactions a gene can potentially be an SDL-partner

of itself. Such an interaction implies that over-activation of this gene also induces its essentiality.

This phenomenon is quite frequent in cancer cells, and is termed oncogene addiction (Weinstein

and Joe, 2008). The SDL-network includes 534 inner loops, that is, self-interacting genes. To

Page 25: Download (.38 MB )

assess the significance of these inner loops we performed the unsupervised predictions of drug

response reported in the main text based on the SDL-network with and without self-interactions,

and based only on the self-interactions (Table S8). The overall number of significantly predicted

drugs (Wilcoxon ranksum p-value <0.05) is 21, 17, and 9, when utilizing the SDL-network with

and without self-interactions, and based only on the self-interactions, respectively. As self

interactions improve the prediction performances of the network, we chose to retain them.

However, for many drugs self-interactions alone are insufficient to explain the response to the

drug.

4.2. Robustness analysis of the unsupervised SDL-based drug response predictions

We utilized the SDL-network to predict drug-efficacy in an unsupervised manner. As for the

SL-based prediction of gene essentiality, the prediction is based on two parameters:

Overexpressioncutoff and SDLessentialitycutoff (Extended Experimental Procedures). We repeated

the drug efficacy predictions with different definitions of gene overexpression

(Overexpressioncutoff) and gene essentiality (SDLessentialitycutoff), ranging from 50-90 and 1-5,

respectively. The predictive-signal obtained under each one of the different 25 settings is

reported in Table S8. The prediction-signal is highly robust across a fairly broad range of

definitions. However, when employing more stringent gene essentiality definitions

(SDLessentialitycutoff), we could not predict the response to drugs whose targets have a low

number of SDL-interactions.

4.3. shRNA-based functional examination improves SDL-based drug response predictions

To examine the role of the shRNA-based functional examination in the identification of SDL-

interactions we generated an SDL-network without accounting for shRNA data, and utilized it to

predict drug response in an unsupervised manner. The performances of the resulting network in

drug response prediction compared to the performances of the original SDL-network

demonstrate that the inclusion of shRNA data boosts the predictability of the SDL-network

(Figure S7, Table S8). 

4.4. Comparing the SDL-based drug response predictors to mutation and genomic

instability based predictors

Page 26: Download (.38 MB )

We have shown that the SDL-network enables to accurately predict the response of cancer cell

lines to various drugs. To further examine the quality of the SDL-based predictors in a

comparative manner we predicted drug response based on two other well established approaches,

and compared their performance to those obtained by the SDL-predictors.

The first approach is based on the notion that the mutation and copy-number status of the drug

target can be utilized to predict the drug response in cancer. Implementing this approach we

utilized the mutation status and SCNA level of the drug targets, extracted from (Barretina et al.,

2012; Garnett et al., 2012). We first obtain for each drug its single target predictors, each

accounts for one of the drug targets and predicts cell lines in which this specific target is

amplified or mutated (in a missense mutation) as sensitive. For each drug we then consider the

status of all of its targets by generating the following three predictors:

1. The best single target predictor – the predictor that predicted most accurately the

response to the drug (according to a Wilcoxon ranksum test that compared the observed

efficacy of the drug in the predicted sensitive and resistant cell lines).

2. A voting predictor that predicts a cell line to be sensitive to a drug if the majority of the

drug's single target predictors predicted it to be sensitive.

3. A combined predictor that predicts a cell line to be sensitive to a drug if at least one of

the drug's single target predictors predicted it to be sensitive.

The SDL-predictor obtains more accurate predictions for 62.5%, 64.3%, and 78.6% of the drugs

when compared to each of the predictors described in (1)-(3), respectively. Likewise, if

considering only drugs that are significantly (p-value < 0.05) predicted by at least one of the

predictors, the SDL-predictor improves upon the competing predictors in (compared to (1))

81.5%, (2) 77.8% and (3) 88.9% of the drugs. Lastly, while the SDL-predictor obtains significant

predictions for 22 drugs, the competing predictors obtain significant predictions for only (1) 9,

(2) 4, and (3) 4 drugs.

The second approach is based on the concept that genomic instability induces drug resistance. To

predict drug response according to genomic instability we computed the genomic instability

index of each cell line based on its SCNA profile, as previously described (Bilal et al., 2013). We

defined a cell line as genomically stable or unstable if its genomic instability index is below or

Page 27: Download (.38 MB )

above the median genomic instability index across the cell lines, respectively. As expected, we

found that the stable cells were more sensitive to the majority of drugs compared to the unstable

cells. However, the SDL-predictor improves upon the genomic-instability predictor in 38

(67.9%) out of the 56 drugs. When considering only the 27 drugs that are significantly predicted

by one of the predictors, the SDL-predictor obtains more accurate predictions for 20 (74.1%)

drugs. Lastly, while the SDL-predictor obtains significant predictions for 22 drugs, the genomic

instability predictor obtains significant predictions for only 9 drugs.

5. Predicting drug-response based on SL-interactions

The SL-network does not enable to accurately predict the response of cancer cell lines to the

administration of different anticancer drugs (data not shown). This is possibly since anticancer

drugs usually target oncogenes, whose essentiality is mainly dictated by other types of genetic

interactions, as SDL-interactions. Supporting this claim, the SL-network predicts best the

response to a PARP1 inhibitor (ABT-888, one-sided Wilcoxon ranksum p-value 0.046, CGP

data), which is one of the few anticancer drug that currently relies on synthetic lethality. For

comparison, as PARP1 is synthetically lethal with BRCA1/2 (Lord et al., 2008; Turner et al.,

2008), we divided the CGP cell lines according to their BRCA1/2 mutation-status and predicted

that the mutated cell lines will be sensitive to PARP-inhibition. We then compared the IC50

values of ABT-888 in the predicted sensitive and in the predicted resistant cell lines via a one-

sided Wilcoxon ranksum, and obtained a p-value of 0.889. We also used the SCNA and mRNA

levels of the BRCA genes to deduce which cell lines have an inactive form of BRCA1/2. When

predicting these cell lines as sensitive we obtained a one-sided Wilcoxon ranksum p-value 0.902.

Hence, the SL-based predictions of the response to PARP1 inhibition improve upon those

obtained by accounting for the well-established SL-interactions between the BRCA genes and

PARP1.

Supplemental References

Benjamini, Y., and Hochberg, Y.  (1995). Controlling  the False Discovery Rate: A Practical and Powerful 

Approach  to Multiple Testing.  Journal of  the Royal Statistical Society Series B  (Methodological) 57, 

289‐300. 

Page 28: Download (.38 MB )

Bland, J.M., and Altman, D.G. (2004). The logrank test. BMJ 328, 1073. 

Chan,  H.‐H.,  Tsai,  S.‐J.,  and  Sun,  H.S.  (2010).  Tumor  Associated  Gene  database 

(http://www.binfo.ncku.edu.tw/TAG/GeneDoc.php). 

Georgi, B., Voight, B.F., and Bućan, M. (2013). From Mouse to Human: Evolutionary Genomics Analysis 

of Human Orthologs of Essential Genes. PLoS Genet 9, e1003484. 

Kasprzyk, A. (2011). BioMart: driving a paradigm change in biological data management. Database 2011. 

Keshava Prasad, T.S., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla, 

D., Raju, R.,  Shafreen, B., Venugopal, A.,  et al.  (2009). Human  Protein Reference Database—2009 

update. Nucleic Acids Research 37, D767‐D772. 

Knox, C., Law, V., Jewison, T., Liu, P., Ly, S., Frolkis, A., Pon, A., Banco, K., Mak, C., Neveu, V., et al. (2011). 

DrugBank 3.0: a comprehensive resource for  ‘Omics’ research on drugs. Nucleic Acids Research 39, 

D1035‐D1041. 

Mosteller, F., and Fisher, R.A. (1948). Questions and Answers. The American Statistician 2, 30‐31. 

Su, A.I., Wiltshire, T., Batalov, S.,  Lapp, H., Ching, K.A., Block, D., Zhang,  J., Soden, R., Hayakawa, M., 

Kreiman, G., et al.  (2004). A gene atlas of the mouse and human protein‐encoding  transcriptomes. 

Proceedings of the National Academy of Sciences of the United States of America 101, 6062‐6067. 

Waldman, Y.Y., Tuller, T., Shlomi, T., Sharan, R., and Ruppin, E. (2010). Translation efficiency in humans: 

tissue specificity, global optimization and differences between developmental stages. Nucleic Acids 

Research 38, 2964‐2974. 

Weinstein, I.B., and Joe, A. (2008). Oncogene Addiction. Cancer Research 68, 3077‐3080. 

Zhao, M., Sun, J., and Zhao, Z. (2013). TSGene: a web resource for tumor suppressor genes. Nucleic Acids 

Research 41, D970‐D976. 

Page 29: Download (.38 MB )

Table S1. Data Description, Related to Figure 1 

Type  Data type  Additional data  No. clinical 

samples 

Reference 

Clinical 

samples 

SCNA  ‐‐  2,201  (Beroukhim et al., 2010) 

SCNA 

mRNA and 

mutations 

‐‐  6,296 (2,978 with 

mutation data) 

The Cancer Genome Atlas (TCGA) 

(The Cancer Genome Atlas 

Research et al., 2013) 

Cancer 

cell lines 

SCNA  ‐‐  591  (Beroukhim et al., 2010) 

SCNA & 

mRNA 

‐‐  995  The Cancer Cell Line Encyclopedia 

(CCLE) (Barretina et al., 2012) 

mRNA  ‐‐  790   CPG (Garnett et al., 2012) 

mRNA  ‐‐  997  CCLE (Barretina et al., 2012) 

shRNA  SCNA and mRNA 

profiles (Barretina 

et al., 2012) 

92  Achilles (Cheung et al., 2011) 

shRNA  SCNA and mRNA 

profiles (Barretina 

et al., 2012) 

46  (Marcotte et al., 2012) 

shRNA  SCNA profiles 

(Beroukhim et al., 

2010) 

9  (Luo et al., 2008)