a meta-learning approach for genomic survival analysis · 21/04/2020  · a meta-learning approach...

43
A meta-learning approach for genomic survival analysis Yeping Lina Qiu 1,2 , Hong Zheng 2 , Arnout Devos 3 , Olivier Gevaert 2,4,* 1 Department of Electrical Engineering, Stanford University 2 Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University 3 School of Computer and Communication Sciences, Swiss Federal Institute of Technology Lausanne (EPFL) 4 Department of Biomedical Data Science, Stanford University * To whom correspondence should be addressed: [email protected] Abstract RNA sequencing has emerged as a promising approach in cancer prognosis as sequencing data becomes more easily and affordably accessible. However, it remains challenging to build good predictive models especially when the sample size is limited and the number of features is high, which is a common situation in biomedical settings. To address these limitations, we propose a meta-learning framework based on neural networks for survival analysis and evaluate it in a genomic cancer research setting. We demonstrate that, compared to regular transfer- learning, meta-learning is a significantly more effective paradigm to leverage high-dimensional data that is relevant but not directly related to the problem of interest. Specifically, meta-learning explicitly constructs a model, from abundant data of relevant tasks, to learn a new task with few samples effectively. For the application of predicting cancer survival outcome, we also show that the meta- learning framework with a few samples is able to achieve competitive performance with learning from scratch with a significantly larger number of samples. Finally, we demonstrate that the meta-learning model implicitly prioritizes genes based on their contribution to survival prediction and allows us to identify important pathways in cancer. 1 Introduction Cancer is a leading cause of death in the world. Accurate prediction of its survival outcome has been an interesting and challenging problem in cancer research over the past decades. Quantitative methods have been developed to model the relationship between multiple explanatory variables and survival outcome, including fully parametric models (Hosmer et al., 2008; Klein and Moeschberger, 2006) and semi-parametric models such as the Cox proportional hazards model (Cox, 1972). The Cox model makes a parametric assumption about how the predictors affect the hazard function, but makes no assumption about the baseline hazard function itself (Harrell et al., 1982). In most real world scenarios, the form of the true hazard function is either unknown or too complex to model, making the Cox model the most popular method in survival analysis (Kleinbaum and Klein, 2012). In clinical practice, historically, survival analysis has relied on low-dimensional patient characteristics such as age, sex, and other clinical features in combination with histopathological evaluations such as stage and grade (Louis et al., 2016). With advances in high-throughput sequencing technology, a greater amount of high-dimensional genomic data is now available and more molecular biomarkers can be discovered to determine survival and improve treatment. With the cost of RNA sequencing coming down significantly, from an average of $100M per genome in 2001 to $1k per genome in . CC-BY-NC-ND 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918 doi: bioRxiv preprint

Upload: others

Post on 06-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

A meta-learning approach for genomic survivalanalysis

Yeping Lina Qiu1,2, Hong Zheng2, Arnout Devos3, Olivier Gevaert2,4,∗1Department of Electrical Engineering, Stanford University

2Stanford Center for Biomedical Informatics Research, Department of Medicine,Stanford University

3School of Computer and Communication Sciences,Swiss Federal Institute of Technology Lausanne (EPFL)

4Department of Biomedical Data Science, Stanford University∗To whom correspondence should be addressed: [email protected]

Abstract

RNA sequencing has emerged as a promising approach in cancer prognosis assequencing data becomes more easily and affordably accessible. However, itremains challenging to build good predictive models especially when the samplesize is limited and the number of features is high, which is a common situationin biomedical settings. To address these limitations, we propose a meta-learningframework based on neural networks for survival analysis and evaluate it in agenomic cancer research setting. We demonstrate that, compared to regular transfer-learning, meta-learning is a significantly more effective paradigm to leveragehigh-dimensional data that is relevant but not directly related to the problem ofinterest. Specifically, meta-learning explicitly constructs a model, from abundantdata of relevant tasks, to learn a new task with few samples effectively. For theapplication of predicting cancer survival outcome, we also show that the meta-learning framework with a few samples is able to achieve competitive performancewith learning from scratch with a significantly larger number of samples. Finally,we demonstrate that the meta-learning model implicitly prioritizes genes basedon their contribution to survival prediction and allows us to identify importantpathways in cancer.

1 Introduction

Cancer is a leading cause of death in the world. Accurate prediction of its survival outcome hasbeen an interesting and challenging problem in cancer research over the past decades. Quantitativemethods have been developed to model the relationship between multiple explanatory variables andsurvival outcome, including fully parametric models (Hosmer et al., 2008; Klein and Moeschberger,2006) and semi-parametric models such as the Cox proportional hazards model (Cox, 1972). TheCox model makes a parametric assumption about how the predictors affect the hazard function, butmakes no assumption about the baseline hazard function itself (Harrell et al., 1982). In most realworld scenarios, the form of the true hazard function is either unknown or too complex to model,making the Cox model the most popular method in survival analysis (Kleinbaum and Klein, 2012).

In clinical practice, historically, survival analysis has relied on low-dimensional patient characteristicssuch as age, sex, and other clinical features in combination with histopathological evaluations suchas stage and grade (Louis et al., 2016). With advances in high-throughput sequencing technology, agreater amount of high-dimensional genomic data is now available and more molecular biomarkerscan be discovered to determine survival and improve treatment. With the cost of RNA sequencingcoming down significantly, from an average of $100M per genome in 2001 to $1k per genome in

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 2: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

2015 (Park and Kim, 2016), it is becoming more feasible to use this technology to prognosticate.Such genomic data often has tens of thousands of variables which requires the development of newalgorithms that work well with data of high dimensionality.

To address these challenges, several implementations of regularized Cox models have been proposed(Goeman, 2010; Park and Hastie, 2007; Wong et al., 2018). A regularized model adds a modelcomplexity penalty to the Cox partial likelihood to reduce the chance of overfitting. More recently,the increasing modeling power of deep learning networks has aided in developing suitable survivalanalysis platforms for high dimensional feature spaces. For example, autoencoder architectureshave been employed to extract features from genomic data for liver cancer prognosis prediction(Chaudhary et al., 2018). The Cox model has also been integrated in a neural network setting to allowgreater modeling flexibility (Cheerla and Gevaert (2019); Ching et al. (2018); Luck et al. (2017);Yousefi et al. (2017).

In studying a specific rare cancer’s survival outcome, one interesting problem is whether it is possibleto make use of the abundant data that is available for more common relevant cancers and leveragethat information to improve the survival prediction. This problem is commonly approached withtransfer-learning (Pratt, 1993), where a model which has been trained on a single task (e.g., 1 ormore abundant cancers) is used to fine-tune on a related target task (rare cancer). In survival analysis,transfer-learning has shown to significantly improve prediction performance (Li et al., 2016). Deepneural networks used to analyze biomedical imaging data can also take advantage of informationtransfer from data in other settings. For example, multiple studies show that convolutional neuralnetworks pretrained on ImageNet data can be used to build performant survival models with histologyimages (Deng et al., 2009; Mobadersany et al., 2018).

In this context, meta-learning is an area in deep learning research that has gained much attention inrecent years which addresses the problem of “learning to learn” (Finn et al., 2017; Vilalta and Drissi,2002). A meta-learning model explicitly learns to adapt to new tasks quickly and efficiently, usuallywith a limited exposure to the new task environment. Such a framework may potentially adapt betterthan the traditional transfer-learning setting where there is no explicit adaptation incorporated in thepre-training algorithm. This problem setting with limited exposure to a new task is also known asfew-shot learning: learn to generalize well, given very few examples (called shots) of a new task (Liet al., 2016). Recent advances in meta-learning have shown that, compared to transfer-learning, it is amore effective approach to few-shot classification (Chen et al., 2017; Devos and Grossglauser, 2019),regression (Finn et al., 2017), and reinforcement learning (Duan et al., 2016; Finn et al., 2017). In thisstudy, we propose a meta-learning framework based on neural networks for survival analysis appliedin a cancer research setting. Specifically, for the application of predicting survival outcome, wedemonstrate that our method is a preferable choice compared to regular transfer-learning pre-trainingand other competing methods on three cancer datasets when the number of training samples from thespecific target cancer is very limited. Finally, we demonstrate that the meta-learning model implicitlyprioritizes genes based on their contribution to survival prediction and allows us to uncover biologicalpathways associated with cancer survival outcome.

2 Materials and methods

2.1 Datasets

We use the RNA sequencing data from The Cancer Genome Atlas (TCGA) pan-cancer datasets(Tomczak et al., 2015). We remove the genes with NA values and normalized the data by logtransformation and z-score transformation. The feature dimension is 17176 genes after preprocessing.The data contains 9707 samples from 33 cancer types. The outcome is the length of survival time inmonths. 78% of the patients are censored, which means that the subject leaves the study before anevent occurs or the study terminates before an event occurs to the subject.

2.2 Meta-learning

Our proposed survival prediction framework is based on a neural network extension of the Coxregression model that relies on semi-parametric modeling by using a Cox loss (Ching et al., 2018).The model consists of two modules: the feature extraction network and the Cox loss module (Figure1). We use a neural network with two hidden layers to extract features from the RNA sequencing

2

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 3: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

data input, which yields a lower dimensional feature vector for each patient. The features are thenfed to the Cox loss module, which performs survival prediction by doing a Cox regression with thefeatures as linear predictors of the hazard (Cox, 1972). The parameters of the Cox loss module β areoptimized by minimizing the negative of the partial log-likelihood:

L(β) = −∑

Yiuncensored

Ziβ − log

∑Yi≥Yi

eZjβ

(1)

where Yi is the survival length for patient i, Zi is the extracted feature for patient i, and β is thecoefficient weight vector between the feature and the output. Since Zi is the output of the featureextraction module, it can be further represented by:

Zi = fϕ (Xi) (2)

where Xi is the input predictor of patient i, f denotes a nonlinear mapping the neural network learnsto extract features form the predictor, and ϕ denotes the model parameters including the weightsand biases of each neural network layer. The feature extraction module parameters ϕ and Cox lossmodule parameters β are jointly trained in the model. For convenience in the following discussionwe denote the combined parameters as θ.

The optimization of parameters θ consists of two stages: a meta-learning stage, and a final learningstage. The meta-learning stage is the key process, where the model aims to learn a suitable parameterinitialization for the final learning stage, so that during final learning the model can adapt efficientlyto previously unseen tasks with a few training samples (Nichol et al., 2018). In order to reach thedesired intermediate state, a first order gradient-based meta-learning algorithm is used to train thenetwork during the meta-learning stage (Finn et al., 2017; Nichol et al., 2018).

Specifically, at the beginning of meta-learning training, the model is randomly initialized withparameter θ. Consider that the training samples for the meta-learning stage consist of n tasks Tτ ,τ = 1, 2 . . . n. A task is defined as a common learning theme shared by a subgroup of samples.Concretely, these samples come from a distribution on which we want to carry out a classificationtask, regression task or reinforcement learning task. The algorithm continues by sampling a task Tτand using samples of Tτ to update the inner-learner. The inner-learner learns Tτ by taking k steps ofstochastic gradient descent (SGD) and updating the parameters to θkτ :

θ0τ = θ

θ1τ ← θ0τ − αL′τ,0(θ0τ)

θ2τ ← θ1τ − αL′τ.1(θ1τ)

. . .

θkτ ← θk−1τ − αL′τ,k−1(θk−1τ

)(3)

where θkτ is the model parameter at step k for learning task τ , Lτ,k−1 is the loss computed on the kth

minibatch sampled from task τ , Lτ,1 is the loss computed on the second minibatch sampled fromtask τ and so on. The ‘prime’ symbol denotes differentiation, and α is the inner learner step size.Note that this learning process is the separate for all tasks, starting from the same initialization θ.

After an arbitrary m number of tasks are independently learnt by the above k-step SGD process, andobtaining θkτ , τ =1, 2, . . . m, we make one update across all tasks with the meta-learner to get a betterinitialization θ:

θ ← θ + γ1

m

m∑τ=1

(θkτ − θ) (4)

where γ is the learning step of the meta-learner. The term 1m

∑mτ=1(θ

kτ − θ) can be considered as a

gradient, so that for example a popular optimization algorithm like Adam (Kingma and Ba, 2014)can be used by the meta-learner to self-adjust learning rates for each parameter. The entire process

3

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 4: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Input

Cox Loss

Features Predicted Hazard

Feature Extraction Module

CoxLoss

Module

Figure 1: Schematic showing the survival prediction model architecture

of inner-learner update and meta-learner update is repeated until a chosen maximum number ofmeta-learning epochs is reached. This algorithm is shown to encourage the gradients of differentminibatches of a given task to align in the same direction, thereby improving generalization andefficient learning later on (Nichol et al., 2018).

In the final learning stage, the model is provided with a few-sample dataset of a new task. First,the model is initialized with the meta-learnt parameters θ, which are then fine-tuned with the newtask training data to θk

τ and finally the fine-tuned model is evaluated with testing data from thenew task. The training procedure of final learning does not require a special algorithm, and can beconducted with regular mini-batch stochastic gradient descent. This final learning stage is equal tothe inner-learning loop for a single task in Equation (3) without any outer loop.

2.3 Experimental setup

In order to assess the meta-learning method’s performance, we compare it with several alternativetraining schemes based on the same neural network architecture: regular pre-training, combinedlearning, and direct learning. First, meta-learning initially learns general knowledge from a datasetcontaining tasks that are relevant but not directly related to the target, and then learns task-specificknowledge from a very small target task dataset. We define the first dataset as the “multi-task trainingdata”, and the second as the “target task training data”. Secondly, regular pre-training also has atwo-stage learning process on the same datasets, but unlike meta-learning without explicitly focusingon learning to reach an initialization that is easy to adapt to new tasks. Thirdly, combined learningdoes not involve a two-stage learning process, but also leverages knowledge from the relevant tasksby combining the multi-task training data and the target task training data together in one dataset totrain a prediction model. Direct learning on the other hand, only uses the target task training samples.To illustrate the effectiveness of the methods with few samples, we consider three cases of directlearning: a large sample size, a medium sample size, and a small sample size which is the samesize as the “target task training data” used for the other methods (i.e. regular pre-training, combinedlearning and meta-learning) (Figure 2).

In our experiments, the “multi-task training data” is the pan-cancer RNA sequencing data containingsamples from any cancer sites except one cancer site that we define as the target cancer site. Theassociated target cancer data is considered as the “target task data”. This target task dataset is splitinto training data (80%) and testing data (20%), stratified by disease sub-type and censoring status.

4

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 5: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Meta Learn

Final Learn

Pre-train

Fine-tune

Multi-task training data

Target task training data(n=20)

Meta-learning

Combined Learn

Combined Learning

Target task testing data(20% of target task data)

25trials

25trials

25trials

Target task training data(n=150)

Target task training data(n=250)

25trials

Direct Learn

Direct Learning

Target task data

Regular Pre-training

Figure 2: Data flow for meta-learning, regular pre-training, and combined learning frameworks

For meta-learning, regular pre-training, and combined learning we will not use all 80% of the trainingset for the target task, as we want to assess the performances when the algorithm is exposed to only asmall number of target task training samples. Therefore, we will randomly draw 20 samples from thetraining dataset as one “target task training data”. We choose a small sample size of 20 because it isa possible case in real life situations where the target task is the study of rare diseases (Hee et al.,2017), or where new technologies are used to produce data which only have the capacity to produce asmall sample. For direct learning, we randomly draw three different sizes from the training datasetsto form training data, 20 for the small size, 150 for the medium size, and 250 for the large size. Allmethods are evaluated on the common testing data of the target task.

Finally, as a linear baseline, we use the combined learning training sample (multi-task training dataand target task training data) to train a linear cox regression model. We conduct 25 experiment trialsfor each method, where each trial is trained with a randomly drawn “target task training dataset” asdescribed above.

2.4 Evaluation

We evaluate the survival prediction model performances with two commonly used evaluation metrics:the concordance index (C-index) (Harrell et al., 1982) and the integrated Brier score (IBS) (Brier,1950). Fistly, the C-index is a standard performance measure of a model’s predictive ability in survivalanalysis. It is calculated by dividing the number of all pairs of subjects whose predicted survival timesare correctly ordered, by the number of pairs of subjects that can possibly be ordered. A pair cannotbe ordered if the earlier time in the pair is censored or both events in the pair are censored. A C-indexvalue of 1.0 indicates perfect prediction where all the predicted pairs are correctly ordered, and avalue of 0.5 indicates random prediction. Secondly, the integrated Brier score is used to evaluate theerror of survival prediction and is represented by the mean squared differences between observedsurvival status and the predicted survival probability at a given time point. The IBS provides anoverall calculation of the model performance at all available times. An IBS value of 0 indicatesperfect prediction, while 1 indicates entirely inaccurate prediction.

We select cancer sites from TCGA with the following two inclusion criteria: (1) a minimum of 450samples, providing enough training samples for different benchmarking training schemes and (2)a minimum of 30% non-censoring samples, enabling more accurate evaluation than more heavilycensored cohorts. This results in the following cancers: glioma, including glioblastoma (GBM) and

5

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 6: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Hyper-parameter ValueTask-level optimizer SGDTask-level learning rate 0.01Task-level gradient steps 5Task-level Batch size 100Meta-level optimizer ADAMMeta-level learning rate 0.0001Meta-level tasks batch size 10L2 regularization scale 0.1

Table 1: Selected hyper-parameters for meta-learning’s meta-learning stage

low-grade-glioma (LGG); non-small cell lung cancer, including lung adenocarcinoma (LUAD) andlung squamous cell carcinoma (LUSC), and head-neck squamous cell carcinoma (HNSC). In addition,these three types of cancers are of clinical interest, because gliomas are the most common type ofmalignant brain tumor, and lung cancer is the deadliest cancer in the world (Ceccarelli et al., 2016;Herbst and Lippman, 2007). HNSC, on the other hand, is a less widely studied type of cancer, whichnonetheless attracted growing attention in the recent decade since the release of the publicly availablelargest dataset in HNSC by TCGA (Brennan et al., 2017; Tonella et al., 2017).

We evaluate the C-index and IBS in 25 experimental trials for each method.

2.5 Hyper-parameter selection

To avoid overfitting, we do not conduct a separate hyper-parameter search for each of the cancerdatasets. Instead, we search for hyper-parameters on one type of cancer and apply the chosenparameters to all experiments. We select the largest cancer cohort, glioma, and use 5-fold cross-validation for hyper-parameter selection. For each given set of hyper-parameters, we average theresults from five validation sets (each is 20% of training data). Since there is similarity in thealgorithm between methods (combined learning, direct learning, and regular pre-training), we sharehyper-parameters between experiments when it makes sense, as detailed below.

All methods use the same neural network architecture with two hidden fully connected layers of size6000 and 2000, and an output fully connected feature layer of size 200. Each layer uses the ReLUactivation function (Nair and Hinton, 2010). Initially we experiment with 4 different structures: 1 or2 hidden layers with feature size of 200 or 50, respectively. We chose the optimal structure detailedbefore and use it as the architecture for all methods in our subsequent discussion.

For the regular pre-training model, we search for hyper-parameters for the pre-training stage andfine-tuning stage separately. For both stages, we test the mini-batch gradient descent and Adamoptimizers, and determine learning rates with grid search on a grid of [.1, .05, .01, .005, .001] forSGD and a grid of [.001, .0005, .0001,.00005, .00001] for Adam. We test batch sizes of 50, 100, 200,and 800 for pre-training. The selected parameters for the pre-train stage are: an SGD optimizer withlearning rate of .001, L2 regularization scale of 0.1 and batch size of 800. The selected parameters forthe fine-tune stage are: an SGD optimizer with learning rate of .001, L2 regularization scale of 0.1,and batch size of 20 which is the size of each target cancer training dataset. For the combined learningmodel and direct learning model, since the algorithm is very similar to the regular pre-trainingmodel’s pre-train stage, we use the same parameters selected for the pre-train. The batch sizes fordirect learning is half of the size of training data.

For the meta-learning model, we search for hyper-parameters for the meta-learning only. For the finallearning stage, we use the same hyper-parameters as in the fine-tune stage of the regular pre-trainingmodel, as both methods can use similar algorithms in the last stage of training. From our previousdiscussion, in the meta-learning stage an SGD optimizer and an Adam optimizer are suitable forthe inner learner and meta-learner respectively. For the learning rates, we perform grid search on agrid of [.1, .05, .01, .005, .001] for SGD and a grid of [.001, .0005, .0001,.00005, .00001] for Adam.Batch size is searched from [50, 100, 200, 800], the number of tasks for averaging one meta-learnerupdate is searched from [5, 10, 20], and the number of gradient descent steps for the inner-learner is

6

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 7: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Method Glioma Lung cancer HNSCDirect (250 samples) 0.24 ± 0.02 0.19 ± 0.01 0.20 ± 0.01Direct (150 samples) 0.25 ± 0.01 0.19 ± 0.01 0.21 ± 0.01

Direct 0.30 ± 0.02 0.24 ± 0.02 0.30 ± 0.02Combined 0.29 ± 0.02 0.21 ± 0.02 0.26 ± 0.02

Pre-training 0.31 ± 0.02 0.23 ± 0.02 0.26 ± 0.02Meta-learning 0.28 ± 0.01 0.16 ± 0.01 0.16 ± 0.00

Table 2: Integrated Brier scores (IBS) with 95% confidence intervals (n=X trials) for target cancersurvival prediction with 20 samples, unless specified otherwise. Lower value is better. Best performingmethod in bold.

searched from [3, 5, 10, 20]. The selected parameters for the meta-learning stage are shown in Table1.

Finally, in order to evaluate the effect of fluctuations of the meta-learning hyper-parameters, andensure that our results reflect the average performance over fluctuations, we conduct a series oftests on the validation data where in each experiment we vary one of the five unique meta-learninghyper-parameters from the chosen value by tuning it up or down by one grid, obtaining ten setsof varied hyper-parameters. We do 5-fold cross-validation for each set of varied hyper-parametersand compute the concordance index from the resulting fifty experiments. We also do fifty randomexperiments using the selected hyper-parameters and compare the average results of varied versusselected hyper-parameters. We conduct a two-sample t-test on the two results, and conclude that theresults obtained by varied parameters do not have a significant difference from the results obtained bythe chosen parameters (mean concordance index difference of 0.005 with p value 0.50). Therefore,our results are robust with respect to fluctuations of the hyper-parameters and our conclusions are notbased on excessive hyper-parameter tuning.

2.6 Interpretation of the genes prioritized by the meta-learning model

We apply risk score backpropagation (Yousefi et al., 2017) to the meta-learned models to investigatethe feature importance of genes for each of the three target cancer sites. For a given sample, each inputfeature is assigned a risk score by taking the partial derivatives of the risk with respect to the feature.A positive risk score with high absolute value means the feature is important in poor prediction(high risk), and a negative risk score with high absolute value means the feature is important in goodprediction (low risk). The features are ranked by the average of risk score across all samples.

Two approaches were adopted for annotating the genes with ranked risk scores generated by themeta-learning model. Firstly, the top 10% high-risk genes (genes with positive risk scores) and thetop 10% low-risk genes (genes with negative risk scores) from each cancer type were subjectedto gene set over-representation analysis, by comparing the genes against the gene sets annotatedwith well-defined biological functions and processes. We model the association between the genesand each gene set using a hypergeometric distribution and Fisher’s exact test. Secondly, instead ofarbitrary thresholding in the first approach, all the genes, together with their ranked risk scores wereincorporated in the gene set enrichment analysis with the fgsea R package (Sergushichev, 2016) whichcalculates a cumulative gene set enrichment statistic value for each gene set. The gene set databasesused in this analysis include Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa andGoto, 2000), The Reactome Pathway Knowledgebase (Croft et al., 2014) and WikiPathways (Slenteret al., 2018).

3 Results

3.1 Meta-learning outperforms regular pre-training and combined learning

For all of the target cancer sites, meta-learning achieves similar or better performance than regularpre-training or combined learning (Figure 3; Table 2). For the glioma cohort, the mean C-indexfor meta-learning is 0.86 (0.85-0.86 95% CI), compared to 0.84 (0.83-0.85 95% CI) for regular

7

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 8: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Lung cancerG

lioma

HN

SC

Combined learning

Pre−training

Meta−learning

0.50

0.55

0.60

0.65

0.70

0.80

0.85

0.450.500.550.600.650.700.75

Con

cord

ance

inde

x (C

−in

dex)

Combined learning

Pre−training

Meta−learning

Figure 3: C-Index for target cancer survival prediction, comparing combined learning, regularpre-training and meta-learning

pre-training and 0.81 (0.81-0.82 95% CI) for combined training. For the lung cancer cohort, themean C-index is 0.65 (0.65-0.66 95% CI) for meta-learning, 0.60 (0.58-0.61 95% CI) for regularpre-training, and 0.62 (0.62-0.63 95% CI) for combined training. For the HNSC cohort, the resultis 0.61 (0.59-0.63 95% CI) for meta-learning, 0.59 (0.57-0.61 95% CI) for regular pre-training and0.62 (0.61-0.64 95% CI) for combined training. Note that, the variance of the meta-learning resultsacross 25 random trials also tends to be the smallest, which is most observable for the lung cancer andglioma cohorts. In addition, each of these multi-layer neural networks also shows better performanceon average than a linear baseline model. The linear baseline model achieves a C-index of 0.61 forlung cancer (0.60-0.62 95% CI), 0.77 for glioma (0.74-0.80 95% CI), and 0.59 for HNSC (0.58-0.6295% CI).

3.2 Meta-learning achieves competitive predictive performance compared to direct learning

Next, we compare our meta-learning approach with regular direct learning on the target task trainingsamples with different cohort sizes. The performance of direct learning drops significantly whenthe number of training samples decreases from 250 to 20, which is anticipated because a greatamount of information is lost and the model can hardly learn well. However, meta-learning and pre-training can compensate for such lack of information by transferring knowledge from the pan-cancerdata explicitly and implicitly, respectively. We show that meta-learning achieves similar or betterprediction performance than large-sample direct training in lung cancer and HNSC, and reachescomparable performance with medium-sample direct training in glioma (Figure 4; Table 2). For thelung cancer cohort, the mean C-index is 0.57 (0.56-0.58 95% CI) for large sample direct learning,0.54 (0.52-0.56 95% CI) for medium sample direct learning, 0.53 (0.50-0.55 95% CI) for smallsample direct learning, and 0.65 (0.65-0.66 95% CI) for meta-learning. For the glioma cohort, themean C-indices for large sample, medium sample and small sample direct learning are 0.86 (0.86-0.8795% CI), 0.85 (0.85-0.86 95% CI), and 0.82 (0.81-0.84 95% CI) respectively, and for meta-learningthe mean C-index is 0.86 (0.85-0.86 95% CI). For the HNSC cohort, the mean C-index is 0.62(0.60-0.64 95% CI) for large sample direct learning, 0.57 (0.54-0.59 95% CI) for medium sampledirect learning, 0.53 (0.49-0.56 95% CI) for small sample direct learning, and 0.61 (0.59-0.63 95%

8

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 9: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

CI) for meta-learning. Thus, in all three cancer sites, meta-learning reaches competitive performancesas large sample direct learning and can outperform it in certain cases.

Lung cancerG

lioma

HN

SC

direct 20direct 150

direct 250

Meta−learning

0.400.450.500.550.600.65

0.75

0.80

0.85

0.400.450.500.550.600.650.70

Con

cord

ance

inde

x (C

−in

dex)

direct 20

direct 150

direct 250

Meta−learning

Figure 4: C-Index for target cancer survival prediction, comparing direct learning with large (250),medium (150) and small (20) size samples and meta-learning.

3.3 Risk score ranked genes are enriched in key cancer pathways

Next, we investigated for each cancer site what genes are most important in the meta-learning model(Figure 5, Supplementary Tables S1- S6). In gliomas, the high-risk genes are associated with viralcarcinogenesis (p value 0.002), Herpes simplex infection (p value 0.007), cell cycle (p value 0.03),apoptosis (p value 0.03), DNA damage response (p value 0.04), all of which are also enriched ingene set enrichment analysis with positive enrichment scores (Figure 5a, Table S2). The low-riskgenes are associated with HSF1 activation (p value 0.02) which is involved in hypoxia pathway, andtryptophan metabolism (p value 0.04), the latter is also enriched in gene set enrichment analysis withnegative enrichment score. Tryptophan catabolism has been increasingly recognized as an importantmicroenvironmental factor in anti-tumor immune responses (Platten et al., 2012) and it is a commontherapeutic target in cancer and neurodegeneration diseases (Platten et al., 2019).

In head and neck cancer, the high-risk genes are associated with PTK6 signaling (p value 0.01),which regulates cell cycle and growth, and cytokines and inflammatory response (p value 0.009).The low-risk genes are associated with autophagy (p value 0.02), which is also enriched in gene setenrichment analysis. Other enriched pathways include B cell receptor signaling pathway, cell cycle,and interleukin 1 signaling pathway (Figure 5b, Table S4). Interleukin 1 is an inflammatory cytokinewhich plays a key role in carcinogenesis and tumor progression (Mantovani et al., 2018).

In lung cancer, the top high-risk genes are associated with "non-small cell lung cancer" pathway (pvalue 0.01), tuberculosis (p value 0.008), Hepatitis B and C virus infection (p value 0.03), and manypathways implicated previously in cancer. These pathways are also enriched in gene set enrichmentanalysis (Figure 5c, Table S6). Pulmonary tuberculosis has been shown to increase the risk of lungcancer (Wu et al., 2011; Yu et al., 2011). The low-risk genes are associated with energy metabolism(p value 0.03), ferroptosis (p value 0.037) and AMPK signaling pathway (p value 0.046), all related toenergy metabolism, particularly lipid metabolism. AMPK signaling pathway activation by an AMPKagonist was shown to suppresses non-small cell lung cancer through inhibition of lipid metabolism(Chen et al., 2017). AMPK signaling and energy metabolism are also enriched in gene set enrichment

9

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 10: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Pathway Gene ranks NES pvalRenin−angiotensin system,KEGG 1.71 1.0e−02

Viral carcinogenesis,KEGG 1.53 1.4e−03Hepatitis C,KEGG 1.51 4.6e−03

Phototransduction,KEGG 1.51 4.2e−02Epstein−Barr virus infection,KEGG 1.49 3.4e−03

Small cell lung cancer,KEGG 1.40 3.0e−02Herpes simplex infection,KEGG 1.38 2.1e−02

p53 signaling pathway,KEGG 1.37 4.9e−02Cushing syndrome,KEGG 1.30 5.0e−02

Tryptophan metabolism,KEGG −1.55 2.3e−02TP53 Regulates Transcription of Cell Death Genes 1.92 1.1e−03

Mitotic G1−G1−S phases 1.65 1.7e−02Apoptosis 1.65 2.5e−03

Integrated Breast Cancer Pathway 1.62 6.3e−03Prostaglandin Synthesis and Regulation 1.60 1.2e−02

Oxidative Damage 1.57 1.7e−02Interferon alpha−beta signaling 1.53 3.9e−02

G1 to S cell cycle control 1.50 2.0e−02Preimplantation Embryo 1.50 2.7e−02IL17 signaling pathway 1.47 4.7e−02BDNF−TrkB Signaling 1.46 4.4e−02

DNA Damage Response 1.42 3.3e−02Cell Cycle 1.34 3.9e−02

Protein alkylation leading to liver fibrosis −1.42 4.5e−02Peptide GPCRs −1.42 4.2e−02

Tryptophan metabolism −1.45 4.0e−02Transcription factor regulation in adipogenesis −1.57 2.8e−02

0 4000 8000 12000 16000

(a) GliomaPathway Gene ranks NES pval

African trypanosomiasis,KEGG 1.71 6.4e−03Proteasome,KEGG 1.61 1.1e−02

Cell cycle,KEGG 1.52 6.4e−03beta−Alanine metabolism,KEGG 1.50 3.8e−02

Pathogenic Escherichia coli infection,KEGG 1.48 2.8e−02Mineral absorption,KEGG 1.45 4.0e−02

Proteoglycans in cancer,KEGG 1.35 1.8e−02Protein processing in endoplasmic reticulum,KEGG 1.34 2.9e−02

Hepatocellular carcinoma,KEGG 1.30 4.0e−02cAMP signaling pathway,KEGG −1.32 3.1e−02

Toxoplasmosis,KEGG −1.36 3.6e−02FoxO signaling pathway,KEGG −1.39 2.2e−02

Autophagy,KEGG −1.48 9.7e−03Fc epsilon RI signaling pathway,KEGG −1.54 1.4e−02

B cell receptor signaling pathway,KEGG −1.56 9.6e−03Autophagy,KEGG −1.48 9.7e−03

NAD+ biosynthetic pathways 1.68 1.4e−02Cell Cycle 1.56 4.0e−03

Cytokines and Inflammatory Response 1.52 4.4e−02Photodynamic therapy−induced NF−kB survival signaling 1.50 3.5e−02

Tumor suppressor activity of SMARCB1 1.50 3.7e−02Pathogenic Escherichia coli infection 1.48 2.8e−02

Interleukin−4 and Interleukin−13 signaling 1.44 2.1e−02Proteasome Degradation 1.44 3.1e−02

Parkin−Ubiquitin Proteasomal System pathway 1.37 5.0e−02Structural Pathway of Interleukin 1 (IL−1) −1.42 4.8e−02MicroRNAs in cardiomyocyte hypertrophy −1.49 1.6e−02

Hematopoietic Stem Cell Gene Regulation by GABP alpha−beta Complex −1.51 4.4e−02miRNA regulation of prostate cancer signaling pathways −1.53 3.1e−02

IL−1 signaling pathway −1.54 1.4e−02Differentiation of white and brown adipocyte −1.56 2.9e−02

Type II diabetes mellitus −1.67 1.6e−02Fatty Acid Biosynthesis −1.84 3.0e−03

0 4000 8000 12000 16000

(b) Head and neck cancerPathway Gene ranks NES pval

Pentose phosphate pathway,KEGG 1.99 5.6e−04Chronic myeloid leukemia,KEGG 1.73 1.3e−03

Glioma,KEGG 1.70 2.0e−03Bladder cancer,KEGG 1.68 6.3e−03

Melanoma,KEGG 1.68 3.2e−03Tuberculosis,KEGG 1.52 2.9e−03

Phospholipase D signaling pathway,KEGG 1.50 5.6e−03MicroRNAs in cancer,KEGG 1.45 9.7e−03

Type I diabetes mellitus,KEGG −1.67 7.7e−03Mammary gland development pathway − Pregnancy and lactation (Stage 3 of 4) 1.97 5.6e−04

Retinoblastoma Gene in Cancer 1.89 4.0e−05Non−genomic actions of 1,25 dihydroxyvitamin D3 1.88 3.2e−04

LTF danger signal response pathway 1.75 8.9e−03Fluoropyrimidine Activity 1.71 7.2e−03

Signaling of Hepatocyte Growth Factor Receptor 1.67 8.3e−03Non−small cell lung cancer 1.58 9.0e−03

Nanoparticle triggered autophagic cell death −1.83 3.7e−030 4000 8000 12000 16000

(c) Lung cancer

Figure 5: Gene set enrichment analysis of the ranked gene list in (a) Glioma (b) Head and neckcancer (c) Lung cancer. The gene set databases used in this analysis included Kyoto Encyclopediaof Genes and Genomes (KEGG) and WikiPathways. Pval, enrichment p-value; NES, normalizedenrichment score. In (a) and (b) the enrich pathways with p value below 0.05 were displayed. In (c)the enrich pathways with p value below 0.01 were displayed. Detailed information can be found insupplementary tables S2, S4 and S6.

analysis (Figure 5C). Other enriched pathways include Notch signaling, interleukin signaling, ErbBsignaling, and signaling pathways regulating pluripotency of stem cells.

10

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 11: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

4 Discussion

Previous studies have shown that when analyzing high-dimensional genomic data, deep learningsurvival models can achieve comparable or superior performance compared to other methods (e.g.,Cox elastic net regression, random survival forests) (Kim et al., 2019). However, the performanceof deep learning is often limited by the relatively small amount of available data (Yousefi et al.,2017). To address this issue, our work investigates different deep learning paradigms to improve theperformance of deep survival models, especially in the setting of small size training data.

In previous studies the most common way to build deep survival models is to train neural networkswith a large number of target task training samples from scratch, a process we call direct training.Direct training with a large sample size (e.g. n = 250) can thus be considered as a baseline. Asexpected, the performance of direct training drops when the number of training samples decreases(e.g. from n = 150 to n = 20). On the other hand, combined learning, regular pre-training, andmeta-learning all leverage additional data from other sources, thereby enabling them to achieve betterperformances when the training sample size is small. We use a small number of target cancer sitetraining samples (e.g. n = 20) with these methods and investigate their performance.

It is important to note that combined learning, regular pre-training and meta-learning are exposed toexactly the same information, but differ only in their algorithms. Combined learning is a one-stagelearning process, whereas pre-training and meta-learning are two-stage learning methods. Meta-learning shows better predictive performance than combined or regular pre-training, indicating that itis able to adapt to a new task more effectively due to the improved optimization algorithm targetingthe few-sample training environment.

It has been shown that methods which use only target task data (direct learning with different sizesamples) and methods which use additional information (combined, pre-training and meta-learning)perform differently, and one type of approach may be better than the other on different cancer sites.For example, on glioma, direct learning tends to do better overall; whereas on lung cancer, the othermethods outperform direct learning. This may be due to that fact that the amount of informationthat can be learnt from related data versus from the target data is different for each cancer site.If there is significant information within the target cancer samples alone, then direct training willbe more effective than learning from other cancer samples. On all three cancer sites we observethat meta-learning achieves similar or better performance than medium-size direct training, andoutperforms large-size direct training in some cases. However, the advantage of meta-learningmay not generalize to every cancer site. Certain cancers may have very unique characteristics sothat transfer of information from other cancers may not help in prediction regardless of improvedadaptivity. For the three cancer sites, the affinity of each target cancer to other types of cancers in thepan-cancer data aids the performance of meta-learning, which efficiently transfers the informationfrom other cancers to the target cancers. On the other hand, some cancers are more dissimilar fromother cancer sites which makes information transfer difficult. For example, for another cancer site,kidney cancer, specifically kidney renal clear cell carcinoma (KIRC) and kidney renal papillary cellcarcinoma (KIRP), both meta-learning and pre-training do not produce good survival prediction.This can be visualized in Figure S1, comparing the affinity between different target cancers with therest of the cancers on a t-distributed stochastic neighbor embedding (t-SNE) graph. Therefore, inorder for meta-learning to achieve good performance, the related tasks training data need to contain areasonable amount of transferable information to the target task.

The performance of meta-learning can be explained by the learned learning algorithm at the meta-learning stage where the model learns from related tasks. We further investigate how to optimizemeta-learning performance. We examine results from two sampling approaches when forming onetask, where we either draw samples only from one cancer, or draw samples from multiple types ofcancer. It is a more natural choice to consider each cancer type as a separate task, but we found thatthe latter leads to improved performance. To explain this improvement, we examine the gradientof the meta-learning loss function. It can be shown that the gradient of the loss function contains aterm that encourages the gradients from different minibatches for a given task to align in the samedirection (Appendix A). If the two minibatches contain samples from the same type of cancer, theirgradient might already be very similar and thus this higher order term would not have a large effect.On the other hand, if the second minibatch contains samples from a different type of cancer than thefirst, the algorithm will learn something that is common to both of them and thereby help to improvegeneralization.

11

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 12: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

The gene set enrichment analysis results validate our model for prioritizing the genes for survivalpredication. In the three cancer types investigated, the resulting gene lists are enriched in keypathways in cancer including cell cycle regulation, DNA damage response, cell death, interleukinsignaling, and NOTCH signaling pathway, etc.

Apart from the well-recognized cancer pathways, our results also reveal potential players affectingcancer development and prognosis, that are not well-studied yet. Viruses have been linked to thecarcinogenesis of several cancers, including human papilloma virus in cervical cancer, hepatitis Band C viruses in liver cancer, and Epstein-Barr virus in several lymphomas and nasopharyngealcarcinoma (Martin and Gutkind, 2008). Our results further suggest that viruses might also play arole in glioma and lung cancer, where the high-risk genes are enriched in several viral carcinogenesispathways. In gliomas, the enriched pathways that are unfavorable for survival include Epstein-Barrvirus and herpes simplex infection. In lung cancer, both hepatitis B and C virus infection pathwaysare enriched. This suggests that that hepatotropic viruses may affect the respiratory system, includingthe association with lung cancer. For example, hepatitis B virus infection has been associated withpoor prognosis in patients with advanced non-small cell lung cancer (Peng et al., 2015). The roleof Epstein-Barr virus in gliomagenesis have also been studied but the results remain inconclusive(Akhtar et al., 2018). Whether these viruses do play a role in carcinogenesis and further affect cancerprognosis, or the association we observed reflects an abnormal immune system that is unfavorable forthe survival of cancer patients remains to be investigated.

As for the enriched pathways that are favorable for cancer survival, we identified pathways relatedto metabolism, in particular, lipid metabolism, in all the three cancer types investigated. In glioma,the top enriched pathway favorable for cancer survival is adipogenesis regulation. In head and neckcancer, differentiation of adopocyte and fatty acid biosynthesis are top enriched favorable pathways.In lung cancer, ferroptosis and AMPK signaling pathway are both related to energy metabolism.Ferroptosis is a process driven by accumulated iron-dependent lipid ROS that leads to cell death.Small molecules-induced ferroptosis has a strong inhibition of tumor growth and enhances thesensitivity of chemotherapeutic drugs, especially in drug resistance (Lu et al., 2018). AMPK playsa central role in the control of cell growth, proliferation and autophagy through the regulation ofmTOR activity and lipid metabolism (Chen et al., 2017; Han et al., 2013). The link between cancerand metabolism is worth investigating in future studies.

5 Conclusion

In survival analyses one problem that researchers have encountered is the insufficient amount oftraining samples for machine learning algorithms to achieve good performances. We address thisproblem by adapting a meta-learning approach which learns effectively with only a small number oftarget task training samples. We show that the meta-learning framework is able to achieve similarperformance as learning from a significantly larger number of samples by using an efficient knowledgetransfer. Moreover, in the context of limited training sample exposure, we demonstrate that thisframework achieves superior predictive performance over both regular pre-training and combinedlearning methods on two types of target cancer sites. Finally, we show that meta-learning models areinterpretable and can be used to investigate biological phenomena associated with cancer survivaloutcome.

The problem of small data size may be a limiting factor in many biomedical analyses, especiallywhen studies are conducted with data that is expensive to produce, or in the case of multi-modaldata (Cheerla and Gevaert, 2019). Our work shows the promise of meta-learning for biomedicalapplications to alleviate the problem of limited data. In future work, we intend to extend this approachto analysis with medical imaging data, such as histopathology data and radiology data, for buildingpredictive models on multi-modal data with limited sets of patients.

6 Acknowledgements

AD acknowledges funding from the European Union’s Horizon 2020 research and innovation pro-gramme under the Marie Skłodowska-Curie grant agreement No. 754354.

12

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 13: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

ReferencesSaghir Akhtar, Semir Vranic, Farhan Sachal Cyprian, and Ala-Eddin Al Moustafa. Epstein–barr virus

in gliomas: cause, association, or artifact? Frontiers in oncology, 8:123, 2018.

K Brennan, JL Koenig, AJ Gentles, JB Sunwoo, and O Gevaert. Identification of an atypicaletiological head and neck squamous carcinoma subtype featuring the cpg island methylatorphenotype. EBioMedicine, 17:223–236, 2017.

Glenn W Brier. Verification of forecasts expressed in terms of probability. Monthly weather review,78(1):1–3, 1950.

Michele Ceccarelli, Floris P Barthel, Tathiane M Malta, Thais S Sabedot, Sofie R Salama, Bradley AMurray, Olena Morozova, Yulia Newton, Amie Radenbaugh, Stefano M Pagnotta, et al. Molecularprofiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell,164(3):550–563, 2016.

Kumardeep Chaudhary, Olivier B Poirion, Liangqun Lu, and Lana X Garmire. Deep learning–basedmulti-omics integration robustly predicts survival in liver cancer. Clinical Cancer Research, 24(6):1248–1259, 2018.

Anika Cheerla and Olivier Gevaert. Deep learning with multimodal representation for pancancerprognosis prediction. Bioinformatics, 35(14):i446–i454, 2019.

Xi Chen, Chun Xie, Xing-Xing Fan, Ze-Bo Jiang, Vincent Kam-Wai Wong, Jia-Hui Xu, Xiao-JunYao, Liang Liu, and Elaine Lai-Han Leung. Novel direct ampk activator suppresses non-small celllung cancer through inhibition of lipid metabolism. Oncotarget, 8(56):96089, 2017.

Travers Ching, Xun Zhu, and Lana X Garmire. Cox-nnet: an artificial neural network method forprognosis prediction of high-throughput omics data. PLoS computational biology, 14(4):e1006076,2018.

David R Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B(Methodological), 34(2):187–202, 1972.

David Croft, Antonio Fabregat Mundo, Robin Haw, Marija Milacic, Joel Weiser, Guanming Wu,Michael Caudy, Phani Garapati, Marc Gillespie, Maulik R Kamdar, et al. The reactome pathwayknowledgebase. Nucleic acids research, 42(D1):D472–D477, 2014.

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scalehierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition,pages 248–255. Ieee, 2009.

Arnout Devos and Matthias Grossglauser. Subspace networks for few-shot classification. arXivpreprint arXiv:1905.13613, 2019.

Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. Rl2:Fast reinforcement learning via slow reinforcement learning. arxiv, 2016. URL https://arxiv.org/abs/1611.02779, 2016.

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation ofdeep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume70, pages 1126–1135. JMLR. org, 2017.

Jelle J Goeman. L1 penalized estimation in the cox proportional hazards model. Biometrical journal,52(1):70–84, 2010.

Dong Han, Shao-Jun Li, Yan-Ting Zhu, Lu Liu, and Man-Xiang Li. Lkb1/ampk/mtor signalingpathway in non-small-cell lung cancer. Asian Pacific Journal of Cancer Prevention, 14(7):4033–4039, 2013.

Frank E Harrell, Robert M Califf, David B Pryor, Kerry L Lee, and Robert A Rosati. Evaluating theyield of medical tests. Jama, 247(18):2543–2546, 1982.

13

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 14: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Siew Wan Hee, Adrian Willis, Catrin Tudur Smith, Simon Day, Frank Miller, Jason Madan, MartinPosch, Sarah Zohar, and Nigel Stallard. Does the low prevalence affect the sample size ofinterventional clinical trials of rare diseases? an analysis of data from the aggregate analysis ofclinicaltrials. gov. Orphanet journal of rare diseases, 12(1):44, 2017.

Roy S Herbst and Scott M Lippman. Molecular signatures of lung cancer–toward personalizedtherapy. New England Journal of Medicine, 356(1):76–78, 2007.

David W Hosmer, Stanley Lemeshow, and S May. Applied survival analysis: regression modeling oftime to event data (wiley series in probability and statistics), 2008.

Minoru Kanehisa and Susumu Goto. Kegg: kyoto encyclopedia of genes and genomes. Nucleic acidsresearch, 28(1):27–30, 2000.

Dong Wook Kim, Sanghoon Lee, Sunmo Kwon, Woong Nam, In-Ho Cha, and Hyung Jun Kim. Deeplearning-based survival prediction of oral cancer patients. Scientific reports, 9(1):1–10, 2019.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980, 2014.

John P Klein and Melvin L Moeschberger. Survival analysis: techniques for censored and truncateddata. Springer Science & Business Media, 2006.

David G Kleinbaum and Mitchel Klein. The cox proportional hazards model and its characteristics.In Survival analysis, pages 97–159. Springer, 2012.

Yan Li, Lu Wang, Jie Wang, Jieping Ye, and Chandan K Reddy. Transfer learning for survival analysisvia efficient l2, 1-norm regularized cox regression. In 2016 IEEE 16th International Conferenceon Data Mining (ICDM), pages 231–240. IEEE, 2016.

David N Louis, Arie Perry, Guido Reifenberger, Andreas Von Deimling, Dominique Figarella-Branger, Webster K Cavenee, Hiroko Ohgaki, Otmar D Wiestler, Paul Kleihues, and David WEllison. The 2016 world health organization classification of tumors of the central nervous system:a summary. Acta neuropathologica, 131(6):803–820, 2016.

Bin Lu, Xiao Bing Chen, Mei Dan Ying, Qiao Jun He, Ji Cao, and Bo Yang. The role of ferroptosisin cancer development and treatment response. Frontiers in pharmacology, 8:992, 2018.

Margaux Luck, Tristan Sylvain, Héloïse Cardinal, Andrea Lodi, and Yoshua Bengio. Deep learningfor patient-specific kidney graft survival analysis. arXiv preprint arXiv:1705.10245, 2017.

Alberto Mantovani, Isabella Barajon, and Cecilia Garlanda. Il-1 and il-1 regulatory pathways incancer progression and therapy. Immunological reviews, 281(1):57–61, 2018.

D Martin and JS Gutkind. Human tumor-associated viruses and new insights into the molecularmechanisms of cancer. Oncogene, 27(2):S31–S42, 2008.

Pooya Mobadersany, Safoora Yousefi, Mohamed Amgad, David A Gutman, Jill S Barnholtz-Sloan,José E Velázquez Vega, Daniel J Brat, and Lee AD Cooper. Predicting cancer outcomes fromhistology and genomics using convolutional networks. Proceedings of the National Academy ofSciences, 115(13):E2970–E2979, 2018.

Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. InProceedings of the 27th international conference on machine learning (ICML-10), pages 807–814,2010.

Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXivpreprint arXiv:1803.02999, 2018.

Mee Young Park and Trevor Hastie. L1-regularization path algorithm for generalized linear models.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):659–677, 2007.

Sang Tae Park and Jayoung Kim. Trends in next-generation sequencing and a new era for wholegenome sequencing. International neurourology journal, 20(Suppl 2):S76, 2016.

14

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 15: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Jie-Wen Peng, Dong-Ying Liu, Gui-Nan Lin, JJ Xiao, and Zhong-Jun Xia. Hepatitis b virus infectionis associated with poor prognosis in patients with advanced non small cell lung cancer. Asian PacJ Cancer Prev, 16(13):5285–5288, 2015.

Michael Platten, Wolfgang Wick, and Benoît J Van den Eynde. Tryptophan catabolism in cancer:beyond ido and tryptophan depletion. Cancer research, 72(21):5435–5440, 2012.

Michael Platten, Ellen AA Nollen, Ute F Röhrig, Francesca Fallarino, and Christiane A Opitz.Tryptophan metabolism as a common therapeutic target in cancer, neurodegeneration and beyond.Nature Reviews Drug Discovery, 18(5):379–401, 2019.

Lorien Y Pratt. Discriminability-based transfer between neural networks. In Advances in neuralinformation processing systems, pages 204–211, 1993.

Alexey Sergushichev. An algorithm for fast preranked gene set enrichment analysis using cumulativestatistic calculation. BioRxiv, page 060012, 2016.

Denise N Slenter, Martina Kutmon, Kristina Hanspers, Anders Riutta, Jacob Windsor, Nuno Nunes,Jonathan Mélius, Elisa Cirillo, Susan L Coort, Daniela Digles, et al. Wikipathways: a multifacetedpathway database bridging metabolomics to other omics research. Nucleic acids research, 46(D1):D661–D667, 2018.

Katarzyna Tomczak, Patrycja Czerwinska, and Maciej Wiznerowicz. The cancer genome atlas (tcga):an immeasurable source of knowledge. Contemporary oncology, 19(1A):A68, 2015.

Luca Tonella, Marco Giannoccaro, Salvatore Alfieri, Silvana Canevari, and Loris De Cecco. Geneexpression signatures for head and neck cancer patient stratification: are results ready for clinicalapplication? Current treatment options in oncology, 18(5):32, 2017.

Ricardo Vilalta and Youssef Drissi. A perspective view and survey of meta-learning. Artificialintelligence review, 18(2):77–95, 2002.

Kin Yau Wong, Cheng Fan, Maki Tanioka, Joel S Parker, Andrew B Nobel, Donglin Zeng, Dan-YuLin, and Charles M Perou. An integrative boosting approach for predicting survival time withmultiple genomics platforms. bioRxiv, page 338145, 2018.

Chen-Yi Wu, Hsiao-Yun Hu, Cheng-Yun Pu, Nicole Huang, Hsi-Che Shen, Chung-Pin Li, andYiing-Jeng Chou. Pulmonary tuberculosis increases the risk of lung cancer: a population-basedcohort study. Cancer, 117(3):618–624, 2011.

Safoora Yousefi, Fatemeh Amrollahi, Mohamed Amgad, Chengliang Dong, Joshua E Lewis, Con-gzheng Song, David A Gutman, Sameer H Halani, Jose Enrique Velazquez Vega, Daniel J Brat,et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survivalmodels. Scientific reports, 7(1):1–11, 2017.

Yang-Hao Yu, Chien-Chang Liao, Wu-Huei Hsu, Hung-Jen Chen, Wei-Chih Liao, Chih-Hsin Muo,Fung-Chang Sung, and Chih-Yi Chen. Increased lung cancer risk among patients with pulmonarytuberculosis: a population cohort study. Journal of Thoracic Oncology, 6(1):32–37, 2011.

15

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 16: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Appendix A

We compute the gradient of the meta-learning loss function. Suppose that, for each task Tτ , theinner-learner takes 2 steps of stochastic gradient descent and updates the parameters to θ2τ . FromEquation (3), we can write the gradient using a Taylor expansion:

g =θ0τ − θ2τα

= L′τ,0(θ0τ)+ L′τ,1

(θ0τ)− αL′′τ,1

(θ0τ)L′τ,0

(θ0τ)+O

(α2)

(5)

Lτ,0 is the loss computed on the first minibatch sampled from task τ , and Lτ,1 is the loss computedon the second minibatch sampled from task τ . The expectation of the first two terms in Equation (5)corresponds to the gradient of expected loss, and the expectation of the third term can be written as:

Eτ,0,1 [L′′1(θ)L′0(θ)] =1

2Eτ,0,1

[∂

∂θ(L′1(θ) · L′0(θ))

]. (6)

This term increases the inner product of the gradient of the first minibatch and the gradient of thesecond minibatch, which means it encourages the gradients from different minibatches for a giventask to align in the same direction.

16

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 17: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Supplementary information

−40

0

40

−80 −40 0 40

x

y

ident

gbm

lgg

others

(a) LGG/GBM

−40

0

40

−80 −40 0 40

x

y

ident

luad

lusc

others

(b) LUAD/LUSC

−40

0

40

80

−40 0 40 80

x

y

ident

hnsc

others

(c) HNSC

−40

0

40

−80 −40 0 40

x

y

ident

kirc

kirp

others

(d) KIRC/KIRP

Figure S1: Mapping of 33 types of cancers’ gene expression data with a t-distributed stochastic neigh-bor embedding (t-SNE), highlighting (S1a) LGG/GBM versus the rest of cancers; (S1b) LUAD/LUSCversus the rest of cancers; (S1c) HNSC versus the rest of cancers; (S1d) KIRC/KIRP versus the restof cancers. Renal cell carcinomas (i.e. KIRC and KIRP) are farther apart from other types of cancers,and also more heterogeneous within, thus information transfer from other cancers is considered moredifficult.

17

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 18: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S1: Glioma gene set over-representation

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

Protein export - Homo sapiens (human) 23 0.018223195 1 0.003460208 SPCS1; SEC62; SRPRB; IMMP1L; SEC61B;SEC11C

6 lowriskgenes KEGG

Sulfur relay system - Homo sapiens (human) 8 0.038146719 1 0.001741149 URM1; MPST; TST 3 lowriskgenes KEGGSulfur metabolism - Homo sapiens (human) 9 0.038146719 1 0.001741149 BPNT1; MPST; TST 3 lowriskgenes KEGGLegionellosis - Homo sapiens (human) 55 0.039741108 1 0.005675369 IL12B; HSPA1B; HSPA1A; IL6; ARF1;

HSPD1; NAIP; CXCL3; CASP8; CLK110 lowriskgenes KEGG

RNA transport - Homo sapiens (human) 171 0.040939816 1 0.011924119 AAAS; NCBP1; SUMO3; EIF3H; EIF3A;NUP210; RPP40; ACIN1; FXR2; THOC2;THOC6; XPO5; NUP133; EIF2B4; RBM8A;EIF1; RGPD1; GEMIN5; SMN1; FMR1;NUP160; EIF4E2

22 lowriskgenes KEGG

Tryptophan metabolism - Homo sapiens (hu-man)

40 0.041726265 1 0.004013761 IDO1; INMT; ALDH3A2; GCDH; OGDH;AADAT; ALDH9A1

7 lowriskgenes KEGG

Carbohydrate digestion and absorption - Homosapiens (human)

44 0.048176123 1 0.004011461 SLC37A4; PIK3CA; CACNA1D; PRKCB;HK1; ATP1B2; ATP1B1

7 lowriskgenes KEGG

NOSTRIN mediated eNOS trafficking 6 0.008571329 1 0.001744186 NOSTRIN; WASL; CAV1 3 lowriskgenes REACTOMEDisinhibition of SNARE formation 5 0.008571329 1 0.001744186 STXBP3; PRKCB; STX4 3 lowriskgenes REACTOMERUNX1 regulates transcription of genes in-volved in differentiation of myeloid cells

6 0.008571329 1 0.001744186 PRKCB; CBFB; CREBBP 3 lowriskgenes REACTOME

HSP90 chaperone cycle for steroid hormone re-ceptors (SHR)

19 0.008593 1 0.003466205 NR3C2; NR3C1; HSPA1A; HSP90AA1;HSPA1B; FKBP4

6 lowriskgenes REACTOME

Dectin-2 family 29 0.008593 1 0.003466205 CLEC10A; MUC20; MUC5B; FCER1G;MUC4; SYK

6 lowriskgenes REACTOME

PAOs oxidise polyamines to amines 2 0.010013404 1 0.001164144 PAOX; SMOX 2 lowriskgenes REACTOMETWIK-releated acid-sensitive K+ channel(TASK)

2 0.010013404 1 0.001164144 KCNK9; KCNK3 2 lowriskgenes REACTOME

Attenuation phase 28 0.011863238 1 0.004029937 FKBP4; HSPA1B; HSPA1A; HSBP1;HSP90AA1; SERPINH1; CREBBP

7 lowriskgenes REACTOME

Crosslinking of collagen fibrils 10 0.012809503 1 0.002320186 TLL1; LOXL3; PCOLCE; PXDN 4 lowriskgenes REACTOMELysine catabolism 12 0.012809503 1 0.002320186 AADAT; PIPOX; OGDH; GCDH 4 lowriskgenes REACTOMEPyrimidine salvage 11 0.012809503 1 0.002320186 UPP1; UCK1; TK2; DCK 4 lowriskgenes REACTOMERUNX1 regulates estrogen receptor mediatedtranscription

6 0.015871668 1 0.001743173 GPAM; KCTD6; CBFB 3 lowriskgenes REACTOME

HSF1 activation 31 0.021616644 1 0.004022989 FKBP4; HSPA1B; HSPA1A; HSBP1; RPA3;HSP90AA1; SERPINH1

7 lowriskgenes REACTOME

Metabolism of polyamines 40 0.023498306 1 0.004581901 GAMT; PAOX; CKMT1A; SMOX; CKM;AMD1; OAZ2; OAZ3

8 lowriskgenes REACTOME

Activation of RAC1 14 0.025669753 1 0.002317497 NCK2; ROBO1; SOS2; PAK4 4 lowriskgenes REACTOMEInterconversion of polyamines 3 0.028037765 1 0.001163467 PAOX; SMOX 2 lowriskgenes REACTOMESynthesis of glycosylphosphatidylinositol(GPI)

18 0.028222393 1 0.002888504 PIGB; PIGH; PIGO; PIGW; DPM2 5 lowriskgenes REACTOME

Metabolism of amino acids and derivatives 339 0.030727025 1 0.020263425 GCAT; OGDH; PIPOX; CKM; NAALAD2;PDHX; HAL; SERINC5; RPL11; RPL18;AMD1; RPL4; SERINC1; IYD; TXNRD1;DDO; OAZ2; IVD; GAMT; INMT; PAOX;RPL35A; IDO1; MPST; RPL7; QDPR;ALDH9A1; ACADSB; RPSA; GLUD1;RPS15; CKMT1A; TST; GCDH; RPS13;SARDH; AADAT; SMOX; OAZ3; SECISBP2

40 lowriskgenes REACTOME

Cohesin Loading onto Chromatin 10 0.038146719 1 0.001741149 RAD21; PDS5A; PDS5B 3 lowriskgenes REACTOMEHSF1-dependent transactivation 36 0.048176123 1 0.004011461 FKBP4; HSPA1B; HSPA1A; HSBP1;

HSP90AA1; SERPINH1; CREBBP7 lowriskgenes REACTOME

Model for regulation of MSMP expression incancer cells and its proangiogenic role in ovar-ian tumors

3 0.028037765 1 0.001163467 CCR2; CTCF 2 lowriskgenes Wikipathways

Viral carcinogenesis - Homo sapiens (human) 201 0.00190752 0.612313978 0.017094017 IL6ST; IKBKG; GSN; DLG1; STAT3;PIK3CB; PIK3CD; ATP6V0D2; CREB1;KRAS; HLA-C; GTF2E2; EGR3; CCNE1;HIST1H2BL; HIST1H2BH; TBPL1; TBP;CDC20; HDAC5; HDAC6; CDKN1B; C3;RANBP1; MRPS18B; KAT2B; EP300; BAD;HIST2H2BE; YWHAB; TRAF3; ATF6B

32 highriskgenes KEGG

Herpes simplex infection - Homo sapiens (hu-man)

185 0.007380866 1 0.013579576 IKBKG; EP300; OAS2; CSNK2A1; IFNAR2;SOCS3; RNASEL; IFIT1; GTF2IRD1; TN-FRSF1A; TBP; TAF10; MAPK10; STAT2;SKP1; OAS3; HLA-C; C3; TBPL1; CYCS;IL15; HLA-DOA; TRAF3; POU2F3; CFP

25 highriskgenes KEGG

Epstein-Barr virus infection - Homo sapiens(human)

200 0.013975163 1 0.014973262 IFNAR2; IKBKG; GADD45A; PIK3CD;OAS3; OAS2; MYC; IRAK4; HLA-C;CDKN1B; CCNE1; PSMD12; PIK3CB;MAPK14; MAPK10; NCOR2; STAT3; STAT2;CR2; CYCS; HLA-DOA; PSMD8; CD58;PSMD2; BID; PSMC3; TRAF3; MAP3K14

28 highriskgenes KEGG

Necroptosis - Homo sapiens (human) 162 0.018658053 1 0.012008734 CHMP2A; CAPN1; CAMK2A; IFNAR2;MAPK10; FTH1; TNFRSF1A; TNFRSF10B;TNFRSF10A; STAT3; STAT2; HIST2H2AA3;CHMP4B; VPS4B; GLUD2; PPIA;HIST1H2AL; HIST1H2AG; HIST1H2AC;BID; PLA2G4A; CASP1

22 highriskgenes KEGG

18

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 19: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S1: Glioma gene set over-representation (continued)

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

Circadian rhythm - Homo sapiens (human) 30 0.021491434 1 0.004027618 CRY1; PRKAA1; NR1D1; SKP1; BHLHE40;CREB1; FBXL3

7 highriskgenes KEGG

RNA degradation - Homo sapiens (human) 79 0.025072681 1 0.007323944 ZCCHC7; DCP2; PARN; PABPC4; EX-OSC3; EXOSC7; EXOSC10; TOB1; PABPC1;PATL1; PAPD7; PAPD5; PNLDC1

13 highriskgenes KEGG

Tuberculosis - Homo sapiens (human) 178 0.026606605 1 0.012965964 LSP1; CALM3; IRAK2; IRAK4; CAMK2A;CREB1; TLR1; IL10RB; CARD9; RFXAP;TNFRSF1A; MAPK14; MAPK10; C3;ATP6V0D2; EP300; HLA-DOA; TGFB2;FCGR3B; BAD; CYCS; CEBPG; BID;FCGR2B

24 highriskgenes KEGG

Human T-cell leukemia virus 1 infection -Homo sapiens (human)

219 0.028932505 1 0.015822785 IKBKG; DLG1; RAN; MAP3K3; PIK3CB;PIK3CD; MYC; KRAS; ADCY7; MAPK10;HLA-C; CRTC1; CCNE1; CHEK2; TBPL1;TNFRSF1A; TBP; CDC20; FOSL1; RANBP1;KAT2B; EP300; IL2RB; IL15; HLA-DOA;TGFB2; CREB1; CDKN2C; MAP3K14;ATF6B

30 highriskgenes KEGG

Cell cycle - Homo sapiens (human) 124 0.045580947 1 0.009911894 GADD45A; WEE1; CDKN2C; CDKN1B;CCNE1; CHEK2; CDC25B; STAG1; CDC14B;CDC14A; CDC20; CDC6; SKP1; EP300;MYC; TGFB2; YWHAB; MCM5

18 highriskgenes KEGG

Cytokine Signaling in Immune system 458 0.00015883 0.299552844 0.031476998 IL6ST; IFNAR2; PIK3CB; PIK3CD; IRAK2;IRAK4; CAMK2A; SNAP25; TNFSF13; TN-FSF15; IFIT1; HCK; SIGIRR; FLNB; PELI2;CISH; BRWD1; IL2RB; HLA-C; MAP3K14;IFITM3; IFITM1; CREB1; TNFRSF25;UBA52; IL10RB; TRIM8; MEF2A; SKP1;TNFSF9; TNFRSF6B; TNFSF4; HIST1H3H;OSM; IL27; MAP3K8; MAP3K3; TRIM29;INPP5D; IKBKG; TNFRSF18; GBP2; GBP7;PTPN2; PTPN6; STAT3; STAT2; TNFRSF1A;IL15; IL1RAP; CASP1; S100A12; OAS3;OAS2; RNASEL; SOCS3; EIF4G1; IL20RB;UBE2L6; TRIM46; MAPK14; MAPK10;PSMB8; ADAM17; TRAF3

65 highriskgenes REACTOME

Signaling by NOTCH 119 0.001166907 1 0.012195122 MOV10; FURIN; DLGAP5; MYC; CREB1;HDAC5; UBA52; YBX1; PRKCI; ARRB2;KAT2B; NCSTN; HDAC6; TLE1; NCOR2;POFUT1; SKP1; LFNG; EP300; ADAM17;ST3GAL4; PBX1

22 highriskgenes REACTOME

Regulation of mRNA stability by proteins thatbind AU-rich elements

37 0.002709399 1 0.005737235 TNFSF13; PABPC1; MAPK14; ANP32A; EX-OSC3; EXOSC7; PARN; YWHAB; DCP2;EIF4G1

10 highriskgenes REACTOME

Interferon alpha/beta signaling 70 0.002974208 1 0.007390563 HLA-C; IFNAR2; IFIT1; STAT2; OAS3;OAS2; IFITM3; IFITM1; PTPN6; SOCS3;RNASEL; PSMB8; GBP2

13 highriskgenes REACTOME

Immune System 1827 0.003342285 1 0.059848734 AGL; UBE2Q2; CD226; PIK3CB; PIK3CD;MUC1; ACLY; FBXL12; FBXL19; PSMD12;SIGIRR; PELI2; ACTR10; CD93; CFP; ASB3;CNPY3; IL10RB; TRIM8; MEF2A; SH3RF1;PRDX6; WIPF3; WIPF1; EP300; ATP6V0D2;CUL2; TRIM29; ARHGAP9; LAIR2; C4BPA;PTPN2; PTPN6; PTPN4; C3; FBXO41; SLPI;FCGR3B; MAPKAP1; IL1RAP; SEC61A1;UBR2; KRAS; SOCS3; ARPC1A; EIF4G1;IL20RB; ARPC2; RNF123; MAPK14;MAPK10; DAPP1; P2RX7; TRAF3; IL6ST;COMMD9; CAMK2A; CISH; PSMC3;SEC61G; MAP3K14; IKBKG; TMEM173;GNLY; VAT1; UBA52; NCSTN; SKP1;LCN2; HIST1H3H; MAP3K8; MAP3K3;INPP5D; CARD9; PSMB8; UBA6; UBE2C;IL15; ATP11A; RNASEL; FBXL3; TLR1;FAF2; CDC20; DTX3L; ADAM17; MUC16;RASGRP3; TANK; HCK; GZMM; SPSB2;BRWD1; C5AR1; CDH1; IFITM3; IFITM1;RNF19A; UNC93B1; VNN1; UBE2E3; TN-FRSF6B; YWHAB; AIM2; VAMP8; ABI1;IL27; PECAM1; PLD1; RNF217; RNF213;DOCK2; SMURF1; FTH1; PPIA; WASF2;CASP4; CASP1; HECTD3; S100A12; OAS3;OAS2; CAPN1; PI3; RNF220; TRIM46;LRMP; HLA-C; UBE2G1; IFNAR2; IRAK2;OSM; IRAK4; IL2RB; SNAP25; TNFSF13;TNFSF15; IFIT1; NFASC; FLNB; GGH; TN-FAIP6; PSMD2; ALDH3B1; NOS3; CREB1;PRCP; TNFRSF25; C1QA; ZBTB16; TN-FSF9; TNFSF4; TRIM69; RNF34; XRCC5;CLEC4G; CLEC4A; TNFRSF18; GBP2;GBP7; HERC1; HERC6; CR2; TNFRSF1A;STAT3; STAT2; CD68; HLA-DOA; CHIT1;LAT2; GSN; GOLGA7; APEH; UBE2L6;PTPRN2; FCGR2B; CD58; DOK3; SLC11A1;SIGLEC9; SIGLEC7; FOLR3

182 highriskgenes REACTOME

19

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 20: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S1: Glioma gene set over-representation (continued)

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

TNFs bind their physiological receptors 29 0.004906229 1 0.004608295 TNFSF13; TNFRSF25; TNFSF15; TN-FRSF18; TNFSF9; TNFRSF6B; TNFSF4;TNFRSF1A

8 highriskgenes REACTOME

Voltage gated Potassium channels 44 0.005076968 1 0.005169443 KCNG2; KCNA2; KCNS1; KCND2; KCNQ3;KCNQ1; KCNF1; KCNH7; KCNH2

9 highriskgenes REACTOME

Collagen degradation 35 0.006191992 1 0.004605642 MMP15; MMP10; ADAM17; FURIN;COL18A1; COL9A2; COL16A1; COL17A1

8 highriskgenes REACTOME

TNFR2 non-canonical NF-kB pathway 50 0.007903028 1 0.00627138 TNFSF13; TNFRSF25; TNFSF15; TN-FRSF18; SKP1; TNFSF9; TNFSF4; TRAF3;TNFRSF1A; TNFRSF6B; MAP3K14

11 highriskgenes REACTOME

HATs acetylate histones 142 0.00981636 1 0.011043622 DR1; YEATS2; YEATS4; KAT2B; EP400;TADA2A; HIST1H2BL; HIST1H2BH; PHF20;TAF10; ATXN7L3; EP300; HIST2H2AA3;HIST2H2BE; HIST1H2AL; HIST1H2AG;HIST1H2AC; HIST1H3H; MEAF6; RUVBL2

20 highriskgenes REACTOME

Synthesis of PA 40 0.011593687 1 0.004597701 LPCAT4; LIPH; PLA2G2D; PLA2G4A; PLD6;PLD1; AGPAT4; GPD1L

8 highriskgenes REACTOME

SCF-beta-TrCP mediated degradation of Emi1 10 0.012757504 1 0.00232288 SKP1; CDC20; UBA52; FBXO5 4 highriskgenes REACTOMELGI-ADAM interactions 14 0.012757504 1 0.00232288 STX1B; ADAM23; ADAM11; ADAM22 4 highriskgenes REACTOMEUCH proteinases 102 0.013842122 1 0.009470752 UBA52; PSMD12; ACTR5; INO80C; PSMA1;

PSMA4; HIST2H2AA3; PSMD2; PSMB8;PSMB6; PSMD8; HIST1H2AL; HIST1H2AG;HIST1H2AC; PSMC3; USP15; BAP1

17 highriskgenes REACTOME

Growth hormone receptor signaling 19 0.016938149 1 0.002895194 PTPN6; SOCS3; CISH; ADAM17; STAT3 5 highriskgenes REACTOMEKSRP (KHSRP) binds and destabilizes mRNA 16 0.016938149 1 0.002895194 DCP2; PARN; MAPK14; EXOSC3; EXOSC7 5 highriskgenes REACTOMEPolo-like kinase mediated events 16 0.016938149 1 0.002895194 EP300; CENPF; MYBL2; WEE1; FOXM1 5 highriskgenes REACTOMEPre-NOTCH Expression and Processing 42 0.019850024 1 0.004589788 PRKCI; MOV10; KAT2B; POFUT1; FURIN;

LFNG; ST3GAL4; EP3008 highriskgenes REACTOME

Degradation of the extracellular matrix 105 0.021458182 1 0.008923592 FURIN; ADAMTS4; CAPN1; CDH1; KLK7;BMP1; MMP15; MMP10; NCSTN; COL16A1;SPOCK3; ACAN; COL9A2; COL17A1;ADAM17; COL18A1

16 highriskgenes REACTOME

Metalloprotease DUBs 36 0.021491434 1 0.004027618 KAT2B; HIST1H2AL; HIST1H2AG;HIST1H2AC; HIST2H2AA3; EP300; UBA52

7 highriskgenes REACTOME

Interferon Signaling 158 0.021717868 1 0.011995638 TRIM46; IFNAR2; OAS3; OAS2; IFITM3;IFITM1; TRIM29; CAMK2A; RNASEL;SOCS3; EIF4G1; HLA-C; UBE2L6; IFIT1;GBP7; PTPN2; PTPN6; TRIM8; FLNB;STAT2; PSMB8; GBP2

22 highriskgenes REACTOME

Transcriptional activation of mitochondrial bio-genesis

27 0.022501875 1 0.003462204 CYCS; SOD2; SIRT5; SIRT4; CREB1;GLUD2

6 highriskgenes REACTOME

Signaling by Interleukins 254 0.025564278 1 0.017223382 IL6ST; OSM; IL27; S100A12; MAP3K8;MAP3K3; PIK3CB; PIK3CD; IRAK2; IRAK4;CREB1; INPP5D; SNAP25; SOCS3; UBA52;IL10RB; IL15; IL20RB; STAT3; PTPN6;SKP1; MEF2A; SIGIRR; IL1RAP; MAPK10;HCK; MAPK14; PELI2; BRWD1; IL2RB; IK-BKG; HIST1H3H; CASP1

33 highriskgenes REACTOME

Regulation of IFNA signaling 26 0.025570448 1 0.002320186 IFNAR2; STAT2; PTPN6; SOCS3 4 highriskgenes REACTOMEToxicity of botulinum toxin type C (BoNT/C) 3 0.027974833 1 0.001164822 SNAP25; STX1B 2 highriskgenes REACTOMEGABA A receptor activation 13 0.027974833 1 0.001164822 GABRB3; GABRB2 2 highriskgenes REACTOMEThe AIM2 inflammasome 3 0.027974833 1 0.001164822 CASP1; AIM2 2 highriskgenes REACTOMEmTORC1-mediated signalling 23 0.028095548 1 0.002891845 EEF2K; AKT1S1; YWHAB; RRAGD; EIF4G1 5 highriskgenes REACTOMEBudding and maturation of HIV virion 30 0.03325781 1 0.003458213 CHMP4B; CHMP2A; VPS28; VPS4B;

UBA52; PPIA6 highriskgenes REACTOME

Glyoxylate metabolism and glycine degradation 32 0.03325781 1 0.003458213 GRHPR; GNMT; LIPT1; ALDH4A1; BCK-DHA; DBT

6 highriskgenes REACTOME

Regulation of TP53 Activity through Associa-tion with Co-factors

14 0.03407681 1 0.002318841 BANP; TP73; PPP1R13L; PHF20 4 highriskgenes REACTOME

Mitochondrial biogenesis 53 0.035666601 1 0.004020678 CYCS; MAPK14; SOD2; SIRT5; SIRT4;CREB1; GLUD2

7 highriskgenes REACTOME

NCAM signaling for neurite out-growth 59 0.039471083 1 0.005681818 SPTB; CACNA1I; NRTN; ARTN; COL9A2;CREB1; KRAS; PSPN; GDNF; COL6A1

10 highriskgenes REACTOME

Disease 508 0.039528833 1 0.027738599 FGFR1OP2; PIK3CB; PIK3CD; SNAP25;CDKN1B; KSR2; HCK; VWF; TAF10;APOBEC3G; CHMP4B; BAD; VPS28;RNMT; MAP3K11; RAN; CDH1; UBA52;NUP85; NCSTN; GTF2E2; CHMP2A;DERL1; DERL3; SHH; TBP; SKP1; HDAC5;HDAC6; RANBP1; EP300; PRDX1; YWHAB;FURIN; ERBB3; ERLEC1; CREB1; DOCK2;KAT2B; XRCC5; XRCC4; PAPD7; STX1B;ARRB2; STAT3; VPS4B; PPIA; MAPKAP1;CAPN1; MYC; KRAS; AKT1S1; CDC25B;POLR2G; TAF7; NUP153; NCOR2; FGF19;ADAM17

59 highriskgenes REACTOME

Interleukin-1 family signaling 77 0.039638193 1 0.00676819 IKBKG; PELI2; MAP3K8; S100A12; SKP1;MAP3K3; IRAK2; IRAK4; IL1RAP; CASP1;UBA52; SIGIRR

12 highriskgenes REACTOME

NOTCH1 Intracellular Domain Regulates Tran-scription

47 0.040852221 1 0.005131129 HDAC5; HDAC6; TLE1; KAT2B; NCOR2;SKP1; UBA52; MYC; EP300

9 highriskgenes REACTOME

20

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 21: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S1: Glioma gene set over-representation (continued)

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

NCAM1 interactions 37 0.041501243 1 0.00401837 CACNA1I; NRTN; ARTN; COL9A2; PSPN;GDNF; COL6A1

7 highriskgenes REACTOME

p53-Dependent G1 DNA Damage Response 21 0.043039667 1 0.002888504 CCNE1; CDKN1B; UBA52; CHEK2; PHF20 5 highriskgenes REACTOMEp53-Dependent G1/S DNA damage checkpoint 21 0.043039667 1 0.002888504 CCNE1; CDKN1B; UBA52; CHEK2; PHF20 5 highriskgenes REACTOMEAssembly Of The HIV Virion 17 0.044031253 1 0.002317497 VPS28; UBA52; PPIA; FURIN 4 highriskgenes REACTOMEEicosanoid ligand-binding receptors 15 0.044031253 1 0.002317497 TBXA2R; LTB4R; PTGIR; CYSLTR1 4 highriskgenes REACTOMEAntigen processing: Ubiquitination & Protea-some degradation

263 0.044259631 1 0.016675352 UBE2Q2; ASB3; CUL2; UBR2; SOCS3;FBXL3; UBA52; RNF213; CDC20; UBE2L6;FBXL12; UBE2E3; HERC1; HERC6; UBA6;UBE2C; SKP1; DTX3L; SPSB2; ZBTB16;FBXO41; SH3RF1; RNF220; TRIM69;HECTD3; SMURF1; RNF34; FBXL19;RNF217; UBE2G1; RNF19A; RNF123

32 highriskgenes REACTOME

MAPK family signaling cascades 236 0.045869684 1 0.015287296 SPTB; RASAL3; GDNF; KRAS; ERBB3;CAMK2A; IL6ST; UBA52; RASGRP3;MMP10; KSR2; ANGPT1; NRTN; RAG1;CDC14B; CDC14A; VWF; IL17RD; ARRB2;FGF19; DUSP2; MYC; PSPN; IL2RB; ARTN;KALRN; MAP3K11; YWHAB; ACTN2

29 highriskgenes REACTOME

mTOR signalling 40 0.047921055 1 0.004016064 PRKAA1; EEF2K; AKT1S1; YWHAB;RRAGD; EIF4G1; STRADA

7 highriskgenes REACTOME

Signaling by NOTCH1 73 0.047955595 1 0.006760563 HDAC5; HDAC6; TLE1; KAT2B; NCSTN;ADAM17; NCOR2; EP300; SKP1; UBA52;MYC; ARRB2

12 highriskgenes REACTOME

Mammary gland development pathway - Invo-lution (Stage 4 of 4)

10 0.001626496 0.977524312 0.002905288 STAT3; SOCS3; IL6ST; CDH1; MYC 5 highriskgenes Wikipathways

Integrated Breast Cancer Pathway 66 0.006464185 1 0.007369615 RASGRP3; NUP85; XRCC3; ODC1; EP300;RAD54L; GADD45A; FOSL1; CDH1; MYC;CREB1; KRAS; AHR

13 highriskgenes Wikipathways

Oxidative Damage 40 0.013030469 1 0.005154639 CDKN1B; CYCS; BAD; C5AR1; MAPK10;GADD45A; TRAF3; CR2; C1QA

9 highriskgenes Wikipathways

Interferon alpha-beta signaling 21 0.014371125 1 0.003466205 IFITM3; GBP2; IFITM1; RNASEL; PSMB8;IFIT1

6 highriskgenes Wikipathways

Regulation of Apoptosis by ParathyroidHormone-related Protein

21 0.014371125 1 0.003466205 BID; BCL2L12; PTHLH; BCL2L15; MYC;PIK3CG

6 highriskgenes Wikipathways

NO-cGMP-PKG mediated Neuroprotection 47 0.015361467 1 0.005151689 CYCS; CNGB1; NPPA; NOS3; GUCY1B2;CAMK2A; CREB1; ACTN2; BAD

9 highriskgenes Wikipathways

Glucocorticoid and MineralcorticoidMetabolism

8 0.015820701 1 0.001745201 CYP11A1; CYP21A2; HSD11B2 3 highriskgenes Wikipathways

mir-124 predicted interactions with cell cycleand differentiation

6 0.015820701 1 0.001745201 SIX4; PRKAA1; PTBP1 3 highriskgenes Wikipathways

The human immune response to tuberculosis 22 0.018126628 1 0.003464203 IFNAR2; IFIT1; STAT2; IFITM1; PTPN2;PSMB8

6 highriskgenes Wikipathways

Prostaglandin Synthesis and Regulation 44 0.020919085 1 0.005145798 PTGS1; S100A10; PLA2G4A; CYP11A1;HSD11B2; TBXA2R; ANXA3; PTGIR;ABCC4

9 highriskgenes Wikipathways

Apoptosis Modulation and Signaling 91 0.027414659 1 0.008384572 TNFRSF25; BLK; TNFRSF10B; TN-FRSF10A; TNFRSF6B; TNFRSF1A; BAD;BIRC7; CYCS; BID; TRAF3; CASP6; CASP4;CASP1; MAP3K14

15 highriskgenes Wikipathways

Structural Pathway of Interleukin 1 (IL-1) 44 0.027792027 1 0.00513992 MAP3K8; TANK; MAP3K3; IRAK2; MYC;IRAK4; MAPK14; IL1RAP; MAP3K14

9 highriskgenes Wikipathways

Fluoropyrimidine Activity 34 0.030400651 1 0.004022989 SLC29A1; RRM2; RRM1; XRCC3; GGH;SMUG1; ABCC4

7 highriskgenes Wikipathways

Apoptosis 87 0.031966146 1 0.007847534 TNFRSF25; IKBKG; TNFRSF1A; TN-FRSF10B; MAPK10; MYC; TP73; CYCS;BAD; BID; TRAF3; CASP6; CASP4; CASP1

14 highriskgenes Wikipathways

Cell Cycle 120 0.033932118 1 0.009933775 GADD45A; MYC; CDKN2C; CDKN1B;CCNE1; CHEK2; CDC25B; STAG1; CDC14B;CDC14A; CDC20; CDC6; SKP1; EP300;WEE1; TGFB2; YWHAB; MCM5

18 highriskgenes Wikipathways

Mitotic G1-G1-S phases 20 0.035077937 1 0.002890173 CCNE1; FBXO5; RRM2; MYBL2; CDC6 5 highriskgenes WikipathwaysApoptosis Modulation by HSP70 19 0.035077937 1 0.002890173 CASP6; TNFRSF1A; BID; CYCS; MAPK10 5 highriskgenes WikipathwaysParkin-Ubiquitin Proteasomal System pathway 73 0.035883432 1 0.006772009 HSPA4; PSMD2; UBE2L6; PSMD12;

TUBA1A; CCNE1; UBE2G1; PSMC3;RNF19A; CASP1; PSMD8; HSPA14

12 highriskgenes Wikipathways

TNF alpha Signaling Pathway 93 0.038990875 1 0.008365867 IKBKG; MAP3K8; MAP3K3; CSNK2A1;KRAS; TANK; KSR2; TNFRSF1A; SKP1;SMPD2; PTPRCAP; BAD; PSMD2; BID;MAP3K14

15 highriskgenes Wikipathways

DNA Damage Response (only ATM dependent) 110 0.039497097 1 0.009407858 WNT10A; PIK3CD; PIK3CG; MYC; SOD2;PIK3C2B; PIK3C2A; KRAS; TCF7L1;CDKN1B; PIK3CB; WNT3A; MAPK10;FOSL1; TP73; WNT4; BAD

17 highriskgenes Wikipathways

Microglia Pathogen Phagocytosis Pathway 40 0.041661964 1 0.004576659 PIK3CB; PIK3CD; PIK3CG; PIK3C2A; HCK;PTPN6; C1QA; SIGLEC7

8 highriskgenes Wikipathways

Interleukin-1 family signaling 15 0.044031253 1 0.002317497 PTPN2; PTPN7; PTPN6; PTPN4 4 highriskgenes Wikipathways

21

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 22: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S2: Glioma gene set enrichment analysis (GSEA)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Tryptophan metabolism - Homo sapiens (hu-man)

0.023100107 0.979820681 -0.432518002 -1.550736633 1141 33 AADAT; ALDH9A1; ALDH3A2; IDO1; OGDH;INMT; GCDH

KEGG

Cushing syndrome - Homo sapiens (human) 0.049814344 0.979820681 0.274721105 1.298534716 2548 138 AHR; CCNE1; TCF7L1; WNT10A; ADCY7;CACNA1F; CDKN2C; CAMK2A; STAR; PBX1;WNT3A; CDKN1B; CYP11A1; CACNA1I; USP8;WNT4; CREB1; CYP21A2; ATF6B; GNAI3; FZD3;NCEH1; ADCY9; CDKN2B; CACNA1C; RAP1A;APC2; WNT16; PRKACA; KCNK2; CDK6

KEGG

Renin-angiotensin system - Homo sapiens (hu-man)

0.010151278 0.74865674 0.55331802 1.713839711 511 19 PRCP; KLK1; THOP1; PREP; ANPEP; CTSA; EN-PEP; CPA3; CTSG; AGT; NLN

KEGG

Small cell lung cancer - Homo sapiens (human) 0.029809018 0.979820681 0.316900196 1.400774125 1513 92 CCNE1; BIRC7; TRAF3; CYCS; CDKN1B;PIK3CB; MYC; IKBKG; GADD45A; PIK3CD;ITGA3; SKP2; PTEN; COL4A2; ITGA2; NOS2;DDB2; CDKN2B; BCL2L1; ITGAV; MAX; TP53;PTK2; TRAF1; CDK6; LAMB1; BAX; ITGB1;LAMB2; LAMC1; E2F3; TRAF2; PIK3R1; TRAF6;NFKBIA

KEGG

p53 signaling pathway - Homo sapiens (human) 0.048804487 0.979820681 0.327513728 1.372478184 2479 69 CCNE1; CD82; BID; CYCS; TP73; TNFRSF10B;RRM2; CHEK2; GADD45A; TP53I3; PTEN; DDB2;IGFBP3; BCL2L1; FAS; TP53; CCNB1; ATR;CDK6; BAX; PPM1D; PMAIP1; RCHY1; BBC3;SESN2

KEGG

Viral carcinogenesis - Homo sapiens (human) 0.001408837 0.415606778 0.309546872 1.528384025 71 188 CCNE1; HIST1H2BL; GTF2E2; HDAC6; YWHAB;KRAS; TBP; CDC20; C3; HIST1H2BH; TRAF3;RANBP1; EP300; CDKN1B; STAT3; IL6ST;HIST2H2BE; ATP6V0D2; DLG1; TBPL1; BAD;PIK3CB; HDAC5; GSN; IKBKG; HLA-C; CREB1;KAT2B; PIK3CD; ATF6B; EGR3; MRPS18B; HLA-B; SKP2; HLA-A; EIF2AK2; CDKN2B; TP53;RHOA; PRKACA; POLB; TRAF1; CDK6; BAX;TRADD; MAPKAPK2; SNW1; PMAIP1; GRB2;ACTN4; JAK3; CREB3L2; TRAF2; NRAS; PIK3R1;GTF2E1; NFKBIA; HDAC7; GTF2H3; IRF9;PRKACB; YWHAZ; NFKB1; DDB1; SCRIB; BAK1

KEGG

Hepatitis C - Homo sapiens (human) 0.004599636 0.452297534 0.321690611 1.513911872 234 134 YWHAB; KRAS; IFNAR2; TRAF3; BID; RNASEL;CYCS; OAS2; TNFRSF1A; STAT3; OAS3; SOCS3;STAT2; BAD; PIK3CB; MYC; IFIT1; IKBKG;PIK3CD; CLDN11; EIF2AK1; EIF2AK2; FAS;PPP2R2B; EIF2AK3; TP53; PPP2R1A; CDK6;BAX; TRADD; PSME3; GRB2; CLDN15; E2F3;TRAF2; CLDN16; NRAS; PIK3R1; TRAF6;CXCL10; NFKBIA; NR1H3; IRF9; YWHAZ;NFKB1

KEGG

Epstein-Barr virus infection - Homo sapiens(human)

0.003443486 0.452297534 0.302236718 1.4859104 175 182 CCNE1; HLA-DOA; CD58; IFNAR2; TRAF3; CR2;MAPK10; BID; CYCS; OAS2; CDKN1B; PSMD2;STAT3; OAS3; NCOR2; MAP3K14; PSMC3;STAT2; PIK3CB; MYC; IKBKG; PSMD12; HLA-C;GADD45A; IRAK4; PIK3CD; MAPK14; PSMD8;HLA-B; CALR; SKP2; NFKBIE; PSMD4; SIN3A;HLA-A; DDB2; EIF2AK2; PLCG2; PSMD3; FAS;TP53; ENTPD8; CDK6; BAX; TRADD; PSMC4;SNW1; NFKBIB; ITGAL; JAK3; E2F3; TRAF2;PIK3R1; TRAF6; CXCL10; NFKBIA; BTK; HLA-DRB1; SAP30; IRF9

KEGG

Phototransduction - Homo sapiens (human) 0.041858717 0.979820681 0.475625998 1.513433595 2115 21 CNGB1; CALM3; CALML4; GNB1; PDE6A;GNAT2; CALML3; PDE6G; GNGT1

KEGG

Herpes simplex infection - Homo sapiens (hu-man)

0.020837817 0.979820681 0.287566634 1.375665052 1064 150 IL15; CFP; POU2F3; HLA-DOA; TBP; C3; IF-NAR2; TRAF3; MAPK10; RNASEL; EP300;CYCS; OAS2; TAF10; TNFRSF1A; OAS3; SOCS3;TBPL1; STAT2; IFIT1; IKBKG; SKP1; GTF2IRD1;CSNK2A1; HLA-C; HLA-B; SKP2; SRPK1; HLA-A; EIF2AK1; EIF2AK2; TAF6; FAS; EIF2AK3;TP53; TRAF1; NXF1; IFIH1; PER1; NFKBIB;TRAF2; TRAF6; NXF3; NFKBIA; TAF4; HLA-DRB1; IRF9; MCRS1

KEGG

22

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 23: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Neutrophil degranulation 0.007895092 0.836411609 0.239588011 1.304953275 409 448 FTH1; PLD1; PRDX6; COMMD9; LCN2; VAT1;ATP11A; S100A12; LRMP; ALDH3B1; CFP;TMEM173; AGL; PRCP; GGH; C3; CD58; SLPI;DOCK2; GOLGA7; CHIT1; XRCC5; APEH; ACLY;FOLR3; SNAP25; CAPN1; VNN1; PPIA; PSMD2;SLC11A1; NFASC; NCSTN; PSMC3; PECAM1;CD68; DOK3; SIGLEC9; GSN; ARHGAP9;PSMD12; VAMP8; HLA-C; C5AR1; PTPN6; CD93;ACTR10; FAF2; TNFAIP6; MAPK14; FCGR3B;PTPRN2; HLA-B; CYFIP1; RHOG; GALNS; TM-BIM1; VCL; CAP1; HLA-A; OSTF1; CCT2; HPSE;ANPEP; COTL1; PLAU; CTSA; PYGB; PLEKHO2;RAB10; C1orf35; CD36; SIGLEC5; CTSZ; AGA;RAP2C; ANXA2; RAP1A; PSMD3; XRCC6;TXNDC5; ITGAV; FCGR2C; RHOA; DNAJC5; BPI;ENPP4; RAB27A; FUCA2; ANO6; ATP11B; GRN;ITGAM; CD300A; CYB5R3; LILRB3; RNASET2;HK3; CLEC4D; ATP8B4; MMP9; ALDOA;SERPINB1; NPC2; NIT2; CDA; LILRA3; IT-GAL; TBC1D10C; SERPINA3; S100A9; HRNR;SIRPA; PAFAH1B2; MLEC; HGSNAT; OLR1;NRAS; CYBA; CTSG; IQGAP2; RAB3A; LAMP1;CLEC5A; ALOX5; LTA4H; SLC2A5; SERPINA1;PA2G4; LAMP2; NFKB1; DSN1; PSMA2; RAB6A;SYNGR1; PGAM1; FPR1; TMEM179B; S100A11;EPX; ARL8A; NME1-NME2; ATG7; SLCO4C1;RNASE3; FABP5; CD59; DSP; CCT8; RAB37

Reactome

Recycling pathway of L1 0.031622634 0.88702983 -0.449249346 -1.532444194 1561 27 NUMB; RPS6KA5; RPS6KA2; MSN; RPS6KA3;AP2B1; EZR; DNM2; RDX; RPS6KA1; DPYSL2;CLTA

Reactome

Butyrate Response Factor 1 (BRF1) binds anddestabilizes mRNA

0.041723843 0.88702983 0.509530904 1.53051629 2097 17 YWHAB; EXOSC3; EXOSC7; DCP2; XRN1; MAP-KAPK2; EXOSC5; DCP1A

Reactome

Cytokine Signaling in Immune system 0.00922317 0.836411609 0.241344672 1.304909082 477 414 IL15; MEF2A; PELI2; TRIM29; S100A12; OSM;IL2RB; CAMK2A; HCK; IFNAR2; TRAF3; GBP2;GBP7; MAPK10; UBE2L6; EIF4G1; PTPN2;RNASEL; FLNB; OAS2; TNFSF4; TNFRSF1A;SNAP25; TNFRSF18; INPP5D; IL20RB; CISH;TNFRSF6B; TNFRSF25; STAT3; OAS3; TNFSF9;IRAK2; MAP3K14; IL6ST; BRWD1; SOCS3;STAT2; ADAM17; PIK3CB; TRIM8; TNFSF13;IFIT1; IKBKG; PSMB8; SIGIRR; SKP1; IL27;MAP3K8; HIST1H3H; HLA-C; TRIM46; IL10RB;CREB1; IRAK4; PIK3CD; PTPN6; MAPK14;MAP3K3; CASP1; IFITM3; UBA52; IFITM1;IL1RAP; TNFSF15; CD27; HLA-B; SAA1; EGR1;IL18; UBE2E1; HIST1H3J; HERC5; IRF1; ARIH1;HLA-A; EIF2AK2; IL18BP; IL1A; IFI30; PELI1;GBP6; FYN; TNFRSF13C; PPP2R1A; PRKACA;HIST2H3D; IL18R1; LIF; IL10; SDC1; HIST1H3I;CSF3; EIF4A2; MAPKAPK2; NCAM1; RIPK2; TN-FSF8; EIF4G3; NFKBIB; GRB2; IL2RG; IL12RB2;TAB3; JAK3; VRK3; UBA7; USP18; TRAF2;SUMO1; CNTFR; PIK3R1; TRAF6; NFKBIA;GAB2; IL1RN; HLA-DRB1; IRF9; TRIM38; YW-HAZ; CD4; NFKB1; TRIM5; CLCF1; HAVCR2;TMEM189

Reactome

Translocation of ZAP-70 to Immunologicalsynapse

0.026126833 0.88702983 -0.530111039 -1.598754197 1298 17 PTPN22; HLA-DQB2; CD247; HLA-DPB1; HLA-DPA1; ZAP70; HLA-DQA1

Reactome

HSP90 chaperone cycle for steroid hormone re-ceptors (SHR)

0.040996962 0.88702983 -0.49024243 -1.52459203 2037 19 FKBP4; HSPA1A; NR3C2; HSP90AA1; HSPA1B;NR3C1; PGR; HSPA1L

Reactome

Collagen degradation 0.005869449 0.836411609 0.499213775 1.730725335 296 29 FURIN; COL18A1; COL16A1; ADAM17; MMP15;COL9A2; COL17A1; MMP10; TMPRSS6; MMP12

Reactome

TP53 Regulates Transcription of Cell DeathGenes

0.019492294 0.884904638 0.407372139 1.541179624 988 42 RABGGTB; BID; CASP6; TP73; TNFRSF10B;TNFRSF10A; CASP1; TP53I3; TNFRSF10D;TP53BP2; IGFBP3; FAS; TP53; BAX; PMAIP1;BBC3; TMEM219; BCL6; STEAP3; TP53AIP1

Reactome

Endogenous sterols 0.009686916 0.836411609 0.532156676 1.717629863 486 22 AHR; CYP46A1; CYP7B1; CYP11A1; CYP21A2;FDXR; CYP27A1; PTGIS

Reactome

Polo-like kinase mediated events 0.037114999 0.88702983 0.524954793 1.550798269 1868 16 FOXM1; EP300; CENPF; MYBL2; WEE1; CCNB1;LIN9; RBBP4; LIN54; CDC25C

Reactome

RHO GTPases Activate WASPs and WAVEs 0.004305492 0.836411609 0.477078151 1.73230666 217 35 ABI1; ARPC1A; WIPF3; WASF2; WIPF1; ARPC2;CYFIP1; ACTR3; PTK2; ACTB; ACTG1; GRB2;WASF1; BTK

Reactome

SCF(Skp2)-mediated degradation of p27/p21 0.047345739 0.918893834 0.491828902 1.502615817 2383 18 CCNE1; CDKN1B; SKP1; UBA52; SKP2 ReactomeCyclin E associated events during G1/S transi-tion

0.020869153 0.88702983 0.455670233 1.579764134 1055 29 CCNE1; CDKN1B; MYC; WEE1; SKP1; UBA52;SKP2; MAX

Reactome

Activation of E2F1 target genes at G1/S 0.012868122 0.836411609 0.575808822 1.701028997 647 16 CCNE1; RRM2; CDC6; FBXO5; DHFR; TFDP2 ReactomeG1/S-Specific Transcription 0.012868122 0.836411609 0.575808822 1.701028997 647 16 CCNE1; RRM2; CDC6; FBXO5; DHFR; TFDP2 ReactomeMitotic G1-G1/S phases 0.026535135 0.88702983 0.314030359 1.404619144 1352 98 CCNE1; POLE2; CDKN2C; CDKN1B; RRM2;

MYBL2; CDC6; FBXO5; MYC; MCM5; WEE1;SKP1; UBA52; E2F5; SKP2; DHFR; CDKN2B;TFDP2; MAX; PRIM2; CCNB1; PPP2R1A; LIN9;CDK6; RBBP4

Reactome

Cyclin A:Cdk2-associated events at S phase en-try

0.044584099 0.88702983 0.424354953 1.471197121 2255 29 CCNE1; CDC25B; CDKN1B; WEE1; SKP1;UBA52; SKP2

Reactome

23

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 24: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Metabolism of steroid hormones 0.001403689 0.836411609 0.569680106 1.903149269 70 25 HSD11B2; STAR; STARD3NL; CYP11A1;CYP21A2; HSD17B2; HSD11B1; FDXR

Reactome

Signaling by NOTCH 0.021931113 0.88702983 0.308373207 1.406268922 1119 110 LFNG; TLE1; FURIN; PRKCI; HDAC6; DLGAP5;EP300; MOV10; PBX1; ARRB2; NCOR2; NCSTN;POFUT1; ADAM17; HDAC5; MYC; SKP1; CREB1;ST3GAL4; YBX1; KAT2B; UBA52; JAG1

Reactome

UCH proteinases 0.030116224 0.88702983 0.312156043 1.391053503 1533 96 PSMA1; USP15; HIST1H2AG; PSMB6; ACTR5;PSMD2; PSMC3; INO80C; PSMB8; PSMD12;PSMA4; HIST1H2AL; HIST2H2AA3; HIST1H2AC;PSMD8; UBA52; BAP1; PSMD4; PSMB2; PSMD3;HIST1H2AJ; INO80E; PSMC4; YY1; ACTB;INO80D; PSME3; ACTL6A; TGFB1; MBD6;PSMB10; SENP8; MCRS1; HIST1H2AH; PSMA2

Reactome

Cell surface interactions at the vascular wall 0.042538163 0.88702983 0.287973161 1.32922636 2167 119 KRAS; CD99L2; CD58; ANGPT1; INPP5D; PPIA;THBD; TNFRSF10B; PECAM1; PIK3CB; IGLL1;TNFRSF10A; PTPN6; ITGA3; SLC16A8; CD48;PROC; GRB14; SLC7A7; SDC4; TNFRSF10D;L1CAM; SLC16A3; APOB; SLC7A11; ITGAV;FYN; PF4V1; CD244; SDC1; ITGB1; ITGAM;GRB2; ITGAL; TGFB1; SIRPA; OLR1; NRAS;PIK3R1

Reactome

Ovarian tumor domain proteases 0.018941713 0.884904638 0.421710851 1.560663791 958 38 TRAF3; OTUB2; IKBKG; ZRANB1; UBA52;UBE2D1; PTEN; OTUD7B; TP53; RHOA;OTUD7A; RIPK2; IFIH1; OTUD3; TNIP1; TRAF6

Reactome

Deubiquitination 0.044164467 0.88702983 0.243746284 1.251046765 2274 254 HIST1H2BL; PSMA1; USP15; HIST1H2AG;CDC20; PSMB6; HIST1H2BH; TRAF3; RNF123;ACTR5; EP300; TAF10; ARRB2; PSMD2;HIST2H2BE; PSMC3; MYC; OTUB2; IKBKG;INO80C; USP8; PSMB8; PSMD12; PSMA4;HIST1H2AL; HIST2H2AA3; KAT2B; ZRANB1;HIST1H2AC; PSMD8; UBA52; BAP1; SKP2;UBE2D1; USP24; PSMD4; PTEN; PSMB2;OTUD7B; DDB2; ATXN3; PSMD3; HIST1H2AJ;TP53; RHOA; USP13; USP10; POLB; INO80E;OTUD7A; MAT2B; PSMC4; YY1; ACTB; RIPK2;INO80D; IFIH1; OTUD3; PSME3; ACTL6A;SNX3; TNIP1; USP9X; BECN1; TGFB1; USP18;IDE; TRAF2; USP42; MBD6; TRAF6; NFK-BIA; PSMB10; BRCC3; SENP8; CFTR; MCRS1;RAD23A; HIST1H2AH; PSMA2

Reactome

G1/S Transition 0.008919762 0.836411609 0.370382713 1.558707807 451 70 CCNE1; POLE2; CDKN1B; RRM2; CDC6; FBXO5;MYC; MCM5; WEE1; SKP1; UBA52; SKP2;DHFR; TFDP2; MAX; PRIM2; CCNB1; PPP2R1A

Reactome

Regulation of TP53 Activity through Methyla-tion

0.031415692 0.88702983 0.534796562 1.57987239 1581 16 EP300; EHMT2; CHEK2; UBA52; TP53; TTC5;SMYD2

Reactome

Synthesis of PIPs at the plasma membrane 0.019476039 0.884904638 0.381848951 1.51517898 987 52 PIK3C2A; PIK3CG; PIK3C2B; MTM1; INPP5D;PLEKHA8; PIK3CB; PIP4K2A; PIK3CD; RAB4A;PIP5K1B; PTEN

Reactome

Regulation of TP53 Activity 0.027457495 0.88702983 0.282017677 1.348191886 1404 150 PRKAA1; MEAF6; RNF34; TBP; PRDM1;PPP1R13L; DNA2; PHF20; TAF7; EP300; TAF10;EHMT2; TP73; MAPKAP1; CHEK2; PIP4K2A;CSNK2A1; MAPK14; BANP; BRD7; UBA52;TP53BP2; BRPF3; AURKA; TAF6; BRPF1; TP53;RICTOR; PPP2R1A; ATR; TTC5; SUPT16H;SMYD2; PDPK1; RBBP4; SSRP1; ZNF385A;PPP2R5C; PRR5; TAF12; PRKAB2; RAD9B; TAF4;TOPBP1

Reactome

Transcriptional Regulation by TP53 0.008834161 0.836411609 0.254178091 1.341368606 456 325 CCNE1; POLR2G; PRKAA1; RABGGTB; CDK12;MEAF6; RNF34; YWHAB; TBP; PRDM1;PPP1R13L; DNA2; PHF20; E2F7; PRDX1;BID; TAF7; EP300; CYCS; TAF10; MOV10;EHMT2; CASP6; CDKN1B; TP73; TNFRSF10B;MAPKAP1; CHEK2; RRAGD; PIP4K2A; DDIT4;TNFRSF10A; CSNK2A1; GADD45A; MAPK14;BANP; CASP1; BRD7; TP53I3; UBA52; CCNK;PLK2; FANCC; PTEN; CDK9; TNFRSF10D;DDB2; TP53BP2; IGFBP3; BRPF3; AURKA; TAF6;BRPF1; TFDP2; PMS2; ALK; FAS; TP53; RICTOR;CCNB1; PPP2R1A; ATR; TTC5; CNOT6; BAX;SUPT16H; SMYD2; PDPK1; CCNT1; RBBP4;SSRP1; PMAIP1; MLH1; CNOT1; ZNF385A;PPP2R5C; G6PD; PRR5; TAF12; RRAGA; BBC3;CDC25C; PRKAB2; GSR; SESN2; RAD9B; TAF4;COX6B1; PRDX5; TOPBP1; GTF2H3; POLR2D;TNRC6B; RHEB; COX11; YWHAZ; EXO1

Reactome

Phospholipid metabolism 0.033944847 0.88702983 0.261235641 1.297459942 1742 197 PLD1; LPCAT4; PHOSPHO1; PIK3C2A; PN-PLA2; AGPAT4; PIK3CG; PLD6; PIK3C2B; MTM1;INPP5D; LPIN2; PLEKHA8; PITPNM3; PIK3CB;PIP4K2A; STARD7; CSNK2A1; PEMT; DGAT2;PLA2G2D; PIK3CD; LPCAT3; LIPH; PLA2G4A;GPD1L; CEPT1; PLBD1; PITPNM2; PLA2G3;RAB4A; OSBPL10; NDUFS3; PIP5K1B; PTEN;TNFAIP8L1; TNFAIP8; DDHD1; PNPLA3; PCYT2

Reactome

The phototransduction cascade 0.041476476 0.88702983 0.43702527 1.488698054 2098 27 CNGB1; RGS9BP; PPEF1; FNTA; GNB1; PDE6A;PDE6G; GNGT1; METAP2; NMT2

Reactome

24

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 25: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Aflatoxin activation and detoxification 0.013192612 0.836411609 -0.550534601 -1.685703373 654 18 DPEP2; CYP3A5; GGT6; AKR7A2; GGT7; ACY3;CYP3A4; DPEP3

Reactome

APC/C-mediated degradation of cell cycle pro-teins

0.033248385 0.88702983 0.39855507 1.483460988 1682 39 CDC14A; NEK2; CDC20; UBE2C; FBXO5; SKP1;UBA52; UBE2D1; UBE2E1; ANAPC1; CDC26; AU-RKA; CCNB1

Reactome

Regulation of mitotic cell cycle 0.033248385 0.88702983 0.39855507 1.483460988 1682 39 CDC14A; NEK2; CDC20; UBE2C; FBXO5; SKP1;UBA52; UBE2D1; UBE2E1; ANAPC1; CDC26; AU-RKA; CCNB1

Reactome

Cytosolic tRNA aminoacylation 0.00762633 0.836411609 0.525207006 1.7343984 384 24 AIMP2; QARS; YARS; AARS; EEF1E1; RARS;GARS; EPRS; PPA1

Reactome

Regulation of mRNA stability by proteins thatbind AU-rich elements

0.01500099 0.884904638 0.432597239 1.590864699 757 37 YWHAB; EIF4G1; ANP32A; EXOSC3; TNFSF13;EXOSC7; DCP2; PARN; MAPK14; PABPC1;XRN1; ZFP36; MAPKAPK2

Reactome

Interferon alpha/beta signaling 0.006126872 0.836411609 0.404884881 1.631392666 310 56 IFNAR2; GBP2; RNASEL; OAS2; OAS3; SOCS3;STAT2; IFIT1; PSMB8; HLA-C; PTPN6; IFITM3;IFITM1; HLA-B; EGR1; IRF1; HLA-A

Reactome

Serotonin Neurotransmitter Release Cycle 0.02909532 0.88702983 0.527192939 1.583569073 1462 17 PPFIA2; SNAP25; UNC13B; PPFIA3; SYT1; SYN3;RIMS1; RAB3A

Reactome

cGMP effects 0.030534351 0.88702983 -0.514105346 -1.574159217 1515 18 KCNMA1; PRKG1; PDE1A; PDE11A; KCNMB4;PRKG2

Reactome

Norepinephrine Neurotransmitter Release Cy-cle

0.031534841 0.88702983 0.53463497 1.57939502 1587 16 PPFIA2; SNAP25; UNC13B; PPFIA3; SYT1;SLC22A1; RIMS1; RAB3A

Reactome

Interferon Signaling 0.018124707 0.884904638 0.294380964 1.393777097 928 140 TRIM29; CAMK2A; IFNAR2; GBP2; GBP7;UBE2L6; EIF4G1; PTPN2; RNASEL; FLNB; OAS2;OAS3; SOCS3; STAT2; TRIM8; IFIT1; PSMB8;HLA-C; TRIM46; PTPN6; IFITM3; IFITM1; HLA-B; EGR1; UBE2E1; HERC5; IRF1; ARIH1; HLA-A;EIF2AK2

Reactome

ISG15 antiviral mechanism 0.037860327 0.88702983 0.426422988 1.489971563 1910 30 UBE2L6; EIF4G1; FLNB; IFIT1; UBE2E1; HERC5;ARIH1; EIF2AK2; EIF4A2; EIF4G3; UBA7

Reactome

Antiviral mechanism by IFN-stimulated genes 0.037860327 0.88702983 0.426422988 1.489971563 1910 30 UBE2L6; EIF4G1; FLNB; IFIT1; UBE2E1; HERC5;ARIH1; EIF2AK2; EIF4A2; EIF4G3; UBA7

Reactome

TCF dependent signaling in response to WNT 0.044771222 0.88702983 -0.262438651 -1.285680698 2182 170 TNKS; KLHL12; CREBBP; LGR5; CXXC4;LGR4; HIST1H4J; KAT5; SOX6; MEN1; TLE4;WNT3; HIST1H3B; USP34; HIST1H4H; RNF43;HIST1H3F; FRAT2; TRRAP; SOX13; CTNNBIP1;HIST1H4E; FRAT1; BCL9; HIST1H4B; HIST1H3A;LEF1; SMURF2; CTNNB1; PPP2R5E; LRP5;TLE3; CSNK2B; LGR6; SOX9; DKK1; DACT1;HIST1H3G; HIST1H2BI; H2AFZ; HIST1H2BO;RNF146; PPP2R1B; UBC; SFRP1

Reactome

Steroid hormones 0.00383331 0.836411609 0.485293294 1.749989429 193 34 HSD11B2; STAR; STARD3NL; CYP11A1;CYP21A2; HSD17B2; HSD11B1; CUBN; FDXR

Reactome

Cell Cycle 0.039460775 0.784765749 0.292631447 1.343666031 2010 114 CCNE1; CDC14A; YWHAB; CDC20; CDKN2C;CDC25B; EP300; TGFB2; CDKN1B; CHEK2;CDC6; MYC; MCM5; WEE1; STAG1; SKP1;CDC14B; GADD45A; E2F5; SKP2; ANAPC1;ESPL1; SMC1A; CDKN2B; STAG2; TFDP2; TP53;CCNB1; ATR; CDK6

Wikipathways

Apoptosis 0.002525053 0.431784108 0.380687215 1.649678682 127 82 CASP4; TRAF3; MAPK10; BID; CYCS; TN-FRSF1A; CASP6; TNFRSF25; TP73; TNFRSF10B;BAD; MYC; IKBKG; CASP1; NFKBIE; IRF1;DFFB; BCL2L1; FAS; TP53; TRAF1; BAX;TRADD; PMAIP1; NFKBIB; HELLS; BCL2L2;BBC3; TRAF2; PIK3R1; NFKBIA

Wikipathways

Integrated Breast Cancer Pathway 0.006332746 0.721933071 0.39496863 1.6158355 319 61 AHR; KRAS; CDH1; EP300; FOSL1; ODC1; MYC;RAD54L; RASGRP3; XRCC3; GADD45A; CREB1;NUP85; ANXA1; GDI1; IMPA1; AURKA; RAP1A;BAX; TFPI; GRN

Wikipathways

Interferon alpha-beta signaling 0.038744395 0.784765749 0.480759919 1.529793683 1943 21 GBP2; RNASEL; IFIT1; PSMB8; IFITM3; IFITM1;EGR1

Wikipathways

Mitotic G1-G1-S phases 0.016970371 0.784765749 0.532948874 1.652437416 853 19 CCNE1; RRM2; MYBL2; CDC6; FBXO5; DHFR WikipathwaysIL17 signaling pathway 0.046633872 0.784765749 0.427732478 1.468976466 2350 28 IL17RA; TRAF3; STAT3; MAP3K14; IKBKG;

IL17RD; NFKBIB; TRAF6; CEBPB; NFKB1;IL17C

Wikipathways

Peptide GPCRs 0.042102716 0.784765749 -0.362677639 -1.423763055 2087 49 CCR7; CXCR1; TACR1; TAC4; MC1R; SSTR3; ED-NRA; ATP8A1; CCR2; CCKBR; GNRHR; CCR9;GALR2; CCR10; FPR2

Wikipathways

Preimplantation Embryo 0.027191051 0.784765749 0.394865977 1.500993518 1370 43 TCF7L1; SIX3; MXD1; CDH1; PBX1; FOXD1;IRX5; E2F5; EGR1; SOX2; POU5F1; AQP9;ZFP36L2; ZFP36

Wikipathways

Transcription factor regulation in adipogenesis 0.028016939 0.784765749 -0.491467316 -1.571775394 1395 21 SLC2A4; FOXO1; IL6; NR3C1; CEBPA; LPIN1;PPARG; PCK2

Wikipathways

BDNF-TrkB Signaling 0.044264314 0.784765749 0.407704852 1.456766535 2227 33 PIK3CG; KRAS; HOMER1; EEF2K; TRPC6;CREB1; GRB2; MKNK1; NRAS; BDNF; GAB2;RHEB; GAB1; RPS6KB1

Wikipathways

TP53 Regulates Transcription of Cell DeathGenes

0.001051295 0.359542984 0.558943936 1.919600523 52 28 BID; CASP6; TNFRSF10B; TNFRSF10A; CASP1;TP53I3; TNFRSF10D; IGFBP3; FAS; BAX;PMAIP1; BBC3; BCL6; STEAP3; TP53AIP1

Wikipathways

Oxidative Damage 0.017136178 0.784765749 0.422129083 1.56995052 864 39 C1QA; TRAF3; CR2; MAPK10; CYCS; CDKN1B;BAD; GADD45A; C5AR1; BAG4; NFKBIE;TRAF1; TRAF2; TRAF6; MAP3K9; NFKB1; BAK1

Wikipathways

25

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 26: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S2: Glioma gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Protein alkylation leading to liver fibrosis 0.044603069 0.784765749 -0.360719175 -1.416074714 2211 49 PDGFRA; SMAD3; COL4A5; IL6; RELB; SMAD2;COL4A4; COL4A1; NFKB2; COL4A3; MAPK12;SMAD1; PPARG; TIMP2; TIMP1; MMP1; MMP2;COL4A6; MAPK13; PARP1; TNFSF10; FASLG;MAPK3

Wikipathways

G1 to S cell cycle control 0.019794925 0.784765749 0.370655465 1.501831771 999 58 CCNE1; POLE2; CDKN2C; CDKN1B; MYC;MCM5; WEE1; GADD45A; CREB1; ATF6B;CDKN2B; TFDP2; PRIM2; TP53; CCNB1; CDK6

Wikipathways

Tryptophan metabolism 0.040490649 0.784765749 -0.38724215 -1.452814372 2006 40 AADAT; ALDH9A1; ALDH3A2; IDO1; OGDH;INMT; GCDH

Wikipathways

DNA Damage Response 0.033421353 0.784765749 0.342399972 1.423833769 1695 66 CCNE1; BID; CYCS; CDKN1B; TNFRSF10B;CHEK2; MYC; TLK1; GADD45A; CREB1;RAD51; DDB2; SMC1A; FAS; TP53; CCNB1; ATR;CDK6; BAX; PMAIP1; BBC3; CDC25C; H2AFX;CDK4; TP53AIP1; GADD45G; CHEK1; CCNB2;TLK2; RRM2B; E2F1; PRKDC

Wikipathways

Prostaglandin Synthesis and Regulation 0.012317274 0.784765749 0.424236849 1.604102106 620 42 PTGIR; ABCC4; HSD11B2; TBXA2R; CYP11A1;ANXA3; PTGS1; S100A10; PLA2G4A; ANXA1;HSD11B1; PTGIS; ANXA2; EDN1

Wikipathways

26

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 27: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S3: HNSC gene set over-representation

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

Nicotine addiction - Homo sapiens (human) 40 0.018223195 1 0.003460208 GRIA3; CACNA1B; GRIN3A; GRIN1;GABRB2; GRIN2A

6 lowriskgenes KEGG

Glutamatergic synapse - Homo sapiens (hu-man)

114 0.018652582 1 0.009444444 DLGAP1; GRIN1; SLC38A3; GNG2;ADCY2; GRM5; PLCB4; ITPR2; ITPR3;GNG11; SHANK2; GRIA3; GLS2; GRIN3A;PPP3CC; GRIN2A; SLC1A1

17 lowriskgenes KEGG

Autophagy - other - Homo sapiens (human) 32 0.021616644 1 0.004022989 GABARAPL1; GABARAPL2; ATG16L1;ATG7; ATG5; ATG9A; ATG2B

7 lowriskgenes KEGG

Butanoate metabolism - Homo sapiens (human) 28 0.02261948 1 0.003458213 ACSM1; ACSM3; HMGCS1; HMGCL;HADH; ECHS1

6 lowriskgenes KEGG

Long-term potentiation - Homo sapiens (hu-man)

67 0.038039107 1 0.006221719 RAF1; EP300; GRM5; GRIN1; PLCB4;ITPR2; ITPR3; RPS6KA2; GRIN2A; PPP3CC;CREBBP

11 lowriskgenes KEGG

Base excision repair - Homo sapiens (human) 33 0.041726265 1 0.004013761 POLB; POLL; OGG1; POLD1; LIG3; PCNA;PARP3

7 lowriskgenes KEGG

Beta oxidation of butanoyl-CoA to acetyl-CoA 5 0.00370472 1 0.001745201 ECHS1; ACSM3; HADH 3 lowriskgenes REACTOMERegulation of TP53 Activity through Acetyla-tion

30 0.004941329 1 0.004602992 HDAC1; AKT1; MAP2K6; EP300; PIP4K2B;ING5; BRD1; ING2

8 lowriskgenes REACTOME

PI5P Regulates TP53 Acetylation 9 0.005029165 1 0.00232288 MAP2K6; EP300; ING2; PIP4K2B 4 lowriskgenes REACTOMEMiscellaneous substrates 12 0.012809503 1 0.002320186 CYP2D6; CYP4B1; CYP2S1; CYP4F3 4 lowriskgenes REACTOMESynthesis of pyrophosphates in the cytosol 11 0.012809503 1 0.002320186 NUDT3; NUDT4; PPIP5K2; PPIP5K1 4 lowriskgenes REACTOMEMitochondrial Fatty Acid Beta-Oxidation 38 0.014087781 1 0.004589788 MCAT; PCCA; ACAD11; ECHS1; ACSM3;

ACOT9; HADH; MMAA8 lowriskgenes REACTOME

CYP2E1 reactions 11 0.015871668 1 0.001743173 CYP2D6; CYP2A6; CYP2S1 3 lowriskgenes REACTOMEPIWI-interacting RNA (piRNA) biogenesis 14 0.018556833 1 0.002318841 TDRD1; TDRD6; MAEL; TDRD9 4 lowriskgenes REACTOMEInositol phosphate metabolism 50 0.018708496 1 0.005694761 MIOX; PLCH2; PLCB4; NUDT3; NUDT4;

INPP5A; INPP5D; INPP5J; PPIP5K2;PPIP5K1

10 lowriskgenes REACTOME

Signaling by NOTCH3 44 0.024346622 1 0.005136986 WWP2; HEY1; EGF; DLL1; EP300; PBX1;MAMLD1; UBC; CREBBP

9 lowriskgenes REACTOME

APEX1-Independent Resolution of AP Sites viathe Single Nucleotide Replacement Pathway

7 0.025727636 1 0.00174216 POLB; LIG3; OGG1 3 lowriskgenes REACTOME

Toxicity of botulinum toxin type G (BoNT/G) 3 0.028037765 1 0.001163467 VAMP1; VAMP2 2 lowriskgenes REACTOMEE2F-enabled inhibition of pre-replication com-plex formation

9 0.028037765 1 0.001163467 CDK1; MCM8 2 lowriskgenes REACTOME

TRAF3-dependent IRF activation pathway 13 0.03420592 1 0.002316155 IFIH1; IRF7; EP300; CREBBP 4 lowriskgenes REACTOMESynaptic adhesion-like molecules 21 0.03523289 1 0.002886836 GRIA3; LRFN3; GRIN2A; LRFN4; GRIN1 5 lowriskgenes REACTOMEIntraflagellar transport 41 0.03662046 1 0.004574042 DYNLL2; IFT20; TNPO1; IFT74; TTC26;

IFT46; IFT81; TTC21B8 lowriskgenes REACTOME

NOTCH3 Intracellular Domain Regulates Tran-scription

20 0.043225634 1 0.00288517 HEY1; EP300; MAMLD1; CREBBP; PBX1 5 lowriskgenes REACTOME

DNA Mismatch Repair 9 0.008339762 1 0.002321532 PCNA; MSH6; RFC1; POLD1 4 lowriskgenes WikipathwaysHematopoietic Stem Cell Gene Regulation byGABP alpha-beta Complex

19 0.008593 1 0.003466205 DNMT3B; GZMB; SMAD4; GABPB1;EP300; CREBBP

6 lowriskgenes Wikipathways

Senescence and Autophagy in Cancer 106 0.016982469 1 0.009449694 TNFSF15; IGF1; SLC39A2; ING1; ING2;VTN; SMAD4; RAF1; IRF7; GABARAPL1;GABARAPL2; ATG16L1; ATG7; ATG5;PCNA; PLAT; SQSTM1

17 lowriskgenes Wikipathways

Caloric restriction and aging 9 0.038146719 1 0.001741149 SIRT1; IGF1; AKT1 3 lowriskgenes WikipathwaysSudden Infant Death Syndrome (SIDS) Suscep-tibility Pathways

159 0.045005202 1 0.010917031 DEAF1; SCN4B; GRIN1; SLC25A4; GATA2;SCN3B; GAPDH; TSPYL1; LMX1B; ESR2;MAOA; CREBBP; HDAC1; EP300; SOX2;SPTBN1; VIPR2; NOS1AP; PBX1; VAMP2

20 lowriskgenes Wikipathways

Initiation of transcription and translation elon-gation at the HIV-1 LTR

32 0.04710214 1 0.003450259 HDAC1; NFKBIA; SUPT5H; EP300; PPP3CC;CREBBP

6 lowriskgenes Wikipathways

Pathways Affected in Adenoid Cystic Carci-noma

63 0.049386532 1 0.005668934 FOXP2; RAF1; EP300; MAGI2; BRCA1;BRD1; AKT1; HIST1H2AL; IL17RD;CREBBP

10 lowriskgenes Wikipathways

Sulfur relay system - Homo sapiens (human) 8 0.005018461 0.818003735 0.00232423 MOCS3; MPST; CTU1; TST 4 highriskgenes KEGGAfrican trypanosomiasis - Homo sapiens (hu-man)

34 0.005096596 0.818003735 0.005166475 IDO1; APOA1; PRKCA; IFNG; HPR; THOP1;HBA1; HBA2; IL6

9 highriskgenes KEGG

beta-Alanine metabolism - Homo sapiens (hu-man)

31 0.017851169 1 0.004027618 AOC3; ALDH3A1; ABAT; HADHA; GAD1;ALDH3B1; ALDH1B1

7 highriskgenes KEGG

Hepatocellular carcinoma - Homo sapiens (hu-man)

168 0.018840372 1 0.013484358 SMARCC1; NQO1; ARAF; NFE2L2; SHC3;ACTB; WNT7B; GADD45A; ARID2; TERC;FZD2; FZD5; TGFBR1; LEF1; WNT10A;LRP5; WNT3; WNT1; GSTM3; CDK4;CDK6; PLCG2; ACTL6A; PRKCA; E2F1

25 highriskgenes KEGG

Salmonella infection - Homo sapiens (human) 86 0.02651946 1 0.007851935 IFNG; CXCL1; CXCL2; ARPC3; ACTB;FLNB; FLNC; MYH9; ARPC5L; IL6;ARPC1B; PKN2; DYNC1H1; CASP1

14 highriskgenes KEGG

Malaria - Homo sapiens (human) 49 0.027884803 1 0.005136986 HBA1; HBA2; THBS2; IL6; IFNG; COMP;LRP1; CSF3; TLR2

9 highriskgenes KEGG

Mineral absorption - Homo sapiens (human) 51 0.031868233 1 0.005134056 SLC26A9; MT1X; MT1M; SLC34A2; TRPV6;FTL; ATP1A3; STEAP2; SLC8A1

9 highriskgenes KEGG

Tyrosine metabolism - Homo sapiens (human) 36 0.035764986 1 0.00401837 AOC3; ALDH3A1; FAHD1; GOT1; ADH1B;ALDH3B1; COMT

7 highriskgenes KEGG

27

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 28: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S3: HNSC gene set over-representation (continued)

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

Oocyte meiosis - Homo sapiens (human) 125 0.049580177 1 0.009387079 ESPL1; CAMK2A; CDC20; CDK2; ANAPC7;RPS6KA1; CALML4; SPDYA; PPP2R5E;SKP1; PTTG2; CPEB1; SPDYE1; CPEB2;YWHAB; YWHAG; CCNB2

17 highriskgenes KEGG

Synthesis of PE 14 0.004319961 1 0.002900232 CPT1B; PHOSPHO1; PCYT2; ETNK2;ETNK1

5 highriskgenes REACTOME

Scavenging of heme from plasma 12 0.004319961 1 0.002900232 LRP1; HPR; HBA1; HBA2; APOA1 5 highriskgenes REACTOMERUNX1 and FOXP3 control the development ofregulatory T lymphocytes (Tregs)

10 0.008322419 1 0.00232288 CTLA4; IL2RA; IFNG; CBFB 4 highriskgenes REACTOME

G2 Phase 5 0.008557166 1 0.001745201 CCNA2; CDK2; E2F1 3 highriskgenes REACTOMEReelin signalling pathway 5 0.008557166 1 0.001745201 DAB1; RELN; VLDLR 3 highriskgenes REACTOMEGlycerophospholipid biosynthesis 133 0.010844308 1 0.01103144 HADHA; GPAM; LPCAT4; ETNK2;

PCYT1A; ETNK1; PCYT2; CSNK2A2;CPT1B; PITPNB; PTDSS1; LCLAT1;PITPNM3; CPNE1; DDHD2; LPCAT3;PHOSPHO1; SLC44A5; PEMT; CPNE3

20 highriskgenes REACTOME

Collagen formation 92 0.012195197 1 0.008417508 COL11A1; COL11A2; P4HA3; COL23A1;PCOLCE; COL9A3; SERPINH1; PPIB;COL10A1; LOXL1; PLOD3; PLOD1; CTSB;ITGB4; COL6A2

15 highriskgenes REACTOME

Collagen biosynthesis and modifying enzymes 68 0.014463281 1 0.006798867 COL10A1; COL11A1; COL11A2; PCOLCE;P4HA3; PLOD3; PLOD1; COL9A3; SER-PINH1; COL23A1; COL6A2; PPIB

12 highriskgenes REACTOME

Binding and Uptake of Ligands by ScavengerReceptors

41 0.015415948 1 0.005148741 STAB2; APOA1; MSR1; HPR; HBA1; HBA2;FTL; LRP1; HSP90B1

9 highriskgenes REACTOME

PTK6 Regulates Cell Cycle 6 0.015846172 1 0.001744186 PTK6; CDK2; CDK4 3 highriskgenes REACTOMERHO GTPases Activate WASPs and WAVEs 37 0.019913623 1 0.004587156 CYFIP2; ARPC3; ACTB; ARPC1B; WIPF2;

WIPF1; ACTR3; BTK8 highriskgenes REACTOME

Neutrophil degranulation 485 0.024465713 1 0.02752729 HSPA8; DBNL; ORM1; SNAP23; TRAPPC1;PRG2; FAF2; FTL; PSMD12; SIRPB1;GLA; RAP1B; IGF2R; S100A9; TMEM63A;MME; PGLYRP1; GYG1; ALDH3B1;CD44; C5AR1; TMEM173; DYNLT1;METTL7A; TMEM179B; SLC11A1; GCA;PGAM1; ATP6V0C; PRDX6; SVIP; IQ-GAP1; DNAJC13; DYNC1H1; TRIM24;CTSB; CXCL1; XRCC5; PSMD11; KRT1;PSMA5; DSG1; PLAUR; MOSPD2; ATP11B;RNASE3; PA2G4; ADAM8; TLR2; ATP8B4;FUCA2; GLB1; CD58; DEFA1B; CPNE3;CPNE1; MMP8; GLIPR1

58 highriskgenes REACTOME

2-LTR circle formation 7 0.025687478 1 0.001743173 BANF1; XRCC5; XRCC4 3 highriskgenes REACTOMEErythrocytes take up oxygen and release carbondioxide

9 0.025687478 1 0.001743173 CA4; HBA1; HBA2 3 highriskgenes REACTOME

Immune System 1827 0.026871103 1 0.05670272 HSPA8; ORM1; TRAPPC1; CLEC2B;CLEC2D; LILRB4; ANAPC7; RPS6KA1;PSMD11; PSMD12; IGF2R; PELI1; PG-LYRP1; CBLB; CD44; EIF4E2; CFI; TRIM11;TRIM10; TMEM179B; GCA; ATP11B;CD300LF; PRDX6; SVIP; WIPF2; WIPF1;IFI27; TRIM29; TRIM24; CD209; C4BPB;EGR1; HLA-DQB1; RNF138; PTPN4;PSMA5; DSG1; LGALS9; PLAUR; SIRPB1;IFI35; PA2G4; SOCS3; ARPC1B; ADAM8;CCNF; IL4R; ARPC3; ATP8B4; HIST1H3J;NFKB2; DEFA1B; TRAF2; CUL5; CAMK2A;DBNL; CD1B; HSP90B1; TNFRSF4; GLMN;GLA; RAP1B; CD200R1; IL2RA; MME;PLCG2; CTSB; NCR1; SEC61B; FUCA2;TMEM173; NFATC3; SKP1; DNAJC13;UBE2J2; MAP3K8; CXCL1; ATG12;UBE2M; KRT1; NLRX1; IL19; IL6; MO-SPD2; IL23R; RNASE3; TLR2; RNF7;HLA-G; FAF2; SEC23A; CDC20; ACTR3;GLB1; CPNE3; CPNE1; USP18; IL27RA;COLEC10; RASGRP4; PRG2; HCK; S100A9;TMEM63A; AP1S2; GYG1; BTK; C5AR1;DYNLT1; IFITM2; RNF19B; RNF19A; CSF1;CSF3; ACTB; PROS1; UNC93B1; UBE2E1;FBXO22; IQGAP1; NOD1; YWHAB;DYNC1H1; MMP8; CTLA4; KIR3DL1;PPP2R5E; CYFIP2; YES1; CASP1; STUB1;KLHL5; BTN3A2; RIPK1; TXK; GLIPR1;IL2RB; SNAP23; ATP6V1F; ATP6V1A; FTL;FLNB; EIF4A1; ITGB1; ALDH3B1; CREB1;METTL7A; ITGB7; IL9R; PGAM1; BTLA;ATP6V0C; DUSP7; CD101; CD247; XRCC5;PRR5; TNFRSF17; GBP3; TNFRSF1A;ICAM4; PPL; GAN; UBOX5; ASB16; DNM2;IFNG; BTBD6; CD58; SEC31A; SEC22B;SLC11A1

173 highriskgenes REACTOME

PTK6 Down-Regulation 3 0.028006291 1 0.001164144 SRMS; PTK6 2 highriskgenes REACTOMESynthesis of glycosylphosphatidylinositol(GPI)

18 0.028158923 1 0.002890173 PIGG; PIGN; PIGP; PIGX; PIGY 5 highriskgenes REACTOME

28

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 29: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S3: HNSC gene set over-representation (continued)

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

SCF(Skp2)-mediated degradation of p27/p21 18 0.028158923 1 0.002890173 CCNA2; PTK6; SKP1; CDK4; CDK2 5 highriskgenes REACTOMESignaling by PTK6 54 0.031364096 1 0.005685048 PTK6; GPNMB; ERBB4; SRMS; BCAR1;

EPAS1; SOCS3; CDK2; CDK4; EREG10 highriskgenes REACTOME

Signaling by Non-Receptor Tyrosine Kinases 54 0.031364096 1 0.005685048 PTK6; GPNMB; ERBB4; SRMS; BCAR1;EPAS1; SOCS3; CDK2; CDK4; EREG

10 highriskgenes REACTOME

FGFR4 ligand binding and activation 14 0.038088888 1 0.00174216 FGF18; FGFR4; FGF17 3 highriskgenes REACTOMECHL1 interactions 9 0.038088888 1 0.00174216 ITGB1; HSPA8; CHL1 3 highriskgenes REACTOMEMismatch repair (MMR) directed byMSH2:MSH3 (MutSbeta)

14 0.044112555 1 0.002316155 RPA1; MLH1; RPA3; LIG1 4 highriskgenes REACTOME

TP53 Regulates Transcription of Genes In-volved in G1 Cell Cycle Arrest

14 0.044112555 1 0.002316155 CCNA2; E2F1; CDK2; E2F8 4 highriskgenes REACTOME

Early Phase of HIV Life Cycle 15 0.044112555 1 0.002316155 LIG1; BANF1; XRCC5; XRCC4 4 highriskgenes REACTOMEMismatch repair (MMR) directed byMSH2:MSH6 (MutSalpha)

14 0.044112555 1 0.002316155 MLH1; RPA1; RPA3; LIG1 4 highriskgenes REACTOME

Synthesis of PC 29 0.04698885 1 0.003452244 CPT1B; PHOSPHO1; SLC44A5; CSNK2A2;PCYT1A; PEMT

6 highriskgenes REACTOME

Kennedy pathway from Sphingolipids 13 0.000916315 0.550705476 0.003480278 PCYT1A; PTDSS1; PEMT; PCYT2; ETNK2;ETNK1

6 highriskgenes Wikipathways

Interleukin-10 signaling 38 0.007625464 1 0.00516055 CCR1; CCL20; CCL22; CXCL1; CXCL2; TN-FRSF1A; CSF1; CSF3; IL6

9 highriskgenes Wikipathways

Tumor suppressor activity of SMARCB1 31 0.007744666 1 0.004600345 SMARCC1; EED; H3F3A; H3F3B; GLI4;CDK6; CDK4; ACTL6A

8 highriskgenes Wikipathways

Cytokines and Inflammatory Response 28 0.008568904 1 0.003468208 CSF1; IFNG; CXCL1; CXCL2; CSF3; IL6 6 highriskgenes WikipathwaysSignaling by PTK6 2 0.010001747 1 0.001164822 SOCS3; PTK6 2 highriskgenes WikipathwaysAlanine and aspartate metabolism 12 0.012783485 1 0.002321532 ABAT; GPT; GOT1; GAD1 4 highriskgenes WikipathwaysRUNX1 and FOXP3 control the development ofregulatory T lymphocytes (Tregs)

7 0.015846172 1 0.001744186 IL2RA; IFNG; CTLA4 3 highriskgenes Wikipathways

Transcriptional Regulation by MECP2 30 0.022560624 1 0.003460208 GAD1; PVALB; MOBP; CREB1; PTPN4;TRPC3

6 highriskgenes Wikipathways

Purine metabolism 7 0.025687478 1 0.001743173 ATIC; ADA; AMPD1 3 highriskgenes WikipathwaysSecretion of Hydrochloric Acid in Parietal Cells 4 0.028006291 1 0.001164144 CCKBR; HRH2 2 highriskgenes WikipathwaysApoE and miR-146 in inflammation andatherosclerosis

8 0.038088888 1 0.00174216 TRAF4; TLR2; NFKB2 3 highriskgenes Wikipathways

Spinal Cord Injury 117 0.039683084 1 0.009402655 GADD45A; RTN4R; AQP4; IFNG; CXCL1;CXCL2; IL6; EGR1; CDK2; PRKCA; GJA1;EFNB2; VCAN; CDK4; FOXO3; SOX9; E2F1

17 highriskgenes Wikipathways

NAD+ biosynthetic pathways 22 0.043132587 1 0.002886836 CD38; IDO1; SIRT3; SIRT6; SIRT5 5 highriskgenes WikipathwaysBreast cancer pathway 154 0.044118246 1 0.011425462 ARAF; GADD45A; SHC3; FZD2; CSNK2A2;

WNT7B; RAD51; SKP1; WNT10A; FZD5;NFKB2; FGF17; LEF1; LRP5; WNT3; WNT1;CDK4; CDK6; FGF18; E2F1; DLL4

21 highriskgenes Wikipathways

Genotoxicity pathway 64 0.049223554 1 0.00567215 CBLB; TNFRSF17; NLRX1; B3GNT2;FBXO22; SEL1L; MEX3B; BRMS1L; E2F8;GADD45A

10 highriskgenes Wikipathways

29

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 30: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S4: HNSC gene set enrichment analysis (GSEA)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

B cell receptor signaling pathway - Homo sapi-ens (human)

0.009615951 0.622169841 -0.372038015 -1.563666442 489 69 PPP3CC; NFKBIA; RAF1; SYK; INPP5D;FCGR2B; CD79A; CD19; RASGRP3; AKT1;RAC2; PIK3R2; MAP2K2; MAPK1; CARD11;PIK3CB; BLNK; MALT1; SOS2; PIK3CA; PPP3CA

KEGG

Fc epsilon RI signaling pathway - Homo sapi-ens (human)

0.014185095 0.622169841 -0.376578882 -1.538260898 720 60 PDPK1; RAF1; MAP2K6; SYK; INPP5D; FCER1A;MS4A2; AKT1; RAC2; MAPK10; PIK3R2;PLA2G4D; MAP2K2; MAPK1; ALOX5AP;MAPK8; PIK3CB; JMJD7-PLA2G4B; SOS2;PIK3CA; MAP2K7; AKT3

KEGG

Cell cycle - Homo sapiens (human) 0.00644264 0.622169841 0.325408317 1.516089026 314 118 CDK6; CCNA2; CCNB2; PTTG2; ESPL1; YWHAG;E2F1; GADD45A; MCM6; ANAPC7; YWHAB;CCNB3; TTK; CDK2; SKP1; CDC20; CDK4;CDKN2B; CHEK2; FZR1; CUL1; PRKDC; CDC7;E2F5; WEE1; CDKN1C; HDAC2; ANAPC10;CCNH; MDM2; YWHAZ; TP53

KEGG

beta-Alanine metabolism - Homo sapiens (hu-man)

0.037678517 0.734661535 0.434460516 1.499564472 1859 28 ALDH3B1; ALDH1B1; AOC3; HADHA; ABAT;ALDH3A1; GAD1; ALDH6A1; SMOX

KEGG

Protein processing in endoplasmic reticulum -Homo sapiens (human)

0.028814642 0.734661535 0.276750889 1.339975858 1397 153 TRAF2; EDEM3; STT3B; HSPA8; EIF2AK4;MAN1A2; HSPA4L; SVIP; STUB1; SEC31A;SEC23A; UBE2J2; SSR2; HSP90B1; NFE2L2;SEC61B; DNAJC10; PLAA; PPP1R15A; SKP1;SEL1L; UBQLN2; DERL3; PDIA4; CUL1; RNF185;DNAJB1; MAN1A1; SSR4; RPN1; DNAJC5B;CAPN1; HSP90AA1; UBE2D1; EIF2AK1; BAK1;DNAJC5; RPN2; STT3A; TRAM1; DNAJB11;BCL2; PREB; SEC24B; RAD23A

KEGG

Autophagy - animal - Homo sapiens (human) 0.009747426 0.622169841 -0.321369757 -1.479444734 498 115 PDPK1; ATG2B; GABARAPL2; RAF1;GABARAPL1; STK11; ATG9A; ATG7; ATG5;ATG16L1; AKT1; MAPK10; PIK3R2; RRAGD;MAP2K2; MAPK1; RRAGB; MAPK8; PIK3CB;SNAP29; ATG4A; RPS6KB1; MAP3K7; DDIT4;ATG2A; IRS1; PIK3CA; CAMKK2; GABARAP;RB1CC1; ITPR1; AKT3; PPP2CA; PPP2CB;TRAF6; NRBF2; PRKACA; NRAS; BAD; MTOR

KEGG

FoxO signaling pathway - Homo sapiens (hu-man)

0.022368961 0.733204828 -0.299966974 -1.394288531 1144 122 PDPK1; EGF; GABARAPL2; RAF1; SMAD4;IGF1; GABARAPL1; STK11; EP300; SIRT1;FBXO25; CREBBP; AKT1; MAPK10; PIK3R2;FOXO4; CDKN1B; MAP2K2; MAPK1; FOXO1;TGFB3; MAPK8; PIK3CB; ATM; FASLG; IRS1;PRKAG1; SOS2; PIK3CA; G6PC3; SLC2A4;AGAP2; GABARAP; PRMT1; AKT3; HOMER2;BCL2L11; CCND2

KEGG

African trypanosomiasis - Homo sapiens (hu-man)

0.006396588 0.622169841 0.472828815 1.711557555 314 34 PRKCA; HBA1; THOP1; HBA2; APOA1; IDO1;IFNG; HPR; IL6; IL18; ICAM1; IL12B; GNAQ;FAS; APOL1; PRKCG; IL1B; SELE; PLCB3;VCAM1; TNF; PLCB2; HBB

KEGG

Autophagy - other - Homo sapiens (human) 0.014763352 0.622169841 -0.470333594 -1.630410796 747 29 ATG2B; GABARAPL2; GABARAPL1; ATG9A;ATG7; ATG5; ATG16L1; ATG4A; ATG2A;GABARAP; PPP2CA; PPP2CB

KEGG

Hepatocellular carcinoma - Homo sapiens (hu-man)

0.039846049 0.734661535 0.266959778 1.303578308 1935 162 PRKCA; WNT3; CDK6; FZD2; WNT1; NQO1;TERC; ARAF; LRP5; WNT10A; ACTL6A; ACTB;E2F1; GADD45A; ARID2; WNT7B; FZD5;NFE2L2; GSTM3; SHC3; SMARCC1; LEF1;TGFBR1; PLCG2; CDK4; PLCG1; WNT9A;WNT10B; FZD8; IGF2; GSTA4; MGST1; TP53;BAK1; WNT2B; APC2; SMAD3; CSNK1A1L;ACTG1; PRKCG; WNT7A; MAPK3; WNT16;MGST3; TGFB2

KEGG

cAMP signaling pathway - Homo sapiens (hu-man)

0.031108092 0.734661535 -0.270346575 -1.323492499 1602 172 VIPR2; PDE4B; GRIA3; PTCH1; NFKBIA; RAF1;PDE3B; ATP1B1; PDE10A; EP300; GRIN3A;GRIN2A; ADCY2; ADORA1; CNGA4; CREBBP;ATP2A2; CHRM1; AKT1; RAC2; MAPK10;ATP2B3; GRIN1; PIK3R2; ROCK1; FFAR2;MYL9; MAP2K2; MAPK1; MAPK8; PIK3CB; PT-GER2; TIAM1; GLI1; RYR2; GHRL; PIK3CA;CALM2; ADCYAP1R1; CAMK2G; FXYD1; AKT3;CACNA1C; ATP2B4; CAMK2D; ATP1B2; ACOX3;GABBR2; PRKACA; BAD; RAPGEF4

KEGG

Mineral absorption - Homo sapiens (human) 0.03957623 0.734661535 0.3745293 1.445757807 1949 45 TRPV6; FTL; SLC26A9; SLC8A1; SLC34A2;ATP1A3; MT1M; STEAP2; MT1X; MT1F; TRPM6;CYBRD1

KEGG

Toxoplasmosis - Homo sapiens (human) 0.035961542 0.734661535 -0.297539113 -1.358098979 1843 109 PDPK1; LAMB2; XIAP; NFKBIA; PIK3R5;MAP2K6; HSPA1B; IL12A; HLA-DMB; BIRC7;LAMB1; HSPA1L; AKT1; MAPK10; CIITA;MAPK1; LAMA4; TGFB3; MAPK8; HLA-DOB;MAP3K7; PIK3R6; TYK2; HLA-DOA; HSPA1A;CYCS; IL10RA; AKT3; TLR4; GNAO1; IFNGR2

KEGG

Pathogenic Escherichia coli infection - Homosapiens (human)

0.028096261 0.734661535 0.370263637 1.476426685 1386 52 ARPC3; PRKCA; TUBA1A; ACTB; ITGB1;ARPC5L; ARPC1B; CTTN; NCK1; HCLS1; YW-HAZ; TUBB2B; TUBB; ACTG1; EZR; TUBA4A;WASL; ARPC4; ROCK2; CDC42; TUBA3D;ARPC2

KEGG

30

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 31: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S4: HNSC gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Proteoglycans in cancer - Homo sapiens (hu-man)

0.017969898 0.662640002 0.270706229 1.353055352 867 192 PRKCA; WNT3; KDR; FZD2; WNT1; ERBB4;ARAF; VEGFA; WNT10A; ACTB; ITGB1; SHH;TLR2; TWIST2; IQGAP1; FLNC; WNT7B; FZD5;IHH; CD44; CAMK2A; CTTN; PLAUR; HSPB2;FLNB; PLCG2; PLCG1; HSPG2; WNT9A; MMP2;WNT10B; HCLS1; FZD8; ARHGEF12; PPP1R12C;MDM2; IL12B; HIF1A; IGF2; TP53; RRAS;MSN; WNT2B; ITGB5; VAV2; ACTG1; PPP1CC;FAS; THBS1; PTPN11; PRKCG; ITGB3; WNT7A;CAV1; ARHGEF1; EZR; MAPK3; WNT16; ERBB3;TGFB2

KEGG

Proteasome - Homo sapiens (human) 0.011155508 0.622169841 0.422994592 1.608623215 549 42 PSMA4; PSMD11; PSMA5; PSMD12; PSMB10;IFNG; PSMA7; PSMD7; PSME2; PSMD2; PSMC3;PSMA1; PSMB5; PSMD13; PSMC1

KEGG

Neutrophil degranulation 0.001029866 0.489701339 0.255683343 1.407114861 48 448 CD58; PGAM1; SNAP23; S100A9; XRCC5;ALDH3B1; DYNLT1; MMP8; FUCA2; MME;PRG2; TRAPPC1; HSPA8; DBNL; PRDX6;PSMD11; FTL; TMEM63A; METTL7A; RNASE3;ATP11B; TRIM24; TMEM179B; ATP6V0C;ATP8B4; SVIP; IGF2R; ORM1; CPNE1; PSMA5;DSG1; TLR2; GLA; IQGAP1; DYNC1H1;DNAJC13; GYG1; DEFA1B; RAP1B; PSMD12;GLIPR1; CPNE3; PA2G4; MOSPD2; SLC11A1;GLB1; PGLYRP1; GCA; FAF2; CD44; C5AR1;PLAUR; SIRPB1; CTSB; CXCL1; ADAM8;TMEM173; KRT1; PSMD7; RAB9B; C1orf35;LAMP1; SELL; RAB5B; MAN2B1; FPR2; CD68;PYGB; NCSTN; A1BG; FCGR3B; PSMD2; PPBP;AGA; NME1-NME2; ANO6; TICAM2; CAB39;PNP; TCIRG1; PSMC3; ALOX5; CXCR1; CAPN1;DOK3; HSP90AA1; COTL1; MGST1; PTPRC;NFAM1; CYFIP1; APRT; DNAJC5; S100A8;ARL8A; QSOX1; TUBB; CR1; RHOF; OSCAR;PLD1; STK10; ATAD3B; RHOG; PPIE; PSMD13;RAB10; STBD1; PYGL; PSEN1; EEF2; CD93;SLCO4C1; MNDA; ALDOC; PTX3; CTSC

Reactome

p75NTR signals via NF-kB 0.0492891 0.939960889 -0.511101082 -1.511559311 2495 16 NFKBIA; UBC; SQSTM1; PRKCI; UBA52 ReactomeSynthesis of Leukotrienes (LT) and Eoxins(EX)

0.009007588 0.939960889 -0.555523767 -1.724026397 456 19 ALOX15; CYP4F3; GGT5; CYP4B1; ALOX5AP;DPEP1; GGT1; MAPKAPK2; CYP4F22; DPEP3

Reactome

Metabolism of polyamines 0.031602892 0.939960889 0.40786959 1.501391018 1555 36 CPS1; NQO1; ODC1; SLC6A7; GOT1; PAOX;ADI1; CKM; ASL; GATM; AZIN1; SMOX; CKB

Reactome

Collagen formation 0.039773659 0.939960889 0.316265709 1.376092502 1939 80 COL10A1; PCOLCE; COL23A1; PLOD3;COL11A1; LOXL1; ITGB4; P4HA3; COL11A2;COL6A2; PLOD1; PPIB; COL9A3; CTSB; SER-PINH1; COL5A2; BMP1; ADAMTS14; PLEC;COL28A1; PXDN; P4HA1; COL3A1; COL6A1;ITGA6; COL8A2; COL4A4; MMP1; TLL1

Reactome

Xenobiotics 0.017827111 0.939960889 -0.573694865 -1.664943148 900 15 CYP2S1; CYP2D6; CYP2A6; CYP3A4; CYP3A7;ARNT2; CYP2C8; CYP2J2

Reactome

Mitochondrial Fatty Acid Beta-Oxidation 0.011525103 0.939960889 -0.458875791 -1.64222078 583 33 MCAT; ACSM3; MMAA; HADH; PCCA; ECHS1;ACAD11; ACOT9; ACAA2

Reactome

Fc epsilon receptor (FCERI) signaling 0.026078117 0.939960889 -0.342288788 -1.444041568 1333 70 TAB3; PDPK1; MAP3K1; NFKBIA; SYK; ITPR3;LAT2; AHCYL1; ITPR2; MAPK10; ITK; PIK3R2;MAPK1; CARD11; MAPK8; PIK3CB; MAP3K7;MALT1; PIK3CA; PPP3CA; MAP2K7; ITPR1;GRAP2

Reactome

Mitotic Spindle Checkpoint 0.024735561 0.939960889 0.309686101 1.404496566 1208 101 SPC24; KIF2C; ZW10; CLIP1; UBE2E1; PPP2R5E;ANAPC7; DYNC1H1; RANGAP1; CDCA8;ZWILCH; CENPQ; CDC20; CKAP5; NSL1;AHCTF1; SKA1; NDE1; SPC25; ERCC6L;ANAPC10; CENPO; AURKB; UBE2D1; UBE2C;BIRC5; PPP1CC; INCENP; RPS27; NDC80;CLASP1; KNTC1; ANAPC11

Reactome

Cell Cycle Checkpoints 0.040547233 0.939960889 0.25051819 1.272096593 1964 216 CCNA2; SPC24; CCNB2; KIF2C; HIST1H4E;MCM10; ZW10; CLIP1; UBE2E1; RAD9B;YWHAG; PPP2R5E; MCM6; ANAPC7; RPA3;DYNC1H1; YWHAB; RANGAP1; CDCA8;ZWILCH; RPA1; HIST1H4A; CDK2; TP53BP1;CENPQ; CDC20; CKAP5; RBBP8; NBN; CHEK2;NSL1; AHCTF1; SKA1; TOPBP1; HIST1H4D;CDC7; NDE1; SPC25; WEE1; ERCC6L; ANAPC10;MDM2; CENPO; YWHAZ; AURKB; TP53;UBE2D1; PHF20; HIST3H2BB; UBE2C; BIRC5;PPP1CC; HIST1H2BO; CDC25A; INCENP; RPS27;NDC80; BARD1; HUS1; CLASP1; KNTC1; RPA2;HIST1H2BF; ANAPC11; HIST1H2BD

Reactome

EPHB-mediated forward signaling 0.019582457 0.939960889 0.44202519 1.582257976 967 32 ARPC3; ACTR3; ACTB; YES1; ARPC1B; LIMK2;ACTG1; CFL1; WASL; ARPC4; RASA1; ROCK2;CDC42; ARPC2; SDC2

Reactome

G alpha (s) signalling events 0.01497082 0.939960889 -0.311660368 -1.435912002 766 115 VIPR2; PDE4B; RAMP3; GNAL; GNG2; GPR15;PDE3B; AVPR2; PDE10A; GNG11; ADM; ADCY2;MC1R; REEP3; REEP1; GRK6; ADCYAP1; GPR27;INSL3; PTGER2; REEP6; GNB2; CALCRL;OR2A7; VIPR1; ADCYAP1R1; HTR7; PTGER4;PDE8A; GPR84; PDE7B; REEP5; PTGDR

Reactome

31

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 32: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S4: HNSC gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Biosynthesis of DHA-derived SPMs 0.04782256 0.939960889 -0.523833477 -1.520238392 2416 15 ALOX15; CYP2D6; GSTM4; HPGD ReactomeBiosynthesis of specialized proresolving media-tors (SPMs)

0.019611325 0.939960889 -0.544865464 -1.639949165 993 17 ALOX15; CYP2D6; GSTM4; HPGD; ALOX5AP;CYP3A4

Reactome

DNA Replication 0.029718777 0.939960889 0.332906198 1.420138441 1451 72 LIG1; CCNA2; MCM10; UBE2E1; GINS2; MCM6;ANAPC7; RPA3; RPA1; CDK2; SKP1; FZR1; RPA4;CUL1; PRIM1; CDC7; POLE3; ANAPC10; POLA1;UBE2D1; UBE2C; FEN1; CDT1; POLD2; GINS1;PRIM2; RPA2; ANAPC11; ANAPC2; DNA2

Reactome

NOTCH3 Intracellular Domain Regulates Tran-scription

0.007386049 0.939960889 -0.554602994 -1.74419649 373 20 PBX1; MAMLD1; EP300; HEY1; CREBBP;MAML3; SNW1; NOTCH3; MAML2; KAT2A

Reactome

Signaling by NOTCH3 0.044282478 0.939960889 -0.377169493 -1.435944029 2247 43 PBX1; EGF; DLL1; MAMLD1; EP300; HEY1;WWP2; UBC; CREBBP

Reactome

Striated Muscle Contraction 0.015764444 0.939960889 0.464731928 1.622506601 778 29 DES; TNNI2; TNNI3; TPM2; TMOD2; VIM;ACTN3; TMOD3; MYH3; MYBPC3

Reactome

Regulation of KIT signaling 0.046898424 0.939960889 0.510831564 1.517383448 2314 16 PRKCA; SH2B3; YES1 ReactomeRegulation of PLK1 Activity at G2/M Transi-tion

0.042299971 0.939960889 0.315595016 1.370125745 2064 79 TUBA1A; NEK2; CCNB2; YWHAG; PRKAR2B;CSNK1E; DYNC1H1; ODF2; CEP72; SKP1;CEP152; CKAP5; SDCCAG8; CEP135; CUL1;CEP192; CEP250; SSNA1; NDE1; CSNK1D;HSP90AA1

Reactome

Loss of Nlp from mitotic centrosomes 0.033305511 0.939960889 0.343269053 1.42815318 1635 63 TUBA1A; NEK2; YWHAG; PRKAR2B; CSNK1E;DYNC1H1; ODF2; CEP72; CEP152; CKAP5;SDCCAG8; CEP135; CEP192; CEP250; SSNA1;NDE1; CSNK1D; HSP90AA1

Reactome

Loss of proteins required for interphase micro-tubule organization from the centrosome

0.033305511 0.939960889 0.343269053 1.42815318 1635 63 TUBA1A; NEK2; YWHAG; PRKAR2B; CSNK1E;DYNC1H1; ODF2; CEP72; CEP152; CKAP5;SDCCAG8; CEP135; CEP192; CEP250; SSNA1;NDE1; CSNK1D; HSP90AA1

Reactome

Sema3A PAK dependent Axon repulsion 0.01120295 0.939960889 0.579954407 1.722707207 552 16 PLXNA1; SEMA3A; HSP90AA1; PLXNA4; CFL1;NRP1; PAK2; PAK3; PLXNA3

Reactome

AURKA Activation by TPX2 0.04087249 0.939960889 0.333095955 1.398178779 2004 66 TUBA1A; NEK2; YWHAG; PRKAR2B; CSNK1E;DYNC1H1; ODF2; CEP72; CEP152; CKAP5;SDCCAG8; CEP135; CEP192; CEP250; SSNA1;NDE1; CSNK1D; HSP90AA1

Reactome

Common Pathway of Fibrin Clot Formation 0.005393678 0.939960889 0.599478446 1.811376587 265 17 SERPINC1; PROCR; PROS1; PF4V1; PROC; F2;F13A1

Reactome

Formation of Fibrin Clot (Clotting Cascade) 0.03314783 0.939960889 0.434728921 1.517757877 1637 29 SERPINC1; PROCR; A2M; PROS1; PF4V1; PROC;F2; F13A1

Reactome

Regulation of TP53 Activity through Acetyla-tion

0.016126801 0.939960889 -0.472019643 -1.621441316 815 28 ING5; BRD1; MAP2K6; EP300; PIP4K2B; HDAC1;ING2; AKT1; RBBP4; CHD4; AKT3; MTA2;GATAD2A; PIP4K2A; PML

Reactome

Mitotic Prometaphase 0.005201801 0.939960889 0.300752591 1.471925804 252 165 NCAPG; TUBA1A; SPC24; NEK2; CCNB2; KIF2C;ZW10; CLIP1; YWHAG; PRKAR2B; PPP2R5E;CSNK1E; DYNC1H1; RANGAP1; CDCA8;ZWILCH; ODF2; CEP72; CENPQ; CEP152;CDC20; CSNK2A2; CKAP5; NSL1; AHCTF1;SDCCAG8; CEP135; SKA1; CEP192; CEP250;SMC2; SSNA1; NDE1; SPC25; ERCC6L; CSNK1D;CENPO; HSP90AA1; AURKB; TUBG2

Reactome

M Phase 0.022609993 0.939960889 0.243127544 1.285007054 1090 300 NCAPG; PRKCA; TUBA1A; SPC24; NEK2;CCNB2; H3F3B; ESPL1; KIF2C; H3F3A;HIST1H4E; ZW10; CLIP1; UBE2E1; BANF1;YWHAG; HIST1H3J; PRKAR2B; PPP2R5E;CSNK1E; ANAPC7; NUP153; DYNC1H1; RAN-GAP1; CDCA8; ZWILCH; ODF2; NUP214;HIST1H4A; CEP72; CENPQ; H2AFJ; CEP152;CDC20; CSNK2A2; CKAP5; NSL1; AHCTF1;SDCCAG8; CEP135; NCAPD3; SKA1; CEP192;KIF23; CEP250; HIST1H4D; SMC2; SSNA1;NDE1; SPC25; ERCC6L; TMPO; NEK6; ANAPC10;CSNK1D; CENPO; HSP90AA1; AURKB; UBE2D1;TUBG2; RAB1A; HIST3H2BB; TUBB; MCPH1;UBE2C; BIRC5; MASTL; PPP1CC

Reactome

WNT ligand biogenesis and trafficking 0.02404509 0.939960889 0.476837221 1.584745034 1185 24 WNT3; WNT1; SNX3; WNT10A; WNT7B;WNT9A; WNT10B; WNT2B; WNT7A; WNT16

Reactome

APC/C-mediated degradation of cell cycle pro-teins

0.045564574 0.939960889 0.383674179 1.437702677 2239 39 CCNA2; NEK2; UBE2E1; ANAPC7; CDK2;SKP1; CDC20; FZR1; CUL1; ANAPC10; AURKB;UBE2D1; UBE2C

Reactome

Regulation of mitotic cell cycle 0.045564574 0.939960889 0.383674179 1.437702677 2239 39 CCNA2; NEK2; UBE2E1; ANAPC7; CDK2;SKP1; CDC20; FZR1; CUL1; ANAPC10; AURKB;UBE2D1; UBE2C

Reactome

Anchoring of the basal body to the plasma mem-brane

0.02712322 0.939960889 0.318816093 1.410924951 1325 88 TUBA1A; NEK2; TMEM67; YWHAG; PRKAR2B;CSNK1E; DYNC1H1; ODF2; TMEM216; CEP72;CEP152; CKAP5; SDCCAG8; CEP135; CEP192;CEP250; SEPT2; SSNA1; NDE1; AHI1; CSNK1D;HSP90AA1; CEP97; TUBB; TCTN3; MARK4

Reactome

ISG15 antiviral mechanism 0.017670503 0.939960889 0.453998516 1.596648378 868 30 UBE2E1; EIF4E2; EIF4A1; FLNB; PLCG1; DDX58;MX1; EIF4E; PPM1B; ISG15; MAPK3; TRIM25;IFIT1; EIF4G2; NEDD4; STAT1; MX2; IRF3

Reactome

Antiviral mechanism by IFN-stimulated genes 0.017670503 0.939960889 0.453998516 1.596648378 868 30 UBE2E1; EIF4E2; EIF4A1; FLNB; PLCG1; DDX58;MX1; EIF4E; PPM1B; ISG15; MAPK3; TRIM25;IFIT1; EIF4G2; NEDD4; STAT1; MX2; IRF3

Reactome

32

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 33: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S4: HNSC gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Intraflagellar transport 0.000511398 0.486339765 -0.517400034 -1.926267214 25 39 IFT81; TTC26; TNPO1; IFT74; DYNLL2; IFT46;IFT20; TTC21B; TRIP11; CLUAP1; IFT122; TC-TEX1D2; TCTEX1D1; KIF3B; DYNLRB1; IFT172;WDR34; KIF3A

Reactome

Cell Cycle 0.004008222 0.685405961 0.337713106 1.563630653 194 114 CDK6; CCNA2; CCNB2; PTTG2; ESPL1; YWHAG;E2F1; GADD45A; MCM6; ANAPC7; YWHAB;CCNB3; TTK; CDK2; SKP1; CDC20; CDK4;CDKN2B; CHEK2; FZR1; CUL1; PRKDC; CDC7;E2F5; WEE1; CDKN1C; HDAC2; ANAPC10;CCNH; MDM2; YWHAZ; TP53

Wikipathways

MicroRNAs in cardiomyocyte hypertrophy 0.01597737 0.875280385 -0.341910037 -1.485536233 818 82 PDPK1; EGF; RAF1; IGF1; MAP2K6; LIF; RCAN1;MYEF2; GATA4; AKT1; PIK3R2; CTF1; ROCK1;MAP2K2; MAPK1; MAPK8; PIK3CB; FZD1;NFATC4; MYLK; PIK3CA; HDAC5; PPP3CA;MAP2K7; AGT; MAP3K14; IKBKE

Wikipathways

Type II diabetes mellitus 0.016466505 0.875280385 -0.544795494 -1.6667906 834 18 PIK3R5; MAPK1; MAPK8; IRS1; SLC2A4;CACNA1A; PRKCZ; INS-IGF2; SOCS4; MTOR

Wikipathways

IL-1 signaling pathway 0.014160261 0.875280385 -0.384964932 -1.544912971 723 55 TAB3; MAP3K1; NFKBIA; MAP3K3; MAP2K6;SQSTM1; AKT1; PIK3R2; MAP2K2; MAPK1;MAPK8; IRAK3; ECSIT; MAP3K7; IL1A;MAP2K7; MAP3K14; PELI2; PRKCZ; MAP-KAPK2; IRAK2; TRAF6

Wikipathways

Proteasome Degradation 0.031333251 0.875280385 0.34492225 1.43605414 1528 64 PSMA4; PSMD11; PSMA5; PSMD12; PSMB10;IFNG; HLA-G; PSMA7; PSMD7; PSME2; UBA1;PSMD2; RPN1; PSMC3; PSMD9; UBE2D1;PSMA1; PSMB5; PSMD13; PSMC1; RPN2

Wikipathways

Pathogenic Escherichia coli infection 0.028055578 0.875280385 0.370263637 1.476113637 1368 52 ARPC3; PRKCA; TUBA1A; ACTB; ITGB1;ARPC5L; ARPC1B; CTTN; NCK1; HCLS1; YW-HAZ; TUBB2B; TUBB; ACTG1; EZR; TUBA4A;WASL; ARPC4; ROCK2; CDC42; TUBA3D;ARPC2

Wikipathways

Parkin-Ubiquitin Proteasomal System pathway 0.049528639 0.875280385 0.324952116 1.369635292 2421 68 TUBA1A; RNF19A; HSPA8; PSMD11; CASP1;STUB1; UBE2J2; PSMD12; GPR37; PSMD7;CUL1; UBA1; PSMD2; HSPA7; PSMC3; CASP8;PSMD9; TUBB2B; TUBB; PSMD13; PSMC1;TUBA4A

Wikipathways

Structural Pathway of Interleukin 1 (IL-1) 0.048421218 0.875280385 -0.370206006 -1.416644582 2471 44 TAB3; MAP3K1; SAFB; MAP3K3; MAP2K6;IRF7; MBP; MAP2K2; MAPK1; MAP3K7; IL1A;MAP2K7; MAP3K14; MAPKAPK2; IRAK2;TRAF6

Wikipathways

Differentiation of white and brown adipocyte 0.029121635 0.875280385 -0.47684147 -1.559342403 1481 23 PPARG; BMP7; BMP4; EBF3; PPARGC1B;CEBPA; HOXC9; SLC7A10; BMP2; SMAD9

Wikipathways

Fatty Acid Biosynthesis 0.00303078 0.685405961 -0.569256695 -1.838913595 153 22 HADH; ECHDC2; ECH1; SCD; ECHS1; ECHDC3;ACAA2; ACLY; MECR

Wikipathways

Photodynamic therapy-induced NF-kB survivalsignaling

0.035345617 0.875280385 0.423296372 1.500017557 1735 31 CXCL2; VEGFA; TNFRSF1A; NFKB2; IL6;ICAM1; MMP2; BIRC5; IL1B; EGLN2; SELE;MMP1; VCAM1; TNF; PTGS2; RELA

Wikipathways

NAD+ biosynthetic pathways 0.013747102 0.875280385 0.531102312 1.678044213 675 20 SIRT6; CD38; SIRT3; SIRT5; IDO1; TNKS; TNKS2;QPRT; PARP2; NADSYN1; SIRT2; SIRT7; NAMPT

Wikipathways

Hematopoietic Stem Cell Gene Regulation byGABP alpha-beta Complex

0.044246392 0.875280385 -0.487433096 -1.512889891 2243 19 GZMB; SMAD4; EP300; GABPB1; CREBBP;DNMT3B

Wikipathways

miRNA regulation of prostate cancer signalingpathways

0.030626487 0.875280385 -0.436269553 -1.525391842 1557 30 NFKBIA; RAF1; PDGFA; CREBBP; CDKN1B;MAP2K2; MAPK1; FOXO1; PIK3CA; AKT3; BAD;MTOR; CCND1; CDKN1A; CREB3L1; AR; SOS1;GSK3B

Wikipathways

Interleukin-4 and Interleukin-13 signaling 0.020518514 0.875280385 0.325636153 1.441366796 997 88 IL23R; BCL6; IL4R; VEGFA; HSPA8; CCL11;FOXO3; CCL22; SOCS3; ITGB1; PIM1; IL6;GATA3; IL18; ICAM1; VIM; IRF4; MMP2; IL6R;F13A1; OSM; ALOX5; IL12B; HIF1A; HSP90AA1;TP53; MCL1; BIRC5

Wikipathways

Tumor suppressor activity of SMARCB1 0.036534978 0.875280385 0.426008872 1.498118789 1794 30 CDK6; H3F3B; H3F3A; ACTL6A; GLI4; EED;SMARCC1; CDK4; EZH2

Wikipathways

Cytokines and Inflammatory Response 0.043501197 0.875280385 0.488153519 1.522703351 2143 19 CSF3; CXCL2; IFNG; IL6; CXCL1; CSF1; IL11;IL12B; IL7; IL15; IL1B

Wikipathways

33

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 34: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S5: Lung cancer gene set over-representation

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

Glycosphingolipid biosynthesis - globo andisoglobo series - Homo sapiens (human)

15 0.034141328 1 0.002317497 B3GALT5; GLA; ST8SIA1; FUT1 4 lowriskgenes KEGG

Ferroptosis - Homo sapiens (human) 40 0.036511049 1 0.004576659 ALOX15; VDAC2; SLC11A2; ATG7;MAP1LC3A; MAP1LC3B; LPCAT3; ACSL1

8 lowriskgenes KEGG

Glycosphingolipid biosynthesis - lacto and neo-lacto series - Homo sapiens (human)

27 0.03979562 1 0.003454231 B3GNT2; B3GALT5; B3GALT2; ST8SIA1;FUT5; FUT1

6 lowriskgenes KEGG

AMPK signaling pathway - Homo sapiens (hu-man)

120 0.046105873 1 0.009392265 CRTC2; IRS1; MAP3K7; PPP2R3A; CREB5;STRADA; STRADB; PIK3R3; PPARG;EIF4EBP1; PPP2R5A; PPP2R5B; TSC2;TSC1; RAB2A; PFKFB2; PFKFB4

17 lowriskgenes KEGG

Oxygen-dependent asparagine hydroxylation ofHypoxia-inducible Factor Alpha

3 0.000999475 1 0.001747234 HIF1AN; HIF1A; EPAS1 3 lowriskgenes REACTOME

PTK6 Expression 5 0.008557166 1 0.001745201 EPAS1; HIF1A; PELP1 3 lowriskgenes REACTOMEGABA A (rho) receptor activation 3 0.010001747 1 0.001164822 GABRR1; GABRR2 2 lowriskgenes REACTOMEInterleukin-20 family signaling 15 0.018520034 1 0.002320186 IL19; STAT3; IL22RA2; IL22RA1 4 lowriskgenes REACTOMEIntra-Golgi traffic 45 0.020991086 1 0.005142857 MAN2A1; NAPA; CUX1; MAN1C1; CYTH1;

TRIP11; CYTH3; RAB33B; GOSR19 lowriskgenes REACTOME

NrCAM interactions 7 0.025687478 1 0.001743173 DLG3; ANK1; NRP2 3 lowriskgenes REACTOMENucleobase catabolism 37 0.025756807 1 0.004022989 NT5M; NT5C; NT5C1B; ENTPD6; ENTPD5;

ENTPD3; NUDT187 lowriskgenes REACTOME

Inhibition of TSC complex formation by PKB 3 0.028006291 1 0.001164144 TSC2; TSC1 2 lowriskgenes REACTOMEBasigin interactions 26 0.033341329 1 0.003456221 SLC7A8; L1CAM; BSG; ITGA3; ATP1B1;

MAG6 lowriskgenes REACTOME

LDL clearance 19 0.035155359 1 0.002888504 LIPA; NCEH1; AP2B1; NPC2; CLTA 5 lowriskgenes REACTOMEPhosphate bond hydrolysis by NTPDase pro-teins

8 0.038088888 1 0.00174216 ENTPD6; ENTPD5; ENTPD3 3 lowriskgenes REACTOME

Nef and signal transduction 8 0.038088888 1 0.00174216 DOCK2; ELMO1; LCK 3 lowriskgenes REACTOMEPyruvate metabolism 30 0.04698885 1 0.003452244 LDHAL6B; PDPR; BSG; LDHC; PDP1; PDK4 6 lowriskgenes REACTOMEExercise-induced Circadian Regulation 48 0.005708783 1 0.006274957 STBD1; CBX3; PSMA4; CLDN5; KLF9;

G0S2; HERPUD1; UCP3; ETV6; GSTM3;GFRA1

11 lowriskgenes Wikipathways

Nanoparticle triggered autophagic cell death 22 0.011223673 1 0.003466205 TSC2; TSC1; ATG7; MAP1LC3A; ULK2;ATG4A

6 lowriskgenes Wikipathways

Globo Sphingolipid Metabolism 20 0.028158923 1 0.002890173 B3GALT5; ST8SIA1; FUT1; ST6GALNAC1;ST6GALNAC2

5 lowriskgenes Wikipathways

Energy Metabolism 47 0.031868233 1 0.005134056 TFB1M; UCP3; MYBBP1A; SIRT3; PPARG;EP300; PPP3CC; MEF2A; NRF1

9 lowriskgenes Wikipathways

Ferroptosis 40 0.036511049 1 0.004576659 ALOX15; VDAC2; SLC11A2; LPCAT3;MAP1LC3A; ATG7; MAP1LC3B; ACSL1

8 lowriskgenes Wikipathways

Bladder cancer - Homo sapiens (human) 41 0.004980754 0.582276332 0.005730659 EGFR; MAP2K1; RB1; THBS1; MAPK1;HRAS; MDM2; FGFR3; E2F2; E2F3

10 highriskgenes KEGG

One carbon pool by folate - Homo sapiens (hu-man)

20 0.006359388 0.582276332 0.003474233 ALDH1L2; MTHFD1L; AMT; FTCD; GART;DHFR

6 highriskgenes KEGG

Central carbon metabolism in cancer - Homosapiens (human)

65 0.006432965 0.582276332 0.007373795 MAP2K1; HRAS; AKT2; PFKP; PFKM; HK3;LDHA; EGFR; FGFR3; G6PD; PDK1; MET;MAPK1

13 highriskgenes KEGG

Tuberculosis - Homo sapiens (human) 178 0.008120665 0.582276332 0.014069264 AKT2; MRC1; IRAK1; RAB5C; CAMK2A;CAMK2B; TLR2; TLR1; VDR; CD209;CARD9; TNFRSF1A; STAT1; MAPK1; IL18;FCGR3A; FCGR3B; BAD; CEBPB; CEBPG;BID; NOD2; MALT1; NOS2; FCGR2B;CASP3

26 highriskgenes KEGG

Pentose phosphate pathway - Homo sapiens (hu-man)

30 0.009382451 0.582276332 0.004039238 H6PD; RBKS; DERA; TKTL1; PFKM; PFKP;G6PD

7 highriskgenes KEGG

MicroRNAs in cancer - Homo sapiens (human) 299 0.01088367 0.582276332 0.013057671 HDAC4; DDIT4; MDM2; FGFR3; THBS1;CD44; PRKCG; HRAS; HMOX1; MAP2K1;SPRY2; FZD3; MAPK1; IGF2BP1; SLC7A1;CASP3; DICER1; MET; EGFR; PLCG2;PLAU; E2F3; E2F2; CCNE2

24 highriskgenes KEGG

Homologous recombination - Homo sapiens(human)

41 0.013948896 0.603046601 0.004597701 XRCC2; RAD54B; RAD54L; POLD3;BARD1; BLM; RAD51; TOP3B

8 highriskgenes KEGG

ErbB signaling pathway - Homo sapiens (hu-man)

85 0.015029199 0.603046601 0.008417508 HRAS; EGFR; MAP2K1; BAD; ERBB4;MAPK1; CAMK2A; CAMK2B; JUN;PRKCG; PLCG2; AKT2; STAT5A; EREG;BTC

15 highriskgenes KEGG

Melanoma - Homo sapiens (human) 72 0.018403282 0.645613478 0.006798867 EGFR; BAD; RB1; MAPK1; HRAS; FGF5;MAP2K1; E2F3; E2F2; AKT2; MET; MDM2

12 highriskgenes KEGG

Glioma - Homo sapiens (human) 71 0.020112569 0.645613478 0.007336343 MAP2K1; EGFR; AKT2; RB1; MAPK1;HRAS; CAMK2A; CAMK2B; PRKCG;MDM2; E2F3; E2F2; PLCG2

13 highriskgenes KEGG

Hepatitis C - Homo sapiens (human) 155 0.025040284 0.717658621 0.011487965 AKT2; OAS1; RSAD2; CLDN2; TNFRSF1A;CLDN11; CLDN15; CLDN18; MAP2K1;TICAM1; STAT1; MAPK1; HRAS; E2F2;EGFR; RIPK1; BAD; BID; RB1; E2F3;CASP3

21 highriskgenes KEGG

Non-small cell lung cancer - Homo sapiens (hu-man)

66 0.029026771 0.717658621 0.006783493 MAP2K1; PLCG2; BAD; RB1; MAPK1;HRAS; EGFR; PRKCG; E2F3; E2F2; AKT2;STAT5A

12 highriskgenes KEGG

34

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 35: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S5: Lung cancer gene set over-representation (continued)

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

Hepatitis B - Homo sapiens (human) 144 0.029064056 0.717658621 0.010970927 PTK2B; AKT2; TLR2; JUN; MAP2K1;VDAC3; TICAM1; STAT1; MAPK1; HRAS;E2F2; STAT5A; BAD; CCNA2; RB1; PRKCG;E2F3; CASP3; ATF6B; CCNE2

20 highriskgenes KEGG

Cushing syndrome - Homo sapiens (human) 154 0.033390117 0.76558769 0.011462882 MEN1; CDKN2C; CAMK2A; CAMK2B;PDE8A; ADCY4; PLCB2; ORAI1; CCNE2;ATF6B; RAP1B; FZD2; FZD3; MAPK1;EGFR; MAP2K1; RB1; DVL2; E2F3; E2F2;WNT11

21 highriskgenes KEGG

Chronic myeloid leukemia - Homo sapiens (hu-man)

76 0.037176772 0.79558292 0.007311586 GAB2; MAP2K1; BAD; RB1; MAPK1;HRAS; MDM2; E2F3; E2F2; PTPN11; AKT2;STAT5A; CTBP1

13 highriskgenes KEGG

Toll-like receptor signaling pathway - Homosapiens (human)

104 0.049311336 0.817489614 0.007829978 AKT2; CXCL9; TLR2; TLR8; JUN; TLR1;TICAM1; STAT1; MAPK1; IRAK1; SPP1;RIPK1; MAP2K1; TAB1

14 highriskgenes KEGG

Melanogenesis - Homo sapiens (human) 101 0.049507574 0.817489614 0.008356546 TYRP1; CAMK2A; CAMK2B; ADCY4;HRAS; PLCB2; FZD2; FZD3; MAPK1;MC1R; MAP2K1; DVL2; ASIP; PRKCG;WNT11

15 highriskgenes KEGG

Signaling by cytosolic FGFR1 fusion mutants 18 0.004625502 1 0.003476246 ZMYM2; STAT1; TRIM24; GAB2; STAT5A;FGFR1OP

6 highriskgenes REACTOME

Phase 0 - rapid depolarisation 44 0.006216062 1 0.005169443 SCN7A; CACNB4; SCN2A; FGF12;CAMK2A; CAMK2B; CACNA2D3;CACNA2D4; SCN5A

9 highriskgenes REACTOME

Signaling by FGFR in disease 47 0.009134022 1 0.005163511 HRAS; GAB2; STAT1; FGFR3; ZMYM2;TRIM24; FGFR1OP; FGF5; STAT5A

9 highriskgenes REACTOME

ERBB2 Regulates Cell Motility 15 0.009168191 1 0.002900232 ERBB4; EGFR; EREG; BTC; MEMO1 5 highriskgenes REACTOMEFGFR1 mutant receptor activation 31 0.009382451 1 0.004039238 GAB2; STAT1; ZMYM2; TRIM24; FGFR1OP;

FGF5; STAT5A7 highriskgenes REACTOME

Resolution of D-loop Structures through Holli-day Junction Intermediates

33 0.009382451 1 0.004039238 GEN1; XRCC2; DNA2; BARD1; BLM;RAD51; EME2

7 highriskgenes REACTOME

Signaling by FGFR1 in disease 38 0.011554828 1 0.004600345 HRAS; GAB2; STAT1; ZMYM2; TRIM24;FGFR1OP; FGF5; STAT5A

8 highriskgenes REACTOME

GRB2 events in ERBB2 signaling 16 0.012638263 1 0.002898551 ERBB4; HRAS; EGFR; EREG; BTC 5 highriskgenes REACTOMEEstablishment of Sister Chromatid Cohesion 11 0.012731557 1 0.00232423 ESCO2; ESCO1; STAG1; PDS5A 4 highriskgenes REACTOMESHC1 events in ERBB2 signaling 22 0.014332229 1 0.003468208 HRAS; EGFR; ERBB4; PTPN12; EREG; BTC 6 highriskgenes REACTOMEResolution of D-Loop Structures 35 0.014531594 1 0.004034582 GEN1; XRCC2; DNA2; BARD1; BLM;

RAD51; EME27 highriskgenes REACTOME

Formation of the Editosome 9 0.015795255 1 0.001746217 APOBEC3H; APOBEC3C; APOBEC3B 3 highriskgenes REACTOMEmRNA Editing: C to U Conversion 9 0.015795255 1 0.001746217 APOBEC3H; APOBEC3C; APOBEC3B 3 highriskgenes REACTOMETLR3-mediated TICAM1-dependent pro-grammed cell death

6 0.015795255 1 0.001746217 TICAM1; RIPK1; RIPK3 3 highriskgenes REACTOME

Zinc efflux and compartmentalization by theSLC30 family

7 0.015795255 1 0.001746217 SLC30A5; SLC30A7; SLC30A3 3 highriskgenes REACTOME

Nitric oxide stimulates guanylate cyclase 25 0.022443232 1 0.003464203 PDE3A; PRKG1; PDE5A; NOS2; KCNMB3;GUCY1B2

6 highriskgenes REACTOME

ERBB2 Activates PTK6 Signaling 13 0.025520888 1 0.002321532 ERBB4; EGFR; EREG; BTC 4 highriskgenes REACTOMEHomology Directed Repair 139 0.026858093 1 0.01046832 SUMO2; BARD1; RNF4; EME2; XRCC2;

TIMELESS; CHEK1; PARP2; HIST1H2BD;BLM; POLE; POLD3; PIAS4; GEN1; DNA2;CCNA2; RAD51; TP53BP1; TIPIN

19 highriskgenes REACTOME

Insulin-like Growth Factor-2 mRNA BindingProteins (IGF2BPs/IMPs/VICKZs) bind RNA

3 0.027943391 1 0.001165501 IGF2BP1; IGF2BP3 2 highriskgenes REACTOME

Cholesterol biosynthesis via desmosterol 4 0.027943391 1 0.001165501 DHCR7; DHCR24 2 highriskgenes REACTOMECholesterol biosynthesis via lathosterol 4 0.027943391 1 0.001165501 DHCR7; DHCR24 2 highriskgenes REACTOMESignaling by Hippo 20 0.028032267 1 0.002893519 CASP3; DVL2; LATS1; AMOTL2; AMOT 5 highriskgenes REACTOMEHDR through Homologous Recombination(HR) or Single Strand Annealing (SSA)

133 0.031241616 1 0.009944751 SUMO2; BARD1; RNF4; EME2; XRCC2;TIMELESS; CHEK1; HIST1H2BD; BLM;POLE; POLD3; PIAS4; GEN1; DNA2;CCNA2; RAD51; TP53BP1; TIPIN

18 highriskgenes REACTOME

G0 and Early G1 25 0.033174432 1 0.003460208 MYBL2; CCNA2; CCNE2; E2F5; LIN9;RBBP4

6 highriskgenes REACTOME

N-Glycan antennae elongation 15 0.034012368 1 0.002320186 ST3GAL4; MGAT4B; B4GALT4; B4GALT6 4 highriskgenes REACTOMERegulation of TLR by endogenous ligand 16 0.034012368 1 0.002320186 CD36; TLR2; TLR1; S100A8 4 highriskgenes REACTOMESHC1 events in ERBB4 signaling 14 0.034012368 1 0.002320186 HRAS; EREG; BTC; ERBB4 4 highriskgenes REACTOMEmRNA Editing 11 0.037973374 1 0.001744186 APOBEC3H; APOBEC3C; APOBEC3B 3 highriskgenes REACTOMESEMA3A-Plexin repulsion signaling by inhibit-ing Integrin adhesion

14 0.043950039 1 0.002318841 SEMA3A; PLXNA3; PLXNA4; TLN1 4 highriskgenes REACTOME

S Phase 103 0.045740468 1 0.008361204 AKT2; PDS5A; CCNE2; STAG1; ANAPC5;GINS2; GINS3; POLE; POLD3; DNA2;CCNA2; ESCO2; ESCO1; RB1; MCM5

15 highriskgenes REACTOME

Downregulation of ERBB2 signaling 29 0.046762797 1 0.003456221 EGFR; AKT2; ERBB4; PTPN12; EREG; BTC 6 highriskgenes REACTOMESignaling of Hepatocyte Growth Factor Recep-tor

34 0.0050574 0.850185102 0.005172414 HRAS; RAP1B; PTK2B; MET; JUN; MAPK1;RASA1; PTPN11; MAP2K1

9 highriskgenes Wikipathways

ErbB Signaling Pathway 92 0.005787334 0.850185102 0.009518477 AKT2; CAMK2A; CAMK2B; MDM2; HRAS;JUN; BTC; MAPK1; PLCG2; MAP2K1; BAD;STAT5A; EREG; EGFR; ERBB4; PRKCG;PDK1

17 highriskgenes Wikipathways

Osteopontin Signaling 13 0.006415057 0.850185102 0.002901915 MAP2K1; MAPK1; SPP1; PLAU; MAP3K14 5 highriskgenes Wikipathways

35

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 36: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S5: Lung cancer gene set over-representation (continued)

Pathway Gene setsize

p value p value(adjusted)

Jaccard In-dex

Overlapping Numberof overlap-ping

Genes Source

Retinoblastoma Gene in Cancer 89 0.010481201 0.850185102 0.008963585 BARD1; MDM2; RBBP4; CCNE2; CHEK1;FAF1; POLE; FANCG; POLD3; DHFR;MSH6; RRM1; CCNA2; RB1; E2F3; E2F2

16 highriskgenes Wikipathways

Androgen Receptor Network in Prostate Cancer 72 0.011073971 0.850185102 0.007357102 HRAS; MAP2K1; RAP1B; SPRY2;SMARCA4; JUN; MAPK1; HSD17B3;SMARCD3; RASA1; PTPN11; PTK2B;FOXA1

13 highriskgenes Wikipathways

EPO Receptor Signaling 26 0.011754847 0.850185102 0.004036909 MAP2K1; STAT1; MAPK1; RASA1; EPOR;PDK1; STAT5A

7 highriskgenes Wikipathways

Non-small cell lung cancer 66 0.012563889 0.850185102 0.007352941 HRAS; MAP2K1; PLCG2; RB1; MAPK1;EGFR; PRKCG; AKT2; E2F3; E2F2; BAD;PDK1; STAT5A

13 highriskgenes Wikipathways

Nanoparticle triggered regulated necrosis 10 0.012731557 0.850185102 0.00232423 TICAM1; TNFRSF1A; RIPK1; RIPK3 4 highriskgenes WikipathwaysIL-2 Signaling Pathway 42 0.017922081 0.922700962 0.005151689 HRAS; GAB2; MAP2K1; STAT1; JUN;

MAPK1; PTPN11; PTK2B; STAT5A9 highriskgenes Wikipathways

Melatonin metabolism and effects 34 0.021429024 0.922700962 0.004029937 AANAT; CYP2D6; IRAK1; CAMK2A; PER1;ACHE; MAOA

7 highriskgenes Wikipathways

Differentiation of white and brown adipocyte 25 0.022443232 0.922700962 0.003464203 CEBPB; CEBPG; ZIC1; HSPB7; BMP7;SLC7A10

6 highriskgenes Wikipathways

Factors and pathways affecting insulin-likegrowth factor (IGF1)-Akt signaling

30 0.025610088 0.922700962 0.004027618 NEB; ACVR2B; PRKAB1; IGFBP5; TRIM63;TNFRSF1A; PDK1

7 highriskgenes Wikipathways

Mammary gland development pathway - Preg-nancy and lactation (Stage 3 of 4)

33 0.030315433 0.922700962 0.004025302 EGFR; ELF5; CEBPB; GAL; ORAI1; ERBB4;PNCK

7 highriskgenes Wikipathways

IL-7 Signaling Pathway 25 0.033174432 0.922700962 0.003460208 BAD; STAT1; PTK2B; MAPK1; MAP2K1;STAT5A

6 highriskgenes Wikipathways

Senescence and Autophagy in Cancer 106 0.033262463 0.922700962 0.008903728 MDM2; HRAS; JUN; THBS1; MAP1LC3C;IGFBP5; SPARC; MAPK1; MLST8;MAP2K1; CEBPB; GABARAPL2; RB1;ATG5; CD44; PLAU

16 highriskgenes Wikipathways

Serotonin Receptor 2 and ELK-SRF-GATA4signaling

20 0.035000627 0.922700962 0.002891845 HRAS; MAP2K1; HTR2B; MAPK1; GATA4 5 highriskgenes Wikipathways

Bladder Cancer 40 0.036292907 0.922700962 0.004581901 HRAS; EGFR; MAP2K1; RB1; THBS1;MAPK1; MDM2; FGFR3

8 highriskgenes Wikipathways

Vitamin D Metabolism 10 0.037973374 0.922700962 0.001744186 VDR; DHCR7; CYP27A1 3 highriskgenes WikipathwaysNifedipine Activity 8 0.037973374 0.922700962 0.001744186 PTK2B; MAP2K1; MAPK1 3 highriskgenes Wikipathwaysfig-met-1-last-solution 14 0.043950039 0.922700962 0.002318841 PRKAB1; G6PD; PDK1; LDHA 4 highriskgenes WikipathwaysHepatitis C and Hepatocellular Carcinoma 56 0.045850477 0.922700962 0.005131129 PODXL; MYOF; JUN; CXCR1; NOS2; CD44;

CASP3; E2F2; PTPN119 highriskgenes Wikipathways

Cardiac Progenitor Differentiation 53 0.04726475 0.922700962 0.004576659 ISL1; SOX2; NCAM1; ACTC1; POU5F1;GATA4; FOXA2; SCN5A

8 highriskgenes Wikipathways

Amyotrophic lateral sclerosis (ALS) 35 0.047793856 0.922700962 0.00401837 CAT; BAD; BID; CCS; CASP3; TNFRSF1A;PRPH

7 highriskgenes Wikipathways

MET in type 1 papillary renal cell carcinoma 59 0.048898709 0.922700962 0.005678592 RAP1B; MAP2K1; HRAS; MAPK1; STRN;AKT2; JUN; PTPN11; BAD; MET

10 highriskgenes Wikipathways

Toll-like Receptor Signaling Pathway 102 0.049311336 0.922700962 0.007829978 AKT2; IRAK1; CXCL9; TLR2; TLR8; JUN;TLR1; STAT1; TICAM1; MAPK1; SPP1;RIPK1; MAP2K1; TAB1

14 highriskgenes Wikipathways

36

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 37: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S6: Lung cancer gene set enrichment analysis (GSEA)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Non-small cell lung cancer - Homo sapiens (hu-man)

0.010898204 0.321497006 0.374045141 1.557379546 545 66 E2F2; RB1; HRAS; PLCG2; E2F3; AKT2;MAP2K1; EGFR; MAPK1; BAD; PRKCG; STAT5A;PDPK1; NRAS; ALK; PIK3R2; GADD45A; TP53;RXRB

KEGG

Chronic myeloid leukemia - Homo sapiens (hu-man)

0.001316629 0.186477295 0.405078284 1.734856777 65 76 E2F2; PTPN11; RB1; GAB2; HRAS; E2F3;AKT2; MAP2K1; CTBP1; MAPK1; MDM2;BAD; STAT5A; NRAS; BCL2L1; RUNX1; PIK3R2;GADD45A; TP53

KEGG

Fc epsilon RI signaling pathway - Homo sapi-ens (human)

0.037671233 0.44360956 0.347892645 1.421121415 1891 60 LCP2; GAB2; HRAS; PLCG2; AKT2; MAP2K1;MAPK1; PDPK1; NRAS; VAV1; LYN; PLA2G4D;MAP2K3; PIK3R2; MAP2K4; PLA2G4A; INPP5D;MAPK13; PIK3R1; MAPK11; FYN; MAP2K2;VAV3; PIK3CD; JMJD7-PLA2G4B

KEGG

Central carbon metabolism in cancer - Homosapiens (human)

0.031231926 0.400583398 0.351556535 1.440756657 1565 61 PFKM; HRAS; AKT2; PDK1; MAP2K1; EGFR;FGFR3; PFKP; MAPK1; MET; LDHA; G6PD; HK3;NRAS; GCK; PIK3R2; PGAM2; TP53; PTEN;SLC1A5

KEGG

Melanoma - Homo sapiens (human) 0.003160632 0.186477295 0.409587644 1.682858843 157 62 E2F2; FGF5; RB1; HRAS; E2F3; AKT2; MAP2K1;EGFR; MAPK1; MET; MDM2; BAD; NRAS;PDGFB; PDGFC; PIK3R2; GADD45A; TP53;PTEN; CDH1; PDGFA; CDK6; FGF19; PIK3R1

KEGG

Allograft rejection - Homo sapiens (human) 0.042946752 0.44360956 -0.412774741 -1.466925282 2142 32 FASLG; PRF1; GZMB; IL12B; HLA-A; HLA-C;HLA-F; HLA-DPA1; HLA-DOA; CD40LG; HLA-DQA2; FAS; CD28; HLA-DRB1; CD80

KEGG

Graft-versus-host disease - Homo sapiens (hu-man)

0.022933665 0.369265734 -0.424909347 -1.54239649 1143 35 FASLG; PRF1; GZMB; KLRC1; KLRD1; HLA-A; HLA-C; HLA-F; HLA-DPA1; HLA-DOA; HLA-DQA2; FAS; KIR3DL1; CD28; HLA-DRB1; CD80

KEGG

Cushing syndrome - Homo sapiens (human) 0.024663633 0.369265734 0.28906144 1.370608778 1240 138 E2F2; PDE8A; CCNE2; ADCY4; MEN1; CAMK2B;RB1; E2F3; WNT11; MAP2K1; ATF6B; FZD3;EGFR; RAP1B; MAPK1; FZD2; CAMK2A; ORAI1;PLCB2; DVL2; CDKN2C; ARNT; ADCY3; DVL3;CREB3; ADCY7; CACNA1H; WNT10A; WNT10B;CACNA1G; CDK2; CYP11A1; GNAI3

KEGG

Bladder cancer - Homo sapiens (human) 0.006262965 0.263939228 0.450013135 1.68471731 313 40 E2F2; RB1; THBS1; HRAS; E2F3; MAP2K1;EGFR; FGFR3; MAPK1; MDM2; NRAS; TYMP;MMP9; TP53; HBEGF; CDH1

KEGG

Longevity regulating pathway - multiplespecies - Homo sapiens (human)

0.022824054 0.369265734 0.369150302 1.493315141 1146 57 ADCY4; HRAS; AKT2; FOXA2; ATG5; PRKAB1;CAT; NRAS; ADCY3; ADCY7; RPS6KB1; PIK3R2;CLPB; CRYAB; PIK3R1; IRS2; IGF1R; EIF4EBP2

KEGG

Autoimmune thyroid disease - Homo sapiens(human)

0.02827712 0.397226206 -0.429783078 -1.527369773 1410 32 FASLG; PRF1; GZMB; TPO; HLA-A; HLA-C;HLA-F; HLA-DPA1; HLA-DOA; CD40LG; HLA-DQA2; FAS; CD28; HLA-DRB1; CD80

KEGG

GnRH signaling pathway - Homo sapiens (hu-man)

0.04962412 0.44360956 0.308654456 1.346055389 2481 84 JUN; ADCY4; CAMK2B; PTK2B; HRAS;MAP2K1; EGFR; MAPK1; GNRH1; CAMK2A;PLCB2; NRAS; ADCY3; PLA2G4D; ADCY7;MAP2K3; LHB; HBEGF; CALML3; MAP2K4;PLA2G4A; MAPK13; PLCB1; CALML6

KEGG

Breast cancer - Homo sapiens (human) 0.04747314 0.44360956 0.275960724 1.305509134 2385 136 E2F2; JUN; FGF5; RB1; HRAS; E2F3; AKT2;WNT11; DLL1; MAP2K1; FZD3; EGFR; MAPK1;FZD2; DVL2; NRAS; BRCA2; CSNK1A1L; DVL3;HES5; TNFSF11; HEYL; WNT10A; RPS6KB1;PIK3R2; GADD45A; WNT10B; TP53; PTEN;CSNK1A1; DLL4; DLL3; FRAT1; SHC2; CDK6;FGF19; TCF7L2; PIK3R1

KEGG

Type I diabetes mellitus - Homo sapiens (hu-man)

0.00769014 0.283573902 -0.4521162 -1.669986296 381 38 FASLG; PRF1; PTPRN; GZMB; IL12B; HLA-A;HLA-C; HLA-F; HLA-DPA1; HLA-DOA; GAD1;HLA-DQA2; FAS; ICA1; CD28; HLA-DRB1; PT-PRN2; CD80; LTA; TNF; HLA-DPB1

KEGG

One carbon pool by folate - Homo sapiens (hu-man)

0.03057396 0.400583398 0.514629501 1.573020565 1531 18 MTHFD1L; DHFR; ALDH1L2; FTCD; AMT;GART

KEGG

Axon guidance - Homo sapiens (human) 0.048569779 0.44360956 0.263061805 1.283049144 2439 168 SEMA4B; PTPN11; ROBO3; PLXNA4; LIMK1;CAMK2B; SEMA3E; HRAS; PLCG2; SSH1;RASA1; PDK1; FZD3; SEMA6A; MAPK1; MET;SEMA3A; RYK; CAMK2A; BMP7; PLXNA3;ROBO1; NRAS; TRPC3; SRGAP2; EFNB1;DPYSL5; NTNG2; TRPC6; ROBO2; DPYSL2;RRAS; PIK3R2; NFATC2; SEMA6B; NGEF;EPHB2; SEMA3B; GNAI3; TRPC1; EFNA2;UNC5A; EPHA2

KEGG

AMPK signaling pathway - Homo sapiens (hu-man)

0.041262282 0.44360956 -0.293033377 -1.34157647 2061 110 STRADB; TSC1; RAB2A; CREB5; PFKFB4;STRADA; EIF4EBP1; PPP2R5B; IRS1; PFKFB2;PPP2R3A; TSC2; MAP3K7; PIK3R3; CRTC2;PPARG; PPP2R5A; RAB14

KEGG

Other types of O-glycan biosynthesis - Homosapiens (human)

0.047304577 0.44360956 0.502863349 1.512858902 2370 17 OGT; LFNG; MFNG; POMT1; ST3GAL3; POFUT1;B4GALT1; ST6GAL1; B4GALT3; GXYLT2

KEGG

Signaling pathways regulating pluripotency ofstem cells - Homo sapiens (human)

0.015129497 0.343323191 0.304191843 1.426586927 759 128 OTX1; SOX2; INHBE; POU5F1; HRAS; ISL1;AKT2; WNT11; MAP2K1; FZD3; FGFR3; MAPK1;FZD2; SKIL; ACVR2B; DVL2; NRAS; DLX5;DVL3; PCGF1; ACVR1B; ACVR1; JAK2;WNT10A; PIK3R2; ESRRB; REST; WNT10B;TBX3; MEIS1; SMAD1; NODAL

KEGG

37

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 38: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Glioma - Homo sapiens (human) 0.00201031 0.186477295 0.404008132 1.703994205 100 70 E2F2; CAMK2B; RB1; HRAS; PLCG2; E2F3;AKT2; MAP2K1; EGFR; MAPK1; MDM2;CAMK2A; PRKCG; NRAS; PDGFB; PIK3R2;GADD45A; TP53; PTEN; CALML3; PDGFA;SHC2; CDK6; PIK3R1; CALML6

KEGG

VEGF signaling pathway - Homo sapiens (hu-man)

0.024662115 0.369265734 0.368339576 1.483868068 1238 56 HRAS; PLCG2; AKT2; MAP2K1; SH2D2A;MAPK1; BAD; PRKCG; NRAS; PLA2G4D;PIK3R2; NFATC2; MAPKAPK3; HSPB1; SHC2;PLA2G4A; MAPK13; PIK3R1; MAPK11; PTGS2;MAP2K2; PIK3CD; PRKCB; JMJD7-PLA2G4B

KEGG

C-type lectin receptor signaling pathway -Homo sapiens (human)

0.025034965 0.369265734 0.3141687 1.410882593 1252 99 PTPN11; CD209; JUN; HRAS; PLCG2; STAT1;AKT2; CARD9; MALT1; MAPK1; MDM2;MAP3K14; RELB; NRAS; STAT2; BCL3; IL1B;PYCARD; RRAS; PIK3R2; NFATC2; RRAS2;IL10; PLK3; CALML3; MAPK13; PAK1; NLRP3;PIK3R1; CALML6; EGR2; MAPK11; MRAS

KEGG

DNA replication - Homo sapiens (human) 0.047535422 0.44360956 0.395344119 1.444286718 2381 36 MCM5; DNA2; POLD3; POLE; PRIM1; POLD1;POLE4; PRIM2; POLD4; RNASEH2B; RFC1;POLA2; RPA2; POLA1; POLE3; RFC4; LIG1;MCM4; MCM2

KEGG

Tuberculosis - Homo sapiens (human) 0.002909468 0.186477295 0.314787174 1.522957884 145 159 CEBPB; CD209; CAMK2B; TLR1; VDR; BID;IRAK1; FCGR3A; STAT1; FCGR3B; AKT2;CARD9; CEBPG; NOS2; FCGR2B; MALT1;RAB5C; CASP3; MAPK1; MRC1; IL18; CAMK2A;TLR2; BAD; TNFRSF1A; NOD2; CD14; RFX5;JAK2; IRAK4; IL1B; CYP27B1; TCIRG1;ATP6V1H; RAB7A; LBP; IL10; C3; PLK3

KEGG

MAPK signaling pathway - Homo sapiens (hu-man)

0.041069685 0.44360956 0.241246726 1.249739286 2060 268 RASGRP3; RASGRP4; JUN; FGF5; MAPK8IP2;IRAK1; CACNB4; HRAS; RASA1; AKT2;DUSP10; MAP2K1; EGFR; RAP1B; FGFR3;CASP3; MAPK1; CACNA2D3; MET; TAB1;CACNA2D4; ERBB4; DUSP8; TAOK3; IL1RAP;PRKCG; EREG; MAP3K14; TNFRSF1A;RPS6KA3; RELB; CD14; RPS6KA2; NRAS;MAP3K12; RASGRF1; MEF2C; PLA2G4D;IRAK4; MAP2K3; TAOK2; CACNA1H; IL1B;CACNA2D2; PDGFB; RRAS; NLK; PDGFC;GADD45A; TP53; RAPGEF2; MAPKAPK3;MAP4K2; RRAS2; MAPK8IP1; HSPB1;CACNA1G; RPS6KA1; EFNA2; EPHA2; PDGFA;TAB2; MAP2K4; NTF3; PLA2G4A; MAPK13;FGF19; NGFR; PAK1; MAP4K4

KEGG

Phospholipase D signaling pathway - Homosapiens (human)

0.00558147 0.263939228 0.318209685 1.501945651 279 134 PTPN11; ADCY4; GRM3; AGPAT1; GAB2;PTK2B; HRAS; PLCG2; PIP5K1A; AKT2;MAP2K1; F2R; EGFR; MAPK1; CXCR1; RALA;PLCB2; AGPAT5; DGKA; NRAS; ADCY3;AVPR1A; DGKQ; LPAR4; PLA2G4D; ADCY7;CYTH4; AGPAT4; PDGFB; DNM1; RRAS; LPAR2;PDGFC; PIK3R2; CYTH2; RRAS2; PTGFR;DGKG; GRM8; AGPAT2; PDGFA; LPAR6; SHC2;PLA2G4A; DGKD; PIK3R1; PLCB1; RALB

KEGG

Homologous recombination - Homo sapiens(human)

0.012344446 0.331055597 0.457031152 1.634852561 617 33 RAD54B; BLM; POLD3; BARD1; XRCC2;RAD54L; TOP3B; RAD51; BRCA2; RAD50;POLD1; XRCC3; TOP3A; PALB2; POLD4;BRCC3; RPA2; TOPBP1

KEGG

MicroRNAs in cancer - Homo sapiens (human) 0.009658188 0.316573927 0.301807035 1.445153351 485 147 E2F2; SPRY2; IGF2BP1; CCNE2; DDIT4; HDAC4;THBS1; HRAS; PLCG2; E2F3; MAP2K1; FZD3;EGFR; FGFR3; CASP3; HMOX1; MAPK1; MET;SLC7A1; DICER1; CD44; MDM2; PLAU; PRKCG;NRAS; SERPINB5; MARCKS; CDCA5; ZFPM2;MMP9; PDGFB; PIK3R2; TP53; PTEN; TNXB;BMF

KEGG

Hepatitis C - Homo sapiens (human) 0.032611729 0.400852503 0.284472207 1.342705187 1635 134 E2F2; CLDN2; BID; RB1; RSAD2; HRAS; STAT1;E2F3; CLDN15; AKT2; CLDN18; MAP2K1;EGFR; TICAM1; CASP3; RIPK1; MAPK1; OAS1;CLDN11; BAD; TNFRSF1A; NRAS; STAT2;PSME3; PPP2R2C; PIK3R2; IRF7; TP53; CDK2;PPARA; CDK6; PPP2R2A; MAVS; TRAF3; CD81;PIK3R1

KEGG

Hepatitis B - Homo sapiens (human) 0.018812335 0.369265734 0.299394939 1.404090596 944 128 E2F2; VDAC3; CCNE2; JUN; CCNA2; RB1;PTK2B; HRAS; STAT1; E2F3; AKT2; MAP2K1;ATF6B; TICAM1; CASP3; MAPK1; TLR2; BAD;PRKCG; STAT5A; NRAS; STAT2; CREB3; MMP9;PIK3R2; IRF7; TP53; PTEN; NFATC2

KEGG

Pentose phosphate pathway - Homo sapiens (hu-man)

0.000559016 0.164909759 0.596920902 1.993460357 27 25 PFKM; DERA; TKTL1; PFKP; RBKS; H6PD;G6PD; PGM1

KEGG

Epstein-Barr virus infection - Homo sapiens(human)

0.018406621 0.369265734 0.275739831 1.360089822 922 182 E2F2; PSMD7; NEDD4; CCNE2; JUN; BID;CCNA2; RB1; IRAK1; PLCG2; STAT1; E2F3;AKT2; CASP3; RIPK1; PSMD3; OAS1; TAB1;PSMC5; PSMD4; CD44; MDM2; TLR2; MAP3K14;RELB; STAT2; TAP2; LYN; NFKBIE; IRAK4;MAP2K3; SKP2; NCOR2; PIK3R2; IRF7; HLA-B;USP7; GADD45A; TP53; CDK2; CD40; BCL2L11;TAB2; HLA-E; MAP2K4; CDK6; MAPK13; MAVS;TRAF3

KEGG

38

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 39: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Pancreatic cancer - Homo sapiens (human) 0.014815258 0.343323191 0.35077173 1.498679661 742 75 E2F2; RB1; STAT1; E2F3; AKT2; RALBP1;MAP2K1; EGFR; MAPK1; RALA; BAD; RAD51;BCL2L1; BRCA2; RPS6KB1; PIK3R2; GADD45A;TP53

KEGG

SHC1 events in ERBB2 signaling 0.049867714 0.693910801 0.464684132 1.484522511 2487 21 PTPN12; HRAS; EGFR; BTC; ERBB4; EREG;NRAS; NRG4; HBEGF

Reactome

FRS-mediated FGFR2 signaling 0.030219285 0.693910801 0.536081572 1.586811851 1508 16 PTPN11; FGF5; HRAS; NRAS ReactomeHDR through Homologous Recombination(HR) or Single Strand Annealing (SSA)

0.043549287 0.693910801 0.289907882 1.331498575 2185 113 CHEK1; SUMO2; RNF4; CCNA2; EME2; PIAS4;DNA2; BLM; POLD3; BARD1; POLE; TP53BP1;XRCC2; TIPIN; HIST1H2BD; GEN1; TIMELESS;RAD51; BRCA2; RAD50; POLD1; XRCC3;TOP3A; POLE4; HIST1H4J; PALB2; HERC2;CDK2; POLD4; RFC1; BRCC3; UBE2V2; UBC;UBB; RNF168; RNF8; RPA2; HIST1H2BJ;HIST1H4E; TOPBP1; CCNA1; POLE3; RFC4;HIST1H4C

Reactome

FRS-mediated FGFR3 signaling 0.003178665 0.494390184 0.642618712 1.868421535 158 15 PTPN11; FGF5; HRAS; FGFR3; NRAS ReactomeDownstream signaling of activated FGFR3 0.020473039 0.693910801 0.515134035 1.625519272 1023 20 PTPN11; FGF5; HRAS; FGFR3; NRAS ReactomeNegative regulation of FGFR3 signaling 0.038239466 0.693910801 0.457902363 1.516222337 1913 24 SPRY2; PTPN11; FGF5; FGFR3; MAPK1 ReactomeEpigenetic regulation of gene expression 0.049662672 0.693910801 -0.272053667 -1.294149107 2465 141 MYBBP1A; SAP18; HIST4H4; HIST1H2BC;

GTF2H5; ACTB; TET2; HIST1H4A; HIST1H2BE;DNMT1; UHRF1; EP300; HIST1H4I; CBX3;HIST1H3C; RRP8; KAT2B; SIN3B; POLR2F;H2AFB1; HIST1H2BO; TWISTNB; CHD4;ERCC6; HIST2H2BE; HIST1H2BL; CD3EAP;HIST2H4A; MBD2; POLR2H; SAP30; HIST1H3I;DNMT3B; POLR2L; HIST1H3D; HIST2H2AC;CCNH; HIST1H3J; HIST1H4B; PHF1

Reactome

TBC/RABGAPs 0.015726605 0.693910801 -0.406640483 -1.567474319 785 45 TSC1; MAP1LC3B; RAB33B; TBC1D24; TSC2;TBC1D10C; RAB5B; TBC1D17; TBC1D14;TBC1D3B; TBC1D10B; RAB11A; RAB5A;TBC1D15; SYTL1; RABGAP1; RAB6B; RAB33A;RAB4A; RABGEF1; GABARAP; TBC1D20;TBC1D16; RAB35

Reactome

Formation of Senescence-Associated Hete-rochromatin Foci (SAHF)

0.031981576 0.693910801 0.533205784 1.578299462 1596 16 EP400; RB1; HIST1H1E; H1F0; HMGA1;HIST1H1D; TP53; HIST1H1B; LMNB1

Reactome

DNA Damage/Telomere Stress Induced Senes-cence

0.010010569 0.595003191 0.500076038 1.691868331 501 26 CCNE2; EP400; CCNA2; RB1; RAD50; HIST1H1E;H1F0; HMGA1; HIST1H1D; TP53; CDK2; CCNA1;HIST1H1B; LMNB1

Reactome

Nucleobase catabolism 0.04347044 0.693910801 -0.420826818 -1.472772591 2174 30 NT5C1B; NT5M; NUDT18; ENTPD5; ENTPD6;ENTPD3; NT5C

Reactome

Toll-Like Receptors Cascades 0.021043838 0.693910801 0.28681624 1.371154075 1057 147 PTPN11; JUN; TLR1; IRAK1; S100A12; TLR8;PLCG2; MAP2K1; CD36; TICAM1; TMEM189;RIPK1; MAPK1; RIPK3; TAB1; TLR2; S100A8;IRAK3; NOD2; RPS6KA3; CD14; RPS6KA2;MEF2C; S100A1; IRAK4; MAP2K3; DNM1; IRF7;AGER; UNC93B1; MAPKAPK3; LBP; RPS6KA1;HSP90B1; VRK3; UBE2D3; TAB2; MAP2K4;CTSS; CNPY3; TRAF3; TLR6; UBC; UBB; ATF1;CTSK; MAPK11; TLR5; RIPK2; TLR7; MAP3K8;UBE2V1; NKIRAS2; ELK1; SAA1; CTSB; NOD1;SIGIRR; APOB; RPS27A

Reactome

Homology Directed Repair 0.040099751 0.693910801 0.288549124 1.335932347 2009 119 CHEK1; SUMO2; RNF4; PARP2; CCNA2;EME2; PIAS4; DNA2; BLM; POLD3; BARD1;POLE; TP53BP1; XRCC2; TIPIN; HIST1H2BD;GEN1; TIMELESS; RAD51; BRCA2; RAD50;POLD1; XRCC3; TOP3A; POLE4; HIST1H4J;PALB2; HERC2; CDK2; POLD4; RFC1; BRCC3;UBE2V2; UBC; UBB; RNF168; RNF8; RPA2;POLQ; HIST1H2BJ; HIST1H4E; TOPBP1; CCNA1;POLE3; RFC4; HIST1H4C

Reactome

Metabolism of porphyrins 0.049059395 0.693910801 0.520588921 1.51361847 2453 15 COX15; COX10; CPOX; HMOX1; PPOX; ALAS1;ALAS2

Reactome

Signaling by PDGF 0.025909009 0.693910801 0.372675253 1.483203733 1298 52 PTPN12; PTPN11; SPP1; THBS1; HRAS; BCAR1;STAT1; RASA1; STAT5A; NRAS; COL6A6;PDGFB; PDGFC; PIK3R2; GRB7

Reactome

Fc epsilon receptor (FCERI) signaling 0.048137329 0.693910801 0.324900889 1.371130902 2405 70 RASGRP4; JUN; LCP2; GAB2; HRAS; PLCG2;MALT1; MAPK1; TAB1; PDPK1; NRAS; VAV1;LYN; PIK3R2; NFATC2; TAB2; MAP2K4; PAK1;PIK3R1

Reactome

Association of TriC/CCT with target proteinsduring biosynthesis

0.0301135 0.693910801 0.40083201 1.495593668 1506 39 CCT7; CCNE2; AP3M1; FBXL3; NOP56; DCAF7;GBA; XRN2; CCT8; FBXW10; WRAP53; TP53;FBXW9

Reactome

Cyclin D associated events in G1 0.047890308 0.693910801 0.370686061 1.420415209 2395 44 E2F2; RB1; E2F3; E2F5; CDKN2C; MNAT1;PPP2R3B; LYN; JAK2; SKP2; CDK6; PPP2R2A;UBC; UBB

Reactome

G1 Phase 0.047890308 0.693910801 0.370686061 1.420415209 2395 44 E2F2; RB1; E2F3; E2F5; CDKN2C; MNAT1;PPP2R3B; LYN; JAK2; SKP2; CDK6; PPP2R2A;UBC; UBB

Reactome

39

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 40: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Mitotic G1-G1/S phases 0.000398153 0.378643096 0.387808644 1.739909886 19 98 E2F2; RBBP4; MCM5; CCNE2; FBXO5; CCNA2;RB1; DHFR; E2F3; POLE; AKT2; MYBL2;E2F5; LIN9; CDKN2C; PRIM1; MNAT1; CCNB1;PPP2R3B; POLE4; LYN; JAK2; SKP2; PRIM2;CDK2; POLA2; CDK6; PPP2R2A; RRM2; UBC;UBB; RPA2; MCM10; POLA1; CCNA1; POLE3

Reactome

Neuronal System 0.032953331 0.693910801 0.239610601 1.257606989 1664 300 GNG5; KCNJ5; KCNMB3; LRFN3; ADCY4;SLC18A3; CAMK2B; GLRA2; GRIK3; CHRNB2;KCNN3; KCNG2; CACNB4; KCNK17; LR-RTM4; HRAS; KCNH2; KCNF1; ACHE; KCNB1;KCNA6; LRFN1; MAPK1; SHANK3; CACNA2D3;PANX1; GNAL; MDM2; CAMK2A; ABCC9;IL1RAP; AP2A2; MAOA; PRKCG; PLCB2;KCNA3; RPS6KA3; KCNN4; RPS6KA2; PDPK1;KCNK10; ADCY3; RASGRF1; KCNN1; SYT2;KCNAB2; GRIK1; ADCY7; LRFN2; GNB5;KCND2; CACNA2D2; CHRNG; RRAS; COMT;GJC1; NLGN3; KCNV2; KCNQ3; KCNQ5;KCNJ10; AP2M1; DLGAP2; KCNJ9; RPS6KA1;KCNH1; GNAI3; SLC1A2; SLC38A1; GABBR2;GNB4; CHRNA5; ACTN2; KCNH8; EPB41L2;KCNK9; ALDH5A1; GABRB2; PLCB1; PTPRD;DNAJC5; SYN1; UNC13B; RTN3; APBA2; GNG10;GALNT8; GRM1; NRXN2; SLC6A1; HTR3A

Reactome

Orc1 removal from chromatin 0.039286643 0.693910801 0.485087392 1.530706284 1964 20 MCM5; CCNA2; SKP2; CDK2; UBC; UBB;CCNA1; MCM4; MCM2; RPS27A

Reactome

DNA Replication 0.005616406 0.494390184 0.377161398 1.600913733 280 72 MCM5; CCNE2; CCNA2; DNA2; POLD3; POLE;ANAPC5; GINS2; GINS3; PRIM1; POLD1; POLE4;SKP2; PRIM2; CDK2; POLD4; RFC1; POLA2;UBE2C; ANAPC1; UBC; UBB; RPA2; MCM10;POLA1; CCNA1; POLE3; RFC4; LIG1; MCM4;CDC7; MCM2; CDC23

Reactome

DNA strand elongation 0.024281417 0.693910801 0.436005842 1.553868982 1218 32 MCM5; DNA2; POLD3; GINS2; GINS3; PRIM1;POLD1; PRIM2; POLD4; RFC1; POLA2; RPA2;POLA1; RFC4; LIG1; MCM4; MCM2

Reactome

Synthesis of DNA 0.005718498 0.494390184 0.384786885 1.611552406 286 67 MCM5; CCNE2; CCNA2; DNA2; POLD3; POLE;ANAPC5; GINS2; GINS3; PRIM1; POLD1; POLE4;SKP2; PRIM2; CDK2; POLD4; RFC1; POLA2;UBE2C; ANAPC1; UBC; UBB; RPA2; POLA1;CCNA1; POLE3; RFC4; LIG1; MCM4; MCM2;CDC23

Reactome

S Phase 0.000997865 0.474484603 0.377374196 1.680267365 49 94 MCM5; CCNE2; CCNA2; RB1; DNA2; POLD3;POLE; AKT2; ESCO2; ANAPC5; PDS5A; ESCO1;STAG1; GINS2; GINS3; PRIM1; MNAT1; POLD1;CDCA5; POLE4; SKP2; STAG2; PRIM2; CDK2;POLD4; RFC1; POLA2; PDS5B; UBE2C; ANAPC1;UBC; UBB; RPA2; POLA1; CCNA1; POLE3; RFC4;LIG1; MCM4; MCM2; CDC23; PTK6

Reactome

Meiotic synapsis 0.041102204 0.693910801 -0.393306518 -1.456675064 2050 38 HIST4H4; HIST1H2BC; HIST1H4A; REC8;HIST1H2BE; SYCP2; HIST1H4I; HIST1H2BO;HIST2H2BE; HIST1H2BL; HIST2H4A

Reactome

EGFR downregulation 0.046340053 0.693910801 0.442271868 1.481488775 2323 25 SPRY2; PTPN12; EGFR; SH3GL2; EPN1;SH3KBP1; HGS; STAM2; UBC; UBB

Reactome

Signaling by EGFR 0.004887976 0.494390184 0.446018221 1.700639482 244 43 SPRY2; PTPN12; PTPN11; HRAS; EGFR;ADAM17; SH3GL2; NRAS; EPN1; ADAM10;PAG1; SH3KBP1; HGS; STAM2; PIK3R1; UBC;UBB

Reactome

Phase 0 - rapid depolarisation 0.012621573 0.666839747 0.445744637 1.622450036 631 35 FGF12; CAMK2B; SCN2A; CACNB4; SCN5A;CACNA2D3; SCN7A; CACNA2D4; CAMK2A

Reactome

Telomere C-strand (Lagging Strand) Synthesis 0.019878928 0.693910801 0.485981376 1.609198551 994 24 DNA2; POLD3; POLE; PRIM1; POLD1; POLE4;PRIM2; POLD4; RFC1; POLA2; RPA2; POLA1;POLE3; RFC4; LIG1

Reactome

Extension of Telomeres 0.026056676 0.693910801 0.44073632 1.543616374 1301 30 DNA2; POLD3; POLE; PRIM1; POLD1; POLE4;WRAP53; PRIM2; POLD4; RFC1; POLA2; RPA2;POLA1; POLE3; RFC4; LIG1

Reactome

activated TAK1 mediates p38 MAPK activation 0.041845772 0.693910801 0.481891353 1.520621099 2092 20 IRAK1; TMEM189; TAB1; NOD2; MAP2K3; MAP-KAPK3; TAB2; MAPK11; RIPK2; UBE2V1; NOD1;IRAK2; MAP2K6; IKBKG

Reactome

Ub-specific processing proteases 0.018451665 0.693910801 -0.273912588 -1.355853169 919 184 WDR20; USP20; POLB; IL33; PSMA4; CDC25A;VDAC2; FKBP8; HIST1H2BC; HIST1H2AH;SMAD7; PSMA6; ADRB2; PSMD10; USP33;HIST2H2BF; HIST1H2BE; CYLD; HIF1A; FOXO4;MAP3K7; USP2; USP30; RNF123; SMURF2;IFIH1; USP37; HIST1H2BO; TAF10; RCE1; AR;WDR48; USP47; MUL1; PSMA3; TAF9B; TNKS2;PSMD12; PSMD14; HIST2H2BE; USP25; USP22;PSMB3; HIST1H2BL; PSMB10; HIST3H2A;PSMA5

Reactome

40

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 41: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Rab regulation of trafficking 0.016865438 0.693910801 -0.310814143 -1.429285834 839 113 TSC1; DENND2D; DENND1C; MON1B;MAP1LC3B; RAB13; TRAPPC5; TRAPPC4;RAB33B; TBC1D24; TSC2; DENND3; SBF2;TBC1D10C; RAB14; RAB5B; DENND5B;TBC1D17; RAB3IL1; GDI2; TBC1D14; TBC1D3B;TBC1D10B; RAB11A; TRAPPC6B; RAB5A;TBC1D15; RINL; GDI1; CHML; DENND4B; CHM;TRAPPC2L; SYTL1; DENND2C; RABGAP1;RAB6B; TRAPPC3; RAB33A; RAB4A; RAB27B;RAB18; RAB27A; RABGEF1; GABARAP;TBC1D20; TBC1D16; RAB1A; RAB35

Reactome

GRB2 events in ERBB2 signaling 0.0290478 0.693910801 0.5474155 1.591617065 1452 15 HRAS; EGFR; BTC; ERBB4; EREG; NRAS; NRG4;HBEGF; NRG1

Reactome

Deubiquitination 0.025772574 0.693910801 -0.250968153 -1.293757621 1280 254 WDR20; USP20; POLB; IL33; PSMA4; CDC25A;OTUD5; VDAC2; FKBP8; HIST1H2BC;HIST1H2AH; ACTB; SMAD7; PSMA6; ADRB2;PSMD10; USP33; HIST2H2BF; HIST1H2BE;CYLD; FOXK2; HIF1A; EP300; TNFAIP3;FOXO4; MAP3K7; USP2; KAT2B; ACTR8;STAM; USP30; RNF123; RHOA; SMURF2; IFIH1;USP37; HIST1H2BO; TAF10; RCE1; AR; WDR48;RAD23A; USP47; MUL1; PSMA3; TAF9B; TNKS2;PSMD12; PSMD14; BAP1; KDM1B; HIST2H2BE;USP25; USP22; MCRS1; PSMB3; HIST1H2BL;TNIP3; PSMB10; HIST3H2A; PSMA5

Reactome

G1/S Transition 0.008623104 0.546704814 0.370644315 1.564175082 430 70 MCM5; CCNE2; FBXO5; CCNA2; RB1; DHFR;POLE; AKT2; PRIM1; MNAT1; CCNB1; PPP2R3B;POLE4; SKP2; PRIM2; CDK2; POLA2; RRM2;UBC; UBB; RPA2; MCM10; POLA1; CCNA1;POLE3; MCM4; CDC7; MCM2; PTK6

Reactome

Regulation of expression of SLITs and ROBOs 0.011538385 0.645470811 0.557460704 1.707824344 576 18 ROBO3; ISL1; MSI1; LHX4; ROBO1; HOXA2;ROBO2; DAG1; LHX2; LDB1; UBC; UBB; SLIT1

Reactome

Neurotransmitter receptors and postsynapticsignal transmission

0.035910224 0.693910801 0.291106071 1.347770564 1799 119 GNG5; KCNJ5; ADCY4; CAMK2B; GLRA2;GRIK3; CHRNB2; HRAS; MAPK1; GNAL; MDM2;CAMK2A; AP2A2; PRKCG; PLCB2; RPS6KA3;RPS6KA2; PDPK1; ADCY3; RASGRF1; GRIK1;ADCY7; GNB5; CHRNG; RRAS; KCNJ10; AP2M1;KCNJ9; RPS6KA1; GNAI3; GABBR2; GNB4;CHRNA5; ACTN2; GABRB2; PLCB1

Reactome

Signaling by SCF-KIT 0.002292435 0.494390184 0.468311909 1.776418915 114 42 CHEK1; PTPN11; SOCS6; GAB2; HRAS; SH2B3;STAT1; STAT5A; NRAS; VAV1; LYN; JAK2;MMP9; PIK3R2; GRAP; GRB7

Reactome

Signaling by FGFR in disease 0.002600676 0.494390184 0.479804168 1.768849487 129 37 FGF5; GAB2; HRAS; STAT1; TRIM24; FGFR1OP;FGFR3; ZMYM2; STAT5A; NRAS; ERLIN2

Reactome

DNA Replication Pre-Initiation 0.042837992 0.693910801 0.416569436 1.471162121 2140 31 MCM5; POLE; PRIM1; POLE4; PRIM2; CDK2;POLA2; UBC; UBB; RPA2; MCM10; POLA1;POLE3; MCM4; CDC7; MCM2; RPS27A; RPA1;GMNN

Reactome

M/G1 Transition 0.042837992 0.693910801 0.416569436 1.471162121 2140 31 MCM5; POLE; PRIM1; POLE4; PRIM2; CDK2;POLA2; UBC; UBB; RPA2; MCM10; POLA1;POLE3; MCM4; CDC7; MCM2; RPS27A; RPA1;GMNN

Reactome

Post NMDA receptor activation events 0.047347768 0.693910801 0.408297073 1.455118479 2376 32 CAMK2B; HRAS; MAPK1; CAMK2A; RPS6KA3;RPS6KA2; PDPK1; ADCY3; RASGRF1; RRAS;RPS6KA1; ACTN2

Reactome

Toll Like Receptor 4 (TLR4) Cascade 0.0301014 0.693910801 0.291253991 1.360552593 1510 126 PTPN11; JUN; TLR1; IRAK1; S100A12; PLCG2;MAP2K1; CD36; TICAM1; TMEM189; RIPK1;MAPK1; RIPK3; TAB1; TLR2; IRAK3; NOD2;RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4;MAP2K3; DNM1; IRF7; AGER; MAPKAPK3; LBP;RPS6KA1; VRK3; UBE2D3; TAB2; MAP2K4;TRAF3; TLR6; UBC; UBB; ATF1

Reactome

Signaling by cytosolic FGFR1 fusion mutants 0.02521699 0.693910801 0.533694057 1.606635804 1257 17 GAB2; STAT1; TRIM24; FGFR1OP; ZMYM2;STAT5A; FGFR1OP2; CPSF6; PIK3R1

Reactome

FGFR1 mutant receptor activation 0.008454468 0.546704814 0.515043237 1.725252787 423 25 FGF5; GAB2; STAT1; TRIM24; FGFR1OP;ZMYM2; STAT5A; ERLIN2; FGFR1OP2; CPSF6;PIK3R1

Reactome

Signaling by FGFR1 in disease 0.003983826 0.494390184 0.494024464 1.76063992 199 32 FGF5; GAB2; HRAS; STAT1; TRIM24; FGFR1OP;ZMYM2; STAT5A; NRAS; ERLIN2; FGFR1OP2;CPSF6; PIK3R1

Reactome

Gastrin-CREB signalling pathway via PKC andMAPK

0.005151643 0.494390184 0.602598536 1.814066261 256 17 HRAS; EGFR; MAPK1; RPS6KA3; RPS6KA2;NRAS; RPS6KA1; HBEGF

Reactome

MyD88:Mal cascade initiated on plasma mem-brane

0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36;TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2;RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4;MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3;TAB2; MAP2K4; TLR6; UBC; UBB; ATF1;MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2;ELK1; SAA1; NOD1; SIGIRR; RPS27A

Reactome

41

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 42: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

Toll Like Receptor TLR1:TLR2 Cascade 0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36;TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2;RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4;MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3;TAB2; MAP2K4; TLR6; UBC; UBB; ATF1;MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2;ELK1; SAA1; NOD1; SIGIRR; RPS27A

Reactome

Toll Like Receptor TLR6:TLR2 Cascade 0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36;TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2;RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4;MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3;TAB2; MAP2K4; TLR6; UBC; UBB; ATF1;MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2;ELK1; SAA1; NOD1; SIGIRR; RPS27A

Reactome

Toll Like Receptor 2 (TLR2) Cascade 0.046558947 0.693910801 0.30108643 1.345321599 2333 96 JUN; TLR1; IRAK1; S100A12; MAP2K1; CD36;TMEM189; MAPK1; TAB1; TLR2; IRAK3; NOD2;RPS6KA3; CD14; RPS6KA2; MEF2C; IRAK4;MAP2K3; AGER; MAPKAPK3; RPS6KA1; VRK3;TAB2; MAP2K4; TLR6; UBC; UBB; ATF1;MAPK11; RIPK2; MAP3K8; UBE2V1; NKIRAS2;ELK1; SAA1; NOD1; SIGIRR; RPS27A

Reactome

FRS-mediated FGFR1 signaling 0.020411427 0.693910801 0.563807735 1.639277683 1020 15 PTPN11; FGF5; HRAS; NRAS ReactomeResolution of D-loop Structures through Holli-day Junction Intermediates

0.007078623 0.546704814 0.521156029 1.745728954 354 25 EME2; DNA2; BLM; BARD1; XRCC2; GEN1;RAD51; BRCA2; RAD50; XRCC3; TOP3A; PALB2

Reactome

Resolution of D-loop Structures throughSynthesis-Dependent Strand Annealing(SDSA)

0.014896684 0.693910801 0.505722889 1.655319564 743 23 DNA2; BLM; BARD1; XRCC2; RAD51; BRCA2;RAD50; XRCC3; TOP3A; PALB2

Reactome

Resolution of D-Loop Structures 0.008340317 0.546704814 0.50189357 1.714001031 417 27 EME2; DNA2; BLM; BARD1; XRCC2; GEN1;RAD51; BRCA2; RAD50; XRCC3; TOP3A; PALB2

Reactome

NOTCH1 regulation of human endothelial cellcalcification

0.012947907 0.429593065 0.565655774 1.703454736 644 17 SOX6; DLL1; FGFR3; DLL4; DLL3; MGP; ALPL;JAG2; ITGA1; CALU; SAT1

Wikipathways

Energy Metabolism 0.01705587 0.429593065 -0.404783261 -1.55838458 855 45 SIRT3; MYBBP1A; UCP3; MEF2A; PPP3CC;TFB1M; EP300; NRF1; PPARG; PPP3CB;PRKAA2; ATF2; PRMT1; MEF2B

Wikipathways

Fluoropyrimidine Activity 0.007188338 0.383959044 0.4843776 1.70963535 358 31 UPP2; DHFR; RRM1; GGH; UMPS; FPGS; XRCC3;TYMP; CES2; ABCC5; TP53; ABCG2; RRM2;ABCC3; TDG; UPP1

Wikipathways

IL-1 signaling pathway 0.02181235 0.429593065 0.374172869 1.503218196 1087 55 PTPN11; JUN; IRAK1; MAP2K1; MAPK1; TAB1;IL1RAP; MAP3K14; IRAK3; IRAK4; MAP2K3;IL1B; PIK3R2; TAB2; MAP2K4; PIK3R1

Wikipathways

IL-7 Signaling Pathway 0.023391696 0.429593065 0.474438774 1.586739115 1168 25 PTK2B; STAT1; MAP2K1; MAPK1; BAD; STAT5A;IL7; BCL2L1; PIK3R2

Wikipathways

Androgen Receptor Network in Prostate Cancer 0.044706917 0.490576187 0.334425835 1.390117319 2236 65 SPRY2; PTPN11; FOXA1; JUN; PTK2B; HRAS;SMARCA4; RASA1; MAP2K1; RAP1B; MAPK1;HSD17B3; SMARCD3

Wikipathways

Retinoblastoma Gene in Cancer 4e-05 0.013691501 0.431472262 1.891761227 1 86 FAF1; E2F2; RBBP4; CHEK1; FANCG; CCNE2;CCNA2; RB1; DHFR; POLD3; E2F3; BARD1;POLE; MSH6; RRM1; MDM2; PRIM1; CCNB1;ANLN; HMGB2; SKP2; TP53; DCK; ZNF655;CDK2; CDK6; MAPK13; RBBP7; RRM2

Wikipathways

Nanoparticle triggered autophagic cell death 0.003728145 0.318756355 -0.581397567 -1.832634063 186 20 TSC1; ATG7; ULK2; MAP1LC3A; ATG4A; TSC2;CHAF1A; SH3GLB1; BCL2

Wikipathways

Aryl Hydrocarbon Receptor 0.013704952 0.429593065 0.416211196 1.586078175 682 43 HPGDS; RB1; HRAS; MAP2K1; EGFR; CD36;MAPK1; NRAS; ARNT; GCLC; NCOR2; CDC37;CDK2

Wikipathways

Notch Signaling 0.049067576 0.490576187 0.370069984 1.417864534 2446 44 KCNJ5; LFNG; MFNG; DLL1; CTBP1; ADAM17;DVL2; DVL3; HES5; NCOR2; DTX3; DLL4; DLL3;MAML3

Wikipathways

Mammary gland development pathway - Preg-nancy and lactation (Stage 3 of 4)

0.00056065 0.06391414 0.556819695 1.965323405 27 31 CEBPB; GAL; EGFR; ELF5; PNCK; ERBB4;ORAI1; BCL2L1; TNFSF11; JAK2; PRLR; ATP2C2

Wikipathways

Bladder Cancer 0.031270032 0.486106862 0.400376699 1.492783629 1560 39 RB1; THBS1; HRAS; MAP2K1; EGFR; FGFR3;MAPK1; MDM2; NRAS; TYMP; MMP9; PIK3R2;TP53; HBEGF; CDH1

Wikipathways

IL-3 Signaling Pathway 0.034509033 0.490576187 0.375656486 1.459043847 1720 47 PTPN11; JUN; GAB2; HRAS; MAP2K1; MAPK1;BAD; STAT5A; BCL2L1; VAV1; LYN; JAK2;IL3RA; PIK3R2; CSF2RB

Wikipathways

IL1 and megakaryocytes in obesity 0.024731937 0.429593065 0.478149273 1.582769407 1233 24 TLR1; IRAK1; F2R; IL18; TLR2; IL1B; MMP9;HBEGF; NLRP3; TIMP2

Wikipathways

Kit receptor signaling pathway 0.04258049 0.490576187 0.34477382 1.405720859 2123 59 PTPN11; SOCS6; GAB2; HRAS; STAT1; MAP2K1;MAPK1; BAD; STAT5A; RPS6KA3; VAV1; LYN;JAK2; RPS6KB1; PIK3R2; GRB7; RPS6KA1

Wikipathways

Signaling of Hepatocyte Growth Factor Recep-tor

0.008262308 0.383959044 0.463582723 1.672953779 411 34 PTPN11; JUN; PTK2B; HRAS; RASA1; MAP2K1;RAP1B; MAPK1; MET

Wikipathways

Melatonin metabolism and effects 0.042413309 0.490576187 0.426013246 1.478593933 2115 29 IRAK1; AANAT; ACHE; CAMK2A; CYP2D6;MAOA; PER1; CRY1; PER2

Wikipathways

Preimplantation Embryo 0.025122401 0.429593065 0.396990647 1.512833405 1251 43 AQP3; SOX2; CDX2; POU5F1; SMARCA4;BATF3; E2F5; ZFP36; MYBL1; FOXD1; HMGA1;TBX3; AQP9; CDH1; GATA2; H2AFY2; GATA3;SOX11

Wikipathways

Exercise-induced Circadian Regulation 0.016795994 0.429593065 -0.397994164 -1.546332431 841 47 HERPUD1; GFRA1; PSMA4; ETV6; STBD1;UCP3; KLF9; G0S2; CLDN5; CBX3; GSTM3;ARNTL; PPP2CB; BTG1; PURA; GSTP1; NR1D2;NCKAP1

Wikipathways

42

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint

Page 43: A meta-learning approach for genomic survival analysis · 21/04/2020  · A meta-learning approach for genomic survival analysis Yeping Lina Qiu1 ;2, Hong Zheng , Arnout Devos3, Olivier

Table S6: Lung cancer gene set enrichment analysis (GSEA) (continued)

Pathway p value p value(adjusted)

ES NES n More Ex-treme

Size Leading Edge Genes Source

MET in type 1 papillary renal cell carcinoma 0.028346014 0.461635087 0.363248835 1.464864955 1411 56 PTPN11; JUN; HRAS; AKT2; MAP2K1; RAP1B;STRN; MAPK1; MET; BAD; NRAS; ALK; TFE3;RPL11

Wikipathways

Phosphodiesterases in neuronal function 0.040324845 0.490576187 0.377085467 1.444743246 2010 44 PDE8A; GUCY1B2; ADCY4; PDE5A; PDE3A;PDE4A; ADCY10; PDE4B; ADCY3; ADCY7

Wikipathways

Non-small cell lung cancer 0.008981498 0.383959044 0.37789964 1.576064839 449 66 E2F2; RB1; HRAS; PLCG2; E2F3; AKT2; PDK1;MAP2K1; EGFR; MAPK1; BAD; PRKCG; STAT5A;NRAS; ALK; PIK3R2; GADD45A; TP53; RXRB

Wikipathways

Breast cancer pathway 0.047455598 0.490576187 0.273185514 1.30107725 2377 142 E2F2; JUN; FGF5; RB1; HRAS; E2F3; AKT2;WNT11; DLL1; MAP2K1; FZD3; EGFR; MAPK1;FZD2; DVL2; RAD51; NRAS; BRCA2; RAD50;CSNK1A1L; DVL3; HES5; TNFSF11; HEYL;WNT10A; RPS6KB1; PIK3R2; GADD45A;WNT10B; TP53; PTEN; CSNK1A1; DLL4; DLL3;FRAT1; SHC2; CDK6; FGF19; TCF7L2; PIK3R1

Wikipathways

LTF danger signal response pathway 0.008870384 0.383959044 0.590021326 1.746165632 440 16 IRAK1; MAPK1; TLR2; CD14; IRAK4; IL1B;AGER; TREM1

Wikipathways

G1 to S cell cycle control 0.0226521 0.429593065 0.365984997 1.487035992 1129 58 E2F2; MCM5; CCNE2; RB1; E2F3; POLE; ATF6B;MDM2; CDKN2C; PRIM1; MNAT1; CCNB1;CREB3; GADD45A; TP53; PRIM2; CDK2; POLA2;CDK6

Wikipathways

DNA Replication 0.01926508 0.429593065 0.427640186 1.564624613 960 36 MCM5; POLD3; POLE; PRIM1; POLD1; PRIM2;CDK2; POLD4; RFC1; POLA2; UBC; RPA2;MCM10; POLA1; RFC4; MCM4; CDC7; MCM2

Wikipathways

Hedgehog Signaling Pathway 0.047153273 0.490576187 -0.52375165 -1.521369023 2366 15 SAP18; KIF7; GLI3 WikipathwaysIL-2 Signaling Pathway 0.045674089 0.490576187 0.380180057 1.433483893 2278 41 PTPN11; JUN; GAB2; PTK2B; HRAS; STAT1;

MAP2K1; MAPK1; STAT5AWikipathways

Type II interferon signaling (IFNG) 0.02016339 0.429593065 0.429058527 1.55989128 1006 35 PTPN11; PSMB9; STAT1; NOS2; CXCL9; OAS1;IRF8; STAT2; JAK2; IL1B; HLA-B

Wikipathways

ErbB Signaling Pathway 0.018199656 0.429593065 0.330621667 1.455654347 908 88 JUN; CAMK2B; HRAS; PLCG2; AKT2; PDK1;MAP2K1; EGFR; MAPK1; BTC; MDM2; ERBB4;CAMK2A; BAD; PRKCG; EREG; STAT5A; NRAS;ABL2; RPS6KB1; PIK3R2; TP53; NRG4; HBEGF;BCL2L11; MAP2K4; SHC2; PAK1; PIK3R1

Wikipathways

Serotonin Receptor 2 and ELK-SRF-GATA4signaling

0.048044558 0.490576187 0.483773588 1.503377097 2397 19 HRAS; GATA4; MAP2K1; MAPK1; HTR2B;NRAS; RASGRF1

Wikipathways

Toll-like Receptor Signaling Pathway 0.04829932 0.490576187 0.307545254 1.351237917 2413 87 JUN; TLR1; IRAK1; SPP1; TLR8; STAT1; AKT2;MAP2K1; TICAM1; RIPK1; MAPK1; CXCL9;TAB1; TLR2; CD14; IRAK4; MAP2K3; IL1B;PIK3R2; IRF7; LBP; CD40; TAB2; MAP2K4;MAPK13; TRAF3; TLR6; PIK3R1

Wikipathways

43

.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (whichthis version posted April 23, 2020. ; https://doi.org/10.1101/2020.04.21.053918doi: bioRxiv preprint