supplementary materials for · 12/19/2011 · supplementary materials for predicting adverse drug...

www.sciencetranslationalmedicine.org/cgi/content/full/3/114/114ra127/DC1

Supplementary Materials for

Predicting Adverse Drug Events Using Pharmacological Network Models

Aurel Cami,* Alana Arnold, Shannon Manzi, Ben Reis

*To whom correspondence should be addressed. E-mail: [email protected]

Published 21 December 2011, Sci. Transl. Med. 3, 114ra127 (2011)

DOI: 10.1126/scitranslmed.3002774

This PDF file includes:

Methods Table S1. Definition of covariates. Table S2. List of drugs and their ATC codes. Table S3. Number of missing observations for PubChem properties extracted for this study. Table S4. Number of missing observations for DrugBank properties extracted for this study. Table S5. Intercorrelation analysis of covariates. Table S6. Prediction cases studies. Table S7. List of supplementary source code files. Fig. S1. Newly associated ADEs per drug in each ATC top-level group. Fig. S2. Newly associated drugs per ADE in each MedDRA top-level group. Fig. S3. Comparative histograms of scores for the observed edges and non-edges by the three model types. Fig. S4. Three-way Venn diagrams for the sets of true and false positives generated by models NET, TAX, and INT. Fig. S5. Comparative histograms of selected network covariates for the predicted edges and non-edges. Fig. S6. Comparative histograms of selected taxonomic covariates for the predicted edges and non-edges. Fig. S7. Comparative histograms of the intrinsic covariates for the predicted edges and non-edges. Fig. S8. Drug-specific AUROCs. Fig. S9. ADE-specific AUROCs.

Other Supplementary Material for this manuscript includes the following: (available at www.sciencetranslationalmedicine.org/cgi/content/full/3/114/114ra127/DC1)

File “meddra_mapping_code.sas” (SAS code to perform MedDRA mapping). File “NET_INT_covariates.R” (R code to compute network and intrinsic covariates). File “TAX_covariates.sas”(SAS code to compute taxonomic covariates). File “Fig2-highres.tif” (high-resolution version of Fig. 2).

1

Supplementary Methods

Mapping ADE names to MedDRA taxonomy We employed the following approach to map ADE names to the Medical Dictionary for

Regulatory Activities (MedDRA) terminology. First, we performed exact matching of each ADE

name against the lowest level terms (LLTs) of MedDRA. This step led to approximately 40% of

the unique ADE names being matched to LLTs. Next, for each non‐matched ADE name we

identified the two closest LLTs in terms of the string generalized edit distance (computed using

function COMPGED in the Statistical Analysis System (SAS) v9.2). Computer code to perform

exact matching and to identify the two closest LLTs of an ADE name is provided below and as

supplementary online files (table S7): meddra_mapping_code.sas, NET_INT_covariates.R, and

TAX_covariates.sas. Then, we performed a manual scan of the list of ADE names and their two

closest LLTs and were able to determine a match between an ADE name and one of its two

closest LLTs for approximately half of the list. We coded the final 30% of ADE names that were

still left unmatched at the end of the preceding step by performing term‐based searches using a

MedDRA browser. After the mapping of all ADE names to MedDRA LLT level was completed, we

identified the unique PT corresponding to each LLT. Finally, we identified the list of HLTs that

corresponded to each PT generated by the preceding step. In this study, all adverse events

were represented by their MedDRA HLT codes.

Source code Any reuse of all or part of these codes must reference this publication. The corresponding SAS

and R files are provided as supplementary online material.

MedDRA mapping SAS code /***************************************************************** Macro to exact-match ADE names to MedDRA LLT names. "in_ds" should be a SAS library containing two kinds of input files: First, it should contain a list of unique ADE names occurring in the drug-ADE database. This list is assumed to have been stored in a SAS data set named "<YEAR>_aes", where YEAR is 2005 or 2010. This data set contains one column named "ae_name". Second, the library should contain a list of unique LLT names occurring in MedDRA. This list is assumed to be stored in a SAS data set named "<YEAR>_unique_llts_meddra". This data set should contain one column named

2

"llt_name" as well as other columns including the MedDRA information pertaining to an LLT, such as the pt_code, pt_name, and so on. ******************************************************************/ %macro exact_match(year= ); proc sort data=in_ds._&year._aes; by ae_name; run; quit; proc sort data=in_ds._&year._unique_llts_meddra; by llt_name; run; quit; data out_ds._&year._aes_meddra; merge in_ds._&year._aes(in=a) in_ds._&year._unique_llts_meddra(in=b rename=(llt_name=ae_name)); by ae_name; if(a); run; quit; %mend exact_match; /****************************************************************** Macro to compute the smallest GED distance between ADE names and MedDRA LLT names. "in_ds" should be a SAS library containing two input SAS data sets: First, this library is supposed to contain the list of ADE names that were not exact-matched after running the previous macro. This list is assumed to be stored in the SAS data set "<YEAR>_unique_aes_llt_nomatch" where YEAR is either 2005 or 2010. This data set contains one column named "ae_name". Second, this library is supposed to contain the list of all unique LLT names in MedDRA. This is list is supposed to be stored in a SAS data set named "llt_lltname_only". This file has only one one column named "llt_name". out_ds: is a library that will contain output file(s) produced by the macro. To limit the running time to a few hours, this macro should be run in a cluster, with each computing node processing a portion of the ADE names contained in the input file "<YEAR>_unique_aes_llt_nomatch". This portion is defined by the macro variables: "jobnum": taking values 1,... "rows_per_job": number of ADE names in the job "total_rows": total number of ADE names to be processed One output file per job will be produced. These partial output files should in the end merged together. Each output file produced by a job contains the following fields: AE_name min_llt1: the closest LLT name in terms of GED min_score1: the GED between AE_name and min_llt1 min_llt2: the second closest LLT name min_score2: the GED between AE_name and min_llt2

3

Notes on running time: The GED computation for all the ADE names occurring in the drug-ADE database in our study took a few hours using the Orchestra cluster (http://ritg.med.harvard.edu/cluster.html)and a few hundred jobs. ******************************************************************/ /* The macro variables below should be re-defined as needed before submitting each job (e.g. using an external script). The values below are given for illustration purposes only */ %let jobnum=1; %let rows_per_job=20; %let total_rows=1000; %macro find_min_ged_llt(year= ,num_llts= ); /* create a local macro variable per LLT name */ data _null_; set in_ds.llt_lltname_only; call symputx(cats("llt_name_", _N_), llt_name, "L"); run; quit; /* find two closest LLTs to each ADE name */ data out_ds._&year._unique_aes_geds_&jobnum; length min_llt1 min_llt2 $ 255; length min_score1 min_score2 8; set in_ds._&year._unique_aes_llt_nomatch; start_ob = (&jobnum - 1)* &rows_per_job + 1; end_ob = &jobnum * &rows_per_job; if (end_ob > &total_rows) then end_ob = &total_rows; if (_N_ < start_ob OR _N_ > end_ob) then delete; else do; *put "_N_ = " _N_; i = 1; min_score1 = 100000; min_score2 = 100000; /* very large values */ min_llt1 = ""; min_llt2 = ""; do while (i <= &num_llts); llt_name_i = SYMGET(cats("llt_name_", i)); score = COMPGED(ae_name, llt_name_i); if (score < min_score1) then do; min_score2 = min_score1; min_score1 = score; min_llt2 = min_llt1; min_llt1 = llt_name_i; end; else if (score < min_score2) then do; min_score2 = score; min_llt2 = llt_name_i;

4

end; i = i+1; end; end; keep AE_name min_llt1 min_score1 min_llt2 min_score2; run; quit; run; quit; %mend find_min_ged_llt; /* example calls */ *%exact_match(year=2010); *%find_min_ged_llt(year=2010, num_llts=67159);

NET, INT covariates R code ####################################################################### # The following functions require R libraries "network" and "sna". # One way to acquire these libraries is to install the "statnet" suite # of packages (www.statnetproject.org). # # Note on running time: The functions listed below were executed for # 689,268 drug-ADE pairs in the network discussed in the paper # # This task was carried out using the Orchestra cluster # (http://ritg.med.harvard.edu/cluster.html), with a few hundred jobs in parallel and took about one day to complete. ###################################################################### ####################################################################### # Function to compute the covariate "euclid-min" discussed in the # paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length two. The first element of this vector denotes the value of covariate "euclid-min" ###################################################################### compute.value.euclid.dist.features = function(G, node1, node2) { all.attr.names = list.vertex.attributes(G) quant.attr.names = setdiff(all.attr.names, c("node_id","drug_name","DrugCard_ID","na", "PubChem_Compound_ID","stitch_compound_name1","vertex.names"))

5

n.drugs = network.size(G) n.attributes = length(quant.attr.names) attr.mat = matrix(-999, nrow=n.drugs, ncol=n.attributes) for (i in 1:n.attributes) { attr.vec = get.vertex.attribute(G, quant.attr.names[i]) attr.mat[,i] = attr.vec } attrs.node1 = attr.mat[node1, ] N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) result.value.attr = numeric(2) min.feature.name = "min_euclid_dist" mean.feature.name = "mean_euclid_dist" names(result.value.attr) = c(min.feature.name, mean.feature.name) if (length(N2) == 0) { result.value.attr[min.feature.name] = 0 result.value.attr[mean.feature.name] = 0 } else { attr.mat.node2 = attr.mat[N2, ] merged.attr.mat = rbind(attrs.node1, attr.mat.node2) dist.mat = as.matrix(dist(merged.attr.mat)) dist.vec = dist.mat[1,2:nrow(merged.attr.mat)] result.value.attr[min.feature.name] = min(dist.vec) result.value.attr[mean.feature.name] = mean(dist.vec) } return(result.value.attr) } ####################################################################### # Function to compute the distribution of Euclidean distances # in the neighborhood of a drug-ADE pair. This distribution is used # in the computation of covariate "euclid-KL" discussed in the # paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a discretized version of the distribution # of Euclidean distances in the neighborhood of pair (node1, node2) ###################################################################### compute.value.euclid.dist.features.full = function(G, node1, node2) { all.attr.names = list.vertex.attributes(G) quant.attr.names = setdiff(all.attr.names, c("node_id","drug_name","DrugCard_ID","na", "PubChem_Compound_ID","stitch_compound_name1","vertex.names"))

6

n.drugs = network.size(G) n.attributes = length(quant.attr.names) attr.mat = matrix(-999, nrow=n.drugs, ncol=n.attributes) for (i in 1:n.attributes) { attr.vec = get.vertex.attribute(G, quant.attr.names[i]) attr.mat[,i] = attr.vec } attrs.node1 = attr.mat[node1, ] N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) nbins = 20 result.value.attr = numeric(nbins) bin.names = character(nbins) for (i in 1:nbins) { bin.names[i] = paste("euclid_bin",i,sep="") } names(result.value.attr) = bin.names breaks.vec = c(seq(from=0, by=40, length.out=20), 10^10) if (length(N2) == 0) { result.value.attr=rep(0,20) } else { attr.mat.node2 = attr.mat[N2, ] merged.attr.mat = rbind(attrs.node1, attr.mat.node2) dist.mat = as.matrix(dist(merged.attr.mat)) dist.vec = dist.mat[1,2:nrow(merged.attr.mat)] histogram.obj = hist(dist.vec, breaks=breaks.vec, plot=FALSE) result.value.attr = histogram.obj$density } return(result.value.attr) } ####################################################################### # Function to compute the covariates "degree-prod" and "degree-absdiff" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length four. The third and second # elements of this vector denote "degree-prod" and "degree-absdiff", # respectively. ###################################################################### compute.degree.features = function(G, node1, node2, D) {

7

#D = degree(G, gmode="graph") D.node1 = D[node1] D.node2 = D[node2] result.degree = numeric(4) result.degree[1:4] <- NA names(result.degree) = c('degree_sum', 'degree_absdiff', 'degree_prod', 'degree_ratio') result.degree['degree_sum'] = D.node1 + D.node2 result.degree['degree_absdiff'] = abs(D.node1 - D.node2) result.degree['degree_prod'] = D.node1 * D.node2 result.degree['degree_ratio'] = D.node1/D.node2 return(result.degree) } ####################################################################### # Function to compute the covariate "jackard-drug-max" # discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length three. The second # element of this vector denotes covariate "jackard-drug-max" ###################################################################### compute.jackard.drug.features = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) n.neighbors = length(N2) result.jackard.drug = numeric(3) result.jackard.drug[1:3] <- NA names(result.jackard.drug) = c('jackard_drug_min','jackard_drug_max','jackard_drug_mean') if (n.neighbors == 0) { result.jackard.drug['jackard_drug_min'] = 0 result.jackard.drug['jackard_drug_max'] = 0 result.jackard.drug['jackard_drug_mean'] = 0 } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N2[i] N1.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N1, N1.i) union.i = union(N1, N1.i) jackard.vector[i] = length(intersection.i)/length(union.i) }

8

result.jackard.drug['jackard_drug_min'] = min(jackard.vector) result.jackard.drug['jackard_drug_max'] = max(jackard.vector) result.jackard.drug['jackard_drug_mean'] = mean(jackard.vector) } return(result.jackard.drug) } ####################################################################### # Function to compute the distribution of Jackard coefficients in the neighborhood of a drug-ADE pair. This distribution is used in the computation of covariate "jackard-drug-KL" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a discretized version of the distribution # of Jackard coefficients in the neighborhood of pair (node1, node2) ####################################################################### compute.jackard.drug.features.full = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) n.neighbors = length(N2) nbins = 20 result.jackard.drug = numeric(nbins) bin.names = character(nbins) for (i in 1:nbins) { bin.names[i] = paste("jackard_drugs_bin",i,sep="") } breaks.vec = seq(from=0, by=0.05, length.out=21) if (length(N2) == 0) { result.jackard.drug=rep(0,20) } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N2[i] N1.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N1, N1.i) union.i = union(N1, N1.i) jackard.vector[i] = length(intersection.i)/length(union.i) } histogram.obj = hist(jackard.vector, breaks=breaks.vec, plot=FALSE) result.jackard.drug = histogram.obj$density #cat("sum ", sum(result.jackard.drug), "\n"); flush.console() }

9

names(result.jackard.drug) = bin.names return(result.jackard.drug) } ####################################################################### # Function to compute the covariate "jackard-ADE-max" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length three. The second element of this vector denotes covariate "jackard-ADE-max" ####################################################################### compute.jackard.ae.features = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N1 = setdiff(N1, node2) n.neighbors = length(N1) result.jackard.ae = numeric(3) result.jackard.ae[1:3] <- NA names(result.jackard.ae) = c('jackard_ae_min','jackard_ae_max','jackard_ae_mean') if (n.neighbors == 0) { result.jackard.ae['jackard_ae_min'] = 0 result.jackard.ae['jackard_ae_max'] = 0 result.jackard.ae['jackard_ae_mean'] = 0 } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N1[i] N2.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N2, N2.i) union.i = union(N2, N2.i) jackard.vector[i] = length(intersection.i)/length(union.i) } result.jackard.ae['jackard_ae_min'] = min(jackard.vector) result.jackard.ae['jackard_ae_max'] = max(jackard.vector) result.jackard.ae['jackard_ae_mean'] = mean(jackard.vector) } return(result.jackard.ae) } ####################################################################### # Function to compute the distribution of Jackard coefficients # in the neighborhood of a drug-ADE pair. This distribution is used # in the computation of covariate "jackard-ADE-KL" discussed in the # paper. #

10

# G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a discretized version of the distribution # of Jackard coefficients in the neighborhood of pair (node1, node2) ####################################################################### compute.jackard.ae.features.full = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N1 = setdiff(N1, node2) n.neighbors = length(N1) nbins = 20 result.jackard.ae = numeric(nbins) bin.names = character(nbins) for (i in 1:nbins) { bin.names[i] = paste("jackard_aes_bin",i,sep="") } breaks.vec = seq(from=0, by=0.05, length.out=21) if (n.neighbors == 0) { result.jackard.ae=rep(0,20) } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N1[i] N2.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N2, N2.i) union.i = union(N2, N2.i) jackard.vector[i] = length(intersection.i)/length(union.i) } histogram.obj = hist(jackard.vector, breaks=breaks.vec, plot=FALSE) result.jackard.ae = histogram.obj$density } names(result.jackard.ae) = bin.names return(result.jackard.ae) } ####################################################################### # Function to compute the covariate "edge-density" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G

11

# # This function returns a vector of length one denoting the covariate # "edge-density" ####################################################################### compute.edge.density.features = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) numer = sum(G[N2,N1]) denom = length(N1)*length(N2) result.edge.dens = numeric(1) result.edge.dens[1] <- NA names(result.edge.dens) = c('edge_dens') result.edge.dens['edge_dens'] = numer/denom return(result.edge.dens) }

TAX covariates SAS code /* * Function to compute the covariate "atc-min" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_atc which also contains * the value of "atc-min" covariate for each drug-ADE pair */ %macro add_ATC_codes_min(all_pairs_ds= ); %local macro_i; /* create hash tables */ data in_ds.&all_pairs_ds._2005edge; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2; run; quit; proc sort data=in_ds.&all_pairs_ds._2005edge; by node_id_2; run; quit; proc transpose data=in_ds.&all_pairs_ds._2005edge out=in_ds.&all_pairs_ds._2005edge_t prefix=node_id_1_; by node_id_2; var node_id_1; run; quit;

12

data in_ds.&all_pairs_ds._2005edge_t; set in_ds.&all_pairs_ds._2005edge_t; drop node_id_1 _NAME_ _LABEL_; run; quit; /* perform ATC distance computations */ data in_ds.&all_pairs_ds._atc; length node_id_1 8; length node_id_2 8; length atc_min_val 8; length atc_max_val 8; length atc_mean_val 8; %let macro_i=1; %do %while (&macro_i < 657); /* (1 + max AE degree) in the 2005 network */ length node_id_1_&macro_i 8; %let macro_i = %eval(&macro_i + 1); %end; length PubChem_Compound_ID 8; length atc_code_min_dist 8; length atc_code_min_tmp 8; length atc_code1-atc_code11 $ 7; length L1 L3 L4 L1_prime L3_prime L4_prime $ 1; length L2 L2_prime $ 2; set in_ds.&all_pairs_ds; array atc_codes_d1{11} $ 8 atc_code_d1_1-atc_code_d1_11; array atc_codes_d2{11} $ 8 atc_code_d2_1-atc_code_d2_11; array atc_min_distances{657} 8; if (_N_ = 1) then do; /* for each drug there were 1-11 ATC codes */ declare hash atcHash(dataset: 'in_ds._2005_drugs_atc_data'); rc = atcHash.definekey('PubChem_Compound_ID'); rc = atcHash.definedata('atc_code1', 'atc_code2', 'atc_code3', 'atc_code4', 'atc_code5', 'atc_code6', 'atc_code7', 'atc_code8', 'atc_code9', 'atc_code10', 'atc_code11'); atcHash.definedone(); /* for each HLT, the neighbors in 2005 network */ declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._2005edge_t"); rc = neighborHash.definekey('node_id_2'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); end; atc_min_val = .; atc_max_val = .; atc_mean_val = .;

13

PubChem_Compound_ID = node_id_1; rc = atcHash.find(); if (rc NE 0) then put "Could not find" PubChem_Compound_ID; else do; atc_code_d1_1 = atc_code1; atc_code_d1_2 = atc_code2; atc_code_d1_3 = atc_code3; atc_code_d1_4 = atc_code4; atc_code_d1_5 = atc_code5; atc_code_d1_6 = atc_code6; atc_code_d1_7 = atc_code7; atc_code_d1_8 = atc_code8; atc_code_d1_9 = atc_code9; atc_code_d1_10 = atc_code10; atc_code_d1_11 = atc_code11; rc = neighborHash.find(); if (rc NE 0) then put "Could not find" node_id_2; else do; %let macro_i = 1; %do %while (&macro_i < 657); atc_min_distances[&macro_i] = .; %let macro_i = %eval(&macro_i + 1); %end; %let macro_i = 1; %do %while (&macro_i < 657); PubChem_Compound_ID = node_id_1_&macro_i; if (PubChem_Compound_ID NE . AND PubChem_Compound_ID NE node_id_1) then do; rc = atcHash.find(); if (rc NE 0) then put "Could not find " PubChem_Compound_ID; else do; atc_code_d2_1 = atc_code1; atc_code_d2_2 = atc_code2; atc_code_d2_3 = atc_code3; atc_code_d2_4 = atc_code4; atc_code_d2_5 = atc_code5; atc_code_d2_6 = atc_code6; atc_code_d2_7 = atc_code7; atc_code_d2_8 = atc_code8; atc_code_d2_9 = atc_code9; atc_code_d2_10 = atc_code10; atc_code_d2_11 = atc_code11; atc_code_min_dist = 99; /* very large value */ i=1;

14

do while (i <= 11); /* max number of unique ATC codes */ if (atc_codes_d1[i] EQ "") then do; leave; end; L1 = substrn(atc_codes_d1[i],1,1); L2 = substrn(atc_codes_d1[i],2,2); L3 = substrn(atc_codes_d1[i],4,1); L4 = substrn(atc_codes_d1[i],5,1); j=1; do while (j <= 11); if (atc_codes_d2[j] EQ "") then do; leave; end; L1_prime = substrn(atc_codes_d2[j],1,1); L2_prime = substrn(atc_codes_d2[j],2,2); L3_prime = substrn(atc_codes_d2[j],4,1); L4_prime = substrn(atc_codes_d2[j],5,1); if (L1 EQ L1_prime AND L2 EQ L2_prime AND L3 EQ L3_prime AND L4 EQ L4_prime) then atc_code_min_tmp = 2; else if (L1 EQ L1_prime AND L2 EQ L2_prime AND L3 EQ L3_prime) then atc_code_min_tmp = 4; else if (L1 EQ L1_prime AND L2 EQ L2_prime) then atc_code_min_tmp = 6; else if (L1 EQ L1_prime) then

15

atc_code_min_tmp = 8; else atc_code_min_tmp = 10; if (atc_code_min_tmp < atc_code_min_dist) then atc_code_min_dist = atc_code_min_tmp; j = j+1; end; /* while j */ i = i+1; end; /* while i */ atc_min_distances[&macro_i] = atc_code_min_dist; end; /*else*/ end; /* if Pubchem_compound_ID */ %let macro_i = %eval(&macro_i + 1); %end; /*while macro i*/ atc_min_val = min(of atc_min_distances{*}); atc_max_val = max(of atc_min_distances{*}); atc_mean_val = mean(of atc_min_distances{*}); end; end; keep node_id_1 node_id_2 atc_min_val atc_max_val atc_mean_val is_old_edge is_old_edge_class is_new_edge is_new_edge_class is_test_pair is_test_pair_class; run; quit; %mend add_ATC_codes_min; /* * Function to compute the distribution of ATC distances * in the neighborhood of a drug-ADE pair. This distribution is used * in the computation of covariate "atc-KL" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_atcb which also contains * the distribution ATC distances in the neighborhood of each pair

16

*/ %macro add_ATC_codes_bins(all_pairs_ds= ); %local macro_i; /* create hash tables */ data in_ds.&all_pairs_ds._05e; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2; run; quit; proc sort data=in_ds.&all_pairs_ds._05e; by node_id_2; run; quit; proc transpose data=in_ds.&all_pairs_ds._05e out=in_ds.&all_pairs_ds._05et prefix=node_id_1_; by node_id_2; var node_id_1; run; quit; data in_ds.&all_pairs_ds._05et; set in_ds.&all_pairs_ds._05et; drop node_id_1 _NAME_ _LABEL_; run; quit; /* compute full distribution of distances */ data in_ds.&all_pairs_ds._atcb; length node_id_1 8; length node_id_2 8; length atc_min_val 8; length atc_max_val 8; length atc_mean_val 8; length atc_bin1-atc_bin5 8; %let macro_i=1; %do %while (&macro_i < 657); length node_id_1_&macro_i 8; %let macro_i = %eval(&macro_i + 1); %end; length PubChem_Compound_ID 8; length atc_code_min_dist 8; length atc_code_min_tmp 8; length atc_code1-atc_code11 $ 7; length L1 L3 L4 L1_prime L3_prime L4_prime $ 1; length L2 L2_prime $ 2; set in_ds.&all_pairs_ds; array atc_codes_d1{11} $ 8 atc_code_d1_1-atc_code_d1_11; array atc_codes_d2{11} $ 8 atc_code_d2_1-atc_code_d2_11; array atc_min_distances{657} 8;

17

if (_N_ = 1) then do; declare hash atcHash(dataset: 'in_ds._2005_drugs_atc_data'); rc = atcHash.definekey('PubChem_Compound_ID'); rc = atcHash.definedata('atc_code1', 'atc_code2', 'atc_code3', 'atc_code4', 'atc_code5', 'atc_code6', 'atc_code7', 'atc_code8', 'atc_code9', 'atc_code10', 'atc_code11'); atcHash.definedone(); declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._05et"); rc = neighborHash.definekey('node_id_2'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); end; atc_min_val = .; atc_max_val = .; atc_mean_val = .; atc_bin1 = 0; atc_bin2 = 0; atc_bin3 = 0; atc_bin4 = 0; atc_bin5 = 0; PubChem_Compound_ID = node_id_1; rc = atcHash.find(); if (rc NE 0) then put "Could not find" PubChem_Compound_ID; else do; atc_code_d1_1 = atc_code1; atc_code_d1_2 = atc_code2; atc_code_d1_3 = atc_code3; atc_code_d1_4 = atc_code4; atc_code_d1_5 = atc_code5; atc_code_d1_6 = atc_code6; atc_code_d1_7 = atc_code7; atc_code_d1_8 = atc_code8; atc_code_d1_9 = atc_code9; atc_code_d1_10 = atc_code10; atc_code_d1_11 = atc_code11; rc = neighborHash.find(); if (rc NE 0) then put "Could not find" node_id_2; else do; %let macro_i = 1; %do %while (&macro_i < 657); atc_min_distances[&macro_i] = .; %let macro_i = %eval(&macro_i + 1); %end; %let macro_i = 1; %do %while (&macro_i < 657);

18

PubChem_Compound_ID = node_id_1_&macro_i; if (PubChem_Compound_ID NE . AND PubChem_Compound_ID NE node_id_1) then do; rc = atcHash.find(); if (rc NE 0) then put "Could not find " PubChem_Compound_ID; else do; atc_code_d2_1 = atc_code1; atc_code_d2_2 = atc_code2; atc_code_d2_3 = atc_code3; atc_code_d2_4 = atc_code4; atc_code_d2_5 = atc_code5; atc_code_d2_6 = atc_code6; atc_code_d2_7 = atc_code7; atc_code_d2_8 = atc_code8; atc_code_d2_9 = atc_code9; atc_code_d2_10 = atc_code10; atc_code_d2_11 = atc_code11; atc_code_min_dist = 99; i=1; do while (i <= 11); if (atc_codes_d1[i] EQ "") then do; leave; end; L1 = substrn(atc_codes_d1[i],1,1); L2 = substrn(atc_codes_d1[i],2,2); L3 = substrn(atc_codes_d1[i],4,1); L4 = substrn(atc_codes_d1[i],5,1); j=1; do while (j <= 11); if (atc_codes_d2[j] EQ "") then do; leave; end; L1_prime = substrn(atc_codes_d2[j],1,1); L2_prime = substrn(atc_codes_d2[j],2,2); L3_prime = substrn(atc_codes_d2[j],4,1); L4_prime = substrn(atc_codes_d2[j],5,1); if (L1 EQ L1_prime AND

19

L2 EQ L2_prime AND L3 EQ L3_prime AND L4 EQ L4_prime) then atc_code_min_tmp = 2; else if (L1 EQ L1_prime AND L2 EQ L2_prime AND L3 EQ L3_prime) then atc_code_min_tmp = 4; else if (L1 EQ L1_prime AND L2 EQ L2_prime) then atc_code_min_tmp = 6; else if (L1 EQ L1_prime) then atc_code_min_tmp = 8; else atc_code_min_tmp = 10; if (atc_code_min_tmp < atc_code_min_dist) then atc_code_min_dist = atc_code_min_tmp; j = j+1; end; /* while j */ i = i+1; end; /* while i */ atc_min_distances[&macro_i] = atc_code_min_dist; end; /*else*/ end; /* if Pubchem_compound_ID */ %let macro_i = %eval(&macro_i + 1); %end; /*while macro i*/ atc_min_val = min(of atc_min_distances{*}); atc_max_val = max(of atc_min_distances{*}); atc_mean_val = mean(of atc_min_distances{*}); atc_min_nonmiss = N(of atc_min_distances{*}); k = 1; do while (k <= dim(atc_min_distances));

20

if (atc_min_distances[k] NE .) then do; if (atc_min_distances[k] EQ 2) then atc_bin1 = atc_bin1 + 1; if (atc_min_distances[k] EQ 4) then atc_bin2 = atc_bin2 + 1; if (atc_min_distances[k] EQ 6) then atc_bin3 = atc_bin3 + 1; if (atc_min_distances[k] EQ 8) then atc_bin4 = atc_bin4 + 1; if (atc_min_distances[k] EQ 10) then atc_bin5 = atc_bin5 + 1; end; k = k+1; end; atc_bin1 = atc_bin1/atc_min_nonmiss; atc_bin2 = atc_bin2/atc_min_nonmiss; atc_bin3 = atc_bin3/atc_min_nonmiss; atc_bin4 = atc_bin4/atc_min_nonmiss; atc_bin5 = atc_bin5/atc_min_nonmiss; end; end; keep node_id_1 node_id_2 is_old_edge atc_bin1 atc_bin2 atc_bin3 atc_bin4 atc_bin5 ; run; quit; %mend add_ATC_codes_bins; /* Function to compute the Kullback-Leibler (KL) distance between a distribution and a desired reference distribution. This function if used to compute all KL-based covariates discussed in the paper. dist_type: what type of distribution--to distinguish between NET, TAX and INT covariates. bin_ds: is a data set containing the (discrete) distribution associated with each drug-ADE pair nbins: is the number of bins in that discrete distribution */ %macro compute_kldist(dist_type= ,bin_ds= ,nbins= ); proc means data=in_ds.&bin_ds mean noprint; var &dist_type._bin1-&dist_type._bin&nbins; where (is_old_edge = 1); output out=in_ds.&bin_ds.M; run; quit; data in_ds.&bin_ds.M;

21

set in_ds.&bin_ds.M; where (_STAT_ EQ "MEAN"); keep &dist_type._bin1-&dist_type._bin&nbins; run; quit; proc means data=in_ds.&bin_ds mean noprint; var &dist_type._bin1-&dist_type._bin&nbins; where (is_old_edge = 0); output out=in_ds.&bin_ds.N; run; quit; data in_ds.&bin_ds.N; set in_ds.&bin_ds.N; where (_STAT_ EQ "MEAN"); keep &dist_type._bin1-&dist_type._bin&nbins; run; quit; proc means data=in_ds.&bin_ds mean noprint; var &dist_type._bin1-&dist_type._bin&nbins; output out=in_ds.&bin_ds.Q; run; quit; data in_ds.&bin_ds.Q; set in_ds.&bin_ds.Q; where (_STAT_ EQ "MEAN"); keep &dist_type._bin1-&dist_type._bin&nbins; run; quit; %local macro_i ; data _null_; set in_ds.&bin_ds.M; if (_N_ = 1) then do; %let macro_i = 1; %do %while (&macro_i <= &nbins); corrected_bin = &dist_type._bin&macro_i + 0.000001; call symput("refBin1_&macro_i", corrected_bin); %let macro_i = %eval(&macro_i + 1); %end; end; run; quit; data _null_; set in_ds.&bin_ds.N; if (_N_ = 1) then do; %let macro_i = 1; %do %while (&macro_i <= &nbins); corrected_bin = &dist_type._bin&macro_i + 0.000001; call symput("refBin0_&macro_i", corrected_bin); %let macro_i = %eval(&macro_i + 1); %end; end; run; quit; data _null_; set in_ds.&bin_ds.Q; if (_N_ = 1) then do;

22

%let macro_i = 1; %do %while (&macro_i <= &nbins); corrected_bin = &dist_type._bin&macro_i + 0.000001; call symput("refBin01_&macro_i", corrected_bin); %let macro_i = %eval(&macro_i + 1); %end; end; run; quit; data in_ds.&bin_ds.K; length kl0_&dist_type kl1_&dist_type kl01_&dist_type 8; set in_ds.&bin_ds; kl0_&dist_type = 0; kl1_&dist_type = 0; kl01_&dist_type = 0; %let macro_i = 1; %do %while (&macro_i <= &nbins); if (&dist_type._bin&macro_i > 0) then do; kl0_&dist_type = kl0_&dist_type + &dist_type._bin&macro_i*(log2(&dist_type._bin&macro_i)-log2(&&refBin0_&macro_i)); kl1_&dist_type = kl1_&dist_type + &dist_type._bin&macro_i*(log2(&dist_type._bin&macro_i)-log2(&&refBin1_&macro_i)); kl01_&dist_type = kl01_&dist_type + &dist_type._bin&macro_i*(log2(&dist_type._bin&macro_i)-log2(&&refBin01_&macro_i)); end; %let macro_i = %eval(&macro_i + 1); %end; drop is_old_edge; run; quit; %mend compute_kldist; /* * Function to compute the covariate "meddra-min" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_meddra_h which also contains * the value of "meddra-min" covariate for each drug-ADE pair */ %macro add_meddra_min_dist_hlt(all_pairs_ds= ); /* create hash tables */ data in_ds.&all_pairs_ds._2005edge; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2;

23

run; quit; proc sort data=in_ds.&all_pairs_ds._2005edge; by node_id_1; run; quit; proc transpose data=in_ds.&all_pairs_ds._2005edge out=in_ds.&all_pairs_ds._2005edge_t prefix=node_id_2_; by node_id_1; var node_id_2; run; quit; data in_ds.&all_pairs_ds._2005edge_t; set in_ds.&all_pairs_ds._2005edge_t; drop node_id_2 _NAME_ _LABEL_; run; quit; data meddra.mdhier_hlt_hlgt; set meddra.mdhier; keep hlt_code hlgt_code; run; quit; proc sort data=meddra.mdhier_hlt_hlgt noduprecs; by hlt_code hlgt_code; run; quit; proc transpose data=meddra.mdhier_hlt_hlgt out=meddra.mdhier_hlt_hlgt_t prefix=hlgt_code; by hlt_code; var hlgt_code; run; quit; /* HLT to HLGT mapping */ data meddra.mdhier_hlt_hlgt_t; set meddra.mdhier_hlt_hlgt_t; drop _NAME_; run; quit; data meddra.mdhier_hlgt_soc; set meddra.mdhier; keep hlgt_code soc_code; run; quit; proc sort data=meddra.mdhier_hlgt_soc noduprecs; by hlgt_code soc_code; run; quit; proc transpose data=meddra.mdhier_hlgt_soc out=meddra.mdhier_hlgt_soc_t prefix=soc_code; by hlgt_code; var soc_code; run; quit;

24

/* HLGT to SOC mapping */ data meddra.mdhier_hlgt_soc_t; set meddra.mdhier_hlgt_soc_t; drop _NAME_; run; quit; /* perform meddra distance computations */ data in_ds.&all_pairs_ds._meddra_h; length meddra_h_min_val 8; length meddra_h_max_val 8; length meddra_h_mean_val 8; %let macro_i=1; %do %while (&macro_i < 213); /* (1 + max drug degree) in the 2005 network */ length node_id_2_&macro_i 8; %let macro_i = %eval(&macro_i + 1); %end; length hlt_code hlgt_code 8; length hlgt_code1 hlgt_code2 8; length hlgt_code11 hlgt_code12 hlgt_code21 hlgt_code22 8; length soc_code1 soc_code2 8; length soc_code11 soc_code12 soc_code21 soc_code22 8; set in_ds.&all_pairs_ds; array meddra_h_min_distances{212} 8; if (_N_ = 1) then do; declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._2005edge_t"); rc = neighborHash.definekey('node_id_1'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); declare hash hltHash(dataset: "meddra.mdhier_hlt_hlgt_t"); rc = hltHash.definekey('hlt_code'); rc = hltHash.definedata('hlgt_code1','hlgt_code2'); hltHash.definedone(); declare hash hlgtHash(dataset: "meddra.mdhier_hlgt_soc_t"); rc = hlgtHash.definekey('hlgt_code'); rc = hlgtHash.definedata('soc_code1','soc_code2'); hlgtHash.definedone(); end; meddra_h_min_val = .; meddra_h_max_val = .; meddra_h_mean_val = .; rc = neighborHash.find(); if (rc NE 0) then do; put "Could not find PubChem_Compound_ID " node_id_1; end; else do;

25

%let macro_i = 1; %do %while (&macro_i < 213); meddra_h_min_distances[&macro_i] = .; %let macro_i = %eval(&macro_i + 1); %end; hlt_code = node_id_2; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt1 in hltHash " hlt_code; end; else do; hlgt_code11 = hlgt_code1; hlgt_code12 = hlgt_code2; end; %let macro_i = 1; %do %while (&macro_i < 213); if (node_id_2_&macro_i NE . AND node_id_2_&macro_i NE node_id_2) then do; /* second HLT */ hlt_code = node_id_2_&macro_i; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt2 in hltHash " hlt_code; end; else do; hlgt_code21 = hlgt_code1; hlgt_code22 = hlgt_code2; end; if ((hlgt_code11 EQ hlgt_code21) OR (hlgt_code22 NE . AND hlgt_code11 EQ hlgt_code22) OR (hlgt_code12 NE . AND hlgt_code12 EQ hlgt_code21) OR (hlgt_code12 NE . AND hlgt_code22 NE . AND hlgt_code12 EQ hlgt_code22)) then do; meddra_h_min_distances[&macro_i] = 2; end; else do; /* hlgt_code11 */ hlgt_code = hlgt_code11; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code11 in hlgtHash " hlgt_code; end; soc_code11 = soc_code1; soc_code12 = soc_code2;

26

/* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code22 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; end; /* hlgt_code12 */ if (hlgt_code12 NE .) then do; hlgt_code = hlgt_code12; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code12 in hlgtHash " hlgt_code; end;

27

soc_code11 = soc_code1; soc_code12 = soc_code2; /* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code 21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; end; end; /* if hlgt_code12 NE . */ end; /* else do */ if (meddra_h_min_distances[&macro_i] EQ .) then meddra_h_min_distances[&macro_i] = 6; end; /* if node_id_2_&macro_i NE node_id_2 */

28

%let macro_i = %eval(&macro_i + 1); %end; meddra_h_min_val = min(of meddra_h_min_distances{*}); meddra_h_max_val = max(of meddra_h_min_distances{*}); meddra_h_mean_val = mean(of meddra_h_min_distances{*}); meddra_h_min_nonmiss = N(of meddra_h_min_distances{*}); end; /* else do */ keep node_id_1 node_id_2 meddra_h_min_val meddra_h_max_val meddra_h_mean_val is_old_edge is_old_edge_class is_new_edge is_new_edge_class is_test_pair is_test_pair_class; run; quit; %mend add_meddra_min_dist_hlt; /* * Function to compute the distribution of Meddra distances * in the neighborhood of a drug-ADE pair. This distribution is used * in the computation of covariate "meddra-KL" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_atcb which also contains * the distribution Meddra distances in the neighborhood of each pair */ %macro add_meddra_min_dist_bins(all_pairs_ds= ); data in_ds.&all_pairs_ds._05e; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2; run; quit; proc sort data=in_ds.&all_pairs_ds._05e; by node_id_1; run; quit; proc transpose data=in_ds.&all_pairs_ds._05e out=in_ds.&all_pairs_ds._05e_t prefix=node_id_2_; by node_id_1; var node_id_2; run; quit;

29

data in_ds.&all_pairs_ds._05e_t; set in_ds.&all_pairs_ds._05e_t; drop node_id_2 _NAME_ _LABEL_; run; quit; data meddra.mdhier_hlt_hlgt; set meddra.mdhier; keep hlt_code hlgt_code; run; quit; proc sort data=meddra.mdhier_hlt_hlgt noduprecs; by hlt_code hlgt_code; run; quit; proc transpose data=meddra.mdhier_hlt_hlgt out=meddra.mdhier_hlt_hlgt_t prefix=hlgt_code; by hlt_code; var hlgt_code; run; quit; data meddra.mdhier_hlt_hlgt_t; set meddra.mdhier_hlt_hlgt_t; drop _NAME_; run; quit; data meddra.mdhier_hlgt_soc; set meddra.mdhier; keep hlgt_code soc_code; run; quit; proc sort data=meddra.mdhier_hlgt_soc noduprecs; by hlgt_code soc_code; run; quit; proc transpose data=meddra.mdhier_hlgt_soc out=meddra.mdhier_hlgt_soc_t prefix=soc_code; by hlgt_code; var soc_code; run; quit; data meddra.mdhier_hlgt_soc_t; set meddra.mdhier_hlgt_soc_t; drop _NAME_; run; quit; data in_ds.&all_pairs_ds._medb; length meddra_h_min_val 8; length meddra_h_max_val 8; length meddra_h_mean_val 8; length med_bin1-med_bin3 8;

30

%let macro_i=1; %do %while (&macro_i < 213); length node_id_2_&macro_i 8; %let macro_i = %eval(&macro_i + 1); %end; length hlt_code hlgt_code 8; length hlgt_code1 hlgt_code2 8; length hlgt_code11 hlgt_code12 hlgt_code21 hlgt_code22 8; length soc_code1 soc_code2 8; length soc_code11 soc_code12 soc_code21 soc_code22 8; set in_ds.&all_pairs_ds; array meddra_h_min_distances{212} 8; if (_N_ = 1) then do; declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._05e_t"); rc = neighborHash.definekey('node_id_1'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); declare hash hltHash(dataset: "meddra.mdhier_hlt_hlgt_t"); rc = hltHash.definekey('hlt_code'); rc = hltHash.definedata('hlgt_code1','hlgt_code2'); hltHash.definedone(); declare hash hlgtHash(dataset: "meddra.mdhier_hlgt_soc_t"); rc = hlgtHash.definekey('hlgt_code'); rc = hlgtHash.definedata('soc_code1','soc_code2'); hlgtHash.definedone(); end; meddra_h_min_val = .; meddra_h_max_val = .; meddra_h_mean_val = .; med_bin1 = 0; med_bin2 = 0; med_bin3 = 0; rc = neighborHash.find(); if (rc NE 0) then do; put "Could not find PubChem_Compound_ID " node_id_1; end; else do; %let macro_i = 1; %do %while (&macro_i < 213); meddra_h_min_distances[&macro_i] = .; %let macro_i = %eval(&macro_i + 1); %end; hlt_code = node_id_2; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt1 in hltHash " hlt_code;

31

end; else do; hlgt_code11 = hlgt_code1; hlgt_code12 = hlgt_code2; end; %let macro_i = 1; %do %while (&macro_i < 213); if (node_id_2_&macro_i NE . AND node_id_2_&macro_i NE node_id_2) then do; /* second HLT */ hlt_code = node_id_2_&macro_i; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt2 in hltHash " hlt_code; end; else do; hlgt_code21 = hlgt_code1; hlgt_code22 = hlgt_code2; end; if ((hlgt_code11 EQ hlgt_code21) OR (hlgt_code22 NE . AND hlgt_code11 EQ hlgt_code22) OR (hlgt_code12 NE . AND hlgt_code12 EQ hlgt_code21) OR (hlgt_code12 NE . AND hlgt_code22 NE . AND hlgt_code12 EQ hlgt_code22)) then do; meddra_h_min_distances[&macro_i] = 2; end; else do; /* hlgt_code11 */ hlgt_code = hlgt_code11; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code11 in hlgtHash " hlgt_code; end; soc_code11 = soc_code1; soc_code12 = soc_code2; /* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR

32

(soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code22 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; end; /* hlgt_code12 */ if (hlgt_code12 NE .) then do; hlgt_code = hlgt_code12; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code12 in hlgtHash " hlgt_code; end; soc_code11 = soc_code1; soc_code12 = soc_code2; /* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code 21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2;

33

if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[&macro_i] = 4; end; end; end; /* if hlgt_code12 NE . */ end; /* else do */ if (meddra_h_min_distances[&macro_i] EQ .) then meddra_h_min_distances[&macro_i] = 6; end; /* if node_id_2_&macro_i NE node_id_2 */ %let macro_i = %eval(&macro_i + 1); %end; meddra_h_min_val = min(of meddra_h_min_distances{*}); meddra_h_max_val = max(of meddra_h_min_distances{*}); meddra_h_mean_val = mean(of meddra_h_min_distances{*}); meddra_h_min_nonmiss = N(of meddra_h_min_distances{*}); k = 1; do while (k <= dim(meddra_h_min_distances)); if (meddra_h_min_distances[k] NE .) then do;

34

if (meddra_h_min_distances[k] EQ 2) then med_bin1 = med_bin1 + 1; if (meddra_h_min_distances[k] EQ 4) then med_bin2 = med_bin2 + 1; if (meddra_h_min_distances[k] EQ 6) then med_bin3 = med_bin3 + 1; end; k = k+1; end; med_bin1 = med_bin1/meddra_h_min_nonmiss; med_bin2 = med_bin2/meddra_h_min_nonmiss; med_bin3 = med_bin3/meddra_h_min_nonmiss; end; /* else do */ keep node_id_1 node_id_2 is_old_edge med_bin1-med_bin3 ; run; quit; %mend add_meddra_min_dist_bins;

35

SUPPLEMENTARY TABLES Table S1. Definition of covariates. Variable i denotes a drug, variable j denotes an ADR, and ( )N i denotes

the set of neighbors of node i. Definition of taxonomic covariates relies on the pre‐computed ATC‐ and

MedDRA‐based distances ATCd , MedDRAd discussed in the paper. The definition of intrinsic covariates

relies on the pre‐computed Euclidean distances INTd discussed in the paper.

Covariate name Covariate definition Additional information

Network covariate

degree‐prod 1 ( , ) ( ) ( )X i j i j= ´degree degree

degree‐sum 2 ( , ) ( ) ( )X i j i j= +degree degree

degree‐ratio 3 ( , ) ( ) / ( )X i j i j=degree degree

degree‐absdiff 4 ( , ) ( ) ( )X i j i j= -degree degree

jackard‐ADE‐max 5

( ) { }( , ) max { ( , )}

k N i jX i j J j k

Î -= ( , ) ( ) ( ) ( ) ( )J j k N j N k N j N k= Ç È denotes the

Jackard coefficient between the sets ( )N j

and ( )N k

jackard‐ADE‐KL 6 ( , )X i j : Kullback‐Leibler (KL) distance between the

distribution ( , )aeD i j of the variable

( , ), ( ) { }J i k k N j iÎ - and a reference distribution

The reference distribution aeD was

computed as the mean of distributions

( , )aeD i j over the training edges ( , )i j

jackard‐drug‐max 7

( ) { }( , ) max { ( , )}

k N j iX i j J i k

Î -=

jackard‐drug‐KL

8 ( , )X i j : KL distance between the distribution

( , )drugD i j of the variable ( , ), ( ) { }J j k k N i jÎ -

and a

reference distribution

The reference distribution drugD was


( , )drugD i j over the training edges ( , )i j

edge‐density 9 ( , )X i j : The edge density in the subgraph induced

by ( ) ( ) { , }N i N j i jÈ - .

Taxonomic covariate

atc‐min 10

( ) { }( , ) min { ( , )}ATC

k N j iX i j d i k

Î -=

atc‐KL 11 ( , )X i j : KL distance between the distribution

( , )ATCD i j of the variable ( , ), ( ) { }ATCd i k k N j iÎ - and

a reference distribution

The reference distribution ATCD was


( , )ATCD i j over the training edges ( , )i j

meddra‐min 12

( ) { }( , ) min { ( , )}MedDRA

k N i jX i j d j k

Î -=

meddra‐KL 13 ( , )X i j : KL distance between the distribution

( , )MedDRAD i j of the variable

( , ), ( ) { }MedDRAd j k k N i jÎ - and a reference

distribution

The reference distribution MedDRAD was


( , )MedDRAD i j over all training edges ( , )i j

Intrinsic covariate

euclid‐min 14

( ) { }( , ) min { ( , )}INT

k N j iX i j d i k

Î -=

euclid‐KL 15 ( , )X i j : KL distance between the distribution

( , )INTD i j of the variable ( , ), ( ) { }INTd i k k N j iÎ - and

a reference distribution

The reference distribution INTD was


( , )INTD i j over the training edges ( , )i j

36

Table S2. List of drugs and their ATC codes.

Drug name atc1 atc2 atc3 atc4 atc5 atc6 atc7 atc8 atc9 atc10 atc11

abacavir J05AF06

acamprosate N07BB03

acarbose A10BF01

acebutolol C07AB04

acenocoumarol B01AA07

acetaminophen N02BE01

acetazolamide S01EC01

acetic acid G01AD02 S02AA10

acetylcholine S01EB09

acetylcysteine R05CB01 S01XA08 V03AB23

acitretin D05BB02

acyclovir J05AB01 D06BB03 S01AD03

adapalene D10AD03

adefovir J05AF08

adenosine C01EB10

albendazole P02CA03

albuterol R03AC02 R03CC02

alclometasone D07AB10 S01BA10

alcohol (ethyl) D01AE06 D08AX08 V03AB16 V03AZ01

alendronate M05BA04

alfentanil N01AH02

alfuzosin G04CA01

allopurinol M04AA01

almotriptan N02CC05

alosetron A03AE01

alprostadil C01EA01 G04BE01

altretamine L01XX03

amantadine N04BB01

ambenonium N07AA30

37

amcinonide D07AC11

amifostine V03AF05

amikacin D06AX12 J01GB06 S01AA21

aminocaproic acid B02AA01

aminolevulinic acid

L01XD04

aminophylline R03DA05

aminosalicylic acid J04AA01

amiodarone C01BD01

amitriptyline N06AA09

amlexanox A01AD07 R03DX01

amobarbital N05CA02

amoxapine N06AA17

amoxicillin J01CA04

amphotericin b (conventional)

A01AB04 A07AA07 G01AA03 J02AA01

ampicillin J01CA01 S01AA19

amsacrine L01XX01

amyl nitrite V03AB22

anagrelide L01XX35

anastrozole L02BG03

apomorphine G04BE07 N04BC07

apraclonidine S01EA03

aprepitant A04AD12

aprotinin B02AB01

argatroban B01AE03

arginine B05XB01

aripiprazole N05AX12

arsenic trioxide L01XX27

ascorbic acid G01AD03 S01XA15

aspirin A01AD05 B01AC06 N02BA01

atazanavir J05AE08

38

atenolol C07AB03

atomoxetine N06BA09

atorvastatin C10AA05

atovaquone P01AX06

atracurium M03AC04

atropine A03BA01 S01FA01

auranofin M01CB03

azelaic acid D10AX03

azelastine R01AC03 R06AX19 S01GX07

azithromycin J01FA10 S01AA26

aztreonam J01DF01

bacitracin D06AX05 R02AB04

baclofen M03BX01

balsalazide A07EC04

beclomethasone A07EA07 D07AC15 R01AD01 R03BA01

benazepril C09AA07

bentoquatam W99WW99*

benzocaine C05AD03 D04AB04 N01BA05 R02AD01

benzonatate R05DB01

benzphetamine W99WW99*

benztropine N04AC01

betamethasone A07EA04 C05AA05 D07AC01 D07XC01 H02AB01 R01AD06 R03BA04 S01BA06 S01CB04 S02BA07 S03BA03

betaxolol C07AB05 S01ED02

bethanechol N07AB02

bexarotene L01XX25

bezafibrate C10AB02

bicalutamide L02BB03

bimatoprost S01EE03

bismuth A07BB01

bisoprolol C07AB07

bivalirudin B01AE06

39

bleomycin L01DC01

bortezomib L01XX32

bosentan C02KX01

brimonidine S01EA05

brinzolamide S01EC04

bromazepam N05BA08

bromfenac S01BC11

bromocriptine G02CB01 N04BC01

brompheniramine R06AB01

budesonide A07EA06 D07AC09 R01AD05 R03BA02

bumetanide C03CA02

bupivacaine N01BB01

buprenorphine N02AE01 N07BC01

bupropion N06AX12

buspirone N05BE01

busulfan L01AB01

butabarbital N05CA23

butenafine D01AE23

butoconazole G01AF15

butorphanol N02AF01

cabergoline G02CB03 N04BC06

caffeine N06BC01

calcipotriene D05AX02

calcitriol A11CC04 D05AX03

calcium acetate A12AA12

calcium chloride A12AA07 B05XA07 G04BA03

candesartan C09CA06

capecitabine L01BC06

capreomycin J04AB30

captopril C09AA01

carbachol N07AB01 S01EB02

40

carbidopa W99WW99*

carbinoxamine R06AA08

carboprost tromethamine

G02AD04

carisoprodol M03BA02

carmustine L01AD01

carteolol C07AA15 S01ED05

carvedilol C07AG02

caspofungin J02AX04

cefaclor J01DC04

cefadroxil J01DB05

cefdinir J01DD15

cefditoren J01DD16

cefepime J01DE01

cefixime J01DD08

cefotaxime J01DD01

cefotetan J01DC05

cefoxitin J01DC01

cefpodoxime J01DD13

cefprozil J01DC10

ceftazidime J01DD02

ceftibuten J01DD14

ceftizoxime J01DD07

cefuroxime J01DC02

celecoxib L01XX33 M01AH01

cephalexin J01DB01

cetirizine R06AE07

cetrorelix H01CC02

cevimeline N07AX03

chloral hydrate N05CC01

chlorambucil L01AA02

chloramphenicol D06AX02 D10AF03 G01AA05 J01BA01 S01AA01 S02AA01 S03AA08

41

chloroprocaine N01BA04

chloroquine P01BA01

chlorothiazide C03AA04

chlorpheniramine R06AB04

chlorpromazine N05AA01

chlorpropamide A10BB02

chlorthalidone C03BA04

chlorzoxazone M03BB03

cholecalciferol A11CC05

ciclopirox D01AE14 G01AX12

cidofovir J05AB12

cilazapril C09AA08

cilostazol B01AC31

cimetidine A02BA01

cinacalcet H05BX01

ciprofloxacin J01MA02 S01AX13 S02AA15 S03AA07

cisapride A03FA02

cisatracurium M03AC11

citalopram N06AB04

cladribine L01BB04

clarithromycin J01FA09

clemastine D04AA14 R06AA04

clindamycin D10AF01 G01AA10 J01FF01

clobazam N05BA09

clobetasol D07AD01

clocortolone D07AB21

clodronate M05BA02

clofarabine L01BB06

clomiphene G03GB02

clomipramine N06AA04

clopidogrel B01AC04

42

clorazepate N05BA05

clotrimazole A01AB18 D01AC01 G01AF02

cloxacillin J01CF02

clozapine N05AH02

cocaine N01BC01 R02AD03 S01HA01 S02DA02

codeine R05DA04

colchicine M04AC01

colesevelam C10AC04

colestipol C10AC02

colistimethate A07AA10 J01XB01

corticotropin H01AA01

cosyntropin H01AA03

crotamiton D04AX01

cyanocobalamin B03BA01

cyclizine R06AE03

cyclobenzaprine M03BX08

cyclopentolate S01FA04

cyclophosphamide L01AA01

cycloserine J04AB01

cyclosporine L04AD01 S01XA18

cyproheptadine R06AX02

cyproterone G03HA01

cysteamine A16AA04

cysteine R05CB16 S01XA21 V03AB36

dacarbazine L01AX04

danazol G03XA01

dantrolene M03CA01

dapsone J04BA02

darifenacin G04BD10

deferoxamine V03AC01

delavirdine J05AG02

43

demeclocycline D06AA01 J01AA01

desflurane N01AB07

desipramine N06AA01

desloratadine R06AX27

desmopressin H01BA02

desonide D07AB08 S01BA11

desoximetasone D07AC03 D07XC02

dexamethasone A01AC02 C05AA09 D07AB19 D07XB05 D10AA03 H02AB02 R01AD03 S01BA01 S01CB01 S02BA06 S03BA01

dexmedetomidine N05CM18

dexrazoxane V03AF02

dextroamphetamine

N06BA02

dextromethorphan R05DA09

diazepam N05BA01

diazoxide C02DA01 V03AH01

dibucaine C05AD04 D04AB02 N01BB06 S01HA06

diclofenac D11AX18 M01AB05 M02AA15 S01BC03

dicloxacillin J01CF01

dicyclomine A03AA07

didanosine J05AF02

diethylpropion A08AA03

diflorasone D07AC10

diflunisal N02BA11

digoxin C01AA05

dihydroergotamine

N02CA01

diltiazem C08DB01

dimenhydrinate R06AA02

dimethyl sulfoxide G04BX13 M02AX03

dinoprostone G02AD02

diphenhydramine D04AA32 R06AA02

dipivefrin S01EA02

44

dipyridamole B01AC07

disopyramide C01BA03

disulfiram N07BB01 P03AA04

dobutamine C01CA07

docetaxel L01CD02

docosanol D06BB11

dofetilide C01BD04

dolasetron A04AA04

domperidone A03FA03

donepezil N06DA02

dopamine C01CA04

dorzolamide S01EC03

doxapram R07AB01

doxazosin C02CA04

doxepin N06AA12

doxorubicin L01DB01

doxycycline A01AB22 J01AA02

dronabinol A04AD10

droperidol N01AX01 N05AD08

dutasteride G04CB02

dyclonine N01BX02 R02AD04

dyphylline R03DA01

echothiophate iodide

S01EB03

econazole D01AC03 G01AF05

edrophonium W99WW99*

efavirenz J05AG03

eletriptan N02CC06

emedastine S01GX06

emtricitabine J05AF09

enalapril C09AA02

enflurane N01AB04

45

enfuvirtide J05AX07

entacapone N04BX02

entecavir J05AF10

epinastine R06AX24 S01GX10

epirubicin L01DB03

eplerenone C03DA04

epoprostenol B01AC09

eprosartan C09CA02

eptifibatide B01AC16

ergocalciferol A11CC01

ergoloid mesylates C04AE01

ergonovine G02AB03

ergotamine N02CA02

erlotinib L01XE03

ertapenem J01DH03

erythromycin D10AF02 J01FA01 S01AA17

escitalopram N06AB10

esmolol C07AB09

estazolam N05CD04

estradiol G03CA03

estramustine L01XX11

eszopiclone N05CF04

ethacrynic acid C03CC01

ethambutol J04AK02

ethanolamine oleate

C05BB01

ethionamide J04AD03

ethosuximide N03AD01

ethotoin N03AB01

etodolac M01AB08

etomidate N01AX07

etoposide L01CB01

46

exemestane L02BG06

exenatide A10BX04

ezetimibe C10AX09

famciclovir J05AB09 S01AD07

famotidine A02BA03

felbamate N03AX10

felodipine C08CA02

fenofibrate C10AB05

fenoldopam C01CA19

fenoprofen M01AE04

fexofenadine R06AX26

finasteride D11AX10 G04CB01

flavoxate G04BD02

flecainide C01BC04

floxuridine L01BC54

fluconazole D01AC15 J02AC01

flucytosine D01AE21 J02AX01

fludarabine L01BB05

fludrocortisone H02AA02

flumazenil V03AB25

flunisolide R01AD04 R03BA03

fluocinonide C05AA11 D07AC08

fluorometholone C05AA06 D07AB06 D07XB04 D10AA01 S01BA07 S01CB05

fluorouracil L01BC02

fluoxymesterone G03BA01

flupenthixol N05AF01

fluphenazine N05AB02

flurandrenolide D07AC07

flurazepam N05CD01

flurbiprofen M01AE09 M02AA19 S01BC04

flutamide L02BB01

47

fluvastatin C10AA04

fluvoxamine N06AB08

folic acid B03BB01

fomepizole V03AB34

formoterol R03AC13

fosamprenavir J05AE07

foscarnet J05AD01

fosfomycin J01XX01

fosinopril C09AA09

fosphenytoin N03AB05

frovatriptan N02CC07

fulvestrant L02BA03

furosemide C03CA01

fusidic acid D06AX01 D09AA02 J01XC01 S01AA13

gabapentin N03AX12

gadopentetate dimeglumine

V08CA01

galantamine N06DA04

gallium nitrate V03AG02

ganciclovir J05AB06 S01AD09

gatifloxacin J01MA16 S01AX21

gefitinib L01XE02

gemcitabine L01BC05

gemfibrozil C10AB04

gemifloxacin J01MA15

gentamicin D06AX07 J01GB03 S01AA11 S02AA14 S03AA06

gentian violet D01AE02

glatiramer acetate L03AX13

gliclazide A10BB09

glimepiride A10BB12

glipizide A10BB07

glutamine A16AA03

48

glyburide A10BB01

glycerin A06AG04 A06AX01

glycopyrrolate A03AB02

goserelin L02AE03

granisetron A04AA02

griseofulvin D01AA08 D01BA01

guanabenz C02KX04

halobetasol D07AC21

haloperidol N05AD01

hexachlorophene D08AE01

homatropine S01FA05

hydralazine C02DB02

hydrochlorothiazide

C03AA03

hydrocortisone A01AC03 A07EA02 C05AA01 D07AA02 D07XA01 H02AB09 S01BA02 S01CB03 S02BA01

hydroxocobalamin B03BA03 V03AB33

hydroxychloroquine

P01BA02

hydroxyurea L01XX05

hydroxyzine N05BB01

hyoscyamine A03BA03

ibandronate M05BA06

ibuprofen C01EB16 G02CC01 M01AE01 M02AA13

ibutilide C01BD05

ifosfamide L01AA06

iloprost B01AC11

imatinib L01XE01

imipramine N06AA02

imiquimod D06BB10

inamrinone C01CE01

indapamide C03BA11

indinavir J05AE02

49

indomethacin C01EB03 M01AB01 M02AA23 S01BC01

ipratropium R01AX03 R03BB01

irbesartan C09CA04

irinotecan L01XX19

isocarboxazid N06AF01

isoflurane N01AB06

isoniazid J04AC01

isoproterenol C01CA02 R03AB02 R03CB01

isosorbide dinitrate

C01DA08 C05AE02

isosorbide mononitrate

C01DA14

isotretinoin D10AD04 D10BA01

isradipine C08CA03

itraconazole J02AC02

ivermectin P02CF01

kanamycin A07AA08 J01GB04 S01AA24

ketamine N01AX03

ketoconazole D01AC08 G01AF11 J02AB02

ketoprofen M01AE03 M02AA10

ketorolac M01AB15 S01BC05

ketotifen R06AX17 S01GX08

labetalol C07AG01

lactic acid G01AD01

lactulose A06AD11

lansoprazole A02BC03

latanoprost S01EE01

leflunomide L04AA13

lepirudin B01AE02

letrozole L02BG04

leuprolide L02AE02

levetiracetam N03AX14

50

levobunolol S01ED03

levocarnitine A16AA01

levofloxacin J01MA12 S01AX19

levonorgestrel G03AC03

levorphanol N02AX53

levothyroxine H03AA01

lidocaine C01BB01 C05AD01 D04AB01 N01BB02 R02AD02 S01HA07 S02DA01

lincomycin J01FF02

lindane P03AB02

linezolid J01XX08

liothyronine H03AA02

liotrix H03AA06

lisinopril C09AA03

lithium N05AN01

lomustine L01AD02

loperamide A07DA03

loratadine R06AX13

losartan C09CA01

lovastatin C10AA02

loxapine N05AH01

magnesium oxide A02AA02 A06AD02 A12CC10

magnesium sulfate A06AD04 A12CC02 B05XA05 D11AX05 V04CC02

malathion P03AX03

mannitol A06AD16 B05BC01 B05CX04

maprotiline N06AA21

mebendazole P02CA01

mecamylamine C02BB01

mechlorethamine L01AA05

meclizine A04AB04 R06AE05

medroxyprogesterone

G03AC06 G03DA02 L02AB02

mefenamic acid M01AG01

51

mefloquine P01BC02

megestrol G03AC05 G03DB02 L02AB01

meloxicam M01AC06

melphalan L01AA03

memantine N06DX01

mepenzolate A03AB12

meperidine N02AB02

mephobarbital N03AA01

mepivacaine N01BB03

meprobamate N05BC01

mercaptopurine L01BB02

meropenem J01DH02

mesalamine A07EC02

metaproterenol R03AB03 R03CB03 R03CB53

metaxalone M03AX02

methadone N07BC02

methamphetamine

N06BA03

methazolamide S01EC05

methimazole H03BB02

methocarbamol M03BA03

methohexital N01AF01 N05CA15

methotrexate L01BA01 L04AX03

methotrimeprazine

N05AA02

methoxsalen D05AD02 D05BA02

methsuximide N03AD03

methyclothiazide C03AA08

methyldopa C02AB01

methylergonovine G02AB01

methylphenidate N06BA04

methylprednisolone

D07AA01 D10AA02 H02AB04

52

metipranolol S01ED04

metoclopramide A03FA01

metolazone C03BA08

metoprolol C07AB02

metyrapone V04CD01

metyrosine C02KB01

mexiletine C01BB02

micafungin J02AX05

midazolam N05CD08

midodrine C01CA17

mifepristone G03XB01

miglitol A10BF02

miglustat A16AX06

milrinone C01CE02

minocycline A01AB23 J01AA08

minoxidil C02DC01 D11AX01

mirtazapine N06AX11

misoprostol A02BB01 G02AD06

mitomycin L01DC03

mitotane L01XX23

mitoxantrone L01DB07

moclobemide N06AG02

modafinil N06BA07

moexipril C09AA13

molindone N05AE02

monobenzone D11AX13

montelukast R03DC03

moxifloxacin J01MA14 S01AX22

mupirocin D06AX09 R01AX06

nabilone A04AD11

nabumetone M01AX01

53

nadolol C07AA12

nafarelin H01CA02

nafcillin J01CF06

naftifine D01AE22

nalbuphine N02AF02

naloxone V03AB15

naltrexone N07BB04

nandrolone A14AB01 S01XA11

naproxen G02CC02 M01AE02 M02AA12

naratriptan N02CC02

natamycin A01AB10 A07AA03 D01AA02 G01AA02 S01AA10

nateglinide A10BX03

nedocromil R01AC07 R03BC03 S01GX04

nefazodone N06AX06

nelfinavir J05AE04

neomycin A01AB08 A07AA01 B05CA09 D06AX04 J01GB05 R02AB01 S01AA03 S02AA07 S03AA01

neostigmine N07AA01 S01EB06

nevirapine J05AG01

niacin C10AD02

nicardipine C08CA04

nicotine N07BA01

nifedipine C08CA05

nilutamide L02BB02

nisoldipine C08CA07

nitazoxanide P01AX11

nitisinone A16AX04

nitrazepam N05CD02

nitric oxide R07AX01

nitrofurantoin J01XE01

nitroglycerin C01DA02 D03AX07

nitroprusside C02DD01

54

nizatidine A02BA04

norepinephrine C01CA03

norethindrone G03AC01 G03DC02

norfloxacin J01MA06 S01AX12

nortriptyline N06AA10

nystatin A07AA02 D01AA01 G01AA01

octreotide H01CB02

ofloxacin J01MA01 S01AX11

olmesartan C09CA08

olopatadine R01AC08 S01GX09

olsalazine A07EC03

ondansetron A04AA01

orlistat A08AB01

orphenadrine M03BC01 N04AB02

oseltamivir J05AH02

oxacillin J01CF04

oxaliplatin L01XA03

oxandrolone A14AA08

oxaprozin M01AE12

oxazepam N05BA04

oxiconazole D01AC11 G01AF17

oxybutynin G04BD04

oxymetazoline R01AA05 R01AB07 S01GA04

oxymorphone N02AX54

oxytocin H01BB02

paclitaxel L01CD01

palonosetron A04AA05

pamidronate M05BA03

pancuronium M03AC01

pantoprazole A02BC02

papaverine A03AD01 G04BE02

55

paricalcitol A11CC07

paromomycin A07AA06

pemetrexed L01BA04

pemirolast S01GX52

penbutolol C07AA23

penciclovir D06BB06 J05AB13

penicillamine M01CC01

pentamidine P01CX01

pentazocine N02AD01

pentostatin L01XX08

pentoxifylline C04AD03

permethrin P03AC04

perphenazine N05AB03

phenazopyridine G04BX06

phendimetrazine A08AX02

phenelzine N06AF03

phenoxybenzamine

C04AX02

phentermine A08AA01

phentolamine C04AB01 G04BE05

phenylephrine C01CA06 R01AA04 R01AB01 R01BA03 S01FB01 S01GA05

phenytoin N03AB02

physostigmine S01EB05 V03AB19

phytonadione B02BA01

pilocarpine N07AX01 S01EB01

pimecrolimus D11AX15

pimozide N05AG02

pindolol C07AA03

pioglitazone A10BG03

piperacillin J01CA12

piperazine P02CB01

pipotiazine N05AC04

56

pirbuterol R03AC08 R03CC07

piroxicam M01AC01 M02AA07 S01BC06

porfimer L01XD01

potassium chloride A12BA01 B05XA01

pralidoxime V03AB04

pramipexole N04BC05

pravastatin C10AA03

praziquantel P02BA01

prazosin C02CA01

prednicarbate D07AC18

prednisolone A07EA01 C05AA04 D07AA03 D07XA02 H02AB06 R01AD02 S01BA04 S01CB02 S02BA03 S03BA02

prednisone A07EA03 H02AB07

prilocaine N01BB04

primaquine P01BA03

primidone N03AA03

probenecid M04AB01

probucol C10AX02

procainamide C01BA02

procaine C05AD05 N01BA02 S01HA05

procarbazine L01XB01

prochlorperazine N05AB04

procyclidine N04AA04

progesterone G03DA04

promethazine D04AA10 R06AD02

propafenone C01BC03

propantheline A03AB05

proparacaine S01HA04

propofol N01AX10

propoxyphene N02AC04

propranolol C07AA05

propylthiouracil H03BA02

57

protriptyline N06AA11

pseudoephedrine R01BA02

pyrazinamide J04AK01

pyridostigmine N07AA02

pyridoxine A11HA02

pyrimethamine P01BD01

quazepam N05CD10

quinapril C09AA06

rabeprazole A02BC04

raloxifene G03XC01

raltitrexed L01BA03

ramelteon N05CH02

ramipril C09AA05

ranitidine A02BA02

remifentanil N01AH06

repaglinide A10BX02

reserpine C02AA02

ribavirin J05AB04

riboflavin A11HA04

rifabutin J04AB04

rifampin J04AB02

rifapentine J04AB05

rifaximin A07AA11 D06AX11

riluzole N07XX02

rimantadine J05AC02

rimexolone H02AB12 S01BA13

risedronate M05BA07

risperidone N05AX08

ritonavir J05AE03

rivastigmine N06DA03

rizatriptan N02CC04

58

rocuronium M03AC09

ropinirole N04BC04

ropivacaine N01BB09

rosiglitazone A10BG02

rosuvastatin C10AA07

salicylic acid D01AE12 S01BC08

salmeterol R03AC12

salsalate N02BA06

saquinavir J05AE01

secobarbital N05CA06

selegiline N04BD01

selenium sulfide D01AE13

sertaconazole D01AC14

sertraline N06AB06

sevelamer V03AE02

sevoflurane N01AB08

sibutramine A08AA10

sildenafil G04BE03

silver sulfadiazine D06BA01

simvastatin C10AA01

sirolimus L04AA10

sodium bicarbonate

B05CB04 B05XA02

solifenacin G04BD08

sotalol C07AA07

spironolactone C03DA01

stavudine J05AF04

streptomycin A07AA04 J01GA01

streptozocin L01AD04

succimer V09CA02 V09IA03

succinylcholine M03AB01

sucralfate A02BX02

59

sulfacetamide S01AB04

sulfasalazine A07EC01

sulfisoxazole J01EB05 S01AB02

sulindac M01AB02

suramin P01CX02

tacrolimus D11AX14 L04AD02

tadalafil G04BE08

tamoxifen L02BA01

tamsulosin G04CA02

tazarotene D05AX05

tegaserod A03AE02

telithromycin J01FA15

telmisartan C09CA07

temazepam N05CD07

temozolomide L01AX03

teniposide L01CB02

tenofovir J05AF07

terazosin G04CA03

terbinafine D01AE15 D01BA02

terbutaline R03AC03 R03CC03

terconazole G01AG02

teriparatide H05AA02

testosterone G03BA03

tetrabenazine N07XX06

tetracycline A01AB13 D06AA04 J01AA07 S01AA09 S02AA08 S03AA02

thalidomide L04AX02

theophylline R03DA04

thiamine A11DA01

thioguanine L01BB03

thiopental N01AF03 N05CA19

thioridazine N05AC02

60

thiothixene N05AF04

tiaprofenic acid M01AE11

ticlopidine B01AC05

tigecycline J01AA12

tiludronate M05BA05

timolol C07AA06 S01ED01

tinidazole J01XD02 P01AB02

tioconazole D01AC07 G01AF08

tiotropium R03BB04

tipranavir J05AE09

tirofiban B01AC17

tobramycin J01GB01 S01AA12

tolazamide A10BB05

tolbutamide A10BB03 V04CA01

tolcapone N04BX01

tolmetin M01AB03 M02AA21

tolnaftate D01AE18

tolterodine G04BD07

topiramate N03AX11

topotecan L01XX17

toremifene L02BA02

torsemide C03CA04

trandolapril C09AA10

tranexamic acid B02AA02

tranylcypromine N06AF04

travoprost S01EE04

treprostinil B01AC21

triamcinolone A01AC01 D07AB09 D07XB02 H02AB08 R01AD11 R03BA06 S01BA05

triamterene C03DB02

triazolam N05CD05

trifluoperazine N05AB06

61

trifluridine S01AD02

trihexyphenidyl N04AA01

trimethobenzamide

A04AD55

trimethoprim J01EA01

trimipramine N06AA06

tropicamide S01FA06

trospium G04BD09

urea A10BB32

urofollitropin G03GA04

ursodiol A05AA02

valrubicin L01DB09

valsartan C09CA03

vancomycin A07AA09 J01XA01

vardenafil G04BE09

vasopressin H01BA01

vecuronium M03AC03

venlafaxine N06AX16

verapamil C08DA01

verteporfin S01LA01

vigabatrin N03AG04

vinblastine L01CA01

vincristine L01CA02

vindesine L01CA03

vinorelbine L01CA04

vitamin a A11CA01

vitamin e A11HA03

voriconazole J02AC03

warfarin B01AA03

yohimbine G04BE04

zafirlukast R03DC01

zaleplon N05CF03

62

zanamivir J05AH01

zidovudine J05AF01

zileuton R03DX08

ziprasidone N05AE04

zoledronic acid M05BA08

zolmitriptan N02CC03

zolpidem N05CF02

zonisamide N03AX15

zopiclone N05CF01

zuclopenthixol N05AF05

Note: * indicates a placeholder code that we created in the case of four drugs for which we were unable to determine an ATC code.

Table S3. Number of missing observations for PubChem properties extracted for this study. The total number of drugs in the study was 809. *XLogP3 and Tautomer Count were excluded from the study due to the missing values.

Property name Number of missing observations

Molecular Weight 0 XLogP3* 41 H Bond Donor 0 H Bond Acceptor 0 Rotatable Bond Count 0 Tautomer Count* 336 Topol Polar Surface Area 0 Heavy Atom Count 0 Formal Charge 0 Complexity 0 Isotope Atom Count 0 Defined Atom StereoCenter (SC) Count 0 Undefined Atom SC Count 0 Defined Bond SC Count 0 Undefined Bond SC Count 0 Covalently Bonded (CB) Unit Count 0

63

Table S4. Number of missing observations for DrugBank properties extracted for this study. The total number of drugs in the study was 809. Melting Point and Half Life were excluded from the study owing to the missing values. The remaining two properties (Exp LogP Hydrophobicity and Protein Binding) were initially included in the study through data imputation. The effect of the imputed data on the predictive performance was assessed by excluding these two properties from the model as well.

Property name Number of missing observations

Exp LogP Hydrophobicity 91 Protein Binding 218 Melting Point 261 Half Life 450

64

Table S5. Intercorrelation analysis of all covariates. The highest positive and negative Pearson correlations are bolded.

degree‐prod

degree‐absdiff

jackard‐ADE‐max

jackard‐drug‐max

jackard‐ADE‐KL

jackard‐drug‐KL

atc‐min

meddra‐min

atc‐KL

meddra‐KL

euclid‐min

degree‐absdiff

0.55

jackard‐ADE‐max

0.71 0.64

jackard‐drug‐max

0.5 0.4 0.57

jackard‐ADE‐KL

‐0.62 ‐0.37 ‐0.77 ‐0.55

jackard‐drug‐KL

‐0.36 ‐0.2 ‐0.4 ‐0.73 0.45

atc‐min ‐0.47 ‐0.36 ‐0.61 ‐0.51 0.66 0.37

meddra‐min

‐0.27 ‐0.23 ‐0.28 ‐0.41 0.25 0.37 0.22

atc‐KL ‐0.15 ‐0.08 ‐0.18 ‐0.12 0.24 0.17 ‐0.02 0.05

meddra‐KL

‐0.09 ‐0.04 ‐0.04 ‐0.16 0.06 0.21 0.02 0.17 0.02

euclid‐min

‐0.08 ‐0.06 ‐0.11 ‐0.16 0.12 0.15 0.14 0.05 0.04 0

euclid‐KL

‐0.37 ‐0.21 ‐0.53 ‐0.48 0.67 0.49 0.56 0.18 0.31 0.02 0.17

65

Table S6. Prediction cases studies. The selected drug‐ADE pairs represent some prominent drug‐ADE associations newly discovered during the period of 2006 to 2010.

Drug name ADE Score Spec (score) PPV (score)

Aprotinin Anaphylaxis 0.15895 0.99 0.67

Ibandronate Osteonecrosis 0.05089 0.91 0.09

Norfloxacin Tendon ruptures 0.20902 0.95 0.32

Rosiglitazone Myocardial infarction 0.04632 0.97 0.33

Saquinavir Torsade de pointes 0.08833 0.94 0.49

Saquinavir Electrocardiogram QT prolonged

0.04748 0.89 0.39

Tegaserod Stroke 0.24184 0.98 0.05

Zonisamide Suicidal ideation 0.16818 0.93 0.04

Table S7. List of supplementary source code files.

File name Comments

meddra_mapping_code.sas SAS code to perform MedDRA mapping

NET_INT_covariates.r R code to compute network and intrinsic covariates

TAX_covariates.sas SAS code to compute taxonomic covariates

66

SUPPLEMENTARY FIGURES

A B C D G H J L M N P R S V0

20

40

60

ATC top-level group

Mean n

um

ber of AEs

Fig. S1. Newly associated ADEs per drug in each ATC top‐level group. ATC top‐level groups: A, alimentary tract and metabolism; B, blood and blood forming organs; C, cardiovascular system; D, dermatologicals; G, genito‐urinary system and sex hormones; H, systemic hormonal preparations; J, anti‐infectives for systemic use; L, antineoplastic and immunomodulating agents; M, musculo‐skeletal system; N, nervous system; P, antiparasitic products, insecticides and repellents; R, respiratory system; S, sensory organs; V, various. Data are means and error bars represent 95% CIs.

67

blo

car

con

ear

end

eye

gas

gen

hep

imm inf

inj

inv

met

mus

neo

ner

pre

psy

ren

rep

res

ski

sur

vas

0

20

40

60

MedDRA top-level group

Mean n

um

ber of dru

gs

Fig. S2. Newly associated drugs per ADE in each MedDRA top‐level group. MedDRA top‐level groups: blo, blood and lymphatic system disorders; car, bardiac disorders; con, congenital, familial and genetic disorders; ear, ear and labyrinth disorders; end, endocrine disorders; eye, eye disorders; gas, gastrointestinal disorders; gen, general disorders and administration site conditions; hep, hepatobiliary disorders; imm, immune system disorders; inf, infections and infestations; inj, injury, poisoning and procedural complications; inv, investigations; met, metabolism and nutrition disorders; mus, musculoskeletal and connective tissue disorders; neo, neoplasms benign, malignant and unspecified; ner, nervous system disorders; pre, pregnancy, puerperium and perinatal conditions; psy, psychiatric disorders; ren, renal and urinary disorders; rep, reproductive system and breast disorders; res, respiratory, thoracic and mediastinal disorders; ski, skin and subcutaneous tissue disorders; sur, surgical and medical procedures; vas, vascular disorders. Data are means and error bars represent 95% CIs.

68

A B

0 0.080.16 0.280.36 0.480.56 0.680.76 0.880.96

score

0

20

40

60

80

pe

rce

nt

Mean = 0.03Std = 0.09

0 0.080.16 0.28 0.4 0.48 0.60.680.76 0.880.96

score

0

4

8

12

16

20

pe

rce

nt

Mean = 0.46Std = 0.36

C D

0 0.04 0.10.14 0.20.24 0.30.34 0.40.44 0.50.54

score

0

20

40

60

80

pe

rce

nt Mean = 0.04

Std = 0.09

0 0.04 0.10.14 0.20.24 0.30.34 0.40.44 0.50.54

score

0

4

8

12

16

pe

rce

ntMean = 0.25Std = 0.17

E F

0 0.020.040.060.08 0.1 0.120.140.160.18 0.2 0.22

score

0

10

20

30

40

50

pe

rce

nt Mean = 0.05

Std = 0.06

0 0.020.040.060.08 0.1 0.120.140.160.18 0.2 0.22

score

0

4

8

12

16

pe

rce

nt Mean = 0.14

Std = 0.07

Fig. S3. Comparative histograms of scores the observed edges and non‐edges by the three model types. (A and B) NET model, non‐edges (A) and edges (B). (C and D) TAX model, non‐edges (C) and edges (D). (E and F) INT model, non‐edges (E) and edges (F).

for

69

True positives False positives

Fig. S4. Three‐way Venn diagrams for the sets of true positives and false positives generated by models NET, TAX, and INT. Specificity was fixed at 0.95.

70

Pairs predicted as non‐edges Pairs predicted as edges

0 4000 10000 16000 22000 28000 34000 40000

degree-prod

0

20

40

60

80

pe

rce

nt

0 4000 10000 16000 22000 28000 34000 40000

degree-prod

0

4

8

12

16

pe

rce

nt

0 50100 200 300 400 500 600 700

degree-absdiff

0

20

40

60

80

pe

rce

nt

0 50100 200 300 400 500 600 700

degree-absdiff

0

5

10

15

20

pe

rce

nt

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

jackard-ae-max

0

10

20

30

40

50

pe

rce

nt

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

jackard-ae-max

0

4

8

12

16

pe

rce

nt

00.4 1.2 22.4 3.2 44.4 5.2 66.4 7.2 8

jackard-ae-KL

0

10

20

30

40

50

pe

rce

nt

00.4 1.2 22.4 3.2 44.4 5.2 66.4 7.2 8

jackard-ae-KL

0

10

20

30

40

pe

rce

nt

Fig. S5. Comparative histograms of selected network covariates or the predicted edges and non‐edges. The predictions were generated by fixing the specificity of model NET at 0.95.

f

71


2 4 6 8 10 12

atc-min

0

10

20

30

40

50

pe

rce

nt

2 4 6 8 102 4 6 8 10 12

atc-min

0

20

40

60

80

pe

rce

nt

2 4 6 8 10

00.3 0.9 1.5 2.12.55 33.3 3.9 4.5 5.15.55 6

atc-KL

0

10

20

30

40

50

pe

rce

nt

00.3 0.9 1.5 2.12.55 33.3 3.9 4.5 5.15.55 6

atc-KL

0

20

40

60

pe

rce

nt

Fig. S6. Comparative histograms of selected taxonomic covariates or the predicted edges and non‐edges. The predictions were generated by fixing the specificity of model TAX at 0.95.

f

72


010 30 50 70 90 110 130 150 170 190 210

euclid-min

0

2

4

6

8

10

12

pe

rce

nt

010 30 50 70 90 110 130 150 170 190 210

euclid-min

0

5

10

15

20

25

30

pe

rce

nt

00.30.81.31.82.32.83.33.84.34.85.35.86.36.8

euclid-KL

0

2

4

6

8

10

pe

rce

nt

00.30.81.31.82.32.83.33.84.34.85.35.86.36.8

euclid-KL

0

20

40

60

80

pe

rce

nt

Fig. S7. Comparative histograms of the intrinsic covariates or the predicted edges and non‐edges. The predictions were generated by fixing the specificity of model INT at 0.95.

f

73

A

A B C D G H J L M N P R S V0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

ATC top-level category

AU

RO

C

B

0 20 40 60 80 100 120 140 160 180 2000.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

AUROC

Regression line

Number of newly associated ADEs

AU

RO

C

Fig. S8. Drug‐specific AUROCs. (A) AUROCs were grouped by ATC top‐level category. Group means and 95% CIs (error bars) are shown in red. (B) AUROC was plotted against the number of newly associated ADEs. A regression line with slope of 0.00003 and P = 0.86 (F test) is shown.

74

A

blo

car

con

ear

end

eye

gas

gen

hep

imm inf

inj

inv

met

mus

neo

ner

pre

psy

ren

rep

res

ski

sur

vas

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

MedDRA top-level category

AU

RO

C

B

0 20 40 60 80 100 120 1400.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

AUROC

Regression line

Number of newly associated drugs

AU

RO

C

Fig. S9. ADE‐specific AUROCs. (A) AUROCs were grouped by MedDRA top‐level category. Group means and 95% CIs (error bars) are shown in red. (B) AUROC was plotted against the number of newly associated drugs. A regression line with slope of ‐0.0015 and P < 0.0001 (F test) is shown.

supplementary materials for · 12/19/2011 · supplementary materials for predicting adverse drug...

Documents