supplementary materials for · 12/19/2011 · supplementary materials for predicting adverse drug...
TRANSCRIPT
www.sciencetranslationalmedicine.org/cgi/content/full/3/114/114ra127/DC1
Supplementary Materials for
Predicting Adverse Drug Events Using Pharmacological Network Models
Aurel Cami,* Alana Arnold, Shannon Manzi, Ben Reis
*To whom correspondence should be addressed. E-mail: [email protected]
Published 21 December 2011, Sci. Transl. Med. 3, 114ra127 (2011)
DOI: 10.1126/scitranslmed.3002774
This PDF file includes:
Methods Table S1. Definition of covariates. Table S2. List of drugs and their ATC codes. Table S3. Number of missing observations for PubChem properties extracted for this study. Table S4. Number of missing observations for DrugBank properties extracted for this study. Table S5. Intercorrelation analysis of covariates. Table S6. Prediction cases studies. Table S7. List of supplementary source code files. Fig. S1. Newly associated ADEs per drug in each ATC top-level group. Fig. S2. Newly associated drugs per ADE in each MedDRA top-level group. Fig. S3. Comparative histograms of scores for the observed edges and non-edges by the three model types. Fig. S4. Three-way Venn diagrams for the sets of true and false positives generated by models NET, TAX, and INT. Fig. S5. Comparative histograms of selected network covariates for the predicted edges and non-edges. Fig. S6. Comparative histograms of selected taxonomic covariates for the predicted edges and non-edges. Fig. S7. Comparative histograms of the intrinsic covariates for the predicted edges and non-edges. Fig. S8. Drug-specific AUROCs. Fig. S9. ADE-specific AUROCs.
Other Supplementary Material for this manuscript includes the following: (available at www.sciencetranslationalmedicine.org/cgi/content/full/3/114/114ra127/DC1)
File “meddra_mapping_code.sas” (SAS code to perform MedDRA mapping). File “NET_INT_covariates.R” (R code to compute network and intrinsic covariates). File “TAX_covariates.sas”(SAS code to compute taxonomic covariates). File “Fig2-highres.tif” (high-resolution version of Fig. 2).
1
Supplementary Methods
Mapping ADE names to MedDRA taxonomy We employed the following approach to map ADE names to the Medical Dictionary for
Regulatory Activities (MedDRA) terminology. First, we performed exact matching of each ADE
name against the lowest level terms (LLTs) of MedDRA. This step led to approximately 40% of
the unique ADE names being matched to LLTs. Next, for each non‐matched ADE name we
identified the two closest LLTs in terms of the string generalized edit distance (computed using
function COMPGED in the Statistical Analysis System (SAS) v9.2). Computer code to perform
exact matching and to identify the two closest LLTs of an ADE name is provided below and as
supplementary online files (table S7): meddra_mapping_code.sas, NET_INT_covariates.R, and
TAX_covariates.sas. Then, we performed a manual scan of the list of ADE names and their two
closest LLTs and were able to determine a match between an ADE name and one of its two
closest LLTs for approximately half of the list. We coded the final 30% of ADE names that were
still left unmatched at the end of the preceding step by performing term‐based searches using a
MedDRA browser. After the mapping of all ADE names to MedDRA LLT level was completed, we
identified the unique PT corresponding to each LLT. Finally, we identified the list of HLTs that
corresponded to each PT generated by the preceding step. In this study, all adverse events
were represented by their MedDRA HLT codes.
Source code Any reuse of all or part of these codes must reference this publication. The corresponding SAS
and R files are provided as supplementary online material.
MedDRA mapping SAS code /***************************************************************** Macro to exact-match ADE names to MedDRA LLT names. "in_ds" should be a SAS library containing two kinds of input files: First, it should contain a list of unique ADE names occurring in the drug-ADE database. This list is assumed to have been stored in a SAS data set named "<YEAR>_aes", where YEAR is 2005 or 2010. This data set contains one column named "ae_name". Second, the library should contain a list of unique LLT names occurring in MedDRA. This list is assumed to be stored in a SAS data set named "<YEAR>_unique_llts_meddra". This data set should contain one column named
2
"llt_name" as well as other columns including the MedDRA information pertaining to an LLT, such as the pt_code, pt_name, and so on. ******************************************************************/ %macro exact_match(year= ); proc sort data=in_ds._&year._aes; by ae_name; run; quit; proc sort data=in_ds._&year._unique_llts_meddra; by llt_name; run; quit; data out_ds._&year._aes_meddra; merge in_ds._&year._aes(in=a) in_ds._&year._unique_llts_meddra(in=b rename=(llt_name=ae_name)); by ae_name; if(a); run; quit; %mend exact_match; /****************************************************************** Macro to compute the smallest GED distance between ADE names and MedDRA LLT names. "in_ds" should be a SAS library containing two input SAS data sets: First, this library is supposed to contain the list of ADE names that were not exact-matched after running the previous macro. This list is assumed to be stored in the SAS data set "<YEAR>_unique_aes_llt_nomatch" where YEAR is either 2005 or 2010. This data set contains one column named "ae_name". Second, this library is supposed to contain the list of all unique LLT names in MedDRA. This is list is supposed to be stored in a SAS data set named "llt_lltname_only". This file has only one one column named "llt_name". out_ds: is a library that will contain output file(s) produced by the macro. To limit the running time to a few hours, this macro should be run in a cluster, with each computing node processing a portion of the ADE names contained in the input file "<YEAR>_unique_aes_llt_nomatch". This portion is defined by the macro variables: "jobnum": taking values 1,... "rows_per_job": number of ADE names in the job "total_rows": total number of ADE names to be processed One output file per job will be produced. These partial output files should in the end merged together. Each output file produced by a job contains the following fields: AE_name min_llt1: the closest LLT name in terms of GED min_score1: the GED between AE_name and min_llt1 min_llt2: the second closest LLT name min_score2: the GED between AE_name and min_llt2
3
Notes on running time: The GED computation for all the ADE names occurring in the drug-ADE database in our study took a few hours using the Orchestra cluster (http://ritg.med.harvard.edu/cluster.html)and a few hundred jobs. ******************************************************************/ /* The macro variables below should be re-defined as needed before submitting each job (e.g. using an external script). The values below are given for illustration purposes only */ %let jobnum=1; %let rows_per_job=20; %let total_rows=1000; %macro find_min_ged_llt(year= ,num_llts= ); /* create a local macro variable per LLT name */ data _null_; set in_ds.llt_lltname_only; call symputx(cats("llt_name_", _N_), llt_name, "L"); run; quit; /* find two closest LLTs to each ADE name */ data out_ds._&year._unique_aes_geds_&jobnum; length min_llt1 min_llt2 $ 255; length min_score1 min_score2 8; set in_ds._&year._unique_aes_llt_nomatch; start_ob = (&jobnum - 1)* &rows_per_job + 1; end_ob = &jobnum * &rows_per_job; if (end_ob > &total_rows) then end_ob = &total_rows; if (_N_ < start_ob OR _N_ > end_ob) then delete; else do; *put "_N_ = " _N_; i = 1; min_score1 = 100000; min_score2 = 100000; /* very large values */ min_llt1 = ""; min_llt2 = ""; do while (i <= &num_llts); llt_name_i = SYMGET(cats("llt_name_", i)); score = COMPGED(ae_name, llt_name_i); if (score < min_score1) then do; min_score2 = min_score1; min_score1 = score; min_llt2 = min_llt1; min_llt1 = llt_name_i; end; else if (score < min_score2) then do; min_score2 = score; min_llt2 = llt_name_i;
4
end; i = i+1; end; end; keep AE_name min_llt1 min_score1 min_llt2 min_score2; run; quit; run; quit; %mend find_min_ged_llt; /* example calls */ *%exact_match(year=2010); *%find_min_ged_llt(year=2010, num_llts=67159);
NET, INT covariates R code ####################################################################### # The following functions require R libraries "network" and "sna". # One way to acquire these libraries is to install the "statnet" suite # of packages (www.statnetproject.org). # # Note on running time: The functions listed below were executed for # 689,268 drug-ADE pairs in the network discussed in the paper # # This task was carried out using the Orchestra cluster # (http://ritg.med.harvard.edu/cluster.html), with a few hundred jobs in parallel and took about one day to complete. ###################################################################### ####################################################################### # Function to compute the covariate "euclid-min" discussed in the # paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length two. The first element of this vector denotes the value of covariate "euclid-min" ###################################################################### compute.value.euclid.dist.features = function(G, node1, node2) { all.attr.names = list.vertex.attributes(G) quant.attr.names = setdiff(all.attr.names, c("node_id","drug_name","DrugCard_ID","na", "PubChem_Compound_ID","stitch_compound_name1","vertex.names"))
5
n.drugs = network.size(G) n.attributes = length(quant.attr.names) attr.mat = matrix(-999, nrow=n.drugs, ncol=n.attributes) for (i in 1:n.attributes) { attr.vec = get.vertex.attribute(G, quant.attr.names[i]) attr.mat[,i] = attr.vec } attrs.node1 = attr.mat[node1, ] N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) result.value.attr = numeric(2) min.feature.name = "min_euclid_dist" mean.feature.name = "mean_euclid_dist" names(result.value.attr) = c(min.feature.name, mean.feature.name) if (length(N2) == 0) { result.value.attr[min.feature.name] = 0 result.value.attr[mean.feature.name] = 0 } else { attr.mat.node2 = attr.mat[N2, ] merged.attr.mat = rbind(attrs.node1, attr.mat.node2) dist.mat = as.matrix(dist(merged.attr.mat)) dist.vec = dist.mat[1,2:nrow(merged.attr.mat)] result.value.attr[min.feature.name] = min(dist.vec) result.value.attr[mean.feature.name] = mean(dist.vec) } return(result.value.attr) } ####################################################################### # Function to compute the distribution of Euclidean distances # in the neighborhood of a drug-ADE pair. This distribution is used # in the computation of covariate "euclid-KL" discussed in the # paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a discretized version of the distribution # of Euclidean distances in the neighborhood of pair (node1, node2) ###################################################################### compute.value.euclid.dist.features.full = function(G, node1, node2) { all.attr.names = list.vertex.attributes(G) quant.attr.names = setdiff(all.attr.names, c("node_id","drug_name","DrugCard_ID","na", "PubChem_Compound_ID","stitch_compound_name1","vertex.names"))
6
n.drugs = network.size(G) n.attributes = length(quant.attr.names) attr.mat = matrix(-999, nrow=n.drugs, ncol=n.attributes) for (i in 1:n.attributes) { attr.vec = get.vertex.attribute(G, quant.attr.names[i]) attr.mat[,i] = attr.vec } attrs.node1 = attr.mat[node1, ] N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) nbins = 20 result.value.attr = numeric(nbins) bin.names = character(nbins) for (i in 1:nbins) { bin.names[i] = paste("euclid_bin",i,sep="") } names(result.value.attr) = bin.names breaks.vec = c(seq(from=0, by=40, length.out=20), 10^10) if (length(N2) == 0) { result.value.attr=rep(0,20) } else { attr.mat.node2 = attr.mat[N2, ] merged.attr.mat = rbind(attrs.node1, attr.mat.node2) dist.mat = as.matrix(dist(merged.attr.mat)) dist.vec = dist.mat[1,2:nrow(merged.attr.mat)] histogram.obj = hist(dist.vec, breaks=breaks.vec, plot=FALSE) result.value.attr = histogram.obj$density } return(result.value.attr) } ####################################################################### # Function to compute the covariates "degree-prod" and "degree-absdiff" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length four. The third and second # elements of this vector denote "degree-prod" and "degree-absdiff", # respectively. ###################################################################### compute.degree.features = function(G, node1, node2, D) {
7
#D = degree(G, gmode="graph") D.node1 = D[node1] D.node2 = D[node2] result.degree = numeric(4) result.degree[1:4] <- NA names(result.degree) = c('degree_sum', 'degree_absdiff', 'degree_prod', 'degree_ratio') result.degree['degree_sum'] = D.node1 + D.node2 result.degree['degree_absdiff'] = abs(D.node1 - D.node2) result.degree['degree_prod'] = D.node1 * D.node2 result.degree['degree_ratio'] = D.node1/D.node2 return(result.degree) } ####################################################################### # Function to compute the covariate "jackard-drug-max" # discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length three. The second # element of this vector denotes covariate "jackard-drug-max" ###################################################################### compute.jackard.drug.features = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) n.neighbors = length(N2) result.jackard.drug = numeric(3) result.jackard.drug[1:3] <- NA names(result.jackard.drug) = c('jackard_drug_min','jackard_drug_max','jackard_drug_mean') if (n.neighbors == 0) { result.jackard.drug['jackard_drug_min'] = 0 result.jackard.drug['jackard_drug_max'] = 0 result.jackard.drug['jackard_drug_mean'] = 0 } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N2[i] N1.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N1, N1.i) union.i = union(N1, N1.i) jackard.vector[i] = length(intersection.i)/length(union.i) }
8
result.jackard.drug['jackard_drug_min'] = min(jackard.vector) result.jackard.drug['jackard_drug_max'] = max(jackard.vector) result.jackard.drug['jackard_drug_mean'] = mean(jackard.vector) } return(result.jackard.drug) } ####################################################################### # Function to compute the distribution of Jackard coefficients in the neighborhood of a drug-ADE pair. This distribution is used in the computation of covariate "jackard-drug-KL" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a discretized version of the distribution # of Jackard coefficients in the neighborhood of pair (node1, node2) ####################################################################### compute.jackard.drug.features.full = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N2 = setdiff(N2, node1) n.neighbors = length(N2) nbins = 20 result.jackard.drug = numeric(nbins) bin.names = character(nbins) for (i in 1:nbins) { bin.names[i] = paste("jackard_drugs_bin",i,sep="") } breaks.vec = seq(from=0, by=0.05, length.out=21) if (length(N2) == 0) { result.jackard.drug=rep(0,20) } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N2[i] N1.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N1, N1.i) union.i = union(N1, N1.i) jackard.vector[i] = length(intersection.i)/length(union.i) } histogram.obj = hist(jackard.vector, breaks=breaks.vec, plot=FALSE) result.jackard.drug = histogram.obj$density #cat("sum ", sum(result.jackard.drug), "\n"); flush.console() }
9
names(result.jackard.drug) = bin.names return(result.jackard.drug) } ####################################################################### # Function to compute the covariate "jackard-ADE-max" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a vector of length three. The second element of this vector denotes covariate "jackard-ADE-max" ####################################################################### compute.jackard.ae.features = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N1 = setdiff(N1, node2) n.neighbors = length(N1) result.jackard.ae = numeric(3) result.jackard.ae[1:3] <- NA names(result.jackard.ae) = c('jackard_ae_min','jackard_ae_max','jackard_ae_mean') if (n.neighbors == 0) { result.jackard.ae['jackard_ae_min'] = 0 result.jackard.ae['jackard_ae_max'] = 0 result.jackard.ae['jackard_ae_mean'] = 0 } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N1[i] N2.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N2, N2.i) union.i = union(N2, N2.i) jackard.vector[i] = length(intersection.i)/length(union.i) } result.jackard.ae['jackard_ae_min'] = min(jackard.vector) result.jackard.ae['jackard_ae_max'] = max(jackard.vector) result.jackard.ae['jackard_ae_mean'] = mean(jackard.vector) } return(result.jackard.ae) } ####################################################################### # Function to compute the distribution of Jackard coefficients # in the neighborhood of a drug-ADE pair. This distribution is used # in the computation of covariate "jackard-ADE-KL" discussed in the # paper. #
10
# G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G # # This function returns a discretized version of the distribution # of Jackard coefficients in the neighborhood of pair (node1, node2) ####################################################################### compute.jackard.ae.features.full = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) N1 = setdiff(N1, node2) n.neighbors = length(N1) nbins = 20 result.jackard.ae = numeric(nbins) bin.names = character(nbins) for (i in 1:nbins) { bin.names[i] = paste("jackard_aes_bin",i,sep="") } breaks.vec = seq(from=0, by=0.05, length.out=21) if (n.neighbors == 0) { result.jackard.ae=rep(0,20) } else { jackard.vector = numeric(n.neighbors) for (i in 1:n.neighbors) { neighbor.i = N1[i] N2.i = get.neighborhood(G, neighbor.i) intersection.i = intersect(N2, N2.i) union.i = union(N2, N2.i) jackard.vector[i] = length(intersection.i)/length(union.i) } histogram.obj = hist(jackard.vector, breaks=breaks.vec, plot=FALSE) result.jackard.ae = histogram.obj$density } names(result.jackard.ae) = bin.names return(result.jackard.ae) } ####################################################################### # Function to compute the covariate "edge-density" discussed in the paper. # # G is a network object # node1 denodes the PubChem_Compound_ID of a drug # node2 denotes the HLT code (MedDRA) of an ADE # # All properties of drugs and ADEs should have been stored as vertex attributes of the network object G
11
# # This function returns a vector of length one denoting the covariate # "edge-density" ####################################################################### compute.edge.density.features = function(G, node1, node2) { N1 = get.neighborhood(G, node1) N2 = get.neighborhood(G, node2) numer = sum(G[N2,N1]) denom = length(N1)*length(N2) result.edge.dens = numeric(1) result.edge.dens[1] <- NA names(result.edge.dens) = c('edge_dens') result.edge.dens['edge_dens'] = numer/denom return(result.edge.dens) }
TAX covariates SAS code /* * Function to compute the covariate "atc-min" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_atc which also contains * the value of "atc-min" covariate for each drug-ADE pair */ %macro add_ATC_codes_min(all_pairs_ds= ); %local macro_i; /* create hash tables */ data in_ds.&all_pairs_ds._2005edge; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2; run; quit; proc sort data=in_ds.&all_pairs_ds._2005edge; by node_id_2; run; quit; proc transpose data=in_ds.&all_pairs_ds._2005edge out=in_ds.&all_pairs_ds._2005edge_t prefix=node_id_1_; by node_id_2; var node_id_1; run; quit;
12
data in_ds.&all_pairs_ds._2005edge_t; set in_ds.&all_pairs_ds._2005edge_t; drop node_id_1 _NAME_ _LABEL_; run; quit; /* perform ATC distance computations */ data in_ds.&all_pairs_ds._atc; length node_id_1 8; length node_id_2 8; length atc_min_val 8; length atc_max_val 8; length atc_mean_val 8; %let macro_i=1; %do %while (¯o_i < 657); /* (1 + max AE degree) in the 2005 network */ length node_id_1_¯o_i 8; %let macro_i = %eval(¯o_i + 1); %end; length PubChem_Compound_ID 8; length atc_code_min_dist 8; length atc_code_min_tmp 8; length atc_code1-atc_code11 $ 7; length L1 L3 L4 L1_prime L3_prime L4_prime $ 1; length L2 L2_prime $ 2; set in_ds.&all_pairs_ds; array atc_codes_d1{11} $ 8 atc_code_d1_1-atc_code_d1_11; array atc_codes_d2{11} $ 8 atc_code_d2_1-atc_code_d2_11; array atc_min_distances{657} 8; if (_N_ = 1) then do; /* for each drug there were 1-11 ATC codes */ declare hash atcHash(dataset: 'in_ds._2005_drugs_atc_data'); rc = atcHash.definekey('PubChem_Compound_ID'); rc = atcHash.definedata('atc_code1', 'atc_code2', 'atc_code3', 'atc_code4', 'atc_code5', 'atc_code6', 'atc_code7', 'atc_code8', 'atc_code9', 'atc_code10', 'atc_code11'); atcHash.definedone(); /* for each HLT, the neighbors in 2005 network */ declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._2005edge_t"); rc = neighborHash.definekey('node_id_2'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); end; atc_min_val = .; atc_max_val = .; atc_mean_val = .;
13
PubChem_Compound_ID = node_id_1; rc = atcHash.find(); if (rc NE 0) then put "Could not find" PubChem_Compound_ID; else do; atc_code_d1_1 = atc_code1; atc_code_d1_2 = atc_code2; atc_code_d1_3 = atc_code3; atc_code_d1_4 = atc_code4; atc_code_d1_5 = atc_code5; atc_code_d1_6 = atc_code6; atc_code_d1_7 = atc_code7; atc_code_d1_8 = atc_code8; atc_code_d1_9 = atc_code9; atc_code_d1_10 = atc_code10; atc_code_d1_11 = atc_code11; rc = neighborHash.find(); if (rc NE 0) then put "Could not find" node_id_2; else do; %let macro_i = 1; %do %while (¯o_i < 657); atc_min_distances[¯o_i] = .; %let macro_i = %eval(¯o_i + 1); %end; %let macro_i = 1; %do %while (¯o_i < 657); PubChem_Compound_ID = node_id_1_¯o_i; if (PubChem_Compound_ID NE . AND PubChem_Compound_ID NE node_id_1) then do; rc = atcHash.find(); if (rc NE 0) then put "Could not find " PubChem_Compound_ID; else do; atc_code_d2_1 = atc_code1; atc_code_d2_2 = atc_code2; atc_code_d2_3 = atc_code3; atc_code_d2_4 = atc_code4; atc_code_d2_5 = atc_code5; atc_code_d2_6 = atc_code6; atc_code_d2_7 = atc_code7; atc_code_d2_8 = atc_code8; atc_code_d2_9 = atc_code9; atc_code_d2_10 = atc_code10; atc_code_d2_11 = atc_code11; atc_code_min_dist = 99; /* very large value */ i=1;
14
do while (i <= 11); /* max number of unique ATC codes */ if (atc_codes_d1[i] EQ "") then do; leave; end; L1 = substrn(atc_codes_d1[i],1,1); L2 = substrn(atc_codes_d1[i],2,2); L3 = substrn(atc_codes_d1[i],4,1); L4 = substrn(atc_codes_d1[i],5,1); j=1; do while (j <= 11); if (atc_codes_d2[j] EQ "") then do; leave; end; L1_prime = substrn(atc_codes_d2[j],1,1); L2_prime = substrn(atc_codes_d2[j],2,2); L3_prime = substrn(atc_codes_d2[j],4,1); L4_prime = substrn(atc_codes_d2[j],5,1); if (L1 EQ L1_prime AND L2 EQ L2_prime AND L3 EQ L3_prime AND L4 EQ L4_prime) then atc_code_min_tmp = 2; else if (L1 EQ L1_prime AND L2 EQ L2_prime AND L3 EQ L3_prime) then atc_code_min_tmp = 4; else if (L1 EQ L1_prime AND L2 EQ L2_prime) then atc_code_min_tmp = 6; else if (L1 EQ L1_prime) then
15
atc_code_min_tmp = 8; else atc_code_min_tmp = 10; if (atc_code_min_tmp < atc_code_min_dist) then atc_code_min_dist = atc_code_min_tmp; j = j+1; end; /* while j */ i = i+1; end; /* while i */ atc_min_distances[¯o_i] = atc_code_min_dist; end; /*else*/ end; /* if Pubchem_compound_ID */ %let macro_i = %eval(¯o_i + 1); %end; /*while macro i*/ atc_min_val = min(of atc_min_distances{*}); atc_max_val = max(of atc_min_distances{*}); atc_mean_val = mean(of atc_min_distances{*}); end; end; keep node_id_1 node_id_2 atc_min_val atc_max_val atc_mean_val is_old_edge is_old_edge_class is_new_edge is_new_edge_class is_test_pair is_test_pair_class; run; quit; %mend add_ATC_codes_min; /* * Function to compute the distribution of ATC distances * in the neighborhood of a drug-ADE pair. This distribution is used * in the computation of covariate "atc-KL" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_atcb which also contains * the distribution ATC distances in the neighborhood of each pair
16
*/ %macro add_ATC_codes_bins(all_pairs_ds= ); %local macro_i; /* create hash tables */ data in_ds.&all_pairs_ds._05e; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2; run; quit; proc sort data=in_ds.&all_pairs_ds._05e; by node_id_2; run; quit; proc transpose data=in_ds.&all_pairs_ds._05e out=in_ds.&all_pairs_ds._05et prefix=node_id_1_; by node_id_2; var node_id_1; run; quit; data in_ds.&all_pairs_ds._05et; set in_ds.&all_pairs_ds._05et; drop node_id_1 _NAME_ _LABEL_; run; quit; /* compute full distribution of distances */ data in_ds.&all_pairs_ds._atcb; length node_id_1 8; length node_id_2 8; length atc_min_val 8; length atc_max_val 8; length atc_mean_val 8; length atc_bin1-atc_bin5 8; %let macro_i=1; %do %while (¯o_i < 657); length node_id_1_¯o_i 8; %let macro_i = %eval(¯o_i + 1); %end; length PubChem_Compound_ID 8; length atc_code_min_dist 8; length atc_code_min_tmp 8; length atc_code1-atc_code11 $ 7; length L1 L3 L4 L1_prime L3_prime L4_prime $ 1; length L2 L2_prime $ 2; set in_ds.&all_pairs_ds; array atc_codes_d1{11} $ 8 atc_code_d1_1-atc_code_d1_11; array atc_codes_d2{11} $ 8 atc_code_d2_1-atc_code_d2_11; array atc_min_distances{657} 8;
17
if (_N_ = 1) then do; declare hash atcHash(dataset: 'in_ds._2005_drugs_atc_data'); rc = atcHash.definekey('PubChem_Compound_ID'); rc = atcHash.definedata('atc_code1', 'atc_code2', 'atc_code3', 'atc_code4', 'atc_code5', 'atc_code6', 'atc_code7', 'atc_code8', 'atc_code9', 'atc_code10', 'atc_code11'); atcHash.definedone(); declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._05et"); rc = neighborHash.definekey('node_id_2'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); end; atc_min_val = .; atc_max_val = .; atc_mean_val = .; atc_bin1 = 0; atc_bin2 = 0; atc_bin3 = 0; atc_bin4 = 0; atc_bin5 = 0; PubChem_Compound_ID = node_id_1; rc = atcHash.find(); if (rc NE 0) then put "Could not find" PubChem_Compound_ID; else do; atc_code_d1_1 = atc_code1; atc_code_d1_2 = atc_code2; atc_code_d1_3 = atc_code3; atc_code_d1_4 = atc_code4; atc_code_d1_5 = atc_code5; atc_code_d1_6 = atc_code6; atc_code_d1_7 = atc_code7; atc_code_d1_8 = atc_code8; atc_code_d1_9 = atc_code9; atc_code_d1_10 = atc_code10; atc_code_d1_11 = atc_code11; rc = neighborHash.find(); if (rc NE 0) then put "Could not find" node_id_2; else do; %let macro_i = 1; %do %while (¯o_i < 657); atc_min_distances[¯o_i] = .; %let macro_i = %eval(¯o_i + 1); %end; %let macro_i = 1; %do %while (¯o_i < 657);
18
PubChem_Compound_ID = node_id_1_¯o_i; if (PubChem_Compound_ID NE . AND PubChem_Compound_ID NE node_id_1) then do; rc = atcHash.find(); if (rc NE 0) then put "Could not find " PubChem_Compound_ID; else do; atc_code_d2_1 = atc_code1; atc_code_d2_2 = atc_code2; atc_code_d2_3 = atc_code3; atc_code_d2_4 = atc_code4; atc_code_d2_5 = atc_code5; atc_code_d2_6 = atc_code6; atc_code_d2_7 = atc_code7; atc_code_d2_8 = atc_code8; atc_code_d2_9 = atc_code9; atc_code_d2_10 = atc_code10; atc_code_d2_11 = atc_code11; atc_code_min_dist = 99; i=1; do while (i <= 11); if (atc_codes_d1[i] EQ "") then do; leave; end; L1 = substrn(atc_codes_d1[i],1,1); L2 = substrn(atc_codes_d1[i],2,2); L3 = substrn(atc_codes_d1[i],4,1); L4 = substrn(atc_codes_d1[i],5,1); j=1; do while (j <= 11); if (atc_codes_d2[j] EQ "") then do; leave; end; L1_prime = substrn(atc_codes_d2[j],1,1); L2_prime = substrn(atc_codes_d2[j],2,2); L3_prime = substrn(atc_codes_d2[j],4,1); L4_prime = substrn(atc_codes_d2[j],5,1); if (L1 EQ L1_prime AND
19
L2 EQ L2_prime AND L3 EQ L3_prime AND L4 EQ L4_prime) then atc_code_min_tmp = 2; else if (L1 EQ L1_prime AND L2 EQ L2_prime AND L3 EQ L3_prime) then atc_code_min_tmp = 4; else if (L1 EQ L1_prime AND L2 EQ L2_prime) then atc_code_min_tmp = 6; else if (L1 EQ L1_prime) then atc_code_min_tmp = 8; else atc_code_min_tmp = 10; if (atc_code_min_tmp < atc_code_min_dist) then atc_code_min_dist = atc_code_min_tmp; j = j+1; end; /* while j */ i = i+1; end; /* while i */ atc_min_distances[¯o_i] = atc_code_min_dist; end; /*else*/ end; /* if Pubchem_compound_ID */ %let macro_i = %eval(¯o_i + 1); %end; /*while macro i*/ atc_min_val = min(of atc_min_distances{*}); atc_max_val = max(of atc_min_distances{*}); atc_mean_val = mean(of atc_min_distances{*}); atc_min_nonmiss = N(of atc_min_distances{*}); k = 1; do while (k <= dim(atc_min_distances));
20
if (atc_min_distances[k] NE .) then do; if (atc_min_distances[k] EQ 2) then atc_bin1 = atc_bin1 + 1; if (atc_min_distances[k] EQ 4) then atc_bin2 = atc_bin2 + 1; if (atc_min_distances[k] EQ 6) then atc_bin3 = atc_bin3 + 1; if (atc_min_distances[k] EQ 8) then atc_bin4 = atc_bin4 + 1; if (atc_min_distances[k] EQ 10) then atc_bin5 = atc_bin5 + 1; end; k = k+1; end; atc_bin1 = atc_bin1/atc_min_nonmiss; atc_bin2 = atc_bin2/atc_min_nonmiss; atc_bin3 = atc_bin3/atc_min_nonmiss; atc_bin4 = atc_bin4/atc_min_nonmiss; atc_bin5 = atc_bin5/atc_min_nonmiss; end; end; keep node_id_1 node_id_2 is_old_edge atc_bin1 atc_bin2 atc_bin3 atc_bin4 atc_bin5 ; run; quit; %mend add_ATC_codes_bins; /* Function to compute the Kullback-Leibler (KL) distance between a distribution and a desired reference distribution. This function if used to compute all KL-based covariates discussed in the paper. dist_type: what type of distribution--to distinguish between NET, TAX and INT covariates. bin_ds: is a data set containing the (discrete) distribution associated with each drug-ADE pair nbins: is the number of bins in that discrete distribution */ %macro compute_kldist(dist_type= ,bin_ds= ,nbins= ); proc means data=in_ds.&bin_ds mean noprint; var &dist_type._bin1-&dist_type._bin&nbins; where (is_old_edge = 1); output out=in_ds.&bin_ds.M; run; quit; data in_ds.&bin_ds.M;
21
set in_ds.&bin_ds.M; where (_STAT_ EQ "MEAN"); keep &dist_type._bin1-&dist_type._bin&nbins; run; quit; proc means data=in_ds.&bin_ds mean noprint; var &dist_type._bin1-&dist_type._bin&nbins; where (is_old_edge = 0); output out=in_ds.&bin_ds.N; run; quit; data in_ds.&bin_ds.N; set in_ds.&bin_ds.N; where (_STAT_ EQ "MEAN"); keep &dist_type._bin1-&dist_type._bin&nbins; run; quit; proc means data=in_ds.&bin_ds mean noprint; var &dist_type._bin1-&dist_type._bin&nbins; output out=in_ds.&bin_ds.Q; run; quit; data in_ds.&bin_ds.Q; set in_ds.&bin_ds.Q; where (_STAT_ EQ "MEAN"); keep &dist_type._bin1-&dist_type._bin&nbins; run; quit; %local macro_i ; data _null_; set in_ds.&bin_ds.M; if (_N_ = 1) then do; %let macro_i = 1; %do %while (¯o_i <= &nbins); corrected_bin = &dist_type._bin¯o_i + 0.000001; call symput("refBin1_¯o_i", corrected_bin); %let macro_i = %eval(¯o_i + 1); %end; end; run; quit; data _null_; set in_ds.&bin_ds.N; if (_N_ = 1) then do; %let macro_i = 1; %do %while (¯o_i <= &nbins); corrected_bin = &dist_type._bin¯o_i + 0.000001; call symput("refBin0_¯o_i", corrected_bin); %let macro_i = %eval(¯o_i + 1); %end; end; run; quit; data _null_; set in_ds.&bin_ds.Q; if (_N_ = 1) then do;
22
%let macro_i = 1; %do %while (¯o_i <= &nbins); corrected_bin = &dist_type._bin¯o_i + 0.000001; call symput("refBin01_¯o_i", corrected_bin); %let macro_i = %eval(¯o_i + 1); %end; end; run; quit; data in_ds.&bin_ds.K; length kl0_&dist_type kl1_&dist_type kl01_&dist_type 8; set in_ds.&bin_ds; kl0_&dist_type = 0; kl1_&dist_type = 0; kl01_&dist_type = 0; %let macro_i = 1; %do %while (¯o_i <= &nbins); if (&dist_type._bin¯o_i > 0) then do; kl0_&dist_type = kl0_&dist_type + &dist_type._bin¯o_i*(log2(&dist_type._bin¯o_i)-log2(&&refBin0_¯o_i)); kl1_&dist_type = kl1_&dist_type + &dist_type._bin¯o_i*(log2(&dist_type._bin¯o_i)-log2(&&refBin1_¯o_i)); kl01_&dist_type = kl01_&dist_type + &dist_type._bin¯o_i*(log2(&dist_type._bin¯o_i)-log2(&&refBin01_¯o_i)); end; %let macro_i = %eval(¯o_i + 1); %end; drop is_old_edge; run; quit; %mend compute_kldist; /* * Function to compute the covariate "meddra-min" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_meddra_h which also contains * the value of "meddra-min" covariate for each drug-ADE pair */ %macro add_meddra_min_dist_hlt(all_pairs_ds= ); /* create hash tables */ data in_ds.&all_pairs_ds._2005edge; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2;
23
run; quit; proc sort data=in_ds.&all_pairs_ds._2005edge; by node_id_1; run; quit; proc transpose data=in_ds.&all_pairs_ds._2005edge out=in_ds.&all_pairs_ds._2005edge_t prefix=node_id_2_; by node_id_1; var node_id_2; run; quit; data in_ds.&all_pairs_ds._2005edge_t; set in_ds.&all_pairs_ds._2005edge_t; drop node_id_2 _NAME_ _LABEL_; run; quit; data meddra.mdhier_hlt_hlgt; set meddra.mdhier; keep hlt_code hlgt_code; run; quit; proc sort data=meddra.mdhier_hlt_hlgt noduprecs; by hlt_code hlgt_code; run; quit; proc transpose data=meddra.mdhier_hlt_hlgt out=meddra.mdhier_hlt_hlgt_t prefix=hlgt_code; by hlt_code; var hlgt_code; run; quit; /* HLT to HLGT mapping */ data meddra.mdhier_hlt_hlgt_t; set meddra.mdhier_hlt_hlgt_t; drop _NAME_; run; quit; data meddra.mdhier_hlgt_soc; set meddra.mdhier; keep hlgt_code soc_code; run; quit; proc sort data=meddra.mdhier_hlgt_soc noduprecs; by hlgt_code soc_code; run; quit; proc transpose data=meddra.mdhier_hlgt_soc out=meddra.mdhier_hlgt_soc_t prefix=soc_code; by hlgt_code; var soc_code; run; quit;
24
/* HLGT to SOC mapping */ data meddra.mdhier_hlgt_soc_t; set meddra.mdhier_hlgt_soc_t; drop _NAME_; run; quit; /* perform meddra distance computations */ data in_ds.&all_pairs_ds._meddra_h; length meddra_h_min_val 8; length meddra_h_max_val 8; length meddra_h_mean_val 8; %let macro_i=1; %do %while (¯o_i < 213); /* (1 + max drug degree) in the 2005 network */ length node_id_2_¯o_i 8; %let macro_i = %eval(¯o_i + 1); %end; length hlt_code hlgt_code 8; length hlgt_code1 hlgt_code2 8; length hlgt_code11 hlgt_code12 hlgt_code21 hlgt_code22 8; length soc_code1 soc_code2 8; length soc_code11 soc_code12 soc_code21 soc_code22 8; set in_ds.&all_pairs_ds; array meddra_h_min_distances{212} 8; if (_N_ = 1) then do; declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._2005edge_t"); rc = neighborHash.definekey('node_id_1'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); declare hash hltHash(dataset: "meddra.mdhier_hlt_hlgt_t"); rc = hltHash.definekey('hlt_code'); rc = hltHash.definedata('hlgt_code1','hlgt_code2'); hltHash.definedone(); declare hash hlgtHash(dataset: "meddra.mdhier_hlgt_soc_t"); rc = hlgtHash.definekey('hlgt_code'); rc = hlgtHash.definedata('soc_code1','soc_code2'); hlgtHash.definedone(); end; meddra_h_min_val = .; meddra_h_max_val = .; meddra_h_mean_val = .; rc = neighborHash.find(); if (rc NE 0) then do; put "Could not find PubChem_Compound_ID " node_id_1; end; else do;
25
%let macro_i = 1; %do %while (¯o_i < 213); meddra_h_min_distances[¯o_i] = .; %let macro_i = %eval(¯o_i + 1); %end; hlt_code = node_id_2; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt1 in hltHash " hlt_code; end; else do; hlgt_code11 = hlgt_code1; hlgt_code12 = hlgt_code2; end; %let macro_i = 1; %do %while (¯o_i < 213); if (node_id_2_¯o_i NE . AND node_id_2_¯o_i NE node_id_2) then do; /* second HLT */ hlt_code = node_id_2_¯o_i; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt2 in hltHash " hlt_code; end; else do; hlgt_code21 = hlgt_code1; hlgt_code22 = hlgt_code2; end; if ((hlgt_code11 EQ hlgt_code21) OR (hlgt_code22 NE . AND hlgt_code11 EQ hlgt_code22) OR (hlgt_code12 NE . AND hlgt_code12 EQ hlgt_code21) OR (hlgt_code12 NE . AND hlgt_code22 NE . AND hlgt_code12 EQ hlgt_code22)) then do; meddra_h_min_distances[¯o_i] = 2; end; else do; /* hlgt_code11 */ hlgt_code = hlgt_code11; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code11 in hlgtHash " hlgt_code; end; soc_code11 = soc_code1; soc_code12 = soc_code2;
26
/* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code22 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; end; /* hlgt_code12 */ if (hlgt_code12 NE .) then do; hlgt_code = hlgt_code12; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code12 in hlgtHash " hlgt_code; end;
27
soc_code11 = soc_code1; soc_code12 = soc_code2; /* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code 21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; end; end; /* if hlgt_code12 NE . */ end; /* else do */ if (meddra_h_min_distances[¯o_i] EQ .) then meddra_h_min_distances[¯o_i] = 6; end; /* if node_id_2_¯o_i NE node_id_2 */
28
%let macro_i = %eval(¯o_i + 1); %end; meddra_h_min_val = min(of meddra_h_min_distances{*}); meddra_h_max_val = max(of meddra_h_min_distances{*}); meddra_h_mean_val = mean(of meddra_h_min_distances{*}); meddra_h_min_nonmiss = N(of meddra_h_min_distances{*}); end; /* else do */ keep node_id_1 node_id_2 meddra_h_min_val meddra_h_max_val meddra_h_mean_val is_old_edge is_old_edge_class is_new_edge is_new_edge_class is_test_pair is_test_pair_class; run; quit; %mend add_meddra_min_dist_hlt; /* * Function to compute the distribution of Meddra distances * in the neighborhood of a drug-ADE pair. This distribution is used * in the computation of covariate "meddra-KL" discussed in the * paper. * * all_pairs_ds is a list of all possible drug-ADE pairs * node_id_1: drug id (PubChem_ID) * node_id_2: AE id (HLT code) * * returns a new data set named <all_pairs_ds>_atcb which also contains * the distribution Meddra distances in the neighborhood of each pair */ %macro add_meddra_min_dist_bins(all_pairs_ds= ); data in_ds.&all_pairs_ds._05e; set in_ds.&all_pairs_ds; where (is_old_edge EQ 1); keep node_id_1 node_id_2; run; quit; proc sort data=in_ds.&all_pairs_ds._05e; by node_id_1; run; quit; proc transpose data=in_ds.&all_pairs_ds._05e out=in_ds.&all_pairs_ds._05e_t prefix=node_id_2_; by node_id_1; var node_id_2; run; quit;
29
data in_ds.&all_pairs_ds._05e_t; set in_ds.&all_pairs_ds._05e_t; drop node_id_2 _NAME_ _LABEL_; run; quit; data meddra.mdhier_hlt_hlgt; set meddra.mdhier; keep hlt_code hlgt_code; run; quit; proc sort data=meddra.mdhier_hlt_hlgt noduprecs; by hlt_code hlgt_code; run; quit; proc transpose data=meddra.mdhier_hlt_hlgt out=meddra.mdhier_hlt_hlgt_t prefix=hlgt_code; by hlt_code; var hlgt_code; run; quit; data meddra.mdhier_hlt_hlgt_t; set meddra.mdhier_hlt_hlgt_t; drop _NAME_; run; quit; data meddra.mdhier_hlgt_soc; set meddra.mdhier; keep hlgt_code soc_code; run; quit; proc sort data=meddra.mdhier_hlgt_soc noduprecs; by hlgt_code soc_code; run; quit; proc transpose data=meddra.mdhier_hlgt_soc out=meddra.mdhier_hlgt_soc_t prefix=soc_code; by hlgt_code; var soc_code; run; quit; data meddra.mdhier_hlgt_soc_t; set meddra.mdhier_hlgt_soc_t; drop _NAME_; run; quit; data in_ds.&all_pairs_ds._medb; length meddra_h_min_val 8; length meddra_h_max_val 8; length meddra_h_mean_val 8; length med_bin1-med_bin3 8;
30
%let macro_i=1; %do %while (¯o_i < 213); length node_id_2_¯o_i 8; %let macro_i = %eval(¯o_i + 1); %end; length hlt_code hlgt_code 8; length hlgt_code1 hlgt_code2 8; length hlgt_code11 hlgt_code12 hlgt_code21 hlgt_code22 8; length soc_code1 soc_code2 8; length soc_code11 soc_code12 soc_code21 soc_code22 8; set in_ds.&all_pairs_ds; array meddra_h_min_distances{212} 8; if (_N_ = 1) then do; declare hash neighborHash(dataset: "in_ds.&all_pairs_ds._05e_t"); rc = neighborHash.definekey('node_id_1'); rc = neighborHash.definedata(ALL: 'YES'); neighborHash.definedone(); declare hash hltHash(dataset: "meddra.mdhier_hlt_hlgt_t"); rc = hltHash.definekey('hlt_code'); rc = hltHash.definedata('hlgt_code1','hlgt_code2'); hltHash.definedone(); declare hash hlgtHash(dataset: "meddra.mdhier_hlgt_soc_t"); rc = hlgtHash.definekey('hlgt_code'); rc = hlgtHash.definedata('soc_code1','soc_code2'); hlgtHash.definedone(); end; meddra_h_min_val = .; meddra_h_max_val = .; meddra_h_mean_val = .; med_bin1 = 0; med_bin2 = 0; med_bin3 = 0; rc = neighborHash.find(); if (rc NE 0) then do; put "Could not find PubChem_Compound_ID " node_id_1; end; else do; %let macro_i = 1; %do %while (¯o_i < 213); meddra_h_min_distances[¯o_i] = .; %let macro_i = %eval(¯o_i + 1); %end; hlt_code = node_id_2; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt1 in hltHash " hlt_code;
31
end; else do; hlgt_code11 = hlgt_code1; hlgt_code12 = hlgt_code2; end; %let macro_i = 1; %do %while (¯o_i < 213); if (node_id_2_¯o_i NE . AND node_id_2_¯o_i NE node_id_2) then do; /* second HLT */ hlt_code = node_id_2_¯o_i; rc1 = hltHash.find(); if (rc1 NE 0) then do; put "Could not find hlt2 in hltHash " hlt_code; end; else do; hlgt_code21 = hlgt_code1; hlgt_code22 = hlgt_code2; end; if ((hlgt_code11 EQ hlgt_code21) OR (hlgt_code22 NE . AND hlgt_code11 EQ hlgt_code22) OR (hlgt_code12 NE . AND hlgt_code12 EQ hlgt_code21) OR (hlgt_code12 NE . AND hlgt_code22 NE . AND hlgt_code12 EQ hlgt_code22)) then do; meddra_h_min_distances[¯o_i] = 2; end; else do; /* hlgt_code11 */ hlgt_code = hlgt_code11; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code11 in hlgtHash " hlgt_code; end; soc_code11 = soc_code1; soc_code12 = soc_code2; /* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR
32
(soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code22 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; end; /* hlgt_code12 */ if (hlgt_code12 NE .) then do; hlgt_code = hlgt_code12; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code12 in hlgtHash " hlgt_code; end; soc_code11 = soc_code1; soc_code12 = soc_code2; /* hlgt_code21 */ hlgt_code = hlgt_code21; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find hlgt_code 21 in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2;
33
if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; /* hlgt_code22 */ if (hlgt_code22 NE .) then do; hlgt_code = hlgt_code22; rc1 = hlgtHash.find(); if (rc1 NE 0) then do; put "Could not find in hlgtHash " hlgt_code; end; soc_code21 = soc_code1; soc_code22 = soc_code2; if ((soc_code11 EQ soc_code21) OR (soc_code22 NE . AND soc_code11 EQ soc_code22) OR (soc_code12 NE . AND soc_code12 EQ soc_code21) OR (soc_code12 NE . AND soc_code22 NE . AND soc_code12 EQ soc_code22)) then do; meddra_h_min_distances[¯o_i] = 4; end; end; end; /* if hlgt_code12 NE . */ end; /* else do */ if (meddra_h_min_distances[¯o_i] EQ .) then meddra_h_min_distances[¯o_i] = 6; end; /* if node_id_2_¯o_i NE node_id_2 */ %let macro_i = %eval(¯o_i + 1); %end; meddra_h_min_val = min(of meddra_h_min_distances{*}); meddra_h_max_val = max(of meddra_h_min_distances{*}); meddra_h_mean_val = mean(of meddra_h_min_distances{*}); meddra_h_min_nonmiss = N(of meddra_h_min_distances{*}); k = 1; do while (k <= dim(meddra_h_min_distances)); if (meddra_h_min_distances[k] NE .) then do;
34
if (meddra_h_min_distances[k] EQ 2) then med_bin1 = med_bin1 + 1; if (meddra_h_min_distances[k] EQ 4) then med_bin2 = med_bin2 + 1; if (meddra_h_min_distances[k] EQ 6) then med_bin3 = med_bin3 + 1; end; k = k+1; end; med_bin1 = med_bin1/meddra_h_min_nonmiss; med_bin2 = med_bin2/meddra_h_min_nonmiss; med_bin3 = med_bin3/meddra_h_min_nonmiss; end; /* else do */ keep node_id_1 node_id_2 is_old_edge med_bin1-med_bin3 ; run; quit; %mend add_meddra_min_dist_bins;
35
SUPPLEMENTARY TABLES Table S1. Definition of covariates. Variable i denotes a drug, variable j denotes an ADR, and ( )N i denotes
the set of neighbors of node i. Definition of taxonomic covariates relies on the pre‐computed ATC‐ and
MedDRA‐based distances ATCd , MedDRAd discussed in the paper. The definition of intrinsic covariates
relies on the pre‐computed Euclidean distances INTd discussed in the paper.
Covariate name Covariate definition Additional information
Network covariate
degree‐prod 1 ( , ) ( ) ( )X i j i j= ´degree degree
degree‐sum 2 ( , ) ( ) ( )X i j i j= +degree degree
degree‐ratio 3 ( , ) ( ) / ( )X i j i j=degree degree
degree‐absdiff 4 ( , ) ( ) ( )X i j i j= -degree degree
jackard‐ADE‐max 5
( ) { }( , ) max { ( , )}
k N i jX i j J j k
Î -= ( , ) ( ) ( ) ( ) ( )J j k N j N k N j N k= Ç È denotes the
Jackard coefficient between the sets ( )N j
and ( )N k
jackard‐ADE‐KL 6 ( , )X i j : Kullback‐Leibler (KL) distance between the
distribution ( , )aeD i j of the variable
( , ), ( ) { }J i k k N j iÎ - and a reference distribution
The reference distribution aeD was
computed as the mean of distributions
( , )aeD i j over the training edges ( , )i j
jackard‐drug‐max 7
( ) { }( , ) max { ( , )}
k N j iX i j J i k
Î -=
jackard‐drug‐KL
8 ( , )X i j : KL distance between the distribution
( , )drugD i j of the variable ( , ), ( ) { }J j k k N i jÎ -
and a
reference distribution
The reference distribution drugD was
computed as the mean of distributions
( , )drugD i j over the training edges ( , )i j
edge‐density 9 ( , )X i j : The edge density in the subgraph induced
by ( ) ( ) { , }N i N j i jÈ - .
Taxonomic covariate
atc‐min 10
( ) { }( , ) min { ( , )}ATC
k N j iX i j d i k
Î -=
atc‐KL 11 ( , )X i j : KL distance between the distribution
( , )ATCD i j of the variable ( , ), ( ) { }ATCd i k k N j iÎ - and
a reference distribution
The reference distribution ATCD was
computed as the mean of distributions
( , )ATCD i j over the training edges ( , )i j
meddra‐min 12
( ) { }( , ) min { ( , )}MedDRA
k N i jX i j d j k
Î -=
meddra‐KL 13 ( , )X i j : KL distance between the distribution
( , )MedDRAD i j of the variable
( , ), ( ) { }MedDRAd j k k N i jÎ - and a reference
distribution
The reference distribution MedDRAD was
computed as the mean of distributions
( , )MedDRAD i j over all training edges ( , )i j
Intrinsic covariate
euclid‐min 14
( ) { }( , ) min { ( , )}INT
k N j iX i j d i k
Î -=
euclid‐KL 15 ( , )X i j : KL distance between the distribution
( , )INTD i j of the variable ( , ), ( ) { }INTd i k k N j iÎ - and
a reference distribution
The reference distribution INTD was
computed as the mean of distributions
( , )INTD i j over the training edges ( , )i j
36
Table S2. List of drugs and their ATC codes.
Drug name atc1 atc2 atc3 atc4 atc5 atc6 atc7 atc8 atc9 atc10 atc11
abacavir J05AF06
acamprosate N07BB03
acarbose A10BF01
acebutolol C07AB04
acenocoumarol B01AA07
acetaminophen N02BE01
acetazolamide S01EC01
acetic acid G01AD02 S02AA10
acetylcholine S01EB09
acetylcysteine R05CB01 S01XA08 V03AB23
acitretin D05BB02
acyclovir J05AB01 D06BB03 S01AD03
adapalene D10AD03
adefovir J05AF08
adenosine C01EB10
albendazole P02CA03
albuterol R03AC02 R03CC02
alclometasone D07AB10 S01BA10
alcohol (ethyl) D01AE06 D08AX08 V03AB16 V03AZ01
alendronate M05BA04
alfentanil N01AH02
alfuzosin G04CA01
allopurinol M04AA01
almotriptan N02CC05
alosetron A03AE01
alprostadil C01EA01 G04BE01
altretamine L01XX03
amantadine N04BB01
ambenonium N07AA30
37
amcinonide D07AC11
amifostine V03AF05
amikacin D06AX12 J01GB06 S01AA21
aminocaproic acid B02AA01
aminolevulinic acid
L01XD04
aminophylline R03DA05
aminosalicylic acid J04AA01
amiodarone C01BD01
amitriptyline N06AA09
amlexanox A01AD07 R03DX01
amobarbital N05CA02
amoxapine N06AA17
amoxicillin J01CA04
amphotericin b (conventional)
A01AB04 A07AA07 G01AA03 J02AA01
ampicillin J01CA01 S01AA19
amsacrine L01XX01
amyl nitrite V03AB22
anagrelide L01XX35
anastrozole L02BG03
apomorphine G04BE07 N04BC07
apraclonidine S01EA03
aprepitant A04AD12
aprotinin B02AB01
argatroban B01AE03
arginine B05XB01
aripiprazole N05AX12
arsenic trioxide L01XX27
ascorbic acid G01AD03 S01XA15
aspirin A01AD05 B01AC06 N02BA01
atazanavir J05AE08
38
atenolol C07AB03
atomoxetine N06BA09
atorvastatin C10AA05
atovaquone P01AX06
atracurium M03AC04
atropine A03BA01 S01FA01
auranofin M01CB03
azelaic acid D10AX03
azelastine R01AC03 R06AX19 S01GX07
azithromycin J01FA10 S01AA26
aztreonam J01DF01
bacitracin D06AX05 R02AB04
baclofen M03BX01
balsalazide A07EC04
beclomethasone A07EA07 D07AC15 R01AD01 R03BA01
benazepril C09AA07
bentoquatam W99WW99*
benzocaine C05AD03 D04AB04 N01BA05 R02AD01
benzonatate R05DB01
benzphetamine W99WW99*
benztropine N04AC01
betamethasone A07EA04 C05AA05 D07AC01 D07XC01 H02AB01 R01AD06 R03BA04 S01BA06 S01CB04 S02BA07 S03BA03
betaxolol C07AB05 S01ED02
bethanechol N07AB02
bexarotene L01XX25
bezafibrate C10AB02
bicalutamide L02BB03
bimatoprost S01EE03
bismuth A07BB01
bisoprolol C07AB07
bivalirudin B01AE06
39
bleomycin L01DC01
bortezomib L01XX32
bosentan C02KX01
brimonidine S01EA05
brinzolamide S01EC04
bromazepam N05BA08
bromfenac S01BC11
bromocriptine G02CB01 N04BC01
brompheniramine R06AB01
budesonide A07EA06 D07AC09 R01AD05 R03BA02
bumetanide C03CA02
bupivacaine N01BB01
buprenorphine N02AE01 N07BC01
bupropion N06AX12
buspirone N05BE01
busulfan L01AB01
butabarbital N05CA23
butenafine D01AE23
butoconazole G01AF15
butorphanol N02AF01
cabergoline G02CB03 N04BC06
caffeine N06BC01
calcipotriene D05AX02
calcitriol A11CC04 D05AX03
calcium acetate A12AA12
calcium chloride A12AA07 B05XA07 G04BA03
candesartan C09CA06
capecitabine L01BC06
capreomycin J04AB30
captopril C09AA01
carbachol N07AB01 S01EB02
40
carbidopa W99WW99*
carbinoxamine R06AA08
carboprost tromethamine
G02AD04
carisoprodol M03BA02
carmustine L01AD01
carteolol C07AA15 S01ED05
carvedilol C07AG02
caspofungin J02AX04
cefaclor J01DC04
cefadroxil J01DB05
cefdinir J01DD15
cefditoren J01DD16
cefepime J01DE01
cefixime J01DD08
cefotaxime J01DD01
cefotetan J01DC05
cefoxitin J01DC01
cefpodoxime J01DD13
cefprozil J01DC10
ceftazidime J01DD02
ceftibuten J01DD14
ceftizoxime J01DD07
cefuroxime J01DC02
celecoxib L01XX33 M01AH01
cephalexin J01DB01
cetirizine R06AE07
cetrorelix H01CC02
cevimeline N07AX03
chloral hydrate N05CC01
chlorambucil L01AA02
chloramphenicol D06AX02 D10AF03 G01AA05 J01BA01 S01AA01 S02AA01 S03AA08
41
chloroprocaine N01BA04
chloroquine P01BA01
chlorothiazide C03AA04
chlorpheniramine R06AB04
chlorpromazine N05AA01
chlorpropamide A10BB02
chlorthalidone C03BA04
chlorzoxazone M03BB03
cholecalciferol A11CC05
ciclopirox D01AE14 G01AX12
cidofovir J05AB12
cilazapril C09AA08
cilostazol B01AC31
cimetidine A02BA01
cinacalcet H05BX01
ciprofloxacin J01MA02 S01AX13 S02AA15 S03AA07
cisapride A03FA02
cisatracurium M03AC11
citalopram N06AB04
cladribine L01BB04
clarithromycin J01FA09
clemastine D04AA14 R06AA04
clindamycin D10AF01 G01AA10 J01FF01
clobazam N05BA09
clobetasol D07AD01
clocortolone D07AB21
clodronate M05BA02
clofarabine L01BB06
clomiphene G03GB02
clomipramine N06AA04
clopidogrel B01AC04
42
clorazepate N05BA05
clotrimazole A01AB18 D01AC01 G01AF02
cloxacillin J01CF02
clozapine N05AH02
cocaine N01BC01 R02AD03 S01HA01 S02DA02
codeine R05DA04
colchicine M04AC01
colesevelam C10AC04
colestipol C10AC02
colistimethate A07AA10 J01XB01
corticotropin H01AA01
cosyntropin H01AA03
crotamiton D04AX01
cyanocobalamin B03BA01
cyclizine R06AE03
cyclobenzaprine M03BX08
cyclopentolate S01FA04
cyclophosphamide L01AA01
cycloserine J04AB01
cyclosporine L04AD01 S01XA18
cyproheptadine R06AX02
cyproterone G03HA01
cysteamine A16AA04
cysteine R05CB16 S01XA21 V03AB36
dacarbazine L01AX04
danazol G03XA01
dantrolene M03CA01
dapsone J04BA02
darifenacin G04BD10
deferoxamine V03AC01
delavirdine J05AG02
43
demeclocycline D06AA01 J01AA01
desflurane N01AB07
desipramine N06AA01
desloratadine R06AX27
desmopressin H01BA02
desonide D07AB08 S01BA11
desoximetasone D07AC03 D07XC02
dexamethasone A01AC02 C05AA09 D07AB19 D07XB05 D10AA03 H02AB02 R01AD03 S01BA01 S01CB01 S02BA06 S03BA01
dexmedetomidine N05CM18
dexrazoxane V03AF02
dextroamphetamine
N06BA02
dextromethorphan R05DA09
diazepam N05BA01
diazoxide C02DA01 V03AH01
dibucaine C05AD04 D04AB02 N01BB06 S01HA06
diclofenac D11AX18 M01AB05 M02AA15 S01BC03
dicloxacillin J01CF01
dicyclomine A03AA07
didanosine J05AF02
diethylpropion A08AA03
diflorasone D07AC10
diflunisal N02BA11
digoxin C01AA05
dihydroergotamine
N02CA01
diltiazem C08DB01
dimenhydrinate R06AA02
dimethyl sulfoxide G04BX13 M02AX03
dinoprostone G02AD02
diphenhydramine D04AA32 R06AA02
dipivefrin S01EA02
44
dipyridamole B01AC07
disopyramide C01BA03
disulfiram N07BB01 P03AA04
dobutamine C01CA07
docetaxel L01CD02
docosanol D06BB11
dofetilide C01BD04
dolasetron A04AA04
domperidone A03FA03
donepezil N06DA02
dopamine C01CA04
dorzolamide S01EC03
doxapram R07AB01
doxazosin C02CA04
doxepin N06AA12
doxorubicin L01DB01
doxycycline A01AB22 J01AA02
dronabinol A04AD10
droperidol N01AX01 N05AD08
dutasteride G04CB02
dyclonine N01BX02 R02AD04
dyphylline R03DA01
echothiophate iodide
S01EB03
econazole D01AC03 G01AF05
edrophonium W99WW99*
efavirenz J05AG03
eletriptan N02CC06
emedastine S01GX06
emtricitabine J05AF09
enalapril C09AA02
enflurane N01AB04
45
enfuvirtide J05AX07
entacapone N04BX02
entecavir J05AF10
epinastine R06AX24 S01GX10
epirubicin L01DB03
eplerenone C03DA04
epoprostenol B01AC09
eprosartan C09CA02
eptifibatide B01AC16
ergocalciferol A11CC01
ergoloid mesylates C04AE01
ergonovine G02AB03
ergotamine N02CA02
erlotinib L01XE03
ertapenem J01DH03
erythromycin D10AF02 J01FA01 S01AA17
escitalopram N06AB10
esmolol C07AB09
estazolam N05CD04
estradiol G03CA03
estramustine L01XX11
eszopiclone N05CF04
ethacrynic acid C03CC01
ethambutol J04AK02
ethanolamine oleate
C05BB01
ethionamide J04AD03
ethosuximide N03AD01
ethotoin N03AB01
etodolac M01AB08
etomidate N01AX07
etoposide L01CB01
46
exemestane L02BG06
exenatide A10BX04
ezetimibe C10AX09
famciclovir J05AB09 S01AD07
famotidine A02BA03
felbamate N03AX10
felodipine C08CA02
fenofibrate C10AB05
fenoldopam C01CA19
fenoprofen M01AE04
fexofenadine R06AX26
finasteride D11AX10 G04CB01
flavoxate G04BD02
flecainide C01BC04
floxuridine L01BC54
fluconazole D01AC15 J02AC01
flucytosine D01AE21 J02AX01
fludarabine L01BB05
fludrocortisone H02AA02
flumazenil V03AB25
flunisolide R01AD04 R03BA03
fluocinonide C05AA11 D07AC08
fluorometholone C05AA06 D07AB06 D07XB04 D10AA01 S01BA07 S01CB05
fluorouracil L01BC02
fluoxymesterone G03BA01
flupenthixol N05AF01
fluphenazine N05AB02
flurandrenolide D07AC07
flurazepam N05CD01
flurbiprofen M01AE09 M02AA19 S01BC04
flutamide L02BB01
47
fluvastatin C10AA04
fluvoxamine N06AB08
folic acid B03BB01
fomepizole V03AB34
formoterol R03AC13
fosamprenavir J05AE07
foscarnet J05AD01
fosfomycin J01XX01
fosinopril C09AA09
fosphenytoin N03AB05
frovatriptan N02CC07
fulvestrant L02BA03
furosemide C03CA01
fusidic acid D06AX01 D09AA02 J01XC01 S01AA13
gabapentin N03AX12
gadopentetate dimeglumine
V08CA01
galantamine N06DA04
gallium nitrate V03AG02
ganciclovir J05AB06 S01AD09
gatifloxacin J01MA16 S01AX21
gefitinib L01XE02
gemcitabine L01BC05
gemfibrozil C10AB04
gemifloxacin J01MA15
gentamicin D06AX07 J01GB03 S01AA11 S02AA14 S03AA06
gentian violet D01AE02
glatiramer acetate L03AX13
gliclazide A10BB09
glimepiride A10BB12
glipizide A10BB07
glutamine A16AA03
48
glyburide A10BB01
glycerin A06AG04 A06AX01
glycopyrrolate A03AB02
goserelin L02AE03
granisetron A04AA02
griseofulvin D01AA08 D01BA01
guanabenz C02KX04
halobetasol D07AC21
haloperidol N05AD01
hexachlorophene D08AE01
homatropine S01FA05
hydralazine C02DB02
hydrochlorothiazide
C03AA03
hydrocortisone A01AC03 A07EA02 C05AA01 D07AA02 D07XA01 H02AB09 S01BA02 S01CB03 S02BA01
hydroxocobalamin B03BA03 V03AB33
hydroxychloroquine
P01BA02
hydroxyurea L01XX05
hydroxyzine N05BB01
hyoscyamine A03BA03
ibandronate M05BA06
ibuprofen C01EB16 G02CC01 M01AE01 M02AA13
ibutilide C01BD05
ifosfamide L01AA06
iloprost B01AC11
imatinib L01XE01
imipramine N06AA02
imiquimod D06BB10
inamrinone C01CE01
indapamide C03BA11
indinavir J05AE02
49
indomethacin C01EB03 M01AB01 M02AA23 S01BC01
ipratropium R01AX03 R03BB01
irbesartan C09CA04
irinotecan L01XX19
isocarboxazid N06AF01
isoflurane N01AB06
isoniazid J04AC01
isoproterenol C01CA02 R03AB02 R03CB01
isosorbide dinitrate
C01DA08 C05AE02
isosorbide mononitrate
C01DA14
isotretinoin D10AD04 D10BA01
isradipine C08CA03
itraconazole J02AC02
ivermectin P02CF01
kanamycin A07AA08 J01GB04 S01AA24
ketamine N01AX03
ketoconazole D01AC08 G01AF11 J02AB02
ketoprofen M01AE03 M02AA10
ketorolac M01AB15 S01BC05
ketotifen R06AX17 S01GX08
labetalol C07AG01
lactic acid G01AD01
lactulose A06AD11
lansoprazole A02BC03
latanoprost S01EE01
leflunomide L04AA13
lepirudin B01AE02
letrozole L02BG04
leuprolide L02AE02
levetiracetam N03AX14
50
levobunolol S01ED03
levocarnitine A16AA01
levofloxacin J01MA12 S01AX19
levonorgestrel G03AC03
levorphanol N02AX53
levothyroxine H03AA01
lidocaine C01BB01 C05AD01 D04AB01 N01BB02 R02AD02 S01HA07 S02DA01
lincomycin J01FF02
lindane P03AB02
linezolid J01XX08
liothyronine H03AA02
liotrix H03AA06
lisinopril C09AA03
lithium N05AN01
lomustine L01AD02
loperamide A07DA03
loratadine R06AX13
losartan C09CA01
lovastatin C10AA02
loxapine N05AH01
magnesium oxide A02AA02 A06AD02 A12CC10
magnesium sulfate A06AD04 A12CC02 B05XA05 D11AX05 V04CC02
malathion P03AX03
mannitol A06AD16 B05BC01 B05CX04
maprotiline N06AA21
mebendazole P02CA01
mecamylamine C02BB01
mechlorethamine L01AA05
meclizine A04AB04 R06AE05
medroxyprogesterone
G03AC06 G03DA02 L02AB02
mefenamic acid M01AG01
51
mefloquine P01BC02
megestrol G03AC05 G03DB02 L02AB01
meloxicam M01AC06
melphalan L01AA03
memantine N06DX01
mepenzolate A03AB12
meperidine N02AB02
mephobarbital N03AA01
mepivacaine N01BB03
meprobamate N05BC01
mercaptopurine L01BB02
meropenem J01DH02
mesalamine A07EC02
metaproterenol R03AB03 R03CB03 R03CB53
metaxalone M03AX02
methadone N07BC02
methamphetamine
N06BA03
methazolamide S01EC05
methimazole H03BB02
methocarbamol M03BA03
methohexital N01AF01 N05CA15
methotrexate L01BA01 L04AX03
methotrimeprazine
N05AA02
methoxsalen D05AD02 D05BA02
methsuximide N03AD03
methyclothiazide C03AA08
methyldopa C02AB01
methylergonovine G02AB01
methylphenidate N06BA04
methylprednisolone
D07AA01 D10AA02 H02AB04
52
metipranolol S01ED04
metoclopramide A03FA01
metolazone C03BA08
metoprolol C07AB02
metyrapone V04CD01
metyrosine C02KB01
mexiletine C01BB02
micafungin J02AX05
midazolam N05CD08
midodrine C01CA17
mifepristone G03XB01
miglitol A10BF02
miglustat A16AX06
milrinone C01CE02
minocycline A01AB23 J01AA08
minoxidil C02DC01 D11AX01
mirtazapine N06AX11
misoprostol A02BB01 G02AD06
mitomycin L01DC03
mitotane L01XX23
mitoxantrone L01DB07
moclobemide N06AG02
modafinil N06BA07
moexipril C09AA13
molindone N05AE02
monobenzone D11AX13
montelukast R03DC03
moxifloxacin J01MA14 S01AX22
mupirocin D06AX09 R01AX06
nabilone A04AD11
nabumetone M01AX01
53
nadolol C07AA12
nafarelin H01CA02
nafcillin J01CF06
naftifine D01AE22
nalbuphine N02AF02
naloxone V03AB15
naltrexone N07BB04
nandrolone A14AB01 S01XA11
naproxen G02CC02 M01AE02 M02AA12
naratriptan N02CC02
natamycin A01AB10 A07AA03 D01AA02 G01AA02 S01AA10
nateglinide A10BX03
nedocromil R01AC07 R03BC03 S01GX04
nefazodone N06AX06
nelfinavir J05AE04
neomycin A01AB08 A07AA01 B05CA09 D06AX04 J01GB05 R02AB01 S01AA03 S02AA07 S03AA01
neostigmine N07AA01 S01EB06
nevirapine J05AG01
niacin C10AD02
nicardipine C08CA04
nicotine N07BA01
nifedipine C08CA05
nilutamide L02BB02
nisoldipine C08CA07
nitazoxanide P01AX11
nitisinone A16AX04
nitrazepam N05CD02
nitric oxide R07AX01
nitrofurantoin J01XE01
nitroglycerin C01DA02 D03AX07
nitroprusside C02DD01
54
nizatidine A02BA04
norepinephrine C01CA03
norethindrone G03AC01 G03DC02
norfloxacin J01MA06 S01AX12
nortriptyline N06AA10
nystatin A07AA02 D01AA01 G01AA01
octreotide H01CB02
ofloxacin J01MA01 S01AX11
olmesartan C09CA08
olopatadine R01AC08 S01GX09
olsalazine A07EC03
ondansetron A04AA01
orlistat A08AB01
orphenadrine M03BC01 N04AB02
oseltamivir J05AH02
oxacillin J01CF04
oxaliplatin L01XA03
oxandrolone A14AA08
oxaprozin M01AE12
oxazepam N05BA04
oxiconazole D01AC11 G01AF17
oxybutynin G04BD04
oxymetazoline R01AA05 R01AB07 S01GA04
oxymorphone N02AX54
oxytocin H01BB02
paclitaxel L01CD01
palonosetron A04AA05
pamidronate M05BA03
pancuronium M03AC01
pantoprazole A02BC02
papaverine A03AD01 G04BE02
55
paricalcitol A11CC07
paromomycin A07AA06
pemetrexed L01BA04
pemirolast S01GX52
penbutolol C07AA23
penciclovir D06BB06 J05AB13
penicillamine M01CC01
pentamidine P01CX01
pentazocine N02AD01
pentostatin L01XX08
pentoxifylline C04AD03
permethrin P03AC04
perphenazine N05AB03
phenazopyridine G04BX06
phendimetrazine A08AX02
phenelzine N06AF03
phenoxybenzamine
C04AX02
phentermine A08AA01
phentolamine C04AB01 G04BE05
phenylephrine C01CA06 R01AA04 R01AB01 R01BA03 S01FB01 S01GA05
phenytoin N03AB02
physostigmine S01EB05 V03AB19
phytonadione B02BA01
pilocarpine N07AX01 S01EB01
pimecrolimus D11AX15
pimozide N05AG02
pindolol C07AA03
pioglitazone A10BG03
piperacillin J01CA12
piperazine P02CB01
pipotiazine N05AC04
56
pirbuterol R03AC08 R03CC07
piroxicam M01AC01 M02AA07 S01BC06
porfimer L01XD01
potassium chloride A12BA01 B05XA01
pralidoxime V03AB04
pramipexole N04BC05
pravastatin C10AA03
praziquantel P02BA01
prazosin C02CA01
prednicarbate D07AC18
prednisolone A07EA01 C05AA04 D07AA03 D07XA02 H02AB06 R01AD02 S01BA04 S01CB02 S02BA03 S03BA02
prednisone A07EA03 H02AB07
prilocaine N01BB04
primaquine P01BA03
primidone N03AA03
probenecid M04AB01
probucol C10AX02
procainamide C01BA02
procaine C05AD05 N01BA02 S01HA05
procarbazine L01XB01
prochlorperazine N05AB04
procyclidine N04AA04
progesterone G03DA04
promethazine D04AA10 R06AD02
propafenone C01BC03
propantheline A03AB05
proparacaine S01HA04
propofol N01AX10
propoxyphene N02AC04
propranolol C07AA05
propylthiouracil H03BA02
57
protriptyline N06AA11
pseudoephedrine R01BA02
pyrazinamide J04AK01
pyridostigmine N07AA02
pyridoxine A11HA02
pyrimethamine P01BD01
quazepam N05CD10
quinapril C09AA06
rabeprazole A02BC04
raloxifene G03XC01
raltitrexed L01BA03
ramelteon N05CH02
ramipril C09AA05
ranitidine A02BA02
remifentanil N01AH06
repaglinide A10BX02
reserpine C02AA02
ribavirin J05AB04
riboflavin A11HA04
rifabutin J04AB04
rifampin J04AB02
rifapentine J04AB05
rifaximin A07AA11 D06AX11
riluzole N07XX02
rimantadine J05AC02
rimexolone H02AB12 S01BA13
risedronate M05BA07
risperidone N05AX08
ritonavir J05AE03
rivastigmine N06DA03
rizatriptan N02CC04
58
rocuronium M03AC09
ropinirole N04BC04
ropivacaine N01BB09
rosiglitazone A10BG02
rosuvastatin C10AA07
salicylic acid D01AE12 S01BC08
salmeterol R03AC12
salsalate N02BA06
saquinavir J05AE01
secobarbital N05CA06
selegiline N04BD01
selenium sulfide D01AE13
sertaconazole D01AC14
sertraline N06AB06
sevelamer V03AE02
sevoflurane N01AB08
sibutramine A08AA10
sildenafil G04BE03
silver sulfadiazine D06BA01
simvastatin C10AA01
sirolimus L04AA10
sodium bicarbonate
B05CB04 B05XA02
solifenacin G04BD08
sotalol C07AA07
spironolactone C03DA01
stavudine J05AF04
streptomycin A07AA04 J01GA01
streptozocin L01AD04
succimer V09CA02 V09IA03
succinylcholine M03AB01
sucralfate A02BX02
59
sulfacetamide S01AB04
sulfasalazine A07EC01
sulfisoxazole J01EB05 S01AB02
sulindac M01AB02
suramin P01CX02
tacrolimus D11AX14 L04AD02
tadalafil G04BE08
tamoxifen L02BA01
tamsulosin G04CA02
tazarotene D05AX05
tegaserod A03AE02
telithromycin J01FA15
telmisartan C09CA07
temazepam N05CD07
temozolomide L01AX03
teniposide L01CB02
tenofovir J05AF07
terazosin G04CA03
terbinafine D01AE15 D01BA02
terbutaline R03AC03 R03CC03
terconazole G01AG02
teriparatide H05AA02
testosterone G03BA03
tetrabenazine N07XX06
tetracycline A01AB13 D06AA04 J01AA07 S01AA09 S02AA08 S03AA02
thalidomide L04AX02
theophylline R03DA04
thiamine A11DA01
thioguanine L01BB03
thiopental N01AF03 N05CA19
thioridazine N05AC02
60
thiothixene N05AF04
tiaprofenic acid M01AE11
ticlopidine B01AC05
tigecycline J01AA12
tiludronate M05BA05
timolol C07AA06 S01ED01
tinidazole J01XD02 P01AB02
tioconazole D01AC07 G01AF08
tiotropium R03BB04
tipranavir J05AE09
tirofiban B01AC17
tobramycin J01GB01 S01AA12
tolazamide A10BB05
tolbutamide A10BB03 V04CA01
tolcapone N04BX01
tolmetin M01AB03 M02AA21
tolnaftate D01AE18
tolterodine G04BD07
topiramate N03AX11
topotecan L01XX17
toremifene L02BA02
torsemide C03CA04
trandolapril C09AA10
tranexamic acid B02AA02
tranylcypromine N06AF04
travoprost S01EE04
treprostinil B01AC21
triamcinolone A01AC01 D07AB09 D07XB02 H02AB08 R01AD11 R03BA06 S01BA05
triamterene C03DB02
triazolam N05CD05
trifluoperazine N05AB06
61
trifluridine S01AD02
trihexyphenidyl N04AA01
trimethobenzamide
A04AD55
trimethoprim J01EA01
trimipramine N06AA06
tropicamide S01FA06
trospium G04BD09
urea A10BB32
urofollitropin G03GA04
ursodiol A05AA02
valrubicin L01DB09
valsartan C09CA03
vancomycin A07AA09 J01XA01
vardenafil G04BE09
vasopressin H01BA01
vecuronium M03AC03
venlafaxine N06AX16
verapamil C08DA01
verteporfin S01LA01
vigabatrin N03AG04
vinblastine L01CA01
vincristine L01CA02
vindesine L01CA03
vinorelbine L01CA04
vitamin a A11CA01
vitamin e A11HA03
voriconazole J02AC03
warfarin B01AA03
yohimbine G04BE04
zafirlukast R03DC01
zaleplon N05CF03
62
zanamivir J05AH01
zidovudine J05AF01
zileuton R03DX08
ziprasidone N05AE04
zoledronic acid M05BA08
zolmitriptan N02CC03
zolpidem N05CF02
zonisamide N03AX15
zopiclone N05CF01
zuclopenthixol N05AF05
Note: * indicates a placeholder code that we created in the case of four drugs for which we were unable to determine an ATC code.
Table S3. Number of missing observations for PubChem properties extracted for this study. The total number of drugs in the study was 809. *XLogP3 and Tautomer Count were excluded from the study due to the missing values.
Property name Number of missing observations
Molecular Weight 0 XLogP3* 41 H Bond Donor 0 H Bond Acceptor 0 Rotatable Bond Count 0 Tautomer Count* 336 Topol Polar Surface Area 0 Heavy Atom Count 0 Formal Charge 0 Complexity 0 Isotope Atom Count 0 Defined Atom StereoCenter (SC) Count 0 Undefined Atom SC Count 0 Defined Bond SC Count 0 Undefined Bond SC Count 0 Covalently Bonded (CB) Unit Count 0
63
Table S4. Number of missing observations for DrugBank properties extracted for this study. The total number of drugs in the study was 809. Melting Point and Half Life were excluded from the study owing to the missing values. The remaining two properties (Exp LogP Hydrophobicity and Protein Binding) were initially included in the study through data imputation. The effect of the imputed data on the predictive performance was assessed by excluding these two properties from the model as well.
Property name Number of missing observations
Exp LogP Hydrophobicity 91 Protein Binding 218 Melting Point 261 Half Life 450
64
Table S5. Intercorrelation analysis of all covariates. The highest positive and negative Pearson correlations are bolded.
degree‐prod
degree‐absdiff
jackard‐ADE‐max
jackard‐drug‐max
jackard‐ADE‐KL
jackard‐drug‐KL
atc‐min
meddra‐min
atc‐KL
meddra‐KL
euclid‐min
degree‐absdiff
0.55
jackard‐ADE‐max
0.71 0.64
jackard‐drug‐max
0.5 0.4 0.57
jackard‐ADE‐KL
‐0.62 ‐0.37 ‐0.77 ‐0.55
jackard‐drug‐KL
‐0.36 ‐0.2 ‐0.4 ‐0.73 0.45
atc‐min ‐0.47 ‐0.36 ‐0.61 ‐0.51 0.66 0.37
meddra‐min
‐0.27 ‐0.23 ‐0.28 ‐0.41 0.25 0.37 0.22
atc‐KL ‐0.15 ‐0.08 ‐0.18 ‐0.12 0.24 0.17 ‐0.02 0.05
meddra‐KL
‐0.09 ‐0.04 ‐0.04 ‐0.16 0.06 0.21 0.02 0.17 0.02
euclid‐min
‐0.08 ‐0.06 ‐0.11 ‐0.16 0.12 0.15 0.14 0.05 0.04 0
euclid‐KL
‐0.37 ‐0.21 ‐0.53 ‐0.48 0.67 0.49 0.56 0.18 0.31 0.02 0.17
65
Table S6. Prediction cases studies. The selected drug‐ADE pairs represent some prominent drug‐ADE associations newly discovered during the period of 2006 to 2010.
Drug name ADE Score Spec (score) PPV (score)
Aprotinin Anaphylaxis 0.15895 0.99 0.67
Ibandronate Osteonecrosis 0.05089 0.91 0.09
Norfloxacin Tendon ruptures 0.20902 0.95 0.32
Rosiglitazone Myocardial infarction 0.04632 0.97 0.33
Saquinavir Torsade de pointes 0.08833 0.94 0.49
Saquinavir Electrocardiogram QT prolonged
0.04748 0.89 0.39
Tegaserod Stroke 0.24184 0.98 0.05
Zonisamide Suicidal ideation 0.16818 0.93 0.04
Table S7. List of supplementary source code files.
File name Comments
meddra_mapping_code.sas SAS code to perform MedDRA mapping
NET_INT_covariates.r R code to compute network and intrinsic covariates
TAX_covariates.sas SAS code to compute taxonomic covariates
66
SUPPLEMENTARY FIGURES
A B C D G H J L M N P R S V0
20
40
60
ATC top-level group
Mean n
um
ber of AEs
Fig. S1. Newly associated ADEs per drug in each ATC top‐level group. ATC top‐level groups: A, alimentary tract and metabolism; B, blood and blood forming organs; C, cardiovascular system; D, dermatologicals; G, genito‐urinary system and sex hormones; H, systemic hormonal preparations; J, anti‐infectives for systemic use; L, antineoplastic and immunomodulating agents; M, musculo‐skeletal system; N, nervous system; P, antiparasitic products, insecticides and repellents; R, respiratory system; S, sensory organs; V, various. Data are means and error bars represent 95% CIs.
67
blo
car
con
ear
end
eye
gas
gen
hep
imm inf
inj
inv
met
mus
neo
ner
pre
psy
ren
rep
res
ski
sur
vas
0
20
40
60
MedDRA top-level group
Mean n
um
ber of dru
gs
Fig. S2. Newly associated drugs per ADE in each MedDRA top‐level group. MedDRA top‐level groups: blo, blood and lymphatic system disorders; car, bardiac disorders; con, congenital, familial and genetic disorders; ear, ear and labyrinth disorders; end, endocrine disorders; eye, eye disorders; gas, gastrointestinal disorders; gen, general disorders and administration site conditions; hep, hepatobiliary disorders; imm, immune system disorders; inf, infections and infestations; inj, injury, poisoning and procedural complications; inv, investigations; met, metabolism and nutrition disorders; mus, musculoskeletal and connective tissue disorders; neo, neoplasms benign, malignant and unspecified; ner, nervous system disorders; pre, pregnancy, puerperium and perinatal conditions; psy, psychiatric disorders; ren, renal and urinary disorders; rep, reproductive system and breast disorders; res, respiratory, thoracic and mediastinal disorders; ski, skin and subcutaneous tissue disorders; sur, surgical and medical procedures; vas, vascular disorders. Data are means and error bars represent 95% CIs.
68
A B
0 0.080.16 0.280.36 0.480.56 0.680.76 0.880.96
score
0
20
40
60
80
pe
rce
nt
Mean = 0.03Std = 0.09
0 0.080.16 0.28 0.4 0.48 0.60.680.76 0.880.96
score
0
4
8
12
16
20
pe
rce
nt
Mean = 0.46Std = 0.36
C D
0 0.04 0.10.14 0.20.24 0.30.34 0.40.44 0.50.54
score
0
20
40
60
80
pe
rce
nt Mean = 0.04
Std = 0.09
0 0.04 0.10.14 0.20.24 0.30.34 0.40.44 0.50.54
score
0
4
8
12
16
pe
rce
ntMean = 0.25Std = 0.17
E F
0 0.020.040.060.08 0.1 0.120.140.160.18 0.2 0.22
score
0
10
20
30
40
50
pe
rce
nt Mean = 0.05
Std = 0.06
0 0.020.040.060.08 0.1 0.120.140.160.18 0.2 0.22
score
0
4
8
12
16
pe
rce
nt Mean = 0.14
Std = 0.07
Fig. S3. Comparative histograms of scores the observed edges and non‐edges by the three model types. (A and B) NET model, non‐edges (A) and edges (B). (C and D) TAX model, non‐edges (C) and edges (D). (E and F) INT model, non‐edges (E) and edges (F).
for
69
True positives False positives
Fig. S4. Three‐way Venn diagrams for the sets of true positives and false positives generated by models NET, TAX, and INT. Specificity was fixed at 0.95.
70
Pairs predicted as non‐edges Pairs predicted as edges
0 4000 10000 16000 22000 28000 34000 40000
degree-prod
0
20
40
60
80
pe
rce
nt
0 4000 10000 16000 22000 28000 34000 40000
degree-prod
0
4
8
12
16
pe
rce
nt
0 50100 200 300 400 500 600 700
degree-absdiff
0
20
40
60
80
pe
rce
nt
0 50100 200 300 400 500 600 700
degree-absdiff
0
5
10
15
20
pe
rce
nt
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
jackard-ae-max
0
10
20
30
40
50
pe
rce
nt
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
jackard-ae-max
0
4
8
12
16
pe
rce
nt
00.4 1.2 22.4 3.2 44.4 5.2 66.4 7.2 8
jackard-ae-KL
0
10
20
30
40
50
pe
rce
nt
00.4 1.2 22.4 3.2 44.4 5.2 66.4 7.2 8
jackard-ae-KL
0
10
20
30
40
pe
rce
nt
Fig. S5. Comparative histograms of selected network covariates or the predicted edges and non‐edges. The predictions were generated by fixing the specificity of model NET at 0.95.
f
71
Pairs predicted as non‐edges Pairs predicted as edges
2 4 6 8 10 12
atc-min
0
10
20
30
40
50
pe
rce
nt
2 4 6 8 102 4 6 8 10 12
atc-min
0
20
40
60
80
pe
rce
nt
2 4 6 8 10
00.3 0.9 1.5 2.12.55 33.3 3.9 4.5 5.15.55 6
atc-KL
0
10
20
30
40
50
pe
rce
nt
00.3 0.9 1.5 2.12.55 33.3 3.9 4.5 5.15.55 6
atc-KL
0
20
40
60
pe
rce
nt
Fig. S6. Comparative histograms of selected taxonomic covariates or the predicted edges and non‐edges. The predictions were generated by fixing the specificity of model TAX at 0.95.
f
72
Pairs predicted as non‐edges Pairs predicted as edges
010 30 50 70 90 110 130 150 170 190 210
euclid-min
0
2
4
6
8
10
12
pe
rce
nt
010 30 50 70 90 110 130 150 170 190 210
euclid-min
0
5
10
15
20
25
30
pe
rce
nt
00.30.81.31.82.32.83.33.84.34.85.35.86.36.8
euclid-KL
0
2
4
6
8
10
pe
rce
nt
00.30.81.31.82.32.83.33.84.34.85.35.86.36.8
euclid-KL
0
20
40
60
80
pe
rce
nt
Fig. S7. Comparative histograms of the intrinsic covariates or the predicted edges and non‐edges. The predictions were generated by fixing the specificity of model INT at 0.95.
f
73
A
A B C D G H J L M N P R S V0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ATC top-level category
AU
RO
C
B
0 20 40 60 80 100 120 140 160 180 2000.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
AUROC
Regression line
Number of newly associated ADEs
AU
RO
C
Fig. S8. Drug‐specific AUROCs. (A) AUROCs were grouped by ATC top‐level category. Group means and 95% CIs (error bars) are shown in red. (B) AUROC was plotted against the number of newly associated ADEs. A regression line with slope of 0.00003 and P = 0.86 (F test) is shown.
74
A
blo
car
con
ear
end
eye
gas
gen
hep
imm inf
inj
inv
met
mus
neo
ner
pre
psy
ren
rep
res
ski
sur
vas
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
MedDRA top-level category
AU
RO
C
B
0 20 40 60 80 100 120 1400.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
AUROC
Regression line
Number of newly associated drugs
AU
RO
C
Fig. S9. ADE‐specific AUROCs. (A) AUROCs were grouped by MedDRA top‐level category. Group means and 95% CIs (error bars) are shown in red. (B) AUROC was plotted against the number of newly associated drugs. A regression line with slope of ‐0.0015 and P < 0.0001 (F test) is shown.