supplementary material on “drug-target network” · 3 by drugbank. atc codes are controlled by...

19
1 Supplementary Material on “Drug-Target Network” Muhammed A. Yıldırım, Kwang-Il Goh, Michael E. Cusick, Albert-László Barabási & Marc Vidal Contents I. Chemical Similarity between Drugs Targeting the Same Protein II. Construction of the Drug Target Network III. Component Size Distributions of DN and TPN IV. Topological features of DN and TPN V. Essentiality of Drug Targets VI. Drug Targets in the Human Protein-Protein Interaction Network VII. Topological Features of Drug Targets in the Human PPI Network VIII. Human Disease Network IX. Properties of Drug Targets in the Human Disease Network X. Expression Profiles of Drug Targets and Disease Genes XI. Supplementary References

Upload: dothuan

Post on 02-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

1

Supplementary Material on “Drug-Target Network” Muhammed A. Yıldırım, Kwang-Il Goh, Michael E. Cusick, Albert-László Barabási & Marc Vidal

Contents

I. Chemical Similarity between Drugs Targeting the Same Protein

II. Construction of the Drug Target Network

III. Component Size Distributions of DN and TPN

IV. Topological features of DN and TPN

V. Essentiality of Drug Targets

VI. Drug Targets in the Human Protein-Protein Interaction Network

VII. Topological Features of Drug Targets in the Human PPI Network

VIII. Human Disease Network

IX. Properties of Drug Targets in the Human Disease Network

X. Expression Profiles of Drug Targets and Disease Genes

XI. Supplementary References

Page 2: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

2

I. Chemical Similarity between Drugs Targeting the Same Protein

Large numbers of drugs target common proteins, e.g. the Histamine H1 Receptor (HRH1)

(targeted by 51 drugs), the Muscarinic 1 Cholinergic Receptor (CHRM1) (targeted by 48 drugs),

the α1A Adrenergic Receptor (ADRA1A) (targeted by 42 drugs), and the Dopamine Receptor D2

(DRD2) (targeted by 40 drugs). All of these drugs are chemically different, but they might show

chemical similarities due to a common ancestry. The DrugBank website provides a chemical

search tool that can be used to find drugs with similar structures to a submitted query chemical.

We developed a web robot to utilize this tool, and we built networks for the drugs targeting the

same protein. An exact match gives a score of 20, so we set a cut-off of 15 to draw an edge

between two proteins (Supplementary Fig. 1a). Proteins targeted by many drugs tend not to be

connected to any other drug in the chemical similarity network showing that their chemical

structure is unique. However, there are also drugs connected to each other exhibiting a clique-

like nature. For instance, drugs targeting HRH1 have 5 such cliques composed of 36 drugs

(Supplementary Fig. 1b), drugs targeting ADRA1A have 2 cliques composed of 14 drugs

(Supplementary Fig. 1c), drugs targeting DRD2 have 3 cliques composed of 22 drugs

(Supplementary Fig. 1d), and drugs targeting CHRM1 have 2 cliques composed of 21 drugs

(Supplementary Fig. 1e).

II. Construction of the Drug Target Network

DrugBank combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with

comprehensive drug target (i.e. sequence, structure, and pathway) information. We downloaded

the drug and drug target information from DrugBank Database as of March 29 2006. We used

the SwissProt ID to discriminate only drugs targeting human proteins. There are 890 FDA

approved drugs and 808 experimental drugs complying with these criteria. The FDA-approved

drugs target 394 human proteins, and experimental drugs target 731 human proteins. The total

number of drug target proteins reported in Drugbank is 1,011.

By using the drug – target associations we generated the bipartite graph shown in Fig. 2.

We used Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/) to visualize the network. In this

map, circular nodes represent the drugs and the rectangular boxes the drug target proteins. The

area of the nodes is proportional to the degree in the network. The coloring for the drug nodes

was done following the Anatomical Therapeutic Chemical (ATC) classification code provided

Page 3: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

3

by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug

Statistics Methodology. Drugs are classified into groups at five different levels, and we used the

first level (the main category) to color the nodes. For drugs with more than one classification, we

applied majority voting among different codes. Target proteins were colored according to their

cellular component profiles, by filtering the Gene Ontology information to map the components

to Membrane, Cytoplasm, Organelles, Nucleus, Secreted and Not Available (including other

cellular components and unknown proteins). We considered proteins residing in organellar

membranes as membrane proteins. The length of the edges in this network was varied to make

the graph layout optimally viewable.

Next, we generated the drug network (DN) and the target protein network (TPN)

projections of the DTN. In the DN, nodes represent drugs and the connections are made when

two drugs share at least one target protein (Supplementary Fig. 2). In the protein centric TPN,

protein nodes are connected if they are both targeted by at least one drug simultaneously (Fig. 3).

The edge thickness between two proteins in the TPN is proportional to the number of drugs

targeting these proteins together. The color scheme for the DN and TPN is the same as the DTN.

Layouts of all networks were generated by a simple force-directed algorithm, followed by a local

manual rearrangement for visual clarity, while leaving the overall layout of the network

unperturbed.

III. Component Size Distributions of DN and TPN

The DN and TPN exhibit different component size distributions compared to 10,000 control

randomized networks obtain by randomizing the drug – target protein associations while keeping

both the number of proteins that a drug targets and the number of drugs that a protein is targeted

fixed (Supplementary Fig. 3a,b). The giant component of the DN contains 53% of all drugs,

whereas the giant component of the TPN contains 31% of all drug target proteins. According to

random graph theory, as the density of edges in a graph increases a giant component forms

whose size scales extensively and generally the proportion of nodes in the giant component is

around 80-90%1. Our randomized networks are no exception to this generalization; however, our

observed network giant component sizes are much smaller than the random expectation. A

similar pattern is observed after the inclusion of experimental drugs and their targets

(Supplementary Fig. 3c,d). The pattern is reversed for experimental drugs for which the giant

Page 4: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

4

component size is larger than the random case (Supplementary Fig. 3e,f). Generally, the sizes

of the second and third largest components are larger than the random control in all graphs.

For a given drug category, the number of drugs in the giant component and the number of

distinct components that the drugs in this category are given in the following table:

Drug Category Total Number of Drugs

Number of Drugs in the Giant Component

Number of Components that the category is present

Anti-Infectives 16 1 4 Antineoplastics 95 9 46 Antiparasitic 7 0 2 Blood 39 1 13 Cardiovascular 151 93 23 Dermatological 25 1 8 Genito-Urinary 52 39 6 Hormones 20 0 9 Metabolism 91 40 26 Musculoskeletal 54 13 10 Nervous System 211 194 7 Respiratory 65 57 4 Sensory Organs 36 22 4 Various 21 6 15

Tyrosine kinase inhibitors generally function through targeting many proteins at once.

Especially after the success of Imatinib, many kinase inhibitors were developed and are currently

in the approval pipeline. These drugs might be responsible for the polypharmacology and more

random associations observed in the experimental drugs. To test this effect, we excluded all the

tyrosine kinase inhibitors, i.e. a drug that has at least one tyrosine kinase target, from the giant

component analysis. The giant component sizes for the drugs and the targets were 616 and 596

respectively. After 104 randomizations of the drug and target associations, the expected giant

component sizes for the experimental drugs and their targets were 599 ± 12 and 551 ± 10. The

effect of polypharmacology results in a larger than expected giant component size. Upon

excluding the tyrosine kinase inhibitors, the giant component size of the drug network decreased

to 529 which is smaller than the expected giant component size of the randomized networks (549

± 12). However, the observed giant component size of the target network (472) remains larger

Page 5: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

5

than the expected giant component size (424 ± 9). Hence, we cannot conclude that the tyrosine

kinase inhibitors are the major contributors to the increased polypharmacology.

IV. Topological features of DN and TPN

A scale-free degree distribution, where degree is the number of connections per node, implies a

preferential attachment model of network growth, particularly when new network nodes are

added one at a time, as in the growth of the drug target network. Both the DN and TPN exhibit

an apparent scale-free like degree (k) distribution (Supplementary Fig. 4a-d), with most drugs

(proteins) linked to only a few other drugs (proteins), while a few drugs (proteins) represent hubs

that are connected to a large number of distinct drugs (proteins). The exponents of the observed

scale-free distributions are small compared to other biological networks. A small exponent

indicates a strong preferential attachment model for network growth2. These results imply that

most new connections are made through a handful of existing nodes, especially through highly

targeted proteins. Upon inclusion of experimental drugs the TPN degree distribution still shows

an apparent scale-free distribution, although the exponent value decreases from 1.59 ± 0.17 to

1.05 ± 0.10. This indicates that experimental drugs introduce more random associations between

the targeted proteins. There is no significant change in the exponent of the DN after the addition

of experimental drugs.

Both the DN and TPN display modular structure. A module in the DN consists of group

of related drugs that target more than one common target, while a module in the TPN

corresponds to one or more drugs targeting all or most proteins in the module. In the TPN

modules appear as complete cliques, where all proteins in the clique are fully connected to the

other members of the clique (Fig. 3). The average clustering coefficient measures the modularity

or cliquishness of a network3, 4. To test for the clustering coefficient, we generated randomized

versions of the DN and the TPN by randomizing the connections between drugs and their targets

in the bipartite DTN while keeping the degree distributions constant. Then we projected

randomized DN and TPN from this randomized DTN. In the DN 509 out of 810 drugs have

clustering coefficient 1, and the average clustering coefficient (0.839 ± 0.010) is many times

larger than the average of clustering coefficients (0.100 ± 0.004) of 104 randomized networks

(Supplementary Fig. 4e). Similarly, the average clustering coefficient of the TPN (0.617 ±

0.025) is an order of magnitude larger than for randomized networks (0.045 ± 0.006). Higher

Page 6: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

6

values of clustering coefficient mostly comes from drugs targeting three or more proteins or

proteins targeted by three or more drugs at once. Upon inclusion of experimental drugs the

average clustering coefficient of the TPN increased from 0.617 ± 0.025 to 0.783 ± 0.011

(Supplementary Fig. 4e), which is again higher than the average clustering coefficient of

randomized networks (0.121 ± 0.004), hinting at two trends: (i) there are many experimental

drugs targeting more than two new target proteins and (ii) many experimental drugs introduce

new connections between old targets.

V. Essentiality of Drug Targets

To predict the essentiality of a human gene, we used the phenotype information of the

corresponding mouse orthologs. A human gene was defined as “essential” if a knock-out of its

mouse ortholog confers lethality. We obtained the human-mouse orthology and mouse

phenotype data from Mouse Genome Informatics5 on January 3, 2006. We considered the classes

of embryonic/prenatal lethality and postnatal lethality as lethal phenotypes, and the rest of

phenotypes as non-lethal ones. There were 1,267 mouse-lethal human orthologs, of which 77 are

approved drug targets and of which 149 are targets of all drug targets (including both approved

and experimental drugs). The fraction of essential proteins for approved targets, and all targets

are shown.

Approved Targets All Targets All Proteins

Essential 77 (19%) 149 (15%) 1,267

Non-Essential 145 (37%) 287 (28%) 1,811

Unknown 172 (44%) 575 (57%)

Cancer drugs might selectively act to terminate cancer cells by targeting essential gene

products. To test this, we looked at the proportion of essential proteins among the targets of

approved oncology drugs. Out of 61 protein targets of oncology drugs 17 were essential (28%),

which is only slightly higher than the ratio of the essential proteins in targets of the all approved

drugs (19%).

Page 7: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

7

VI. Drug Targets in the Human Protein-Protein Interaction Network

We looked at the distribution of drug targets in the human protein-protein interaction (PPI)

map8,9. By starting from a drug target or any protein in the network (for the randomized control)

we looked at the fraction of drug targets for each distance while applying the breadth-first search

algorithm. The distance is defined as smallest number of edges between pairs of nodes. Rather

than being random, we saw an enhancement in distances 1 and 2, the fraction of drug targets

being larger at distance 2 than fraction of drug targets at distance 1 (Supplementary Fig. 5a,b).

If there were a naïve clustering of the drug targets, we would expect to see a monotonic decrease

of fractions with the distance, as observed for the drug targets in the disease gene network (Fig.

5c). Hence the distance 2 enhancement is an inherent feature of the drug targets in the PPI.

This apparent increase in distances 1 and 2 also relates to the families of ‘druggable’

proteins. Having an increase in distance 2 means that the drugs share a protein interactor rather

than interacting directly between themselves. For instance, G protein coupled receptors (GPCR)

show a much sharper peak around distance 2, which can be explained by the fact that GPCRs

interact with G proteins and rarely interact with each other (Supplementary Fig. 5c). But if we

look at kinases, the peak around distance 1 is higher but the peak around distance 2 is still

prominent (Supplementary Fig. 5d). This shows that some kinases interact with each other, but

others share a protein interactor rather than directly interacting with each other. Another

interesting feature of kinases is the decrease after distance 3 for both the kinase group and the

random control. This result indicates that kinases are central in the network and they usually can

be reached from any node within 3 interactions.

We also measured the number of shared neighbors for two proteins if the two proteins are

targeted simultaneously by a drug. We saw that for both targets of approved drugs and targets of

all drugs there is a significant increase for the shared neighbors, as suggested by distance 2

enhancement in the fraction measure. The number of shared neighbors for drug targets is given

in the table below:

Approved Targets All Targets

Observed 91 330

Expected 11 ± 3 96 ± 11

P-Value 10-22443 10-14812

Page 8: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

8

VII. Topological Features of Drug Targets in the Human PPI Network

Aside from the degree of a node in the Human PPI (Fig. 4c), one can also measure average

Betweenness (number of shortest paths passing through the node), average Closeness (inverse of

the average length of the shortest paths calculated for all pairs starting from a particular node)

and average Clustering Coefficient (see Methods) of the nodes. We used Pajek to calculate all

these topological features for the human PPI network.

The degree of a node in the human PPI network correlates well with essentiality10. In

order to discriminate the effect of essential proteins that would inflate the average degree of the

drug targets, we divided each protein group into essential and non-essential sets. All essential

components of a protein group have significantly higher average degree than the corresponding

non-essential part (Supplementary Fig. 5e). However, drug targets have higher average degree

compared to the network average if we compare essential and non-essential parts separately.

Hence, our conclusion about the degree of drug targets would not be influenced by the essential

proteins in the group. This degree difference might be due to investigational bias towards known

drug target proteins, thus yielding more interactors for them, especially in the literature curated

dataset (PPI information obtained from methods like Y2H is unbiased8; however, Y2H is known

to show fewer interactions for membrane proteins, and drug targets are frequently membrane

proteins).

Using the degree information, we can also heuristically calculate the expected ratio of the

drug targets, or ‘druggable genome’. First, we binned the data according to their degree, i.e. we

assigned certain range of degrees to a single bin. Initially each bin was the degree of the protein

itself. However, it is well known that protein interaction networks are scale-free, so logarithmic

binning was the second choice for different logarithmic bases. The ratio of the druggable

genome was estimated by:

( )

∑∑

=

ii

iii

n

nbinratio

*Pr

where ibin is the i th bin, ( )ibinPr is the ratio of number of target proteins belonging to the i th

bin to the number of all target proteins, and in is the number of proteins, regardless of being a

target or not, in the i th bin. Linear binning gives a fraction of 10%. By increasing the base of the

logarithm, this ratio increased from 11% for the base 1.25 to 20% for the base 2. Larger

Page 9: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

9

logarithms bin large chunks of data together, so we believe that 10-20% is a reasonable interval,

which corresponds to 3,000 to 6,000 proteins. This estimate agrees well with the earlier

estimates of disease-modifying genes which was around 3,000 – 10,000 genes11.

Betweenness quantifies the expected information flow from a node. Betweenness of the

drug targets is larger than the randomized case (Supplementary Fig. 5f). This shows that drugs

preferentially target proteins that are more central to the cell in terms of decision flow. As drugs

change the phenotypic state of the cell, it is not surprising to see that drug targets have higher

average betweenness. Closeness also measures centrality and the behavior of different protein

classes is similar to their behavior for the betweenness measure (Supplementary Fig. 5g). The

average clustering coefficient profile shows a different behavior than the other two measures.

The average clustering coefficient of the disease related proteins is higher than the other protein

classes (Supplementary Fig. 5h). Targets of all drugs (both approved and experimental) has

significantly lower average clustering coefficient. This might arise because most drug targets are

preferentially distance 2 away for this class of proteins (Supplementary Fig. 5b) and this is

indicative of fewer inter-neighbor connections.

We also looked at the topological features of two protein families that are prominent

‘druggable’ targets: GPCRs and protein kinases. We obtained the list of GPCRs from the

Molecular Class-Specific Information System (MCSIS) project website (http://www.gpcr.org)

and list of kinases from Gene Ontology (GO) database. We found 285 GPCR proteins (excluding

the taste receptors) and 464 protein kinases. Interestingly GPCRs have very low degree

compared to the other proteins, whereas kinases show large connectivity (Supplementary Fig.

5e). This trend persists for the betweenness (Supplementary Fig. 5f).

VIII. Human Disease Network

The human disease network (HDN) is reported elsewhere6. We briefly describe the dataset and

procedure to build this network. The data was obtained from the Morbid Map (MM) of the

Online Mendelian Inheritance in Man (OMIM)7, which has the most up-to-date disorder-gene

associations. The MM was downloaded as of 21 December 2005. Each entry of the MM contains

information about the name of the disorder, associated gene symbols, OMIM ID, and the

chromosomal location. The “strong” associations, i.e. at least one mutation in the gene is proven

to be causative to the disorder, have the “(3)” label, and there were 2,929 disorder terms with this

Page 10: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

10

tag. We merged these disorders into 1,284 “distinct” disorders first automatically and then by

manual validation. A broader classification into 22 disorder classes was done manually,

including 155 disorders assigned to the “multiple” class for disorders with multiple clinical

features, and 31 disorders assigned to the “unknown” class. The map in Fig. 5a was obtained by

combining disease nodes (circles) to gene nodes (rectangles). Coloring of the nodes represents

the disease classes. In the map, genes that are associated with only a single disease are omitted

for clarity and aesthetics.

IX. Properties of Drug Targets in the Human Disease Network

We looked at the average number of genes associated with each disease and the average number

of disorders associated with each gene. When one of the genes associated with a disorder is also

a drug target (for both approved and experimental targets) the average number of genes

associated with that disease is higher than the random case (Supplementary Fig. 6a).

For the genes that are also approved drug targets the average number of associated disorders was

higher than random, whereas this average was slightly lower for experimental drug targets

(Supplementary Fig. 6b). These results indicate that experimental drug targets are more

peripheral in the disease gene network. From these observations we would expect the average

degree of target of approved drugs to be highest among all gene classes (Fig. 5b), but we see an

opposite trend. This is largely due to approved drug targets being peripheral at the disease gene

network as well. For instance cancer genes that are targeted by approved drugs are peripheral at

the cancer disease cluster where most of the higher degree genes reside in the center (Fig. 5b).

For different categories of genes, the ratio of the diseases associated with targets of approved

drugs change drastically. However, the average degree of targets of drugs approved after 1996

increases from 3.58 ± 0.47 to 4.17 ± 0.52. Note that the degree in the human disease gene

network is correlated with the number of diseases the gene is involved with, and the number of

other genes associated with the same diseases. This increase could indicate investigational bias

towards diseases with few or no effective indicated drugs, hence producing larger numbers of

associated disease genes.

Page 11: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

11

X. Expression Profiles of Drug Targets and Disease Genes

We used microarray data available for 36 normal human tissues12. There is expression

information for 293 approved drug targets, 808 of all (approved plus experimental) drug targets.

First we looked at the co-expression correlations between the proteins that are targeted by the

same drug (Supplementary Fig. 7a). We see a significantly more co-expression for the targets

of approved drugs (Kolmogorov – Smirnov test, P < 10-9).

To quantify how well at least one of the targets of the drugs is co-expressed with

corresponding disease-causing genes, we took the maximum of the Pearson Correlation

Coefficients (PCC) obtained from the drug-disease pairs, where at least one drug target protein

and one corresponding disease gene product were in the expression dataset. The distribution has

a second peak around 1 (Supplementary Fig. 7b), corresponding to a drug targeting exactly the

disease-causing gene product. Other than this peak, drug target – disease gene pairs show less

correlation than random. For the random set two groups of genes with the same number of

disease genes and drug targets were selected randomly for each drug – disease pair. The

neurological, respiratory, psychiatric and endocrine disease classes show more co-expression; in

contrast ophthamological, gastrointestinal and immunological disease classes show less

(Supplementary Fig. 7c).

Page 12: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

12

XI. Supplementary References

1. Newman, M. Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64, 016131 (2001).

2. Albert, R. & Barabasi, A.L. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47-97 (2002).

3. Ravasz, E., Somera, A., Mongru, M., Oltvai, Z. & Barabási, A.-L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551-1555 (2002).

4. Watts, D. & Strogatz, S. Collective dynamics of 'small-world' networks. Nature 393, 440-442 (1998).

5. Eppig, J.T. et al. The Mouse Genome Database (MGD): from genes to mice--a community resource for mouse biology. Nucleic Acids Res. 33, D471-475 (2005).

6. Goh, K.I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685-8690 (2007).

7. Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A. & McKusick, V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514-517 (2005).

8. Rual, J.-F. et al. Toward a proteome-scale map of the human protein-protein interaction network. Nature 437, 1173-1178 (2005).

9. Stelzl, U. et al. A human protein-protein interaction network: A resource for annotating the proteome. Cell 122, 957-968 (2005).

10. Jeong, H., Mason, S.P., Barabasi, A.L. & Oltvai, Z.N. Lethality and centrality in protein networks. Nature 411, 41-42 (2001).

11. Hopkins, A.L. & Groom, C.R. The druggable genome. Nat. Rev. Drug Discov. 1, 727-730 (2002).

12. Ge, X. et al. Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues. Genomics 86, 127-141 (2005).

Page 13: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

Anisotropine Methylbromide

Atropine

Benzquinamide

Benztropine

Bethanechol

Biperiden

Buclizine

Carbachol

Carbinoxamine

Cevimeline

Clidinium

Cryptenamine

Cyclizine

Cyclopentolate

Cycrimine

Desipramine

Dicyclomine

Diphenidol

Doxylamine

Ethopropazine

Flavoxate

Glycopyrrolate

Homatropine Methylbromide

Hyoscyamine IpratropiumMethantheline

Methscopolamine

Metixene

Metoclopramide

Olanzapine

Oxybutynin

Oxyphencyclimine

Oxyphenonium

Pilocarpine

Pirenzepine

Procyclidine

Promazine

Promethazine

Propantheline

Propiomazine

Scopolamine

Solifenacin succinate SuccinylcholineTolterodine

Tridihexethyl

TriflupromazineTrihexyphenidyl

Trospium

Acetophenazine

Amantadine

Apomorphine

Aripiprazole

Bromocriptine

Buspirone

Cabergoline

Chlorpromazine

Chlorprothixene

Cinnarizine

Clozapine

DomperidoneDroperidol

Flupenthixol

Fluphenazine

Haloperidol

Levodopa

Loxapine

Mesoridazine

Metoclopramide

Minaprine

OlanzapinePergolide

Perphenazine

Pimozide Pramipexole

Prochlorperazine Promazine

Promethazine

Propiomazine

Quetiapine

Remoxipride

Risperidone

Ropinirole

Sulpiride

Thiethylperazine

ThioridazineTrifluoperazine

Triflupromazine

Ziprasidone

Alfuzosin

Amiodarone

Amphetamine

Benzphetamine

Betanidine Carvedilol Dapiprazole Doxazosin Epinastine

Epinephrine

Ergotamine

Flupenthixol

Guanadrel Sulfate

GuanethidineLabetalol

Maprotiline

Metaraminol

MethoxamineMidodrine

Nefazodone

Nicergoline

Nilutamide

Norgestrel Oxymetazoline

Perphenazine

Phenoxybenzamine

Phenylephrine Phenylpropanolamine

Prazosin

PromazinePromethazine Propiomazine

Pseudoephedrine

Risperidone

TamsulosinTerazosin

Thiethylperazine

Thioridazine

Tolazoline

Nefazodone

Trifluoperazine

Ziprasidone

Astemizole

Azatadine

Azelastine

Benzquinamide

Bromodiphenhydramine

Brompheniramine

Buclizine

Carbinoxamine

Cetirizine

Chlorpheniramine

Cinnarizine

Clemastine

Clozapine

Cyclizine

Desipramine

Desloratadine

Dexbrompheniramine

Dimenhydrinate

Diphenhydramine

Diphenylpyraline

Doxepin

Doxylamine

Emedastine

Epinastine

Fexofenadine Histamine Phosphate

Hydroxyzine

Ketotifen Fumarate

Levocabastine

Loratadine

Maprotiline

Meclizine

Mequitazine

Methdilazine

Nedocromil

Olanzapine

Olopatadine

Orphenadrine

Pemirolast

Prochlorperazine

PromazinePromethazine

Propiomazine

Risperidone

Terfenadine

Trazodone

Trimeprazine

Tripelennamine

Triprolidine

Ziprasidone

a) b)

c)

e)

Similarity Score = 20

Similarity Score = 18

Chlorpheniramine ChloroquineSimilarity Score = 16

Chlorpheniramine CarbinoxamineSimilarity Score = 19

Chlorpheniramine TropicamideSimilarity Score = 17

Chlorpheniramine OrphenadrineSimilarity Score = 15

Chlorpheniramine

Dexbrompheniramine

d)

Chlorpheniramine

Chlorpheniramine

Drugs Targeting ADRA1A

Drugs Targeting HRH1

Drugs Targeting CHRM1

Drugs Targeting DRD2

Supplementary Figure 1. Chemical similarity of drugs targeting same protein. a) Drugs with different chemical similarity scores for the input drug Chlorpheniramine. b-e) Chemical similarity networks for drugs targeting the same protein. Two drugs are connected if the similarity score is at least 15. A blue edge corresponds to similarity score of 15 or 16, orange edge corresponds to similarity score of 17 or 18, a red edge is similarity score of 19 or 20. Shaded regions are regions of high similarity showing a clique like nature. b) Chemical similarity network for drugs targeting HRH1. c) Chemical similarity network for drugs targeting ADRA1A. d) Chemical similarity network for drugs targeting DRD2. e) Chemical similarity network for drugs targeting CHRM1.

Page 14: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

METABOLISMBLOODCARDIOVASCULARDERMATOLOGICALGENITO-URINARYHORMONESANTI-INFECTIVESANTINEOPLASTICMUSCULOSKELETALNERVOUS SYSTEMANTIPARASITICRESPIRATORYSENSORY ORGANSVARIOUS

Supplementary Figure 2. The Drug Network (DN). In the DN each node corresponds to a distinct chemical entity, and each node is colored based on the anatomical therapeutic chemical class it belongs to. The names of the 14 drug classes are shown on the right. The size of each node is proportional to the number of proteins targeted by the corresponding drug. The link thickness is proportional to the number of proteins shared by the drugs it connects. The nodes are colored according to their Anatomical Therapeutic Chemical Classification.

Page 15: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

0 10 20 30 1259 14310.0001

0.001

0.01

0.1

1

10

100RandomAll Drugs

RandomExperimental Targets

0 10 20 725 782

RandomAll Targets

0 10 20 30 476 7880.0001

0.001

0.01

0.1

1

10

100RandomApp Drugs (DN)

0 5 10 122 302

RandomApproved Targets (TPN)

0 5 10 601 6400.0001

0.001

0.01

0.1

1

10

100RandomExperimental Drugs

0 5 10 550 596

Ave

rage

Num

ber

of C

ompo

nent

sA

vera

ge N

umbe

r of

Com

pone

nts

Ave

rage

Num

ber

of C

ompo

nent

sa) b)

c) d)

e) f)

Component Size Component Size

Supplementary Figure 3. Component Size Distributions of SN and TPN. a) for the drug network (DN), b) for the target protein network (TPN), c) for the DN after addition of the experimental drugs, d) for the TPN after addition of targets of the experimental drugs, e) for the DN consisting only of the experimental drugs, and f) for the TPN consisting only of targets of the experimental drugs.

Page 16: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

1 10 100

1

10

100

1 10 50

1

10

100

1 10 100

1

10

100

1 10 100

1

10

100

Drug Network (DN) Target Protein Network (TPN)

DN + Experimental Drugs TPN + Targets of Experimental Drugs

15.022.1 ±−∝ k 17.059.1 ±−∝ k

16.030.1 ±−∝ k 1.005.1 ±−∝ k

a) b)

c) d)

TPN+EXPDN+EXP

TPN

DN

DNTPN DN+EXP

TPN+EXP

0

0.2

0.4

0.6

0.8

Ave

rage

Clu

ster

ing

Coe

ffic

ient Observed Random

e)

Supplementary Figure 4. Degree Distributions and Clustering Coefficients of DN and TPN. a) of the drug network (DN), b) of the target protein network (TPN), c) of the DN after addition of the experimental drugs, d) of the TPN after addition of targets of the experimental drugs. e) Clustering coefficient of the networks compared to the random case.

Page 17: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

0.18

0.19

0.2

0.21

0.22

0.23

0.24

0.25

0.26

AllProteins

TargetProteins

(TP)

TP + EXP Disease Kinases GPCRs

Ave

rage

Clo

sene

ss

All Non Essential Essential

a) b)

0

0.05

0.1

0.15

0.2

1 2 3 4 5Distance

Frac

tion

Approved TargetsAll

0.05

0.1

0.15

0.2

0.25

1 2 3 4 5Distance

Frac

tion

All TargetsAll

f)

0

6

12

18

24

AllProteins

TargetProteins

(TP)

TP + EXP Disease Kinases GPCRs

Ave

rage

Deg

ree

All Non Essential Essential

0

0.02

0.04

0.06

0.08

0.1

0.12

AllProteins

TargetProteins

(TP)

TP + EXP Disease Kinases GPCRsAve

rage

Clu

ster

ing

Coe

ffic

ient All Non Essential Essential

0

0.001

0.002

0.003

AllProteins

TargetProteins

(TP)

TP + EXP Disease Kinases GPCRs

Ave

rage

Bet

wee

nnes

sAll Non Essential Essential

d)c)

h)g)

e)

0

0.1

0.2

0.3

0.4

1 2 3 4 5Distance

Frac

tion

GPCRsAll

0

0.05

0.1

0.15

0.2

1 2 3 4 5Distance

Frac

tion

KinasesAll

Supplementary Figure 5 . Drug Targets and Some Druggable Families in the Human Protein – Protein Interaction Network. a) Fraction of targets of approved drugs while applying a breadth-first search starting from either a target protein or a random protein in human PPI network. b) Doing the same for GPCRs. c) Doing the same for kinases. d) Doing the same for targets of all drugs (approved and experimental). e) Average degree of several classes of proteins in the human PPI network. f) Average betweenness of several classes of proteins in the human PPI network. g) Average closeness of several classes of proteins in the human PPI network. h) Average clustering coefficient of several classes of proteins in the human PPI network.

Page 18: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

0

1

2

3

4

5

Disorders (1286) Disease with AppTargets (239)

Disease with ExpTargets (234)

Ave

rage

Num

ber

of A

ssoc

iate

d G

enes

1

1.5

2

Disease genes (1777) Approved Targets(166)

Experimental Targets(210)A

vera

ge N

umbe

r of

Ass

ocia

ted

Dis

orde

rs

a) b)

c)

0

50

100

150

200

250

Endo

crin

e

Hem

atol

ogic

al

Card

iova

scul

ar

Psyc

hiat

ric

Unc

lass

ified

Conn

ectiv

e_tis

sue_

diso

rder

Rena

l

Imm

unol

ogic

al

Nut

ritio

nal

Met

abol

ic

Bone

Ear,N

ose,

Thro

at

Neu

rolo

gica

l

Dev

elop

men

tal

Oph

tham

olog

ical

Resp

irato

ry

Canc

er

Mus

cula

r

Skel

etal

mul

tiple

Gas

troin

testi

nal

Der

mat

olog

ical

Cou

nt

With Drug TargetsNo Drug Targets

Disease Categories

Supplementary Figure 6. Disease and Gene Statistics in Human Disease Network. a) Average number of genes for all diseases in the data set compared to diseases which are associated with genes that are also approved (App) targets or experimental (Exp) targets. Numbers in the parenthesis show how many drugs are in each category. b) Average number of disorders associated with each disease compared to the average number of diseases for the genes that are also approved drug targets or experimental drug targets. Numbers in the parenthesis show how many genes are in each category. c) Count of disorders with genes that are also approved drug targets for different disease categories. Disease categories are ranked according to the ratio of number of disorders associated with approved drug targets to all disorders in that category (except the Nutritional disease class where there are only four disorders).

Page 19: Supplementary Material on “Drug-Target Network” · 3 by DrugBank. ATC codes are controlled by the World Health Organization Centre for Drug Statistics Methodology. Drugs are classified

0

0.05

0.1

0.15

0.2

0.25

-0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9

Average Max PCC

Freq

uenc

y

Drug Targets vs Disease GenesRandom

0

0.05

0.1

0.15

0.2

-0.9 -0.7 -0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9

Expression PCC

Freq

uenc

yApproved Target ProteinsAll Target ProteinsAll Proteins

b)

c)

a)

0

0.1

0.2

0.3

0.4

0.5

0.6

Neu

rolo

gica

l(74)

Res

pira

tory

(36)

Unc

lass

ified

(7)

Psyc

hiat

ric(4

4)

Endo

crin

e(50

)

Met

abol

ic(2

7)

Nut

ritio

nal(7

)

Car

diov

ascu

lar(1

42)

Can

cer(1

10)

Hem

atol

ogic

al(4

4)

Ren

al(6

)

Bon

e(26

)

Der

mat

olog

ical

(4)

Con

nect

ive_

tissu

e_di

sord

er(5

4)

mul

tiple

(7)

Mus

cula

r(5)

Oph

tham

olog

ical

(36)

Gas

troin

testi

nal(1

1)

Imm

unol

ogic

al(4

2)

Disease Categories

Ave

rage

Max

PC

C

Supplementary Figure 7. Expression Profile Correlations. a) Frequency of expression correlations for proteins that are targeted by approved (or approved plus experimental) drugs simultaneously compared with all calculated correlations. b) Frequency of average maximum PCC between drug targets and corresponding disease genes. c) Ranked average max PCC for various disease categories. Numbers in the parenthesis show how many such drug-disease pairs were calculated in each disease category.