integrating pathway information with gene expression data to identify novel pathway-specific cancer...

9
Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs Charles Pei Upper Arlington High School

Upload: charles-pei

Post on 27-Jan-2017

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs

Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific

Cancer Drugs

Charles Pei

Upper Arlington High School

Page 2: Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs

Integrating pathway information with gene expression data to identify novel pathway-specific cancer drugs

Abstract:

Connectivity Map (CMap), an extensive database of drug-treatment gene expression data comprised of over 6000 experiments across 1300 compounds, has proven to be a valuable asset in drug repositioning. It has been used to recognize drugs with common mechanisms of action (MOAs), discover new MOAs and identify new treatments. The goal of this project is to integrate publicly available pathway information with gene expression data from CMap in order to discover novel pathway-specific cancer drugs. We identified several major cancer related pathways, the p53 signaling, PI3K/AKT signaling, PTEN signaling and Wnt/β-catenin signaling pathways. We applied a modified CMap algorithm to carry out pathway specific queries across the two databases and identified drugs that specifically perturb pathways of interest. We successfully created a novel method for calculating a connectivity score from non-directional pathway information and ranked gene expression data. Applying the method, we identified many drugs significantly affecting the PTEN, PI3K/AKT and p53 pathways, though none were identified for the Wnt pathway. Some of these results were validated through Venn analysis with Ingenuity Pathway Analysis (IPA) information on pathways while others were hypothesized to be novel drug indications.

Page 3: Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs

INTRODUCTION

The average cost to bring a single drug to market has surpassed $5 billion, a statistic driven by the fact that over 95% of experimental medicines fail due to toxicity or lack of efficacy1. As a result of this unsustainably high price, an unconventional method known as drug repositioning, in which established drug compounds are applied to new therapeutic indications, has gained prominence due to its lower development costs and shorter paths to approval when compared to traditional drug development.

Our lab has successfully applied CMap to repurpose drugs in various disease areas. However, the current CMap method assesses the effects on whole systems rather than individual pathways. Since some drugs may have off-pathway effects, we are interested in finding pathway-specific drugs. We hypothesized that analyzing diseases on a pathway, rather than genome-wide, level could yield novel drug indications. We focus on four cancer-related pathways, the p53 signaling, PI3K/AKT signaling, PTEN signaling and Wnt/β-catenin signaling pathways and identify drugs affecting each one.

Firstly, the p53 pathway is composed of a network of genes and their products that are targeted to respond to stress signals that impact upon cellular mechanisms that monitor DNA replication, chromosome segregation and cell division2. In response to a stress signal, the p53 protein is activated and leads to either cell cycle arrest or cellular apoptosis. Thus, mutations of genes in the pathway leading to the absence of the functional p53 protein can lead to cancers. Second and thirdly, the tumor suppressor PTEN is a negative regulator of the PI3K signaling pathway, a main regulator of cell growth, metabolism and survival3. The loss and mutation of PTEN in various cancers lead to hyperactive PI3K signaling. Finally, deregulation of the Wnt/β-catenin pathway is known to play a major role in human tumorigenesis4. By investigating the effects of the thousands of drugs on CMap on these four pathways, we were able to validate previously known cancer treatments and predict new cancer indications.

MATERIALS AND METHODS

Drug treatment gene expression data was obtained from Connectivity Map (CMap), a database of over 6000 experiments across 1300 compounds, using MySQL. The raw expression data was ranked and read into R. Pathway information and known drug indications for each pathway were retrieved from Ingenuity Pathway Analysis (IPA), a tool used to model, analyze, and understand the complex biological and chemical systems. This information was read into R as well.

Enrichment analysis of the non-directional pathway information was conducted by using a rank-based pattern-matching strategy based on the Kolmogorov-Smirnov statistic for

Page 4: Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs

nonparametric data in R. The aforementioned algorithm calculated a pathway enrichment score for each pathway-drug comparison, which was stored for later use.

A permutation method (10000 random permutations) was used to calculate p-values for each pathway enrichment score from the prior step. These p-values were then corrected for the multiple hypotheses problem by False Discovery Rate (FDR) adjustment to find significant drug indications for each pathway. The overlaps between known drugs and predicted drugs were analyzed using Venn analysis and the hypergeometric test with all CMap drugs as the background.

RESULTS

Inhibitors and activators were defined as (FDR<0.2). Thus, we predicted p53 inhibitors (n=1272), p53 activators (n=663), PI3K inhibitors (n=241), PI3K activators (n=95), PTEN activators (n= 238) and PTEN inhibitors (n=101). Interestingly, no Wnt activators or inhibitors were predicted.

Analysis of the overlaps between the known pathway drugs showed that three drugs, tretinoin, doxorubicin and daunorubicin, are known to affect the PI3K, PTEN and p53 pathways. While tretinoin is an acne drug repurposed to treat acute promyelocytic leukemia, doxorubicin and daunorubicin are predictably used to treat a wide range of cancers. All three were found to be statistically significant in affecting all three pathways except for daunorubicin in the PI3K pathway. Evidently, the method seems to work for less specific drug indications.

When applied to more specific drugs, the method seems to work as well. Venn Analysis of the predicted vs. known drugs shows significant overlap for each pathway except for Wnt, with p53 having a hypergeometric p-value=0, PI3k with p-value=1.60e-08 and PTEN with p-

Figure 1. Predicted vs. known drug indications. 8 p53 drugs, 5 PTEN drugs and 13 PI3K predicted drugs were verified by our analysis. The Wnt pathway had no statistically significant drugs.

Page 5: Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs

value=2.68e-05 when tested with all CMap drugs as the background. Clearly, with such low p-values, the method is predicting drugs at a much higher rate than random.

ran FDR score name dose (M) cell 1 0.109 0.37 chlorprothixene 1.14E-05 MCF72 0.109 0.37 mephenytoin 1.84E-05 HL603 0.109 0.36 metixene 1.16E-05 PC34 0.109 0.36 noscapine 9.60E-06 MCF75 0.109 0.36 acenocoumarol 1.14E-05 MCF76 0.109 0.36 clemastine 8.60E-06 MCF77 0.109 0.36 chlorpromazine 1.12E-05 HL608 0.109 0.36 R-atenolol 1.50E-05 MCF79 0.109 0.35 conessine 1.12E-05 MCF710 0.109 0.35 dihydroergocristine 5.60E-06 MCF7

Table 1. Top ten activators for PI3K. Experiments ranked by FDR, then pathway enrichment score.

rank FDR score name dose (M) cell line1 0.122 -0.385 sirolimus 1.00E-07 ssMCF2 0.122 -0.358 sirolimus 1.00E-07 MCF73 0.122 -0.353 cytisine 2.10E-05 MCF74 0.122 -0.350 natamycin 6.00E-06 MCF75 0.122 -0.349 sulfamethoxypyrida 1.42E-05 MCF76 0.124 -0.347 trioxysalen 1.76E-05 MCF77 0.124 -0.344 hexamethonium 1.00E-05 MCF78 0.124 -0.334 ciprofloxacin 1.08E-05 MCF79 0.124 -0.332 metamizole sodium 1.20E-05 MCF710 0.124 -0.329 monorden 1.00E-07 MCF7

Table 2. Top ten inhibitors for PI3K. Experiments ranked by FDR, then pathway enrichment score.

The predicted drugs were also analyzed by ranking by first FDR and then pathway enrichment score. The top ten inhibitors and activators for each pathway were taken by rank and analyzed. Tables 1 and 2 show the top ten activators and inhibitors, respectively, for PI3K. The bolded rows contain drugs that are also known drug indications. In Table 1, clemastine was found to be both known and predicted in the top ten activators. Clemastine is currently prescribed as an antihistamine for allergy medication, but has also been found to induce apoptosis in cutaneous T-cell lymphoma cell lines5. In Table 2, two different experiments involving the drug sirolimus were found to be the top and second-highest ranked inhibitors.

Page 6: Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs

Sirolimus is known to have immunosuppressant and tumor-suppressant properties, and its effects on both the PTEN and PI3K pathways have already been recorded6.

The other two pathways with significant activators and inhibitors, PTEN and p53, did not have any known drugs in their top ten activator and inhibitor groupings. This may either be due to the predicted experiments above them having a stronger effect or an error in the method in accurately ranking the experiments quantitatively.

CONCLUSIONS

We have developed a pathway-based method of finding new drugs. Based on the significance of the overlap between known drug indications and predicted indications in three of the four pathways analyzed, we can conclude that some of the previously unknown predicted drugs may potentially be novel cancer drug indications.

Future work on this project would include creating a better enrichment score algorithm for more pathway specificity and superior accuracy, analyzing the biology of the drugs’ effects on the pathways and integrating the method with more extensive databases such as the pathway databases KEGG and Reactome, and the gene-expression database LINCS.

ACKNOWLEDGEMENTS

The authors would like to thank the other members of the Butte lab, the Stanford Institutes of Medicine Summer Program (SIMR) and Tianyi Wang for facilitating this research.

REFERENCES1Herper, Matthew. (2013). “The cost of creating a new drug now $5 billion, pushing big pharma to change.” Forbes Magazine.2http://www.nature.com/onc/journal/v24/n17/full/1208615a.html3http://www.nature.com/onc/journal/v27/n41/full/onc2008247a.html4http://www.boneandcancer.org/MOLab%20Publications%20in%20PDF%20files/Luu

%20et%20al_Targeting%20Wnt%20bCat%20Review_CCDT_4-7-05.pdf5http://www.ncbi.nlm.nih.gov/pubmed/233628706 http://www.ncbi.nlm.nih.gov/pubmed/16039868

Page 7: Integrating Pathway Information with Gene Expression Data to Identify Novel Pathway-specific Cancer Drugs