reconstruction of the temporal signaling …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre...

71
RECONSTRUCTION OF THE TEMPORAL SIGNALING NETWORK IN SALMONELLA- INFECTED HUMAN CELLS A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF INFORMATICS OF MIDDLE EAST TECHNICAL UNIVERSITY BY GÜNGÖR BUDAK IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN BIOINFORMATICS SEPTEMBER 2016

Upload: others

Post on 10-Jan-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

RECONSTRUCTION OF THE TEMPORAL SIGNALING NETWORK IN SALMONELLA-INFECTED HUMAN CELLS

A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF INFORMATICS OF

MIDDLE EAST TECHNICAL UNIVERSITY

BY

GÜNGÖR BUDAK

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

IN BIOINFORMATICS

SEPTEMBER 2016

Page 2: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları
Page 3: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

RECONSTRUCTION OF THE TEMPORAL SIGNALING NETWORK IN SALMONELLA-INFECTED HUMAN CELLS

Submitted by Güngör BUDAK in partial fulfillment of the requirements for the degree of Master of Science in the Department of Bioinformatics, Middle East Technical University by, Prof. Dr. Nazife BAYKAL Director, Graduate School of Informatics, METU Assoc. Prof. Dr. Yeşim AYDIN SON Head of Department, Health Informatics, METU Assoc. Prof. Dr. Yeşim AYDIN SON Supervisor, Health Informatics, METU Dr. Nurcan TUNÇBAĞ Co-supervisor, Health Informatics, METU Examining Committee Members: Assoc. Prof. Dr. Rengül ÇETİN ATALAY Health Informatics, METU Assoc. Prof. Dr. Yeşim AYDIN SON Health Informatics, METU Assist. Prof. Dr. Aybar Can ACAR Health Informatics, METU Assoc. Prof. Dr. Banu GÜNEL KILIÇ Information Systems, METU Assist. Prof. Dr. Ceren SUCULARLI Bioinformatics, Hacettepe University

Date: 09.09.2016

Page 4: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları
Page 5: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

iii

I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work.

Name, Last name: Güngör BUDAK

Signature:

Page 6: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

iv

ABSTRACT

RECONSTRUCTION OF THE TEMPORAL SIGNALING NETWORK IN SALMONELLA-INFECTED HUMAN CELLS

Budak, Güngör MSc, Bioinformatics Graduate Program

Supervisor: Assoc. Prof. Dr. Yeşim AYDIN SON Co-supervisor: Dr. Nurcan TUNÇBAĞ

September 2016, 57 pages

Salmonella enterica is a bacterial pathogen whose mechanism of infection is usually

through food sources. The pathogen proteins are translocated into the host cells to

change the host signaling mechanisms either by activating or inhibiting the host proteins.

In order to obtain a more complete view of the biological processes and the signaling

networks and to reconstruct the temporal signaling network of the human host, we have

used two network modeling approaches, the Prize-collecting Steiner Forest (PCSF)

approach and the Integer Linear Programming (ILP) based edge inference approach by

integrating a published temporal phosphoproteomic dataset of Salmonella-infected

human cells and the human interactome. The final temporal signaling network conserves

the information about temporality and directionality, while showing hidden entities in the

signaling, such as the SNARE binding, mTOR signaling, immune response, cytoskeleton

organization, and apoptosis pathways. Although the targets of Salmonella effectors such

as CDC42, RHOA, 14-3-3δ, Syntaxin family, Oxysterol-binding proteins were not present

in the phosphoproteomic dataset, they were revealed in the reconstructed signaling

network. Structural analysis of these targets also revealed binding preferences of their

neighbors. The application of such integrated approaches has a high potential to identify

the clinical targets in infectious diseases, especially in the Salmonella infections.

Keywords: phosphoproteomic, network reconstruction, Salmonella infection, temporal

data integration, pathway analysis

Page 7: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

v

ÖZ

SALMONELLA İLE ENFEKTE OLMUŞ İNSAN HÜCRELERİNDE ZAMANA BAĞLI SİNYAL AĞLARININ YENİDEN KURULMASI

Budak, Güngör Yüksek Lisans, Biyoenformatik Yüksek Lisans Programı

Tez yöneticisi: Doç. Dr. Yeşim AYDIN SON Ortak tez yöneticisi: Dr. Nurcan TUNÇBAĞ

Eylül 2016, 57 sayfa

Salmonella enterica, enfeksiyon mekanizması genellikle yiyecek kaynakları yoluyla olan

bir bakteriyel patojendir. Patojen proteinleri, konak hücrelere konağın sinyal

mekanizmalarını ve konak proteinlerini etkinleştirerek ya da engelleyerek değiştirmek

üzere taşınır. Enfekte insan hücrelerindeki biyolojik yolakların ve zamana bağlı sinyal

ağlarının daha bütün olarak yeniden kurulması için, Salmonella ile enfekte olmuş insan

hücrelerinin zamana bağlı fosfoproteomik veriseti ile insan etkileşim haritası

birleştirilerek, Ödül-toplayan Steiner Orman (ÖTSO) ve Tam Sayı Doğrusal Programlama

(TSDP) temelli ilişki çıkarım yaklaşımları kullanıldı. Elde edilen zamana bağlı sinyal ağı,

zaman ve yön bilgisini korurken SNARE bağlanma, mTOR sinyali, bağışıklık tepkisi,

hücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

gösterdi. CDC42, RHOA, 14-3-3δ, Syntaxin ailesi, Oxysterol bağlanma proteinler gibi

Salmonella etkileyicilerinin hedefleri fosfoproteomik verisetinde olmamasına rağmen, bu

proteinler yeniden kurulan sinyal ağında ortaya çıktı. Bu hedeflerin yapısal analizi,

komşularının bağlanma eğilimlerini gösterdi. Bu gibi birleştirilmiş yaklaşımların

uygulaması, özellikle Salmonella enfeksiyonları olmak üzere, bulaşıcı hastalıklardaki

klinik hedeflerin tanımlanmasında yüksek potansiyele sahiptir.

Anahtar sözcükler: fosfoproteomik, yeniden ağ kurma, Salmonella enfeksiyonu, zamana

bağlı veri birleştirmesi, yolak çözümlemesi

Page 8: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

vi

DEDICATION

To my family and fiancée…

Page 9: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

vii

ACKNOWLEDGEMENTS

First and foremost, I would like to express my deepest gratitude to my co-supervisor, Dr.

Nurcan TUNÇBAĞ who has been a tremendous mentor for me. Her constant intellectual

support and extraordinary guidance have allowed this thesis to come alive in the first

place.

I would like to thank my supervisor Assoc. Prof. Dr. Yeşim AYDIN SON not only for her

support throughout this work, but also for her priceless guidance with a friendly attitude

since the day I met bioinformatics as a freshman biology student.

I would like to thank Dr. Öykü EREN ÖZSOY and Assoc. Prof. Dr. Tolga CAN for their

exceptional help with the ILP-based edge inference analysis used in this work.

I also would like to thank Assoc. Prof. Dr. Rengül ÇETİN ATALAY, Assoc. Prof. Dr. Banu

GÜNEL KILIÇ, Assist. Prof. Dr. Aybar Can ACAR and Assist. Prof. Dr. Ceren

SUCULARLI for taking time out from their busy schedule to attend the examining

committee and provide insightful comments.

Last but not the least important, I would like to thank my parents, Nergiz BUDAK and

Haydar BUDAK, my sister, Gözde BUDAK, and my fiancée, Sinem DEMİRKAYA for their

encouragement, unconditional and infinite support.

Page 10: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

viii

TABLE OF CONTENTS

ABSTRACT ..................................................................................................................... iv

ÖZ ................................................................................................................................... v

DEDICATION .................................................................................................................. vi

ACKNOWLEDGEMENTS .............................................................................................. vii

TABLE OF CONTENTS ................................................................................................ viii

LIST OF TABLES ............................................................................................................ x

LIST OF FIGURES .......................................................................................................... xi

LIST OF ABBREVIATIONS ............................................................................................ xii

CHAPTERS

1 INTRODUCTION .................................................................................................... 1

2 LITERATURE REVIEW .......................................................................................... 5

2.1 Systems Biology Analysis of Salmonella enterica ....................................... 5

2.2 Quantitative Phosphoproteomic Analysis .................................................... 7

2.3 Protein-Protein Interaction (PPI) Databases ................................................ 8

2.4 Network Modeling Methods ......................................................................... 9

2.5 Network Analysis and Visualization ........................................................... 12

3 MATERIALS AND METHODS .............................................................................. 13

3.1 Datasets ..................................................................................................... 13

3.2 Network Modeling ...................................................................................... 13

3.2.1 Prize-collecting Steiner Forest Approach .............................................. 15

3.2.2 Integer Linear Programming based Edge Inference Approach ............. 17

3.3 Assessment of the Improvement ............................................................... 19

3.4 Network Analysis and Clustering ............................................................... 19

3.5 Functional Enrichment Analysis ................................................................. 20

Page 11: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

ix

3.6 Structural Analysis of the Nodes in the Reconstructed Network ................ 20

3.7 Network Visualization ................................................................................ 21

4 RESULTS AND DISCUSION ................................................................................ 23

4.1 Reconstruction of Temporal Signaling Networks in Salmonella-infected Human Cells ............................................................................................................ 23

4.2 Salmonella-human Interactome and Functional Enrichment of the Reconstructed Network ............................................................................................ 27

4.3 Functional Enrichment of the Clusters in the Reconstructed Network ....... 29

4.4 Target Selection Approach the Reconstructed Network ............................ 34

4.5 CDC42 is a clinically important target in Salmonella infection ................... 36

4.6 The reconstructed signaling network revealed many other potential clinical targets 37

5 CONCLUSION ...................................................................................................... 43

5.1 Conclusions ............................................................................................... 43

5.2 Future Work ............................................................................................... 45

REFERENCES .............................................................................................................. 47

APPENDIX .................................................................................................................... 57

The abstract of the publication of this study on Frontiers in Microbiology. ................. 57

Page 12: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

x

LIST OF TABLES

Table 1: Representing overlapping proteins in different time points and different locations ........................................................................................................................ 27 Table 2. Targets of Salmonella effectors in the reconstructed network. ....................... 29 Table 3. Top ranking proteins in the reconstructed network ......................................... 35

Page 13: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

xi

LIST OF FIGURES

Figure 1: Salmonella-human host cell interaction for Salmonella replication .................. 6 Figure 2: A toy example for PCSF ................................................................................... 9 Figure 3: A comparison of the network modeling approaches to the PCSF approach embedded in Omics Integrator ...................................................................................... 11 Figure 4. The flowchart of complete analysis. ............................................................... 15 Figure 5: The reconstructed network using only ILP-based edge inference without using PCSF ............................................................................................................................. 25 Figure 6: Snapshot of the interactive online visualization of the reconstructed network 26 Figure 7. Visual representation of the reconstructed signaling network of Salmonella-infected host cell. ........................................................................................................... 28 Figure 6: Enriched Gene Ontology biological process terms for clusters in the reconstructed network ................................................................................................... 31 Figure 7: Enriched Gene Ontology cellular component terms for clusters in the reconstructed network ................................................................................................... 32 Figure 8: Enriched Gene Ontology molecular function terms for clusters in the reconstructed network ................................................................................................... 33 Figure 9: Enriched KEGG and Reactome pathways for clusters in the reconstructed network .......................................................................................................................... 34 Figure 10. CDC42 and its interactions in the reconstructed network ............................ 37 Figure 11: ACTG1 and its interactions in the reconstructed network ............................ 38 Figure 12: STX4 and its interactions in the reconstructed network ............................... 39 Figure 13. Visualization of the first degree neighbors of selected proteins ................... 41

Page 14: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

xii

LIST OF ABBREVIATIONS

CLR: Context likelihood relatedness DAVID: Database for Annotation, Visualization and Integrated Discovery GO: Gene Ontology HGNC: HUGO Gene Nomenclature Committee ILP: Integer Linear Programming MS: Mass spectrometry NP: Nondeterministic polynomial time PCSF: Prize-collecting Steiner Forest PCST: Prize-collecting Steiner Tree PDB: Protein Data Bank PPI: Protein-protein interaction SCV: Salmonella containing vacuole SHI: Salmonella–human interactome T3SS: Type III Secretion System

Page 15: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

1

CHAPTERS

CHAPTER 1

1 INTRODUCTION

Salmonella enterica is a bacterial pathogen that usually infects its host through food

sources. Translocation of the pathogen proteins into the host cells leads to changes in

the signaling mechanism either by activating or inhibiting the host proteins. These

changes can be quantified by the phosphoproteomic measurements from high-

throughput experiments in Salmonella-infected host cells. However, not all proteins that

are active in a signaling process can be quantified in a single snapshot, because some

proteins have very low levels of phosphorylation and they are in very low abundance or

the phosphorylated protein undergoes into a rapid degradation process (Johnson &

Hunter, 2004). Temporal quantification of the phosphorylated proteins provides a better

viewpoint about how the signaling occurs in the infected-host cell. However, the complex

molecular changes during the infection require a systems level network-based analysis

to identify potential clinical targets. At this point, network modeling approaches emerge

to complete lacking parts and to reconstruct the signaling network by integrating publicly

available interaction data.

Network modeling methods either infer the already known interactomes or reconstruct

the interactions from the input data which are usually obtained by curating experimental

results in the literature. There are several publicly available human interactomes retrieved

from high-throughput or low-throughput experiments to detect protein-protein interactions

(PPIs). These data are usually deposited in databases. Recent efforts have been spent

to consolidate these databases in a single source and score each interaction based on

the reliability of the data source (Keskin, Tuncbag, & Gursoy, 2016). There are many

tools to reconstruct the signaling networks including Omics Integrator which solves the

Prize-collecting Steiner forest (PCSF) problem. Omics Integrator is a powerful tool for

recovering hidden nodes in the reconstructed network that also incorporates weighted

negative evidence. From similar network modeling methods, KeyPathwayMiner can also

incorporate negative evidence but the negative evidence has to be given as hard cutoff

(Alcaraz et al., 2014; Tuncbag et al., 2016). Although PCSF is sufficient for revealing

hidden proteins and obtaining a more complete network, given temporal

Page 16: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

2

phosphoproteomic measurements where there are multiple datasets, it cannot perform

the integration (Tuncbag et al., 2016). The integration is necessary to understand the

temporality of the Salmonella-infection in human cells. Integer linear programming (ILP)

based edge inference method is able to integrate such time-series datasets quickly and

accurately (Eren Ozsoy, Aydin Son, & Can, 2015). Therefore, the combination of PCSF

and ILP-based edge inference can accurately reconstruct a more complete temporal

signaling network in Salmonella-infected human cells.

In this thesis, the aim is to reconstruct the temporal signaling network in Salmonella-

infected human cells and identify possible clinically important targets. The temporal

phosphoproteomic data of the Salmonella-infected human cells (Rogers, Brown, Fang,

Pelech, & Foster, 2011) have been used to model the altered signaling network in the

host cells. We have used a powerful combination of two different network modeling

approaches to construct the signaling network of Salmonella-infected human cells. First,

the temporal phosphoproteomic data of Salmonella-infected human cells are integrated

with protein interaction data to construct the signaling pathway at each time point. Then,

all constructed networks were merged together, and used as the input for the second part

of the network modeling step, in which directions are assigned to the interactions based

on the temporal data. Our approach allowed us to identify host pathways altered during

Salmonella infection. In addition, by using network analysis techniques, we have provided

a ranking of the proteins according to their importance during the infection. Additionally,

this temporal signaling network is supported with the structural data to show how some

proteins use the same binding site repeatedly and some others use different faces to

interact with their partners.

This thesis is divided into following chapters:

Chapter 2 provides the literature review including the information about the systems

biology of Salmonella enterica and the details of the most widely used quantitative

phosphoproteomic analysis methods. Moreover, Chapter 2 gives information about

protein-protein interaction (PPI) databases, network modeling methods, network analysis

and visualization of the biological networks.

Chapter 3 describes the experimental dataset used in this study, the protein-protein

interaction (PPI) databases that are selected to integrate using network modeling

methods and to validate the predicted interactions. It also gives several numbers about

the experimental dataset and the PPI databases. Furthermore, Chapter 3 describes the

methods used in reconstructing the final temporal signaling network, network analysis

and functional enrichment analysis.

Page 17: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

3

Chapter 4 shows the network and functional analysis results of the reconstructed

network. The important clinical targets which are revealed after the reconstruction in

Salmonella-infected human cells are also shown in Chapter 4. Furthermore, Chapter 4

discusses the results obtained along with the literature support.

Chapter 5 gives brief summary of the study and concludes the results and discussion

chapter by revisiting some of the points. This chapter also mentions the future direction

of the study.

Page 18: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

4

Page 19: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

5

CHAPTER 2

2 LITERATURE REVIEW

2.1 Systems Biology Analysis of Salmonella enterica

Salmonella enterica is a gastrointestinal pathogen that infects the human cells by

translocation of its effector proteins with a vacuole called the Salmonella containing

vacuole (SCV). The SCV compartment allows the pathogen to replicate and proliferate

within the host cells. The secretion system transfers the pathogen proteins, which are

called the effectors, directly into the cytosol of the host cells (Dandekar, Astrid, Jasmin,

& Hensel, 2012). Translocation is achieved by type III secretion systems (T3SSs) where

T3SS-1 is responsible for the regulation and replication of the SCV. These effectors have

interactions with the host proteins and can change cell functions such as apoptosis, post-

translational modifications, and intracellular signaling (Hannemann, Gao, & Galan, 2013)

(See Figure 1). Salmonella can adapt to a broad range of environmental conditions and

process many different metabolites (Dandekar et al., 2012). Although many efforts have

been invested to understand the adaptation mechanism of Salmonella, functions of its

effector proteins, the affected metabolic regulatory pathways, details of the host-

pathogen communication and the changes in the host signaling pathways are still

unknown. A systems level modeling approach has been performed on the effectors of

Salmonella to understand the adaptation process and 14 regulators have been identified

to play a critical role in the regulation of the genes responsible for Salmonella infection

(Yoon, McDermott, Porwollik, McClelland, & Heffron, 2009). Also the Salmonella’s

metabolic network during its replication has been modeled using flux balance analysis,

which has led to the identification of a set of metabolic pathways crucial during the

intracellular replication (Raghunathan, Reed, Shin, Palsson, & Daefler, 2009). The

invasion of the pathogen is mainly transduced by the protein kinase signaling cascades

in the host cell. Salmonella infection promotes apoptosis and adapts to the host cell’s

ubiquitination process (Steele-Mortimer, 2011). Regarding the regulome of Salmonella,

the context likelihood of relatedness (CLR) approach has been used to infer the

transcriptional regulatory connections by using mutual information in gene expression

data and several regulatory networks have been identified (Taylor et al., 2009).

Page 20: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

6

Figure 1: Salmonella-human host cell interaction for Salmonella replication.

Salmonella transfers its effector proteins into host cell to induce bacterial uptake and

maturation of macropinosomes where it replicates. Adapted from Hannemann S, Gao B,

Galán JE (2013) Salmonella Modulation of Host Cell Gene Expression Promotes Its

Intracellular Growth. PLoS Pathog 9(10): e1003668. doi:10.1371/journal.ppat.1003668.

Copyright 2013 Hannemann et al. under Creative Commons Attribution License.

Understanding the communication and the signaling between Salmonella and its host in

detail is crucial to improve the available treatment strategies for the Salmonella infection.

The recently released interactome of the Salmonella effectors and human proteins, which

has been curated from the literature, again revealed the enrichment of the MAPK

signaling and the apoptotic pathways for the studied protein set (Schleker et al., 2012).

Page 21: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

7

The advances in high-throughput omic technologies also allow the systems-level

identification of signaling components within the host cell. The analysis of mRNA

expression of ~4300 genes after a Salmonella infection in the human epithelial cells

showed that the NF-κB is a key transcription factor in the regulation of a wide range of

genes (Eckmann, Smith, Housley, Dwinell, & Kagnoff, 2000). Also several cytokines,

transcription factors and kinases are shown to be over-expressed in the same study. In

a temporal gene expression analysis, where the Bayesian network analysis is used, the

immune response, Wnt, PI3K, mTOR, TGF-β and many other signaling pathways were

found to be altered during the Salmonella infection. The host signal response was shown

to be activated during the earlier time points rather than later (Lawhon et al., 2011). In

another study, different gene expression datasets were integrated with protein-protein

interactions and compared to each other to find out the specific subnetworks altered by

Salmonella infection in the host (Dhal, Barman, Saha, & Das, 2014). In a global temporal

phosphoproteomic analysis of Salmonella-infected human cells, 9500 phosphorylation

events were quantified during the first 20 minutes of the infection and regulated host

pathways were identified. Clustering analysis showed that the effector SopB was mainly

responsible for the alterations of the phosphorylation events in the host cell (Rogers et

al., 2011).

2.2 Quantitative Phosphoproteomic Analysis

The development of mass spectrometry (MS) based methods made large-scale

phosphoproteomic analysis possible. Although these methods do not allow one to have

time-resolved and mechanistic insight into dynamic signaling pathways, such quantitative

information can be extracted in sequential experiments. These methods can be through

the use of an isotopic label which is incorporated into proteins in cell culture, (e.g., stable

isotope labelling by amino acids in cell culture (SILAC)), a label chemically introduced

into proteins (e.g., isotope-coded affinity tag (ICAT)) or peptides (e.g., iTRAQ) or can be

label-free. The SILAC experiment, which is the most widely used method, is conducted

by replacing an essential amino acid with a nonradioactive, isotopically labelled (heavy)

form of that amino acid in one culture, and the normal form (light) of that amino acid in a

parallel culture. After growing both cultures until near complete incorporation of the

isotopically labeled amino acid, the two cultures are combined into one sample and the

proteins are digested into peptides. Next, these peptides are analyzed by LC-MS/MS

(liquid chromatography coupled MS) experiment and the ratios of light and heavy forms

of peptide pairs are investigated such that if the ratio is one-to-one, there is no change in

the expression (Collins, Yu, & Choudhary, 2007).

Page 22: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

8

2.3 Protein-Protein Interaction (PPI) Databases

The PPI databases are primarily composed of accumulated results of high-throughput

and low-throughput experiments of protein interactions from the literature. These

databases are growing as the interaction-detection methods are improved and the

number of available experiment results increases. There are over 100 PPI databases,

which are mostly independent, as of 2015. Due to the redundancy, differences in curation

and annotation, and interoperability among these databases, the integration of all

information from different sources is necessary. There are two major databases which

integrate different sources, which are iRefWeb and String (Search Tool for Interacting

Genes and Proteins) (Franceschini et al., 2013; Turner et al., 2010). iRefWeb provides

details of interaction data and supporting evidence such as the number of publications

supporting the interaction, the type of experimental detection, and the agreement across

databases. String additionally provides protein−protein associations (indirect) based on

experimental evidence or predictions using text mining along with direct protein−protein

interactions (Keskin et al., 2016). In this study, we used iRefWeb as the global human

interactome and a Salmonella-human interactome which is generated through manual

literature and database search of journal articles and databases and contains 62 PPI

(Schleker et al., 2012). Although these PPI databases are useful for understanding the

interactions between proteins, a full understanding can only be obtained through

structural binding information of these proteins (Wang et al., 2012). Structure-based

predictions of PPIs can be done using docking or template-based approaches (Keskin et

al., 2016). Docking is a computational modeling method where the binding event of two

proteins is modeled by calculating the binding free energy, and finding the structural

assemblies. Template-based methods make use of the available experimental

information for homologous proteins to predict an unknown protein-protein interaction

(Vakser, 2014). Depending on quality and coverage of templates, template-based

approaches are more effective in terms of computational power than docking and they

provide relatively lower false positive rate compared to docking (M. Gao & Skolnick, 2010;

Y. Gao, Douguet, Tovchigrechko, & Vakser, 2007; Tuncbag, Keskin, Nussinov, & Gursoy,

2012; Vreven, Hwang, Pierce, & Weng, 2014). Interactome3D is a database of

interactions that are predicted by a template-based method where sequence and

structural similarity as well as the domain occurrence are investigated (Mosca, Ceol, &

Aloy, 2013). The other template-based PPI databases include INstruct, HOMCOS and

PRISM (Fukuhara & Kawabata, 2008; Meyer, Das, Wang, & Yu, 2013; Tuncbag, Gursoy,

Nussinov, & Keskin, 2011).

Page 23: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

9

2.4 Network Modeling Methods

Figure 2: A toy example for PCSF. Green circles indicate experimental hits that come

from high-throughput ‘omic’ experiments, while purple circles represent the hidden

intermediate molecules that the algorithm can locate in the reconstructed network using

a human interactome. Adapted from Esti Yeger-Lotem, Laura Riva, Linhui Julie Su,

Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie

S Valastyan, David R Karger, Susan Lindquist & Ernest Fraenkel (2009) Bridging high-

throughput genetic and transcriptional data reveals cellular responses to alpha-

synuclein toxicity. Nature Genetics 41, 316 - 323. doi:10.1038/ng.337. Copyright 2009

Nature Publishing Group under the license number 3936471346242 on Aug 26, 2016

Although omic technologies provide large amount of high dimensional data, the complete

map of the signaling pathways cannot be retrieved by the direct connections of omic hits,

because there are many intermediates which are not represented in the experimental

data. Signaling networks can also be modeled by optimization based approaches

(Dittrich, Klau, Rosenwald, Dandekar, & Muller, 2008; Gosline, Spencer, Ursu, &

Fraenkel, 2012; Huang et al., 2012; Huang & Fraenkel, 2009; Tuncbag et al., 2013;

Yeger-Lotem et al., 2009) where omic hits are defined as constraints. For example,

Page 24: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

10

among these approaches, Prize-collecting Steiner forest (PCSF) approach can locate

hidden intermediate nodes within the reconstructed network (Tuncbag et al., 2013) (See

Figure 2 and 3). Previously, various network modeling approaches have been applied

for the integration of the multiple omic sets of diseases and the disease networks of

various cancer types have been successfully reconstructed (Huang et al., 2012; Kim,

Wuchty, & Przytycka, 2011). Regulatory networks can be reconstructed by various

approaches by utilizing the gene expression data including Boolean networks, Bayesian

networks, and methods based on information theory and differential equations (De Jong,

2002; Hecker, Lambeck, Toepfer, Van Someren, & Guthke, 2009; Linde, Schulze,

Henkel, & Guthke, 2015). Analysis of the perturbation data has also been proposed for

the reconstruction networks (Aijo, Granberg, & Lahdesmaki, 2013; Bender et al., 2010;

Frohlich, Sahin, Arlt, Bender, & Beissbarth, 2009; Kiani & Kaderali, 2014; Markowetz,

Kostka, Troyanskaya, & Spang, 2007). For example Nested Effects Models (NEMs)

(Markowetz et al., 2007) use a set of knocked-down genes and their indirect effect on a

larger set of genes to reconstruct the network. Methods that utilize observations of

perturbed networks at a steady state or at several time points include (Dynamic)

Deterministic Effects Propagation Networks ((D)DEPNs) (Bender et al., 2010; Frohlich et

al., 2009), Sorad (Aijo et al., 2013), and Dynamic Probabilistic Boolean Threshold

Networks (DPTBNs) (Kiani & Kaderali, 2014). However, these network reconstruction

methods are computationally expensive and do not scale well for the reconstruction of

large networks. Recently, Linear Programming (LP) based approaches have also been

used to solve the network reconstruction problem (Eren Ozsoy & Can, 2013; Knapp &

Kaderali, 2013; Matos, Knapp, & Kaderali, 2015). LP-based methods model the

reconstruction problem as an optimization problem and are able to construct networks

from both perturbation and time-series assays. However, based on the optimization

function and the linear constraints, LP-based methods may be computationally

expensive, as well. For example, a very recent method, lpNet (Matos et al., 2015),

requires 3 days to reconstruct a 20 node network in the in silico dataset of the HPN-

DREAM breast cancer network inference challenge. The DREAM (Dialogue on Reverse-

Engineering Assessment and Methods) challenge aims to setup a joint effort between

computational and experimental biologists toward revealing the cellular networks from

multiple high-throughput data (Stolovitzky, Monroe, & Califano, 2007).

Page 25: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

11

Figure 3: A comparison of the network modeling approaches to the PCSF

approach embedded in Omics Integrator. Blue color = "Yes", white color = "No".

Adapted from Tuncbag N, Gosline SJC, Kedaigle A, Soltis AR, Gitter A, Fraenkel E

(2016) Network-Based Interpretation of Diverse High-Throughput Datasets through the

Omics Integrator Software Package. PLoS Comput Biol 12(4): e1004879.

doi:10.1371/journal.pcbi.1004879. Copyright 2016 Tuncbag et al. under Creative

Commons Attribution License.

An LP variant, the Integer Linear Programming (ILP) approach, by Melas et al. uses

several optimization steps to find and remove the inconsistencies between

measurements and the input network topology (Melas, Samaga, Alexopoulos, & Klamt,

2013). Additionally, ILP is a known NP-hard problem, and due to the large number of

variables, this method may not find solutions in a reasonable time, as it requires 64000

seconds for a 14 node network. A divide and conquer based ILP solution has been

previously proposed, which scales well for larger networks by merging the solutions of

the smaller subnetworks (Eren Ozsoy & Can, 2013). The main difference of the proposed

ILP approach was the definition of the optimization function as the minimization of the

discrepancy between a reference network and the inferred network. Recently, the ILP

approach has been extended to work on the temporal data (Eren Ozsoy et al., 2015) and

here we have directly apply that method for the construction of the temporal signaling

network in Salmonella-infected human cells. Although network modeling approaches are

Page 26: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

12

easily adaptable to identify signaling components in various disease states, to our best

knowledge, these approaches have not yet been applied for the reconstruction of

signaling networks in the human host cell during the Salmonella infection.

2.5 Network Analysis and Visualization

A simple network analysis is usually performed using several network statistics. These

network statistics include summary statistics of the network, centrality measures and

clustering. First, the network summary statistics provide us with a more local level of

understanding about how actors take part in. The simplest way of getting a local

understanding is to quantify the node’s degree which is number of incoming (in-degree)

and outgoing (out-degree) edges to that node. Second, centrality measures help us

understand how important actors in a network in a quantifiable way. There are several

common centrality measures; degree, closeness, betweenness. Degree centrality

quantifies the importance of nodes based on their degrees. Closeness centrality takes

the distance between nodes into account when quantifying the importance of nodes.

Betweenness centrality measures how much linked a node to other nodes in the network

and quantifies according to this information (Salter‐Townshend, White, Gollini, & Murphy,

2012). Clustering is another common step in the network analysis which focuses on

dividing the network into highly connected subnetworks. The Girvan-Newman clustering

method makes use of centrality measures for clustering (Girvan & Newman, 2002). Many

clustering algorithms try to find the cluster of nodes with maximized modularity (Danon,

Diaz-Guilera, Duch, & Arenas, 2005). For example, the Louvain method for community

detection divides networks into clusters with the highest modularity (Vincent, Jean-Loup,

Renaud, & Etienne, 2008). There are also methods that find overlapping clusters

including CFinder algorithm that makes use of k-cliques (Adamcsek, Palla, Farkas,

Derenyi, & Vicsek, 2006; Palla, Derenyi, Farkas, & Vicsek, 2005).

Cytoscape is the most commonly used software for analyzing and visualizing networks

(Shannon et al., 2003). It provides a direct access to PPI databases for multiple

organisms. There are also downloadable functional pathways in the software. Many

statistics and parameters about networks can be obtained using the built in network

analysis option. Visualization of node and edges can be customized with many options

directly or assigning data such as customizing size of nodes according to their degrees.

The software can be extended by installing plugins using the Cytoscape interface or its

website. CyToStruct, a Cytoscape plugin, can combine structural information by

integrating PPI databases at molecular level (Nepomnyachiy, Ben-Tal, & Kolodny, 2015).

Another Cytoscape plugin ClueGO integrates Gene Ontology, KEGG and BioCarta

functional annotations to do functional enrichment for the network (Bindea et al., 2009).

Page 27: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

13

CHAPTER 3

3 MATERIALS AND METHODS

3.1 Datasets

We have used the global temporal phosphoproteomic dataset, where four time points,

which are 2, 5, 10, and 20 minutes, after Salmonella infection in human cell, were

selected (Rogers et al., 2011). Another dimension of this dataset is the cellular

compartments where the phosphorylation site is identified as membrane, cytosol, or

nucleus. At each time point, if the change in the phosphorylation status of a peptide is

significantly altered compared to the uninfected cells (p-value < 0.05) and the variance

across biological replicates are small (variance < 15%) then that peptide is selected for

the next step of the analysis, so added to the dataset. Then, each peptide selected, has

been mapped to their HUGO Gene Nomenclature Committee (HGNC) symbols using the

Database for Annotation, Visualization and Integrated Discovery (DAVID) web server

(Huang da, Sherman, & Lempicki, 2009). If multiple peptides map to a single protein, then

the peptide with the maximum value of fold change for phosphorylation level is included

for further analysis. Since the overlap between the three cellular compartments within the

same time points were very small, we merged the phosphoproteins at the same time

point but in different compartments.

Besides the global phoshoproteomic data, the human protein interactome is used for the

data integration and modeling. The interaction data from iRefWeb has been downloaded

which has 113,248 weighted interactions between 15,684 proteins (Turner et al., 2010).

Also, a Salmonella effector to human host protein interactome, which consists of 40

effectors and 50 host proteins connected with 62 interactions, has been retrieved

(Schleker et al., 2012).

3.2 Network Modeling

The network modeling procedure is composed of two stages; (i) network construction

using the Prize-collecting Steiner Forest (PCSF) approach, and (ii) network

Page 28: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

14

reconstruction using the Integer Linear Programming (ILP) based edge inference

approach. These two approaches complement each other as the PCSF approach reveals

the hidden components in signaling by finding the high confidence regions in the human

interactome, and the ILP-based edge inference approach reconstructs interactions and

their directionality by using temporal data as constraints. First, the phosphopeptides with

p-value < 0.05 and variance <15% in the downloaded experimental dataset were mapped

to their HUGO symbols to obtain fold changes for distinct set of phosphoproteins. When

there are multiple fold changes for the same phosphoprotein, then the maximum fold

change was assigned to it. Since we saw that there is a very few overlap within different

cell compartments (nucleus, cytoplasm, membrane) for phosphoproteins in the

experimental dataset, the phosphoproteins from these compartments are merged. Next,

the phosphoproteins were split into fold changes datasets (containing the phosphoprotein

and its corresponding fold change at a time point) for the four time points (2, 5, 10 and

20 min). Then, these four fold changes datasets were separately used in the PCSF

approach along with a human interactome to discover hidden intermediate proteins (i.e.,

the Steiner nodes). The PCSF approach outputted a network for each time point that

includes the phosphoproteins from the experimental dataset, and hidden intermediate

proteins from the human interactome. Next, we converted the PCSF networks for the four

time points into a binary matrix which represents the presence (or activity) of each

phosphoprotein or hidden intermediate protein (in rows) at each time point (in columns;

1 shows that the protein is present at that time point, and 0 shows the protein is not

captured by the experiment or is not found as a hidden intermediate protein by PCSF).

This binary matrix and the same human interactome used in PCSF step are then used in

the ILP-based edge inference approach to obtain a single merged network whose

directionality is determined by using temporality information in the binary matrix. In Figure

4, the flowchart of our integrated approach is given.

Page 29: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

15

Figure 4. The flowchart of complete analysis. The dataset which includes temporal

fold changes of phosphopeptides at four different time points (t1 = 2 min, t2 = 5 min, t3 =

10 min, t4 = 20 min) and at three different locations (nucleus, cytoplasm, and membrane)

was split and converted into temporal fold changes datasets of the corresponding

phosphoproteins by taking the maximum fold change among phosphopeptides that were

observed at different locations and mapped to the same phosphoprotein. Next, we

applied PCSF approach for each fold changes dataset by integrating human interactome

in order to discover hidden intermediate proteins. The resulting networks (F1, F2, F3, F4)

are then used to form a binary matrix where the rows are time points and columns are

phosphoproteins. Each corresponding cell of the binary matrix represents a significant

change (p-value < 0.05 and variance <15%) in the phosphoprotein at the time point.

Finally, we applied an ILP-based edge inference approach by integrating human

interactome in order to validate and determine edges and edge directions.

3.2.1 Prize-collecting Steiner Forest Approach

PCSF is based on finding the high-confidence regions within a protein interactome, which

is used to recover the phosphoproteomic hits (i.e., the terminal nodes) and hidden

proteins from the global temporal phoshoproteomic data (i.e., the Steiner nodes) in this

Page 30: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

16

study (Tuncbag et al., 2013). In the optimization stage, two objectives are important;

avoiding the low-confidence protein-protein interactions in the final network and including

as many phosphoproteomic hits as possible. Each protein identified with a significantly

changed phosphorylation status at a time point is assigned a weight equal to the absolute

value of the log fold change in phosphorylation, which is called the “prize”. The cost of an

interaction is calculated from its confidence score where higher cost implies lower

confidence. The algorithm has to assign a cost for each interaction included in the

network, and pay a penalty for excluding a phosphoproteomic hit equal to its prize. For a

given, directed or undirected network G(V, E, c(e), p(v)) with a node set V and edge set

E, p(v) ≥ 0 assigns a prize to each node v ∈ V and c(e) > 0 assigns a cost to each edge

e ∈ E. The aim is to find a forest F(VF, EF) that minimizes the objective function:

𝒇(𝑭) = ∑ (𝛽 ∙ 𝑝(𝑣)) +𝒗∉𝑭

∑ (𝑐(𝑒))𝒆∈𝑭

+ 𝜔 ∙ 𝜅 (1)

where κ is the number of trees in the forest and β is the scaling factor. Another parameter

that is used at the optimization stage is the depth value (D) which represents the

maximum allowed number of edges from the root node to any terminal node. To convert

the PCSF problem into a Prize-collecting Steiner Tree (PCST) problem we have

introduced an extra root node v0 into the network connected to each terminal node t ∈ T

by an edge (t, v0) with cost ω where T ⊂ V. This optimization problem has been solved

with a message passing algorithm, the msgsteiner tool (Bailly-Bechet et al., 2010). The

forest F is defined as a disjoint collection of trees with all edges pointing to the roots. In

this work, depth is set to 10, ω is in the interval [1, 10] and β is in the interval [1, 10].

Optimum forests obtained by each parameter combination are merged together in order

to consider the suboptimal solutions. Finally, a PCSF is constructed for each time point.

The ω parameter is a scaling factor that controls the number of trees in a way that when

it is high, then we are not allowing high number of trees because it increases the overall

cost of having higher number of trees in Equation 1. Therefore, the lower ω will result in

solutions (or forests) with a small number of trees. The β parameter is also a scaling

factor which controls the trade-off between including more terminals and using less

reliable edges. As β controls the prize of the terminals nodes, with a high β, we can

include more terminals in the solution; however, the algorithm may also include these

terminals with less reliable edges from the human interactome, which is not useful.

Therefore, to obtain a useful solution, an interval for the β parameter should be

investigated and it should be picked where the number of terminal nodes and Steiner

nodes reaches the plateau and the number of terminal nodes has its maximum value

among other solutions. If the β parameter is given as zero, then the optimal solution can

be an empty forest.

Page 31: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

17

3.2.2 Integer Linear Programming based Edge Inference Approach

For constructing signaling networks using temporal phosphorylation data, we have used

an extended version of a previous ILP model that can handle both the temporal and

steady state perturbation data (Eren Ozsoy & Can, 2013). The extended ILP-based edge

inference approach proposed in (Eren Ozsoy et al., 2015) is highly scalable when

temporal data is available; therefore, in this study, we directly apply that method for the

construction of the whole Salmonella infection signaling network using a single integer

linear program. The details of the ILP model are given below.

A reference signaling network can be curated from literature or obtained from a public

database assuming that it is given as a directed graph G(V, E), where V represents the

node set (i.e., proteins) and E represents the edge set (i.e., protein-protein interactions),

with several source nodes si, and sink nodes tj. The steady state knock-down version of

this problem has been shown to be NP-complete (nondeterministic polynomial time

complete) (Hashemikhabir, Ayaz, Kavurucu, Can, & Kahveci, 2012), which means that it

is unlikely to find a solution to this problem which is fast and efficient. When the same

problem is formulated as a linear optimization problem, the solution of this optimization

problem provides a network, satisfying the experimental observations with minimum

number of changes (insertion or deletion) of the edges on the reference network. The raw

temporal phosphorylation data is assumed to be processed, and the binary activity data

is available for the proteins in the network. We have used the cutoffs p-value < 0.05 and

variance < 15 as described in the Section 3.1 to generate the binary activity data. Steiner

nodes are also assumed to be active based on their presence in the reconstructed PCSF

networks.

As the objective function of the model is to minimize the edit operations, i.e.,

insertions/deletions of edges, on the reference network, the proposed model also works

when there is no reference network available. For such cases, the smallest network

satisfying the phosphorylation data is sought. Let xij be the binary variable representing

the presence of the edge from node i to node j in the reference network. If the edge is

present, then the value of xij is 1, otherwise it is 0. Correspondingly, wij represents the

presence of the edge from node i to node j in the network to be reconstructed from

observations. For a graph G(V, E) with n nodes, the objective function is given in

Equation 2, which basically quantifies the difference between the reference network and

the reconstructed network.

In the solution phase, the binary matrix of state variables is used for the construction of

the linear constraints. A protein is assumed to be activated once the corresponding state

variable becomes 1. For the model, it does not matter what value the state variable is

Page 32: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

18

assigned thereafter. For the construction of the constraints, the kinematics of the system

is taken into consideration. A protein is assumed to be activated by any protein, which is

already activated at any previous time point and also a protein is able to activate any

protein at any of the following time points. The constraints are based on the following

assumptions:

1. Sources are the proteins which are activated at the first time point and sinks are

the proteins which are activated at the last time point.

2. Each source node and sink node has to be connected to the network.

3. At each time point, the proteins may only be activated by the upstream proteins,

which are active in preceding time points.

4. No direct edges from sources to sinks are allowed.

5. No edges between sources or sinks are allowed.

6. No self-edges are allowed.

Note that these assumptions do not allow an upstream edge. However, there may be

such edges in the reference network which are to be removed in the reconstructed

network. The algorithm may allow feedback loops if the temporal data and the human

interactome allow. For example, there can be a feedback loop with three nodes A (active

at t1 and t2), B (active at t2), and C (active at t2 and t3) and the edges from A to B, from B

to C and from C to A. Based on the above assumptions, the following graphical

constraints are derived.

1. There should be at least one edge going out of each source protein to the proteins

activated at the second time point.

2. There should be at least one edge going into each sink proteins from the proteins

activated at the last time point.

3. There should be at least one edge going into an intermediate node from the

upstream nodes activated at a previous time point.

4. There should be at least one edge going out of an intermediate node to one of

the downstream nodes, including the sink nodes.

Note that these constraints are derived only from the temporal phosphorylation data. It is

also possible to add additional constraints, if any perturbation experiment is available for

Page 33: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

19

the network. Let the set Vi be the set of proteins active at time point i. Let Vs be the set

of source nodes and Vt be the set of target (i.e. sink) nodes. The node set of the

reconstructed network V is the union of all source nodes, target nodes, and all the nodes

active at some time point. Let Vp be the set of nodes activated just before the sink nodes.

Let Vd be the set of downstream nodes that are activated after the activation of node i.

The overall Integer Linear Program is then given as:

Minimize ∑ ∑ |𝑥𝑖𝑗 − 𝑤𝑖𝑗|𝑛𝑗=1

𝑛𝑖=1 (2)

Subject to

∑ 𝑥𝑖𝑗 ≥ 1 for all 𝑖 ∈ 𝑉𝑠𝑗∈𝑉1 (3)

∑ 𝑥𝑖𝑗 ≥ 1 for all 𝑗 ∈ 𝑉𝑡𝑖∈𝑉𝑝

(4)

∑ 𝑥𝑖𝑗 ≥ 1 for all 𝑖 ∈ 𝑉𝑥𝑗∈𝑉𝑑

(5)

∑ 𝑥𝑖𝑗 ≥ 1 for all 𝑗 ∈ 𝑉𝑠+1𝑖∈(𝑉\𝑉𝑠) (6)

3.3 Assessment of the Improvement

In the first step, we only used the temporal phosphoproteomic data to reconstruct the

signaling networks without the PCSF algorithm. The fold changes datasets for each time

point are merged together into a binary matrix showing the presence of nodes at each

time point in order to determine edge directions and reconstruct the final network by using

the ILP-based edge inference approach. The human protein interactome described in 2.3

is assigned as the reference network for the ILP-based edge inference analysis. In the

second step, the PCSF and ILP-based edge inference approaches are combined to

provide the hidden intermediate nodes (from the same human protein interactome) based

on the proteins identified at different time points from the experimental data, in addition

to the direction information for the edges. Since the final network from the first step had

very poor connectivity compared to the final network from the second step, the resulting

directed network from the second step is then used for further analyses and visualization.

3.4 Network Analysis and Clustering

Critical nodes in the network were ranked by calculating four attributes: the path

frequency, in-degree, out-degree, the sum of in-degree and out-degree, and the

betweenness centrality. A simple path is an ordered sequence of nodes in a graph such

that each node occurs at most once in this sequence and each pair of consecutive nodes

is connected by an edge (Pavlopoulos et al., 2011). Given all the possible simple paths

between the terminal nodes at 2 minutes and the terminal nodes at 20 minutes, the path

Page 34: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

20

frequency of a node p is defined as the ratio of the simple paths that include p over all

simple paths. The network analysis has been performed by using the Python NetworkX

package (Schult & Swart, 2008). The network clustering is done using the Louvain

method for community detection, which computes the clusters with the highest modularity

without requiring a cluster size input (Vincent et al., 2008). Modularity is defined as a

metric that evaluates how much densely the nodes are connected to within a cluster as

compared to they would be in a random network (Newman & Girvan, 2004). We used a

Python implementation for the Louvain method (Aynaud, 2009). The network was

clustered into 25 partitions by the Louvain method and these partitions are used in

functional enrichment analysis. Python version 2.7.9 has been used in these analyses

(Python Software Foundation, 2016).

3.5 Functional Enrichment Analysis

After clustering the network using the Louvain method, functional enrichment of each

cluster has been performed using RDAVIDWebService package which is an R interface

to DAVID’s web server (Fresno & Fernandez, 2013; Jiao et al., 2012). We used the

following categories for functional enrichment analysis: Gene Ontology (GO) biological

process, GO cellular component, GO molecular function, KEGG pathways and Reactome

pathways. Then, we have collected the functional enrichment results for each cluster and

generated a matrix where rows are functional annotation terms and columns are

corresponding p-values for each cluster for all the data with adjusted p-value < 0.05. We

also took the negative logarithm of the p-values with base 10 for better understanding

and visualization during the matrix generation. Finally, the results were visualized as

heatmaps using pheatmap R package (Kolde, 2012). R version 3.2.2 has been used in

these analyses (R Development Core Team, 2016).

3.6 Structural Analysis of the Nodes in the Reconstructed Network

We also analyzed the molecular structure of certain nodes that appear to be important

based on our network analysis in order to discover binding preferences of the important

interacting nodes. We searched for these nodes on Interactome3D website for structural

molecular-based interactions (Mosca et al., 2013). Next, we downloaded the available

first neighbor structural interactions (only rank 1.0) of these nodes from Interactome3D in

Protein Data Bank (PDB) format. We only downloaded the ones that our integrated

approach has found. Then, we did structural alignment for each first neighbor to the

corresponding node by considering the node as reference using UCSF Chimera

(Pettersen et al., 2004). Finally, we exported the aligned structural interactions as images

in UCSF Chimera.

Page 35: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

21

3.7 Network Visualization

We used Cytoscape desktop version to generate the image files for the final

reconstructed network and several subnetworks from the final network for this study

(Shannon et al., 2003). Since the analysis included different types of nodes at different

time points, we used a certain style for all network images throughout the study. The

nodes were drawn as circles. The phosphoproteomic hits were shown with a single black

border, while the Steiner nodes also had a thick pink border. The nodes which are known

to be targeted by Salmonella effectors given in a Salmonella-human interactome are

labeled in purple, whereas the other host nodes were labeled in black (Schleker et al.,

2012). The fill color of node is used to represent the time points when the node is active.

If a node was active at multiple time points, the combination of multiple colors like a pie

chart was used for visualization. The activity of nodes at 2, 5, 10 and 20 min was indicated

in yellow, green, blue and red, respectively. The size of the circle was also set accordingly

based on the degree of the node. We also generated an interactive online visualization

of the final reconstructed network using Cytoscape Web (Lopes et al., 2010).

Page 36: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

22

Page 37: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

23

CHAPTER 4

4 RESULTS AND DISCUSION

4.1 Reconstruction of Temporal Signaling Networks in Salmonella-infected Human Cells

Modeling signaling networks is a more challenging task when the time dimension is taken

into account. In a given temporal omic data, one of the problems to be solved is how the

omic hits are connected and what are the upstream and downstream regulators in the

final signaling network. The Integer Linear Programming (ILP) based edge inference

approach uses the temporal information as constraints and reconstructs edges and their

direction between the omic hits towards solving this problem. But ILP-based edge

inference approach cannot identify the missing components in the omic hits for the

representation of the whole signaling system. The Prize-collecting Steiner Forest (PCSF)

approach solves this problem by searching for the most confident region of the

interactome that will include most of the experimental hits and adds hidden components,

called Steiner nodes. As a limitation, if the initial interactome is undirected, the PCSF

approach cannot assign directions to the edges and cannot use the temporal constraints,

which can be solved by the ILP-based edge inference approach. As the different aspects

of the ILP-based edge inference and PCSF network modeling approaches are

complementing each other, we have combined these two approaches to reconstruct the

temporal signaling networks in Salmonella-infected human cells.

In this work, we have used a temporal phosphoproteomic dataset of Salmonella-infected

human cells (Rogers et al., 2011). This dataset has been first divided into three parts at

each time point based on the cellular compartment (membrane, cytosol, and nucleus).

Additionally, the overlaps of the significantly phosphorylated proteins across different

time points and different cellular compartments are compared. We have noted that the

overlap between compartments within the same time points were very small, so the

phosphoproteins at the same time point but in different compartments are safely merged

(See Table 1). Next, for each time point, we have prepared a set of significantly

phosphorylated proteins along with their fold changes during phosphorylation. To show

Page 38: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

24

the importance of revealing hidden components that were not present in the

phosphoproteomic hit set, we first ran only the ILP-based edge inference approach on

the data. The result was a disconnected network with many small subnetworks composed

of three or four nodes which were not a representation of any pathway. The network

includes 451 nodes and 377 edges (See Figure 5).

Page 39: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

25

Fig

ure

5:

Th

e r

eco

nstr

uc

ted

netw

ork

usin

g o

nly

IL

P-b

ase

d e

dg

e i

nfe

ren

ce

wit

ho

ut

usin

g P

CS

F.

The

netw

ork

is d

isconn

ecte

d w

ith

45

1 n

ode

s a

nd

37

7 e

dg

es.

Page 40: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

26

Figure 5 clearly showed that experimental hits alone are not enough to represent the

complete network as there are missing components between these nodes. At this point,

we took advantage of the PCSF approach in revealing hidden nodes and prepared a

reference network to be used in the ILP-based edge inference approach. For this

purpose, multiple PCSFs have been constructed for each time point and merged together

to form a single network. To pipe this output into the ILP-based edge inference approach,

we have prepared a network matrix where rows are proteins, columns are time points.

When a node is present at a time point either as a phosphoproteomic hit or as a hidden

node revealed by the PCSF approach, it is labeled as 1, otherwise it is 0. The ILP-based

edge inference approach reconstructed a network based on the provided matrix as a

reference network. The final network was composed of 658 nodes and 869 edges. An

interactive visualization and related source information of the final network which is

developed using Cytoscape Web is available at

http://mistral.ii.metu.edu.tr/salmonella/salmonella_main.html (See Figure 6) (Lopes et

al., 2010). As shown in Figure 7, the resulting network keeps the temporal information

and also reveals hidden proteins and directions of the interactions. 547 out of 869

interactions were present in the reference human interactome. Remaining 322

interactions were novel, predicted interactions.

Figure 6: Snapshot of the interactive online visualization of the reconstructed

network

Page 41: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

27

Table 1: Representing overlapping proteins in different time points and different

locations. 2, 5, 10, and 20 are time points in minutes. M; membrane, C: cytoplasm, N: nucleus

2 5 10 20

M C N M C N M C N M C N

2 M 9 0 0 1 0 0 0 0 0 1 0 0

C 0 10 0 0 1 0 0 0 0 0 2 0

N 0 0 23 0 0 2 0 0 2 0 0 3

5 M 1 0 0 15 0 5 4 0 2 0 0 2

C 0 1 0 0 17 0 0 2 0 0 4 0

N 0 0 2 5 0 38 3 2 10 1 0 4

10 M 0 0 0 4 0 3 17 1 3 1 1 5

C 0 0 0 0 2 2 1 24 3 2 4 0

N 0 0 2 2 0 10 3 3 34 2 2 6

20 M 1 0 0 0 0 1 1 2 2 9 2 1

C 0 2 0 0 4 0 1 4 2 2 30 2

N 0 0 3 2 0 4 5 0 6 1 2 42

4.2 Salmonella-human Interactome and Functional Enrichment of the Reconstructed Network

To check if this reconstructed network is specific to Salmonella infection, we have

searched for the known targets of Salmonella effectors in the network. In this step, we

have used the interactome that has been curated and compiled from published studies

where 40 effectors interact with 50 human proteins through 62 interactions (Schleker et

al., 2012). We found that 13 proteins out of 50 were present in the reconstructed network,

which are known to be the targets of Salmonella effectors. The enrichment of Salmonella

effector targets in the reconstructed network is statistically significant when compared to

the overall human interactome (p-value = 8.2 x 10-8, by hypergeometric test) which

implies the specificity of the reconstructed network to Salmonella infection. In Table 2,

targets of Salmonella effectors present in the reconstructed network are listed with their

functions and whether they are phoshoproteomic hits or found by our approach as

intermediates. Additionally, we have checked the Gene Ontology and KEGG pathway

enrichments for the overall reconstructed network. Regulation of transcription (p-value =

2.0 x 10-3), apoptosis (p-value = 3.9 x 10-10), intracellular transport (p-value = 1.0 x 10-8),

cell cycle (p-value = 2.1 x 10-10), cytoskeleton organization (p-value = 2.3 x 10-17) are

some of the processes enriched in the reconstructed network.

Page 42: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

28

Fig

ure

7.

Vis

ua

l re

pre

sen

tati

on

of

the r

eco

nstr

uc

ted

sig

nali

ng

netw

ork

of

Salm

on

ella

-in

fec

ted

ho

st

ce

ll. T

he

reconstr

ucte

d n

etw

ork

is c

luste

red (

Clu

ste

r #0 is a

t th

e le

ft-t

op

and

Clu

ste

r #24

is a

t th

e r

ight-

bott

om

) and

th

e la

yo

ut

of

the

ne

two

rk h

as b

ee

n a

rra

ng

ed

acco

rdin

gly

fo

r vis

ua

liza

tion

. T

he

sty

le o

f th

e n

etw

ork

is g

ive

n in 3

.7.

Page 43: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

29

More specifically, SNARE interaction in vesicular transport (p-value = 1.0 x 10-6), mTOR

signaling (p-value = 1.5 x 10-4), and MAPK signaling (p-value = 9.9 x 10-6) pathways are

among the enriched pathways. In several studies, Salmonella infection was shown to

down-regulate mTOR pathway to induce apoptosis (Lee et al., 2014). The Salmonella

effector AvrA targets MAPK signaling, mTOR signaling, and NF-κB pathways to

manipulate the processes in the host cell (Liu, Lu, Xia, Wu, & Sun, 2010).

Table 2. Targets of Salmonella effectors in the reconstructed network. *Intermediate, Steiner node; pp-hit, Phosphoproteomic hit.

Host protein Function Pathogen effectors Type of the node in the network*

CDC42 Actin filament bundle assembly SopB, SopE intermediate

MTOR Protein serine/threonine kinase activity

AvrA intermediate

RHOA Actin cytoskeleton organization SifA, SseJ intermediate

YWHAG Negative regulation of protein serine/threonine kinase activity

SspH2 intermediate

TP53 Apoptotic activity AvrA intermediate

TLN1 Structural constituent of cytoskeleton

SseL intermediate

KLC1 Microtubule motor activity PipB2 pp-hit

FLNA Actin cytoskeleton reorganization

SseI, SrfH, SspH2 pp-hit

CTNNB1 Cytoskeletal anchoring at plasma membrane

AvrA pp-hit

VIM Intermediate filament organization

SptP pp-hit

IQGAP1 Ras GTPase activator activity SseI, SrfH pp-hit

JUP Cytoskeletal anchoring at plasma membrane

SseF pp-hit

KRT18 Intermediate filament cytoskeleton organization

SipC or SspC pp-hit

MAPK1 Activation of MAPK activity AvrA, SpvC pp-hit

OSBPL9, OSBPL10, OSBPL11

Lipid transport SseL pp-hit (except OSBPL11)

STX1A, STX3, STX4, STX5, STX7, STX12, STXBP5

Intracellular protein transport intermediate (except STX7, STX12)

4.3 Functional Enrichment of the Clusters in the Reconstructed Network

The reconstructed network is divided into 25 clusters using the Louvain method of

community detection. With an adjusted p-value (Benjamini-Hochberg p-value correction)

cutoff of 0.05 from DAVID functional enrichment results, half of the clusters showed

Page 44: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

30

significant GO terms, KEGG and Reactome pathways. A few clusters had very small

number of genes which resulted in very poor functional enrichment and several clusters

did not have any enrichment under adjusted p-value cutoff 0.05.

We did GO biological process enrichment analysis to understand the processes where

the phosphoproteins from different clusters take part in. GO biological process results

show Cluster 11 is enriched in processes such as organelle organization (adjusted p-

value = 1.0 x 10-4), actin cytoskeleton organization (adjusted p-value = 7.0 x 10-5),

cytoskeleton organization (adjusted p-value = 4.2 x 10-5) and actin filament-based

process. Cluster 10 and Cluster 5 are enriched in regulation of transcription (adjusted p-

value = 3.6 x 10-2 and 2.5 x 10-2, respectively). Cluster 5 is also enriched in interspecies

interaction between organisms (adjusted p-value = 2.0 x 10-5). Cluster 8 is enriched in

organelle organization (adjusted p-value = 3.6 x 10-2). Cluster 4 is enriched in membrane

organization (adjusted p-value = 1.9 x 10-2) and vesicle-mediated transport (adjusted p-

value = 1.9 x 10-2). GO biological process enrichment results are given in Figure 8.

Page 45: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

31

Figure 8: Enriched Gene Ontology biological process terms for clusters in the

reconstructed network. Some of the related terms have been zoomed in on the right

of the figure. Adjusted p-value cutoff was 0.05 for the enrichment results and the

adjusted p-values were transformed into negative logarithm base 10.

We have also checked GO cellular component enrichment to understand where the

phosphoproteins from different clusters in the network are localized. GO cellular

component results revealed that Cluster 8 is enriched in organelle part (adjusted p-value

= 8.2 x 10-6), intracellular organelle part (adjusted p-value = 7.9 x 10-6), macromolecular

Page 46: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

32

complex (adjusted p-value = 3.0 x 10-6), cytoskeletal part (adjusted p-value = 1.0 x 10-7),

actin cytoskeleton (adjusted p-value = 7.0 x 10-4), and microtubule cytoskeleton (adjusted

p-value = 8.0 x 10-3). Cluster 1 is enriched in non-membrane-bounded organelle (adjusted

p-value = 7.4 x 10-5) and intracellular non-membrane-bounded organelle (adjusted p-

value = 7.4 x 10-5). Cluster 5 is enriched in nucleus (adjusted p-value = 4.6 x 10-7). Cluster

21 is also enriched in nucleus (adjusted p-value = 4.7 x 10-5) and nuclear part (adjusted

p-value = 7.4 x 10-5). GO cellular component enrichment results are given in Figure 9.

Figure 9: Enriched Gene Ontology cellular component terms for clusters in the

reconstructed network. Some of the related terms have been zoomed in on the right

of the figure. Adjusted p-value cutoff was 0.05 for the enrichment results and the

adjusted p-values were transformed into negative logarithm base 10.

Page 47: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

33

We also did an enrichment analysis for GO molecular function to understand the functions

of the phosphoproteins from different clusters in the network. GO molecular function

enrichment results revealed that Cluster 11 is enriched in actin binding (adjusted p-value

= 1.0 x 10-4) and cytoskeletal protein binding (adjusted p-value = 5.8 x 10-6). Cluster 5 is

enriched in transcription factor binding (adjusted p-value = 4.0 x 10-3), NF-kappaB binding

(adjusted p-value= 5.0 x 10-2), and nucleic acid binding (adjusted p-value = 7.0 x 10-3)

according to the GO molecular function enrichment results (See Figure 10).

Figure 10: Enriched Gene Ontology molecular function terms for clusters in the

reconstructed network. Adjusted p-value cutoff was 0.05 for the enrichment results

and the adjusted p-values were transformed into negative logarithm base 10.

Page 48: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

34

The KEGG and Reactome pathways functional enrichment results also provide insights

about the clusters in the reconstructed network. KEGG pathway enrichment showed that

Cluster 8 is enriched in mTOR signaling pathway (adjusted p-value = 1.0 x 10-2) and

Cluster 4 is enriched in SNARE interactions in vesicular transport (adjusted p-value = 4.0

x 10-2) (See Figure 11A). Also, Cluster 11 is enriched in regulation of actin cytoskeleton

(adjusted p-value = 4.0 x 10-2). Reactome pathway enrichment results had fewer

pathways which include signaling by Rho GTPases (adjusted p-value = 3.0 x 10-4) for

Cluster 11 and apoptosis (adjusted p-value = 3.2 x 10-2) for Cluster 1 (See Figure 11B).

Figure 11: Enriched KEGG and Reactome pathways for clusters in the

reconstructed network. (A) KEGG pathway enrichment results for clusters. (B)

Reactome pathway enrichment results for clusters. Adjusted p-value cutoff was 0.05 for

both enrichment results and the adjusted p-values were transformed into negative

logarithm base 10.

4.4 Target Selection Approach the Reconstructed Network

With the help of network analysis techniques, we were able to rank the nodes present

based on their importance in the reconstructed network. The most central nodes (based

on different measures) are listed in Table 3 where 10 proteins that have known functions

in apoptosis are observed. 14-3-3ζ (YWHAZ) and p53 (TP53) are the most frequently

observed proteins on the simple paths passing through the hits from 2 minutes to 20

minutes. MAPK1, CDC42, MAP3K3, and 14-3-3δ (YWHAG) behave like signal

transducers where the number of incoming edges are very low compared to other nodes;

while MAPK1 also behaves like a signal receiver when the number of edges of each node

were compared. These proteins are critical for the structure of the network; in other words,

these proteins have the potential to reveal clinically important targets.

Page 49: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

35

Table 3. Top ranking proteins in the reconstructed network. * Steiner node, **

Salmonella–human interactome (SHI) node, *** Both Steiner node and SHI node.

Name Function Localization Path freq.

In degree

Out degree

Betw. centrality

ACTG1* structural constituent of cytoskeleton

actin cytoskeleton

7.3 2 5 0.0017

AIFM1* apoptotic process mitochondrion 6.8 1 1 0.0017

APBB2 extracellular matrix organization

lamellipodium 0.0 1 24 0.0001

ATXN1* DNA binding cytoplasm, nucleus

14.8 8 5 0.0021

CDC42*** actin cytoskeleton organization

cytoskeleton 4.2 1 13 0.0002

CFL1 actin cytoskeleton organization

cytoskeleton 24.3 8 1 0.001

CIC DNA binding nucleus 9.5 5 1 0.0008

CRTC3 cAMP response element binding protein binding

cytoplasm, nucleus

12.7 1 0 0.0

CTNNB1* alpha-catenin binding cytoskeleton 0.0 2 9 0.0008

EIF2AK2* protein serine/threonine kinase activity

cytoplasm, nucleus

1.6 6 2 0.001

EIF3G translation initiation factor activity

cytoplasm, nucleus

6.8 2 2 0.002

EIF4G1 translation factor activity, nucleic acid binding

cytosol, membrane

37.6 8 6 0.004

HSPB1 cellular component movement cytoskeleton 22.0 3 2 0.0012

MAP3K3* activation of MAPKK activity cytosol 0.0 0 9 0.0

MAPK1** MAP kinase activity cytoskeleton 11.1 12 1 0.0015

MPP6* maturation of 5.8S rRNA membrane 9.8 2 2 0.002

MPZL1 cell-cell signaling membrane 24.9 3 1 0.0005

RELA* sequence-specific DNA binding transcription factor activity

nucleus 0.0 3 11 0.0016

SRC* protein tyrosine kinase activity membrane, cytoskeleton

2.1 3 13 0.0006

TOP2A chromatin binding cytoplasm, nucleus

0.0 6 1 0.0006

TP53*** p53 binding cytoplasm, nucleus

32.8 15 14 0.011

UBE2I* SUMO ligase activity cytoplasm, nucleus

7.3 4 8 0.0016

YWHAG*** protein kinase binding cytoplasmic vesicle membrane

0.0 0 13 0.0

YWHAZ* protein kinase binding cytoplasmic vesicle membrane

65.1 34 11 0.0093

ZC3HAV1 cellular response to exogenous dsRNA

cytoplasm, nucleus

24.9 1 0 0.0

Page 50: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

36

4.5 CDC42 is a clinically important target in Salmonella infection

Two effectors of Salmonella, SopE and SopB, stimulate CDC42 (cell division control

protein 42) protein which induces rearrangements in the cytoskeleton. CDC42 were not

present in the set of phosphoproteomic hits; however, it was located in a central part of

the reconstructed network, which is revealed by the PCSF approach. Although CDC42

has a phosphorylation site at tyrosine 64 (Tu, Wu, Wang, & Cerione, 2003), which is not

present in the initial phosphoproteomic data, our approach correctly locates CDC42 in

the final reconstructed network. Based on the total degrees, CDC42 is among the top 10

ranking proteins. In the reconstructed network, CDC42 has only one incoming edge, but

has 13 outgoing edges, which is consistent with the infection mechanism of Salmonella

where stimulation of CDC42 leads to activation of many downstream signaling

components including p21-activated kinases (PAKs) and PBD domain containing

proteins (Galan & Zhou, 2000). PAK4 and PBD domain containing protein CDC42EP1

are among the downstream partners in the reconstructed network. Also, when we

zoomed into the first degree neighbors of CDC42, their downstream components in our

network can be observed (See Figure 12A). Some of these partners are active at 5 min,

or at 10 min, or at other time points. The CDC42 shows a hub-like character in the

reconstructed network. Hub proteins cannot interact with all their partners at the same

time. They either adapt multiple binding sites or use a single binding site repeatedly. This

property of hubs has been well-established for TP53 protein where four binding sites are

repeatedly used to interact with different partners (Tuncbag, Kar, Gursoy, Keskin, &

Nussinov, 2009). We have checked this property in CDC42 to understand its interactions

by searching for the available structural data in Protein Databank (PDB) (Berman et al.,

2000) and Interactome3D (Mosca et al., 2013). We have found six interactions out of 13

in atomic detail (See Figure 12B). Structural data provides information about at which

region two proteins are interacting. Analysis of the binding site for each protein pair has

shown that CDC42 is using the same binding region completely or partially to interact

with its partners and this property is a characteristic of hub proteins. Also, the downstream

partners of CDC42 are active at different time points as illustrated in Figure 12A, which

also shows the mutually exclusive character of the interactions. For example, PAK4 is in

the reconstructed network showing its effect after infection at time points 10 and 20

minutes. PARD6A and ARHGAP32 effect at 10 and 20 minutes, respectively. The partner

proteins are effective at different time points and CDC42 can bind to these proteins in a

mutually exclusive manner. IQGAP1 is another downstream component, which promotes

Salmonella invasion by binding to CDC42 and knockdown of IQGAP1 was shown to be

reducing the invasion (Brown, Bry, Li, & Sacks, 2007).

Page 51: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

37

Figure 12. CDC42 and its interactions in the reconstructed network. (A) The region

where CDC42 and its first neighbors are located in the reconstructed network. The style

of the network is given in 3.7 where CDC42, EIF2AK2, BCR, BAIAP2, ARHGAP32,

PARD6A, and PAK4 are Steiner nodes and others are phosphoproteomic hits. (B)

Structural details of CDC42 interactions where CDC42 uses the same binding site

completely or partially to interact with its partners. Here, PAK6 is a structural homolog

of PAK4; therefore, to show similar binding of PAK4, it is represented with PAK6.

4.6 The reconstructed signaling network revealed many other potential clinical targets

Besides CDC42, some other targets of Salmonella effectors were located correctly in the

reconstructed network although they were not present in the initial phosphoproteomic

data; such as 14-3-3δ, RHOA, TP53, TLN1 (Schleker et al., 2012). Also pathogen targets

such as β-catenin, MAPK1, IQGAP1 are observed in the reconstructed network as

phosphoproteomic hits (Schleker et al., 2012). We also observed additional proteins such

as ACTG1 (actin gamma 1) and STX4 (syntaxin 4) which are not phosphoproteomic hits

and they are not known targets of Salmonella effectors according to the Salmonella-

human interactome; however, they have similar binding pattern like CDC42 interacting

with their first neighbors at the same binding site according to Interactome3D and it has

been shown that a Salmonella effector SpvB interacts with mouse actin gamma (Mosca

et al., 2013; Tezcan-Merdol et al., 2001). Moreover, STX4 is known to form a SNARE

complex with SNAP23 under regulation of VAMP8 that is regulated by a Salmonella

effector SopB by increasing local levels of PI3P (Dai, Zhang, Weimbs, Yaffe, & Zhou,

2007).

ACTG1 (actin gamma 1) is one of the major proteins of muscle and cytoskeleton

Page 52: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

38

(Khaitlina, 2001). Since the Salmonella infection leads to actin-cytoskeleton

reorganization, ACTG1 has an important role during the infection (Finlay, Ruschkowski,

& Dedhar, 1991). ACTG1 is activated by proteins at 5 and 10 minutes and it interacts with

several proteins such as CFL1, CFL2 and DSTN at 10 minutes and in the reconstructed

signaling network (See Figure 13A). Similar to CDC42, the 5 outgoing edges from

ACTG1 suggests that it functions like a signal transducer. Among these neighbors we

found structure-based interactions for 3 of them in top ranked models in Interactome3D,

which are given in Figure 13B where CFL2 and DSTN use the same binding site but

CFL1 uses a different binding site (Mosca et al., 2013).

Figure 13: ACTG1 and its interactions in the reconstructed network . (A) The

region where ACTG1 and its first neighbors are located in the reconstructed network.

The style of the network is given in 3.7 where ACTG1, PRSS23, and RELA are Steiner

nodes and others are phosphoproteomic hits. (B) Structural details of ACTG1

interactions where ACTG1 uses the same binding site completely or partially to interact

with its partners CFL2 and DSTN but it uses a different binding site for CFL1.

STX4 (syntaxin 4) is a Syntaxin family protein that takes part in vesicle trafficking and it

has been shown to be required for GLUT4 (glucose transporter type 4) translocation

(Yang et al., 2001). STX4 is activated by proteins at 2, 5 and 10 minutes and it interacts

with several proteins such as VAMP2, VAMP4 and SNAP23 at 5 and 10 minutes in the

reconstructed signaling network (See Figure 14A). From these interactions, 3 of them

had structural information in Interactome3D which are given in Figure 14B. Although we

do not see many outgoing edges from STX4 in the network as in CDC42 and ACTG1, it

Page 53: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

39

binds at the same position to its first neighbors according to the top ranked models in

Interactome3D (See Figure 14B) (Mosca et al., 2013).

Figure 14: STX4 and its interactions in the reconstructed network. (A) The region

where STX4 and its first neighbors are located in the reconstructed network. The style

of the network is given in 3.7 where STX4, SNAP23, VAMP2 and VAMP4 are Steiner

nodes and others are phosphoproteomic hits. (B) Structural details of STX4 interactions

where STX4 uses the same binding site completely or partially to interact with its

partners.

In addition to these targets, mTOR pathway was found to be enriched in the reconstructed

network. mTOR signaling was known to be altered after Salmonella infection and mTOR

is a phosphoproteomic hit having significant effect at 10 minutes in our data. When we

have investigated the neighbors of the mTOR protein (See Figure 15A) RHEB, a direct

regulator of mTOR (Long, Lin, Ortiz-Vega, Yonezawa, & Avruch, 2005), is observed as

an interactor of mTOR in the reconstructed network. Also, RPTOR binding to mTOR and

EIF4EBP1 was recovered in our network. The mTOR - RHEB complex induces

phosphorylation of EIF4EBP1 (Long et al., 2005). Even though RHEB is an important

player in the activation of EIF4EBP1, it was not observed within the initial

phosphoproteomic hits. The proposed two-step modeling approach was able to locate

RHEB in the final network, completing the missing interactions of the signaling pathway.

According to our network, signaling in the RPTOR-mTOR-Rheb-EIF4EBP1 axis starts at

5 minutes and continues until 10 minutes.

Page 54: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

40

RHOA, is a GTPase that functions in the actin cytoskeleton organization (Hall, 1998). It

was a Steiner node in final network as a target of Salmonella effectors. It has 5 binding

partners in the reconstructed network of where four of them are upstream interactors and

only one is a downstream interactor (See Figure 15B). Among the upstream interactors,

RPS6KA4 is effective at 2 minutes and the remaining ones are active at 5 minutes. The

downstream interactor INPPL1 is effective at 10 minutes. So, the final reconstructed

network suggests that RHOA receives signals from proteins active at minute 2 and minute

5 and transmits these signals until minute 10.

The 14-3-3δ protein (YWHAG) shows a pattern similar to CDC42 where the incoming

interactions are not present, but there are many outgoing interactions from 14-3-3δ which

implies that 14-3-3δ is a signal mediator for the downstream components of the network.

In Figure 15C, first neighbors of 14-3-3δ are illustrated in the network. 14-3-3δ is a

Steiner node, a known target of Salmonella effectors, and it is effective at minutes 2, 5,

and 20. Its 13 outgoing edges suggest a function like a signal transducer, sending signals

to many downstream proteins at different time points.

Also, seven proteins from the Syntaxin family together with STX4 were present in the

reconstructed network and only three of them was a phosphoproteomic hit, others were

Steiner nodes found by our approach (See Figure 15D). Syntaxins function in vesicle

trafficking which is an important process in Salmonella replication and transport.

Salmonella effectors hijack syntaxins by binding them (Ramos-Morales, 2012).

Page 55: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

41

Figure 15. Visualization of the first degree neighbors of selected proteins. (A)

mTOR, (B) RHOA, (C) YWHAG, and (D) Syntaxins in the reconstructed network in

Figure 2. The style of the network is given in 3.7.

Finally, oxysterol-binding proteins (OSBPs) are also present in the reconstructed

network, which are known to be enhancing replication of Salmonella in the host cell by

interacting with the Salmonella effector SseL (Auweter, Yu, Arena, Guttman, & Finlay,

2012). This interaction can lead to the exploitation of OSBP dependent pathways altered

during the Salmonella infection.

Page 56: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

42

Page 57: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

43

CHAPTER 5

5 CONCLUSION

5.1 Conclusions

Improvements in the high-throughput technologies revolutionized the systems biology

era. Instead of comparing lists of genes or proteins, finding the interactions, regulations

and mechanisms within a set of significantly altered proteins or genes have gained

importance. In addition, integration of multiple ‘omic’ hits in a biologically meaningful way

is now crucial to better understand the functional pathways and cellular mechanisms that

are active during a disease or perturbation. For this purpose, several network modeling

approaches have been developed, which successfully reveals clinically valuable targets

and important pathways, especially in several cancer types. Another dimension of omic

data is its temporality, which makes the network modeling process more challenging, as

instead of simply connecting omic hits, the time related constraints have to be considered

during network reconstruction.

In this work, we have provided a proof-of-concept application of an integrative approach,

which benefits from two different network reconstruction methods, namely Prize-

collecting Steiner Forest (PCSF) and Integer Linear Programming (ILP) based edge

inference methods. Although both methods infer networks from experimental omic hits,

they perform better in different parts of the modeling. The former reveals the hidden

components of the signaling, but cannot handle time as a dimension. The latter can

integrate temporal information to reconstruct directed edges, but cannot add missing

signaling components to complete the lacking parts of the signaling. These

complementary aspects of the methods inspired us to combine both approaches to model

the signaling network of human cells after Salmonella infection based on the temporal

phosphoproteomic data. We have selected Salmonella infection, because the signaling

changes in the host cells are still unknown despite the efforts to understand the

communication details between the pathogen and the host. In addition, available

approaches have not been yet applied to model signaling changes in the host organism

during Salmonella infection. In the first stage, we have integrated the temporal

Page 58: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

44

phosphoproteomic data of Salmonella-infected human cells with confidence weighted

protein-protein interactions to reconstruct the signaling pathway for each time point with

the PCSF approach. Then, all the components were labeled with the corresponding time

points based on their presence in the reconstructed networks, and used as the input for

the ILP-based edge inference step, in which directions are assigned to the interactions

based on the temporal data. The final network with 658 proteins and 869 interactions

provided a rich source to analyze the signaling alterations and clinical target identification.

Our approach allowed us to identify host pathways functioning during the Salmonella

infection and to rank the proteins according to their importance for the infection based on

their centrality in the network. The resulting network conserves the information about

temporality, direction of interactions, while revealing the hidden entities in the signaling.

Several pathways such as SNARE binding, mTOR signaling, immune response,

cytoskeleton organization, and apoptosis, were found to be affected, many of which were

previously found to be altered in the host cell after Salmonella infection. Additionally, we

have shown that the reconstructed network is enriched in the protein targets of the

Salmonella effectors. The final network also involves enrichments in the cytoskeletal

organization and the regulation of cellular component size. Clustering of the resulting

network showed that the multiple GO terms (for biological process, cellular component

and molecular function), KEGG and Reactome pathways are enriched in half of the

clusters including actin cytoskeleton organization, interspecies interaction between

organisms and SNARE interactions in vesicular transport. These findings are in parallel

with the known infection mechanism of the Salmonella where the injected effector

proteins trigger the epithelial cell membrane by rearranging the cytoskeleton of the host

cell that results in invasion of the bacterium into the host cell.

Another benefit of the proposed two-step approach (See Figure 4) was that hidden

components of signaling can be revealed with network reconstruction. In this specific

demonstration, several known targets of Salmonella effectors have been accurately

included in the reconstructed network such as CDC42, RHOA, 14-3-3δ, Syntaxins

although they were not present in the initial phosphoproteomic data. These hidden

signaling components can be potential therapeutic targets. Among them CDC42 is a

target of the effector protein SopB and their interaction helps in the adaptation of

Salmonella to the intracellular condition of the host. CDC42 is responsible of downstream

signaling and behaves as a signal transmitter. From a medical point of view, targeting

CDC42 is a good approach both for blocking the adaptation of Salmonella in the host cell

and abnormal downstream signaling during infection (See Figure 12). ACTG1 is a

constituent of cytoskeleton and it has been found that a Salmonella effector SpvB targets

mouse actin gamma (Khaitlina, 2001; Tezcan-Merdol et al., 2001). Therefore, ACTG1

can be a therapeutic target for stopping Salmonella adaptation in the human cell (See

Figure 13). RHOA functions in cytoskeleton organization and it is a target of Salmonella

Page 59: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

45

effectors. The effector SifA activates RHOA during infection. Activated RHOA promotes

opening tubes in the membrane (Srikanth et al., 2010). Therefore, RHOA can be

considered as a therapeutic target and controlling its activation can be a good approach

in Salmonella treatment (See Figure 15B). Also, besides revealing the hidden

components, the reconstructed edge directions nicely showed how the signals are

transmitted temporally from one layer to another. For example, some host proteins

behave like a signal receiver such as MAPK1 and some others behave like a signal

transmitter such as, 14-3-3δ (See Figure 15C).

Understanding these communications and signaling details in the host is crucial to

improve the available treatment strategies for Salmonella infection in the near future,

especially as the new antibiotic-resistance species are on the rise. We believe that the

integrated approaches, such as the one presented here, have a high potential for

understanding the key molecular mechanisms in bacteria’s susceptibility or resistance to

the available antibiotics and for the identification of new clinical targets in infectious

diseases, especially in the Salmonella infection. We also believe that targeting the crucial

proteins in the infection mechanism can provide a better treatment alternative to

antibiotics.

5.2 Future Work

This study has shown that this kind of integrated network modeling approaches can

provide us with a better understanding of disease conditions and important clinical targets

that drive the diseases. In this thesis, we present an application of this approach in

Salmonella infection. To date, temporal data representing different cellular states or

molecular changes are getting accumulated for several diseases including different types

of cancers. We will further improve and apply the PCSF and the ILP-based edge

inference approaches to model temporal signaling networks in human cancers. Toward

this aim, we started modeling the networks in ovarian cancer to evaluate changes in the

phosphoproteome when there is a delay in freezing tissues following sample (tumor)

excision. The results will provide us the temporal ischemic effects in a patient-specific

manner.

Page 60: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

46

Page 61: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

47

REFERENCES

Adamcsek, B., Palla, G., Farkas, I. J., Derenyi, I., & Vicsek, T. (2006). CFinder: locating

cliques and overlapping modules in biological networks. Bioinformatics, 22(8), 1021-1023. doi:10.1093/bioinformatics/btl039

Aijo, T., Granberg, K., & Lahdesmaki, H. (2013). Sorad: a systems biology approach to predict and modulate dynamic signaling pathway response from phosphoproteome time-course measurements. Bioinformatics, 29(10), 1283-1291. doi:10.1093/bioinformatics/btt130

Alcaraz, N., Pauling, J., Batra, R., Barbosa, E., Junge, A., Christensen, A. G., . . . Baumbach, J. (2014). KeyPathwayMiner 4.0: condition-specific pathway analysis by combining multiple omics studies and networks with Cytoscape. BMC Syst Biol, 8, 99. doi:10.1186/s12918-014-0099-x

Auweter, S. D., Yu, H. B., Arena, E. T., Guttman, J. A., & Finlay, B. B. (2012). Oxysterol-binding protein (OSBP) enhances replication of intracellular Salmonella and binds the Salmonella SPI-2 effector SseL via its N-terminus. Microbes Infect, 14(2), 148-154. doi:10.1016/j.micinf.2011.09.003

Aynaud, T. (2009). Louvain Community Detection. Retrieved from https://bitbucket.org/taynaud/python-louvain

Bailly-Bechet, M., Borgs, C., Braunstein, A., Chayes, J., Dagkessamanskaia, A., Francois, J. M., & Zecchina, R. (2010). Finding undetected protein associations in cell signaling by belief propagation. Proc Natl Acad Sci U S A, 108(2), 882-887.

Bender, C., Henjes, F., Frohlich, H., Wiemann, S., Korf, U., & Beissbarth, T. (2010). Dynamic deterministic effects propagation networks: learning signalling pathways from longitudinal protein array data. Bioinformatics, 26(18), i596-602. doi:10.1093/bioinformatics/btq385

Page 62: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

48

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., . . . Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res, 28(1), 235-242.

Bindea, G., Mlecnik, B., Hackl, H., Charoentong, P., Tosolini, M., Kirilovsky, A., . . . Galon, J. (2009). ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics, 25(8), 1091-1093. doi:10.1093/bioinformatics/btp101

Brown, M. D., Bry, L., Li, Z., & Sacks, D. B. (2007). IQGAP1 regulates Salmonella invasion through interactions with actin, Rac1, and Cdc42. J Biol Chem, 282(41), 30265-30272. doi:10.1074/jbc.M702537200

Collins, M. O., Yu, L., & Choudhary, J. S. (2007). Analysis of protein phosphorylation on a proteome-scale. Proteomics, 7(16), 2751-2768. doi:10.1002/pmic.200700145

Dai, S., Zhang, Y., Weimbs, T., Yaffe, M. B., & Zhou, D. (2007). Bacteria-generated PtdIns(3)P recruits VAMP8 to facilitate phagocytosis. Traffic, 8(10), 1365-1374. doi:10.1111/j.1600-0854.2007.00613.x

Dandekar, T., Astrid, F., Jasmin, P., & Hensel, M. (2012). Salmonella enterica: a surprisingly well-adapted intracellular lifestyle. Front Microbiol, 3, 164. doi:10.3389/fmicb.2012.00164

Danon, L., Diaz-Guilera, A., Duch, J., & Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005(09), P09008.

De Jong, H. (2002). Modeling and simulation of genetic regulatory systems: a literature review. Journal of computational biology, 9(1), 67-103.

Dhal, P. K., Barman, R. K., Saha, S., & Das, S. (2014). Dynamic modularity of host protein interaction networks in Salmonella Typhi infection. PLoS One, 9(8), e104911. doi:10.1371/journal.pone.0104911

Dittrich, M. T., Klau, G. W., Rosenwald, A., Dandekar, T., & Muller, T. (2008). Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics, 24(13), i223-231. doi:10.1093/bioinformatics/btn161

Page 63: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

49

Eckmann, L., Smith, J. R., Housley, M. P., Dwinell, M. B., & Kagnoff, M. F. (2000). Analysis by high density cDNA arrays of altered gene expression in human intestinal epithelial cells in response to infection with the invasive enteric bacteria Salmonella. J Biol Chem, 275(19), 14084-14094.

Eren Ozsoy, O., Aydin Son, Y., & Can, T. (2015). Reconstruction of Signaling and Regulatory Networks using Time-Series and Perturbation Experiments. manuscript under review.

Eren Ozsoy, O., & Can, T. (2013). A Divide and Conquer Approach for Construction of Large-Scale Signaling Networks from PPI and RNAi Data Using Linear Programming. IEEE/ACM Trans Comput Biol Bioinform. doi:FF8B32E0-6B62-469D-AC2B-5D933C2BC2B6

Finlay, B. B., Ruschkowski, S., & Dedhar, S. (1991). Cytoskeletal rearrangements accompanying salmonella entry into epithelial cells. J Cell Sci, 99 ( Pt 2), 283-296.

Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., . . . Jensen, L. J. (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 41(Database issue), D808-815. doi:10.1093/nar/gks1094

Fresno, C., & Fernandez, E. A. (2013). RDAVIDWebService: a versatile R interface to DAVID. Bioinformatics, 29(21), 2810-2811. doi:10.1093/bioinformatics/btt487

Frohlich, H., Sahin, O., Arlt, D., Bender, C., & Beissbarth, T. (2009). Deterministic Effects Propagation Networks for reconstructing protein signaling networks from multiple interventions. BMC Bioinformatics, 10, 322. doi:10.1186/1471-2105-10-322

Fukuhara, N., & Kawabata, T. (2008). HOMCOS: a server to predict interacting protein pairs and interacting sites by homology modeling of complex structures. Nucleic Acids Res, 36(Web Server issue), W185-189. doi:10.1093/nar/gkn218

Galan, J. E., & Zhou, D. (2000). Striking a balance: modulation of the actin cytoskeleton by Salmonella. Proc Natl Acad Sci U S A, 97(16), 8754-8761.

Gao, M., & Skolnick, J. (2010). Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci U S A, 107(52), 22517-22522. doi:10.1073/pnas.1012820107

Page 64: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

50

Gao, Y., Douguet, D., Tovchigrechko, A., & Vakser, I. A. (2007). DOCKGROUND system of databases for protein recognition studies: unbound structures for docking. Proteins, 69(4), 845-851. doi:10.1002/prot.21714

Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proc Natl Acad Sci U S A, 99(12), 7821-7826. doi:10.1073/pnas.122653799

Gosline, S. J., Spencer, S. J., Ursu, O., & Fraenkel, E. (2012). SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets. Integr Biol (Camb), 4(11), 1415-1427. doi:10.1039/c2ib20072d

Hall, A. (1998). Rho GTPases and the actin cytoskeleton. Science, 279(5350), 509-514.

Hannemann, S., Gao, B., & Galan, J. E. (2013). Salmonella modulation of host cell gene expression promotes its intracellular growth. PLoS Pathog, 9(10), e1003668. doi:10.1371/journal.ppat.1003668

Hashemikhabir, S., Ayaz, E. S., Kavurucu, Y., Can, T., & Kahveci, T. (2012). Large-scale signaling network reconstruction. IEEE/ACM Trans Comput Biol Bioinform, 9(6), 1696-1708. doi:10.1109/TCBB.2012.128

Hecker, M., Lambeck, S., Toepfer, S., Van Someren, E., & Guthke, R. (2009). Gene regulatory network inference: data integration in dynamic models—a review. Biosystems, 96(1), 86-103.

Huang da, W., Sherman, B. T., & Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 4(1), 44-57. doi:10.1038/nprot.2008.211

Huang, S. S., Clarke, D. C., Gosline, S. J. C., Labadorf, A., Chouinard, C. R., Gordon, W., . . . Fraenkel, E. (2012). Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene-induced Signaling. Plos Comp Biol.

Huang, S. S., & Fraenkel, E. (2009). Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks. Sci Signal, 2(81), ra40. doi:10.1126/scisignal.2000350

Page 65: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

51

Jiao, X., Sherman, B. T., Huang da, W., Stephens, R., Baseler, M. W., Lane, H. C., & Lempicki, R. A. (2012). DAVID-WS: a stateful web service to facilitate gene/protein list analysis. Bioinformatics, 28(13), 1805-1806. doi:10.1093/bioinformatics/bts251

Johnson, S. A., & Hunter, T. (2004). Phosphoproteomics finds its timing. Nat Biotechnol, 22(9), 1093-1094. doi:10.1038/nbt0904-1093

Keskin, O., Tuncbag, N., & Gursoy, A. (2016). Predicting Protein-Protein Interactions from the Molecular to the Proteome Level. Chem Rev, 116(8), 4884-4909. doi:10.1021/acs.chemrev.5b00683

Khaitlina, S. Y. (2001). Functional specificity of actin isoforms. Int Rev Cytol, 202, 35-98.

Kiani, N. A., & Kaderali, L. (2014). Dynamic probabilistic threshold networks to infer signaling pathways from time-course perturbation data. BMC Bioinformatics, 15, 250. doi:10.1186/1471-2105-15-250

Kim, Y. A., Wuchty, S., & Przytycka, T. M. (2011). Identifying causal genes and dysregulated pathways in complex diseases. PLoS Comput Biol, 7(3), e1001095. doi:10.1371/journal.pcbi.1001095

Knapp, B., & Kaderali, L. (2013). Reconstruction of cellular signal transduction networks using perturbation assays and linear programming. PLoS One, 8(7), e69220.

Kolde, R. (2012). Pheatmap: pretty heatmaps. R package version, 61.

Lawhon, S. D., Khare, S., Rossetti, C. A., Everts, R. E., Galindo, C. L., Luciano, S. A., . . . Adams, L. G. (2011). Role of SPI-1 secreted effectors in acute bovine response to Salmonella enterica Serovar Typhimurium: a systems biology analysis approach. PLoS One, 6(11), e26869. doi:10.1371/journal.pone.0026869

Lee, C. H., Lin, S. T., Liu, J. J., Chang, W. W., Hsieh, J. L., & Wang, W. K. (2014). Salmonella induce autophagy in melanoma by the downregulation of AKT/mTOR pathway. Gene Ther, 21(3), 309-316. doi:10.1038/gt.2013.86

Linde, J., Schulze, S., Henkel, S., & Guthke, R. (2015). Data- and knowledge-based modeling of gene regulatory networks: An update. . EXCLI Journal(14), 346-378.

Page 66: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

52

Liu, X., Lu, R., Xia, Y., Wu, S., & Sun, J. (2010). Eukaryotic signaling pathways targeted by Salmonella effector protein AvrA in intestinal infection in vivo. BMC Microbiol, 10, 326. doi:10.1186/1471-2180-10-326

Long, X., Lin, Y., Ortiz-Vega, S., Yonezawa, K., & Avruch, J. (2005). Rheb binds and regulates the mTOR kinase. Curr Biol, 15(8), 702-713. doi:10.1016/j.cub.2005.02.053

Lopes, C. T., Franz, M., Kazi, F., Donaldson, S. L., Morris, Q., & Bader, G. D. (2010). Cytoscape Web: an interactive web-based network browser. Bioinformatics, 26(18), 2347-2348. doi:10.1093/bioinformatics/btq430

Markowetz, F., Kostka, D., Troyanskaya, O. G., & Spang, R. (2007). Nested effects models for high-dimensional phenotyping screens. Bioinformatics, 23(13), i305-312. doi:10.1093/bioinformatics/btm178

Matos, M. R., Knapp, B., & Kaderali, L. (2015). lpNet: a linear programming approach to reconstruct signal transduction networks. Bioinformatics. doi:10.1093/bioinformatics/btv327

Melas, I. N., Samaga, R., Alexopoulos, L. G., & Klamt, S. (2013). Detecting and removing inconsistencies between experimental data and signaling network topologies using integer linear programming on interaction graphs. PLoS Comput Biol, 9(9), e1003204.

Meyer, M. J., Das, J., Wang, X., & Yu, H. (2013). INstruct: a database of high-quality 3D structurally resolved protein interactome networks. Bioinformatics, 29(12), 1577-1579. doi:10.1093/bioinformatics/btt181

Mosca, R., Ceol, A., & Aloy, P. (2013). Interactome3D: adding structural details to protein networks. Nat Methods, 10(1), 47-53. doi:10.1038/nmeth.2289

Nepomnyachiy, S., Ben-Tal, N., & Kolodny, R. (2015). CyToStruct: Augmenting the Network Visualization of Cytoscape with the Power of Molecular Viewers. Structure, 23(5), 941-948. doi:10.1016/j.str.2015.02.013

Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys, 69(2 Pt 2), 026113. doi:10.1103/PhysRevE.69.026113

Page 67: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

53

Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814-818. doi:10.1038/nature03607

Pavlopoulos, G. A., Secrier, M., Moschopoulos, C. N., Soldatos, T. G., Kossida, S., Aerts, J., . . . Bagos, P. G. (2011). Using graph theory to analyze biological networks. BioData Min, 4, 10. doi:10.1186/1756-0381-4-10

Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., & Ferrin, T. E. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. J Comput Chem, 25(13), 1605-1612. doi:10.1002/jcc.20084

Python Software Foundation. (2016). Python Language Reference, version 2.7. Retrieved from http://www.python.org

R Development Core Team. (2016). R: A Language and Environment for Statistical Computing. Retrieved from http://www.R-project.org

Raghunathan, A., Reed, J., Shin, S., Palsson, B., & Daefler, S. (2009). Constraint-based analysis of metabolic capacity of Salmonella typhimurium during host-pathogen interaction. BMC Syst Biol, 3, 38. doi:10.1186/1752-0509-3-38

Ramos-Morales, F. (2012). Impact of Salmonella enterica type III secretion system effectors on the eukaryotic host cell. ISRN Cell Biology, 2012.

Rogers, L. D., Brown, N. F., Fang, Y., Pelech, S., & Foster, L. J. (2011). Phosphoproteomic analysis of Salmonella-infected cells identifies key kinase regulators and SopB-dependent host phosphorylation events. Sci Signal, 4(191), rs9. doi:10.1126/scisignal.2001668

Salter‐Townshend, M., White, A., Gollini, I., & Murphy, T. B. (2012). Review of statistical network analysis: models, algorithms, and software. Statistical Analysis and Data Mining, 5(4), 243-264.

Schleker, S., Sun, J., Raghavan, B., Srnec, M., Müller, N., Koepfinger, M., . . . Klein‐Seetharaman, J. (2012). The current Salmonella‐host interactome.

PROTEOMICS-Clinical Applications, 6(1‐2), 117-133.

Page 68: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

54

Schult, D. A., & Swart, P. (2008). Exploring network structure, dynamics, and function using NetworkX. Paper presented at the Proceedings of the 7th Python in Science Conferences (SciPy 2008).

Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., . . . Ideker, T. (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res, 13(11), 2498-2504. doi:10.1101/gr.1239303

Srikanth, C., Wall, D. M., Maldonado-Contreras, A., Shi, H. N., Zhou, D., Demma, Z., . . . McCormick, B. A. (2010). Salmonella pathogenesis and processing of secreted effectors by caspase-3. Science, 330(6002), 390-393.

Steele-Mortimer, O. (2011). Exploitation of the ubiquitin system by invading bacteria. Traffic, 12(2), 162-169. doi:10.1111/j.1600-0854.2010.01137.x

Stolovitzky, G., Monroe, D., & Califano, A. (2007). Dialogue on Reverse‐Engineering Assessment and Methods. Ann N Y Acad Sci, 1115(1), 1-22.

Taylor, R. C., Singhal, M., Weller, J., Khoshnevis, S., Shi, L., & McDermott, J. (2009). A network inference workflow applied to virulence-related processes in Salmonella typhimurium. Ann N Y Acad Sci, 1158, 143-158. doi:10.1111/j.1749-6632.2008.03762.x

Tezcan-Merdol, D., Nyman, T., Lindberg, U., Haag, F., Koch-Nolte, F., & Rhen, M. (2001). Actin is ADP-ribosylated by the Salmonella enterica virulence-associated protein SpvB. Mol Microbiol, 39(3), 606-619.

Tu, S., Wu, W. J., Wang, J., & Cerione, R. A. (2003). Epidermal growth factor-dependent regulation of Cdc42 is mediated by the Src tyrosine kinase. J Biol Chem, 278(49), 49293-49300. doi:10.1074/jbc.M307021200

Tuncbag, N., Braunstein, A., Pagnani, A., Huang, S. S., Chayes, J., Borgs, C., . . . Fraenkel, E. (2013). Simultaneous reconstruction of multiple signaling pathways via the prize-collecting steiner forest problem. J Comput Biol, 20(2), 124-136. doi:10.1089/cmb.2012.0092

Tuncbag, N., Gosline, S. J., Kedaigle, A., Soltis, A. R., Gitter, A., & Fraenkel, E. (2016). Network-Based Interpretation of Diverse High-Throughput Datasets through the

Page 69: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

55

Omics Integrator Software Package. PLoS Comput Biol, 12(4), e1004879. doi:10.1371/journal.pcbi.1004879

Tuncbag, N., Gursoy, A., Nussinov, R., & Keskin, O. (2011). Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM. Nat Protoc, 6(9), 1341-1354. doi:10.1038/nprot.2011.367

Tuncbag, N., Kar, G., Gursoy, A., Keskin, O., & Nussinov, R. (2009). Towards inferring time dimensionality in protein-protein interaction networks by integrating structures: the p53 example. Mol Biosyst, 5(12), 1770-1778. doi:10.1039/B905661K

Tuncbag, N., Keskin, O., Nussinov, R., & Gursoy, A. (2012). Fast and accurate modeling of protein-protein interactions by combining template-interface-based docking with flexible refinement. Proteins, 80(4), 1239-1249. doi:10.1002/prot.24022

Turner, B., Razick, S., Turinsky, A. L., Vlasblom, J., Crowdy, E. K., Cho, E., . . . Wodak, S. J. (2010). iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. Database, 2010, baq023.

Vakser, I. A. (2014). Protein-protein docking: from interaction to interactome. Biophys J, 107(8), 1785-1793. doi:10.1016/j.bpj.2014.08.033

Vincent, D. B., Jean-Loup, G., Renaud, L., & Etienne, L. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.

Vreven, T., Hwang, H., Pierce, B. G., & Weng, Z. (2014). Evaluating template-based and template-free protein-protein complex structure prediction. Brief Bioinform, 15(2), 169-176. doi:10.1093/bib/bbt047

Wang, X., Wei, X., Thijssen, B., Das, J., Lipkin, S. M., & Yu, H. (2012). Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat Biotechnol, 30(2), 159-164. doi:10.1038/nbt.2106

Yang, C., Coker, K. J., Kim, J. K., Mora, S., Thurmond, D. C., Davis, A. C., . . . Pessin, J. E. (2001). Syntaxin 4 heterozygous knockout mice develop muscle insulin resistance. J Clin Invest, 107(10), 1311-1318. doi:10.1172/JCI12274

Page 70: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

56

Yeger-Lotem, E., Riva, L., Su, L. J., Gitler, A. D., Cashikar, A. G., King, O. D., . . . Fraenkel, E. (2009). Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat Genet, 41(3), 316-323. doi:10.1038/ng.337

Yoon, H., McDermott, J. E., Porwollik, S., McClelland, M., & Heffron, F. (2009). Coordinated regulation of virulence during systemic infection of Salmonella enterica serovar Typhimurium. PLoS Pathog, 5(2), e1000306. doi:10.1371/journal.ppat.1000306

Page 71: RECONSTRUCTION OF THE TEMPORAL SIGNALING …etd.lib.metu.edu.tr/upload/12620456/index.pdfhücre iskeleti organizasyonu ve apoptoz yolakları gibi sinyallerdeki gizli fonksiyonları

57

APPENDIX

The abstract of the publication of this study on Frontiers in Microbiology.

This study has been published as, Reconstruction of the temporal signaling network in Salmonella-infected human cells

Gungor Budak, Oyku Eren Ozsoy, Yesim Aydin Son, Tolga Can and Nurcan Tuncbag*

Salmonella enterica is a bacterial pathogen that usually infects its host through food

sources. Translocation of the pathogen proteins into the host cells leads to changes in

the signaling mechanism either by activating or inhibiting the host proteins. Given that the

bacterial infection modifies the response network of the host, a more coherent view of the

underlying biological processes and the signaling networks can be obtained by using a

network modeling approach based on the reverse engineering principles. In this work, we

have used a published temporal phosphoproteomic dataset of Salmonella-infected

human cells and reconstructed the temporal signaling network of the human host by

integrating the interactome and the phosphoproteomic dataset. We have combined two

well-established network modeling frameworks, the Prize-collecting Steiner Forest

(PCSF) approach and the Integer Linear Programming (ILP) based edge inference

approach. The resulting network conserves the information on temporality, direction of

interactions, while revealing hidden entities in the signaling, such as the SNARE binding,

mTOR signaling, immune response, cytoskeleton organization, and apoptosis pathways.

Targets of the Salmonella effectors in the host cells such as CDC42, RHOA, 14-3-3δ,

Syntaxin family, Oxysterol-binding proteins were included in the reconstructed signaling

network although they were not present in the initial phosphoproteomic data. We believe

that integrated approaches, such as the one presented here, have a high potential for the

identification of clinical targets in infectious diseases, especially in the Salmonella

infections.

Budak G, Eren Ozsoy O, Aydin Son Y, Can T and Tuncbag N (2015) Reconstruction of

the temporal signaling network in Salmonella-infected human cells. Front. Microbiol.

6:730. doi: 10.3389/fmicb.2015.00730