iberex biomedical entity-relationship explorer tutorial …iberex.korea.ac.kr/tutorial.pdf ·...

iBEReX

Biomedical Entity-Relationship Explorer

TUTORIAL

http://berex.korea.ac.kr

http://berex.korea.ac.kr/

2

Table of Contents:

CHAPTER 1: INTRODUCTION TO iBEReX 3

1. iBEReX Overview 3

1.1. Query iBEReX 4

1.2. iBEReX Visualization Page 5

1.3. Top/Bottom Panel 6

1.4. Visualization Panel 7

1.4.1. Menu 7

1.4.2. Save File 7

1.4.3. Expand Nodes 7

1.4.4. Add Shortest Paths Option 8

1.4.5. Network Layout Option 8

1.4.6. Help 9

1.4.7. Visualization Tool 9

1.4.8. Delete/Expand Options 9

1.4.9. Search Option 10

1.5. Entity-Relationship Panel 11

1.5.1. Detailed Information on Entity (node) 11

1.5.2. Entity-Relationship Type 11

1.5.3. Entity-Relationship Tree 12

1.5.4. Graph Edit 13

1.5.5. Other Query Options 13

CHAPTER 2: Context-Specific Subnetwork Discovery (COSSY) Analysis Option 14

2. COSSY Overview 14

2.1. COSSY Analysis Option in iBEReX 14

2.2. File Formats 16

2.2.1. Gene cluster text format (*.gct) 16

2.2.2. Categorical class format (*.cls) 17

2.2.3. Chip format (*.chip) (optional) 17

2.3. COSSY Results 18

2.4. Linking Two Genes by Shortest Paths 20

REFERENCES 21

3

CHAPTER 1: INTRODUCTION TO iBEReX

1. iBEReX Overview.

iBEReX is a new web-based biomedical knowledge integration, search and exploration tool. iBEReX

integrates eight popular biomedical databases and delineates an integrated network by combining the

multi-layered biomedical entity-relationships extracted from these databases. By users entering keywords

to search the integrated network, iBEReX returns a subnetwork that matches the keywords. iBEReX also

allows users to upload their gene expression profiles, and automatically discovers the important

subnetworks to differentiate the phenotypes from their data. Users can interactively explore the resulting

subnetwork. iBEReX has the following advantages as compared to other biomedical database exploration

tools: 1) reliable information was extracted from widely used databases that are proven for their utility in

biomedical research; 2) integrated networks were constructed by gathering information from these various

databases; 3) interactive exploration and visualization of the resulting networks; 4) context-specific

subnetwork discovery from high-throughput transcriptomics data; and 5) a user-driven approach to

automatically generate plausible and/or unexpected relationships from the derived relevant subnetwork.

The underlying iBEReX database is based on the recently published BEReX (Jeon et al., 2014) and

integrates the subnetwork discovery algorithm COSSY (Saha et al., 2014) for high-throughput

transcriptomics analysis. iBEReX utilizes the open source Cytoscape web codes for interactive and

powerful visualization features. We believe that iBEReX will be a useful bioinformatics tool for

biologists in exploring the complex biomedical entity-relationship networks.

iBEReX Webpage: http://berex.korea.ac.kr

This website is free and open to all users; there is no login requirement.

Figure 1.1: Homepage of iBEReX.


4

1.1. Query iBEReX

Figure 1.2: Two types of query inputs.

We query and explore the biomedical entity-relationship in iBEReX (Figure 1.2) by using a simple

keyword search or a list of keywords that can be uploaded to the web server. Here, we use Example 1 to

illustrate the functionality of iBEReX. By clicking Example 1, the query box will be populated with three

keywords: BCR, ABL1, and imatinib. Then we click the “submit” button to visualize the entity-

relationships between these keywords.

NOTE: iBEReX accepts any keywords as the query inputs (e.g. gene symbols, diseases, pathways, gene

ontology terms, drug names, etc.).

5

1.2. iBEReX Visualization Page

Figure 1.3: The visualization page of iBEReX.

The results page of iBEReX contains four panels: the top panel, the visualization panel, the entity-

relationship panel, and the bottom panel.

Top Panel: In the top panel, if a user clicks on the iBEReX logo, it will return to the homepage.

Visualization Panel: The visualization panel has four components: the “Menu,” “Layout,” and “Help”

options at the top left corner, the visualization screen in the middle, and the “visualization tool” at the

bottom right corner. This is the main visualization screen for the user to interactively navigate and explore

the biomedical entity-relationship network.

Entity-Relationship Panel: The entity-relationship panel provides the detailed information on the entity,

the relationship, the entity-relationship, and the option to edit the network (nodes and edges).

Bottom Panel: The bottom panel presents the query list which displays the list of keywords queried by

the user, and the color legend representing the different biomedical entities. The user can select from this

query list to highlight the entity in the visualization panel. The user can also select all nodes of a certain

type by clicking on the legend of the desired type.

6

1.3. Top/Bottom Panel

Figure 1.4: Top panel of the visualization page.

By clicking the iBEReX logo, users can return to the homepage of the iBEReX server.

Figure 1.5: Bottom panel of the visualization page.

The bottom panel contains the query list and the color legend for the nodes in the graph. As illustrated in

Figure 1.5, the query list includes ABL1 and BCR (two gene symbols), and imatinib (a drug name),

which are query terms in Example 1. By clicking one of these entities, it will highlight the selected entity

in the visualization panel. For example, if “ABL1” is selected from here, the visualization panel will show

a gray circle around “ABL1” indicating that this gene has been selected. The entity-relationship panel will

also display the information regarding this selected gene (Figure 1.6). In order to apply an operation

collectively on all nodes of the same type, users can click on the legend of the desired type and then

perform the operation. For example, suppose that a user wants to change the color of all disease nodes in

the current graph from the default color (white) to red. The user can click on the “disease” legend, which

will highlight all disease nodes in the graph, and then change the node color in the “Graph Edit” tab in the

entity-relationship panel on the right.

Figure 1.6: ABL1 is selected and a gray circle highlights “ABL1” in the visualization panel. The

entity-relationship panel displays the detailed information of the selected gene.

7

1.4. Visualization Panel

1.4.1. Menu

Three options are available in the “Menu,” including “Save,” “Expand by,” and “Add shortest paths by.”

1.4.2. Save File

Figure 1.7: Save option.

Users can save the network (entity-relationship) in iBEReX as the following formats: png, svg, pdf,

xgmml, graphml, and sif.

1.4.3. Expand Nodes

Figure 1.8: Expand option.

By default, iBEReX will expand the graph by five highest-ranking entities in response to the execution of

the “expand” operation explained in Section 1.4.8. Users can change this by selecting the “Expand by”

option under “Menu.”

8

1.4.4. Add Shortest Paths By Option

Figure 1.9: Add shortest paths by number of nodes option.

By default, iBEReX will add a single top-scoring shortest path between two selected nodes upon a user’s

request (Section 2.5). Users can change this by selecting the “Add shortest paths by” option under

“Menu.”

1.4.5. Network Layout Option

Figure 1.10: Layout option.

Users can display the network using the automatic graph layout options provided under the “Layout”

menu. Three options are currently available: Radial (default), Circle, and Tree. Figure 1.11 illustrates the

three different layouts from querying Example 1.

Figure 1.11: Three layout options: A. Radial, B. Circle, and C. Tree.

9

1.4.6. Help

Figure 1.12: Help options.

The “Help” tab has two options: (i) “About” – describes the iBEReX version, and (ii) “Tutorial” – which

is this document.

1.4.7. Visualization Tool

Figure 1.13: Functions of the Visualization tool.

The visualization tool enables users to pan the entire network, zoom in and out, and expand the

visualization panel to fit the screen.

1.4.8. Delete/Expand Options

Figure 1.14: Delete/Expand options.

10

If users right click on a node (entity), a pop-up menu will appear; through this menu, users can

delete/expand the relationships with other entities (e.g. genes/proteins, diseases/symptoms, drugs,

pathways, gene ontology terms, miRNAs, and transcription factors). Again, by default, five entities will

be expanded at a time (see Section 1.4.3). For example, Figure 1.15 is obtained by clicking “ABL1” for

“Expand by Gene/Protein.” Five new genes/proteins were added into the network.

Figure 1.15: Expand by Gene/Protein.

1.4.9. Search Option

Figure 1.16: Searching “SRC” in the Example 1 query.

Users can search a particular entity in the visualization panel by using the “Search” box. If an entity is

already available in the current network, that entity will be highlighted. If an entity is not available in the

current network, the entity will be added into the current network. Figure 1.16 illustrates searching the

“SRC” in Example 1. Since SRC is not available in the Example 1 network, it has been added into the

network.

11

1.5. Entity-Relationship Panel

The entity-relationship panel contains four tabs that provide detailed information and link out options for

the selected entity and relationship in the visualization panel. It also provides the Edit options for the

nodes and edges in the network.

Figure 1.17: Options in the Entity-Relationship Panel.

1.5.1. Detailed Information on Entity (node)

For example, if “ABL1” is selected in the visualization panel, the “Detail Info” tab will display the

information regarding this entity (Figure 1.18) and provide link out options to other external databases.

Figure 1.18: Detailed information on ABL1.

1.5.2. Entity-Relationship Type

Users can customize the entity-relationship type to be displayed in the visualization panel by clicking the

“Entity-Relationship Type” tab. By default, all types of entity-relationship are displayed. Users can

uncheck types in the drop-down menu as illustrated in Figure 1.19.

12

Figure 1.19: Entity-Relationship Type for ABL1.

1.5.3. Entity-Relationship Tree

Users can explore all the entity-relationships of the selected entity (node) in the “Entity-Relationship

Tree” tab. Figure 1.20 illustrates the entity-relationship tree of “ABL1,” and all the genes and pathways

connected to “ABL1” in Figure 1.20B and Figure 1.20C, respectively.

Figure 1.20: Entity-Relationship Tree.

13

1.5.4. Graph Edit

The “Graph Edit” tab allows users to customize the nodes and edges in the visualization panel. Figure

1.21 illustrates the options in the “Graph Edit” tab.

Figure 1.21: Options in Graph Edit tab.

1.5.5. Other Query Options

Users can specify the colors of biomedical entities (e.g. genes, drugs, etc.) in their query by appending “|”

followed by the desired color at the end of each query line. iBEReX will automatically extract and color

the biomedical entity according to the color specified by the user.

See Example 2 (Figure 1.22):

ITGAX | red

ITGB2 | green

EZR | blue

Figure 1.22: Color Options in Example 2 query.

14

CHAPTER 2: Context-Specific Subnetwork Discovery (COSSY) Analysis Option

2. COSSY Overview

The algorithm COSSY discovers important subnetworks that can differentiate between two phenotypes

(context) (Saha et al., 2014). It automatically finds differentially expressed subnetworks of closely

interacting molecules from molecular interaction networks by using gene expression profiles. COSSY

enables users to analyze their gene expression profile datasets on the iBEReX web server

(http://berex.korea.ac.kr).

2.1. COSSY Analysis Option in iBEReX

Figure 2.1. Context-Specific Subnetwork Discovery (COSSY) and Analysis of Gene Expression

Profiles Webpage.

To perform the subnetwork discovery analysis, users need to provide gene expression profiles (in the

format of *.gct), a class label (in the format of *.cls), and an optional chip file (in the format of *.chip).

1. Click on the “Analysis” tab.

2. Upload gene expression profile dataset file (File format: *.gct).

3. If the gene expression profile dataset is already converted to gene symbols (e.g. RNAseq), then

click “no need to map probe ID -> gene ID.” If the gene expression profile dataset contained

probe IDs, then unclick this option and provide the chip file (File format: *.chip).

4. Upload class labels (File format: *.cls).

5. Click “Analyze & Explore.”

The right panel of the “Analysis” tab provides an example of a gene expression profile dataset

(leukemia2.gct), a chip file (leukemia2.chip), and a class file (leukemia2.cls). We will use this example to

illustrate the functionality of COSSY. On the click of the “Analyze & Explore” button, COSSY will be


15

executed. The execution typically takes several minutes (depending on the number of samples) to return

the results.

NOTE:

Once a COSSY query is submitted, iBEReX acknowledges the receipt of the user dataset and shows a

link to the result page. Since a COSSY job typically takes several minutes, users may not see the result

graph populated in the results page immediately after the submission. Users are advised to keep the

browser open and refresh the result page until the result graph appears, or to bookmark the URL of the

result page and visit the webpage later.

16

2.2. File Formats

The iBEReX server uses common file formats for data input in the COSSY analysis. The detailed

description of the file formats used here is available online at

http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats.

2.2.1. Gene cluster text format (*.gct)

This is a tab delimited file representing an expression dataset. The first line of this format is fixed

[#1.2].

The second line contains the number of genes followed by the number of samples in the dataset.

Format:

(#genes)(tab)(#samples)

The third line contains column titles of the expression dataset. The first two columns are “Name” and

“Description,” and the other columns represent the expression profiles.

Format:

Name(tab)Description(tab)(sample_1_name)(tab)(sample_2_name)(tab)...

(sample_N_name)

The expression data start from the fourth line. Each line represents the expression levels of a gene across

all the samples.

Format:

(gene name)(tab)(gene description)(tab)(col_1_data)(tab)(col_2_data)

(tab)...(col_N_data)

The “Name” column must contain unique identifiers. It contains microarray probe IDs for microarray

data, or RNA-seq IDs for RNA-seq profiles.

Note: For microarray probe annotations, users must provide a chip file separately which will be mapped

against the “Name” column of the gct file. For RNA-seq annotations, users must provide the gene

symbols in the “Description” column.

http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats

17

2.2.2. Categorical class format (*.cls)

This is a tab or space delimited file.

The first line contains three numbers.

Format:

(number of samples)(tab)(number of classes)(tab)1

The second line contains the names of the classes and is preceded by a hash symbol (#).

Format:

# (tab)(class_0_name)(tab)(class_1_name)

The last line actually contains the class labels of all the samples.

Format:

(sample_1_class)(tab)(sample_2_class)(tab)...(sample_N_class)

Note: COSSY works with binary categories; therefore, two classes exist. The higher label in the lexical

order is chosen as the positive class by default. For example, if “0” and “1” are the two class labels,

then “0” is the negative class (control), and “1” is the positive class (case). Similarly, the “tumor” class

and “normal” class represent the positive and negative classes, respectively.

2.2.3. Chip format (*.chip) (optional)

This file contains the annotation information about a microarray. It is a tab delimited by containing three

columns: 1) Probe Set ID, 2) Gene Symbol, and 3) Gene Title.

Format:

(probe_id)(tab)(gene_symbol)(tab)(gene_title)

Note: COSSY can handle the case where a probe is mapped to multiple genes, i.e. if there are multiple

gene symbols for a probe. Gene symbols have to be separated by a semicolon (;) or three consecutive

slashes (///).

18

2.3. COSSY Results

Figure 2.2: Subnetworks identified by COSSY from the Leukemia (Example) data set.

From the COSSY analysis, the list of genes identified from the example (leukemia) dataset includes

PDPK1, PIK3CA, RPS6KB2, PRKCI, PLCG2, PPP3CB, PPP3CC, GRB2, CREB1, BIRC5, MYC, and

WNT4. The red and green colors represent over-expressed and under-expressed genes in Case (AML in

this example) as compared to Control (ALL in this example). These genes were used as the query in

iBEReX, and listed on the bottom panel in the webpage. Users may click on any gene in this list, and the

corresponding gene will be highlighted in the visualization panel. The current implementation of COSSY

is based on KEGG pathways (Saha et al., 2014). Given a user expression profile dataset, COSSY

identifies the top 10 phenotype-correlated subnetworks from KEGG pathways, and selects the top five

most differentially expressed genes from each of the top 10 subnetworks. The five gene group is reported

as a Molecular Interaction Subnetwork (MIS). Note that the number of genes in an MIS may be less than

five when microarray profiles are used because multiple microarray probes may map to one gene ID. The

example in Figure 2.2 shows that MIS #1 consists of only four genes. The two different probes in MIS #1

are mapped to the same gene ID. Also note that two extra genes, PPP3CA and STAT3, were added in the

visualization panel showing the top three MISs (Figure 2.2). Since an MIS is formed by the top five

differentially expressed genes from a subnetwork, the genes in an MIS may not be fully connected to each

other. To glue the genes together in an MIS, iBEReX introduces the least number of additional required

genes into the MIS.

19

Figure 2.3. Analysis Results.

The MISs are reported in the sorted order of the correlation strength with the phenotypes in the user

profiles. By default, iBEReX displays the top three MISs. Users may increase or decrease the number of

MISs to be included in the visualization panel by sliding the MIS rule in the “Analysis Results” tab

(Figure 2.3). Figure 2.4 illustrates the result after sliding the MIS rule from 3 (default) to 4.

Figure 2.4: Changing the “Top Phenotype-Correlated Subnetworks” from top three to top four.

20

Figure 2.5: Changing the “Top Phenotype-Correlated Subnetworks” from top three to top two.

Figure 2.5 illustrates the option of reducing the top three MISs to the top two MISs, using the MIS rule in

the “Analysis Result” tab. Here, only the differentially expressed genes identified in the top two MISs and

a glue gene (PPP3CA) are visualized as a subnetwork. Users can click on the gene list in the bottom panel

to highlight the genes in the visualization panel. Users can also add other entity-relationships by right

clicking the highlighted genes.

2.4. Linking Two Genes by Shortest Paths

Figure 2.6. Linking two genes by shortest paths.

21

In iBEReX, users can connect isolated subnetworks by adding the nodes that formed the shortest paths

between two selected biomedical entities. Here, we illustrate how to connect the two remote subnetworks

by selecting RPS6KB2 and PPP3CB. Once the two genes are selected (by shift-click), right-click on one

of the selected genes. A pop-up menu will be displayed (Figure 2.6). Select the “Add shortest paths”

option from this pop-up menu to find biomedical entities that link between these two nodes.

Figure 2.7: Results of linking the two genes by shortest paths.

As illustrated in Figure 2.7, a GO term, “Immunity, innate” (purple node), is added to connect RPS6KB2

and PPP3CB. Users can expand or delete nodes from this network by repeating these procedures. By

default, shortest paths will be added one at a time. However, users can change it by clicking “Menu” and

“Add shortest paths by” in the drop down menu to select the desired number of paths to add (see Section

1.4.4).

References:

Jeon, M., Lee, S., Lee, K., Tan, A.C., and Kang, J. (2014). BEReX: Biomedical Entity-Relationship

eXplorer. Bioinformatics. 30:135-136. [PMID: 24149052]

Saha A, Tan AC, Kang J (2014) Automatic Context-Specific Subnetwork Discovery from Large

Interaction Networks. PLoS ONE. 9(1): e84227. doi:10.1371/journal.pone.0084227

iberex biomedical entity-relationship explorer tutorial …iberex.korea.ac.kr/tutorial.pdf ·...

Documents