EGAN Tutorial:Loading Network Data
October, 2009
Jesse Paquette
UCSF Helen Diller Family Comprehensive Cancer Center
Preamble
• This document has many slides with multi-step animations– Best viewed in Slide Show mode
• The EGAN graphical user interface is evolving– Icons may change
– Menus may change
– Button/widget placement may change
– This document probably won’t change as quickly
– Please contact the developers if you notice major discrepancies between this and EGAN
Loading network data: An overview
• The EGAN pre-collated network represents only a fraction of available data
• Additional data can be loaded as– Gene sets/association nodes
• Pathways, annotation terms, articles, transcription factor targets, miRNA targets, conserved domains, significant gene sets/clusters from experiments, etc.
– Gene-gene edges• Protein-protein interactions, literature co-occurrence,
expression correlation, sequence homology, transcription factor targets, kinase targets, etc.
• This document will outline the steps for loading additional gene sets and gene-gene edges into EGAN
Loading gene sets into EGAN
Loading gene sets into EGAN:Gene set file formats• Two possible tab-delimited text formats
– GMT• All default pre-collated gene sets in EGAN are all specified via GMT
files• Each row represents a different gene set
– GMX• Transposed GMT• Each column represents a different gene set
• First two columns of GMT (or rows for GMX) specify– Gene set ID (first column)
• Can potentially be used to link out to the gene set’s web page via URL– Gene set name (second column)
• Can be empty or same as the ID• Subsequent columns list the genes in each set
– Gene identifiers must be mappable to Entrez Gene IDs• EGAN provides a wide variety of mapping file options
– Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc.
• EGAN expects that all entity IDs are the same type for each file
Loading gene sets into EGAN: An example
First column: gene set IDs
Second column: gene set names
Later columns: gene identifiers
Each row is a gene set
Loading gene sets into EGAN: An example
Save as tab-delimited text
Loading gene sets into EGAN: An example
• Download or construct a gene set file– This example will use c2.cgp.v2.5.symbols.gmt from
MSigDB (download this file to follow along)• You’ll have to log-in with your email address to
download MSigDB gene sets
• Launch EGAN H. sapiens
Loading gene sets into EGAN: An example
Click on “7) Association Data”
Shown are the default pre-collated gene sets.
We want to load a new one.
Click “Browse…”
Select your GMT file and click“Specify gene association set”.
This GMT file uses Gene Symbols for gene identifiers.
Select “HUGO Gene Symbol” from the drop-down menu.
Now specify that these gene sets are of type “MSigDB C2: chemical and genetic perturbations” by selecting that option from the drop-down menu.
This MSigDB type has been pre-defined for EGAN, which is why it exists in this menu. Finally, click “Add Set”When you are finished loading data,
click “Finish – Launch EGAN”.
Loading gene sets into EGAN: An example
Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file.
If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).
Loading gene sets into EGAN: An example
When EGAN finishes loading, your new set(s) will be available for exploration
Loading gene-gene edges into EGAN
Loading gene-gene edges into EGAN:File formats
• Two possible tab-delimited text formats– SIF (Simple Interaction File) format commonly used in Cytoscape
• .sif extension (required in EGAN)• Each line represents a gene-gene relationship• Three columns
– First column is first gene– Middle column is ignored in EGAN– Third column is second gene
– EGAN interaction file format• .txt file extension• Three columns, like SIF
– Middle column is a PubMed ID
• Gene identifiers must be mappable to Entrez Gene IDs– EGAN provides a wide variety of mapping file options
• Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc.
– EGAN expects that all entity IDs are the same type for each file
Loading gene-gene edges into EGAN: An example
Each row is a gene-gene relationship
Third column: second gene
First column: first gene
Loading gene-gene edges into EGAN: An example
Save as tab-delimited text
Loading gene-gene edges into EGAN: An example
• Download or construct a gene-gene edge file– This example will use HPN.sif, a set of kinase-target
relationships available in the “.sif Gzip-ed files” link at NetworKIN (download this file to follow along)
• You’ll have to accept the NetworKIN license in order to download data
• Launch EGAN H. sapiens
Loading gene-gene edges into EGAN: An example
Click on “8) Gene Relationship Edges”
Shown are the default pre-collated gene-gene edge files.
We want to load a new one.
Click “Browse…”
Select your SIF (or EGAN .txt) file and click “Specify gene-gene edge set”
This SIF file uses Gene Symbols for gene identifiers.
Select “HUGO Gene Symbol” from the drop-down menu.
Now specify that these gene sets are of type “NetworKIN” by selecting that option from the drop-down menu.
The NetworKIN type has been pre-defined for EGAN, which is why it exists in this menu.
Finally, click “Add Set”When you are finished loading data, click “Finish – Launch EGAN”
Loading gene-gene edges into EGAN: An example
Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file.
If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).
Loading gene-gene edges into EGAN: An example
When EGAN finishes loading, your new gene-gene edges will be available for exploration
Loading network data: Tips and hints
• Both the MSigDB and NetworKIN types were pre-defined in EGAN– This may not be the case for your new data– You can use the “Custom Node/Custom Edge” types as a default
• You can specify your own type definitions in a Type Definition file– Give your added nodes and edges distinct colors and links– See item 4 in the Launch EGAN Wizard– Use this type definition file as a template – just add the appropriate lines for
your new types• You can specify gene set, gene-gene edge and mapping files via
URL (or .jar file, but that’s tricky)– Just type or paste the URL into the appropriate text field instead of clicking
“Browse…”• Potential issues to consider
– Identifiers used in your gene set/gene-gene edge file might not be found in the mapping file
– Genes in your mapping file might not be present in the network– These issues are written (rather crudely) to the Log
• Inspect the log file if you notice unexpected behavior
Questions/comments?
• Visit http://groups.google.com/group/ucsf-egan for downloads, documentation and discussion– Requires an account with Google Groups