pathway studio workgroup/enterprise training course
DESCRIPTION
Pathway Studio Workgroup/Enterprise training course. DAY 1 Technology overview System architecture. Pathway Studio Desktop Pathway Studio Workgroup Pathway Studio Enterprise Main functionality: Data mining and pathway building Analysis of high-throughput data - PowerPoint PPT PresentationTRANSCRIPT
©2006 Ariadne Genomics. All Rights Reserved.
Pathway Studio Workgroup/Enterprise training
course
©2006 Ariadne Genomics. All Rights Reserved.
DAY 1Technology overview System architecture
©2006 Ariadne Genomics. All Rights Reserved.
3©2006 Ariadne Genomics. All Rights Reserved.
Products
• Pathway Studio Desktop• Pathway Studio Workgroup• Pathway Studio Enterprise
Main functionality:1) Data mining and pathway building
2) Analysis of high-throughput data
3) Text-mining and fact extraction
©2006 Ariadne Genomics. All Rights Reserved.
4©2006 Ariadne Genomics. All Rights Reserved.
Ariadne Corporate OfferingSoftware solution for Knowledge management and pathway analysis of the high-throughput data
Knowledge Databases
ResNetBiological Association
Networks
Pathway BuildingPathway collection
MedScan
1000 abstracts/min
Proprietary data
Public interaction data
Analysis of High-Throughput data
Text-mining
©2006 Ariadne Genomics. All Rights Reserved.
5©2006 Ariadne Genomics. All Rights Reserved.
Accomplishments (April, 2007)
188 publications using AGI software and ResNet database• Gene expression microarray analysis (105)• Pathway Analysis (80)• Disease mechanism (64)• Human genetics (7)• Publication by Ariadne Authors (13)• Text processing (9)• Reviews (6)• Databases (3)• Drug discovery (16)• Toxicogenomics (4)
©2006 Ariadne Genomics. All Rights Reserved.
6©2006 Ariadne Genomics. All Rights Reserved.
Pathway Studio Workgroupclient-server architecture
DatabaseRead-only users
Data curators
Third party tools,in-house applications,API SQL interface,
bulk data management
PSW administrator
©2006 Ariadne Genomics. All Rights Reserved.
7©2006 Ariadne Genomics. All Rights Reserved.
PathwayExpert Architecture
Bioinformaticians via Pathway Studio
Database
Application server
Read-only users via web browser
Data editorsvia web browser
Third party tools,in-house applications,API SQL interface,
bulk data management
©2006 Ariadne Genomics. All Rights Reserved.
8©2006 Ariadne Genomics. All Rights Reserved.
“Everyone is an Expert” decentralized deployment schema
Hundreds or thousands of users some with read only and some with editor or publishers roles accessing one central database via Pathway Studio and/or Web browser to analyze experiments, browse pathway collection, do literature mining, sharing the data and analysis results.
©2006 Ariadne Genomics. All Rights Reserved.
9©2006 Ariadne Genomics. All Rights Reserved.
“Bioinformatics service group” centralized deployment schema
Bioinformatics group servicing scientists for entire company by analyzing their experimental data and literature mining. Analysis results are published via Web browser interface for end users
Bioinformatics group1) Analysis of experimental data2) Text-mining and Pathway
Building
View only access to pathways and analysis networks annotated with experimental data via web browser and links to PathwayExpert Web Services
1) Experimental data2) Search requests
End users
©2006 Ariadne Genomics. All Rights Reserved.
10©2006 Ariadne Genomics. All Rights Reserved.
“Disease area” decentralized clusters deployment schemaDisease area groups have bioinformatics, biologists and chemists working as a
team with focus on one disease
Cardiovascular group Cancer group
Digestive disorders group CNS group
©2006 Ariadne Genomics. All Rights Reserved.
Day 1Introduction to MedScan technology
©2006 Ariadne Genomics. All Rights Reserved.
12©2006 Ariadne Genomics. All Rights Reserved.
Ariadne MedScan Text-To-Knowledge Technology Extracting biological association networks from text
Knowledge Databases
ResNetBiological Association
NetworksPathway Analysis in ResNet database
MedScan1000 abstracts/min
Pathway Studio to navigate knowledgebase
MedScan output: RNEF XML
©2006 Ariadne Genomics. All Rights Reserved.
13©2006 Ariadne Genomics. All Rights Reserved.
How MedScan extracts facts from text?• Sentence in PubMed:
“Axin binds beta-catenin and inhibits GSK-3beta.”• Identify Proteins in Dictionary (in red):
“Axin binds beta-catenin and inhibits GSK-3beta.”• Identify Interaction Type (in black):
“Axin binds beta-catenin and inhibits GSK-3beta.”
• Extracted Facts: Axin - beta-catenin relation: Binding Axin -> GSK-3beta relation: Regulation, effect: Negative
Syntactic Layer Noun Phrase Verb Phrase Noun Phrase
Semantic Layer Protein Protein Relations
Protein
©2006 Ariadne Genomics. All Rights Reserved.
14©2006 Ariadne Genomics. All Rights Reserved.
Describing MedScan
• Manually curated: dictionaries and grammar rules
• Fast: 14 mln PubMed abstracts in 2 days on modern PC
• Comprehensive: facts recovery rate > 90%
• Removes redundancy: 7,647,282 non-distinct relations =>1,000,000 distinct relations
• Accurate: false positive rate – 10%
• Customizable: dictionaries and patterns
©2006 Ariadne Genomics. All Rights Reserved.
15©2006 Ariadne Genomics. All Rights Reserved.
MedScan Architecture
Entity recognizer
Semantic processor
Pattern matcher
Entity detection
Relationship extraction
Dictionaries
Rules
Patterns
Modules
Mam
mal
s
Pla
nt
s
Toxic
olo
gy
Cartridges
Future:•New modules: ConceptScan•New cartridges: Immunology, Clinical
Yea
st
Dro
sop
hila
Customizable by user
C-
ele
gan
s
RNEFXML
©2006 Ariadne Genomics. All Rights Reserved.
16©2006 Ariadne Genomics. All Rights Reserved.
Overview of MedScan ArchitectureInput Text Input Text
Tokenizer
Semantic Interpreter
Semantic treeSemantic tree
Tagged SentencesTagged Sentences
Ontological interpreter
Syntactic Parser
Preprocessor
Sequence of Words Sequence of Words
Sentence StructureSentence Structure
Databaseof relations
Grammar
Lexicon
Extractionrules
Protein names dictionary
Converter
Extracted factsExtracted facts
Dictionary-based
Identifies proteins and small molecules
Context-free grammar
Grammar and lexicon are proprietary.
They are domain-independent by design but focused on biomedical field.
Rule-based
Rules are equivalentto ontology
Pattern Matcher
Extraction patterns
©2006 Ariadne Genomics. All Rights Reserved.
17©2006 Ariadne Genomics. All Rights Reserved.
MedScan ApplicationsPubmed
Open access
MedScan
Entity-based indexSemantic Index
Automatic reader’s digest Document Summary
Indexing the scientific literature
Extracting interactions to create databases for systems biology
©2006 Ariadne Genomics. All Rights Reserved.
18©2006 Ariadne Genomics. All Rights Reserved.
Text-mining tools in Pathway Studio• Tools -> Start MedScan Reader
– Web-browser enhanced with MedScan technology– Search PubMed and manually select abstracts for fact extraction– Search Google Scholar and extract facts from top 100 hits– Search Google and extract facts from top 30 hits– Search Highwire and BioMed Central and extract facts from the
individual full-text articles• Tools -> MedScan: Extract pathways from text
– search PubMed– from file– from location
• Tools -> Update pathway• Tools -> Pathway Reference summary
– Export to EndNote
©2006 Ariadne Genomics. All Rights Reserved.
19©2006 Ariadne Genomics. All Rights Reserved.
Medscan Reader settings1) Specifying MedScan
cartridge
2) Tracking favorite entities via highlight
3) Filtering for favorite entities and relations
4) Filtering against entities and relations
©2006 Ariadne Genomics. All Rights Reserved.
Day 1Ariadne ResNet database construction
©2006 Ariadne Genomics. All Rights Reserved.
21©2006 Ariadne Genomics. All Rights Reserved.
ResNet Mammal Database
• Shipped with >1,000,000 unique relations derived by Medscan between proteins, metabolites, chemicals, cell processes and diseases
• ResNet physical interactions are manually curated• 712 manually curated pathways• Gene Ontology• Optional pathway updates:
– >300 Regulome pathways– >2500 Biological processes pathways– >200 Cellular component pathways– High-throughput interaction data
• ResNet automatically curation is possible to remove redundancy and cleanup false positives
©2006 Ariadne Genomics. All Rights Reserved.
22©2006 Ariadne Genomics. All Rights Reserved.
Pathways collection in ResNet
• Canonical pathways (included, curated)• Signaling line pathways (included, curated)• Regulome pathways (optional, automatic)• Biological processes pathways (optional, automatic)• Cellular component pathways (optional, automatic)• KEGG metabolic pathways (optional, imported)• STKE (commercial)• Metabolic vision (commercial)• PathArt (commercial)
©2006 Ariadne Genomics. All Rights Reserved.
23©2006 Ariadne Genomics. All Rights Reserved.
Ariadne databases for other organismsAll databases contain:- Relations extracted by MedScan organism-specific cartridge from organism-specific abstracts and full-text articles- Entrez Gene protein annotation- Protein interactions from Entrez Gene (include BIND, HPRD, BioGRID and EcoCyc datasets)- Gene Ontology annotation
Model Organism databases:• ResNet Plant >400,000 relations, supports 6 plant species
– Optional entity co-occurrence data– Additional protein physical interactions predicted by TAIR
• ResNet Drosophila– Additional interactions from published high-throughput datasets
• ResNet C-elegans– Additional interactions from published high-throughput datasets
•ResNet Yeast– Additional interactions from published high-throughput datasets
•ResNet Bacteria (beta version)– Additional interactions from published high-throughput datasets
Databases for non-model organisms containing interactions predicted from closest model organism are available from: http://www.ariadnegenomics.com/support/downloads/databases/
©2006 Ariadne Genomics. All Rights Reserved.
24©2006 Ariadne Genomics. All Rights Reserved.
Additional Commercial Datasets
• KEGG: > 130 metabolic pathways from Kyoto U-ty• STKE: > 70 pathways from AAAS• Metabolic vision: >10,000 curated pathways for 587
organisms from Integrated Genomics Inc• Hynet: adds over 100,000 new protein physical
interactions to ResNet 5.0 from Prolexys Inc• PathArt: >600 disease pathways from Jubilant Inc
©2006 Ariadne Genomics. All Rights Reserved.
Day1Pathway Studio maintenance and
administration and technical support
©2006 Ariadne Genomics. All Rights Reserved.
26©2006 Ariadne Genomics. All Rights Reserved.
Hardware requirements for Pathway Studio• Pathway Studio desktop or workgroup client
– CPU: 2 GHz or more – RAM: 512 MB or more – Disk space for application: 500 MB– Disk space for one local database: 2 GB
• PathwayStudio workgroup server– 1 CPU for 1-5 concurrent users: : >3.0 GHz – 2 CPU for 6-10 concurrent users: >3.0 GHz– RAM for 1-5 concurrent users: >2 GB– RAM for 6-10 concurrent users >3 GB – Disk space : 20 GB for the database – Optimal disk configuration:
• for 1-5 concurrent users: 4 hard drives in RAID 0• for 6-10 concurrent users: RAID 10 mode
©2006 Ariadne Genomics. All Rights Reserved.
27©2006 Ariadne Genomics. All Rights Reserved.
Pathway Studio software requirements
• Pathway Studio desktop or workgroup client– Microsoft Windows Server (2000,2003), Windows XP
(Professional), Windows Vista (Professional, Ultimate, Corporate)
• PathwayStudio workgroup server– MS SQL Server 2000 or 2005 (Developer, Workgroup,
Standard or Enterprise Edition) on Windows 2000, Windows 2003 Server, Windows XP Professional
– Oracle 10g or later on any supported Oracle platform including Windows 2003 Server, Linux, etc.
©2006 Ariadne Genomics. All Rights Reserved.
28©2006 Ariadne Genomics. All Rights Reserved.
Connecting to the central workgroup database
©2006 Ariadne Genomics. All Rights Reserved.
29©2006 Ariadne Genomics. All Rights Reserved.
Connecting to the server enterprise database
©2006 Ariadne Genomics. All Rights Reserved.
30©2006 Ariadne Genomics. All Rights Reserved.
Database Index folder
• Database statistics
• Viewing entities in the list pane
• Viewing pathways
• Viewing groups
• Expression experiments folder
• Simulation model folder
©2006 Ariadne Genomics. All Rights Reserved.
31©2006 Ariadne Genomics. All Rights Reserved.
PS Workgroup Admin consoleUser roles in Workgroup environment
• Administrator• Editor – can edit public objects• Publisher – can publish private pathways• Regular user – can work only in his private spaceAsk your PSW administrator to get an account and choose your role
©2006 Ariadne Genomics. All Rights Reserved.
32©2006 Ariadne Genomics. All Rights Reserved.
Ariadne Technical Support http://www.ariadnegenomics.com/products/support.html
©2006 Ariadne Genomics. All Rights Reserved.
33©2006 Ariadne Genomics. All Rights Reserved.
Summary of the introduction slides
• Medscan technology
• Software architecture, hardware and software requirements
• User roles
• ResNet database overview
• Ariadne’s technical support
©2006 Ariadne Genomics. All Rights Reserved.
34©2006 Ariadne Genomics. All Rights Reserved.
Summary for the rest of the day
• Working with objects in database• Working with pathway diagram and layout algorithms• Database search in PS• Build pathway tool and strategy • Data import/export• Pathways in ResNet • Pathway comparison and statistical algorithms Find
groups/pathways• Text-mining in PS• Microarray analysis: data import options and algorithms• Pathway kinetics simulation in PS
©2006 Ariadne Genomics. All Rights Reserved.
DAY 1Pathway Building in Pathway Studio
• Manual• Automatic using Graph navigation tools• Using text-mining with MedScan
©2006 Ariadne Genomics. All Rights Reserved.
36©2006 Ariadne Genomics. All Rights Reserved.
Viewing and editing pathways in Pathway Studio
• Viewing entities in the List Pane• Entity and relation tables• Show all references• Pathway Reference summary• Export protein list• Display styles: By type, By effect, By reference count• UI options:
– magnifier– fit text to entities– simple and full graph view– fit to window– rotate– move– zoom by rectangle– advanced graph scaling
• resizing nodes in pathway pane
©2006 Ariadne Genomics. All Rights Reserved.
37©2006 Ariadne Genomics. All Rights Reserved.
Finding entities and relations in Pathway Studio database• Quick search
• String search• Search by attribute• Build pathway tool
©2006 Ariadne Genomics. All Rights Reserved.
38©2006 Ariadne Genomics. All Rights Reserved.
Viewing and editing entity/relation properties
Edit Entity property dialog, URN identifierLinks to external databasesAdding new properties, Declaring new properties in the database
©2006 Ariadne Genomics. All Rights Reserved.
39©2006 Ariadne Genomics. All Rights Reserved.
Palette pane
• Making a figure legend for your publication
• Viewing group display styles
• Drag & drop entity icon into pathway pane
©2006 Ariadne Genomics. All Rights Reserved.
40©2006 Ariadne Genomics. All Rights Reserved.
Images pane
• Drag & drop images into pathway pane
• Importing your own images
• Image properties
©2006 Ariadne Genomics. All Rights Reserved.
41©2006 Ariadne Genomics. All Rights Reserved.
KEGG pathways layoutnode cloning in pathway graph
• 131 metabolic pathways• 20,972 connected proteins
©2006 Ariadne Genomics. All Rights Reserved.
42©2006 Ariadne Genomics. All Rights Reserved.
Several methods for adding objects and relations to Pathway pane
Adding objects:
• Drag & drop from the palette
• Drag & drop from the list pane
Adding relations:
• Connect selected entities button
• Enter a fact box
• Drag & drop from the list pane
©2006 Ariadne Genomics. All Rights Reserved.
43©2006 Ariadne Genomics. All Rights Reserved.
Building pathways by manual curation in Pathway Studio
In GeneMapp
In Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
44©2006 Ariadne Genomics. All Rights Reserved.
Building pathways by manual curation in Pathway Studio
• Complex Nodes• Adding components
to Complex Nodes
In GeneMapp
In Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
45©2006 Ariadne Genomics. All Rights Reserved.
Questioner about the previous slides
• How many chemical reactions in the ResNet database?
• What is the default image for Transcription factor in PS?
• How many images for cell membrane can be in PS?
• What is the quickest search in PS?• What is the quickest way to add relation to your
pathway diagram?
©2006 Ariadne Genomics. All Rights Reserved.
DAY 1Automatic Pathway Building using
Graph navigationBuild pathway tool
©2006 Ariadne Genomics. All Rights Reserved.
47©2006 Ariadne Genomics. All Rights Reserved.
Mining regulatory relations in database
Basic principal:Regulatory interactions are mediated by physical interaction network
– Regulomes– Biological processes pathways– Disease pathways
©2006 Ariadne Genomics. All Rights Reserved.
48©2006 Ariadne Genomics. All Rights Reserved.
Build Pathway dialog
•Build pathway options
•Filtering by direction
•Number of steps
•Build pathway filter
The main application of the Build pathway tool is to quickly find connections between entities of interest therefore its button is available from all panes:
©2006 Ariadne Genomics. All Rights Reserved.
49©2006 Ariadne Genomics. All Rights Reserved.
Build pathway filters
• Using entity filters to answer different biological questions
• Using relation filter to analyze different types of high-throughput data
• Filtering by properties
©2006 Ariadne Genomics. All Rights Reserved.
50©2006 Ariadne Genomics. All Rights Reserved.
Build pathway Edit Results
• Display filtering• Selecting results based on local connectivity• IsNew column
©2006 Ariadne Genomics. All Rights Reserved.
51©2006 Ariadne Genomics. All Rights Reserved.
Automatic layout options• Direct force layout
– charges and springs– Good to find hubs in the pathway
• Hierarchical layout – Directed graph– Good for metabolic pathways (KEGG, ERGO)
• Symmetric layout (Centric graph)– Good for Expand pathway
• Cell localization layout (Circular and linear membrane)Configurable:– Cell localization annotation– Organelle images layout– Association of Cell localization value and Organelle image
• Dynamic layout– Direct-force like with adjustable spring force– Use cell localization if organelle
©2006 Ariadne Genomics. All Rights Reserved.
52©2006 Ariadne Genomics. All Rights Reserved.
Regulome pathways: algorithm input
©2006 Ariadne Genomics. All Rights Reserved.
53©2006 Ariadne Genomics. All Rights Reserved.
Regulome pathways: algorithm result
©2006 Ariadne Genomics. All Rights Reserved.
54©2006 Ariadne Genomics. All Rights Reserved.
Building pathways by Data miningconverting regulatory network to protein physical interaction network for Cell Processes, Diseases, Regulomes
©2006 Ariadne Genomics. All Rights Reserved.
55©2006 Ariadne Genomics. All Rights Reserved.
Disease networks2300 diseases, 230 cancers in ResNet 5.0
Entities associated with Endothelial cells cancer in ResNet
©2006 Ariadne Genomics. All Rights Reserved.
56©2006 Ariadne Genomics. All Rights Reserved.
Endothelial cells cancer network
©2006 Ariadne Genomics. All Rights Reserved.
57©2006 Ariadne Genomics. All Rights Reserved.
Data-mining techniques and hints
• Different filter settings – different biological questions. Know the relation type meaning
• Directional filter to perform upstream/downstream analysis
• Relaxing search by including the Regulation relations• To mine for more specific relations use search
Relation by Sentence include “your focus keyword”– Find relation mentioned in certain tissue– Find specific mechanism: trans-activation, cleavage etc…
• Filter by relation confidence using Relation table to increase network confidence
©2006 Ariadne Genomics. All Rights Reserved.
DAY 1 Build pathway settings asking different biological questions
©2006 Ariadne Genomics. All Rights Reserved.
59©2006 Ariadne Genomics. All Rights Reserved.
Finding major regulators among DE genes
First choice for expression data
2
3
Second choice for expression data
1Third choice for expression data
©2006 Ariadne Genomics. All Rights Reserved.
60©2006 Ariadne Genomics. All Rights Reserved.
Upstream analysis of DE genes and gene clusters
First choice for expression data2
3
Second choice for expression data
1
Third choice for expression data
12
3
©2006 Ariadne Genomics. All Rights Reserved.
61©2006 Ariadne Genomics. All Rights Reserved.
Analysis of proteomics co-IP data
©2006 Ariadne Genomics. All Rights Reserved.
62©2006 Ariadne Genomics. All Rights Reserved.
Analysis of proteomics phosphoprofiling experiments
©2006 Ariadne Genomics. All Rights Reserved.
63©2006 Ariadne Genomics. All Rights Reserved.
Analysis of metabolomics experiment
12
3
21
3
Importing metabolomics experiment
©2006 Ariadne Genomics. All Rights Reserved.
64©2006 Ariadne Genomics. All Rights Reserved.
Relaxing Build pathway settings
• Replace Find only direct interactions by Find shortest path
• Increase Maximum number of steps in Find common regulators or in Find shortest path
©2006 Ariadne Genomics. All Rights Reserved.
Day 1Pathway Building by text-mining
Non-melanoma skin cancer >1,000,000 cases, (<2,000 deaths), in USA
©2006 Ariadne Genomics. All Rights Reserved.
66©2006 Ariadne Genomics. All Rights Reserved.
MedScan Reader: PubMed search
Keep searching and adding relations
At the end Send extracted relations to Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
67©2006 Ariadne Genomics. All Rights Reserved.
MedScan Reader: Import top 100 Hits from Google Scholar search: downloads found articles and processes them with MedScan
©2006 Ariadne Genomics. All Rights Reserved.
68©2006 Ariadne Genomics. All Rights Reserved.
MedScan Reader: Import top 30 Hits from Google search: downloads found web-pages and processes them with MedScan
©2006 Ariadne Genomics. All Rights Reserved.
69©2006 Ariadne Genomics. All Rights Reserved.
Full-text article found on Highwire press with “non-melanoma skin cancer” text search
©2006 Ariadne Genomics. All Rights Reserved.
70©2006 Ariadne Genomics. All Rights Reserved.
“Non-melanoma skin cancer” literature network – result of text-mining by MedScan Reader
Every entity in this network was mentioned in the context of non-melanoma skin cancer
©2006 Ariadne Genomics. All Rights Reserved.
71©2006 Ariadne Genomics. All Rights Reserved.
Protein interaction network for non-melanoma skin cancer using information from entire ResNet
Compare this pathway with your experimental patient data
©2006 Ariadne Genomics. All Rights Reserved.
72©2006 Ariadne Genomics. All Rights Reserved.
Text-mining techniques and hintscontrolling relevance of literature networks
• Search with keywords for full-text articles and subsequent MedScan fact extraction loosely associates keywords with facts: you find all facts mentioned in the one article with your keywords
• Search with keywords for PubMed abstracts and subsequent MedScan fact extraction provides better relevance of the extracted facts to your keywords: you find all facts mentioned in the one abstract with your keywords
• Search with keywords for sentences extracted by MedScan provides the most relevant relevance of the extracted facts to your keywords: you find all facts mentioned in the one abstract with your keywords
Relevance Vs. Recovery
0
20
40
60
80
100
120
full-text abstract senetence
%
Relevance
Recovery
©2006 Ariadne Genomics. All Rights Reserved.
DAY 1Data Import/Export
©2006 Ariadne Genomics. All Rights Reserved.
74©2006 Ariadne Genomics. All Rights Reserved.
Tools ->Import Protein List
• Choice of identifiers
• Lookup preview
• Paste and Load from file
• Import as New group of proteins
©2006 Ariadne Genomics. All Rights Reserved.
75©2006 Ariadne Genomics. All Rights Reserved.
Tools -> Import Protein Network
• Choice of identifiers• Lookup preview• Paste and Load from file• Import of Regulatory relations
©2006 Ariadne Genomics. All Rights Reserved.
76©2006 Ariadne Genomics. All Rights Reserved.
Importing Chip-On-Chip data as PromoterBinding relations using Tools->Import Protein Network
©2006 Ariadne Genomics. All Rights Reserved.
77©2006 Ariadne Genomics. All Rights Reserved.
Import creates a new pathway with new relations
©2006 Ariadne Genomics. All Rights Reserved.
78©2006 Ariadne Genomics. All Rights Reserved.
Database ->Import Wizard
• Importing from Internet• Import formats and options• Specifying source for entities and relation• Specifying source folder for pathways
©2006 Ariadne Genomics. All Rights Reserved.
79©2006 Ariadne Genomics. All Rights Reserved.
Database ->Export Wizard
• Exporting pathways• Export filters• Export strategy• Exporting entities annotation in Plain text format
©2006 Ariadne Genomics. All Rights Reserved.
DAY 1Data management, pathway
comparison, find groups/pathways
©2006 Ariadne Genomics. All Rights Reserved.
81©2006 Ariadne Genomics. All Rights Reserved.
Working with groups in Pathway Studio
• Create group
• Add Entities to a group
• Add group as a node into pathway pane
• Select/Highlight by group
• Maintaining group hierarchy
©2006 Ariadne Genomics. All Rights Reserved.
82©2006 Ariadne Genomics. All Rights Reserved.
Edit -> Combine Pathway
• Union
• Intersection
• Subtract
©2006 Ariadne Genomics. All Rights Reserved.
83©2006 Ariadne Genomics. All Rights Reserved.
Tools for pathways comparison in Pathway Studio
• Combine pathways• Select• Highlight
©2006 Ariadne Genomics. All Rights Reserved.
84©2006 Ariadne Genomics. All Rights Reserved.
Statistical algorithms for pathway comparison in Pathway Studio
• Find Pathways
• Find Groups
• Gene Ontology analysis
©2006 Ariadne Genomics. All Rights Reserved.
DAY 2 Analysis of high-throughput data
in Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
86©2006 Ariadne Genomics. All Rights Reserved.
Experiment types
• Gene expression– Find major regulators– Find biomarkers– Gene clustering
• Metabolomics– Find major metabolism regulators– Combined analysis with gene
expression • Proteomics
– Mass-spec protein level– Finding major kinases/phosphatase
for phosphoprofiles
©2006 Ariadne Genomics. All Rights Reserved.
87©2006 Ariadne Genomics. All Rights Reserved.
Data model in ResNet database Use different networks for different types of experimental data
Expression
PromoterBinding
DirectRegulation
ProtModification
Binding
MolSynthesis
MolTransport
Regulation
Interpretation of Gene Expression data
Interpretation of Proteomics data
Interpretation of Metabolomics data, Biomarkers prediction and validation
…MORE….
©2006 Ariadne Genomics. All Rights Reserved.
88©2006 Ariadne Genomics. All Rights Reserved.
Analysis of gene expression microarray data: import and selection of responsive genes
• Data import– Tab-delimited and Excel files– Affymetrix CEL files (with RMA normalization)– GenePix (GPR)Result: Save the experiment in the Expression favorites
• Selection of responsive genes– Find differentially expressed genes (significance analysis via t-test) for
analysis of two samples measured in multiple replicas – Gene clustering via correlation networks (Pearson correlation)– Find responsive genes in the 3d party software for statistical analysis of
microarray data and import it as a list (Tools->Import protein list)Result: save as group of genes in Groups folder
©2006 Ariadne Genomics. All Rights Reserved.
89©2006 Ariadne Genomics. All Rights Reserved.
Analysis of gene expression microarray data: Pathway Analysis
• Network analysis– Identification of DE expressed protein complexes and physical networks
• Build pathway: Find direct regulation, filter for physical interactions (Binding, DirectRegulation, ProtModification)
• Build differentially expressed networks, filter by Binding (PS Enterprise only)– Identification of major regulators and targets in expression network:
• Build pathway: Find direct regulation, filter for Expression and/or PromoterBinding interactions, use hierarchical layout
• Find significant regulators (network enrichment analysis) filter by Expression, PromoterBinding (PS Enterprise only)
Result: save as pathway
• Functional analysis– Find groups/pathways
• Gene ontology analysis• Comparative gene ontology analysis
– Build pathway: Find common targets, filter by CellProcess– Find DE groups/pathways (Gene Set Enrichment analysis, GSEA)Result: List of groups/pathways with p-values indicating statistical significance of differential
expression. Save as a group, as analysis results or export to Excel
©2006 Ariadne Genomics. All Rights Reserved.
90©2006 Ariadne Genomics. All Rights Reserved.
Most common workflow for microarray analysis in Pathway Studio for disease
• Identify genes differentially expressed in disease (DE genes)
• Identify genes known to associate to disease according to the literature using Pathway Studio
• Identify DE genes that are linked to known diseases genes using Pathway Studio
• Report novel disease genes
©2006 Ariadne Genomics. All Rights Reserved.
91©2006 Ariadne Genomics. All Rights Reserved.
Expression Data Import wizard
• Generic tab-delimited format– Import any matrix expression data containing
expression values and/or p-values. Minimum requirement: one column with gene identifiers and one column with sample
• Import of Affymetrix CEL (RMA averaging)
• Import of Molecular devices Genepix format with Vera & Sam normalization
©2006 Ariadne Genomics. All Rights Reserved.
92©2006 Ariadne Genomics. All Rights Reserved.
Expression experiment viewer in Pathway Studio
• Experiment properties
• Gene identifier column: views, sorting, find
• Heat map scale
• Filter genes by value
• Filtering by genes by pathway
• Text view for expression matrix
• Create group from selection
©2006 Ariadne Genomics. All Rights Reserved.
93©2006 Ariadne Genomics. All Rights Reserved.
Finding differentially expressed genes in Pathway Studio (significance analysis):Two-sample t-test = Between groups t-test
Finds genes that are differentially expressed between two classes of samples measured independently on single color microarrays. Examples: multiple replicas of one untreated (1) and multiple replicas of one treated sample (2); multiple replicas of one normal sample (1) and multiple replicas of one disease sample (2);Calculated p-values indicate significance of expression difference between replicas marked 1 and replicas marked 2.
©2006 Ariadne Genomics. All Rights Reserved.
94©2006 Ariadne Genomics. All Rights Reserved.
Finding differentially expressed genes in Pathway Studio (significance analysis): Paired samples t-test, usually for two channel microarray platform
Find genes which are differentially expressed between two classes of samples when the comparison is performed in one experiment (two color or two channel microarray) but multiple times.The first class is marked by positive integer and the corresponding sample from the second class measured on the same array is marked by the negative integer with the same absolute value. Calculated p-values indicate significance of expression difference between two sample classes.
©2006 Ariadne Genomics. All Rights Reserved.
95©2006 Ariadne Genomics. All Rights Reserved.
Finding differentially expressed genes in Pathway Studio (significance analysis): DE genes in multiple experimental log ratio samples
If you have imported pre-calculated your data as log ratios of the normalized expression values you should use this test to find differentially expressed genes for multiple replicas of normalized expression values. Calculated p-values indicate how far the ratio of a given gene deviates from the global mean of ratios across all genes and samples.
©2006 Ariadne Genomics. All Rights Reserved.
96©2006 Ariadne Genomics. All Rights Reserved.
Gene expression clustering using Relevance networkExpression -> Build network from expression -> Pearson correlation
©2006 Ariadne Genomics. All Rights Reserved.
97©2006 Ariadne Genomics. All Rights Reserved.
Parameters for Pearson correlation
Major parameters:• Percent of genes to remove – removes less variable genes. Controls number of vertices in the
graph. Keep number of proteins under 1000 in the network• Threshold – allows correlation links above threshold. Controls number of edges in the graph.• Number of permutations – turn on automatic Threshold calculation using randomized
expression samples.• P-value – select most non-random correlation links. Controls number of edges in the graph.
Value 0.01 corresponds to 10% of all possible links equal to (number of vertices)2
©2006 Ariadne Genomics. All Rights Reserved.
98©2006 Ariadne Genomics. All Rights Reserved.
Finding upstream regulator for a gene cluster using Build pathway option Find common regulators
©2006 Ariadne Genomics. All Rights Reserved.
99©2006 Ariadne Genomics. All Rights Reserved.
Finding major transcription regulators among differentially expressed genes
Use Build pathway tool option Find direct interactions with filtering for PromoterBinding and Expression to reduce the complexity of your differential expression pattern
©2006 Ariadne Genomics. All Rights Reserved.
100©2006 Ariadne Genomics. All Rights Reserved.
Build pathway filter stringencies
Gene Expression:• Promoter Binding > Expression > Regulation > Co-
occurrence• Protein > Complex > Functional ClassMetabolomics:• MolSynthesis > RegulationProteomics:• Direct Regulation > ProtModification > Binding >
Regulation• Protein > Complex > Functional Class
©2006 Ariadne Genomics. All Rights Reserved.
101©2006 Ariadne Genomics. All Rights Reserved.
Questioner Day 1
• What is the quickest Entity search in Pathway Studio?
• What is the most comprehensive Entity search in Pathway Studio?
• How to create a group in PathwayStudio and add entities to it?
• How to Build pathway from the up-regulated genes in you microarray experiment?
©2006 Ariadne Genomics. All Rights Reserved.
102©2006 Ariadne Genomics. All Rights Reserved.
Workflow 1 Build pathway for EDG regulation
• Using GeneMapp pathway as a guide build the EDG1 pathway in PathwayStudio:– Find proteins for EDG1 pathway– Find relations for EDG1 pathway– Create additional relations missing from ResNet
database – Arrange nodes by cell localization– Save pathway as HTML for web publication
©2006 Ariadne Genomics. All Rights Reserved.
103©2006 Ariadne Genomics. All Rights Reserved.
Workflow 2Create a pathway containing groups and sub-
pathways as nodes.
• Continue building EDG pathway by adding sub-pathways and groups
• Complet the pathway by text-mining search with filtering
©2006 Ariadne Genomics. All Rights Reserved.
104©2006 Ariadne Genomics. All Rights Reserved.
Workflow 3Find drug regulating kinases
• Find kinases in the database with connectivity >0– Search by attribute for Functinal class = Kinase and
Connectivity >0
• Find drugs regulating these kinases– Expand pathway from kinases with filter by small molecules– Select drugs in the expanded pathway– Select neighbors for drugs– Copy selection in the new pathway
©2006 Ariadne Genomics. All Rights Reserved.
105©2006 Ariadne Genomics. All Rights Reserved.
Workflow 4Find biological processes regulated by proteins
involved in prostate cancer
• Find prostate cancer disease node
• Find proteins regulating prostate cancer
• Find cell processes affected by these proteins
• Sort found processes by number of prostate cancer protein regulators
©2006 Ariadne Genomics. All Rights Reserved.
Day 2Advanced workflows in Pathway Studio
©2006 Ariadne Genomics. All Rights Reserved.
107©2006 Ariadne Genomics. All Rights Reserved.
Workflow 1. Comparative Gene ontology analysis (Folberg’s experiment)
Import of CEL files1) Calculation of the differentially expressed genes2) Creating a group from DE genes3) Finding statistically significant GO groups4) Creating a pathway from GO groups5) Comparing two lists of GO groups6) Finding DE genes in GO groups
Comparing lists of the differentially expressed GO groups rather than DE genes is more sensitive when comparing the responses in two cell lines, patients and other samples.
©2006 Ariadne Genomics. All Rights Reserved.
108©2006 Ariadne Genomics. All Rights Reserved.
Comparing lists of the differentially expressed GO groups rather than DE genes is more sensitive when comparing the responses in two cell lines, patients and other samples
Subtracting groups
6 genes
Subtracting genes
No significant groups
©2006 Ariadne Genomics. All Rights Reserved.
109©2006 Ariadne Genomics. All Rights Reserved.
Two groups of genes differentially expressed during growth in 3D culture vs. flat culture for aggressive and non-aggressive tumors are selected
non-aggressive
aggressive
flat
3Dno growth
flat
3Dgrowth
1. Genes of interest2. Groups of interest
©2006 Ariadne Genomics. All Rights Reserved.
110©2006 Ariadne Genomics. All Rights Reserved.
Comparative GO group analysis of aggressive vs. non-aggressive uveal melanoma
Open DE GO groups from aggressive tumors
Compare with DE GO groups from non-aggressive tumors
©2006 Ariadne Genomics. All Rights Reserved.
111©2006 Ariadne Genomics. All Rights Reserved.
Select GO groups related to your experimental goals(cell adhesion DE groups unique for aggressive tumors)
These groups are significant in aggressive melanoma when we compare its growth in 3D matrix vs. flat culture
These groups are NOT significant in non-aggressive melanoma when we compare its growth in 3D matrix vs. flat culture
©2006 Ariadne Genomics. All Rights Reserved.
112©2006 Ariadne Genomics. All Rights Reserved.
A network of differentially expressed in aggressive uveal melanoma involved in cell adhesion
©2006 Ariadne Genomics. All Rights Reserved.
113©2006 Ariadne Genomics. All Rights Reserved.
23 SP1 targets among DE genes in cell adhesion network unique for aggressive uveal melanoma during 3D growth
©2006 Ariadne Genomics. All Rights Reserved.
114©2006 Ariadne Genomics. All Rights Reserved.
Supportive evidence for SP1 role in melanoma aggressiveness
©2006 Ariadne Genomics. All Rights Reserved.
115©2006 Ariadne Genomics. All Rights Reserved.
Workflow 2. Three methods to find biological processes affected by DE genes
1) Find groups from Biological processes Gene Ontology classification2) Find pathways indicating biological processes3) Build pathway option Find common targets filtering for Cell ProcessIncludes: - Finding proteins using Search by attribute (cell localization) and then
determining their biological processes
©2006 Ariadne Genomics. All Rights Reserved.
116©2006 Ariadne Genomics. All Rights Reserved.
Workflow 3. Three ways to find biomarkers in Pathway Studio
• By text-mining– Extract pathways from text: PubMed Search for your Disease
• By data-mining– Search for disease of interest in the database– Use Build Pathway: Expand option to find Disease biomarkers
• By gene expression data analysis– Identify Differentially expressed genes– Use Build pathway: Direct interaction option to find proteins that are
downstream of many DE genes. These proteins are most likely biomarkers according to your expression data (See also next slide)
©2006 Ariadne Genomics. All Rights Reserved.
117©2006 Ariadne Genomics. All Rights Reserved.
Workflow 4. Building disease network using Build pathway tool
Includes:• Finding disease of interest in the database• Finding proteins contributing to disease• Finding biomarkers for a disease• Building disease networks using:
– Build pathway Find direct interactions for protein regulating disease
– Build pathway Expand pathway for protein biomarkers– Combining two pathways– Layout by cell localization - Text –mining : updates?
©2006 Ariadne Genomics. All Rights Reserved.
118©2006 Ariadne Genomics. All Rights Reserved.
Workflow 5. Building pathway by text-mining for Li-Fraumeni syndrome
Includes:• Creating new local database• Use of Search PubMed option (Db import)• Consolidation of the db (db updates / groups)• Understanding the major protein players in Li-
Fraumeni syndrome• Understanding regulators / targets / cell
processes associated with Li-Fraumeni syndrome