cytoscape ci chapter 1
TRANSCRIPT
1
Cytoscape CyberinfrastructureLeveraging Microservicesto the Cloud and Beyond
Chapter 1
ISMB/NetBioSIG 2015Dublin, IrelandJuly 10, 2015
Keiichiro Ono, Dexter Pratt & Barry DemchakIdeker Lab
2
Cytoscape’s 3 Wishes
More … memory for networks cores for analysis code reusability languages/libraries for coding
Better browser presence Access to long running calculations Quicker/cheaper novel workflows Higher quality, more shareable code Even more vibrant NB community
3
Cytoscape Cyberinfrastructure (CI)
Internet-based computing ecosystem that Complements Cytoscape Supports producers, consumers, and
operators (as COIs) Scales and evolves to support data
acquisition, computing, storage, management, integration, mining, and visualization
Sharable and Testable Coevolution – community w/ CI / community
Service Oriented Architecture (SOA) Microservices + data bus + discovery
4
Roadmap
Existing Ecosystem Cyberinfrastructure (CI) & Network Biology
Use Cases Strategy
Technology SOA & REST CX & middleware
CI Now & Later Support Call to Community
5
Existing Ecosystem Visual Workflow Systems - Taverna & Galaxy (& MyExperiment)
Service Repositories - BioCatalogue
General programming languages & tools - Python, R, Java, Matlab, IPython/Jupyter
Network Analysis & Visualization – Cytoscape, cytoscape.js, GeneMANIA
Cytoscape CI shared workflows shared services novel workflows scalable
6
CI & Network Biology
Identify network
Add data to
network
Layout nodes
Color nodes
Publish
New Service New Service New Service New Service
BridgeDB
Clients Services
Critical CI Outcomes Cheap services ~ innovation Reproducible workflows Interoperable tool chains Code & algorithm reusability Community Community Community
7
CI - Future of Publishing
8
NAV – CI-based Workflow
9
Generic Microservices
Producer Database
OK
StoreData(xxx)
Tim
e
Producer Database
Message Bus
Sto
reD
ata(
xxx)
OK
)(xfy
For a service, the meaning of life:
Benefits Loose Coupling Late Binding Decentralized
Governance Scalability Reusability
Distributability Portability Composability Interoperability Testability
10
Cytoscape CI
CytoscapeDesktop
Message Bus (Internet)Message Bus (Internet)
Analytics LayoutNDEx (Store/
Retrieve)
Journal Publishing NeXO
Personal Publishing
R/Python/Matlab
LayoutLayoutLayoutAnalyticsAnalyticsAnalytics
cyNetShare
Gene-MANIA
BridgeDB MCODE
Data Model Layouts
Serv
ices
Appl
icati
ons
CX is an aspect-oriented transfer format CX carries networks and related data
11
CX Transfer Format
1
3 2
Example Graph
4
5
nodes Aspect
edges Aspect
cartesianLayout Aspect
Aspect Relationships
Organizes
Positions
ID=1 ID=2 ID=3
nodes Aspect (3 nodes)
edges Aspect (2 edges)
Source Target Source TargetID=4 ID=5
cartesianLayout Aspect
X=100 Y=100ID X=200 Y=200ID X=100 Y=200ID
CX Encoding
Benefits Streamable (large networks) Lossless (BioPAX, SGML, OpenBEL…) Extensible (new aspects) Mature parsers (JSON) JSON LD (RDF compatible) Purpose-optimized transfers (aspects) Community, community, community
12
CX in Action
[
{"nodes": [{"@id": "_:1"}, {"@id": "_:2"}]},
{"edges": [{"source": "_:1", "@id": "_:4", "target": "_:2"}]},
{"cartesianLayout": [{"x": "100", "node": "_:1", "y": "100"}]},
{"cartesianLayout": [{"x": "200", "node": "_:2", "y": "300"}]},
{"nodes": [{"@id": "_:3"}]},
{"edges": [{"source": "_:2", "@id": "_:5", "target": "_:3"}]},
{"cartesianLayout": [{"x": "100", "node": "_:3", "y": "200"}]}
]
13
API Perspective - Simple
ServiceClient
CX Library
Service call (w/CX)
REST
Results return (w/CX)CX Library
Long running jobs require long running clients Allows only one service at a time
14
API Perspective - Elaborated
Node
ServiceInterfa
ce
CX Library
ServiceInterfa
ce
CX Library
ServiceInterfa
ce
CX Library
SubmitAgent
...
Node
Running
Results Collector
Results DatabaseResults Database
Client
CX Library
Complete
Monitor DatabaseMonitor Database
Status Monitor
Service call (w/CX)Service return (jobID)
Status call (jobID)Status return
REST
Mes
sage
Bro
ker
Service call (w/CX)MQ
Save
resu
lts
Query status (jobID)
Results call (jobID)Results return (w/CX)
QueuedLoad
Balancer
15
Implementation Perspective
NodeService (any language)Interfa
ce
(Zero M
Q)
CX Library
SubmitAgent
(Python Flask)
Node
Results Collector (Python)
Results DatabaseResults Database
Client (any
language)
CX Library
Monitor DatabaseMonitor Database
Status Monitor
(Python)
Zero
MQ
REST MQ
16
CI Now
Cytoscape
R / Python / Matlab / C#
cyREST
cyNetShare
cytoscape.js
cytoscape.js
cytoscape.js
ScienceDirect
Cyrface
cytoscape.js
NDEx
cytoscape.js
NAVNetwork
Based Stratification
Heat Dissipation
ID Translation (BridgeDB)
XGMML
.cyjs
App Store
.cyjs
WS/SOAP
17
CI Later
Cytoscape
R / Python / Matlab / C#
cyREST/CX
cyNetShare
cytoscape.js
cytoscape.js
cytoscape.js
ScienceDirect
Cyrface
cytoscape.js
NDEx
cytoscape.js
NAVNetwork
Based Stratification
?DREAM?
?GIANT?Heat Dissipation
ID Translation (BridgeDB)
Layouts
Clustering (?MCODE?)
Network Prediction
(?GeneMANIA?)
Attribute Merge
CX
Enrichment
CX
CX
CX
?Taverna?
?Galaxy?
CIAuthApp Store
18
CI Later w/Reuse
Cytoscape
R / Python / Matlab / C#
cyREST/CX
cyNetShare
cytoscape.js
cytoscape.js
cytoscape.js
ScienceDirect
Cyrface
cytoscape.js
NDEx
cytoscape.js
NAVNetwork
Based Stratification
?DREAM?
?GIANT?Heat Dissipation
ID Translation (BridgeDB)
Layouts
Clustering (?MCODE?)
Network Prediction
(?GeneMANIA?)
Attribute Merge
CX
Enrichment
CX
CX
CX
?Taverna?
?Galaxy?
CIAuthApp Store
Support
National Resource for Network Biology (NRNB) Supports software and staging hardware
Pharma & NCI support NDEx Elsevier All sources open and on GitHub
19
20
Call to Community
App authorshipCytoscape community thrives Pride of authorship, listing in App Store Tangible realization of useful research Valuable workflows for all to use Publishable results (e.g., F1000)
CI community inherits all of these! … but also: More direct path from algorithm to useful code Wider audience Easier coding & dissemination Better coding practices More resources
More [email protected]
21
Reading List http://martinfowler.com/articles/microservices.html http://home.ndexbio.org/about-ndex-2 http://idekerlab.github.io/cy-net-share Lincoln Stein. Towards a cyberinfrastructure for the biological
sciences: progress, visions and challenges. http://www.nature.com/nrg/journal/v9/n9/full/nrg2414.html
Barry Demchak, et al. PALMS: A Modern Coevolution of Community and Computing Using Policy Driven Development. https://sosa.ucsd.edu/ResearchCentral/view.jsp?id=203
Stephen Goff, et al. The iPlant collaborative: cyberinfrastructure for plant biology. http://journal.frontiersin.org/article/10.3389/fpls.2011.00034/pdf
22
End of Deck
Backup slides are beyond here
23
Existing Ecosystem
Visual Workflow Systems Taverna & Galaxy – high level orchestration MyExperiment – sharing workflows
Service Repositories BioCatalogue
General programming languages & tools Python, R, Java, Matlab, IPython/Jupyter
Network Analysis & Visualization Cytoscape & cytoscape.js GeneMANIA
Cytoscape Cyberinfrastructure (?)
Shar
ing /
Reuse
Nove
l wor
kflow
s
Wor
ks w
/Cy
Scale
s
++ - - ++
++ - - ++
++
+ ++ + -
+ + ++ -
+ - +
++ ++ ++ ++
24
CX Timings
Using Human network (18K nodes, 127K edges) CX output around 150MB Timings exclude accessing Cytoscape data
model – Cytoscape data model increases timings by 2-4x
Aspect Read (ms) Write (ms)Nodes 6 3
Edges 97 51
NodeAttrs 77 58
EdgeAttrs 1289 1077