unclassified// protected information// proprietary information building applications with the medici...
TRANSCRIPT
Unclassified// Protected Information// Proprietary Information
Building Applications with the Building Applications with the MeDICi Integration Framework MeDICi Integration Framework Building Applications with the Building Applications with the
MeDICi Integration Framework MeDICi Integration Framework
Scientist/AnalystMiners
Plumbers Tool Builders
Ian Gorton, Ian Gorton, Justin Almquist, Justin Almquist, Jack Chatterton, Jack Chatterton, Adam WynneAdam Wynne
2
Unclassified// Protected Information// Proprietary Information
OutlineOutlineOutlineOutline
What is the MeDICi Integration Framework (MIF)?What can the MIF do for you?How does it do it?What’s available right now and it better be fast …How do I get started?
3
Unclassified// Protected Information// Proprietary Information
What is the MeDICi Integration What is the MeDICi Integration Framework (MIF)?Framework (MIF)?What is the MeDICi Integration What is the MeDICi Integration Framework (MIF)?Framework (MIF)?
Java-based integration technologyComponent-based API for creating analytical pipelines Asynchronous component model for Java or non-Java (eg .exe, C/C++, R, Haskell,
etc) codes (flexible) Components can be distributed or run in MIF container (scalable) Component communicate over a variety of protocols (e.g. JMS, Web Services,
sockets, etc) (configurable) Non-pipeline architectures supported (e.g. feedback loops, worker pools)
Built on robust, industry-tested Java technologies Mule (ESB/SOA compliant) JMS (eg JBoss, ActiveMQ, SonicMQ) ehcache
4
Unclassified// Protected Information// Proprietary Information
Filter Calc1 Proxy Merge Viz
Proxy
DB Query Format Useful
Code
Reference Database
Results Database
Calc1
Data
Example analytical pipeline message flow
5
Unclassified// Protected Information// Proprietary Information
What can the MIF do for you?What can the MIF do for you?What can the MIF do for you?What can the MIF do for you?Provide a common API for designing components Make downstream integration
straightforward Make iterative development and
integration testing easyMake it easy to create applications: using new/legacy components that
were not designed to work together that must execute in a distributed
environmentSupport flexible deployments Components are loosely-coupled Components can be configured to
suit deployment needs
MIFAPI
MIFAPI
MIFAPI
MIFAPI
MIFAPI
New
MIFAPI
6
Unclassified// Protected Information// Proprietary Information
How does it do it?How does it do it?How does it do it?How does it do it?
Some components execute in MIF container – JavaSome execute outside MIF container – language (who cares)MIF containers can be partitioned/replicated
Filter Calc1 Proxy Merge Viz
Proxy
DB Query Format Useful
Code
Distributed Component code
Configurable protocol
7
Unclassified// Protected Information// Proprietary Information
Scaling MIF applicationsScaling MIF applicationsScaling MIF applicationsScaling MIF applications
Filter DB Query Format Useful
Code
Filter DB Query Format Useful
Code
Filter DB Query Format Useful
Code
ReplicatedMIF
PartitionedMIF
8
Unclassified// Protected Information// Proprietary Information
Example: Calculating Functional Example: Calculating Functional Overrepresentation PipelineOverrepresentation PipelineExample: Calculating Functional Example: Calculating Functional Overrepresentation PipelineOverrepresentation Pipeline
BRM Database
BRM Business Logic (Jboss EJB)
MIF Component Pipeline
Call Cross Reference EJB to collect identifiers for KEGG database
BRM GUIBRM GUI
Gene ID List
FunctionalOverrep.
Call EJB to get Pathways from KEGG DB
Call EJB to get all Genes in each pathway from KEGG
Call EJB to Calculate Functional Overrepresentation
BRM GUIBRM GUI
9
Unclassified// Protected Information// Proprietary Information
Component Composition in MIFComponent Composition in MIFComponent Composition in MIFComponent Composition in MIF
Single Module
Simple Pipeline
ComponentProcessing
Module
Implementation Class
Outbound Endpoint
Processing Module
Implementation Class
Inbound Endpoint
Outbound Endpoint
Data Data DataInbound Endpoint MIF Component
10
Unclassified// Protected Information// Proprietary Information
Chat Traffic Analysis ExampleChat Traffic Analysis ExampleChat Traffic Analysis ExampleChat Traffic Analysis Example
A “real-world” example applicationAnalysis of chat messagesUtilizes many MIF constructs: Pipeline Components Modules Aggregators Routing Endpoints Package structure
11
Unclassified// Protected Information// Proprietary Information
Chat Traffic Analysis ModelChat Traffic Analysis ModelChat Traffic Analysis ModelChat Traffic Analysis Model
12
Unclassified// Protected Information// Proprietary Information
Chat Example Code - MainChat Example Code - MainChat Example Code - MainChat Example Code - MainCreate a pipeline
Setup the pipeline endpoints (input & output to application)
MifPipeline pipeline = new MifPipeline();
MifEndpoint inEndp = pipeline.addMifEndpoint("inEndp", EndpointType.JMS, "topic/ChatDataTopic"); MifEndpoint outEndp = pipeline.addMifEndpoint("outEndp", EndpointType.STREAM, "console.out?outputMessage=CHAT RESULT: ");
Wire the pipeline and start listening for messagesMap<String, MifEndpoint> endps = new HashMap<String, MifEndpoint>(); endps.put("chat-in", inEndp); endps.put("chat-out", outEndp); pipeline.addMifComponent(new ChatComponent("ChatComponent", endps));
pipeline.start();
13
Unclassified// Protected Information// Proprietary Information
Chat Example Code - ComponentChat Example Code - ComponentChat Example Code - ComponentChat Example Code - ComponentGet the input/output endpoints
Ingest (subset)
MifEndpoint inChatEndp = getEndpoint("chat-in"); MifEndpoint outChatEndp = getEndpoint("chat-out");
//construct the ingest moduleMifEndpoint outIngestKeywordEndp = pipeline.addMifEndpoint("outIngestKeywordEndp", EndpointType.VM, "ingest.keyword.queue"); MifModule ingestModule = new MifModule("IngestModule", Ingest.class.getName(), inChatEndp, outIngestKeywordEndp, null); //add the module to the pipeline pipeline.addMifModule(ingestModule);
KeywordMifEndpoint inKeywordEndp = pipeline.addMifEndpoint("inKeywordEndp", EndpointType.VM,
"ingest.keyword.queue"); MifEndpoint outKeywordEndp = pipeline.addMifEndpoint("outKeywordEndp", EndpointType.VM, "keyword.queue"); pipeline.addMifModule("KeywordModule", Keyword.class.getName(), inKeywordEndp, outKeywordEndp, null);
14
Unclassified// Protected Information// Proprietary Information
Chat Example Code – Component cntd…Chat Example Code – Component cntd…Chat Example Code – Component cntd…Chat Example Code – Component cntd…Get the input/output endpoints
MifEndpoint inKeywordAggEndp = pipeline.addMifEndpoint("inKeywordAggEndp", EndpointType.VM, "keyword.queue"); // create the aggregator module which is just a place holder for the actual aggregator construct. Note that this // is the final module in the component so the outbound endpoint is one specified outside // the component (outChatEndp). MifModule chatAggregateModule = new MifModule("AggregateModule", ChatAggregate.class.getName(), inKeywordAggEndp, outChatEndp, null); // Add the aggregator to the pipeline and assign it to the module itself MifAggregator chatAnalysisAggregator = pipeline.addMifAggregator(new ChatAnalysisAggregator()); chatAggregateModule.setAggregator(chatAnalysisAggregator); // finally, add the module to the pipeline and we're done configuring the component. pipeline.addMifModule(chatAggregateModule);
15
Unclassified// Protected Information// Proprietary Information
Chat Example Code – Processing ModuleChat Example Code – Processing ModuleChat Example Code – Processing ModuleChat Example Code – Processing ModuleBlackout.java Delegate to “real” implementation
blackout.processContentAnalysis(message);
public class Blackout implements MifInOutProcessor { Logger log = Logger.getLogger(Blackout.class); private String pathToBlackoutFile = "blackout.txt"; private static BlackoutId blackout = null;
public Blackout() { initBlackout(); }
public Serializable listen(Serializable input) { MapWrapper data = (MapWrapper) input; HashMap message = data.getMap(); if(blackout != null) { blackout.processContentAnalysis(message); } return new MapWrapper(message); } … … …}
16
Unclassified// Protected Information// Proprietary Information
What’s available right now?What’s available right now?What’s available right now?What’s available right now?
The MIF API Used/tested in several DICI projects
DocumentationHooks for connecting to our provenance technology …
17
Unclassified// Protected Information// Proprietary Information
Capturing Provenance Capturing Provenance Capturing Provenance Capturing Provenance
Metadata about workflows What processes ran What data we used in each step
MIF API has extensions to communicate provenance data Asynchronous JMS events
Current implementation captures raw in/out data Useful but not scalable Designing a data virtualization layer
to support refs from provenance to real data
PNNL Provenance Architecture
18
Unclassified// Protected Information// Proprietary Information
Using ProvenanceUsing ProvenanceUsing ProvenanceUsing Provenance
19
Unclassified// Protected Information// Proprietary Information
and it better be fast …and it better be fast …and it better be fast …and it better be fast …And of course scalableSo we created a benchmark A friction test A measure of ‘middleware’ overhead
MIF Container
Splitter
Component1
Component2
Aggregator
JMS
JMS
Results Collector
LoadGenerator
Load Generator sends messages to booth JMS queues at some known rate (eg 100 per second?).
20
Unclassified// Protected Information// Proprietary Information
But you can trust us – But you can trust us – we’re scientistswe’re scientistsBut you can trust us – But you can trust us – we’re scientistswe’re scientists
Throughput - Messages/second
0
2000
4000
6000
8000
10000
12000
14000
1024 16384 32768 65536 131072 262144
Message Size
Mes
sag
es/s
ec
server 1
servers 2
servers 3
servers 4
servers 5
servers 6
servers 7
Throughput - GBytes/day
0
1000
2000
3000
4000
5000
6000
1024 16384 32768 65536 131072 262144
Message Size
GB
ytes
Server 1
server 2
server 3
servers 4
servers 5
Servers 6
Servers 7
1650 m/sec for 1K messages Scales linearly to 7 servers
Peak throughput of 5.4TB/day for 128K messages on 2 servers that rate swamped the cluster
switch – hardware limitation! 290 m/sec on 1 server (3.3 TB/day
throughput)
Grove specs9 nodesAll connected via a single 1Gb switchHardware1 Dell 2850 connected to RAID8 Dell 1850Dual Intel Xeon processors (hyperthreaded) @ 3.0 GHz4GB RAM1 RAID @ ~5TBSoftwareRed Hat Enterprise Linux 4 Linux kernel 2.6.9-55.0.2.ElsmpSonicMQ 7,5java version "1.6.0_03"
21
Unclassified// Protected Information// Proprietary Information
How do I get started?How do I get started?How do I get started?How do I get started?
We have a wiki - medici.pnl.gov/wiki API docs and installation guide Examples Design and programming guidelines More being added every day :-}
And we’re available to help Initial adoption/design Support ‘Consulting’
22
Unclassified// Protected Information// Proprietary Information
And finally - the MeDICi ‘Vision’And finally - the MeDICi ‘Vision’And finally - the MeDICi ‘Vision’And finally - the MeDICi ‘Vision’
BPEL Designer
Data
Visualization
Analysis
Analysis
Analysis Analysis
Data
Biologist
Design Execution
MIF Deployment Node
MIF Pipeline Builder
AnalysisComponent
Designer
MIFComponent
Catalog
BPEL Engine Deployment Node
BPEL Genertor
Model-driven code generation
Provenance Store
<BPEL script>……
……..…….
</BPEL script>
Web Services
23
Unclassified// Protected Information// Proprietary Information
That’s all folks!That’s all folks!That’s all folks!That’s all folks!
We believe that the MIF can: Help you deliver high quality solutions to clients Faster, cheaper, especially for ‘integration’ projects Help you easily leverage other internal/external codes in your
solutions Give us a ‘lingua franca’ – a step towards wide-scale component
reuse
But we’re just humble plumbers … We need application partners to deliver to clients
• You take the kudos, we write invisible plumbing sat in dark corners … We need feedback on how to improve the technology