unclassified// protected information// proprietary information building applications with the medici...

24
Unclassified// Protected Information// Proprietary Information Building Applications with Building Applications with the MeDICi Integration the MeDICi Integration Framework Framework Scientist/Analyst Miners Plumbers Tool Builders Ian Gorton, Ian Gorton, Justin Almquist, Justin Almquist, Jack Chatterton, Jack Chatterton, Adam Wynne Adam Wynne

Upload: scot-pitts

Post on 25-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Unclassified// Protected Information// Proprietary Information

Building Applications with the Building Applications with the MeDICi Integration Framework MeDICi Integration Framework Building Applications with the Building Applications with the

MeDICi Integration Framework MeDICi Integration Framework

Scientist/AnalystMiners

Plumbers Tool Builders

Ian Gorton, Ian Gorton, Justin Almquist, Justin Almquist, Jack Chatterton, Jack Chatterton, Adam WynneAdam Wynne

2

Unclassified// Protected Information// Proprietary Information

OutlineOutlineOutlineOutline

What is the MeDICi Integration Framework (MIF)?What can the MIF do for you?How does it do it?What’s available right now and it better be fast …How do I get started?

3

Unclassified// Protected Information// Proprietary Information

What is the MeDICi Integration What is the MeDICi Integration Framework (MIF)?Framework (MIF)?What is the MeDICi Integration What is the MeDICi Integration Framework (MIF)?Framework (MIF)?

Java-based integration technologyComponent-based API for creating analytical pipelines Asynchronous component model for Java or non-Java (eg .exe, C/C++, R, Haskell,

etc) codes (flexible) Components can be distributed or run in MIF container (scalable) Component communicate over a variety of protocols (e.g. JMS, Web Services,

sockets, etc) (configurable) Non-pipeline architectures supported (e.g. feedback loops, worker pools)

Built on robust, industry-tested Java technologies Mule (ESB/SOA compliant) JMS (eg JBoss, ActiveMQ, SonicMQ) ehcache

4

Unclassified// Protected Information// Proprietary Information

Filter Calc1 Proxy Merge Viz

Proxy

DB Query Format Useful

Code

Reference Database

Results Database

Calc1

Data

Example analytical pipeline message flow

5

Unclassified// Protected Information// Proprietary Information

What can the MIF do for you?What can the MIF do for you?What can the MIF do for you?What can the MIF do for you?Provide a common API for designing components Make downstream integration

straightforward Make iterative development and

integration testing easyMake it easy to create applications: using new/legacy components that

were not designed to work together that must execute in a distributed

environmentSupport flexible deployments Components are loosely-coupled Components can be configured to

suit deployment needs

MIFAPI

MIFAPI

MIFAPI

MIFAPI

MIFAPI

New

MIFAPI

6

Unclassified// Protected Information// Proprietary Information

How does it do it?How does it do it?How does it do it?How does it do it?

Some components execute in MIF container – JavaSome execute outside MIF container – language (who cares)MIF containers can be partitioned/replicated

Filter Calc1 Proxy Merge Viz

Proxy

DB Query Format Useful

Code

Distributed Component code

Configurable protocol

7

Unclassified// Protected Information// Proprietary Information

Scaling MIF applicationsScaling MIF applicationsScaling MIF applicationsScaling MIF applications

Filter DB Query Format Useful

Code

Filter DB Query Format Useful

Code

Filter DB Query Format Useful

Code

ReplicatedMIF

PartitionedMIF

8

Unclassified// Protected Information// Proprietary Information

Example: Calculating Functional Example: Calculating Functional Overrepresentation PipelineOverrepresentation PipelineExample: Calculating Functional Example: Calculating Functional Overrepresentation PipelineOverrepresentation Pipeline

BRM Database

BRM Business Logic (Jboss EJB)

MIF Component Pipeline

Call Cross Reference EJB to collect identifiers for KEGG database

BRM GUIBRM GUI

Gene ID List

FunctionalOverrep.

Call EJB to get Pathways from KEGG DB

Call EJB to get all Genes in each pathway from KEGG

Call EJB to Calculate Functional Overrepresentation

BRM GUIBRM GUI

9

Unclassified// Protected Information// Proprietary Information

Component Composition in MIFComponent Composition in MIFComponent Composition in MIFComponent Composition in MIF

Single Module

Simple Pipeline

ComponentProcessing

Module

Implementation Class

Outbound Endpoint

Processing Module

Implementation Class

Inbound Endpoint

Outbound Endpoint

Data Data DataInbound Endpoint MIF Component

10

Unclassified// Protected Information// Proprietary Information

Chat Traffic Analysis ExampleChat Traffic Analysis ExampleChat Traffic Analysis ExampleChat Traffic Analysis Example

A “real-world” example applicationAnalysis of chat messagesUtilizes many MIF constructs: Pipeline Components Modules Aggregators Routing Endpoints Package structure

11

Unclassified// Protected Information// Proprietary Information

Chat Traffic Analysis ModelChat Traffic Analysis ModelChat Traffic Analysis ModelChat Traffic Analysis Model

12

Unclassified// Protected Information// Proprietary Information

Chat Example Code - MainChat Example Code - MainChat Example Code - MainChat Example Code - MainCreate a pipeline

Setup the pipeline endpoints (input & output to application)

MifPipeline pipeline = new MifPipeline();

MifEndpoint inEndp = pipeline.addMifEndpoint("inEndp", EndpointType.JMS, "topic/ChatDataTopic"); MifEndpoint outEndp = pipeline.addMifEndpoint("outEndp", EndpointType.STREAM, "console.out?outputMessage=CHAT RESULT: ");

Wire the pipeline and start listening for messagesMap<String, MifEndpoint> endps = new HashMap<String, MifEndpoint>(); endps.put("chat-in", inEndp); endps.put("chat-out", outEndp); pipeline.addMifComponent(new ChatComponent("ChatComponent", endps));  

pipeline.start();

13

Unclassified// Protected Information// Proprietary Information

Chat Example Code - ComponentChat Example Code - ComponentChat Example Code - ComponentChat Example Code - ComponentGet the input/output endpoints

Ingest (subset)

MifEndpoint inChatEndp = getEndpoint("chat-in"); MifEndpoint outChatEndp = getEndpoint("chat-out");

//construct the ingest moduleMifEndpoint outIngestKeywordEndp = pipeline.addMifEndpoint("outIngestKeywordEndp", EndpointType.VM, "ingest.keyword.queue"); MifModule ingestModule = new MifModule("IngestModule", Ingest.class.getName(), inChatEndp, outIngestKeywordEndp, null); //add the module to the pipeline pipeline.addMifModule(ingestModule);

KeywordMifEndpoint inKeywordEndp = pipeline.addMifEndpoint("inKeywordEndp", EndpointType.VM,

"ingest.keyword.queue"); MifEndpoint outKeywordEndp = pipeline.addMifEndpoint("outKeywordEndp", EndpointType.VM, "keyword.queue"); pipeline.addMifModule("KeywordModule", Keyword.class.getName(), inKeywordEndp, outKeywordEndp, null);

14

Unclassified// Protected Information// Proprietary Information

Chat Example Code – Component cntd…Chat Example Code – Component cntd…Chat Example Code – Component cntd…Chat Example Code – Component cntd…Get the input/output endpoints

MifEndpoint inKeywordAggEndp = pipeline.addMifEndpoint("inKeywordAggEndp", EndpointType.VM, "keyword.queue"); // create the aggregator module which is just a place holder for the actual aggregator construct. Note that this // is the final module in the component so the outbound endpoint is one specified outside // the component (outChatEndp). MifModule chatAggregateModule = new MifModule("AggregateModule", ChatAggregate.class.getName(), inKeywordAggEndp, outChatEndp, null); // Add the aggregator to the pipeline and assign it to the module itself MifAggregator chatAnalysisAggregator = pipeline.addMifAggregator(new ChatAnalysisAggregator()); chatAggregateModule.setAggregator(chatAnalysisAggregator); // finally, add the module to the pipeline and we're done configuring the component. pipeline.addMifModule(chatAggregateModule);

15

Unclassified// Protected Information// Proprietary Information

Chat Example Code – Processing ModuleChat Example Code – Processing ModuleChat Example Code – Processing ModuleChat Example Code – Processing ModuleBlackout.java Delegate to “real” implementation

blackout.processContentAnalysis(message);

public class Blackout implements MifInOutProcessor { Logger log = Logger.getLogger(Blackout.class); private String pathToBlackoutFile = "blackout.txt"; private static BlackoutId blackout = null;  

public Blackout() { initBlackout(); }  

public Serializable listen(Serializable input) { MapWrapper data = (MapWrapper) input; HashMap message = data.getMap();   if(blackout != null) { blackout.processContentAnalysis(message); }   return new MapWrapper(message); }   … … …}

16

Unclassified// Protected Information// Proprietary Information

What’s available right now?What’s available right now?What’s available right now?What’s available right now?

The MIF API Used/tested in several DICI projects

DocumentationHooks for connecting to our provenance technology …

17

Unclassified// Protected Information// Proprietary Information

Capturing Provenance Capturing Provenance Capturing Provenance Capturing Provenance

Metadata about workflows What processes ran What data we used in each step

MIF API has extensions to communicate provenance data Asynchronous JMS events

Current implementation captures raw in/out data Useful but not scalable Designing a data virtualization layer

to support refs from provenance to real data

PNNL Provenance Architecture

18

Unclassified// Protected Information// Proprietary Information

Using ProvenanceUsing ProvenanceUsing ProvenanceUsing Provenance

19

Unclassified// Protected Information// Proprietary Information

and it better be fast …and it better be fast …and it better be fast …and it better be fast …And of course scalableSo we created a benchmark A friction test A measure of ‘middleware’ overhead

MIF Container

Splitter

Component1

Component2

Aggregator

JMS

JMS

Results Collector

LoadGenerator

Load Generator sends messages to booth JMS queues at some known rate (eg 100 per second?).

20

Unclassified// Protected Information// Proprietary Information

But you can trust us – But you can trust us – we’re scientistswe’re scientistsBut you can trust us – But you can trust us – we’re scientistswe’re scientists

Throughput - Messages/second

0

2000

4000

6000

8000

10000

12000

14000

1024 16384 32768 65536 131072 262144

Message Size

Mes

sag

es/s

ec

server 1

servers 2

servers 3

servers 4

servers 5

servers 6

servers 7

Throughput - GBytes/day

0

1000

2000

3000

4000

5000

6000

1024 16384 32768 65536 131072 262144

Message Size

GB

ytes

Server 1

server 2

server 3

servers 4

servers 5

Servers 6

Servers 7

1650 m/sec for 1K messages Scales linearly to 7 servers

Peak throughput of 5.4TB/day for 128K messages on 2 servers that rate swamped the cluster

switch – hardware limitation! 290 m/sec on 1 server (3.3 TB/day

throughput)

Grove specs9 nodesAll connected via a single 1Gb switchHardware1 Dell 2850 connected to RAID8 Dell 1850Dual Intel Xeon processors (hyperthreaded) @ 3.0 GHz4GB RAM1 RAID @ ~5TBSoftwareRed Hat Enterprise Linux 4 Linux kernel 2.6.9-55.0.2.ElsmpSonicMQ 7,5java version "1.6.0_03"

21

Unclassified// Protected Information// Proprietary Information

How do I get started?How do I get started?How do I get started?How do I get started?

We have a wiki - medici.pnl.gov/wiki API docs and installation guide Examples Design and programming guidelines More being added every day :-}

And we’re available to help Initial adoption/design Support ‘Consulting’

22

Unclassified// Protected Information// Proprietary Information

And finally - the MeDICi ‘Vision’And finally - the MeDICi ‘Vision’And finally - the MeDICi ‘Vision’And finally - the MeDICi ‘Vision’

BPEL Designer

Data

Visualization

Analysis

Analysis

Analysis Analysis

Data

Biologist

Design Execution

MIF Deployment Node

MIF Pipeline Builder

AnalysisComponent

Designer

MIFComponent

Catalog

BPEL Engine Deployment Node

BPEL Genertor

Model-driven code generation

Provenance Store

<BPEL script>……

……..…….

</BPEL script>

Web Services

23

Unclassified// Protected Information// Proprietary Information

That’s all folks!That’s all folks!That’s all folks!That’s all folks!

We believe that the MIF can: Help you deliver high quality solutions to clients Faster, cheaper, especially for ‘integration’ projects Help you easily leverage other internal/external codes in your

solutions Give us a ‘lingua franca’ – a step towards wide-scale component

reuse

But we’re just humble plumbers … We need application partners to deliver to clients

• You take the kudos, we write invisible plumbing sat in dark corners … We need feedback on how to improve the technology

24

Unclassified// Protected Information// Proprietary Information

Questions?Questions?Questions?Questions?