process/data api. process api - intro the workflow engine runs applications – executable code in...

25
Process/data API

Post on 19-Dec-2015

237 views

Category:

Documents


0 download

TRANSCRIPT

Process/data API

Process API - intro

• The workflow engine runs applications– Executable code in different languages– API – methods– Web services

• Applications require setup to run– Where are they– Where will they run (farm, local machine, specific

machine– Data IO– Version etc

Process API - Intro

• We do this 2 ways• As a single object process

– We have defined a data object to hold things– We can use the same idea for the processAPI– Set up the object and “doIt”

• As setup calls and application call– Define setups for a process– Use a single call to run the process

ProcessAPI

• The following are the fields within the WFE process object. (ignoring WFE specific)– Name & Human-readable name : not impt.– type– File : Where, could be URL– Data : see later– Runtime/fail time : does the API monitor these– parameters

Process Object fields

• Type – Ie is this an exec, URL, and so on

• Process– The actual mapped process name. A Site specific mapping

will define the actual meaning of the process name

• Location : – Where is the application to run (client/server/farm), or

other things like URL.– Is it useful to have this in the WFE - XML file – or as a

separate process API XML setup. I would think the latter.

Process-API

• Data– The WFE data object defines input and output at

run time – only mutability is class (static)– We have to pass data to a process, then it might

be sensible to put the process object– See the data API definition for the object.– Some object containers are data in and some are

data out – they need to have the same structure though.

Process-API

• Runtime and failtime– These are WFE exception manager properties– It might not be a good idea reproduce the

exception outside the WFE as the WFE needs to handle any failure. Process failure must not be hidden from the WFE

Process API

• Parameters– Probably a python dictionary is best here.– Needs to be exposed to the WFE since different

parts of the workflow may need different parameters (consider MAXIT)

Process API

• The problem I have is defining which data object is which. The data object needs a definition so the program knows what the data – see process API.– Using python class object

ProcOb = ApiProcess()ProcOb.set( ‘name ‘,‘myAlignProg’)ProcObset(‘parameters’], ‘-P 33 –x ddd’)ProcOb.set(‘type’,‘exec’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘input’, data.ob[‘D2’])ProcOb.add(‘output’,data.ob[‘D3’])

These will of course be defined in the workflow engine variables.Note that adding of multiple data objects

Process API

• Program Exec– Executable– Process : Use a mapped

name for application – site specific

– Location : local/server/farm – mapped names

– How do we know which objects are which ?

ProcOb = ApiProcess()ProcOb.set(‘type’,‘exec’)ProcOb.set(‘process’,‘maxit’)ProcOb.set(‘location’,’server’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘input’, data.ob[‘D2’])ProcOb.add(‘output’,data.ob[‘D3’])processAPI.run (procOb)

Process API

• DataAPI copy– Copy data– Parameters = new version– Data objects – see later

ProcOb = ApiProcess()ProcOb.set(‘name ‘, ‘copy’)ProcOb.set(‘parameters’, ‘newVersion’)ProcOb.set(‘process’,‘method’)ProcOb.set(‘location’,’dataAPI’)ProcOb.add(‘input’, data.ob[‘D1’])ProcOb.add(‘output’,data.ob[‘D3’])processAPI.run (procOb)

Automated questions in XML• <wf:task taskID="TD3" name="SequenceOK" nextTask="J1" breakpoint="false">

<wf:description>Check whether the sequence align was OK</wf:description> <wf:decision type="AUTO"> <wf:dataObjectsLocation> <wf:location dataID="D6" type="input"/> </wf:dataObjectsLocation> <wf:nextTasks> <wf:nextTask taskID="TW4"> <wf:function dataID="D6" gte="20" less="200000000"/> </wf:nextTask> <wf:nextTask taskID="TM5"> <wf:function dataID="D6" gte="2" less="20"/> </wf:nextTask> <wf:nextTask taskID="T9"> <wf:function dataID="D6" gte="0" less="2"/> </wf:nextTask> </wf:nextTasks> </wf:decision> </wf:task>

Decision data object

Decision option

More complex functions will require python methods specific to the question

Detail description to technology

• A data object is pre-declared in the XML– Data place holder– Defines API object detail

• A task object can reference data objects– As input, output or both

• A process task :• API method• Exec program

<wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject>

<wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task to copy data object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Creating data objects in WFE• # the data object ID'

self.object.set("deposition-dataset-ID",depID) self.object.set("workflow-class-ID",classID) self.object.set("workflow-instance-ID",instID)

self.type = data.getAttribute("type") self.object.set("return-type",data.getAttribute("type")) if (data.getAttribute("mutable")=="true"): self.object.set("access",data.getAttribute("read-write")) else: self.object.set("access",data.getAttribute("read-only"))

# internal workflow cross reference self.name = data.getAttribute("dataID") self.nameHumanReadable = data.getAttribute("name")

for detail in data.childNodes: if (detail.nodeName == "wf:description"): self.description = detail.firstChild.data elif (detail.nodeName == "wf:location"): self.nameSpace = detail.getAttribute("namespace") self.object.set("data-object-name",detail.getAttribute("namespace")) self.where = detail.getAttribute("where") self.object.set("data-object-location",detail.getAttribute("where"))

Each data XML statement is stored as a reference object

This object is a place holder which can be passed to processes

It contains information where to access data

The engine data object– May be a real or virtual payload of data– Where, what and type– Payload is passed between tasks– The WF is a data processing pipeline

• A real value can be examined to effect the WF• The path is dependent on data values (auto/manual

decisions are based on these values)• The data version is WF instance data

– Can be domain data (via dataAPI)– Can be WF data (via statusAPI) – scope defined by the

object the data is stored in

Engine process manager• def run(self):

self.status = 1; for key, value in self.inputObjects istat = myApi.do(value)

• if self.task.uniqueType == "test": # test method - just counts for 5 seconds for i = in (0,5): time.sleep(1.0) elif self.task.uniqueType == "method": # this is an API process if self.task.uniqueWhere == "API": # this is an API method call self.processAPI.runMethod(task.uniqueName) elif self.task.uniqueType == "exec": # this is an exec program found "where" self.processAPI.runExec(task.uniqueName, task.uniqueWhere)

• for key, value in self.outputObjects istat = myApi.do(value)

self.statusAPI.setStatus(“finished”)

This is a thread – running inside exception manager

Send the request data objects

Get the response data objects

What sort of process is it ?

Workflow granularity• It does not really matter• A process can be as complex as you like

– Depends on go-back granularity– Depends on “how much would loose if it crashed”

• Data is the problem !– The workflow is a flow of data – so hiding data from the engine will

collapse a workflow to nothing.– The pathway choice is all about data – the less visible the data – the less

choice in the workflow.– If a process decides what to do with data the consequence is :

• Loose go-back ability• Loose track of the data and what is going on• Loose plug and play on the process.• Loose exception management.

Engine design examples

Read XML – store objects

and tasks

Run tasks – follow path

Start/restart (maybe at go-

back point)

Exit

Send data object

requests

Run process

Get response data objects

Send data objects to interface

Wait for interface

Send actionable

events

Get return action from

interface

Process task

Interface task

John’s requirements 1• 1) Identify and copy and archive object

– Object declaration <wf:dataObject dataID="D1" name="dataToCopy" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__old_object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataCopy" type="Object" dependence="D1" mutable="true"> <wf:description>General object - new copy of data</wf:description> <wf:location namespace="__new_object" where="DM"/> </wf:dataObject>

– Task declaration• <wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false">

<wf:description>Run API task to copy data object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIcopy" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Name reference

The actual data

The process – a method within the API

John’s requirement 2make new data version

• Declare data– Input D1– Output D2

• Declare task– Method in API

<wf:dataObjects> <wf:dataObject dataID="D1" name="dataToAddNewVersion" type="Object" mutable="true"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataNewVersion" type="Object" dependence="D1" mutable="true"> <wf:description>New version of data</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> </wf:dataObjects>

<wf:task taskID="T2" name="copyData" nextTask="T9" breakpoint="false"> <wf:description>Run API task create a new version of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APInewVersion" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

John’s requirement 3Get version list and show

• Data – 3 objects– D1 – object target– D2 – Version list– D3 – Which one to use

• Some tasks– Get list from API– Interface to chose

(not shown)

<wf:dataObject dataID="D1" name="dataObjectTarget" type="Object" mutable="false"> <wf:description>target object to query on</wf:description> <wf:location namespace="__object_name" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="VersionList" type="List" mutable="false"> <wf:description>Return version list</wf:description> <wf:location namespace="versionList" where="local"/> </wf:dataObject> <wf:dataObject dataID="D3" name="useVersion" type="Integer" mutable="true"> <wf:description>Version to use</wf:description> <wf:location namespace="version" where="WF"/> </wf:dataObject>

<wf:task taskID="T2" name="requestVersionList" nextTask="T3" breakpoint="false"> <wf:description>Run API to get the version list of an object</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIversionList" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

John’s requirement 4/5data selector

• A data object may need additional qualifiers to say what it is.– Selector value– “selection”

• It is likely that the qualifier will :– need to be a WF class (static) variable– Need to be a WF inst (dynamic) variable.

<wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="_entity.id=1" where="DM"/> </wf:dataObject>

<wf:dataObject dataID="D2" name="dataToGetwithQualifier" type="String" mutable="true"> <wf:description>general object with qualifer</wf:description> <wf:location namespace="__object" qualifier="set_entity.type='protein' where entity.id=1" where="DM"/> </wf:dataObject>

John’s requirement 6Length/size of object

• <wf:dataObject dataID="D1" name="dataTarget" type="Object" mutable="false"> <wf:description>General object to copy</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataLength" type="integer" dependence="D1" mutable="true"> <wf:description>Length of data object</wf:description> <wf:location namespace="dataLength" where="WF"/> </wf:dataObject>

<wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIObjectSize" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> </wf:dataObjectsLocation> </wf:process>

Define object and place holder for size value

Run task to input data to function, and return length

John’s requirement 7Format conversion

• <wf:dataObjects> <wf:dataObject dataID="D1" name="dataObjectPDB" type="Object" mutable="false"> <wf:description>General object to convert format</wf:description> <wf:location namespace="__object" where="DM"/> </wf:dataObject> <wf:dataObject dataID="D2" name="dataObjectMMCIF" type="Object" dependence="D1" mutable="true"> <wf:description>New data in different format</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> <wf:dataObject dataID="D3" name="status" type="string" dependence="D1" mutable="true"> <wf:description>A status code return</wf:description> <wf:location namespace="__object" where="DF"/> </wf:dataObject> </wf:dataObjects>

• <wf:task taskID="T2" name="formatChange" nextTask="T9" breakpoint="false"> <wf:description>Run API task to change the format of data</wf:description> <wf:process runTime="00:00:04" failTime="00:00:10"> <wf:detail name="APIformatChangePDBtoPDBx" type="method" where="API"/> <wf:dataObjectsLocation> <wf:location dataID="D1" type="input"/> <wf:location dataID="D2" type="output"/> <wf:location dataID="D3" type="output"/> </wf:dataObjectsLocation> </wf:process> </wf:task>

Input and output formats

Place holder for status – this might be so intrinsic to all tasks that it should probably be pre-declared and always present

And the API function to do this