s-cube lp: data dependency: inferring data attributes in service orchestrations based on sharing...

58
S-Cube Learning Package Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis Universidad Polit´ ecnica de Madrid (UPM)

Upload: virtual-campus

Post on 18-Dec-2014

276 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

S-Cube Learning Package

Data Dependency:Inferring Data Attributes in Service Orchestrations

Based on Sharing Analysis

Universidad Politecnica de Madrid (UPM)

Page 2: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Learning Package Categorization

S-Cube

WP-JRA-2.2: Adaptable Coordinated ServiceCompositions

Models and Mechanisms For CoordinatedService Compositions

Data Dependency:Inferring Data Attributes in Service Orchestrations Based on

Sharing Analysis

Page 3: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Table of Contents

1 Introduction and Background

2 Motivation and Problem Statement

3 Overview of the ApproachContexts and Concept LatticesHorn Clause ProgramsWorkflows in Horn Clause FormSharing AnalysisObtaining and Interpreting Results

4 Application to Fragment Identification

5 Conclusions

These slides have been prepared for offline viewing. Throughout the presentation, running commentaries, notes andadditional remarks will be displayed on the margins using the condensed font, like here.

Please refer to the publication list at the end for more details.

Page 4: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

1 Introduction and Background

Page 5: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

SOA and Web ServicesService Oriented Architecture (SOA):

• Flexible set of computing system design andimplementation principles

• Emphasis on loose coupling between servicesand with OS

• Distribution over Internet/intranet• Actors: service providers and consumers• Intrinsic dynamism and adaptability• Functionality often in the form of Web services

Web services:• Interoperability, platform independence• Data exchange standards: XML• Several technological flavors:

I WSDL/SOAP-basedI RESTful

• Typical implementation platforms:I Java / .NET application servers, BPEL, Web

server scripts (e.g. PHP)

We mention just some of thekey features of SOA and WebServices, that correspond toa “high-level” view of thearea, i.e. without taking intoaccount the details ofimplementation technologiesand infrastructure.

There are many standardsand technologies in theservice world. Also, differentservice provision platformsoffer varying degrees offunctionality to users anddesigners.

For a more detailedintroduction to the SOAdesign philosophy, platforms,tools, and techniques, pleaserefer to the list of publications.

Page 6: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Service Compositions

Service compositions aggregate individualservices to achieve a more complex orcross-organizational task:

• Combining loosely coupled components• Compositions expose themselves as services• Often described using workflows (control/data)• Complex control and latent parallelism• Potentially long-running• Centralized control⇒ orchestration• Subject to migration, adaptation, fragmentation,

etc.• Described using abstract & executable

formalisms & languages

Service compositions aretypically designed to reflectsome underlyingtechnological or businessprocess.

Compositions thus allowcreation of new “higher level”functionality from existingservices as building blocks. Aservice composition ofteninvolves services fromdifferent subsystems withinan organization, as well asexternal services.

Orchestrations have acentralized control and dataflow (a workflow of activities).They are usually describedusing a general-purpose orspecialized language orformal notation.

Page 7: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Data in Service Compositions

Data in service compositions represents inputs,intermediate results, internal messages, andfinal results:

• Workflow activities operate on data (access,combine, transform, etc.)

• Therefore data dependencies as important ascontrol

• Data is atomic or structured using richinformation formats (XML trees)

• Uses query languages (e.g. XPath) to searchand access fields and nested elements

• Behavior of control structures typically dependson data.

Analyzing data together withcontrol allows us to answerquestions about compositionbehavior depending on theinput data and other receivedmessages.

E.g., we can ask whethersome conditional branch inthe workflow will be taken fora given kind of data, or whatvalues would someintermediate data fields havefor the given input.

That kind of problems hasbeen long studied in programanalysis, and getting exactanswers is generally hardand undecidable in presenceof loops.

Page 8: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

2 Motivation and Problem Statement

Page 9: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example Medical Workflow

+

a1: Retrievemedical history

a2: Retrievemedication record

+

x: Patient ID

a4: Select newmedication

a3: Continue lastprescription

¬stable

stable

� a5: Log treatment

y: Medical history

z: Medication record

Fig. 1. An example drug prescription workflow.

a41: Run tests toproduce medication

criteria

a42: Searchmedicationdatabases

Resultsufficientlyspecific?

no yes

y: Medical history

z: Medication record

c: Criterion

p: Prescription candidate

Fig. 2. Selection of new medication.

In order to make concepts useful for analysis, we need toorganize them into concept lattices. A lattice is a mathemat-ical structure (L,≤,∨,∧) built around a set L (in our casecontaining concepts from a context), a partial order relation≤, the least upper bound (LUB) operation ∨, and the greatestlower bound (GLB) operation ∧. For arbitrary x,y ∈ L, theelement x∨ y = z has the property x ≤ z and y ≤ z, but it isalso the least such element, i.e., for any other w ∈ L such thatx ≤ w and y ≤ w, we have z ≤ w. The case for the greatestlower bound operation ∧ is symmetric. In this paper, we dealonly with finite and complete lattices, where for any arbitrarynon-empty subset of lattice elements the LUB and the GLBexist in L; such lattices have unique greatest (�) and least (⊥)elements.

For concept lattices, the ordering relation ≤ between twoconcepts B1 and B2 holds iff B1 ⊆ B2, or, equivalently, iffB�2 ⊆ B�1: a higher concept includes all objects from a lower(or derived) one; lower concepts are derived from higher onesby adding attributes. Consequently, the LUB is obtained usingB1∩B2, and the GLB using B1∪B2.

Context lattices are usually represented using a variantof Hasse diagrams (Fig. 5). Nodes correspond to concepts,with the top concept visually on the top, and the bottomconcept placed accordingly. The annotations associated witha concept (using dashed lines) show the attributes introducedby the concept (besides the derived attributes from the higherconcepts) above the line, and the objects that belong to thatconcept, but not to any of its derived concepts, below the line.

Concepts may have one or both parts of the annotation empty;in the latter case, the annotation is not shown.

Fig. 5 presents the concept lattices for the medical databasecontexts from Fig. 4. The most general concepts are shown ontop of the lattices, and the most specific (empty in both cases)at the bottom.

B. Describing Data with Concept Lattices

The data items that are input to the workflow need to bemapped to the appropriate objects in the input concept lattice.In the case of our example (Fig. 1), we would need to mapthe Patient ID input data item to either Passport, National ID,Driving License, or Social Security Card. In our example, eachof those objects maps to a different concept in the lattice, butin general several objects can map to the same concept.

The prerequisite in order to use concept lattices is to createan adequate context at a level of abstraction that capturesenough information to represent all relevant concepts andtheir attributes. Complex models can always be simplified bykeeping only those attributes that really discriminate betweendifferent concepts. Existing tools (e.g., ConExp, Lattice Miner,Colibri and others [1]) facilitate the process of eliciting andexploring knowledge using FCA.

A relevant point is that some data sources may not appearexplicitly as workflow inputs. In Fig. 1, activities a1 and a2need to access some external source to extract records usingthe input Patient ID. The attributes of the retrieved recordsdepend on properties of these data sources and therefore, they

Written using BPMN (Business Process Modeling Notation).• A high-level (non-executable) description.

This workflow shows a simplified drug prescription process in a health organization. At the entry, the patient identifieshim/herself (item x , PatientID). The patient’s medical history (y ) and medication record (z) are then retrieved in parallel(activities a1 and a2).

Page 10: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example Medical Workflow (contd.)

+

a1: Retrievemedical history

a2: Retrievemedication record

+

x: Patient ID

a4: Select newmedication

a3: Continue lastprescription

¬stable

stable

� a5: Log treatment

y: Medical history

z: Medication record

Fig. 1. An example drug prescription workflow.

a41: Run tests toproduce medication

criteria

a42: Searchmedicationdatabases

Resultsufficientlyspecific?

no yes

y: Medical history

z: Medication record

c: Criterion

p: Prescription candidate

Fig. 2. Selection of new medication.

In order to make concepts useful for analysis, we need toorganize them into concept lattices. A lattice is a mathemat-ical structure (L,≤,∨,∧) built around a set L (in our casecontaining concepts from a context), a partial order relation≤, the least upper bound (LUB) operation ∨, and the greatestlower bound (GLB) operation ∧. For arbitrary x,y ∈ L, theelement x∨ y = z has the property x ≤ z and y ≤ z, but it isalso the least such element, i.e., for any other w ∈ L such thatx ≤ w and y ≤ w, we have z ≤ w. The case for the greatestlower bound operation ∧ is symmetric. In this paper, we dealonly with finite and complete lattices, where for any arbitrarynon-empty subset of lattice elements the LUB and the GLBexist in L; such lattices have unique greatest (�) and least (⊥)elements.

For concept lattices, the ordering relation ≤ between twoconcepts B1 and B2 holds iff B1 ⊆ B2, or, equivalently, iffB�2 ⊆ B�1: a higher concept includes all objects from a lower(or derived) one; lower concepts are derived from higher onesby adding attributes. Consequently, the LUB is obtained usingB1∩B2, and the GLB using B1∪B2.

Context lattices are usually represented using a variantof Hasse diagrams (Fig. 5). Nodes correspond to concepts,with the top concept visually on the top, and the bottomconcept placed accordingly. The annotations associated witha concept (using dashed lines) show the attributes introducedby the concept (besides the derived attributes from the higherconcepts) above the line, and the objects that belong to thatconcept, but not to any of its derived concepts, below the line.

Concepts may have one or both parts of the annotation empty;in the latter case, the annotation is not shown.

Fig. 5 presents the concept lattices for the medical databasecontexts from Fig. 4. The most general concepts are shown ontop of the lattices, and the most specific (empty in both cases)at the bottom.

B. Describing Data with Concept Lattices

The data items that are input to the workflow need to bemapped to the appropriate objects in the input concept lattice.In the case of our example (Fig. 1), we would need to mapthe Patient ID input data item to either Passport, National ID,Driving License, or Social Security Card. In our example, eachof those objects maps to a different concept in the lattice, butin general several objects can map to the same concept.

The prerequisite in order to use concept lattices is to createan adequate context at a level of abstraction that capturesenough information to represent all relevant concepts andtheir attributes. Complex models can always be simplified bykeeping only those attributes that really discriminate betweendifferent concepts. Existing tools (e.g., ConExp, Lattice Miner,Colibri and others [1]) facilitate the process of eliciting andexploring knowledge using FCA.

A relevant point is that some data sources may not appearexplicitly as workflow inputs. In Fig. 1, activities a1 and a2need to access some external source to extract records usingthe input Patient ID. The attributes of the retrieved recordsdepend on properties of these data sources and therefore, they

Aiming at fragmentation that respects data privacy.

Depending on whether the patient’s condition is stable or not, either the earlier prescription is continued (activity a3), or anew medication is selected (activity a4). Finally, the treatment of the patient is logged (activity a5).

In this example, we consider data privacy attributes. Data may contain confidential information on the patient’s medical andmedication history, including insurance coverage. A fragment should contain activities that access data of only a certainprivacy level. The fragments can then be distributed based on what privacy clearance they require.

Page 11: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example Sub-Workflow

+

a1: Retrievemedical history

a2: Retrievemedication record

+

x: Patient ID

a4: Select newmedication

a3: Continue lastprescription

¬stable

stable

� a5: Log treatment

y: Medical history

z: Medication record

Fig. 1. An example drug prescription workflow.

a41: Run tests toproduce medication

criteria

a42: Searchmedicationdatabases

Resultsufficientlyspecific?

no yes

y: Medical history

z: Medication record

c: Criterion

p: Prescription candidate

Fig. 2. Selection of new medication.

In order to make concepts useful for analysis, we need toorganize them into concept lattices. A lattice is a mathemat-ical structure (L,≤,∨,∧) built around a set L (in our casecontaining concepts from a context), a partial order relation≤, the least upper bound (LUB) operation ∨, and the greatestlower bound (GLB) operation ∧. For arbitrary x,y ∈ L, theelement x∨ y = z has the property x ≤ z and y ≤ z, but it isalso the least such element, i.e., for any other w ∈ L such thatx ≤ w and y ≤ w, we have z ≤ w. The case for the greatestlower bound operation ∧ is symmetric. In this paper, we dealonly with finite and complete lattices, where for any arbitrarynon-empty subset of lattice elements the LUB and the GLBexist in L; such lattices have unique greatest (�) and least (⊥)elements.

For concept lattices, the ordering relation ≤ between twoconcepts B1 and B2 holds iff B1 ⊆ B2, or, equivalently, iffB�2 ⊆ B�1: a higher concept includes all objects from a lower(or derived) one; lower concepts are derived from higher onesby adding attributes. Consequently, the LUB is obtained usingB1∩B2, and the GLB using B1∪B2.

Context lattices are usually represented using a variantof Hasse diagrams (Fig. 5). Nodes correspond to concepts,with the top concept visually on the top, and the bottomconcept placed accordingly. The annotations associated witha concept (using dashed lines) show the attributes introducedby the concept (besides the derived attributes from the higherconcepts) above the line, and the objects that belong to thatconcept, but not to any of its derived concepts, below the line.

Concepts may have one or both parts of the annotation empty;in the latter case, the annotation is not shown.

Fig. 5 presents the concept lattices for the medical databasecontexts from Fig. 4. The most general concepts are shown ontop of the lattices, and the most specific (empty in both cases)at the bottom.

B. Describing Data with Concept Lattices

The data items that are input to the workflow need to bemapped to the appropriate objects in the input concept lattice.In the case of our example (Fig. 1), we would need to mapthe Patient ID input data item to either Passport, National ID,Driving License, or Social Security Card. In our example, eachof those objects maps to a different concept in the lattice, butin general several objects can map to the same concept.

The prerequisite in order to use concept lattices is to createan adequate context at a level of abstraction that capturesenough information to represent all relevant concepts andtheir attributes. Complex models can always be simplified bykeeping only those attributes that really discriminate betweendifferent concepts. Existing tools (e.g., ConExp, Lattice Miner,Colibri and others [1]) facilitate the process of eliciting andexploring knowledge using FCA.

A relevant point is that some data sources may not appearexplicitly as workflow inputs. In Fig. 1, activities a1 and a2need to access some external source to extract records usingthe input Patient ID. The attributes of the retrieved recordsdepend on properties of these data sources and therefore, they

Sub-Workflow for medication selection (component service a4)• involves looping based on intermediate data.

To make things more interesting, we “unpack” one of the component services, a4, from the main workflow and represent itas a sub-workflow with own inputs (items y and z), outputs (item p), and intermediate data (item c).

As soon as there is a loop involved (taking the “no” branch), the analysis becomes more complex. An exact analysis of theorchestration state after the loop would require a discovery of the loop invariant, which is a generally difficult problem. As wewill show in the next section, we find our way around this obstacle by employing abstract interpretation techniques that giveus a conservative approximation of the loop behavior.

Page 12: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Data Attributes

User-defined attributes can be used tocharacterize data in a given analysis domain

• Application dependent view• Simplified data model: sets of properties instead

of complex structures• User (designer) chooses relevant attributes,

describing e.g.:I information contentI privacy/confidentiality levels ⇐ our exampleI ownershipI other aspects of quality

• Possibly: a combination of views• Known or assumed for input data, implicit in

control/data dependencies in the workflow

Question: How to infer attributes (i.e. properties)of intermediate and resulting data items?

• Based on control flow and data dependencies

Reasoning about thewhereabouts of data in theexecution of a servicecomposition is simpler if wetrack only data attributes andnot the entire complex datastructures. This fits very wellan approach to programanalysis known as abstractinterpretation, where infinitedata domains are abstractedinto finite ones.

E.g., knowing privacy levelsof input data, we can try toinfer the privacy levels ofintermediate data and theindividual activities in theworkflow.

Of course, we have to knowhow data tests andoperations depend on andaffect data attributes on theabstract plane.

Page 13: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Knowing Data Attributes

Knowledge of data attributes at design time:• Supporting fragmentation

I Fragment: a part of orchestration that can bedistributed for remote execution

I What parts can be identified and enacted in adistributed fashion?

• Checking data complianceI Content of messages exchanged with/between

component services in an orchestrationI Is “sufficient” data passed to components?

• Robust top-down developmentI Modular structure of service orchestrationsI Refining specifications of workflow

(sub-)components

Also useful at runtime:• Updating predictions with actual data• On-demand analysis

Analysis of data attributes forcomponents of a serviceorchestration at design time isan instance of static analysis,where properties are inferredfrom specification, and not byrunning the orchestration.

The static analysis approachcan be combined withmonitoring and adaptationmechanisms, and theanalysis can be performed ona live executing instance ofthe orchestration.

That can give more accurateresults, because by looking atthe live instance we can learnthe actual values of data upto that point in execution, andupdate the analysisaccordingly.

Page 14: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Problem StatementPROBLEM

To infer user-defined attributes for data items and activities on different lev-els in an orchestration, automatically from:

known attributes of input data• defined or assumed by the designer

control structure• including complex control structures, such as parallel flows, conditional

branches and loops.

data operations• reading or writing data, including tests, assignments and service

invocations

The aim is to provide the automated, mechanical inference of data attributes, ideally using a tool that can be invoked atdesign time. The concrete tool implementation depends on the language in which the orchestration is written (e.g., BPEL,BPMN, etc.)

In this learning package, we generally present the approach and ideas for each step in the analysis process. These stepscan be adapted to particular orchestration language and turned into a fully automated tool chain.

Page 15: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

3 Overview of the Approach

Page 16: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Overview

Use

rpe

rspe

ctiv

eU

nder

lyin

gte

chni

ques

and

artif

acts

α1 α2 α3 . . .

i1i2i3. . .

Input data context Workflow definition

α1 α2 α3 . . .

o1

o2

o3. . .

Resulting context

Input concept lattice Resulting concept lattice

w(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2):-A1=f1(X1),Y1=f1Y1(X1),A2=f2(X2),Y2=f2Y2(X2),A3=f3(Y1,Y2),...

Horn clause program

...X1=f(U1,U2),X2=f(U1),X3=f,...

Input substitution

- Abstract interpretation- Sharing+freeness domain- CiaoDE / CiaoPP suite

Sharing analysis[[X1,A1,Y1,A3,Z1],[A3,Z1,A4,Z2],[X2,A4,Z2],[X2,A2,Y2,A3,Z1,A4,Z2]]

Abstract substitution

Fig. 3. Overview of the approach.

need to be mapped to appropriate objects (in this case theMedical history and the Medication record from Fig. 5(a)).

IV. APPLYING SHARING ANALYSIS

Our application of sharing analysis to elicit new knowledgeabout attributes of the workflow entities is based on threepoints: (a) representing the control structures of the workflowin a language amenable to analysis, (b) representing data linksand activities in the workflow as explicit variables, and (c)representing attributes of these entities as additional hiddenvariables which can share with the variables set up in (b). Twovariables share if there is some object which is reachable fromboth, maybe following a reference chain. By inferring howruntime variables can share in the programming language rep-resentation of the workflow we deduce the runtime attributesof data items and activities in the workflow.

A. Workflows as Horn Clauses

The sharing analysis tools we will use [7], [6] work on logicprograms, and therefore the workflow under consideration

Symptoms Tests CoverageMedical history

Medication record

(a) Characteristics of medical databases.

Name Address PIN SSNPassport

National Id CardDriving License

Social Security Card

(b) Types of identity documents.

Fig. 4. Two examples of contexts.

needs to be represented in the form of a logic program [14]: aseries of logical implications which can be operationally un-derstood as stating which subgoals are needed to accomplish agiven goal. Note that the translation into a logic program doesnot need to be operationally equivalent to the initial workflow;it only needs to represent the data flow and data aliasingcorrectly. We translate data flow into parameter passing anddata aliasing into unification of logical variables. Here webuild on [8], where examples of translations from an abstractnotation for workflows into Horn clauses are given. Due to sizeconstraints we cannot reproduce here the description of thetranslation. Instead, we kindly direct the reader to [8] for moredetails. A key ingredient of the translation is representing, foreach activity in the workflow, the sets of data items read andwritten. This vantage point in workflow modeling is sharedwith the existing approaches to the analysis of soundnessof Workflow Nets with Data (WFD nets) [18], as well aswith the approaches to verifying validity of business processspecifications using data-flow matrices [19]. However, unlikethose higher-level conceptual views that are mainly concernedwith various aspects of business process management, in ourcase we aim at inferring properties on a more technical levelthat takes into account details of (possibly complex and nested)control flow and data operations. For that purpose, WFD netsor UML activity diagrams are not sufficiently informative,while Horn clauses provide an adequate computation paradigmthat has been extensively studied.

As an illustration, we give here a commented translationof our workflow written in BPMN (Figs. 1 and 2) into Hornclauses. The translation for this case is given in Fig. 6 usingProlog syntax, and will be explained in the following text.

Lines 1-8 are a Horn clause that defines the predicate w forthe workflow with a list of comma-separated goals in the body(lines 2-8) following the definition symbol “:-”. Character “%”

Above the line are artifacts that the user works with directly. The input data context describes user-defined attributes of theinputs to the orchestration, and accompanies the workflow definition, like the one in our example. The results are returned inthe form of the resulting context which gives back the attributes for the intermediate data items, results, and activities.

The intermediate steps below the line are explained in the slides that follow.

Page 17: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Overview (contd.)The approach to Automated Attribute inference takes as input:

• an input data context that identifies the input data items to the workflowand their attributes

• a workflow definition in some appropriate formalism (e.g. BPMN in ourexample)

and gives at output:• a resulting context that presents inferred attributes of all intermediate

data items and activities in the given workflow.

The key steps in the process include:• Conceptualizing the input data context in the form of a concept lattice,

and preparing the input substitution for the analysis.• Turning the given workflow definition into a Horn Clause program that

is fed to the analysis, along with the input substitution.• Performing sharing analysis and using its result, the abstract

substitution to construct the resulting concept lattice.• Interpreting the resulting concept lattice to produce the resulting

context.

Page 18: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Outline of the Section

This Overview section starts with two subsections that introducesome important background notions.

• Subsection Contexts and Concept Lattices briefly introduces the keynotions of Formal Concept Analysis (FCA), like contexts and conceptlattices that are used in the rest of the text for representing (andreasoning about) inputs and outputs of the proposed analysisapproach.

• Subsection Horn Clause Programs presents the key ideas behindlogic (Horn Clause) programs, gives an informal introduction to theirform and meaning, and presents the notion of structured terms,substitutions, unification, which are all referred to later. It alsointroduces Prolog syntax.

These two subsections do not describe steps of the approach assuch. Rather, they supply the notions whose understanding isnecessary for understanding the steps in the approach.

Page 19: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Outline of the Section (contd.)

The rest of subsections describe steps in the process of automatedattribute inference:

• Subsection Workflows in Horn Clause Form starts from a rathergeneralized way of describing workflows that involve complex data andcontrol dependencies, and describes how such workflows can beturned into a Horn Clause form amenable to sharing analysis.

• Next, subsection Sharing Analysis first defines the notion of sharingin logic programs, building on the notion of substitution, introducedearlier. Next, it describes the notion of abstract substitution, which isused in the actual analysis as the domain for abstract interpretation. Italso describes how an initial substitution for the analysis is set up usingattributes from the input concept lattice.

• Finally, subsection Obtaining and Interpreting Results explains howthe result of the sharing analysis, in the form of abstract substitution, isturned into a resulting concept lattice, and then used to generate theresulting context, which is the end result.

Page 20: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

3 1 Contexts and Concept Lattices

Page 21: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Contexts: Objects and Attributes

Use

rpe

rspe

ctiv

eU

nder

lyin

gte

chni

ques

and

artif

acts

α1 α2 α3 . . .

i1i2i3. . .

Input data context Workflow definition

α1 α2 α3 . . .

o1

o2

o3. . .

Resulting context

Input concept lattice Resulting concept lattice

w(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2):-A1=f1(X1),Y1=f1Y1(X1),A2=f2(X2),Y2=f2Y2(X2),A3=f3(Y1,Y2),...

Horn clause program

...X1=f(U1,U2),X2=f(U1),X3=f,...

Input substitution

- Abstract interpretation- Sharing+freeness domain- CiaoDE / CiaoPP suite

Sharing analysis[[X1,A1,Y1,A3,Z1],[A3,Z1,A4,Z2],[X2,A4,Z2],[X2,A2,Y2,A3,Z1,A4,Z2]]

Abstract substitution

Fig. 3. Overview of the approach.

need to be mapped to appropriate objects (in this case theMedical history and the Medication record from Fig. 5(a)).

IV. APPLYING SHARING ANALYSIS

Our application of sharing analysis to elicit new knowledgeabout attributes of the workflow entities is based on threepoints: (a) representing the control structures of the workflowin a language amenable to analysis, (b) representing data linksand activities in the workflow as explicit variables, and (c)representing attributes of these entities as additional hiddenvariables which can share with the variables set up in (b). Twovariables share if there is some object which is reachable fromboth, maybe following a reference chain. By inferring howruntime variables can share in the programming language rep-resentation of the workflow we deduce the runtime attributesof data items and activities in the workflow.

A. Workflows as Horn Clauses

The sharing analysis tools we will use [7], [6] work on logicprograms, and therefore the workflow under consideration

Symptoms Tests CoverageMedical history

Medication record

(a) Characteristics of medical databases.

Name Address PIN SSNPassport

National Id CardDriving License

Social Security Card

(b) Types of identity documents.

Fig. 4. Two examples of contexts.

needs to be represented in the form of a logic program [14]: aseries of logical implications which can be operationally un-derstood as stating which subgoals are needed to accomplish agiven goal. Note that the translation into a logic program doesnot need to be operationally equivalent to the initial workflow;it only needs to represent the data flow and data aliasingcorrectly. We translate data flow into parameter passing anddata aliasing into unification of logical variables. Here webuild on [8], where examples of translations from an abstractnotation for workflows into Horn clauses are given. Due to sizeconstraints we cannot reproduce here the description of thetranslation. Instead, we kindly direct the reader to [8] for moredetails. A key ingredient of the translation is representing, foreach activity in the workflow, the sets of data items read andwritten. This vantage point in workflow modeling is sharedwith the existing approaches to the analysis of soundnessof Workflow Nets with Data (WFD nets) [18], as well aswith the approaches to verifying validity of business processspecifications using data-flow matrices [19]. However, unlikethose higher-level conceptual views that are mainly concernedwith various aspects of business process management, in ourcase we aim at inferring properties on a more technical levelthat takes into account details of (possibly complex and nested)control flow and data operations. For that purpose, WFD netsor UML activity diagrams are not sufficiently informative,while Horn clauses provide an adequate computation paradigmthat has been extensively studied.

As an illustration, we give here a commented translationof our workflow written in BPMN (Figs. 1 and 2) into Hornclauses. The translation for this case is given in Fig. 6 usingProlog syntax, and will be explained in the following text.

Lines 1-8 are a Horn clause that defines the predicate w forthe workflow with a list of comma-separated goals in the body(lines 2-8) following the definition symbol “:-”. Character “%”

Notion of context in Formal Concept Analysis(FCA)

• Set A of attributes (columns)• Set O of objects (rows)• Boolean object-attribute relation ρ ⊆ O × A

Formal Concept Analysis is abranch of mathematicallattice theory concerned withknowledge representationand reasoning.

A FCA context is simply atable that associates objectswith attributes.

The examples on the leftshow two contexts: one thatdescribes the content ofmedical databases, andanother that describes theinformation contained indifferent identity documents.

Objects (rows) stand forsome meaningful entities,and attributes (columns) arechosen by the user torepresent relevant notions inthe application domain.

Page 22: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Concepts

The idea behind a concept is a close connectionbetween subsets of objects and attributes.Objects→ Attributes

• For arbitrary subset of objects B ⊆ Olet B ′ = {a ∈ A | ∀o ∈ B, oρa }

“all and only those attributes that belong to allobjects from B”

Attributes→ Objects• For arbitrary subset of attributes D ⊆ A

let D ′ = {o ∈ O | ∀a ∈ D, oρa }

“all and only those objects that have allattributes from D”

Iff B ′ = D and D ′ = B then (B,D) is a concept• B ′′ = (B ′) ′ = D ′ = B, D ′′ = (D ′) ′ = B ′ = D

From the definition, inconcept (B,D) we need toknow only B or D to find theother using (·) ′.That means we can chooseto work with objects orattributes, whatever is moreconvenient.

E.g., we can start from asingle attribute a andcalculate {a } ′′ to find themost general concept thathas a.

Or, we can start with anobject o and calculate {o } ′′

to find the most specificconcept containing o.

Because B ′′ = B andD ′′ = D, we say thatconcepts are closed under(·) ′′, i.e., (·) ′′ is a closure.

Page 23: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Concept LatticesConcept lattice with ordering v

• (B1,D1) v (B2,D2)⇔ B1 ⊆ B2 ⇔ D2 ⊆ D1• Lesser concept adds attributes (D2 ⊆ D1, i.e.

(B1,D1) is more specific)• Greater concept includes lesser objects

(B1 ⊆ B2, i.e. (B2,D2) is more general)• Top (>): the most general concept (all objects)• Bottom (⊥): the most specific concept (all

attributes)

It is a complete lattice

introduces a comment line and helps relate parts of the bodywith activities from Fig. 1. Arguments to w are listed insideparentheses in line 1; following the Prolog syntax convention,variable names start with an uppercase letter. These variablescorrespond to the names of activities and data items fromthe BPMN diagrams, including those data sources which arenot explicit in the diagram: in this case the databases fromwhich the medical history and medical record are retrievedfrom. These databases have to be characterized in the resultlattice, and in order to do so they are assigned variables D andE. To expose results for the sub-workflow from Fig. 2, thearguments to w include all activities and data items from thesub-workflow.

Simple activities (including external service invocation) aretranslated as goals of the shape α = ϕ(Γ), where α stands foran activity, Γ is a sequence of all data items used as inputsby the activity, and ϕ is an uninterpreted function symbol(whose particular name is not relevant for sharing analysis,and has been chosen to recall the activity name). This isfollowed by goals of the same shape where the left-hand sideof “=” stands for data item produced by the activity, and theΓ part on the right hand side includes data items used in thecomputation of the data item. For instance, goals A1=f1(X,D)and Y=f1 Y(X,D) in lines 2 and 3 represent the fact that a1uses data items x and d as inputs, to produce data item y. Theonly exception in w is the goal for sub-workflow a4 (line 7)to be discussed below.

The ordering of activities in the body of a clause mustrespect data dependencies, in the sense that data items shouldappear as arguments in a goal only if they are produced by apreceding activity. The ordering also needs to respect controldependencies arising from explicit sequences and joins (ANDand OR). Otherwise, as in the AND-split case, the relativeorder of activities as goals in the body of a Horn clause is

Symptoms

TestsMedical history

CoverageMedication record

(a) Concept lattice for medical databases.

Name

PINPassport

AddressDriving License

National ID

SSNSoc. Sec. Card

(b) Concept lattice for identity documents.

Fig. 5. Concept lattices for contexts from Fig. 4.

1 w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-A1=f1(X,D), % a_1

3 Y=f1_Y(X,D),A2=f2(X,E), % a_2

5 Z=f2_Z(X,E),A3=f3(Y,Z), % a_3

7 a_4(Y,Z,A4,A41,C,A42,P), % a_4A5=f5(X). % a_5

9

a_4(Y,Z,A4,A41,C,A42,P):-11 w2(Y,Z,A41,C2,A42,P2),

A4=f(P2),13 a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).

15 a_4x(_,_,C,P,C,P,_,_,_).a_4x(X,Z,_,_,C,P,A4,A41,A42):-

17 a_4(X,Z,A4,A41,C,A42,P).

19 w2(Y,Z,A41,C,A42,P):-A41=f41(Y,Z), % a_41

21 C=f41_C(Y),A42=f42(C), % a_42

23 P=f42_P(C).

Fig. 6. Horn clause program encoding for the medication prescriptionworkflow.

not significant from the sharing analysis point of view [8],and one such ordering can always be found, unless there is arace condition between potentially parallelized activities thattry to read/write the same data item. This is not the case in ourexample and the possibility of this happening can be staticallydetected from the structure of the workflow. Also note that weinclude both branches of the XOR-split, since the data in a5can be affected by either one of them. The workflow for thecomponent activity a4 is effectively a repeat-until loop, andits body (activities a41 and a42) is translated in lines 19-23 inthe same manner as w.

The goal for a4 in the definition of w (line 7) is a callto a predicate a4 defined in lines 10-13. Its loop structure istranslated by introducing auxiliary clauses in lines 15-17 thatrepresent the case of loop exit (line 15) and the loop iterationby means of a recursive call. The call to the body of the loop(w2 in line 11) is translated before the call to the auxiliarypredicate a 4x.

B. Input Substitutions

An input substitution sets up the initial sharing (and there-fore which attributes are shared) between the input top-levelvariables. It is a mapping from the variables that represent thedata items given as input to the workflow to subsets of the“hidden” variables which represent attributes.

Variable sharing can be represented as a lattice where nodesrepresent variable sets which share a unique, hidden variable.The structure of the sharing lattice can be directly derived fromthe input concept lattice by assigning a hidden variable to eachattribute in the input context. For clarity, hidden variables arenamed after the corresponding attributes. Next, the top-levelvariables are mapped to objects from the input context, and,

introduces a comment line and helps relate parts of the bodywith activities from Fig. 1. Arguments to w are listed insideparentheses in line 1; following the Prolog syntax convention,variable names start with an uppercase letter. These variablescorrespond to the names of activities and data items fromthe BPMN diagrams, including those data sources which arenot explicit in the diagram: in this case the databases fromwhich the medical history and medical record are retrievedfrom. These databases have to be characterized in the resultlattice, and in order to do so they are assigned variables D andE. To expose results for the sub-workflow from Fig. 2, thearguments to w include all activities and data items from thesub-workflow.

Simple activities (including external service invocation) aretranslated as goals of the shape α = ϕ(Γ), where α stands foran activity, Γ is a sequence of all data items used as inputsby the activity, and ϕ is an uninterpreted function symbol(whose particular name is not relevant for sharing analysis,and has been chosen to recall the activity name). This isfollowed by goals of the same shape where the left-hand sideof “=” stands for data item produced by the activity, and theΓ part on the right hand side includes data items used in thecomputation of the data item. For instance, goals A1=f1(X,D)and Y=f1 Y(X,D) in lines 2 and 3 represent the fact that a1uses data items x and d as inputs, to produce data item y. Theonly exception in w is the goal for sub-workflow a4 (line 7)to be discussed below.

The ordering of activities in the body of a clause mustrespect data dependencies, in the sense that data items shouldappear as arguments in a goal only if they are produced by apreceding activity. The ordering also needs to respect controldependencies arising from explicit sequences and joins (ANDand OR). Otherwise, as in the AND-split case, the relativeorder of activities as goals in the body of a Horn clause is

Symptoms

TestsMedical history

CoverageMedication record

(a) Concept lattice for medical databases.

Name

PINPassport

AddressDriving License

National ID

SSNSoc. Sec. Card

(b) Concept lattice for identity documents.

Fig. 5. Concept lattices for contexts from Fig. 4.

1 w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-A1=f1(X,D), % a_1

3 Y=f1_Y(X,D),A2=f2(X,E), % a_2

5 Z=f2_Z(X,E),A3=f3(Y,Z), % a_3

7 a_4(Y,Z,A4,A41,C,A42,P), % a_4A5=f5(X). % a_5

9

a_4(Y,Z,A4,A41,C,A42,P):-11 w2(Y,Z,A41,C2,A42,P2),

A4=f(P2),13 a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).

15 a_4x(_,_,C,P,C,P,_,_,_).a_4x(X,Z,_,_,C,P,A4,A41,A42):-

17 a_4(X,Z,A4,A41,C,A42,P).

19 w2(Y,Z,A41,C,A42,P):-A41=f41(Y,Z), % a_41

21 C=f41_C(Y),A42=f42(C), % a_42

23 P=f42_P(C).

Fig. 6. Horn clause program encoding for the medication prescriptionworkflow.

not significant from the sharing analysis point of view [8],and one such ordering can always be found, unless there is arace condition between potentially parallelized activities thattry to read/write the same data item. This is not the case in ourexample and the possibility of this happening can be staticallydetected from the structure of the workflow. Also note that weinclude both branches of the XOR-split, since the data in a5can be affected by either one of them. The workflow for thecomponent activity a4 is effectively a repeat-until loop, andits body (activities a41 and a42) is translated in lines 19-23 inthe same manner as w.

The goal for a4 in the definition of w (line 7) is a callto a predicate a4 defined in lines 10-13. Its loop structure istranslated by introducing auxiliary clauses in lines 15-17 thatrepresent the case of loop exit (line 15) and the loop iterationby means of a recursive call. The call to the body of the loop(w2 in line 11) is translated before the call to the auxiliarypredicate a 4x.

B. Input Substitutions

An input substitution sets up the initial sharing (and there-fore which attributes are shared) between the input top-levelvariables. It is a mapping from the variables that represent thedata items given as input to the workflow to subsets of the“hidden” variables which represent attributes.

Variable sharing can be represented as a lattice where nodesrepresent variable sets which share a unique, hidden variable.The structure of the sharing lattice can be directly derived fromthe input concept lattice by assigning a hidden variable to eachattribute in the input context. For clarity, hidden variables arenamed after the corresponding attributes. Next, the top-levelvariables are mapped to objects from the input context, and,

The concept lattice is oftenshown using a variant ofHasse diagrams (bottom left).Nodes represent concepts.The top concept is visually atthe top, and the bottomconcept is visually at thebottom.

Callouts show objects new tothe concept (inherited by allupwards nodes) above theline, and attributes new to theconcepts (inherited by alldownwards nodes) below theline.

The example concept lattices(bottom left) correspond tothe sample contexts formedical databases personalidentification documents.

Page 24: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

3 2 Horn Clause Programs

Page 25: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

About Logic Programs

Logic programs represent a computation task asset of logical rules and factsLogical rules model if-then inferences:

• B1 ∧ B2 ∧ · · ·∧ Bn → H: if B1, . . . ,Bn are alltrue (n > 0), then we conclude that H is true.

• Often written as H ← B1 ∧ B2 ∧ · · ·∧ Bn• H is the head of the rule• B1 ∧ · · ·∧ Bn is the body of the rule• H ← (the case n = 0) is a fact (H is always true)

Exampleancestor(x , y)← parent(x , y) (a parent is an ancestor)ancestor(x , y)← parent(z, y)∧ (a parent’s ancestor is

ancestor(x , z) an ancestor)

Logic programming is one ofthe classical programmingparadigms, along withimperative, object-orientedand functional programming.

The example gives rules forthe “x is an ancestor of y”relation, written asancestor(x , y), using alsoparent(x , y) relation.

Logic programs aredeclarative, because theystate the rules and theproblem to be solved (e.g.finding somebody’s ancestorsor descendants), not thesequence of steps to solve it.That makes logic programsrelatives of SQL, but far morepowerful.

Page 26: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Elements of Horn Clause Programs

Elements of logic programs include:• Predicates that describe logical properties or

relations, such as ancestor/2 and parent/2(where /n means “with n arguments”)

• Atoms that apply predicates, such as “x is anancestor of y”, written as ancestor(x , y)

• Variables x , y , z that stand for arbitrary objectsin a rule (implicitly ∀-qualified)

• Constants that name distinct objects (such asAlice, Bob, Carol and Dennis below)

In a Horn Clause program rules, H and each ofB1, . . . ,Bn are atoms.

Continued Example: Parent Fact Databaseparent(Alice,Bob)←parent(Dennis,Bob)←parent(Carol,Dennis)←

The elements of logicprograms (predicates, terms,variables, constants, etc.)correspond to the notions inFirst Order Logic (FOL).

As in FOL, we assume thatpredicate and constantnames refer to distinctentities – unlike differentlynamed variables that canrefer to the same object.

Also, p/1 and p/2 are twodifferent predicates — withone and two arguments,respectively — even thoughthey share the same name p.

The simplified structure ofHorn Clause programs allowsefficient reasoning, i.e.derivation of logicalconsequences from knownfacts and rules.

Page 27: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Executing Horn-Clause Programs

Executing a logic programs means searching fora proof of a logical statement known as thequery, finding variable values along the way.

Sample Query 1: Find Bob’s ancestorsQuery: ancestor(x ,Bob)Answers: x = Alice, x = Carol, x = Dennis

Sample Query 2: Find Carols’s descendantsQuery: ancestor(Carol, y)Answers: y = Dennis, y = Bob

Sample Query 3: Find Alice’s ancestorsQuery: ancestor(x ,Alice)Answer: no solution (cannot prove for any x)

The sample queries computedifferent things (or fail, in thelast case) depending on thequery – in C or Java wewould have to programseparate procedures for “findperson’s ancestors” and “findperson’s descendants” etc.

In case of success, thevariables in the query maypoint to objects for which thequery can be proven from theprogram.

The “magic” is done by theunder-the-hood inferenceengine that takes the programand the query and performs asystematic search for a proof.

The result may be a failure, ora single or multiple solution(possibly infinite number ofthem).

Page 28: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Handling Structures

Structured terms have the shape f (t1, t2, . . . , tm),m > 0, where f is a functor, and each of ti isagain a term.

Example: Peano ArithmeticsProgram: number(0)←

number(s(x))← number(x)succ(x , s(x))←

Query 1: number(x)Answers: x = 0, x = s(0), x = s(s(0)), x = s(s(s(0))), . . .

Query 2: succ(x , s(0))Answer: x = 0

Lists are common structures, with constant [ ]representing the empty list, and functor “.” (dot)used to put together the head and the tail:

• [4] is the same as .(4, [ ])• [1, 2, 3, 4] is the same as .(1, .(2, .(3, .(4, [ ]))))

Note that f(t1, . . . , tn) isNOT a function call in thesense of C, Python orHaskell. It can be thought ofas a data record with name fand n fields. For n = 0, f issimply a constant.

It goes without saying thatstructured terms can be (andoften are) nested, as in theexamples of Peanoarithmetics and lists.

Lists are very frequently useddata structures, and are acommon tool in logicprogramming. However,structured terms can be usedto represent nodes in a treeor a graph, records that storeinformation, or other kinds ofdata containers we need.

Page 29: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Unification and SubstitutionsAn atoms of the form t1 = t2 expresses syntactical equality.

• it succeeds if t1 and t2 are identical, or can be made identical bysubstituting some variables in t1 and t2 for terms.

• we are interested in the substitution which introduces the least amountof information — the most general unifier (MGU)

• a substitution maps (binds) variables to terms

Unification ExamplesUnification MGU

1 = 0 none (failure)s(0) = s(x) θ = {x 7→ 0}f (0) = s(x) none (failure)

Unification MGUs(x) = s(y) θ = {x 7→ y}

f (s(x), x) = f (z, 1) θ = {z 7→ s(1), x 7→ 1}f (s(x), y) = f (1, z) none (failure)

Running a query means finding a substitution that makes the querytrue, by adding MGUs from each B1, . . . ,Bn in a rule body.

Note that t1 = t2 is just a nicer way of writing = (t1, t2). Unification is implicit in parameter passing in clause heads. Forinstance, we can rewrite the rule “succ(x , s(x))←” as “succ(x , y)← y = s(x)”. Equally, the rule“number(s(x))← number(x)” can be rewritten as “number(y)← y = s(x)∧ number(x)”.

Page 30: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Prolog and Friends

Prolog is a programming language based onHorn Clause rules.Concrete language syntax:

• clauses end with a full stop (“.”)• uses “:-” instead of “←”, comma instead of “∧”• variables start with uppercase letters or “ ”• predicate names, functors and constants start

with lowercase letters

Powerful analysis tools and techniques based on“clean” program semantics.

Examples in PrologAncestors: ancestor(X,Y):- parent(X,Y).

ancestor(X,Y):- parent(Z,Y), ancestor(X,Z).

Peano arith.: number(0).

number(s(X)):- number(X).

add(0, X, X):- number(X).

add(s(X), Y, s(Z)):- add(X,Y,Z).

The full Prolog languageincludes “impure” features,such as dynamic fact updatesand I/O. Modern Prologsystems contain extensionssuch as constraint logicprogramming (CLP) andtabling.

However, “pure” Prologprograms have a closerelationship with logicaltheories. Reasoning aboutthem in a sound fashion iseasier than in otherexecutable formalisms.

We will use Prolog to encodeobjects and attributes in anexecutable from which willcapture the structure of aworkflow, and Prolog analysistools to automatically deriveattributes.

Page 31: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

3 3 Workflows in Horn Clause Form

Page 32: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Anatomy of a Workflow

In general, workflows may contain complex dataand control dependencies:

• sequences, conditional branches, and loops• parallel flows, with pre- and post-conditions• data items are read and written by activities• inputs: possibly complex XML information sets

〈x , y〉z

x

z〈y , z〉

xy x? y? 〈x , y〉?

Understanding how data is handled throughoutthe workflow is non-trivial.

• what information items / parts are used? where?

There are many concreteworkflow definition languagesthat can be used to specifycontrol structures and dataoperations. Here, we use anabstract workflowrepresentation where bothcontrol and datadependencies are shownexplicitly.

Analyzing content of dataitems at all points in aworkflow is an instance of ageneral program analysisproblem. To solve, especiallyin presence of loops andcomplex data structures,approximation techniquessuch as abstractinterpretation are usuallyneeded

Page 33: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example of (Enriched) Workflow

To make the analysis of workflow control and data dependencieseasier, let us first “distill” our BPMN workflow example into a simplifiedabstract form below (elements to be clarified in the slides that follow).

• We keep only the activity tags (a1, . . . , a5), control dependenciesbetween them, and labels for data items read/written by the activities.

• We abstract the looping in the sub-workflow as a structured activity ofrepeat-until type with a separate body sub-workflow.

a1x ,dy

a2x ,ez

a4y ,z−

a3y ,z−

a5

x−

AND

AND

OR

a4 : repeat-until loop

exit depends on p

a41

y ,zc

a42

cp

C ′ ={pre–a42 ≡ done–a41}

C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2,

pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2,

pre–a5 ≡ done–a3 ∨ done–a4}

Page 34: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example of (Enriched) Workflow (cont.)

a1x ,dy

a2x ,ez

a4y ,z−

a3y ,z−

a5

x−

AND

AND

OR

a4 : repeat-until loop

exit depends on p

a41

y ,zc

a42

cp

C ′ ={pre–a42 ≡ done–a41}

C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2,

pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2,

pre–a5 ≡ done–a3 ∨ done–a4}

Workflow control structure:• includes activities a1, . . . , a5• arrows show control dependencies (e.g. a4 depends on a1 and a2)• independent activities may run in parallel (e.g. a1 and a2)• different join types (AND/OR)

Data dependencies based on read/write annotations• Ri

Wiannotation for each activity ai

• Ri is the set of data items read, Wi is the set of data items written

Page 35: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example of (Enriched) Workflow (cont.)

a1x ,dy

a2x ,ez

a4y ,z−

a3y ,z−

a5

x−

AND

AND

OR

a4 : repeat-until loop

exit depends on p

a41

y ,zc

a42

cp

C ′ ={pre–a42 ≡ done–a41}

C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2,

pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2,

pre–a5 ≡ done–a3 ∨ done–a4}

Set C of logical control preconditions:• activity preconditions pre–ai expressed using propositional formulas• done–aj means “aj has finished”• succ–aj means “aj has achieved its (user-defined) goal”• easily models sequences and AND/OR/XOR parallel flows

Helps detect possible deadlocks and race conditions:• deadlocks appear in case of circular dependencies (pre–ai → done–ai )• race conditions appear when two activities that read/write same data

item can execute in parallel

Page 36: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example of (Enriched) Workflow (cont.)

a1x ,dy

a2x ,ez

a4y ,z−

a3y ,z−

a5

x−

AND

AND

OR

a4 : repeat-until loop

exit depends on p

a41

y ,zc

a42

cp

C ′ ={pre–a42 ≡ done–a41}

C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2,

pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2,

pre–a5 ≡ done–a3 ∨ done–a4}

Based on control preconditions, we can find legal orderings ofactivities that respect the preconditions:

• only if there are no deadlocks/race conditions(that can be efficiently checked using e.g. SAT solvers)

All legal orderings are equivalent from the point of view of datahandling.

Page 37: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Example of (Enriched) Workflow (cont.)

a1x ,dy

a2x ,ez

a4y ,z−

a3y ,z−

a5

x−

AND

AND

OR

a4 : repeat-until loop

exit depends on p

a41

y ,zc

a42

cp

C ′ ={pre–a42 ≡ done–a41}

C ={pre–a4 ≡ done–a1 ∧ ¬succ–a1 ∧ done–a2,

pre–a3 ≡ done–a1 ∧ succ–a1 ∧ done–a2,

pre–a5 ≡ done–a3 ∨ done–a4}

Sub-workflows can be used to model complex constructs:• in our case, activity a4 is a repeat-until loop• the body of the loop is a sub-workflow (with a41 and a42

Sub-workflows also allow modular development and/or assembly ofworkflows

Page 38: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Workflow as a Horn Clause Program

We represent workflow symbolically ina Horn Clause form for furtheranalysis.

• the representation is notoperationally equivanent

The predicate w stands for theworkflow

• clause body reflects a legalordering of activities

• variables stand for data itemsand activities

Sub-workflows and complex activitiesare in separate predicates.

w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-

A1=f1(X,D), % a_1

Y=f1_Y(X,D),

A2=f2(X,E), % a_2

Z=f2_Z(X,E),

A3=f3(Y,Z), % a_3

a_4(Y,Z,A4,A41,C,A42,P), % a_4

A5=f5(X). % a_5

a_4(Y,Z,A4,A41,C,A42,P):-

w2(Y,Z,A41,C2,A42,P2),

A4=f4(P2),

a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).

a_4x(_,_,C,P,C,P,_,_,_).

a_4x(X,Z,_,_,C,P,A4,A41,A42):-

a_4(X,Z,A4,A41,C,A42,P).

w2(Y,Z,A41,C,A42,P):-

A41=f41(Y,Z), % a_41

C=f41_C(Y),

A42=f42(C), % a_42

P=f42_P(C).

Note that in Prolog syntax, an underscore (“ ”) represents a new, fresh variable that stands for an arbitrary term.

Predicate w2 represents the body of the loop, and predicates a 4 and a 4x model the repeat-until construct.

Page 39: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Workflow as a Horn Clause Program (cont.)

For each activity, we model datadependencies with unifications:

• Ai = fi(Ri) stands for “activity ai

reads data items from set Ri ”• for each written item z ∈ Wi

written by ai , z = fiz(Q) standsfor “z is written using data itemsfrom Q ⊆ Ri ”

Such a Horn Clause representationcan be derived mechanically and, inprinciple, automatically.

w(X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5):-

A1=f1(X,D), % a_1

Y=f1_Y(X,D),

A2=f2(X,E), % a_2

Z=f2_Z(X,E),

A3=f3(Y,Z), % a_3

a_4(Y,Z,A4,A41,C,A42,P), % a_4

A5=f5(X). % a_5

a_4(Y,Z,A4,A41,C,A42,P):-

w2(Y,Z,A41,C2,A42,P2),

A4=f4(P2),

a_4x(Y,Z,C2,P2,C,P,A4,A41,A42).

a_4x(_,_,C,P,C,P,_,_,_).

a_4x(X,Z,_,_,C,P,A4,A41,A42):-

a_4(X,Z,A4,A41,C,A42,P).

w2(Y,Z,A41,C,A42,P):-

A41=f41(Y,Z), % a_41

C=f41_C(Y),

A42=f42(C), % a_42

P=f42_P(C).

Choice of functors (fi and fiz ) is purely symbolic and is not significant for the subsequent sharing analysis. The purpose ofthe unifications in the Horn Clause representation is to express functional dependencies between activities and data items ina workflow, and not to actually calculate them.

Page 40: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

3 4 Sharing Analysis

Page 41: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Sharing in Logic Programs

Sharing analysis of logic programs tries to inferhow data is shared between variables:

• sharing is always relative to a substitution θupon successful execution of a query

• two variables x , y are said to share if the termsxθ and yθ (i.e. after applying θ to x and y )contain some common variable.

Exampleθ = {x 7→ s(y)} xθ = s(y), yθ = y x and y share

θ = {} xθ = x , yθ = y x and y do NOT share

θ = {x 7→ s(w), xθ = s(w), yθ = f (1, z) x and y doy 7→ f (1, z)} NOT share

θ = {x 7→ [1,w , f (z)], xθ = [1,w , f (z)], x and y sharey 7→ n(z, s(w))} yθ = n(z, s(w)) w and z

Sharing analysis tries to findout all possible sharingsbetween variables in case ofsuccessful executions. Thisrequires inclusion of allpossible substitutions on exitfrom a query.

That is generally impracticaland often impossible, sincethere may be many or eveninfinite number of possiblesubstitutions to be taken intoaccount.

To make sharing analysisviable, we often resort tosome sort of approximationthat reduces the repertoire ofpossible sharing cases to afinite, manageable size, whileremaining safe, i.e. notmissing any potential sharing.

Page 42: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Abstract Substitution DomainInstead of looking at (possibly infinite number of) concretesubstitutions, we can perform analysis on a simplified abstract level.

Abstract substitutions approximate terms with sets of containedvariables (not concerned with the exact shape of terms):

Concrete: θ = { x 7→ f (u , g(v )), y 7→ h(5,u ), z 7→ i(v ,w )}

u v w

Abstract: Θ = { {x , y} , {x , z} , {z} }

shared by

DOMAIN

By operating in the abstract substitution domain, the analysis task becomes simplified and finite. The shown abstractsubstitution domain is not the only applicable choice. For instance, we can work with pair-wise sharing etc. Different sharingdomains also differ with respect to the computational cost of the analysis and precision. The domain used here is known tobe more precise (in the sense of avoiding over-approximation), but exponential in time with respect to the number ofvariables involved. It is also often combined with additional freeness or groundness information.

Page 43: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Workflow Input SubstitutionsWe include the information on user-defined attributes of input data tothe workflow, by setting up the initial substitution for inputs (x , d , e inour case):

init1(X, D, E):-

X= f1(Name, Pin),

D= f2(Symptoms, Tests),

E= f3(Symptoms, Coverage).

Reflects positioning of inputs in the initial context / concept lattice.The initial concrete substitution coded here maps to the initial abstractsubstitution Θ = {{x}, {d , e}, {d}, {e}}

• “x has some components not shared with d or e”• “d and e share something” (Symptoms), but• “both d and e have some private (not shared) components”

(Tests and Coverage, respectively)

Again, note that the choice of functors (f1, f2, f3) and variable names that stand for the attributes (Name, PIN, etc.) isnot significant for the abstract sharing analysis.

Page 44: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

3 5 Obtaining and Interpreting Results

Page 45: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Sharing Results

1 init1(X,D,E):-X=[Name,PIN],

3 D=[Symptoms,Tests],E=[Symptoms,Coverage].

5

init2(X,D,E):-7 X=[Name,Address,SSN],

D=[Symptoms,Tests],9 E=[Symptoms,Coverage].

Fig. 7. Initial substitution for the two cases.

therefore, to subsets of the hidden input variables (which arenodes in the sharing lattice). The ordering a� b between top-level variables a and b in the sharing lattice holds iff A ⊆ B,where A and B are the corresponding subsets of the associatedhidden input variables. It directly follows that � in the sharinglattice is the exact opposite of ≤ in the concept lattice.

In the text that follows, we will use two cases for inputsubstitutions:

Case 1 Patient ID (item x) maps to the Passport object (hasthe attributes Name and PIN).

Case 2 Patient ID (item x) maps to the Social Security Cardobject (has the attributes Name, Address and SSN).

In both cases, the data source d for medical historiesmaps to the Medical history DB object (attributes Symptomsand Tests), and the data source e for medication recordsmaps to the Medication record DB (attributes Symptoms andCoverage).

The input substitution is easy to produce: the input variablesare made equal to (i.e., made to share with) terms that containexactly the associated hidden variables from the input sharinglattice (Fig. 7). The actual shape of the terms is not significant,and therefore we just use lists of variables associated to theattributes.

C. Obtaining Sharing Analysis Results

The sharing analysis is applied to the program resultingfrom the translation of the workflow (Fig. 6) and the codethat sets up the initial substitutions (Fig. 7). The underlyingtheoretical framework we use is abstract interpretation [2],which interprets a program by mapping concrete, possiblyinfinite sets of values onto (usually finite) abstract domainsand reinterprets the operations of the language in a way thatrespects the original semantics of the language. The abstractapproximations of the concrete behavior are safe in the sensethat properties proved in the abstract domain necessarily holdin the concrete case. However, its precision depends in generalon the problem and on the choice of the abstract domain.

The analysis is run using CiaoPP [7], [6], a tool forthe analysis and transformation of logic programs featuring,among others, a powerful sharing analysis. While the sharinganalysis we used is, in pathological cases, exponential in thenumber of variables in a clause, in our experience it exhibits

1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],3 [X,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],4 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],5 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P],6 [D,A1,Y,A3,A4,A41,C,A42,P],7 [E,A2,Z,A3,A41]]

(a) The resulting substitution

Top-level variables Recovered hidden variablesX, A5 {u1, u2, u3, u4}E {u1, u3, u5, u7}D {u1, u2, u5, u6}

A2, Z {u1, u2, u3, u4, u5, u7}A1, Y, A42, C, P {u1, u2, u3, u4, u5, u6}A3, A4, A41 {u1, u2, u3, u4, u5, u6, u7}

(b) Points in the resulting sharing lattice.Fig. 8. Abstract substitution and the recovered hidden variables.

u1

u3

u2u5

u4

x, a5

u7e u6

d

a2 , za1 , y, p,

a42 , c

a3 , a4 , a41

Fig. 9. The resulting concept lattice.

a reasonable speed in practice.1

The output of the analysis is an abstract substitution(Fig. 8(a)), which is common to both cases of input datamapping. The difference will arise in the interpretation, asdescribed in the next section. Each row (1-7) contains a subsetof the top-level variables (representing items and activities inthe workflow) that share at least one unique hidden variable.The minimal set of hidden variables which can explain thatabstract substitution can be easily recovered [8]: for each linei = 1 . . .7 in Fig. 8(a), we introduce a new output hiddenvariable ui (arbitrarily but uniquely named), and we assign toeach top-level variable all hidden variables corresponding tothe rows in which it appears. The result is shown in Fig. 8(b),where each row shows the top-level variables associated to aset of the output hidden variables.

V. FROM SHARING BACK TO CONCEPTS

The mapping of intermediate variables (those which arenot initial top-level variables) to subsets of the hidden outputvariables carries the information on the relationship betweenthese variables that the sharing analysis inferred. However, the

1The results presented here were obtained in 1.192ms using CiaoPP runningon an Apple MacBook computer with Intel Core Duo processor, 2GB of RAMand MacOS X 10.6.5.

The resulting abstract substitution (a) showssharing for data items and activities.

It is as if the data item and activity variablesshared a set of hidden variables u1, . . . , u7 (b)

The sharing results shown in(a) were obtained fromsharing and freeness (shfr)analysis in the CiaoPPanalysis suite.

The results are safe in thesense that all possiblesharing is included. However,it may contain a degree ofover-approximation, i.e. itmay conservatively assumesharing where it cannot beruled out with certainty.

From an abstract substitution,we do not know whichvariables are shared, so onepossibility is to “recover” asufficient number of hiddenvariables that are shared in amanner compatible with theabstract substitution.

Page 46: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Minimal Hidden Variable RecoveryHow many hidden variables are needed tocomply with the sharing results?

• As many as there are sharing settings.• The hidden variables are counterparts of the

user-defined attributes used in the inputsubstitution.

A straightforward algorithm to recover a minimalset of resulting hidden variables U.

function RECOVERSUBSTVARS(V,Θ )n← |Θ |; U ← {u1,u2, ...,un} � n = |Θ | fresh variables in US : V →℘(U); S← const( /0) � the initial value for the resultfor x ∈V , i ∈ {1..n} do � for each variable and subst. setting

if x ∈Θ [i] then � if the variable appears in the settingS← S[x �→ S(x)∪{ui}] � add ui to its resulting set

end ifend for

return U , Send function

Fig. 5. The minimal substitution variable set recovery algorithm.

1 init1(f(W1,W2),f(W2)).init2(f(W1),f(W2)).

3 init3(f(W2),f).

5 caseN(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2):-6 initN(X1,X2),7 w(X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2).

Case Sharing results1 [[X1,A1,Y1,A3,Z1],

[X1,X2,A1,Y1,A2,Y2,A3,Z1,A4,Z2]]2 [[X1,A1,Y1,A3,Z1],

[X2,A2,Y2,A3,Z1,A4,Z2]]3 [[X1,A1,Y1,A3,Z1]]

{w1,w2}1 : x1

{w1}1 : x22 : x1

{w2} 2 : x23 : x1

{} 3 : x2

{u1,u2}1 : x1, a1,y1, a3,z12 : a3,z1

{u1}1 :x2, a2,y2,a4,z22 :x1, a1,y1

{u2}2 :x2, a2,y2,

a4,z23 :x1, a1,y1,

a3,z1{} 3 : x2, a2,y2, a4,z2

Fig. 6. The initial settings and the sharing results.

oped which give users the possibility of running different analysis algorithms on inputprograms. We build on one of the sharing analyses available in CiaoPP.

In the sharing and freeness domain, an abstract substitution Θ is a subset of allpossible sharing sets. Each sharing set is a subset of the variables in the program. Themeaning of this set is that those variables may be bound at run-time to terms whichshare some common variable. E.g., if θ = {x �→ f (u),y �→ g(u,v),z �→ w}, then thecorresponding abstract substitution (projected over x, y, and z) is Θ = {{x,y},{y},{z}}.Note that if u is further substituted (with some term t), x and y will be jointly affected;if v is further substituted, only y will be affected, and if w is further substituted only zwill be affected.

Although an abstract substitution represents an infinite family of concrete substitu-tions (e.g., θ � = {x �→ v,y �→ h(w,v),z �→ k(u)} in the previous example), it is alwayspossible to construct a minimal concrete substitution exhibiting that sharing, by takingas many auxiliary variables as there are sharing sets, and constructing terms over them.Furthermore, if one is only interested in the sets of shared auxiliary variables, not theexact shape of the substituted terms, the simple algorithm from Figure 5 suffices. Notethat only a ground variable x ∈V can have S(x) = /0.

4.3 Deriving Fragment Identification Information from Sharing

To derive fragment identification information from the sharing analysis results, the logicprogram workflow representation has to be analyzed in conjunction with some initial

The proof that it is sufficientto “invent” a hidden variablefor each sharing setting in theresulting abstract sharing,and that fewer hiddenvariables than that would notdo, follows from the definitionof abstract sharing and themonotonicity of logicprograms.

For any non-empty abstractsubstitution, there is aninfinite number of compatibleconcrete substitutions evenwith a fixed set of hiddenvariables, because the shapeof terms may be arbitrary.

That suffices in our case,because we just want toknow what is shared and notexactly how.

Page 47: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Resulting Lattice (Recovered)

1 init1(X,D,E):-X=[Name,PIN],

3 D=[Symptoms,Tests],E=[Symptoms,Coverage].

5

init2(X,D,E):-7 X=[Name,Address,SSN],

D=[Symptoms,Tests],9 E=[Symptoms,Coverage].

Fig. 7. Initial substitution for the two cases.

therefore, to subsets of the hidden input variables (which arenodes in the sharing lattice). The ordering a� b between top-level variables a and b in the sharing lattice holds iff A ⊆ B,where A and B are the corresponding subsets of the associatedhidden input variables. It directly follows that � in the sharinglattice is the exact opposite of ≤ in the concept lattice.

In the text that follows, we will use two cases for inputsubstitutions:

Case 1 Patient ID (item x) maps to the Passport object (hasthe attributes Name and PIN).

Case 2 Patient ID (item x) maps to the Social Security Cardobject (has the attributes Name, Address and SSN).

In both cases, the data source d for medical historiesmaps to the Medical history DB object (attributes Symptomsand Tests), and the data source e for medication recordsmaps to the Medication record DB (attributes Symptoms andCoverage).

The input substitution is easy to produce: the input variablesare made equal to (i.e., made to share with) terms that containexactly the associated hidden variables from the input sharinglattice (Fig. 7). The actual shape of the terms is not significant,and therefore we just use lists of variables associated to theattributes.

C. Obtaining Sharing Analysis Results

The sharing analysis is applied to the program resultingfrom the translation of the workflow (Fig. 6) and the codethat sets up the initial substitutions (Fig. 7). The underlyingtheoretical framework we use is abstract interpretation [2],which interprets a program by mapping concrete, possiblyinfinite sets of values onto (usually finite) abstract domainsand reinterprets the operations of the language in a way thatrespects the original semantics of the language. The abstractapproximations of the concrete behavior are safe in the sensethat properties proved in the abstract domain necessarily holdin the concrete case. However, its precision depends in generalon the problem and on the choice of the abstract domain.

The analysis is run using CiaoPP [7], [6], a tool forthe analysis and transformation of logic programs featuring,among others, a powerful sharing analysis. While the sharinganalysis we used is, in pathological cases, exponential in thenumber of variables in a clause, in our experience it exhibits

1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],3 [X,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],4 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],5 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P],6 [D,A1,Y,A3,A4,A41,C,A42,P],7 [E,A2,Z,A3,A41]]

(a) The resulting substitution

Top-level variables Recovered hidden variablesX, A5 {u1, u2, u3, u4}E {u1, u3, u5, u7}D {u1, u2, u5, u6}

A2, Z {u1, u2, u3, u4, u5, u7}A1, Y, A42, C, P {u1, u2, u3, u4, u5, u6}A3, A4, A41 {u1, u2, u3, u4, u5, u6, u7}

(b) Points in the resulting sharing lattice.Fig. 8. Abstract substitution and the recovered hidden variables.

u1

u3

u2u5

u4

x, a5

u7e u6

d

a2 , za1 , y, p,

a42 , c

a3 , a4 , a41

Fig. 9. The resulting concept lattice.

a reasonable speed in practice.1

The output of the analysis is an abstract substitution(Fig. 8(a)), which is common to both cases of input datamapping. The difference will arise in the interpretation, asdescribed in the next section. Each row (1-7) contains a subsetof the top-level variables (representing items and activities inthe workflow) that share at least one unique hidden variable.The minimal set of hidden variables which can explain thatabstract substitution can be easily recovered [8]: for each linei = 1 . . .7 in Fig. 8(a), we introduce a new output hiddenvariable ui (arbitrarily but uniquely named), and we assign toeach top-level variable all hidden variables corresponding tothe rows in which it appears. The result is shown in Fig. 8(b),where each row shows the top-level variables associated to aset of the output hidden variables.

V. FROM SHARING BACK TO CONCEPTS

The mapping of intermediate variables (those which arenot initial top-level variables) to subsets of the hidden outputvariables carries the information on the relationship betweenthese variables that the sharing analysis inferred. However, the

1The results presented here were obtained in 1.192ms using CiaoPP runningon an Apple MacBook computer with Intel Core Duo processor, 2GB of RAMand MacOS X 10.6.5.

We can now construct the resulting conceptlatice:

• activity and data item variables as objects• recovered hidden variables as attributes• activities are highlighted

To interpret the resultinglattice in terms of the originaluser-defined attributes, weobserve that sharing analysispreserves ordering betweenconcepts in the lattice. Thatmeans that the “lesser”concepts in the resultinglattice inherit all originalattributes from the input dataitems (shown in boldface).

Therefore, we se the resultinglattice over hidden variablesas a skeleton to “paint”intermediate data items andactivities with the originaluser-defined attributes.

Page 48: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Resulting Context

Finally, after assigning user-defined attributes toconcepts in the resulting lattice, we can createthe resulting context:

Item Name PIN Symp. Tests Cover.

xde

a2 , z

a1 , y, p, a42 , c

a3 , a4 , a41

a5

Item Name Address SSN Symp. Tests Cover.

xde

a2 , z

a1 , y, p, a42 , c

a3 , a4 , a41

a5

Fig. 10. The resulting context for the two analysis cases.

meaning of these output hidden variables has to be interpretedin terms of the original attributes — starting with those of theinput data items. The sharing analysis of course preserves theoriginal relationship among the input top-level variables [8]:if two variables a and b were associated in the input sharinglattice to subsets of attributes A and B, respectively, such thatA⊆ B, then for the corresponding subsets A1 and B1 to whicha and b map in the resulting sharing lattice, it is the case thatA1 ⊆ B1.

The next step is to construct a result concept lattice (Fig. 9)based on the sharing analysis results where data items andactivities are considered as objects and the hidden variablesin the result are considered as a new set of attributes. Theactivities are highlighted and framed, and the input data itemsfrom the input concept lattice are set in boldface. In this latticewe first assign the original attributes to the input data items,and then pass them down to all the lower-level concepts. Wethen obtain the resulting contexts (Fig. 10) for the two initialcases aforementioned. Note that only the attributes that areassociated with some input data item may appear.

It should be noted that the construction of the resultingconcept lattice can be done in polynomial time with respectto the number of objects (data items, activities) and attributes[13]. Different algorithms for construction of concept latticesdiffer in performance over different types of sparse contexts.

We want to note that in the most general case sharinganalysis is undecidable, and the results of the analyzer canbe a safe over-approximation which can indicate sharingbetween variables when it could not be proved that there isdefinitely no sharing. However, when it indicates no sharing,then this is definitely the case. The assignment of attributes tothe workflow elements should be interpreted accordingly: theabsence of an attribute is always certain, but its presence isnot guaranteed.

We can now go back to the application cases mentioned inSection II and illustrate how the information in the contextsin Fig. 10 can be applied.

a) Fragmentation: The organization responsible formedicine prescription may want to split the workflow amongseveral partners, based on what kind of information they areallowed to handle. The basis for fragmentation is the resultingcontexts from Fig. 10. An example of fragmentation is shownin Fig. 11. The swim lanes correspond to the health organi-zation and its partners. Registry and Archive cannot handleSymptoms, Tests, or Coverage data, and is therefore assignedactivity a5. Medical examiners can at most see Symptoms

and Tests, and are thus assigned the activities a1 and a42.Medication providers can only take care of Symptoms andCoverage, and are assigned activity a2. All other activities(a3, a4 and a41) need full access and remain centrally handledby the health organization.

b) Data Compliance: It may be known that a particularkind of information identifying a patient, such as his/herSSN, is required for retrieving the patient’s medication record(activity a2), and that the patient’s address is required forsending the results of tests (activity a42). It can therefore bedetected at design time that unless the patient is identified witha Social Security Card, these activities will fail. The designermay either restrict the use of the workflow by requiring thecard, or select implementations of the mentioned activitieswith weaker success preconditions.

c) Robust Top-Down Development: Based on the char-acterization of the input data items, designers can derive theattributes of the data items in nested workflows. For instance,the attributes of the medicine search criterion (c) and theprescription candidate (p) are inferred in Fig. 10 in a safeway.

VI. CONCLUSIONS

We have shown how an FCA-based characterization of inputdata to a workflow can be enriched to include intermediatedata items and internal activities. These are annotated withattributes which are inferred from emergent properties ofthe workflow which stem from the workflow structure andrelationships between input data. We have shown how thistask can be automated by translating (a) an initial FCA intoa lattice from which sharing conditions are derived and (b)the workflow structure into a logic program. Then, (a) and (b)are subjected to a sharing analysis, and the results are mappedback to a resulting lattice and that to a resulting context, whoseinformation can be used as a starting point for a numberof other tasks. We have illustrated this methodology with aworked example.

As future work, we plan to address the development ofautomatic translations from common business process spec-ification languages (BPEL, XPDL, YAWL, etc.) into logicprograms amenable to sharing analysis in order to further testand refine the techniques proposed herein. Besides, we plan toexplore other applications of the concept of sharing to services,aiming not only at (local) data sharing between activities,but also looking towards the representation of stateful serviceconversations and quality aspects of services.

• The input data items (above the line) keep theinitial attributes

• Intermediate data items and activities (belowthe line) are added with the assigned attributes

The resulting context is asimple tabular form that ispresented to the user as theresult of the sharing analysis.The user starts with the inputcontext (above the line) andthe workflow definition, whileall other steps areintermediate, mechanical andideally fully automated.

For activities, attributesindicated the properties ofdata visible (read) by anactivity. For data items,attributes describe theinformation content of dataand what it was derived from.

Note that the sharing analysisis conservative in the sensethat all attributes that cannotbe decidedly ruled out areincluded.

Page 49: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

4 Application to Fragment Identification

Page 50: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Fragmentation Example (Information Flow)

Main medical workflow

Hea

lthO

rgan

izat

ion

Med

ical

Exa

min

ers

Med

icat

ion

Prov

ider

Reg

istr

y&

Arc

hive

+

a1: Retrievemedical history

a2: Retrievemedication record

+ �

a4: Select newmedication

a3: Continue lastprescription

¬stable

stable

a5: Log treatment

a41: Run tests toproduce medication

criteria

a42: Searchmedicationdatabases

Resultsufficientlyspecific?

no yes

Workflow for service a4.

Fig. 11. An example fragmentation for the drug prescription workflow.

ACKNOWLEDGMENTS

The research leading to these results has received fundingfrom the European Community’s 7th Framework Programmeunder the NoE S-Cube (Grant Agreement n◦ 215483). Theauthors were also partially supported by Spanish MEC project2008-05624/TIN DOVES and CM project P2009/TIC/1465(PROMETIDOS).

REFERENCES

[1] Claudio Carpineto and Giovanni Romano. Concept Data Analysis:Theory and Applications. Wiley, 2004.

[2] P. Cousot and R. Cousot. Abstract Interpretation: a Unified LatticeModel for Static Analysis of Programs by Construction or Approxima-tion of Fixpoints. In ACM Symposium on Principles of ProgrammingLanguages (POPL’77). ACM Press, 1977.

[3] B. A. Davey and H. A. Priestley. Introduction to Lattices and Order.Cambridge University Press, 2nd ed. edition, 2002.

[4] Walid Fdhila, Ustun Yildiz, and Claude Godart. A Flexible Approachfor Automatic Process Decentralization Using Dependency Tables. InICWS, pages 847–855, 2009.

[5] Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors. Formal Con-cept Analysis, Foundations and Applications, volume 3626 of LectureNotes in Computer Science. Springer, 2005.

[6] M. Hermenegildo, G. Puebla, F. Bueno, and P. Lopez Garcıa. IntegratedProgram Debugging, Verification, and Optimization Using AbstractInterpretation (and The Ciao System Preprocessor). Science of ComputerProgramming, 58(1–2), 2005.

[7] M. V. Hermenegildo, F. Bueno, M. Carro, P. Lopez, E. Mera, J.F.Morales, and G. Puebla. An Overview of Ciao and its DesignPhilosophy. Theory and Practice of Logic Programming, 2011.http://arxiv.org/abs/1102.5497.

[8] Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo. AutomaticFragment Identification in Workflows Based on Sharing Analysis. InMathias Weske, Jian Yang, Paul Maglio, and Marcelo Fantinato, editors,Service-Oriented Computing – ICSOC 2010, number 6470 in LNCS.Springer Verlag, 2010.

[9] D. Jordan and et. al. Web Services Business Process Execution LanguageVersion 2.0. Technical report, IBM, Microsoft, et. al, 2007.

[10] R. Khalaf and F. Leymann. E Role-based Decomposition of BusinessProcesses using BPEL. In IEEE International Conference on WebServices (ICWS’06), 2006.

[11] Rania Khalaf. Note on Syntactic Details of Split BPEL-D BusinessProcesses. Technical Report 2007/2, IAAS, U. Stuttgart, July 2007.

[12] Oliver Kopp, Rania Khalaf, and Frank Leymann. Deriving Explicit DataLinks in WS-BPEL Processes. In International Conference on ServicesComputing (SCC), 2008.

[13] Sergei O. Kuznetsov and Sergei A. Obiedkov. Comparing performanceof algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell.,14(2-3):189–216, 2002.

[14] J.W. Lloyd. Foundations of Logic Programming. Springer, 2nd Ext. Ed.,1987.

[15] Daniel Martin, Daniel Wutke, and Frank Leymann. A Novel Approachto Decentralized Workflow Enactment. In EDOC ’08: Proceedingsof the 2008 12th International IEEE Enterprise Distributed ObjectComputing Conference, pages 127–136, Washington, DC, USA, 2008.IEEE Computer Society.

[16] F. Nielson, H. R. Nielson, and C. Hankin. Principles of ProgramAnalysis. Springer, 2005. Second Ed.

[17] Object Management Group. Business Process Modeling Notation(BPMN), Version 1.2, January 2009.

[18] Natalia Sidorova, Christian Stahl, and Nikola Trcka. Workflow sound-ness revisited: Checking correctness in the presence of data while stayingconceptual. In CAiSE, pages 530–544, 2010.

[19] Sherry X. Sun, J. Leon Zhao, Jay F. Nunamaker, and Olivia R. LiuSheng. Formulating the data-flow perspective for business processmanagement. Information Systems Research, 17(4):374–391, 2006.

[20] W. M. P. van der Aalst and A. H. M. ter Hofstede. YAWL: Yet AnotherWorkflow Language. Information Systems, 30(4):245–275, June 2005.

[21] Wil van der Aalst and Maja Pesic. DecSerFlow: Towards a TrulyDeclarative Service Flow Language. In The Role of Business Processesin Service Oriented Architectures, number 06291 in Dagstuhl SeminarProceedings, 2006.

[22] Ping Yang, Shiyong Lu, Mikhail I. Gofman, and Zijiang Yang. In-formation flow analysis of scientific workflows. J. Comput. Syst. Sci.,76(6):390–402, 2010.

[23] Ustun Yildiz and Claude Godart. Information Flow Control withDecentralized Service Compositions. In ICWS, pages 9–17, 2007.

Distributing execution of the workflow(s) across organizations• Fragment: a subset of activities sharing a common property• Fragments assigned to swim-lanes (partners)• Property: access level to sensitive data

I Medical examiners cannot see insurance coverageI Medication providers cannot see medical testsI Registry can see only the patient ID.

Page 51: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Another Input Context Example

init2(X,D,E):-

X=f1(Name, Address, SSN),

D=f2(SSN, Tests, Coverage),

E=f3(SSN, Coverage).

u1 [[X,D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],

u2 [X,D,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],

u3 [X,A1,Y,A2,Z,A3,A4,A41,C,A42,P,A5],

u4 [D,E,A1,Y,A2,Z,A3,A4,A41,C,A42,P],

u5 [D,A1,Y,A3,A4,A41,C,A42,P]]

u1u2

u3

x, a5

u4e

u5d

z, a2

y , p, c, a1 , a3 , a4 , a41 , a42

Name Address SSN Tests Coveragex, a5 X X X

d X X Xe X X

z, a2 X X X X

y , p, c, other a〈·〉 X X X X X

Swimlane ActivitiesHealth Organization a1, a3, a4, a41, a42

Medical Examiners (empty )Medication Provider a2

Registry & Archive a5

INITIAL SUBSTITUTION1

SHARING RESULTS2

RESULTING LATTICE3

RESULTING CONTEXT4 RESULTING FRAGMENTATION SCHEME5

Page 52: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

5 Conclusions

Page 53: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

ConclusionsRepresenting inputs and intermediate data and activities using FCAcontexts and concept lattices allows lattice-based formulation,interpretation, and reasoning on data attributes.Sharing analysis of logic programs is a powerful technique for(abstract) sharing analysis, including sharing of data attributes.

• Supports complex data and control structures• Applicable both at design and run-time

Applications include fragmentation (as illustrated), but also:Data compliance checking – to verify that sufficient information is

exchanged between component servicesRobust top-down development – by refining component /

sub-workflow specifications.

Future work: developing translators from concrete executablelanguages (BPEL, XPDL, Yawl, etc.) into Horn clause programs tofacilitate automatic analysis. Also: analyzing stateful conversationsbetween compositions.

Page 54: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

References

The content of this presentation is based on the following publications:

Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo.Automated Attribute Inference in Complex Service Workflows Based on SharingAnalysis.Proceedings of the 8th International Conference on Service Computing - IEEE SCC2011, IEEE Press, 2011.

Dragan Ivanovic, Manuel Carro, and Manuel Hermenegildo.Automatic Fragment Identification in Workflows Based on Sharing Analysis.In Mathias Weske, Jian Yang, Paul Maglio, and Marcelo Fantinato, editors,Service-Oriented Computing – ICSOC 2010, number 6470 in LNCS. Springer Verlag,2010.

Page 55: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

References

Some pointers on Web service analysis and fragmentation:

Daniel Martin, Daniel Wutke, and Frank Leymann.A Novel Approach to Decentralized Workflow Enactment.In EDOC ’08: Proceedings of the 2008 12th International IEEE Enterprise DistributedObject Computing Conference, pages 127–136, Washington, DC, USA, 2008. IEEEComputer Society.

Ustun Yildiz and Claude Godart.Information Flow Control with Decentralized Service Compositions.In Proceedings of ICWS 2007, pages 9–17, 2007.

Oliver Kopp, Rania Khalaf, and Frank Leymann.Deriving Explicit Data Links in WS-BPEL Processes.In International Conference on Services Computing (SCC), 2008.

Rania Khalaf.Note on Syntactic Details of Split BPEL-D Business Processes.Technical Report 2007/2, IAAS, U. Stuttgart, July 2007.

Page 56: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

References

Some pointers to Formal Concept Analysis (FCA):

Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors.Formal Con- cept Analysis, Foundations and Applications.Volume 3626 of Lecture Notes in Computer Science. Springer, 2005.

Claudio Carpineto and Giovanni Romano.Concept Data Analysis: Theory and Applications.Wiley, 2004.

B. A. Davey and H. A. Priestley.Introduction to Lattices and Order.Cambridge University Press, 2nd ed. edition, 2002.

Sergei O. Kuznetsov and Sergei A. Obiedkov.Comparing performance of algorithms for generating concept lattices.J. Exp. Theor. Artif. Intell., 14(2-3):189–216, 2002.

Page 57: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

References

Some pointers on program analysis in general and logic programming:

P. Cousot and R. Cousot.Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs byConstruction or Approximation of Fixpoints.In ACM Symposium on Principles of Programming Languages (POPL’77). ACMPress, 1977.

F. Nielson, H. R. Nielson, and C. Hankin.Principles of Program Analysis.Springer, 2005. Second Ed.

M. V. Hermenegildo, F. Bueno, M. Carro, P. Lopez, E. Mera, J.F. Morales, and G.Puebla.An Overview of Ciao and its Design Philosophy.Theory and Practice of Logic Programming, 2011. http://arxiv.org/abs/1102.5497.

Page 58: S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestrations Based on Sharing Analysis

Acknowledgements

The research leading to these results has received fundingfrom the European Community’s Seventh FrameworkProgramme [FP7/2007-2013] under grant agreement 215483(S-Cube).