a proof of concept: provenance in a service oriented architecture liming chen, victor tan, fenglian...
TRANSCRIPT
A Proof of Concept:Provenance in a Service Oriented
ArchitectureLiming Chen, Victor Tan,Fenglian Xu, Alexis Biller,Paul Groth, Simon Miles,
John Ibbotson, Michael Luckand Luc Moreau
Purpose
• Asking questions about the provenance of something, i.e. the process by which it came to be as it is, is essential in many domains
• We are working with bioinformaticians, medics, aerospace engineers, physicists and have found a wide range of questions they wish to ask
• A simple example application can:– Clarify the requirements on software to aid answering
those questions– Be used to explain the issues involved to non-domain
experts– Be extended in controlled ways to explore issues that
arise in ‘real’ applications
EU Provenance and PASOA
• Recent work of the EU Provenance project:– Developed a logical architecture for software to aid answering
provenance-related questions, along with other research on security, scalability and user tool support.
– Now being applied to two project applications: organ transport management (UPC, Spain) and aerospace engineering (DLR, Germany)
– The logical architecture document should be released next week: keep an eye on www.gridprovenance.org
• Recent work of the PASOA project:– Has focused on e-Science applications and has gathered
requirements, developed protocols and software– EU Provenance used PASOA software for the work described in
this talk– PASOA will be discussed in the following two presentations
Outline
• The example application• Asking provenance-related questions• The example as a service-oriented
process• Recording documentation of a process• What does the example show us?• What are the limits of the example?• Conclusions
The example application
Baking a Victoria Sponge
• INGREDIENTS– 110g (4oz) Butter
110g (4oz) Caster Sugar 110g (4oz) Self-raising Flour 2 Eggs Vanilla Essence or 1 tsp Grated Lemon Rind
• RECIPE– Preheat oven to 190°C: 375°F: Gas 5.
Whisk together the butter and sugar until light and creamy. Add the beaten eggs gradually with a little of the flour. Fold in the remaining sieved flour and add the flavouring. Divide equally between two 15cm (6 inch) sandwich tins. Bake for 20 - 25 minutes. Turn out on to a wire rack to cool.
• This is not so a contrived an example!
www.thefoody.com
get mixture 1
20g sugar
and 20g butter
whisk them together
2 eggsbeat the
eggs for 2 minutes
mix the beaten eggs
with mixture 1
obtain mixture 2
100g flour
together with mixture 2
fold to mixture 3
put mixture 3 into oven
set baking time to 30min
set baking temperature
to 180˚C
obtain a cake
We then set a time for
bakingcake
After Baking
• Some questions can be asked after baking a cake
• Answers to the questions can be found if we record details of the baking process during its execution
• Details of the baking process is what we call the provenance of a cake
“What went wrong?” Questions
• Did we follow the recipe accurately?– Did we use the correct ingredients at the right time?– Did we provide the correct quantities? Correct units?– Did we perform actions for the right duration?
We need to keep a record of all actions performed with all their parameters (such as the number of eggs used)
• Organ transplant example: Did the medics follow the correct procedure?
• Bioinformatics example: Did I analyse a amino acid sequence using tools that actually only apply to nucleotide sequences?
“What went wrong?” Questions
• Other factors can affect the baking process:– Amount of flour required varies with altitude– Oven is broken and baked at a different temperature
We need to know the “internal state” of the different entities participating in the baking process (such as actual oven temperature or oven altitude)
• Organ transplant example: By what criteria did a team decide to accept or reject an organ?
• Bioinformatics example: What script was used by the services to perform each stage of the experiment?
“Process Analysis” Questions
• Did we use the same amount of ingredients for baking cake 1 and cake 2? or in the same proportion?
• What was the longest step in the execution of a recipe?• Why did not we finish the process? Where did we stop?The process that led to a given cake should be delimited
and analysable• Organ transplant example: Which patient’s death led to
the organ now being transplanted?• Bioinformatics example: What samples led to the final
analysis result?
“What Did Parties Do?” Questions
• Did the baker follow the user’s instructions (regardless of any claim from the baker)?
• Did each step of the baking process follow the user’s instructions? Did they receive the correct instructions?– Did they follow the received instructions?
All entities should document their view of a process because it may vary
Organ transplant example: Were there differing opinions on the suitability of an organ for transplant?
Bioinformatics example: I claim I used a database in my experiments whose license allows me to patent my results: does the database owner confirm this?
Implementation
• We implemented the application as a set of Web Services, and then implemented clients that answered the provenance-related questions by querying the provenance store
• This involved mapping the scenario onto a service-oriented architecture
Mixture 1
Cake
Mixture 1 + Eggs + Beating Time
Mixture 2
Flour + Mixture 2
Mixture 3
Mixture 3 + Temperature + Baking Time
Sugar + Flour + Beating Time + Temperature
Cake
Service-Oriented Process
Butter + Sugar
WhiskBeat &
MixFold
OvenBake
BakerUser
RecordingWhisk
Beat & Mix
FoldOvenBake
BakerUser
ProvenanceStore
After baking, the provenance store containsa trace of the different activities thatwere involved in the production of a cake.
The provenance of a cake is the documentationof the process that led to that cake
WhiskReturn (Mixture 1)
OvenBakeReturn (Cake)
Beat&Mix (Mixture 1, Eggs, Beating Time)
Beat&MixReturn (Mixture 2)
Fold (Flour, Mixture 2)
FoldReturn (Mixture 3)
OvenBake (Mixture 3, Temperature, Baking Time)
Baker (Sugar, Flour, Beating Time, Temperature
BakerReturn (Cake)
Whisk (Butter, Sugar)
What we have learnt
Process Documentation and Provenance
• We distinguish– process documentation (the documentation recorded
into a provenance store about a process)– provenance (the information retrieved from a
provenance store about a process)
• This is because we have found there to be different requirements on each
Process documentation ProvenanceProcessing
Process documentation
• Should allow questions about the provenance of entities to be answered
• Should follow a consistent, application-independent structure so that independent parties can record documentation that is easily combined– e.g. oven may be owned by someone other than the user, but
their documentation is combined to answer whether the requested temperature was used
• Should state exactly what those recording it know to have happened, not confuse it with what they guessed or inferred had happened– e.g. baker states that it put the cake in the oven, not that the
cake was successfully baked, because the oven may have been broken
Provenance
• Should give the client asking for the provenance of something control over the scope of the answer– e.g. whether the process that produced the flour is included in
the provenance of the cake
• Should be/provide the information relevant to answering a client’s/user’s questions (not swamp them with detail)– e.g. report how much flour used rather than giving XML structure
sent between application components
• May (in order to achieve the above) include inferred information– e.g. infer from baker putting mixture in oven and getting cake out
that the cake was successfully baked from the mixture
Provenance architectures
• Should allow different parties to record independent documentation if they want to– e.g. user and baker can record independently, allowing
discrepancies to be noticed• Should have no dependence on any one workflow
engine/language, and no requirement for (explicit) workflows to be used at all– e.g. our example application was written in Java, and baking in
reality follows a plan in someone’s head• Should have independence from any one product of a
process: should not be necessary to store process documentation with any one result of a process– e.g. the provenance of the cake, the provenance of the
ingredients and the provenance of the intermediate mixtures overlap, so cannot claim it ‘belongs’ to any
Limitations and Strengths
• The current example has limitations:– Physical world treated as if it mapped directly to the
electronic world: how does a baker record documentation in a provenance store Web Service? through a GUI? what if the GUI goes wrong or they use the GUI wrongly, do we still have sound process documentation?
– None of the objects in the process have constituent parts that we may want to independently find the provenance of
– Assumes a single provenance store that every service happily submits documentation to
• …but the strength of the example is that it can be simply extended to remove these limitations
Conclusions
• The simple example allows us to determine the requirements on software to record process documentation and make it available to users
• We have used it as a testbed, extending it to explore other aspects of provenance (along with other applications)
• It is rich enough to continue extending to mirror, in a controlled way, issues discovered in the future
EU Provenance Partners
• IBM United Kingdom Limited• University of Southampton• University of Wales, Cardiff• Deutsches Zentrum fur Luft- und
Raumfahrt s.V• Universitat Politecnica de Catalunya• Magyar Tudomanyos Akademia
Szamitastechnikai es Automatizalasi Kutato Intezet