a proof of concept: provenance in a service oriented architecture liming chen, victor tan, fenglian...

27
A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson, Michael Luck and Luc Moreau

Upload: angelina-stevenson

Post on 28-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

A Proof of Concept:Provenance in a Service Oriented

ArchitectureLiming Chen, Victor Tan,Fenglian Xu, Alexis Biller,Paul Groth, Simon Miles,

John Ibbotson, Michael Luckand Luc Moreau

Page 2: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Purpose

• Asking questions about the provenance of something, i.e. the process by which it came to be as it is, is essential in many domains

• We are working with bioinformaticians, medics, aerospace engineers, physicists and have found a wide range of questions they wish to ask

• A simple example application can:– Clarify the requirements on software to aid answering

those questions– Be used to explain the issues involved to non-domain

experts– Be extended in controlled ways to explore issues that

arise in ‘real’ applications

Page 3: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

EU Provenance and PASOA

• Recent work of the EU Provenance project:– Developed a logical architecture for software to aid answering

provenance-related questions, along with other research on security, scalability and user tool support.

– Now being applied to two project applications: organ transport management (UPC, Spain) and aerospace engineering (DLR, Germany)

– The logical architecture document should be released next week: keep an eye on www.gridprovenance.org

• Recent work of the PASOA project:– Has focused on e-Science applications and has gathered

requirements, developed protocols and software– EU Provenance used PASOA software for the work described in

this talk– PASOA will be discussed in the following two presentations

Page 4: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Outline

• The example application• Asking provenance-related questions• The example as a service-oriented

process• Recording documentation of a process• What does the example show us?• What are the limits of the example?• Conclusions

Page 5: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

The example application

Page 6: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Baking a Victoria Sponge

• INGREDIENTS– 110g (4oz) Butter

110g (4oz) Caster Sugar 110g (4oz) Self-raising Flour 2 Eggs Vanilla Essence or 1 tsp Grated Lemon Rind

• RECIPE– Preheat oven to 190°C: 375°F: Gas 5.

Whisk together the butter and sugar until light and creamy. Add the beaten eggs gradually with a little of the flour. Fold in the remaining sieved flour and add the flavouring. Divide equally between two 15cm (6 inch) sandwich tins. Bake for 20 - 25 minutes. Turn out on to a wire rack to cool.

• This is not so a contrived an example!

www.thefoody.com

Page 7: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

get mixture 1

20g sugar

and 20g butter

whisk them together

Page 8: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

2 eggsbeat the

eggs for 2 minutes

mix the beaten eggs

with mixture 1

obtain mixture 2

Page 9: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

100g flour

together with mixture 2

fold to mixture 3

Page 10: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

put mixture 3 into oven

set baking time to 30min

set baking temperature

to 180˚C

obtain a cake

Page 11: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

We then set a time for

bakingcake

Page 12: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

After Baking

• Some questions can be asked after baking a cake

• Answers to the questions can be found if we record details of the baking process during its execution

• Details of the baking process is what we call the provenance of a cake

Page 13: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

“What went wrong?” Questions

• Did we follow the recipe accurately?– Did we use the correct ingredients at the right time?– Did we provide the correct quantities? Correct units?– Did we perform actions for the right duration?

We need to keep a record of all actions performed with all their parameters (such as the number of eggs used)

• Organ transplant example: Did the medics follow the correct procedure?

• Bioinformatics example: Did I analyse a amino acid sequence using tools that actually only apply to nucleotide sequences?

Page 14: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

“What went wrong?” Questions

• Other factors can affect the baking process:– Amount of flour required varies with altitude– Oven is broken and baked at a different temperature

We need to know the “internal state” of the different entities participating in the baking process (such as actual oven temperature or oven altitude)

• Organ transplant example: By what criteria did a team decide to accept or reject an organ?

• Bioinformatics example: What script was used by the services to perform each stage of the experiment?

Page 15: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

“Process Analysis” Questions

• Did we use the same amount of ingredients for baking cake 1 and cake 2? or in the same proportion?

• What was the longest step in the execution of a recipe?• Why did not we finish the process? Where did we stop?The process that led to a given cake should be delimited

and analysable• Organ transplant example: Which patient’s death led to

the organ now being transplanted?• Bioinformatics example: What samples led to the final

analysis result?

Page 16: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

“What Did Parties Do?” Questions

• Did the baker follow the user’s instructions (regardless of any claim from the baker)?

• Did each step of the baking process follow the user’s instructions? Did they receive the correct instructions?– Did they follow the received instructions?

All entities should document their view of a process because it may vary

Organ transplant example: Were there differing opinions on the suitability of an organ for transplant?

Bioinformatics example: I claim I used a database in my experiments whose license allows me to patent my results: does the database owner confirm this?

Page 17: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Implementation

• We implemented the application as a set of Web Services, and then implemented clients that answered the provenance-related questions by querying the provenance store

• This involved mapping the scenario onto a service-oriented architecture

Page 18: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Mixture 1

Cake

Mixture 1 + Eggs + Beating Time

Mixture 2

Flour + Mixture 2

Mixture 3

Mixture 3 + Temperature + Baking Time

Sugar + Flour + Beating Time + Temperature

Cake

Service-Oriented Process

Butter + Sugar

WhiskBeat &

MixFold

OvenBake

BakerUser

Page 19: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

RecordingWhisk

Beat & Mix

FoldOvenBake

BakerUser

ProvenanceStore

After baking, the provenance store containsa trace of the different activities thatwere involved in the production of a cake.

The provenance of a cake is the documentationof the process that led to that cake

WhiskReturn (Mixture 1)

OvenBakeReturn (Cake)

Beat&Mix (Mixture 1, Eggs, Beating Time)

Beat&MixReturn (Mixture 2)

Fold (Flour, Mixture 2)

FoldReturn (Mixture 3)

OvenBake (Mixture 3, Temperature, Baking Time)

Baker (Sugar, Flour, Beating Time, Temperature

BakerReturn (Cake)

Whisk (Butter, Sugar)

Page 20: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

What we have learnt

Page 21: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Process Documentation and Provenance

• We distinguish– process documentation (the documentation recorded

into a provenance store about a process)– provenance (the information retrieved from a

provenance store about a process)

• This is because we have found there to be different requirements on each

Process documentation ProvenanceProcessing

Page 22: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Process documentation

• Should allow questions about the provenance of entities to be answered

• Should follow a consistent, application-independent structure so that independent parties can record documentation that is easily combined– e.g. oven may be owned by someone other than the user, but

their documentation is combined to answer whether the requested temperature was used

• Should state exactly what those recording it know to have happened, not confuse it with what they guessed or inferred had happened– e.g. baker states that it put the cake in the oven, not that the

cake was successfully baked, because the oven may have been broken

Page 23: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Provenance

• Should give the client asking for the provenance of something control over the scope of the answer– e.g. whether the process that produced the flour is included in

the provenance of the cake

• Should be/provide the information relevant to answering a client’s/user’s questions (not swamp them with detail)– e.g. report how much flour used rather than giving XML structure

sent between application components

• May (in order to achieve the above) include inferred information– e.g. infer from baker putting mixture in oven and getting cake out

that the cake was successfully baked from the mixture

Page 24: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Provenance architectures

• Should allow different parties to record independent documentation if they want to– e.g. user and baker can record independently, allowing

discrepancies to be noticed• Should have no dependence on any one workflow

engine/language, and no requirement for (explicit) workflows to be used at all– e.g. our example application was written in Java, and baking in

reality follows a plan in someone’s head• Should have independence from any one product of a

process: should not be necessary to store process documentation with any one result of a process– e.g. the provenance of the cake, the provenance of the

ingredients and the provenance of the intermediate mixtures overlap, so cannot claim it ‘belongs’ to any

Page 25: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Limitations and Strengths

• The current example has limitations:– Physical world treated as if it mapped directly to the

electronic world: how does a baker record documentation in a provenance store Web Service? through a GUI? what if the GUI goes wrong or they use the GUI wrongly, do we still have sound process documentation?

– None of the objects in the process have constituent parts that we may want to independently find the provenance of

– Assumes a single provenance store that every service happily submits documentation to

• …but the strength of the example is that it can be simply extended to remove these limitations

Page 26: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

Conclusions

• The simple example allows us to determine the requirements on software to record process documentation and make it available to users

• We have used it as a testbed, extending it to explore other aspects of provenance (along with other applications)

• It is rich enough to continue extending to mirror, in a controlled way, issues discovered in the future

Page 27: A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,

EU Provenance Partners

• IBM United Kingdom Limited• University of Southampton• University of Wales, Cardiff• Deutsches Zentrum fur Luft- und

Raumfahrt s.V• Universitat Politecnica de Catalunya• Magyar Tudomanyos Akademia

Szamitastechnikai es Automatizalasi Kutato Intezet