andrew jones interop. in changing infrastructure biodiversityworld grid workshop nesc, edinburgh –...
TRANSCRIPT
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
1
Design Decisions
Interoperability in a changing architecture
Andrew Jones
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
2
BiodiversityWorld requirements (1)
• Biodiversity Problem Solving Environment –• Heterogeneous diverse resources
• Facilitating integration of both legacy and newly-developed resources
• Flexible workflows• Main challenges centre around metadata,
interoperability, resource discovery, etc;• High-performance computing secondary
(though relevant)
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
3
BiodiversityWorld requirements (2)• Distinctive features:
• a biodiversity informatics GRID• interoperability with heterogeneous data, complex in
structure• resilience to infrastructure change & interoperation with
other GRIDs• interactive collaboration a secondary concern
• Assumptions about resources:• A resource worked either:
• Essentially in ‘batch’ mode, or• Supporting a sequence of operations on a single resource, but
involving exchange of minimal data• Reasonable to treat each resource (including databases)
as a service offering its own, defined set of operations
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
4
BiodiversityWorld architectural overview
BiodiversityWorld-GRID Interface (BGI)
The GRID
Workflow enactment
engine Wrapped resources
Native Biodiversity-
World Resources
Metadata repository
Presentation
BGI API
User interface
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
5
The BGI concept
• Standardised invocation mechanism
• Wrappers notionally divided into Grid-facing and resource-facing parts
1 <<abstract>> BdwAbstractWrapper
<<interface>> BgiWrapperInterface
Bgi Implementation_1
Bgi Implementation_2
Concrete Wrapper_1
Concrete Wrapper_2
1
. . .
. . .
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
6
Why we protected ourselves from ‘the Grid’(!)
• Rapidly evolving standards• Previous experience in GRAB
• Globus 2 approach needed ‘canned queries’, temporary files, etc … unnatural for distributed request/response model
• BiodiversityWorld• Globus and other software still evolving
• Globus 3: Grid Services; Globus 4: WSRF; …
• Trade-off: abstraction layer (BGI); invocation mechanism• Insulates from change• Performance penalty
• Assume computationally intensive applications lie in a single BDW resource
• Proprietary invocation mechanism hinders interoperation with other Grid/Web services
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
7
Implementations of BGI
• RMI
• GT3 Grid Services (incomplete)
• Web services
• GT4/WSRF/Grid-Service-as-portal
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
8
Benefits & limitations
• Too many standards, so we defined a new one!!• Interoperability with other projects restricted
• Could wrap non-BDW resources, or• Implement alternative Grid-facing “glue” replacing
invocation mechanism with some other standard• Restrictions on highly interactive applications
• BGI OK for coarse-grained interaction; not for dynamic interaction with potentially large data volumes
• Transmission and storage of intermediate results: method not specified• Can pass URI instead of data, but no specifications
restricting what this might refer to
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
9
Transmission/storage of data
• Desirable to have uniform mechanisms for transmission and storage of data for:• Efficient operation of workflows• Re-use; composition of workflows• Supporting more flexible experimentation
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
10
Are workflows sufficient for flexible experimentation?
• Creating a workflow:• Workflows clearly good for capturing complex tasks
• Good for ‘tweaking’ tasks• But is this how users think?• If not, we should provide an environment that supports a
more exploratory approach too, e.g.• User tries out some small subtasks• (S)he joins results together• Builds larger workflows from fragments
• This requires recording of interactions, so re-usable workflows can be composed
• Storage of intermediate data sets• Provenance metadata (extending MDR)
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
11
How to achieve dynamic interaction?
• Some possibilities for future development• Remote direct manipulation (And other remote interactions?)
• BGI not well suited to fine-grained interaction with resources• Some resources may not be accessible except as stand-alone• May need (less portable) ‘by-pass’ mechanisms, e.g.
• New BGI protocol• Using existing techniques, such as VNC
• Local direct manipulation, etc.• Achievable via component-based ‘plug-in’ approaches (e.g. using
JavaBeans), but component interface must be defined• Requires data to be present locally; bandwidth concerns• Some bandwidth problems can be addressed by combining local
specialised client component & remote server component (e.g. passing vectors, not bitmaps)
• BGI may or may not be fast enough in this case
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
12
How to achieve data transmission/intermediate result storage?
• Low level• E.g. orchestrate facilities such as GridFTP,
GRAM, …
• Higher-level• E.g. Inferno, SRB
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
13
Additional considerations
• Again, have problem of committing to other, evolving standards
• Need at least a thin API layer to protect resources from change
• And don’t want to break existing BDW system
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
14
More direct database exploitation with OGSA-DAI
• BioDA project is investigating relevance & suitability of OGSA-DAI in relation to bioinformatics projects
• 2 main possibilities within BDW:1. Augment BGI to support inclusion of queries in workflows and to be
sent directly to OGSA-DAI enabled databases.• Distributed query processing facilities could assist in planning execution
& distribution of data-orientated parts of a workflow. (For the current status of OGSA-DQP see Section 4.) • Very major revision to BDW protocols; also,• many resources of interest are simply not exposed as databases.
2. Provide facilities within individual wrappers that benefit from OGSA-DAI.
• Current exemplar (under development) takes approach (2) …
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
15
BDW OGSA-DAI initial exemplar
8. getOutPut()
OGSA-DAI R5 GDS
Format file (xsl)OGSA-DAIClient
1. BGI()
1. BGIinvokeOperation
BDWQueryActivity
Wrapper Module
WrapperWrapperWrapperWrapperWrapperWrapper
2. Create GDS
and query
3. Invoke wrapper
Web DBs
4. Query
Web DBs
4. Query
deliverFromURL(url)
5. Download URL
XSLTransform
6. url
7. XSL transform to BDW
format
pull data8. getOutPut()
OGSA-DAI R5 GDS
Format file (xsl)OGSA-DAIClient
1. BGI()
1. BGIinvokeOperation
BDWQueryActivity
Wrapper Module
WrapperWrapperWrapperWrapperWrapperWrapper
2. Create GDS
and query
3. Invoke wrapper
Web DBs
4. Query
Web DBs
4. Query
deliverFromURL(url)
5. Download URL
XSLTransform
6. url
7. XSL transform to BDW
format
pull data
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
16
BDW OGSA-DAI exemplar extension
OGSA-DAI R5 GDS
7. XSL transform to BDW format
XSLTransformXSLTransformXSLTransformXSLTransformXSLTransformXSLTransform
mergeOutputOGSA-DAI
Client
1. BGI([ ])
1. BGIInvokeOperation ([ ])
8. integrate output
deliverToURL /GFTP
9. To WF unit
OGSA-DAI R5 GDS
7. XSL transform to BDW format
XSLTransformXSLTransformXSLTransformXSLTransformXSLTransformXSLTransform
mergeOutputOGSA-DAI
Client
1. BGI([ ])
1. BGIInvokeOperation ([ ])
8. integrate output
deliverToURL /GFTP
9. To WF unit
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
17
Conclusions
• BDW interoperation layer designed to meet requirements we were given• Suitable for high-level interactions• Not so good for dynamic interaction with resources (need
for this now generally recognised)• Doesn’t specify how data is to be moved around
• Applicable to other domains meeting similar criteria• Interesting possibilities for extension• But we have achieved a sustainable architecture;
this is an important feature to retain in future systems
BiodiversityWorld GRID WorkshopNeSC, Edinburgh – 30 June and 1 July 2005
Andrew JonesInterop. in changing infrastructure
18
Some discussion points(Arising from Jaspreet’s and Andrew’s talks)
1. Balance of requirements for different kinds of GRIDS – (performance, resource discovery, sustainability, …) – how does this affect decisions about architectures, protocols, … ?
2. How can BDW protocols best be enhanced in future projects?
3. How can we best achieve interoperability between grids from different projects (including BDW)?
4. How can we make it easier for 3rd parties to• Introduce their resources to an existing
BgiWrapperService?• Develop their own additional BgiWrapperServices?