OMII-UK Software Activities
Steven Newhouse, Director
©2
Our Mission…
OMII-UK aims to provide software and support to enable a sustained future for
the UK e-Science community and its international collaborators
•Promote the use of good-quality open-source software•Reduce the risk of moving to new e-infrastructure world•Recognise distinct user communities: by domain and function
©3
Manchester
Southampton
Edinburgh
University of Manchester
Electronics and Computer Science
University of EdinburghEuropean Bioinformatics Institute
The OMII-UKPartnership
Cambridge
•Southampton: 14 FTEs•Manchester: 9 FTEs•Edinburgh: 8 FTEs•Community: ~8 FTEs
©4
OMII-UK in context
Ad hoc e-Infrastructure services
e-Science UsersEPSRC ESRC STFC
NERC BBSRC MRC AHRC
e-Infrastructure Services to enable e-Science Globus, gLite, CROWN, NAREGI, Web Services, ..
Higher-level services & tools
National Grid Services
Data Compute
Bio
info
rmat
ics
Use
rs
Eng
inee
ring
Use
rs
Info
rmat
ion
Ret
rieva
l(J
ISC
)
Indu
stria
l Par
tner
s
OMII-UKServices
Organisation & Composition
International Grid Providers
©5
User Communities
AppliedResearchDomain
Casual User(Novice
or Infrequent)
Intensive User(Expert
or Focused)
Technologists
Assemblersof domain
Components/Services/Tools
Buildersof domain
Components/Services/Tools
Assemblers of generic
Components/Services/Tools
Builders of generic
Components/Services/Tools
Providers
VO Managers
ResourceOwners
Helpdesk &Training
SystemAdministrators
Applied e-Researchers Technology Specialists e-Infrastructure Providers
Users
Applied Technology Specialists e-Infrastructuree-Researchers (domain & generic) Providers
©6
Broad Software Activities Taverna
Composing workflows across distributed resources
OGSA-DAI Web service to integrating heterogeneous data resources
GridSAM Web service for jobs submission and job monitoring
GRIMOIRES Support for service publishing, discovery & annotation
BPEL Supporting scientific workflows through web services
Taverna
©8
Taverna - Background Emerged from myGrid project (UK EPSRC) Integration & Interoperability very difficult
Cut and paste between data sources & services Can translate data formats by using other services Quickly realised that this was not a viable solution…
Focus on the challenges within the bioinformatics community Everything is distributed: Data, Services and Scientists Heterogeneous data sources Many specifications: I/O, data representation, annotation Everything is a string!
©9
Traditional Bioinformatics
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
Manual workflow by PhD stu
dent
©10
Workflow language specifies how processes fit together
High level workflow diagram separated from any lower level coding – you don’t have to be a coder to build workflows
Workflow is the script or protocol used that you configure when you run it
Workflow is the integrator of knowledgeProvides automation and repeatabilityEasily share & customise workflows
Workflows
©11
Taverna Workflow Components
SCUFL Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available
SOAPLABWeb Service
Any Application
Web Service e.g. DDBJ BLAST
©12
Adding your own processes Consume web services SoapLab
Expose applications as web services
Java API Consumer
import Java API of libSBML as workflow components
http://www.ebi.ac.uk/soaplab/
©13
Shield the Scientist – Bury the Complexity
Workflow enactor
Processor Processor
PlainWeb
Service
Soaplab
Processor
LocalJavaApp
Processor
Enactor
Processor
BioMOBY
Processor
WSRF
Processor
BioMART
Styx
Styxclient
Processor
Rpackage
...
...
Scufl Model
TavernaWorkbench
Workflow Execution
Application
Simple Conceptual Unified Flow Language
©14
Related Challenges Community Services
Services: Web Services, Web Forms, Local processors Need to be annotated and made available for discovery
Use Semantic Web Technologies Annotate by function, type input & output using ontology
Discover services by function in Feta Experimental workflow needs services to interact with
Provenance Record the services run within a workflow Be able to go back and replay Quality of the results based on the quality of input
OGSA-DAI
Mario [email protected]
©16
What is OGSA-DAI? Middleware providing data access and integrations capabilities
Targeted at application developers Provides access to data through Web Services
Uniform access interface Data integration capabilities
Support different types of data Relational XML File system(note: it does not virtualize the underlying data model)
Enacts a simple but powerful workflow: Query-Transform-Delivery-etc Encapsulates multiple service interactions in one Move computation closer to data
©17
Why use OGSA-DAI? Use of Web Services
Platform independence/Language neutrality Transparencies
Location/Product Additional security layers
service-level/resource-level Provides extensive base functionality:
Query (SQL, XPath, XQuery, XUpdate, …) Transformation (XSL, Compression, …) Delivery (ftp, GridFTP, SOAPwAttachments, …) Extensible (add your own!)
Non-SOAP delivery based mechanisms
©18
More reasons … Out of the box solution
Saves application developer time Extensible and Versatile Framework
Can add or customise capabilities Plays Nicely with Other Grid Middleware
OMII-UK Distribution Globus Toolkit 4.0.* Tomcat with AxisSoon: UNICORE 6 (OMII-Europe) gLite 3 (OMII-Europe)
Good documentation
©19
Using OGSA-DAIData Source
Client
OGSA-DAI
Application-specific service
Data Source
Application-specific service
Client
Control message
Data message
Data Source
OGSA-DAI
Application-specific
functionality
Client
©20
Usage Scenarios
Data Source
Client
OGSA-DAI
Control message
Data message
Data Source
Client
OGSA-DAI
FTP Server on Client
Data Source 1
OGSA-DAI
Data Source 2 Data Source n
Client
©21
Relational Multi-ResourcesMultiple Data Resource
AccessorMulti
Resource
Data Service
Resource
MultiOne
Data
Service
Data
Service
Relational
SQL Query
Multi Results
SQL Query
Results x M
Data Service
Resource
Data
Service
Relational
SQL Query SQL QueryResults Results
SQL QuerySQL Query ResultsResults
SQL Query SQL QueryResults Results
More sophisticated capabilitiesoffered in conjunction with
OGSA-DQP
©22
Service ModelData
Resource Accessor
Relational
XMLDBData
Resource Accessor
Data Resource Accessor
Data Service
Resource
Files
Data Service
Resource
Data Service
Resource
Data
Service
Support differentmessaging infrastructures
Perform Document
Response Document
Support differentdata source types
Core functionality
©23
Inside a Perform Document
DB Query
Delivery
block
block
Produces data in blocks
Stores and provides access to data blocks
Consumes data blocks
Pipe
©24
Activity Types Resource-specific
Relational XMLDB Files Multi-resources
Transformation Delivery Resource creation and destruction Can extend/customise the framework
©25
Recap Request
XML perform document submitted by client Contains:
Connected set of activities to be executed by data resource Flow control
Sequential/parallel execution of activities An activity has
An individual data-related operation Has 0 or more inputs and 0 or more outputs
Data Resource Service Parses requests, executes activities, builds responses
Response Status of execution of a request possibly with result data XML response document returned to a client
©26
Next Release Complete re-write Release scheduled for Q2 2007
OGSA-DAI 3.0 (OMII-UK) OGSA-DAI 3.0 (GT4.0.3)
Offer a number of advantages …
©27
For the next release … Request model
Specify multiple data resources in a single request Support more complex scenarios involving multiple data
resources Activity framework
Improved support for: Streaming of data Handing BLOBs and binary data
Simpler activity API and activity input/output model Support for iteration of activities in a single request
Resource model Data resources, sessions, requests, data sources, data sinks
©28
…and … Core APIs
Drive OGSA-DAI core functionality down from the presentation layer Easier to write application-specific presentation layers
Persistence Persist the state of services and resources within files/database
Security Pluggable resource and activity authorization framework Call outs to databases or remote delegation services Improved support for:
Message-level security Transport-level security Delegation
Scalability and robustness Provision for future support to execute parts of a request on different JVMs Provision for future support of clustering and load balancing of
requests to different OGSA-DAI servers
©29
Further Information The OGSA-DAI project site
http://www.ogsadai.org.uk
The DAIS-WG site http://forge.gridforum.org/projects/dais-wg
OGSA-DAI users mailing list [email protected] General discussion on OGSA-DAI, data and the grid
Formal support for OGSA-DAI releases http://www.ogsadai.org.uk/support [email protected]
GridSAM
©31
GridSAM A Job Submission and Monitoring Web Service Supports Job Submission Description Language
(JSDL) It is not…
A scheduling service That’s the role of the underlying launching mechanism That’s the role of a super-scheduler that brokers jobs to a
set of GridSAM services A provisioning service
GridSAM runs what’s been told to run GridSAM does not resolve software dependencies and
resource requirements
©32
What is GridSAM to end-users? A set of command line tools and client APIs to:
Submit and Start Jobs Monitor Jobs Terminate Jobs File transfer Client-side submission scripting Client-side Java API
©33
What is GridSAM to resource providers? A Web Service to expose heterogeneous
execution resources uniformly Single machine through Forking or SSH Condor Pool Grid Engine 6 through DRMAA Globus 2.4.3 exposed resources OR use our plug-in API to implement …
©34
GridSAM Implementation Virtual File System API (Apache VFS)
FTP / GSIFTP / HTTP / WEBDAV / SFTP Event dispatches (OpenSymphony Quartz) Job Persistence (Hibernate - JDBC databases) Runtime Monitoring and Control (JMX)
©35
Latest Features JSDL extension to support MPI Applications Authorisation based on JSDL structure
Allow / deny submission based on a set of XPath rules and the identities of the submitter (e.g. distinguished name).
Tracking Basic Execution Service (ogsa-bes) HPCP Interoperation at SC06 Will implement future HPCP revisions
GRIMOIRES
©37
GRIMOIRES Service RegistryA Grimoire is a magician's manual for invoking demons
Web Service for a service registry (UDDI) Certificate-based authentication and access control Lifetime management of entries Register WSDL and annotate entries with metadata
Metadata from third party annotation
Search for services Through WSDL interface Through metadata annotation
©38
Implementation An open source implementation in Java Interoperable with standard UDDI tools,
UDDI browser, uddi4j
Deployable in multiple environments: Tomcat/Axis and the OMII-UK software environment GT4: Expose registry entities as WS-Resources
Support WS-ResourceProperties, WS-ResourceLifetime, and WS-Notification
Use RDF triple store behind the scene Link entries to support reasoning search interface
©39
Features in Progress XPath and XML capability
Publishing XML documents XPath-based query
Scalability and performance improvement Collaboration with EGEE
Replication support based on WS-Notification Notification and replication support outside WSRF
BPEL
©41
BPEL: Flexible Orchestration of Scientific Workflows BPEL: Business Process Execution Language BPEL is the industry standard for orchestration
of web services Multiple providers both commercial and open-
source Funded project integrates open-source
components for science Visual modelling environment (Eclipse project)
UCL Developer with committer status BPEL Enactment engine (ActiveBPEL)
©42
The OMII-UK BPEL Project Make the benefits of BPEL accessible to
application scientists Need to overcome a number of issues:
Provide suitable set of abstractions to simplify creation of scientific workflows
Provide tool support to hide complexity of technologies inherent to BPEL
Provide integration of various middleware technologies
UCL Department of Computer Science
©43
Example Integration
UCL Department of Computer Science
©44
Graphical BPEL Designer Developed in cooperation with IBM and Oracle An Eclipse plug-in & project Features (will) include:
Pre-deployment validation Automated deployment to various BPEL engines:
ActiveBPEL, JBOSS, jBPM and many more Automated client generation to run deployed workflows Integration with graphical WSDL, XSD editors Project management including cvs Context-sensitive wizards
UCL Department of Computer Science
©45
ActiveBPEL Engine Open source implementation of BPEL4WS 1.1 Support for WS-BPEL 2.0 via extensions Industrial-strength BPEL workflow enactment engine
Scalable Persistence Hot-deployment
Provides Web-based management & monitoring console
UCL Department of Computer Science
©46
Scientific Workflows Need to combine web/grid services into:
Larger services Experiments that connect services & processes
Experiments need to be changed frequently to incorporate new insights and ideas
Application scientists need ownership of their workflows
Challenges: Large data, Many services & invocations, concurrent workflows (parameter sweeps)
UCL Department of Computer Science
Community Interactions
©48
Support & Training Provide confidence in adopting
e-Science solutions through software support and training.
Provide collaborative mechanisms to enable the e-Science community to help itself.
Engage with the international community to define, contribute and disseminate best practice and standards.
ISSGC06
©49
Standards Engagement:Consume & Contribute Open Grid Forum
Membership of the Board of Directors, Standards & e-Science
Active in WGs: OGSA-WG, BES-WG, JSDL-WG, DIAS-WG, Byte-IO,
W3C Semantic Web/Grid (e.g. OWL, RDF) Healthcare and Life Sciences SIG (LSIDs) OGF Liaison
OASIS WSRF membership Track relevant specifications
Tracking European Standards Developments
©50
Website Information:
General pages - news, events & wiki (with RSS feeds) About software being used in the community Communities & projects using our software
Software: Download individual software components
Work that we have commissioned Interim releases directly from individual development teams
Contributions from the community Integrated software release (client & server bundles)
Wizards to install Tomcat, Axis, Services, Database, …
©51
e-Science Community
Informationabout
softwarethat you
have founduseful
Open SourceDevelopment Community
Information aboutsoftware that you
are developing thatothers might use
Contributing softwareinto the
Repository
Open SourceDevelopers funded by OMII-UK
in the community
Website & Wiki SoftwareRepository
Software Catalogue
Register a softwareproject or activity
Website Interactions
©52
OMII-UK User Community
Download individual software components
directly from repositoryOMII-UK Software Release
Software componentsintegrated and testedto form the OMII-UK
software release
-
AppliedDomain
Researchers
Technologists Providers
• Information about the community• Advice & consultancy• Community forums & feedback• Support & Training• Partnerships to provide software
Website & Wiki SoftwareRepository
Software Catalogue
©53
Commissioned Software Programme Invest in open-source community development
activities Approximately 8 FTEs a year Projects of 12-18 months in length Hardening of software not research
Funding mechanism Through specific calls (e.g. portlets, GridAPIs, …) Respond to specific proposals as they come in
Deposit outputs in the repository and NeSCForge
Increase confidence & accelerateadoption of open-source software
©54
Funding from recent calls SAGA Implementation (Shantenu Jha)
Java & C++ implementation on OMII-UK & GT Shibboleth enabled portal (Richard Sinnott)
Portlet to specify resource access control policy Portlets to manage attribute delegation
Artefact sharing framework for portlets (Ian Taylor) Controlled sharing of workflows
Still under negotiation: Sharing application descriptions based upon JSDL Execution of JSDL documents on resources Community outreach for new GridAPIs
©55
Very Flexible Funding Model Engagement with large open-source projects
Taverna, BPEL components, …
Development of community driven projects OGSA-DAI, GridSAM, Grimoires, …
Not just UK focussed Fund developers in USA & Amsterdam
Community not technology focussed Have own software release around plain WS Fund activity consuming GT: OGSA-DAI, SAGA, Grimoires
©56
Other projects Knoggle
An open architecture for matchmaking and brokering Consume GridSAM, Grimoires, BPEL & Taverna
Open Grid Manager Monitoring and reporting of resources Lightweight probes recording data in Grimoires
Application Hosting Environment Simplified lightweight interface to running applications Uses WSRF::Lite to provide secure Perl WS
Release 3.2.0
©58
Perspectives on release 3.2.0 Web Service Developer
Secure web services hosting environment Infrastructure to help you build services
Application Developer Everything a Web Service Developer wants Higher-level services that do ‘something’
Service Provider Easy installs, portability, composability – ‘pick & mix’
End-User Greater focus on the ‘client’ side Workflow composition and execution
©59
For a Web Service Developer Tomcat, Axis & WS-Security: Integrated and
deployed onto your machine Implementation of the WS-Eventing
specification Implementation of the WS-ReliableMessaging
and WS-Reliability specifications
©60
For an Application Developer Job Submission and Job monitoring web service
that uses the OGF’s Job Submission Description Language (JSDL) to describe jobs
UDDI compliant registry web service that can support the addition of extra service meta-data.
Using OGSA-DAI as a framework for querying, processing and delivering data from and between heterogeneous sources via a web service interface.
©61
For an End User Taverna: Graphical workflow composition tool able
to integrate different web, data and web service sources.
Packaging & contribution to the open-source Oracle/IBM BPEL workflow editor and ActiveBPEL execution engine.
A lightweight application hosting environment for running unmodified scientific applications across different grid infrastructures which uses WSRF::Lite - a Perl implementation of the WS-RF specifications
Accessing web and grid services from Matlab and Jython environments
©62
Next Production Release Updates from OGSA-DAI, Taverna, GridSAM, Grimoires Open Grid Manager: A framework for reporting on the
status of grid resources into Grimoires and viewing the collected results.
AuthZ Service: Integration of SAML based service for container wide authorisation policy.
MANGO: Reference application using BPEL and GridSAM.
Portlets: Integrating hosting environment and portlets.
TimelineSeries of development releases (3.3.x)Production release in April 2007 (3.4.0)
©63
Summary What can you do to get involved?
Let others know about your project Contribute a release of your software Join the beta-testing programme Download the complete software release or a component
More Information: Web: www.omii.ac.uk Contact: [email protected] Mail: [email protected]