syntactic mediation in grid and web service architectures
TRANSCRIPT
Syntactic Mediation in Grid and Web Service
Architectures
Martin Szomszor
Terry R. Payne
Luc Moreau
University of Southampton
myGrid [http://www.mygrid.org.uk]
Syntactic Mediation… Cambridge online dictionary defines mediation as: Verb [I or T]
“to talk to two separate people or groups involved in a disagreement to try to help them to agree or find a solution to their problem”
Syntax The structure and organisation of information
Syntactic mediation To mediate between two parties who not agree on
syntax
… in Grid and Web Services
Questions to be answered When is syntactic mediation required? What are the current solutions? How can they be improved?
Use Case Taken from a Bioinformatics Grid application
Our solution Intermediary representation of data in OWL Mappings between XML and OWL
Conclusions and further work
Grid and Web Services Grid
Coordinated resource access Service oriented view of resource access
Web Services Architecture Supporting the service oriented view Resources exposed through Web Service
interfaces This powerful model enables complex
collaboration of disparate resources, typically through the use of workflow
myGrid - Bioinformatics
In-silico experimentation myGrid is service orientated
Steps in the experimental process correspond to Web Service invocations
Taverna - The Virtual Workbench Compose Edit Execute Monitor View results
Taverna Workbench
A Bioinformatics Use Case
Get some sequence data and perform a sequence alignment Sequence alignment checks for similarities
between different sequences Many sequence data repositories
XEMBL DDBJ-XML
Many sequence alignment services NCBI Blast …
Bio Example - Workflow Two stages More than one service for some stages
AccessionNumber
Get Sequence
Data
Get Sequence
Data
SequenceData
AlignmentResult
SequenceAlignment
SequenceAlignment
XEMBLServiceXEMBLService
DDBJ-XMLService
DDBJ-XMLService NCBI BlastNCBI Blast
Bio Example - Workflow Two possible concrete workflows
Using XEMBL Service
Using DDBJ-XML Service
AccessionNumber
XEMBLService
XEMBLService
SequenceData
AlignmentResult
NCBI BlastService
NCBI BlastService
AccessionNumber
DDBJ-XMLService
DDBJ-XMLService
SequenceData
AlignmentResult
NCBI BlastService
NCBI BlastService
Syntactic Compatibility
Is the output from one service compatible with the input to another service?
XEMBLService
XEMBLService
SequenceData
NCBI BlastService
NCBI BlastService
XEMBLService
XEMBLService
BSML Sequence Record
NCBI BlastService
NCBI BlastService
FASTA FormattedSequence
Conceptually the same typeBut different syntactic types
Syntactic Mediation
When a syntactic miss-match occurs, some additional processing is required
We define this processing as syntactic mediation
Current solutions Explicitly defined transform (e.g. XSLT) Hard coded black-box components
Assisted Mediation
How can we achieve this? Use ontologies for a common conceptual model of
the data Describe how parts of the data schema (XML-
Schema) correspond to parts in the semantic schema (OWL)
Transform data sets into their corresponding conceptual representations
Use conceptual representation as a common model to mediate between services's data models
Syntactic Mediation
XEMBLServiceXEMBLService
NCBI BlastService
NCBI BlastService
BSMLSequenceRecord
AccessionNumber
FASTA Formatted Sequence
AlignmentResults
Conceptual representation of sequence dataTransform XML output
from XEMBL serviceto OWL concept instance
Serialise OWL conceptinstance to XML for input
to NCBI Blast Service
Bioinformatics Use Case: Sequence Data Ontology
Sequence_Data
descriptionaccession_idsequencehas_featurehas_reference
authorsjournaltitle
Reference
Sequence_Feature
location
Feature_Source
lab_hostisolatemol_typeorgansim
Feature_CDS
translationproductprotien_id
Sequence_Location
startend
BSML_Sequence_Data
date_createddate_last_updated
DDBJ_Sequence_Data
versiondivision
Key:Object property
Subconcept
Transformation of XML to OWL
<Sequence ic-acckey="AB000059"> <Feature-table> <Feature class="SOURCE"> <Qualifier value-type="isolate” value="Som1"/> <Qualifier value-type="organism” value="Feline …”/> <Interval-loc startpos="1" endpos="1755"/> </Feature> </Feature-table></Sequence>
Sequence Data
AB000059
Accession_IDFeature Source
Location
Som1 Feline …
1 1755has-Feature
isolate organism
has-location
start end
Transformation of OWL to XML
Sequence Data
AB000059
Accession_ID Feature Source
LocationSom1 Feline …
1 1755
has-Feature
isolate organism has-location
start end
<DDBJXML> <ACCESSION>AB000059</ACCESSION> <FEATURES> <source> <location>1..1755</location> <qualifiers name="isolate">Som1</qualifiers> <qualifiers name="organsim">Felis ...</qualifiers> </source> </FEATURES></DDBJXML>
Mapping Language We present a simple mapping language to
describe the transformation of data from XML to OWL
Mappings are bi-directional XML to OWL OWL to XML
Hierarchical view of both data models Template is applied to match source model and
create destination model Data values are mapped from source to
destination via variable assignment
Example Mapping 1
Mapping an attribute value
<Sequence ic-acckey="AB000059"> Sequence Data
AB000059
Accession_ID
{xml}Sequence[ic-acckey = $accession]<->{owl}Sequence_Data( Accession_id($accession))
$accession
Example Mapping 2
Mapping an element value
Sequence Data
aatagagtg…
sequence
{xml}Sequence( seq-data($sequence))<->{owl}Sequence_Data( sequence($sequence))
$sequence
<Sequence> <seq-data>aatagagtg…</seq-data></Sequence>
Example Mapping 3
Mapping an group of element values<Reference> <RefAuthors>Horiuchi M.</RefAuthors> <RefTitle>evolutionary…</RefTitle> <RefJournal>Unpublished</RefJournal></Reference>
Reference
Horiuchi, M.
author
{xml}Reference( RefAuthors($author), RefTitle($title), RefJournal($journal))<->{owl}Reference( author($author), title($title), journal($journal))
Evolutionary…
Unpublished
title
journal
Example Mapping 4
Split and join
<location>1:1755</location>
Sequence_Location
1
start
{xml}location( split($start, “:”, $end))<->{owl}Sequence_Location( start($start), end($end))
1755
end
Example Mapping 5 Mapping an sequence of elements
<feature-table> element contains a sequence of <Reference> elements
<feature-table> <Reference> <RefAuthors>Horiuchi M.</RefAuthors> <RefTitle>evolutionary…</RefTitle> <RefJournal>Unpublished</RefJournal> </Reference> <Reference> <RefAuthors>Horiuchi M.</RefAuthors> <RefJournal>EMBL/GenBank/DDBJ…</RefJournal> </Reference></feature-table>
Example Mapping 5
Reference
Horiuchi, M.
author
Evolutionary… Unpublished
title
journal
Reference
Horiuchi, M.
author
EMBL/GenBank/DDBJ…
journal
Sequence_Data
has-reference has-reference
Each <Reference> element corresponds to an instance of the ‘Reference’ concept
Example Mapping 5
{xml}feature-table( Reference( RefAuthors($author), RefTitle($title), RefJournal($journal) )…)<->{owl}Sequence_Data( Reference( author($authors), title($title), journal($journal) )…)
The ellipsis (…) construct is used to denote this behaviour
sequence
Mapping Language (BNF)
Mapping Language Engine
Java component built on Dom4J and Jena Input: Mapping and Data set (XML or OWL) Output: Data set (XML or OWL)
Transformation Process
Four stages Create data model for input
DOM4J for XML JENA for OWL
Parse mapping Apply source mapping to bind variables Apply destination mapping and create new
instance filling variables with their corresponding values from the source
Assisted data mediation for WS
XEMBLServiceXEMBLService
NCBI BlastService
NCBI BlastService
BSMLSequenceRecord
AccessionNumber
FASTA Formatted Sequence
AlignmentResults
Conceptual representation of sequence dataTransform XML output
from XEMBL serviceto OWL concept instance
Serialise OWL conceptinstance to XML for input
to NCBI Blast Service
Assisted data mediation for WS
XEMBLServiceXEMBLService
NCBI BlastService
NCBI BlastService
BSMLSequenceRecord
AccessionNumber
FASTA Formatted Sequence
AlignmentResults
Mapping Language
EngineMapping
Mapping Language
EngineMapping
OWL instance
Comparison of Approaches
Direct mappings between compatible data formats
Mappings from data format to common conceptual model
Number of Mappings 1 mapping for each pair of compatible data formats. For n compatible formats, n2
mappings required
1 mapping from each data format to its conceptual model. For n compatible data formats, n mappings required
Addition of Data Must create mappings to all other compatible formats
Create one mapping to conceptual model
Why OWL? Expressive Power
Complex concept specifications
Reasoning power Subsumption and concept classification
Classification of data Given an XML document and some mappings, it
would be possible to find which mappings are valid and therefore what are the possible conceptual models for the given document
Conclusions
By using mappings from XML documents to OWL instances, we can automatically transform a portion of data from one representation to another, providing both share a common conceptual model
This can be used to rectify data incompatibilities that occur in workflows with data flow between services - assisted mediation
Further Work Add regular expression support
More powerful than our existing split and join operators
Mapping Repository Current implementation assumes that mappings
are know We cannot assume that end users are able to
write mappings Ideally, a repository of mappings would be available to
users allowing them to find the appropriate mapping Domain experts could create mappings and upload them
to the repository Would allow us to support further automation
Suggest mappings to user
Questions and comments?