www.neresc.ac.uk ogsa-dqp - a service-based distributed query processor for the grid arijit...
TRANSCRIPT
www.neresc.ac.ukwww.neresc.ac.uk
OGSA-DQP - A Service-Based Distributed Query Processor for The Grid
OGSA-DQP - A Service-Based Distributed Query Processor for The Grid
Arijit MukherjeeUniversity of Newcastle
Arijit MukherjeeUniversity of Newcastle
2www.neresc.ac.ukwww.neresc.ac.uk
Motivation behind OGSA-DQP
1) High level Data Access and Integration services are needed if data-intensive distributed applications running on heterogeneous platforms are to benefit from the Grid.
2) Emerging standards for Data Access - OGSA-DAI supports exposure of data resources onto Grids.
3) DQP is an approach to deliver (1) given the availability of (2)
2
3www.neresc.ac.ukwww.neresc.ac.uk
Service-based in what sense?
OGSA-DQP is service-based in two orthogonal senses -
Supports querying over data storage and analysis services factored out as services
Hence resource virtualisation via SOA
Construction of the distributed query plan and their execution over the grid are factored out as services
3
4www.neresc.ac.ukwww.neresc.ac.uk
OGSA-DQP Goals
To benefit from homogeneous access to heterogeneous data sources [OGSA-DAI].
To benefit from Grid abstractions for on-demand allocation of resources required for a task [Condor, OMII, GT*].
To provide transparent, implicit support for parallelism and distribution. [Polar*]
To orchestrate the composition of data retrieval and analysis services.
To expose this orchestration capability as a Grid data service.
4
5www.neresc.ac.ukwww.neresc.ac.uk
OGSA-DQP Approach
OGSA-DQP uses a middleware approach.
It can be seen as a mediator over OGSA-DAI wrappers.
It promises bottom-lines regarding:
efficiency: “leave to schedule in parallel”;
effectiveness: “leave to to orchestrate your services”;
usability: “use it as a Grid data service”.
DBMS
data
OGSA-DQP
DBMS
data
Query Results
OGSA-DAI OGSA-DAI
5
6www.neresc.ac.ukwww.neresc.ac.uk
OGSA-DQP Innovations
OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator.
This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule.
OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not.
6
7www.neresc.ac.ukwww.neresc.ac.uk
OGSA-DQP Architecture
Extends the OGSA-DAI with two new services
Grid Distributed Query ServiceExposed to clientFinds and retrieves service descriptionsParses, compiles, optimizes, schedules the query execution plans over a union of distributed data resources
Query Evaluation ServiceNot exposed to the clientImplements the physical query algebraImplements the query execution model and semanticsEvaluates a partition of the query execution plan generated by the GDQSInteracts with other QESs/GDSs/Web Services
7
8www.neresc.ac.ukwww.neresc.ac.uk
Example Query Plan
“select p.proteinId, Blast(p.sequence) from p in protein, t in proteinTerm
where t.termId=‘GO:0008372’ and p.proteinId=t.proteinId”
select(proteinId, sequence)
table_scan(proteinTerms)
(termed=ABCD)
table_scan(proteins)
select(proteinId, sequence)
scan(proteinTerms)
(termed=ABCD)
scan(proteins)
select(p.proteinId, blast)
operation call(blast(p.sequence))
join(p.proteinId=t.proteinId)
select(proteinId)
select(p.proteinId, blast)
operation_call(blast(p.sequence))
hash_join(p.proteinId=t.proteinId)
select(proteinId)
(a) Single-node logical plan
(b) Single-node physical plan
select(p.proteinId, blast)
operation_call(blast(p.sequence))
exchange
exchange exchange
hash_join(p.proteinId=t.proteinId)
select(proteinId)
select(proteinId, sequence)
table_scan(proteinTerms)(termed=ABCD)
table_scan(proteins)
4, 5
3, 6
2, 3
6
(c) partitioned plan
8
9www.neresc.ac.ukwww.neresc.ac.uk
Another Example
9
10www.neresc.ac.ukwww.neresc.ac.uk
OGSA-DQP: Query Evaluation
Query installation stage:
As many QE services are utilised as there are partitions specified.Each partition is sent to the QE service it is scheduled for.
Query evaluation stage:
Each QES evaluates its partition using an iterator model.Queries execute under pipelined and partitioned parallelism.Results are conveyed to client.
10
11www.neresc.ac.ukwww.neresc.ac.uk
OGSA-DQP Execution Flow
GDQSGDQS GDS-1GDS-1
GDS-2GDS-2
Web ServiceWeb Service
ClientResource list
wsdl
schema
OQL Parser Logical
OptimizerPhysical Optimizer
SchedulerPartitioner
Polar* Query Optimizer Engine
query
schema
QES
QES
QES
Partition1
Partition3
partition2
results
11
12www.neresc.ac.ukwww.neresc.ac.uk
What we provide
Resource virtualisation through a service-oriented architecture:
Data Resource Discovery using service registries;Computational Resource Discovery via Index Services (not implemented yet);Reliance on GDSs for metadata and data access
Coarse-grained services with document-oriented interfaces
By acquiring and manipulating data in a data-flow architecture that is constructed dynamically, OGSA-DQP constructs, on-the-fly, a lightweight Distributed Query Processing Engine.
12
13www.neresc.ac.ukwww.neresc.ac.uk
Timeline
Release 1.0 in September 2003Improved Release 2.0 in July 2004 (based on GT3.2 and OGSA-DAI 4.0)Around 700 downloadsNew Release 3.0 - coming soon!What’s new:
Based on OGSA-DAI R7.0GDQS closer to OGSA-DAIGQES refactored as QES (WS-I)
13
14www.neresc.ac.ukwww.neresc.ac.uk
Working on…
More friendly (!) - Use SQLMore portable - Support Cygwin, Solaris for the compiler/optimiser
Cygwin - DONE
Better performance – we are working on itSome bottlenecks removed
More functional - Semi-structured data; Streams.Working on incorporating XML DBs
More dynamic - Use Index Services; dynamically install services
QES is DynaSOAr-READY
More application test-beds - Sensor networks.More adaptive - Queries may be long running, environment is constantly changing - static optimisation is likely to become stale fast. Monitor, assess and respond (e.g., switch operators/ algorithms, spawn more copies, relocate).
Ongoing
More widely deployable - As OGSA-DAIIntroduce Virtual Machines
Looking into it.
14
15www.neresc.ac.ukwww.neresc.ac.uk
Where to find out more
Papers -M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. Service Based Distributed Querying on the Grid. 1st International Conference on Service Oriented Computing, 2003, LNCS 2910M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith . OGSA-DQP: A Service for Distributed Querying on the Grid, in Proceedings of the Advances in Database Technology - EDBT 2004, LNCS 2992M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. An Experience Report on Designing and Building OGSA-DQP: A Service Based Distributed Query Processor for the Grid. GGF9 Workshop on Designing and Building Grid Services, 2003.M N Alpdemir, A Mukherjee, A Gounaris, N W Paton, P Watson, A A A Fernandes, J Smith. OGSA-DQP: A Service-Based Distributed Query Processor for the Grid. 2nd UK e-Science All Hands Meeting, 2003.J Smith, A Gounaris, P Watson, N W Paton, A A A Fernandes, R Sakellariou. Distributed Query Processing on the Grid. GRID 2002, LNCS 2536
(papers available at http://www.cs.ncl.ac.uk/research/pubs/authors/byType.php?id=110 )
Software - http://www.ogsadai.org.uk/dqp
15
16www.neresc.ac.ukwww.neresc.ac.uk
Peoples and Partners
Prof. PaulWatson
Dr. Jim Smith
Arijit Mukherjee
Prof. Norman Paton
Dr. Alvaro AA
Fernandez
Dr. Rizos Sakellariou
Anastasios Gounaris
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Steven Lynden
16
www.neresc.ac.ukwww.neresc.ac.uk
Thank You?
Thank You?