www.neresc.ac.uk ogsa-dqp - a service-based distributed query processor for the grid arijit...

17
www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle

Upload: evan-fisher

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

www.neresc.ac.ukwww.neresc.ac.uk

OGSA-DQP - A Service-Based Distributed Query Processor for The Grid

OGSA-DQP - A Service-Based Distributed Query Processor for The Grid

Arijit MukherjeeUniversity of Newcastle

Arijit MukherjeeUniversity of Newcastle

Page 2: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

2www.neresc.ac.ukwww.neresc.ac.uk

Motivation behind OGSA-DQP

1) High level Data Access and Integration services are needed if data-intensive distributed applications running on heterogeneous platforms are to benefit from the Grid.

2) Emerging standards for Data Access - OGSA-DAI supports exposure of data resources onto Grids.

3) DQP is an approach to deliver (1) given the availability of (2)

2

Page 3: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

3www.neresc.ac.ukwww.neresc.ac.uk

Service-based in what sense?

OGSA-DQP is service-based in two orthogonal senses -

Supports querying over data storage and analysis services factored out as services

Hence resource virtualisation via SOA

Construction of the distributed query plan and their execution over the grid are factored out as services

3

Page 4: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

4www.neresc.ac.ukwww.neresc.ac.uk

OGSA-DQP Goals

To benefit from homogeneous access to heterogeneous data sources [OGSA-DAI].

To benefit from Grid abstractions for on-demand allocation of resources required for a task [Condor, OMII, GT*].

To provide transparent, implicit support for parallelism and distribution. [Polar*]

To orchestrate the composition of data retrieval and analysis services.

To expose this orchestration capability as a Grid data service.

4

Page 5: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

5www.neresc.ac.ukwww.neresc.ac.uk

OGSA-DQP Approach

OGSA-DQP uses a middleware approach.

It can be seen as a mediator over OGSA-DAI wrappers.

It promises bottom-lines regarding:

efficiency: “leave to schedule in parallel”;

effectiveness: “leave to to orchestrate your services”;

usability: “use it as a Grid data service”.

DBMS

data

OGSA-DQP

DBMS

data

Query Results

OGSA-DAI OGSA-DAI

5

Page 6: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

6www.neresc.ac.ukwww.neresc.ac.uk

OGSA-DQP Innovations

OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator.

This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule.

OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not.

6

Page 7: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

7www.neresc.ac.ukwww.neresc.ac.uk

OGSA-DQP Architecture

Extends the OGSA-DAI with two new services

Grid Distributed Query ServiceExposed to clientFinds and retrieves service descriptionsParses, compiles, optimizes, schedules the query execution plans over a union of distributed data resources

Query Evaluation ServiceNot exposed to the clientImplements the physical query algebraImplements the query execution model and semanticsEvaluates a partition of the query execution plan generated by the GDQSInteracts with other QESs/GDSs/Web Services

7

Page 8: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

8www.neresc.ac.ukwww.neresc.ac.uk

Example Query Plan

“select p.proteinId, Blast(p.sequence) from p in protein, t in proteinTerm

where t.termId=‘GO:0008372’ and p.proteinId=t.proteinId”

select(proteinId, sequence)

table_scan(proteinTerms)

(termed=ABCD)

table_scan(proteins)

select(proteinId, sequence)

scan(proteinTerms)

(termed=ABCD)

scan(proteins)

select(p.proteinId, blast)

operation call(blast(p.sequence))

join(p.proteinId=t.proteinId)

select(proteinId)

select(p.proteinId, blast)

operation_call(blast(p.sequence))

hash_join(p.proteinId=t.proteinId)

select(proteinId)

(a) Single-node logical plan

(b) Single-node physical plan

select(p.proteinId, blast)

operation_call(blast(p.sequence))

exchange

exchange exchange

hash_join(p.proteinId=t.proteinId)

select(proteinId)

select(proteinId, sequence)

table_scan(proteinTerms)(termed=ABCD)

table_scan(proteins)

4, 5

3, 6

2, 3

6

(c) partitioned plan

8

Page 9: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

9www.neresc.ac.ukwww.neresc.ac.uk

Another Example

9

Page 10: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

10www.neresc.ac.ukwww.neresc.ac.uk

OGSA-DQP: Query Evaluation

Query installation stage:

As many QE services are utilised as there are partitions specified.Each partition is sent to the QE service it is scheduled for.

Query evaluation stage:

Each QES evaluates its partition using an iterator model.Queries execute under pipelined and partitioned parallelism.Results are conveyed to client.

10

Page 11: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

11www.neresc.ac.ukwww.neresc.ac.uk

OGSA-DQP Execution Flow

GDQSGDQS GDS-1GDS-1

GDS-2GDS-2

Web ServiceWeb Service

ClientResource list

wsdl

schema

OQL Parser Logical

OptimizerPhysical Optimizer

SchedulerPartitioner

Polar* Query Optimizer Engine

query

schema

QES

QES

QES

Partition1

Partition3

partition2

results

11

Page 12: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

12www.neresc.ac.ukwww.neresc.ac.uk

What we provide

Resource virtualisation through a service-oriented architecture:

Data Resource Discovery using service registries;Computational Resource Discovery via Index Services (not implemented yet);Reliance on GDSs for metadata and data access

Coarse-grained services with document-oriented interfaces

By acquiring and manipulating data in a data-flow architecture that is constructed dynamically, OGSA-DQP constructs, on-the-fly, a lightweight Distributed Query Processing Engine.

12

Page 13: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

13www.neresc.ac.ukwww.neresc.ac.uk

Timeline

Release 1.0 in September 2003Improved Release 2.0 in July 2004 (based on GT3.2 and OGSA-DAI 4.0)Around 700 downloadsNew Release 3.0 - coming soon!What’s new:

Based on OGSA-DAI R7.0GDQS closer to OGSA-DAIGQES refactored as QES (WS-I)

13

Page 14: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

14www.neresc.ac.ukwww.neresc.ac.uk

Working on…

More friendly (!) - Use SQLMore portable - Support Cygwin, Solaris for the compiler/optimiser

Cygwin - DONE

Better performance – we are working on itSome bottlenecks removed

More functional - Semi-structured data; Streams.Working on incorporating XML DBs

More dynamic - Use Index Services; dynamically install services

QES is DynaSOAr-READY

More application test-beds - Sensor networks.More adaptive - Queries may be long running, environment is constantly changing - static optimisation is likely to become stale fast. Monitor, assess and respond (e.g., switch operators/ algorithms, spawn more copies, relocate).

Ongoing

More widely deployable - As OGSA-DAIIntroduce Virtual Machines

Looking into it.

14

Page 15: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

15www.neresc.ac.ukwww.neresc.ac.uk

Where to find out more

Papers -M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. Service Based Distributed Querying on the Grid. 1st International Conference on Service Oriented Computing, 2003, LNCS 2910M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith . OGSA-DQP: A Service for Distributed Querying on the Grid, in Proceedings of the Advances in Database Technology - EDBT 2004, LNCS 2992M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. An Experience Report on Designing and Building OGSA-DQP: A Service Based Distributed Query Processor for the Grid. GGF9 Workshop on Designing and Building Grid Services, 2003.M N Alpdemir, A Mukherjee, A Gounaris, N W Paton, P Watson, A A A Fernandes, J Smith. OGSA-DQP: A Service-Based Distributed Query Processor for the Grid. 2nd UK e-Science All Hands Meeting, 2003.J Smith, A Gounaris, P Watson, N W Paton, A A A Fernandes, R Sakellariou. Distributed Query Processing on the Grid. GRID 2002, LNCS 2536

(papers available at http://www.cs.ncl.ac.uk/research/pubs/authors/byType.php?id=110 )

Software - http://www.ogsadai.org.uk/dqp

15

Page 16: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

16www.neresc.ac.ukwww.neresc.ac.uk

Peoples and Partners

Prof. PaulWatson

Dr. Jim Smith

Arijit Mukherjee

Prof. Norman Paton

Dr. Alvaro AA

Fernandez

Dr. Rizos Sakellariou

Anastasios Gounaris

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Steven Lynden

16

Page 17: Www.neresc.ac.uk OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University

www.neresc.ac.ukwww.neresc.ac.uk

Thank You?

Thank You?