data access and integration with ogsa-dai: ogsa-dqp

23
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester

Upload: ranae

Post on 31-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Data access and integration with OGSA-DAI: OGSA-DQP. Steven Lynden University of Manchester. Introduction. OGSA-DQP is a service based distributed query processor It evaluates queries over distributed data sources wrapped by OGSA-DAI It is built using OGSA-DAI extensibility points - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data access and integration with OGSA-DAI: OGSA-DQP

Data access and integration with OGSA-DAI:

OGSA-DQP

Steven Lynden

University of Manchester

Page 2: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 172

Introduction

OGSA-DQP is a service based distributed query processor

It evaluates queries over distributed data sources wrapped by OGSA-DAI

It is built using OGSA-DAI extensibility points People involved:

• University of Manchester• Tasos Gounaris, Steven Lynden, Alvaro Fernandes, Rizos Sakellariou,

Norman Paton

• University of Newcastle• Jim Smith, Arijit Mukherjee, Paul Watson

• OGSA-DAI Prototype release 3.0 available from the OGSA-DAI

website• Install on OGSA-DAI WSRF/WS-I 2.1

Page 3: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 173

OGSA-DQP high-level overview

OGSA-DQP uses a middleware approach.

It can be seen as a mediator over OGSA-DAI wrappers.

Usability: use it as an OGSA-DAI data service.

DQP is capable of planning, scheduling and executing in parallel the distributed queries

Calls to analysis (Web) services can be declared within queries and invoked by DQP.

DBMS

data

OGSA-DQP

Query Results

OGSA-DAI

OGSA-DAI

DBMS

data

Page 4: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 174

Using OGSA-DQP

All interactions are client-server based Firstly, configure OGSA-DQP by specifying the

data sources and analysis services to be used (administration)

DQP creates a global schema which can then be used to formulate queries

The user may then submit queries Infrastructural requirements:

- OGSA-DAI-wrapped relational databases

- Analysis services (optional)

- Evaluation infrastructure

Page 5: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 175

OGSA-DQP architecture

OGSA-DAIdata service

perform

EvaluatorQE

EvaluatorQE

EvaluatorQE

The “OGSA-DQP service”, Grid Distributed Query Service (GDQS) AKA “Coordinator”

AKA Grid Query Evaluation Service (GQES)

DQP activities installed

Page 6: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 176

OGSA-DQP architecture

DQP evaluator services:• Are plain Web services• Implement the QueryEvaluation port type:

•evaluate – the input is a query plan partition which is subsequently executed

•receiveData – allows the evaluator to receive data from other evaluators

OGSA-DAI extensions:• DQP resource – a resource which encapsulates a distributed

query infrastructure: DQP evaluator services, OGSA-DAI data services etc. Implemented as a data resource accessor.

• OQL query statement activity – enables the submission of a query in Object Query Language (OQL)

• DQP factory activity – enables the creation and configuration of DQP resources.

Page 7: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 177

Example query

Given two DBMSs and one analysis tool (i.e., a Web service):• goTerm : a GO Gene Ontology table in a remote mySQL DB, exposed by an

OGSA-DAI data service• protein : a table in a protein sequence DB, exposed by an OGSA-DAI

data service• Blast (sequence alignment scoring Web service);

We want to obtain alignment scores for a sequence against proteins of a certain kind

The user submits a single query referencing data stored at multiple sites. The author of the query need not be aware of how/where data is stored. Queries are written in Object Query Language (OQL):

select p.proteinId, Blast(p.sequence)from protein p, goTerm twhere t.termId = ‘GO:0005942’ and p.proteinId=t.proteinId

Page 8: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 178

Background: OQL

Why?• OGSA-DQP is based on a parallel distributed query processor

for object databases (Polar*)• The standard query language of object databases is OQL

Polar* is still used by DQP to parse, optimise and schedule queries

Instead of querying object databases, we are now querying relational databases

OQL queries are compiled by Polar* into distributed query plans.

During the execution of the query plan, DQP will query relational data sources using SQL.

Page 9: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 179

Client interaction with OGSA-DQP

Two main client/server interactions:

1. Configuration: the client sends a perform document requesting the service to create a DQP data service resource

2. Query submission: the client sends a perform document requesting the service to execute an Object Query Language (OQL) query, using a DQP data service resource created in (1)

The data service resource created in (1) encapsulates the distributed query infrastructure used to execute queries. Differs from the typical OGSA-DAI data service resources e.g. relational data service resource

Page 10: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1710

DQP configuration

OGSA-DAIdata serviceperform

<perform><DQPFactory>Evaluator URLsOGSA-DAI data service resourcesWeb service URLs</DQPFactory> </perform> OGSA-DAI

data serviceGetRP

DQP factory activity

OGSA-DAIdata service

GetRP

creates

DQP DSR • Global schema of imported DBs & analysis services• Set of evaluators that can be used• Physical DB metadata (used to optimise queries)

Result: resource ID of created DSR

Page 11: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1711

DQP query evaluation

OGSA-DAIdata serviceperform

<perform><OQLQueryStatement><expression>OQL query</expression></OQLQueryStatement> </perform>

OGSA-DAIdata service

perform

OQLQueryStatement

DQP DSR

EvaluatorQE

transport

OGSA-DAIdata service

perform

Analysisservice

. . .EvaluatorQE

EvaluatorQE

Result: WebRowSet XML Stream

Page 12: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1712

Interacting with an OGSA-DQP service

Three options:• A command line client

• Allows configuration and query submission via the execution of Apache Ant scripts

• Client toolkit classes• Allow you to integrate OGSA-DQP into your own

applications

[The above utilities are part of the main OGSA-DQP download]

• GUI client

Page 13: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1713

Command-line client

Configuration example:

$ ant factory

-Ddqp.config.file=config.xml

-Durl=http://rpc122.cs.man.ac.uk/axis/services/service1

-Dresource.id=dqp-factory

Querying the global schema – example:

$ ant getschemas

-Durl=http://rpc122.cs.man.ac.uk/axis/services/service1

-Dresource.id=ogsadai-911acvd122

Page 14: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1714

Command-line client

Query submission example:

$ ant query

-Durl=http://rpc122.cs.man.ac.uk/axis/services/service1

-Dresource.id=ogsadai-911acvd122

-Dclient.query=“%print select i.id from i in go_goterms;”

-Dclient.output.file=results.xml

Results will be saved as a WebRowSet, the standard XML representation of relational results used by OGSA-DAI

Page 15: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1715

Client toolkit classes

Client toolkit classes are provided for the activities contributed by OGSA-DQP:•GDQSFactory class used to construct DQPFactory activities

•OQLQuery class used to construct OQLQueryStatement activities

The client toolkit allows the integration of DQP with other applications and seamless interaction with the OGSA-DAI client toolkit

OGSA-DQP client toolkit is Java only…

Page 16: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1716

Query execution using client toolkit

1 GenericServiceFetcher fetcher = GenericServiceFetcher.getInstance();2 DataService service = fetcher.getDataService(url,resourceID);3 OQLQuery oqlQuery = new OQLQuery(query);4 OutputStreamActivity outputStream = new OutputStreamActivity();5 outputStream.setInput( oqlQuery.getOutput() );6 ActivityRequest request = new ActivityRequest();7 request.add( oqlQuery );8 service.perform(request);9 oqlQuery.getResultSet();10 java.sql.ResultSet rs = outputStream.getResultSet();

Page 17: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1717

Demo: The GUI Client

The GUI allows you to:• Interact with OGSA-DQP services. The GUI is pre-configured

with the URL of a OGSA-DQP service we have deployed at EPCC.

• View the configuration parameters of DQP data service resources

• View the global schema maintained by a DQP data service resource

• Submit OQL queries to DQP data service resources

• View the results of queries

• View graphical and XML representations of query plans

Page 18: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1718

Services @ Newcastle University

Evaluatorservice

giga01.ncl.ac.uk

OGSA-DAI dataservice

GO Term DB

Evaluatorservice

giga02.ncl.ac.uk

OGSA-DAI dataservice

Protein interaction DB

Evaluatorservice

giga03.ncl.ac.uk

OGSA-DAI dataservice

Protein Term DB

Evaluatorservice

giga04.ncl.ac.uk

OGSA-DAI dataservice

Protein property DB

Page 19: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1719

Services @ Newcastle University

Evaluatorservice

giga05.ncl.ac.uk

OGSA-DAI dataservice

Protein Sequence DB

Evaluatorservice

giga06.ncl.ac.uk

Evaluatorservice

giga07.ncl.ac.uk

Evaluatorservice

giga08.ncl.ac.uk

Evaluatorservice

giga09.ncl.ac.uk

Entropyanalyserservice

Page 20: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1720

Database tables

name length sqlTypeName

id 32 varchar

type 55 varchar

name 255 varchar

GO Terms extent name: “goterms_goterms”

Protein interactionsExtent name: “interaction_protein_interactions”

name length sqlTypeName

ORF1 50 varchar

ORF2 50 varchar

baitProtein 50 varchar

interactionType 5 varchar

repeats 11 int

experimenter 100 varchar

Page 21: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1721

Database tables

Protein terms extent name: “protein_term_protein_goterm”

name length sqlTypeName

ORF 55 varchar

molecularWeight 12 float

hydrophobicity 12 float

Protein propertiesextent name: “protein_property_protein_propertys”

name length sqlTypeName

ORF 50 varchar

sequence 65535 text

Protein sequenceextent name: “protein_sequence_protein_sequences”

name length sqlTypeName

ORF 55 varchar

GOTermIdentifier

32 varchar

Page 22: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1722

DQP service @ EPCC

OGSA-DAI data service

DQP factory

GIGA resource

test.ogsadai.org.uk

Encapsulates the distributed query environment deployed at Newcastle

ogsadai-1092f60c1e1

Page 23: Data access and integration with OGSA-DAI: OGSA-DQP

Data access & integration with OGSA-DAI: GGF 1723

Conclusion

OGSA-DQP is a service based distributed query processor that is:

• Exposed as a service

• Implemented as an orchestration of services It provides an example of how the OGSA-DAI extensibility points can be

used…• The activity extensibility points are used

• New data resource accessors are implemented

• Dynamic resource deployment is used during configuration to create new resources

Benefits:• OGSA-DAI manages activity concurrency – we didn’t need to write concurrent

code

• OGSA-DQP can take advantage of the host of delivery options provided by OGSA-DAI

• OGSA-DQP is insulated from multiple platforms (WS-I, WSRF) by OGSA-DAI