m.nedim alpdemir, anastasios gounaris¹, arijit mukherjee², desmond fitzgerald, norman w. paton¹,...

23
M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹, Jim Smith 2 1 University of Manchester, UK 2 University of Newcastle, UK Acknowledgement: Much of the content in many of the slides has been authored by co-workers, especially Nedim Alpdemir and Jim Smith. Errors are mine, of course. Experience on Performance Evaluation with OGSA- DQP

Upload: megan-patterson

Post on 28-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹,

Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹, Jim Smith2

1University of Manchester, UK2University of Newcastle, UK

Acknowledgement: Much of the content in many of the slides has been authored by co-workers, especially Nedim Alpdemir and Jim Smith. Errors are mine, of course.

Experience on Performance Evaluation with OGSA- DQP

Page 2: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 2

Outline of talk The OGSA-DQP system Impact of infrastructure layers Profiling

Page 3: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 3

OGSA-DQP system

Unified view of and access to remote DBMSs

Unified view of and access to remote DBMSs

Page 4: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 4

Selecting Resources in OGSA-DQP

Unified

schema

machines

Page 5: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 5

Evaluating Queries in OGSA-DQP

query plan

Page 6: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 6

High level architecture

Page 7: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 7

Brief tour: an illustration

G D Q S

GD S 1

GD S 2

W e bS e rv ice s

C lie n t

re s o u rce lis t

W S D L

D B S ch e m a

D B S ch e m a

G L o g ica lO pt im is e r G

Ph y s ica lO pt im is e r

G Pa rt it io n e r GS ch e du le r

G

OQ

L P

arse

r

Po la r* Q u e ry O pt im is e r En g in e

GD S Q u e r yR e qu e s t D oc .

O Q LQ u e r y

pr in t

e xc h an g e

h as h join

s c ane x ch a n g e

s c an

P1

P2

P3

GGQ ES 3

GGQ ES 2

GGQ ES 1

Distributed QueryExecution Engine

sub- pla n

sub- pla n

da ta b lock s

da ta b lock s

s u b-qu e ry

s u b-qu e ry

o pe ra t io n ca ll

<?xml version="1.0" encoding="UTF-8"?>

<GDQDataSourceList xmlns="http://dqp.ogsadai.org.uk/schema/gdqs" >

<importedDataSource>

<GDSFactoryHandle>http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>

<GDSFactoryHandle>http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>

<GDSFactoryHandle>http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory</GDSFactoryHandle>

</importedDataSource>

<importedService>

<wsdlURL>http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL</wsdlURL>

</importedService>

</GDQDataSourceList>

<?xml version="1.0" encoding="UTF-8"?>

<databaseSchema xmlns="">

<logicalSchema>

<table name="goterm">

<column fullName="goterm_id" length="32" name="id">

<sqlTypeName>varchar</sqlTypeName>

<sqlJavaTypeID>12</sqlJavaTypeID>

</column>

<column fullName="goterm_type" length="55" name="type">

<sqlTypeName>varchar</sqlTypeName>

<sqlJavaTypeID>12</sqlJavaTypeID>

</column>

<column fullName="goterm_name" length="255" name="name">

<sqlTypeName>varchar</sqlTypeName>

<sqlJavaTypeID>12</sqlJavaTypeID>

</column>

<primaryKey>

<columnFullName>id</columnFullName>

</primaryKey>

</table>

</logicalSchema>

<physicalSchema>

<hostMachine>130.88.192.230</hostMachine>

<database join_buffer_size="131072" max_join_size="4294967295">

<physTable avgRowLength="67" dataLength="766784" indexLength="126976" name="goterm" rowFormat="Dynamic" rows="11369"/>

</database>

</physicalSchema>

<GDSFHandle>http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory</GDSFHandle>

</databaseSchema>

<?xml version="1.0" encoding="UTF-8"?>

<Partitions>

<Partition>

<evaluatorURI>http://130.88.198.195:9090/ogsa/services/ogsadai/dqp/GridQueryEvaluationFactory/hash-11025450-1076603541049</evaluatorURI>

<Operator operatorID="0" operatorType="TABLE_SCAN">

<tupleType>

<type>goterm</type>

<name>goterm.OID</name>

<type>string</type>

<name>goterm.id</name>

<type>string</type>

<name>goterm.type</name>

<type>string</type>

<name>goterm.name</name>

</tupleType>

<TABLE_SCAN>

<dataResourceName> goterms </dataResourceName>

<GDSHandle> http://130.88.192.230:9090/ogsa/services/ogsadai/GridDataServiceFactory/hash-31056514-1076603576481</GDSHandle>

<tableName> goterms </tableName>

<predicateExpr>

<predicate>

<comparativeOperator>LIKE</comparativeOperator>

<leftOperand name=" goterm.id" type="13"/>

<rightOperand name=" GO:0000%" type="16"/>

</predicate>

</predicateExpr>

</TABLE_SCAN>

</Operator> . . .

</Partition> . . .

</Partitions>

Page 8: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 8

Where we are The OGSA-DQP system Impact of infrastructure layers Profiling • Work based on publicly available releases OGSA-DQP 2.0, OGSA-DAI 4.0, GT 3.2.

!! Some of the figures presented do not describe the behaviour of the system any more and refer to pre-optimisation stages.

• Complementarily, see OGSA-DAI papers in AHM2004-5.

Page 9: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 9

Data Sources Protein_goterm (16803 rows @ 24B ~ 404KB)

ORF1 [varchar(50)]

ORF2 [varchar(50)]

baitProtein [varchar(50)]

interactionType [varchar(5)]

repeats [int(11)]

experimenter [varchar(100)]

YER081W YIL074C YIL074C Y2H 0 Uetz et al YER081W YIL074C YER081W Y2H 2 Ito et al

● Protein_interaction (4716 rows @ 47B ~ 227KB)

ORF [varchar (55)] GoTermIdentifier [varchar(32)] Q0010 GO:0000004 YAL037W GO:0005554

Page 10: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 10

Representation in WebRowSet

Table name Original data size

XML overhead per row

Total XML overhead

Total Size

protein_goterm 404 KB 160 B 2.68 MB ~3 MB protein_interaction 226 KB 376 B 1.77 MB ~2 MB

<currentRow><columnValue>YAL037W</columnValue><columnValue>GO:0000004</columnValue>

</currentRow>

Page 11: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 11

(1) Access Techniques

Scan-1, a full scan of the protein_goterm table: select * from protein_goterm;

Scan-2, a full scan of the protein_interaction table: select * from protein_interaction;

Join, which is an equi-join of the two tables: select i.ORF2 from protein_goterm as p, protein_interaction as i where p.ORF=i.ORF1;

Page 12: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 12

Configurations

JDBC local/remote

GDS local/remote sync/async

OGSA-DQP GQES co-located with GDS GQES calls GDS asynchronously

Page 13: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 13

Scan-1 (protein_goterm)

local JDBC

remote JDBC

local Synch-GDS

remote Synch-GDS

local Asynch-GDS

remote Asynch-GDS

OGSA-DQP-scan

0.0020.0040.0060.0080.00

100.00120.00140.00160.00180.00200.00220.00

0.43 5.33 6.33

117.00

218.33

156.37

Data Access Mode

Ex

ec

. Tim

e (

sec

.)

Page 14: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 14

Scan-2 (protein_interaction)

local JDBC

remote JDBC

local Synch-GDS

remote Synch-GDS

local Asynch-GDS

remote Asynch-GDS

OGSA-DQP-scan

0.00

10.00

20.00

30.00

40.00

50.00

60.00

0.34 2.00 3.00

32.33

63.00

47.27

Data Access Mode

Qu

ery

Ex

ec. T

ime

(se

c.)

Page 15: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 15

Join (both tables)

local JDBC

remote JDBC

local Synch-GDS

remote Synch-GDS

local Asynch-GDS

remote Asynch-GDS

OGSA-DQP-join

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

0.24 2.00 3.00

97.67

197.00

349.07

Data Access Mode

Qu

ery

Ex

ec. T

ime

(se

c.)

Page 16: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 16

Remote Access by Block

1 5 10 20 30 40 500

10

20

30

40

50

60

protein_interactions

Tuples per Block

Ela

pse

d T

ime

(s)

1 5 10 20 30 40 500

25

50

75

100

125

150

175

200

225

protein_goterms

Synch

Asynch

Tuples per Block

Ela

pse

d T

ime

(s)

Block access benefits local cases too (e.g. DQP access to protein_goterms 156 -> 28s)

Page 17: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 17

(2) Breaking Down Costs

8.1

26.8 (4.8+1+22)

15.1 (9.6+5.5)

Total Cost: 28.6 secs

Page 18: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 18

(3) Parallelizing Operation Call

select p.ORF, go.id, calculateEntropySlow(p.sequence)

from protein_sequences p, goterms go, protein_goterms pg

where go.id=pg.GOTermIdentifier andp.ORF=pg.ORF and pg.ORF like "YCL0\%" and go.id like "GO:0\%";

Page 19: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 19

Configuration

Two parameters: a number of copies

of WS are available;

a number of spare machines are available – compiler plants an op-call on each.

Page 20: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 20

Measurements

1 2 3 4 5 60

20

40

60

80

100

120

140

160

2 Op call ops

3

4

5

6

Service copies

Ela

pse

d T

ime

(s)

Page 21: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 21

Lessons Increase granularity of inter-service

communication, like access to GDS. Reduce delivery cost – coalesce root

evaluator within GDQS – in progress. Parallelizing expensive operations

can be beneficial – ongoing work. Translations performed in transfers

and XML WebRowSet processing can be expensive – ongoing work.

Page 22: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 22

Current Work New Release

Supports WS-RF/ WS-I Builds on top of OGSA-DAI 7.0

(due in September) Includes optimisations

Investigating adaptive and fault-tolerant mechanisms

Page 23: M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,

21 Sep 2005 AHM 2005 23

Contact OGSA-DAI and OGSA-DQP software

http://www.ogsadai.org.uk http://www.ogsadai.org.uk/dqp

Project site, mentioning adaptivity and fault-tolerance

http://www.ncl.ac.uk/polarstar