1 scoped and approximate queries in a relational grid information service dong lu, peter a. dinda,...

37
1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu , Peter A. Dinda , Jason A. Skicewicz Prescience Lab, Dept. of Computer Science Northwestern University, Evanston, IL 60201

Upload: sidney-ancell

Post on 14-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

1

Scoped and Approximate Queries in a Relational Grid Information Service

Dong Lu , Peter A. Dinda , Jason A. Skicewicz

Prescience Lab, Dept. of Computer Science

Northwestern University, Evanston, IL 60201

Page 2: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

2

Outline• Introduction and motivation

– Powerful queries, but expensive to execute

– Trade off between result size and query time

• Our solutions: Scoped query, Approximate query, Scoped Approximate query– Nondeterministic query (SC Talk on Tuesday)

• Performance Evaluation

Page 3: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

3

What is RGIS?

• GIS: A Grid Information Service stores information about the resources and services in a distributed computing environment and answer queries about it.

• RGIS: Grid Information Service based on relational data model.

Page 4: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

4

Why RGIS?1. RGIS can answer complex compositional queries

• Relational algebra (SQL)• Joins

• Difficult in a hierarchical model (directory service)

2. Other reasons• Indexes separate from data model• Schema evoluation • Transactional insert/update/delete• Consistency

Page 5: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

5

RGIS Model of a Gridmodule

endpoint

maclinkmacswitch

iplinkrouter

host

connectorswitch

connectorlink

• Annotated network topology graph

• Annotation examples– Hosts: memory, disk, OS,

NICs, etc.– Router/Switch: backplane

bandwidth, ports– Link: latency and bandwidth

• Highly dynamic data in streams, not DB

• Virtualization, Futures, Leases– Virtual machines

Network

Data link

Physical

Software

Page 6: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

6

The RGIS Design (Per Site)

Oracle 9i Back EndWindows, Linux, Parallel Server, etc

Oracle 9i Front Endtransactional inserts and updates

using stored procedures, queries using select statements

(uses database’s access control)

UpdateManager

Web Interface

Content Delivery Network Interface

For loose consistency

Query Managerand Rewriter

Users

Schema, type hierarchy, indices,PL/SQL stored procedures

for each object

Applications

RDBMSUse of Oracle

is not a requirement of approach

site-to-site (tentative)

Updates encrypted using asymmetric cryptography on network. Only those with appropriate keys have access

Authenticated Direct Interface

SOAP Interface

Page 7: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

7

Challenge/Trade off

• Complex queries to a relational database can take a long time, – Hours, days or even weeks when we want seconds.

• Typically, returned result set is unnecessarily big.– Get back all results

• We need mechanisms to trade off the query time with the size of result set.

Page 8: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

8

Challenge/Trade off

All results

Scopedresults

Nondeterministicresults

Approximateresults

Page 9: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

9

Example: Cluster Finder

Cluster

RoutersIP links

Hosts

Find N hosts connected to the same router, with total memory N*512 MB, all running Linux, and the bisection bandwidth of The cluster is no less than 100Mbits/sec.

Page 10: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

10

Original SQL for 2 Host Cluster FinderSELECT [scoped-approx] h1.distip, h2.distip FROM hosts h1, hosts h2, iplinks l1, iplinks l2, routers r WHERE h1.mem_mb+h2.mem_mb>=1024 and h1.os='linux' and h2.os='linux' and ((l1.src=r.distip and l2.src=r.distip and l1.dest=h1.distip and l2.dest=h2.distip) or (l1.dest=r.distip and l2.dest=r.distip and l1.src=h1.distip and l2.src=h2.distip)) and h1.distip<>h2.distip and L1.BW_MBS >= 100 AND L2.BW_MBS >= 100[SCOPED BY r.distip=X]WITHIN 100 seconds; Original

Page 11: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

11

Original SQL for Cluster Finder

• It is 2*N+1 way join to look for a N node cluster. Not scalable.

Cluster 2

RoutersIP links

Hosts

Cluster 1

Page 12: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

12

Scoped Cluster Finder

RoutersIP links

Hosts

Query the hosts around a random router.

Page 13: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

13

Scoped Cluster Finder SELECT H1.DISTIP, H2.DISTIP FROM HOSTS H1, HOSTS H2, IPLINKS L1, IPLINKS L2, ROUTERS R WHERE H1.MEM_MB+H2.MEM_MB>=1024 AND H1.OS='LINUX' AND H2.OS='LINUX' AND ((L1.SRC=R.DISTIP AND L2.SRC=R.DISTIP AND L1.DEST=H1.DISTIP AND L2.DEST=H2.DISTIP) OR (L1.DEST=R.DISTIP AND L2.DEST=R.DISTIP AND L1.SRC=H1.DISTIP AND L2.SRC=H2.DISTIP)) AND H1.DISTIP<>H2.DISTIP AND L1.BW_MBS >= 100 AND L2.BW_MBS >= 100 AND R.DISTIP = X; Scoped

Page 14: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

14

Approximate Cluster Finder

• When searching for N hosts with total memory N*512, we can approximate the query with “search for N hosts with each having memory over 512”.

• Thus reduced or avoided the number of joins.

• However, this won’t find, say, N/2 hosts with 256 MB and N/2 hosts with 768 MB

Page 15: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

15

Approximate Cluster FinderSELECT R.DISTIP, H1.DISTIP FROM HOSTS H1, IPLINKS L1, ROUTERS R WHERE H1.MEM_MB>=512 AND H1.OS='LINUX' AND L1.BW_MBS >= 100 AND ((L1.SRC=R.DISTIP AND L1.DEST=H1.DISTIP) OR (L1.DEST = R.DISTIP AND L1.SRC=H1.DISTIP)) AND R.DISTIP IN (SELECT R.DISTIP FROM HOSTS H1, IPLINKS L1, ROUTERS R WHERE H1.MEM_MB>=512 AND H1.OS='LINUX' AND L1.BW_MBS>=100 AND ((L1.SRC=R.DISTIP AND L1.DEST=H1.DISTIP) OR (L1.DEST = R.DISTIP AND L1.SRC=H1.DISTIP)) GROUP BY R.DISTIP HAVING COUNT(*) >= 2) ORDER BY R.DISTIP;

Page 16: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

16

Scoped Approximate Cluster Finder

• Combine approximate query with scoped query.

• Scoped to one randomly chosen router at a time, if no results found, choose another random router and repeat the query.

• Approximate N host join for 512*N memory with searches for N hosts each with >=512.

• Always a THREE way join.– regardless of the size of the cluster being searched

for. Thus very scalable. – may need to search multiple routers.

Page 17: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

17

Scoped Approximate Cluster Finder

SELECT H1.DISTIP FROM HOSTS H1, IPLINKS L1, ROUTERS RWHERE H1.MEM_MB>=512 AND H1.OS='LINUX' AND L1.BW_MBS >= 100 AND ((L1.SRC=R.DISTIP AND L1.DEST=H1.DISTIP) OR (L1.DEST = R.DISTIP AND L1.SRC=H1.DISTIP)) AND R.DISTIP=X AND ROWNUM <=2

The scoped approximate cluster finder has a fixed number of joins.

Page 18: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

18

Time bounded queries

• The query rewriter will start the query as a child process.

• Parent kills the child process if no results returned within deadline.

Page 19: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

19

Limitations of Scoped and Approximate queries

• The returned results are subset of original query, and it is possible to report no results while the original query could return results after running a long time.

• Not all queries can be written as Scoped or Approximate queries.

• It is hard to automate the Scoped and Approximate query rewriting.

Page 20: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

20

Performance Evaluation

• Need to populate the database with large amount of data.

• Computational grids are still in early stages. – No large data sets available.– Use Smith MDS data for memory

• We generate synthetic grids that are representative of the Internet.– Can generate very large grids

Page 21: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

21

GridG Generated Synthetic Grids

• Three-level network: WAN, MAN, LAN. Nodes on WAN, MAN are routers, while nodes on LAN are hosts.

• Links: IP links annotated with bandwidth and latency.

• Hosts: annotated with memory size, architecture, number of processors, CPU clock rate, disk size, etc.

• User can control all the distributions and the size of network.

Page 22: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

22

GridG: Synthesing Realistic Computational Grids

http://www.cs.northwestern.edu/~urgis/GridG

Other transformationson common format(Cluster maker, etc)

Structured TopologyBase

TopologyGenerator

(Tiers)

TranslationTo

CommonFormat

GridGPowerLaw

Enforcer

Structured Topologythat obeys power laws

Grid

GridGAnnotator

GISSimulator

DOTVisualization

OtherTools

RGISDatabase

SC talk on Tuesday!

Page 23: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

23

Experimental Setup

• Dell PowerEdge 4400: dual Xeon 1 GHz processors, 2 GB memory, 240 GB RAID 5 storage system.

• Oracle 9i Enterprise edition, red hat Linux 7.1.

• Each test is repeated either 25 or 100 times, and we provide the average value.

Page 24: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

24

Performance of various Query Technique with Cluster Finder

Cluster size | Standard | Scoped | Approx | Scoped Approx

2 | 21.44 | 2.27 | 7.62 | 1.16

4 | >7200 | 2047.9 | 7.48 | 1.32

8 | >9000 | >3600 | 7.46 | 1.43

16 | N/A | >3600 | 7.51 | 1.45

32 | N/A | >3600 | 7.65 | 5.96

64 | N/A | >3600 | >120 | 9.58

(Time to run query in Seconds)

Page 25: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

25

Performance of Scoped Approximate Queries

• Cluster Finder : Find N hosts, each running Linux, with total memory at least N*512 MB, all connected to the same router, the bisection width is at least 100Mbits.– Our running example

• Non network query : Find N hosts with total memory at least N*512 MB.– No joins needed at all

Page 26: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

26

Performance of Scoped Approximate Queries (2)

• Scalability with database size.

• Scalability with the complexity of queries.

• Scalability with concurrent users and update load.

Page 27: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

27

Performance of Scoped Approximate Query (9.8K hosts, Cluster Finder)

Page 28: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

28

Performance of Scoped Approximate Query (101K hosts , Cluster Finder)

Page 29: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

29

Performance of Scoped Approximate Query (980K hosts , Cluster Finder)

Page 30: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

30

Performance of Scoped Approximate Query (9.8K hosts, Non-network query)

Page 31: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

31

Performance of Scoped Approximate Query (101K hosts , Non-network query)

Page 32: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

32

Performance of Scoped Approximate Query (980K hosts , Non-network query)

Page 33: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

33

Scalability with multiple concurrent users and background load

• Other research has shown that GIS servers will undertake frequent updating while serving the requests.

• GIS servers serve multiple concurrent users.• Evaluate scoped approximate queries with concurrent

users and update load.• Concurrent users: execute queries repeatedly• The update load: execute transactional updates on

randomly selected hosts as fast as possible.– About 200 updates/second

Page 34: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

34

Performance of Scoped Approximate Query (9.8K hosts , Cluster Finder, with Concurrent

Users, looking for 64 nodes)

Page 35: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

35

Performance of Scoped Approximate Query (9.8K hosts , Non network query, with

Concurrent Users, looking for 64 nodes)

Page 36: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

36

Conclusions

• Described and evaluated two query techniques to trade off query time with the size of result set: Scoped and Approximate query.

• Combination of Scoped and Approximate query can dramatically reduce response time and server load.

Page 37: 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

37

For more information

• GridG and Related paper: http://www.cs.northwestern.edu/~urgis/GridG

“Synthesizing Realistic Computational Grids”, In proceedings of SC03.

• RGIS and Related paper: http://www.cs.northwestern.edu/~urgis/

“Nondeterministic Queries in a Relational Grid Information Service”, In proceedings of SC03.