Download - 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

1

Scoped and Approximate Queries in a Relational Grid Information Service

Dong Lu , Peter A. Dinda , Jason A. Skicewicz

Prescience Lab, Dept. of Computer Science

Northwestern University, Evanston, IL 60201

2

Outline• Introduction and motivation

– Powerful queries, but expensive to execute

– Trade off between result size and query time

• Our solutions: Scoped query, Approximate query, Scoped Approximate query– Nondeterministic query (SC Talk on Tuesday)

• Performance Evaluation

3

What is RGIS?

• GIS: A Grid Information Service stores information about the resources and services in a distributed computing environment and answer queries about it.

• RGIS: Grid Information Service based on relational data model.

4

Why RGIS?1. RGIS can answer complex compositional queries

• Relational algebra (SQL)• Joins

• Difficult in a hierarchical model (directory service)

2. Other reasons• Indexes separate from data model• Schema evoluation • Transactional insert/update/delete• Consistency

5

RGIS Model of a Gridmodule

endpoint

maclinkmacswitch

iplinkrouter

host

connectorswitch

connectorlink

• Annotated network topology graph

• Annotation examples– Hosts: memory, disk, OS,

NICs, etc.– Router/Switch: backplane

bandwidth, ports– Link: latency and bandwidth

• Highly dynamic data in streams, not DB

• Virtualization, Futures, Leases– Virtual machines

Network

Data link

Physical

Software

6

The RGIS Design (Per Site)

Oracle 9i Back EndWindows, Linux, Parallel Server, etc

Oracle 9i Front Endtransactional inserts and updates

using stored procedures, queries using select statements

(uses database’s access control)

UpdateManager

Web Interface

Content Delivery Network Interface

For loose consistency

Query Managerand Rewriter

Users

Schema, type hierarchy, indices,PL/SQL stored procedures

for each object

Applications

RDBMSUse of Oracle

is not a requirement of approach

site-to-site (tentative)

Updates encrypted using asymmetric cryptography on network. Only those with appropriate keys have access

Authenticated Direct Interface

SOAP Interface

7

Challenge/Trade off

• Complex queries to a relational database can take a long time, – Hours, days or even weeks when we want seconds.

• Typically, returned result set is unnecessarily big.– Get back all results

• We need mechanisms to trade off the query time with the size of result set.

8

Challenge/Trade off

All results

Scopedresults

Nondeterministicresults

Approximateresults

9

Example: Cluster Finder

Cluster

RoutersIP links

Hosts

Find N hosts connected to the same router, with total memory N*512 MB, all running Linux, and the bisection bandwidth of The cluster is no less than 100Mbits/sec.

10

Original SQL for 2 Host Cluster FinderSELECT [scoped-approx] h1.distip, h2.distip FROM hosts h1, hosts h2, iplinks l1, iplinks l2, routers r WHERE h1.mem_mb+h2.mem_mb>=1024 and h1.os='linux' and h2.os='linux' and ((l1.src=r.distip and l2.src=r.distip and l1.dest=h1.distip and l2.dest=h2.distip) or (l1.dest=r.distip and l2.dest=r.distip and l1.src=h1.distip and l2.src=h2.distip)) and h1.distip<>h2.distip and L1.BW_MBS >= 100 AND L2.BW_MBS >= 100[SCOPED BY r.distip=X]WITHIN 100 seconds; Original

11

Original SQL for Cluster Finder

• It is 2*N+1 way join to look for a N node cluster. Not scalable.

Cluster 2

RoutersIP links

Hosts

Cluster 1

12

Scoped Cluster Finder

RoutersIP links

Hosts

Query the hosts around a random router.

13

Scoped Cluster Finder SELECT H1.DISTIP, H2.DISTIP FROM HOSTS H1, HOSTS H2, IPLINKS L1, IPLINKS L2, ROUTERS R WHERE H1.MEM_MB+H2.MEM_MB>=1024 AND H1.OS='LINUX' AND H2.OS='LINUX' AND ((L1.SRC=R.DISTIP AND L2.SRC=R.DISTIP AND L1.DEST=H1.DISTIP AND L2.DEST=H2.DISTIP) OR (L1.DEST=R.DISTIP AND L2.DEST=R.DISTIP AND L1.SRC=H1.DISTIP AND L2.SRC=H2.DISTIP)) AND H1.DISTIP<>H2.DISTIP AND L1.BW_MBS >= 100 AND L2.BW_MBS >= 100 AND R.DISTIP = X; Scoped

14

Approximate Cluster Finder

• When searching for N hosts with total memory N*512, we can approximate the query with “search for N hosts with each having memory over 512”.

• Thus reduced or avoided the number of joins.

• However, this won’t find, say, N/2 hosts with 256 MB and N/2 hosts with 768 MB

15

Approximate Cluster FinderSELECT R.DISTIP, H1.DISTIP FROM HOSTS H1, IPLINKS L1, ROUTERS R WHERE H1.MEM_MB>=512 AND H1.OS='LINUX' AND L1.BW_MBS >= 100 AND ((L1.SRC=R.DISTIP AND L1.DEST=H1.DISTIP) OR (L1.DEST = R.DISTIP AND L1.SRC=H1.DISTIP)) AND R.DISTIP IN (SELECT R.DISTIP FROM HOSTS H1, IPLINKS L1, ROUTERS R WHERE H1.MEM_MB>=512 AND H1.OS='LINUX' AND L1.BW_MBS>=100 AND ((L1.SRC=R.DISTIP AND L1.DEST=H1.DISTIP) OR (L1.DEST = R.DISTIP AND L1.SRC=H1.DISTIP)) GROUP BY R.DISTIP HAVING COUNT(*) >= 2) ORDER BY R.DISTIP;

16

Scoped Approximate Cluster Finder

• Combine approximate query with scoped query.

• Scoped to one randomly chosen router at a time, if no results found, choose another random router and repeat the query.

• Approximate N host join for 512*N memory with searches for N hosts each with >=512.

• Always a THREE way join.– regardless of the size of the cluster being searched

for. Thus very scalable. – may need to search multiple routers.

17

Scoped Approximate Cluster Finder

SELECT H1.DISTIP FROM HOSTS H1, IPLINKS L1, ROUTERS RWHERE H1.MEM_MB>=512 AND H1.OS='LINUX' AND L1.BW_MBS >= 100 AND ((L1.SRC=R.DISTIP AND L1.DEST=H1.DISTIP) OR (L1.DEST = R.DISTIP AND L1.SRC=H1.DISTIP)) AND R.DISTIP=X AND ROWNUM <=2

The scoped approximate cluster finder has a fixed number of joins.

18

Time bounded queries

• The query rewriter will start the query as a child process.

• Parent kills the child process if no results returned within deadline.

19

Limitations of Scoped and Approximate queries

• The returned results are subset of original query, and it is possible to report no results while the original query could return results after running a long time.

• Not all queries can be written as Scoped or Approximate queries.

• It is hard to automate the Scoped and Approximate query rewriting.

20

Performance Evaluation

• Need to populate the database with large amount of data.

• Computational grids are still in early stages. – No large data sets available.– Use Smith MDS data for memory

• We generate synthetic grids that are representative of the Internet.– Can generate very large grids

21

GridG Generated Synthetic Grids

• Three-level network: WAN, MAN, LAN. Nodes on WAN, MAN are routers, while nodes on LAN are hosts.

• Links: IP links annotated with bandwidth and latency.

• Hosts: annotated with memory size, architecture, number of processors, CPU clock rate, disk size, etc.

• User can control all the distributions and the size of network.

22

GridG: Synthesing Realistic Computational Grids

http://www.cs.northwestern.edu/~urgis/GridG

Other transformationson common format(Cluster maker, etc)

Structured TopologyBase

TopologyGenerator

(Tiers)

TranslationTo

CommonFormat

GridGPowerLaw

Enforcer

Structured Topologythat obeys power laws

Grid

GridGAnnotator

GISSimulator

DOTVisualization

OtherTools

RGISDatabase

SC talk on Tuesday!

23

Experimental Setup

• Dell PowerEdge 4400: dual Xeon 1 GHz processors, 2 GB memory, 240 GB RAID 5 storage system.

• Oracle 9i Enterprise edition, red hat Linux 7.1.

• Each test is repeated either 25 or 100 times, and we provide the average value.

24

Performance of various Query Technique with Cluster Finder

Cluster size | Standard | Scoped | Approx | Scoped Approx

2 | 21.44 | 2.27 | 7.62 | 1.16

4 | >7200 | 2047.9 | 7.48 | 1.32

8 | >9000 | >3600 | 7.46 | 1.43

16 | N/A | >3600 | 7.51 | 1.45

32 | N/A | >3600 | 7.65 | 5.96

64 | N/A | >3600 | >120 | 9.58

(Time to run query in Seconds)

25

Performance of Scoped Approximate Queries

• Cluster Finder : Find N hosts, each running Linux, with total memory at least N*512 MB, all connected to the same router, the bisection width is at least 100Mbits.– Our running example

• Non network query : Find N hosts with total memory at least N*512 MB.– No joins needed at all

26

Performance of Scoped Approximate Queries (2)

• Scalability with database size.

• Scalability with the complexity of queries.

• Scalability with concurrent users and update load.

27

Performance of Scoped Approximate Query (9.8K hosts, Cluster Finder)

28

Performance of Scoped Approximate Query (101K hosts , Cluster Finder)

29

Performance of Scoped Approximate Query (980K hosts , Cluster Finder)

30

Performance of Scoped Approximate Query (9.8K hosts, Non-network query)

31

Performance of Scoped Approximate Query (101K hosts , Non-network query)

32

Performance of Scoped Approximate Query (980K hosts , Non-network query)

33

Scalability with multiple concurrent users and background load

• Other research has shown that GIS servers will undertake frequent updating while serving the requests.

• GIS servers serve multiple concurrent users.• Evaluate scoped approximate queries with concurrent

users and update load.• Concurrent users: execute queries repeatedly• The update load: execute transactional updates on

randomly selected hosts as fast as possible.– About 200 updates/second

34

Performance of Scoped Approximate Query (9.8K hosts , Cluster Finder, with Concurrent

Users, looking for 64 nodes)

35

Performance of Scoped Approximate Query (9.8K hosts , Non network query, with

Concurrent Users, looking for 64 nodes)

36

Conclusions

• Described and evaluated two query techniques to trade off query time with the size of result set: Scoped and Approximate query.

• Combination of Scoped and Approximate query can dramatically reduce response time and server load.

37

For more information

• GridG and Related paper: http://www.cs.northwestern.edu/~urgis/GridG

“Synthesizing Realistic Computational Grids”, In proceedings of SC03.

• RGIS and Related paper: http://www.cs.northwestern.edu/~urgis/

“Nondeterministic Queries in a Relational Grid Information Service”, In proceedings of SC03.

Download - 1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer

Top Related