nondeterministic queries in a relational grid information service
DESCRIPTION
Nondeterministic Queries in a Relational Grid Information Service. Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University http://plab.cs.northwestern.edu. Overview. RGIS: GIS system based on the relational data model using SQL - PowerPoint PPT PresentationTRANSCRIPT
Nondeterministic Queries in a Relational Grid Information Service
Peter A. DindaDong Lu
Prescience LabDepartment of Computer Science
Northwestern University
http://plab.cs.northwestern.edu
2
Overview• RGIS: GIS system based on the relational
data model using SQL• Complex compositional queries can be posed
– “Find me 16 hosts on the same LAN that together have 32 GB of RAM”
• Can be very expensive to answer– Joins: worst case O(n^m) for m tables of size n
• Introduce nondeterminism– User gets random sample of result set– Automated query transformation
3
Outline• Overview• Model• Implementation• Nondeterministic queries• Performance evaluation• Related work• Conclusions D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003
4
RGIS Model of a Gridmodule
endpoint
maclinkmacswitch
iplink routerhost
connectorswitchconnectorlink
• Annotated network topology graph
• Annotation examples– Hosts: memory, disk, OS,
NICs, etc.– Router/Switch: backplane
bandwidth, ports– Link: latency and
bandwidth• Highly dynamic data in
streams, not DB• Virtualization, Futures,
Leases– Virtual machines
Network
Data link
Physical
Software
5
Outline• Overview• Model• Implementation• Nondeterministic queries• Performance evaluation• Related work• Conclusions D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003
6
id => time, randinsertids
id => {id}virtuals
id => idfutures
id => endtimeleases
resource metadata
Users, Groups,Capabilities, Sessions,Resource limits…Essential viewsuser => {capability}user => {resource limit}
security
id,type,name,blobmoduleexecs
id,typedatasources
id,type,module,datasource
endpointsid,moduleexec
modules
software layer
id,typeinfo,distiphosts
id,typeinfo,distiprouters
id,typeinfo,blobrouterbenchmarks
id,typeinfo,blobhostbenchmarks
network layer
id, name, desc
archtypes, routertypes, switchtypes, linktypes, ostypes, osvendors, hardwarevendors, linktypes, pathtypes, moduletypes, endpointtypes, datatypes, hostbenchmarktypes, routerbenchmarktypes
valid types
id,src,destiplinks
distip => {ip}ipassocs
id,src,destippaths
id,typeinfo,distadxmacswitches
id,typeinfo,src,destmaclinks
data link layer
distadx => {adx}macassocs
ip => macaddripmacassoc
adx => macaddrconnectormacassoc
id,typeinfo,distadxconnectorswitches
id,typeinfo,src,destconnectorlinks
physical layer
distadx => {adx}connectorassocs
Software
Network
Data Link
Physical
Metadata
Types
Security
7
id => time, randinsertids
id => {id}virtuals
id => idfutures
id => endtimeleases
resource metadata
Users, Groups,Capabilities, Sessions,Resource limits…Essential viewsuser => {capability}user => {resource limit}
security
id,type,name,blobmoduleexecs
id,typedatasources
id,type,module,datasource
endpointsid,moduleexec
modules
software layer
id,typeinfo,distiphosts
id,typeinfo,distiprouters
id,typeinfo,blobrouterbenchmarks
id,typeinfo,blobhostbenchmarks
network layer
id, name, desc
archtypes, routertypes, switchtypes, linktypes, ostypes, osvendors, hardwarevendors, linktypes, pathtypes, moduletypes, endpointtypes, datatypes, hostbenchmarktypes, routerbenchmarktypes
valid types
id,src,destiplinks
distip => {ip}ipassocs
id,src,destippaths
id,typeinfo,distadxmacswitches
id,typeinfo,src,destmaclinks
data link layer
distadx => {adx}macassocs
ip => macaddripmacassoc
adx => macaddrconnectormacassoc
id,typeinfo,distadxconnectorswitches
id,typeinfo,src,destconnectorlinks
physical layer
distadx => {adx}connectorassocs
8
RGIS Design(Per Site)
Oracle 9i Back EndWindows, Linux, Parallel Server, etc
Oracle 9i Front Endtransactional inserts and updates
using stored procedures, queries using select statements(uses database’s access control)
UpdateManager
Web Interface
Content Delivery Network Interface
For loose consistency
Query Managerand Rewriter
Users
Schema, type hierarchy, indices,PL/SQL stored procedures
for each object
Applications
RDBMSUse of Oracle
is not a requirement of approach
site-to-site (tentative)
Updates encrypted using asymmetric cryptography on network. Only those with appropriate keys have access
Authenticated Direct Interface
SOAP Interface
9
RGIS Design (Intersite)
RGIS Server RGIS Server
RGIS Server
Update Push ToFriend Site
Update Push ToFriend Site
•Site RGIS server pushes local updates to friend sites
•Site RGIS server consolidates updates from site and friend sites
•Site RGIS server answers all queries originating from its site
A B
C
10
Insert/Update/Delete
50,000 500,000 5,000,0000
100
200
300
40015002000250030003500
Number of Hosts in Database
Insert (Single)
Insert (Bulk)
Update (Single)
Update (Batches of 100)
Update (Bulk)
Delete (Bulk)
Dual Xeon 1 GHz, 2 GB, 8x36 GB RAID5, Oracle 9i
x x
11
2,700 lines of authored SQL4,000 lines of generated PL/SQL
22,000 lines of authored Perl
Main dependencies•DBI to Oracle 9i•SOAP::Lite•CGI
Not finished yet!
12
RGIS Design(Per Site)
Oracle 9i Back EndWindows, Linux, Parallel Server, etc
Oracle 9i Front Endtransactional inserts and updates
using stored procedures, queries using select statements(uses database’s access control)
UpdateManager
Web Interface
Content Delivery Network Interface
For loose consistency
Query Managerand Rewriter
Users
Schema, type hierarchy, indices,PL/SQL stored procedures
for each object
Applications
RDBMSUse of Oracle
is not a requirement of approach
site-to-site (tentative)
Updates encrypted using asymmetric cryptography on network. Only those with appropriate keys have access
Authenticated Direct Interface
SOAP Interface
This talk
13
Outline• Overview• Model• Implementation• Nondeterministic queries• Performance evaluation• Related work• Conclusions D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003
14
Motivation
• Queries for compositions of resources easily expressed in SQL:
• But such queries can be very expensive to execute• However, we typically don’t need the entire result set, just
some rows, and not always the same ones• And we need them in a bounded amount of time
“Find 2 hosts with Linux that together have 3 GB of RAM”
select h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072
15
Why Not Just Limit?• Oracle rownum, MySQL limit clause• “Return first k rows of result set”
• Problem: Always get the SAME answer• Problem: May STILL take a long time
– Results not discovered until near the end• Problem: Query time related to DATA
as well as k
16
Query Approaches
All results
Scopedresults
Nondeterministic results (this paper)
Approximateresults
Available inGrid 2003 Paper
Return Random Sample of Result Set
17
Nondeterministic Version of Query
select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds
18
Implementing non-deterministic queriesselect nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds SELECT
H1.INSERTID, H2.INSERTID FROM HOSTS H1 SAMPLE(P), HOSTS H2 SAMPLE(P) WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072)
Query Managerand Rewriter
Random sample ofinput tables with
Selection Probability Pdetermined by time constraint
and server load
Using Oracle-SpecificExtensions
19
Implementing non-deterministic queriesselect nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds SELECT
H1.INSERTID, H2.INSERTID FROM HOSTS H1, HOSTS H2 , INSERTIDS TEMP_H1 , INSERTIDS TEMP_H2 WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) AND (H1.INSERTID=TEMP_H1.INSERTID AND TEMP_H1.rand > 982663452.975047 AND TEMP_H1.rand <= 1025613125.93505) AND (H2.INSERTID=TEMP_H2.INSERTID AND TEMP_H2.rand > 1877769069.94039 AND TEMP_H2.rand <= 1920718742.90039)
Query Managerand Rewriter
Random sample ofinput tables with
Selection Probability Pdetermined by time constraint
and server load
Using Our Schema(Not Oracle-Specific)
Rest of Talk
20
Implementing non-deterministic queries
Host insertid random_number
0 Nx x+y
RandomStarting Point
y=P*N ReshufflingRequirement
21
Deadlines• Hard-limiting
– Time-limited thread or process forked• Climbing
– Start with low probability p, issue query, if no results, double probability, try again, keep going until no more time or have results
• Estimation– Like climbing, but do polynomial estimation
over previous runs to estimate if next run will exceed deadline
22
Outline• Overview• Model• Implementation• Nondeterministic queries• Performance evaluation• Related work• Conclusions D. Lu and P. Dinda, Synthesizing Realistic
Computational Grids, SC 2003D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003
23
GridG: Synthesing Realistic Computational Grids
http://www.cs.northwestern.edu/~urgis/GridG
• Generates a Grid as an annotated layer 3 topology– Hosts, routers, links
• Graph conforms to power laws of Internet topology• Annotations include:
– memory, clock speed, cpu type, number of CPUs, operating system type, link bandwidths, router bandwidths, etc.
– Memory distribution according to Smith study of MDS contents
24
Test GridsGrid Size (Hosts) Query
50,000 “Find n hosts with 3 GB of memory”
500,000 “Find n hosts with 3 GB of memory”
5,000,000 “Find n hosts with 3 GB of memory”
10,000 “Find 2 close hosts”
50,000 “Find 2 close hosts”
100,000 “Find 2 close hosts”
25
Nondeterministic query performance
0.1
1
10
100
1
10
100
1000
10000
100000
1000000
0.0001 0.001 0.002Selection Probability
Query Time
Number ofResults
Meaningful tradeoff between query processing time and result set size is possible
Select two hosts that together have >3GB of RAM
26
Nondeterministic query performance
0.1
1
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of Hosts
Query Time
Number ofResults
p=0.0001
p=0.00001
p=0.000005
p=0.00000001
Can use tradeoff to controlquery time independent of query complexity
Select n hosts that together have >3GB of RAM, holding query time constant
27
Deadlines
Climbing Climbing+Hard Limiting Estimation Estimation+Hard Limiting0
0.5
1
1.5
2
2.5
Mechanism
Target Deadline
Find 2 hosts with collective600 GB RAM (VERY RARE)in 50K host grid
Max
Min
28
Extending RGIS to Support Grid Computing On Virtual Machines
• Virtuals– Each RGIS object has a unique id– Virtualization table associates unique id of virtual
resources with unique ids of their constituent physical resources
– Virtual nature of resource is hidden unless query explicitly requests it
• Futures– An RGIS object that does not exist yet– Futures table of unique ids– Future nature of resource hidden unless query
explicitly requests it
29
Related Work• SLP, X.500, LDAP• Condor ClassAds• MDS• R-GMA• Redline• Random sampling from databases
– Olsen, others
30
Conclusions• GIS system based on relational data model• Powerful queries, but expensive to execute• Nondeterminism to control query time
– Can be implemented without RDMBS support– Automated query translation in RGIS
• Several techniques to implement deadlines for queries
31
People and Acknowledgements
• Students– Jason Skicewicz, Andrew Weinrich (Web +
Soap), Jack Lange (CDN)• Collaborator
– Relational Grid Resources Project at Indiana• Beth Plale• http://www.cs.indiana.edu/~plale/projects/RGR
• Funder– NSF
32
For MoreInformation
• URGIS Site– http://www.cs.northwestern.edu/~urgis
• Prescience Lab– http://plab.cs.northwestern.edu
Join The User Comfort Study! http://comfort.cs.northwestern.edu
Special Advertising Section