irisnet cache-and-query for wide area sensor databases amol deshpande, uc berkeley suman nath, cmu...
TRANSCRIPT
IrisNet
Cache-and-Query for Wide Area Cache-and-Query for Wide Area
Sensor DatabasesSensor Databases
Amol Deshpande, UC BerkeleySuman Nath, CMUPhillip Gibbons, Intel Research PittsburghSrinivasan Seshan, CMU
Presented by David Yates, April 9, 2004
June 2003 2IrisNet
OutlineOutline
•Overview of IrisNet
•Example application: Parking Space Finder
•Query processing in IrisNet• Data partitioning
• Distributed query execution
•Conclusions
•Critique
June 2003 3IrisNet
Internet-scale Resource-intensive Internet-scale Resource-intensive Sensor Network Services (IrisNet)Sensor Network Services (IrisNet)•Motivation
• Proliferation of resource-intensive sensors attached to powerful devices
• Webcams, pressure gauges, microphones
• Rich data sources with high data volumes
• Typically distributed over wide geographical areas
• Useful services utilizing such sensors missing
•IrisNet: An infrastructure to support deployment of
sensor services over such sensors
June 2003 4IrisNet
IrisNet: Design GoalsIrisNet: Design Goals
• Ease of deployment of sensor services• Minimal requirements from the service provider
• Distributed data storage and querying for high
throughputs
• Ease of querying• XML as the data format, XPATH as the query language
• Natural geographical hierarchy on data as well as queries
• Continuously evolving data
• Location transparency
• Logical view of the entire distributed database as a single
centralized XML document
June 2003 5IrisNet
IrisNet ArchitectureIrisNet Architecture
•Sensing Agents (SA)• PDA/PC-class processor, MBs–GBs storage
• Collect & process data from sensors, as dictated by “senselet” code uploaded by OAs
• Processed data sent to the OAs for update in-place
•Organizing Agents (OA)• PC/Server-class processor, GBs storage
• Provide data storage, discovery, querying facilities
• Use an off-the-shelf database to store data locally
• Interface with the local database using XPATH/XSLT
SA
SAOAs
June 2003 6IrisNet
OutlineOutline
•Overview of IrisNet
•Example application: Parking Space Finder
•Query processing in IrisNet• Data partitioning
• Distributed query execution
•Conclusions
•Critique
June 2003 7IrisNet
Example Application : Parking Example Application : Parking Space Finder (PSF)Space Finder (PSF)• Webcams monitor parking spaces and provide real-time
information about their availability
• Image processing to extract availability information
• Natural geographical hierarchy on the dataCounty (Allegheny)
City (Pittsburgh)
Neighborhood(Oakland)
Block 1 Block 3Block 2
Parkingspace 1 Parkingspace 3Parkingspace 2
June 2003 8IrisNet
Example XML Fragment for PSFExample XML Fragment for PSF<State id=“Pennysylvinia”>
<County id=“Allegheny”>
<City id=“Pittsburgh”>
<Neighborhood id=“Oakland”>
<total-spaces>200</total-spaces>
<Block id=“1”>
<GPS>…</GPS>
<pSpace id=“1”>
<in-use>no</in-use>
<metered>yes</metered>
</pSpace>
<pSpace id=“2”>
…
</pSpace>
</Block>
</Neighborhood>
<Neighborhood id=“Shadyside”>
…
June 2003 9IrisNet
Example XML Fragment for PSFExample XML Fragment for PSF
Cityid = 'Pittsburgh'
Neighborhoodid = 'Oakland'
Blockid =' 1'
pSpaceid = '1'
in-use
no
metered
yes
Neighborhoodid = 'Shadyside'
Blockid =' 2'
Price
GPS pSpaceid = '2'
total-spaces
200
Stateid='Pennsylvania'
Countyid='Allegheny'
June 2003 10IrisNet
Example QueriesExample Queries
• Users issue queries against the document as a whole
• Find all available parking spots in Oakland
/State[@id=“Pennsylvania”]/County[@id=“Allegheny”]/City[@id=“Pittsburgh”] /Neighborhood[@id=“Oakland”]/Block/pSpace[in-use =
“no”]
• Find all blocks in in Allegheny have more than 20 metered parking spots /State[@id=“Pennsylvania”]/County[@id=“Allegheny”]
//Block[count(./pSpace[metered = “yes”]) > 20]
• Find the cheapest parking spot in Oakland Block 1
/State[@id=“Pennsylvania”]/County[@id=“Allegheny”]/City[@id=“Pittsburgh”] /Neighborhood[@id=“Oakland”]/Block[@id=‘1’] /pSpace[not(../pSpace/price > ./price)]
• Challenge : Evaluate arbitrary XPATH queries against the document even though the document may be partitioned across multiple OAs
June 2003 11IrisNet
Data Partitioning and Query Data Partitioning and Query Processing: OverviewProcessing: Overview
• Maintain data partitioning invariants• Used to guarantee that an OA always has sufficient
information to participate correctly in a query
• Use DNS to maintain the data distribution information and to route queries to data
• Convert the XPATH query to an XSLT query that :• Walks the document recursively
• Evaluates part of the query that can be done locally
• Gathers missing information by asking subqueries
June 2003 12IrisNet
OutlineOutline
•Overview of IrisNet
•Example application: Parking Space Finder
•Query processing in IrisNet• Data partitioning
• Distributed query execution
•Conclusions
•Critique
June 2003 13IrisNet
Partitioning GranularityPartitioning Granularity
•Definition : An IDable node in the document• Has an “id” attribute with value unique among its
siblings
• All its ancestors in the document are IDableCity
id = 'Pittsburgh'
Neighborhoodid = 'Oakland'
Blockid =' 1'
pSpaceid = '1'
in-use
no
metered
yes
Neighborhoodid = 'Shadyside'
Blockid =' 2'
Price
GPS pSpaceid = '2'
total-spaces
200
June 2003 14IrisNet
Partitioning GranularityPartitioning Granularity
•Definition : Local Information of an IDable node• All its attributes and all its non-IDable descendants
• IDs of all its IDable children
Cityid = 'Pittsburgh'
Neighborhoodid = 'Oakland'
Blockid =' 1'
pSpaceid = '1'
in-use
no
metered
yes
Neighborhoodid = 'Shadyside'
Blockid =' 2'
Price
GPS pSpaceid = '2'
total-spaces
200
June 2003 15IrisNet
Partitioning GranularityPartitioning Granularity
•Definition : Local Information of an IDable node• All its attributes and all its non-IDable descendants
• IDs of all its IDable children
Cityid = 'Pittsburgh'
Neighborhoodid = 'Oakland'
Blockid =' 1'
pSpaceid = '1'
in-use
no
metered
yes
Neighborhoodid = 'Shadyside'
Blockid =' 2'
Price
GPS pSpaceid = '2'
total-spaces
200
June 2003 16IrisNet
Data PartitioningData Partitioning
•Data storage, ownership always in units of local
information corresponding to the IDable nodes in
the document• These form a nearly-disjoint partitioning of the overall
document
• Granularity can be controlled using the “id” attributes
• A partitioning unit can be uniquely identified using the “id”’s on the path to the root of the document
•Data ownership:• Each partitioning unit owned by exactly one OA
June 2003 17IrisNet
Data PartitioningData Partitioning
• Data stored locally at each OA:• A document fragment consisting of union of partitioning units
• Constraints:
• Must store the document fragment it owns• If stored the “id” of an IDable node, must also store the
local information of all its ancestors
• We minimize the amount of information required to store (details in paper)
• Only need to store ID’s of all ancestors, and of their children
• Invariant :
• If an OA has the “id” of an IDable node, it either• Has the local information for the node, or• Has the “id”’s on the path to the root allowing it to locate
the local information for that node
June 2003 18IrisNet
Data Partitioning: ExampleData Partitioning: Example
Neighborhoodid = 'Oakland'
Blockid =' 1'
Blockid =' 2'
pSpaceid = '1'
pSpaceid = '2'
Cityid = 'Pittsburgh'
pSpaceid = '3'
Countyid = 'Allegheny'
Neighborhoodid = 'Shadyside'OA 2 Owns
OA 1 Owns
June 2003 19IrisNet
Data Partitioning: ExampleData Partitioning: Example
Data storage configuration at OA 1
Local information required
Local information optional
Neighborhoodid = 'Oakland'
Blockid =' 1'
Blockid =' 2'
pSpaceid = '1'
pSpaceid = '2'
Cityid = 'Pittsburgh'
pSpaceid = '3'
Countyid = 'Allegheny'
Neighborhoodid = 'Shadyside'
Local information optional
June 2003 20IrisNet
Data Partitioning: ExampleData Partitioning: Example
Data storage configuration at OA 2
Local information required
Local information required
Neighborhoodid = 'Oakland'
Blockid =' 1'
Blockid =' 2'
pSpaceid = '1'
pSpaceid = '2'
Cityid = 'Pittsburgh'
pSpaceid = '3'
Countyid = 'Allegheny'
Neighborhoodid = 'Shadyside'
Local information optional
June 2003 21IrisNet
Mapping Data to OAsMapping Data to OAs
• Mapping of nodes to physical OAs maintained using DNS
• For each IDable node, create a unique DNS-style name by concatenating the IDs on the path to the root
• Mapped to OA 1:• Allegheny-County….iris.net• Pittsburgh-City.Allegheny-County….iris.net
• Mapped to OA 2:• Oakland-Neighborhood.Pittsburgh-City. Allegheny-County….iris.net• 1-Block.Oakland-Neighborhood.Pittsburgh-City.Allegheny-County….iris.net• 1-pSpace.1-Block.Oakland-Neighborhood. Pittsburgh-City.Allegheny-County….iris.net• …
Neighborhoodid = 'Oakland'
Blockid =' 1'
Blockid =' 2'
pSpaceid = '1'
pSpaceid = '2'
Cityid = 'Pittsburgh'
pSpaceid = '3'
Countyid = 'Allegheny'
Neighborhoodid = 'Shadyside'OA 2 Owns
OA 1 Owns
June 2003 22IrisNet
OutlineOutline
•Overview of IrisNet
•Example application: Parking Space Finder
•Query processing in IrisNet• Data partitioning
• Distributed query execution
•Conclusions
•Critique
June 2003 23IrisNet
Self-Starting Distributed QueriesSelf-Starting Distributed Queries
• Each query has a hierarchical prefix/State[@id=‘Pennsylvania’]/County[@id=‘Allegheny’]
/City[@id=‘Pittsburgh’]/ /Neighborhood[@id=‘Oakland’]/Block/pSpace
• Simple parsing of the query to extract the least
common ancestor (LCA) of the possible query result
• Send the query to Oakland-
Neighborhood.Pittsburgh-City. Allegheny-
County.Pennsylvania-State.parking.intel-iris.net
• Name extracted from query without any global or
per-service state
June 2003 24IrisNet
QEG Details QEG Details
•Nesting depth of an XPATH query• Maximum depth at which a location path that traverses
over IDable nodes occurs in the query
•Examples :• /a[@id=‘x’]/b[@id=‘y’]/c 0
• /a[@id=‘x’]//c 0
• /a[./b/c]/b 1 (if b is IDable)
• /a[count(./b/[./c[@id=‘1’]]) 2
•Complexity of evaluating a query increases with
nesting depth
June 2003 25IrisNet
Queries with Nesting Depth = 0Queries with Nesting Depth = 0
•Any predicate in the query can be evaluated using
just the local information for an IDable node• Example : …/Block[@id=‘1’][./available-spaces > 10]
•Sketch of the XSLT program :• Walk the document recursively
• If local information for the node under consideration available, evaluate the part of the query that refers to that node, otherwise tag the returned answer with the tag “asksubquery”
•Postprocessor finds the missing information by
asking subqueries
June 2003 26IrisNet
CachingCaching
•A site can add to its document any fragment as
long as the data partitioning constraints are
satisfied
•We generalize subqueries to fetch the smallest
superset of the answer that satisfies the
constraints and cache it
•Data time-stamped at the time of caching
•Queries can specify freshness requirements
June 2003 27IrisNet
Further Details in PaperFurther Details in Paper
•Queries with Nesting Depth > 0
•Schema changes
•Data partitioning changes
•Implementation details and experimental study
June 2003 28IrisNet
ConclusionsConclusions
•Identified the challenges in query processing over
a distributed XML document
•Developed formal framework and techniques that • Allow for flexible document partitioning
• Integrate caching seamlessly
• Correctly and efficiently answer XPATH queries
•Experimental results demonstrate the advantages
of flexible data partitioning and caching
June 2003 29IrisNet
Further InformationFurther Information
• IrisNet project website• http://www.intel-iris.net
June 2003 30IrisNet
OutlineOutline
•Overview of IrisNet
•Example application: Parking Space Finder
•Query processing in IrisNet• Data partitioning
• Distributed query execution
•Conclusions• Performance Study
•Critique
June 2003 31IrisNet
Performance Study SetupPerformance Study Setup• Current prototype written in Java
• A cluster of 9 2GHz Pentium IV machines
• Apache Xindice used as the backend XML database
• Artificially generated database• 2400 parking spaces with 2 cities, 6 neighborhoods and
120 blocks
• Five query workloads• QW-1: Asking for a single block
• QW-2: Asking for two blocks from a single neighborhood
• QW-3: Asking for two blocks from two neighborhoods
• QW-4: Asking for two blocks from two cities
• QW-Mix: 40% of QW-1 and QW-2, 15% QW-3, 5%QW-4
June 2003 32IrisNet
Architectures ComparedArchitectures Compared
Queries
SA Updates
Parking Space
Block
Neighborhood
City
Centralized
Queries
SA Updates
Neigh-borhood
City
Centralized querying,distributed update
Parking Space
Block
June 2003 33IrisNet
CachingCaching
•Architecture already allows for caching data• An OA is allowed to store more data than that it owns
• Data time-stamped at the time of caching
• Queries can specify freshness tolerance
June 2003 34IrisNet
Architectures ComparedArchitectures Compared
Distributed querying/update,fixed two-level organization
SA Updates
Neigh-borhood
City
DNSServer
Parking Space
Block
Distributed querying/updates,hierarchical organization
SA Updates
DNSServer
Neighborhood
City
Parking Space
Block
June 2003 35IrisNet
Query ThroughputsQuery Throughputs
June 2003 36IrisNet
Data Partitioning: Example 2Data Partitioning: Example 2
OA 1 OWNS
OA 2 OWNSNeighborhoodid = 'Oakland'
Blockid =' 1'
Blockid =' 2'
pSpaceid = '1'
pSpaceid = '2'
Cityid = 'Pittsburgh'
pSpaceid = '3'
Countyid = 'Allegheny'
• e.g. OA 2 must store local information of the County(Allegheny) node
June 2003 37IrisNet
ConclusionsConclusions
•Location transparency • distributed DB hidden from user
•Flexible data partitioning
•Low latency queries & Query scalability• Direct query routing to LCA of the answer
• Query-driven caching, supporting partial matches
• Load shedding; No per-service state needed at web servers
•Support query-based consistency
•Use off-the-shelf DB components
June 2003 38IrisNet
Example XML Fragment for PSFExample XML Fragment for PSF
…
<County id=“Allegheny”>
<City id=“Pittsburgh”>
<Neighborhood id=“Oakland”>
<available-spaces>8</available-spaces>
<Block id=“1”>
<pSpace id=“1”>
<in-use>no</in-use>
<metered>yes</metered>
</pSpace>
…
</Block>
</Neighborhood>
</City>
</County>
…
Neighborhoodid = 'Oakland'
Blockid =' 1'
Blockid =' 2'
pSpaceid = '1'
pSpaceid = '2'
Cityid = 'Pittsburgh'
pSpaceid = '3'
Countyid = 'Allegheny'
in-use GPS metered
yesno
June 2003 39IrisNet
OutlineOutline
•Overview of IrisNet
•Example application: Parking Space Finder
•Query processing in IrisNet• Data partitioning
• Distributed query execution
•Conclusions• Performance Study
•Critique
June 2003 40IrisNet
What I liked (strengths)What I liked (strengths)• In general, this is a very good idea paper, but a mediocre evaluation
paper
• Application scenario is different from other sensor database work; data model is novel; and doesn’t share constraints with some other work
• Location transparency is elegant – logical view of distributed database as a single centralized database
• XML has some distinct advantages, e.g., facilitates dynamic update of database schema
• XML also provides standard query interfaces, e.g., XPATH and XSLT
• Query-based consistency that supports an application bypassing a cache if data is too stale (i.e., old)
• Partial match caching is a clever optimization that leverages the cache invariants in the distributed XML database
June 2003 41IrisNet
What I didn’t like (weaknesses)What I didn’t like (weaknesses)
• Proposed cache-and-query system is tied to TCP/IP network and DNS in particular
• Implemented distributed query processing without true distributed caching; authors admit that selective bypassing of caching is needed (at a minimum)
• The experimental setup used is not realistic (distributed database that isn’t really distributed)
• Evaluation is only for queries (without concurrent updates); really need both, e.g., 100% queries (baseline); 95% queries with 5 % updates; 90% queries with 10% updates; 80% with 20%; 60% with 40%
June 2003 42IrisNet
Possible Future WorkPossible Future Work
• Perform evaluation in distributed environment with more realistic network problems (e.g., network latency, packet delay and loss); perhaps this would make caching more important
• Add distributed caching, e.g., selective bypass of caches
• Perform evaluation with query + update workload
• Experiment with caching policies other than “cache everything everywhere”
• Explore other distributed database schemes (for XML)
• Explore other techniques for distributing data and distributing caching