irisnet cache-and-query for wide area sensor databases amol deshpande, uc berkeley suman nath, cmu...

IrisNet

Cache-and-Query for Wide Area Cache-and-Query for Wide Area

Sensor DatabasesSensor Databases

Amol Deshpande, UC BerkeleySuman Nath, CMUPhillip Gibbons, Intel Research PittsburghSrinivasan Seshan, CMU

Presented by David Yates, April 9, 2004

June 2003 2IrisNet

OutlineOutline

•Overview of IrisNet

•Example application: Parking Space Finder

•Query processing in IrisNet• Data partitioning

• Distributed query execution

•Conclusions

•Critique

June 2003 3IrisNet

Internet-scale Resource-intensive Internet-scale Resource-intensive Sensor Network Services (IrisNet)Sensor Network Services (IrisNet)•Motivation

• Proliferation of resource-intensive sensors attached to powerful devices

• Webcams, pressure gauges, microphones

• Rich data sources with high data volumes

• Typically distributed over wide geographical areas

• Useful services utilizing such sensors missing

•IrisNet: An infrastructure to support deployment of

sensor services over such sensors

June 2003 4IrisNet

IrisNet: Design GoalsIrisNet: Design Goals

• Ease of deployment of sensor services• Minimal requirements from the service provider

• Distributed data storage and querying for high

throughputs

• Ease of querying• XML as the data format, XPATH as the query language

• Natural geographical hierarchy on data as well as queries

• Continuously evolving data

• Location transparency

• Logical view of the entire distributed database as a single

centralized XML document

June 2003 5IrisNet

IrisNet ArchitectureIrisNet Architecture

•Sensing Agents (SA)• PDA/PC-class processor, MBs–GBs storage

• Collect & process data from sensors, as dictated by “senselet” code uploaded by OAs

• Processed data sent to the OAs for update in-place

•Organizing Agents (OA)• PC/Server-class processor, GBs storage

• Provide data storage, discovery, querying facilities

• Use an off-the-shelf database to store data locally

• Interface with the local database using XPATH/XSLT

SA

SAOAs

June 2003 6IrisNet

OutlineOutline





•Conclusions

•Critique

June 2003 7IrisNet

Example Application : Parking Example Application : Parking Space Finder (PSF)Space Finder (PSF)• Webcams monitor parking spaces and provide real-time

information about their availability

• Image processing to extract availability information

• Natural geographical hierarchy on the dataCounty (Allegheny)

City (Pittsburgh)

Neighborhood(Oakland)

Block 1 Block 3Block 2

Parkingspace 1 Parkingspace 3Parkingspace 2

June 2003 8IrisNet

Example XML Fragment for PSFExample XML Fragment for PSF<State id=“Pennysylvinia”>

<County id=“Allegheny”>

<City id=“Pittsburgh”>

<Neighborhood id=“Oakland”>

<total-spaces>200</total-spaces>

<Block id=“1”>

<GPS>…</GPS>

<pSpace id=“1”>

<in-use>no</in-use>

<metered>yes</metered>

</pSpace>

<pSpace id=“2”>

…

</pSpace>

</Block>

</Neighborhood>

<Neighborhood id=“Shadyside”>

…

June 2003 9IrisNet

Example XML Fragment for PSFExample XML Fragment for PSF

Cityid = 'Pittsburgh'

Neighborhoodid = 'Oakland'

Blockid =' 1'

pSpaceid = '1'

in-use

no

metered

yes

Neighborhoodid = 'Shadyside'

Blockid =' 2'

Price

GPS pSpaceid = '2'

total-spaces

200

Stateid='Pennsylvania'

Countyid='Allegheny'

June 2003 10IrisNet

Example QueriesExample Queries

• Users issue queries against the document as a whole

• Find all available parking spots in Oakland

/State[@id=“Pennsylvania”]/County[@id=“Allegheny”]/City[@id=“Pittsburgh”] /Neighborhood[@id=“Oakland”]/Block/pSpace[in-use =

“no”]

• Find all blocks in in Allegheny have more than 20 metered parking spots /State[@id=“Pennsylvania”]/County[@id=“Allegheny”]

//Block[count(./pSpace[metered = “yes”]) > 20]

• Find the cheapest parking spot in Oakland Block 1

/State[@id=“Pennsylvania”]/County[@id=“Allegheny”]/City[@id=“Pittsburgh”] /Neighborhood[@id=“Oakland”]/Block[@id=‘1’] /pSpace[not(../pSpace/price > ./price)]

• Challenge : Evaluate arbitrary XPATH queries against the document even though the document may be partitioned across multiple OAs

June 2003 11IrisNet

Data Partitioning and Query Data Partitioning and Query Processing: OverviewProcessing: Overview

• Maintain data partitioning invariants• Used to guarantee that an OA always has sufficient

information to participate correctly in a query

• Use DNS to maintain the data distribution information and to route queries to data

• Convert the XPATH query to an XSLT query that :• Walks the document recursively

• Evaluates part of the query that can be done locally

• Gathers missing information by asking subqueries

June 2003 12IrisNet

OutlineOutline





•Conclusions

•Critique

June 2003 13IrisNet

Partitioning GranularityPartitioning Granularity

•Definition : An IDable node in the document• Has an “id” attribute with value unique among its

siblings

• All its ancestors in the document are IDableCity

id = 'Pittsburgh'


Blockid =' 1'

pSpaceid = '1'

in-use

no

metered

yes


Blockid =' 2'

Price

GPS pSpaceid = '2'

total-spaces

200

June 2003 14IrisNet


•Definition : Local Information of an IDable node• All its attributes and all its non-IDable descendants

• IDs of all its IDable children



Blockid =' 1'

pSpaceid = '1'

in-use

no

metered

yes


Blockid =' 2'

Price

GPS pSpaceid = '2'

total-spaces

200

June 2003 15IrisNet


•Definition : Local Information of an IDable node• All its attributes and all its non-IDable descendants

• IDs of all its IDable children



Blockid =' 1'

pSpaceid = '1'

in-use

no

metered

yes


Blockid =' 2'

Price

GPS pSpaceid = '2'

total-spaces

200

June 2003 16IrisNet

Data PartitioningData Partitioning

•Data storage, ownership always in units of local

information corresponding to the IDable nodes in

the document• These form a nearly-disjoint partitioning of the overall

document

• Granularity can be controlled using the “id” attributes

• A partitioning unit can be uniquely identified using the “id”’s on the path to the root of the document

•Data ownership:• Each partitioning unit owned by exactly one OA

June 2003 17IrisNet

Data PartitioningData Partitioning

• Data stored locally at each OA:• A document fragment consisting of union of partitioning units

• Constraints:

• Must store the document fragment it owns• If stored the “id” of an IDable node, must also store the

local information of all its ancestors

• We minimize the amount of information required to store (details in paper)

• Only need to store ID’s of all ancestors, and of their children

• Invariant :

• If an OA has the “id” of an IDable node, it either• Has the local information for the node, or• Has the “id”’s on the path to the root allowing it to locate

the local information for that node

June 2003 18IrisNet

Data Partitioning: ExampleData Partitioning: Example


Blockid =' 1'

Blockid =' 2'

pSpaceid = '1'

pSpaceid = '2'


pSpaceid = '3'

Countyid = 'Allegheny'

Neighborhoodid = 'Shadyside'OA 2 Owns

OA 1 Owns

June 2003 19IrisNet


Data storage configuration at OA 1

Local information required

Local information optional


Blockid =' 1'

Blockid =' 2'

pSpaceid = '1'

pSpaceid = '2'


pSpaceid = '3'




June 2003 20IrisNet


Data storage configuration at OA 2




Blockid =' 1'

Blockid =' 2'

pSpaceid = '1'

pSpaceid = '2'


pSpaceid = '3'




June 2003 21IrisNet

Mapping Data to OAsMapping Data to OAs

• Mapping of nodes to physical OAs maintained using DNS

• For each IDable node, create a unique DNS-style name by concatenating the IDs on the path to the root

• Mapped to OA 1:• Allegheny-County….iris.net• Pittsburgh-City.Allegheny-County….iris.net

• Mapped to OA 2:• Oakland-Neighborhood.Pittsburgh-City. Allegheny-County….iris.net• 1-Block.Oakland-Neighborhood.Pittsburgh-City.Allegheny-County….iris.net• 1-pSpace.1-Block.Oakland-Neighborhood. Pittsburgh-City.Allegheny-County….iris.net• …


Blockid =' 1'

Blockid =' 2'

pSpaceid = '1'

pSpaceid = '2'


pSpaceid = '3'


Neighborhoodid = 'Shadyside'OA 2 Owns

OA 1 Owns

June 2003 22IrisNet

OutlineOutline





•Conclusions

•Critique

June 2003 23IrisNet

Self-Starting Distributed QueriesSelf-Starting Distributed Queries

• Each query has a hierarchical prefix/State[@id=‘Pennsylvania’]/County[@id=‘Allegheny’]

/City[@id=‘Pittsburgh’]/ /Neighborhood[@id=‘Oakland’]/Block/pSpace

• Simple parsing of the query to extract the least

common ancestor (LCA) of the possible query result

• Send the query to Oakland-

Neighborhood.Pittsburgh-City. Allegheny-

County.Pennsylvania-State.parking.intel-iris.net

• Name extracted from query without any global or

per-service state

June 2003 24IrisNet

QEG Details QEG Details

•Nesting depth of an XPATH query• Maximum depth at which a location path that traverses

over IDable nodes occurs in the query

•Examples :• /a[@id=‘x’]/b[@id=‘y’]/c 0

• /a[@id=‘x’]//c 0

• /a[./b/c]/b 1 (if b is IDable)

• /a[count(./b/[./c[@id=‘1’]]) 2

•Complexity of evaluating a query increases with

nesting depth

June 2003 25IrisNet

Queries with Nesting Depth = 0Queries with Nesting Depth = 0

•Any predicate in the query can be evaluated using

just the local information for an IDable node• Example : …/Block[@id=‘1’][./available-spaces > 10]

•Sketch of the XSLT program :• Walk the document recursively

• If local information for the node under consideration available, evaluate the part of the query that refers to that node, otherwise tag the returned answer with the tag “asksubquery”

•Postprocessor finds the missing information by

asking subqueries

June 2003 26IrisNet

CachingCaching

•A site can add to its document any fragment as

long as the data partitioning constraints are

satisfied

•We generalize subqueries to fetch the smallest

superset of the answer that satisfies the

constraints and cache it

•Data time-stamped at the time of caching

•Queries can specify freshness requirements

June 2003 27IrisNet

Further Details in PaperFurther Details in Paper

•Queries with Nesting Depth > 0

•Schema changes

•Data partitioning changes

•Implementation details and experimental study

June 2003 28IrisNet

ConclusionsConclusions

•Identified the challenges in query processing over

a distributed XML document

•Developed formal framework and techniques that • Allow for flexible document partitioning

• Integrate caching seamlessly

• Correctly and efficiently answer XPATH queries

•Experimental results demonstrate the advantages

of flexible data partitioning and caching

June 2003 29IrisNet

Further InformationFurther Information

• IrisNet project website• http://www.intel-iris.net

June 2003 30IrisNet

OutlineOutline





•Conclusions• Performance Study

•Critique

June 2003 31IrisNet

Performance Study SetupPerformance Study Setup• Current prototype written in Java

• A cluster of 9 2GHz Pentium IV machines

• Apache Xindice used as the backend XML database

• Artificially generated database• 2400 parking spaces with 2 cities, 6 neighborhoods and

120 blocks

• Five query workloads• QW-1: Asking for a single block

• QW-2: Asking for two blocks from a single neighborhood

• QW-3: Asking for two blocks from two neighborhoods

• QW-4: Asking for two blocks from two cities

• QW-Mix: 40% of QW-1 and QW-2, 15% QW-3, 5%QW-4

June 2003 32IrisNet

Architectures ComparedArchitectures Compared

Queries

SA Updates

Parking Space

Block

Neighborhood

City

Centralized

Queries

SA Updates

Neigh-borhood

City

Centralized querying,distributed update

Parking Space

Block

June 2003 33IrisNet

CachingCaching

•Architecture already allows for caching data• An OA is allowed to store more data than that it owns

• Data time-stamped at the time of caching

• Queries can specify freshness tolerance

June 2003 34IrisNet

Architectures ComparedArchitectures Compared

Distributed querying/update,fixed two-level organization

SA Updates

Neigh-borhood

City

DNSServer

Parking Space

Block

Distributed querying/updates,hierarchical organization

SA Updates

DNSServer

Neighborhood

City

Parking Space

Block

June 2003 35IrisNet

Query ThroughputsQuery Throughputs

June 2003 36IrisNet

Data Partitioning: Example 2Data Partitioning: Example 2

OA 1 OWNS

OA 2 OWNSNeighborhoodid = 'Oakland'

Blockid =' 1'

Blockid =' 2'

pSpaceid = '1'

pSpaceid = '2'


pSpaceid = '3'


• e.g. OA 2 must store local information of the County(Allegheny) node

June 2003 37IrisNet

ConclusionsConclusions

•Location transparency • distributed DB hidden from user

•Flexible data partitioning

•Low latency queries & Query scalability• Direct query routing to LCA of the answer

• Query-driven caching, supporting partial matches

• Load shedding; No per-service state needed at web servers

•Support query-based consistency

•Use off-the-shelf DB components

June 2003 38IrisNet

Example XML Fragment for PSFExample XML Fragment for PSF

…

<County id=“Allegheny”>

<City id=“Pittsburgh”>

<Neighborhood id=“Oakland”>

<available-spaces>8</available-spaces>

<Block id=“1”>

<pSpace id=“1”>

<in-use>no</in-use>

<metered>yes</metered>

</pSpace>

…

</Block>

</Neighborhood>

</City>

</County>

…


Blockid =' 1'

Blockid =' 2'

pSpaceid = '1'

pSpaceid = '2'


pSpaceid = '3'


in-use GPS metered

yesno

June 2003 39IrisNet

OutlineOutline





•Conclusions• Performance Study

•Critique

June 2003 40IrisNet

What I liked (strengths)What I liked (strengths)• In general, this is a very good idea paper, but a mediocre evaluation

paper

• Application scenario is different from other sensor database work; data model is novel; and doesn’t share constraints with some other work

• Location transparency is elegant – logical view of distributed database as a single centralized database

• XML has some distinct advantages, e.g., facilitates dynamic update of database schema

• XML also provides standard query interfaces, e.g., XPATH and XSLT

• Query-based consistency that supports an application bypassing a cache if data is too stale (i.e., old)

• Partial match caching is a clever optimization that leverages the cache invariants in the distributed XML database

June 2003 41IrisNet

What I didn’t like (weaknesses)What I didn’t like (weaknesses)

• Proposed cache-and-query system is tied to TCP/IP network and DNS in particular

• Implemented distributed query processing without true distributed caching; authors admit that selective bypassing of caching is needed (at a minimum)

• The experimental setup used is not realistic (distributed database that isn’t really distributed)

• Evaluation is only for queries (without concurrent updates); really need both, e.g., 100% queries (baseline); 95% queries with 5 % updates; 90% queries with 10% updates; 80% with 20%; 60% with 40%

June 2003 42IrisNet

Possible Future WorkPossible Future Work

• Perform evaluation in distributed environment with more realistic network problems (e.g., network latency, packet delay and loss); perhaps this would make caching more important

• Add distributed caching, e.g., selective bypass of caches

• Perform evaluation with query + update workload

• Experiment with caching policies other than “cache everything everywhere”

• Explore other distributed database schemes (for XML)

• Explore other techniques for distributing data and distributing caching

irisnet cache-and-query for wide area sensor databases amol deshpande, uc berkeley suman nath, cmu...

Documents