beth plale computer science dept. indiana university calder: ogsa-dai access to data in streams
TRANSCRIPT
![Page 1: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/1.jpg)
Beth Plale Computer Science Dept.
Indiana University
Calder: OGSA-DAI access to data in streams
![Page 2: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/2.jpg)
Contributors
• PhD students:– Nithya Vijayakumar– Ying Liu
• Funding agencies:– Department of Energy
• Early Career grant
– National Science Foundation• LEAD project
![Page 3: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/3.jpg)
Problem Statement• Data streams (e.g., from sensor networks,
instruments) are growing in pervasivness and importance
• Bringing streaming sources to a grid by means of wrapping each sensor and instrument as grid or web service is a naïve solution.
• It is the data in the streams that is of value to growing groups of user, not the instruments.– Select few will ‘steer’ instruments
• Need fresh solution that provides access to collections of data streams.
![Page 4: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/4.jpg)
Outline
• Characterization of data stream systems
![Page 5: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/5.jpg)
Data streams:• Indefinite sequence of events (or
messages, tuples)
• Often time marked– Generation time, that is, timestamp, and– Logical time
• Events continuously generated– pushed or pulled from providers to remote
consumers
![Page 6: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/6.jpg)
Types of Data Stream Systems
Data manipulationsystems-process large amounts of data
Stream routing systems- delivery of events
Stream detectionsystems-detect unusual behavior
sizeand numberof stream“chunks”analyzed
timeliness demandson response
overlap
![Page 7: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/7.jpg)
Stream Routing Systems• Known by various names
– Publish/subscribe, selective data dissemination, document filtering, message oriented middleware (MOM)
• Decisions made event-by-event– Set of queries (usually very large number)
managed over long time duration, arriving event matched against set of queries.
• Stream routing projects– Xfilter (UMaryland), Xyleme (INRIA),
XPushMachine (UWashington), NaradaBroker (IndianaU), Bayou(XeroxParc), Echo (GeorgiaTech)
route
detectmanip
![Page 8: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/8.jpg)
Stream Routing Example
stock quote stream
queries added through user interface
Each arriving event matched against set of queries
Results multicast to owners of satisfied queries
- Timeliness requirement necessitates focus on efficient matching
Long-standingqueries
![Page 9: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/9.jpg)
Data Manipulation Systems
• Event streams subject to transformation, filtering, aggregation.
• Looser timeliness requirements on results• Long running queries, often periodic (based
on assumption of synchronous streams)• Results in generation of new streams• Projects:
– Antarctic Monitoring(UNottingham), sensor network query layer (Cornell), dQUOB (IndianaU), STREAM (Stanford), Fjords (Berkeley), NiagraCQ (UWisconsin)
route
detectmanip
![Page 10: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/10.jpg)
Stream Detection Systems
• Event-oriented (versus periodic)• Less predictable, asynchronous streams• Intent is to detect anomalous behavior• Timeliness is critical, time markers key to
decision making• Result is notification message• Examples:
– R-GMA (EU DataGrid), dQUOB (IndianaU), Conquer (GeorgiaTech), Gigascope (AT&T), Fjords (Berkeley)
route
detectmanip
![Page 11: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/11.jpg)
Claim: streams systems that qualify as Data Stream Resource are only those circled
• A Data resource has coherence and meaning• Both systems qualify as ‘data resource’ because: -- distributed global snapshot on stream behavior alone -- distributed global snapshot has meaning and coherence
Data manipulationsystems
Stream routing systems
Stream detectionsystemsoverlap
Justification:
![Page 12: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/12.jpg)
Rowsetgrid dataservice
Rowsetgrid dataservice
Rowsetgrid dataservice
Rowsetgrid dataservice
OGSA-DAI OGSI access to data sources
RowsetGrid dataservice
RowsetGrid dataservice
Grid dataservice
registry
Grid dataservice
registry
Grid dataservice
registry
Grid dataservice
registryGrid dataService(GDS)
Grid dataService(GDS)
R-DBMSR-DBMS
Data description - database description
Data access - query/update access
Data management - monitor service
Data factory - create rowset instance
![Page 13: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/13.jpg)
Rowsetgrid dataservice
Rowsetgrid dataservice
Rowsetgrid dataservice
Rowsetgrid dataservice
Database Access using OGSA-DAI OGSI
RowsetGrid dataservice
RowsetGrid dataservice
grid dataservice
registry
grid dataservice
registry
grid dataservicefactory
grid dataservicefactory
Grid dataservice
registry
Grid dataservice
registry
Grid dataservice
registry
Grid dataservice
registry
Grid dataService(GDS)
Grid dataService(GDS)
R-DBMS
Grid servicehandle offactory
createGDS
handle ofGDS
query
servicecreate
response(row-by-row)
![Page 14: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/14.jpg)
Rowsetgrid dataservice
Rowsetgrid dataservice
Rowsetgrid dataservice
Rowsetgrid dataservice
Database Access using OGSA-DAI OGSI
RowsetGrid dataservice
RowsetGrid dataservice
grid dataservice
registry
grid dataservice
registry
grid dataservicefactory
grid dataservicefactory
Grid dataservice
registry
Grid dataservice
registry
Grid dataservice
registry
Grid dataservice
registry
Grid dataService(GDS)
Grid dataService(GDS)
Grid servicehandle offactory
createGDS
handle ofGDS
query
servicecreate
response(row-by-row)
![Page 15: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/15.jpg)
Calder: presents db interface to application and supports multiple stream systems (like ogsa-dai)
![Page 16: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/16.jpg)
![Page 17: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/17.jpg)
Specifying Long-Running Queries: An Example
Select from 3 data types, CAPS radar, ACARS data, and NEXRAD Doppler level II,
Filter out events not falling within 80 mile radius around New OrleansExecute beginning immediately and terminating execution in 1 hour. Set sliding window size (RANGE) as window over which joins are carried out
SELECT (caps, acars, nexradII) WHERE REGION(90W, 30N, 62.5KM) START now EXPIRE 1hr RANGE 6 min
![Page 18: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/18.jpg)
Retrieving results from rowset service: Rowset Request API
Results obtained through request to rowset service of form:-- single event based on timestamp,-- range of events bounded by time range,-- most recent n events, or-- stream of events.
getTuple(timestamp, ringBufferID)getRangeTuple(startTS, endTS, ringBufferID)getMostRecent(lastRecent, num_events, ringBufferID)getStream(ringbufferID)
![Page 19: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/19.jpg)
![Page 20: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/20.jpg)
![Page 21: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/21.jpg)
Experimental Environment• GDS, rowset server, stream processing server
– Dual Dell 2.8 GHz Precision workstation, 2 GB memory, RHEL
• Planner– Solaris UltraSPARC 502MHz, 1GB memory,
SunOS 5.8• Computational mesh (1 node)
– Xeon Intel 2.8 Ghz, 2GB RAM, RedHat 8.0• 1 Gbps switched Ethernet network• OGSA-DAI OGSI v4.0• dQUOB v1.0
![Page 22: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/22.jpg)
Evaluation
• Benchmark following steps – GDS setup - ogsa-dai factory call– Query plan time
• Plan how query is to be distributed
– Query compile - compile into portable script form– Distribute query - deploy script into computational
mesh (1 node in test)– Ring buffer setup - allocate space in rowset
service, return handle to GDS.
![Page 23: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/23.jpg)
Benchmark results: average taken over 100 different runs
![Page 24: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/24.jpg)
Time interval (a) obtained bygetRangeTuple(startTS, end TS, ringBufferID)drops as # queries at node increases; turnaround increases
![Page 25: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/25.jpg)
Related Research
• Common Middleware Instrument Architecture (CIMA), McMullen, Bramley et al. • GATES,
– Aggarwal, HPDC04
• Data Cutter, Saltz• Grid Stream Database Manager (GSDM)
– Koparanova and Risch, AxGrids 2003– Stores into O-O DBMS
• DQP – Manchester and Newcastle
• DB community: Widom, Borealis, NiagaraCQ
![Page 26: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/26.jpg)
Summary
• Model of data stream store provides conceptual framework for retrieving data from streams by means of rich and meaningful queries
• GGF DAIS framework of OGSA-DAI is intuitive abstraction for accessing data stream store
• It is not about the instruments and sensors … it is about the streams!
Real-Time WRF Real-Time WRF executed on executed on Grid when Grid when
environment environment primed and primed and
storms presentstorms present
On-DemandOn-DemandResourceResource
SchedulingScheduling
![Page 27: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/27.jpg)
Parting Views
• Our work brings data streams to the grid in way that user intuitively thinks about accessing data resources
• Querying heterogeneous data management tools is a difficult problem. Heterogeneous query languages will exist– I.e., continuous query languages will always be
different from database query languages.
• Scientists need a lot of coaching to understand why their monolithic “pile of perl” is not a grid service, and further why they should invest in breaking up that “pile of perl”.
![Page 28: Beth Plale Computer Science Dept. Indiana University Calder: OGSA-DAI access to data in streams](https://reader033.vdocument.in/reader033/viewer/2022051517/5697bfed1a28abf838cb9220/html5/thumbnails/28.jpg)
http://www.cs.indiana.edu/dde/projects/Calder.html
Calder extends dQUOB (http://www.cs.indiana.edu/dde/projects/dquob.html).
dQUOB v1.0, which is available for release as open source, includes the stream processing system components of Calder.