![Page 1: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/1.jpg)
1
Data Stream Management Systems
CS240B Notesby
Carlo Zaniolo
![Page 2: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/2.jpg)
2
Data Streams
Continuous, unbounded, rapid, time-varying streams of data elements
Occur in a variety of modern applications Network monitoring and traffic engineering Sensor networks, RFID tags Telecom call records Financial applications Web logs and click-streams Manufacturing processes
DSMSDSMS = Data Stream Management System
![Page 3: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/3.jpg)
3
Many Research Projects …
Amazon/CougarAmazon/Cougar (Cornell) – sensors Aurora (Brown/MIT) – sensor monitoring,
dataflow Hancock Hancock (AT&T) – Telecom streams Niagara (OGI/Wisconsin) – Internet DBs & XML OpenCQ OpenCQ (Georgia) – triggers, view maintenance Stream (Stanford) – general-purpose DSMS TapestryTapestry (Xerox) – pubish/subscribe filtering Telegraph (Berkeley) – adaptive engine for
sensors TribecaTribeca (Bellcore) – network monitoring Stream Mill (UCLA) - power & extensibility Gigascope: AT&T Labs – Network Monitoring
![Page 4: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/4.jpg)
4
DSMS
Scratch Store
The (Simplified) Big Picture
Input streams
RegisterQuery Streamed
Result
Archive
StoredRelations
Clients
Server
![Page 5: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/5.jpg)
5
Databases vs Data Streams
Database Systems Model: persistent data Table: set|bag of tuples Updates: All Query: transient Query Answer: exact Query Eval. multi-pass Operator: blocking OK Query Plan: fixed
Data Stream Systems Model: transient data Infinite sequence of tuples Updates: append only Query: persistent Query Answer: Often approx Query Eval. one-pass Operators: unblocking only Query Plan: adaptive
![Page 6: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/6.jpg)
6
Research Challenges
Data Models Relational Streams first, XML streams important
too Tuple-Time Stamping Order is important Windows or other synopses
Query Languages: SQL or XQUERY + extensions
Blocking operators and Expressive Power
Query Plans: Optimized scheduling for response time or memory
Quality of Services (QoS) & Approximation Load shedding, sampling
Support for Advanced Applications Data Stream Mining
![Page 7: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/7.jpg)
7
Data Models
Relational Data Streams Each data stream consists of relational
tuples The stream can be modelled as an
append-only relation But repetitions are allowed and order is
very important! Order based on timestamps—or arrival
order
Streaming XML Data. A stream of structured SAX elements
![Page 8: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/8.jpg)
8
Timestamps Data streams are (basically) ordered according to their
timestamps The meaning of windows, unions an joins is based on timestamps External
Injected by data source Model real-world event represented by tuple Tuples may be out-of-order, but if near-ordered can reorder with small
buffers Internal
Introduced as special field by the DSMS Approx. based on the time they arrived
Missing (called latent in Stream Mill) The system assigns no timestamp to arriving tuples, But tuples are still processed as ordered sequences By operators whose semantics expects timestamps… Thus operators might instantiated timestamps as/when needed
![Page 9: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/9.jpg)
9
Data Stream Query Languages
Continuous queries and
Blocking Operators
![Page 10: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/10.jpg)
10
Query Operators: Sample Stream
Traffic (sourceIP, %source IP address
sourcePort, %port number on source
destIP, % destination IP address
destPort, % port number on destination
length , %length in bytes
time % time stamp
);
![Page 11: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/11.jpg)
11
Blocking Query Operators
No output until the entire input has been seen—i.e., the last tuple in the input, … often detected after we hit the EOF.
Streams – input never ends: thus blocking operators cannot be used as such
Traditional SQL aggregates are blocking Many SQL operators have DBMS implementations that
are blocking but are not intrinsically blocking group by, sort join can be implemented in blcoking and
nonblocking ways Other operators are intrinsically blocking Can we formally characterize which is which?
We will see that nonblocking operators are the monotonic ones
![Page 12: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/12.jpg)
12
Problematic Operators for Data Streams
Blocking query operators—i.e., those that must see everything in the input before they can return anything in the output
NonBlocking query operators are those that can return results now, without seeing the rest of the stream
Selection and projection are nonblocking Set Difference, and Traditional aggregates
are blocking Continuous aggregates are not.
![Page 13: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/13.jpg)
13
Aggregate Invocation: two Forms Traditional
select G, F1 from S where P group by G having F2 op J
With windows (SQL:2003 OLAP Functions)
traffic (sourceIP, sourcePort, destIP , destPort, length, Time)select sourceIP, Time, avg(lenght) over(order by Time, partition by sourceIP 50 rows preceding)
Cumulative (running) window:
... over(order by Time, partition by sourceIP unlimited preceding)
G: grouping attributes,F1,F2: aggregate expressions
![Page 14: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/14.jpg)
14
Aggregate Function Properties
1. distributive: sum, count, min, max2. algebraic: AVG3. holistic: count-distinct, median4. On-line aggregates such as exponentially decaying
AVG5. User-Defined Aggregates (UDAs)
Sliding window invocation 1—2. Efficient computation for memory and CPU
Sliding window invocation on 3 ? Continuous window on these ? Yes, also for 5. UDAs can be similar to any of those
![Page 15: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/15.jpg)
15
Avoiding Blocking Behavior
Windows: aggregates on a limited size window are approximate and nonblocking
DSMS do windows of all kinds: Sliding windows (same as OLAP functions) Tumbles: restart every new window (traditional
definition) Panes: the window is broken up into panes
Punctuation [Tucker, Maier, Sheard, Fegaras]
Assertion about future stream contents Unblocks operators, reduces state
Construct used for avoiding blocking are also useful for avoiding infinite memory
![Page 16: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/16.jpg)
16
Joins
General case problematic on streams: May need to join arbitrarily far-apart stream tuples
Equijoin on timestamps is easy to compute—but not very useful
Majority of work focuses on joins between one stream and a window specified on the other
The symmetric case also common… Traffic2 as B [window TB] …
Multi-joins less common but possible.
Select A.sourceIP, B.sourceIPfrom Traffic1 as A [window TA], Traffic2 as B where A.destIP = B.destIP
![Page 17: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/17.jpg)
17
Join of Stream S with a Table T (where T is a DB relation or a Window on a
Stream)
When a new tuple z with timestamp ts(z) arrives in S, join it with all the tuples in T.- ts(z) is the timestamp of tuples so produced
If T is a window on a stream S’ T must contain all the tuples up to ts(z)
included: cumulative window on S’ But we do not have infinite memory: so we
must approximate T with a synopsis. E.g., 30 minutes preceding
![Page 18: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/18.jpg)
18
Multi-way Sliding Window Joins
Evaluation of n-way sliding window joins queries n streams with associated sliding windows continuously evaluate the joins of all n windows
Two natural joins strategies eager: join is evaluated each time a new tuple arrives
in any of the input streams lazy: join is evaluated with some pre-specified
frequency, e.g., every t time units
Computation incremental, as in differential fixpoint of recursive rules.
![Page 19: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/19.jpg)
19
Query Optimizationand Scheduling
Sceduling to minimize response time or minimize memory—no real change in CPU time
Optimization based on sharing, query plans, operators, buffers, …
![Page 20: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/20.jpg)
20
A Query Plan
⋈
Stream1 Stream2
Stream3
Q1 Q2
⋈
SchedulerGiven – query plan and selectivity estimatesSchedule – tuples through operator chains
![Page 21: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/21.jpg)
21
Schedulers and QoS Metrics
Round Robin (RR) is perhaps the most basic operators in a circular queue are given a fixed
time slice. Starvation is avoided, but little adaptivity
FIFO: takes the first tuple in input and moves it through the chain Minimal latency, poor memory
Greedy Alogrithms: Buffers with most tuples first Tuples that waited longest first Operators that release more memory first
![Page 22: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/22.jpg)
22
Memory Optimization on a Chain[Babcock, Babu, Datar, Motwani]
Time
selectivity = 0.0
selectivity = 0.6
selectivity = 0.2
Net
Sel
ecti
vity
σ1
σ2
σ3
best slopeσ3
σ2
σ1
Input
Output
starvation point
![Page 23: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/23.jpg)
23
Main ideas
Operators are thought of as filters which Operate on a set of tuples Produce s tuples in return
s selectivity of an operator If s = 0.2 we can interpret the value in two
ways Out of every 10 tuples, the operator outputs 2
tuples If the input requires 1 unit of memory, the output
will require 0.2 units of memory
![Page 24: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/24.jpg)
24
The lower envelope
Imagine there is a line from this point to every operator point (ti, si) to its right
The operator that corresponds to the line with the steepest slope is called the “steepest descent operator point”
![Page 25: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/25.jpg)
25
The Lower Envelope By starting at
the first point (t0, s0) and repeatedly calculating the steepest descent operator point we find the lower envelope P’ for a progress chart P
Notice that the slopes of the segments are non-increasing
The operators in each segment form a chain.
FIFO within chain Greedy across
chains
![Page 26: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/26.jpg)
26
Scheduling Chain minimizes memory be required in special
overload situations But increases response time (latency) Typically though we want to optimize for response time
Different scheduling protocols optimize different objectives: latency, inaccuracy, memory use, computation, starvation, … Computation complexity is independent from scheduler Different policies give significantly different results
only for bursty loads
Research Issues: Complex query plans (beyond simple paths) Minimization of response time Adaptive strategies: how do we switch between the
two to adapt to load changes?
![Page 27: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/27.jpg)
27
Optimization by Sharing
In traditional multi-query optimization: sharing (of expressions, results etc) among
queries can lead to improved performance
Examples:Similar issues arise when processing queries on streams: sharing of query operators and expressions sharing of sliding windows
![Page 28: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/28.jpg)
28
Multi-query Processing on Streams
Opportunities for optimization when windows are shared---e.g:
select sum (A.length)from Traffic1 A [window 1hour], Traffic2 B [window 1 hour]where A.destIP = B.destIP
select count (distinct A.sourceIP)from Traffic1 A [window 1 min], Traffic2 B [window 1 min]where A.destIP = B.destIP
Strategies for scheduling the evaluation of shared joins: Largest window only Smallest window first Process at any instant the tuple that is likely to benefit the
largest number of joins (maximize throughput)
![Page 29: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/29.jpg)
29
Shared Predicates [Niagara, Telegraph]
R.A > 1R.A > 7
R.A > 11
R.A < 3R.A < 5
R.A = 6R.A = 8
R.A ≠ 9
Predicatesfor R.A
7
1 11
A>7 A>11
9
A<3
3
6 8
A<5
A>1
>
<
=
≠
TupleA=8
![Page 30: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/30.jpg)
30
QoS and Load Schedding
When input stream rate exceeds system capacity a stream manager can shed load (tuples)
Load shedding affects queries and their answers
Introducing load shedding in a data stream manager is a challenging problem
Random and semantic load shedding
![Page 31: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/31.jpg)
31
DSMSQuality of Service (QOS)
Approximation and Load Shedding
![Page 32: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/32.jpg)
32
QOS via Synopses and Approximation
Synopsis: bounded-memory history-approximation Succinct summary of old stream tuples Like indexes/materialized-views, but base data is
unavailable Examples
Sliding Windows Samples Sketching techniques Histograms Wavelet representation
Approximate Algorithms: e.g., median, quantiles,…
Fast and light Data Mining algorithms
![Page 33: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/33.jpg)
33
QoS and Load Schedding
When input stream rate exceeds system capacity a stream manager can shed load (tuples)
Load shedding affects queries and their answers: drop the tasks and the tuples that will cause least loss
Introducing load shedding in a data stream manager is a challenging problem
Random load shedding or semantic load shedding
![Page 34: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/34.jpg)
34
XML Data Streams
![Page 35: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/35.jpg)
35
XML Data Streams: Applications
• An XML data stream is a sequence of tokens
• Data and application integration
• Distributed monitoring of computing systems
• Message-based web services
• Purchase orders, retail transactions
• Personalized content delivery
![Page 36: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/36.jpg)
36
XML Streams: Data ModelXML data: tree structure
<Purchase_Doc><PR_Number val = “50”/><Supp_Name>ABC</
Supp_Name><Address><City>Florham Park</City><State>New Jersey</State></Address><Line_Items><Item><Part_Number val=
“1050”/><Quantity val=“20”/></Item>
Data stream: ~ SAX events
[element Purchase_Doc anyType]
[element PR_Number anyType]
[attribute val anySimpleType][chardata 50][end-attribute][end-element][element Supp_Name
anyType][text ABC][end-element]…
![Page 37: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/37.jpg)
37
XML Query Languages
XML query languages Xquery, XSLT, Xpath Declarative matching of structured data and
text Easy restructuring to meet needs of data
consumers
![Page 38: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/38.jpg)
38
XML Streams: research Issue
Efficient Processing of single/multiple queries (e.g., Xfilters/Yfilters)
Blocking operators/constructs in XQuery—e.g., XQuery new function definition mechanisms are blocking
Integration of relational and XML DSMS—just like relational and XML DBMS are now being intergrated.
![Page 39: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/39.jpg)
39
Prototype Systems
Aurora (Brandeis, Brown, MIT) [CCC+02] Gigascope (AT&T) [CJSS03] Hancock (AT&T) [CFP+00] STREAM (Stanford) [MWA+03] Telegraph (Berkeley) [CCD+03] … Stream Mill [UCLA]
![Page 40: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/40.jpg)
40
Aurora (Brandeis, Brown, MIT)
Geared towards monitoring applications (streams, triggers, imprecise data, real time requirements)
Specified set of operators, connected in a data flow graph
Optimization of the data flow graph Three query modes (continuous, ad-hoc, view) Aurora accepts QoS specifications and attempts to
optimize QoS for the outputs produced Real time scheduling, introspection and load
shedding
![Page 41: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/41.jpg)
41
AT&T: Hancock and Gigascope
Hancock: A C-based domain specific language which facilitates signature extraction from transactional data streams.
Signature: charetizes behavior of customer or services Support for efficient and tunable representation of
signature collections Support for custom scalable persistent data structures Elaborate statistics collection from streams
Gigascope: SQL based DSMS for monitoring of network data
![Page 42: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/42.jpg)
42
STREAM [Stanford Uiversity]
General purpose stream data manager CQL (continuous query language) for
declarative query specification Consider query plan generation Resource management:
Operator scheduling Static and dynamic approximations
![Page 43: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/43.jpg)
43
Telegraph [UCB]
Continuous query processing system Support for stream oriented operators Support for adaptivity in query processingVarious aspects of optimized multi-query
stream processing
![Page 44: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/44.jpg)
44
Commercial Systems
Sybase: publish-subscribe using MQ (Memory Queues) MQs: are in-memory tables processed using
active rules and stored procedures Similar solutions in Oracle and Teradata. But
IBM's MQSeries, Microsoft's MSMQ are web-service oriented: Java Message Service (JMS), WebSphere, CORBA.
Two DSMS startups: CORAL8: http://coral8.com/
Streambase: http://www.streambase.com/
![Page 45: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/45.jpg)
45
More Tutorial Talks
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani,Jennifer Widomhttp://theory.stanford.edu/~rajeev/pods-full-talk.ppt
Nick Koudas and Divesh Srivastava. Data stream query processing. Tutorial presented at International Conference on Very Large Databases (VLDB), 1149, 2003. [ PDF | talk slides (PDF)
Nick Koudas et al. Matching XML Documents Approximately (with S. Yahia and D. Srivastava) Tutorial delivered at ICDE 2003
Nick Koudas et al. Stream Data Management: Research Directions and Opportunities. Invited Talk at IDEAS 2002.
Nick Koudas et al. Mining Data Streams (with S. Guha) Invited Tutorial delivered at PAKDD 2003
![Page 46: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/46.jpg)
46
Implementation Approaches for Continuous Queries on Streaming XML
Automata-based techniques: XFilter [AF00]: finite state machine per path
expression XTrie [CFGR02]: shares common sub-paths of PC
paths YFilter [DF03]: single NFA for all path expressions [GMOS03]: single DFA, limitations on flexibility XPush [GS03]: pushdown automaton for tree patterns
Index-based techniques: MatchMaker [LP02]: shared tree patterns IndexFilter [BGKS03]: shared path expressions,
comparison
![Page 47: 1 Data Stream Management Systems CS240B Notes by Carlo Zaniolo](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d7e5503460f94a61e3a/html5/thumbnails/47.jpg)
47
XML Stream Processing: Key Ideas
Obtain bindings of for clause path expression variables Ordered sequence, no duplicates
Filter bindings using where clause path expression predicates Existential check suffices
Compute bindings of return clause path expressions Ordered (possibly null) sequence
Goal: Efficient matching/binding of XML path expressions Very large number of path expressions