continuous data stream processing
DESCRIPTION
Continuous Data Stream Processing. MAKE Lab. Post-Excellence Project Subproject 6. Date: 2006/03/07. Peer search engine. Profile database. Cluster coordinator. Cluster monitor. Music channel simulator. XML Filtering engine. MusicXML database. Music Virtual Channel. Clustering - PowerPoint PPT PresentationTRANSCRIPT
Continuous Data Stream Continuous Data Stream ProcessingProcessing
MAKE Lab
Date: 2006/03/07Post-Excellence ProjectPost-Excellence ProjectSubproject 6Subproject 6
Continuous Data Stream Processing
22
Clusteringengine
Clusteringengine
Music metadat
a
Music metadat
a
Music Virtual ChannelMusic Virtual Channel
…11
NN
22
…
Music collections
Internet V.C.player
V.C.player Filtering
engineFilteringengine
Music channel simulat
or
Music channel simulat
or
InterfaceInterface
ProfilemonitorProfile
monitorChannelmonitorChannelmonitor
FavoritechannelFavoritechannel
ClustermonitorClustermonitor
Clustercoordinator
Clustercoordinator
Peer searchengine
Peer searchengine
Profiledatabase
Profiledatabase
MusicXML
database
MusicXML
database
XML Filteringengine
XML Filteringengine
Continuous Data Stream Processing
33
Research DirectionsResearch Directions
Streaming Data
Management
Mining
Filtering
Temporal Query Processing
Spatial Query Processing
Aggregate Query Processing
Frequent Tree Pattern Mining
Frequent Itemset Mining(sliding window)
Sequence Query Matching
Episode Query Matching
Range Search
KNN Search
Top-K Search
Closed Tree Pattern Mining
Frequent Itemset Mining(landmark model)
Continuous Data Stream Processing
44
Sequence Query MatchingSequence Query Matching
Given a set of sequence queries (SQs), how to continuously monitor the event stream for them and report the segments that are approximate answers of certain queries as soon as the segments arrive according to the error bounds of the queries?
Event Stream <a,b,c,d><c,e><a,b,c><b,d><a,d><e,f><a,e><a,b,c
><e,f><a,b,c><e><b,c,e><d,f>······················Sequence Query
<a,b,c><b,d><a,c,d><e,f><a,e>, ε=1
Continuous Data Stream Processing
55
Episode Query MatchingEpisode Query Matching
Knowledge Discovery from Telecommunication Network Alarm Databases [ICDE96] If an alarm of type A occurs, then an alarm of type B occurs within 30
seconds with probability 0.8 If alarms of types A and B occurs within 5 seconds, then a alarm of typ
e C occurs within 60 seconds with probability 0.7 If an alarm of type A precedes an alarm of type B, and C precedes D, a
ll within 15 seconds, then E will follow within 4 minutes with probability 0.6
A
A B5 seconds
C D
A
B
15 seconds
Continuous Data Stream Processing
66
Top-K Query Top-K Query
Suppose there are two continuous queries and . Then, another continuous query is registered.
Coordinator
Server 1
Server 2 Server 3
Server4
Queries
Which two web documents are the most popular across the first and second servers?
Which two web documents are the most popular across the third and fourth servers?
Which two web documents are the most popular across the second and third servers?
Continuous Data Stream Processing
77
Main DifficultiesMain Difficulties
Heavy Communication Cost The serve only updates its current data when necessary
Multiple Continuous Queries Most papers focus on one-time top-k queries or single
continuous top-k query Information sharing is necessary
Continuous Data Stream Processing
88
SearchengineSearchengine
V.C.player
V.C.player
V.C.player
V.C.player
user profile,channel
V.C.playerrecommended
channel
selectedchannel
Vote Mechanism
Spatial Query ProcessingSpatial Query Processing
Continuous queries for moving objects in high-dimensional space Range search KNN search
userprofile
Continuous Data Stream Processing
99
Problem DefinitionProblem Definition
Given a set of objects with their positions on a N-dimension (N>20) region. The set of objects is highly dynamic: each object can move in an unrestricted fashion, i.e., we do not assume any pattern of motion
Continuously monitoring the results of each query point Range Query KNN Query
Continuous Data Stream Processing
1010
Main DifficultiesMain Difficulties
Heavy Communication Cost The object updates occur only when the results for some
queries might change• Safe Region [SIGMOD05]
Incremental Update Efficiently maintain the effective results
Multiple Continuous Queries Decide the quarantine area for each query
Mixed Types of Queries Support both the range query and
the KNN query Q1 Q2
Q1Q2
Q1 Q2
Continuous Data Stream Processing
1111
Range QueryRange Query
Query Q: (x,y), r
Cell CA: max < rB: min r maxC: min > rmax: dis(query,cell)min: dis(query,cell)
Continuous Data Stream Processing
1212
Range Query (Cont.)Range Query (Cont.)
Moving Query MQ
How to maintain the Result for a MQ?
Continuous Data Stream Processing
1313
Range Query (Cont.)Range Query (Cont.)
When to update?
Q1 Q2 Q3
A A A
A A B
A A C
No update and no recalculate
Update and recalculate for some queries
No update and no recalculate
We only need to consider those objects marked with B
flag = 0/1
Client
Server Q1 Q2 Q3
Continuous Data Stream Processing
1414
Range Query (Cont.)Range Query (Cont.)
For a range query Q
Result list O3 O5 O7
Affected queries Q2 Q4 Q7A
For a cell C
Q3 Q6 Q9BC2
Covered cells
C2
C3 C4 C5A
C2 C7 C9B
Query Motion
Continuous Data Stream Processing
1515
KNN QueryKNN Query
Query Q: (x,y), 3
update the order
Object Update
re-computation
update the order
Continuous Data Stream Processing
1616
KNN Query (Cont.)KNN Query (Cont.)
Query Q: (x,y), 3
Query Q’: (x’,y’), rr = d’max
d’max
Continuous Data Stream Processing
1717
KNN Query (Cont.)KNN Query (Cont.)
Query Q: (x,y), 3
dmax
dquery
Query Q’: (x’,y’), rr = dmax+dquery
Continuous Data Stream Processing
1818
KNN Query (Cont.)KNN Query (Cont.)
Query Q: (x,y), 3
dmax
dcell
Query Q’: (x’,y’), rr = dmax+dcell
Continuous Data Stream Processing
1919
Tree Pattern Mining
As the trees stream in, find out the subtrees that occur more than θ·N times, where N is the number of trees received so far and 0≦θ 1≦
STMerSTMer
Frequent Tree Patterns
T1 T3 T2
Continuous Data Stream Processing
2020
Closed Tree Pattern Mining
Mining closed frequent subtrees over data streams a subtree is closed if none of its proper supertrees
has the same support as its
A
B
C D
A
B
C
B
C D
closed
A B C D
B
D
B
C
B
C D
A
B
C2 3 3 2 2 3 2 2
frequent subtrees
A
B2