continuous data stream processing

Click here to load reader

Post on 05-Jan-2016

39 views

Category:

Documents

5 download

Embed Size (px)

DESCRIPTION

Continuous Data Stream Processing. MAKE Lab. Post-Excellence Project Subproject 6. Date: 2006/03/07. Peer search engine. Profile database. Cluster coordinator. Cluster monitor. Music channel simulator. XML Filtering engine. MusicXML database. Music Virtual Channel. Clustering - PowerPoint PPT Presentation

TRANSCRIPT

  • Continuous Data Stream ProcessingMAKE LabDate: 2006/03/07Post-Excellence ProjectSubproject 6

    Continuous Data Stream Processing*

    Music Virtual ChannelClusteringengineMusic metadata1N2Music collectionsInternetV.C.playerV.C.playerFilteringengineInterfaceProfilemonitorChannelmonitorFavoritechannel

    Continuous Data Stream Processing*

    Research DirectionsStreaming Data ManagementMiningFilteringTemporal Query ProcessingSpatial Query ProcessingAggregate Query ProcessingFrequent Tree Pattern MiningFrequent Itemset Mining(sliding window)Sequence Query MatchingEpisode Query Matching Range Search KNN Search Top-K Search Closed Tree Pattern MiningFrequent Itemset Mining(landmark model)

    Continuous Data Stream Processing*

    Sequence Query MatchingGiven a set of sequence queries (SQs), how to continuously monitor the event stream for them and report the segments that are approximate answers of certain queries as soon as the segments arrive according to the error bounds of the queries?Event StreamSequence Query, =1

    Continuous Data Stream Processing*

    Episode Query MatchingKnowledge Discovery from Telecommunication Network Alarm Databases [ICDE96]If an alarm of type A occurs, then an alarm of type B occurs within 30 seconds with probability 0.8If alarms of types A and B occurs within 5 seconds, then a alarm of type C occurs within 60 seconds with probability 0.7If an alarm of type A precedes an alarm of type B, and C precedes D, all within 15 seconds, then E will follow within 4 minutes with probability 0.6

    AABCDAB

    Continuous Data Stream Processing*

    Top-K Query Suppose there are two continuous queries and . Then, another continuous query is registered.

    CoordinatorServer 1Server 2Server 3Server4QueriesWhich two web documents are the most popular across the first and second servers?Which two web documents are the most popular across the third and fourth servers?Which two web documents are the most popular across the second and third servers?

    WP

    AI

    VUd

    WP

    AI

    VUd

    WP

    AI

    VUd

    WP

    AI

    VUd

    Continuous Data Stream Processing*

    Main DifficultiesHeavy Communication CostThe serve only updates its current data when necessaryMultiple Continuous QueriesMost papers focus on one-time top-k queries or single continuous top-k queryInformation sharing is necessary

    Continuous Data Stream Processing*

    Spatial Query ProcessingContinuous queries for moving objects in high-dimensional spaceRange searchKNN searchSearchengineV.C.playerV.C.playerV.C.playerV.C.playeruser profile,channelV.C.playerrecommendedchannelselectedchannelVote Mechanismuserprofile

    Continuous Data Stream Processing*

    Problem DefinitionGiven a set of objects with their positions on a N-dimension (N>20) region. The set of objects is highly dynamic: each object can move in an unrestricted fashion, i.e., we do not assume any pattern of motionContinuously monitoring the results of each query pointRange QueryKNN Query

    Continuous Data Stream Processing*

    Main DifficultiesHeavy Communication CostThe object updates occur only when the results for some queries might changeSafe Region [SIGMOD05]Incremental UpdateEfficiently maintain the effective resultsMultiple Continuous QueriesDecide the quarantine area for each queryMixed Types of QueriesSupport both the range query and the KNN query

    Continuous Data Stream Processing*

    Range QueryCell CA: max < rB: min r maxC: min > rmax: dis(query,cell)min: dis(query,cell)

    Continuous Data Stream Processing*

    Range Query (Cont.)Moving Query MQ

    How to maintain the Result for a MQ?

    Continuous Data Stream Processing*

    Range Query (Cont.)When to update?Q1Q2Q3AAAAABAAC

    Continuous Data Stream Processing*

    Range Query (Cont.)For a range query QResult listO3O5O7Affected queriesQ2Q4Q7AFor a cell CQ3Q6Q9BCovered cellsC3C4C5AC2C7C9B

    Continuous Data Stream Processing*

    KNN QueryQuery Q: (x,y), 3

    Continuous Data Stream Processing*

    KNN Query (Cont.)Query Q: (x,y), 3

    Query Q: (x,y), rr = dmaxdmax

    Continuous Data Stream Processing*

    KNN Query (Cont.)Query Q: (x,y), 3

    Query Q: (x,y), rr = dmax+dquery

    Continuous Data Stream Processing*

    KNN Query (Cont.)Query Q: (x,y), 3

    Query Q: (x,y), rr = dmax+dcell

    Continuous Data Stream Processing*

    Tree Pattern MiningAs the trees stream in, find out the subtrees that occur more than N times, where N is the number of trees received so far and 01STMerFrequent Tree PatternsT1

    Continuous Data Stream Processing*

    Closed Tree Pattern MiningMining closed frequent subtrees over data streamsa subtree is closed if none of its proper supertrees has the same support as its