Transcript

Continuous Data Stream Continuous Data Stream ProcessingProcessing

Music Virtual Channel – extensionsData Stream Monitoring – tree pattern miningContinuous Query Processing – sequence queries

Date: 2005/10/21Post-Excellence ProjectPost-Excellence ProjectSubproject 6Subproject 6

Continuous Data Stream Management

22

Clusteringengine

Clusteringengine

Music metadata

Music metadata

Music Virtual Channel Music Virtual Channel Extensions Extensions

…11

NN

22

Music collections

Internet V.C.player

V.C.player Filtering

engineFilteringengine

Music channel simulat

or

Music channel simulat

or

InterfaceInterface

ProfilemonitorProfile

monitorClustermonitorClustermonitor

ChannelmonitorChannelmonitor

FavoritechannelFavoritechannel

Clustercoordinator

Clustercoordinator

Peer searchengine

Peer searchengine

Profiledatabase

Profiledatabase

MusicXML

database

MusicXML

database

XML Filteringengine

XML Filteringengine

Continuous Data Stream Management

33

An Extension on Virtual ChannelAn Extension on Virtual Channel

After a player starts a rangerange (or kNNkNN) search, It updates its profile periodically The search results are continuously maintained

V.C. player(query)

0%

10%

20%

30%

40%

50%

POP BLUE ROCK LATIN JAZZ DANCE0%

10%

20%

30%

40%

50%

POP BLUE ROCK LATIN JAZZ DANCE

V.C. player(peer)

0%

10%

20%

30%

40%

50%

POP BLUE ROCK LATIN JAZZ DANCE0%

10%

20%

30%

40%

50%

POP BLUE ROCK LATIN JAZZ DANCE0%

10%

20%

30%

40%

50%

POP BLUE ROCK LATIN JAZZ DANCE

Continuous Data Stream Management

44

An Extension on Virtual ChannelAn Extension on Virtual Channel

Compared with the clustering engine A flexible definition of “clusters” Update is more natural than insertion/deletion No need of parameter setting and re-clustering Indexing can relieve the pain of frequent update

Compared with the problem of moving objects Movements in a high-dimensional feature space In most cases every object is also a query Prediction of object movement is possible

Continuous Data Stream Management

55

When a music piece is played on a channel, The corresponding musicXML file can be obtained A query can be a portion of musicXML or XQuery

An Extension on Favorite ChannelAn Extension on Favorite Channel

Continuous Data Stream Management

66

An Extension on Favorite ChannelAn Extension on Favorite Channel

Compared with query segments More musical semantic in a query Do not interfere the music playback Matching on complex tree-structures

• Common subquery is still useful

Continuous Data Stream Management

77

Research IssuesResearch Issues

Peer Search Engine An indexing method to support continuous query An indexing method to support continuous query

processing for high-dimensional moving objectsprocessing for high-dimensional moving objects A prediction-based bounding mechanism to reduce

the frequency of profile updateXML Filtering Engine

An online method to enable tree pattern mining An online method to enable tree pattern mining over a data streamover a data stream

An indexing mechanism to support XML filtering

Discovering Frequent Tree Discovering Frequent Tree Patterns over Data StreamsPatterns over Data Streams

Submitted for publication

Continuous Data Stream Management

99

Problem DefinitionProblem Definition

As the query trees stream in, find out the subtrees which occur more then θ·N times, where N is the number of trees received so far and 0≦θ 1≦

STMerSTMer

Frequent Tree Patterns

T1 T3 T2

Continuous Data Stream Management

1010

Problem Definition (Cont.)Problem Definition (Cont.)

Labeled ordered treeInduced subtree

B

D C

differs fromB

C D

A

B E

C D

Tree pattern Query Tree

Continuous Data Stream Management

1111

An ExampleAn Example

Given θ = 0.6

Frequent Tree Patterns (occurrence > 0.6*1) :

STMerSTMer

A

B C

A

B CA B C

A

B

A

C

Frequent Tree Patterns (occurrence > 0.6*2) :

B

B

D E

Frequent Tree Patterns (occurrence > 0.6*3) :

A BA

B

A

B F

Continuous Data Stream Management

1212

Main DifficultiesMain Difficulties

The properties of data streams: One pass Traditional tree mining methods fail Fast input rate Efficiency issue is critical Incremental An incremental algorithm is

required Unbounded Approximate counting is needed

Continuous Data Stream Management

1313

An Overview of Our MethodAn Overview of Our Method

Subtree generation

Subtree maintenance

STMerSTMerT1

A candidate pool

Requests on demand

Continuous Data Stream Management

1414

String RepresentationString Representation

DFS order on T (label, level) node sequence S

Continuous Data Stream Management

1515

Subtree GenerationSubtree Generation

Data stream

Buffer A1

A

TD

A1

A

t1

A,1

Buffer A1B2

A

B

TD

B1

B

A

B

A1B2

t2

B,2

Continuous Data Stream Management

1616

Subtree Generation (Cont.)Subtree Generation (Cont.)

Data stream

t1t2

B1

B

A

B

A1B2A1

A

B,2

Buffer A1B2C2

TD

A

B CC1

CA

C

A1C2

A

B C

A1B2C2

A,1C,2

t3

Continuous Data Stream Management

1717

Subtree Generation (Cont.)Subtree Generation (Cont.)

A1 B1

B2

ΦAPT

C1

D2

D1

E3

E2

E1

C2

D3

E4

C2

D3

E4

Buffer A1B2    

TD

A

B C

D

E

F2C2 D3 E4

Continuous Data Stream Management

1818

Subtree MaintenanceSubtree Maintenance

Buffer A1B2E2

(E2, 1, 3)

APT

A1 B1 E1

B2 E2

E2

Φ

GPT

+1

#query trees received = 321

(A1, 5, 0)

(B2, 4, 1)

Φ

(C3, 2, 1)

+1

+1

Continuous Data Stream Management

1919

Experiments on SensitivityExperiments on Sensitivity

Minimum support Error parameter

Continuous Data Stream Management

2020

Experiments on ComparisonExperiments on Comparison

StreamT (ICDM’02)

Continuous Data Stream Management

2121

ConclusionConclusion

Contribution A novel technique is proposed for efficient

subtree generation A compact structure is employed to reduce the

the memory requirement of the candidate poolCurrent work

Mining closed frequent subtrees over data streams A

B C

2

A

B5

A

C2

A

5


Top Related