online event-driven subsequence matching over financial data streams huanmei wu,betty salzberg,...

20
Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu, Betty Salzberg, Donghui Zhang Northeastern University, College of Computer & Information Science Presented by : Evangelos Kanoulas

Upload: bella-griggs

Post on 28-Mar-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

Online Event-driven Subsequence Matching over Financial Data Streams

Huanmei Wu, Betty Salzberg, Donghui Zhang

Northeastern University, College of Computer & Information Science

Presented by : Evangelos Kanoulas

Page 2: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Motivation (1)

An incoming stream of stock market data Analyze it and do

Trend prediction Pattern recognition Dynamic clustering of multiple data streams Rule discovery

Subsequence matching is the main component

Page 3: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Motivation (2)

Subsequence similarity over financial data streams has its unique properties

Zigzag shape of piecewise linear representation (PLR) Relative position of end points is important Price change (amplitude) is more important than time interval

1

24

3

5

time1’

2’4’

3’

5’

S1 S2Price Price

time

S1 S2 S3

Page 4: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Outline

Motivation

2. Data Stream Processing

3. Subsequence Matching

4. Trend Prediction

5. Performance

6. Conclusion

Page 5: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Data Stream Processing (1)Aggregation and Smoothing

Incoming data arrives at any time Piecewise Linear Representation requires a

unique value for each time interval Aggregation of the raw data Smoothing of the aggregated values using the

moving average

Page 6: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Data Stream Processing (2)Segmentation

PLR may not be in a zig-zag shape The end points of the PLR should be points at which the

trend changes dramatically All other points are considered as noise and should be

eliminated

aggregated data stream

Page 7: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Data Stream Processing (3)%b data stream : the base for linear segmentation

Why use %b (Bollinger Band Percent)?

1. %b is a widely used financial indicator

2. %b has a smoothed moving trend similar to the aggregated data stream

3. %b is normalized value, most values are between -1 and 2

Uniform segmentation criteria

aggregated data stream

%b data stream

Page 8: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Data Stream Processing (4)Segmentation over %b

t

Pri

ce (

x)

Sliding Window

12

35

6

78 9 10

11

12

4

13

In the current sliding window, where Pj(Xj,tj) is the current point, Pi(Xi, ti) is an upper end point if,

Xi = max ( X values of the current sliding window )

Xi > Xj + ( where is the given error threshold )

Pi(Xi, ti) is the last one satisfying the above two conditions

Pi

Pj

Page 9: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Data Stream Processing (5)Two Step Pruning

a. Filter step on %b streams

b. Refine step on the raw sequence stream to eliminate false positives

t4t0 t1 t2 t3

Agg. Stream

%b stream δp

b

pri

ce

pri

ce

δpd

t3t0 t1 t2 t4 t5

t

δpb

Page 10: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Outline

Motivation Data Stream Processing

3. Subsequence Matching

4. Trend Prediction

5. Performance

6. Conclusion

Page 11: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Subsequence Similarity (1)Event-driven subsequence matching

Identifying a new potential end point triggers a subsequent matching search

The search algorithm finds subsequences in the historical data similar to a query subsequence

The query subsequence consists of the most current n end points

Pri

ce

tt5 t6 t7 t8 t9 t10 t11 t12 t13 t14 …… t37 t38 t39 t40

1

2

3

4

Page 12: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Subsequence Similarity (2) New similarity measure

S = {(X1, t1), (X2, t2), …, (Xn, tn)}

S' = {(X1', t1'), (X2', t2'), …, (Xn', tn')}

S and S' are similar if they satisfy the following two conditions :

The relative position of S and S' end points is the same d(S, S') < , where

d(S, S') = ( * ||(Xi+1 - Xi)| - |(Xi+1' - Xi')||

+ * |(ti+1 - ti) - (ti+1' - ti')|)

where , , 0 are user defined parameters

Page 13: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Subsequence Similarity (3)Subsequence Permutation

S = {(X1, t1), (X2, t2), …, (Xn, tn)}

S’ = { [(X1, t1), (X3, t3), …, (Xn-1, tn-1)],

[(X2, t2), (X4, t4), …, (Xn, tn)] }

S” = {[(Xi1, ti1), (Xi3, ti3), …, (Xi(n-1), ti(n-1))],

[(Xi2, ti2), (Xi4, ti4), …, (Xin, tin)] }

Separate upper and lower points

Sort separately based on X values

{i1, i3, …, i(n-1), i2, i4, …, in}

Get the subsequence permutation

Page 14: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Outline

Motivation Data Stream Processing Subsequence Matching

4. Trend Prediction

5. Performance

6. Conclusion

Page 15: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Trend predictionSubsequence matching application

Trend-K at a point p measures the change of the price to the next k points

Three trends: UP, DOWN, NOTREND

Pri

ce

t

t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 …… t37 t38 t39 t40

Page 16: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Outline

Motivation Data Stream Processing Subsequence Matching Trend Prediction

5. Performance

6. Conclusion

Page 17: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Performance (1)Similarity measure

70

65

60

55

50

45

40

35

30

Per

m+

Am

p

Am

p O

nly

Per

m O

nly

Per

m+

Eu

c

Eu

c O

nly

Cor

rect

nes

s %

Page 18: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Performance (2)Event–driven vs. Fixed time periods

Cor

rect

nes

s %

70

65

60

55

50

45

40

35

30

Eve

nt-

dri

ven

FT

1F

T5

FT

10

FT

15

FT

25

FT

30

FT

20

Rel

ativ

e C

PU

cos

t

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Eve

nt-

dri

ven

FT

1

FT

5

FT

10

FT

15

FT

25

FT

30

FT

20

Page 19: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Outline

Motivation Data Stream Processing Subsequence Similarity Trend Prediction Performance

6. Conclusion

Page 20: Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu,Betty Salzberg, Donghui Zhang Northeastern University, College of Computer

SIGMOD 2004NU CCIS

Conclusion

Proposed an online segmentation and pruning algorithm

Defined an alternative similarity subsequence measure

Introduced an event-driven online similarity matching algorithm

Achieved 70% correct predictions using real world data