burst detection using shifted aggregation tree. content burst detection problem and shifted binary...

38
Burst Detection using Shifted Aggregation Tree

Post on 20-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Burst Detection using Shifted Aggregation Tree

Page 2: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Content

• Burst Detection Problem and Shifted Binary Tree• Aggregation Pyramid and Aggregation Tree• Shifted Aggregation Tree (SAT) and Detection

Algorithm• Search for Best SAT Structure• Experiments and Discussion

Page 3: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Aggregation Pyramid (AP)

• N-level pyramid-shape data structure built on a sliding window of length N– Level 1 has N cells containing origi

nal data, from left to right– Level 2 has N-1 cells, storing the ag

gregates for 2 consecutive data, i.e, a sliding sub-window of length 2

– Level h has N-h+1 cells, storing the aggregates for h consecutive data, i.e, a sliding sub-window of length h

12-level Aggregation Pyramid

Xin: Lots of double counting. This is not a generalization of SBT. (why having double counting can’t be a generalization? A SAT could be worse than SBT.)

Page 4: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Embed SBT in AP

• Each node in Shifted Binary Tree is either an original data or an aggregate, which is a cell in Aggregation Pyramid, SBT can be embedded in AP.

• Figure shows how each cell in SBT is embedded in the cell with the same color in AP.

Embed SBT in AP

Page 5: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Aggregation Pyramid as Host Structure

• Any structure composed of only aggregates and the original data can be embedded in AP.

• It provides a host structure and a tool to visualize and compare different data structures. Another Embedded Structure

Page 6: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Aggregation Tree• A tree relation defined on a subset cells of Aggregation Py

ramid containing all the first level cells

• For any cell(t,h) in its domain, if cell(t,h) aggregates cell(t1,h1), cell(t2,h2)…cell(tn,hn), then cell(t,h) is the parent, cell(t1,h1) is the first child, …, cell(tn,hn) is the nth child.

• Call cell(t,h) as node(t,k) if cell(t,h) is in the kth layer in the aggregation tree. Node(t,k) has the height of h in the aggregation pyramid, thus corresponds to a window of length h.

Page 7: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Notations

Sliding Window AggregationPyramid

(Shifted)Aggregation

TreeWindows oflength h

Level h Layer k withheight h_k

A windowstarting from t-h+1, ending at t

Cell(t,h) Node(t,k)

An overlappingwindow betweenwin(t-h+1,t) andwin(t+s-h+1,t+s),which is win(t+s-h+1,t)

Cell(t,h-s) Overlap ofNode(t,k) andNode(t+s,k)

The overlap of node(12,3) and node(16,3), i.e. cell(12,12) and cell(16,12), is cell(12,9), the dark gray cell.

Page 8: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Shifted Aggregation Tree (SAT)• Aggregation Tree w/ the following constraints (SAT

Constraints)– Nodes in the same layer have the same sub-tree structure, i.e.,

if node(t,h) aggregates node(t1,h1), node(t2,h2), etc, node(t+s,h) aggregates node(t1+s,h1), node(t2+s,h2) and so on. All the children’s window lengths and relative orders in time are same, only shift in time.

– Every node in the same layer shifts the same duration in time from its previous node.

– The shift at layer l is a multiple of the shift at layer l-1. – The overlap window length of two adjacent nodes at layer l is

greater than or equal to the window length at layer l-1

Page 9: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT generalizes SBTSBT SAT

A node at layer l aggregates 2 children nodes at layer l-1

A node at layer l aggregates 2 or more children nodes at layer l-1 and/or lower layer(s)

Shift at layer l is twice as shift at layer l-1

Shift at layer l is k times as shift at layer l-1, k is any natural number

The overlap window length of two adjacent nodes at layer l is equal to the window length at layer l-1

The overlap window length of two adjacent nodes at layer l is greater than or equal to the window length at layer l-1

Page 10: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT ExamplesXin: does this generalize? (yes)

Page 11: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT Representation• Layer k can be represented by a triple (hk, sk, {ck1, ck2,…ckn}),

where– hk, the corresponding level h in Aggregation Pyramid– sk, the shift, distance at layer 1 from leftmost leaf of node to

the leftmost leaf of next node at same level.– {ckl, ck2, …ckn}, the layers for all its children

• (hk, sk) in short when no confusion.• A SAT can be represented by a list of (hk, sk), the first triple rep

resents the first layer and so on.• For example, {(1,1), (2,1), (4,2), (8,4)} represents a SWT of he

ight 8

Page 12: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT Properties

• The overlap window length owlk of two adjacent nodes at layer k is hk-sk

• A window length h between hok-1 + 2 and hok +1 is always covered by a node at layer k

Page 13: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT Benefits• Structure not unique, defining a family of

structures.• Variable (h,s) means variable bounding ratio T,

controllable false alarm ratio and controllable detail search region.

• Adaptive to the data!

Page 14: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Detection Algorithm w/ SAT• For each time point t,

– start from layer 2, i.e., k=2– while (a window ends) // need update node(t,k) (1)

• update node(t,k) (2) • if node(t,k) exceeds the threshold of some length h

in its Detail Search Region DSR(t,k) (3)– detail search DSR(t,k)Needs proof that you want

to do this now. (4)– end– go to next layer, k ++

Page 15: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Detail Search Region (DSR)

• DSR(t,k) : the region when node(t,k) updated, where to do detail search for real burst

• A set of cells in aggregation pyramid within node(t,k)’s coverage

• Exclude cells before t-sk which have been searched by node(t-sk,k), where sk is the shift at layer k

• Exclude cells within DSR(t,k-1) which have been searched by node(t,k-1) Xin: I don’t understand this one. This may look for a different threshold. (Correct, for the whole coverage, you can do detail search, just not necessary if some part can be covered by lower nodes)

• Loose DSR vs Tight DSR

Page 16: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Loose DSR

• hok : the overlap window length of 2 adjacent nodes at layer k

• A window length h between hok-1 + 2 and hok +1 is always covered by a node at layer k

• A quadrilateral shape bounded by [t-sk+1, t] and [hk-1-sk-1+2, h

k-sk+1]

• Used by SBT

Page 17: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Tight DSR• A window length h between hok-1 + 2 and

hk could be covered by a window hk , depending on the ending time– cell(t-1,hk-1) is covered by node(t,k), but cell

(t-2, hk-1) is not

• A m-nary fork shape, m=sk/sk-1

– For SBT, m=2, i.e, “L” shape

• Has the same probability to raise an alarm as Loose DSR, but has less detail search cost on average, since some cells will be detected by a tight bound, especially with bound-&-prune detail search.

Page 18: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Naïve Detail Search

• Search every cell in DSR one by one.• Cell(t+1,h+1)/cell(t-1/h-1) can be computed by

adding/removing cell(t+1,1)/cell(t,1) from cell(t,h).

• Starting from one seed cell, populate the whole interested DSR.

Page 19: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Grid-based Bound-&-Prune Detail Search

• Given node(t,k), i.e., cell(t,hk), there are several cells at lower layers having the same starting time or the same ending time.

• By subtracting these cells, we can get some intermediate cells within DSR.

• These cells form a grid and partition DSR.

• Each cell has its own small DSR, if it doesn’t exceed the minimal threshold, no need to check its DSR.

• Binary partition DSR, check big DSR first, then small DSR.

Page 20: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Grid-based Bound-&-Prune Detail Search

• By subtracting a lower layer cell starting at the same time on the left, we can get a cell with the same color on the right.

• The intermediate cells partitions the “L” shape tight DSR bounded by the red lines

• cell(28,20) has its DSR bounded by the pink lines

Page 21: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Algorithm Complexity• Depend on specific SAT structure and the data to be

detected

• (Should have an analysis in the average case for the best SAT structure for the given data)

Page 22: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Search for Best SAT Structure

• Given the above detection algorithm and the data to be detected, the best SAT structure minimizes the total running time.

• Two major parts in the detection algorithm

– updating the SAT structure, step (1) and (2)

– detail searching DSR, step (3) and (4)

• Best SAT structure balances between the updating time and the searching time.

Page 23: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Optimization Goal

• The goal is to minimize

• But how to quantitate the updating time and the searching time?– Theoretical method: estimate number of cell-access

– Experiment method: count machine time on sample data

)( SearchTimeUpdateTime

Page 24: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Estimate time by Number of Cell-access

• Most part of the algorithm, i.e. step (2),(3),(4) are to access either the original data or an aggregate which is the same type as original data.

• The number of cell-access implies how many operations are executed, thus an estimation of the running time.

• Use the expected number of cell-access as the measurement of the expected running time.

• Step (1) has the same number of execution as step (2), counted in step (2) by multiplying a weight learned from experiment.

Page 25: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Cell-access of node(t,l)

• The updating step (2) requires to read all its children and write the aggregate to node(t,l), thus the number of cell-access is sizeof(children) + 1

• Step (3) can be done by a binary search to compare node(t,l) against the interested thresholds within DSR, requiring log2W+1 comparison, where W is the number of interested window lengths in this DSR.

Page 26: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Cell-access of node(t,l) (2)

• For naïve detail search, to check one cell requires 4 cell-access (2 read, 1 write, 1 comparison against the threshold).

• The probability to search a cell(t,h) is the probability to raise an alarm at level h given its covering node. Probalarm can be learned from sample data.

• Total number of cell-access is DSRin cells

)Prob*4( alarm

Page 27: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Time Cost of a SAT• Cost(SAT):

– For theoretical method:

– For experiment method: actual machine time to test on sample data with this SAT

• Normalized Cost(SAT):

– Where t is the total time points when counting or testing, and hL i

s the window length of the top layer

– Comparable value for different SAT with different time cycles and different max window lengths

• Best SAT is the one with minimal normalized cost which can cover the max interested window length N.

Lht

SATCost

*

)(

nodes All

)),((numAccess ltNode

Page 28: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

H-S grid as a search space• Map layer (h, s) to a grid cell in a

regular h-s grid, origin at (1,1).

• Link these grid cells by the SAT list order, a SAT can be mapped to a path in the h-s grid.

• Shown two SAT paths– {(1,1)(3,3)(6,3)(12,3)} in red

– {(1,1)(2,2)(4,2)(8,4)(12,4)} in blue

• Grayed cells don’t satisfy SAT constraints, because h < s h-s grid and 2 SAT paths

Page 29: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Best SAT as shortest path

• Any path ending above the line h-s+1=N, can at least cover window length N, thus no need for another layer.

• Call the grid cells above the line h-s+1=N as final states, Best SAT is the shortest path starting from (1,1) to one of the final states and satisfying SAT constraints.

• Multiple Single-Source-Shortest-Path (SSSP) problems between (1,1) and one of final states in a directed graph.

Page 30: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SSSP w/ Constraints

• Not all the paths between (1,1) and a final state are legal, i.e, satisfying SAT constraints.

• Unlimited final states, when to stop?

• Large graph if considering all the possible edges between layer l-1 and layer l, many of them are not likely even in a good SAT, say {(1,1)(900,900)} is not likely in a SAT covering length 1000.

Page 31: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Best-first Graph Search• Best-first search in a dynamically-generated directed gr

aph– Dynamically add edges to the graph for the node with best n

ormalized cost– Guarantee all the paths are legal– Avoid to generate the unlikely edges– Stop searching after reaching M final states.

• Dijastra-style cost update to track the shortest path– If the cost of a new path ending at current node is better than

the cost kept in this node, update this node with the new cost

Page 32: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Search Algorithm• insert (1,1) into the graph and a heap based on the normalized cost• while the heap is not empty

– pop the first node in the heap– if it’s a final node

• if it’s already in the graph– if the new cost is better than the cost stored, update the cost

• else insert the node into the graph and store its cost• if already reach M final nodes, stop

– else• generate a set of next possible nodes for this node• for each next possible node

– evaluate the normalized cost– if the new node is in the graph

» If the new cost is better than the cost stored, update the cost– else insert the node into the graph and store its cost– insert this node into the heap

• end while• output the best node and corresponding path with the best cost

Page 33: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Experiments

• Test data: random normal-distributed N(6,1), 1M

• Sample data: 20K

• Interested windows are all the windows from length l up to the max length N

• For window length h, the threshold Th is set to

where bp is the real burst probability, is the normal cumulative function

)1(* 1 bpstdavg

Page 34: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT w/ naïve detail search vs SBT

• Total running time in machine clock

• SAT_EP: SAT found using experiment time cost

• SAT_TH: SAT found using theoretical time cost

• SBT: Shifted Binary Tree

burst probability

max window length SAT_EP SAT_TH SBT SAT_EP SAT_TH SBT SAT_EP SAT_TH SBT SAT_EP SAT_TH SBT

10 1212 700 2413 1661 521 2123 1021 581 1953 1552 1201 1972

50 2083 1221 2952 1712 1171 2904 1501 1121 2753 2583 6318 2604

100 3094 2062 3715 2383 1972 3565 2843 3885 3555 2704 11065 3385

500 17675 8762 30572 18567 8162 30163 20408 8201 30092 8751 31895 29943

0.00010.1 0.01 0.001

Page 35: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT w/ naïve detail search vs SBT (2)

burstProb=0.1

0

10000

20000

30000

40000

10 50 100 500

max window length

mac

hin

e cl

ock SAT_EP

SAT_TH

SWT

burstProb=0.01

0

10000

20000

30000

40000

10 50 100 500

max window length

mach

ine c

lock

SAT_EP

SAT_TH

SWT

burstProb=0.001

01000020000

3000040000

10 50 100 500

max window length

mac

hine

clo

ck SAT_EP

SAT_TH

SWT

burstProb=0.0001

0

10000

20000

30000

40000

10 50 100 500

max window length

mac

hin

e cl

ock SAT_EP

SAT_TH

SWT

Page 36: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Alarm probability is always large• In the testing data, even the real burst probability Probtrue is 0.

0001, the probability for a window length l to exceed the threshold for length l-d, increases rapidly even d increases a little.

• Figure right shows this probability for all the window length pairs less than 100 with Probtrue=0.0001.

• When Probtrue is considerably large, updating time decides the SAT structure, since the detecting time are very close.

Page 37: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

Discussion

• SAT_EP is sensitive to CPU time, since it depends on the testing time on a small amount of sample data. Different runs may give different SAT structures.

• It’s stable in the sense it always finds some SAT better than SBT.

• SAT_TH doesn’t give an accurate estimation of actual running time.

• But when Probalarm is large, it produces better comparison of updating cost between different SAT, thus better result.

Page 38: Burst Detection using Shifted Aggregation Tree. Content Burst Detection Problem and Shifted Binary Tree Aggregation Pyramid and Aggregation Tree Shifted

SAT w/ naïve detail search vs SBT (3)

• Breakdown time in machine clock for SAT_EP

burst probabilitymax window length update search update search update search update search

10 452 280 500 711 320 260 440 69250 531 992 420 950 350 720 531 1430

100 431 2263 320 1563 651 1550 341 1893500 451 16864 471 17655 381 19447 401 7809

0.1 0.01 0.001 0.0001