tar: temporal association rules on evolving numerical attributes wei wang, jiong yang, and richard...

32
TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Upload: roberta-french

Post on 01-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

TAR: Temporal Association Rules on Evolving Numerical

Attributes

Wei Wang, Jiong Yang, and Richard Muntz

Speaker: Sarah Chan

CSIS DB Seminar

May 7, 2003

Page 2: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Presentation Outline

Introduction Problem Definition Mining Algorithms Performance Evaluation Conclusions

Page 3: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Introduction

Association rule mining• X Y (itemsets)• Existence of X implies existence of Y

Earlier work focused on binary attributes and intra-transaction relationships• E.g. “ham bread”: means “A customer who buys h

am is likely to buy bread as well”

Page 4: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Introduction

Cannot describe relationships such as:• If price of item A falls below $1, then monthly sales of

item B rise by a margin between 10K and 20K.• People between 35 and 45 with salary between 80K

and 120K are likely to buy a house whose price is between 300K and 400K within 2 years of marriage.

Goal: to mine ARs involving numerical attributes and temporal evolution

Page 5: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Problem Definition

Each object has a set of numerical attributes Database: a sequence of snapshots S1, S2, .. St

of objects Evolution: temporal changes of values of some

attribute of some object• E.g. Evolution of “salary” attr. with 3 snapshots

(salary [40000,45000]) (salary [47500,55000]) (salary [60000,70000])

Page 6: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Problem Definition

TARs (on evolving numerical attributes): ARs that capture correlations among attr. evolutions

Scope of paper: only consider correlations of simultaneous evolutions (i.e. attr. evolutions over same set of snapshots)

Page 7: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Mining Quantitative ARs Srikant and Agrawal (SIGMOD’96)

• Divide domain of each quantitative attr. into intervals• Combine intervals as long as their support is less tha

n max-sup threshold• A set of items: original and combined intervals• Apply traditional AR mining algorithm

Page 8: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

BitOp (Lent et al., ICDE’97)• Rule form:

A B C quantitative categorical

• Partition attribute domain

into 2-D grids• For each value of attr. C

Examine data in each grid cell to see if AR applies Represent result by a bit in a 2-D bitmap Combine ARs with adjacent LHS attr. values to form a clustered AR

• Smoothing: to cover “small holes” in a big cluster

Mining Quantitative ARs

b4

b3

b2

b1a1 a2 a3 a4 a5 a6

xxx

x x

x

x x

x x

xxx

xxx xx

x

x

xx

x

Page 9: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Mining TARs SR algorithm (based on Srikant et al., 1996)

• Map numerical attribute evolutions to binary attrs.• Apply any traditional AR mining algorithm• Transform binary attr. values in rules to numerical ran

ges• Complexity

For a numerical attr. quantized to b intervals• Need O(b2) items to represent all possible sub-ranges• For t snapshots, need O(b2t) items to encode all possible evolutions

Huge number of items, very inefficient

Page 10: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Mining TARs LE algorithm (based on BitOp)

• Quantize domains• Map each possible evolution of RHS attr. into an item• For each rule form, generate clustered rules for each

possible value of each possible RHS attr.• Complexity

For a RHS attr. quantized to b intervals, consider its evolution over t snapshots

• There could be b2t distinct evolutions Total no. of possible evolutions increases exponentially with no. of a

ttrs. and no. of snapshots

Page 11: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Mining TARs TAR algorithm

Page 12: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: Evolution and Its Space

Given attr. Ai and m snapshots

• Evolution E(Ai ) = (Ai [l1, u1]) (Ai [l2, u2]) … (Ai [lm, um])

Length of evolution = m

• Evolution space of Ai: m dimensional space (jth dimension associated with value of Ai at jth snapshot)

Page 13: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: Evolution and Its Space

E.g. E1 = (salary [40000,45000]) (salary [47500,55000]) (salary [60000,70000])

Page 14: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: Evolution Conj. and Its Space

Given n attrs A1, A2, …, An (length m)

• Evolution conjunction: E(A1) E(A2) … E(An)

• Evolution space: n x m dimensional space (each dimension associated with value of one attr. at one snapshot)

Page 15: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: TAR

TAR: X Y (evolution conjunctions)• Symmetric relationship• Assumption: Y only contains evolution of one attr.

• E(A1)E(A2)…E(Ak-1)E(Ak+1)…E(An) E(Ak)

Page 16: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: Window

Window• Subsequence of m consecutive snapshots

• For t available snapshots S1, S2, …, St, there are t-m+1 windows of width m

Page 17: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: Object History

Object history of an object o over a window W• The sequence of changes of o over W

• Follows an evolution E(Ai) iff, for each snapshot in the window, the value of Ai in the object history falls into corr. interval specified in E(Ai)

• Follows an evolution conjunction E(A1) E(A2) … E(An) iff it follows every evolution in it

• o satisfies the TAR X Y iff, it has an object history that follows X and Y

Page 18: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: TAR as Hypercube

Each object history can be mapped to a point in evolution space of involved attributes

TAR: a hypercube in this space, which contains the set of object histories satisfying the rule

Support, density & strength thresholds: constraints on number & distribution of object histories in hypercube

Page 19: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

The Model: Rule Set

Rule set <rmin, rmax>: set of all rules r s.t. r is a specialization of rmax and a generalization of rmin

Each rule set can summarize a large no. of valid rules

Page 20: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Mining TARs: TAR algorithm Find density-based (subspace) clusters Find all valid rule sets

Page 21: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Mining TARs: TAR algorithm Find density-based (subspace) clusters

• Create base intervals for each attribute• Form base cubes from base intervals: n=1, m=1• Bottom-up clustering algorithm• Density of an evolution cube: object history concentr

ation of the sparsest base cube in it• The Apriori property holds on density

Find all valid rule sets

Page 22: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Mining TARs: TAR algorithm Find density-based (subspace) clusters Find all valid rule sets

• Make use of the strength and support metrics For rule X Y,

strength = Sup(X Y) / (Sup(X) x Sup(Y))

• Strength is used to prune search space

Page 23: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Pruning with the Strength Threshold

Property 1:• For any rule r, a base rule bri

which is a specialization of r and

with strength that of r.

Implication• Only have to examine rules which are generalization

s of BR (set of base rules) whose strength thres.

Page 24: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Pruning with the Strength Threshold

Property 2• For any two rules r and r’ where

r’ is a specialization of r, and

strength of r’ < strength of r,

another base rule bri which is

a specialization of r but not r’ and

strength of bri > strength of r.

Implication• Can skip rules which are generalizations of r’ but whi

ch do not contain any other base rule in BR.

Page 25: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Finding Rule Sets from Each Cluster

Find BR For each subset of BR, explore

corr. search region from rule r(min. bounding box of rules in BR’)• If strength of r < thres., ignore region• min-rule

If sup of r thres., min-rule r If sup of r < thres., search for its valid generalizations within region.

Stop when strength < thres.

• max-rule Search similarly until a rule is found s.t. all of its generalizations either viola

te strength requirement or another base rule is included There can be multiple max-rules for a min-rule

Page 26: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Performance Evaluation

300MHz CPU with 128MB memory Three synthetic datasets

• 100,000 objects with 5 attributes• 100 snapshots• Embedded 500 rules of length 5 or less• User-specified thresholds

Density: 2 (2 times the average density) Support: 5% Strength: 1.3

Page 27: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Performance Evaluation

Precision: 100% for all algorithms

Recall

Page 28: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Performance Evaluation

Observations• TAR is faster than SR and LE

Strength is used to prune the search space in TAR Search a smaller set of candidate rules

• Response time of TAR increases at a slower pace w.r.t. number of base intervals

Page 29: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Performance Evaluation

Real dataset• 20,000 objects (persons)• 5 attributes: age, title, salary, family status (single,

married, head of household), distance between person’s house and a major city

• 10 snapshots (one per year)• No. of base intervals 100; support 3%, density 2,

strength 1.3

Page 30: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Performance Evaluation

Performance of TAR alg. on real dataset• Time taken: 260s to mine 347 rule sets• Examples of TARs

People receiving a salary raise tend to move further away from city center.

If people with a salary in the range 70K and 100K get a raise, the range of the raise will likely be from 7K to 15K.

Page 31: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Conclusions

A TAR model is proposed to represent correlations among numerical attribute evolutions.

A novel approach to mine TARs by first discovering clusters and then efficiently constructing rule sets is introduced.

Empirical evaluation shows TAR algorithm outperforms alternative algs. by a large margin.

Page 32: TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

References

W. Wang, J. Yang, and R. Muntz. W. Wang, J. Yang, and R. Muntz. TAR: Temporal TAR: Temporal association rules on evolving numerical attributesassociation rules on evolving numerical attributes, , ICDE’01.ICDE’01.

R. Srikant and R. Agrawal. R. Srikant and R. Agrawal. Mining quantitative Mining quantitative association rules in large relational tablesassociation rules in large relational tables, SIGMOD’96., SIGMOD’96.

B. Lent, A. Swami, and J. Widom. B. Lent, A. Swami, and J. Widom. Clustering Clustering association rulesassociation rules, ICDE’97., ICDE’97.

R. Agrawal, J. Gehrke, D. Gunopulos, and P. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Raghavan. Automatic subspace clustering of high Automatic subspace clustering of high dimensional data for data mining applicationdimensional data for data mining application, , SIGMOD’98.SIGMOD’98.