mining frequency counts from sensor set data loo kin kong 25 th june 2003

20
Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Upload: maximilian-hill

Post on 12-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Mining frequency counts from sensor set data

Loo Kin Kong25th June 2003

Page 2: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Outline

Motivation Sensor set data Finding frequency counts of itemsets from

sensor set data Future work

Page 3: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Stock quotes

Closing prices of some HK stocks…

Date 0005 0023 0511 238809/6 95.00 15.30 28.10 7.9010/6 94.00 15.45 27.80 7.8511/6 93.50 15.60 27.90 7.8512/6 95.00 15.50 27.80 7.7513/6 95.25 15.55 27.70 7.9016/6 95.25 15.30 27.85 7.9517/6 97.00 15.55 27.60 8.0018/6 97.00 15.60 26.45 8.10

19/6 96.50 15.55 27.20 8.0520/6 96.00 15.55 28.00 8.1023/6 94.25 15.30 27.70 7.95

Page 4: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Stock quotes

Intra-day stock price of TVB (0511) on 23rd June 2003(Source: quamnet.com)

Page 5: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Motivation Fluctuation of the price of a stock may be

related to that of another stock or other conditions

Online analysis tools can help to give more insight on such variations

The case of stock market can be generalized... We use “sensors” to monitor some conditions,

for example: We monitor the prices of stocks by getting

quotations from a finance website We monitor the weather by measuring temperature,

humidity, air pressure, wind, etc.

Page 6: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Sensors

Properties of a sensor include: A sensor reports values, either spontaneously or by

request, reflecting the state of the condition being monitored

Once a sensor reports a value, the value remains valid until the sensor reports again

The lifespan of a value is defined as the length of time when the value is valid

The value reported must be one of the possible states of the condition

The set of all possible states of a sensor is its state set time

sss s ss

t6t5t1 t2 t4t3

Page 7: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Sensor set data A set of sensors (say, n of them) is called a

sensor set At any time, we can obtain an n-tuple, which is

composed of the values of the n sensors, attached with a time stamp

<t, (v1, v2, ..., vn)>where:

t is the time when the n-tuple is obtainedvx is the value of the x-th sensor

If the n sensors have the same state set, we call the sensor set homogeneous

Page 8: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Mining association rules from sensor set data

An association rule is a rule, satisfying certain support and confidence restrictions, in the form

X Ywhere X and Y are two disjoint itemsets

We redefine the support to reflect the time factor in sensor set data

supp(X) = lifespan(X) / length of history

Page 9: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Transformations of sensor-set data

The n-tuples <ts, (v1, v2, ..., vn)> need transformation for finding frequent itemsets

Transformation 1: Each (zx, sy) pair, where zx is a sensor and sy a state for

zx, is treated as an item in traditional association rule mining

Hence, the i-th n-tuple is transformed as<t(i+1) - ti, {(z1, v1), (z2, v2), ..., (zn, vn)}>

where ti is the timestamp of the i-th n-tuple Thus, association rules of the form

{(z1, s1), (z2, v2), ..., (zn, vn)} {(zx, vx)}can be obtained

Page 10: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Transformations of sensor-set data Transformation 2:

Assuming a homogeneous sensor set, each s in the state set is treated as an item in traditional association rule mining

The i-th n-tuple is transformed as<t(i+1) - ti, {(e1, s1), (e2, s2), ..., (em, sm)}>

where ti is the timestamp of the i-th n-tuple, ex is a boolean value, showing whether the state sx exists in the tuple

Thus, association rules of the form{s1, s2, ..., sj} {sk}

can be obtained

Page 11: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

The Lossy Counting (LC) Algorithm for items User specifies the support threshold s and error

tolerance Transactions of single item are conceptually

kept in buckets of size 1/ At the end of each bucket, counts smaller than

the error tolerance are discarded Counts, kept in a data structure D, of items are

kept in the form (e, f, ), where e is the item f is the frequency of e since the entry is inserted in D is the maximum count of e before the entry is

added to D

Page 12: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

The Lossy Counting (LC) Algorithm for items1. D ; N 02. w 1/; b 13. e next transaction; N N + 1

4. if (e,f,) exists in D do5. f f + 16. else do7. insert (e,1,b-1) to D8. endif9. if N mod w = 0 do10. prune(D, b); b b + 111. endif12. Goto 3;

D: The set of all countsN: Curr. len. of streame: Transaction (of item)

w: Bucket widthb: Current bucket id

Page 13: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

The Lossy Counting (LC) Algorithm for items

1. function prune(D, b)2. for each entry (e,f,) in D do3. if f + b do4. remove the entry from D5. endif

Page 14: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

The Lossy Counting (LC) Algorithm for itemsets

Transactions are kept in buckets Multiple (say m) buckets are processed at a

time. The value m depends on the amount of memory available

For each transaction E, essentially, every subset of E is enumerated and treated as if an item in LC algorithm for items

Page 15: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Extending the LC Algorithm for sensor-set data We can extend the LC Algorithm for finding

approximate frequency counts of itemsets for SSD: Instead of using a fixed sized bucket, size of which is

determined by , we can use a bucket which can hold an arbitrary number of transactions

During the i-th bucket, when a count is inserted to D, we set

= T1,i-1

where Ti,j denotes the total time elapsed since bucket i up to bucket j

At the end of the i-th bucket, we prune D by removing the counts such that

+f T1,i

Page 16: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Extending the LC Algorithm for sensor-set data1. D ; N 02. w (user defined value); b 13. E next transaction; N N + 14. foreach subset e of E

5. if (e,f,) exists in D do6. f f + 17. else do8. insert (e,1, T1,b-1) to D9. endif10. if N mod w = 0 do11. prune(D, T1,b); b b + 112. endif13. Goto 3;

D: The set of all countsN: Curr. len. of stream

E: Transaction (of itemset)w: Bucket width

b: Current bucket id

Page 17: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Observations The choice of w can affect the efficiency of the

algorithm A small w may cause the pruning procedure being

invoked too frequently A big w may cause that many transactions being

kept in the memory It may be possible to derive a good w w.r.t. mean

lifespan of the transactions If the lifespans of the transactions are short,

potentially we need to prune D frequently Difference between adjacent transactions may

be little

Page 18: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Future work

Evaluate the efficiency of the LC Algorithm for sensor-set data

Investigate how to exploit the observation that adjacent transactions may be very similar

Page 19: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003

Q & A

Page 20: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003