mining frequency counts from sensor set data loo kin kong 25 th june 2003
TRANSCRIPT
![Page 1: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/1.jpg)
Mining frequency counts from sensor set data
Loo Kin Kong25th June 2003
![Page 2: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/2.jpg)
Outline
Motivation Sensor set data Finding frequency counts of itemsets from
sensor set data Future work
![Page 3: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/3.jpg)
Stock quotes
Closing prices of some HK stocks…
Date 0005 0023 0511 238809/6 95.00 15.30 28.10 7.9010/6 94.00 15.45 27.80 7.8511/6 93.50 15.60 27.90 7.8512/6 95.00 15.50 27.80 7.7513/6 95.25 15.55 27.70 7.9016/6 95.25 15.30 27.85 7.9517/6 97.00 15.55 27.60 8.0018/6 97.00 15.60 26.45 8.10
19/6 96.50 15.55 27.20 8.0520/6 96.00 15.55 28.00 8.1023/6 94.25 15.30 27.70 7.95
![Page 4: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/4.jpg)
Stock quotes
Intra-day stock price of TVB (0511) on 23rd June 2003(Source: quamnet.com)
![Page 5: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/5.jpg)
Motivation Fluctuation of the price of a stock may be
related to that of another stock or other conditions
Online analysis tools can help to give more insight on such variations
The case of stock market can be generalized... We use “sensors” to monitor some conditions,
for example: We monitor the prices of stocks by getting
quotations from a finance website We monitor the weather by measuring temperature,
humidity, air pressure, wind, etc.
![Page 6: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/6.jpg)
Sensors
Properties of a sensor include: A sensor reports values, either spontaneously or by
request, reflecting the state of the condition being monitored
Once a sensor reports a value, the value remains valid until the sensor reports again
The lifespan of a value is defined as the length of time when the value is valid
The value reported must be one of the possible states of the condition
The set of all possible states of a sensor is its state set time
sss s ss
t6t5t1 t2 t4t3
![Page 7: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/7.jpg)
Sensor set data A set of sensors (say, n of them) is called a
sensor set At any time, we can obtain an n-tuple, which is
composed of the values of the n sensors, attached with a time stamp
<t, (v1, v2, ..., vn)>where:
t is the time when the n-tuple is obtainedvx is the value of the x-th sensor
If the n sensors have the same state set, we call the sensor set homogeneous
![Page 8: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/8.jpg)
Mining association rules from sensor set data
An association rule is a rule, satisfying certain support and confidence restrictions, in the form
X Ywhere X and Y are two disjoint itemsets
We redefine the support to reflect the time factor in sensor set data
supp(X) = lifespan(X) / length of history
![Page 9: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/9.jpg)
Transformations of sensor-set data
The n-tuples <ts, (v1, v2, ..., vn)> need transformation for finding frequent itemsets
Transformation 1: Each (zx, sy) pair, where zx is a sensor and sy a state for
zx, is treated as an item in traditional association rule mining
Hence, the i-th n-tuple is transformed as<t(i+1) - ti, {(z1, v1), (z2, v2), ..., (zn, vn)}>
where ti is the timestamp of the i-th n-tuple Thus, association rules of the form
{(z1, s1), (z2, v2), ..., (zn, vn)} {(zx, vx)}can be obtained
![Page 10: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/10.jpg)
Transformations of sensor-set data Transformation 2:
Assuming a homogeneous sensor set, each s in the state set is treated as an item in traditional association rule mining
The i-th n-tuple is transformed as<t(i+1) - ti, {(e1, s1), (e2, s2), ..., (em, sm)}>
where ti is the timestamp of the i-th n-tuple, ex is a boolean value, showing whether the state sx exists in the tuple
Thus, association rules of the form{s1, s2, ..., sj} {sk}
can be obtained
![Page 11: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/11.jpg)
The Lossy Counting (LC) Algorithm for items User specifies the support threshold s and error
tolerance Transactions of single item are conceptually
kept in buckets of size 1/ At the end of each bucket, counts smaller than
the error tolerance are discarded Counts, kept in a data structure D, of items are
kept in the form (e, f, ), where e is the item f is the frequency of e since the entry is inserted in D is the maximum count of e before the entry is
added to D
![Page 12: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/12.jpg)
The Lossy Counting (LC) Algorithm for items1. D ; N 02. w 1/; b 13. e next transaction; N N + 1
4. if (e,f,) exists in D do5. f f + 16. else do7. insert (e,1,b-1) to D8. endif9. if N mod w = 0 do10. prune(D, b); b b + 111. endif12. Goto 3;
D: The set of all countsN: Curr. len. of streame: Transaction (of item)
w: Bucket widthb: Current bucket id
![Page 13: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/13.jpg)
The Lossy Counting (LC) Algorithm for items
1. function prune(D, b)2. for each entry (e,f,) in D do3. if f + b do4. remove the entry from D5. endif
![Page 14: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/14.jpg)
The Lossy Counting (LC) Algorithm for itemsets
Transactions are kept in buckets Multiple (say m) buckets are processed at a
time. The value m depends on the amount of memory available
For each transaction E, essentially, every subset of E is enumerated and treated as if an item in LC algorithm for items
![Page 15: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/15.jpg)
Extending the LC Algorithm for sensor-set data We can extend the LC Algorithm for finding
approximate frequency counts of itemsets for SSD: Instead of using a fixed sized bucket, size of which is
determined by , we can use a bucket which can hold an arbitrary number of transactions
During the i-th bucket, when a count is inserted to D, we set
= T1,i-1
where Ti,j denotes the total time elapsed since bucket i up to bucket j
At the end of the i-th bucket, we prune D by removing the counts such that
+f T1,i
![Page 16: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/16.jpg)
Extending the LC Algorithm for sensor-set data1. D ; N 02. w (user defined value); b 13. E next transaction; N N + 14. foreach subset e of E
5. if (e,f,) exists in D do6. f f + 17. else do8. insert (e,1, T1,b-1) to D9. endif10. if N mod w = 0 do11. prune(D, T1,b); b b + 112. endif13. Goto 3;
D: The set of all countsN: Curr. len. of stream
E: Transaction (of itemset)w: Bucket width
b: Current bucket id
![Page 17: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/17.jpg)
Observations The choice of w can affect the efficiency of the
algorithm A small w may cause the pruning procedure being
invoked too frequently A big w may cause that many transactions being
kept in the memory It may be possible to derive a good w w.r.t. mean
lifespan of the transactions If the lifespans of the transactions are short,
potentially we need to prune D frequently Difference between adjacent transactions may
be little
![Page 18: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/18.jpg)
Future work
Evaluate the efficiency of the LC Algorithm for sensor-set data
Investigate how to exploit the observation that adjacent transactions may be very similar
![Page 19: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/19.jpg)
Q & A
![Page 20: Mining frequency counts from sensor set data Loo Kin Kong 25 th June 2003](https://reader036.vdocument.in/reader036/viewer/2022083007/56649e9a5503460f94b9ce65/html5/thumbnails/20.jpg)