association rule mining. mining association rules in large databases association rule mining ...
Post on 22-Dec-2015
233 views
TRANSCRIPT
![Page 1: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/1.jpg)
Association Rule Mining
![Page 2: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/2.jpg)
Mining Association Rules in Large Databases
Association rule mining
Algorithms Apriori and FP-Growth
Max and closed patterns
Mining various kinds of association/correlation
rules
![Page 3: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/3.jpg)
Max-patterns & Close-patterns If there are frequent patterns with many
items, enumerating all of them is costly. We may be interested in finding the
‘boundary’ frequent patterns. Two types…
![Page 4: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/4.jpg)
Max-patterns Frequent pattern {a1, …, a100} (100
1) + (100
2) + … + (11
00
00) = 2100-1 = 1.27*1030
frequent sub-patterns! Max-pattern: frequent patterns without
proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,FMin_sup=2
![Page 5: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/5.jpg)
MaxMiner: Mining Max-patterns
Idea: generate the complete set-enumeration tree one level at a time, while prune if applicable.
(ABCD)
A (BCD) B (CD) C (D) D ()
AB (CD) AC (D) AD () BC (D) BD () CD ()
ABC (C)
ABCD ()
ABD () ACD () BCD ()
![Page 6: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/6.jpg)
Local Pruning Techniques (e.g. at node A)
Check the frequency of ABCD and AB, AC, AD. If ABCD is frequent, prune the whole sub-tree. If AC is NOT frequent, remove C from the
parenthesis before expanding.
(ABCD)
A (BCD) B (CD) C (D) D ()
AB (CD) AC (D) AD () BC (D) BD () CD ()
ABC (C)
ABCD ()
ABD () ACD () BCD ()
![Page 7: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/7.jpg)
Algorithm MaxMiner
Initially, generate one node N= , where h(N)= and t(N)={A,B,C,D}.
Consider expanding N, If h(N)t(N) is frequent, do not expand N. If for some it(N), h(N){i} is NOT frequent,
remove i from t(N) before expanding N. Apply global pruning techniques…
(ABCD)
![Page 8: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/8.jpg)
Global Pruning Technique (across sub-trees)
When a max pattern is identified (e.g. ABCD), prune all nodes (e.g. B, C and D) where h(N)t(N) is a sub-set of it (e.g. ABCD).
(ABCD)
A (BCD) B (CD) C (D) D ()
AB (CD) AC (D) AD () BC (D) BD () CD ()
ABC (C)
ABCD ()
ABD () ACD () BCD ()
![Page 9: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/9.jpg)
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
ABCDEF 0
A 2
B 2
C 3
D 3
E 2
F 1
Min_sup=2
Max patterns:
A (BCDE)B (CDE) C (DE) E ()D (E)
![Page 10: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/10.jpg)
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
ABCDE 1
AB 1
AC 2
AD 2
AE 1
Min_sup=2
A (BCDE)B (CDE) C (DE) E ()D (E)
AC (D) AD ()
Max patterns:
Node A
![Page 11: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/11.jpg)
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
BCDE 2
BC
BD
BE
Min_sup=2
A (BCDE)B (CDE) C (DE) E ()D (E)
AC (D) AD ()
Max patterns:
BCDE
Node B
![Page 12: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/12.jpg)
Example
Tid Items
10 A,B,C,D,E
20 B,C,D,E,
30 A,C,D,F
(ABCDEF)
Items Frequency
ACD 2
Min_sup=2
A (BCDE)B (CDE) C (DE) E ()D (E)
AC (D) AD ()
Max patterns:
BCDE
ACD ()
ACD
Node AC
![Page 13: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/13.jpg)
Frequent Closed Patterns For frequent itemset X, if there exists no
item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “ab” is a frequent closed pattern
Concise rep. of freq pats Reduce # of patterns and rules N. Pasquier et al. In ICDT’99
TID Items
10 a, b, c
20 a, b, c
30 a, b, d
40 a, b, d
50 e, f
Min_sup=2
![Page 14: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/14.jpg)
Max Pattern vs. Frequent Closed Pattern max pattern closed pattern
if itemset X is a max pattern, adding any item to it would not be a frequent pattern; thus there exists no item y s.t. every transaction containing X also contains y.
closed pattern max pattern “ab” is a closed pattern, but not max TID Items
10 a, b, c
20 a, b, c
30 a, b, d
40 a, b, d
50 e, f
Min_sup=2
![Page 15: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/15.jpg)
Mining Frequent Closed Patterns: CLOSET
Flist: list of all frequent items in support ascending order
Flist: d-a-f-e-c
Divide search space
Patterns having d
Patterns having a but not d, etc.
Find frequent closed pattern recursively
Among the transactions having d, cfa is frequent closed cfad is a frequent closed pattern
J. Pei, J. Han & R. Mao. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets", DMKD'00.
TID Items
10 a, c, d, e, f20 a, b, e30 c, e, f40 a, c, d, f50 c, e, f
Min_sup=2
![Page 16: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/16.jpg)
Multiple-Level Association Rules
Items often form hierarchy. Items at the lower level are
expected to have lower support.
Rules regarding itemsets at appropriate levels could be
quite useful. A transactional database can
be encoded based on dimensions and levels
We can explore shared multi-level mining
Food
breadmilk
skim
Garelick
2% fat whitewheat
Wonder....
![Page 17: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/17.jpg)
Mining Multi-Level Associations
A top_down, progressive deepening approach: First find high-level strong rules:
milk bread [20%, 60%]. Then find their lower-level “weaker” rules:
2% fat milk wheat bread [6%, 50%]. Variations at mining multiple-level association
rules. Level-crossed association rules:
skim milk Wonder wheat bread Association rules with multiple, alternative
hierarchies:
full fat milk Wonder bread
![Page 18: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/18.jpg)
Multi-level Association: Uniform Support vs. Reduced Support Uniform Support: the same minimum
support for all levels + One minimum support threshold. No need to
examine itemsets containing any item whose ancestors do not have minimum support.
– Lower level items do not occur as frequently. If support threshold
too high miss low level associations too low generate too many high level
associations
![Page 19: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/19.jpg)
Multi-level Association: Uniform Support vs. Reduced Support Reduced Support: reduced minimum
support at lower levels There are 4 search strategies:
Level-by-level independent Independent search at all levels (no misses)
Level-cross filtering by k-itemset Prune a k-pattern if the corresponding k-pattern at
the upper level is infrequent Level-cross filtering by single item
Prune an item if its parent node is infrequent Controlled level-cross filtering by single item
Consider ‘subfrequent’ items that pass a passage threshold
![Page 20: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/20.jpg)
Uniform SupportMulti-level mining with uniform support
Milk
[support = 10%]
full fat Milk
[support = 6%]
Skim Milk
[support = 4%]
Level 1min_sup = 5%
Level 2min_sup = 5%
X
![Page 21: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/21.jpg)
Reduced SupportMulti-level mining with reduced support
full fat Milk
[support = 6%]
Skim Milk
[support = 4%]
Level 1min_sup = 5%
Level 2min_sup = 3%
Milk
[support = 10%]
![Page 22: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/22.jpg)
Interestingness Measurements
Objective measuresTwo popular measurements: support; and confidence
Subjective measuresA rule (pattern) is interesting if it is unexpected (surprising to the user); and/or actionable (the user can do something with it)
![Page 23: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/23.jpg)
Criticism to Support and Confidence
Example 1: Among 5000 students
3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal
play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%.
play basketball not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence
basketball not basketball sum(row)cereal 2000 1750 3750not cereal 1000 250 1250sum(col.) 3000 2000 5000
![Page 24: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/24.jpg)
Criticism to Support and Confidence (Cont.)
Example 2: X and Y: positively correlated, X and Z, negatively related support and confidence of X=>Z dominates
We need a measure of dependent or correlated events
P(B|A)/P(B) is also called the lift of rule A => B
X 1 1 1 1 0 0 0 0Y 1 1 0 0 0 0 0 0Z 0 1 1 1 1 1 1 1
Rule Support ConfidenceX=>Y 25% 50%X=>Z 37.50% 75%)()(
)(, BPAP
BAPcorr BA
![Page 25: Association Rule Mining. Mining Association Rules in Large Databases Association rule mining Algorithms Apriori and FP-Growth Max and closed patterns](https://reader035.vdocument.in/reader035/viewer/2022062407/56649d7a5503460f94a5d9c6/html5/thumbnails/25.jpg)
Other Interestingness Measures: Interest Interest (correlation, lift)
taking both P(A) and P(B) in consideration
P(AB)=P(B)*P(A), if A and B are independent events
A and B negatively correlated, if the value is less than 1;
otherwise A and B positively correlated
)()(
)(
BPAP
BAP
X 1 1 1 1 0 0 0 0Y 1 1 0 0 0 0 0 0Z 0 1 1 1 1 1 1 1
Itemset Support InterestX,Y 25% 2X,Z 37.50% 0.9Y,Z 12.50% 0.57