kuo-yu huangncu csie dblab1 the concept of maximal frequent itemsets ncu csie database laboratory...
TRANSCRIPT
![Page 1: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/1.jpg)
Kuo-Yu Huang NCU CSIE DBLab 1
The Concept of Maximal FrequThe Concept of Maximal Frequent Itemsetsent Itemsets
NCU CSIE Database LaboratoryNCU CSIE Database LaboratoryKuo-Yu HuangKuo-Yu Huang
2002-04-152002-04-15
![Page 2: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/2.jpg)
Kuo-Yu Huang NCU CSIE DBLab 2
OutlineOutline
• Introduction
• Max-Miner
• MAFIA
• GenMax
• Conclusion
![Page 3: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/3.jpg)
Kuo-Yu Huang NCU CSIE DBLab 3
Introduction(1/2)Introduction(1/2)
• Interesting datasets with long patterns– Questionnaire results– Transactions database
• Contain many frequently occurring items• A wide average record length
• Apriori-like algorithms are inadequate– Enumerates every single frequent itemsets
![Page 4: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/4.jpg)
Kuo-Yu Huang NCU CSIE DBLab 4
Introduction(2/2)Introduction(2/2)
• Maximal Frequent Itemsets– If it has no superset that is frequent.– eq
• Items: a, b, c, d, e• Frequent Itemset: {a, b, c}• {a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not Fre
quent Itemset.• Maximal Frequent Itemsets: {a, b, c}
![Page 5: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/5.jpg)
Kuo-Yu Huang NCU CSIE DBLab 5
Max-Miner(1/4)Max-Miner(1/4)
• Efficiently mining long patterns from databases– R. J. Bayardo– ACM SIGMOD’98
• Max-Miner– Abandons a bottom-up traversal– Attempts to “look-ahead”– Identify a long frequent itemset, prune all its subse
ts.
![Page 6: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/6.jpg)
Kuo-Yu Huang NCU CSIE DBLab 6
Max-Miner(2/4)Max-Miner(2/4)
• Set-enumeration tree
• Breadth-first search
![Page 7: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/7.jpg)
Kuo-Yu Huang NCU CSIE DBLab 7
Max-Miner(3/4)Max-Miner(3/4)
• Candidate group– Head: h(g)
• Itemset enumerated by the node.
– Tail: t(g)• An ordered set and contains all items not in h
(g)
– eg:Node {1}• h{g}: {1}• t{g}: {2, 3, 4}
![Page 8: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/8.jpg)
Kuo-Yu Huang NCU CSIE DBLab 8
Max-Miner(4/4)Max-Miner(4/4)
• Support counting– h(g), h(g) t{g}, h(g) {i} for all ∪ ∪– If h(g) t{g} is frequent, then any itemset e∪
numerated by a sub-node will also be frequent but no maximal.
– If h(g) {i} is infrequent, then any head of a ∪sub-node that contains item I will also be infrequent.
![Page 9: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/9.jpg)
Kuo-Yu Huang NCU CSIE DBLab 9
MAFIA(1/4)MAFIA(1/4)
• MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases.– D. Burdick, M. Calimlim, and J. Gehrke.– ICDE’01
• MAFIA– Integrates a depth-first traversal of the itms
et lattice with eiffective pruning mechanisms
![Page 10: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/10.jpg)
Kuo-Yu Huang NCU CSIE DBLab 10
MAFIA(2/4)MAFIA(2/4)
![Page 11: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/11.jpg)
Kuo-Yu Huang NCU CSIE DBLab 11
MAFIA(3/4)MAFIA(3/4)
• HUTMFI– Check Head Union Tail is in MFI
• Stop searching and return
• PEP– newNode = C i∪– Check newNode.support == C.support
• Move I from C.tail to C.head
• FHUT– newNode = C I∪– Whether I is the leftmost child in the tail
![Page 12: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/12.jpg)
Kuo-Yu Huang NCU CSIE DBLab 12
MAFIA(4/4)MAFIA(4/4)
![Page 13: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/13.jpg)
Kuo-Yu Huang NCU CSIE DBLab 13
GenMax(1/2)GenMax(1/2)• Efficiently Mining Maximal Frequent Ite
msets– Karam Gouda and Mohammed J. Zaki.– ICDM’01
• GenMax– A backtrack search based algorithm for mi
ning maximal frequent itemsets.
![Page 14: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/14.jpg)
Kuo-Yu Huang NCU CSIE DBLab 14
GenMax(2/2)GenMax(2/2)• Superset checking techniques
– Do superset check only for Il+1 P∪ l+1
– Using check_status flag– Local maximal frequent itemsets
• Reordering the combine set
• Diffsets propagation
![Page 15: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/15.jpg)
Kuo-Yu Huang NCU CSIE DBLab 15
Conclusion(1/4)Conclusion(1/4)
database # of Items Average length # of records Maximal pattern length
Chess
Pumsb
76
7117
37
74
3196
49046
23(20%)
27(40%)
Connect
Pumsb*
130
7117
43
50
67557
49046
31(2.5%)
43(2.5%)
T10I4D100K
T40I10D100K
1000
1000
10
40
100,000
100,000
13(0.01%)
25(0.1%)
Type I
Type II
Type III
• Type I:– normal MFI distribution with not too long maximal patterns.
• Type II:– Left-skewed distribution with longer pattern
• Type III:– Exponential decay distribution with short maximal pattern
![Page 16: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/16.jpg)
Kuo-Yu Huang NCU CSIE DBLab 16
Conclusion(2/4)Conclusion(2/4)
![Page 17: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/17.jpg)
Kuo-Yu Huang NCU CSIE DBLab 17
Conclusion(3/4)Conclusion(3/4)
![Page 18: Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d1a5503460f949ef113/html5/thumbnails/18.jpg)
Kuo-Yu Huang NCU CSIE DBLab 18
Conclusion(4/4)Conclusion(4/4)