林俊宏 2010.06.01 parallel association rule mining based on fi-growth algorithm bundit...
TRANSCRIPT
![Page 1: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/1.jpg)
林俊宏2010.06.01
Parallel Association Rule Mining based on FI-Growth Algorithm
Bundit Manaskasemsak,
Nunnapus Benjamas,
Arnon Rungsawang
![Page 2: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/2.jpg)
Outline
Introduction1
FI-Growth algorithm
Parallel FI-Growth
Experiments and results
2
3
4
Conclusion5
![Page 3: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/3.jpg)
Introduction
Association rule mining is one of the most important techniques in data mining.
consists of two main steps: frequent itemsets generation tries to extract the most
frequent patterns; rule generation uses these frequent patterns to
generate interesting rules.
林俊宏 2010.06.01
![Page 4: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/4.jpg)
Two fundamental algorithms proposed for finding the frequent itemsets from large databases Apriori algorithm Closed algorithm
Proposed to reduce this cost. The Fp-growth algorithm FI-growth algorithm
Introduction
林俊宏 2010.06.01
![Page 5: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/5.jpg)
Transaction-oriented databases are usually very large. Mining useful rules from such large and volatile
databases is a challenging problem.
Fast association rule mining inevitably requires large computing resources.
cluster computing technology offers a potential solution parallel Apriori approach, parallel FP-growth approach
Introduction
林俊宏 2010.06.01
![Page 6: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/6.jpg)
The objective of this paper utilize parallelization on a computing cluster
environment for fast extraction of frequent itemsets from large dense databases.
propose an alternative approach parallel association rule mining based on the FI-
growth algorithm
Introduction
林俊宏 2010.06.01
![Page 7: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/7.jpg)
Similar to the FP-growth algorithm, FI-growth represents the data set as a prefix
sharing tree, called an “FI-tree”.
It commonly consists of two phases: FI-tree construction Mining
FI-Growth algorithm
林俊宏 2010.06.01
![Page 8: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/8.jpg)
FI-Growth algorithm
Constructing an FI-tree requires scanning the database only twice: the first scan creates the header table the second scan creates the items-tree.
A 3
B 1
C 4
D 2
E 4
F 4
A 3
C 4
D 2
E 4
F 4
Note that : the items in all lists must be
in the same relative order.
林俊宏 2010.06.01
![Page 9: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/9.jpg)
Combining operation the same sub-paths are grouped and their counts
summed.
The combining operation has the following properties. 1) Self-reflective property: tree(a) © tree(a) is equal to
tree(a) itself. 2) Commutative property: tree(a1) © tree(a2) is equal to
tree(a2) © tree(a1). 3) Associative property: (tree(a1) © tree(a2)) © tree(a3) is
equal to tree(a1) © (tree(a2) © tree(a3)).
FI-Growth algorithm
e: 1
d:2
f: 1 f:1
e: 1
d:2
f: 1 f:1
e: 1
d:2
f: 1 f:1
林俊宏 2010.06.01
![Page 10: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/10.jpg)
The result (grey nodes) replaces the old one that is linked from root.
林俊宏 2010.06.01
![Page 11: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/11.jpg)
root
a:3
c:2
e:1
d:2
c:2
e:1 e:2
f:2f:1f:1 f:4 f:3
e:4 e:1
d:2
f:1f:1
e:1
d:2
f:1f:1 f:2
FI-Growth algorithm Branching step Subset finding step Pruning step
林俊宏 2010.06.01
![Page 12: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/12.jpg)
Parallel FI-Growth
a parallel version of the FI-growth algorithm employ a data parallelism technique on a PC
cluster partition the transaction one-time synchronization to
exchange their sub-trees
林俊宏 2010.06.01
![Page 13: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/13.jpg)
Hierarchical minimum support two solutions to avoid such a problem:
All processors synchronize their lists of item counts utilizing two values of minimum support:
• min_supL1 is defined and used to prune the local header table
• min_supL2 is defined to prune the local items-tree.
in this paper, we use the second approach.
Parallel FI-Growth
林俊宏 2010.06.01
![Page 14: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/14.jpg)
Parallelization min_supL1 = 1(20%) min_supL2 = 2(40%)
Parallel FI-Growth
林俊宏 2010.06.01
![Page 15: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/15.jpg)
FI-Tree synchronization Exchanging of local header table:
• To reduce the communication overhead, only the list of items is broadcast to other processors.
Sending of local sub-tree:• which local sub-tree(s) should be kept, and which should be
sent to the target processors
Parallel FI-Growth
林俊宏 2010.06.01
![Page 16: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/16.jpg)
Experiments and results
Hardware and environment configuration: Tested on a cluster of x86-64 based SMP machines
named “Bedrocks”. Each machine consists of dual 3.2GHz Intel quad-core
processors, 4GB of main memory, and an 80GB SATA disk.
equipped with the Linux-based operating system inter-connected via a 1000Base-TX Ethernet switch the parallel algorithm is written in the C language uses the MPICH message passing library version 1.2.7.
All experiments were run under no-load conditions
林俊宏 2010.06.01
![Page 17: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/17.jpg)
Data set: For the test data set, we utilized the standard “IBM
synthetic data generator” to synthesize a transaction database.• 1000 unique items • 16 million records (each has average transaction length of
10)
Experiments and results
林俊宏 2010.06.01
![Page 18: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/18.jpg)
林俊宏 2010.06.01
![Page 19: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/19.jpg)
Conclusion
research in many areas, including run-time memory requirements
In this paper propose a parallel FI-growth algorithm to accelerate
association rule mining.
In future work, effects of partitioning memory requirements reduce the communication overhead load balancing
林俊宏 2010.06.01
![Page 20: 林俊宏 2010.06.01 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang](https://reader035.vdocument.in/reader035/viewer/2022062801/56649e5f5503460f94b58b6f/html5/thumbnails/20.jpg)
林俊宏 2010.06.01